In today’s tutorial we are going to build off of our last blog post by using the function we created to aggregate the ENTIRE history of team performance data. Once the data is collected we will analyze it primarily using dplyr and wrap up with some ggvis generated visualizations.
As an ode to one of the a music group near to my heart I have decided to assign naming rights to this process that will help you tackle data questions big and small.
First thing we need after to do after firing up is load/install the packages we’re going to use.
options(stringsAsFactors = F)
###Load Packages and Bring In Function
c('pipeR','dplyr','rvest','RCurl','ggvis') -> packages
#install.packages('ggvis') if you don't have this package you best install it
lapply(packages,library,character.only = T)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:utils':
##
## history
##
## Loading required package: bitops
Next, we have to feed the function we built into R.
This process is somewhat complicated since Github uses https. I highly reccomend that if you have any questions about the functions being used here, or anywhere in this tutorial, that you use R’s fantastic help function by enter ?functionName
'https://raw.githubusercontent.com/abresler/blog_code/master/nba/functions/getBREFTeamStats.R' -> dope_function
dope_function %>>% (getURL(url = .,ssl.verifypeer=FALSE)) %>>%
(parse(text = .)) -> get
eval(expr = get)
You should see the function getBREFTeamStatTable in your workspace but before we can use it must decide which data we want to investigate. How about the entire history of NBA team statistics
After some perusing it looks like the first official season was 1949-1950. Perfect we know what to feed our function to make her happy, every season end from 1950 to 2015.
How do we go about that you ask? Though R has a multiple options up to this task, today we are going to use the magic of a for loop. Since every season besides the 2014-2015 is in the record books we are going to separate historic seasons from this current 2014-2015 season and then bind the 2 data frames together that way we are only scraping the historic data once.
### Historic Seasons
1950:2014 -> season_ends #these are the years
data.frame() %>>% tbl_df -> all_historic_years #create an empty data frame which will contain all the years
#our FOR loop for every season end from 1950 to 2014
for(s in season_ends){
getBREFTeamStatTable(season_end = s, date = T,table_name = 'team')-> table
s -> table$season_end #add a numeric year to make sorting easy
all_historic_years %>>% rbind_list(table) -> all_historic_years #bind year with the master table
table %>>% rm #remove the year table to free up memory
}
#Save Historic Data Somewhere
#all_historic_years %>>%
#write.csv('2014/november/loop_analyze_visualize/historic_team_data_1950_2014.csv',row.names = F)
#Pull in 2015
getBREFTeamStatTable(table_name = 'team',season_end = 2015, date = T) -> data_2015
2015 -> data_2015$season_end
#Bind Both Together -- sort descending
all_historic_years %>>%
rbind_list(data_2015) %>>%
arrange(desc(season_end)) -> all_nba_team_data
That was pretty easy
season | table_name | bref_team_id | team | g | mp | fg | fga | fg. | X3p | X3pa | X3p. | X2p | X2pa |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2014-2015 | team_stats | DAL | Dallas Mavericks | 9.00 | 2160.00 | 364.00 | 755.00 | 0.48 | 80.00 | 227.00 | 0.35 | 284.00 | 528.00 |
2014-2015 | team_stats | POR | Portland Trail Blazers | 9.00 | 2160.00 | 353.00 | 775.00 | 0.46 | 97.00 | 251.00 | 0.39 | 256.00 | 524.00 |
2014-2015 | team_stats | TOR | Toronto Raptors | 9.00 | 2160.00 | 329.00 | 742.00 | 0.44 | 64.00 | 195.00 | 0.33 | 265.00 | 547.00 |
2014-2015 | team_stats | PHO | Phoenix Suns | 8.00 | 1970.00 | 304.00 | 674.00 | 0.45 | 69.00 | 205.00 | 0.34 | 235.00 | 469.00 |
2014-2015 | team_stats | GSW | Golden State Warriors | 8.00 | 1920.00 | 309.00 | 629.00 | 0.49 | 76.00 | 200.00 | 0.38 | 233.00 | 429.00 |
2014-2015 | team_stats | BOS | Boston Celtics | 7.00 | 1680.00 | 289.00 | 615.00 | 0.47 | 55.00 | 183.00 | 0.30 | 234.00 | 432.00 |
Now that we are in possession of the 1363 row data frame containing every team’s statistical performance since the 1949-50 season it’s time to explore it in order to better understand the data and find interesting ideas through visualization.
One nice way to quickly analyze a bunch of a data frame's numeric data is with the summary function. Let’s use it on our NBA data frame.
all_nba_team_data %>>% summary #summary of the variables
## season table_name bref_team_id
## Length:1363 Length:1363 Length:1363
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## team g mp fg
## Length:1363 Min. : 6.0 Min. : 1465 Min. : 214
## Class :character 1st Qu.:82.0 1st Qu.:19755 1st Qu.:2932
## Mode :character Median :82.0 Median :19805 Median :3179
## Mean :78.3 Mean :19087 Mean :3087
## 3rd Qu.:82.0 3rd Qu.:19855 3rd Qu.:3491
## Max. :82.0 Max. :20080 Max. :3980
## NA's :1 NA's :141 NA's :1
## fga fg. X3p X3pa
## Min. : 496 Min. :0.3100 Min. : 10.0 Min. : 75.0
## 1st Qu.:6526 1st Qu.:0.4410 1st Qu.:133.8 1st Qu.: 423.5
## Median :6899 Median :0.4570 Median :333.0 Median : 963.5
## Mean :6784 Mean :0.4538 Mean :329.3 Mean : 943.4
## 3rd Qu.:7380 3rd Qu.:0.4748 3rd Qu.:493.2 3rd Qu.:1388.2
## Max. :9295 Max. :0.5450 Max. :891.0 Max. :2371.0
## NA's :1 NA's :1 NA's :379 NA's :379
## X3p. X2p X2pa X2p.
## Min. :0.1040 Min. : 168 Min. : 358 Min. :0.3100
## 1st Qu.:0.3140 1st Qu.:2466 1st Qu.:5249 1st Qu.:0.4550
## Median :0.3430 Median :2785 Median :5996 Median :0.4750
## Mean :0.3293 Mean :2849 Mean :6102 Mean :0.4682
## 3rd Qu.:0.3620 3rd Qu.:3449 3rd Qu.:7197 3rd Qu.:0.4910
## Max. :0.4280 Max. :3972 Max. :9295 Max. :0.5580
## NA's :379 NA's :1 NA's :1 NA's :1
## ft fta ft. orb
## Min. : 99 Min. : 124 Min. :0.6250 Min. : 57
## 1st Qu.:1486 1st Qu.:1984 1st Qu.:0.7310 1st Qu.: 919
## Median :1646 Median :2196 Median :0.7510 Median :1044
## Mean :1635 Mean :2183 Mean :0.7498 Mean :1022
## 3rd Qu.:1836 3rd Qu.:2443 3rd Qu.:0.7700 3rd Qu.:1172
## Max. :2434 Max. :3411 Max. :0.8320 Max. :1520
## NA's :1 NA's :1 NA's :1 NA's :260
## drb trb ast stl
## Min. : 174 Min. : 243 Min. : 115 Min. : 34.0
## 1st Qu.:2335 1st Qu.:3374 1st Qu.:1679 1st Qu.: 585.5
## Median :2440 Median :3533 Median :1854 Median : 655.0
## Mean :2363 Mean :3610 Mean :1812 Mean : 647.5
## 3rd Qu.:2545 3rd Qu.:3763 3rd Qu.:2041 3rd Qu.: 731.5
## Max. :3074 Max. :6131 Max. :2575 Max. :1059.0
## NA's :260 NA's :18 NA's :1 NA's :260
## blk tov pf pts
## Min. : 21.0 Min. : 76 Min. : 101 Min. : 614
## 1st Qu.:347.0 1st Qu.:1175 1st Qu.:1731 1st Qu.: 7820
## Median :399.0 Median :1281 Median :1875 Median : 8340
## Mean :399.5 Mean :1279 Mean :1833 Mean : 8047
## 3rd Qu.:460.0 3rd Qu.:1433 3rd Qu.:2028 3rd Qu.: 8881
## Max. :716.0 Max. :2011 Max. :2470 Max. :10371
## NA's :260 NA's :260 NA's :1 NA's :1
## pts.g playoff_team scrape_time
## Min. : 70.0 Mode :logical Min. :2014-11-14 16:36:47
## 1st Qu.: 96.6 FALSE:584 1st Qu.:2014-11-14 16:37:02
## Median :102.4 TRUE :779 Median :2014-11-14 16:37:12
## Mean :102.5 NA's :0 Mean :2014-11-14 16:37:11
## 3rd Qu.:108.6 3rd Qu.:2014-11-14 16:37:22
## Max. :126.5 Max. :2014-11-14 16:37:31
## NA's :1
## season_end
## Min. :1950
## 1st Qu.:1978
## Median :1992
## Mean :1990
## 3rd Qu.:2004
## Max. :2015
##
We now can see all sorts interesting things, for example the highest ever team field goal percentage was 54.5% [the 1984-85 Lakers]. I wonder if we can find quickly find all the unique NBA teams that ever played? Super easy to do in R
all_nba_team_data %>>%
select(team) %>>% #select the team
arrange(team) %>>% #sort it so
unique %>>% # gives us the unique results
unlist %>>% #comes in a list form so need to unlist it
as.character ## every NBA Team
## [1] "Anderson Packers"
## [2] "Atlanta Hawks"
## [3] "Baltimore Bullets"
## [4] "Boston Celtics"
## [5] "Brooklyn Nets"
## [6] "Buffalo Braves"
## [7] "Capital Bullets"
## [8] "Charlotte Bobcats"
## [9] "Charlotte Hornets"
## [10] "Chicago Bulls"
## [11] "Chicago Packers"
## [12] "Chicago Stags"
## [13] "Chicago Zephyrs"
## [14] "Cincinnati Royals"
## [15] "Cleveland Cavaliers"
## [16] "Dallas Mavericks"
## [17] "Denver Nuggets"
## [18] "Detroit Pistons"
## [19] "Fort Wayne Pistons"
## [20] "Golden State Warriors"
## [21] "Houston Rockets"
## [22] "Indiana Pacers"
## [23] "Indianapolis Olympians"
## [24] "Kansas City Kings"
## [25] "Kansas City-Omaha Kings"
## [26] "Los Angeles Clippers"
## [27] "Los Angeles Lakers"
## [28] "Memphis Grizzlies"
## [29] "Miami Heat"
## [30] "Milwaukee Bucks"
## [31] "Milwaukee Hawks"
## [32] "Minneapolis Lakers"
## [33] "Minnesota Timberwolves"
## [34] "New Jersey Nets"
## [35] "New Orleans Hornets"
## [36] "New Orleans Jazz"
## [37] "New Orleans Pelicans"
## [38] "New Orleans/Oklahoma City Hornets"
## [39] "New York Knicks"
## [40] "New York Nets"
## [41] "Oklahoma City Thunder"
## [42] "Orlando Magic"
## [43] "Philadelphia 76ers"
## [44] "Philadelphia Warriors"
## [45] "Phoenix Suns"
## [46] "Portland Trail Blazers"
## [47] "Rochester Royals"
## [48] "Sacramento Kings"
## [49] "San Antonio Spurs"
## [50] "San Diego Clippers"
## [51] "San Diego Rockets"
## [52] "San Francisco Warriors"
## [53] "Seattle SuperSonics"
## [54] "Sheboygan Red Skins"
## [55] "St. Louis Bombers"
## [56] "St. Louis Hawks"
## [57] "Syracuse Nationals"
## [58] "Toronto Raptors"
## [59] "Tri-Cities Blackhawks"
## [60] "Utah Jazz"
## [61] "Vancouver Grizzlies"
## [62] "Washington Bullets"
## [63] "Washington Capitols"
## [64] "Washington Wizards"
## [65] "Waterloo Hawks"
I wonder if this arrogant troll knew that or could tell us how many official NBA teams there have been since 1949-50??
As fun as it is to explore the variables we already have in the data frame we need to move on to creating some of our own. One thing this data frame doesn't tell us is how many points each team scored during the course of the season. I also see a metric that captures the number of points per field goal attempt. Seems like a daunting task, it would be in Excel but we hate Excel and don't need it when we have nuclear weapons like R. Yet again there are countless ways to achieve both of these things in R but my favorite way is to use dplyr's mutate function.
all_nba_team_data %>>%
filter(!is.na(g)) %>>%
mutate(points_total = g * pts.g,
points_per_fga = points_total / fga) -> all_nba_team_data
all_nba_team_data %>>%
select(team,season,points_total, points_per_fga) #make sure it worked
## Source: local data frame [1,362 x 4]
##
## team season points_total points_per_fga
## 1 Dallas Mavericks 2014-2015 963.9 1.276689
## 2 Portland Trail Blazers 2014-2015 948.6 1.224000
## 3 Toronto Raptors 2014-2015 948.6 1.278437
## 4 Phoenix Suns 2014-2015 838.4 1.243917
## 5 Golden State Warriors 2014-2015 838.4 1.332909
## 6 Boston Celtics 2014-2015 732.2 1.190569
## 7 Brooklyn Nets 2014-2015 831.2 1.272894
## 8 Sacramento Kings 2014-2015 934.2 1.326989
## 9 Chicago Bulls 2014-2015 932.4 1.307714
## 10 Houston Rockets 2014-2015 826.4 1.341558
## .. ... ... ... ...
Cool like Joe Johnson at end of a game with the Brooklyn Nets looking for the win, but we went to all that work to add those new columns we should at least explore them a little. How about trying to find which team had the highest ever points per field goal attempt?
all_nba_team_data %>>%
filter(max(points_per_fga) == points_per_fga) %>>%
select(team, season_end, points_per_fga, playoff_team)
## Source: local data frame [1 x 4]
##
## team season_end points_per_fga playoff_team
## 1 Utah Jazz 1995 1.376369 TRUE
Well, well, well those 1994-95 Utah Jazz. That was a pretty good team with superstars like Felton Spencer, John Crotty, Blue Edwards, Adam Keefe and this bad ass at age 35.
I don’t know, maybe that team had a few other decent players, just can’t think of them right now! This whole analysis thing has been fun and we just scratched the surface of what R can do but we came to here visualize, let’s do it.
No better way to think about it than those lyrics from the late Tupac Shakur. We can see the data of course, all 44,946 pieces of it, but its hard to really see what’s going on in a tabular format. That, ladies and gentleman, is what data visualization is for.
Before we get started visualizing we need to decide what exactly we want to explore. I keep hearing about this Daryl Morey character and how he is all into all these crazy things including data science. I also hear rumors that he is encouraging a game-plan centered around shooting tons of three pointers and even tried to formulate his roster around doing that.
Can data analysis tell us anything about this and the history of the three point shot? Of course it can.
First we need to create a data frame that includes only the seasons since the advent of the Three Point Shot Era. We could use Google to find the answer but we are learning we are data scientists so let’s use R to answer the question of what years cover this era. How can R do this? When we looked at the summary of our data frame we noticed that X3p., x3p and X3pa columns all contained NA values. Hmmm, maybe thats the answer, if we filter out NAs in one of those columns intuition says that should work.
all_nba_team_data %>>%
filter(!is.na(X3pa)) -> era3pt_shot #create the 3pt era data frame by filter out NA remember that that symbol ! means NOT and is.na is a function to find NA values
Looks like R got us the right answer and if you don’t trust R, this should do the trick.
We have one item before to take care of before getting to the best part and that’s adding a discrete column for the decade which we intend utilize to add some spice to our visualizations..
era3pt_shot$season_end %>>% #select season end
cut(breaks = c(0,1989,1999,2009,2015), #use cut to tell it end of each decade
labels = c('1980s','1990s','2000s','2010s') #label the new column
) -> era3pt_shot$decade
Alright we have what we need, its #DataViz time.
Let’s look at a colored Scatter Plot of Total Three Point Shot Attempts against Total Field Goal Attempts. Note since these variables are not calculated in real terms across the seasons [remember our 2015 season data is on going] we must filter the 2014-15 data in order to keep things consistent and not ruin the visualization.
era3pt_shot %>>%
filter(!season_end == 2015) %>>% #filter out 2015
ggvis(x =~fga, y =~X3pa, fill=~factor(decade), shape=~factor(decade), fillOpacity := 0.65) %>>% #fga vs 3pt attempts colored by decade
layer_points() %>>% #ggvis nomenclature for scatter
add_axis("x", title = "Field Goal Attempts",
title_offset = 50) %>>% #fix x title
add_axis("y", title = "3PT Shot Attempts",
title_offset = 55) %>>% #fix y title
add_legend(c("fill",'shape'),
title = "Decade") #clean up the legend
Look at that beautiful ggvis scatter plot. What do we see here? Well first it looks like back in the 1980s and 1990s team’s on average used to take significantly more field goal attempts, we even see one insane out-lier, the absolutely DREADFUL 1990-91 Denver Nuggets who attempted an insane 8,668 field goals [bonus challenge to those up for it, try to use the 2 lines of dplyr I used to figure this out].
In addition to that it, looks like as we move through the decades, despite less overall field goal attempts, teams appear to be taking more 3 point shots. There are all sorts of factors that may have influenced this pattern outside of the magical powers of Daryl Morey, I don’t want to give it away but if you have time, try to investigate how the NBA has changed the Three Point Shot rules since its adoption in the 1979-80 season. Though we won’t get into it in this post, it looks like the data during the 1990s and the 2010s would be ripe for Clustering Analysis [hint there maybe actual clusters in what we see or these very apparent clusters data maybe the result of different external force].
This visualization is quite interesting and the more we look and think about it the more potential investigative topics may come across but this visualization isn’t fair, it's missing the 2015 season data and it isn't easy to see if there have been trends in how the 3 point shot progressed as the years have passed.
I know a way, lets add a new column that is an apples to apples comparison across all seasons since 1979-80 but what? Hmm… what are potential real variables that could be used to try to generate something like this? Two good options come to mind, minutes and games played. We could use either, but for the purposes of this next analysis let’s use minutes played. Since we are investigating three point shot attempts lets select that and divide by total minutes played.
era3pt_shot %>>%
mutate(X3pa_per_min = X3pa/mp) -> era3pt_shot #add the new variable
Ok we have the variables we need let’s get to it. For this visualization we want to look at 3 Point Shot Attempts Per Minute by Season End.
We could do this visually in any number of ways but since I want to get fancy and showcase the power of easy regression analysis in ggvis thanks to the hard work of the man, the myth, the legend Hadley Wickham we are going to stick with a scatter plot and layer on a regression line.
era3pt_shot %>>%
ggvis(x =~season_end, y =~X3pa_per_min, fill = ~factor(decade), fillOpacity := 0.45) %>>%
add_axis("x", title = "", values = seq(from = 1980,to =2015,1), format="####", title_offset = 50,
properties = axis_props(
ticks = list(stroke = "black"),
majorTicks = list(strokeWidth = 2),
labels = list(
angle = 50,
fontSize = 11,
align = "left",
baseline = "middle",
dx = 3
))) %>>%
add_axis("y", title = "3PT Point Attempts Per Minute",
title_offset = 55, ticks = 20) %>>%
layer_points(shape=~factor(decade)) %>>%
group_by(decade) %>>%
layer_model_predictions(model = 'lm', stroke =~factor(decade), se = T) %>>%
add_legend(c("fill","stroke","shape"),orient = 'right', title = "Decade",
properties = legend_props(
title = list(fontSize = 12),
labels = list(fontSize = 10, dx = 10),
symbol = list(stroke = "black", strokeWidth = 1,
size = 100)
))
Look at this amazing regression scatter plot highlighting every single 3 point shot attempt per minute since the introduction of the shot. It appears there was certainly a an upward trend from the 1980’s into the late 1990s and another starting in the mid 2000s through this season. There are a host of potential explanatory factors and other follow on analysis we could perform based on this viz, some which may be topics in forthcoming posts, but before we call it a day we want to look at, and visualize, one last item.
When looking at this graph it may be should be come apparent that, with only a few exceptions, each season’s leader in three point shot attempts appears to be higher than the prior season’s leader, signifying a potential trend. In fancy statistical words, this possible trend appears to display linearity. Appearances can deceive though and we need to thoroughly investigate this hypothesis in keeping with the data ninja code of mathematically proving the possibility of a relationship.
There are a number of ways to do this while adhering to this code, all of which R or any good programming language makes easy for you. In this case we are going to use the results of a linear model, the easiest of which to do in R uses the lm function.
era3pt_shot %>>%
group_by(season_end) %>>% #group by season
filter(X3pa_per_min == max(X3pa_per_min)) %>>% #take the max of each year
(lm(X3pa_per_min ~ season_end, data = .)) %>>% #apply lm against season
summary #let's look at the summary data to see the fit
##
## Call:
## lm(formula = X3pa_per_min ~ season_end, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.019115 -0.006928 -0.000761 0.002920 0.032767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.5236007 0.3623104 -18.01 <2e-16 ***
## season_end 0.0033032 0.0001814 18.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01131 on 34 degrees of freedom
## Multiple R-squared: 0.907, Adjusted R-squared: 0.9043
## F-statistic: 331.7 on 1 and 34 DF, p-value: < 2.2e-16
Our instincts served us well. There is unquestionably a linear relationship over time that indicates that the team that leads the league in 3 point shot attempts per minute has been going up year by year. What this means for the future we can’t say definitively and I by no means would extrapolate that this trend will continue, but historically it has. Now for our final visualization let’s look at who the teams were that actually lead the league in three point shot attempts per minute were.
Maybe, but I think we could do better. We know that there are 5 players on the court per team at a time, that there are a minimum of 4 quarters each of 12 minute length. That means, excluding overtime, there are 240 available minutes per game. So we can take our new variable, and multiply it by 240 to get a new version of it that contains the same information but per 240 minutes, essentially it is a roundabout that should nearly mimic three shot attempts per game if we were to calculate it. Let’s add our new variable and then create a new data frame containing only the top performing teams by season.
era3pt_shot %>>%
mutate(X3pa_per_240_min = X3pa_per_min * 240) %>>%
group_by(season_end) %>>%
filter(X3pa_per_min == max(X3pa_per_min)) %>>%
select(season_end, team, X3pa_per_240_min) -> top_teams
Now that we have data and new variable this let’s quickly visualize it, but since I am stickler for colors matching the entities they are associated with I want to find the right colors for each team. Fortunately I’ve already done this and we can just read in the file with this data and add in the 2 teams missing teams [the San Diego Clippers and the late Seattle Supersonics, RIP].
#Bring in Correct Colors
'https://asbcllc.com/data/NBA/team_colors.csv' %>>% read.csv %>>% data.frame %>>%
tbl_df -> active_colors
data.frame(team = top_teams$team %>>% unique) %>>% tbl_df %>>%
arrange((team)) -> teams
teams %>>% merge(active_colors,all.x = T) -> teams
'#EE2944' -> teams[14,2] #add the San Diego Clippers
'#266A2E'-> teams[15,2] #add the Sonics
We are now ready to create our final visualization.
top_teams %>>%
ggvis(x =~season_end, #season
y =~X3pa_per_240_min,#by our 240 stat showi
text := ~team, #plot the team name
fill = ~team) %>>% #color the team
add_axis("x", title = "", #no title
values = seq(from = 1980,to =2015,1), #plot the years
format="####", title_offset = 50, #format the axis
properties = axis_props(
ticks = list(stroke = "black"),
majorTicks = list(strokeWidth = 2),
labels = list(
angle = 50,
fontSize = 11,
align = "left",
baseline = "middle",
dx = 3
))) %>>%
add_axis("y", title = "3PT Attempts Per 240 Available Minutes",
format="####", #uses d3 axis format
title_offset = 55, ticks = 20) %>>% #fix title and ticks
layer_text() %>>% #adds the text
scale_nominal(property = 'fill', #use our pretty colors
domain = as.character(teams$team),
range = as.character(teams$primary_color)) %>>%
add_legend("fill", orient = 'right',
title = "Team", #need a pretty legend
properties = legend_props(
title = list(fontSize = 12),
labels = list(fontSize = 10, dx = 10),
symbol = list(stroke = "black", strokeWidth = 1,
size = 100)
))
Wow look at this amazing chart. This clearly validates the linearity we explored earlier and shows that in recent times this Daryl Morey guy has clearly pushed for and structured teams around taking lots of three point shots. What is even more interesting is that there appears to be another candy loving, last minute ETO signing, father of 8 that is nearly universally despised by Nets and Lakers fans whose presence on a roster, whether by design or chance, appears as a constant in all but 1 of the team’s that have lead the NBA in 3 point shot attempts per 240 minutes since the 2009-10 season. Any idea who he may be? Here’s a hint
Well it looks like what we heard about Daryl Morey is true. He is optimizing his roster to do crazy things like taking insane amounts of threes. Will the strategy work and can the Rockets keep up this barrage of 3 point shots, only future us in late June know for sure, but if I were a betting man, I’d think both could happen but execution of this strategy lead to a championship? Again no one knows for sure, but looking at this data there were 2 teams that led the league in our 3 point shot attempts per 240 minutes who did win NBA titles. Who you ask? Look into it, but they played in a state home to Cowboys and were coached by a guy who nearly was killed during a game by the baddest of the bad power forwards to play the game, Kermit Washington.
We did alot today, we learned how to loop through a function, dug deeper into analyzing data and introduced the all powerful art of data visualization. We got down with L.A.V., something we should now be able anytime, anywhere as long as we have some data and
That wraps up today’s post. Here is the R source code and as always don’t hesitate to reach out to me with any questions or feedback on Twitter. I’ll be back soon with another post.