In today’s tutorial we are going to build off of our last blog post by using the function we created to aggregate the ENTIRE history of placeholder+image team performance data. Once the data is collected we will analyze it primarily using dplyr and wrap up with some ggvis generated visualizations.

As an ode to one of the a music group near to my heart I have decided to assign naming rights to this process that will help you tackle data questions big and small.

L.A.V. [Loop-Analyze-Visualize]

By the end of this tutorial you will be down with it.


Let’s Get It Started In Here

First thing we need after to do after firing up is load/install the packages we’re going to use.

options(stringsAsFactors = F)
###Load Packages and Bring In Function
c('pipeR','dplyr','rvest','RCurl','ggvis') -> packages
#install.packages('ggvis') if you don't have this package you best install it
lapply(packages,library,character.only = T)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:utils':
## 
##     history
## 
## Loading required package: bitops

Next, we have to feed the function we built into R.

This process is somewhat complicated since Github uses https. I highly reccomend that if you have any questions about the functions being used here, or anywhere in this tutorial, that you use R’s fantastic help function by enter ?functionName

'https://raw.githubusercontent.com/abresler/blog_code/master/nba/functions/getBREFTeamStats.R' -> dope_function
dope_function %>>% (getURL(url = .,ssl.verifypeer=FALSE)) %>>%
    (parse(text = .)) -> get
eval(expr = get)

Getting Loopy to Suck In All the Data


You should see the function getBREFTeamStatTable in your workspace but before we can use it must decide which data we want to investigate. How about the entire history of NBA team statistics

After some perusing it looks like the first official season was 1949-1950. Perfect we know what to feed our function to make her happy, every season end from 1950 to 2015.

How do we go about that you ask? Though R has a multiple options up to this task, today we are going to use the magic of a for loop. Since every season besides the 2014-2015 is in the record books we are going to separate historic seasons from this current 2014-2015 season and then bind the 2 data frames together that way we are only scraping the historic data once.

### Historic Seasons
1950:2014 -> season_ends #these are the years
data.frame() %>>% tbl_df -> all_historic_years #create an empty data frame which will contain all the years

#our FOR loop for every season end from 1950 to 2014
for(s in season_ends){
    getBREFTeamStatTable(season_end = s, date = T,table_name = 'team')-> table
    s -> table$season_end #add a numeric year to make sorting easy
    all_historic_years %>>% rbind_list(table) -> all_historic_years #bind year with the master table
    table %>>% rm #remove the year table to free up memory
}

#Save Historic Data Somewhere
#all_historic_years %>>%
    #write.csv('2014/november/loop_analyze_visualize/historic_team_data_1950_2014.csv',row.names = F)

#Pull in 2015
getBREFTeamStatTable(table_name = 'team',season_end = 2015, date = T) -> data_2015
2015 -> data_2015$season_end

#Bind Both Together -- sort descending
all_historic_years %>>%
    rbind_list(data_2015) %>>% 
    arrange(desc(season_end)) -> all_nba_team_data

That was pretty easy

But Did It Work?


season table_name bref_team_id team g mp fg fga fg. X3p X3pa X3p. X2p X2pa
2014-2015 team_stats DAL Dallas Mavericks 9.00 2160.00 364.00 755.00 0.48 80.00 227.00 0.35 284.00 528.00
2014-2015 team_stats POR Portland Trail Blazers 9.00 2160.00 353.00 775.00 0.46 97.00 251.00 0.39 256.00 524.00
2014-2015 team_stats TOR Toronto Raptors 9.00 2160.00 329.00 742.00 0.44 64.00 195.00 0.33 265.00 547.00
2014-2015 team_stats PHO Phoenix Suns 8.00 1970.00 304.00 674.00 0.45 69.00 205.00 0.34 235.00 469.00
2014-2015 team_stats GSW Golden State Warriors 8.00 1920.00 309.00 629.00 0.49 76.00 200.00 0.38 233.00 429.00
2014-2015 team_stats BOS Boston Celtics 7.00 1680.00 289.00 615.00 0.47 55.00 183.00 0.30 234.00 432.00
Only one person can accurately describe the sense of achievement and accomplishment we’ve earned for successfully overcoming this data hurdle.

Time to Analyze

Now that we are in possession of the 1363 row data frame containing every team’s statistical performance since the 1949-50 season it’s time to explore it in order to better understand the data and find interesting ideas through visualization.

One nice way to quickly analyze a bunch of a data frame's numeric data is with the summary function. Let’s use it on our NBA data frame.

all_nba_team_data %>>% summary #summary of the variables
##     season           table_name        bref_team_id      
##  Length:1363        Length:1363        Length:1363       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##      team                 g              mp              fg      
##  Length:1363        Min.   : 6.0   Min.   : 1465   Min.   : 214  
##  Class :character   1st Qu.:82.0   1st Qu.:19755   1st Qu.:2932  
##  Mode  :character   Median :82.0   Median :19805   Median :3179  
##                     Mean   :78.3   Mean   :19087   Mean   :3087  
##                     3rd Qu.:82.0   3rd Qu.:19855   3rd Qu.:3491  
##                     Max.   :82.0   Max.   :20080   Max.   :3980  
##                     NA's   :1      NA's   :141     NA's   :1     
##       fga            fg.              X3p             X3pa       
##  Min.   : 496   Min.   :0.3100   Min.   : 10.0   Min.   :  75.0  
##  1st Qu.:6526   1st Qu.:0.4410   1st Qu.:133.8   1st Qu.: 423.5  
##  Median :6899   Median :0.4570   Median :333.0   Median : 963.5  
##  Mean   :6784   Mean   :0.4538   Mean   :329.3   Mean   : 943.4  
##  3rd Qu.:7380   3rd Qu.:0.4748   3rd Qu.:493.2   3rd Qu.:1388.2  
##  Max.   :9295   Max.   :0.5450   Max.   :891.0   Max.   :2371.0  
##  NA's   :1      NA's   :1        NA's   :379     NA's   :379     
##       X3p.             X2p            X2pa           X2p.       
##  Min.   :0.1040   Min.   : 168   Min.   : 358   Min.   :0.3100  
##  1st Qu.:0.3140   1st Qu.:2466   1st Qu.:5249   1st Qu.:0.4550  
##  Median :0.3430   Median :2785   Median :5996   Median :0.4750  
##  Mean   :0.3293   Mean   :2849   Mean   :6102   Mean   :0.4682  
##  3rd Qu.:0.3620   3rd Qu.:3449   3rd Qu.:7197   3rd Qu.:0.4910  
##  Max.   :0.4280   Max.   :3972   Max.   :9295   Max.   :0.5580  
##  NA's   :379      NA's   :1      NA's   :1      NA's   :1       
##        ft            fta            ft.              orb      
##  Min.   :  99   Min.   : 124   Min.   :0.6250   Min.   :  57  
##  1st Qu.:1486   1st Qu.:1984   1st Qu.:0.7310   1st Qu.: 919  
##  Median :1646   Median :2196   Median :0.7510   Median :1044  
##  Mean   :1635   Mean   :2183   Mean   :0.7498   Mean   :1022  
##  3rd Qu.:1836   3rd Qu.:2443   3rd Qu.:0.7700   3rd Qu.:1172  
##  Max.   :2434   Max.   :3411   Max.   :0.8320   Max.   :1520  
##  NA's   :1      NA's   :1      NA's   :1        NA's   :260   
##       drb            trb            ast            stl        
##  Min.   : 174   Min.   : 243   Min.   : 115   Min.   :  34.0  
##  1st Qu.:2335   1st Qu.:3374   1st Qu.:1679   1st Qu.: 585.5  
##  Median :2440   Median :3533   Median :1854   Median : 655.0  
##  Mean   :2363   Mean   :3610   Mean   :1812   Mean   : 647.5  
##  3rd Qu.:2545   3rd Qu.:3763   3rd Qu.:2041   3rd Qu.: 731.5  
##  Max.   :3074   Max.   :6131   Max.   :2575   Max.   :1059.0  
##  NA's   :260    NA's   :18     NA's   :1      NA's   :260     
##       blk             tov             pf            pts       
##  Min.   : 21.0   Min.   :  76   Min.   : 101   Min.   :  614  
##  1st Qu.:347.0   1st Qu.:1175   1st Qu.:1731   1st Qu.: 7820  
##  Median :399.0   Median :1281   Median :1875   Median : 8340  
##  Mean   :399.5   Mean   :1279   Mean   :1833   Mean   : 8047  
##  3rd Qu.:460.0   3rd Qu.:1433   3rd Qu.:2028   3rd Qu.: 8881  
##  Max.   :716.0   Max.   :2011   Max.   :2470   Max.   :10371  
##  NA's   :260     NA's   :260    NA's   :1      NA's   :1      
##      pts.g       playoff_team     scrape_time                 
##  Min.   : 70.0   Mode :logical   Min.   :2014-11-14 16:36:47  
##  1st Qu.: 96.6   FALSE:584       1st Qu.:2014-11-14 16:37:02  
##  Median :102.4   TRUE :779       Median :2014-11-14 16:37:12  
##  Mean   :102.5   NA's :0         Mean   :2014-11-14 16:37:11  
##  3rd Qu.:108.6                   3rd Qu.:2014-11-14 16:37:22  
##  Max.   :126.5                   Max.   :2014-11-14 16:37:31  
##  NA's   :1                                                    
##    season_end  
##  Min.   :1950  
##  1st Qu.:1978  
##  Median :1992  
##  Mean   :1990  
##  3rd Qu.:2004  
##  Max.   :2015  
## 

We now can see all sorts interesting things, for example the highest ever team field goal percentage was 54.5% [the 1984-85 Lakers]. I wonder if we can find quickly find all the unique NBA teams that ever played? Super easy to do in R

all_nba_team_data %>>%
    select(team) %>>% #select the team
    arrange(team) %>>%  #sort it so
    unique %>>% # gives us the unique results
    unlist %>>% #comes in a list form so need to unlist it
    as.character ## every NBA Team
##  [1] "Anderson Packers"                 
##  [2] "Atlanta Hawks"                    
##  [3] "Baltimore Bullets"                
##  [4] "Boston Celtics"                   
##  [5] "Brooklyn Nets"                    
##  [6] "Buffalo Braves"                   
##  [7] "Capital Bullets"                  
##  [8] "Charlotte Bobcats"                
##  [9] "Charlotte Hornets"                
## [10] "Chicago Bulls"                    
## [11] "Chicago Packers"                  
## [12] "Chicago Stags"                    
## [13] "Chicago Zephyrs"                  
## [14] "Cincinnati Royals"                
## [15] "Cleveland Cavaliers"              
## [16] "Dallas Mavericks"                 
## [17] "Denver Nuggets"                   
## [18] "Detroit Pistons"                  
## [19] "Fort Wayne Pistons"               
## [20] "Golden State Warriors"            
## [21] "Houston Rockets"                  
## [22] "Indiana Pacers"                   
## [23] "Indianapolis Olympians"           
## [24] "Kansas City Kings"                
## [25] "Kansas City-Omaha Kings"          
## [26] "Los Angeles Clippers"             
## [27] "Los Angeles Lakers"               
## [28] "Memphis Grizzlies"                
## [29] "Miami Heat"                       
## [30] "Milwaukee Bucks"                  
## [31] "Milwaukee Hawks"                  
## [32] "Minneapolis Lakers"               
## [33] "Minnesota Timberwolves"           
## [34] "New Jersey Nets"                  
## [35] "New Orleans Hornets"              
## [36] "New Orleans Jazz"                 
## [37] "New Orleans Pelicans"             
## [38] "New Orleans/Oklahoma City Hornets"
## [39] "New York Knicks"                  
## [40] "New York Nets"                    
## [41] "Oklahoma City Thunder"            
## [42] "Orlando Magic"                    
## [43] "Philadelphia 76ers"               
## [44] "Philadelphia Warriors"            
## [45] "Phoenix Suns"                     
## [46] "Portland Trail Blazers"           
## [47] "Rochester Royals"                 
## [48] "Sacramento Kings"                 
## [49] "San Antonio Spurs"                
## [50] "San Diego Clippers"               
## [51] "San Diego Rockets"                
## [52] "San Francisco Warriors"           
## [53] "Seattle SuperSonics"              
## [54] "Sheboygan Red Skins"              
## [55] "St. Louis Bombers"                
## [56] "St. Louis Hawks"                  
## [57] "Syracuse Nationals"               
## [58] "Toronto Raptors"                  
## [59] "Tri-Cities Blackhawks"            
## [60] "Utah Jazz"                        
## [61] "Vancouver Grizzlies"              
## [62] "Washington Bullets"               
## [63] "Washington Capitols"              
## [64] "Washington Wizards"               
## [65] "Waterloo Hawks"

Look at Us, Master's of This Data

Wow I never knew there was a team called the Chicago Stags.

I wonder if this arrogant troll knew that or could tell us how many official NBA teams there have been since 1949-50??



Time Create Some New Variables

As fun as it is to explore the variables we already have in the data frame we need to move on to creating some of our own. One thing this data frame doesn't tell us is how many points each team scored during the course of the season. I also see a metric that captures the number of points per field goal attempt. Seems like a daunting task, it would be in Excel but we hate Excel and don't need it when we have nuclear weapons like R. Yet again there are countless ways to achieve both of these things in R but my favorite way is to use dplyr's mutate function.

all_nba_team_data %>>%
    filter(!is.na(g)) %>>%
    mutate(points_total = g * pts.g,
                 points_per_fga = points_total / fga) -> all_nba_team_data
all_nba_team_data %>>% 
    select(team,season,points_total, points_per_fga) #make sure it worked
## Source: local data frame [1,362 x 4]
## 
##                      team    season points_total points_per_fga
## 1        Dallas Mavericks 2014-2015        963.9       1.276689
## 2  Portland Trail Blazers 2014-2015        948.6       1.224000
## 3         Toronto Raptors 2014-2015        948.6       1.278437
## 4            Phoenix Suns 2014-2015        838.4       1.243917
## 5   Golden State Warriors 2014-2015        838.4       1.332909
## 6          Boston Celtics 2014-2015        732.2       1.190569
## 7           Brooklyn Nets 2014-2015        831.2       1.272894
## 8        Sacramento Kings 2014-2015        934.2       1.326989
## 9           Chicago Bulls 2014-2015        932.4       1.307714
## 10        Houston Rockets 2014-2015        826.4       1.341558
## ..                    ...       ...          ...            ...

Cool like Joe Johnson at end of a game with the Brooklyn Nets looking for the win, but we went to all that work to add those new columns we should at least explore them a little. How about trying to find which team had the highest ever points per field goal attempt?

all_nba_team_data %>>%
    filter(max(points_per_fga) == points_per_fga) %>>%
    select(team, season_end, points_per_fga, playoff_team)
## Source: local data frame [1 x 4]
## 
##        team season_end points_per_fga playoff_team
## 1 Utah Jazz       1995       1.376369         TRUE

Well, well, well those 1994-95 Utah Jazz. That was a pretty good team with superstars like Felton Spencer, John Crotty, Blue Edwards, Adam Keefe and this bad ass at age 35.


I don’t know, maybe that team had a few other decent players, just can’t think of them right now! This whole analysis thing has been fun and we just scratched the surface of what R can do but we came to here visualize, let’s do it.

Visualize What You Can’t C

No better way to think about it than those lyrics from the late Tupac Shakur. We can see the data of course, all 44,946 pieces of it, but its hard to really see what’s going on in a tabular format. That, ladies and gentleman, is what data visualization is for.

What Do We Visualize?

Before we get started visualizing we need to decide what exactly we want to explore. I keep hearing about this Daryl Morey character and how he is all into all these crazy things including data science. I also hear rumors that he is encouraging a game-plan centered around shooting tons of three pointers and even tried to formulate his roster around doing that.


Can data analysis tell us anything about this and the history of the three point shot? Of course it can.

Step 1: Filter Down to the Three Point Era

First we need to create a data frame that includes only the seasons since the advent of the Three Point Shot Era. We could use Google to find the answer but we are learning we are data scientists so let’s use R to answer the question of what years cover this era. How can R do this? When we looked at the summary of our data frame we noticed that X3p., x3p and X3pa columns all contained NA values. Hmmm, maybe thats the answer, if we filter out NAs in one of those columns intuition says that should work.

all_nba_team_data %>>%
    filter(!is.na(X3pa)) -> era3pt_shot #create the 3pt era data frame by filter  out NA remember that that symbol ! means NOT and is.na is a function to find NA values

It Worked, JEAH

Looks like R got us the right answer and if you don’t trust R, this should do the trick.

placeholder+image

Step 2: Add A New Discrete Columns for the Decade

We have one item before to take care of before getting to the best part and that’s adding a discrete column for the decade which we intend utilize to add some spice to our visualizations..

era3pt_shot$season_end %>>% #select season end
    cut(breaks = c(0,1989,1999,2009,2015), #use cut to tell it  end of each decade
            labels = c('1980s','1990s','2000s','2010s') #label the new column
            ) -> era3pt_shot$decade

Step 3A: Get Hyped


Step 3B: Visualization Time

Alright we have what we need, its #DataViz time.

Let’s look at a colored Scatter Plot of Total Three Point Shot Attempts against Total Field Goal Attempts. Note since these variables are not calculated in real terms across the seasons [remember our 2015 season data is on going] we must filter the 2014-15 data in order to keep things consistent and not ruin the visualization.

Time to Admire Our First Visualization

era3pt_shot %>>%
    filter(!season_end == 2015) %>>% #filter out 2015
    ggvis(x =~fga, y =~X3pa, fill=~factor(decade), shape=~factor(decade), fillOpacity := 0.65) %>>% #fga vs 3pt attempts colored by decade
    layer_points() %>>% #ggvis nomenclature for scatter
    add_axis("x", title = "Field Goal Attempts", 
                     title_offset = 50) %>>% #fix x title
    add_axis("y", title = "3PT Shot Attempts",
                     title_offset = 55) %>>% #fix y title
    add_legend(c("fill",'shape'), 
                         title = "Decade") #clean up the legend


Look at that beautiful ggvis scatter plot. What do we see here? Well first it looks like back in the 1980s and 1990s team’s on average used to take significantly more field goal attempts, we even see one insane out-lier, the absolutely DREADFUL 1990-91 Denver Nuggets who attempted an insane 8,668 field goals [bonus challenge to those up for it, try to use the 2 lines of dplyr I used to figure this out].

In addition to that it, looks like as we move through the decades, despite less overall field goal attempts, teams appear to be taking more 3 point shots. There are all sorts of factors that may have influenced this pattern outside of the magical powers of Daryl Morey, I don’t want to give it away but if you have time, try to investigate how the NBA has changed the Three Point Shot rules since its adoption in the 1979-80 season. Though we won’t get into it in this post, it looks like the data during the 1990s and the 2010s would be ripe for Clustering Analysis [hint there maybe actual clusters in what we see or these very apparent clusters data maybe the result of different external force].

This visualization is quite interesting and the more we look and think about it the more potential investigative topics may come across but this visualization isn’t fair, it's missing the 2015 season data and it isn't easy to see if there have been trends in how the 3 point shot progressed as the years have passed.

Step 4: Figure Out A Way to Create a Real Variable that Compares All Season’s Equally

I know a way, lets add a new column that is an apples to apples comparison across all seasons since 1979-80 but what? Hmm… what are potential real variables that could be used to try to generate something like this? Two good options come to mind, minutes and games played. We could use either, but for the purposes of this next analysis let’s use minutes played. Since we are investigating three point shot attempts lets select that and divide by total minutes played.

era3pt_shot %>>%
    mutate(X3pa_per_min = X3pa/mp) -> era3pt_shot #add the new variable

Step 5: Time to Visualize It, the Iggy Azelea Way


Ok we have the variables we need let’s get to it. For this visualization we want to look at 3 Point Shot Attempts Per Minute by Season End.

We could do this visually in any number of ways but since I want to get fancy and showcase the power of easy regression analysis in ggvis thanks to the hard work of the man, the myth, the legend Hadley Wickham we are going to stick with a scatter plot and layer on a regression line.

era3pt_shot %>>%
    ggvis(x =~season_end, y =~X3pa_per_min, fill = ~factor(decade), fillOpacity := 0.45) %>>%
    add_axis("x", title = "", values = seq(from = 1980,to =2015,1), format="####", title_offset = 50,
                     properties = axis_props(
                        ticks = list(stroke = "black"),
                        majorTicks = list(strokeWidth = 2),
                        labels = list(
                            angle = 50,
                            fontSize = 11,
                            align = "left",
                            baseline = "middle",
                            dx = 3
                        ))) %>>%
    add_axis("y", title = "3PT Point Attempts Per Minute",
                     title_offset = 55, ticks = 20) %>>%
    layer_points(shape=~factor(decade)) %>>%
    group_by(decade) %>>%
    layer_model_predictions(model = 'lm', stroke =~factor(decade), se  = T) %>>%
    add_legend(c("fill","stroke","shape"),orient = 'right', title = "Decade",
                         properties = legend_props(
                            title = list(fontSize = 12),
                            labels = list(fontSize = 10, dx = 10),
                            symbol = list(stroke = "black", strokeWidth = 1,
                                                        size = 100)
                         ))

Damn That 1362 Point Plot is FINE


Look at this amazing regression scatter plot highlighting every single 3 point shot attempt per minute since the introduction of the shot. It appears there was certainly a an upward trend from the 1980’s into the late 1990s and another starting in the mid 2000s through this season. There are a host of potential explanatory factors and other follow on analysis we could perform based on this viz, some which may be topics in forthcoming posts, but before we call it a day we want to look at, and visualize, one last item.

When looking at this graph it may be should be come apparent that, with only a few exceptions, each season’s leader in three point shot attempts appears to be higher than the prior season’s leader, signifying a potential trend. In fancy statistical words, this possible trend appears to display linearity. Appearances can deceive though and we need to thoroughly investigate this hypothesis in keeping with the data ninja code of mathematically proving the possibility of a relationship.

There are a number of ways to do this while adhering to this code, all of which R or any good programming language makes easy for you. In this case we are going to use the results of a linear model, the easiest of which to do in R uses the lm function.

Step 6: Let’s Do It, Linear Regression That Is.

era3pt_shot %>>% 
    group_by(season_end) %>>% #group by season
    filter(X3pa_per_min == max(X3pa_per_min)) %>>%  #take the max of each year
    (lm(X3pa_per_min ~ season_end, data = .)) %>>% #apply lm against season
    summary #let's look at the summary data to see the fit
## 
## Call:
## lm(formula = X3pa_per_min ~ season_end, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.019115 -0.006928 -0.000761  0.002920  0.032767 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.5236007  0.3623104  -18.01   <2e-16 ***
## season_end   0.0033032  0.0001814   18.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01131 on 34 degrees of freedom
## Multiple R-squared:  0.907,  Adjusted R-squared:  0.9043 
## F-statistic: 331.7 on 1 and 34 DF,  p-value: < 2.2e-16

Good Job Intuition, We Were Right That’s a 90% Fit to the Data


Our instincts served us well. There is unquestionably a linear relationship over time that indicates that the team that leads the league in 3 point shot attempts per minute has been going up year by year. What this means for the future we can’t say definitively and I by no means would extrapolate that this trend will continue, but historically it has. Now for our final visualization let’s look at who the teams were that actually lead the league in three point shot attempts per minute were.

Step 7: Create A Better “Real” Variable to Plot With

Before we do this, although I like our real statistic we created it is kind of hard to process what it is actually telling us, put it this way could we use this to explain this trend these basketball loving ladies?

Maybe, but I think we could do better. We know that there are 5 players on the court per team at a time, that there are a minimum of 4 quarters each of 12 minute length. That means, excluding overtime, there are 240 available minutes per game. So we can take our new variable, and multiply it by 240 to get a new version of it that contains the same information but per 240 minutes, essentially it is a roundabout that should nearly mimic three shot attempts per game if we were to calculate it. Let’s add our new variable and then create a new data frame containing only the top performing teams by season.

era3pt_shot %>>%
    mutate(X3pa_per_240_min = X3pa_per_min * 240) %>>%
    group_by(season_end) %>>%
    filter(X3pa_per_min == max(X3pa_per_min)) %>>%
    select(season_end, team, X3pa_per_240_min) -> top_teams

Step 8: Lets Visualize These Teams

Now that we have data and new variable this let’s quickly visualize it, but since I am stickler for colors matching the entities they are associated with I want to find the right colors for each team. Fortunately I’ve already done this and we can just read in the file with this data and add in the 2 teams missing teams [the San Diego Clippers and the late Seattle Supersonics, RIP].

#Bring in Correct Colors
'http://asbcllc.com/data/NBA/team_colors.csv' %>>% read.csv %>>% data.frame %>>%
    tbl_df -> active_colors
data.frame(team = top_teams$team %>>% unique) %>>% tbl_df %>>%
    arrange((team)) -> teams
teams %>>% merge(active_colors,all.x = T) -> teams
'#EE2944' -> teams[14,2] #add the San Diego Clippers
'#266A2E'-> teams[15,2] #add the Sonics

We are now ready to create our final visualization.

top_teams %>>%
    ggvis(x =~season_end, #season
                y =~X3pa_per_240_min,#by our 240 stat showi
                text := ~team, #plot the team name
                fill = ~team) %>>% #color the team
    add_axis("x", title = "",  #no title
                     values = seq(from = 1980,to =2015,1), #plot the years
                     format="####", title_offset = 50, #format the axis
                     properties = axis_props(
                        ticks = list(stroke = "black"),
                        majorTicks = list(strokeWidth = 2),
                        labels = list(
                            angle = 50,
                            fontSize = 11,
                            align = "left",
                            baseline = "middle",
                            dx = 3
                        ))) %>>%
    add_axis("y", title = "3PT Attempts Per 240 Available Minutes", 
                     format="####", #uses d3 axis format
                     title_offset = 55, ticks = 20) %>>% #fix title and ticks
    layer_text() %>>% #adds the text 
    scale_nominal(property = 'fill', #use our pretty colors
                                domain = as.character(teams$team),
                                range = as.character(teams$primary_color)) %>>%
    add_legend("fill", orient = 'right', 
                         title = "Team", #need a pretty legend
                         properties = legend_props(
                            title = list(fontSize = 12),
                            labels = list(fontSize = 10, dx = 10),
                            symbol = list(stroke = "black", strokeWidth = 1,
                                                        size = 100)
                            ))

Wow look at this amazing chart. This clearly validates the linearity we explored earlier and shows that in recent times this Daryl Morey guy has clearly pushed for and structured teams around taking lots of three point shots. What is even more interesting is that there appears to be another candy loving, last minute ETO signing, father of 8 that is nearly universally despised by Nets and Lakers fans whose presence on a roster, whether by design or chance, appears as a constant in all but 1 of the team’s that have lead the NBA in 3 point shot attempts per 240 minutes since the 2009-10 season. Any idea who he may be? Here’s a hint


Wrapping It All Up, You Can Now Forever Be Down with L.A.V.

Well it looks like what we heard about Daryl Morey is true. He is optimizing his roster to do crazy things like taking insane amounts of threes. Will the strategy work and can the Rockets keep up this barrage of 3 point shots, only future us in late June know for sure, but if I were a betting man, I’d think both could happen but execution of this strategy lead to a championship? Again no one knows for sure, but looking at this data there were 2 teams that led the league in our 3 point shot attempts per 240 minutes who did win NBA titles. Who you ask? Look into it, but they played in a state home to Cowboys and were coached by a guy who nearly was killed during a game by the baddest of the bad power forwards to play the game, Kermit Washington.

We did alot today, we learned how to loop through a function, dug deeper into analyzing data and introduced the all powerful art of data visualization. We got down with L.A.V., something we should now be able anytime, anywhere as long as we have some data and


That wraps up today’s post. Here is the R source code and as always don’t hesitate to reach out to me with any questions or feedback on Twitter. I’ll be back soon with another post.