This R Markdown Notebook will provide you a quick tutorial on how to use my package, nbastatR, and the plotly extension, heatmaply to create an interactive heatmap exploring the future cap space of the 30 NBA teams.

Step 1: Load the required packages

Please note that if you don’t have the following packages installed follow the code in the comments. I also recommend that even if you have these packages installed that you run the code in the comments and update to the most recent development version of each package.

Step 2: Acquire the data

This is the most important step, bringing the data into R to explore. In order to do that we will use nbastatR and its function get_teams_yahoo_team_salary_data which brings in team salary data from Yahoo’s fantastic new team salary pages.

all_team_data <-
  get_teams_yahoo_team_salary_data(
    use_all_teams = T,
    nest_data = F
  )
You got team summary salary data for Boston Celtics
You got team summary salary data for Brooklyn Nets
You got team summary salary data for New York Knicks
You got team summary salary data for Philadelphia 76ers
You got team summary salary data for Toronto Raptors
You got team summary salary data for Chicago Bulls
You got team summary salary data for Cleveland Cavaliers
You got team summary salary data for Detroit Pistons
You got team summary salary data for Indiana Pacers
You got team summary salary data for Milwaukee Bucks
You got team summary salary data for Atlanta Hawks
You got team summary salary data for Charlotte Hornets
You got team summary salary data for Miami Heat
You got team summary salary data for Orlando Magic
You got team summary salary data for Washington Wizards
You got team summary salary data for Golden State Warriors
You got team summary salary data for Los Angeles Clippers
You got team summary salary data for Los Angeles Lakers
You got team summary salary data for Phoenix Suns
You got team summary salary data for Sacramento Kings
You got team summary salary data for Dallas Mavericks
You got team summary salary data for Houston Rockets
You got team summary salary data for Memphis Grizzlies
You got team summary salary data for New Orleans Pelicans
You got team summary salary data for San Antonio Spurs
You got team summary salary data for Denver Nuggets
You got team summary salary data for Minnesota Timberwolves
You got team summary salary data for Oklahoma City Thunder
You got team summary salary data for Portland Trail Blazers
You got team summary salary data for Utah Jazz

Step 3: Mung the and summarize the data

The next step involves a little data munging. The first thing we will do is convert the idSeason variable into a factor in order to preserve the variable’s order.

Next, since we only want to explore future available cap space we limit our items to the projected amount of cap space and cap holds {if you don’t know what a cap hold is I highly advise you to read about them here}.

Next, since we want to look at the data by team and season, we group by the season, team, and item. After this we summarize the totals, and convert the values into amount in millions. Then we convert the data from tidy long form to wide form.

Spreading the data creates NA values since certain teams have no cap holds in the future, to fix that we replace NAs with zeros. After that we create a new variable that is the sum of the amount of available cap space and the amount of projected cap holds. This is the number we will explore that shows nominally the amount of money available for teams to sign players while not being over the salary cap, note that after 2016-17 the salary cap number is a projection.

The final step involves a little more tidying to get the data frame into our desired data format.

Finally, we will select the only the variables we want to visualize, season, and projected cap space and add team slug as the row-name of the data.

team_cap_space <-
  all_team_data %>%
  mutate(idSeason = idSeason %>% factor(ordered = T)) %>%
  dplyr::filter(nameItem %in% c('amountCapSpace', 'amountCapHold')) %>%
  dplyr::select(idSeason, slugTeamYahoo, nameItem, value) %>%
  group_by(idSeason, slugTeamYahoo, nameItem) %>%
  summarise(value = sum(value, na.rm = T) / 1000000) %>%
  ungroup %>%
  spread(nameItem, value) %>%
  replace_na(list(amountCapHold = 0)) %>%
  mutate(amountSpaceLessHold = amountCapHold + amountCapSpace) %>%
  dplyr::select(idSeason, slugTeamYahoo, amountSpaceLessHold) %>%
  gather(item, amountSpaceLessHold, -c(idSeason, slugTeamYahoo)) %>%
  dplyr::select(-item) %>%
  spread(idSeason, amountSpaceLessHold)
row.names(team_cap_space) <-
  team_cap_space$slugTeamYahoo
Warning: Setting row names on a tibble is deprecated.
plot_data <-
  team_cap_space %>%
  dplyr::select(-slugTeamYahoo)

Team cap space by season, excluding cap holds

This data makes some assumptions that are not likely in reality including, player Early Termination Options aren’t exercised and both Team Options and Player Options are exercised. Put another way it assumes that the player finishes out the contract in its absolute term. While this may be an unreasonable assumption, it also means that we know now that in all likelihood teams will have more available cap space than we see here in the event Player/Team options are not exercised, and Early Termination Options are exercised.


Step 4: Calculate the optimal number of clusters

This is an important step. There are many ways to try to optimize the number of clusters within a set of data. I personally prefer the mclust package which we will use today. For those interested in other possible clustering optimization methods this stackoverflow post is an absolute MUST read.


team_clusters <-
  plot_data %>% Mclust() %>% .$z %>% dim %>% .[2]
[1] 6

Step 5: Plot the clustered heat maps

The last step is the to plot the heat maps. We will do 2 heat maps, 1 of using the nominal amount of projected cap space, and the other using the amount of projected cap space scaled at mean zero. Please note that this code will only work if you have the Myriad Pro font library installed. You can either install the font or just take the font out of the code.

Plot 1: Unscaled heatmap
unscaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond", # Requires Myriad Pro font to be installed on your computer
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Less Cap Holds (millions)"
  )
Plot 1: Scaled heatmap
scaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T,
    scale = 'col'
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond",
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Scaled Mean 0"
  )

Conclusion

Now you have successfully explored the entire landscape of NBA salary through the 2020-21 season in only a few lines of code all thanks to R and it’s community of package developers! Until next time, keep learning and exploring, preferably in R.

---
title: "Exploring NBA team salary cap room with nbastatR and heatmaply"
author: "Alex Bresler"
date: "2016-07-07"
output: 
  html_notebook: 
    css: semantic_css/semantic.min.css
    fig_height: 5
    fig_width: 12
    toc: no
---
<script src="semantic_css/semantic.min.js"></script>
<br>
<p>This [R Markdown Notebook](http://rmarkdown.rstudio.com/r_notebooks.html) will provide you a quick tutorial on how to use my package, [nbastatR](https://github.com/abresler/nbastatR), and the [plotly](https://plot.ly/) extension, [heatmaply](https://github.com/talgalili/heatmaply) to create an interactive heatmap exploring the future cap space of the 30 NBA teams.</p>

<h4>Step 1: Load the required packages</h4>

Please note that if you don't have the following packages installed follow the code in the comments.  I also recommend that even if you have these packages installed that you run the code in the comments and update to the most recent development version of each package.

```{r load_packages, echo=T, message=FALSE, warning=FALSE, results='hide'}
lapply(
  c('heatmaply', # devtools::install_github('talgalili/heatmaply')
    'mclust', #install.packages('mclust')
    'dplyr', # devtools::install_github('hadley/dplyr')
    'plotly', # devtools::install_github('ropensci/plotly')
    'nbastatR', # devtools::install_github('abresler/nbastatR')
    'purrr', # devtools::install_github('hadley/purrr')
    'tidyr' # devtools::install_github('hadley/tidyr')
  ),
  library,
  character.only = T
)

options(digits = 4)

```

<h4>Step 2: Acquire the data</h4>

This is the most important step, bringing the data into R to explore.  In order to do that we will use `nbastatR` and its function `get_teams_yahoo_team_salary_data` which brings in team salary data from Yahoo's fantastic new team salary pages.

```{r get_data}
all_team_data <-
  get_teams_yahoo_team_salary_data(
    use_all_teams = T,
    nest_data = F
  )

```


<h4>Step 3: Mung the and summarize the data</h4>

The next step involves a little data munging.  The first thing we will do is convert the `idSeason` variable into a factor in order to preserve the variable's order.  

Next, since we only want to explore future available cap space we limit our items to the projected amount of cap space and cap holds {if you don’t know what a cap hold is I highly advise you to read about them [here](http://www.cbafaq.com/salarycap.htm#Q14)}.  

Next, since we want to look at the data by team and season, we group by the season, team, and item. After this we summarize the totals, and convert the values into amount in millions. Then we convert the data from [tidy](http://vita.had.co.nz/papers/tidy-data.html) long form to wide form.  

Spreading the data creates NA values since certain teams have no cap holds in the future, to fix that we replace NAs with zeros.  After that we create a new variable that is the sum of the amount of available cap space and the amount of projected cap holds.  This is the number we will explore that shows nominally the amount of money available for teams to sign players while not being over the salary cap, note that after 2016-17 the salary cap number is a projection.  

The final step involves a little more tidying to get the data frame into our desired data format.

Finally, we will select the only the variables we want to visualize, season, and projected cap space and add team slug as the row-name of the data.

```{r}
team_cap_space <-
  all_team_data %>%
  mutate(idSeason = idSeason %>% factor(ordered = T)) %>%
  dplyr::filter(nameItem %in% c('amountCapSpace', 'amountCapHold')) %>%
  dplyr::select(idSeason, slugTeamYahoo, nameItem, value) %>%
  group_by(idSeason, slugTeamYahoo, nameItem) %>%
  summarise(value = sum(value, na.rm = T) / 1000000) %>%
  ungroup %>%
  spread(nameItem, value) %>%
  replace_na(list(amountCapHold = 0)) %>%
  mutate(amountSpaceLessHold = amountCapHold + amountCapSpace) %>%
  dplyr::select(idSeason, slugTeamYahoo, amountSpaceLessHold) %>%
  gather(item, amountSpaceLessHold, -c(idSeason, slugTeamYahoo)) %>%
  dplyr::select(-item) %>%
  spread(idSeason, amountSpaceLessHold)

row.names(team_cap_space) <-
  team_cap_space$slugTeamYahoo

plot_data <-
  team_cap_space %>%
  dplyr::select(-slugTeamYahoo)
```

#### Team cap space by season, excluding cap holds

This data makes some assumptions that are not likely in reality including, player [Early Termination Options](http://www.cbafaq.com/salarycap.htm#Q59) aren't exercised and both [Team Options](http://www.cbafaq.com/salarycap.htm#Q59) and [Player Options](http://www.cbafaq.com/salarycap.htm#Q59) are exercised.  Put another way it assumes that the player finishes out the contract in its absolute term.  While this may be an unreasonable assumption, it also means that we know now that in all likelihood teams will have more available cap space than we see here in the event Player/Team options are not exercised, and Early Termination Options are exercised.

<br>
```{r results = 'asis', echo=F}
p2 <- 
  plot_data

rownames(p2) <-
  team_cap_space$slugTeamYahoo %>%
  paste0(
    "<a href ='",
    all_team_data %>% arrange(slugTeamYahoo) %>% .$urlTeamSalaryYahoo %>% unique,
    "' target='_blank'>",
    .,
    "</a>"
  )
formattable::formattable(p2)
```

<h4>Step 4: Calculate the optimal number of clusters</h4>

This is an important step.  There are many ways to try to optimize the number of clusters within a set of data.  I personally prefer the [mclust](http://www.stat.washington.edu/mclust/) package which we will use today.  For those interested in other possible clustering optimization methods [this stackoverflow post](http://stackoverflow.com/questions/15376075/cluster-analysis-in-r-determine-the-optimal-number-of-clusters) is an absolute **MUST** read.

```{r team_clusters, include=TRUE}
team_clusters <-
  plot_data %>% Mclust() %>% .$z %>% dim %>% .[2]
  
```

<h4>Step 5: Plot the clustered heat maps</h4>

The last step is the to plot the heat maps.  We will do 2 heat maps, 1 of using the nominal amount of projected cap space, and the other using the amount of projected cap space scaled at mean zero.  Please note that this code will only work if you have the [Myriad Pro](http://fontsup.com/font/myriad-pro-semibold-condensed.html) font library installed.  You can either install the font or just take the font out of the code.

<h5>Plot 1: Unscaled heatmap</h5>

```{r heatmap_salaries, include=TRUE}
unscaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond", # Requires Myriad Pro font to be installed on your computer
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Less Cap Holds (millions)"
  )

```

```{r unscaled_plot, results = 'asis', echo=F, fig.align = 'center', fig.width=12}
unscaled
```


<h5>Plot 1: Scaled heatmap</h5>

```{r heatmap_salaries_scaled, include=TRUE}
scaled <-
  plot_data %>%
  heatmaply(
    column_text_angle = 0,
    k_row = team_clusters,
    Colv = FALSE,
    Rowv = T,
    scale = 'col'
  ) %>%
  layout(
    margin = list(l = 130, b = 40),
    font = list(
      family = "MyriadPro-Cond",
      size = 12,
      color = "#7f7f7f"
    ),
    title = "NBA Cap Space by Team and Season, Scaled Mean 0"
  )

```

```{r scaled_plot, results = 'asis', echo=F, fig.align = 'center', fig.width=12}
scaled
```

<h4>Conclusion</h4>

Now you have successfully explored the entire landscape of NBA salary through the 2020-21 season in only a few lines of code all thanks to R and it's community of package developers!  Until next time, keep learning and exploring, preferably in R.
