Primary Land Use Tax Lot Output [PLUTO]
For those that aren’t familiar with PLUTO I best equate it to the list of franchises in a professional sports league. Instead of it being a sports team it is a unique plot of land and the structure that may or may not be built on top of it.
PLUTO returns stat sheet containing all the numeric and categoric data for every plot of land in New York City. The data is extremely robust, you can spend a few moments looking through the information it contains here.
The data is updated annually and changes as land use changes. The data can is also meant to be linked with other data stores including ACRIS, the Department of Buildings, and The Department of Tax and Finance. When you do this you are truly able to get a clear view of the characteristics and history of every piece of land in New York.
One important thing to remember is that there is a one-to-many relationship between a unique plot of land and the number of tax IDs it contains. An easy to understand example is a condominium building. One condominum building can contain many units, each with their own tax-lots. PLUTO will not give you information about the tax lots in the building but nycRealEstateR
has another method to allow you to get that information.
Now that I have explained PLUTO lets actually explore some if it.
Acquiring PLUTO Data
The package contains 2 methods for importing PLUTO.
get_data_nyc_property_file(
id_data_set = "PLUTO",
filter_na_address = TRUE,
return_message = FALSE
)
This method goes onto the web, downloads, cleans, and parses PLUTO. This is is timeconsuming and bandwith consuming but is worth doing if you don’t yet have the data stored locally or if the data was recently updated.
You can also import a cached version of this data which I have stored in the cloud. I am going to us this method to demonstrate some of the interesting things one can do with this data.
df_pluto <-
import_pluto(only_key_data = F)
Lets take a look at the structure of the data.
df_pluto %>%
glimpse()
Now lets take a look at the top 5 buildings by built size by borough and parent class letter
df_pluto %>%
group_by(codeBorough, letterClass) %>%
arrange(desc(areaBuildingSF)) %>%
slice(1:5) %>%
ungroup() %>%
DT::datatable(
options = list(pageLength = 5,
lengthMenu = c(5, 10, 15, 20),
scrollX = TRUE),
class = "display",
caption = NULL,
filter = c("none", "bottom",
"top"),
escape = TRUE,
style = "default",
width = NULL,
height = NULL,
elementId = NULL,
fillContainer = getOption("DT.fillContainer", NULL),
autoHideNavigation = getOption("DT.autoHideNavigation",
NULL),
selection = c("multiple", "single", "none"),
plugins = NULL)
As you can see there are 99 different variables covering things like land use, current building-area, allowable buildable area. We now also know that there are 858,396 unique plots of land throughout the 5 boroughs of New York.
Lets take this a bit further and perform some Exploratory Data Analysis
Distribution by Borough
Lets take a look at how some of this information distributes by borough.
df_buildings_borough <-
df_pluto %>%
group_by(codeBorough) %>%
summarise(countBuildings = n(),
amountSFBuilt = sum(areaBuildingSF),
areaSFBuildable = sum(areaAllowableSF),
areaLand = sum(areaLotSF)) %>%
mutate(ratioFAR = amountSFBuilt / areaLand,
ratioFARBuildable = areaSFBuildable / areaLand) %>%
mutate_if(is.numeric, funs(. %>% formattable::comma(digits = 0))) %>%
arrange(desc(countBuildings))
df_buildings_borough %>% DT::datatable()
Number of buildings
df_buildings_borough %>%
asbmisc::plot_hc_xy(
type = "column",
category = "codeBorough",
y_var = "countBuildings",
title = "New York City Buildings by Borough"
)
Amount of Built SF
df_buildings_borough %>%
mutate(amountSFBuilt = amountSFBuilt/1000000) %>%
plot_hc_xy(type = "waterfall", category = "codeBorough", y_var = "amountSFBuilt",
title = "New York City Built Square Footage by Borough") %>%
hc_yAxis(
tickAmount = 8,
lineColor = "transparent",
minorGridLineWidth = 0,
gridLineColor = "transparent",
labels = list(format = "{value}M SF"))
FAR Distribuion
df_buildings_borough %>%
arrange((ratioFAR)) %>%
plot_hc_xy(
type = "columnrange" ,
category = "codeBorough",
low_var = "ratioFAR",
high_var = "ratioFARBuildable",
title = "FAR Built vs Buildable by Borough"
) %>%
hc_yAxis(
tickAmount = 8,
lineColor = "transparent",
minorGridLineWidth = 0,
gridLineColor = "transparent",
labels = list(format = "{value} FAR"),
title = "FAR"
)
## Joining, by = "idRow"