Sec 3: Mapping
At this point, we’ll begin to map the top 35 most populous cities, scaling the color palette to reveal local detail within each city. The top 35 cities are listed below:
cities <- c("New York, NY", "Los Angeles, CA","Chicago, IL", "Houston, TX",
"Philadelphia, PA", "Phoenix, AZ", "San Antonio, TX", "San Diego, CA",
"Dallas, TX", "San Jose, CA", "Austin, TX", "Jacksonville, FL",
"San Francisco, CA", "Indianapolis, IN", "Columbus, OH", "Fort Worth, TX",
"Charlotte, NC", "Detroit, MI", "El Paso, TX", "Seattle, WA",
"Denver, CO","Washington, DC", "Memphis, TN", "Boston, MA",
"Nashville, TN", "Baltimore, MD", "Oklahoma City, OK", "Portland, OR",
"Las Vegas, NV", "Louisville, KY","Milwaukee, WI","Albuquerque, NM",
"Tucson, AZ","Fresno, CA","Sacramento, CA")
Next, we’ll set up the graph layout with no margins (mai), with 7 rows and 5 columns (mfrow), with a navy blue background (bg).
##Set graph layout
par(mai=c(0,0,0,0),mfrow = c(7,5),bg='#001a4d', bty='n')
Now let’s get looping to create a montage of city maps. The process is as follows:
- First set a placeholder dataframe for coordinates
- For each city, geocode the city name for the centroid using the geocode() function from ggmaps, append the coordinates to the placeholder. This coordinates file will be used again later.
- Use the extent function to specify the spatial bounding box (the frame around the city) as +/-1 degree longitude and +/-0.25 degree latitude.
- Crop the raster file (rast) by the bounding box.
- Convert the cropped raster into a vector in order to get the radiances, then run a k-means clustering algorithm to find natural intervals within the radiance distribution. For each cluster, extract the maximum radiance.
- Map the city with a navy to yellow color palette with intervals from the k-means clustering.
##Loop through data
coords <- data.frame() ##place holder
for(i in 1:length(cities)){
##Coords
temp_coord <- geocode(cities[i], source = "google")
coords <- rbind(coords,temp_coord)
e <- extent(temp_coord$lon - 1, temp_coord$lon + 1,
temp_coord$lat - 0.25, temp_coord$lat + 0.25)
rc <- crop(rast, e)
##Rescale brackets
sampled <- as.vector(rc)
clusters <- 15
clust <- kmeans(sampled,clusters)$cluster
combined <- as.data.frame(cbind(sampled,clust))
brk <- sort(aggregate(combined[,1], list(combined[,2]), max)[,2])
#Plots
plot(rc, breaks=brk, col=colorRampPalette(c("#001a4d","#0066FF", "yellow"))(clusters),
legend=F,yaxt='n',xaxt='n',frame = F, asp=1.5)
text(temp_coord$lon ,temp_coord$lat + 0.15,
substr(cities[i],1,regexpr(",",cities[i])-1),
col="white", cex=1.25)
rm(combined)
}
Whereas the map above shows city-specific patterns, the second map focuses on comparing patterns across cities using the same color coding. Rather than running a k-means algorithm for each city in order to generate the radiance intervals, one interval is generated based on a random sample of pixels from across the country.
#Set layout
par(mai=c(0,0,0,0),mfrow = c(7,5),bg='#001a4d', bty='n')
#Run clustering
set.seed(123) #set seed for reproducibility
sampled <- sample(rast, 20000) #sample 20,000 pixels
clusters <- 15 ##15 clusters
clust <- kmeans(sampled,clusters)$cluster
combined <- as.data.frame(cbind(sampled,clust))
brk <- sort(aggregate(combined[,1], list(combined[,2]), max)[,2])
##Loop through each city
for(i in 1:length(cities)){
temp_coord <- coords[i,] ##re-use the coordinates
e <- extent(temp_coord$lon - 1, temp_coord$lon + 1,
temp_coord$lat - 0.25, temp_coord$lat + 0.25)
rc <- crop(rast, e)
#Plots
plot(rc, breaks=brk, col=colorRampPalette(c("#001a4d","#0066FF", "yellow"))(clusters),
legend=F,yaxt='n',xaxt='n',frame = F, asp=1.5)
text(temp_coord$lon ,temp_coord$lat + 0.15,
substr(cities[i],1,regexpr(",",cities[i])-1),
col="white", cex=1.25)
rm(combined)
}
Sec 4: Comparing measures through visualizations
Histogram by select MSA
For a quick example, we’ve selected five MSAs that are serially processed into histograms of logarithmically transformed radiance. The data is processed first for each of the five MSAs, then turned into a series of histograms using a combination of ggplot and plotly.
##MSAs by GEOID
msa_list <- c(16180,19140,45820,42540,35620)
##Placeholder
radiances <- data.frame()
##Loop MSA file
for(i in msa_list){
print(i)
#Extract MSA i polygon
shp_temp <- msa[msa@data$GEOID==i,]
#Extract MSA abbreviated name
if(regexpr("-",as.character(shp_temp@data$NAME)[1])[1]==-1){
loc = as.character(substr(as.character(shp_temp@data$NAME)[1],1,regexpr(",",as.character(shp_temp@data$NAME)[1])-1))
} else{
loc = as.character(substr(as.character(shp_temp@data$NAME)[1],1,regexpr("-",as.character(shp_temp@data$NAME)[1])-1))
}
#Extract the radiances, append to radiances placeholder
rad <- masq(shp_temp,rast,1)$avg_rad
temp <- data.frame(loc = as.character(paste(loc,"(TNL = ",round(sum(rad),0),")",sep="")), avg_rad = rad)
radiances <- rbind(radiances,temp)
}
#Use ggplot to create histograms by MSA group. Preload.
ggplot(radiances, aes(x=log(avg_rad))) +
geom_histogram(position="identity", alpha=0.4) +
facet_grid(. ~ loc)
#Remove all axes labels for style
x <- list(
zeroline = FALSE,
showline = FALSE,
showticklabels = FALSE,
showgrid = FALSE
)
y <- list(
zeroline = FALSE,
showline = FALSE,
showticklabels = FALSE,
showgrid = FALSE
)
#Initiate a plotly graph without axes
ggplotly() %>% layout(xaxis=x, yaxis=y)
Scatterplot of TNL vs. Population
To extract radiance data from five MSAs take a few seconds, but a few hundred MSAs takes a while longer. We rely on the doParallel and foreach libraries to parallelize data extraction. Below, two processor cores are used to extract the TNL for each MSA and record into a consolidated data frame of MSAs and TNL. This data frame is then joined with the MSA population files to allow direct comparison between population and TNL. The ultimate interactive graph illustrates a clear positive correlation between the two quantities.
##Set up comparisons
registerDoParallel(cores=2)
extract <- foreach(i=1:nrow(msa@data), .combine=rbind) %dopar% {
data <- masq(msa,rast,i)
data.frame(GEOID = data$GEOID[1],sum = sum(data$avg_rad))
}
extract$GEOID <- as.numeric(as.character(extract$GEOID))
##Join in data
joined<-merge(extract, msa_pop[,c("CBSA","NAME","POPESTIMATE2014")],by.x="GEOID",by.y="CBSA")
colnames(joined) <- c("GEOID","TNL","MSA","Population")
plot_ly(joined,
x = log(TNL),
y = log(Population),
text = paste("MSA: ", MSA),
mode = "markers",
color = TNL,colors="PuOr") %>%
layout(title="Total Nighttime Light vs. Population", showlegend = F)
Basic joining is just the beginning. In an upcoming tutorial, we will demonstrate how to mine county-level data and identify which can be accurately predicted using VIIRS data.