There is no single package for mapping data in R
. I’ve found that different packages and functions perform best for different types of maps. In this tutorial, I will walk you through several packages, data sources, and plotting tools. I realize this may be a bit more confusing than focusing on a single package, but after quite a bit of thought on the matter, I decided the most useful thing would be to introduce you to the best tools I’ve found to date and let you decide which package/toolkit is the most appropriate for your visualization needs. With this in mind, our objectives for this tutorial are as follows:
R
to…R
is fantastic for spatial analysis. R
is OK at spatial data visualization. R
is great for interactive data visualization (Leaflet, Shiny). If you have a highly map-centric project, there is nothing wrong with working in ArcGIS or QGIS if you think the mapping tools I show you in this tutorial are insufficient.R
.Make sure you’ve installed and loaded the following:
library(ggmap)
library(tidyverse)
library(rgdal)
library(tmap)
In what follows, I’m going to walk you through the basics of map-making in R
. We’ll start by learning how to access some great basemaps (not to be used for analysis, just for visualization), the slowly add shapefiles and rasters. We’ll learn how to make some common maps (i.e. choropleth) in R
and walk through a couple of packages that simplify the whole map-making process. We won’t have time to get to interactive maps, which is one of the cooler things you can do with R
, but if you’re interested in this, then you’ll need to learn more about Leaflet and Shiny. You can see a few examples of interactive maps made in R
here.
This week we are going to work with datasets from the Utah GIS repository. You can download each dataset at the links below:
Once you’ve downloaded the data, load them into R
. I’ve loaded them with the readOGR()
function from rgdal
, but by now you know several ways to load spatial data, so you do you:
library(rgdal)
trails <- readOGR("./data/HistoricTrails.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/HistoricTrails.shp", layer: "HistoricTrails"
## with 3415 features
## It has 9 fields
cities <- readOGR("./data/CitiesTownsLocations.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CitiesTownsLocations.shp", layer: "CitiesTownsLocations"
## with 460 features
## It has 7 fields
## Integer64 fields read as strings: POPULATION
tracts <- readOGR("./data/CensusTracts2010.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CensusTracts2010.shp", layer: "CensusTracts2010"
## with 588 features
## It has 31 fields
## Integer64 fields read as strings: P0010011 P0010012 P0010013
Luckily, all of these datasets are already in the same projection. This allows us to quickly visualize all three datasets in a single plot. If they weren’t in the same projection, you’d need to reproject them to a single projection before doing any visualization.
plot(tracts)
plot(cities, add = T, col = "red")
plot(trails, add = T, col = "blue")
You can actually get fairly far with the base R
plot()
function. The code gets dense quite quickly and the visualizations are nothing like what tmap
and ggplot/ggmap
can produce (see below). I recommend using plot()
only to quickly visualize your data. Data visualizations you intend to publish or share should be made with one of the packages I describe below. If you’re interested in learning more about using plot()
, I recommend Chapter 3 of Applied Spatial Data Analysis in R by Roger Bivand and colleagues. Below, I’ve inserted a table from p. 68 of this text that reviews useful arguments to be passed to the plot()
function when working with spatial objects. If you want to learn even more about plot()
, go here and here and/or here and/or here.
Before we begin, I am going to transform these datasets into a lat/long coordinate system. This will help us add them to web-based maps later in the tutorial:
crs <- CRS("+init=epsg:4269")
tracts <- spTransform(tracts, crs)
cities <- spTransform(cities, crs)
trails <- spTransform(trails, crs)
ggplot2
and ggmap
You can do quite a bit of mapping with our friend ggplot2
. The disadvantage of ggplot
is that you have to transform any sp
object you’re working with into a data.frame
.1 The tool you use to transform a sp
object to a data.frame
depends on the type of sp
object you’re working with. For point data, you can use the data.frame()
function like so:
cities_df <- data.frame(cities)
head(cities_df)
## NAME COUNTY COUNTYNBR COUNTYSEAT POPULATION TYPE
## 1 Milford Beaver 01 0 1422 City
## 2 Minersville Beaver 01 0 921 Town
## 3 Snowville Box Elder 02 0 178 Town
## 4 Bear River City Box Elder 02 0 869 City
## 5 Corinne Box Elder 02 0 694 City
## 6 Deweyville Box Elder 02 0 347 Town
## CARTO coords.x1 coords.x2 optional
## 1 Town/Place -113.0108 38.39707 TRUE
## 2 Town/Place -112.9240 38.21553 TRUE
## 3 Town/Place -112.7105 41.96263 TRUE
## 4 Town/Place -112.1270 41.61653 TRUE
## 5 Town/Place -112.1093 41.55077 TRUE
## 6 Town/Place -112.0928 41.71537 TRUE
This function essentially appends the cities@coords
coordinate columns (here coords.x1
and coords.x2
) to the cities@data
data.frame
.
How would we transform polygons or lines into a data.frame
? Think of a line shape as a collection of points connected by straight lines. We can store coordinates for each point and the order in which points should be drawn in rows describing a single line. Attributes for a single line can be repeated for each point on a line. Let me show you this with our trail data. First, I’ll select a single route using spdplyr
to make this a bit easier to understand:
library(spdplyr)
brl <- trails %>%
filter(RouteName == "Bear River Loop")
plot(brl)
I have isolated one route from the larger trails
dataset. This SpatialLinesDataFrame
route is stored in several line segments:
glimpse(brl@data)
## Observations: 7
## Variables: 9
## $ TrailName <fct> California Trail, California Trail, California Trail,…
## $ RouteName <fct> Bear River Loop, Bear River Loop, Bear River Loop, Be…
## $ AltRouteNa <fct> NA, NA, NA, NA, NA, NA, NA
## $ County <fct> Lincoln, Lincoln, Bear Lake, Bear Lake, Bear Lake, Be…
## $ State <fct> WY, WY, ID, ID, ID, ID, ID
## $ Scale <fct> "1:100,000", "1:100,000", "1:100,000", "1:100,000", "…
## $ AdminAgenc <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Source <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Shape_Leng <dbl> 4003.20738, 69.89398, 8575.35463, 10733.19146, 617.61…
To plot this using ggplot
, I need to transform this set of lines into a data.frame
. The fortify()
function essentially transforms this SpatialLinesDataFrame
into a data.frame
where each row describes the coordinates and plotting order of a single point along a line. Watch:
brl_df <- fortify(brl)
head(brl_df)
## long lat order piece id group
## 1 -111.0003 42.17425 1 1 219 219.1
## 2 -111.0071 42.17616 2 1 219 219.1
## 3 -111.0119 42.17795 3 1 219 219.1
## 4 -111.0173 42.17985 4 1 219 219.1
## 5 -111.0203 42.18088 5 1 219 219.1
## 6 -111.0229 42.18156 6 1 219 219.1
Notice that the new data frame contains coordinates (long
and lat
) for points along a line. order
indicates the order in which a point should be drawn and group
indicates the line segment to which the point belongs. We can append the other data in the original brl
SpatialLinesDataFrame
by using the id
variable:
brl@data$id <- rownames(brl@data)
brl_df <- merge(brl_df, brl@data, by = "id")
Now we can plot this line using ggplot()
:
ggplot(brl_df) +
geom_path(aes(x = long, y = lat, group = group)) +
coord_fixed()
I’ll explain how to build maps in a second. First, let me walk you through fortify()
with a polygon. Like lines, think of a polygon as an area described by a collection of points joined by lines to enclose a polygon. When we fortify()
a SpatialPolygonsDataFrame
into a data.frame
, we generate a row for each point, an order in which points should be plotted, and the group, or polygon, to which a point belongs:
tract_df <- fortify(tracts)
## Regions defined for each Polygons
tracts@data$id <- rownames(tracts@data)
tract_df <- merge(tract_df, tracts@data, by = "id")
head(tract_df)
## id long lat order hole piece group STATEFP10 COUNTYFP10
## 1 0 -112.0021 41.52189 1 FALSE 1 0.1 49 003
## 2 0 -112.0020 41.52181 2 FALSE 1 0.1 49 003
## 3 0 -112.0020 41.52179 3 FALSE 1 0.1 49 003
## 4 0 -112.0017 41.52155 4 FALSE 1 0.1 49 003
## 5 0 -112.0016 41.52147 5 FALSE 1 0.1 49 003
## 6 0 -112.0015 41.52136 6 FALSE 1 0.1 49 003
## TRACTCE10 GEOID10 NAME10 FUNCSTAT10 INTPTLAT10 INTPTLON10
## 1 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 2 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 3 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 4 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 5 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 6 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## LOGRECNO AREALAND AREAWATR POP100 HU100 P0020001 P0020002 P0020003
## 1 0000009 2185421 0 3720 1395 3720 396 3324
## 2 0000009 2185421 0 3720 1395 3720 396 3324
## 3 0000009 2185421 0 3720 1395 3720 396 3324
## 4 0000009 2185421 0 3720 1395 3720 396 3324
## 5 0000009 2185421 0 3720 1395 3720 396 3324
## 6 0000009 2185421 0 3720 1395 3720 396 3324
## P0020004 P0020005 P0020006 P0020007 P0020008 P0020009 P0020010 MTFCC
## 1 3262 3188 17 27 15 6 9 G5020
## 2 3262 3188 17 27 15 6 9 G5020
## 3 3262 3188 17 27 15 6 9 G5020
## 4 3262 3188 17 27 15 6 9 G5020
## 5 3262 3188 17 27 15 6 9 G5020
## 6 3262 3188 17 27 15 6 9 G5020
## P0010011 P0010012 P0010013 P0010014 SqMiles SHAPE_Leng SHAPE_Area
## 1 11 30 17 0 0 6709.768 2184057
## 2 11 30 17 0 0 6709.768 2184057
## 3 11 30 17 0 0 6709.768 2184057
## 4 11 30 17 0 0 6709.768 2184057
## 5 11 30 17 0 0 6709.768 2184057
## 6 11 30 17 0 0 6709.768 2184057
Now let’s get to mapping. One of the nice things about ggplot
is that you can easily grab base maps off the web using ggmap
. Let’s say we want to grab a map of Utah from Google Maps or OpenStreetMap.
library(ggmap)
base_map <- get_map(location = 'utah', zoom = 6, maptype = 'satellite', source = 'google', crop = T)
plot(base_map)
You can play around with the zoom
and maptype
to get the map you need. This website provides a nice overview of the maptype
and source
options. You drop in coordinates for your location or use the bounding box of a shapefile you’re working with to define the basemap extent. Note that these basemaps are projected in a Web Mercator projection that makes rendering visualizations more efficient. These basemaps aren’t to be used in analyses, just for visualization. If you plan on using a ggmap
basemap or other basemaps with projected data, I strongly recommend you read this article. Great overviews of all the things you can do with this package can be found here and here.
Once you find a basemap you like, you can add layers of data to ggmap()
just like you would with ggplot()
(e.g. + geom_point()
). Note, however, that ggmap()
sets the default map dataset and aesthetics. This means that as you add layers other than the basemap, you have to specify both the mapping
and data
arguments to the geom.
trails_df <- fortify(trails)
trails@data$id <- rownames(trails@data)
trails_df <- merge(trails_df, trails@data, by = "id")
ggmap(base_map) +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).
The coord_fixed()
command is important. It tells ggplot
how to fix the aspect ratio between latitude and longitude (y/x). If we set a ratio above one, we exaggerate, latitude.
ggmap(base_map) +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
coord_fixed(3.0)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).
And a low value exaggerates longitude:
ggmap(base_map) +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
coord_fixed(0.3)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).
We can play around with aesthetics to make this look a bit better, say by coloring our cities based the county in which they are located and changing our trail color to gray so they are easier to detect:
ggmap(base_map) +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2, color = COUNTY)) +
geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "gray") +
coord_fixed(1.0)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).
We can also plot polygons in using geom_polygon()
:
ggmap(base_map) +
geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), fill = "gray", color = "black", alpha = 0.4)
Just like our polygon, we need to specify the aesthetic x
, y
, and group
. I added information telling ggplot
how to plot the fill color, border color, and shading of the polygons.
Finally, let’s change the color of each census tract based on population. To do this, add fill
to your aesthetic and specify the variable of interest. I also dropped the alpha
option which specified transparency.
ggmap(base_map) +
geom_polygon(data = tract_df, aes(x = long, y = lat, group = group, fill = POP100), color = "black") +
guides(fill=guide_legend(title="Population"))
Don’t forget to change the legend title! More about manipulating legends with ggplot2
found here. Of course, you can also make maps without a basemap. Here are two quick examples:
ggplot() +
geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), alpha = 0.2, color = "black") +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2), color = "orange") +
geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "dark green")
And if you want to drop the gray background, just add + theme_nothing()
:
ggplot() +
geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), alpha = 0.2, color = "black") +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2), color = "orange") +
geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "dark green") +
theme_nothing()
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
If you look at this code all at once, it may seem overwhelming, but the big advantage of ggplot
is that it’s modular. You can - and should - add pieces to your visualization one step at a time. If you think you’ll be mapping a lot with ggplot
, this tutorial walks you through the whole process with different data.
sp
With the sp
package, you can skip the fortify()
step and jump right to plotting your Spatial
object. Let’s say I have a line shapefile of historic trails in Utah, a point shapefile indicating the location of major cities and towns in the state, and a polygon shapefile indicating counties.
spplot(tracts, "POP100")
We can play around with the colors using RColorBrewer
. In this case, the high population of the area around Salt Lake City makes it hard to tease out population differences in other areas. We can do a quick scale transformation to help with visualization.
library(RColorBrewer)
#display.brewer.all() # prints out palette options to console
palette <- brewer.pal(n = 7, name = "YlGnBu")
spplot(tracts, "POP100", main = "Population", col.regions = palette, cuts = 6, col = "transparent")
Here, I’ve specified my own color palette using RColorBrewer
, added a title, and set the cuts
(breaks) in the symbology (note that in general cuts
should be one less than the n
breaks defined in your palette). I also set county boundaries to transparent, but am regretting this decision since there are so many contiguous boundaries with no difference in population. I’m also noticing that the high population of the Salt Lake City region makes it hard for us to see population variations in rural Utah. I’ll add a few more breaks to my color palette to help us see this. Another option is to log transform population.
palette <- brewer.pal(n = 9, name = "YlGnBu")
spplot(tracts, "POP100", main = "Population", col.regions = palette, cuts = 8, col = "gray")
If you need to specify specific color breaks, this tutorial will help.
What if we want to add additional layers to this choropleth map? We can do this using sp.layout
. To use sp.layout
, first create a new list where the first item is the type of layer to add, the second is the actual object to plot, and any following items are plotting options:
trail_layer <- list("sp.lines", trails, col = "dark green")
spplot(tracts, "POP100", sp.layout = trail_layer, main = "Population", col.regions = palette, cuts = 8, col = "gray")
If you want to add multiple shapefiles, just join them in a list:
city_layer <- list("sp.points", cities, col = "blue")
spplot(tracts, "POP100", sp.layout = list(trail_layer, city_layer), main = "Population", col.regions = palette, cuts = 8, col = "gray")
I find this sp.layout
step clunky. This is how you add additional info to spplot()
, which I find far more complicated than the +
approach of ggplot
. If you’re interested in going further with spplot()
, check out this example and this tutorial.
tmap
I’ve saved the best for last. The tmap
package makes it easy to visualize sp
and raster
objects in R
. It’s far more elegant than the other options I’ve shown you and makes creating interactive maps very simple. It also makes all of the DOGTAILS work straightforward. I am going to walk through some of the tools in the tmap
package as applied to our Utah dataset. The full package vignette is fantastic and found here.
Let’s start with the qtm()
or quick thematic map tool. This quickly creates maps of any shape object (similar to the base R
plot()
function):
library(tmap)
tmap_mode("plot") # set to "view" for interactive visualizations, see below
## tmap mode set to plotting
qtm(trails)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
We can use qtm()
to build fast choropleth maps:
qtm(tracts, fill = "POP100", fill.title = "Population in Utah")
Notice how without specification, qtm()
creates a legend! As you bang your head against your desk expressing frustration over the last four packages we’ve gone through when, ultimately, you’ll probably just use this package, remember that knowledge is POWER! I wanted to introduce you to the MANY tools out there, you never know when you might get stuck and need one of them. qtm()
has a number of cool arguments:
text
allows you to specify text to go over each polygon/attribute in your shapefile (i.e. state names, abbreviations, whatever)text.size
controls text size.format
- the package comes with a few pre-defined map formats that are worth checking out. There are also a few predefined style
options included.fill.title
titles the legendtext.root
determines how text size is increasedfill.textNA
allows you to specify the name of attributes with value NA
In addition to this awesome tool, tmap
has a number of other tools for specific spatial data types. I’ve reproduced the full list from the tmap
vignette here and recommend you go read this vignette if you plan on plotting with tmap
.
Think of these as similar to the geo
elements you add to a ggplot
. Like ggplot
, you can use tmap
to add complexity and layers to a map. Unlike ggplot
, we do not have to transform our sp
objects into data.frames
. Hooray! Here’s an example with our Utah data sets. First first element is tm_shape()
, which specifies the shape object. Then we can add additional information. Here’s a very simple example:
tm_shape(tracts) +
tm_fill()
Very informative map. Basically here I tell tmap
that I am working with the tracts
shapefile (tm_shape(tracts)
) and I just want to fill in the whole thing in the default color, a bleak shade of gray. We can do better than this:
tm_shape(tracts) +
tm_fill(col = "POP100", title = "Population (x100)") +
tm_compass() +
tm_scale_bar()
Oooohhh! We can add more information to tm_fill()
to specify exactly how we want to fill polygons. Here’s one example:
tm_shape(tracts) +
tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
tm_compass() +
tm_scale_bar()
You know what to do to get more info about fill options (?tm_fill
). Notice how the style
we chose significantly changes our visualization (i.e. maps can lie). Here, I’ve used jenks
, which minimizes each class’s average deviation from the class mean. I’ve also asked tmap
to plot population density, i.e. POP100
divided by the area of each polygon (which tmap
calculates in the background… sniff sniff, so beautiful).
And to add the tract polygons we can add tm_borders()
and specify the darkness of borders using alpha
:
tm_shape(tracts) +
tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
tm_borders(alpha = 0.3) +
tm_compass() +
tm_scale_bar()
Did you notice that tmap
makes adding our DOGTAILS painless? It’s a lot more painful in ggplot
and spplot
, which is why I didn’t show it in this tutorial (trust me, painful). Each tm_shape()
and it’s specifications are referred to as a group. We can stack multiple groups (i.e. specific visualization of tm_shape(tracts)
, tm_shape(trails)
, and tm_shape(cities)
). A crazy thing about tmap
is that each shapefile you intend to plot can have different projections and extents. tmap
will automatically reproject stacked groups of data to the projection of the first group (tracts
in our case).
tm <- tm_shape(tracts) +
tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
tm_borders(alpha = 0.3) +
tm_compass() +
tm_scale_bar() +
tm_shape(trails) +
tm_lines(col = "brown", alpha = 0.4)
tm
You can alter the format, shape, and legend, among other things, but digging a bit more into the tmap
elements and code. I strongly recommend you review the [tmap
in a Nutshell Vignette](https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html) and the full package documentation that explains all of the cool things you can do. I’ll quickly show you how to fix our map to make the legend a bit more legible:
tm <- tm +
tm_legend(text.size=1,
title.size=1.2,
position = c("left","bottom"),
bg.color = "white",
bg.alpha=.5,
frame="gray50",
height=.6)
tm
## Some legend labels were too wide. These labels have been resized to 0.87. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
Want to save this beautiful map we made?
save_tmap(tm, "./figs/my_awesome_map.png", width = 800, height = 1000)
## Warning in save_tmap(tm, "./figs/my_awesome_map.png", width = 800, height =
## 1000): save_tmap is deprecated as of tmap version 2.0. Please use tmap_save
## instead
## Legend labels were too wide. The labels have been resized to 0.85, 0.57, 0.50, 0.50, 0.46. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
## Map saved to /Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/figs/my_awesome_map.png
## Resolution: 800 by 1000 pixels
## Size: 2.666667 by 3.333333 inches (300 dpi)
Let me load this thing back into my RMarkdown doc:
The last amazing, mindbogglingly cool thing we can do with tmap
is quickly make interactive visualizations. I won’t have time to cover Leaflet and Shiny, the main tools for interactive mapping in R
. If, however, you want to QUICKLY build a cool interactive map, tmap
can help!
tmap_mode("view")
## tmap mode set to interactive viewing
last_map()
## Warning in last_map(): last_map is deprecated as of tmap version 2.0.
## Please use tmap_last
## Compass not supported in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.