R
.sf
package is in the works that will likely change the way we work with spatial data in R
. It’s brand spanking new, so most of the packages that have been built for spatial data still only work with sp
objects, which are the focus of this tutorial. Just keep in mind that in the future, these packages may switch to sf
. More info here.R
All new packages and functions this week, so the base R
we’ve covered to date should be sufficient!
Hooray! We’ve finally made it to spatial data! Well, actually nearly all of the data we’ve already played with is spatial, it just did not contain the coordinate and projection information required to make maps and conduct spatial analyses. This tutorial will focus on giving you the tools required to import, project, and create spatial datasets. There are three main packages used by the R
community to bring spatial data into R
. These are:
sp
rgdal
tidyverse
Make sure you have these packages installed on your machine.
library(sp)
library(rgdal)
library(tidyverse)
R
sp
is the mama-package of all the spatial packages. You can look through the sp
documentation to get a sense of all the things you can do with this powerful package. The team behind the sp
package created a set of new data classes that integrate our best friend the data.frame
with some friends who at times can be very difficult to hang out with: point, polygon, and raster. These new data classes include the following:
getClass("Spatial")
## Class "Spatial" [package "sp"]
##
## Slots:
##
## Name: bbox proj4string
## Class: matrix CRS
##
## Known Subclasses:
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2
The sub-classes followed by DataFrame
like SpatialPointsDataFrame
, include additional attributes in the form of a data.frame
. For example, if you have a SpatialPolygon
object of US Census blocks, then this object would only include the spatial information required to plot the Census block polygons. With a SpatialPolygonDataFrame
, you can store this spatial information and additional data such as population, race, and income associated with each polygon. This may seem trivial, but, as I’ll show you below, this makes for an incredibly powerful way to organize, visualize, and analyze spatial data. You know me, I like data, so in this class, our spatial objects will almost always come in the DataFrame
flavor, i.e. with additional attributes attached to the spatial object. Since you guys are now experts on data.frame
wrangling, this should come as good news!1
Did you notice I referred to our new sp
datasets as objects? This is important. In the programming world, an object stores functions and variables relevant only to that object. Huh? For example, we could create a vehicle
object and assign it a number of attributes that are only relevant to vehicles, i.e. numberofwheels
, color
, weight
, etc. We could also build functions associated with that object, i.e. compute_mileage()
, enter_Batman_mode()
, etc. We can bundle all of these object-specific attributes and functions in our vehicle
object. In the programming world, this is known as object-oriented programming and it is very powerful.2
Our Spatial
objects have a series of attributes that describe things specific to spatial data such projections and bounding boxes. Let’s say I have loaded a shapefile in R
and called the new variable that stores this shapefile states
. I’ll show you how to load shapefiles below. For now, let’s just focus on how R
organizes the information stored in a shapefile. Let’s start by looking at the class
of the shapefile object:
## OGR data source with driver: ESRI Shapefile
## Source: "./data/states.shp", layer: "states"
## with 51 features
## It has 5 fields
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
class(states)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Excellent. We confirmed that states
is, in fact, a spatial object. More precisely, it is a SpatialPolygonsDataFrame.
Remember, this means that there is a data.frame
with more info hiding somewhere in this object. Because this is an object, it is organized differently than a classic data.frame
whose variables can be accessed using $
. In R
, objects contain slots, which store different types of information or even functions specific to the object. We access different slots using the @
symbol. Let’s look at some of the slots stored in our states
object. We can access the actual data (or, in ArcGIS speak, the attribute table) using @data
:
states@data
## STATE_NAME DRAWSEQ STATE_FIPS SUB_REGION STATE_ABBR
## 0 Hawaii 1 15 Pacific HI
## 1 Washington 2 53 Pacific WA
## 2 Montana 3 30 Mountain MT
## 3 Maine 4 23 New England ME
## 4 North Dakota 5 38 West North Central ND
## 5 South Dakota 6 46 West North Central SD
## 6 Wyoming 7 56 Mountain WY
## 7 Wisconsin 8 55 East North Central WI
## 8 Idaho 9 16 Mountain ID
## 9 Vermont 10 50 New England VT
## 10 Minnesota 11 27 West North Central MN
## 11 Oregon 12 41 Pacific OR
## 12 New Hampshire 13 33 New England NH
## 13 Iowa 14 19 West North Central IA
## 14 Massachusetts 15 25 New England MA
## 15 Nebraska 16 31 West North Central NE
## 16 New York 17 36 Middle Atlantic NY
## 17 Pennsylvania 18 42 Middle Atlantic PA
## 18 Connecticut 19 09 New England CT
## 19 Rhode Island 20 44 New England RI
## 20 New Jersey 21 34 Middle Atlantic NJ
## 21 Indiana 22 18 East North Central IN
## 22 Nevada 23 32 Mountain NV
## 23 Utah 24 49 Mountain UT
## 24 California 25 06 Pacific CA
## 25 Ohio 26 39 East North Central OH
## 26 Illinois 27 17 East North Central IL
## 27 District of Columbia 28 11 South Atlantic DC
## 28 Delaware 29 10 South Atlantic DE
## 29 West Virginia 30 54 South Atlantic WV
## 30 Maryland 31 24 South Atlantic MD
## 31 Colorado 32 08 Mountain CO
## 32 Kentucky 33 21 East South Central KY
## 33 Kansas 34 20 West North Central KS
## 34 Virginia 35 51 South Atlantic VA
## 35 Missouri 36 29 West North Central MO
## 36 Arizona 37 04 Mountain AZ
## 37 Oklahoma 38 40 West South Central OK
## 38 North Carolina 39 37 South Atlantic NC
## 39 Tennessee 40 47 East South Central TN
## 40 Texas 41 48 West South Central TX
## 41 New Mexico 42 35 Mountain NM
## 42 Alabama 43 01 East South Central AL
## 43 Mississippi 44 28 East South Central MS
## 44 Georgia 45 13 South Atlantic GA
## 45 South Carolina 46 45 South Atlantic SC
## 46 Arkansas 47 05 West South Central AR
## 47 Louisiana 48 22 West South Central LA
## 48 Florida 49 12 South Atlantic FL
## 49 Michigan 50 26 East North Central MI
## 50 Alaska 51 02 Pacific AK
Our data! In data.frame
form. If you don’t believe me, class(states@data)
proves that a real live data.frame
is stored in our spatial object. All of our magical dplyr
tools apply… want to know how many states are in each sub-region?
states@data %>%
group_by(SUB_REGION) %>%
summarize(n_county = n())
## # A tibble: 9 x 2
## SUB_REGION n_county
## <fctr> <int>
## 1 East North Central 5
## 2 East South Central 4
## 3 Middle Atlantic 3
## 4 Mountain 8
## 5 New England 6
## 6 Pacific 5
## 7 South Atlantic 9
## 8 West North Central 7
## 9 West South Central 4
Is your mind blown? Tears streaming down your face? Aren’t you glad you invested all that time in learning about data.frames
and the dplyr
? Notice, however, that the output of this dplyr
ing is a data.frame
, not a sp
object. To keep the subset and cleaned shapefile in shapefile form, we have to use a different package called, unsurprisingly, spdplyr
, which you’ll learn about in another tutorial.
Let’s take a peak at the other info stored in our states
object. We can also access information about the bounding box of the spatial object:
states@bbox
## min max
## x -178.21760 -66.96927
## y 18.92179 71.40624
This is a matrix of bounding box coordinates with column names c(min,max)
with the first row showing the longitude (x-axis) and the second row showing the latitude (y-axis).
We can also access projection information:
states@proj4string
## CRS arguments:
## +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
This slot holds a string that describes the projection of the dataset. This string should be in a PROJ.4
format.3 Wait, what? Hang on, I’ll go through the projection stuff in more detail below.
Together, these slots form a spatial object that we can plot. For now, we will use base R
to plot this object, but in a future class we’ll learn about more sophisticated mapping tools.
plot(states)
Now that you understand how a spatial object is structured and its slots are accessed, let’s move on to loading different types of spatial data into R
.
R
If you’re like me, you have folders full of shapefiles. A single shapefile is often split into many files (e.g. .cpg
, .dbf
, .prj
, etc.), the most important of which is the sacred .SHP
file. To get all of this info into an organized spatial object in R
is easy, and in fact, as is often the case in R
, there are multiple packages and functions that will do this for you. Today, we’ll learn how to use the rgdal
package to load a shapefile:
library(rgdal)
states <- readOGR(dsn = "./data/states.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "./data/states.shp", layer: "states"
## with 51 features
## It has 5 fields
summary(states)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -178.21760 -66.96927
## y 18.92179 71.40624
## Is projected: FALSE
## proj4string :
## [+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Data attributes:
## STATE_NAME DRAWSEQ STATE_FIPS SUB_REGION
## Alabama : 1 Min. : 1.0 01 : 1 South Atlantic : 9
## Alaska : 1 1st Qu.:13.5 02 : 1 Mountain : 8
## Arizona : 1 Median :26.0 04 : 1 West North Central: 7
## Arkansas : 1 Mean :26.0 05 : 1 New England : 6
## California: 1 3rd Qu.:38.5 06 : 1 East North Central: 5
## Colorado : 1 Max. :51.0 08 : 1 Pacific : 5
## (Other) :45 (Other):45 (Other) :11
## STATE_ABBR
## AK : 1
## AL : 1
## AR : 1
## AZ : 1
## CA : 1
## CO : 1
## (Other):45
The dsn
argument is simply the path to the shapefile you want to open in R
. The summary()
of the shapefile provides some important information. I can immediately see that R
has read my shapefile in as a SpatialPolygonsDataFrame
. I can see the bounding box under the Coordinates
heading. I see that my data is not projected (Is projected: FALSE
) and the proj4string
string that gives me more information about the data’s coordinate reference system. I can also see the data attributes stored in the @data
part of the object and some descriptive stats about each variable. Nifty.
People tend to hate projections4 and generally don’t understand them very well. It is, however, VERY important that you project your data correctly. If you fail to do this, it’s highly likely that your spatial analyses will be wrong. So let’s learn how this stuff works once and for all.
Maps tend to be two-dimensional; the Earth is not. A coordinate system is a way to locate everything on Earth’s surface in x and y space. The way we portray (pseudo-)spherical Earth on a flat surface is by using a projection. Each projection is associated with a particular coordinate system. There are a bazillion coordinate systems, and each coordination system comes with strong assumptions and distortions (and fan-base):
12N
;Each coordinate system is tied to a datum. This is essentially a reference point, the starting point from which coordinates are measured (i.e. lat and lon are based on the equator and prime meridian). In R
we can access the projection information for a Spatial
object using the proj4string
attribute of the object:
states@proj4string
## CRS arguments:
## +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
Excellent. Our data is currently in the WGS84 Geographic Coordinate System which plots locations on a sphere using longitude and latitude (but does not project these coordinates into two-dimensional space). This is fine if we’re only interested in visualizing our data spatially, but if we have any plans to use the spatial data to compute distance or area, then we need to project our data (from 3D to 2D). This is fairly straightforward in R
. Say we want to reproject this dataset into USA Contiguous Albers Equal Area Conic:
states <- spTransform(states, CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"))
I got that crazy long string by searching for the projection on spatialreference.org and clicking the Proj4 link. R
expects projection information to come in this proj4
format.
If you happen to have a shapefile that has an undefined projection in R
, but you happen to know what projection the data is in, you can set the projection as follows:
shp <- readOGR("./path/to/shp.shp")
projection(shp) <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
Finally, don’t forget that the map projection you use strongly impacts the way you visualize and analyze your data (see here or here)).
R
Let’s say I’m a volcanologist and I find a really cool dataset describing all eruptions on Earth during the Holocene.5 The data is stored in a .csv
document. We now know how to open the .csv
in R
and inspect its contents:
volcano <- read.csv('./data/volcano.csv')
glimpse(volcano)
## Observations: 1,496
## Variables: 13
## $ ï..Volcano.Number <int> 210010, 210020, 210030, 210040, 211001, 2...
## $ Volcano.Name <fctr> West Eifel Volcanic Field, Chaine des Pu...
## $ Country <fctr> Germany, France, Spain, Spain, Italy, It...
## $ Primary.Volcano.Type <fctr> Maar(s), Lava dome(s), Pyroclastic cone(...
## $ Activity.Evidence <fctr> Eruption Dated, Eruption Dated, Evidence...
## $ Last.Known.Eruption <fctr> 8300 BCE, 4040 BCE, Unknown, 3600 BCE, 1...
## $ Region <fctr> Mediterranean and Western Asia, Mediterr...
## $ Subregion <fctr> Western Europe, Western Europe, Western ...
## $ Latitude <dbl> 50.170, 45.775, 42.170, 38.870, 43.250, 4...
## $ Longitude <dbl> 6.850, 2.970, 2.530, -4.020, 10.870, 11.9...
## $ Elevation..m. <int> 600, 1464, 893, 1117, 500, 800, 949, 458,...
## $ Dominant.Rock.Type <fctr> Foidite, Basalt / Picro-Basalt, Trachyba...
## $ Tectonic.Setting <fctr> Rift zone / Continental crust (>25 km), ...
But we’re not here to learn how to work with .csv
data… we have now graduated to spatial data! So how do we turn our .csv
into a SpatialPointsDataFrame
(SpatialPoints
because we’re dealing with coordinates of eruptions and DataFrame
because there are more attributes than these coordinates, like Volcano.Name
and Country
)?
prj <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
select <- dplyr::select #this tells R to use the select() function from the dplyr package, not the select function stored in some of the spatial packages
coords <- volcano %>% select(Longitude, Latitude)
volcano_pt <- SpatialPointsDataFrame(coords = coords, data = volcano, proj4string = prj)
plot(volcano_pt)
Notice here that I had to define the projection of my dataset using the prj
variable. I make the prj
variable a CRS
object using the CRS()
function. This essentially smooshes the long projection string into a format that R
likes. Be sure you’ve formatted your projection data using the CRS()
function when you’re creating shapefiles. I have to create a coords
variable that stores the longitude and latitude for my point dataset. I used dplyr
to select these two columns from the full volcano
dataset. Then we mix it all together to bake a beautiful volcanic spatial cake using the SpatialPointsDataFrame()
function. To learn more about this function, you know what to do (?SpatialPointsDataFrame
).
Ohhh how I want to transform this ugly base R
plot into something better. That’s coming. For now, here’s a plot with base R
and some data from the maptools
package:
library(maptools)
data(wrld_simpl)
plot(wrld_simpl)
plot(volcano_pt, add=T, col='orange', pch=2)
Note the add=T
argument. This lets you add multiple layers to one plot. Could be useful for the homework! :)
Most of the polygon data I work with has already been built by someone else, so I don’t often build polygons from scratch. I can, however, think of many, many instances when this would be a very useful skill to master. I found a fantastic tutorial HERE on creating SpatialPolygon
objects. I’ll summarize it below.
Our ultimate objective is to have a complete SpatialPolygons
object. To do this in R
, we have to build each individual polygon and bundle them all together. Let’s start by building a set of simple polygons describing the outline of the main part of a house and of the roof (in this example, the coordinates are arbitrary; in the real world, you’d have actual coordinates). The rbind()
function pulls together each set of coordinates (i.e. c(1,1)
) and binds them together in separate rows to create a mini-matrix of coordinates. The hole=T
option in the house2.door
variable is a cool option that tells R
to view that polygon as detracting from the area of the larger polygon it is a part of, i.e. when you compute the surface area of House 2, don’t include the door.
house1.building <- Polygon(rbind(c(1, 1), c(2, 1), c(2, 0), c(1, 0)))
house1.roof <- Polygon(rbind(c(1, 1), c(1.5, 2), c(2, 1)))
house2.building <- Polygon(rbind(c(3, 1), c(4, 1), c(4, 0), c(3, 0)))
house2.roof <- Polygon(rbind(c(3, 1), c(3.5, 2), c(4, 1)))
house2.door <- Polygon(rbind(c(3.25, 0.75), c(3.75, 0.75), c(3.75, 0), c(3.25,
0)), hole=T)
Now, let’s bind the individual Polygon
objects into a Polygons
object for House 1 and House 2.
h1 <- Polygons(list(house1.building, house1.roof), "H1")
h2 <- Polygons(list(house2.building, house2.roof, house2.door), "H2")
And finally, let’s bundle everything into a SpatialPolygon
and plot our beautiful houses:
# create spatial polygons object from lists A SpatialPolygons is like a
# shapefile or layer.
houses <- SpatialPolygons(list(h1, h2))
plot(houses)
SpatialPolygons
are cool and all, but SpatialPolygonsDataFrames
are much cooler. How do we add data to our new SpatialPolygon
? First let’s make some attributes:
attr <- data.frame(attr1=c(0,100), attr2 = c(7, 24), row.names = c("H1", "H2"))
colnames(attr) <- c("Accessibility", "Party duration")
attr
## Accessibility Party duration
## H1 0 7
## H2 100 24
Now to make our SPDF
. We’ll learn more about spplot()
, the plotting tool in the sp
package in a later lesson:
houses.df <- SpatialPolygonsDataFrame(houses, attr)
spplot(houses.df)
Conclusion? If you want to have a party, make sure your house has a door.
You can apply the same basic principles to make a SpatialLines
object.
For shapefiles, I recommend using the rgdal
package. Here, I’m writing out the houses.df
SpatialPolygonDataFrame
we built above:
writeOGR(obj = houses.df, dsn="./data/house.shp", layer="house", driver="ESRI Shapefile", overwrite=T)
The writeOGR()
function requires you input the name of the object you want to save (obj
), the destination for the object (dsn
), the layer name you want to give the object (layer
) and the driver, or data type, you want to write out (driver
). I’ve written out a classic ESRI Shapefile
but there are lots of other drivers you can use found here.
There’s lots more information on writing out geospatial data in R
here.
sp
package.I know, I know… you’re dying to know if this means we can use the dplyr
with spatial data!? YES WE CAN! We’ll get there.↩
If you’re interested in how spatial objects are built in R
, check this out↩
Check out this website for allllll the info you could ever want about proj.4
strings.↩
The only person who was ever pumped about projections was an old Flemish dude named Mercator.↩
Volcanologists rejoice! This dataset exists and I encourage you to use it in your class project.↩