R
using the terra
package.The main tool we’ll be working with in this tutorial is the
terra
package, so go ahead and install/load that. You
can work with rasters using the older raster
, but
the terra
package is quickly becoming the best package for
gridded data in R
, so we’ll focus on that!
library(terra)
library(sf)
library(tidyverse)
Raindrops on roses and rasters of land use… these are a few of my favorite things. Rasters are fantastic tools to store massive amounts of massively cool data. Think of a raster as a matrix… or a grid of numbers. Each pixel in the raster has number associated with it that tells us something about the data of interest, i.e.
Let me pause here and state that you are very lucky because you have insanely easy access to incredible, beautiful visualizations of our precious planet.. in near real time, all the time, anywhere. Take a second to ponder that. Then go download Google Earth and start playing.
The data stored in rasters can be continuous (precipitation, elevation, reflectance) or categorical (land use category). In this tutorial we’ll work with both. For a raster to be a raster, we need a few key pieces of information:
When you’re working with raster data, especially satellite imagery, think carefully about trade-offs between resolution in space and time. Some datasets are available daily at a low spatial resolution; others are available monthly at a high spatial resolution; and others are high spatial and temporal resolution, but massively large and hard for you to work with on your laptop.
The main data type you’ll be working with in raster-world is the
SpatRaster
. Wuzdat? It’s basically a data structure that
allows R to read and understand gridded data. This data can be two
dimensional (so think a single picture) or three-dimensional (so think a
STACK of pictures, for example a stack of rasters showing land use in
one place through time). Note that terra
also supports
SpatVectors
, so points, lines and polygons like we’ve been
working with in sf
, but we’ll stick to sf
in
this class.
OK, so what are the basic ingredients you need to “bake” a raster? Well, think of the things that describe a picture (which is just a big matrix of pixels). A picture has:
Let’s start by making a simple raster from scratch. Then we’ll play
with some categorical and continuous raster data using the
terra
package.
my_rast <- rast()
Congrats! You just made a raster called my_rast
using
the rast()
function. If you ask R
what the
class()
of this new thing you built is, you’ll see it’s a
SpatRaster
:
class(my_rast)
## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"
You can get more information about the raster by simply typing the name of the raster in the console:
my_rast
## class : SpatRaster
## dimensions : 180, 360, 1 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
Cool. We have made a raster with 180 rows, 360 columns and 1 layer
(so it’s not a stacked raster with multiple layers, e.g. showing change
through time or different wavelengths of light, it’s just ONE picture).
It’s got a resolution of 1x1. 1x1 what…? Let’s ask R
for
more information about the projection of this data (which we can see is
lon/lat WGS 84
above):
crs(my_rast)
## [1] "GEOGCRS[\"WGS 84 (CRS84)\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"unknown\"],\n AREA[\"World\"],\n BBOX[-90,-180,90,180]],\n ID[\"OGC\",\"CRS84\"]]"
Well that’s a ton of info - but check out the piece that says
LENGTHUNIT
… you’ll see that the unit length here is in
meters. Note that this isn’t always the case. Sometimes the
resolution is reported in degrees, kilometers, and (for rogue Americans)
miles. So always check this to understand your data and to make sure it
is being read by R
correctly.
OK, so these are the default geometry parameters for a raster in
R
, but what if we want to make a special, fancy
raster with parameters we define?
my_fancy_rast <- rast(ncol = 10, nrow = 10)
ncell(my_fancy_rast)
## [1] 100
Now I’ve made a 10x10 raster with 100 cells. Cool.
hasValues(my_fancy_rast)
## [1] FALSE
But sadly there’s no data in this raster (aka, the cells don’t have any actual values, so we can’t really visualize anything here). It’s kinda’ like we have an empty matrix - it has specific dimensions, but is empty. Let’s add some random values:
values(my_fancy_rast) <- runif(ncell(my_fancy_rast)) # runif takes random numbers from a uniform distribution
plot(my_fancy_rast, main = "LOOK AT MY FANCY RASTER:")
Well congrats on your raster. But fortunately, we rarely have to make
our own rasters from scratch. Most of the time, we load rasters
already built by other folks (thanks NASA!). These data can
take many many many different forms. In this tutorial, we will focus on
learning how to load and manipulate the most commonly used types of
raster data. There are tons of raster extensions out there:
.grd
, .asc
, .sdat
,
.rst
, .nc
, .tif
,
.envi
, .bil
, .img
,
.hdf
. I’ve encountered all of these at some point… and the
terra
package can handle just about all of them.
We’ll get to raster stacks later in the tutorial. For now, let’s work with a single layer of my current favorite raster data set, the CropScape dataset.. This is a categorical raster that covers most of the country from 2000 to present at a 30 meter - YES T-H-I-R-T-Y meter - resolution. Since the data set was built by the folks at the U.S. Department of Agriculture, there are more detailed categorizations for agricultural lands than for ecosystems.1
I went to the CropScape website and downloaded data for DeKalb County, GA in 2022. To do this, click “Select Area” and type in the state (Georgia) and county (DeKalb). This should take you to DeKalb County. You’ll see lots of gray which is (you guessed it) - urban cover. Play with the slider bar at the bottom of the map to see if you can spot any interesting changes in land use through time with your eyes (you can use the Animate tool for this). Then, make sure you select 2022, and click Export in the top right corner of the screen. Again, confirm you’re downloading data for 2022. Note the projection (Web Mercator (WGS 84)) and especially the file type. I recommend sticking with GeoTiffs. This file structure can hold both the data (and thus the image) but also the other relevant information we need for spatial analysis (projection, extent, etc…). DPI of 96 is fine, then click export and Download Exported Area. Unzip the folder you’ve downloaded. In it, you should see several files… we need ALL of these files to load the raster in R, so move them ALL to your data folder.
Then you can load your raster with the easiest function ever
(pointing to the .tif
file):
cdl <- rast("./data/clipped.tif")
cdl
## class : SpatRaster
## dimensions : 1888, 1609, 1 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : -9395692, -9347422, 3972700, 4029340 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / Pseudo-Mercator (EPSG:3857)
## source : clipped.TIF
## color table : 1
## categories : Class_Name
## name : Class_Name
## min value : Corn
## max value : Blueberries
OK, cool, what can we see here? We have a raster with 1888 rows and 1609 columns. Only one layer (so only 2022). The extent tells us about the coordinates (in the WGS84 CRS) that define the extent of the raster. Wait a sec… blueberries!? Corn!?
plot(cdl)
Before we dive into this categorical raster a bit more, let’s pull out some basic info about the data:
ext(cdl) # extent
## SpatExtent : -9395691.70219402, -9347421.70219402, 3972699.54617784, 4029339.54617784 (xmin, xmax, ymin, ymax)
print(paste("This groovy data has", ncol(cdl), "columns.", sep = " "))
## [1] "This groovy data has 1609 columns."
print(paste("This groovy data has", nrow(cdl), "rows.", sep = " "))
## [1] "This groovy data has 1888 rows."
Or we can simply call:
dim(cdl)
## [1] 1888 1609 1
We can also find the resolution of our data:
res(cdl)
## [1] 30 30
30 meters!!!!!!!!!!!! Can you believe it!? Go, science, go!!!!
There’s also the projection information (crs(cdl)
) and
the data itself (values(cdl)
). The nice thing about
terra
objects is that you don’t need to specify the
values(cdl)
piece for a number of useful functions to work.
Need the max and min values of the raster?
range(cdl)
## class : SpatRaster
## dimensions : 1888, 1609, 2 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : -9395692, -9347422, 3972700, 4029340 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / Pseudo-Mercator (EPSG:3857)
## source(s) : memory
## names : range_min, range_max
## min values : 1, 1
## max values : 242, 242
Let’s see a histogram of our pixel values:
hist(cdl)
Any NA
s? Here are a few ways to answer that question:
length(cdl[is.na(cdl)])
## [1] 1906311
global(cdl, fun = "isNA")
## isNA
## Class_Name 1906311
global(cdl, fun = "notNA")
## notNA
## Class_Name 1131481
The global()
function lets you apply functions (even
those you build!) to your raster. Here, we’re just counting the amount
if isNA
(pixels that are NA
) and
notNA
(pixels that are not NA
in our
raster).
Lots of missing data! What percent of pixels are missing?
length(cdl[is.na(cdl)])/(ncell(cdl))
## [1] 0.6275318
Why so much missing data? Look at the plot of our raster. Raster datasets are stored with a well defined bounding box, so they are always rectangular. The region of interest, however, is rarely rectangular (unless you’re in one of those states with a bunch of square counties). This leads to a lot of missing data around the edges. Check for this and be wary of missing data when you compute summary stats on your raster… You can quickly check to see the value of this “black space” in the raster plot by indexing pixels at the corners of the raster, i.e.:
cdl[1,1]
## Class_Name
## 1 <NA>
This returns the value of the pixel at location row = 1, column = 1,
which is stored as NA
. Note that if all this missing data
around your raster really bothers you, you can use the
trim()
function to remove outer rows and columns that only
contain NA
values.
Another nice trick to help you check the contents of your raster: you
can use the freq()
function to build a frequency table for
your data. Probably only do this for categorical data, or you’ll have a
gigantic frequency table:
freq(cdl)
## layer value count
## 1 1 Corn 75
## 2 1 Cotton 37
## 3 1 Soybeans 149
## 4 1 Winter Wheat 37
## 5 1 Dbl Crop WinWht/Soybeans 29
## 6 1 Rye 10
## 7 1 Oats 6
## 8 1 Millet 10
## 9 1 Other Hay/Non Alfalfa 984
## 10 1 Tomatoes 1
## 11 1 Sod/Grass Seed 5
## 12 1 Peaches 7
## 13 1 Other Tree Crops 1
## 14 1 Pecans 6
## 15 1 Open Water 10985
## 16 1 Developed/Open Space 296101
## 17 1 Developed/Low Intensity 261846
## 18 1 Developed/Med Intensity 147132
## 19 1 Developed/High Intensity 79371
## 20 1 Barren 9384
## 21 1 Deciduous Forest 137228
## 22 1 Evergreen Forest 126904
## 23 1 Mixed Forest 17696
## 24 1 Shrubland 2525
## 25 1 Grassland/Pasture 28467
## 26 1 Woody Wetlands 12360
## 27 1 Herbaceous Wetlands 102
## 28 1 Dbl Crop WinWht/Corn 3
## 29 1 Dbl Crop Oats/Corn 1
## 30 1 Dbl Crop WinWht/Sorghum 16
## 31 1 Dbl Crop Soybeans/Oats 1
## 32 1 Blueberries 2
Or if you want to know the frequency of one specific land use category, check this out:
freq(cdl, value = 1) # Corn
## layer value count
## 1 1 1 75
You can use your plotting skillz to turn this into a ggplot:
library(tidyverse)
freq <- as.data.frame(freq(cdl)) %>% filter(value != "NA")
ggplot(data = freq) +
geom_bar(aes(x = reorder(value, desc(count)), y = count), stat = "identity") +
xlab("Land use category") + ylab("Count of pixels") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("What's going on in our county?")
Oftentimes two raster datasets will not line up because of
differences in resolution and/or extent. The
projectRaster()
function from the raster
package lets us reproject datasets so they can talk to one another.
Let’s look at the projection info for our cdl
raster:
crs(cdl)
## [1] "PROJCRS[\"WGS 84 / Pseudo-Mercator\",\n BASEGEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]],\n CONVERSION[\"unnamed\",\n METHOD[\"Popular Visualisation Pseudo Mercator\",\n ID[\"EPSG\",1024]],\n PARAMETER[\"Latitude of natural origin\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8801]],\n PARAMETER[\"Longitude of natural origin\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8802]],\n PARAMETER[\"False easting\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8806]],\n PARAMETER[\"False northing\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8807]]],\n CS[Cartesian,2],\n AXIS[\"easting\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1]],\n AXIS[\"northing\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1]],\n ID[\"EPSG\",3857]]"
This is stored in CRS
format, which now you guys know
all about. Let’s say we want to transform our raster from its current
Albers Equal Area projection to a US
National Atlas Equal Area projection:
crs_code <- '+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
cdl_albers <- project(cdl, crs_code)
Note that this can take awhile depending on the size of your raster.
It’s like adjusting a very elaborate patchwork quilt over a complex
surface. Also, BE SURE to think about whether your data
is categorical or continuous before you begin reprojecting. Reprojecting
rasters means that the cell centroids and even size may change. To
assign values to the new cells in the reprojected raster,
projectRaster()
may have to compute new values for the
reprojected cells using one of two interpolation procedures: nearest
neighbor (near
) or bilinear (bilinear
) which
is the default. If you apply bilinear
interpolation to a
categorical raster with integer values like 1, 22, 33, 4
etc, you’ll end up with pixels with values of averages of these
categories (11.5, 27.5
). This totally destroys your
categorization. So when you’re working with categorical rasters
like the CropScape data, be sure you set method = "near"
in
your reprojection. You can read more about the methods for projecting
different types of rasters by typing ?project
in the
console.
If you want to find the difference between values in two rasters, it’s fairly complicated:
diff <- raster_1 - raster_1
Or, if for some reason you wanted to add 10
to all cells
in our CDL raster:
newcdl <- cdl + 10
Don’t forget that a raster is just a matrix, so math is a breeze! Well, kinda… raster math that’s this simple only works if your rasters have the same extent and resolution and if the calculations are fairly simple. You can see the full set of operations you can use here.
You can also use simple R indexing to replace values in the raster. If for some reason, you wanted to turn all instances of corn (pixel values of 1) to soy (pixel values of five), you simple do the following:
nocorn <- cdl
nocorn[nocorn == 1] <- 5
Be careful when you replace values in the ORIGINAL raster you loaded.
This changes the cdl
SpatRaster
you’re working
with in this session of R. I recommend making a copy of the data when
you’re doing big replacements, so you can keep the unaltered original
data for other analyses.
If you want a single value summarizing the entire raster
(so, for example, the sum of the values of all pixels in raster), use
the global()
function we saw above:
global(cdl, "isNA")
## isNA
## Class_Name 1906311
If you want to do a deep read of other more complex things you can do with rasters in R, read this chapter of the RSpatial textbook.
Cropping rasters in R
is pretty straightforward. If you
know the coordinates of your new raster extent, then:
new.extent <- extent(c(-125, -100, 20, 50))
out <- crop(raster, new.extent)
You can get the new.extent
values from an existing
raster by calling ext(new_extent_raster)
. You can also pull
this info out of a shapefile.
If you want to crop to a specific shape rather than the
extent of that shape, then you need to add an additional step. Say we
want to extract the part of this land use raster located in the city of
Decatur2
(note that these two spatial datasets have to be in the same projection
for the crop()
to work):
shp <- readRDS("./data/decatur.RDS")
shp <- st_transform(shp, crs(cdl))
plot(cdl)
plot(shp, add=T, col = "red")
city_raster <- crop(cdl, shp)
plot(city_raster)
Notice that this only cropped the land use raster to the extent of the Decatur shapefile, not to the actual boundaries of the shapefile. To do that, we have to add one more step:
city_raster <- crop(cdl, shp, mask = T)
plot(city_raster)
mask=T
basically tells R
that we want to use
the shapefile as a mask for the raster and mark as NA
anything that isn’t within the shapefile boundaries we’re using.
Masking raster data is something we need to do quite a bit in the
remote sensing world. We can do this using the mask()
function. Let’s say we want to mask all pixels not classified
as “Developed - High Intensity” (124
) in our new city
raster:
city_raster[city_raster == 124] <- NA
plot(city_raster)
ggplot
So how do we pull this all together into a nice visual? With the
tidyterra
package, terra
objects can play nice
with tidyverse
tools. Be sure to install
tidyterra
, then try this:
library(tidyterra)
ggplot() +
geom_spatraster(data = city_raster) +
geom_sf(data = shp, fill = "transparent") +
theme_bw() +
labs(fill = "",
title = "LULC in Decatur, GA")
There’s a lot more you can do with tidyterra
, so read more
here!
Great! Now that we know how to do some basic things with categorical rasters in R, let’s look at some multi-band and continuous data. Let’s play with the ENACTS-BMD data created by Dr. Nachiketa Acharya and his colleagues (you can read more about the data here). This gridded data describes daily temperature and rainfall observations in Bangladesh. Today, we’ll work with a small subset of this data describing rainfall in January 2022.
jan22 <- rast("./data/JAN22.tif")
jan22
## class : SpatRaster
## dimensions : 134, 108, 31 (nrow, ncol, nlyr)
## resolution : 0.04999999, 0.05 (x, y)
## extent : 87.65, 93.05, 20.35, 27.05 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : JAN22.tif
## names : precip_1, precip_2, precip_3, precip_4, precip_5, precip_6, ...
## min values : 0.000000, 0.00000000, 0.000000, 0.000000000, 0.000000, 0.000000, ...
## max values : 3.211378, 0.01945309, 4.411213, 0.006855291, 2.826128, 3.603938, ...
## time (days) : 2022-01-01 to 2022-01-31
Note that this raster as 31 layers. Each layer represents precipitation for one day in January 2022. For example, if I wanted to visualize precipitation on January 11th, I could do this:
plot(jan22[[11]])
Similarly, I could ask for mean precipitation for the entire country on January 11th:
global(jan22[[11]], fun = mean, na.rm=T)
## mean
## precip_11 1.258884
We could also sum total precipitation for each pixel in Bangladesh for the month of January as follows:
totalJan <- app(jan22, fun = sum)
plot(totalJan)
Pretty cool, huh?!
You can write out a raster using the writeRaster()
function in the raster
package.
writeRaster(my_raster, "my/directory/raster.tif") # you can use whatever extension you like - I prefer .tif
# note that if you want to write netCDF files, use writeCDL()
USGS data repositories allow you to search for data by time, year, satellite, resolution, etc.:
You can also download data directly from NASA (easiest when downloading large time-series). University of Maryland maintains a database of pre-processed images for the remote sensing of land surface processes. You can find lots of DEMs here (ASTER and STRM are the most widely used). You can find cool, free high-resolution topographic and bathymetric data here. This website groups data sources by application (e.g. oceanography, land surface processes, forecasting) and can be helpful to explore.
RSpatial
text provides more information
about remote sensing in R
.