raster
package so if you’re planning on working with lots of raster data, be sure to review this chapter to learn about other tools stored in the package.R
indexing which you should be familiar with by now.The main tool we’ll be working with in this tutorial is the raster
package, so go ahead and install/load that. You can work with rasters using the sp
package by loading the raster as a SpatialGridDataFrames
, but the raster
package is much better, so we’ll focus on that!
library(raster)
Raindrops on roses and rasters of land use… these are a few of my favorite things. Rasters are fantastic tools to store massive amounts of massively cool data. Think of a raster as a matrix (NO NOT THAT MATRIX)… or a grid of numbers. Each pixel in the raster has number associated with it that tells us something about the data of interest, i.e.
Let me pause here and state that you are some of the luckiest people to ever live because you have insanely easy access to incredible, beautiful visualizations of our precious planet.. in near real time, all the time, anywhere. Take a second to ponder that. Then go download Google Earth and start playing.
The data stored in rasters can be continuous (precipitation, elevation, reflectance) or categorical (land use category). In this tutorial we’ll work with both. For a raster to be a raster, we need a few key pieces of information:
When you’re working with raster data, especially satellite imagery, think carefully about trade-offs between resolution in space and time. Some datasets are available daily at a low spatial resolution; others are available monthly at a high spatial resolution; and others are high spatial and temporal resolution, but massively large and hard for you to work with on your laptop.
raster
packageThe raster
package includes several raster classes:
RasterLayer
contains a single-layer raster. This object contains the number of columns and rows in the raster, the spatial extent, pixel values, and the projection information (stored in CRS
format)RasterBricks
, and RasterStacks
are great for multiband data (i.e. multiple spectral bands, observations through time). The main difference is that a RasterBrick
can be linked to only one (though multi-layer) file. RasterStacks
can be made with multiple files (band from one file merged with a band from another file) - though these objects have to have the same extent and resolution.You’ve already learned how to make a raster from scratch, so we won’t cover that here. Instead, we’ll focus on learning how to load and manipulate the most commonly used types of raster data. There are tons of raster extensions out there: .grd
, .asc
, .sdat
, .rst
, .nc
, .tif
, .envi
, .bil
, .img
, .hdf
. I’ve encountered all of these at some point… and the raster
package can handle just about all of them. You can find the full list in the raster
package documentation. And if you still feel confused about the different ways of storing raster data in the raster
package, check this out. And finally, if you’re one of the unlucky ones whose data comes in HDF4
or HDF5
data (ahem NASA, get your stuff together!), read this. You’ll definitely want the HDFView
tool.
We’ll get to bricks later in the tutorial. For now, let’s work with a single layer of my current favorite raster dataset, the CropScape dataset.. This is a categorical raster that covers most of the country from 2000 to present at a 30 meter - YES T-H-I-R-T-Y meter - resolution. Since the dataset was built by the folks at the U.S. Department of Agriculture, there are more detailed categorizations for agricultural lands than for ecosystems.1
I went to the CropScape website and downloaded data for Cache Valley in 2016. To do this, click on the little USA map at the top of the page, select County
, find Cache, then click Submit
. The map should zoom to Cache Valley. Next, click the little folder icon with a green arrow on it. Select 2016
, Submit
and the data should download to your computer. You’ll have to unzip the files and move both files to a local data folder. So I did all that, and can load my raster with the easiest function ever:
cache <- raster("./data/CDL_2016_49005.tif")
plot(cache)
Oh wow that’s pretty. The value of each pixel is indicative of the land use in that pixel (i.e. 1 = CORN, etc). You can download the metadata here). What’s all that pink stuff? That’s alfalfa, and that’s a longggg conversation for another day. The data we downloaded is a GeoTiff
, which raster
handles seamlessly. Our new cache
object is a raster
object. What’s stored in this object? We have extent and dimensions:
cache@extent
## class : Extent
## xmin : -1323615
## xmax : -1267665
## ymin : 2148435
## ymax : 2223375
print(paste("This groovy data has", cache@ncols, "columns.", sep = " "))
## [1] "This groovy data has 1865 columns."
print(paste("This groovy data has", cache@nrows, "rows.", sep = " "))
## [1] "This groovy data has 2498 rows."
Or we can simply call:
dim(cache)
## [1] 2498 1865 1
We can also find the resolution of our data:
res(cache)
## [1] 30 30
30 meters!!!!!!!!!!!! Can you believe it!? Go, science, go!!!!
There’s also the projection information (cache@crs
) and the data itself (cache@data
). The nice thing about raster
objects is that you don’t need to specify the @data
piece for a number of useful functions to work. Need the max and min values of the raster?
maxValue(cache)
## [1] 255
minValue(cache)
## [1] 0
Let’s see a histogram of our pixel values:
hist(cache)
Lots of low values… possibly some zeros?
length(cache[cache == 0])
## [1] 1283028
Lots of zeros! What percent of pixels are zero?
length(cache[cache == 0])/(cache@ncols*cache@nrows)
## [1] 0.2754006
Ooohh 27 percent? This likely means a value of zero indicates missing data. Look at the plot of our raster. Raster datasets are stored with clear extents, so they are always rectangular. The region of interest, however, is rarely rectangular (unless you’re in one of those states with a bunch of square counties). This leads to a lot of missing data around the edges. Check for this and be wary of missing data when you compute summary stats on your raster… You can quickly check to see the value of this “black space” in the raster plot by indexing pixels at the corners of the raster, i,e,:
cache[1,1]
##
## 0
This returns the value of the pixel at location row = 1, column = 1. As we expected, the value is 0
. I know from looking at the CropScape metadata that 0
indicates missing data, so let’s set our 0
pixels to NA
:
cache[cache == 0] <- NA
minValue(cache)
## [1] 1
maxValue(cache)
## [1] 229
Now that I’ve walked you through this example, I’ll just tell you that trim()
will drop the missing data around the edge of the dataset. Sorry. But you still needed to know how to replace raster values:
cache <- trim(cache)
plot(cache)
Alternatively, there is an extent()
function that can let you add a buffer of NAs
around a raster… this could be useful if you have a smaller raster and want it to have the same extent as a larger raster.
Another nice trick. You can use the freq()
function to build a frequency table for your data. Probably only do this for categorical data, or you’ll have a gigantic frequency table:
freq(cache)
## value count
## [1,] 1 41082
## [2,] 4 691
## [3,] 12 30
## [4,] 21 45817
## [5,] 23 3692
## [6,] 24 92755
## [7,] 28 1652
## [8,] 33 34813
## [9,] 36 355800
## [10,] 37 77071
## [11,] 43 1281
## [12,] 49 16
## [13,] 57 3
## [14,] 59 239
## [15,] 61 46904
## [16,] 66 56
## [17,] 67 63
## [18,] 68 10
## [19,] 70 1
## [20,] 111 22427
## [21,] 121 66079
## [22,] 122 52769
## [23,] 123 21041
## [24,] 124 7690
## [25,] 131 5098
## [26,] 141 897396
## [27,] 142 578479
## [28,] 143 7924
## [29,] 152 865163
## [30,] 176 90818
## [31,] 190 13812
## [32,] 195 43205
## [33,] 205 1850
## [34,] 229 15
## [35,] NA 1276169
Oftentimes two raster datasets will not line up because of differences in resolution and/or extent. The projectRaster()
function from the raster
package lets us do that. Let’s look at the projection info for our cache
raster:
cache@crs
## CRS arguments:
## +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0
## +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
This is stored in CRS
format, which now you guys know all about. Let’s say we want to transform our raster from its current Albers Equal Area projection to a US National Atlas Equal Area projection:
crs <- '+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
cache_prj <- projectRaster(cache, res = res(cache), crs = crs, method = "ngb")
cache_prj@crs
Note that this can take awhile depending on the size of your raster. Also, BE SURE to think about whether your data is categorical or continuous before you begin reprojecting. Reprojecting rasters means that the cell centroids even size may change. To assign values to the new cells, projectRaster()
computes new values using one of two interpolation procedures: nearest neighbor (ngb
) or bilinear (bilinear
) which is the default. If you apply bilinear
interpolation to a categorical raster with integer values like 1, 22, 33, 4
etc, you’ll end up with pixels with values of averages of these categories (11.5, 27.5
). This totally destroys your categorization.
Another common raster problem is resolution mis-match. What happens if you’ve got a 30 meter resolution and a 1 kilometer resolution raster that you want to have talk to each other through raster math? So my bigtimeruleofthumb for mixed resolution data is that you HAVE TO go with the coarser resolution data when you resample… if you resample data to a higher resolution, you’re lying to yourself and pretending that you have better data than you actually have. You don’t have two 30 meter datasets. You have one awesome 30 meter dataset and one less-than-awesome 1 km dataset. If you want to interact the two, then you really have two 1 km datasets. Bummer, huh? But it’s honest.
Say we want to compute the most common land use category in a 1 kilometer grid. Our cache
dataset is a 30 meter dataset. We can fit 33.3333333 30 meter pixels along the edge of a single 1 kilometer pixel.
cache_1k <- aggregate(cache, fact = 33.3333, fun = modal, na.rm=T)
res(cache_1k)
## [1] 990 990
hist(cache_1k)
aggregate()
is a cool function that lets you coarsen the resolution of your raster. Here we essentially move from a 30 meter dataset to a 30*30 meter pixel (approximately 1 km). The scaling factor is set as fact
. Pixel values in the new raster are determined by a function of your choosing (fun
). In our case, we compute the mode landuse and assign that value to our new raster.
What if you don’t want a 900 meter resolution dataset? What if you need something that’s exactly 1000 meters (1 km)? resample()
can help:
ideal_ras <- raster(ncol = 55, nrow = 75)
extent(ideal_ras) <- extent(cache_1k)
res(ideal_ras) <- c(1000,1000)
cache_1k_2 <- resample(cache_1k, ideal_ras, method = "ngb")
res(cache_1k_2)
## [1] 1000 1000
hist(cache_1k_2)
resample()
is even easier if you have a raster in the projection/extent you desire. Say that raster is called ideal_ras
… then you simple call resample(current_ras, ideal_ras, method = "ngb")
. This is also a way to do subtle shifts in rasters, like shifting centroids.
Other useful functions for moving your raster around:
shift()
performs linear shifts of your rasterrotate()
- you guessed it - rotates your rasterflip()
reverses the order of your raster (x,y to y,x)If you want to find the difference between values in two rasters, it’s fairly complicated:
diff <- raster_1 - raster_1
Don’t forget that a raster is just a matrix, so math is a breeze! Well, kinda… raster math that’s this simple only works if your rasters have the same extent and resolution and if the calculations are fairly simple. When datasets don’t match or you’re interested in complicated functions, the overlay()
function is your friend. overlay()
takes two or more rasters and applies the same function to them in a way that’s a bit more efficient than raster math or for-looping. If you get to a place where you think you’ll need overlay()
let me know. Otherwise, know that the syntax goes a little something like this:
out <- overlay(raster_1, raster_2, function = mySpecialFunction)
It’s a good tool to know about, just in case. If your dataset is stacked or bricked, you’ll want to use calc()
instead of overlay()
.
If you want to compute some statistic on the entire raster, use cellStats()
:
ras_mean <- cellStats(raster_1, stat = "mean", na.rm=T)
calc()
is a useful function to know about. Say you have a raster brick (latitude by longitude by time) with observations of landuse from 2008 to 2015 and you want to compute the most frequent category of land use through time. This is fairly simple:
calc(my_brick, fun = modal, na.rm = T)
Here calc()
basically extracts a vector of observations (through time in this case) for each pixel and applies the function specified in fun
to each vector.
Another nice function is stackApply()
. You can use this to apply a function to a subset of rasters in a raster brick. You just have to provide the indices (i.e. locations) of the rasters of interest, for example:
stackApply(my_brick, fun = modal, indices = c(1, 1, 1, 2, 2, 2), na.rm = T)
In this case, if our my_brick
object has six bands, our indexing approach tells stackApply()
to compute the mode of the first three layers and the second three layers. Useful!
More on raster math here and here.
You can index a raster just like a matrix. If, for whatever reason, we want to change all pixels in our cache_1k
raster with a value of 4
to 938
, we can do that using simple base R
indexing:
cache_1k[cache_1k == 4] <- 938
Say you want to do this reclassification for many categories. Then you’ll need to use the reclassify()
function. To use this function, you’ll have to use a sort of “translation” matrix that lists tells R
which values you want to reclassify. For example, if you want to classify values from 1 to 5 as 333, then you’d need to do the following:
new_cache <- reclassify(cache_1k, c(1, 5, 333))
# c(1, 5, 333) as c(from, to, as)
plot(new_cache)
Cropping rasters in R
is pretty straightforward. If you know the coordinates of your new raster extent, then:
new.extent <- extent(c(-125, -100, 20, 50))
out <- crop(raster, new.extent)
You can get the new.extent
values from an existing raster by calling extent(new_extent_raster)
. You can also pull this info out of a shapefile, but you’ll have to slightly tweak the format of the extent by forcing it into a matrix format using matrix()
:
extent <- matrix(st_bbox(new_extent_shapefile), nrow=2)
If you want to crop to a specific shape rather than the extent of that shape, then you need to add an additional step. Say we want to extract the part of this land use raster located in the municipality of North Logan using a shapefile of North Logan called nl
(note that these two spatial have to be in the same projection for the crop()
to work).
nl_raster <- crop(cache, nl)
plot(nl_raster)
plot(nl, add=T)
Notice that this only cropped the land use raster to the extent of the North Logan shapefile, not to the actual boundaries of the shapefile. To do that, we have to add one more step:
nl_clean <- mask(nl_raster, mask = nl)
plot(nl_clean)
Masking raster data is something we need to do quite a bit in the remote sensing world. We can do this using the mask()
function. Let’s say we want to mask pixels classified as pasture (81
) in our cache
dataset:
alfalfa <- mask(cache, cache == 36, maskvalue=F)
plot(alfalfa)
Here maskvalue=F
masks all pixels that are NOT classified as alfalfa. If we set this to T
, we can mask all pixels classified as alfalfa:
not_alfalfa <- mask(cache, cache == 36, maskvalue=T)
plot(not_alfalfa)
If you want to mask multiple values, just add a list:
multi_mask <- mask(cache, cache %in% c(1, 36, 141, 142), maskvalue=T)
plot(multi_mask)
Say you have a Landsat image for the Berkane Region of Morocco:
ls <- brick("./data/maroc.tif")
plot(ls)
The gross lines are because of an error in the Landsat 7 satellite, which you can read more about here. Notice that instead of loading the data with the raster()
function, which is great for single-band data, here I use brick()
. Remember that the main difference is that a RasterBrick
can be linked to only one (though multi-layer) file, in our case maroc.tif
. RasterStacks
can be made with multiple files (band from one file merged with a band from another file) - though these objects have to have the same extent and resolution. So if we happened to have our Landsat bands in separate .tif
files, we’d want to load the data using stack().
Let’s inspect this data:
dim(ls)
## [1] 1159 1603 3
summary(ls)
## maroc.1 maroc.2 maroc.3
## Min. 327 363 438
## 1st Qu. 2259 1205 995
## Median 2536 1632 1230
## 3rd Qu. 2809 1949 1443
## Max. 4755 3858 3390
## NA's 122118 122118 122118
Most of the functions in the raster
package that we’ve used on single-band rasters also work on multi-band rasters, they just spit out a result for each band in the raster.
minValue(ls)
## [1] 296 339 411
maxValue(ls)
## [1] 4858 4089 3647
We can index the third dimension in these rasters as follows:
b1 <- ls[,,1]
# or #
b1 <- ls[[1]] # note the double brackets
The two commas just tells R
to select ALL rows and all columns, then only the first band of the multi-band object.
You can write out a raster using the writeRaster()
function in the raster
package.
writeRaster(my_raster, "my/directory/raster.tif") # you can use whatever extension you like - I prefer .tif
USGS data repositories allow you to search for data by time, year, satellite, resolution, etc.:
You can also download data directly from NASA (easiest when downloading large time-series). University of Maryland maintains a database of pre-processed images for the remote sensing of land surface processes. You can find lots of DEMs here (ASTER and STRM are the most widely used). You can find cool, free high-resolution topographic and bathymetric data here. This website groups data sources by application (e.g. oceanography, land surface processes, forecasting) and can be helpful to explore.
RSpatial
text provides more information about remote sensing in R
.The way we categorize data in rasters can become EXTREMELY important as we move towards spatial analysis of raster data. More to come later in the class!↩