31 Raster Data
Rasters are a form of data that is georeferenced and (somewhat) continuous. Raster data is perhaps best envisioned as a matrix of values, whose entries represent spatially referenced data points. The raster itself can be visualized as you could for normal matrix output. What makes a raster different, however, is that it is (or should be) georeferenced. This means that each element of the matrix represents some measurement on the ground having a specific location and spread. This is analogous to an image, where if you zoom in on it enough, you will be able to differentiate between individual pixels, it is just that for rasters, each pixel has a spatial location and size associated with it that we can map onto the earth.
You can either create raster objects de novo or you can acquire them from some external source. To create one from scratch, you start with a matrix of values and then construct the raster object using the raster()
function as:
library(raster)
r <- matrix(runif(10000),nrow=100)
rnd <- raster( r )
which can be visualized using the normal plot command. The raster library has overridden several of the plotting functions and you can plot raster objects and decorate the images in the same way you do for normal plotting materials (5).
plot(rnd)
There are also many available repositories for raster data including Open Source, Governmental, and Municipal locations. One common source for these data is that of http://worldclim.org. This repository contains temperature and precipitation data generalized for the entire globe.
These data are available free of charge and have been used in numerous biological studies. Moreover, they provide a set of ‘biologically relevant’ layers, called BioClim
, that summarize both temperature and precipitation. They motivate these by saying:
Bioclimatic variables are derived from the monthly temperature and rainfall values in order to generate more biologically meaningful variables. These are often used in ecological niche modeling (e.g., BIOCLIM, GARP). The bioclimatic variables represent annual trends (e.g., mean annual temperature, annual precipitation) seasonality (e.g., annual range in temperature and precipitation) and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest month, and precipitation of the wet and dry quarters).
These layers are encoded into 19 Bio-layers as defined in Table 31.1. These layers are available for download from their site directly (I recommend using the tiles approach so you do not have to download the entire world map) as well as from the R package dismo.
Layer | Description |
---|---|
BIO1 | Annual Mean Temperature |
BIO2 | Mean Diurnal Range (Mean of monthly (max temp - min temp)) |
BIO3 | Isothermality (BIO2/BIO7 * 100) |
BIO4 | Temperature Seasonality (standard deviation * 100) |
BIO5 | Max Temperature of Warmest Month |
BIO6 | Min Temperature of Coldest Month |
BIO7 | Temperature Annual Range (BIO5-BIO6) |
BIO8 | Mean Temperature of Wettest Quarter |
BIO9 | Mean Temperature of Driest Quarter |
BIO10 | Mean Temperature of Warmest Quarter |
BIO11 | Mean Temperature of Coldest Quarter |
BIO12 | Annual Precipitation |
BIO13 | Precipitation of Wettest Month |
BIO14 | Precipitation of Driest Month |
BIO15 | Precipitation Seasonality (Coefficient of Variation) |
BIO16 | Precipitation of Wettest Quarter |
BIO17 | Precipitation of Driest Quarter |
BIO18 | Precipitation of Warmest Quarter |
BIO19 | Precipitation of Coldest Quarter |
For the purposes of this chapter, I’ll use bioclim and altitude layers from tile 22, which encompasses the spatial distribution of sampling locations in Baja California for the Araptus attenuatus dataset
Common raster formats include GeoTiff, essentially an image file with some metadata associated with it, and BIL (Binary interleaved) file formats. Both of these types are available from WorldClim. In general, the GeoTiff format is slightly easier to work with as all the data is contained within a single file, whereas the BIL format has two files for each raster (the second file is a header file that has the spatial meta data associated with it). If you do use the BIL format, the file path you pass to raster()
would be of the BIL file, not the header one.
From Worldclim, I downloaded the elevation raster for Tile 22 and can load it into R using the raster()
function as:
alt <- raster("./spatial_data/alt.tif")
alt
## class : RasterLayer
## dimensions : 3600, 3600, 12960000 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -120, -90, 0, 30 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## data source : /Users/rodney/Library/Mobile Documents/com~apple~CloudDocs/Books/AMG/spatial_data/alt.tif
## names : alt
## values : -202, 5469 (min, max)
The alt
object is summarized here. A couple of things should be pointed out here: - In total, this is an object with 12,960,000 entries in it!
- The resolution of each ‘pixel’ in this representation is 0.008, which is about 30-arc seconds or ~1km. That means that each loation in the study area is represented by the same exact value as the surrounding square kilometer. Obviously, if you are working on processes whose spatial scale is relevant less than 1000 m2, this kind of data is going to be of little value to you. - The values within the matrix range from -202 upwards to 5469. This is in meters.
In addition, the raster has a spatial extent and a projection associated with it. For more information on projections see 32.
bbox( alt )
## min max
## s1 -120 -90
## s2 0 30
proj4string( alt )
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
This elevation raster looks like:
plot( alt, xlab="Longitude", ylab="Latitude" )
31.1 Cropping Rasters
Just because we have a large raster does not mean that it is in your best interest to use the entire object. Much of the spatial analyses routines used in population genetics require measurements of intervening distance, either Euclidean or ecological. Many of the routines for estimation of these distances require the estimation of pairwise distance between all pixels. For our purposes, the arapat
dataset does not occur throughout most of this map, so it is in our best interests to use only the portion of the raster relevant to our data rather than the entire thing.
Here is one way of going this. I first define an extent, which consists of a vector representing the coordinates for xmin
, xmax
, ymin
, and ymax
(in decimal degrees longitude and latitude). You then crop()
the raster to that extent.
e <- extent( c(-115,-109,22,30) )
baja_california <- crop( alt, e )
plot(baja_california, xlab="Longitude",ylab="Latitude")
Lets make this base map a bit more pretty by taking the altitude and estimating the slope of each pixel and the direction it is facing (aspect). From this, we can ‘shade’ the hills in the map giving it more of a relief view we commonly see in maps.
slope <- terrain( baja_california, opt="slope" )
aspect <- terrain( baja_california, opt="aspect")
baja <- hillShade( slope, aspect, 40, 270 )
plot(baja, xlab="Longitude",ylab="Latitude", legend=FALSE)
Onto this map, we can plot our populations. For this, we convert the raw coordinates into a SpatialPoints
object (see 30) and then overlay onto the map. I use two points()
commands to make the symbol for each population.
data(arapat)
coords <- strata_coordinates(arapat)
pts <- SpatialPoints( coords[,2:3], proj4string = CRS(proj4string(baja)))
plot(baja, xlab="Longitude",ylab="Latitude", legend=FALSE)
points( pts, col="darkred", pch=3)
points( pts, col="red", pch=16)
31.1.1 Cropping Rasters Via Polygons
It is also possible to crop a raster with a more fine grained approach using a polygon. Here is an example using five points picked around the region of Loreto, BCS (I just grabbed these by looking at Google Earth). You define a polygon by a series of points, the last of which has to be identical to the first one so that the polygon is a closed object and not just a series of points on a crooked line…
pts <- rbind( c(-111.5,27.0),
c(-112.4,26.7),
c(-111.7,25.7),
c(-111.1,25.4),
c(-110.8,26.0),
c(-111.5,27.0) )
pts
## [,1] [,2]
## [1,] -111.5 27.0
## [2,] -112.4 26.7
## [3,] -111.7 25.7
## [4,] -111.1 25.4
## [5,] -110.8 26.0
## [6,] -111.5 27.0
From these points, we construct a SpatialPolygons
object (see @ref{polygons} for more info on this convoluted construction) and then can overlay onto the map to make sure it in the correct vicinity (here we are eyeballing it a bit). For more on why this next line of code looks so crazy, see 30.3.
plot(baja, legend=FALSE)
plot(polys, add=TRUE)
To use the polygon to crop the raster, we have to both remove the part of the raster that is not contained within the polygon (mask
) and then cut down the remaining raster to change the bounding box to that representing the portion of the data that remains (trim
). If you do not trim the raster, it will have the same amount of data associated with it as the previous raster (e.g., the underlying data matrix will have 960 rows and 720 columns) but the part that is masked will be represented by NA
values. For many rasters, the data is held in-memory (see the entry for ’data sourcein the summary above) and as such removing as much of a raster that is
NA` improves your ability to manipulate it better.
loredo <- trim( mask( baja, polys ) )
plot(loredo)
31.1.2 Cropping Rasters Via Convex Hull
An analysis common to modern population genetics is that of finding ecological distances between objects on a landscape. The estimation of pairwise distance derived from spatial data is a computationally intensive thing, one that if you are not careful will bring your laptop to its knees! One way to mitigate this data problem is to use a minimal amount raster area so that the estimation of the underlying distance graph can be done on a smaller set of points.
Cropping by a polygon like demonstrated in the previous example is a ‘by hand’ approach to estimating a box that roughly encompasses your data points. A more efficient one is one where you simply provide your coordinates and we can estimate a polygon that surrounds those coordinates with the minimal amount of wasted space. This is called a Convex Hull, which is kind of like a polygon that is created as if there was a rubbrerband fit around all your points. It is a minimal area that includes all of your points.
For this example, I’m going to use the populations found along the peninsula and find the minimal area encompassing those points.
data(arapat)
coords <- strata_coordinates(arapat)
baja_coords <- coords[ !(coords$Stratum %in% c("101","102","32")), ]
baja_pts <- SpatialPoints( baja_coords[,2:3])
plot(baja, legend=FALSE)
plot(baja_pts,add=T,col="darkred")
plot(baja_pts,add=T,col="darkred",pch=16)
The methods for finding the hull and adding a buffer around it are found in the rgeos
package. These are pretty easy functions to use and are very helpful. If you are having trouble installing the rgeos
package from source, see 1.1.5.1.
hull <- gConvexHull(baja_pts)
plot(baja, legend=FALSE)
plot(baja_pts,add=T,col="darkred")
plot(baja_pts,add=T,col="darkred",pch=16)
plot(hull,add=T,border="red")
The function gConvexHull()
returns an object of type SpatialPolygons, just like we created before. However, we now have a polygon that has each of our most ‘outward’ populations on the very edge of the polygon. It may be beneficial for us to add a buffer around this polygon.
plot(baja, legend=FALSE)
plot(baja_pts,add=T,col="darkred")
plot(baja_pts,add=T,col="darkred",pch=16)
plot(hull_plus_buffer, add=T, border="red")
Now, we can mask and trim it to include only the area of interest.
plot(pop_hull, legend=FALSE, xlab="Longitude", ylab="Latitude")
plot(baja_pts,add=T,col="darkred",pch=16)
This would be a great raster to start looking at ecological separation in since we have removed the extraneous data that would unintentionally cause problems with the distance estimators.
31.2 Modifying Rasters
We can modify rasters just as easily as we can crop them. They are matrices, after all. The square bracket indexing you use for matrices are just as effective as before.
In the next example, I mask the landscape based upon elevation. I create a copy of the original raster and then make everything whose elevation is less than 500m as missing data. Plotting this over the top of the original raster shows only locations where elevation exceeds this cutoff.
bc[ bc < 500 ] <- NA
plot( baja_california, legend=FALSE, col="darkgrey" )
plot( bc, add=TRUE, legend=FALSE)
31.3 Extracting Point Data From Rasters
So far, rastes have been confined to representing a single static object. However, it is not uncommon to need to query a raster and find out the values at particular points. These points may be pre-defined or they may be dynamic (e.g., you need to point at a location on the map and determine the value there).
For queries of the first kind, we can use the extract()
function. For this I downloaded the average temperature and precipitation rasters from Worldclim.
baja_temp <- raster("./spatial_data/bio1.tif")
baja_prec <- raster("./spatial_data/bio12.tif")
And then extract the values from each of these layers into the coords
data we already have set up.
coords$elevation <- extract( baja_california, coords[,c(2,3)])
coords$mean_temp <- extract( baja_temp, coords[,c(2,3)])
coords$mean_precip <- extract( baja_prec, coords[,c(2,3)])
coords[1:10,]
## Stratum Longitude Latitude elevation mean_temp mean_precip
## 1 88 -114.2935 29.32541 681 178 143
## 11 9 -113.9449 29.01457 361 195 148
## 20 84 -113.6679 28.96651 368 197 124
## 29 175 -113.4897 28.72796 240 205 106
## 36 177 -113.9914 28.66056 177 203 120
## 46 173 -112.8698 28.40846 26 223 102
## 56 171 -113.1826 28.22308 522 195 145
## 66 89 -113.3999 28.03661 290 203 117
## 76 159 -113.3161 27.52944 87 211 123
## 85 SFr -112.9640 27.36320 305 205 107
A note should be made on the temperature and precipitation values. Temperature is denoted in tenths of a degree Celcius. Though it does get quite hot at times, it does not average 188°C at population 88! Similarly, the units for precipitation are mm (or tenths of centimeters if you will…).
library(ggrepel)
coords <- coords[ order(coords$Latitude),]
p <- ggplot( coords, aes(x=Latitude,y=elevation)) + geom_line(color="lightgrey")
p <- p + geom_point() + ylab("Elevation (m)")
p + geom_text_repel(aes(label=Stratum), color="red")
The package ggrepel
provides a pseudo-smart labelling geometry for ggplot allowing you to have labels that are shifted around the points so as to maximize visibility.
For inquires of the second type, we can use the function click()
to retrieve one of several outputs. Here is the help file that describes the various components.
click {raster} R Documentation Query by clicking on a map Description Click on a map (plot) to get values of a Raster* or Spatial* object at that location; and optionally the coordinates and cell number of the location. For SpatialLines and SpatialPoints you need to click twice (draw a box). Usage ## S4 method for signature 'Raster' click(x, n=Inf, id=FALSE, xy=FALSE, cell=FALSE, type="n", show=TRUE, ...) ## S4 method for signature 'SpatialGrid' click(x, n=1, id=FALSE, xy=FALSE, cell=FALSE, type="n", ...) ## S4 method for signature 'SpatialPolygons' click(x, n=1, id=FALSE, xy=FALSE, type="n", ...) ## S4 method for signature 'SpatialLines' click(x, ...) ## S4 method for signature 'SpatialPoints' click(x, ...) Arguments x - Raster*, or Spatial* object (or missing) n - number of clicks on the map id - Logical. If TRUE, a numeric ID is shown on the map that corresponds to the row number of the output xy - Logical. If TRUE, xy coordinates are included in the output cell - Logical. If TRUE, cell numbers are included in the output type - One of "n", "p", "l" or "o". If "p" or "o" the points are plotted; if "l" or "o" they are joined by lines. See ?locator show - logical. Print the values after each click? ... - additional graphics parameters used if type != "n" for plotting the locations. See ?locator Value The value(s) of x at the point(s) clicked on (or touched by the box drawn). Note The plot only provides the coordinates for a spatial query, the values are read from the Raster* or Spatial* object that is passed as an argument. Thus you can extract values from an object that has not been plotted, as long as it spatially overlaps with with the extent of the plot. Unless the process is terminated prematurely values at at most n positions are determined. The identification process can be terminated by clicking the second mouse button and selecting 'Stop' from the menu, or from the 'Stop' menu on the graphics window. See Also select, drawExtent Examples r <- raster(system.file("external/test.grd", package="raster")) #plot(r) #click(r) #now click on the plot (map)
Here is the output from a single inquire on the baja_california
raster map.
plot( baja_california, legend=FALSE)
click(baja_california, xy=TRUE)
## x y value
## 1 -113.3875 27.82083 116
31.3.1 Stacks of Rasters
It is not uncommon to be working with many different raster layers at the same time. Instead of loading them individually, the raster
library has a RasterStack
object that can hold several rasters at one time and can be used in places where we would use individual rasters. Here is an example using the elevation and temperature rasters for tile 22.
files <- c("./spatial_data/alt.tif", "./spatial_data/bio1.tif", "./spatial_data/bio5.tif", "./spatial_data/bio6.tif")
bio_layers <- stack( files )
bio_layers
## class : RasterStack
## dimensions : 3600, 3600, 12960000, 4 (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -120, -90, 0, 30 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : alt, bio1, bio5, bio6
## min values : -202, -20, 54, -83
## max values : 5469, 295, 424, 242
Performing operations on a stack is as easy as performing them on individual layers. Here, I trim them to the hull defined above.
e <- extent( c(-115,-109,22,30) )
bio_layers <- crop( bio_layers, e )
plot(bio_layers)
From which values may be extracted using normal methods as outlined above.
df <- extract( bio_layers, baja_pts)
df <- df[ !is.na(df[,1]),]
head(df)
## alt bio1 bio5 bio6
## [1,] 681 178 331 48
## [2,] 361 195 346 61
## [3,] 368 197 348 62
## [4,] 240 205 355 68
## [5,] 177 203 352 71
## [6,] 26 223 372 81
And visualized normally.
library( GGally )
ggpairs( as.data.frame(df))
31.4 Rasters & ggplot
As is the case with a lot of data types in R, there is a way to use the ggplot
library to visualize rasters. Essentially, what you need to do is to transform your raster objects into data.frame
objects for ggplot
’s geom_tile()
function. Here is an example.
library(ggplot2)
df <- data.frame( rasterToPoints( baja_california ))
names(df) <- c("Longitude","Latitude","Elevation")
p <- ggplot( df ) + geom_tile( aes(x=Longitude,y=Latitude,fill=Elevation))
p <- p + scale_fill_gradientn( colors=c('#a6611a','#dfc27d','#f5f5f5','#80cdc1','#018571'))
p <- p + coord_equal() + xlab("Longitude") + ylab("Latitude")
p
As usual, we can add additional information to the plot and as we would for any other ggplot
object. Here I’ll add the populations and indicate if they have samples in them that are of one species (Pure) or have a mix of the two (Mixed).
num.clades <- colSums( table(arapat$Species, arapat$Population) > 0 )
Stratum=names(num.clades)
Species= factor( c("Pure","Mixed")[num.clades] )
tmp.df <- data.frame( Stratum, Species )
sites <- merge( coords, tmp.df )
p + geom_point( aes(x=Longitude,y=Latitude, shape=Species), size=3, data=sites )
31.5 3D Visualization
It is also possible to visualize rasters in 3-space. The library rasterVis
provides an interface to the rgl
library to plot a surface. Once installed, these are easy to use for viewing surfaces. Here is an example using the elevation data we have been playing with.
library(rasterVis)
plot3D( baja_california , zfac=0.1)
The zfac
option in the plot is the amount to scale the z-axis (elevation) in relation to the x-axis and y-axis dimensions. It is a bit exagerated at zfac=0.1
but you get the idea.
31.6 Saving & Exporting Rasters
There are many situations where you need to save a raster you’ve manipulated. As these raster objects are R objects, you can save them directly to the file system using the write()
as:
write( baja_california, filename="baja_california.rda")
This will save the raster object to file exactly like it is in your R session. To load it back in you just use load()
and it is returned just like it was. The benefit to saving these as R objects is that you do not need to change it at all to pick up where you left off.
You may also need to export the raster in a non-R format for external analyses. To do this, you use the writeRaster()
function. The file extension is used to determine the file format used and R saves it automatically.
writeRaster( baja_california, filename="baja_california.tif")