30 Vector Data
For the purposes of this chapter, I will consider vector data as consisting of a finite set of points that may or may not be connected. In R
, these points can be used directly, as a numeric data type, or as a Spatial* class object. The sp
library contains a lot of functions that help deal with points, lines, and polygons and this is going to be a short overview of how they can be derived and manipulated in the pursuit of population genetic studies.
The construction of Spatial* objects is a bit convoluted. I don’t appreciate why that is, it is just the way it was constructed. Here is a set series of examples to get you going. I will wait until subsequent chapters for us to use these structures in analyses and data extraction, this is mostly a quick tutorial on how to create these objects.
30.1 Points
Points are defined by SpatialPoints objects. A collection of points may have additional data associated with each location and would make a SpatialPointsDataFrame. This is a bit different than the normal data.frame
objects we’ve been using with coordinates in them already—in fact it is the opposite of that. It is a set of points within which is located a data.frame
rather than data.frame
that has within it a set of points.
Confused yet? Lets get to the point and make some coordinates. Here is the way we’ve extracted points from the Arapatus attenuatus data set thus far.
library(gstudio)
data(arapat)
coords <- strata_coordinates(arapat)
summary(coords)
## Stratum Longitude Latitude
## 101 : 1 Min. :-114.3 Min. :23.08
## 102 : 1 1st Qu.:-112.9 1st Qu.:24.52
## 12 : 1 Median :-111.5 Median :26.21
## 153 : 1 Mean :-111.7 Mean :26.14
## 156 : 1 3rd Qu.:-110.4 3rd Qu.:27.47
## 157 : 1 Max. :-109.1 Max. :29.33
## (Other):33
However, we can also derive these points directly as a SpatialPoints object defined in the sp library by setting the optional flag as.SpatialPoints=TRUE
.
library(sp)
library(raster)
pts <- strata_coordinates( arapat, as.SpatialPoints = TRUE )
pts
## class : SpatialPoints
## features : 39
## extent : -114.2935, -109.1263, 23.0757, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : NA
Notice that there is no coordinate reference system in the default extraction. This is because you can pass a wide array of coordinates to this function and it only takes the centroid. It is up to you to define the projection and datum for the data. If it is Long/Lat data as in the example, it can be defined as:
proj4string(pts) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
pts
## class : SpatialPoints
## features : 39
## extent : -114.2935, -109.1263, 23.0757, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
Any set of x- and y- coordinates can be turned into a SpatialPoints object. If we are to associate data with those points, the data has to have the same number of observations as there are coordinates. In our case here, we have 39 populations and as an example I’ll determine the number of individuals genotyped in each population as a
df <- data.frame( table(arapat$Population) )
names(df) <- c("Population","N")
pts.df <- SpatialPointsDataFrame(pts,df)
pts.df
## class : SpatialPointsDataFrame
## features : 39
## extent : -114.2935, -109.1263, 23.0757, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : Population, N
## min values : 101, 4
## max values : SFr, 19
You can translate it back into a data.frame
object as:
as.data.frame( pts.df )[1:5,]
## Population N x y
## 88 101 9 -114.2935 29.32541
## 9 102 8 -113.9449 29.01457
## 84 12 10 -113.6679 28.96651
## 175 153 10 -113.4897 28.72796
## 177 156 6 -113.9914 28.66056
or access the data within the data.frame
directly (thereby not needing to make a new object) using the attribute @
operator
pts.df@data[1:5,]
## Population N
## 1 101 9
## 2 102 8
## 3 12 10
## 4 153 10
## 5 156 6
Since it is a SpatialPoints object, you can get information about it such as the bounding box (e.g., the coordinates of a box that encloses all the points).
bbox(pts.df)
## min max
## x -114.2935 -109.12633
## y 23.0757 29.32541
30.2 Lines
Lines are created by pairs of points. A single Line object
c1 <- cbind(coords$Longitude[1:2], coords$Latitude[1:2])
c2 <- cbind(coords$Longitude[2:3], coords$Latitude[2:3])
L1 <- Line(c1)
L2 <- Line(c2)
L1
## An object of class "Line"
## Slot "coords":
## [,1] [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
coordinates(L1)
## [,1] [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
A collection of Line objects can be put into a Lines object.
Ls1 <- Lines( list(L1), ID="88 to 9")
Ls2 <- Lines( list(L2), ID="9 to 84")
Ls1
## An object of class "Lines"
## Slot "Lines":
## [[1]]
## An object of class "Line"
## Slot "coords":
## [,1] [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
##
##
##
## Slot "ID":
## [1] "88 to 9"
And if they are spatial in context (e.g., if we need to plot them in any way, shape, or form), we need to put them into a SpatialLines object, which is also constructed from a list
of Lines objects.
SLs <- SpatialLines( list(Ls1,Ls2))
proj4string(SLs) <- CRS(proj4string(pts))
SLs
## class : SpatialLines
## features : 2
## extent : -114.2935, -113.6679, 28.96651, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
If we want to add data to the set of lines, we can associate a data.frame
with each of them with internal data.
df <- data.frame( Sequence = c("First","Second"), Like_It= c(TRUE,FALSE), row.names = c("88 to 9","9 to 84"))
SLDF <- SpatialLinesDataFrame( SLs, df )
SLDF
## class : SpatialLinesDataFrame
## features : 2
## extent : -114.2935, -113.6679, 28.96651, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : Sequence, Like_It
## min values : First, TRUE
## max values : Second, FALSE
as.data.frame(SLDF)
## Sequence Like_It
## 88 to 9 First TRUE
## 9 to 84 Second FALSE
We can also extract the line lengths of each line.
SpatialLinesLengths(SLs, longlat = TRUE)
## [1] 48.34721 27.51264
30.3 Polygons
A polygon is simply a collection of line segments that closes in on itself. We can use polygons to identify habitat, define boundaries, etc. In the short description to follow, we will create a set Polygon* objects, culminating in a SpatialPolygonsDataFrame object.
We will start with the first 5 coordinates in the arapat
data set. To make the polygon, we must close the coordinates, which means take the first one we put in and append it to the end of the list of coordinates, such that in this case c[1,] == c[6,]
.
c <- cbind( coords$Longitude[1:5], coords$Latitude[1:5])
c <- rbind( c, c[1,])
P <- Polygon( c )
P
## An object of class "Polygon"
## Slot "labpt":
## [1] -113.89179 28.89114
##
## Slot "area":
## [1] 0.1849395
##
## Slot "hole":
## [1] FALSE
##
## Slot "ringDir":
## [1] 1
##
## Slot "coords":
## [,1] [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
## [3,] -113.6679 28.96651
## [4,] -113.4897 28.72796
## [5,] -113.9914 28.66056
## [6,] -114.2935 29.32541
As you can see, there is some additional information provided in the default layout. A few points to be made:
- The area
parameter is not georeferenced as the polygon itself has no projection.
- The labpt
is the coordinate where a label would be plot. - The hole
and ringDir
determine if the polygon represent a hole in some other polygon (e.g., the doughnut hole and the direction it is plot).
Similar to how we constructed SpatialLines, a Polygon must be in inserted into a set of Polygons
Ps <- Polygons( list(P), ID="Bob")
From which a list of can be created to make a SpatialPolygons object
SPs <- SpatialPolygons( list(Ps))
proj4string(SPs) <- CRS(proj4string(pts))
SPs
## class : SpatialPolygons
## features : 1
## extent : -114.2935, -113.4897, 28.66056, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
And data can be added to it making a SpatialPolygonsDataFrame (n.b., The row.names
of the data.frame
must match the ID
we set for making the Polygons objects). If they do not, there will be an error thrown.
df <- data.frame(Populations=paste(coords$Stratum[1:5],collapse=", "), row.names = "Bob")
SPDF <- SpatialPolygonsDataFrame( SPs, df)
SPDF
## class : SpatialPolygonsDataFrame
## features : 1
## extent : -114.2935, -113.4897, 28.66056, 29.32541 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 1
## names : Populations
## min values : 88, 9, 84, 175, 177
## max values : 88, 9, 84, 175, 177
30.4 Saving Vector Objects
As all of these objects are R objects, they can be saved to disk using the save()
function, which makes them a *.rda object. If you have objects that take a bit of time to create, it is in your best interests to save them after creation and on subsequent analyses, just use the saved versions.