30 Vector Data

For the purposes of this chapter, I will consider vector data as consisting of a finite set of points that may or may not be connected. In R, these points can be used directly, as a numeric data type, or as a Spatial* class object. The sp library contains a lot of functions that help deal with points, lines, and polygons and this is going to be a short overview of how they can be derived and manipulated in the pursuit of population genetic studies.

The construction of Spatial* objects is a bit convoluted. I don’t appreciate why that is, it is just the way it was constructed. Here is a set series of examples to get you going. I will wait until subsequent chapters for us to use these structures in analyses and data extraction, this is mostly a quick tutorial on how to create these objects.

30.1 Points

Points are defined by SpatialPoints objects. A collection of points may have additional data associated with each location and would make a SpatialPointsDataFrame. This is a bit different than the normal data.frame objects we’ve been using with coordinates in them already—in fact it is the opposite of that. It is a set of points within which is located a data.frame rather than data.frame that has within it a set of points.

Confused yet? Lets get to the point and make some coordinates. Here is the way we’ve extracted points from the Arapatus attenuatus data set thus far.

library(gstudio)
data(arapat)
coords <- strata_coordinates(arapat)
summary(coords)
##     Stratum     Longitude         Latitude    
##  101    : 1   Min.   :-114.3   Min.   :23.08  
##  102    : 1   1st Qu.:-112.9   1st Qu.:24.52  
##  12     : 1   Median :-111.5   Median :26.21  
##  153    : 1   Mean   :-111.7   Mean   :26.14  
##  156    : 1   3rd Qu.:-110.4   3rd Qu.:27.47  
##  157    : 1   Max.   :-109.1   Max.   :29.33  
##  (Other):33

However, we can also derive these points directly as a SpatialPoints object defined in the sp library by setting the optional flag as.SpatialPoints=TRUE.

library(sp)
library(raster)
pts <- strata_coordinates( arapat, as.SpatialPoints = TRUE )
pts
## class       : SpatialPoints 
## features    : 39 
## extent      : -114.2935, -109.1263, 23.0757, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : NA

Notice that there is no coordinate reference system in the default extraction. This is because you can pass a wide array of coordinates to this function and it only takes the centroid. It is up to you to define the projection and datum for the data. If it is Long/Lat data as in the example, it can be defined as:

proj4string(pts) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
pts
## class       : SpatialPoints 
## features    : 39 
## extent      : -114.2935, -109.1263, 23.0757, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Any set of x- and y- coordinates can be turned into a SpatialPoints object. If we are to associate data with those points, the data has to have the same number of observations as there are coordinates. In our case here, we have 39 populations and as an example I’ll determine the number of individuals genotyped in each population as a

df <- data.frame( table(arapat$Population) )
names(df) <- c("Population","N")
pts.df <- SpatialPointsDataFrame(pts,df)
pts.df
## class       : SpatialPointsDataFrame 
## features    : 39 
## extent      : -114.2935, -109.1263, 23.0757, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 2
## names       : Population,  N 
## min values  :        101,  4 
## max values  :        SFr, 19

You can translate it back into a data.frame object as:

as.data.frame( pts.df )[1:5,]
##     Population  N         x        y
## 88         101  9 -114.2935 29.32541
## 9          102  8 -113.9449 29.01457
## 84          12 10 -113.6679 28.96651
## 175        153 10 -113.4897 28.72796
## 177        156  6 -113.9914 28.66056

or access the data within the data.frame directly (thereby not needing to make a new object) using the attribute @ operator

pts.df@data[1:5,]
##   Population  N
## 1        101  9
## 2        102  8
## 3         12 10
## 4        153 10
## 5        156  6

Since it is a SpatialPoints object, you can get information about it such as the bounding box (e.g., the coordinates of a box that encloses all the points).

bbox(pts.df)
##         min        max
## x -114.2935 -109.12633
## y   23.0757   29.32541

30.2 Lines

Lines are created by pairs of points. A single Line object

c1 <- cbind(coords$Longitude[1:2], coords$Latitude[1:2])
c2 <- cbind(coords$Longitude[2:3], coords$Latitude[2:3])
L1 <- Line(c1)
L2 <- Line(c2)
L1
## An object of class "Line"
## Slot "coords":
##           [,1]     [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
coordinates(L1)
##           [,1]     [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457

A collection of Line objects can be put into a Lines object.

Ls1 <- Lines( list(L1), ID="88 to 9")
Ls2 <- Lines( list(L2), ID="9 to 84")
Ls1
## An object of class "Lines"
## Slot "Lines":
## [[1]]
## An object of class "Line"
## Slot "coords":
##           [,1]     [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
## 
## 
## 
## Slot "ID":
## [1] "88 to 9"

And if they are spatial in context (e.g., if we need to plot them in any way, shape, or form), we need to put them into a SpatialLines object, which is also constructed from a list of Lines objects.

SLs <- SpatialLines( list(Ls1,Ls2))
proj4string(SLs) <- CRS(proj4string(pts))
SLs
## class       : SpatialLines 
## features    : 2 
## extent      : -114.2935, -113.6679, 28.96651, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

If we want to add data to the set of lines, we can associate a data.frame with each of them with internal data.

df <- data.frame( Sequence = c("First","Second"), Like_It= c(TRUE,FALSE), row.names = c("88 to 9","9 to 84"))
SLDF <- SpatialLinesDataFrame( SLs, df )
SLDF
## class       : SpatialLinesDataFrame 
## features    : 2 
## extent      : -114.2935, -113.6679, 28.96651, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 2
## names       : Sequence, Like_It 
## min values  :    First,    TRUE 
## max values  :   Second,   FALSE
as.data.frame(SLDF)
##         Sequence Like_It
## 88 to 9    First    TRUE
## 9 to 84   Second   FALSE

We can also extract the line lengths of each line.

SpatialLinesLengths(SLs, longlat = TRUE)
## [1] 48.34721 27.51264

30.3 Polygons

A polygon is simply a collection of line segments that closes in on itself. We can use polygons to identify habitat, define boundaries, etc. In the short description to follow, we will create a set Polygon* objects, culminating in a SpatialPolygonsDataFrame object.

We will start with the first 5 coordinates in the arapat data set. To make the polygon, we must close the coordinates, which means take the first one we put in and append it to the end of the list of coordinates, such that in this case c[1,] == c[6,].

c <- cbind( coords$Longitude[1:5], coords$Latitude[1:5])
c <- rbind( c, c[1,])
P <- Polygon( c )
P
## An object of class "Polygon"
## Slot "labpt":
## [1] -113.89179   28.89114
## 
## Slot "area":
## [1] 0.1849395
## 
## Slot "hole":
## [1] FALSE
## 
## Slot "ringDir":
## [1] 1
## 
## Slot "coords":
##           [,1]     [,2]
## [1,] -114.2935 29.32541
## [2,] -113.9449 29.01457
## [3,] -113.6679 28.96651
## [4,] -113.4897 28.72796
## [5,] -113.9914 28.66056
## [6,] -114.2935 29.32541

As you can see, there is some additional information provided in the default layout. A few points to be made:
- The area parameter is not georeferenced as the polygon itself has no projection.
- The labpt is the coordinate where a label would be plot. - The hole and ringDir determine if the polygon represent a hole in some other polygon (e.g., the doughnut hole and the direction it is plot).

Similar to how we constructed SpatialLines, a Polygon must be in inserted into a set of Polygons

Ps <- Polygons( list(P), ID="Bob")

From which a list of can be created to make a SpatialPolygons object

SPs <- SpatialPolygons( list(Ps))
proj4string(SPs) <- CRS(proj4string(pts))
SPs
## class       : SpatialPolygons 
## features    : 1 
## extent      : -114.2935, -113.4897, 28.66056, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

And data can be added to it making a SpatialPolygonsDataFrame (n.b., The row.names of the data.frame must match the ID we set for making the Polygons objects). If they do not, there will be an error thrown.

df <- data.frame(Populations=paste(coords$Stratum[1:5],collapse=", "), row.names = "Bob")
SPDF <- SpatialPolygonsDataFrame( SPs, df)
SPDF
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : -114.2935, -113.4897, 28.66056, 29.32541  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 1
## names       :         Populations 
## min values  : 88, 9, 84, 175, 177 
## max values  : 88, 9, 84, 175, 177

30.4 Saving Vector Objects

As all of these objects are R objects, they can be saved to disk using the save() function, which makes them a *.rda object. If you have objects that take a bit of time to create, it is in your best interests to save them after creation and on subsequent analyses, just use the saved versions.