The location (physical, temporal, ecological, etc.) of an observation can is represented as a coordiante consisting of one or more values specifying the localization within a larger space. Coordinates repreenting a physical location are represented by datum embedded within a larger
We often start with data in raw format. For this example, we will be using the Araptus attenuata Sonoran desert bark beetle data set from the scape
library. If you do not have it, install it from the Dyerlab Github account. Here is what it looks like.
## Site Longitude Latitude Males
## 12 : 1 Min. :-114.3 Min. :23.29 Min. : 9.00
## 153 : 1 1st Qu.:-113.2 1st Qu.:24.88 1st Qu.:16.00
## 157 : 1 Median :-112.0 Median :26.64 Median :19.00
## 159 : 1 Mean :-111.9 Mean :26.42 Mean :24.93
## 160 : 1 3rd Qu.:-110.7 3rd Qu.:28.04 3rd Qu.:28.00
## 161 : 1 Max. :-109.3 Max. :29.33 Max. :64.00
## (Other):23
## Females Suitability
## Min. : 5.00 Min. :0.05628
## 1st Qu.:15.00 1st Qu.:0.26730
## Median :21.00 Median :0.41251
## Mean :23.62 Mean :0.43527
## 3rd Qu.:30.00 3rd Qu.:0.56413
## Max. :63.00 Max. :0.90188
##
These data represent:
1. Sampling sites in Baja California 2. Census of male and female beetles sampled. 3. The habitat suitability (from a Niche Model) based upon the host-plant range.
In addition to normal floating-point numbers being able to represent coordinate data such as latitude and longitude, we can also specify the data as objects recognizable for spatial databases, GIS, opensource libraries, GeoJSON, etc. These are ISO standards and if we put our data into these formats, we gain the ability to use a much wider array of tools.
For our purposes, we will use the implentation in the sf
(Simple Features) library.
Most of the methods in the library are prefixed with st_
as a nod to the routine names that are used in geospatial databases (so by learning sf
you are getting a leg up on how to use spatial database calls as well).
To make a point we can construct it as a Well-Known Text object (WKT) as:
## [1] "POINT (-111.7 26.14)"
or as a Well-Known Binary (WBK) as:
## [1] 01 01 00 00 00 cd cc cc cc cc ec 5b c0 a4 70 3d 0a d7 23 3a 40
We can convert the data.frame
above to one that contains these GEOMETRY objects.
## Simple feature collection with 29 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -114.2935 ymin: 23.2855 xmax: -109.327 ymax: 29.32541
## epsg (SRID): NA
## proj4string: NA
## First 10 features:
## Site Males Females Suitability geometry
## 1 12 24 21 0.3519050 POINT (-112.6655 27.18232)
## 2 153 35 41 0.7324870 POINT (-110.4624 24.13389)
## 3 157 26 30 0.8810290 POINT (-110.096 24.0195)
## 4 159 22 15 0.1879650 POINT (-113.3161 27.52944)
## 5 160 48 36 0.3651910 POINT (-112.5296 27.40498)
## 6 161 64 63 0.2791050 POINT (-112.986 27.0367)
## 7 162 57 41 0.6136198 POINT (-112.408 27.2028)
## 8 163 21 21 0.4328730 POINT (-110.951 24.2115)
## 9 166 19 26 0.2673030 POINT (-112.0806 25.91409)
## 10 168 28 25 0.4964650 POINT (-111.2156 25.55757)
However, notice there is no Coordinate Reference System (CRS) associated with these data.
## Coordinate Reference System: NA
The ‘coordinates’ we specified are just points, which we happened to label as “Latitude” and “Longitude” but we could have called them “X” and “Y” and unless you specify the CRS, R will pretend like it has no idea what system you are using. If we want to make them represent real coordiantes (as in Latitude and Longitude we yanked off our GPS unit or grabbed from Google Earth), we need to specify the CRS directly. See the page on projections for more information.
Here is a quick way to set a CRS using EPSG codes, an EPSG code (see EPSG.io for more information) is a shorthand notation for a coordinate reference system. There are tons of different projects available and you can make up your own if you like. For this example, I’ll use one of the most common ones we will run into, epsg:3857
.
## Simple feature collection with 29 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -114.2935 ymin: 23.2855 xmax: -109.327 ymax: 29.32541
## epsg (SRID): 3857
## proj4string: +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
## First 10 features:
## Site Males Females Suitability geometry
## 1 12 24 21 0.3519050 POINT (-112.6655 27.18232)
## 2 153 35 41 0.7324870 POINT (-110.4624 24.13389)
## 3 157 26 30 0.8810290 POINT (-110.096 24.0195)
## 4 159 22 15 0.1879650 POINT (-113.3161 27.52944)
## 5 160 48 36 0.3651910 POINT (-112.5296 27.40498)
## 6 161 64 63 0.2791050 POINT (-112.986 27.0367)
## 7 162 57 41 0.6136198 POINT (-112.408 27.2028)
## 8 163 21 21 0.4328730 POINT (-110.951 24.2115)
## 9 166 19 26 0.2673030 POINT (-112.0806 25.91409)
## 10 168 28 25 0.4964650 POINT (-111.2156 25.55757)
The epgs:3857
is used by Google Maps, ESRI, OpenStreet Maps, Bing, and other providers and is equivallent to (for those who care) to the following WKT definition:
PROJCS["WGS 84 / Pseudo-Mercator",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]],
PROJECTION["Mercator_1SP"],
PARAMETER["central_meridian",0],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["X",EAST],
AXIS["Y",NORTH],
EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"],
AUTHORITY["EPSG","3857"]]
A bounding box is the minimal set of coordinates that create a square box (in Longitude and Latitude) that encompass all the points. Here is the box for the sites we collected.
## xmin ymin xmax ymax
## -114.29353 23.28550 -109.32700 29.32541
This is often useful in grabbing maps as we need to know the extent
of the data.