In this section, we will focus on lines and polygons. These are represented as sf objects, we can leverage a large amount of st_* functions to perform manipulations, and we can visualize them using either built-in routines or via ggplot (as expected).

Raw Data

In this section, we will be working with lines and polygons. These are going to be represented by roads and development zones in Richmond, Virginia. These data are made available by the GIS Department of the City of Richmond. For this example, we will be loading these in as shapefiles.

You can load in shapefile data directly into R but we have to do a little work. First, we should understand that a shapefile is not an actual file, it is a collection of several files. They are often zipped up into a single archive.

Here are two shape file archives that I have up on Github in the class repository.

roads_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Centerlines-shp.zip"
district_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Zoning_Districts-shp.zip"

We can use R to download and unzip the file in the current data directory (n.b., you can do it using a browser as well). To use R you need to first download them (I’ve set eval=FALSE to the chuck so it is not redownloaded each time. Run it by hand using CTRL/CMD + Return).

download.file( district_url , destfile = "./Districts.zip")
download.file( roads_url, destfile =  "./Roads.zip")

We can unzip them now as:

unzip("Districts.zip")
unzip("Roads.zip")

These routines will expand the archives in the current directory.

Depending upon how the archives were created, they may make a subdirectory or just a pile of files in the same directory. For this example, the are one of each with the Zoning_Districts. set of files expanded in the current directory and the Roads expanded to a subfolder nnamed Centerlines-shp.

system( "ls" )

Lines

We’ve covered points and now if we put them together in a sequence, we get lines. They are taken in the order given.

Instead of loading these in manually, I’m going to load in the shapefile with the roads. To load in shapefiles, we use the st_read() function and pass it the .shp file.

roads <- st_read( "Centerlines-shp/tran_Carriageway.shp" ) 
Reading layer `tran_Carriageway' from data source 
  `/Users/rodney/Documents/github/classes/ENVS-Lectures/lectures/spatial/spatial_polygons/Centerlines-shp/tran_Carriageway.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 29081 features and 25 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 11734060 ymin: 3682790 xmax: 11817490 ymax: 3751927
Projected CRS: NAD83 / Virginia South (ftUS)
names( roads )
 [1] "FID"        "Carriagewa" "AssetID"    "StreetType" "Functional"
 [6] "FIPS"       "LeftFromAd" "LeftToAddr" "RightFromA" "RightToAdd"
[11] "PrefixDire" "ProperName" "SuffixType" "SuffixDire" "FullName"  
[16] "RouteName"  "OneWay"     "PostedSpee" "CADRouteSp" "CreatedBy" 
[21] "CreatedDat" "EditBy"     "EditDate"   "GlobalID"   "SHAPE_Leng"
[26] "geometry"  
FID

Carriagewa

AssetID

StreetType

Functional

FIPS

LeftFromAd

LeftToAddr

RightFromA

RightToAdd

PrefixDire

ProperName

SuffixType

SuffixDire

FullName

RouteName

OneWay

PostedSpee

CADRouteSp

CreatedBy

CreatedDat

EditBy

EditDate

GlobalID

SHAPE_Leng

geometry

We can clean it up a bit by removing the extraneous columns.

roads %>%
  select(-CreatedBy,
         -CreatedDat,
         -EditBy,
         -EditDate) %>%
  select( FIPS, AssetID, StreetType, Functional, FullName, OneWay, geometry ) -> roads
roads

You can see that the geometry object is a LINESTRING (in sf terms). We can see the coordinates for one of these (say Dwyer St), by conveting the geometry object to a Well Know Text (WKT) version representing the sequence of points.

For any particular street, say Three Chopt Road in Richmond, we can filter out the rows of this for each LINESTRING object.

roads %>% 
  filter( FullName == "Three Chopt Road") -> three_chopt
three_chopt

This one has 65 elements, each of which is created by a sequence of points. We can loop through them and print out the coordinates in textual format as:

for( i in 1:nrow(three_chopt) ) {
  geo <- three_chopt$geometry[i]
  cat( i, st_as_text( geo ), "\n") 
}
1 LINESTRING (11763416 3739109, 11763452 3739376) 
2 LINESTRING (11763405 3738655, 11763403 3738666, 11763398 3738687, 11763386 3738761, 11763382 3738807, 11763383 3738849, 11763385 3738879, 11763392 3738922, 11763395 3738955, 11763414 3739084, 11763416 3739109) 
3 LINESTRING (11763466 3740210, 11763459 3740241, 11763452 3740269) 
4 LINESTRING (11763452 3740269, 11763455 3740306, 11763478 3740532) 
5 LINESTRING (11763466 3740210, 11763475 3740227, 11763483 3740243) 
6 LINESTRING (11763483 3740243, 11763487 3740298, 11763508 3740495) 
7 LINESTRING (11763460 3739957, 11763460 3739970, 11763462 3740051, 11763464 3740132, 11763466 3740210) 
8 LINESTRING (11763449 3739565, 11763451 3739646, 11763454 3739727, 11763456 3739813, 11763458 3739889, 11763460 3739957) 
9 LINESTRING (11765211 3734622, 11765195 3734652, 11765182 3734682, 11765169 3734713, 11765159 3734744, 11765145 3734793, 11765133 3734842, 11765123 3734892, 11765114 3734955, 11765103 3735018, 11765102 3735024, 11765096 3735051, 11765089 3735077, 11765079 3735102) 
10 LINESTRING (11765747 3733557, 11765743 3733573) 
11 LINESTRING (11765743 3733573, 11765625 3734022, 11765619 3734041, 11765612 3734061, 11765603 3734079) 
12 LINESTRING (11765603 3734079, 11765596 3734095, 11765587 3734109) 
13 LINESTRING (11765785 3733407, 11765751 3733540) 
14 LINESTRING (11765751 3733540, 11765747 3733557) 
15 LINESTRING (11765079 3735102, 11765079 3735131, 11765078 3735159, 11765075 3735187, 11765051 3735355) 
16 LINESTRING (11765587 3734109, 11765578 3734122, 11765569 3734135, 11765471 3734266, 11765356 3734419, 11765251 3734560) 
17 LINESTRING (11765251 3734560, 11765230 3734591, 11765211 3734622) 
18 LINESTRING (11763934 3736816, 11763898 3736854, 11763863 3736893, 11763829 3736934, 11763798 3736975, 11763768 3737018, 11763746 3737052, 11763726 3737087, 11763707 3737124) 
19 LINESTRING (11763707 3737124, 11763690 3737162, 11763674 3737202, 11763661 3737242) 
20 LINESTRING (11766160 3732239, 11766154 3732261, 11766146 3732281, 11766137 3732301, 11766125 3732320, 11766111 3732338) 
21 LINESTRING (11766111 3732338, 11766093 3732359, 11766077 3732381, 11766063 3732405, 11766051 3732429, 11766040 3732454, 11765967 3732710, 11765954 3732756) 
22 LINESTRING (11765954 3732756, 11765900 3732959) 
23 LINESTRING (11765900 3732959, 11765785 3733407) 
24 LINESTRING (11763442 3738419, 11763392 3738656) 
25 LINESTRING (11763661 3737242, 11763641 3737323, 11763560 3737754, 11763499 3738117, 11763442 3738419) 
26 LINESTRING (11765051 3735355, 11765040 3735477, 11765027 3735603, 11765006 3735720) 
27 LINESTRING (11765006 3735720, 11764995 3735768, 11764982 3735815, 11764967 3735862, 11764950 3735908, 11764932 3735953, 11764911 3735997, 11764882 3736048, 11764852 3736098, 11764826 3736138, 11764798 3736178, 11764769 3736216, 11764739 3736254) 
28 LINESTRING (11764137 3736666, 11764136 3736666, 11764093 3736693, 11764051 3736721, 11764011 3736751, 11763972 3736783, 11763934 3736816) 
29 LINESTRING (11764739 3736254, 11764665 3736336, 11764642 3736363, 11764617 3736388, 11764591 3736412, 11764564 3736434, 11764540 3736451, 11764516 3736467, 11764490 3736482, 11764464 3736495, 11764457 3736499, 11764390 3736532, 11764273 3736592, 11764227 3736615, 11764182 3736639, 11764137 3736666) 
30 LINESTRING (11765310 3734586, 11765295 3734599, 11765282 3734614, 11765268 3734634, 11765254 3734654, 11765242 3734675, 11765232 3734696, 11765219 3734727, 11765208 3734758, 11765199 3734789, 11765190 3734821, 11765177 3734889, 11765162 3734957, 11765160 3734976, 11765156 3734993, 11765150 3735011, 11765142 3735028, 11765133 3735043, 11765121 3735060, 11765108 3735075, 11765094 3735089, 11765079 3735102) 
31 LINESTRING (11765857 3733581, 11765848 3733588, 11765847 3733588, 11765836 3733599, 11765825 3733611, 11765817 3733624, 11765809 3733637, 11765804 3733652, 11765749 3733864, 11765694 3734088, 11765689 3734103, 11765683 3734118, 11765675 3734132) 
32 LINESTRING (11765675 3734132, 11765636 3734183, 11765359 3734547, 11765344 3734561, 11765327 3734574, 11765310 3734586) 
33 LINESTRING (11759671 3743060, 11759743 3743022, 11759962 3742885) 
34 LINESTRING (11759962 3742885, 11760198 3742743, 11760303 3742691, 11760432 3742645, 11760506 3742626, 11760570 3742617) 
35 LINESTRING (11760570 3742617, 11760922 3742570) 
36 LINESTRING (11760922 3742570, 11761128 3742538, 11761301 3742505) 
37 LINESTRING (11761301 3742505, 11761737 3742422) 
38 LINESTRING (11761737 3742422, 11761973 3742378, 11762064 3742355, 11762163 3742318, 11762234 3742279) 
39 LINESTRING (11762234 3742279, 11762345 3742202, 11762581 3742010) 
40 LINESTRING (11762581 3742010, 11762798 3741834) 
41 LINESTRING (11756794 3744048, 11757219 3743916, 11757442 3743857, 11757585 3743811) 
42 LINESTRING (11757585 3743811, 11757919 3743694) 
43 LINESTRING (11757919 3743694, 11758226 3743587) 
44 LINESTRING (11758226 3743587, 11758589 3743460) 
45 LINESTRING (11758589 3743460, 11758970 3743326) 
46 LINESTRING (11759295 3743234, 11759313 3743227) 
47 LINESTRING (11759313 3743227, 11759332 3743220) 
48 LINESTRING (11759332 3743220, 11759478 3743169, 11759644 3743095, 11759671 3743060) 
49 LINESTRING (11759280 3743196, 11759299 3743190) 
50 LINESTRING (11759299 3743190, 11759300 3743190, 11759318 3743183) 
51 LINESTRING (11759318 3743183, 11759463 3743132, 11759631 3743056, 11759671 3743060) 
52 LINESTRING (11755429 3744542, 11755434 3744537, 11755501 3744480, 11755502 3744479, 11755574 3744429, 11755576 3744428, 11755652 3744386, 11755654 3744385, 11755734 3744350, 11755736 3744349, 11756446 3744138, 11756739 3744045, 11756794 3744048) 
53 LINESTRING (11755466 3744563, 11755526 3744511, 11755596 3744463, 11755671 3744421, 11755749 3744387, 11756458 3744176, 11756458 3744176, 11756750 3744084, 11756794 3744048) 
54 LINESTRING (11762798 3741834, 11763253 3741478, 11763350 3741382) 
55 LINESTRING (11763350 3741382, 11763364 3741367) 
56 LINESTRING (11758970 3743326, 11759023 3743329, 11759295 3743234) 
57 LINESTRING (11758970 3743326, 11759009 3743291, 11759280 3743196) 
58 LINESTRING (11763508 3740495, 11763505 3740642) 
59 LINESTRING (11763505 3740642, 11763509 3740676, 11763510 3740774, 11763500 3740870, 11763471 3741028) 
60 LINESTRING (11763478 3740532, 11763505 3740642) 
61 LINESTRING (11763441 3739375, 11763446 3739407, 11763449 3739565) 
62 LINESTRING (11763393 3739110, 11763413 3739268, 11763413 3739284, 11763428 3739375) 
63 LINESTRING (11763383 3738657, 11763380 3738666, 11763373 3738704, 11763362 3738777, 11763358 3738817, 11763359 3738854, 11763361 3738885, 11763365 3738920, 11763393 3739110) 
64 LINESTRING (11763471 3741028, 11763438 3741210, 11763427 3741246, 11763398 3741312, 11763376 3741348) 
65 LINESTRING (11763376 3741348, 11763364 3741367) 

We can then plot this using the built-in plot commands as:

plot( three_chopt["StreetType"] )

Or using ggplot as:

ggplot( three_chopt ) + 
  geom_sf() + 
  coord_sf()

Polygons

Polygons are simply lines whose first and last point are the same (e.g., they close upon themselves). We can create these de novo

Polygons from Data Frames

As a first approximation, we can grab polygon data from ggplot itself. Here I pull in the data.frame representing the counties of Virginia.

map_data( "county", "virginia") %>%
  select( Longitude = long,
          Latitude = lat,
          group,
          County = subregion) -> va_counties
head( va_counties )

To get an idea of what theses data represent visually, let’s first plot it as a geom_point() object. This wil show you where all the coordinates are located (just not the connecting lines).

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_point( size=0.25 ) + 
  coord_quickmap()

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_polygon( aes( group=group ),
                fill="grey80",
                color = "black", 
                size = 0.25) + 
  coord_quickmap()

What is hidden here is the complexity of the the points themselves. Each county is identified by a group in the data.frame

If we look at a particular county, it may be a bit more informative on how these things are consturcted. Here are the points (in red) and the underlying connecting lines creating the polygon (in grey).

va_counties %>%
  filter( County %in%  c("hanover","henrico") ) %>%
  ggplot( aes(Longitude, Latitude) ) + 
  geom_polygon( aes( fill = County), alpha=0.1 ) +
  geom_point( aes( color = County) ) +
  coord_quickmap()

Notice that the points on the border are repeated in both County == "hanover" and County == "henrico".

Polygons from Shapefiles

We can also load these in from shapefiles. In the Richmond GIS data, we have Zoning District data. We can unzip them in the current directory as before.

unzip( "./Districts.zip")

And in this case, it simply expands all the files in the current directory as a set of files named Zoning_Districts.*.

system("ls -al Zoning*")

To load it in, we read the shapefile (.shp) from the local directory.

districts <- st_read( "Zoning_Districts.shp" )
Reading layer `Zoning_Districts' from data source 
  `/Users/rodney/Documents/github/classes/ENVS-Lectures/lectures/spatial/spatial_polygons/Zoning_Districts.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 634 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11743500 ymin: 3687944 xmax: 11806060 ymax: 3744741
Projected CRS: NAD83 / Virginia South (ftUS)
class( districts )
[1] "sf"         "data.frame"
sf

data.frame

This has a lot of columns of information.

names( districts )
 [1] "OBJECTID"   "Name"       "Ordinance"  "OrdinanceP" "Conditiona"
 [6] "AdoptionDa" "Comment"    "CreatedBy"  "CreatedDat" "EditBy"    
[11] "EditDate"   "GlobalID"   "Shape__Are" "Shape__Len" "geometry"  
OBJECTID

Name

Ordinance

OrdinanceP

Conditiona

AdoptionDa

Comment

CreatedBy

CreatedDat

EditBy

EditDate

GlobalID

Shape__Are

Shape__Len

geometry
summary( districts )
    OBJECTID          Name            Ordinance          OrdinanceP       
 Min.   :   1.0   Length:634         Length:634         Length:634        
 1st Qu.: 162.2   Class :character   Class :character   Class :character  
 Median : 324.5   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 389.6                                                           
 3rd Qu.: 486.8                                                           
 Max.   :2677.0                                                           
  Conditiona          AdoptionDa           Comment           CreatedBy        
 Length:634         Min.   :2000-01-01   Length:634         Length:634        
 Class :character   1st Qu.:2000-01-01   Class :character   Class :character  
 Mode  :character   Median :2000-01-01   Mode  :character   Mode  :character  
                    Mean   :2004-01-20                                        
                    3rd Qu.:2007-09-10                                        
                    Max.   :2020-07-27                                        
   CreatedDat            EditBy             EditDate         
 Min.   :2020-08-24   Length:634         Min.   :2020-08-24  
 1st Qu.:2020-08-24   Class :character   1st Qu.:2020-08-24  
 Median :2020-08-24   Mode  :character   Median :2020-08-24  
 Mean   :2020-08-24                      Mean   :2020-08-24  
 3rd Qu.:2020-08-24                      3rd Qu.:2020-08-24  
 Max.   :2020-08-24                      Max.   :2020-08-24  
   GlobalID           Shape__Are          Shape__Len                geometry  
 Length:634         Min.   :     2823   Min.   :   213.4   MULTIPOLYGON :634  
 Class :character   1st Qu.:    71070   1st Qu.:  1232.3   epsg:2284    :  0  
 Mode  :character   Median :   258852   Median :  2552.3   +proj=lcc ...:  0  
                    Mean   :  2749033   Mean   :  6265.9                      
                    3rd Qu.:  1175317   3rd Qu.:  6166.8                      
                    Max.   :171812574   Max.   :111874.1                      

More importantly, we can look at the raw data and see the other meta data.

head(districts, n=2)

The whole thing looks like this (I’ll use the area of each polygon as the fill color).

plot( districts["Shape__Are"], axes=TRUE )

Notice it is in CRS = NAD83/Virginia South (ftUS), which if we look at epsg.io and search for it relates to EPGS=32147. Let’s do some pre-processing1:
- Put it in Lat/Lon for simplicity
- Drop some of the unnecessary columns of data in the shapefile. - Crop to the VCU/Fan area (I went to google earth and found the bounding box and then just added it here so I had to make it lat/lon then crop then change it back).

districts %>% 
  select( OBJECTID, 
          Name, 
          GlobalID, 
          Area = Shape__Are,
          geometry) -> districts
head( districts )

And we can plot it normally using plot() for sf objects. Each row is a MULTIPOLYGON object.

districts %>%
  filter( OBJECTID == 368 ) %>%
  st_buffer(dist = 1500) %>%
  st_bbox() -> fan_bbox


districts %>%
  st_crop( fan_bbox ) -> theFan 
Warning: attribute variables are assumed to be spatially constant throughout all
geometries
plot( theFan["Name"] )

Or as a ggplot() object (notice how it converts to lat/lon when plotting),

ggplot( theFan ) + 
  geom_sf( aes( fill=Name ) ) + 
  coord_sf() 

Let’s go grab a key to those zoning types. I’ve uploaded a csv file with a translation. Here I left_join() with that new file that is read in dynamically2.

zone_url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/DistrictCodes.csv"

theFan %>%
  left_join( read_csv( zone_url ),
             by="Name") %>%
  mutate( Category = factor( Category) ) %>%
  select( OBJECTID, 
          Name, 
          Category, 
          everything() )  -> theFan
Rows: 27 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Name, Category

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot( theFan ) +
  geom_sf( aes( fill=Category)) +
  scale_fill_brewer( type="qual", 
                     palette = "Set3")

Operations

So we will close this out by looking at a few different operations that we can use for polygons. First, I’m going to load in the road shapefile (that was named by some random sequence of letters) and reproject it.

unzip( "./Roads.zip",)
st_read("Centerlines-shp/tran_Carriageway.shp") -> roads
Reading layer `tran_Carriageway' from data source 
  `/Users/rodney/Documents/github/classes/ENVS-Lectures/lectures/spatial/spatial_polygons/Centerlines-shp/tran_Carriageway.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 29081 features and 25 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 11734060 ymin: 3682790 xmax: 11817490 ymax: 3751927
Projected CRS: NAD83 / Virginia South (ftUS)
head( roads, n=3)

Let’s clean it up a bit.

roads %>%
  select( -CreatedBy, -CreatedDat, -EditBy, -EditDate) -> roads
head( roads )
plot( theFan$geometry, lwd=2 )
fanRoads <- st_crop( roads, st_bbox( theFan ))
Warning: attribute variables are assumed to be spatially constant throughout all
geometries
plot( fanRoads$geometry, col="blue", cex=0.5, add=TRUE )

Let’s isolate one of the main polygons in theFan data set. The target one below is indicated by OBJECTID=368.

theFan %>%
  mutate( Target = ifelse( OBJECTID == 368, 
                           TRUE, 
                           FALSE) ) -> theFan


theFan %>%
  ggplot() + 
  geom_sf( aes(fill=Target) ) + 
  geom_sf_text( aes(label=OBJECTID), size=3 ) +
  coord_sf() 

Spatial Joins

names( theFan )
[1] "OBJECTID" "Name"     "Category" "GlobalID" "Area"     "geometry" "Target"  
OBJECTID

Name

Category

GlobalID

Area

geometry

Target
names( fanRoads )
 [1] "FID"        "Carriagewa" "AssetID"    "StreetType" "Functional"
 [6] "FIPS"       "LeftFromAd" "LeftToAddr" "RightFromA" "RightToAdd"
[11] "PrefixDire" "ProperName" "SuffixType" "SuffixDire" "FullName"  
[16] "RouteName"  "OneWay"     "PostedSpee" "CADRouteSp" "GlobalID"  
[21] "SHAPE_Leng" "geometry"  
FID

Carriagewa

AssetID

StreetType

Functional

FIPS

LeftFromAd

LeftToAddr

RightFromA

RightToAdd

PrefixDire

ProperName

SuffixType

SuffixDire

FullName

RouteName

OneWay

PostedSpee

CADRouteSp

GlobalID

SHAPE_Leng

geometry

We can use spatial joins to select features either directly. Here I’ll use the target polygon in theFan

target <- theFan[ theFan$OBJECTID == 368, ]
target

And then add an attribute to the data.frame if each multipolygon intersects that polygon.

fanRoads %>%
  mutate( OnTarget = st_intersects( fanRoads,
                                    target, 
                                    sparse = FALSE ) ) -> fanRoads
summary( fanRoads$OnTarget )
     V1         
 Mode :logical  
 FALSE:1567     
 TRUE :553      

We can get the names of these road using normal dplyr routines,

fanRoads %>%
  filter( st_intersects( fanRoads,
                         target, 
                         sparse = FALSE ) == TRUE ) %>%
  as_data_frame() %>%
  select( `Street Name` = FullName ) %>%
  arrange( `Street Name`) %>%
  unique() 
Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
Please use `as_tibble()` instead.
The signature and semantics have changed, see `?as_tibble`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

And we can plot them as:

fanRoads %>%
  filter( OnTarget==TRUE ) %>%
  ggplot() +
  geom_sf( aes( fill = Target ), data=theFan ) +
  geom_sf( color="green" ) + 
  scale_fill_manual( values=c("grey90","dodgerblue3"))

Go check out the sf cheatsheet for more geospatial joins and options.


  1. Dyer’s First Law: Reproject then forget about it!↩︎

  2. You should be careful when you use joins on sf objects. If you sf object is on the right side (see discussion of joins here) then the result will not be an sf object and you’ll have to coerce it back into one again. It always adopts the characteristics of the left object.↩︎

---
title: "Lines & Polygons"
output: html_notebook
---

```{r startup, include=FALSE}
library( sf )
library( maps )
library( ggplot2 )
library( tidyverse )
ggplot2::theme_set( theme_minimal( base_size=16) )
```


In this section, we will focus on lines and polygons.  These are represented as `sf` objects, we can leverage a large amount of `st_*` functions to perform manipulations, and we can visualize them using either built-in routines or via `ggplot` (as expected).

## Raw Data

In this section, we will be working with lines and polygons.  These are going to be represented by roads and development zones in Richmond, Virginia.  These data are made available by the GIS Department of the City of Richmond. For this example, we will be loading these in as *shapefiles*.

You can load in shapefile data directly into R but we have to do a little work.  First, we should understand that *a shapefile* is not an actual file, it is a collection of several files.  They are often zipped up into a single archive.  

Here are two shape file archives that I have up on Github in the class repository.  

```{r}
roads_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Centerlines-shp.zip"
district_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Zoning_Districts-shp.zip"
```

We can use `R` to download and unzip the file *in the current data directory* (n.b., you can do it using a browser as well).  To use `R` you need to first download them (I've set `eval=FALSE` to the chuck so it is not redownloaded each time.  Run it by hand using `CTRL/CMD + Return`).

```{r}
download.file( district_url , destfile = "./Districts.zip")
download.file( roads_url, destfile =  "./Roads.zip")
```

We can unzip them now as:

```{r}
unzip("Districts.zip")
unzip("Roads.zip")
```

These routines will expand the archives in the current directory.

Depending upon how the archives were created, they may make a subdirectory or just a pile of files in the same directory.  For this example, the are one of each with the *Zoning_Districts.* set of files expanded in the current directory and the *Roads* expanded to a subfolder nnamed *Centerlines-shp*.

```{r}
system( "ls" )
```


## Lines

We've covered [points](https://dyerlab.github.io/ENVS-Lectures/spatial/spatial_points/slides.html#1) and now if we put them together in a sequence, we get lines.  They are taken in the order given.  

Instead of loading these in manually, I'm going to load in the shapefile with the roads.  To load in shapefiles, we use the `st_read()` function and pass it the .shp file.

```{r}
roads <- st_read( "Centerlines-shp/tran_Carriageway.shp" ) 
names( roads )
```

We can clean it up a bit by removing the extraneous columns.

```{r}
roads %>%
  select(-CreatedBy,
         -CreatedDat,
         -EditBy,
         -EditDate) %>%
  select( FIPS, AssetID, StreetType, Functional, FullName, OneWay, geometry ) -> roads
roads
```

You can see that the `geometry` object is a `LINESTRING` (in `sf` terms).  We can see the coordinates for one of these (say *Dwyer St*), by conveting the `geometry` object to a Well Know Text (WKT) version representing the sequence of points.

For any particular street, say *Three Chopt Road* in Richmond, we can filter out the rows of this for each `LINESTRING` object. 

```{r}
roads %>% 
  filter( FullName == "Three Chopt Road") -> three_chopt
three_chopt
```

This one has `r nrow( three_chopt)` elements, each of which is created by a sequence of points.  We can loop through them and print out the coordinates in textual format as:

```{r}
for( i in 1:nrow(three_chopt) ) {
  geo <- three_chopt$geometry[i]
  cat( i, st_as_text( geo ), "\n") 
}
```

We can then plot this using the built-in plot commands as:

```{r}
plot( three_chopt["StreetType"] )
```

Or using `ggplot` as:

```{r}
ggplot( three_chopt ) + 
  geom_sf() + 
  coord_sf()
```

## Polygons

Polygons are simply lines whose first and last point are the same (e.g., they close upon themselves).  We can create these *de novo*

### Polygons from Data Frames

As a first approximation, we can grab polygon data from `ggplot` itself.  Here I pull in the `data.frame` representing the counties of Virginia.

```{r}
map_data( "county", "virginia") %>%
  select( Longitude = long,
          Latitude = lat,
          group,
          County = subregion) -> va_counties
head( va_counties )
```
To get an idea of what theses data represent visually, let's first plot it as a `geom_point()` object.  This wil show you where all the coordinates are located (just not the connecting lines).


```{r}
ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_point( size=0.25 ) + 
  coord_quickmap()
```



```{r}
ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_polygon( aes( group=group ),
                fill="grey80",
                color = "black", 
                size = 0.25) + 
  coord_quickmap()
```

What is hidden here is the complexity of the the points themselves.  Each county is identified by a `group` in the `data.frame` 




If we look at a particular county, it may be a bit more informative on how these things are consturcted.  Here are the points (in red) and the underlying connecting lines creating the polygon (in grey).  

```{r}
va_counties %>%
  filter( County %in%  c("hanover","henrico") ) %>%
  ggplot( aes(Longitude, Latitude) ) + 
  geom_polygon( aes( fill = County), alpha=0.1 ) +
  geom_point( aes( color = County) ) +
  coord_quickmap()
```

Notice that the points on the border are repeated in both `County == "hanover"` and `County == "henrico"`.

### Polygons from Shapefiles

We can also load these in from shapefiles.  In the Richmond GIS data, we have Zoning District data.  We can unzip them in the current directory as before.

```{r eval=FALSE}
unzip( "./Districts.zip")
```

And in this case, it simply expands all the files in the current directory as a set of files named `Zoning_Districts.*`. 

```{r}
system("ls -al Zoning*")
```

To load it in, we read the shapefile (.shp) from the local directory.

```{r}
districts <- st_read( "Zoning_Districts.shp" )
class( districts )
```

This has a lot of columns of information.

```{r}
names( districts )
```


```{r}
summary( districts )
```

More importantly, we can look at the raw data and see the other meta data.

```{r}
head(districts, n=2)
```


The whole thing looks like this (I'll use the area of each polygon as the fill color).

```{r}
plot( districts["Shape__Are"], axes=TRUE )
```
Notice it is in `CRS = NAD83/Virginia South (ftUS)`, which if we look at [epsg.io](http://epsg.io/32147) and search for it relates to EPGS=32147.  Let's do some pre-processing[^1]:  
- Put it in Lat/Lon for simplicity  
- Drop some of the unnecessary columns of data in the shapefile. 
- Crop to the VCU/Fan area (I went to google earth and found the bounding box and then just added it here so I had to make it lat/lon then crop then change it back).

```{r}
districts %>% 
  select( OBJECTID, 
          Name, 
          GlobalID, 
          Area = Shape__Are,
          geometry) -> districts
head( districts )
```





And we can plot it normally using `plot()` for `sf` objects.  Each row is a `MULTIPOLYGON` object.

```{r}

districts %>%
  filter( OBJECTID == 368 ) %>%
  st_buffer(dist = 1500) %>%
  st_bbox() -> fan_bbox


districts %>%
  st_crop( fan_bbox ) -> theFan 

plot( theFan["Name"] )


```
Or as a `ggplot()` object (notice how it converts to lat/lon when plotting),

```{r}
ggplot( theFan ) + 
  geom_sf( aes( fill=Name ) ) + 
  coord_sf() 
```


Let's go grab a key to those zoning types.  I've uploaded a csv file with a translation.  Here I `left_join()` with that new file that is read in dynamically[^2].

```{r}
zone_url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/DistrictCodes.csv"

theFan %>%
  left_join( read_csv( zone_url ),
             by="Name") %>%
  mutate( Category = factor( Category) ) %>%
  select( OBJECTID, 
          Name, 
          Category, 
          everything() )  -> theFan
```


```{r}
ggplot( theFan ) +
  geom_sf( aes( fill=Category)) +
  scale_fill_brewer( type="qual", 
                     palette = "Set3")
```


## Operations

So we will close this out by looking at a few different operations that we can use for polygons.  First, I'm going to load in the road shapefile (that was named by some random sequence of letters) and reproject it.

```{r}
unzip( "./Roads.zip",)
st_read("Centerlines-shp/tran_Carriageway.shp") -> roads
head( roads, n=3)
```

Let's clean it up a bit.

```{r}
roads %>%
  select( -CreatedBy, -CreatedDat, -EditBy, -EditDate) -> roads
head( roads )
```

```{r}
plot( theFan$geometry, lwd=2 )
fanRoads <- st_crop( roads, st_bbox( theFan ))
plot( fanRoads$geometry, col="blue", cex=0.5, add=TRUE )
```



Let's isolate one of the main polygons in `theFan` data set.  The target one below is indicated by `OBJECTID=368`.

```{r}
theFan %>%
  mutate( Target = ifelse( OBJECTID == 368, 
                           TRUE, 
                           FALSE) ) -> theFan


theFan %>%
  ggplot() + 
  geom_sf( aes(fill=Target) ) + 
  geom_sf_text( aes(label=OBJECTID), size=3 ) +
  coord_sf() 
```

## Spatial Joins

```{r}
names( theFan )
names( fanRoads )
```

We can use *spatial joins* to select features either directly.  Here I'll use the target polygon in `theFan` 

```{r}
target <- theFan[ theFan$OBJECTID == 368, ]
target
```

And then add an attribute to the `data.frame` if each multipolygon `intersects` that polygon.

```{r}
fanRoads %>%
  mutate( OnTarget = st_intersects( fanRoads,
                                    target, 
                                    sparse = FALSE ) ) -> fanRoads
summary( fanRoads$OnTarget )
```



We can get the names of these road using normal `dplyr` routines,

```{r}
fanRoads %>%
  filter( st_intersects( fanRoads,
                         target, 
                         sparse = FALSE ) == TRUE ) %>%
  as_data_frame() %>%
  select( `Street Name` = FullName ) %>%
  arrange( `Street Name`) %>%
  unique() 
```

And we can plot them as:

```{r}
fanRoads %>%
  filter( OnTarget==TRUE ) %>%
  ggplot() +
  geom_sf( aes( fill = Target ), data=theFan ) +
  geom_sf( color="green" ) + 
  scale_fill_manual( values=c("grey90","dodgerblue3"))
```

Go check out the `sf` cheatsheet for more geospatial joins and options.



[^1]: **Dyer's First Law**: Reproject then forget about it!

[^2]: You should be careful when you use joins on `sf` objects.  If you `sf` object is on the right side (see discussion of joins [here](https://dyerlab.github.io/ENVS-Lectures/manipulation/relational_data/slides.html#5)) then the result will not be an `sf` object and you'll have to coerce it back into one again.  It always adopts the characteristics of the left object. 
