class: left, bottom background-image: url("images/contour.png") background-position: right background-size: auto # Visusalization with GGPlot ### Environmental Data Literacy <p> </p> <p> </p> <img src="images/logo1.svg" width="400px"> --- # Learning Objectives - Set up online repositories for data such as Google Drive and retrieve it directly into RStudio for utilization. - Describe how the Grammar of Graphics deviates from built-in `plot`ting commands. - Use an `aes`-thetic to define which data components of the data will be used in constructing the graphic. - Apply `geom`metric data layers to existing plots. --- class: sectionTitle # Online Repositories --- # Data Integrity Where data lives is an indication of its integrity. The following characteristics are indications of - Central location - Easily accessible - Described (metadata) - Versioning --- background-image: url("https://live.staticflickr.com/65535/50294277523_f97d3cfd61_c_d.jpg") background-position: center background-size: auto # GitHub/GitLab [Dyerlab GitHub Repository](https://github.com/dyerlab) --- # Loading Directly ## GitHub or other Repository ```r library( readr ) url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/arapat.csv" data <- read_csv(url) ``` --- # Loading Directly ```r head( data ) ``` ``` ## # A tibble: 6 x 3 ## Stratum Longitude Latitude ## <chr> <dbl> <dbl> ## 1 88 -114. 29.3 ## 2 9 -114. 29.0 ## 3 84 -114. 29.0 ## 4 175 -113. 28.7 ## 5 177 -114. 28.7 ## 6 173 -113. 28.4 ``` --- # Loading From Google Drive You can keep you data in a spreadsheet on Google Docs and then configure your repostitory to make it available for download to anyone who has a [Here](https://docs.google.com/spreadsheets/d/1Mk1YGH9LqjF7drJE-td1G_JkdADOU0eMlrP01WFBT8s) is a data set from the Rice Rivers Center with the following characteristics: - 8200 records - The following variables: `DateTime, RecordID, PAR, WindSpeed_mph, WindDir, AirTempF, RelHumidity, BP_HG, Rain_in, H2O_TempC, SpCond_mScm, Salinity_ppt, PH, PH_mv, Turbidity_ntu, Chla_ugl, BGAPC_CML, BGAPC_rfu, ODO_sat, ODO_mgl, Depth_ft, Depth_m, SurfaceWaterElev_m_levelNad83m` --- # Making Data Available .pull-left[ - Put your data into a Google Spreadsheet - Select `Publish to the Web` - Select `Link`, which sheet to publish, and `Comma-separated values (csv)` - Retrieve very long URL. ] .pull-right[ ![Publish Dialog](https://live.staticflickr.com/65535/50295015666_d6aeb39873_c_d.jpg) ] --- # Making Data Available .pull-left[ - Put your data into a Google Spreadsheet - Select `Publish to the Web` - Select `Link`, which sheet to publish, and `Comma-separated values (csv)` - Retrieve very long URL. ] .pull-right[ ![Publish Dialog](https://live.staticflickr.com/65535/50294350108_d8ebc88016_c_d.jpg) ] --- # Load in the Data ```r ## ## NOTICE I SPLIT THIS SO IT FITS ON ONE LINE IN THE SLIDES ## url_parts <- c( "https://docs.google.com/spreadsheets/d/e", "/2PACX-1vRm_v8JJPSipkDywT354v2owDBWa1j82", "_OhvdQmSBRztSv8YuWZBYe73T3jiY6suQhYoGCiV", "Y3gu9jW/pub?gid=0&single=true&output=csv" ) rice <- read_csv( paste(url_parts, collapse="") ) ``` --- # The Rice Data ``` ## # A tibble: 6 x 23 ## DateTime RecordID PAR WindSpeed_mph WindDir AirTempF RelHumidity BP_HG ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1/1/201… 43816 0 3.87 14.6 31.0 80.5 30.3 ## 2 1/1/201… 43817 0 4.79 18.5 30.7 82.1 30.3 ## 3 1/1/201… 43818 0 3.61 16.2 31.2 81.9 30.3 ## 4 1/1/201… 43819 0 2.99 11.5 30.5 83 30.3 ## 5 1/1/201… 43820 0 3.52 11.3 30.9 81.8 30.3 ## 6 1/1/201… 43821 0 3.83 20.0 30.6 82.8 30.3 ## # … with 15 more variables: Rain_in <dbl>, H2O_TempC <dbl>, SpCond_mScm <dbl>, ## # Salinity_ppt <dbl>, PH <dbl>, PH_mv <dbl>, Turbidity_ntu <dbl>, ## # Chla_ugl <dbl>, BGAPC_CML <dbl>, BGAPC_rfu <dbl>, ODO_sat <dbl>, ## # ODO_mgl <dbl>, Depth_ft <dbl>, Depth_m <dbl>, ## # SurfaceWaterElev_m_levelNad83m <dbl> ``` --- class: sectionTitle # The Grammar of Graphics --- class: middle background-image: url("https://live.staticflickr.com/65535/50295214407_bc0f4d10b6_c_d.jpg") background-position: center background-size: auto --- class: middle background-image: url("https://live.staticflickr.com/65535/50294412713_25bbd52230_c_d.jpg") background-position: center background-size: auto --- # Components of Graphical Objects - A aesthetic statement indicating which columns of data to use and how to use them in the plot (designating x-axis vs color, etc.). - An estimate of a trendline through the data (the red one), which displays a statistical summary of the raw data. - A set of geometric overlays for the points which include size and shape configurations. - Specified color scheme for the regions. - Labeling of a subset of the data (which is done using a separate data.frame derived from the first). - Labels on axes. - A legend positioned in a specific fashion. - A title over the whole thing. - A theme for the rest of the coloring and customized lines and grids. --- class: middle background-image: url("https://live.staticflickr.com/65535/50295214407_bc0f4d10b6_c_d.jpg") background-position: center background-size: auto --- # The Grammar of Graphics .pull-left[ Components of graphics: - Data - Aesthetics - Transformations - Partitions - Auxillary Text - Overlays ] .pull-right[ ![asdf](https://live.staticflickr.com/65535/50295255672_46390d9ee5_w_d.jpg) ] --- # The `ggplot2` Library .pull-left[ ![Tidyverse](https://live.staticflickr.com/65535/50295284047_ebb5dec2e8_w_d.jpg) ] .pull-right[ ### R Packages for Data Science - RStudio + Hadley Wickham - Collection of Packages - Makes you .red[AWESOME] ```r library( ggplot2 ) ``` ] --- # The Aesthetics An *aesthetic* is a function that allows you to tell the graphics which columns of data are to be used in the creation of graph features. ```r aes( x = Sepal.Length, y = Sepal.Width ) ``` ``` ## Aesthetic mapping: ## * `x` -> `Sepal.Length` ## * `y` -> `Sepal.Width` ``` -- Commonly included **within** the initial call to `ggplot()` ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width ) ) ``` --- # Stepwise Creation of a Plot ```r ggplot( iris ) ``` <img src="slides_files/figure-html/unnamed-chunk-9-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r ggplot( iris, aes( x = Sepal.Length) ) ``` <img src="slides_files/figure-html/unnamed-chunk-10-1.png" width="40%" style="display: block; margin: auto;" /> --- # Adding Geometry Layer ```r ggplot( iris, aes( x = Sepal.Length) ) + geom_histogram() ``` <img src="slides_files/figure-html/unnamed-chunk-11-1.png" width="40%" style="display: block; margin: auto;" /> --- # Simple Density Plot ```r ggplot( iris, aes( x = Sepal.Length) ) + geom_density() ``` <img src="slides_files/figure-html/unnamed-chunk-12-1.png" width="40%" style="display: block; margin: auto;" /> --- # A Scatter Plot ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + geom_point() ``` <img src="slides_files/figure-html/unnamed-chunk-13-1.png" width="40%" style="display: block; margin: auto;" /> --- # Scatterplot with Colors *Aesthetics* also contribute to symbologies and colors ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width, color=Species) ) + geom_point() ``` <img src="slides_files/figure-html/unnamed-chunk-14-1.png" width="40%" style="display: block; margin: auto;" /> --- # In & Out of `aes()` ```r ggplot( iris ) + geom_point(aes( x = Sepal.Length, y = Sepal.Width, col=Species), shape=5) ``` <img src="slides_files/figure-html/unnamed-chunk-15-1.png" width="40%" style="display: block; margin: auto;" /> --- # Iterative Building of Graphics ```r p <- ggplot( iris ) p <- p + geom_point( aes( x = Sepal.Length, y = Sepal.Width, col=Species, shape=Species), size=3, alpha=0.75 ) p <- p + xlab("Sepal Length") p <- p + ylab("Sepal Length") class(p) ``` ``` ## [1] "gg" "ggplot" ``` --- # Printing out `p` ```r p ``` <img src="slides_files/figure-html/unnamed-chunk-17-1.png" width="40%" style="display: block; margin: auto;" /> --- # Scope of Visibility Only things in `ggplot()` apply to *all* following components. Placing `aes()` or `data=` parts in later components *only* make them visible to that particular component. ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + geom_point() + stat_smooth() ``` -- ```r ggplot( iris ) + geom_point( aes( x = Sepal.Length, y = Sepal.Width) ) + stat_smooth( aes( x = Sepal.Length, y = Sepal.Width) ) ``` -- ```r ggplot() + geom_point( aes( x = Sepal.Length, y = Sepal.Width), data = iris ) + stat_smooth( aes( x = Sepal.Length, y = Sepal.Width), data = iris ) ``` --- class: sectionTitle # Themes ## Additive Components to the Plot --- class: middle background-image: url("https://media.giphy.com/media/dWEk3w1Uo97qw/giphy.gif") background-position: center background-size: cover .top[ <font color="white" size=24>Themes are also Customizable!!</font> ] <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> --- ```r p + theme_bw() ``` <img src="slides_files/figure-html/unnamed-chunk-21-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_gray() ``` <img src="slides_files/figure-html/unnamed-chunk-22-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_linedraw() ``` <img src="slides_files/figure-html/unnamed-chunk-23-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_dark() ``` <img src="slides_files/figure-html/unnamed-chunk-24-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_minimal() ``` <img src="slides_files/figure-html/unnamed-chunk-25-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_classic() ``` <img src="slides_files/figure-html/unnamed-chunk-26-1.png" width="40%" style="display: block; margin: auto;" /> --- ```r p + theme_void() ``` <img src="slides_files/figure-html/unnamed-chunk-27-1.png" width="40%" style="display: block; margin: auto;" /> --- # Customizing Theme Components ```r p + theme_bw( base_size = 18 ) ``` <img src="slides_files/figure-html/unnamed-chunk-28-1.png" width="40%" style="display: block; margin: auto;" /> --- class: inverse # Create Your Own Themes ```r source("theme_dyerlab_grey.R") p + theme_dyerlab_grey() ``` <img src="slides_files/figure-html/unnamed-chunk-29-1.png" width="40%" style="display: block; margin: auto;" /> --- # Boxplot ```r ggplot( iris, aes( x = Sepal.Length) ) + geom_boxplot( notch=TRUE ) ``` <img src="slides_files/figure-html/unnamed-chunk-31-1.png" width="40%" style="display: block; margin: auto;" /> --- # Species Differences ```r ggplot( iris, aes(x=Species, y=Sepal.Length) ) + geom_boxplot( notch=TRUE ) ``` <img src="slides_files/figure-html/unnamed-chunk-32-1.png" width="40%" style="display: block; margin: auto;" /> --- # Species Differences Fill Colors ```r ggplot( iris, aes(x=Species, y=Sepal.Length) ) + geom_boxplot( notch=TRUE, fill=c("#002145") ) + ylab("Sepal Length") ``` <img src="slides_files/figure-html/unnamed-chunk-33-1.png" width="40%" style="display: block; margin: auto;" /> --- # Species Differences Fill Colors ```r ggplot( iris, aes(x=Species, y=Sepal.Length) ) + geom_boxplot( notch=TRUE, fill=c("#002145", "#a5acaf","#66c010") ) + ylab("Sepal Length") ``` <img src="slides_files/figure-html/unnamed-chunk-34-1.png" width="40%" style="display: block; margin: auto;" /> --- class: sectionTitle # Overlays --- # Overlaying a Trendline ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + geom_point() + stat_smooth() ``` <img src="slides_files/figure-html/unnamed-chunk-35-1.png" width="40%" style="display: block; margin: auto;" /> --- # Overlaying a Trendline ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + geom_point() + stat_smooth( method="lm", formula = "y ~ x") ``` <img src="slides_files/figure-html/unnamed-chunk-36-1.png" width="40%" style="display: block; margin: auto;" /> --- # Stacking Order ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + geom_point( color="red") + stat_smooth( fill="black", alpha=1) ``` <img src="slides_files/figure-html/unnamed-chunk-37-1.png" width="40%" style="display: block; margin: auto;" /> --- # Stacking Order ```r ggplot( iris, aes( x = Sepal.Length, y = Sepal.Width) ) + stat_smooth( fill="black", alpha=1) + geom_point( color="red") ``` <img src="slides_files/figure-html/unnamed-chunk-38-1.png" width="40%" style="display: block; margin: auto;" /> --- # On-The-Fly Transformations Customizing the y-axis data format... ```r ggplot( iris, aes(x = Sepal.Length) ) + geom_histogram( aes( y = ..density.. ), color="green", fill="orange", bins = 15 ) + geom_density( color = "magenta", lwd=1.5 ) ``` <img src="slides_files/figure-html/unnamed-chunk-39-1.png" width="30%" style="display: block; margin: auto;" /> --- # Textual Overlays ```r cor_model <- cor.test( iris$Sepal.Length, iris$Sepal.Width) cor_model ``` ``` ## ## Pearson's product-moment correlation ## ## data: iris$Sepal.Length and iris$Sepal.Width ## t = -1.4403, df = 148, p-value = 0.1519 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.27269325 0.04351158 ## sample estimates: ## cor ## -0.1175698 ``` -- ```r names( cor_model ) ``` ``` ## [1] "statistic" "parameter" "p.value" "estimate" "null.value" ## [6] "alternative" "method" "data.name" "conf.int" ``` --- # Formatting as Text ```r cor.text <- paste( "r = ", format( cor_model$estimate, digits=4), ";\n P = ", format( cor_model$p.value, digits=4 ), sep="" ) cor.text ``` ``` ## [1] "r = -0.1176;\n P = 0.1519" ``` --- ```r p + geom_text( aes(x=7.25, y=4.25, label=cor.text) ) ``` <img src="slides_files/figure-html/unnamed-chunk-43-1.png" width="55%" style="display: block; margin: auto;" /> --- # Labels ```r sLength <- by( iris$Sepal.Length, iris$Species, mean ) sWidth <- by( iris$Sepal.Width, iris$Species, mean ) df <- data.frame( Sepal.Length = as.numeric( sLength ), Sepal.Width = as.numeric( sWidth ), Species = levels( iris$Species ) ) df ``` ``` ## Sepal.Length Sepal.Width Species ## 1 5.006 3.428 setosa ## 2 5.936 2.770 versicolor ## 3 6.588 2.974 virginica ``` --- # Labels ```r ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point( aes(color=Species) ) + geom_text( aes(label=Species), data=df) ``` <img src="slides_files/figure-html/unnamed-chunk-45-1.png" width="40%" style="display: block; margin: auto;" /> --- # Smart Labels ```r library( ggrepel ) ``` ![ggrepel Help](https://live.staticflickr.com/65535/50305860422_f3cf5f5545_c_d.jpg) --- ```r ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point( aes(color=Species) ) + geom_label_repel( aes(label=Species), data=df ) ``` <img src="slides_files/figure-html/unnamed-chunk-47-1.png" width="40%" style="display: block; margin: auto;" /> --- # Remove Legend ```r ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point( aes(color=Species) ) + geom_label_repel( aes(label=Species), data=df ) + guides( color = FALSE ) ``` <img src="slides_files/figure-html/unnamed-chunk-48-1.png" width="40%" style="display: block; margin: auto;" /> --- class: middle background-image: url("images/contour.png") background-position: right background-size: auto .center[ # Questions? ![Peter Sellers](images/peter_sellers.gif) ] <p> </p> .bottom[ If you have any questions for about the content presented herein, please feel free to send me an [email](mailto://rjdyer@vcu.edu) and I'll get back to you as soon as possible.]