class: left, bottom background-image: url("images/contour.png") background-position: right background-size: auto # Basic Data Visualization ### Environmental Data Literacy <p> </p> <p> </p> <img src="images/logo1.svg" width="400px"> --- background-image: url("images/throw_into_pool.gif") background-position: center background-size: cover ## .white[How most learn to make graphics] --- background-color: black <iframe width="100%" height="100%" src="https://www.youtube.com/embed/tDDzStY8WMU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- class: center, middle background-image: url("images/DataWorkCycle.png") background-size: contain --- # The Example Data - Iris There is a classic data set in statistics called *Fisher's Iris Data Set* (see more about [Ronald Fisher](https://en.wikipedia.org/wiki/Ronald_Fisher) ) looking at 50 measurements of *sepal* and *pedal* lengths amongst three species of *Iris*.<sup>1</sup> ![Iris species](https://live.staticflickr.com/65535/50163458792_2e3e877468_c_d.jpg) .footnote[[1] These data were first published in Fisher RA (1936) The use of multiple measurements in taxonomic problems. *Annals of Eugenics*, **7**: [doi](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x)).] --- # Anatomy of a Flower -- ![Iris morphology](https://live.staticflickr.com/65535/50274494323_4e215e571b_c_d.jpg) --- # The Example Data - Iris
--- # The Example Data - Iris -- ```r summary( iris ) ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 ## Median :5.800 Median :3.000 Median :4.350 Median :1.300 ## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 ## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 ## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ## ``` --- # Learning Objectives For this lecture, the learning objectives include: - Create univariate and bivariate plots of data (continuous-continuous & continuous-categorical). - Apply varying basic symbologies for representing data in plots. - Use named and hex colors to better --- class: sectionTitle # Univariate Data --- # A Single Vector of Data ```r sepal_length <- iris$Sepal.Length head(sepal_length) ``` ``` ## [1] 5.1 4.9 4.7 4.6 5.0 5.4 ``` --- # A Single Vector of Data - Histograms ```r hist( sepal_length ) ``` <img src="slides_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Arguments to Customize Plots - `xlab` & `ylab`: The names attached to both x- and y-axes. - `main`: The title on top of the graph. - `breaks`: This controls the way in which the original data are partitioned (e.g., the width of the bars along the x-axis). - If you pass a single number, `n` to this option, the data will be partitioned into `n` bins. - If you pass a sequence of values to this, it will use this sequence as the boundaries of bins. - `col`: The color of the bar (not the border) - `probability`: A flag as either `TRUE` or `FALSE` (the default) to have the y-axis scaled by total likelihood of each bins rather than a count of the numbrer of elements in that range. --- # Density Plots ```r d_sepal.length <- density( sepal_length ) d_sepal.length ``` ``` ## ## Call: ## density.default(x = sepal_length) ## ## Data: sepal_length (150 obs.); Bandwidth 'bw' = 0.2736 ## ## x y ## Min. :3.479 Min. :0.0001495 ## 1st Qu.:4.790 1st Qu.:0.0341599 ## Median :6.100 Median :0.1534105 ## Mean :6.100 Mean :0.1905934 ## 3rd Qu.:7.410 3rd Qu.:0.3792237 ## Max. :8.721 Max. :0.3968365 ``` -- ```r class(d_sepal.length) ``` ``` ## [1] "density" ``` --- # Density Plots -- ```r plot( d_sepal.length ) ``` <img src="slides_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- class: sectionTitle # Bivariate Data --- # The Generality of `plot()` In `R`, many objects understand how to `plot` themselves. - Density objects - Analyses (regression, ANOVA, etc) - points, lines, polygons, & rasters --- # A Scatter Plot -- ```r plot( iris$Sepal.Length, iris$Sepal.Width ) ``` <img src="slides_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # Functional Forms ### Listing Data as Separate Values ```r plot( x, y ) ``` -- ### List Functional Form of the Data ```r plot( y ~ x) ``` --- class: sectionTitle # Customizing Plots --- # `plot()` Options Parameter | Description ----------|----------------------------------------------------------------------------------------------------------- `type` | The kind of plot to show ('p'oint, 'l'ine, 'b'oth, or 'o'ver). A point plot is the default. `pch` | The character (or symbol) being used to plot. There 26 recognized general characters to use for plotting. The default is `pch=1`. `col` | The color of the symbols/lines that are plot. `cex` | The magnification size of the character being plot. The default is `cex=1` and deviation from that will increase (cex > 1) or decrease (0 < cex < 1) the scaling of the symbols. Also works for `cex.lab` and `cex.axis`. `lwd` | The width of any lines in the plot. `lty` | The type of line to be plot (solid, dashed, etc.) `bty` | The 'Box' type around the plot ("o", "1", "7","c","u", "]", and my favorite "n") --- <img src="slides_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- # Species Differences in the `iris` dataset ```r summary( iris ) ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 ## Median :5.800 Median :3.000 Median :4.350 Median :1.300 ## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 ## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 ## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ## ``` --- # Symbology ```r symbol <- as.numeric( iris$Species) symbol ``` ``` ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ## [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 ## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ## [149] 3 3 ``` --- # Species Differences by Symbol ```r plot( iris$Sepal.Length, iris$Sepal.Width, pch=symbol ) ``` <img src="slides_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- # Additional Customizations ```r plot( Sepal.Width ~ Sepal.Length, data = iris, pch = symbol, bty="n", cex=1.5, cex.axis=1.5, cex.lab = 1.5, xlab="Sepal Length", ylab="Sepal Width") ``` <img src="slides_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- class: sectionTitle # Color Spaces ## Named & Hexadecimal --- # Named Colors In `R`, there are 657 different *named* colors accessable through the function `colors()`. ```r sample( colors(), size=5, replace = FALSE ) ``` ``` ## [1] "red2" "gray62" "slateblue2" "gray83" "pink3" ``` --- ```r raw_colors <- sample( colors(), size=3, replace=FALSE) colors <- raw_colors[ symbol ] colors ``` ``` ## [1] "gold1" "gold1" "gold1" "gold1" ## [5] "gold1" "gold1" "gold1" "gold1" ## [9] "gold1" "gold1" "gold1" "gold1" ## [13] "gold1" "gold1" "gold1" "gold1" ## [17] "gold1" "gold1" "gold1" "gold1" ## [21] "gold1" "gold1" "gold1" "gold1" ## [25] "gold1" "gold1" "gold1" "gold1" ## [29] "gold1" "gold1" "gold1" "gold1" ## [33] "gold1" "gold1" "gold1" "gold1" ## [37] "gold1" "gold1" "gold1" "gold1" ## [41] "gold1" "gold1" "gold1" "gold1" ## [45] "gold1" "gold1" "gold1" "gold1" ## [49] "gold1" "gold1" "gray16" "gray16" ## [53] "gray16" "gray16" "gray16" "gray16" ## [57] "gray16" "gray16" "gray16" "gray16" ## [61] "gray16" "gray16" "gray16" "gray16" ## [65] "gray16" "gray16" "gray16" "gray16" ## [69] "gray16" "gray16" "gray16" "gray16" ## [73] "gray16" "gray16" "gray16" "gray16" ## [77] "gray16" "gray16" "gray16" "gray16" ## [81] "gray16" "gray16" "gray16" "gray16" ## [85] "gray16" "gray16" "gray16" "gray16" ## [89] "gray16" "gray16" "gray16" "gray16" ## [93] "gray16" "gray16" "gray16" "gray16" ## [97] "gray16" "gray16" "gray16" "gray16" ## [101] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [105] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [109] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [113] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [117] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [121] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [125] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [129] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [133] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [137] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [141] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [145] "palevioletred3" "palevioletred3" "palevioletred3" "palevioletred3" ## [149] "palevioletred3" "palevioletred3" ``` --- # Adding a Legend -- ```r plot( Sepal.Width ~ Sepal.Length, data = iris, col = colors, pch=20, bty="n", cex=2, xlab="Sepal Length", ylab="Sepal Width") legend(6.5,4.3, pch=20, cex=1.5, col=raw_colors,legend=levels(iris$Species) ) ``` <img src="slides_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- # Hex Colors Color spaces defined by: - Red - Green - Blue -- In base-16 no less: > 0 1 2 3 4 5 6 7 8 9 A B C D E F So for 2-digits, that is 256 distinct values for each color > 00 → FF --- # Hex Colors Represented triplets of RRGGBB preceded by hashtag ```r raw_colors <- c("#86cb92", "#8e4162", "#260F26") colors <- raw_colors[ symbol ] ``` --- <img src="slides_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- # Color Theme Generators Google up something like "Color Theme Generator" and see what you find. One I use is: [coolors](https://coolors.co) --- # Color Brewer [Color Brewer](https://colorbrewer2.org) --- # Color Brewer in R
--- ```r library(RColorBrewer) display.brewer.all() ``` <img src="slides_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- class: sectionTitle # Categorical Data --- # Categorical Dependent Data
--- # Mean Sepal Length, by Species ```r mu.Setosa <- mean( iris$Sepal.Length[ iris$Species == "setosa" ]) mu.Versicolor <- mean( iris$Sepal.Length[ iris$Species == "versicolor" ]) mu.Virginica <- mean( iris$Sepal.Length[ iris$Species == "virginica" ]) meanSepalLength <- c( mu.Setosa, mu.Versicolor, mu.Virginica ) meanSepalLength ``` ``` ## [1] 5.006 5.936 6.588 ``` --- # The BarPlot ```r barplot( meanSepalLength, names.arg = c("setosa","versicolor","virginica"), xlab="Iris Species", ylab="Mean Sepal Length") ``` <img src="slides_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> --- # A Shortcut ```r mu.Setosa <- mean( iris$Sepal.Length[ iris$Species == "setosa" ]) mu.Versicolor <- mean( iris$Sepal.Length[ iris$Species == "versicolor" ]) mu.Virginica <- mean( iris$Sepal.Length[ iris$Species == "virginica" ]) meanSepalLength <- c( mu.Setosa, mu.Versicolor, mu.Virginica ) meanSepalLength ``` -- ```r meanSepalLength <- by( iris$Sepal.Length, iris$Species, mean ) meanSepalLength ``` ``` ## iris$Species: setosa ## [1] 5.006 ## ------------------------------------------------------------ ## iris$Species: versicolor ## [1] 5.936 ## ------------------------------------------------------------ ## iris$Species: virginica ## [1] 6.588 ``` --- ```r barplot( meanSepalLength, xlab = "Iris Species", ylab = "Average Sepal Length") ``` <img src="slides_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- # The Boxplot - High Density Information A boxplot contains a high amount of information content and is appropriate when the groupings on the x-axis are categorical. For each category, the graphical representation includes: - The median value for the raw data - A box indicating the area between the first and third quartile (e.g,. the values enclosing the 25% - 75% of the data). The top and bottoms are often referred to as the *hinges* of the box. - A notch (if requested), represents confidence around the estimate of the median. - Whiskers extending out to shows `\(\pm 1.5 * IQR\)` (the Inner Quartile Range) - Any points of the data that extend beyond the whiskers are plot as points. --- ```r boxplot( Sepal.Length ~ Species, data=iris, notch=TRUE, ylab="Sepal Length") ``` <img src="slides_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> --- class: sectionTitle # Overrlaying Text --- # Textifying Your Plot -- ```r cor <- cor.test( iris$Sepal.Length, iris$Sepal.Width ) cor ``` ``` ## ## Pearson's product-moment correlation ## ## data: iris$Sepal.Length and iris$Sepal.Width ## t = -1.4403, df = 148, p-value = 0.1519 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.27269325 0.04351158 ## sample estimates: ## cor ## -0.1175698 ``` -- ```r cor.text <- paste( "r = ", format( cor$estimate, digits=4), "; P = ", format( cor$p.value, digits=4 ), sep="" ) cor.text ``` ``` ## [1] "r = -0.1176; P = 0.1519" ``` --- ```r plot( Sepal.Width ~ Sepal.Length, data = iris, col=colors, pch=20, bty="n", xlab="Sepal Length", ylab="Sepal Width") text( 6.5, 4.2, cor.text, cex=1.2 ) ``` <img src="slides_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> --- class: middle background-image: url("images/contour.png") background-position: right background-size: auto .center[ # Questions? ![Peter Sellers](images/peter_sellers.gif) ] <p> </p> .bottom[ If you have any questions for about the content presented herein, please feel free to drop a comment on the class LMS and I'll get back to you as soon as possible.]