The Title

class: left, bottom
background-image: url("images/contour.png")
background-position: right
background-size: auto

# Numerical Data<br/> & Data Frames

### This is where we leave ![Excel](images/excel.png) behind

---

class: sectionTitle, inverse

# Numerical Data

---

# Numerical Data

In `R`, numerical data is largely represented by a data type called `numeric`.

--
- For most purposes, this is the only data type we will need (though `integer` types and specialized libraries exist).

--
- Magnitude determined by your computer (my MacBook can handle 2.225074e-308 - 1.797693e+308).

---

# Operators

In many ways, `R` can act just like an interactive calculator.  *Arithmetic operators* are just like normal.

```r
x <- 10
y <- 23
x + y
x - y
x * y
x / y
```

---

# Exponential Operators

*Exponents* use the carat on the keyboard (on us-english keyboards it is above the #6 key). So the value of `$2^16$` is

```r
2^16
```

```
## [1] 65536
```

Roots are found by inverting the exponent.  For example, the `$\;^3\sqrt{27}$` (cube-root of 27) is

```r
27^(1/3)
```

```
## [1] 3
```

---

# Logrithms

The logrithms are provided as the function `log()` which defaults to the natural log

```r
log( 10 )
```

```
## [1] 2.302585
```

You can change the base by passing the function the optional argument (make sure you separate the value from the optional argument with a comma).

```r
log( 10, base=10 )
```

```
## [1] 1
```

---
# Additional Operators

.center[ *Potential Operations >>> Symbols on Keyboard* ]

*Modulus Operator*

```r
23 %% 10 
```

```
## [1] 3
```

---

# Order of Operations

The order of precedence for operations are just like you learned in math class.

```r
x1 <- 23
y1 <- 55
x2 <- 56
y2 <- 63
distance <- sqrt(  (x1-x2)^2 + (y1-y2)^2 )
distance
```

```
## [1] 33.95585
```

---

# `?Syntax`

Operator        |    Description
----------------|-------------------------------------------
:: :::	        | access variables in a namespace
$ @	            | component / slot extraction
[ [[	          | indexing
^	              | exponentiation (right to left)
- +	            |  unary minus and plus
:	              |   sequence operator
%any%	          | special operators (including %% and %/%)
* /	            | multiply, divide
+ -	            | (binary) add, subtract
< > <= >= == !=	| ordering and comparison
!	              | negation
& &&	          | and
&vert; &vert;&vert;	      | or
~	              | as in formulae
-> ->>	        | rightwards assignment
<- <<-	        | assignment (right to left)
=	              | assignment (right to left)
?	              | help (unary and binary)

---
class: sectionTitle

# Introspection & Coercion

---

# Introspection

In `R`, each variable can be queried about it's `class` (what kind of data that particular variable holds).

```r
x <- 42
class( x )
```

```
## [1] "numeric"
```

You can also ask if it is a particular type using the `is.numeric()` function.

```r
is.numeric( x )
```

```
## [1] TRUE
```

---

# Coercion

We can also turn *one representation* of our data into a different different type, though there are limitations.  For example, if we just read in a text file and it has a represented as text (a [Character Data Type](../character_data/slides.html) in `R`) but we need to have it function as a `numeric` type, we can use the following approach

```r
x <- "42"
class( x )
```

```
## [1] "character"
```

The create a new variable who (if possible) contains the numeric representation of the character string `"42"`.

```r
y <- as.numeric( x )
class(y)
```

```
## [1] "numeric"
```

---

# Coercion Fail

When it fails, it returns a warning and a missing data value.

```r
as.numeric( "Bob" )
```

```
## Warning: NAs introduced by coercion
```

```
## [1] NA
```

&nbsp;

<div class="box-red">It is acknowledged that many error messages in R may not be "comprehensible" to the user and it is not clear if this is a *feature* or a *bug*.</div>

---
class: sectionTitle

# Caveats

---

# Order of Operations

There are times that the order of operations will really come back to .red[bite you].  Consider this example where I create a sequence of numbers using the sequence operator (`:`)

```r
n <- 4
1:n
```

```
## [1] 1 2 3 4
```

So if we wanted to make a sequence from 1 to `$n-1$`, we *could* type this:

```r
1:n-1
```

```
## [1] 0 1 2 3
```

---

To *fix* this, feel free to be *verbose* in your use of parentheses.  If you are intending to get `$10^2$`, `$10^3$` `$\ldots$` `$10^6$` and type it as:

```r
10^2:6
```

```
##  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84  83  82
## [20]  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66  65  64  63
## [39]  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47  46  45  44
## [58]  43  42  41  40  39  38  37  36  35  34  33  32  31  30  29  28  27  26  25
## [77]  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10   9   8   7   6
```

What you want is:

```r
10^(2:6)
```

```
## [1] 1e+02 1e+03 1e+04 1e+05 1e+06
```

<div class="box-yellow">Notice how the second (and intended) code is actually easier to read than the first.</div>
---

# Numerical Approximations

Computers use binary switches to represent numbers.  For integers, it is great, but for floating point numbers it .red[sucks], big time.

Consider the following:

```r
x <- .1
y <- .3 / 3 
```

But if we ask if they are equal, what do you expect?

```r
x == y
```

```
## [1] FALSE
```

```r
print(x, digits=20)
```

```
## [1] 0.10000000000000000555
```

```r
print(y, digits=20)
```

```
## [1] 0.099999999999999991673
```

---

## 15 Minute Activity - Numerical Operations

Create an R script named `numerical_operators.R` in the project folder and answer the following questions.  Copy each of these questions as commented text into your script.

1. Define a variable named `temp` and set it to the temperature of this room.  Did you use degrees Fahrenheit?  Write the code to convert this to Celcius.  (or the other way around if you used the SI).

2. The function `rnorm(500)` will give you 500 random number from the normal probability distribution.   Use it to assign these values to a variable named `theData`.  Find them mean, variance, and standard deviation of these data (hint: `mean()`, `var()`, and `sd()` are what you are looking for—use the help function for these to learn more about them).  Also try `summary()`.

3. Consider Dr. Dyer’s need for fresh [charcuterie](https://en.wikipedia.org/wiki/Charcuterie) in his life. Luckily, Richmond has a spectacular butcher in Carytown, [Belmont Butchery](http://belmontbutchery.com/). Below are the coordinates for both Dyer’s office and the purveyor of fine meat products denoted as Meters in Virginia State Plane (4502). Use your old friend, the Pythagorean theorem (shown a few slides ago) to figure out the distance between these two points.  Present your results in miles.

```r
office <- c(3592374.948, 1134930.213)
belmont <- c(3590195.540, 1136003.201)
```

---
class: sectionTitle, inverse

# Data Frames!

![Yes](https://media.giphy.com/media/f6VfCFyOL5KmiICskp/giphy.gif)

---

# Data Frames & Related Materials

> Data frames are a structure that can hold many different data types in one simple structure.

Data frames are the *lingua franca* for `R`, especially once we start getting into more complicated analysis and manipulation.  For simplicity, one can consider a `data.frame` object much like a spreadsheet.  Each row represents a record on some object and each column—consisting of different kinds of data—are measurements on that object.

---

<div id="htmlwidget-757ae4a5ac65e16878b3" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-757ae4a5ac65e16878b3">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150"],[5.1,4.9,4.7,4.6,5,5.4,4.6,5,4.4,4.9,5.4,4.8,4.8,4.3,5.8,5.7,5.4,5.1,5.7,5.1,5.4,5.1,4.6,5.1,4.8,5,5,5.2,5.2,4.7,4.8,5.4,5.2,5.5,4.9,5,5.5,4.9,4.4,5.1,5,4.5,4.4,5,5.1,4.8,5.1,4.6,5.3,5,7,6.4,6.9,5.5,6.5,5.7,6.3,4.9,6.6,5.2,5,5.9,6,6.1,5.6,6.7,5.6,5.8,6.2,5.6,5.9,6.1,6.3,6.1,6.4,6.6,6.8,6.7,6,5.7,5.5,5.5,5.8,6,5.4,6,6.7,6.3,5.6,5.5,5.5,6.1,5.8,5,5.6,5.7,5.7,6.2,5.1,5.7,6.3,5.8,7.1,6.3,6.5,7.6,4.9,7.3,6.7,7.2,6.5,6.4,6.8,5.7,5.8,6.4,6.5,7.7,7.7,6,6.9,5.6,7.7,6.3,6.7,7.2,6.2,6.1,6.4,7.2,7.4,7.9,6.4,6.3,6.1,7.7,6.3,6.4,6,6.9,6.7,6.9,5.8,6.8,6.7,6.7,6.3,6.5,6.2,5.9],[3.5,3,3.2,3.1,3.6,3.9,3.4,3.4,2.9,3.1,3.7,3.4,3,3,4,4.4,3.9,3.5,3.8,3.8,3.4,3.7,3.6,3.3,3.4,3,3.4,3.5,3.4,3.2,3.1,3.4,4.1,4.2,3.1,3.2,3.5,3.6,3,3.4,3.5,2.3,3.2,3.5,3.8,3,3.8,3.2,3.7,3.3,3.2,3.2,3.1,2.3,2.8,2.8,3.3,2.4,2.9,2.7,2,3,2.2,2.9,2.9,3.1,3,2.7,2.2,2.5,3.2,2.8,2.5,2.8,2.9,3,2.8,3,2.9,2.6,2.4,2.4,2.7,2.7,3,3.4,3.1,2.3,3,2.5,2.6,3,2.6,2.3,2.7,3,2.9,2.9,2.5,2.8,3.3,2.7,3,2.9,3,3,2.5,2.9,2.5,3.6,3.2,2.7,3,2.5,2.8,3.2,3,3.8,2.6,2.2,3.2,2.8,2.8,2.7,3.3,3.2,2.8,3,2.8,3,2.8,3.8,2.8,2.8,2.6,3,3.4,3.1,3,3.1,3.1,3.1,2.7,3.2,3.3,3,2.5,3,3.4,3],[1.4,1.4,1.3,1.5,1.4,1.7,1.4,1.5,1.4,1.5,1.5,1.6,1.4,1.1,1.2,1.5,1.3,1.4,1.7,1.5,1.7,1.5,1,1.7,1.9,1.6,1.6,1.5,1.4,1.6,1.6,1.5,1.5,1.4,1.5,1.2,1.3,1.4,1.3,1.5,1.3,1.3,1.3,1.6,1.9,1.4,1.6,1.4,1.5,1.4,4.7,4.5,4.9,4,4.6,4.5,4.7,3.3,4.6,3.9,3.5,4.2,4,4.7,3.6,4.4,4.5,4.1,4.5,3.9,4.8,4,4.9,4.7,4.3,4.4,4.8,5,4.5,3.5,3.8,3.7,3.9,5.1,4.5,4.5,4.7,4.4,4.1,4,4.4,4.6,4,3.3,4.2,4.2,4.2,4.3,3,4.1,6,5.1,5.9,5.6,5.8,6.6,4.5,6.3,5.8,6.1,5.1,5.3,5.5,5,5.1,5.3,5.5,6.7,6.9,5,5.7,4.9,6.7,4.9,5.7,6,4.8,4.9,5.6,5.8,6.1,6.4,5.6,5.1,5.6,6.1,5.6,5.5,4.8,5.4,5.6,5.1,5.1,5.9,5.7,5.2,5,5.2,5.4,5.1],[0.2,0.2,0.2,0.2,0.2,0.4,0.3,0.2,0.2,0.1,0.2,0.2,0.1,0.1,0.2,0.4,0.4,0.3,0.3,0.3,0.2,0.4,0.2,0.5,0.2,0.2,0.4,0.2,0.2,0.2,0.2,0.4,0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.2,0.3,0.3,0.2,0.6,0.4,0.3,0.2,0.2,0.2,0.2,1.4,1.5,1.5,1.3,1.5,1.3,1.6,1,1.3,1.4,1,1.5,1,1.4,1.3,1.4,1.5,1,1.5,1.1,1.8,1.3,1.5,1.2,1.3,1.4,1.4,1.7,1.5,1,1.1,1,1.2,1.6,1.5,1.6,1.5,1.3,1.3,1.3,1.2,1.4,1.2,1,1.3,1.2,1.3,1.3,1.1,1.3,2.5,1.9,2.1,1.8,2.2,2.1,1.7,1.8,1.8,2.5,2,1.9,2.1,2,2.4,2.3,1.8,2.2,2.3,1.5,2.3,2,2,1.8,2.1,1.8,1.8,1.8,2.1,1.6,1.9,2,2.2,1.5,1.4,2.3,2.4,1.8,1.8,2.1,2.4,2.3,1.9,2.3,2.5,2.3,1.9,2,2.3,1.8],["setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","setosa","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","versicolor","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica","virginica"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Sepal.Length<\/th>\n      <th>Sepal.Width<\/th>\n      <th>Petal.Length<\/th>\n      <th>Petal.Width<\/th>\n      <th>Species<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2,3,4]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

# Introspection

```r
class( iris )
```

```
## [1] "data.frame"
```

```r
dim( iris )
```

```
## [1] 150   5
```

```r
summary( iris )
```

```
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
```

---

# Properties of Data Frame Objects

```r
nrow(iris)
```

```
## [1] 150
```

```r
ncol(iris)
```

```
## [1] 5
```

```r
names(iris)
```

```
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
```

---

# Accessing Internal Elements

Accessing elements within a `data.frame` can be done by grid position (row,col) or by column entry.  Here is an example showing the third entry in the `Petal.Width` column using numerical coordinates:

```r
iris[3,4]
```

```
## [1] 0.2
```

And column entries.

```r
iris$Petal.Width[3]
```

```
## [1] 0.2
```

### ☝️ column notation much more readable

---

# Creating Raw Data Frames

Data frames can hold different kinds data types in a grid-like format.  *Rows* are records for observations and *Columns* represent individual measurements on each object.

```r
site <- c( "Const","ESan", "Aqu")
longitude <- c( -111.675, -110.3686, -110.1043)
latitude <- c(25.0247, 24.45879, 23.2855)
```

```r
sites <- data.frame( Site = site,
                     Longitude = longitude,
                     Latitude = latitude )
class( sites )
```

```
## [1] "data.frame"
```

```r
dim( sites ) 
```

```
## [1] 3 3
```

```r
names( sites ) # shorthand for colnames
```

```
## [1] "Site"      "Longitude" "Latitude"
```

---

# Viewing Data Frame Objects.

If the data are small enough, we can visualize it all by printing out the elements.  It is also possible have each column of data to summarize itself.

.pull-left[

```r
sites
```

```
##    Site Longitude Latitude
## 1 Const -111.6750 25.02470
## 2  ESan -110.3686 24.45879
## 3   Aqu -110.1043 23.28550
```

`RStudio` has a built-in spreadsheet if you need to make quick observations or edits

```r
View(sites)
```

]

.pull-right[

```r
summary( sites )
```

```
##      Site             Longitude         Latitude    
##  Length:3           Min.   :-111.7   Min.   :23.29  
##  Class :character   1st Qu.:-111.0   1st Qu.:23.87  
##  Mode  :character   Median :-110.4   Median :24.46  
##                     Mean   :-110.7   Mean   :24.26  
##                     3rd Qu.:-110.2   3rd Qu.:24.74  
##                     Max.   :-110.1   Max.   :25.02
```

]

---

# Tibbles

The `tidyverse` extends a `data.frame` by giving it more functionality.  This is *largely opaque* to us, because any time we use functions from `tidy`, they do the conversions automatically.

```r
library( tidyverse )
```

```
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
```

```
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
```

```
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
```

---

# Loading Data

We can load data from local files (on your computer), from databases (local or external), or from any location we can access a fully qualitified domain name (e.g., a URL).

```r
url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/arapat.csv"
```

```r
samples <- read_csv( url )
```

```
## Rows: 39 Columns: 3
```

```
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Stratum
## dbl (2): Longitude, Latitude
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

---

# Showing the Data

```r
summary( samples )
```

```
##    Stratum            Longitude         Latitude    
##  Length:39          Min.   :-114.3   Min.   :23.08  
##  Class :character   1st Qu.:-112.9   1st Qu.:24.52  
##  Mode  :character   Median :-111.5   Median :26.21  
##                     Mean   :-111.7   Mean   :26.14  
##                     3rd Qu.:-110.4   3rd Qu.:27.47  
##                     Max.   :-109.1   Max.   :29.33
```

Since `read_csv()` produces a tibble itself as output (as do *all functions in tidyverse*), there is no need to convert it from being a vanilla `data.frame`.

```r
class( samples )
```

```
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
```

---

# Sizes of Data Objects

Both `data.frame` and `tibble` objects have a number of rows and columns that make up their dimensions.

```r
nrow( samples )
```

```
## [1] 39
```

```r
ncol( samples )
```

```
## [1] 3
```

```r
dim( samples )
```

```
## [1] 39  3
```

```r
names( samples )
```

```
## [1] "Stratum"   "Longitude" "Latitude"
```

---

.pull-left[
# Visualizing Data

One of the first things I like to do is to look at the data that is being imported and see if there are any obvious problems.  These data have spatial coordinates for sites, so here I'll map it interactively (we'll get to this tomorrow.
]

.pull-right[

&nbsp;

<div id="htmlwidget-920409ae14da2732203f" style="width:504px;height:504px;" class="leaflet html-widget"></div>
<script type="application/json" data-for="htmlwidget-920409ae14da2732203f">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addProviderTiles","args":["OpenTopoMap",null,null,{"errorTileUrl":"","noWrap":false,"detectRetina":false}]},{"method":"addMarkers","args":[[29.32541,29.01457,28.96651,28.72796,28.66056,28.40846,28.22308,28.03661,27.52944,27.3632,27.40498,27.2028,27.18232,27.0367,26.94589,26.24905,26.20876,26.0155,25.91409,25.60521,25.55757,25.34819,25.0247,24.87611,24.74642,24.58843,24.2115,24.45879,24.13389,24.21441,24.0438,24.0195,24.00789,23.2855,23.08984,23.0757,27.90509,26.63783,26.38014],[-114.2935,-113.9449,-113.6679,-113.4897,-113.9914,-112.8698,-113.1826,-113.3999,-113.3161,-112.964,-112.5296,-112.408,-112.6655,-112.986,-112.0461,-112.4095,-111.3783,-111.3547,-112.0806,-111.3264,-111.2156,-111.6006,-111.675,-110.6917,-111.5441,-110.746,-110.951,-110.3686,-110.4624,-110.2725,-109.989,-110.096,-109.8507,-110.1043,-110.1091,-109.6487,-110.5744,-109.327,-109.1263],null,null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},["88","9","84","175","177","173","171","89","159","SFr","160","162","12","161","93","165","169","58","166","64","168","51","Const","77","164","75","163","ESan","153","48","156","157","73","Aqu","Mat","98","101","32","102"],null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[23.0757,29.32541],"lng":[-114.2935,-109.1263]}},"evals":[],"jsHooks":[]}</script>
]

---

# Small Items - Skipping Metadata

Sometimes there are meta-data rows at the top that must be skipped.  Imagine a data file that has the following contents (not too uncommon among people who harbor a mild grudge against most data analysts...)

```
Collected on 7 September 2021
By RJ Dyer
Site, Longitude , Latitude
Const, -111.6750,  25.02470
ESan,  -110.3686,  24.45879
Aqu,   -110.1043,  23.28550
```

---

```r
read_csv( "Collected on 7 September 2021
By RJ Dyer
Site, Longitude , Latitude
Const, -111.6750,  25.02470
ESan,  -110.3686,  24.45879
Aqu,   -110.1043,  23.28550" , skip=2)
```

```
## Rows: 3 Columns: 3
```

```
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Site
## dbl (2): Longitude, Latitude
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## # A tibble: 3 × 3
##   Site  Longitude Latitude
##   <chr>     <dbl>    <dbl>
## 1 Const     -112.     25.0
## 2 ESan      -110.     24.5
## 3 Aqu       -110.     23.3
```
(Note: I'm just passing a multiline string to the `read_csv` function.)

---

# No Column Names 🛑

```r
read_csv( "Const, -111.6750,  25.02470
 ESan,  -110.3686,  24.45879
 Aqu,   -110.1043,  23.28550")
```

```
## Rows: 2 Columns: 3
```

```
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Const
## dbl (2): -111.6750, 25.02470
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## # A tibble: 2 × 3
##   Const `-111.6750` `25.02470`
##   <chr>       <dbl>      <dbl>
## 1 ESan        -110.       24.5
## 2 Aqu         -110.       23.3
```

---

# No Column Names 👍🏾

```r
read_csv( "Const, -111.6750,  25.02470
 ESan,  -110.3686,  24.45879
 Aqu,   -110.1043,  23.28550" , col_names = FALSE)
```

```
## Rows: 3 Columns: 3
```

```
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): X1
## dbl (2): X2, X3
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## # A tibble: 3 × 3
##   X1       X2    X3
##   <chr> <dbl> <dbl>
## 1 Const -112.  25.0
## 2 ESan  -110.  24.5
## 3 Aqu   -110.  23.3
```

---

# Adding or Override Names

```r
read_csv( "Const, -111.6750,  25.02470
 ESan,  -110.3686,  24.45879
 Aqu,   -110.1043,  23.28550", col_names = c("Site","Longitude","Latitude") )
```

```
## Rows: 3 Columns: 3
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## # A tibble: 3 × 3
##   Site  Longitude Latitude
##   <chr>     <dbl>    <dbl>
## 1 Const     -112.     25.0
## 2 ESan      -110.     24.5
## 3 Aqu       -110.     23.3
```

---

# Missing Data

```r
read_csv( "Site, Longitude , Latitude
 Const, ,  25.02470
 ESan,  -110.3686,  
 Aqu,   -110.1043,  23.28550") -> df
```

```
## Rows: 3 Columns: 3
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```r
df
```

```
## # A tibble: 3 × 3
##   Site  Longitude Latitude
##   <chr>     <dbl>    <dbl>
## 1 Const       NA      25.0
## 2 ESan      -110.     NA  
## 3 Aqu       -110.     23.3
```

`NA` is a valid data type!

---

# Dealing with `NA` values

The absence of data, `NA`, is important and `R` makes a big deal about warning you when you have missing data so you do not make improper inferences.

```r
df$Longitude 
```

```
## [1]        NA -110.3686 -110.1043
```

```r
mean( df$Longitude )
```

```
## [1] NA
```

**By Default**, `R` does not *assume* that you want to ignore the missing data, you **must** tell it to do so each time.

```r
mean( df$Longitude, na.rm=TRUE )
```

```
## [1] -110.2364
```

---

# Missing Data Non-Traditional

```r
read_csv( "Site, Longitude , Latitude
 Const, -9 ,  25.02470
 ESan,  -110.3686,  -9 
 Aqu,   -110.1043,  23.28550", na="-9")
```

```
## Rows: 3 Columns: 3
```

```
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
## # A tibble: 3 × 3
##   Site  Longitude Latitude
##   <chr>     <dbl>    <dbl>
## 1 Const       NA      25.0
## 2 ESan      -110.     NA  
## 3 Aqu       -110.     23.3
```

---

# Slicing and Dicing

When we take a slice of data from a `tibble` (or `data.frame`), how we ask for it may determine the nature of what is returned to us.

```r
summary( samples )
```

```r
class( samples$Stratum )
```

```
## [1] "character"
```

```r
class( samples[,1])
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```

---

# Subsets by Position

```r
samples[1:10, ]
```

```
## # A tibble: 10 × 3
##    Stratum Longitude Latitude
##    <chr>       <dbl>    <dbl>
##  1 88          -114.     29.3
##  2 9           -114.     29.0
##  3 84          -114.     29.0
##  4 175         -113.     28.7
##  5 177         -114.     28.7
##  6 173         -113.     28.4
##  7 171         -113.     28.2
##  8 89          -113.     28.0
##  9 159         -113.     27.5
## 10 SFr         -113.     27.4
```

---

# Filtering By Logic

So far, we've used the actual row numbers to grab data from the tibble.  We can also use logic based upon data within the table itself.

Remember, that a relational operator will return `TRUE` or `FALSE` and we can use that to filter the whole thing.  Here is how we'd find all data where the latitude was greater than -110.

```r
samples[ samples$Longitude > -110,]
```

```
## # A tibble: 5 × 3
##   Stratum Longitude Latitude
##   <chr>       <dbl>    <dbl>
## 1 156         -110.     24.0
## 2 73          -110.     24.0
## 3 98          -110.     23.1
## 4 32          -109.     26.6
## 5 102         -109.     26.4
```

Notice the columns designation is left blank (so we are getting all of them.)

---

# Filtering Individual Data

We can use some of the fancy string stuff we learned previously to pull out only the names of the sites that match a certain regular expression (here they must start with either `C`, `E`, or `S`).  Using the `$` notation returns the results as a vector.

```r
samples$Stratum[ str_detect( samples$Stratum, "^[CES]") ]
```

```
## [1] "SFr"   "Const" "ESan"
```

But using the square bracket notation (rows and indicating numerically which column), returns the result as a tibble.

```r
samples[str_detect( samples$Stratum, "^[CES]"),1]
```

```
## # A tibble: 3 × 1
##   Stratum
##   <chr>  
## 1 SFr    
## 2 Const  
## 3 ESan
```

---

# Adding New Data Columns

Adding new columns always post-pends them onto the right side of the tibble.

```r
samples$ID <- 1:39
samples
```

```
## # A tibble: 39 × 4
##    Stratum Longitude Latitude    ID
##    <chr>       <dbl>    <dbl> <int>
##  1 88          -114.     29.3     1
##  2 9           -114.     29.0     2
##  3 84          -114.     29.0     3
##  4 175         -113.     28.7     4
##  5 177         -114.     28.7     5
##  6 173         -113.     28.4     6
##  7 171         -113.     28.2     7
##  8 89          -113.     28.0     8
##  9 159         -113.     27.5     9
## 10 SFr         -113.     27.4    10
## # … with 29 more rows
```

---

# Changing Individual Values

.pull-left[
By column variable name

```r
samples$ID[2] <- 42
samples
```

```
## # A tibble: 39 × 4
##    Stratum Longitude Latitude    ID
##    <chr>       <dbl>    <dbl> <dbl>
##  1 88          -114.     29.3     1
##  2 9           -114.     29.0    42
##  3 84          -114.     29.0     3
##  4 175         -113.     28.7     4
##  5 177         -114.     28.7     5
##  6 173         -113.     28.4     6
##  7 171         -113.     28.2     7
##  8 89          -113.     28.0     8
##  9 159         -113.     27.5     9
## 10 SFr         -113.     27.4    10
## # … with 29 more rows
```

]

.pull-right[
By index coordinate.

```r
samples[2,4] <- 24
samples
```

```
## # A tibble: 39 × 4
##    Stratum Longitude Latitude    ID
##    <chr>       <dbl>    <dbl> <dbl>
##  1 88          -114.     29.3     1
##  2 9           -114.     29.0    24
##  3 84          -114.     29.0     3
##  4 175         -113.     28.7     4
##  5 177         -114.     28.7     5
##  6 173         -113.     28.4     6
##  7 171         -113.     28.2     7
##  8 89          -113.     28.0     8
##  9 159         -113.     27.5     9
## 10 SFr         -113.     27.4    10
## # … with 29 more rows
```

]

---

# Forced Coercion

```r
samples$ID[2] <- "Bob"
samples
```

```
## # A tibble: 39 × 4
##    Stratum Longitude Latitude ID   
##    <chr>       <dbl>    <dbl> <chr>
##  1 88          -114.     29.3 1    
##  2 9           -114.     29.0 Bob  
##  3 84          -114.     29.0 3    
##  4 175         -113.     28.7 4    
##  5 177         -114.     28.7 5    
##  6 173         -113.     28.4 6    
##  7 171         -113.     28.2 7    
##  8 89          -113.     28.0 8    
##  9 159         -113.     27.5 9    
## 10 SFr         -113.     27.4 10   
## # … with 29 more rows
```

---

# Deleting Content

.pull-left[
Individual values in a column can be deleted by assigning it `NA`, a missing value.  The *Recycle Rule* we saw above, will repeat the `NA` throughout the whole column.

```r
samples$ID <- NA 
samples 
```

```
## # A tibble: 39 × 4
##    Stratum Longitude Latitude ID   
##    <chr>       <dbl>    <dbl> <lgl>
##  1 88          -114.     29.3 NA   
##  2 9           -114.     29.0 NA   
##  3 84          -114.     29.0 NA   
##  4 175         -113.     28.7 NA   
##  5 177         -114.     28.7 NA   
##  6 173         -113.     28.4 NA   
##  7 171         -113.     28.2 NA   
##  8 89          -113.     28.0 NA   
##  9 159         -113.     27.5 NA   
## 10 SFr         -113.     27.4 NA   
## # … with 29 more rows
```
]

.pull-right[

To entirely delete the column, instead of just assigning all the elemnets to be missing, can be accomplished by setting the whole column equal to `NULL`

```r
samples$ID <- NULL 
samples 
```

```
## # A tibble: 39 × 3
##    Stratum Longitude Latitude
##    <chr>       <dbl>    <dbl>
##  1 88          -114.     29.3
##  2 9           -114.     29.0
##  3 84          -114.     29.0
##  4 175         -113.     28.7
##  5 177         -114.     28.7
##  6 173         -113.     28.4
##  7 171         -113.     28.2
##  8 89          -113.     28.0
##  9 159         -113.     27.5
## 10 SFr         -113.     27.4
## # … with 29 more rows
```
]

---

# Adding Rows of Content

To add additional Rows of content, we need to put the new data into their own `data.frame` or `tibble`

```r
tibble( 
  Stratum = c("Los Barriles","Comondu"),
  Longitude = c(-109.7026, -111.8442),
  Latitude = c(23.6811, 26.0708) 
) -> newSites
newSites
```

```
## # A tibble: 2 × 3
##   Stratum      Longitude Latitude
##   <chr>            <dbl>    <dbl>
## 1 Los Barriles     -110.     23.7
## 2 Comondu          -112.     26.1
```

---

# Adding Rows of Content

And then `bind` it onto the existing sample.

```r
samples <- rbind( samples, newSites)

tail( samples )
```

```
## # A tibble: 6 × 3
##   Stratum      Longitude Latitude
##   <chr>            <dbl>    <dbl>
## 1 98               -110.     23.1
## 2 101              -111.     27.9
## 3 32               -109.     26.6
## 4 102              -109.     26.4
## 5 Los Barriles     -110.     23.7
## 6 Comondu          -112.     26.1
```

---

# Deleting Rows

To delete rows, you use negative row indices.

```r
dim(samples)
```

```
## [1] 41  3
```

```r
samples <- samples[-41:-39,]
dim(samples)
```

```
## [1] 38  3
```

Notice: For all of this "add on" and "delete" stuff, if we want it to **persist** we .red[must] reassign the values back onto the original variable.

---

# Real Names

While not quite critical here, we often have the need to use more descriptive names for our data columns, some of which need to have spaces to be fully descriptive.  One of the last benefits of a `tibble` I'll discuss here, is that it allows for spaces in the names of data columns.

```r
names(samples)
```

```
## [1] "Stratum"   "Longitude" "Latitude"
```

```r
names( samples )[1] <- "Population Name"
samples 
```

```
## # A tibble: 38 × 3
##    `Population Name` Longitude Latitude
##    <chr>                 <dbl>    <dbl>
##  1 88                    -114.     29.3
##  2 9                     -114.     29.0
##  3 84                    -114.     29.0
##  4 175                   -113.     28.7
##  5 177                   -114.     28.7
##  6 173                   -113.     28.4
##  7 171                   -113.     28.2
##  8 89                    -113.     28.0
##  9 159                   -113.     27.5
## 10 SFr                   -113.     27.4
## # … with 28 more rows
```

---

# Accessing Spaced Out Columns

`RStudio` will properly autoinsert all valid column names if you hit the tab button for you.  However, if you are doing it manually, surround the name of the data column in a backtick (that is the character on the upper left corner of your keyboard).

```r
samples$`Population Name`
```

```
##  [1] "88"    "9"     "84"    "175"   "177"   "173"   "171"   "89"    "159"  
## [10] "SFr"   "160"   "162"   "12"    "161"   "93"    "165"   "169"   "58"   
## [19] "166"   "64"    "168"   "51"    "Const" "77"    "164"   "75"    "163"  
## [28] "ESan"  "153"   "48"    "156"   "157"   "73"    "Aqu"   "Mat"   "98"   
## [37] "101"   "32"
```

---

## BIG ACTIVITY - Your DATA!

Create a new R script called `data_frames.R` in the project folder.

1. At the top of the file, insert the line `library(tidyverse)` so that the script will load in the proper libraries.

2. Use the function `read_csv()` to load in the file as a variable named something that is meaningful for you.  The file *should be* in the same folder as the script (if you followed the instructions from the very first lecture), so you only have to give the name of the file (in quotes-it is a character variable) of the csv file.

3. What are the names of the columns of data in this object? What do they actually measure?

4. How many measurements are taken at each record and how many records are present?

5. How deep was the deepest measurement? What was the coldest temperature?

6. Assign the depth measurement to a variable named `depth` and do the same for the `Do_Optical` data (but of course name it something else appropriate).  What are the mean values for each of variables?

7. **BONUS** The function `plot(x,y)` will make a quick scatter plot of two variables with the variables in `x` on the (wait for it) x-axis and those in the variable `y` on the y-axis.  Use the data from the last question to make the magnificent display of Do as a function of depth.

---

class: middle
background-image: url("images/contour.png")
background-position: right
background-size: auto

.center[

![## Any Questions](https://media.giphy.com/media/G0vYU697uKl0IiIJO2/giphy.gif)

&nbsp;

## Ask away!
]