Vectors & Data Frames

Vectors and data frames are the foundation of data analysis in R. Essentially, most everything we work with will be contained within one of these container types. As such, it is important for us to get a good understanding and gain a high level of comfort and understanding of how to access and set data in these structures.

In this and most of the following homework and presentations, I will use the generic term “data frame” to indicate a suite of data that has several records and individual measurements on each record. These will typically be tibble objects rather than the older data.frame ones unless otherwise stated. As such, we shall need to import tidyverse at the beginning.

library( tidyverse )
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.4     ✓ dplyr   1.0.7
✓ tidyr   1.1.3     ✓ stringr 1.4.0
✓ readr   2.0.1     ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Creating

In the chunk below, create three vectors of data. one for for names, one for age, and another for grade.

# vectors of different types

Now, create a tibble from these vectors.

# tibble

Add a new column of data to this data frame (you can make it up, which is what I did with the data above…).

## Add new column

Add a new row of data to the data.frame then summarize it.

## Add new row

Manipulating Data

In reality, we spend very little time working with data as small as this, so let’s jump into a slightly larger data set. There is a built-in data set measuring air quality in New York entitled airquality (I know, tricky right?). It is available on every stock installation and this is what it looks like.

summary( airquality )
     Ozone           Solar.R           Wind             Temp      
 Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
 1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
 Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
 Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
 3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
 Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
 NA's   :37       NA's   :7                                       
     Month            Day      
 Min.   :5.000   Min.   : 1.0  
 1st Qu.:6.000   1st Qu.: 8.0  
 Median :7.000   Median :16.0  
 Mean   :6.993   Mean   :15.8  
 3rd Qu.:8.000   3rd Qu.:23.0  
 Max.   :9.000   Max.   :31.0  
                               

Let’s make a copy of these built-in data and turn it into a tibble.

data <- as_tibble( airquality )
summary( data  )
     Ozone           Solar.R           Wind             Temp      
 Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
 1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
 Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
 Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
 3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
 Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
 NA's   :37       NA's   :7                                       
     Month            Day      
 Min.   :5.000   Min.   : 1.0  
 1st Qu.:6.000   1st Qu.: 8.0  
 Median :7.000   Median :16.0  
 Mean   :6.993   Mean   :15.8  
 3rd Qu.:8.000   3rd Qu.:23.0  
 Max.   :9.000   Max.   :31.0  
                               

Manipulate the data frame in the following ways:

  1. These data were collected in 1973. Create a new column of data that represents a textual version of the date (month, day, and year) then drop (delete) the columns Month and Day.
## New Column for compound dates
data
  1. Conver the temperature from F to C.
# Convert F -> C 
  1. Change the name of the Temp column to Temperature °C and the Solar.R to Solar Radiation in an attempt to practice what is known as ‘Literate Programming’.
# set proper names for columns 

And show your creation by using head() to reveal the first 6 rows.

### Show the first few rows

Extracting Data Questions

OK, now we have some data to work with, use that data frame to extract the following information and answer the following questions.

  1. What were the hottest and coldest dates recorded in this data set?

  2. How many of the days in the data recorded higher than the average wind speed?

  3. How many rows of data are there with no missing values for any of recorded observations?

  4. On what days was the solar radiation greater than 300 Langleys in Central Park?

LS0tCnRpdGxlOiAiVmVjdG9ycyAmIERhdGEgRnJhbWVzIEhvbWV3b3JrIgphdXRob3I6ICJZb3VyIE5hbWUgSGVyZSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgVmVjdG9ycyAmIERhdGEgRnJhbWVzCgpWZWN0b3JzIGFuZCBkYXRhIGZyYW1lcyBhcmUgdGhlIGZvdW5kYXRpb24gb2YgZGF0YSBhbmFseXNpcyBpbiBSLiAgRXNzZW50aWFsbHksIG1vc3QgZXZlcnl0aGluZyB3ZSB3b3JrIHdpdGggd2lsbCBiZSBjb250YWluZWQgd2l0aGluIG9uZSBvZiB0aGVzZSBjb250YWluZXIgdHlwZXMuICBBcyBzdWNoLCBpdCBpcyBpbXBvcnRhbnQgZm9yIHVzIHRvIGdldCBhIGdvb2QgdW5kZXJzdGFuZGluZyBhbmQgZ2FpbiBhIGhpZ2ggbGV2ZWwgb2YgY29tZm9ydCBhbmQgdW5kZXJzdGFuZGluZyBvZiBob3cgdG8gYWNjZXNzIGFuZCBzZXQgZGF0YSBpbiB0aGVzZSBzdHJ1Y3R1cmVzLiAKCkluIHRoaXMgYW5kIG1vc3Qgb2YgdGhlIGZvbGxvd2luZyBob21ld29yayBhbmQgcHJlc2VudGF0aW9ucywgSSB3aWxsIHVzZSB0aGUgZ2VuZXJpYyB0ZXJtICJkYXRhIGZyYW1lIiAgdG8gaW5kaWNhdGUgYSBzdWl0ZSBvZiBkYXRhIHRoYXQgaGFzIHNldmVyYWwgcmVjb3JkcyBhbmQgaW5kaXZpZHVhbCBtZWFzdXJlbWVudHMgb24gZWFjaCByZWNvcmQuICBUaGVzZSB3aWxsIHR5cGljYWxseSBiZSBgdGliYmxlYCBvYmplY3RzIHJhdGhlciB0aGFuIHRoZSBvbGRlciBgZGF0YS5mcmFtZWAgb25lcyB1bmxlc3Mgb3RoZXJ3aXNlIHN0YXRlZC4gQXMgc3VjaCwgd2Ugc2hhbGwgbmVlZCB0byBpbXBvcnQgYHRpZHl2ZXJzZWAgYXQgdGhlIGJlZ2lubmluZy4KCgpgYGB7cn0KbGlicmFyeSggdGlkeXZlcnNlICkKYGBgCgoKIyMjIENyZWF0aW5nIAoKSW4gdGhlIGNodW5rIGJlbG93LCBjcmVhdGUgdGhyZWUgdmVjdG9ycyBvZiBkYXRhLiAgb25lIGZvciBmb3IgbmFtZXMsIG9uZSBmb3IgYWdlLCBhbmQgYW5vdGhlciBmb3IgZ3JhZGUuCgpgYGB7cn0KIyB2ZWN0b3JzIG9mIGRpZmZlcmVudCB0eXBlcwpgYGAKCk5vdywgY3JlYXRlIGEgdGliYmxlIGZyb20gdGhlc2UgdmVjdG9ycy4KCmBgYHtyfQojIHRpYmJsZQpgYGAKCgpBZGQgYSBuZXcgY29sdW1uIG9mIGRhdGEgdG8gdGhpcyBkYXRhIGZyYW1lICh5b3UgY2FuIG1ha2UgaXQgdXAsIHdoaWNoIGlzIHdoYXQgSSBkaWQgd2l0aCB0aGUgZGF0YSBhYm92ZS4uLikuIAoKYGBge3J9CiMjIEFkZCBuZXcgY29sdW1uCmBgYAoKCkFkZCBhIG5ldyByb3cgb2YgZGF0YSB0byB0aGUgYGRhdGEuZnJhbWVgIHRoZW4gc3VtbWFyaXplIGl0LiAKCmBgYHtyfQojIyBBZGQgbmV3IHJvdwpgYGAKCgoKIyMjIE1hbmlwdWxhdGluZyBEYXRhCgpJbiByZWFsaXR5LCB3ZSBzcGVuZCB2ZXJ5IGxpdHRsZSB0aW1lIHdvcmtpbmcgd2l0aCBkYXRhIGFzIHNtYWxsIGFzIHRoaXMsIHNvIGxldCdzIGp1bXAgaW50byBhIHNsaWdodGx5IGxhcmdlciBkYXRhIHNldC4gIFRoZXJlIGlzIGEgYnVpbHQtaW4gZGF0YSBzZXQgbWVhc3VyaW5nIGFpciBxdWFsaXR5IGluIE5ldyBZb3JrIGVudGl0bGVkIGBhaXJxdWFsaXR5YCAoSSBrbm93LCB0cmlja3kgcmlnaHQ/KS4gIEl0IGlzIGF2YWlsYWJsZSBvbiBldmVyeSBzdG9jayBpbnN0YWxsYXRpb24gYW5kIHRoaXMgaXMgd2hhdCBpdCBsb29rcyBsaWtlLgoKYGBge3J9CnN1bW1hcnkoIGFpcnF1YWxpdHkgKQpgYGAKCkxldCdzIG1ha2UgYSBjb3B5IG9mIHRoZXNlIGJ1aWx0LWluIGRhdGEgYW5kIHR1cm4gaXQgaW50byBhIHRpYmJsZS4gIAoKYGBge3J9CmRhdGEgPC0gYXNfdGliYmxlKCBhaXJxdWFsaXR5ICkKc3VtbWFyeSggZGF0YSAgKQpgYGAKCgoKCk1hbmlwdWxhdGUgdGhlIGRhdGEgZnJhbWUgaW4gdGhlIGZvbGxvd2luZyB3YXlzOiAgCgoxLiBUaGVzZSBkYXRhIHdlcmUgY29sbGVjdGVkIGluIDE5NzMuICBDcmVhdGUgYSBuZXcgY29sdW1uIG9mIGRhdGEgdGhhdCByZXByZXNlbnRzIGEgdGV4dHVhbCB2ZXJzaW9uIG9mIHRoZSBkYXRlIChtb250aCwgZGF5LCBhbmQgeWVhcikgdGhlbiBkcm9wIChkZWxldGUpIHRoZSBjb2x1bW5zIE1vbnRoIGFuZCBEYXkuCgpgYGB7cn0KIyMgTmV3IENvbHVtbiBmb3IgY29tcG91bmQgZGF0ZXMKZGF0YQpgYGAKCgoyLiBDb252ZXIgdGhlIHRlbXBlcmF0dXJlIGZyb20gRiB0byBDLgoKYGBge3J9CiMgQ29udmVydCBGIC0+IEMgCmBgYAoKCjMuIENoYW5nZSB0aGUgbmFtZSBvZiB0aGUgYFRlbXBgIGNvbHVtbiB0byBgVGVtcGVyYXR1cmUgwrBDYCBhbmQgdGhlIGBTb2xhci5SYCB0byBgU29sYXIgUmFkaWF0aW9uYCBpbiBhbiBhdHRlbXB0IHRvIHByYWN0aWNlIHdoYXQgaXMga25vd24gYXMgJ1tMaXRlcmF0ZSBQcm9ncmFtbWluZ10oaHR0cDovL3d3dy5saXRlcmF0ZXByb2dyYW1taW5nLmNvbSknLgoKCmBgYHtyfQojIHNldCBwcm9wZXIgbmFtZXMgZm9yIGNvbHVtbnMgCmBgYAoKQW5kIHNob3cgeW91ciBjcmVhdGlvbiBieSB1c2luZyBgaGVhZCgpYCB0byByZXZlYWwgdGhlIGZpcnN0IDYgcm93cy4KCmBgYHtyfQojIyMgU2hvdyB0aGUgZmlyc3QgZmV3IHJvd3MKYGBgCgoKIyMgRXh0cmFjdGluZyBEYXRhIFF1ZXN0aW9ucwoKT0ssIG5vdyB3ZSBoYXZlIHNvbWUgZGF0YSB0byB3b3JrIHdpdGgsIHVzZSB0aGF0IGRhdGEgZnJhbWUgdG8gZXh0cmFjdCB0aGUgZm9sbG93aW5nIGluZm9ybWF0aW9uIGFuZCBhbnN3ZXIgdGhlIGZvbGxvd2luZyBxdWVzdGlvbnMuCgoKMS4gIFdoYXQgd2VyZSB0aGUgaG90dGVzdCBhbmQgY29sZGVzdCBkYXRlcyByZWNvcmRlZCBpbiB0aGlzIGRhdGEgc2V0PyAgIAoKMi4gIEhvdyBtYW55IG9mIHRoZSBkYXlzIGluIHRoZSBkYXRhIHJlY29yZGVkIGhpZ2hlciB0aGFuIHRoZSBhdmVyYWdlIHdpbmQgc3BlZWQ/IAoKMy4gSG93IG1hbnkgcm93cyBvZiBkYXRhIGFyZSB0aGVyZSB3aXRoIG5vIG1pc3NpbmcgdmFsdWVzIGZvciBhbnkgb2YgcmVjb3JkZWQgb2JzZXJ2YXRpb25zPwoKNC4gT24gd2hhdCBkYXlzIHdhcyB0aGUgc29sYXIgcmFkaWF0aW9uIGdyZWF0ZXIgdGhhbiAzMDAgTGFuZ2xleXMgaW4gQ2VudHJhbCBQYXJrPwoK