This homework is going to play around with relational joins. We will use some flight data from 2013 in New York City1.

library( nycflights13 )

This library has five different data frames. Here is an overview of the raw data.

The Data

A table of airline carrier abbreviations (N = 16 different carriers).

head( airlines )

Information on the airports (N = 1458 airports).

head( airports )

Mostly in the US.

Weather conditions at each airport (a total of 26115 of them!)

head(weather)

And a list of airplanes (N = 3322).

head( planes )

From all of these auxiliary data, we can construct a series of flights which has the following variables in it.

names( flights )
 [1] "year"           "month"          "day"            "dep_time"      
 [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
 [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
[13] "origin"         "dest"           "air_time"       "distance"      
[17] "hour"           "minute"         "time_hour"     
year

month

day

dep_time

sched_dep_time

dep_delay

arr_time

sched_arr_time

arr_delay

carrier

flight

tailnum

origin

dest

air_time

distance

hour

minute

time_hour

If you look at the flight table, you will see that several columns of data here are connected to the other data frames described above.

flights Foreign Key Data From
carrier airlines
dest airports
origin airport
tailnum planes
time_hour weather

The reason that these tables have keys connecting one to the other is that there is an economy of scale for very large data sets if we partition it into realtional components such as this.

Questions

Use the tables above to answer the following questions.

  1. Which carriers have planes with over 100 flights in 2013 (and how many)?
  1. During what 48 hour period were the worst delays? Is there any connection with weather?
  1. What are the five best and five worst airlines for keeping on schedule?

  2. Independent of the length of the flight, are you more likely to be delayed by taking a large plane or a small plane?

  3. Do all planes have tail numbers?


  1. If you get an error when attempting to load this library in, install it with install.packages("nycflights13") and then load it in again.↩︎

LS0tCnRpdGxlOiAiUmVsYXRpb25hbCBEYXRhIEhvbWV3b3JrIgphdXRob3I6IFlPVVJfTkFNRV9IRVJFCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0KbGlicmFyeSggdGlkeXZlcnNlICkKbGlicmFyeSgga25pdHIgKQprbml0cjo6b3B0c19jaHVuayRzZXQoIHdhcm5pbmcgPSBGQUxTRSwgCiAgICAgICAgICAgICAgICAgICAgICAgZXJyb3IgPSBGQUxTRSwKICAgICAgICAgICAgICAgICAgICAgICBmaWcucmV0aW5hID0gMykKYGBgCgpUaGlzIGhvbWV3b3JrIGlzIGdvaW5nIHRvIHBsYXkgYXJvdW5kIHdpdGggcmVsYXRpb25hbCBqb2lucy4gIFdlIHdpbGwgdXNlIHNvbWUgZmxpZ2h0IGRhdGEgZnJvbSAyMDEzIGluIE5ldyBZb3JrIENpdHlbXjFdLgoKYGBge3J9CmxpYnJhcnkoIG55Y2ZsaWdodHMxMyApCmBgYAoKVGhpcyBsaWJyYXJ5IGhhcyBmaXZlIGRpZmZlcmVudCBkYXRhIGZyYW1lcy4gIEhlcmUgaXMgYW4gb3ZlcnZpZXcgb2YgdGhlIHJhdyBkYXRhLgoKCiMjIFRoZSBEYXRhIAoKQSB0YWJsZSBvZiBhaXJsaW5lIGNhcnJpZXIgYWJicmV2aWF0aW9ucyAoTiA9IGByIG5yb3coYWlybGluZXMpYCBkaWZmZXJlbnQgY2FycmllcnMpLgoKYGBge3J9CmhlYWQoIGFpcmxpbmVzICkKYGBgCgpJbmZvcm1hdGlvbiBvbiB0aGUgYWlycG9ydHMgKE4gPSBgciBucm93KGFpcnBvcnRzKWAgYWlycG9ydHMpLgoKYGBge3J9CmhlYWQoIGFpcnBvcnRzICkKYGBgCk1vc3RseSBpbiB0aGUgVVMuCgpgYGB7ciBlY2hvPUZBTFNFLCBmaWcuY2FwPSJBaXJwb3J0dHMgaW4gY29udGluZW50YWwgVW5pdGVkIFN0YXRlcy4iLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQppZighcmVxdWlyZShsZWFmbGV0KSl7CiAgaW5zdGFsbC5wYWNrYWdlcygibGVhZmxldCIpCn0KCmxpYnJhcnkoIGxlYWZsZXQgKQphaXJwb3J0cyAlPiUKICBmaWx0ZXIoIGxvbiA8IDAsCiAgICAgICAgICBsb24gPiAtMTMwLAogICAgICAgICAgbGF0IDwgNTQpICU+JQogIGxlYWZsZXQoICkgJT4lCiAgYWRkTWFya2VycyggbGF0ID0gfmxhdCwKICAgICAgICAgICAgICBsbmcgPSB+bG9uLAogICAgICAgICAgICAgIHBvcHVwID0gfmZhYSkgJT4lCiAgYWRkUHJvdmlkZXJUaWxlcyggcHJvdmlkZXI9Ik9wZW5Ub3BvTWFwIikKYGBgCgpXZWF0aGVyIGNvbmRpdGlvbnMgYXQgZWFjaCBhaXJwb3J0IChhIHRvdGFsIG9mIGByIG5yb3cod2VhdGhlcilgIG9mIHRoZW0hKQoKYGBge3J9CmhlYWQod2VhdGhlcikKYGBgCkFuZCBhIGxpc3Qgb2YgYWlycGxhbmVzIChOID0gYHIgbnJvdyggcGxhbmVzIClgKS4KCmBgYHtyfQpoZWFkKCBwbGFuZXMgKQpgYGAKCkZyb20gYWxsIG9mIHRoZXNlIGF1eGlsaWFyeSBkYXRhLCB3ZSBjYW4gY29uc3RydWN0IGEgc2VyaWVzIG9mIGZsaWdodHMgd2hpY2ggaGFzIHRoZSBmb2xsb3dpbmcgdmFyaWFibGVzIGluIGl0LgoKYGBge3J9Cm5hbWVzKCBmbGlnaHRzICkKYGBgCgpJZiB5b3UgbG9vayBhdCB0aGUgZmxpZ2h0IHRhYmxlLCB5b3Ugd2lsbCBzZWUgdGhhdCBzZXZlcmFsIGNvbHVtbnMgb2YgZGF0YSBoZXJlIGFyZSAqY29ubmVjdGVkKiB0byB0aGUgb3RoZXIgZGF0YSBmcmFtZXMgZGVzY3JpYmVkIGFib3ZlLgoKYGZsaWdodHNgIEZvcmVpZ24gS2V5IHwgRGF0YSBGcm9tIAotLS0tLS0tLS0tLS0tLS0tLS0tLS0tfC0tLS0tLS0tLS0tLS0tLS0KY2FycmllciAgICAgICAgICAgICAgIHwgYWlybGluZXMKZGVzdCAgICAgICAgICAgICAgICAgIHwgYWlycG9ydHMKb3JpZ2luICAgICAgICAgICAgICAgIHwgYWlycG9ydAp0YWlsbnVtICAgICAgICAgICAgICAgfCBwbGFuZXMgCnRpbWVfaG91ciAgICAgICAgICAgICB8IHdlYXRoZXIKClRoZSByZWFzb24gdGhhdCB0aGVzZSB0YWJsZXMgaGF2ZSAqa2V5cyogY29ubmVjdGluZyBvbmUgdG8gdGhlIG90aGVyIGlzIHRoYXQgdGhlcmUgaXMgYW4gZWNvbm9teSBvZiBzY2FsZSBmb3IgdmVyeSBsYXJnZSBkYXRhIHNldHMgaWYgd2UgcGFydGl0aW9uIGl0IGludG8gcmVhbHRpb25hbCBjb21wb25lbnRzIHN1Y2ggYXMgdGhpcy4KCgojIFF1ZXN0aW9ucwoKVXNlIHRoZSB0YWJsZXMgYWJvdmUgdG8gYW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zLgoKMS4gV2hpY2ggY2FycmllcnMgaGF2ZSBwbGFuZXMgd2l0aCBvdmVyIDEwMCBmbGlnaHRzIGluIDIwMTMgKGFuZCBob3cgbWFueSk/ICAKCmBgYHtyfQoKYGBgCgoKCgoyLiBEdXJpbmcgd2hhdCA0OCBob3VyIHBlcmlvZCB3ZXJlIHRoZSB3b3JzdCBkZWxheXM/ICBJcyB0aGVyZSBhbnkgY29ubmVjdGlvbiB3aXRoIHdlYXRoZXI/CgoKYGBge3J9CgpgYGAKCgoKCgoKMy4gV2hhdCBhcmUgdGhlIGZpdmUgYmVzdCBhbmQgZml2ZSB3b3JzdCBhaXJsaW5lcyBmb3Iga2VlcGluZyBvbiBzY2hlZHVsZT8KCjQuIEluZGVwZW5kZW50IG9mIHRoZSBsZW5ndGggb2YgdGhlIGZsaWdodCwgYXJlIHlvdSBtb3JlIGxpa2VseSB0byBiZSBkZWxheWVkIGJ5IHRha2luZyBhIGxhcmdlIHBsYW5lIG9yIGEgc21hbGwgcGxhbmU/IAoKNS4gRG8gYWxsIHBsYW5lcyBoYXZlIHRhaWwgbnVtYmVycz8KCgoKCgoKCgpbXjFdOiBJZiB5b3UgZ2V0IGFuIGVycm9yIHdoZW4gYXR0ZW1wdGluZyB0byBsb2FkIHRoaXMgbGlicmFyeSBpbiwgaW5zdGFsbCBpdCB3aXRoIGBpbnN0YWxsLnBhY2thZ2VzKCJueWNmbGlnaHRzMTMiKWAgYW5kIHRoZW4gbG9hZCBpdCBpbiBhZ2Fpbi4g