This homework is going to play around with relational joins. We will use some flight data from 2013 in New York City.
library( nycflights13 )
This library has five different data frames. Here is an overview of the raw data.
The Data
A table of airline carrier abbreviations (N = 16 different carriers).
head( airlines )
Information on the airports (N = 1458 airports).
head( airports )
Mostly in the US.
Weather conditions at each airport (a total of 26115 of them!)
head(weather)
And a list of airplanes (N = 3322).
head( planes )
From all of these auxiliary data, we can construct a series of flights which has the following variables in it.
names( flights )
[1] "year" "month" "day" "dep_time"
[5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time"
[9] "arr_delay" "carrier" "flight" "tailnum"
[13] "origin" "dest" "air_time" "distance"
[17] "hour" "minute" "time_hour"
year
month
day
dep_time
sched_dep_time
dep_delay
arr_time
sched_arr_time
arr_delay
carrier
flight
tailnum
origin
dest
air_time
distance
hour
minute
time_hour
If you look at the flight table, you will see that several columns of data here are connected to the other data frames described above.
carrier |
airlines |
dest |
airports |
origin |
airport |
tailnum |
planes |
time_hour |
weather |
The reason that these tables have keys connecting one to the other is that there is an economy of scale for very large data sets if we partition it into realtional components such as this.
Questions
Use the tables above to answer the following questions.
- Which carriers have planes with over 100 flights in 2013 (and how many)?
- During what 48 hour period were the worst delays? Is there any connection with weather?
What are the five best and five worst airlines for keeping on schedule?
Independent of the length of the flight, are you more likely to be delayed by taking a large plane or a small plane?
Do all planes have tail numbers?
LS0tCnRpdGxlOiAiUmVsYXRpb25hbCBEYXRhIEhvbWV3b3JrIgphdXRob3I6IFlPVVJfTkFNRV9IRVJFCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0KbGlicmFyeSggdGlkeXZlcnNlICkKbGlicmFyeSgga25pdHIgKQprbml0cjo6b3B0c19jaHVuayRzZXQoIHdhcm5pbmcgPSBGQUxTRSwgCiAgICAgICAgICAgICAgICAgICAgICAgZXJyb3IgPSBGQUxTRSwKICAgICAgICAgICAgICAgICAgICAgICBmaWcucmV0aW5hID0gMykKYGBgCgpUaGlzIGhvbWV3b3JrIGlzIGdvaW5nIHRvIHBsYXkgYXJvdW5kIHdpdGggcmVsYXRpb25hbCBqb2lucy4gIFdlIHdpbGwgdXNlIHNvbWUgZmxpZ2h0IGRhdGEgZnJvbSAyMDEzIGluIE5ldyBZb3JrIENpdHlbXjFdLgoKYGBge3J9CmxpYnJhcnkoIG55Y2ZsaWdodHMxMyApCmBgYAoKVGhpcyBsaWJyYXJ5IGhhcyBmaXZlIGRpZmZlcmVudCBkYXRhIGZyYW1lcy4gIEhlcmUgaXMgYW4gb3ZlcnZpZXcgb2YgdGhlIHJhdyBkYXRhLgoKCiMjIFRoZSBEYXRhIAoKQSB0YWJsZSBvZiBhaXJsaW5lIGNhcnJpZXIgYWJicmV2aWF0aW9ucyAoTiA9IGByIG5yb3coYWlybGluZXMpYCBkaWZmZXJlbnQgY2FycmllcnMpLgoKYGBge3J9CmhlYWQoIGFpcmxpbmVzICkKYGBgCgpJbmZvcm1hdGlvbiBvbiB0aGUgYWlycG9ydHMgKE4gPSBgciBucm93KGFpcnBvcnRzKWAgYWlycG9ydHMpLgoKYGBge3J9CmhlYWQoIGFpcnBvcnRzICkKYGBgCk1vc3RseSBpbiB0aGUgVVMuCgpgYGB7ciBlY2hvPUZBTFNFLCBmaWcuY2FwPSJBaXJwb3J0dHMgaW4gY29udGluZW50YWwgVW5pdGVkIFN0YXRlcy4iLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQppZighcmVxdWlyZShsZWFmbGV0KSl7CiAgaW5zdGFsbC5wYWNrYWdlcygibGVhZmxldCIpCn0KCmxpYnJhcnkoIGxlYWZsZXQgKQphaXJwb3J0cyAlPiUKICBmaWx0ZXIoIGxvbiA8IDAsCiAgICAgICAgICBsb24gPiAtMTMwLAogICAgICAgICAgbGF0IDwgNTQpICU+JQogIGxlYWZsZXQoICkgJT4lCiAgYWRkTWFya2VycyggbGF0ID0gfmxhdCwKICAgICAgICAgICAgICBsbmcgPSB+bG9uLAogICAgICAgICAgICAgIHBvcHVwID0gfmZhYSkgJT4lCiAgYWRkUHJvdmlkZXJUaWxlcyggcHJvdmlkZXI9Ik9wZW5Ub3BvTWFwIikKYGBgCgpXZWF0aGVyIGNvbmRpdGlvbnMgYXQgZWFjaCBhaXJwb3J0IChhIHRvdGFsIG9mIGByIG5yb3cod2VhdGhlcilgIG9mIHRoZW0hKQoKYGBge3J9CmhlYWQod2VhdGhlcikKYGBgCkFuZCBhIGxpc3Qgb2YgYWlycGxhbmVzIChOID0gYHIgbnJvdyggcGxhbmVzIClgKS4KCmBgYHtyfQpoZWFkKCBwbGFuZXMgKQpgYGAKCkZyb20gYWxsIG9mIHRoZXNlIGF1eGlsaWFyeSBkYXRhLCB3ZSBjYW4gY29uc3RydWN0IGEgc2VyaWVzIG9mIGZsaWdodHMgd2hpY2ggaGFzIHRoZSBmb2xsb3dpbmcgdmFyaWFibGVzIGluIGl0LgoKYGBge3J9Cm5hbWVzKCBmbGlnaHRzICkKYGBgCgpJZiB5b3UgbG9vayBhdCB0aGUgZmxpZ2h0IHRhYmxlLCB5b3Ugd2lsbCBzZWUgdGhhdCBzZXZlcmFsIGNvbHVtbnMgb2YgZGF0YSBoZXJlIGFyZSAqY29ubmVjdGVkKiB0byB0aGUgb3RoZXIgZGF0YSBmcmFtZXMgZGVzY3JpYmVkIGFib3ZlLgoKYGZsaWdodHNgIEZvcmVpZ24gS2V5IHwgRGF0YSBGcm9tIAotLS0tLS0tLS0tLS0tLS0tLS0tLS0tfC0tLS0tLS0tLS0tLS0tLS0KY2FycmllciAgICAgICAgICAgICAgIHwgYWlybGluZXMKZGVzdCAgICAgICAgICAgICAgICAgIHwgYWlycG9ydHMKb3JpZ2luICAgICAgICAgICAgICAgIHwgYWlycG9ydAp0YWlsbnVtICAgICAgICAgICAgICAgfCBwbGFuZXMgCnRpbWVfaG91ciAgICAgICAgICAgICB8IHdlYXRoZXIKClRoZSByZWFzb24gdGhhdCB0aGVzZSB0YWJsZXMgaGF2ZSAqa2V5cyogY29ubmVjdGluZyBvbmUgdG8gdGhlIG90aGVyIGlzIHRoYXQgdGhlcmUgaXMgYW4gZWNvbm9teSBvZiBzY2FsZSBmb3IgdmVyeSBsYXJnZSBkYXRhIHNldHMgaWYgd2UgcGFydGl0aW9uIGl0IGludG8gcmVhbHRpb25hbCBjb21wb25lbnRzIHN1Y2ggYXMgdGhpcy4KCgojIFF1ZXN0aW9ucwoKVXNlIHRoZSB0YWJsZXMgYWJvdmUgdG8gYW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zLgoKMS4gV2hpY2ggY2FycmllcnMgaGF2ZSBwbGFuZXMgd2l0aCBvdmVyIDEwMCBmbGlnaHRzIGluIDIwMTMgKGFuZCBob3cgbWFueSk/ICAKCmBgYHtyfQoKYGBgCgoKCgoyLiBEdXJpbmcgd2hhdCA0OCBob3VyIHBlcmlvZCB3ZXJlIHRoZSB3b3JzdCBkZWxheXM/ICBJcyB0aGVyZSBhbnkgY29ubmVjdGlvbiB3aXRoIHdlYXRoZXI/CgoKYGBge3J9CgpgYGAKCgoKCgoKMy4gV2hhdCBhcmUgdGhlIGZpdmUgYmVzdCBhbmQgZml2ZSB3b3JzdCBhaXJsaW5lcyBmb3Iga2VlcGluZyBvbiBzY2hlZHVsZT8KCjQuIEluZGVwZW5kZW50IG9mIHRoZSBsZW5ndGggb2YgdGhlIGZsaWdodCwgYXJlIHlvdSBtb3JlIGxpa2VseSB0byBiZSBkZWxheWVkIGJ5IHRha2luZyBhIGxhcmdlIHBsYW5lIG9yIGEgc21hbGwgcGxhbmU/IAoKNS4gRG8gYWxsIHBsYW5lcyBoYXZlIHRhaWwgbnVtYmVycz8KCgoKCgoKCgpbXjFdOiBJZiB5b3UgZ2V0IGFuIGVycm9yIHdoZW4gYXR0ZW1wdGluZyB0byBsb2FkIHRoaXMgbGlicmFyeSBpbiwgaW5zdGFsbCBpdCB3aXRoIGBpbnN0YWxsLnBhY2thZ2VzKCJueWNmbGlnaHRzMTMiKWAgYW5kIHRoZW4gbG9hZCBpdCBpbiBhZ2Fpbi4g