Time is the next dimension.

This topic covers the basics of how we put together data based upone date and time objects. For this, we will use the following data frame with a single column of data representing dates as they are written in the US.

These are several challenges associated with working with date and time objects. To those of us who are reading this with a background of how US time and date formats are read, we can easily interpret data objects as Month/Day/Year formats (e.g., ā€œ2/14/2018ā€), and is commonly represented in the kind of input data we work in R with as with a string of characters. Dates and times are sticky things in data analysis because they do not work the way we think they should. Here are some wrinkles:

  1. There are many types of calendars, we use the Julian calendar. However, there are many other calendars that are in use that we may run into. Each of these calendars has a different starting year (e.g., in the Assyrian calendar it is year 6770, it is 4718 in the Chinese calendar, 2020 in the Gregorian, and 1442 in the Islamic calendar).
  2. Western calendar has leap years (+1 day in February) as well as leap seconds because it is based on the rotation around the sun, others are based upon the lunar cycle and have other corrections.
  3. On this planet, we have 24 different time zones. Some states (looking at you Arizona) donā€™t feel it necessary to follow the other states around so they may be the same as PST some of the year and the same as MST the rest of the year. The provence of Newfoundland decided to be half-way between time zones so they are GMT-2:30. Some states have more than one time zone even if they are not large in size (hello Indiana).
  4. Dates and time are made up of odd units, 60-seconds a minute, 60-minutes an hour, 24-hours a day, 7-days a week, 2-weeks a fortnight, 28,29,30,or 31-days in a month, 365 or 366 days in a year, 100 years in a century, etc.

Fortunately, some smart programmers have figured this out for us already. What they did is made the second as the base unit of time and designated 00:00:00 on 1 January 1970 as the unix epoch. Time on most modern computers is measured from that starting point. It is much easier to measure the difference between two points in time using the seconds since unix epich and then translate it into one or more of these calendars than to deal with all the different calendars each time. So under the hood, much of the date and time issues are kept in terms of epoch seconds.

unclass( Sys.time() )
[1] 1605102067

Basic Date Objects

R has some basic date functionality built into it. One of the easiest says to get a date object created is to specify the a date as a character string and then coerce it into a data object. By default, this requires us to represent the date objects as ā€œYEAR-MONTH-DAYā€ with padding 0 values for any integer of month or date below 9 (e.g., must be two-digits).

So for example, we can specify a date object as:

class_start <- as.Date("2021-01-15")
class_start
[1] "2021-01-15"

And it is of type:

class( class_start )
[1] "Date"
Date

If you want to make a the date from a different format, you need to specify what elements within the string representation using format codes. These codes (and many more) can be found by looking at ?strptime.

class_end <- as.Date( "5/10/21", format = "%m/%d/%y")
class_end
[1] "2021-05-10"

Date objects can be put into vectors and sequences just like other objects.

semester <- seq( class_start, class_end, by = "1 day")
semester
  [1] "2021-01-15" "2021-01-16" "2021-01-17" "2021-01-18" "2021-01-19"
  [6] "2021-01-20" "2021-01-21" "2021-01-22" "2021-01-23" "2021-01-24"
 [11] "2021-01-25" "2021-01-26" "2021-01-27" "2021-01-28" "2021-01-29"
 [16] "2021-01-30" "2021-01-31" "2021-02-01" "2021-02-02" "2021-02-03"
 [21] "2021-02-04" "2021-02-05" "2021-02-06" "2021-02-07" "2021-02-08"
 [26] "2021-02-09" "2021-02-10" "2021-02-11" "2021-02-12" "2021-02-13"
 [31] "2021-02-14" "2021-02-15" "2021-02-16" "2021-02-17" "2021-02-18"
 [36] "2021-02-19" "2021-02-20" "2021-02-21" "2021-02-22" "2021-02-23"
 [41] "2021-02-24" "2021-02-25" "2021-02-26" "2021-02-27" "2021-02-28"
 [46] "2021-03-01" "2021-03-02" "2021-03-03" "2021-03-04" "2021-03-05"
 [51] "2021-03-06" "2021-03-07" "2021-03-08" "2021-03-09" "2021-03-10"
 [56] "2021-03-11" "2021-03-12" "2021-03-13" "2021-03-14" "2021-03-15"
 [61] "2021-03-16" "2021-03-17" "2021-03-18" "2021-03-19" "2021-03-20"
 [66] "2021-03-21" "2021-03-22" "2021-03-23" "2021-03-24" "2021-03-25"
 [71] "2021-03-26" "2021-03-27" "2021-03-28" "2021-03-29" "2021-03-30"
 [76] "2021-03-31" "2021-04-01" "2021-04-02" "2021-04-03" "2021-04-04"
 [81] "2021-04-05" "2021-04-06" "2021-04-07" "2021-04-08" "2021-04-09"
 [86] "2021-04-10" "2021-04-11" "2021-04-12" "2021-04-13" "2021-04-14"
 [91] "2021-04-15" "2021-04-16" "2021-04-17" "2021-04-18" "2021-04-19"
 [96] "2021-04-20" "2021-04-21" "2021-04-22" "2021-04-23" "2021-04-24"
[101] "2021-04-25" "2021-04-26" "2021-04-27" "2021-04-28" "2021-04-29"
[106] "2021-04-30" "2021-05-01" "2021-05-02" "2021-05-03" "2021-05-04"
[111] "2021-05-05" "2021-05-06" "2021-05-07" "2021-05-08" "2021-05-09"
[116] "2021-05-10"

Some helpful functions include the Julian Ordinal Day (e.g., number of days since the start of the year).

ordinal_day <- yday( semester[102] )
ordinal_day
[1] 116

The weekday as an integer (0-6 starting on Sunday), which I use to index the named values.

days_of_week <- c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
x <- wday( semester[32] )
days_of_week[ x ]
[1] "Monday"
Monday

Since we did not specify a time, things like hour() and minute() do not provide any usable information.

Dates & Times

To add time to the date objects, we need to specify both date and time specifically. Here are some example data:

df <- data.frame( Date = c("8/21/2004 7:33:51 AM",
                           "7/12/2008 9:23:08 PM",
                           "2/14/2010 8:18:30 AM",
                           "12/23/2018 11:11:45 PM",
                           "2/1/2019 4:42:00 PM",
                           "5/17/2012 1:23:23 AM",
                           "12/11/2020 9:48:02 PM") )
summary( df )
     Date          
 Length:7          
 Class :character  
 Mode  :character  

Just like above, if we want to turn these into date and time objects we must be able to tell the parsing algorithm what elements are represented in each entry. There are many ways to make dates and time, 10/14 or 14 Oct or October 14 or Julian day 287, etc. These are designated by a format string were we indicate what element represents a day or month or year or hour or minute or second, etc. These are found by looking at the documentation for?strptime.

In our case, we have:
- Month as 1 or 2 digits
- Day as 1 or 2 digits
- Year as 4 digits
- a space to separate date from time
- hour (not 24-hour though)
- minutes in 2 digits
- seconds in 2 digits
- a space to separate time from timezone
- timezone
- / separating date objects
- : separating time objects

To make the format string, we need to look up how to encode these items. The items in df for a date & time object such as 2/1/2019 4:42:00 PM have the format string:

format <- "%m/%d/%Y %I:%M:%S %p"

Now, we can convert the character string in the data frame to a date and time object. Instead of using the built-in as.Date() functionality, I like the lubridate library1 as it has a lot of additional functionality that weā€™ll play with a bit later.

library( lubridate )
df$Date <- parse_date_time( df$Date, 
                            orders=format, 
                            tz = "EST" )
summary( df )
      Date                    
 Min.   :2004-08-21 07:33:51  
 1st Qu.:2009-04-29 14:50:49  
 Median :2012-05-17 01:23:23  
 Mean   :2013-07-11 07:28:39  
 3rd Qu.:2019-01-12 19:56:52  
 Max.   :2020-12-11 21:48:02  
class( df$Date )
[1] "POSIXct" "POSIXt" 
POSIXct

POSIXt

Now, we can ask Date-like questions about the data such as what day of the week was the first sample taken?

weekdays( df$Date[1] )
[1] "Saturday"
Saturday

What is the range of dates?

range( df$Date )
[1] "2004-08-21 07:33:51 EST" "2020-12-11 21:48:02 EST"

What is the median of samples

median( df$Date )
[1] "2012-05-17 01:23:23 EST"

and what julian ordinal day (e.g., how many days since start of the year) is the last record.

yday( df$Date[4] )
[1] 357

Just for fun, Iā€™ll add a column to the data that has weekday.

df$Weekday <- weekdays( df$Date )
df

However, we should probably turn it into a factor (e.g., a data type with pre-defined levelsā€”and for us hereā€”an intrinsic order of the levels).

df$Weekday <- factor( df$Weekday, 
                        ordered = TRUE, 
                        levels = days_of_week
                        )
summary( df$Weekday )
   Sunday    Monday   Tuesday Wednesday  Thursday    Friday  Saturday 
        2         0         0         0         1         2         2 

Filtering on Date Objects

We can easily filter the content within a data.frame using some helper functions such as hour(), minute(), weekday(), etc. Here are some examples including pulling out the weekends.

weekends <- df[ df$Weekday %in% c("Saturday","Sunday"), ]
weekends

finding items that are in the past (paste being defined as the last time this document was knit).

past <- df$Date[ df$Date < Sys.time() ]
past
[1] "2004-08-21 07:33:51 EST" "2008-07-12 21:23:08 EST"
[3] "2010-02-14 08:18:30 EST" "2018-12-23 23:11:45 EST"
[5] "2019-02-01 16:42:00 EST" "2012-05-17 01:23:23 EST"

Items that are during working hours

work <- df$Date[ hour(df$Date) >= 9 & hour(df$Date) <= 17 ]
work
[1] "2019-02-01 16:42:00 EST"

And total range of values in days using normal arithmatic operations such as the minus operator.

max(df$Date) - min(df$Date)
Time difference of 5956.593 days

  1. If you get an error saying something like, ā€œthere is no package named lubridateā€ then use install.packages("lubridate") and install it. You only need to do this once.ā†©ļøŽ

