Find the last day of the month

I have different sets of monthly data that I want to align and evaluate once a month. The different sources report the timestamp of the monthly data differently – one reports the date without the day, another as the last business day, and another as the last day of the month. For what I want to do, I’m content to align the data to the last day of the month. How do I do that in R?

The solution for this isn’t obvious in Excel or OpenOffice, either, but I thought it would be at least similarly simple in R. I’d looked around, and the best solution I’d found on r-help was something of a kludge:

index(x) = as.POSIXct(as.Date(as.yearmon(index(x)),frac=1), tz="UTC")

I said it wasn’t pretty, didn’t I? You are reading that right – convert the index first to yearmon, then to Date (with frac representing the fraction of a month with a number between 1 and 0), then finally to POSIXct.

That’s worked fine for me until I recently found strange, eight-second misalignments between my data sets that seemed to be caused by the transformation above. Timezone issues, perhaps? On Jeff Ryan’s advice to keep all my date transformations in POSIX, I finally found a better solution in r-help that had been posted by Whit Armstrong years and years ago. I’m not sure why I didn’t find it earlier, but his solution was somewhat more general than what I have been looking for. I’ve stripped his answer down a bit to the following utility function:

eom <- function(date) {
  # date character string containing POSIXct date
  date.lt <- as.POSIXlt(date) # add a month, then subtract a day:
  mon <- date.lt$mon + 2 
  year <- date.lt$year
  year <- year + as.integer(mon==13) # if month was December add a year
  mon[mon==13] <- 1
  iso = ISOdate(1900+year, mon, 1, hour=0, tz=attr(date,"tz"))
  result = as.POSIXct(iso) - 86400 # subtract one day
  result + (as.POSIXlt(iso)$isdst - as.POSIXlt(result)$isdst)*3600
}

This wasn’t as simple as I was hoping for, but once functionalized it becomes simple. The premise is the same as the spreadsheet solution – add a month then subtract a day.

# Whit's example
x <- seq(as.POSIXct("2001-01-10"),as.POSIXct("2005-12-10"),by="months")
data.frame(before=x,after=eom(x))
       before      after
1  2001-01-10 2001-01-31
2  2001-02-10 2001-02-28
3  2001-03-10 2001-03-31
4  2001-04-10 2001-04-30
5  2001-05-10 2001-05-31
6  2001-06-10 2001-06-30
... snip ...

Ah, much better. Thanks, Whit!

Advertisements

9 thoughts on “Find the last day of the month

  1. Charles says:

    I’ve long used the following fairly simple function:

    eom = function(x, p=as.POSIXlt(x)) as.Date(modifyList(p, list(mon=p$mon + 1, mday=0)))

  2. ilir says:

    Check out ceiling_date() from package lubridate.

    • msharp2013 says:

      ## Specifically:
      library(lubridate)
      x <- seq(as.POSIXct("2001-01-10"),as.POSIXct("2005-12-10"),by="months")
      data.frame(before = x, after = ceiling_date(x, 'month') – ddays(1))

      • msharp2013 says:

        ## However, it is slower; Changing x to make it longer ..
        > x system.time(eom(x))
        user system elapsed
        0.001 0.000 0.001
        > system.time(ceiling_date(x, ‘month’) – ddays(1))
        user system elapsed
        0.088 0.001 0.089

      • The ceiling_date() solution doesn’t work for dates on the first day of a month. For example, ceiling_date(as.Date(“2014-12-01″),”month”) returns 2014-12-01, not 2015-01-01.

      • Here’s a lubridate solution that works properly, and is pretty simple and easy to understand: update(x, days=days_in_month(x))

  3. G. Grothendieck says:

    1. I tried a variety of examples but was unable to get eom to work, e.g. `eom(format(Sys.time()))` and `eom(format(Sys.Date()))` but both fail for me. 2. The one line solution shown at the beginning has the advatnage that it is shorter, it is smoothly integrated in a seemless way into the rest of R rather than requiring a special function that you have to remember and it works with vector input as well as a wide variety of input classes. 3. Also its a better idea to use Date if you don’t have times since then you can’t get sutbtle time zone bugs and annoying subsecond inconsistencies that are seen from time to time when using POSIXct. Its really better to try to use POSIXct as little as possible. Here it would have been better to simply avoid POSIXct in the partial example shown and use Date instead, if this were feasible in the context of the problem. 4. If you can provide a reproducible example of the as.Date/as.yearmon code not working we will look into it. Maybe this workaround would solve it: `as.Date(as.yearmon(as.Date(index(x))),frac=1)`

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: