Find the last day of the month

I have different sets of monthly data that I want to align and evaluate once a month. The different sources report the timestamp of the monthly data differently – one reports the date without the day, another as the last business day, and another as the last day of the month. For what I want to do, I’m content to align the data to the last day of the month. How do I do that in R?

The solution for this isn’t obvious in Excel or OpenOffice, either, but I thought it would be at least similarly simple in R. I’d looked around, and the best solution I’d found on r-help was something of a kludge:

index(x) = as.POSIXct(as.Date(as.yearmon(index(x)),frac=1), tz="UTC")

I said it wasn’t pretty, didn’t I? You are reading that right – convert the index first to yearmon, then to Date (with frac representing the fraction of a month with a number between 1 and 0), then finally to POSIXct.

That’s worked fine for me until I recently found strange, eight-second misalignments between my data sets that seemed to be caused by the transformation above. Timezone issues, perhaps? On Jeff Ryan’s advice to keep all my date transformations in POSIX, I finally found a better solution in r-help that had been posted by Whit Armstrong years and years ago. I’m not sure why I didn’t find it earlier, but his solution was somewhat more general than what I have been looking for. I’ve stripped his answer down a bit to the following utility function:

eom <- function(date) {
  # date character string containing POSIXct date <- as.POSIXlt(date) # add a month, then subtract a day:
  mon <-$mon + 2 
  year <-$year
  year <- year + as.integer(mon==13) # if month was December add a year
  mon[mon==13] <- 1
  iso = ISOdate(1900+year, mon, 1, hour=0, tz=attr(date,"tz"))
  result = as.POSIXct(iso) - 86400 # subtract one day
  result + (as.POSIXlt(iso)$isdst - as.POSIXlt(result)$isdst)*3600

This wasn’t as simple as I was hoping for, but once functionalized it becomes simple. The premise is the same as the spreadsheet solution – add a month then subtract a day.

# Whit's example
x <- seq(as.POSIXct("2001-01-10"),as.POSIXct("2005-12-10"),by="months")
       before      after
1  2001-01-10 2001-01-31
2  2001-02-10 2001-02-28
3  2001-03-10 2001-03-31
4  2001-04-10 2001-04-30
5  2001-05-10 2001-05-31
6  2001-06-10 2001-06-30
... snip ...

Ah, much better. Thanks, Whit!

Aggregate portfolio contributions through time

The last CRAN release didn’t have much new functionality, but Ross Bennett and I have completely re-written the Return.portfolio function to fix some issues and make the calculations more transparent.  The function calculates the returns of a portfolio given asset returns, weights, and rebalancing periods – which, although not rocket science, requires some diligence about it.

Users of this function frequently want to aggregate contribution through time – but contribution for higher periodicity data can’t be directly accumulated into lower periodicities (e.g., using daily contributions to calculate monthly contributions).   So the function now also outputs values for the individual assets and the aggregated portfolio so that contributions can be calculated at different periodicities.  For example, contribution during a quarter can be calculated as the change in value of the position through those three months, divided by the original value of the portfolio. The function doesn’t do this directly, but it provides the value calculation so that it can be done.

We’ve also added some other convenience features to that function.  If you do not specify weights, the function assumes an equal weight portfolio.  Alternatively, you can specify a vector or single-row matrix of weights that matches the length of the asset columns. In either case, if you don’t specify a rebalancing period, the weights will be applied at the beginning of the asset time series and no further rebalancing will take place. If a rebalancing period is specified (using the endpoints attribute of ‘days’, ‘weeks’, ‘months’, ‘quarters’, and ‘years’ from xts’ endpoints function), the portfolio will be rebalanced to the given weights at the interval specified.

That function can also do irregular rebalancing when passed a time series of weights. It uses the date index of the weights for xts-style subsetting of rebalancing periods, and treats those weights as “end-of-period” weights (which seems to be the most common use case).

When verbose=TRUE, Return.portfolio now returns a list of data and intermediary calculations.  Those should allow anyone to step through the specific calculations and see exactly how the numbers are generated.

Ross did a very nice vignette for the function (vignette(portfolio_returns)), and as usual there’s a lot more detail in the documentation – take a look.

Here’s an example of a traditional 60/40 portfolio. We’ll look at the results of different rebalancing period assumptions, and then aggregate the monthly portfolio contributions to yearly contributions.

symbols = c(
  "SPY", # US equities, SP500
  "AGG"  # US bonds, Barclay Agg
getSymbols(symbols, from="1970-01-01")
x.P <-, lapply(symbols, function(x) {
               Cl(to.monthly(Ad(get(x)), drop.time = TRUE,
colnames(x.P) = paste0(symbols, ".Adjusted")
x.R <- na.omit(Return.calculate(x.P))

#            SPY.Adjusted AGG.Adjusted
# 2003-10-31   0.05350714 -0.009464182
# 2003-11-28   0.01095923  0.003380861
# 2003-12-31   0.05035552  0.009815412
# 2004-01-30   0.01975363  0.004352241
# 2004-02-27   0.01360322  0.011411238
# 2004-03-31  -0.01331329  0.006855184
#            SPY.Adjusted AGG.Adjusted
# 2014-04-30  0.006931012  0.008151410
# 2014-05-30  0.023211141  0.011802974
# 2014-06-30  0.020650814 -0.000551116
# 2014-07-31 -0.013437564 -0.002481390
# 2014-08-29  0.039463463  0.011516492
# 2014-09-15 -0.008619401 -0.010747791

If we didn’t pass in any weights, the function would assume an equal-weight portfolio. We’ll specify a 60/40 split instead.

# Create a weights vector
w = c(.6,.4) # Traditional 60/40 Equity/Bond portfolio weights
# No rebalancing period specified, so buy and hold initial weights
result.norebal = Return.portfolio(x.R, weights=w)
#                           portfolio.returns
# Annualized Return                    0.0705
# Annualized Std Dev                   0.0880
# Annualized Sharpe (Rf=0%)            0.8008

If we don’t specify a rebalancing period, we get buy and hold returns. Instead, let’s rebalance every year.

# Rebalance annually back to 60/40 proportion
result.years = Return.portfolio(x.R, weights=w, rebalance_on="years")
#                           portfolio.returns
# Annualized Return                    0.0738
# Annualized Std Dev                   0.0861
# Annualized Sharpe (Rf=0%)            0.8565

Similarly, we might want to consider quarterly rebalancing. But this time we’ll collect all of the intermediary calculations, including position values. We get a list back this time.

# Rebalance quarterly; provide full calculations
result.quarters = Return.portfolio(x.R, weights=w, 
rebalance_on="quarters", verbose=TRUE)  
#                           portfolio.returns
# Annualized Return                    0.0723
# Annualized Std Dev                   0.0875
# Annualized Sharpe (Rf=0%)            0.8254

That provides more detail, including the monthly contributions from each asset.

# We asked for a verbose result, so the function generates a list of 
# intermediary calculations, including asset contributions for each period:
# [1] "returns"      "contribution" "BOP.Weight"   "EOP.Weight"   
# [5] "BOP.Value"    "EOP.Value" 

# Examine the beginning-of-period weights; note the reweighting periods
#            SPY.Adjusted AGG.Adjusted
# 2014-01-31    0.6000000    0.4000000
# 2014-02-28    0.5876652    0.4123348
# 2014-03-31    0.5975060    0.4024940
# 2014-04-30    0.6000000    0.4000000
# 2014-05-30    0.5996912    0.4003088
# 2014-06-30    0.6023973    0.3976027
# 2014-07-31    0.6000000    0.4000000
# 2014-08-29    0.5973447    0.4026553
# 2014-09-30    0.6039059    0.3960941
# 2014-10-15    0.6000000    0.4000000

# Look at monthly contribution from each asset
#            SPY.Adjusted  AGG.Adjusted
# 2014-01-31 -0.021147406  0.0061514949
# 2014-02-28  0.026753095  0.0015515892
# 2014-03-31  0.004943173 -0.0006035523
# 2014-04-30  0.004178138  0.0033039234
# 2014-05-30  0.013920140  0.0046954857
# 2014-06-30  0.012434880 -0.0002195083
# 2014-07-31 -0.008069401 -0.0009942920
# 2014-08-29  0.023590440  0.0046081450
# 2014-09-30 -0.008343079 -0.0024215991
# 2014-10-15 -0.032250533  0.0067572530

Having the monthly contributions is nice, but what if we want to know what each asset contributed to the annual result of the portfolio? We get this question quite a bit (and it has prompted many attempts to “fix” the code – we appreciate that isn’t as straightforward as it seems).

EDIT: Even knowing that, I got it wrong the first time… Based on the reference that Paolo points to in his comment below and some subsequent email conversation, I’ve replaced the last part of this post with the correct calculations.

From the portfolio contributions of individual assets, such as those of a particular asset class or manager, the multi-period contribution is neither the sum of nor the geometric compounding of single-period contributions. Because the weights of the individual assets change through time as transactions occur, the capital base for the asset changes.

Instead, the asset’s multi-period contribution is the sum of the asset’s dollar contributions from each period, as calculated from the wealth index of the total portfolio. Once contributions are expressed as a change in dollar value relative to the wealth index of the portfolio, asset contributions then sum to the returns of the total portfolio for the period.

# Calculate weighted contributions
# cumulative returns lagged forward to represent beginning of the period portfolio value
lag.cum.ret <- na.fill(lag(cumprod(1+result.quarters$returns),1),1) 
# multiply by contributions to get weighted contributions
wgt.contrib = result.quarters$contribution * rep(lag.cum.ret, NCOL(result.quarters$contribution))

# Create end of year dates for xts timestamps
dates = c(seq(as.Date("2003/12/31"), tail(index(returns),1), "years"), tail(index(returns),1))

# Summarize weighted contributions by year
ann.wgt.contrib = apply(wgt.contrib, 2, function (x) apply.yearly(x, sum))
ann.wgt.contrib = as.xts(ann.wgt.contrib,

# Normalize to the beginning of period value
p.ann.contrib = NULL
for(i in 2003:2014) 
  p.ann.contrib = rbind(p.ann.contrib, colSums(wgt.contrib[as.character(i)]/rep(head(lag.cum.ret[as.character(i)],1),NCOL(wgt.contrib))))
p.ann.contrib = as.xts(p.ann.contrib, = dates)
p.ann.contrib = cbind(p.ann.contrib, rowSums(p.ann.contrib))
colnames(p.ann.contrib) = c("SPY Contrib", "AGG Contrib", "Portfolio Return")
#            SPY Contrib  AGG Contrib Portfolio Return
# 2003-12-31  0.07116488  0.001458576       0.07262346
# 2004-12-31  0.06465335  0.015395509       0.08004886
# 2005-12-31  0.02927809  0.009055321       0.03833341
# 2006-12-31  0.09375945  0.016132091       0.10989154
# 2007-12-31  0.03113908  0.027212530       0.05835161
# 2008-12-31 -0.23405576  0.028503181      -0.20555258
# 2009-12-31  0.15850172  0.011712674       0.17021440
# 2010-12-31  0.09597257  0.025305730       0.12127830
# 2011-12-31  0.01689928  0.031037641       0.04793692
# 2012-12-31  0.09585370  0.015927463       0.11178116
# 2013-12-31  0.18533901 -0.008329840       0.17700917
# 2014-08-29  0.03670684  0.019700427       0.05640727

So that provides the annual contribution of each asset for each asset. Let’s check the result – do the annual contributions for each instrument sum to the portfolio returns for the year?

# Calculate the annual return of the portfolio for each year between 2003 
# and current YTD
> period.apply(result.quarters$returns, INDEX=endpoints(result.quarters$returns, "years"), FUN=Return.cumulative, geometric=TRUE)
#            portfolio.returns
# 2003-12-31        0.07262346
# 2004-12-31        0.08004886
# 2005-12-30        0.03833341
# 2006-12-29        0.10989154
# 2007-12-31        0.05835161
# 2008-12-31       -0.20555258
# 2009-12-31        0.17021440
# 2010-12-31        0.12127830
# 2011-12-30        0.04793692
# 2012-12-31        0.11178116
# 2013-12-31        0.17700917
# 2014-10-15        0.03793038 
# Yes, the results match!

So that’s an example of how one would go about aggregating return contributions from a higher periodicity (monthly) to a lower periodicity (yearly) within a portfolio.

Knowing that, I went ahead and drafted a function for aggregating contributions called to.period.contributions that’s in the sandbox on R-Forge. Once you’ve sourced the function into your environment, you can aggregate contributions as such:

to.period.contributions(result.quarters$contribution, "years")
#            SPY.Adjusted AGG.Adjusted Portfolio Return
# 2003-12-31   0.07116488  0.001458576       0.07262346
# 2004-12-31   0.06465335  0.015395509       0.08004886
# 2005-12-30   0.02927809  0.009055321       0.03833341
# 2006-12-29   0.09375945  0.016132091       0.10989154
# 2007-12-31   0.03113908  0.027212530       0.05835161
# 2008-12-31  -0.23405576  0.028503181      -0.20555258
# 2009-12-31   0.15850172  0.011712674       0.17021440
# 2010-12-31   0.09597257  0.025305730       0.12127830
# 2011-12-30   0.01689928  0.031037641       0.04793692
# 2012-12-31   0.09585370  0.015927463       0.11178116
# 2013-12-31   0.18533901 -0.008329840       0.17700917
# 2014-10-15   0.01455321  0.023377173       0.03793038

Along with that I created a few wrapper functions for to.weekly, to.monthly.contributions, to.quarterly.contributions, and to.yearly.contributions. Give those a shot and let me know if you see any issues. Thanks again to Paolo for the feedback!

Tagged ,

PerformanceAnalytics update released to CRAN

Version number 1.4.3541 of PerformanceAnalytics was released on CRAN today.

If you’ve been following along, you’ll note that we’re altering our version numbering system.  From here on out, we’ll be using a “major.cran-release.r-forge-rev” form so that when issues are reported it will be easier for us to track where they may have been introduced.

Even though PerformanceAnalytics has been in development for almost a decade, we haven’t made significant changes to the interfaces of the functions – hence the major release number hasn’t changed from “1”.  I’ll warn you that we are working on revisions to many of the charts functions that might cause us to change some of those interfaces significantly (in ways that break backward compatibility), in which case we’ll increment the major release.  Hopefully we’ll be able to provide wrappers to avoid breaking much, but we’ll see.  That development is ongoing and there’s no deadline at the moment, so maybe next year. On the other hand, it’s going pretty well and generating a lot of excitement, so maybe sooner.

This is our 4th CRAN release after 1.0, so the minor number moves to 4.  We’ve been releasing the package to CRAN a couple of times a year with some regularity over the last seven years, although it’s slowed as the package has grown and demands from the CRAN maintainers have increased.

This release is tagged at rev. 3541 on R-Forge.  During the last year most of our development activity has been on other related packages, GSOC projects, and more speculative projects.  Little new functionality has found its way into this new release – this release is mostly bug fixes with a few new functions thrown in here and there. If you’re interested, you can follow along with package development by grazing through the sandbox directory on R-Forge. There’s quite a bit in there that is close but needs to be carried over the finish line.

We continue to welcome suggestions, contributions, and patches – whether for functionality or documentation.

GSOC 2014: Let’s do it again!

Google Summer of Code opened for students on Monday, March 10, more than a month earlier than last year.  If you weren’t following the announcements and that deadline caught you wrong-footed, all I can say is that the good news is that students will know their fate by April 21, well before the summer starts.

In other good news, the The R Project has once again been selected as a mentoring organization , and a variety of mentors have proposed a number of projects for students to work on during this summer.  If you’re interested, here’s a quick introduction to the GSOC program and pointer to the R-related project ideas that are lining up for students this summer.
GSoC 2014 Flags

About Google Summer of Code

A quick reminder – Google brings together students with mentors to work on open-source projects of their choosing.  Mentors get code written for their project, but no money; students get paid $5,000, equivalent to a nice summer internship.

If you’re a student and you’re interested on something R-related, pick something you’re interested in working on (whether a mentor has submitted an interesting idea you want to pursue, or if you have an idea and want a mentor).  With an idea in hand, submit a project application directly to Google before Friday, March 21 at 12:00pm PDT.   Google will award a certain number of student slots to the “R Project for Statistical Computing,” and projects will be ranked and slots allocated by the GSOC-R administrators and mentors.

Finance-related Projects

Like last year, there are a few proposed R projects that are finance-related.  I won’t go through them here, but look for the names of past mentors: Jonathan Cornelissen, Kris Boudt, David Ardia, and Doug Martin; as well as new mentors such as Daniele Signori and David Matteson.  This is promising to be a very productive summer…

This year, Brian Peterson and I are looking for a student to work on improving time series visualization for xts time series objects, building on what we learned in successful GSOC projects in 2012 and 2013.  This project will specifically focus on developing multi-panel time series charts that may be created from specified panels with a chart layout. Charts may (eventually) be of composed of panels with several different chart types, but the focus this summer is only on time series charts that may be linked via the x- and/or y-axes.

By default, plot.xts will simply chart the time series data in the form it is passed in, using a default panel that is a simple line chart. The following will show a single panel line chart with six lines:

> dim(x.xts) 
[1] 60 6 
> plot(x.xts)

A panel function will be used to define transformations of the data and its display. For example, there might be a function called panel.CumReturns that takes return data, chains together the individual returns, and produces a line chart of cumulative returns through time. This code will show a single panel as defined in panel.CumReturns with six lines:

> dim(x.xts) 
[1] 60 6 
> plot(x.xts, panels=panel.CumReturns)

Multiple panels can be used in a chart.  Say we have a panel.BarVaR function that takes returns and plots only the first series in a bar chart overlayed with an expanding window calculation of VaR for that asset. And we have a panel.Drawdowns function that produces a line chart of drawdowns through time for all time series returns data passed in.  These panels can be passed the same way via an argument. In this case, the layout will be simply be divided by the number of panels, for example, divided into thirds in the following:

> dim(x.xts) 
[1] 60 6 
> plot(x.xts, panels = c(panel.CumReturns, panel.BarVaR, panel.Drawdowns)) 

This would result in a three-panel chart, each with six data data series (although the panel function may not choose to draw all of them) available.

There’s much more, but that should whet your appetite.

Functions will likely be included in xtsExtra, an R package that provides supplementary functionality for xts. The package also served as a development platform for the GSoC 2012 and 2013 xts project, for experimental code that may eventually end up in the xts package.

There are also several other very interesting projects proposed for the broader R Project organization as well. Take a look – these are in various states of needing students or mentors.

Students, start your proposal…

Students should also take a look at the R Project’s proposal template as a starting point.  Proposals are expected to be very detailed, and may run to ten or more pages.  In short, this is a competitive process and you will need to put your best foot forward.  I should also note that the process is very iterative – you’ll get feedback as time goes on and will be expected to be responsive to the questions people ask.  Project mentors usually also propose a test – some task that they think is representative of the summer’s work that will help demonstrate your skills and fitness for the project.

Or, consider bringing something new to the table.  This is an active, dynamic group of people who have a broad set of interests, and the process can accommodate well-proposed ideas that garner support.

Good luck, and I hope to hear from you soon.

Some belated spring cleaning

A very busy spring has transitioned into a very busy summer, so let me recap a few topics that probably deserve more time than I’ll give them here. Here are the things I’m overdue on, in no particular order:


In the March edition of the Journal of Risk, Kris Boudt, Brian Peterson and I published a paper titled Asset allocation with conditional value-at-risk budgets. You can also see a pre-publication version on SSRN. It was nice to see this finally hit paper – many thanks to my co-authors for all their work on an interesting topic.

Equal CVaR Concentration

Panel 3 of Figure 4 shows the weights through time and contribution of CVaR for a minimum CVaR concentration portfolio.

Dirk Eddelbuettel’s book is finally out. Congrats to him – that’s a nice accomplishment! I tried to steal the pre-print at the R/Finance conference, but Dirk made me buy my own copy.

R/Finance 2013

R/Finance 2013 went very well. This event has already been covered here and here – even with an article in the R Journal – but I thought I’d briefly mention a few highlights.

I thought the keynote speakers were fantastic. Every time I see Atillio Meucci speak, I learn something new about a topic I thought I already knew pretty well. This time, Atillio pulled out several animimated visualizations that were very thoughtfully designed – each presented a huge amount of information in a linked way that showed relationships between measures and how they changed dynamically through time. Each of the animations served to underscore the (sometimes simple) intuition behind the complex math. “A quant presentation without equations,” he said. Exceptionally well done – developing the intuition behind these concepts is a significant challenge, even in a room full of quants. No slides, but more on that later.

Revolution’s blog did more justice to Ryan Sheftel’s talk than I’m going to do here. Ryan did an excellent job describing the implementation issues within a large organization, providing a strong dose of reality that I think was appreciated by the audience of practitioners.

Sanjiv Das hit one out of the park as well. Flip through his slides when you have a chance – it was a nice demonstration of how he’s used R in very different projects related to finance. He’s a polymath. I particularly enjoyed his talk on network analysis usig SEC and FDIC filings to identify banks that pose systematic risk – a talk that echoed one given by Michael Gordy, a senior economist in the FRB, in 2012.

I have to plump for the hometown, as well. Ruey Tsay always has something interesting up his sleeve, and this presentation was no different. He warned that this is work in progress, but with Y. Hu he’s developing Principal Volatility Components as a way to identify common volatility components among financial assets. That struck me as work that is well worth tracking.

That was more than I had intended to write on the topic, but a few other presentations stood out to me as well: David Matteson’s talk on change points was accompanied by an excellent paper; Samantha Azzarello’s presentation on a Baysian interpretation of the Taylor Rule was as well; Thomas Harte gave another from-the-trenches viewpoint; David Ardia; Ronald Hochreiter; Alexios Ghalanos, and many others – there were a number of excellent sessions. We were also glad to have several returning speakers – Doug Martin, Kris Boudt, Bernhard Pfaff, Jiahan Li, Bryan Lewis. Lightning talks were also well received, particularly Winston Chang’s demonstration of Shiny. Jan Humme and Brian Peterson did a very nice overview of quantstrat in the pre-conference tutorials. All of the slides for the 2013 conference are here. Great stuff – take a look. Then pencil in the 2014 conference in May of next year…

Other R Conferences

Continuing on the topic of finance-related R conferences, congratulations go to Markus Gesman and Cass Business Scool for organizing the inagural R in Insurance conference this year. If you missed it, as I did, check out the presentations here and plan your travel accordingly for next year.

On the other side of the ledger, this was the last year that Diethelm Wuertz’s R-Metrics conference is to be held in Meielesalp. I didn’t make this one, but I’ve since heard that next year’s will be held in Paris.

Google Summer of Code 2013

GSOC 2013 has not only started, but is well underway. Of the nineteen R projects going on, six are finance-related. In no particular order:

All this activity is resulting in a tremendous amount of code covering a variety of topics and projects. Thanks to all who are participating – both mentors and students – and to Google for supporting open source! I’ll try to provide more detailed project wrap-ups at the end of the summer.


I’ve made some changes to blotter recently for handling account-level transactions, such as additions and withdrawals (rev. 1485). That should improve the package’s functionality for cash reconciliation. The functionality is pretty rudimentary, but it appears to work. Let me know if you see opportunities for improvement. Blotter is pretty close to CRAN-ready at this point, but requires a final push that is incongruous with good weather.

A very belated thanks to Brian Peterson for pushing out version 1.1 of PerformanceAnalytics to CRAN early this year. I’ve been intending to go over some of the significant changes in that version for months now, but we might have another version out before I get the posting done. Never mind.

R/Finance 2013 Is Coming Quickly…

There is about two weeks remaining until R/Finance 2013 – being held on May 17th and 18th at UIC in Chicago.  Make sure you register beforehand to ensure you have a spot, and – yes – you do want to come to the conference dinner on Friday.  RFinance2013

I am particularly excited about the lineup of keynotes this year, which includes:

  • Sanjiv Das – Santa Clara University; Author of Derivatives: Principles and Practice;
  • Attilio Meucci – Chief Risk Officer at Kepos Capital, LP; Author of Risk and Asset Allocation
  • Ryan Sheftel – Managing Director for Electronic Market Making at Credit Suisse; and
  • Ruey Tsay – University of Chicago; Author of An Introduction to Analysis of Financial Data with R

In addition, the agenda for the two day conference is quite interesting – I’m anticipating several pages of interesting things to try coming from this lineup.

And there are several optional pre-conference sessions this year, some of which are close to sold out – you’ll want to act quickly if you want a seat.  Those cover topics and packages such as quantstrat, data.table, Rcpp, distributed computing, and whatever Jeff Ryan has on his mind (which is always interesting).

Make sure to introduce yourself – I hope to see you there!

Writing from R to Excel with xlsx

Paul Teetor, who is doing yeoman’s duty as one of the organizers of the Chicago R User Group (CRUG), asked recently if I would do a short presentation about a “favorite package”.  I picked xlsx, one of the many packages that provides a bridge between spreadsheets and R.  Here are the slides from my presentation last night; the script is below.

I’ll be honest with you – I use more than one package for reading and writing spreadsheets. But this was a good opportunity for me to dig into some unique features of xlsx and I think the results are worth recommending.

A key feature for me is that xlsx uses the Apache POI API, so Excel isn’t needed.  Apache POI is a mature, separately developed API between Java and Excel 2007.  That project is focused on creating and maintaining Java APIs for manipulating file formats based on the Office Open XML standards (OOXML) and Microsoft’s OLE 2 Compound Document format (OLE2).  As xlsx uses the rJava package to link Java and R, the heavy lifting of parsing XML schemas is being done in Java rather than in R.
Continue reading

GSoC and R: Off to the Races

Google Summer of Code has now opened for student applications, and the R Project has once again been selected as a mentoring organization.  I’ve discussed before that a variety of mentors have proposed a number of projects for students to work on during this summer, but I wanted to emphasize some points about the schedule.

The deadline for student submissions is May 03 at 19:00 UTC.  You have to have a credible application in Melange by this time, or the application will not get a slot.  That’s not a lot of time to create or pick an idea, write an applicationGSOC2013, identify a mentor, sign up for a Melange account, and post your application.  But students can improve their applications once they are posted, so it is worth putting up an incomplete draft if you need to.

Even after the deadline, students will receive questions and advice for improvements from the mentors once the application is up, and should be responsive to those requests.  All of the mentors are involved in voting about which projects will be funded.

Google has extended the amount of time spent determining slots this year, so the ‘behind-the-scenes’ process will be longer this year than it was in past years.  Accept/reject notices to students will come on May 31st.

Everyone who wants to participate in this year’s Google Summer of Code with R should join the Google Group:

Good luck!

GSoC 2013: At the starting line

Google Summer of Code will be open for students on Monday, April 22.  The R Project has once again been selected as a mentoring organization , and a variety of mentors have proposed a number of projects for students to work on during this summer.  Here’s a bit about the program, and more on the R-related projects that are lining up for students this summer.GSOC2013

About Google Summer of Code

The concept is relatively simple – Google brings together students with mentors to work on open-source projects of their choosing.  Mentors get code written for their project, but no money; students get paid $5,000, equivalent to a nice summer internship.

If you’re a student and you’re interested on something R-related, pick something you’re interested in working on (whether a mentor has submitted an interesting idea you want to pursue, or if you have an idea and want a mentor).  With an idea in hand, submit a project application directly to Google.   Google will award a certain number of student slots to the R project, and projects will be ranked and slots allocated by the GSOC-R administrators and mentors.

Continue reading

Tagged ,

GSOC 2013: IID Assumptions in Performance Measurement

GSOC2013Google Summer of Code for 2013 has been announced and organizations such as R are beginning to assemble ideas for student projects this summer. If you’re an interested student, there’s a list of project proposals on the R wiki. If you’re considering being a mentor, post a project idea on the site soon – project outlines end up being 1-2 pages of text, plus references – and they should be up on the wiki by mid-to-late March. Google will use the listed projects outlines as part of their criteria for accepting the R project for another year of GSoC and in their preliminary budgeting of slots.

I’ve posted one project idea so far, one that would extend PerformanceAnalytics’ standard tools for analysis to better deal with various violations of a standard assumption that returns are IID (that is, each observation is drawn from an identical distribution and is independent of other observations).

Observable autocorrelation is one of those violations. There have been a number of different approaches for addressing autocorrelation in financial data that have been discussed in the literature. Various authors, such as Lo (2002) and Burghardt, et. al. (2012), have noted that the effects of autocorrelation can be huge, but are largely ignored in practice. Burghardt observes that the effects are particularly interesting when measuring drawdowns, a widely used performance measure that describes the performance path of an investment. Recently, Bailey and Lopez del Prado (2013) have developed a closed-form solution for the estimating drawdown potential, without having to assume IID cashflows.

There’s more detail at the project site, including a long list of references. I’d be glad to hear from you if you have any ideas, thoughts, or even code in this vein (or others). Here are a few of the references to get you thinking: