The Paul Tol 21-color salute

You may or may not know that PerformanceAnalytics contains a number of specific color schemes designed for charting data in R (they aren’t documented well, but they show up in some of the chart examples). I’ve been collecting color palates for years in search of good combinations of attractiveness, relative weight, and distinctiveness, helped along the way by great sites like ColorBrewer and packages like RColorBrewer.   I’ve assembled palettes that work for specific purposes, such as the color-focus palates (e.g., redfocus is red plus a series of dark to light gray colors). Others, such as rich#equal, provide a palette for displaying data that all deserve equal treatment in the chart. Each of these palettes have been designed to create readable, comparable line and bar graphs with specific objectives outlined before each category below.

I use this approach rather than generating schemes on the fly for two reasons: it creates fewer dependencies on libraries that don’t need to be called dynamically; and to guarantee the color used for the n-th column of data.

Oh, and here’s a little utility function (that I don’t think I wroteEDIT: that I know I didn’t write, since it was written by Achim Zeileis and is found in his colorspace package, but I have carried it around for quite a while) for displaying a palette:

# Function for plotting colors side-by-side
pal <- function(col, border = "light gray", ...){
  n <- length(col)
  plot(0, 0, type="n", xlim = c(0, 1), ylim = c(0, 1),
       axes = FALSE, xlab = "", ylab = "", ...)
  rect(0:(n-1)/n, 0, 1:n/n, 1, col = col, border = border)
}

Continue reading

Advertisements

Visually Comparing Return Distributions

Here is a spot of code to create a series of small multiples for comparing return distributions. You may have spotted this in a presentation I posted about earlier, but I’ve been using it here and there and am finally satisfied that it is a generally useful view, so I functionalized it.

require(PerformanceAnalytics)
data(edhec)
page.Distributions(edhec[,c("Convertible Arbitrage", "Equity Market Neutral","Fixed Income Arbitrage", "Event Driven", "CTA Global", "Global Macro", "Long/Short Equity")])

Compare-Returns
Continue reading

R/Finance 2013 Call for Papers

It’s that time of year again – we’ve just posted our Call for Papers for the R/Finance 2013 conference, which focuses on applied finance using R. This is our fifth annual conference, again organized by a group of R package authors and community contributors and hosted by the International Center for Futures and Derivatives (ICFD) at the University of Illinois at Chicago.

The conference will be held this spring in Chicago, IL, on Friday May 17 and Saturday May 18, 2013.

I’m particularly excited about our lineup of speakers this year, which we’ve just finalized:

Sanjiv Das, who is a Professor of Finance and the Chair of the Finance Department at Santa Clara University’s Leavey School of Business. He is also the author of Derivatives: Principles and Practice, and he’s a senior editor of The Journal of Investment Management and co-editor of The Journal of Derivatives. You’ll find R spread through most of his work and his blog.

Attilio Meucci is the Chief Risk Officer at Kepos Capital, L.P. and author of Risk and Asset Allocation. He is a thought leader in advanced risk and portfolio management, and somewhat rare in the world of financial research in that he regularly posts code along with his working papers – a characteristic that I deeply appreciate. Unfortunately for me and the broader finance community for R, he prefers to code in Matlab. All of Meucci’s original MATLAB source is available on http://www.symmys.com, but a recent Google Summer of Code project was dedicated to translating some of it to R.

Ryan Sheftel is a Managing Director for Electronic Market Making at Credit Suisse and has been introducing more and more automation for their Treasury bond execution services out to clients. He’s noted publicly that many of CS’ best traders spend a lot of time pounding away writing code – and, perhaps unusually for a senior manager at a large bank, spends time himself coding in R.

Ruey Tsay is a Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. R users may be interested in his new book, An Introduction to Analysis of Financial Data with R, or may already own an edition of Analysis of Financial Time Series, a core book that is well applied in his course on time series analysis at U of C. Also look for companion packages on CRAN.

Hopefully that will whet your appetite enough for you to make plans to attend.

But perhaps you should consider speaking. We’re looking for speakers who focus significantly on the application of R (and packages in R) in various applications to finance. We strongly encourage speakers provide working R code to accompany the presentation/paper, as our audience enjoys being able to take concrete ideas and apply them to their own problems after the conference.

Ideally, data sets would also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). We tend to give preference to presenters who have released R packages.

As in previous years, we will keep all presentations in one track in a large presentation hall with dual projections screens and a stage. This allows all of our conference participants to see all presentations. Given that we have had well over 200 attendees in prior years and a mix of academics and practitioners, you should plan for this type of large and varied audience.

So, unlike an academic conference where you may be presenting your work to 10-15 people who are highly knowledgeable in your field of expertise, you will be presenting to an audience with more varied skills and interests: think TED talk and not detailed exposition of your theory to experts.

Presentations that have been best received in the past have clearly communicated the motivation for the work, and how it could be applied in practice. Presentation that have been less well received have sought to go through the detailed math behind the theories, or have an unclear link to R.

Hopefully that will give you a sense of what we’re looking for, assuming you haven’t attended before. This has been a conference I’ve really enjoyed in the past, and I’m sure this year will be no exception. Much of that comes from hanging out with the attendees – and I hope to see you there, too.

xts and GSOC 2012

Josh Ulrich and Jeff Ryan mentored a Google Summer of Code (GSOC) project this summer focused on experimental functionality for xts in collaboration with R. Michael Weylandt, a student in operations research and financial engineering from Princeton. You might recognize Michael from his presentation at R/Finance this year, where he gave a talk entitled “A Short Introduction to Real-Time Portfolio/Market Monitoring with R“.

There were three main objectives of this GSOC project. One was to extend the plotting functionality of xts – to replace the existing plot.xts function with something much more generally useful and to add a barchart.xts primitive that handles stacked bars for time series with negative values. The proof of concept for both of these graphics come from chart functions in PerformanceAnalytics, but a variety of other improvements were also discussed.

Another objective was to experiment with supporting multiple data types within the same object for time series. The concept here is something like a data.frame, which allows class-specific list elements, aligned on an index. Michael wrote a prototype and definitely moved the ball forward here. Fuller functionality will require more test cases to be written to validate the approach and flush out bugs, as well as to add a number of utility functions such as rbind, cbind, etc.

The third objective was to provide ‘bridge’ functionality to convert xts objects to methods that assume a regular time series, such as AR/ARIMA, Holt Winters, or VAR methods, using something like the the zooreg subclass and some translations. Michael provides a number of these for arima, acf, pacf, HoltWinters, and others. These are convenience wrappers for xts users that manage the xts data into the underlying functions, then as appropriate with the results (such as residuals in the case of arima) are coerced back to xts objects.

The result is contained in a supplementary package called xtsExtra, which Michael constructed as a side-pocket for newly developed functionality, any or all of which may end up in the xts package at some point. Beyond Jeff and Josh, Michael opened up to the broader r-sig-finance community to get feedback on xtsExtra, which resulted in several helpful conversations with Jonathan Cornelison, Eric Zivot, Rob Hyndman, Stuart Greenlee, Kenton Russell, Brian Peterson and me.

I want to step back to the first objective for a moment to talk for a moment about plot.xts. klr at TimelyPortfolio immediately took to the code and exercised it well – here is a particularly good chart. Here’s another. And another. Oh, and this one! These were great examples, and I think they are suggestive of how the function could be extended even further, perhaps simplifying the interface and extending the panel functionality. That might require some significant re-work, but I think the results will be well worth it. I think Jeff Ryan might have some tricks up his sleeve as well…

We’ll see where some of this speculation goes, but I want to thank Michael again for his commendable efforts this summer! His has been a considerable effort to extend and improve xts in some very useful ways, and I’m looking forward to his continued involvement in this and perhaps other endevors.

FinancialInstrument Moves to CRAN

I thought I would break up the posts about GSOC (no, I’m not done yet – there are a few more to do) with a quick note about FinancialInstrument.

The FinancialInstrument package provides a construct for defining and storing meta-data for tradable contracts (referred to as instruments, e.g., stocks, futures, options, etc.). The package can be used to create any asset class and derivatives, so it is required for packages like blotter and quantstrat.

FinancialInstrument was originally conceived as blotter was being written. Blotter provides portfolio accounting functionality, accumulating transactions into positions, then into portfolios and an account. Blotter, of course, needs to know something about the instrument being traded.

FinancialInstrument is used to hold the meta-data about an instrument that blotter uses to calculate the notional value of positions and the resulting P&L. FinancialInstrument, however, has plenty of utility beyond portfolio accounting, such as pre-trade pricing, risk management, etc., and was carved out so that others might take advantage of its functionality. Brian Peterson did the heavy lifting there, constructing FinancialInstrument as a meta-data container based on a data design we developed for a portfolio management system years ago.

Utility packages like this are generally thankless work, although incredibly useful and powerful for (potentialy several) end applications. They quietly do a bunch of heavy lifting that allows the user interface to be simpler, more powerful and more flexible than they otherwise might be, and allow the developer to focus on the specific application rather than re-inventing already-existing but trapped functionality.

Thankfully, Garrett See has found both the time and motivation to take what was a useful but unfinished package and help Brian carry it across the finish line into CRAN. Garrett also added a great deal of functionality around managing the .instrument namespace, such as ls_instruments() and many other ls_* and rm_* functions. Those ls_* functions get names of instruments of a particular type or denominated in a given currency (or currencies), while rm_* functions remove instruments. Similarly, a series of update_* functions help update instruments from various sources, such as Yahoo!.

At this point, FinancialInstrument has a lot of functionality. Let’s take a closer look…
Continue reading

Conversion of Meucci’s MatLab Code

You might remember a second proposal I put forward for this summer’s Google Summer of Code (GSoC). This project was ambitious, looking to convert a subset of Attillio Meucci’s MatLab code to R. Thankfully, Brian Peterson took the lead mentor position for this particular project.

Coincidently, the day before GSoC started we received a very generous contribution of ported code from Ram Ahluwalia at Wingfoot Capital. He even provided a csv file that detailed the forty-some functions and mapped them to Meucci’s original paper and script file.

In response, we re-scoped the project to focus on the code contribution and shape it into a stand-alone package that would mirror Attilio’s code and match the results he displays in his papers and publications. Manan proceeded through Ram’s code, testing and organizing the code into a package named ‘Meucci’ that is now available on r-forge.

Ram had converted several of Meucci’s interesting MATLAB scripts to R, such as robust Bayesian portfolio optimization from his script “Meucci_RobustBayesian,” and entropy pooling for blending views on scenarios with a prior scenario-probability distribution covered in his paper, “Fully Flexible Views: Theory and Practice.” Other topics included detecting outliers using the minimum volume ellipsoid from the script for “S_HighBreakdownMVE.m,” and his marginal copula algorithm.

Manan was also able to extend the code. He added code around Meucci’s “Managing Diversification” article published in June 2009 in Risk Magazine, and code from his “Review of Statistical Arbitrage, Cointegration, and Multivariate Ornstein-Uhlenbeck“.

This has all been factored into an installable package on r-forge that contains twenty-some exported functions, a number of utility functions, six demos, and some degree of documentation for most of them. There is still much to do here, but this package should provide a good foundation for future work. All of Meucci’s original MATLAB source is available on http://www.symmys.com; browsing through the extensive material he makes available will give you a sense of how ambitious we are…

A very big thanks goes to Ram Ahluwalia for the significant code contribution, and congratulations to Manan Shaw on a successful GSoC 2012.

By the way, I probably haven’t said enough about Brian Peterson’s involvement in the GSoC this summer. Beyond being the lead mentor for this project and co-mentoring others, Brian was one of the representatives of the R Project overall. His work in leading and organizing the broader R Project effort this summer took a great deal of time, but resulted in a project list that was enthusiastically received by students and mentors alike. Many thanks to him as well for all his efforts this summer.

New Attribution Functions for PortfolioAnalytics

Another Google Summer of Code (GSoC) project this summer focused on creating functions for doing returns-based performance attribution. I’ve always been a little puzzled about why this functionality wasn’t covered already, but I think that most analysts do this kind of work in Excel. That, of course, has its own perils. But beyond the workflow issues, there have been a number of methodological refinements through time that I’m guessing most analysts don’t take advantage of. Furthermore, FactSet, Morningstar, and other vendors provide attribution functionality embedded within their reporting functions.

I’m of the opinion that R is a natural place for this kind of functionality to reside, and portfolio attribution has long been on the list of additions to the PortfolioAnalytics package. The calculations are relatively straight forward, but the mechanics of specifying and managing a portfolio hierarchy can be inconvenient. They require an attention to calculation order and other details, as well.

The mentors for this project included David Cariño, who is a Research Fellow at Russell Investments. He is also a co-author of the book that was used as a reference[1] for the project, and he teaches a course on this topic at UW. Doug Martin, who is a Professor of Statistics, Adjunct Professor of Finance and Director of Computational Finance at the University of Washington, and former Chairman of the Department of Statistics, was also a mentor.

Again, student interest in the project was strong. The mentors for GSoC decided that Andrii Babii, a student at Toulouse School of Economics, provided the strongest proposal and example code.

Working from Christopherson, Carino and Ferson (2009), Bacon (2008), and several other sources, Andrii proceeded to construct functions for calculating contribution and linking returns through time both arithmetically and geometrically. He supports a variety of methods for multi-period linking including Carino, Menchero, Davies-Laker, Frongello, and GRAP.

Using his functions, arithmetic effects can be displayed either as suggested in Brinson, Hood and Beebower (1986), or as in Brinson and Fachler (1985). Priority can be switched around from top-down to bottom-up, as well. Currency attribution is included with arithmetic effects handled according to Ankrim and Hensel (1992).

Andrii established a data format for specifying a portfolio hierarchy, which allows him to provide multi-level attribution. This is usually discussed within the Brinson model as “asset-country-sector,” although the function is written generally. It then returns the total multi-period effects and attribution effects at each level. He provides functions for weight aggregation to accomplish this, which I think will also be generally useful in PortfolioAnalytics.

Finally, Andrii also constructed functions for Fixed Income attribution and delta adjusting option returns for attribution.

All of these attribution-related functions will be moved into the PortfolioAnalytics package proper, shortly. I think you will find that all of these functions are well documented and come with good examples.

But, wait, there’s more! Andrii also knocked out a function for estimating market timing attribution according to either the Treynor-Mazuy or Merton-Henriksson models, called MarketTiming. That function is likely to end up in PerformanceAnalytics, since it is estimated using multiple regression and doesn’t require portfolio weights.

He also provided an AcctReturns function that extends the blotter package. This function takes the account data calculated from transactions and prices, along with external cash flows (such as contributions or withdrawals) to calculate time-weighted returns or linked modifed Dietz returns. This will be a welcome addition to blotter.

Congratulations to Andrii on a very successful GSoC 2012! This is a substantial contribution to PortfolioAnalytics, and is one that I think will see a great deal of use through time. I’m also looking forward to seeing where Andrii’s interests take him, and I hope they result in more contributions. Thanks also go to the mentors, David and Doug, and also to Google to making the summer possible.

[1] Jon A. Christopherson, David R. Carino, Wayne E. Ferson “Portfolio Performance Measurement and Benchmarking”, 2009. 466p. – (Wiley finance series)

…Now With More Bacon (2008)!

I’m sure that Carl Bacon[1] sighs deeply when he reads such headlines, but it is clearly appropriate in this case. Perhaps you remember that I proposed a Google Summer of Code project for 2012 around a considerable code contribution to PerformanceAnalytics from Diethelm Wuertz at ETHZ. That code was focused on adding a large number of functions from Carl Bacon’s book, “Practical Portfolio Performance Measurement and Attribution”.

The project garnered strong interest from students. Ultimately, this year’s mentors awarded the project to Matthieu Lestel who is a student at ENSIMAG, an engineering school in France. Matthieu was selected for his detailed proposal and plan for the summer, as well as a good code sample.

Matthieu commenced to produce dozens of new functions, extend several more existing ones, and add more than 40 pages of additional documentation (complete with formulae and examples) to PerformanceAnalytics. He’s included Bacon’s small data set and several new table.* functions for testing and demonstrating that the functions match the published results. All on plan, I would add.

He also wrote a very nice overview of the functions developed from Bacon (2008) that are included in PerformanceAnalytics, which should be helpful to readers or teachers of Bacon’s work. Matthieu’s summary document will also be distributed as a vignette in the package, accessible using vignette('PA-Bacon'). Additional detail is included in the documentation for each function, as well. I’ll highlight some of those functions in later posts.

His GSoC code and the vignette currently reside on r-forge in the development codebase. I fully expect that Matthieu’s code will be included in the next CRAN release sometime this fall. You will find that his functions support multi-column data, preserve data labels, and otherwise act consistently with the other functions in PerformanceAnalytics.

Congratulations to Matthieu on a very successful GSoC 2012! This has been a considerable contribution to PerformanceAnalytics, one that I think many users will appreciate and benefit from. I’m looking forward to seeing where his interests take him, and I hope they culminate in more contributions. Thanks also go to Diethelm for the original code contribution, and also to Google to making the summer possible.

I want to, again, encourage such code or documentation contributions. If there’s a function or twelve that you think would be broadly useful, please consider contributing. Even if it’s a proof of concept of an interesting idea, a good starting point is very helpful. Hand us enough code, and perhaps we’ll have another great GSoC project for 2013…

References:
[1] Bacon, Carl. “Practical Portfolio Performance Measurement and Attribution”, (London, John Wiley & Sons. September 2004) ISBN 978-0-470-85679-6. 2nd Edition May 2008 ISBN 978-0470059289

Framing investing as a decision-making process

Brian Peterson and I had a chance to visit the University of Washington a couple of weeks ago at the behest of Doug Martin, where we gave a seminar covering various R packages we’ve written. Here are the slides we used.

We also had quite a bit of time that we spent with Doug, Eric Zivot, Guy Yollin, David Carino, and others. We had some very good working sessions, mainly around this summer’s Google Summer of Code (GSoC) projects. I’ll talk about those projects as the summer progresses, but for now I wanted to post the slides we used for our seminar and talk a bit about them.

For the seminar, Doug asked us to provide a one-hour overview of some of the R packages we’ve been working on. That was something we haven’t done in quite a while, so it was a good opportunity for us to pull together a few of our favorite applications that we’ve shown separately before. We thought that to better tie them together it might help if we provided a bit more about the framework we’ve used to organize our thinking through time. So we dusted off some graphics that we used several years ago to frame the discussion.

After some reflection, I think that this framework is still useful.

Stylized process and capabilities view of an investing business

The framework begins with a stylized view of an investment management business and the core business processes that cover innovation, production, compliance, and distribution. My focus, naturally, has been on functionality that supports the research and investment components of the business. That focus can be (and has been, in different contexts) decomposed into sub-processes covering the generation of ideas through implementation and monitoring.

The process view has its limitations, of course, because at the core of any investment business is a set of recurring decisions that need to be made. The decisions that get made are not easily contained within a sub-process — a view on risk can be as relevant to idea generation or portfolio construction as it is to risk remediation. So different tools for analyzing and supporting those decisions end up being used over and over again, in slightly different ways.

The whole purpose of developing these packages in R, then, is to provide tools to help people make high quality decisions efficiently and effectively, wherever they occur in the investment process. That requires more than tools, of course. Decision-making is, unto itself, a process. Users need decision-focused information to better develop evidence and confidence. They need to make decisions consistently, and they quickly receive and assess feedback about the quality of those decisions. Context does matter, of course, but it is usually better developed by the user than the tool developer (hence the success of Excel and end-user computing in finance).

With some key decisions outlined above, a number of capabilities seem useful. A number of years ago we then decomposed those capabilities into applications, then decomposed those applications further to provide a functional view. In updating these slides, it was a pleasant surprise to see that we and the broader R community have been able to make quite a bit of progress filling in functionality. A fair amount remains to be done, certainly.

The rest of the slides discuss three specific applications built using the R toolchain. The first is returns-based performance analysis, in this case examining a hedge fund against a set of peers. The second examines the construction of a portfolio of hedge fund indexes. And the third is a backtest, in this case a simple trend-following strategy.

Packages are separated so that the Returns-and-Weights context is treated separately from Prices-and-Transactions.

I should point out that these applications bridge two different data contexts. In developing this framework, we decided to separate the returns-and-weights context from prices-and-transactions. That’s been a good decision for helping us to scope projects, although we will eventually provide the functionality for bridging the two contexts seamlessly. The two contexts already work well together with a package for time series data and another for meta-data definition.

So that’s a bit of description about how we came to develop much of the functionality available today. This framework continues to be useful for identifying needs and scoping new projects, as I hope you will see as the summer progresses.

Tagged

Download and parse EDHEC hedge fund indexes

In our pre-conference workshop, Brian Peterson and I worked with the EDHEC hedge fund indexes as a way to demonstrate how to use PortfolioAnalytics within the context of long-term allocation problems.

Although they are not investible, these indexes are probably more representative than most given that they are, in fact, meta-indexes. Other indexes might be preferable when considering a specific portfolio, however.

Here’s how to parse the data. Unfortunately, there’s no good way to download the data directly (as far as I can figure), so you’ll have to have to log in (registration is free) and download the data manually. The code shown here only requires PerformanceAnalytics.

Here’s a parser for once you have the history.csv file in a local directory.
Continue reading