Visually Comparing Return Distributions

Here is a spot of code to create a series of small multiples for comparing return distributions. You may have spotted this in a presentation I posted about earlier, but I’ve been using it here and there and am finally satisfied that it is a generally useful view, so I functionalized it.

require(PerformanceAnalytics)
data(edhec)
page.Distributions(edhec[,c("Convertible Arbitrage", "Equity Market Neutral","Fixed Income Arbitrage", "Event Driven", "CTA Global", "Global Macro", "Long/Short Equity")])

When visually comparing distributions, there are a few characteristics to get right across the graphs. For example, each histogram’s bin sizes should match and the min and the max of each chart should line up.

I prefer all three views together. The histogram is a more typical view of the distribution, improved when overplotted with a normal distribution and with the zero bin marked, both being important references. The QQ plot is more important, again improved with confidence bands for a normal distribution.

This is just a first cut. There’s no reason that the normal distribution has to be the reference for these charts, but I’ll have to do some more parameterization. There is also a balance between the number of rows in the device and readability. Maybe I’ll insert some “pagination” like charts.BarVaR uses… What else?

This is checked into PApages on r-forge right now, in /sandbox as page.Distributions.R. I’m contemplating including it in PerformanceAnalytics, but I’m interested in your feedback before I do. Here’s the code:

# Histogram, QQPlot and ECDF plots aligned by scale for comparison
page.Distributions &lt;- function (R, ...) {
  require(PerformanceAnalytics)
  op &lt;- par(no.readonly = TRUE)
  # c(bottom, left, top, right)
  par(oma = c(5,0,2,1), mar=c(0,0,0,3))
  layout(matrix(1:(4*NCOL(R)), ncol=4, byrow=TRUE), widths=rep(c(.6,1,1,1),NCOL(R)))
  # layout.show(n=21)
  chart.mins=min(R, na.rm=TRUE)
  chart.maxs=max(R, na.rm=TRUE)
  row.names = sapply(colnames(R), function(x) paste(strwrap(x,10), collapse = &quot;\n&quot;), USE.NAMES=FALSE)
  for(i in 1:NCOL(R)){
    if(i==NCOL(R)){
      plot.new()
      text(x=1, y=0.5, adj=c(1,0.5), labels=row.names[i], cex=1.1)
      chart.Histogram(R[,i], main=&quot;&quot;, xlim=c(chart.mins, chart.maxs), 
                      breaks=seq(round(chart.mins, digits=2)-0.01, round(chart.maxs, digits=2)+0.01, by=0.01), 
                      show.outliers=TRUE, methods=c(&quot;add.normal&quot;), colorset = 
                        c(&quot;black&quot;, &quot;#00008F&quot;, &quot;#005AFF&quot;, &quot;#23FFDC&quot;, &quot;#ECFF13&quot;, &quot;#FF4A00&quot;, &quot;#800000&quot;))
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
      chart.QQPlot(R[,i], main=&quot;&quot;, pch=20, envelope=0.95, col=c(1,&quot;#005AFF&quot;), ylim=c(chart.mins, chart.maxs))
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
      chart.ECDF(R[,i], main=&quot;&quot;, xlim=c(chart.mins, chart.maxs), lwd=2)
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
    }
    else{
      plot.new()
      text(x=1, y=0.5, adj=c(1,0.5), labels=row.names[i], cex=1.1)
      chart.Histogram(R[,i], main=&quot;&quot;, xlim=c(chart.mins, chart.maxs), 
                      breaks=seq(round(chart.mins, digits=2)-0.01, round(chart.maxs, digits=2)+0.01, by=0.01), 
                      xaxis=FALSE, yaxis=FALSE, show.outliers=TRUE, methods=c(&quot;add.normal&quot;), colorset = 
                        c(&quot;black&quot;, &quot;#00008F&quot;, &quot;#005AFF&quot;, &quot;#23FFDC&quot;, &quot;#ECFF13&quot;, &quot;#FF4A00&quot;, &quot;#800000&quot;))
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
      chart.QQPlot(R[,i], main=&quot;&quot;, xaxis=FALSE, yaxis=FALSE, pch=20, envelope=0.95, col=c(1,&quot;#005AFF&quot;), ylim=c(chart.mins, chart.maxs))
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
      chart.ECDF(R[,i], main=&quot;&quot;, xlim=c(chart.mins, chart.maxs), xaxis=FALSE, yaxis=FALSE, lwd=2)
      abline(v=0, col=&quot;darkgray&quot;, lty=2)
    }
  }
  par(op)
}

6 thoughts on “Visually Comparing Return Distributions”

vonjd says:

January 18, 2013 at 1:25 pm

This is great – it should definitely included in PerformanceAnalytics (which is by the way surely one of the best R packages anyway!)

Ralph says:

January 18, 2013 at 2:49 pm

This is a very good use of the small multiples idea and I think keeping them together on the same display is sensible. I was wondering whether you would gain any benefit from an additional column with the original time series?

John says:

January 18, 2013 at 10:55 pm

Am I the only one who get this error when I run the code (R-2.15.2)

Error in q.function(P, …) :
unused argument(s) (xaxis = FALSE, yaxis = FALSE, ylim = c(-0.1237, 0.0745))

- John says:
  
  January 19, 2013 at 5:40 pm
  
  I’d like to publicly thank Peter for his help. In case others encounter this problem, it turned out that I needed the development version of the PerformanceAnalytics package (1.0.5.2). My first attempt to install this from RForge failed with “not available for R-2.15.2” so I ended up cloning the svn repository and building from source. But the multiple panel plot capabilities are worth the effort.
  
  - Rafik Margaryan (@raffdoc) says:
    
    January 22, 2013 at 6:25 am
    
    How we can install development version of PerformanceAnalytics?
klr (@timelyportfolio) says:

January 26, 2013 at 3:31 am

great work and very helpful. I thought it might be nice to also offer another view with distributions across multiple accounts per manager, so if we have 3 managers each with 100 accounts, we could compare the distributions across accounts instead of time. I’ll play with it and see if I can accomplish with minimal code change.

Thanks again for all the great work.