Category Archives: R-project

Visualizing the S&P 500

It’s difficult to grasp what is happening to the entire S&P 500 all at once. A friend of mine who was a stellar technical analyst came up with an idea to view the S&P 500 as a packed array of boxes, organized by sector and with each stocks ‘box’ area scaled to market cap. Then each box was colour-coded by return for the month. The net result was something like this:

I recreated that image for the monthly economics report using the following code. The boxChartHelper function takes a data.frame holding the stocks and returns another data.frame with the co-ordinates for each area. The data frame should have a column named “size” which is used to scale the boxes. The plotBoxChart function takes a data.frame of stocks (with columns specifying their ticker, sector, size or market cap, and the colour you want the box) and plots it using base graphics.

I know: (1) Hadley Wickham could do it in 2 lines with ggplot2, and (2) this isn’t a box plot so the naming is terrible. I’m just putting it here for reference because some people wanted to know how I make the chart – hint: not in Excel.

boxChartHelper<-function(data, # data.frame holding data
                         left, right, bottom, top) {
  if($size))) browser()
  # calc values
  width = right - left
  height = top - bottom
  aspect = width/height  # asp > 1, width bigger than height
  # if there is only one row, set and leave
  if (nrow(data)==1) {
    data$left = left
    data$right = right
    data$left = left
    data$top = top
    data$bottom = bottom
  # if two rows, split by width
  if (nrow(data)==2) {
    if (aspect >= 1) { # split the width
      data$top = top
      data$bottom = bottom
      width.1 = width*(data$size[1]/sum(data$size))
      data$left = c(left, left+width.1)
      data$right = c(left+width.1, right)
    } else { # split the height
      data$left = left
      data$right = right
      height.1 = height*(data$size[1]/sum(data$size))
      data$bottom = c(bottom, bottom + height.1)
      data$top = c(bottom + height.1, top)
  # else, cut into two and recurse
  data<-data[order(data$size, decreasing=TRUE),]
  splitter <- c(TRUE, rep(FALSE, nrow(data)-1))
  for(i in 2:nrow(data)) {
    if(sum(data$size[splitter])/sum(data$size) > 0.5) break
    splitter[i] <- TRUE
  if (aspect >= 1) { # split the width
    width.1 <- width*sum(data$size[splitter])/sum(data$size)
    return(rbind(boxChartHelper(data[splitter,], left, left+width.1, bottom, top),
                 boxChartHelper(data[!splitter,], left+width.1, right, bottom, top)))           
  } else { # split the height
    height.1 <- height*sum(data$size[splitter])/sum(data$size)
    return(rbind(boxChartHelper(data[splitter,], left, right, bottom, bottom+height.1),
                 boxChartHelper(data[!splitter,], left, right, bottom+height.1, top)))           

  # make sure the data is okay
  if (any([[sec.col]]))) stop("Error: NA Sectors")
  if (any([[size.col]]))) stop("Error: NA Sizes")
  if(!is.null(title)) = 2 else
  par(mar=c(,0.25,,0.25), cex=1)
  plot.window(xlim=c(0,100)*aspect, ylim=c(0,100), xaxs="i", yaxs="i")
  if(!is.null(title)) title(main=title)
  bounds <- c(par("usr"))<-tapply(data[[size.col]], data[[sec.col]], sum)<$sec <- rownames(
  names( <- c("size", "section")$change<-0$left<-0$right<-0$top<-0$bottom<-0<-boxChartHelper(, bounds[1], bounds[2], bounds[3], bounds[4])
  # map and plot individual stocks
  for (i in 1:nrow( { =$section[i]
    stock.list <- data[data$,]
    stock.list$size <- stock.list[[size.col]]
    stock.list <- boxChartHelper(stock.list,$left[i], 
                       $top[i])$change[i] <- sum(stock.list[change.col] * stock.list$size)/
    # plot the results
    for (j in 1:nrow(stock.list)) {
      # draw the filled rectangle
           border="grey", lwd=0.25, col=stock.list$colour[j])
      # draw the name of the stock
      if( {
        stock.text <- paste(stock.list$Ticker[j],"\n",format(stock.list[[change.col]][j], digits=2),"%",sep="")
        stock.cex <- 0.75
        box.width <- (stock.list$right[j] - stock.list$left[j])/2
        box.height <- (stock.list$top[j] - stock.list$bottom[j])/2
        # remove the ticker symbol if too small
        if ((strheight(stock.text, cex=stock.cex) >= box.height * 0.98) ||
              (strwidth(stock.text, cex=stock.cex) >= box.width * 0.98)) {
          stock.text <- stock.list$Ticker[j]
        # try one level smaller
        if(strheight(stock.text,cex=stock.cex) >= box.height * 0.98) stock.cex <- 0.5
        if(strwidth(stock.text,cex=stock.cex) >= box.width * 0.98) stock.cex <- 0.5
        # draw if it fits
        if ((strheight(stock.text, cex=stock.cex) < box.height * 0.99) &&
              (strwidth(stock.text, cex=stock.cex) < box.width * 0.99)) {
          text(x=mean(c(stock.list$left[j], stock.list$right[j])),
               y=mean(c(stock.list$top[j], stock.list$bottom[j])) - strheight(stock.text, cex=stock.cex)/2,
  # plot the sections
  for (i in 1:nrow( {
    sec.width <-$right[i] -$left[i]
         border="black", lwd=3)
    if (strwidth($section[i], cex=1.5) > sec.width) {
      label <- gsub(" ", "\n",$section[i])
    } else {
      label <-$section[i]
         labels=label, pos=1, offset=0, cex=1.5)

Are Housing Prices High?

Many US recoveries in the past have been driven by housing. Conversely, a major factor in the meltdown in 2008 was also driven by housing. It’s reasonable to ask: how can we identify housing bubbles?

Bubbles are tied to discussions about whether the current price levels are sustainable. There are a lot of ways to skin that particular cat, but one item I like to keep track of is the relation between housing prices and disposable income. There’s a clear linear relationship between the two and a very aggressive “reversion to mean” behaviour. Whether this is more about prices collapsing or incomes rising is up for debate and the trigger to make that happen is something I haven’tĀ figured out, but its a great relationship to watch.

Housing chart

The latest data point is the green dot. Note how far above the trendline it is sitting.

Let’s look at the data behind the chart. The best free source I can think of is FRED – it’s timely, comprehensive and easy to download with R. Here’s the specific series that I’m looking at:

 # Load required data
getSymbols(c("MSPNHSUS", "CUUR0000SA0L2", "A229RX0"), src="FRED")

This shows new home pricing (MSPNHSUS) and disposable income per capita (A229RX0). Finally, since we are looking for such long periods of time, its worthwhile to take inflation into account, however I wanted to look at inflation excluding house pricing (CUUR0000SA0L2). Next step is to marshal the data into a nice format:

# Calculate raw data
real.home.sales <- MSPNHSUS / CUUR0000SA0L2 * as.numeric(last(CUUR0000SA0L2))
real.ave.income <- A229RX0<-cbind(real.home.sales/1000, real.ave.income/1000)
names(<-c("real.home.sales","real.ave.income") <-[complete.cases(,])

# simple linear model of home sales to average income
home.income.lm <- lm(real.home.sales ~ real.ave.income,

Now that we have the data in some nice data frames, here's the code to built the plot of the results above.

# create the plot
par(mar=c(4,3,2,0.5), cex=1)
     ylab="", xlab="",main="", type="n")
grid(lty=2, col="lightgrey")
       pch=20, col="blue", cex=0.5)
title(main="Personal Income vs. Housing Prices (Inflation adjusted values)",
      cex.main=1.1, font.main=1)
mtext(side=2, "New Home Price (000's)", line=2)
mtext(side=1, "Disposable Income Per Capita (000's)", line=2)

# plot the latest point
last.point=c(last($real.ave.income), last($real.home.sales))
points(x=last.point[1], y=last.point[2], pch=20, col="green", cex=3)

# plot the best fit line
abline(home.income.lm, col="red")

text(x=par("usr")[1], y=par("usr")[4]-strheight("R")*1.1,
        pos=4, offset=0.5,
        labels=bquote(r^2: ~ .(paste0(format((summary(home.income.lm))$adj.r.squared*100,
        digits=2, nsmall=1),"%") ) ))

text(x=par("usr")[1], y=par("usr")[4]-2*strheight("R")*1.5, 
        pos=4, offset=0.5,
        labels=paste0("Range: ", format(as.POSIXct(first(rownames(, "%b %Y")," - ",
        format(as.POSIXct(last(rownames(, "%b %Y")))