Category Archives: Programming

Texas Holdem revisited – Wars of Attrition

[ed. note: poker research was something I started a while back with my friend Pat Wong when we were developing pokerbots. The ideas in this are from those conversations – thanks Pat!]

When playing holdem, the key strategic tension is the need to play as many hands as possible, versus the need to only play “good” hands. Thanks to the mandatory antes when in the small and big blind positions, one cannot simply fold until you get pocket Aces (AA). There are only 6 possible draws of AA out of 1326 possible 2-card draws. As it turns out, you win about 80% of the time with AA, but you are only playing 6/1326 = 0.45% of the time. Meanwhile, at a table with 10 players, 2/10 times you have a mandatory ante as big or small blind, which is a much more steady drain on your capital.

There are lots of books written on opening strategy, but I was interested to see if there was an expectation based approach to playing as many hands as is reasonable to see the flop without having a negative expectation value.

To do so, we must first figure out the probabilities. As discussed in my previous post, you can figure out a decent estimate of the winning percentages of each two hand combination. There are 1326 possible two handed cards that can be drawn from a 52 card deck, but you can simplify this by only considering whether the cards are suited (s) or off-suit (d); this drops the number of combinations to 169.

Next, we have to take into account the effect of playing against multiple players. I did a brute force calculation showing the win/loss/tie percentages against 2 or 3 players (ie. you against one opponent or two opponents). Usually you can simplify matters by simply assuming that each opponent is an independent game – but I wanted to do the research to see how close this is to the truth.

poker_odds2hand_unweightedpoker_odds3hand_unweighted

Above is a hand table showing the percentage chance of winning for 2 and 3 players. Obviously the odds of winning drops the more opponents you face, but what keeps the game interesting is that the pot increases faster than the odds drop. Note that these are the unweighted probabilities – i.e. it doesn’t take into account the fact that no sane opponent is going to play 27d. Now if you are facing a big blind that hasn’t put up any money beyond their ante – so no raises thus far – then the unweighted probability table is the one you would use. On the other hand, if you want to consider the “real” tables where players would fold crappier cards (a process discussed in my earlier post) then you would want the weighted probability tables – in general I’ll use these weighted numbers going forward.

poker_odds3handweighted poker_odds2handweighted

To see the effect of my weighting, here’s a quick graph showing the unweighted versus weighted probabilities. Note that my weighting schedule is certainly open to criticism – you may have a very different estimate as to how to do the weighting of opponents.

poker_weighted_vs_unweighted

I wanted to see how closely I could estimate for multiple players. If a 3 player game were equivalent to 2 independent games played at the same time, then you would see the probability of winning be the square of the 2 player game. Let’s say you have cards with a 50% chance of winning against a single opponent. For an independent game, that means the chance of winning should drop to 25% (i.e. 50% x 50%). However, in texas holdem, you have shared community cards so they are clearly not independent. The question is by how much they are not independent. Below is a graph showing how much of a differential there is between the independent case (i.e. just squaring the 2 player probability) versus what we see when we do an exhaustive calculation.

poker_3player_win_est

The graph shows that this actually underestimates your change of winning. If your hole cards are good enough to beat one player, then you are more likely to beat the second player. This adds about 8% to your chance of winning versus the calculation assuming independence (that’s what the histogram below is showing). These are unweighted estimates by the way. For the rest of the calculations I’ll be using the actual odds, not just an estimate – mainly because it took 4 solid days of computer time to calculate them, so why throw them away? I’m curious about how 4-10 players would look but I fear the calculation time that would take – I’m not willing to blow the money on EC2 to figure out a hunch.

Okay – here’s where the rubber hits the road. We have our weighted probabilities for 3 players. We can now figure out which cards we should play to minimize the cost of seeing the flop. We need one additional set of assumptions. I’m assuming a 10 person table, but that most of the time people fold so that there are only 3 opponents willing to see the flop. Therefore, the expected value of folding is (on average) equal to the cost of the ante for the big and small blinds. Assuming ante is $1, that means if you did nothing but fold and never played a hand, your average cost per fold is $1/10 + $0.5/10 (the ante’s for the blinds averaged over the 10 players). But we will be playing some hands – let’s assume for simplicity sake that the cost of losing is $1 (the cost of your ante assuming no raises). On the other hand, the profit from winning is $2.5 (i.e. you get the ante’s from the two opponents, plus the small blind assuming they fold as well).

This gives us a way of calculating the expectation value for our strategy. If we play 50% of the hands, and the hands we do play have a 40% chance of winning, then the expectation is:

Expected Value = 50% x (-$1.5/10) + 50% x (40% x $2.5 – 60% x $1) = $0.55

Now, let’s look at the weighted win percentages for the cards from above. If we set a cutoff percentage where we play cards with greater than the cutoff and fold cards with win probabilities below the cutoff – and we calculate a weighted expectation given the win percentages for each card pair, then we can create a quick graph that shows us the cutoff with the maximum expectation:

poker_optimal_cutoff

Going back to our weighted win percentages, that gives us the following table:

poker_strategytable

The values in pink (red is too hard to read) show us the cards that we play. The values in blue are the ones we fold. Yellow values are ones on the line (i.e. within 10% of the cutoff value). If we play these cards, we’ll see enough flops and have a good enough chance of winning to be basically breakeven. At this point, your skill at playing poker and bluffing has to come in – which is where you’ll see your profits.

As an exercise, I simulated playing 1,000,000 hands to see what your bank balance would look like under this strategy – on average its about breakeven (showing a profit of $6000 over 1 million hands is about $0.006/hand). The bottom histogram also looks at drawdowns – i.e. once you hit a new high, how far down will you go on average. Again, given it is 1 million hands, there’s not too much variance in there.

poker_drawdown

I’m trying to build this into a nice spreadsheet that you can use to create your own strategies – I did most of this work in R and it’s not easy to go from code to spreadsheets. I’ll post an update when I figure it out.

Texas Holdem win/loss percentages on the Ante

Texas Hold’em is bloodsport, primarily because of its computational complexity which requires you to guess probabilities that are not easy to estimate. The “blood” part comes from the consequences of making uninformed bets, which can get you into big trouble – if you don’t know which guy at the table is the mark, then its always you. Poker is not meant to be kind to the beginner.

I used to write pokerbots. For the flop, turn and river rounds the ease of calculating hand odds by computer in real time is pretty straightforward with a bit of pre-calculation. The hard part was dealing with bluffing and adjusting the hand odds accordingly. The University of Alberta’s pokerbot program did a really good job figuring that out – I usually wrote my bots to combat theirs.

The hardest part – and the most computationally intensive – was figuring out proper bets during the ante. You had your 2 cards, you faced a table holding their cards – and you had to properly guess the odds. I saw lots of opening round strategies but didn’t see anyone making a simple table with the probabilities of winning or losing. So – I wrote a quick program to figure that out.

The first step is to show the “base” probabilities. This is the chance of a win or a draw if you go right to the river card – it’s not odds as we would need to know the size of the pot etc, but you can use this info to figure out the odds pretty easily from there. I simplified things (for reasons that will be clear later) by reducing openning cards to suited or not suited for me versus my opponent. Here’s the table showing that result:

You use it by looking at your hand and reading off the table. Let’s say I was dealt 10-H and A-D. That’s an offsuit pair, so I’m looking at the upper part of the triangle (note that all pairs are necessarily offsuit), specifically at the cell for TAd (T=10, A=Ace, d=offsuit portion of the grid). In this case, it gives me a probability of 65%. That means playing against every other hand combination and assuming that all those hand combinations appear equally, I would have a 65% chance of winning. That means against 22d (pair of 2’s), AAd (pair of Aces), 27s (2,7 same suit) etc. – all 169 combinations (13 x 13).

This grid would be good for figuring out your chances against the blinds if no one raised during the ante – but not good for much else.

Things get more complicated if you are figuring out the probability of a win against other players who have to bet money in order to stay in the game (or if the blinds raised or called to a raise). In that case, the likelihood of your TDd playing against an opponents 27d is zero: that’s literally the worst hand and no one is going to bet money on that. So, we need to weight the hands of your opponents in order to come up with a more realistic estimate of what your winning odds are. In this case, I used the odds from the base table to help me figure it out – anything with less than a 50% base chance of winning isn’t going to get played by an opponent, while anything with 70% or over base chance will get play 100%. All numbers in between are simply copied. When I use this weighting, I come up with a new table:

For the record, here’s the probabilities I use to calculate table 3:

I’m trying to put this into a spreadsheet so that you can make your own probabilities based on your own estimates. The code used to calculate this was a separate program I wrote that did a couple of days of calculations to come up with the base cases I use to calculate the estimates – I need to clean it up before releasing it to the world, so I’ll probably post that later. I originally did this years ago in C but for simplicity I just wrote this version in Java – oddly enough, running highly optimized C code on a machine from 10 years ago versus unoptimized Java code on a modern laptop (and not even the fastest one out there – I probably could have multithreaded it to take advantage of the multiple cores on my i5) took about the same length of time.

Finally, for the record, even this is a simplification. Given, TAd, there are actually twelve combinations of cards which would match that pattern (10 hearts – Ace diamonds, 10 spades – Ace hearts etc.). The issue lies in the base program I use to calculate the odds of various match ups like TAd versus 27s. If the 10 in TA is a hearts and both of the 27s are hearts, then the chance of a flush resulting in a tie is greater than if the two sets of cards have different suits. Still, flushes are relatively rare and don’t change the percentages by much – you can get a sense of the effect by looking at the relative win percentages for, say, TAs versus TAd – it’s a delta of 1-2%. I figured that was a fine margin of error if it meant I could fit everything into a 13×13 table.

Visualizing the S&P 500

It’s difficult to grasp what is happening to the entire S&P 500 all at once. A friend of mine who was a stellar technical analyst came up with an idea to view the S&P 500 as a packed array of boxes, organized by sector and with each stocks ‘box’ area scaled to market cap. Then each box was colour-coded by return for the month. The net result was something like this:

sp500.april.box.chart

I recreated that image for the monthly economics report using the following code. The boxChartHelper function takes a data.frame holding the stocks and returns another data.frame with the co-ordinates for each area. The data frame should have a column named “size” which is used to scale the boxes. The plotBoxChart function takes a data.frame of stocks (with columns specifying their ticker, sector, size or market cap, and the colour you want the box) and plots it using base graphics.

I know: (1) Hadley Wickham could do it in 2 lines with ggplot2, and (2) this isn’t a box plot so the naming is terrible. I’m just putting it here for reference because some people wanted to know how I make the chart – hint: not in Excel.

boxChartHelper<-function(data, # data.frame holding data
                         left, right, bottom, top) {
  if(is.na(sum(data$size))) browser()
  
  # calc values
  width = right - left
  height = top - bottom
  aspect = width/height  # asp > 1, width bigger than height
  
  # if there is only one row, set and leave
  if (nrow(data)==1) {
    data$left = left
    data$right = right
    data$left = left
    data$top = top
    data$bottom = bottom
    return(data)
  }
  
  # if two rows, split by width
  if (nrow(data)==2) {
    if (aspect >= 1) { # split the width
      data$top = top
      data$bottom = bottom
      width.1 = width*(data$size[1]/sum(data$size))
      data$left = c(left, left+width.1)
      data$right = c(left+width.1, right)
      return(data)
    } else { # split the height
      data$left = left
      data$right = right
      height.1 = height*(data$size[1]/sum(data$size))
      data$bottom = c(bottom, bottom + height.1)
      data$top = c(bottom + height.1, top)
      return(data)
    }
  }
  
  # else, cut into two and recurse
  data<-data[order(data$size, decreasing=TRUE),]
  splitter <- c(TRUE, rep(FALSE, nrow(data)-1))
  for(i in 2:nrow(data)) {
    if(sum(data$size[splitter])/sum(data$size) > 0.5) break
    splitter[i] <- TRUE
  }
  
  if (aspect >= 1) { # split the width
    width.1 <- width*sum(data$size[splitter])/sum(data$size)
    return(rbind(boxChartHelper(data[splitter,], left, left+width.1, bottom, top),
                 boxChartHelper(data[!splitter,], left+width.1, right, bottom, top)))           
  } else { # split the height
    height.1 <- height*sum(data$size[splitter])/sum(data$size)
    return(rbind(boxChartHelper(data[splitter,], left, right, bottom, bottom+height.1),
                 boxChartHelper(data[!splitter,], left, right, bottom+height.1, top)))           
  }
}


plotBoxChart<-function(data, 
                       sec.col="Sector", 
                       size.col="MarketCap", 
                       change.col="Change",
                       colour.col="colour", 
                       title=NULL,
                       bottom.space=4,
                       show.box.labels=FALSE) {
  # make sure the data is okay
  if (any(is.na(data[[sec.col]]))) stop("Error: NA Sectors")
  if (any(is.na(data[[size.col]]))) stop("Error: NA Sizes")
  
  
  if(!is.null(title)) title.space = 2 else title.space=0.25
  par(mar=c(bottom.space,0.25,title.space,0.25), cex=1)
  aspect=dev.size()[1]/dev.size()[2]
  plot.new()
  plot.window(xlim=c(0,100)*aspect, ylim=c(0,100), xaxs="i", yaxs="i")
  if(!is.null(title)) title(main=title)
  
  bounds <- c(par("usr"))
  sec.data<-tapply(data[[size.col]], data[[sec.col]], sum)
  sec.data<-as.data.frame(sec.data)
  sec.data$sec <- rownames(sec.data)
  names(sec.data) <- c("size", "section")
  sec.data$change<-0
  sec.data$left<-0
  sec.data$right<-0
  sec.data$top<-0
  sec.data$bottom<-0
  
  sec.data<-boxChartHelper(sec.data, bounds[1], bounds[2], bounds[3], bounds[4])
  
  # map and plot individual stocks
  for (i in 1:nrow(sec.data)) {
    sec.name = sec.data$section[i]
    stock.list <- data[data$Sector==sec.name,]
    stock.list$left<-0
    stock.list$right<-0
    stock.list$top<-0
    stock.list$bottom<-0
    stock.list$size <- stock.list[[size.col]]
    
    stock.list <- boxChartHelper(stock.list, sec.data$left[i], 
                                 sec.data$right[i],
                                 sec.data$bottom[i],
                                 sec.data$top[i])
    
    sec.data$change[i] <- sum(stock.list[change.col] * stock.list$size)/
      sum(stock.list$size)
    
    # plot the results
    for (j in 1:nrow(stock.list)) {
      # draw the filled rectangle
      rect(xleft=stock.list$left[j],
           ybottom=stock.list$bottom[j],
           xright=stock.list$right[j],
           ytop=stock.list$top[j],
           border="grey", lwd=0.25, col=stock.list$colour[j])
      
      # draw the name of the stock
      if(show.box.labels==TRUE) {
        stock.text <- paste(stock.list$Ticker[j],"\n",format(stock.list[[change.col]][j], digits=2),"%",sep="")
        stock.cex <- 0.75
        box.width <- (stock.list$right[j] - stock.list$left[j])/2
        box.height <- (stock.list$top[j] - stock.list$bottom[j])/2
        
        # remove the ticker symbol if too small
        if ((strheight(stock.text, cex=stock.cex) >= box.height * 0.98) ||
              (strwidth(stock.text, cex=stock.cex) >= box.width * 0.98)) {
          stock.text <- stock.list$Ticker[j]
        }
        
        # try one level smaller
        if(strheight(stock.text,cex=stock.cex) >= box.height * 0.98) stock.cex <- 0.5
        if(strwidth(stock.text,cex=stock.cex) >= box.width * 0.98) stock.cex <- 0.5
        
        # draw if it fits
        if ((strheight(stock.text, cex=stock.cex) < box.height * 0.99) &&
              (strwidth(stock.text, cex=stock.cex) < box.width * 0.99)) {
          text(x=mean(c(stock.list$left[j], stock.list$right[j])),
               y=mean(c(stock.list$top[j], stock.list$bottom[j])) - strheight(stock.text, cex=stock.cex)/2,
               labels=stock.text,
               cex=stock.cex,
               pos=3,
               offset=0)
        }
      }
      
    }
  }
  
  # plot the sections
  for (i in 1:nrow(sec.data)) {
    sec.width <- sec.data$right[i] - sec.data$left[i]
    rect(xleft=sec.data$left[i],
         ybottom=sec.data$bottom[i],
         xright=sec.data$right[i],
         ytop=sec.data$top[i],
         border="black", lwd=3)
    if (strwidth(sec.data$section[i], cex=1.5) > sec.width) {
      label <- gsub(" ", "\n", sec.data$section[i])
    } else {
      label <- sec.data$section[i]
    }
    text(x=(sec.data$left[i]+sec.data$right[i])/2,
         y=(sec.data$bottom[i]+sec.data$top[i])/2+strheight(label)/2,
         labels=label, pos=1, offset=0, cex=1.5)
  }
  invisible(sec.data)
}


Are Housing Prices High?

Many US recoveries in the past have been driven by housing. Conversely, a major factor in the meltdown in 2008 was also driven by housing. It’s reasonable to ask: how can we identify housing bubbles?

Bubbles are tied to discussions about whether the current price levels are sustainable. There are a lot of ways to skin that particular cat, but one item I like to keep track of is the relation between housing prices and disposable income. There’s a clear linear relationship between the two and a very aggressive “reversion to mean” behaviour. Whether this is more about prices collapsing or incomes rising is up for debate and the trigger to make that happen is something I haven’t figured out, but its a great relationship to watch.

Housing chart

The latest data point is the green dot. Note how far above the trendline it is sitting.

Let’s look at the data behind the chart. The best free source I can think of is FRED – it’s timely, comprehensive and easy to download with R. Here’s the specific series that I’m looking at:

 # Load required data
library(quantmod)
getSymbols(c("MSPNHSUS", "CUUR0000SA0L2", "A229RX0"), src="FRED")

This shows new home pricing (MSPNHSUS) and disposable income per capita (A229RX0). Finally, since we are looking for such long periods of time, its worthwhile to take inflation into account, however I wanted to look at inflation excluding house pricing (CUUR0000SA0L2). Next step is to marshal the data into a nice format:

# Calculate raw data
real.home.sales <- MSPNHSUS / CUUR0000SA0L2 * as.numeric(last(CUUR0000SA0L2))
real.ave.income <- A229RX0
house.data<-cbind(real.home.sales/1000, real.ave.income/1000)
names(house.data)<-c("real.home.sales","real.ave.income")
house.data <- as.data.frame(house.data[complete.cases(house.data),])

# simple linear model of home sales to average income
home.income.lm <- lm(real.home.sales ~ real.ave.income, data=house.data)

Now that we have the data in some nice data frames, here's the code to built the plot of the results above.

# create the plot
par(mar=c(4,3,2,0.5), cex=1)
plot(x=house.data$real.ave.income, 
     y=house.data$real.home.sales, 
     ylab="", xlab="",main="", type="n")
grid(lty=2, col="lightgrey")
points(x=house.data$real.ave.income, 
       y=house.data$real.home.sales, 
       pch=20, col="blue", cex=0.5)
title(main="Personal Income vs. Housing Prices (Inflation adjusted values)",
      cex.main=1.1, font.main=1)
mtext(side=2, "New Home Price (000's)", line=2)
mtext(side=1, "Disposable Income Per Capita (000's)", line=2)

# plot the latest point
last.point=c(last(house.data$real.ave.income), last(house.data$real.home.sales))
points(x=last.point[1], y=last.point[2], pch=20, col="green", cex=3)

# plot the best fit line
abline(home.income.lm, col="red")

text(x=par("usr")[1], y=par("usr")[4]-strheight("R")*1.1,
        pos=4, offset=0.5,
        labels=bquote(r^2: ~ .(paste0(format((summary(home.income.lm))$adj.r.squared*100,
        digits=2, nsmall=1),"%") ) ))

text(x=par("usr")[1], y=par("usr")[4]-2*strheight("R")*1.5, 
        pos=4, offset=0.5,
        labels=paste0("Range: ", format(as.POSIXct(first(rownames(house.data))), "%b %Y")," - ",
        format(as.POSIXct(last(rownames(house.data))), "%b %Y")))