Leng and I have been working on a few improvements to our model that we wanted to share.

## The story so far

We’ve been converting betting odds for each seat to probabilities of victory for each party. This is straightforward, and many others (such as the Guardian’s Simon Jackman) are doing this. Using these probabilities to estimate the most likely number of seats each party will win (and ultimately, who will win the election) is more subtle and difficult. In a previous post, we talked about a common way people get this wrong. So far, our approach has been to treat electorates as independent of each other (this makes the number of seats won by each party a Poisson-Binomial random variable). But electorates aren’t independent. All politics might be local in some parts of the world, but in Australia people vote not just for their local representative, but for who they want to be Prime Minister. For example, people in seats A and B might consider both local and national issues in choosing who to vote for. If they only consider local issues (e.g., local planning decisions), then the electorates are independent. If they only consider national issues (e.g., federal political issues such as border protection), the electorate results will be highly correlated. The truth is somewhere in between. But if Kevin Rudd was found to be secretly from New Zealand tomorrow, we think there would probably be a national swing against him in horror and disgust, and this national swing can’t be modelled right assuming seats are independent.

How important is this covariance? Let’s take a look.

## Updating the model

First, we needed to update our model. The problems here are i) including covariance in your model makes it much more complicated and ii) we don’t know what the actual covariances are between seats. We sucked it up and tackled the first part, so we can now include covariances in our model: our simulations are now samples from a correlated multivariate Bernoulli distribution.

The second part is harder: it’s not clear how to estimate covariances between seats. Instead, we’ve calculated an upper bound on the covariance between each seat. The maximum covariance is bounded by the probabilities inferred from the betting odds. You can derive an upper bound using the definition of covariance for a Bernoulli random variable and a bit of algebra.

To get an idea for how covariances affect the seat totals, we plotted histograms of number of seats won by Labor and the Coalition for two different models. In the first model, there is no covariance at all. This is a common assumption, and one we’ve been using so far: the electorates are treated as independent. Here’s the histogram using betting odds from August 6:

If we assume the seats are independent, we see Labor has practically zero (less than 1%) chance of getting more than 75 seats, and therefore winning the election. What happens if we introduce covariance? We rerun the analysis, but this time set all covariances to be the maximum possible for the given probabilities. Here are the histograms for the new model:

The difference is significant! Labor now has a 34% chance of winning enough seats to form government. The shape of the distribution also changes, with the biggest changes occurring at the extremes. This makes sense: if the seat outcomes are highly correlated, we would expect to get ‘election tsunamis’ happening frequently, where one party wins in a landslide each time.

Interestingly, the mean seat counts barely change at all between models. Under the maximum covariance model, Labor is expected to win 64 to the Coalition’s 83 seats; under the independent electorates model, Labor is 65 to the Coalition’s 83. This suggests that assuming independence between seats shouldn’t change the expected number of seats much, which is why we’ve felt comfortable reporting these figures in the AFR articles despite assuming independent electorates up until now.

## Getting our covariance on

What are the implications for our predictions? For now, we’re comfortable that our mean seat counts are reasonable, whether or not you include covariance in the model. Probabilities of overall election victory, however, are sensitive to covariances between electorates, which we’re unlikely to ever fully know. But we can bound the covariance, and therefore still get some useful information from the electorate betting odds. For instance, for August 6, the probability of Labor victory is somewhere between 0 and 34%. This is a wide range! But it’s still useful information: it tells us that the betting markets believe Labor are likely to lose the election. We’ll be keeping an eye on this over the coming weeks.

Covariance, yes! Great to see you guys thinking about this and building it into your models. But I’m suspicious of the probability distributions coming out of the maximum covariance model. Do we really believe that it is almost twice as likely that Labor wins 25 seats rather ~64 seats? It looks like you’re getting some edge effects from the bounding of the covariance, and maybe you need to look into those further. Those distributions fail a sanity test at the moment.

Completely different topic: I would be really interested to see how the betting odds are changing over time. It could be cool to trace the odds over time for a handful of interested seats (e.g. Eden-Monaro etc), if not all seats. Could make for an interesting and pretty graphic (especially if you look at major stories over that time as well) and also give you an idea of the variability in the odds themselves.

Love your work. Will it be you guys giving the keynote address at the Joint Statistical Meetings next year instead of Nate Silver? (Whoever that is?)

Pingback: Random Interlacements | Eventually Almost Everywhere

Pingback: election lab | Happy Beattie vs Sad Beattie

Pingback: election lab | 95% credible interval

Pingback: election lab | Our final predictions: how did they go?

Pingback: election lab | Response to Kevin Bonham