AFR – July 31

Here’s a link to the latest AFR write up re the predictions K and I have been making. It does a neat job in summarizing all the predictions over time. If anyone is feeling the momentum, it looks like its the ALP.

Re the election date, it’s currently scheduled for the week 2 of the AFL finals (Sept 14). This will be the first federal election held during the AFL finals since 1946 (thanks AEC). Looks like most elections are held in March, or between Oct-Dec. Football and elections always remind me of the 1999 Victorian election held on preliminary final day – two massive upsets in one day. Hopefully there will be no upsets in the football this year.



We get our electoral betting market data from Sportsbet – scraping it using Python, using a cron job to automate the scraping everyday, and then a little script to get it sent to dropbox. It was a nice little set up because it was all pretty automated.

But from July 22 until July 29, betting data for around 40 seats were not available. Our scraping was still working, but the data wasn’t on the site. So in that time, me and K just sat and waited. Now, truly great bloggers would have gone to other betting sites and tried to find alternative sources of data. Not us. We just waited.

Lucky for us, Sportsbet are now back with all 150 seats. And in our down time, our pal at the AFR, Edmund Tadros, told us that Sportingbet have some data too. So maybe we’ll try write another little Python script to scrape it. (If you ever need to scrape data off a website, check out the Python library Beautiful Soup.)

Thankfully today we checked and it looks like all 150 seats are available again. More predictions coming tomorrow!

Longshot bias

So its now July 30, but below we share the predicted results based off the betting markets on July 22. One small thing we did is to tweak the model a little bit to deal with the longshot bias. This has changed the results a bit.

What exactly is the longshot bias? 

In gambling, it is an empirically regular occurrence that bettors tend to overvalue the the longshot (ie. the outcome or candidate that has a very low chance of winning). And they tend to undervalue the favorite. So in betting odds, we might see that ALP candidate pays out $5 for a $1 bet, while the Coalition candidate will only pay out $1.30 for a $1 bet. The ALP candidate is the longshot, the Coalition is the favorite. But the true underlying odds might be $8 for the ALP and $1.10 for the Coalition. Because bettors have a bias in betting on the longshot, this will increase the demand for that outcome, thus reducing the payout.

What are some explanations of the longshot bias?

Justin Wolfers and Erik Snowberg have a nice paper that looks at two competing theories of the longshot bias. The traditional thinking is that some punters are risk loving in nature. This assumes that the punters are rational, but bet on the longshot because they derive utility from taking risky bets. Given a choice between guaranteed $25 or a gamble with 25% chance of winning $100, the risk loving person takes the gamble even though the expected value of both choices are the same. They love the punt.

The paper looks at another possibility and concludes that this is a bigger driver of the longshot bias. The authors argue that nope, the bettors arent really rational, risk loving economic actors. Nope, they’re just naughty children with misguided perceptions of probability!

How do we correct for longshot bias and how did this affect results?

Having learned a bit more re this longshot bias lark, we decided that in seats where a candidate has less than 10% chance of winning, this is effectively a longshot. For those seats, we rounded it down to 0. The effect of this is pretty stark. Looking at the category of ‘other’, it has gone down to only 2 seats. This is due to the changes we made to the model, not changes in the betting odds. Let’s compare this to what we’re seeing in the real world by having a look at the MPs who are currently in the ‘other’ category.

It looks like with Rob Oakeshott and Tony Windsor both retiring, Lyne and New England are definitely in play for the major parties. And Tony Crook has also retired, putting O’Connor in play too.

Adam Bandt (Melbourne: Greens) benefited from Liberal preferences at the last election and will be hoping to take advantage of the progressive vote being angered by the hardline ALP asylum seeker position. But it looks like he might struggle to hold onto Melbourne if the chat is true that the Liberals will preference the ALP in this election.

Andrew Wilkie is looking pretty good in Denison, having scored some union love from the NTEU. We love it he’s up against newcomer Jane Austin from the ALP. People have been making high brow literary jokes about her name – we have instead chosen to focus on the metropolitan hospital genre but have come up short. Everyone’s favorite Katter’s Australian Party candidate, Bob Katter, is also looking pretty good in his seat. So if we’re counting right, the two predicted seats for ‘other’ are Wilkie and Katter. If you’ve heard anything different, let us know.

Checking in

So it appears that K and I arent the best at this blogging caper. Looking at the blogs we like to read, those people are very consistent with a post a day or so. Looking at our short track record, we’re duds. Sorry about this. K has been doing the academic thing, flying from MIT back to Melbourne for a conference – some stuff about the environment. He’s a good kid like that. I dont have any real excuses. Just a bit soft I guess.

Anyway, enough agony aunt stuff and onto the stats. Below, please find the predicted results from the July 21 data. Looks like the ALP is creeping up. We’ve been paying attention to the latest national polling by the big guys and they’re all calling it very close. Looking at the number of current crossbenchers (there are 6 of them), our predicted ‘other’ win count looks a bit high. We’re going to tweak our model a bit to try fix this up. Our usual disclaimers hold (ie. very early out, maybe the data not good if not enough betting etc). Thanks for reading our blog!

Some stale / fresh data: Data from July 11

So K and I are running a few days behind the data. We’ve set up a kron job to automate the scraping of the data each day from the Sportsbet website, but it takes us a bit of time to organize ourselves re doing the analysis and then writing up the post. So the data is kind of fresh in that it’s new to the blog. But we know it’s a bit stale given we’re blogging about it 6 days late. Sorry about that.

We’ve just completed the analysis for the data from July 11. In the press, it’s all been about how the national level polls are tightening and the election is going to be close. Accordingly, we’re seeing more stories where the inevitability of an Abbott government is increasingly being questioned whereas before all the pressure and focus was on the ALP.

Our electoral level analysis differs a bit from the national polls. It shows the ALP making up a bit of ground, but the margin between the two parties re the expected number of seats won is still pretty large.

On July 11, the ALP had increased the expected number of seats they would win by 4 from the number predicted on June 30 (increase from 56 to 60). We’re mindful that the original election date of Sept 14 is pretty far away, and possibly going to change. (There’s a bit of chat that Nov 2 could be the new date.) But the electoral level analysis has not swung as dramatically as the national level analysis. We’ll look at a few reasons why this might be the case in coming posts.

July 11 – Predicted seats won

Latest analysis in the AFR

Our latest analysis has been written up in the AFR. In short: the probability of a Labor victory has increased again, but is still very small.

NB – We don’t link directly to the AFR website because the articles are behind a paywall. So we save the file as a pdf, put it in Dropbox, and then make use of the public link facility. We know it looks a bit janky. If you have a better solution, let us know!

Probability Distributions

In the last post, we showed some early probability distributions that model the number of seats each party would win, and the associated probabilities. Probability distributions are perhaps not polite dinner party conversation topics (K and I have made this mistake many a time) but they are hugely important. Its a neat way to represent the degree of uncertainty about the outcome of certain events. The only thing we know for sure re how many seats the ALP will win:

  1. They will definitely win between 0 and 150 seats.
  2. The probability of them winning exactly 0 seats, 1 seat, 2 seats etc must be between 0 and 1.
  3. If we add up all the probabilities for winning each seat, it would add up to 1.

I thought it might be relevant to point a few things out about the probability distribution we showed in the previous blog.

  1. We’ve obviously put two probability distributions into the same chart. In future, we’ll likely separate them out in case there is any overlap which makes it hard to see stuff.
  2. The shape of the distributions look bell shaped (ie. normally distributed). This is called a Poisson-Binomial distribution, and is well-approximated by a normal distribution in some cases.

Only bell we could find

We hope to dive into more analysis of the politics in future blog posts instead of making you go through our version of stats 101!

Modeling election outcomes

So far, all our posts thus far have been set up stuff. We’ve covered

  1. self-important intros and why we’re doing this
  2. how not to use electoral probabilities
  3. a bit of talking ourselves up re the AFR article
  4. advantages and disadvantages of using electoral betting data

It’s time to look at how we use the probabilities we have from the betting markets to model the election.

What are we modeling? 

To start with, we want to predict how many seats the governing ALP will win at the next election. Dont be surprised, but the ALP will definitely win between 0 and 150 seats! The aim of our analysis is to figure out the probability of each possible outcome. So we want to figure out the probability that the ALP will win 0 seats, the probability they will win 1 seat, will win 2 seats,…, will win 150 seats. And we will then do the same for the Coalition.

We can then draw this as a nice little probability distribution, analyze the shape, and talk about the likelihood of various scenarios.

How do we go about doing this?

For each electorate, imagine there are only two relevant probabilities. The probability ALP wins, and the probability ALP loses. We can then think of the result of each electorate as a toss of an unfair coin. Some coins will favor the ALP, and some coins will favor the Coalition. We can then simulate an election by tossing each of the unfair coins. And then we can count the number of electorates won by the ALP, and not won by the ALP to make a prediction.

We repeat this process 100,000 times. The more times we do it, the better we can approximate the true probability distribution. After simulating 100,000 elections, we can then see how often certain outcomes occur. Maybe we see that the ALP winning 63 seats happens in 40,000 of the simulated elections. We’d then say the probability of the ALP winning 63 seats is 40%. We would do this for every possible outcome and then plot a probability distribution.

What do the probability distributions look like?

Below we show a probability distribution we did just before Julia Gillard was replaced. The model predicts that:

  • the probability the ALP will win government (ie. more than 75 seats) is pretty low.
  • the expected number of seats the ALP will win is 49.
  • the expected number of seats the Coalition will win is 95.

Some of these numbers were covered in the AFR last weekend. But we wanted to show the shape of the probability distribution before Gillard was replaced.

June 25 – Before Rudd

After Rudd replaced Gillard, the ALP’s probability distribution shifted to the right a bit. But the probability of them winning government (ie. winning more than 75 seats) is still not so good. The ALP increased their expected number of seats to 56, with the Coalition down to 88.

June 30 – After Rudd

As things stand, the electorate-level betting odds imply the probability of a Labor victory is still very, very small. Whether this true, or a limitation of the data, will be a focus of future posts.

Predictions from betting odds

Leng and I use betting odds to make our predictions. None of this is new — academics have been using them for years — but they’re pretty new to the mainstream media, so we thought it’d be good to talk through some of their strengths and weaknesses.

Why use betting odds?

  1. They’re more accurate: Previous studies have shown that they often yield comparable, if not better, predictions than polls. Betting odds reflect the market’s estimate of the underlying probabilities of victory for each party, implicitly combining information from many sources. Unlike poll respondents, bettors are putting their money on the line, so their views are likely to be better informed than those given to pollsters.
  2. They include more information: Participants will include not just their own views, but those of their friends, colleagues, pundits, polls and other sources in forming these views. This means betting markets tend to have a larger “effective” sample size, compared to polls.
  3. They’re naturally designed for making predictions: While pollsters ask participants about their current voting preferences, betting markets implicitly ask participants, ‘who will win?’, so their results are more useful in making predictions.
  4. They allow us to answer questions national polls can’t: Elections are won by winning seats. But we don’t have separate polls for all 150 electorates. Betting markets provide us with electorate-level data that allows us to make direct predictions about who will win the election. An added bonus is that we can also predict answers to many other interesting questions that are difficult to answer with national polling data: how many seats will the ALP lose in western Sydney? How likely is a hung parliament? What are the odds of the Liberal party winning the bellwether seat of Eden-Monaro, but losing the election? Provided we have good betting odds, we can use our general framework to answer all kinds of interesting questions that can’t be answered using national polling data without making lots of assumptions.

What are the weaknesses of using betting odds?

This blog is about exploring the potential of betting odds for making predictions and, while we believe these predictions are likely to be accurate close to the election, we are just as interested in exploring the weaknesses of these techniques as their strengths. Here are a few:

  1. Betting odds can yield poor predictions when not many people are betting, particularly in the early stages of the campaign. In the extreme case, where no-one has placed a bet on an electorate, the betting odds will reflect the views of a few people in the betting agency and are essentially useless (although we’re able to exclude some of them from our analysis by looking for electorates where the odds have not changed from their opening values). This is why we’re cautious about making predictions at this early stage in the campaign.
  2. We can’t quantify the uncertainty on our predictions: Unfortunately, the betting agencies don’t make available information on market size, let alone individual betting behaviour, so we can’t put error bars on our predictions.
  3. They have their own biases: for instance, they tend to overestimate the probability of victory of underdogs (this is called “longshot bias”).

Overall, we think electoral betting odds are a really rich source of data for making predictions about elections, as long as they’re interpreted with caution. As we get closer to the election, we’ll start rolling out more of our analyses.

Coverage in the AFR

Last weekend, the Australian Financial Review ran a story that made use of the analysis K and I did. The story talks about the predicted number of seats the ALP and the Coalition would win before and after the change in the ALP leadership, and some of the differences between polling data and betting market data.