Link is here.
K and I are obviously not the only ones doing this prediction caper for the Australian election. Aside from the AFR coverage using our stuff, we enjoy reading Simon Jackman’s stuff in The Guardian. We dont know Simon but he’s got a stellar background – professor at Stanford in both the politics and statistics department. Some of his research uses Bayesian techniques which is stuff K and I have been reading.
On Aug 2, Simon Jackman had an article in the The Guardian about betting markets. He uses effectively the same simulation we do – get implied probabilities from betting markets, runs lots of simulations and get the distribution of outcomes. Based on his 1 August simulation, he predicts the ALP will win 61 seats. This is a few less than our model predicts but in the same ballpark, unlike the national polls which are effectively predicting the ALP winning around 75 seats.
Why does Simon’s predictions differ from ours? From what we can tell, the only difference is that he is using data from both Centrebet and Sportsbet, and then averaging the implied probability. We’ve only got data from Sportbet. It may also be possible that he is not correcting for the longshot bias that we discussed in a previous post. This might mean that in his model, seats that will likely end up in the ALP column are given to ‘other’ candidates. If you look at our predictions before we corrected for the longshot bias, we had thought the ALP would win around 60-62 seats too. Anyway, after reading Simon’s article, it’s good to know me and K are not cray. We’re going to enjoy this song and the rest of our weekend. Hope you do too!
By now you have probably seen that the election will be on Sept 7 – four short weeks away! I think this means me and K will have to crank up our general rate of output. We’ve been working on looking at different types of analyses beyond the current stuff and hopefully will execute it.
Re the election, it’s interesting to see the campaign slogans and tag lines the parties go with. Seems like we’ll be seeing a lot of ‘Hope, reward and opportunity’ via the Real Solutions Plan. And some stuff about a positive vision which is not the same old negativity. Three word slogans always remind me of ‘Peace, Bread, Land’. Not knocking it at all because they can be very effective.
Some exciting news for us is that we’ve agreed with the AFR to conduct our analysis for them on an exclusive basis. That was pretty good to nail down. But if anyone wants to put K’s face on radio, I’m sure everyone will enjoy that. Looking forward to the next four weeks!
Here’s a link to the latest AFR write up re the predictions K and I have been making. It does a neat job in summarizing all the predictions over time. If anyone is feeling the momentum, it looks like its the ALP.
Re the election date, it’s currently scheduled for the week 2 of the AFL finals (Sept 14). This will be the first federal election held during the AFL finals since 1946 (thanks AEC). Looks like most elections are held in March, or between Oct-Dec. Football and elections always remind me of the 1999 Victorian election held on preliminary final day – two massive upsets in one day. Hopefully there will be no upsets in the football this year.
We get our electoral betting market data from Sportsbet – scraping it using Python, using a cron job to automate the scraping everyday, and then a little script to get it sent to dropbox. It was a nice little set up because it was all pretty automated.
But from July 22 until July 29, betting data for around 40 seats were not available. Our scraping was still working, but the data wasn’t on the site. So in that time, me and K just sat and waited. Now, truly great bloggers would have gone to other betting sites and tried to find alternative sources of data. Not us. We just waited.
Lucky for us, Sportsbet are now back with all 150 seats. And in our down time, our pal at the AFR, Edmund Tadros, told us that Sportingbet have some data too. So maybe we’ll try write another little Python script to scrape it. (If you ever need to scrape data off a website, check out the Python library Beautiful Soup.)
Thankfully today we checked and it looks like all 150 seats are available again. More predictions coming tomorrow!
So its now July 30, but below we share the predicted results based off the betting markets on July 22. One small thing we did is to tweak the model a little bit to deal with the longshot bias. This has changed the results a bit.
What exactly is the longshot bias?
In gambling, it is an empirically regular occurrence that bettors tend to overvalue the the longshot (ie. the outcome or candidate that has a very low chance of winning). And they tend to undervalue the favorite. So in betting odds, we might see that ALP candidate pays out $5 for a $1 bet, while the Coalition candidate will only pay out $1.30 for a $1 bet. The ALP candidate is the longshot, the Coalition is the favorite. But the true underlying odds might be $8 for the ALP and $1.10 for the Coalition. Because bettors have a bias in betting on the longshot, this will increase the demand for that outcome, thus reducing the payout.
What are some explanations of the longshot bias?
Justin Wolfers and Erik Snowberg have a nice paper that looks at two competing theories of the longshot bias. The traditional thinking is that some punters are risk loving in nature. This assumes that the punters are rational, but bet on the longshot because they derive utility from taking risky bets. Given a choice between guaranteed $25 or a gamble with 25% chance of winning $100, the risk loving person takes the gamble even though the expected value of both choices are the same. They love the punt.
The paper looks at another possibility and concludes that this is a bigger driver of the longshot bias. The authors argue that nope, the bettors arent really rational, risk loving economic actors. Nope, they’re just naughty children with misguided perceptions of probability!
How do we correct for longshot bias and how did this affect results?
Having learned a bit more re this longshot bias lark, we decided that in seats where a candidate has less than 10% chance of winning, this is effectively a longshot. For those seats, we rounded it down to 0. The effect of this is pretty stark. Looking at the category of ‘other’, it has gone down to only 2 seats. This is due to the changes we made to the model, not changes in the betting odds. Let’s compare this to what we’re seeing in the real world by having a look at the MPs who are currently in the ‘other’ category.
It looks like with Rob Oakeshott and Tony Windsor both retiring, Lyne and New England are definitely in play for the major parties. And Tony Crook has also retired, putting O’Connor in play too.
Adam Bandt (Melbourne: Greens) benefited from Liberal preferences at the last election and will be hoping to take advantage of the progressive vote being angered by the hardline ALP asylum seeker position. But it looks like he might struggle to hold onto Melbourne if the chat is true that the Liberals will preference the ALP in this election.
Andrew Wilkie is looking pretty good in Denison, having scored some union love from the NTEU. We love it he’s up against newcomer Jane Austin from the ALP. People have been making high brow literary jokes about her name – we have instead chosen to focus on the metropolitan hospital genre but have come up short. Everyone’s favorite Katter’s Australian Party candidate, Bob Katter, is also looking pretty good in his seat. So if we’re counting right, the two predicted seats for ‘other’ are Wilkie and Katter. If you’ve heard anything different, let us know.
So it appears that K and I arent the best at this blogging caper. Looking at the blogs we like to read, those people are very consistent with a post a day or so. Looking at our short track record, we’re duds. Sorry about this. K has been doing the academic thing, flying from MIT back to Melbourne for a conference – some stuff about the environment. He’s a good kid like that. I dont have any real excuses. Just a bit soft I guess.
Anyway, enough agony aunt stuff and onto the stats. Below, please find the predicted results from the July 21 data. Looks like the ALP is creeping up. We’ve been paying attention to the latest national polling by the big guys and they’re all calling it very close. Looking at the number of current crossbenchers (there are 6 of them), our predicted ‘other’ win count looks a bit high. We’re going to tweak our model a bit to try fix this up. Our usual disclaimers hold (ie. very early out, maybe the data not good if not enough betting etc). Thanks for reading our blog!
So K and I are running a few days behind the data. We’ve set up a kron job to automate the scraping of the data each day from the Sportsbet website, but it takes us a bit of time to organize ourselves re doing the analysis and then writing up the post. So the data is kind of fresh in that it’s new to the blog. But we know it’s a bit stale given we’re blogging about it 6 days late. Sorry about that.
We’ve just completed the analysis for the data from July 11. In the press, it’s all been about how the national level polls are tightening and the election is going to be close. Accordingly, we’re seeing more stories where the inevitability of an Abbott government is increasingly being questioned whereas before all the pressure and focus was on the ALP.
Our electoral level analysis differs a bit from the national polls. It shows the ALP making up a bit of ground, but the margin between the two parties re the expected number of seats won is still pretty large.
On July 11, the ALP had increased the expected number of seats they would win by 4 from the number predicted on June 30 (increase from 56 to 60). We’re mindful that the original election date of Sept 14 is pretty far away, and possibly going to change. (There’s a bit of chat that Nov 2 could be the new date.) But the electoral level analysis has not swung as dramatically as the national level analysis. We’ll look at a few reasons why this might be the case in coming posts.
In the last post, we showed some early probability distributions that model the number of seats each party would win, and the associated probabilities. Probability distributions are perhaps not polite dinner party conversation topics (K and I have made this mistake many a time) but they are hugely important. Its a neat way to represent the degree of uncertainty about the outcome of certain events. The only thing we know for sure re how many seats the ALP will win:
- They will definitely win between 0 and 150 seats.
- The probability of them winning exactly 0 seats, 1 seat, 2 seats etc must be between 0 and 1.
- If we add up all the probabilities for winning each seat, it would add up to 1.
I thought it might be relevant to point a few things out about the probability distribution we showed in the previous blog.
- We’ve obviously put two probability distributions into the same chart. In future, we’ll likely separate them out in case there is any overlap which makes it hard to see stuff.
- The shape of the distributions look bell shaped (ie. normally distributed). This is called a Poisson-Binomial distribution, and is well-approximated by a normal distribution in some cases.
We hope to dive into more analysis of the politics in future blog posts instead of making you go through our version of stats 101!
So far, all our posts thus far have been set up stuff. We’ve covered
- self-important intros and why we’re doing this
- how not to use electoral probabilities
- a bit of talking ourselves up re the AFR article
- advantages and disadvantages of using electoral betting data
It’s time to look at how we use the probabilities we have from the betting markets to model the election.
What are we modeling?
To start with, we want to predict how many seats the governing ALP will win at the next election. Dont be surprised, but the ALP will definitely win between 0 and 150 seats! The aim of our analysis is to figure out the probability of each possible outcome. So we want to figure out the probability that the ALP will win 0 seats, the probability they will win 1 seat, will win 2 seats,…, will win 150 seats. And we will then do the same for the Coalition.
We can then draw this as a nice little probability distribution, analyze the shape, and talk about the likelihood of various scenarios.
How do we go about doing this?
For each electorate, imagine there are only two relevant probabilities. The probability ALP wins, and the probability ALP loses. We can then think of the result of each electorate as a toss of an unfair coin. Some coins will favor the ALP, and some coins will favor the Coalition. We can then simulate an election by tossing each of the unfair coins. And then we can count the number of electorates won by the ALP, and not won by the ALP to make a prediction.
We repeat this process 100,000 times. The more times we do it, the better we can approximate the true probability distribution. After simulating 100,000 elections, we can then see how often certain outcomes occur. Maybe we see that the ALP winning 63 seats happens in 40,000 of the simulated elections. We’d then say the probability of the ALP winning 63 seats is 40%. We would do this for every possible outcome and then plot a probability distribution.
What do the probability distributions look like?
Below we show a probability distribution we did just before Julia Gillard was replaced. The model predicts that:
- the probability the ALP will win government (ie. more than 75 seats) is pretty low.
- the expected number of seats the ALP will win is 49.
- the expected number of seats the Coalition will win is 95.
Some of these numbers were covered in the AFR last weekend. But we wanted to show the shape of the probability distribution before Gillard was replaced.
After Rudd replaced Gillard, the ALP’s probability distribution shifted to the right a bit. But the probability of them winning government (ie. winning more than 75 seats) is still not so good. The ALP increased their expected number of seats to 56, with the Coalition down to 88.