Leng and I had a really interesting exchange with Simon Jackman on Twitter yesterday. Me and my whopping eight followers (hi mum) would probably agree that I suck at Twitter. It became difficult to continue the discussion with the 140 character limit, so Leng and I thought we’d try and outline it here.
The discussion was about the importance or otherwise of modelling covariance between seats, in light of the extremely low probability of Labor victory (< 1%) implied by the seat-level betting data, when modelled assuming seats are independent. Leng and I have put a bit of effort into including this covariance in our model, and we think the very low implied probabilities of Labor victory are due to ignoring this covariance. Simon suggested, though, that the Efficient Market Hypothesis (EMH), were it to hold, implied the seats should be treated as independent.
Here’s what we think Simon’s saying:
1. Bettors can bet on any seat.
2. If they think that seats are NOT independent but move in certain directions together, then this will be reflected in how they bet.
3. How they bet affects the prices of each seat.
4. Assuming the EMH holds, the price of each seat then accurately represents the probability of a party winning that particular seat.
5. Built into that probability is any covariance that bettors think exists. Therefore, it is OK to assume independence because each seat’s probability already builds in any covariance.
We agree with 1-4, but disagree with 5. For a correlated multivariate Bernoulli distribution, the probabilities don’t uniquely define the covariances (although they do constrain the allowable choices of covariances). So even if the EMH holds, and are known exactly, these probabilities still imply a range for each covariance, rather than a single value. If the EMH holds, we would expect this range to include the true covariance, but we don’t have enough information to define it exactly.
With a little algebra, you can show that for any set of seat probabilities, choosing all covariances to be zero (i.e., consistent with assuming independent) is always within the valid range of choices for covariances. But this doesn’t mean it’s the right choice. Indeed, there are good reasons to think that there are significant correlations between many seats. And if we choose valid non-zero covariances, the seat-level betting data imply a much more realistic probability of Labor victory (as much as 34% using odds from August 6), that is in better agreement with national-level betting odds.
We hope we haven’t misrepresented Simon’s views about the role of EMH in making the independence assumption. Hopefully we can clear this up and better understand the proper role of covariance in making predictions using betting markets.