# Does covariance matter?

Leng and I had a really interesting exchange with Simon Jackman on Twitter yesterday. Me and my whopping eight followers (hi mum) would probably agree that I suck at Twitter. It became difficult to continue the discussion with the 140 character limit, so Leng and I thought we’d try and outline it here.

The discussion was about the importance or otherwise of modelling covariance between seats, in light of the extremely low probability of Labor victory (< 1%) implied by the seat-level betting data, when modelled assuming seats are independent. Leng and I have put a bit of effort into including this covariance in our model, and we think the very low implied probabilities of Labor victory are due to ignoring this covariance. Simon suggested, though, that the Efficient Market Hypothesis (EMH), were it to hold, implied the seats should be treated as independent.
Here’s what we think Simon’s saying:
1. Bettors can bet on any seat.
2. If they think that seats are NOT independent but move in certain directions together, then this will be reflected in how they bet.
3. How they bet affects the prices of each seat.
4. Assuming the EMH holds, the price of each seat then accurately represents the probability of a party winning that particular seat.
5. Built into that probability is any covariance that bettors think exists. Therefore, it is OK to assume independence because each seat’s probability already builds in any covariance.
We agree with 1-4, but disagree with 5. For a correlated multivariate Bernoulli distribution, the probabilities $p_{1}, ..., p_{150}$ don’t uniquely define the covariances (although they do constrain the allowable choices of covariances). So even if the EMH holds, and $p_{1}, ..., p_{150}$ are known exactly, these probabilities still imply a range for each covariance, rather than a single value. If the EMH holds, we would expect this range to include the true covariance, but we don’t have enough information to define it exactly.
With a little algebra, you can show that for any set of seat probabilities, choosing all covariances to be zero (i.e., consistent with assuming independent) is always within the valid range of choices for covariances. But this doesn’t mean it’s the right choice. Indeed, there are good reasons to think that there are significant correlations between many seats. And if we choose valid non-zero covariances, the seat-level betting data imply a much more realistic probability of Labor victory (as much as 34% using odds from August 6), that is in better agreement with national-level betting odds.
We hope we haven’t misrepresented Simon’s views about the role of EMH in making the independence assumption. Hopefully we can clear this up and better understand the proper role of covariance in making predictions using betting markets.

## 2 thoughts on “Does covariance matter?”

1. Nice clarification.

(1) I’m really only interested in what the EMH implies; I doubt that it is true as an empirical matter.

(2) The thing I don’t know (and I should try to find a proof or see if I can develop one) is whether EMH implies zero (conditional) covariances among the events being wagered on. I understand that the marginal Bernoulli success probabilities don’t uniquely give you the joint distribution (i.e., covariances) of correlated Bernoullis. But my (possibly false) conjecture runs in the reverse direction: EMH implies zero covariances.

(3) I should be really clear what I proposing: my (possibly false) conjecture is that if market prices are set under EMH then seat outcomes are conditionally independent given market prices (and implied marginal probabilities). That is, given the probabilities in each seat from the EMH market, observing that seat j has gone way or another gives you no more information about what will happen in seat k, \forall j \neq k.

(4) I’m thinking my conjecture is wrong in the following way. Consider this. On election night the result in seat j is revealed before the result in seat k. Labor wins seat j. If a betting market was still open for seat k, might that not effect prices in that market? Certainly yes, if the result in seat j was a shock (which could contradict EMH in itself); and maybe still yes, but perhaps only by some epsilon, if Labor was overwhelmingly expected to win seat j.

The interesting case of course is where seat j was thought to be 50-50, and then nature reveals the true state of the world. One can easily imagine the price of some seat k reacting to this information; indeed, EMH would demand that this relevant information be incorporated into the new market price in seat k, update via Bayes Rule etc.

(5) This then gets us to the more interesting problem of what is the correct correlation structure to impose across the seats. This is where historical information, modeling etc would come in. Clearly there is some state-level clustering I’d be looking at, perhaps socio-demographic characteristics of the seats too. I’ve done work on this (in another context) for the American states in my poll averaging work for Huffington Post; it is a cool problem. I really like what you did with it, averaging over the set of feasible correlation structures.

I’ve really enjoyed wrestling with this problem. I’ve almost always used a conditional independence assumption when simulating aggregate probabilities (based on some joint event) using the marginal probabilities of the underlying Bernoulli events, if only because it is so damned simple.

I’ve occasionally hedged my bets in print on whether this was correct, or asserted that the procedure is rationalized under EMH. It is good to think harder about whether this is correct. I now suspect not. Thanks.