Here at Pollsposition, our goal is not to frantically comment on the latest published poll. Though that may be entertaining, it is counterproductive to making good predictions. Our focus is not so much on polling institutes individually but more on them globally – what we refer to as “the market.” And contrary to popular belief, this market has performed quite well in recent years, in line with its historical performance.
But in order to acquire this global vision, we must individually analyze pollsters. This is what our pollster ratings does in a systematic, methodical and reproducible way. Because one of the worst uses of polls that we encounter is to select a survey that confirms what you already believe (in other words cherry picking).
At least two things don’t work in this tweet:
- A 5-point decline in one survey doesn’t mean much: in early August, when this tweet was sent, the aggregate decline in all polls was significantly lower (-1 point). It was a net -2 points (approval-disapproval). It’s not great, but it’s also not a massive plunge. Just another example of a myopic view on polls.
- A 27% approval rating may be shocking, but if you insist on reading polls one after the other, at least examine where they come from. YouGov in this case – which, as we clearly see in our tracker , systematically reports approval ratings significantly lower than the average pollster. Why? It’s hard to say. It may come from the way they ask the question, their sample, their collection method, etc. The fact is there is a bias, which must be taken into account. A typical example of cherry-picking.
And it is not your run of the mill journalist that makes these mistakes: it is the editor in chief for the political department of Le Figaro, and his tweet was re-tweeted close to 500 times! Now that we’ve seen what not to do, let’s look at what we do know about French pollsters.
The difference in performance are relatively close but noticeable
To evaluate a pollster’s performance over time, we have developed an indicator that we call "relative error" . It tries to answer a simple question: during each election, how far did each institute move away from the market? The more the relative error is below 0, the more the institute has outperformed the market; the more positive it is, the more it underperformed the market. For example, a relative error of -2 means the pollster was less mistaken than market by 2 points. Our model then aggregates these errors for all elections, giving more influence to presidential elections and the most recent elections (those taking place after 2006). With all parties combined, this is what we get:
The relative errors of pollsters, all parties combined
|Pollster||Relative error (% points)|
The more negative the relative error, the more the institute has outperformed the market; the more positive the error, the more it has underperformed the market. For example, a relative error of -2 means that the pollster was less mistaken than the market by 2 points.
Our first lesson is that French institutes are rarely far off from their colleagues - no more than +/- 0.5 point difference. This is somewhat reassuring as none is visibly very bad. But neither is very good either. And this lack of volatility in the results, combined with the fact that pollsters almost all use the same methods of collection (online forms) can turn into herding , which is harmful for forecasting: if all institutes err in the same direction, the models using these polls will also be wrong.
The difference in performances is nevertheless noticeable, especially when analyzed by parties:
Relative errors of polling firms, by party
We can see that some institutes such as Elabe, Odoxa or OpinionWay significantly stray from the market, a sure sign that there are differences in performance between firms. Others remain remarkably close to the market (Kantar and Ipsos). Perhaps are they herding. Or, on the contrary, maybe are they influencing other pollsters through their reputations? I tend to lean toward the second hypothesis: Ipsos and Kantar are old institutes, signatories of the ESOMAR quality charter , with historically accurate results, established collection methods and exhaustive publications of their methods and results for every survey. This likely influences their colleagues and, as a result, they come to embody the market.
Finally, the complexity does not look symmetrical: no institute beats the market for all parties, while 3 perform less well than the market for all parties (BVA, Ifop, OpinionWay, and even CSA). In other words, it's easier to be worse than the market than to beat it.
Regression to the mean renews the hierarchy
Even when you are on top of the hierarchy, nothing guarantees that you will stay there. On the one hand, it’s positive: it seems that “rentier effects” are small in this industry. But this also means it is more difficult to predict which institutes will be the best on a given election - hence the point of aggregating polls.
We can observe this regression to the mean by dividing the market into 3 categories (best, average, poor) and by calculating the amount of time each institute spends there. If there is, in fact, regression to the mean the amount of time each pollster spends within each category should be well balanced – at least not completely off balance in favor one category. And essentially, this is what we are seeing:
Pollsters’ performance against the market, by % of elections they covered
A relative error between +/-0.1 point is equivalent to being in the market. In order to be better than the market, it takes a relative error < - 0.1 point. To be worse than the market, you need a relative error > 0.1 points.
At best, pollsters spend 2/3 of their time above the market – same thing for the "poorer" category. This means that there is a certain hierarchy, but it is partially renewed with every election.
Pollsters’ biases are small... except for the National Front
To explain polling errors, numerous theories are debated in public: polls are supposedly biased towards the pollster’s supposed favorite candidate; National Front voters are shy, so their party is systematically undervalued by the polls, etc. The advantage of these assumptions is that they can be confronted by data. And – spoiler alert – they are false.
As shown in the graph below, out of the 800 polls and 15 elections that we analyzed, the only substantial bias (more than 1 point) implicated….the FN. The sign of a wide conspiracy? Not really, since the errors favor the FN: on average, pollsters overestimate the far right by 2 points. In other words, the FN is the only party for which pollsters show a substantial bias – and this is an overestimating bias, which goes against the shy voters theory – something we had already pointed out.
Weighted average bias of all pollsters, by party
The bias is the difference between the poll and the election result. A positive bias indicates an overvaluation of a given party; a negative bias indicates a party that has been undervalued. The weighting is calculated in a way that the 5 most recent elections hold the most weight.
But I have kept something from you: these averages give most of the weight to the last 5 elections in our database. When we compare them to a simple average (that gives equal weight to all elections, regardless of their date) we notice the overestimation of the FN appeared just recently – ironically, around the same time as the “shy voter” theory.
Weighted average bias towards the far right of each polling institute
The bias is defined by the difference between the poll and the election result. A positive bias indicates an overvaluation of a given party; a negative bias indicates an undervaluation of the party. Elabe and Odoxa have only participated in 3 elections.
Conclusion: in the last 5 elections, all pollsters overestimated the far right – and more so than usual. From now on when you meet a believer of “shy voting”, there is a great chance that she arrived at this conclusion either because she did not look at the data (in which case I would be grateful if you told her about us), or because she chose to ignore it. In a partisan debate it is almost required. To understand what polls have to say it is a real hurdle.