Two big polling errors, and two single parties in the lead - PollsPosition

Two big polling errors, and two single parties in the lead

Post mortem of the European elections

By Alexandre Andorra

On 06/18/2019

Post-election articles always present a risk of "La Bruyère syndrome" - "Everything is said, and we have come too late for more than seven thousand years that there are men who think". What can we say, then, as the dust settles, the subject has been discussed many times and eyes are already turning to next year's municipal elections?

Yes, the French political landscape is now structured around two main forces - RN and LREM. But we wrote it in our first analysis . No, summarizing the election as "RN is in the lead" is not the most insightful. But we wrote that in our last update.

The most relevant thing seems to be focusing on what makes us different - the model and its performance, in the light of the European elections results. After all, this feedback allows us to criticize and improve our models - imagine that you have to evaluate a model that estimates a variable that can never be observed; it is possible, but it takes much longer. If you ask me, the ideal for the model would be to have one election per week!

So let's look at what surprised the model, what didn't surprise it but surprised the non-statistical analyses, and what everyone - including the model - expected. Spoiler alert: we're going to talk about the Greens, LR, LFI, RN, entropy, polling errors... Make yourself comfortable, it's going to be legendary.

Why did the model not anticipate errors on LR (right) and EELV (green)?

Let's start by setting the scene: how far was the model from the results? The following table makes this comparison on the 7 parties we model (the others are too close to 0 to be properly modeled):

Raw errors of the model - vote share and seats
Vote Share Median Share Forecast Vote Share Error Nbr Seats Median Seats Forecast Nbr Seats Error
EELV 13.5 8.3 -5.2 13 8 -5
PS 6.2 5.2 -1 6 5 -1
LREM 22.4 22.6 0.2 23 22 -1
DLF 3.5 4 0.5 0 0 0
RN 23.3 24.4 1.1 23 24 1
LFI 6.3 8.3 2 6 8 2
LR 8.5 14.2 5.7 8 13 5

The raw error measures the distance between the model's median and the election results but also indicates the direction of that error. A positive error means the model overvalued a given party, while a negative error indicates an undervaluation. See our method for more details.

Of course, the two big mistakes on LR (right) and EELV (green) are striking - 5.2 and 5.7 points respectively, compared to a historical error of 1 point for the Greens and 1.7 for the right. A factor of 3 to 5 compared to the historical average seems very far away.

"But is it really? Because, after all, it is normal for polls to be wrong," you will tell me, because you are an attentive reader. "Absolutely, my dear Watson," I will answer you. But as we wrote in February : "our results are conditional on the fact that 2019 polling errors are not significantly different from past polling errors". However, for LR and EELV, the errors were too large compared to their historical errors, so the model could not even consider them. That is why it gave 0% to the "LR at 8.5%" and "EELV at 13.5%" scenarios.

A good empirical way to observe this phenomenon is to check whether the distributions simulated by the model contained the final result: if so, the error was in the historical norm and the model was not surprised; if not, the error really was surprising:

Were the raw errors surprising?
Nbr Seats Nbr Seats Error Surprising
EELV 13 -5 Yes
PS 6 -1 No
LREM 23 -1 No
DLF 0 0 No
RN 23 1 No
LFI 6 2 No
LR 8 5 Yes

The raw error measures the distance between the model's median and the election results but also indicates the direction of that error. A positive error means the model overvalued a given party, while a negative error indicates an undervaluation. See our method for more details.

The only surprising errors historically are those on LR and EELV. This illustrates what we said in our last update : when polls are so wrong, the model can do little (at least as long as it is based only on polls, which may not be the case for the next election... #Suspens).

Let us note the two cases:

  1. LFI (far left): the error is quite high historically, but it remains within the norm. The model is not surprised and anticipated the overvaluation, giving LFI more than a 1-in-4 chance of winning 6 seats or fewer - while conventional wisdom was that LFI was likely to be underestimated, partly because it had 20% in 2017, and partly because the party itself regularly attacked the polls.
  2. LR-EELV: the error is very far from the historical norm (x3 or x5 the average). The model does not consider these scenarios to be realistic and gives them (wrongly) a near-zero probability.

Do you prefer Athens or New York?

I cannot resist the temptation to introduce here a fundamental concept, both to model and to make decisions under uncertainty: entropy . As its name does not indicate, it is a way of quantifying the lack of information, and therefore the unpredictability, of a system.

Imagine that you live in Athens, and that the weather is nice today (I'm borrowing this example from Richard McElreath's fantastic course ). What would you say if I asked you to forecast the weather for tomorrow? "The weather will be nice." Why? Because the weather in Athens is good almost all the time. Your uncertainty is low, and you will be shocked if it rains. Note that the reasoning is the same if you live in Glasgow; just replace "beautiful" with "rain": you will be shocked if the weather is nice.

Now, imagine that you live in Paris or New York. Even if the weather is fine today, your uncertainty is high and you have trouble being sure of tomorrow’s weather, because the weather often varies in these two cities. Silver lining: you'll be shocked neither by rain nor sun.

Entropy measures this potential for surprise, and your goal, in a model as in any decision, is to maximize entropy, to be as little shocked as possible by the results. In short, like a Scout, the goal is to always "be prepared".

"Why would he talk about entropy?”

The model’s distributions illustrate this point. We have already talked about LFI, so let's take the Socialist Party (PS), which ended up with 6 seats. The model gave it a 4-in-9 chance (46%) of getting 0 seat. It also means that it had a 5-in-9 chance of getting at least 4 seats. Here we have a "New York" distribution (remember the example above?): many events are possible, and if the model’s assumptions are reasonable, no event is really surprising. The same goes for the RN-LREM duel, which could range from +7 for LREM to +10 for RN (they ended up in a tie).

Distribution of the gap between RN (far right) and LREM (center)

Each bar represents the probability that the seat-gap between far right RN and centrist LREM is equal to the given number. A positive gap means that RN finishes ahead of LREM; a negative gap means that LREM finishes ahead of RN. The higher the bar, the higher the probability. Hover over the chart to see the details.

If this type of distribution makes people uncomfortable because it reflects a great uncertainty that does not lend itself to binary titles - "X leads in the polls" -, I tend to like them because they prepare us for the maximum number of possible scenarios (always under the model ‘s assumptions and with the available data).

Basically, they remind us that reality is rarely simple and deterministic. After the results, most newspapers’ and editorials’ headlines were "RN wins European elections". However, with a few random variations, LREM could have finished first by a small margin, and these same newspapers would have claimed that "Macron was strengthened by European elections".

Conversely, if you want surprising distributions, take LR or EELV. The environmentalists ended up with 13 seats, when the model gave them a 95-in-100 chance of getting 5 to 11. The "13 seats" scenario involved errors that were literally too abnormal to be considered by the model.

Green Party's number of seats - Distribution anticipated by the model and final result

Each bar represents the probability that the Green Party (EELV) gets the indicated number of seats. The higher the bar, the higher the probability. When a party gets less than 5% of the votes it gets 0 seat, which explains the threshold. Hover over the chart to see the details.

It's like when you live in Athens and it starts raining while the weather was fine yesterday: it's so weird that you didn't even imagine it. And as a result, you are surprised - and you are soaked. In this case, one month before the elections, we wrote : "depending on the model and available data, it would be surprising to see left-wing parties outperform on Election Day if one of them underperforms". LFI underperformed, but EELV still outperformed. So like you and us, the model was surprised by this scenario!

But this is the kind of information that reduces your uncertainty: next time, the model will know that this kind of error is plausible. However, let's not forget that the model was calibrated to get it right 5 times out of 6 (83%) - it got 5 out of 7 parties, not bad! These performances are of course to be evaluated on a larger sample, but for the next elections, we will consider increasing this threshold to 100%: how to train the model so that it is able to correctly retrospectively predict all parties, for all elections?

We are sceptical about this approach, mainly because it is difficult to find a scientifically justifiable uncertainty threshold to generate: we can reach 100% by generating 15-point polling errors, but this model would not have much to say... However, skepticism must not prevent us from thinking about solutions.

Turnout, that black swan that was white?

We talked about it in our last update : turnout is often the subject of many fantasies, as if it were the black swan that could turn everything upside down. We will not repeat here our doubts about these interpretations . In very close elections, turnout can play a decisive role. But it must systematically favor one party while disadvantaging others. In short, we are talking about very special cases.

I think it is more appropriate to maximize our entropy by reflecting on the many factors of uncertainty that influence an election, not just by looking at turnout. This year, turnout was about 50% , 8 points higher than in 2014 and 4 points higher than in the polls. So despite this substantial polling error, we ended up with a turnout that was quite similar to previous elections, because this variable does not seem so volatile.

Is the overvaluation of extremist parties statistical or systematic?

Another feature of recent elections seems to be the overvaluation of the far-right party: Pollsters have now overestimated the RN for a fourth time in a row – 5 times in the last 6 elections. While there is no particular trend for the other parties, let’s note that the far left has been overvalued too, but for a longer time (that’s why the trend line is less steep than for RN):

Scatter plot and best-fit line of all pollsters' raw error for far right and far left parties
Scatter plot and best-fit line of all pollsters raw error for far right and far left
                             parties Scatter plot and best-fit line of all pollsters raw error for far right and far left
                             parties Scatter plot and best-fit line of all pollsters raw error for far right and far left

The raw error measures the distance between the weighted average of pollsters and the election results but also indicates the direction of that error. A positive error means that pollsters overvalued a given party, while a negative error indicates an undervaluation. PollsPosition calculations on 800+ polls in 16 elections. See our method for more details.

As for RN, we anticipated it in our last update : RN’s 3-point increase over the last two weeks of an eminently stable campaign was surprising and seems to highlight herding – at least in part:

5-in-6-chance interval of the popular vote

Solid lines represent the median share of the popular vote of each party. Shaded areas show the range in which the true popular vote is, with a 5 in 6 chance (83%). So a hypothetical range from 20% to 25% with a 22.5% median means that the party has a 5-in-6 chance of getting 20% to 25% of the popular vote, with a median share of 22.5%.
Why take a 5 in 6 chance as benchmark? Look at it as the probability of getting any number but the 6 when throwing a fair die. Hover over the chart to see the details. You can hide/display a party by clicking on its name in the legend.

If this is the case, this behavior is quite rational on the part of pollsters: as long as they are less criticized for an overvaluation of extreme parties than for an undervaluation, they will have an interest in overvaluing.

Beyond potential herding, if this overvaluation is not explained solely by random statistical variations, it must be expected to happen again and taken into account for next elections. Not definitively, but probabilistically: we must move our prior towards something like "the most likely is that polls will overestimate far right and far left parties".

Finally, note that, for both turnout and RN, conventional wisdom got it wrong – turnout was supposed to be very low, it was higher than expected; RN was expected well ahead of LREM, it ended up less than one point ahead. The goal is not to point conventional wisdom out but to emphasize that the model is better than us humans at managing these uncertainties, because it does not forget that they go both ways.

The model is also partially protected from herding thanks to the distributions it generates (which take into account these polling errors). But we can see in the graph above that the model’s median gets mislead by the polls.

What's next?

As mentioned above, the model performed well overall and provides a good baseline for what we want to accomplish - using all the available information and weighting it according to its importance, to better estimate the uncertainties and power relationships surrounding an election.

But we saw that polls have their limits - occasionally abnormal errors, herding... Our goal for the next election will be to depend less on them, by integrating other variables into the model. One of our frustrations is also that French polls stay at the national level. We will therefore study the possibility of overcoming this constraint for 2020 local elections and especially for 2021 departmental elections.

So we’ve got many exciting projects in store! The numbers also tell us that you prefer our statistical models and analyses to more frequent but less refined publications such as the PollsCatcher. Between each election, we will therefore publish articles and podcast episodes from time to time, when we have something interesting and unique to share. We will let you know by one single means: the newsletter. So if you haven't already, I encourage you to subscribe below (it's free):

In the meantime, we would like to thank you for your support and attention. There were nearly 4000 of you reading us during the last month of the campaign. It's heart-warming, and it motivates us for the future!

Alexandre Andorra is a cofounder of PollsPosition.