Car reliability has always played a central role in Formula 1 — an inevitable consequence of teams not wanting to leave any potential performance unexploited. Mechanical DNFs at inopportune times have decided numerous championships. Who can forget Mansell’s puncture at Adelaide 1986 or Hamilton’s shock engine failure at Malaysia 2016? Many of the sport’s most memorable upset results have also come as a result of reliability problems or weather conditions eliminating much of the field.
Today, it is commonly assumed that Formula 1 drivers have little influence in determining the reliability of their cars, with mechanical failures down to the roll of the dice. Sophisticated electronics and monitoring tools carefully protect sensitive components from inappropriate inputs. Drivers are fed constant information on their brake temperatures, fluid pressures, etc., with instructions from the pitwall on how to manage issues as soon as they appear on the telemetry trace. Gone are the days of a driver accidentally missing a gear shift or overrevving the engine. Drivers today can still put undue wear on the car in other ways, such as running too aggressively over kerbs, but tracks have become relatively sanitized and the cars ever more robust. In this article, I’ll investigate the question of whether there is evidence for drivers, past or present, affecting their own car’s reliability.
Mechanical reliability across F1 history
Before going into this analysis, it’s important to note that car failure rates have varied over the course of Formula 1 history. To explore this, I used my database of Formula 1 race results, in which I have previously coded all DNFs as either driver DNFs (e.g., crashes) or non-driver DNFs (e.g., engine failures). This is the same database I use for my model-based driver rankings and update annually for my end of season performance rankings.
I note that this database only includes drivers who completed at least three “counting races” (i.e., races without a non-driver DNF) in at least one season in their career. It therefore excludes drivers with very short careers or drivers who rarely managed to qualify. Since these excluded drivers tended to be in poor machinery, the season-average DNF rates shown below are probably slight underestimates compared to the full historical record, but they will nevertheless capture the historical trends.
The graph below shows how the percentages of starts ending in different types of DNFs have varied over time.
Across Formula 1 history, driver DNFs (chiefly, crashes) have ranged between ~5-20% of starts, perhaps reflecting a combination of varying difficulty of tracks and cars, along with changing levels of driver skills between eras. On the other hand, non-driver DNFs (chiefly, mechanical failures) have ranged between ~10-50% of starts. Since almost all non-driver DNFs are mechanical DNFs (rare exceptions include technical disqualifications), I will refer to non-driver DNFs as mechanical DNFs from here onward in this article.
Reliability issues and total DNF rates both peaked in the mid-1960s and mid-1980s. Since 2007, we have experienced an historically low level of DNFs, driven mostly by a reduction in mechanical DNFs. Each season in the past decade has had mechanical DNFs in ≤15% of starts. This trend has had the following outcomes.
- Upset results are generally less common now. In the 1980s, a new driver got their first career win every 11 races on average, and a new driver got their first podium every 8 races. From 2007-2017, new winners are crowned every 20 races, and new podium finishers every 12 races.
- Given the relative rarity of DNFs today, as well as the need to count all races, a single DNF for a championship competitor is now viewed as a more grave penalty. Before 1991, a driver’s worst 2-7 results (rules varying by year) did not count, meaning some DNFs had no actual consequence.
Whether these are positive or negative outcomes from a sporting standpoint depends on the aspects of the sport you find most entertaining.
Meaningful differences (and avoiding jumping to conclusions)
First, let’s consider an example that illustrates how poorly intuition can serve us when dealing with statistics of small numbers (a similar point was made in a previous article on driver crash rates).
In the image below, imagine that each box represents a race in a 20-race season. A red box is a mechanical DNF. Clearly, driver 2 had the much worse reliability in this season, with 8 mechanical DNFs to 2 for driver 1.
[Note that such cases certainly occur in practice, and this example is similar to the 2004 McLaren drivers, who experienced 8 and 2 mechanical DNFs respectively in 18 starts.]
Can we say with confidence that these two samples are significantly different in a statistical sense? To pose the question another way, how unlikely would this season result be if Drivers 1 and 2 were actually identical in their likelihood of experiencing a mechanical DNF, and the observed difference were all down to luck?
Perhaps surprisingly, the result for this null hypothesis is p=0.065, meaning we would expect to see a difference at least this large between identical teammates in 6.5% of seasons. Using the customary (arbitrary) cut-off of p<0.05, we should conclude that there is no statistically significant difference here. We would need to see such a difference play out over a longer time period before concluding that there is likely a systematic difference in the rates of mechanical failure for these drivers. To put it another way, we don’t have the statistical power to be justified in concluding differently.
Analysis of driver reliability rates
Since reliability rates differ greatly between years and between teams, we can’t look at absolute reliability rates if we are trying to establish a driver’s influence on reliability. Instead, we can compare a driver’s reliability to their teammates’ to see if there are systematic relative differences. For this analysis, I used all drivers in my database who debuted in 1980 or later. This choice was to simplify the analysis, as before this date teams often ran more than two cars and customer cars were common, both of which would make the analysis more complex.
For each driver, the p-value was computed (from Fisher’s exact test), giving the probability that such a one-sided record could be observed by chance. The table below presents the drivers with the most extreme p-values.
[Note: The statistics used below assume that each driver’s record is an independent sample from others’. In reality, this is not completely valid, as one driver’s reliability record also influences the tallies of their teammates. For drivers with longer careers and multiple teammates, we can reasonably assume that they see a sample of drivers with differing effects and are thus being compared to an estimate of the population mean. For drivers with very short careers, this is potentially problematic, however. For instance, a driver with only one teammate might appear to have a positive effect on reliability if that one teammate was particularly hard on their car. Or they might have their own positive effect masked if their only teammate had an equal positive effect on reliability. A full treatment of this statistical dependence would require a much more complicated statistical model, so here I proceed with a simplified approach to determine the general results, under a non-ideal assumption.]
Since we are performing multiple comparisons here (178 drivers in total), we need to be alert to the possibility of finding improbable results just due to the sheer number of comparisons. Even if the universal null hypothesis is true (i.e., every single driver has the same probability of DNFs as their teammates), a standard threshold of p<0.05 is expected to yield significant results for ~1 in 20 cases. A more stringent cut-off is therefore needed here.
In a large dataset such as this, we can correct for multiple comparisons using the Benjamini-Hochberg method. Allowing a typical false discovery rate of 10%, we find that only one of these comparisons can be considered statistically significant: Alain Prost’s significantly lower rate of mechanical DNFs than his teammates. Michele Alboreto’s higher rate of DNFs than his teammates is close to statistical significance. All other drivers in the sample are well over the significance threshold, meaning their results are easily accounted for by chance.
Although not statistically significant, Gabriele Tarquini‘s record is worthy of a brief comment just for its absurdity. Tarquini drove for several of the least competitive teams of the late 1980s and early 1990s, including Osella, Coloni, Fondmetal, and AGS. This is reflected in his record of 79 race entries but only 38 starts, due to frequent difficulties qualifying. In some races he was the only entrant for his team, but in general his teammates did a much worse job of qualifying, reflected in the fact that he has only 9 starts alongside a teammate. To add insult to injury, his car broke down in 6 of those.
Michele Alboreto‘s very poor reliability record is surprising. As far as I’m aware, Alboreto did not acquire a reputation as a car destroyer, although his 1985 title challenge was notably affected by poor reliability. Title protagonist Alain Prost had 3 mechanical DNFs vs. 7 mechanical DNFs for Alboreto (including 5 consecutive failures in the last 5 races), whereas Alboreto’s teammate Johansson had only 3 mechanical DNFs. Looking through Alboreto’s career, lower reliability than his teammates is a consistent feature. Was it all incredible misfortune or was Alboreto actually too hard on his equipment? This one is difficult to call.
Alain Prost: a unique case
In Alain Prost’s case, and his case alone, it is extremely likely that we are seeing a systematic difference in mechanical reliability, rather than a difference that could be attributed to chance.
As to the cause of this difference, we can consider two possibilities. One is Prost’s widely held reputation as a driver who was exceptionally gentle with his machinery and never pushed more than was required. The other is the possibility that Prost was given superior equipment to his teammates. The latter possibility can be tested (and rejected) by comparing Prost’s DNF rates against his various teammates, and by comparing Prost’s record to other top drivers.
The below table shows how Prost’s reliability compared to each of his teammates. As we can see from this, Prost’s reliability was consistently better, even against teammates such as Lauda, Rosberg, and Senna, whom he faced on relatively equal terms. Moreover, his reliability was not generally better in cases where he faced more junior teammates, such as Johansson and Alesi, where he would be expected to benefit most from number 1 driver status.
It’s interesting to note that Rene Arnoux also has a very favorable career reliability record, with 43 mechanical DNFs to his teammates’ 67 (p=0.004). He is not included in this analysis sample, due to debuting before 1980, but a quick-and-dirty analysis shows that in terms of the p-value he would rank only behind Alain Prost (41-73, p=0.0005) and Jean-Pierre Jabouille (a car destroyer: 23-9, p=0.0008) if the analysis were formally extended to include drivers who debuted from 1950-1979.
Next, we can look at Prost’s record alongside other world champions from the sample, many of whom enjoyed strict number 1 driver status for most of their careers.
As we can see from this, only Prost has a record that is well outside the range of chance. The next closest in terms of statistical significance is Nigel Mansell, who actually trends in the other direction (more mechanical failures than his teammates).
We are therefore left with the inescapable conclusion that Alain Prost had a significant positive influence as a driver on the reliability of his own cars.
Other interesting cases
Using this approach, it’s interesting to investigate some other drivers who are perceived as frequently breaking down or experiencing abnormal levels of misfortune.
Andrea de Cesaris was a renowned DNF specialist, holding the record for most total DNFs and most consecutive DNFs. My previous analysis of crashes also ranked him among the most crash-prone drivers (living up to his unflattering nickname), which helped to significantly boost his DNF tally. While the cars he drove across his career were highly unreliable, his reliability record was not significantly worse than that of his teammates: 90 mechanical DNFs to 79 of his teammates (p=0.30).
Mark Webber was often considered prone to poor reliability throughout his Formula 1 career, especially during his days at Jaguar and Red Bull. An analysis of his career reliability failures shows no bias, however. Webber had 34 mechanical DNFs to 36 of his teammates. This analysis doesn’t include cases where a driver had mechanical issues yet was able to finish the race, which perhaps afflicted Webber to some degree.
Jean Alesi is another name often mentioned in discussions of the most unlucky Formula 1 drivers. His decision to drive for Ferrari over Williams in 1991 was surely an unfortunate one, and he is well remembered for near misses on the way to his first win. As one of my previous analyses showed, he may have won the 1992 championship title, had he been at Williams. As far as reliability went, however, he was luckier than his teammates if anything, with 57 mechanical DNFs to 70 for his teammates (p=0.20).
To my knowledge, this is the first formal analysis of drivers’ reliability rates compared to their teammates across a large part of Formula 1 history. The key findings are:
- Since about 2007, reliability has been much higher in Formula 1. It is reasonable to assume that drivers today have less influence on mechanical reliability than drivers in the 1980s and earlier.
- Caution is needed whenever trying to draw conclusions about luck or driver influence from a single season — pure chance can cause large discrepancies in small samples.
- In the great majority of cases, career differences in reliability between drivers and their teammates are explainable by chance alone (i.e., the null hypothesis). This is not to say that drivers cannot influence mechanical reliability, but it does imply that if there is an effect of the driver it must generally be quite small (too small to detect from a typical Formula 1 career). This is consistent with an assumption of some of my modeling, which is that non-driver DNFs are mostly down to chance.
- Alain Prost stands out as a singular example of a driver who positively influenced the reliability of his cars. In models that ignore non-driver DNFs altogether, he is therefore going to be slightly underrated due to this virtue being neglected.
Mate, it makes be very happy every time I see one of your f1metrics email. This was a very nice read and very well timed given the current Riccardo/Verstappen situation. 😀
A great post as always!
I always thought Michael Schumacher had much better reliability than his teammates. Apparently that was not the case. Possibly my perception was skewed by Schumacher’s near-bulletproof reliability in his dominant Ferrari years (2001-2004). Compared to Barrichello at Ferrari the number of technical retirements were:
2001: 2-1 (although Barrichello retired a few laps before the end of the US Grand Prix)
2002: 0-2 (although Barrichello failed to even take the start in two GPs)
2005: 2-2 (Schumacher retired after suffering two punctures, so I counted this as a technical problem, however these punctures were most likely a result of his too aggressive driving style or a poor car set-up; I don’t know how this was interpret by you. Barrichello had similar tire-related problems in Malaysia)
So in the end Barrichello “beat” Schumacher with 10 to 6 in terms of mechanical issues in the six years they were teammates, which can be increased to 13-6 if you include his failure to start two races in 2002 and his blown engine in the 2001 US GP. The failure rates were still quite low and the difference is likely not statistically significant. But what teammate did beat Schumacher in terms of reliability?
Great work as usual
Fantastic post, and a very thorough analysis – I am so happy I have found this website! I do have some questions though. First, what was the significance level used? Alboreto’s p-value is 0.007, which should be considered significant with the standard 0.05 cutoff. (Is this because of the Benjamini-Hochberg correction?) Second: in the last table, some drivers have p-values equal to and very close to 1.00. How should those be interpreted? Thanks!
Yes, the Benjamini-Hochberg correction gives a sliding scale of p-value thresholds, below which is considered significant. Alboreto’s doesn’t quite make that cut-off. A value a 1 or very close to 1 means it’s one of the most probable outcomes (i.e., almost all other outcomes are less probable).
OK, makes sense, thanks!
A fantastic and well-constructed analysis as always! I have a query about the way the data was handled. If I’ve understood the methodology correctly, you’ve treated the data as a discrete variable, which each race resulting in either a score or a (non-driver) DNF, 1s and 0s. If this is the case, I can understand the approach, as it ties in with your other analyses on scoring rates, points per race, and measuring driver performance.
However, in this case, I wonder whether it might make more sense to treat race completion as a continuous variable, by the percentage of the race completed leading up to each non-driver DNF. Although failures can be catastrophic, some can also develop gradually, so a continuous variable approach may reflect the reality better.
Furthermore, a driver can have a substantially more positive influence on the reliability of the car than a given teammate does, but one which is still insufficient to make the car last the whole race. For example, a race where one teammate finishes and the other completes 85% of the race distance is treated as the same result as a race where one teammate finishes and the other manages only 10% of the race. Similarly, a race where one driver has a failure after 20% race distance and their teammate after 40% suggests an influence on reliability similar to a pairing where one driver finishes and the other retires after 50% of the race – but only the latter result would be included in your results.
Approaching race completion as continuous would expand your dataset, and may then improve the statistical confidence in some of the results.
At the moment, I think the method is addressing the question ‘do drivers influence mechanical reliability enough to affect their finishing position’, which is more meaningful for your measurement of points-per-race on the expanded points system, but is more restrictive than the stated question of ‘do drivers influence mechanical reliability’ – and I’d love to hear your thoughts!
Looking forward to an end-of-season post, time permitting.
That is a very good point. I don’t have lap of retirement recorded in this particular dataset, but it’s something I can obtain without much trouble. I’ll try looking into this!
[…] himself somehow causing the failures! But as I showed recently, we should always be cautious in reaching conclusions about reliability from small samples. By the end of the season, Verstappen actually had fewer mechanical DNFs than his teammate […]
[…] de lo importante que es ser segundo, de que los domingos hay ante todo que puntuar. Y el blog F1Metrics, especializado en aplicar estadística al deporte, llegó a una conclusión asombrosa. Alain Prost […]
Hi, don’t know if you’ll see this, but would be interesting to know the p value of Ricciardo’s breakdowns this year vs Verstappens, maybe in the end-of-year write up? Plenty of conspiracy theories out there that Red Bull are deliberately sabotaging his car as punishment for leaving, wondered what the random chance of it happening is…
[…] and that non-driver DNFs such as mechanical failures during races are largely down to chance (note: I’ve shown this to be true in general). There are many factors the model does not incorporate (largely because they are currently […]
Very interesting stuff 🙂
I remembered back in the 2000s that Räikkönen seemed to have more Mercedes engine failures than his teammates, and even heard some theory by that time that he could have benefited from a different engine than his teammates, more powerful but less reliable. So, in order to look at it, I counted all of his engine failures during his McLaren years. I’m not talking about mechanical DNFs and not even about engine failure DNFs because since 2004, an engine failure during FP or qualification has a direct impact on the race because of the grid penalty, so the engine is supposed to resist through the whole week-end. So, for this analysis, I counted every engine DNF between 2002 and 2003, and every engine failure during racing week-ends between 2004 and 2006.
My results are those (over 88 GP):
Räikkönen: 19 engine failures
Teammates: 7 engine failures.
I guess that the numbers are too weak to be statistically significant, but still I found these results so close to what I would have guessed to stop there. So, I decided to push further and analyse KR’s engine failures in other teams. FYI, I did this at the beginning of 2018 so this doesn’t take into account any engine failures during this season.
Over his whole carreer outside McLaren, Räikkönen had only 6 engine failures, to 7 to his teammates.
And about mechanical failures other than engine:
With McLaren, 9-10 for Räikkönen.
With other teams, 14-14 for Räikkönen.
Isn’t that peculiar?
Stumbled across this to provide ammo for a discussion I was having about driver reliability and I wanted to say how interesting I found it. I’m descending down the rabbit hole of your site now to find more gems!
As always an interesting read.
I was watching a Jim Clark documentary the other day. They mentioned several times that he was not only quicker than anybody else, but also smoother. That begs the question: how does Jim Clark compare to hos factory team mates (at least when they drove the same model of cars)? Would it be possible to investigate with the restriction that you only use data from team mates in the same entry (Team Lotus) and if they drove the same car as him.
If possible, then another interesting hade would be Jackie Stewart, as he has often made a great deal about his driving smoothly .
Yes, I agree! I’m actually working on such a distinction in the dataset right now.
[…] interesting facts that I might not have otherwise noticed, such as the role of F1 drivers in determining the likelihood of mechanical DNFs. It also allowed for other applications, such as reevaluating each year of the F1 world […]