F1 ELO Rankings

Data Source

Dataset Overview

The Formula 1 data used in this analysis is sourced from the Formula 1 World Championship (1950-2024) dataset available on Kaggle.

Race Data

Race results (1950-2024)
Qualifying session results
Detailed DNF reasons

Supporting Information

Driver biographical information
Constructor (team) details
Circuit specifications

ELO Rating System Overview

Origins and Applications

The ELO rating system, developed by Arpad Elo, was originally created for chess rankings but has since proven remarkably versatile across various competitive contexts:

Chess and other board games (FIDE World Rankings)
Video games (competitive matchmaking systems)
Sports analytics (football, basketball)
Academic performance assessment

The system's brilliance lies in its ability to create meaningful comparisons through transitive relationships. For example, in chess, if Player A beats Player B, and Player B beats Player C, the system can infer relative skill levels even if A and C never play each other directly.

Project Inspiration

This project was inspired by the comprehensive analysis presented in "I made an F1 ELO Engine. Who's highest rated?" by Mr V's Garage. The video demonstrated how systematic analysis of historical F1 data could reveal insights about driver performance across eras.

Building upon this concept, this project aims to:

Create an objective, data-driven rating system
Account for historical context and era differences
Provide transparent methodology and results
Make findings accessible through interactive visualization

Key Implementation Principles

Core Challenges in F1 Analysis:

Car performance significantly influences results
Multiple competitors in each event
Technical failures affect outcomes
Varying track characteristics
Different era characteristics (reliability, season length, data quality)

Our implementation addresses these challenges through several key principles:

Teammate Comparisons: Since teammates drive extremely similar cars in terms of their fundamental design, comparing their performances offers a reliable method to evaluate driver skill while minimizing the impact of car performance differences. Their performance differences therefore reflect driving ability more accurately than comparing drivers across different teams.
Cross-Team Comparisons: When drivers switch teams, they create a network of indirect comparisons. For example, if Driver A beats teammate B, and B later beats C at another team, we can infer relative performance between A and C. This network effect allows the system to build a comprehensive ranking across the entire grid.
Dynamic Learning Rates: The K-factor system adapts based on:
- Driver experience level (higher for rookies, lower for veterans)
- Season length (normalized across different eras)
- Historical era (adjusted for data reliability)
Reliability Handling: The system accounts for:
- Technical DNFs (excluded from comparison)
- Driver-caused DNFs (counted as losses)
- Era-specific withdrawal rules (pre/post 1970)
- Shortened races (50% weight)
Historical Context: Results are weighted differently to account for:
- Reliability differences across eras
- Frequency of races per season
- Quality of available performance data
- Different career patterns and longevity
Confidence Metrics: Rating reliability is assessed through:
- Career length and sample size
- Performance consistency
- Rating volatility
- Era-specific considerations

Core ELO Calculation

1. Base Rating

\[R_{initial} = 1500\]

All drivers start with a base rating of 1500 points. This value:

Serves as the average skill level
Provides numerical stability
Follows traditional ELO standards

2. Expected Score Calculation

\[E_A = \frac{1}{1 + 10^{(R_B - R_A)/400}}\]

\(E_A\): Expected score for Driver A
\(R_A\): Current rating of Driver A
\(R_B\): Current rating of Driver B
\(400\): Scaling factor for 90% win probability

3. K-Factor Calculation

The K-factor determines rating change magnitude and varies based on driver experience:

Rookie Phase (0-10 races)

\[K_{rookie} = K_{max} \cdot 1.0\]

Maximum learning rate applied

Learning Phase (10-25 races)

\[progress = \frac{races - 10}{15}\] \[K_{learning} = K_{max} \cdot (1.0 - 0.6 \cdot progress)\]

Gradually decreasing learning rate

Experience Phase (25-50 races)

\[progress = \min(1.0, \frac{races - 25}{25})\] \[K_{experienced} = K_{max} \cdot (0.4 - 0.2 \cdot progress)\]

Stabilized learning rate for experienced drivers

Final K-Factor Calculation

\[K_{final} = K_{phase} \cdot F_{era} \cdot F_{season}\]

\(K_{phase}\): Experience phase factor
\(F_{era}\): Historical era adjustment
\(F_{season}\): Season length normalization

4. Rating Update Formula

\[R_{new} = R_{old} + K(S_{actual} - E_{expected})\]

\(K_{new}\): Updated rating
\(K_{old}\): Previous rating
\(K\): Calculated K-factor
\(S_{actual}\): Actual score (1 for win, 0 for loss)
\(E_{expected}\): Expected score

Confidence Metrics

1. Rating Volatility

Formula

\[\sigma = \sqrt{\frac{\sum_{i=1}^{n}(R_i - \bar{R})^2}{n-1}}\]

Where:

\(\sigma\): Rating volatility
\(R_i\): Individual rating in history
\(\bar{R}\): Mean rating
\(n\): Number of ratings

Interpretation Guide

0-50:	Very stable performance - Consistently meets expectations
51-100:	Normal variation - Typical performance fluctuations
101-150:	Moderate volatility - Shows occasional inconsistency
151-200:	High volatility - Significant performance swings
200+:	Extreme volatility - Very inconsistent results

Higher volatility doesn't necessarily indicate poor performance, but rather the consistency of results. Factors like car reliability, team changes, or competing in different eras can all contribute to higher volatility.

2. Confidence Interval Calculation

Base Standard Error

\[SE_{base} = \frac{200}{\sqrt{max(1, RaceCount)}}\]

Initial uncertainty based on number of races

Adjusted Standard Error

\[SE_{adjusted} = SE_{base} \cdot (1 + \frac{\sigma}{100}) \cdot (1 + \frac{CareerSpan}{50})\]

\(\sigma\): Rating volatility
\(CareerSpan\): Driver's F1 career length in years

95% Confidence Interval

\[CI = Rating \pm (1.96 \cdot SE_{adjusted})\]

Represents the range where:

95% confidence in true rating
1.96 is the z-score for 95% level

3. Confidence Score Calculation

\[Score = 100 \cdot \frac{Width_{max} - Width_{current}}{Width_{max} - Width_{min}}\]

\(Widht_{current}\): Driver's CI width
\(Widht_{maxt}\): Largest CI width
\(Widht_{min}\):: Smallest CI width

4. Reliability Grade Scale

Grade	Confidence Score Range	Interpretation
A+	90-100	Extremely reliable: Large sample size, consistent performance
A	80-89	Very reliable: Substantial data, stable performance
B+	70-79	Reliable: Good sample size, relatively consistent
B	60-69	Moderately reliable: Decent sample size
C+	50-59	Somewhat reliable: Limited but meaningful data
C	40-49	Limited reliability: Small sample size or high volatility
D+	30-39	Questionable reliability: Very small sample or extreme volatility
D	20-29	Very questionable reliability: Minimal data points
F	0-19	Highly unreliable: Insufficient data for meaningful rating

Driver Classification System

Overview

The driver classification system is designed to accurately reflect a driver's experience level while accounting for the significant historical changes in Formula 1. The sport has evolved dramatically over its history, particularly in terms of season length and career duration. To address this, there is a dual-era classification system that applies different thresholds based on when a driver competed.

Historical Context

Pre-1980 Era

Shorter seasons (typically 7-11 races per year)
Higher safety risks leading to shorter careers
Common practice of participating in multiple racing series
Less specialized preparation and testing
Limited practice sessions and qualifying runs

Modern Era (1980-Present)

Extended seasons (20+ races per year)
Improved safety standards enabling longer careers
Increased specialization and focus on F1
Extensive testing and practice opportunities
Complex qualifying formats and race weekends

Classification Levels

Pre-1980 Era Thresholds

Rookie 0-10 races Initial entry period and learning phase
Intermediate 11-25 races Developing consistency and race craft
Experienced 26-40 races Demonstrated competency and consistent performance
Veteran 41-50 races Proven long-term competitor
Legend 50+ races Sustained presence and extensive experience

Modern Era Thresholds (1980-Present)

Rookie 0-25 races Equivalent to about one full season
Intermediate 26-75 races Multiple seasons of experience
Experienced 76-150 races Comprehensive understanding and proven race craft
Veteran 151-250 races Long-term competitor across multiple eras
Legend 250+ races Exceptional longevity and mastery

System Justification

This dual-era classification system provides several key advantages:

Historical Accuracy: Recognizes the different demands and circumstances of each era
Fair Comparison: Allows meaningful comparison of drivers across different periods
Career Context: Accounts for the evolution of season length and career duration
Experience Recognition: Acknowledges both short-term intensity and long-term consistency
Era-Appropriate Evaluation: Reflects the different skill development patterns of each period

Special Cases and Data Processing

Race Event Handling

Race Exclusions

Indianapolis 500 (1950s)

Different regulations and non-representative competition

50% Weight Races

1976 Japanese GP Shortened race
1991 Australian GP Rain-shortened
2009 Malaysian GP Rain-shortened
2021 Belgian GP Minimal racing laps

Race Result Processing

Scope of Analysis

Only main race results are processed, excluding:

Qualifying sessions
Sprint races
Practice sessions

Non-Start Classifications

Injury
107% Rule
Did not qualify
Pre-race injury

Safety concerns
Excluded
No pre-qualify
Fatal accident

Penalized Results

Driver-caused DNFs result in automatic loss (0 points):

Accident

Collision

Spun off

Withdrawal Handling

Pre-1970 Era

Counted if driver had grid position

Post-1970 Era

Counted only if at least one lap completed

Methodology