Methodology
Data Source
Dataset Overview
The Formula 1 data used in this analysis is sourced from the Formula 1 World Championship (1950-2024) dataset available on Kaggle.
Race Data
- Race results (1950-2024)
- Qualifying session results
- Detailed DNF reasons
Supporting Information
- Driver biographical information
- Constructor (team) details
- Circuit specifications
ELO Rating System Overview
Origins and Applications
The ELO rating system, developed by Arpad Elo, was originally created for chess rankings but has since proven remarkably versatile across various competitive contexts:
- Chess and other board games (FIDE World Rankings)
- Video games (competitive matchmaking systems)
- Sports analytics (football, basketball)
- Academic performance assessment
The system's brilliance lies in its ability to create meaningful comparisons through transitive relationships. For example, in chess, if Player A beats Player B, and Player B beats Player C, the system can infer relative skill levels even if A and C never play each other directly.
Project Inspiration
This project was inspired by the comprehensive analysis presented in "I made an F1 ELO Engine. Who's highest rated?" by Mr V's Garage. The video demonstrated how systematic analysis of historical F1 data could reveal insights about driver performance across eras.
Building upon this concept, this project aims to:
- Create an objective, data-driven rating system
- Account for historical context and era differences
- Provide transparent methodology and results
- Make findings accessible through interactive visualization
Key Implementation Principles
Core Challenges in F1 Analysis:
- Car performance significantly influences results
- Multiple competitors in each event
- Technical failures affect outcomes
- Varying track characteristics
- Different era characteristics (reliability, season length, data quality)
Our implementation addresses these challenges through several key principles:
- Teammate Comparisons: Since teammates drive extremely similar cars in terms of their fundamental design, comparing their performances offers a reliable method to evaluate driver skill while minimizing the impact of car performance differences. Their performance differences therefore reflect driving ability more accurately than comparing drivers across different teams.
- Cross-Team Comparisons: When drivers switch teams, they create a network of indirect comparisons. For example, if Driver A beats teammate B, and B later beats C at another team, we can infer relative performance between A and C. This network effect allows the system to build a comprehensive ranking across the entire grid.
-
Dynamic Learning Rates: The K-factor system adapts based on:
- Driver experience level (higher for rookies, lower for veterans)
- Season length (normalized across different eras)
- Historical era (adjusted for data reliability)
-
Reliability Handling: The system accounts for:
- Technical DNFs (excluded from comparison)
- Driver-caused DNFs (counted as losses)
- Era-specific withdrawal rules (pre/post 1970)
- Shortened races (50% weight)
-
Historical Context: Results are weighted differently to account for:
- Reliability differences across eras
- Frequency of races per season
- Quality of available performance data
- Different career patterns and longevity
-
Confidence Metrics: Rating reliability is assessed through:
- Career length and sample size
- Performance consistency
- Rating volatility
- Era-specific considerations
Core ELO Calculation
1. Base Rating
All drivers start with a base rating of 1500 points. This value:
- Serves as the average skill level
- Provides numerical stability
- Follows traditional ELO standards
2. Expected Score Calculation
- \(E_A\): Expected score for Driver A
- \(R_A\): Current rating of Driver A
- \(R_B\): Current rating of Driver B
- \(400\): Scaling factor for 90% win probability
3. K-Factor Calculation
The K-factor determines rating change magnitude and varies based on driver experience:
Maximum learning rate applied
Gradually decreasing learning rate
Stabilized learning rate for experienced drivers
- \(K_{phase}\): Experience phase factor
- \(F_{era}\): Historical era adjustment
- \(F_{season}\): Season length normalization
4. Rating Update Formula
- \(K_{new}\): Updated rating
- \(K_{old}\): Previous rating
- \(K\): Calculated K-factor
- \(S_{actual}\): Actual score (1 for win, 0 for loss)
- \(E_{expected}\): Expected score
Confidence Metrics
1. Rating Volatility
Formula
Where:
- \(\sigma\): Rating volatility
- \(R_i\): Individual rating in history
- \(\bar{R}\): Mean rating
- \(n\): Number of ratings
Interpretation Guide
| 0-50: | Very stable performance - Consistently meets expectations |
| 51-100: | Normal variation - Typical performance fluctuations |
| 101-150: | Moderate volatility - Shows occasional inconsistency |
| 151-200: | High volatility - Significant performance swings |
| 200+: | Extreme volatility - Very inconsistent results |
2. Confidence Interval Calculation
Initial uncertainty based on number of races
- \(\sigma\): Rating volatility
- \(CareerSpan\): Driver's F1 career length in years
Represents the range where:
- 95% confidence in true rating
- 1.96 is the z-score for 95% level
3. Confidence Score Calculation
- \(Widht_{current}\): Driver's CI width
- \(Widht_{maxt}\): Largest CI width
- \(Widht_{min}\):: Smallest CI width
4. Reliability Grade Scale
| Grade | Confidence Score Range | Interpretation |
|---|---|---|
| A+ | 90-100 | Extremely reliable: Large sample size, consistent performance |
| A | 80-89 | Very reliable: Substantial data, stable performance |
| B+ | 70-79 | Reliable: Good sample size, relatively consistent |
| B | 60-69 | Moderately reliable: Decent sample size |
| C+ | 50-59 | Somewhat reliable: Limited but meaningful data |
| C | 40-49 | Limited reliability: Small sample size or high volatility |
| D+ | 30-39 | Questionable reliability: Very small sample or extreme volatility |
| D | 20-29 | Very questionable reliability: Minimal data points |
| F | 0-19 | Highly unreliable: Insufficient data for meaningful rating |
Driver Classification System
Overview
The driver classification system is designed to accurately reflect a driver's experience level while accounting for the significant historical changes in Formula 1. The sport has evolved dramatically over its history, particularly in terms of season length and career duration. To address this, there is a dual-era classification system that applies different thresholds based on when a driver competed.
Historical Context
- Shorter seasons (typically 7-11 races per year)
- Higher safety risks leading to shorter careers
- Common practice of participating in multiple racing series
- Less specialized preparation and testing
- Limited practice sessions and qualifying runs
- Extended seasons (20+ races per year)
- Improved safety standards enabling longer careers
- Increased specialization and focus on F1
- Extensive testing and practice opportunities
- Complex qualifying formats and race weekends
Classification Levels
- Rookie 0-10 races Initial entry period and learning phase
- Intermediate 11-25 races Developing consistency and race craft
- Experienced 26-40 races Demonstrated competency and consistent performance
- Veteran 41-50 races Proven long-term competitor
- Legend 50+ races Sustained presence and extensive experience
- Rookie 0-25 races Equivalent to about one full season
- Intermediate 26-75 races Multiple seasons of experience
- Experienced 76-150 races Comprehensive understanding and proven race craft
- Veteran 151-250 races Long-term competitor across multiple eras
- Legend 250+ races Exceptional longevity and mastery
System Justification
This dual-era classification system provides several key advantages:
- Historical Accuracy: Recognizes the different demands and circumstances of each era
- Fair Comparison: Allows meaningful comparison of drivers across different periods
- Career Context: Accounts for the evolution of season length and career duration
- Experience Recognition: Acknowledges both short-term intensity and long-term consistency
- Era-Appropriate Evaluation: Reflects the different skill development patterns of each period
Special Cases and Data Processing
Race Event Handling
Race Exclusions
Different regulations and non-representative competition
50% Weight Races
-
1976 Japanese GP Shortened race
-
1991 Australian GP Rain-shortened
-
2009 Malaysian GP Rain-shortened
-
2021 Belgian GP Minimal racing laps
Race Result Processing
Scope of Analysis
Only main race results are processed, excluding:
- Qualifying sessions
- Sprint races
- Practice sessions
Non-Start Classifications
- Injury
- 107% Rule
- Did not qualify
- Pre-race injury
- Safety concerns
- Excluded
- No pre-qualify
- Fatal accident
Penalized Results
Driver-caused DNFs result in automatic loss (0 points):
Withdrawal Handling
Pre-1970 Era
Counted if driver had grid position
Post-1970 Era
Counted only if at least one lap completed