How We Build the Dataset

Our multi-stage data pipeline combines OEM documentation, regulatory filings, and proprietary validation to produce the most reliable fitment database available.

Manufacturers

141K+

Fitment Records

Pipeline Stages

The Problem With Fitment Data

Most fitment databases on the market are incomplete, outdated, or riddled with errors. A single wrong tire size recommendation can cause handling issues, speedometer inaccuracies, or even safety hazards.

Existing sources suffer from three problems:

Fragmentation — OEM data is scattered across manufacturer portals, dealer networks, and regulatory databases, each using different schemas, naming conventions, and identification systems
Version drift — Mid-cycle facelifts, regional variants, and optional wheel packages create hundreds of edge cases that static databases miss
Translation errors — Cross-border data requires mapping between TPC codes, ETRTO standards, DOT specifications, and manufacturer-specific part numbering systems

We solve this with a multi-source ingestion pipeline that cross-validates every record before it enters the production dataset.

Our 7-Stage Data Pipeline

Primary Source Ingestion

We ingest structured fitment data from multiple authoritative European sources including regulatory filings and manufacturer documentation. Each source uses a different schema — our ingestion layer normalizes across naming conventions, regional variants, and classification systems.

Vehicle Taxonomy Resolution

European vehicles have different internal designations across markets. A BMW G20 in Germany may be listed as a "3er Reihe Limousine" while the same car in Sweden appears as "3-Serien Sedan." Our taxonomy resolver maps 38,000+ engine configurations to a unified vehicle tree using a combination of displacement, power output, fuel type, and production year matching. This stage alone handles over 4,500 body type variants.

Standards Normalization

Raw fitment data references tire sizes using various regional and manufacturer-specific standards. We normalize everything to a unified format, resolving conflicts between different specification sources, handling dual-fitment configurations, and standardizing size notation across the dataset.

Multi-Source Validation

Every fitment record is validated against multiple independent sources. Discrepancies between sources are flagged for review. This cross-validation catches data entry errors, regional exclusions, and discontinued specifications. A meaningful percentage of records require manual adjudication at this stage.

Physical Constraint Verification

We run proprietary validation rules against the full dataset to detect physically impossible fitment combinations — checking clearance parameters, load capacity, and dimensional constraints. A 295/30R22 on a Fiat 500 would pass a naive database lookup. Our constraint layer catches it.

Temporal Alignment

Vehicle production runs don't align with calendar years. A "2024 model year" BMW may have been produced from March 2023 to February 2025, with a mid-cycle facelift changing the available wheel options in September 2024. Our temporal alignment engine maps exact production date ranges to fitment validity windows, handling model year overlaps, LCI (Life Cycle Impulse) transitions, and market-specific launch dates.

Production Export & QA

The validated dataset is exported to our production database with full relational integrity. Every quarterly update goes through a regression test suite that compares the new dataset against the previous version, flagging any records that changed, were removed, or were added. Anomalous changes (>5% delta in any manufacturer's fitment count) trigger a manual review before release.

Technology

Our pipeline is built on a proprietary data processing stack developed over several years. Due to the competitive nature of this market, we don't disclose specific tooling or implementation details.

The full pipeline involves significant computation time per run. We process a large volume of raw records to produce the final validated dataset — the majority of raw inputs are filtered out through deduplication, validation, and constraint checks.

Update Cadence

The dataset is updated quarterly, timed to capture new model year introductions which typically happen in Q3 (European market) and Q1 (Asian manufacturers). Each update adds approximately 5,000-15,000 new fitment records and corrects any errors identified since the previous release.

API and embed widget subscribers receive updates automatically. CSV/SQLite customers can download the latest version from their account.

Accuracy Commitment

We regularly audit the dataset by sampling records and verifying them against original OEM sources. Accuracy is a core priority — every quarterly update includes corrections from the previous cycle.

If you find an error in the dataset, report it. Confirmed errors are corrected in the next quarterly update, or sooner for safety-critical issues.

Why Not Just Scrape It Yourself?

We get asked this sometimes. The honest answer: you could try, but it's harder than it looks.

The raw data is scattered across dozens of sources in different formats and languages. Normalizing vehicle names alone (handling the fact that a "VW Golf Mk8" is the same car as a "Volkswagen Golf CD1" is the same as a "Golf VIII 5H") requires a mapping table that took us months to build and validate.

Then there's the validation. Without cross-referencing against physical constraints and multiple sources, you end up with a dataset full of impossible fitments that will damage your credibility with customers.

We've been doing this for years. We've built the tooling, refined the validation rules, and established the source relationships. The dataset you're buying represents thousands of engineering hours — compressed into a single file you can start using today.

Ready to use production-grade fitment data?

Skip the months of data engineering. Get the validated dataset today.

View pricing →