You need three years of monthly KPIs by shop. Maybe it's for a PE process. Maybe it's for a board meeting. Maybe you just want to know if cycle time is actually improving or if it just feels that way.
You open CCC. You click run on the all-time Vehicles Delivered report. You wait. Then you watch CCC crash, or time out, or return a partial result that doesn't match the total you're expecting.
Welcome to the historical data problem. Here's how MSOs actually solve it.
Why CCC Can't Just Give You History
CCC's reporting layer is a live-query system running against a transactional database. It wasn't designed to produce multi-year cross-shop reports. The longer the time window, the more records it touches, and the more likely it is to fall over.
At a certain scale, the "all-time" report isn't a report. It's a denial-of-service attack against the same database your shops are writing to in real time.
The fix isn't to make CCC produce history faster. It's to get the history somewhere else—once—and query it from there.
The Three Paths
Path 1: Forward-Looking Only
You accept you don't have real history. You start a sync today. Your warehouse has data from this point forward, and in 12 months you'll have a year of reliable history.
This is the cheapest path. It's also what most MSOs default to, often without realizing they defaulted. One year later, they're asked for the trailing-three-years data room and they don't have it.
Path 2: Incremental Backfill
You sync going forward and run a backfill process that pulls historical records in chunks. Month by month, shop by shop, using whatever API pagination the system supports without triggering timeouts.
This works but takes time. Depending on the volume, backfilling three years of data across 50 shops can take weeks of careful sync work.
Path 3: Month-End Snapshot Reconstruction
If you can't pull the raw data, you reconstruct from whatever monthly snapshots your team has saved—board decks, month-end Excel exports, accounting closes. You normalize them into your warehouse so the metrics can be trended.
This works when you need high-level trend data, not RO-level detail. It doesn't work when a PE sponsor asks for the underlying records.
Most MSOs end up using a combination. Path 1 for live data, Path 2 for anything the API can still reach, Path 3 for the older stuff that's been aged out of the live system.
What You Actually Care About in History
Be specific about what "history" you need. The answer changes the approach:
| Use case | Granularity needed | Path |
|---|---|---|
| PE diligence, 36-month KPI trends | Monthly, by shop | 2 + 3 |
| Board reporting, YoY comparisons | Monthly, by shop | 2 + 3 |
| Individual RO-level analysis | Record-level | 2 only |
| Fraud / audit investigation | Record-level | 2 only, with reconciliation |
| Marketing / customer analysis | Record-level with contact info | 2, with PII considerations |
If you only need monthly trend data, snapshot reconstruction is cheap and fast. If you need record-level detail, you're doing a real backfill.
The Gotchas Nobody Warns You About
Rate Sheet Versioning
Labor codes and rate sheets change over time. If you pull historical ROs today, the system may recompute them against today's rates. Your three-year trend line becomes a re-render, not a true historical series.
Snapshot the rate tables at the time of sync. Store them as a versioned dimension. Join ROs to the rate sheet that was in effect at the time of the RO, not the current one.
Orphaned and Deleted Records
Records deleted from CCC between when they were created and when you backfill don't exist anymore. The historical reports CCC runs today may not match what the reports would have shown at the time, because the underlying data has changed.
The only way to reconcile is against external records: board decks, month-end exports, whatever was saved at the time. If numbers don't match, the current backfill is wrong—not the historical snapshot.
Shop-Level Metadata Changes
A shop that closed two years ago. A shop that changed names. A shop that was acquired from a different MSO with a different DMS. All of these break the shop-id-over-time assumption.
Normalize shop identifiers in your warehouse before you start joining history together. This is tedious but unavoidable.
Timezone Drift
CCC stores timestamps in local shop time in some reports and UTC in others. When you compute cycle times or cross-shop rollups across years, timezone mismatches compound into unexplained variance.
Convert everything to UTC at the warehouse layer, with the shop's timezone as a dimension. Compute business-day and business-hour metrics from there.
Backfill Throttling
Pulling three years of data through the same API endpoints that are serving your live sync can starve either process. Don't backfill during business hours. Don't backfill against the same credentials your live pipeline uses. Rate-limit the backfill aggressively so it doesn't create incident-grade pressure on the API.
A Realistic Timeline
For a collision MSO running 30 shops, backfilling three years of data to the level a PE sponsor expects:
- Planning and dimension normalization: 1–2 weeks
- Incremental backfill per shop: 2–4 weeks
- Reconciliation against external records: 1–2 weeks
- Dimensional modeling and testing: 1–2 weeks
Total: 6–10 weeks for a clean historical load at an MSO this size. Faster is possible but usually produces data that doesn't reconcile.
When to Start
If you might need historical data in the next two years—for any reason, diligence or otherwise—start the forward-looking sync today and start the backfill planning this month.
The cheapest version of this problem is one that gets solved before anyone's asking for the data. The most expensive version is one that has to be solved in six weeks during a diligence process.