Multi-Tenant Ingestion at Scale: Re-Architecting CDC for ~2,000 Tenant Schemas
Case Study Summary
Client: A multi-tenant SaaS platform with ~2,000 PostgreSQL tenant schemas (~100+ tables each).
Impact Metrics:
- 90% lower ingestion cost
- 80% lower transformation cost and latency
- 99% lower Snowflake ingestion cost on pipelines migrated to Iceberg + Polaris
- 99.9% platform uptime under data contracts and TDD
- 20-person data team onboarded to a standardised dbt framework — onboarding time 1 week → hours
- ~90% drop in ad-hoc metric-definition requests after the semantic-layer rollout
Challenge
The platform served ~2,000 tenants, each with their own PostgreSQL schema and 100+ tables. The existing ingestion stack tried to land every change from every tenant into Snowflake on near-real-time SLAs.
The result:
- Runaway ingestion cost. Per-tenant connectors and per-table extraction created a multiplicative bill that scaled with tenant growth, not with business value.
- Latency creep. End-to-end latency on transformations was minutes-to-hours and getting worse with every onboarded tenant.
- Schema-drift fragility. Source-system changes from product teams routinely broke downstream dashboards. Each break became a multi-day fire-fight.
- Inconsistent transformations. Twenty engineers, twenty styles. Onboarding a new team member took roughly a week before they could ship a tested pipeline.
- Metric chaos. Executives and operators argued constantly about whose number was right, and the data team spent hours every week answering "what does X actually mean?".
The brief was unambiguous: cut cost dramatically, cut latency dramatically, and stop the platform from breaking — without halting the business.
Strategic Approach
We treated ingestion as an engineering problem, not a tooling-procurement problem. The core decisions:
- Move to event-driven, log-based CDC. Replace per-tenant pull-style connectors with a single AWS DMS deployment streaming change events, fanned out via SQS, processed by Lambda, and landed via Snowpipe. One pipeline, sized to change volume, not tenant count.
- Push cold ingestion off Snowflake. Migrate selected high-volume, low-query-frequency pipelines to Apache Iceberg with the Polaris catalog, escaping per-byte ingest pricing on data that didn't need to live in Snowflake's hot path.
- Enforce contracts at the boundary. Every source-system topic published a versioned schema. Breaking changes failed CI before they reached production. Downstream consumers became durable.
- Standardise the transformation layer. One dbt framework for the whole 20-person team, GitHub Actions CI/CD running in Docker containers, pre-merge tests on every model.
- Ship a semantic layer. Snowflake-native, documentation-backed metric definitions — one source of truth for every number that hit a dashboard.
Reference Architecture
flowchart LR
subgraph sources [Source Systems]
Pg[PostgreSQL Multi-Tenant<br/>~2,000 schemas]
end
subgraph ingest [Event-Driven CDC]
DMS[AWS DMS<br/>log-based CDC]
SQS[SQS<br/>change-event fan-out]
Lambda[Lambda<br/>batch and route]
end
subgraph storage [Storage]
Snowpipe[Snowpipe<br/>hot path]
Iceberg[Apache Iceberg<br/>+ Polaris catalog<br/>cold and bulk]
end
subgraph transform [Transformation and Governance]
Dbt[dbt framework<br/>standardised, tested]
Contracts[Data Contracts<br/>schema-drift firewall]
Semantic[Snowflake Semantic Layer<br/>governed metrics]
end
subgraph consumers [Consumers]
BI[Self-Serve BI]
Apps[Internal Apps and APIs]
Exec[Executives and Operators]
end
Pg --> DMS --> SQS --> Lambda
Lambda --> Snowpipe
Lambda --> Iceberg
Snowpipe --> Dbt
Iceberg --> Dbt
Contracts --> Dbt
Dbt --> Semantic
Semantic --> BI
Semantic --> Apps
BI --> Exec
The hot path lands in Snowflake via Snowpipe for low-latency analytics. Bulk and cold workloads land in Apache Iceberg with the Polaris catalog, queryable from Snowflake (and other engines) without paying Snowflake's per-byte ingestion premium. Data contracts sit in front of the dbt layer; the semantic layer sits behind it.
What We Built
Ingestion: from connectors to a single CDC stream
- AWS DMS as the single log-based CDC source, replacing dozens of per-tenant connectors.
- SQS for change-event fan-out, with dead-letter queues and replay for safety.
- Lambda for routing and lightweight batching, deciding per-table whether the destination was Snowpipe (hot) or Iceberg (cold/bulk).
- Snowpipe for low-latency landing into Snowflake.
This collapsed ingestion cost by 90% by sizing the pipeline to actual change volume, not tenant count.
Migration to Apache Iceberg + Polaris
- Identified high-volume, low-query-frequency tables where Snowflake's per-byte ingest pricing dominated cost.
- Migrated those pipelines to Apache Iceberg with Polaris catalog.
- Snowflake reads Iceberg tables natively via external volumes — no rewriting consumer queries.
This dropped Snowflake ingestion cost on the migrated pipelines by a further 99% with no measurable impact on consumer query performance.
Standardised dbt framework
- One project layout, one test pattern, one CI workflow for the whole 20-person team.
- GitHub Actions running dbt in Docker containers for reproducible builds and pre-merge testing.
- Test-driven development as the default — every model ships with unit tests and data tests.
Engineer onboarding time dropped from about a week to a few hours. Pipeline cost and latency both fell by 80% as redundant transformations were collapsed into the standard pattern.
Data contracts and the data-mesh boundary
- Each source-system team owned a published, versioned contract for the data they emitted.
- Breaking changes failed CI before they could reach production.
- Downstream consumers — dbt models, dashboards, internal apps — became durable to upstream evolution.
End result: 99.9% platform uptime even as upstream source systems continued to evolve at product-team speed.
Semantic layer
- Snowflake-native semantic layer with documentation-backed metric definitions.
- Every metric had a single owner, a definition, and a test.
- Self-serve BI wired to the semantic layer rather than to raw marts.
Ad-hoc "what does this metric mean?" requests to the data team dropped by roughly 90%.
Outcomes
| Outcome | Metric |
|---|---|
| Ingestion cost | −90% |
| Transformation cost & latency | −80% |
| Snowflake ingestion cost on Iceberg-migrated pipelines | −99% |
| Platform uptime | 99.9% |
| Engineer onboarding time | 1 week → hours |
| Ad-hoc metric-definition requests | −~90% |
The data team stopped being a bottleneck. The CFO stopped flinching at the Snowflake bill. Product teams kept shipping changes upstream without breaking downstream. Executives stopped arguing about whose number was right.
Key Takeaways
- Size ingestion to change volume, not tenant count. Per-tenant or per-table connectors are a tax that scales the wrong way.
- Push cold workloads off your hot warehouse. Iceberg + Polaris is now a credible escape valve for per-byte ingest pricing.
- Contracts are the data-mesh primitive that actually matters. Without them, "mesh" is just "distributed chaos".
- Standardisation beats heroics. One dbt framework + CI + TDD across a 20-person team buys more reliability than any monitoring tool.
- A semantic layer is leverage. Once metrics are governed, the data team stops being a help-desk and starts being a platform.
-
Multi-tenant ingestion costs out of control?
Free 30-minute ingestion architecture review
Sized ingestion blueprint for your tenant model
Cost, latency & migration riskIf your CDC bill is climbing faster than your tenant count, or onboarding a new source is a multi-week engineering project, let's review your ingestion architecture and design something that actually scales — and doesn't break the bank.