Multi-Tenant Ingestion at Scale: Re-Architecting CDC for ~2,000 Tenant Schemas

Case Study Summary

Client: A multi-tenant SaaS platform with ~2,000 PostgreSQL tenant schemas (~100+ tables each).

Impact Metrics:

90% lower ingestion cost
80% lower transformation cost and latency
99% lower Snowflake ingestion cost on pipelines migrated to Iceberg + Polaris
99.9% platform uptime under data contracts and TDD
20-person data team onboarded to a standardised dbt framework — onboarding time 1 week → hours
~90% drop in ad-hoc metric-definition requests after the semantic-layer rollout

Challenge

The platform served ~2,000 tenants, each with their own PostgreSQL schema and 100+ tables. The existing ingestion stack tried to land every change from every tenant into Snowflake on near-real-time SLAs.

The result:

Runaway ingestion cost. Per-tenant connectors and per-table extraction created a multiplicative bill that scaled with tenant growth, not with business value.
Latency creep. End-to-end latency on transformations was minutes-to-hours and getting worse with every onboarded tenant.
Schema-drift fragility. Source-system changes from product teams routinely broke downstream dashboards. Each break became a multi-day fire-fight.
Inconsistent transformations. Twenty engineers, twenty styles. Onboarding a new team member took roughly a week before they could ship a tested pipeline.
Metric chaos. Executives and operators argued constantly about whose number was right, and the data team spent hours every week answering "what does X actually mean?".

The brief was unambiguous: cut cost dramatically, cut latency dramatically, and stop the platform from breaking — without halting the business.

Strategic Approach

We treated ingestion as an engineering problem, not a tooling-procurement problem. The core decisions:

Move to event-driven, log-based CDC. Replace per-tenant pull-style connectors with a single AWS DMS deployment streaming change events, fanned out via SQS, processed by Lambda, and landed via Snowpipe. One pipeline, sized to change volume, not tenant count.
Push cold ingestion off Snowflake. Migrate selected high-volume, low-query-frequency pipelines to Apache Iceberg with the Polaris catalog, escaping per-byte ingest pricing on data that didn't need to live in Snowflake's hot path.
Enforce contracts at the boundary. Every source-system topic published a versioned schema. Breaking changes failed CI before they reached production. Downstream consumers became durable.
Standardise the transformation layer. One dbt framework for the whole 20-person team, GitHub Actions CI/CD running in Docker containers, pre-merge tests on every model.
Ship a semantic layer. Snowflake-native, documentation-backed metric definitions — one source of truth for every number that hit a dashboard.

Reference Architecture

flowchart LR
    subgraph sources [Source Systems]
        Pg[PostgreSQL Multi-Tenant<br/>~2,000 schemas]
    end

    subgraph ingest [Event-Driven CDC]
        DMS[AWS DMS<br/>log-based CDC]
        SQS[SQS<br/>change-event fan-out]
        Lambda[Lambda<br/>batch and route]
    end

    subgraph storage [Storage]
        Snowpipe[Snowpipe<br/>hot path]
        Iceberg[Apache Iceberg<br/>+ Polaris catalog<br/>cold and bulk]
    end

    subgraph transform [Transformation and Governance]
        Dbt[dbt framework<br/>standardised, tested]
        Contracts[Data Contracts<br/>schema-drift firewall]
        Semantic[Snowflake Semantic Layer<br/>governed metrics]
    end

    subgraph consumers [Consumers]
        BI[Self-Serve BI]
        Apps[Internal Apps and APIs]
        Exec[Executives and Operators]
    end

    Pg --> DMS --> SQS --> Lambda
    Lambda --> Snowpipe
    Lambda --> Iceberg
    Snowpipe --> Dbt
    Iceberg --> Dbt
    Contracts --> Dbt
    Dbt --> Semantic
    Semantic --> BI
    Semantic --> Apps
    BI --> Exec

The hot path lands in Snowflake via Snowpipe for low-latency analytics. Bulk and cold workloads land in Apache Iceberg with the Polaris catalog, queryable from Snowflake (and other engines) without paying Snowflake's per-byte ingestion premium. Data contracts sit in front of the dbt layer; the semantic layer sits behind it.

What We Built

Ingestion: from connectors to a single CDC stream

AWS DMS as the single log-based CDC source, replacing dozens of per-tenant connectors.
SQS for change-event fan-out, with dead-letter queues and replay for safety.
Lambda for routing and lightweight batching, deciding per-table whether the destination was Snowpipe (hot) or Iceberg (cold/bulk).
Snowpipe for low-latency landing into Snowflake.

This collapsed ingestion cost by 90% by sizing the pipeline to actual change volume, not tenant count.

Migration to Apache Iceberg + Polaris

Identified high-volume, low-query-frequency tables where Snowflake's per-byte ingest pricing dominated cost.
Migrated those pipelines to Apache Iceberg with Polaris catalog.
Snowflake reads Iceberg tables natively via external volumes — no rewriting consumer queries.

This dropped Snowflake ingestion cost on the migrated pipelines by a further 99% with no measurable impact on consumer query performance.

Standardised dbt framework

One project layout, one test pattern, one CI workflow for the whole 20-person team.
GitHub Actions running dbt in Docker containers for reproducible builds and pre-merge testing.
Test-driven development as the default — every model ships with unit tests and data tests.

Engineer onboarding time dropped from about a week to a few hours. Pipeline cost and latency both fell by 80% as redundant transformations were collapsed into the standard pattern.

Data contracts and the data-mesh boundary

Each source-system team owned a published, versioned contract for the data they emitted.
Breaking changes failed CI before they could reach production.
Downstream consumers — dbt models, dashboards, internal apps — became durable to upstream evolution.

End result: 99.9% platform uptime even as upstream source systems continued to evolve at product-team speed.

Semantic layer

Snowflake-native semantic layer with documentation-backed metric definitions.
Every metric had a single owner, a definition, and a test.
Self-serve BI wired to the semantic layer rather than to raw marts.

Ad-hoc "what does this metric mean?" requests to the data team dropped by roughly 90%.

Outcomes

Outcome	Metric
Ingestion cost	−90%
Transformation cost & latency	−80%
Snowflake ingestion cost on Iceberg-migrated pipelines	−99%
Platform uptime	99.9%
Engineer onboarding time	1 week → hours
Ad-hoc metric-definition requests	−~90%

The data team stopped being a bottleneck. The CFO stopped flinching at the Snowflake bill. Product teams kept shipping changes upstream without breaking downstream. Executives stopped arguing about whose number was right.

Key Takeaways

Size ingestion to change volume, not tenant count. Per-tenant or per-table connectors are a tax that scales the wrong way.
Push cold workloads off your hot warehouse. Iceberg + Polaris is now a credible escape valve for per-byte ingest pricing.
Contracts are the data-mesh primitive that actually matters. Without them, "mesh" is just "distributed chaos".
Standardisation beats heroics. One dbt framework + CI + TDD across a 20-person team buys more reliability than any monitoring tool.
A semantic layer is leverage. Once metrics are governed, the data team stops being a help-desk and starts being a platform.

Multi-tenant ingestion costs out of control?

Free 30-minute ingestion architecture review
Sized ingestion blueprint for your tenant model
Cost, latency & migration risk

If your CDC bill is climbing faster than your tenant count, or onboarding a new source is a multi-week engineering project, let's review your ingestion architecture and design something that actually scales — and doesn't break the bank.

Book Architecture Review