Skip to content

End-to-End Reverse ETL: Snowflake → Salesforce (Lightweight, Config-Driven)

Case Study Summary

Client: A B2B SaaS Company

Impact Metrics:

  • $10,000/year saved (tooling fees avoided)
  • ~90% latency reduction (direct bulk path)
  • Operational visibility: logging + failed-record summaries

Business Context & Problem

Why change?

  • Managed Reverse ETL (e.g., Census) was overkill for simple syncs.

  • Latency & control - Third-party connectors added delay and limited tuning.

  • Ownership - Need a simple, auditable pipeline under engineering control.

Success criteria

  • Lower cost and latency, predictable transforms, clear logs/auditability, and fast iteration with minimal code.

Requirements & Constraints

  • Config over code - Add/adjust syncs via YAML, not app changes.
  • Simple transforms - Mostly renames/mappings; predictable & auditable.
  • Bulk-first - Prefer Salesforce Bulk API for higher volumes.
  • Safety rails - Avoid overwriting populated fields where not intended.
  • Fault isolation - One bad sync/record must not halt others.
  • Mixed operation modes - insert, update, and upsert available per config.

Solution Overview

  • Python app: run_reverse_etl.py discovers YAML configs and executes syncs independently.
  • Extract: Snowflake via Snowpark Session (compute pushed to warehouse).
  • Transform: Lightweight field mapping & conditional logic.
  • Load: Salesforce REST or Bulk (toggle per config).
  • Config-driven: Each sync defined in configs/*.yaml.
  • Selective queries: Snowflake SQL returns only rows that require syncing.

Architecture & Data Flow

flowchart LR
    Dev[(GitLab CI Schedules/Manual)] --> GR@{shape: rounded, label: "GitLab Runner (AWS)"}
    GR -->|Pull image & run| CTR@{shape: rounded, label: "Docker Container", animation: fast}
    VARS[(AWS Secrets Manager)] --> CTR
    CTR e1@-->|Logs| SNOW@{shape: rounded, label: "Snowflake"}
    SNOW e2@-->|Results only rows to sync| CTR
    CTR e3@-->|REST/Bulk| SF@{shape: rounded, label: "Salesforce"}
    SNOW --> MAIL@{shape: rounded, label: "Email Alerts (Snowflake)"}

    e1@{ animation: fast }
    e2@{ animation: fast }
    e3@{ animation: fast }

Flow (high-level)

  1. Runner starts container with app + configs + secrets from AWS Secrets Manager.
  2. App discovers YAML configs; each sync runs in isolation.
  3. Extract executes in Snowflake; queries are scoped to only rows requiring sync.
  4. Transform maps/renames and applies guardrails (e.g., set_if_empty).
  5. Load upserts/updates/inserts via REST or Bulk; results summarized per config.
  6. Observability: email alerts are sent via Snowflake.

Configuration-Driven Design

  • Add a new sync by dropping a YAML file—no code changes, faster iteration, clear Git diffs/reviews.
  • Consistent runtime: same runner, logging, and guardrails across all syncs.

Notes:

  • set_if_empty prevents overwriting populated fields.
  • Per-config operation enables mixed modes.

Deployment & Operations

  • Runner: GitLab Runner in AWS.
  • Image: Pulled from ECR or GitLab Registry.
  • Secrets: AWS Secrets Manager (Snowflake, Salesforce credentials, etc.).

Runtime

  • Triggering: CI schedules and/or manual runs.
  • Batching: Bulk API by default for volume; REST for per-record sets.
  • Selective loads: Snowflake SQL returns only rows needing sync.

Results & Impact

  • Cost: Removed managed Reverse ETL license for these syncs (≈ $10k/year).
  • Performance: ~90% latency reduction via direct Snowflake → SF bulk path.
  • Ops clarity: Report-only with clear summaries; email alerts via Snowflake; Git-backed audit trail.

Trade-offs & Alternatives

  • Managed Reverse ETL: Pre-built connectors; higher cost/less control.
  • Heavier custom build: More features (DQ, lineage, retries), but higher maintenance.
  • This pattern: Best ROI for simple, selective syncs where control matters.
  • Ready to optimize your data activation?


    Free 30-minute reverse ETL consultation
    Custom sync strategy & cost analysis
    Potential savings & performance gains

    Tired of expensive third-party tools for simple data syncs? Let's explore how a lightweight, custom reverse ETL solution could save you thousands while giving you better control and performance.

    Explore Custom Solutions