End-to-End Reverse ETL: Snowflake → Salesforce (Lightweight, Config-Driven)
Case Study Summary
Client: A B2B SaaS Company
Impact Metrics:
- $10,000/year saved (tooling fees avoided)
- ~90% latency reduction (direct bulk path)
- Operational visibility: logging + failed-record summaries
Business Context & Problem
Why change?
-
Managed Reverse ETL (e.g., Census) was overkill for simple syncs.
-
Latency & control - Third-party connectors added delay and limited tuning.
-
Ownership - Need a simple, auditable pipeline under engineering control.
Success criteria
- Lower cost and latency, predictable transforms, clear logs/auditability, and fast iteration with minimal code.
Requirements & Constraints
- Config over code - Add/adjust syncs via YAML, not app changes.
- Simple transforms - Mostly renames/mappings; predictable & auditable.
- Bulk-first - Prefer Salesforce Bulk API for higher volumes.
- Safety rails - Avoid overwriting populated fields where not intended.
- Fault isolation - One bad sync/record must not halt others.
- Mixed operation modes - insert, update, and upsert available per config.
Solution Overview
- Python app: run_reverse_etl.py discovers YAML configs and executes syncs independently.
- Extract: Snowflake via Snowpark Session (compute pushed to warehouse).
- Transform: Lightweight field mapping & conditional logic.
- Load: Salesforce REST or Bulk (toggle per config).
- Config-driven: Each sync defined in configs/*.yaml.
- Selective queries: Snowflake SQL returns only rows that require syncing.
Architecture & Data Flow
flowchart LR
Dev[(GitLab CI Schedules/Manual)] --> GR@{shape: rounded, label: "GitLab Runner (AWS)"}
GR -->|Pull image & run| CTR@{shape: rounded, label: "Docker Container", animation: fast}
VARS[(AWS Secrets Manager)] --> CTR
CTR e1@-->|Logs| SNOW@{shape: rounded, label: "Snowflake"}
SNOW e2@-->|Results only rows to sync| CTR
CTR e3@-->|REST/Bulk| SF@{shape: rounded, label: "Salesforce"}
SNOW --> MAIL@{shape: rounded, label: "Email Alerts (Snowflake)"}
e1@{ animation: fast }
e2@{ animation: fast }
e3@{ animation: fast }
Flow (high-level)
- Runner starts container with app + configs + secrets from AWS Secrets Manager.
- App discovers YAML configs; each sync runs in isolation.
- Extract executes in Snowflake; queries are scoped to only rows requiring sync.
- Transform maps/renames and applies guardrails (e.g., set_if_empty).
- Load upserts/updates/inserts via REST or Bulk; results summarized per config.
- Observability: email alerts are sent via Snowflake.
Configuration-Driven Design
- Add a new sync by dropping a YAML file—no code changes, faster iteration, clear Git diffs/reviews.
- Consistent runtime: same runner, logging, and guardrails across all syncs.
Notes:
- set_if_empty prevents overwriting populated fields.
- Per-config operation enables mixed modes.
Deployment & Operations
- Runner: GitLab Runner in AWS.
- Image: Pulled from ECR or GitLab Registry.
- Secrets: AWS Secrets Manager (Snowflake, Salesforce credentials, etc.).
Runtime
- Triggering: CI schedules and/or manual runs.
- Batching: Bulk API by default for volume; REST for per-record sets.
- Selective loads: Snowflake SQL returns only rows needing sync.
Results & Impact
- Cost: Removed managed Reverse ETL license for these syncs (≈ $10k/year).
- Performance: ~90% latency reduction via direct Snowflake → SF bulk path.
- Ops clarity: Report-only with clear summaries; email alerts via Snowflake; Git-backed audit trail.
Trade-offs & Alternatives
- Managed Reverse ETL: Pre-built connectors; higher cost/less control.
- Heavier custom build: More features (DQ, lineage, retries), but higher maintenance.
- This pattern: Best ROI for simple, selective syncs where control matters.
-
Ready to optimize your data activation?
Free 30-minute reverse ETL consultation
Custom sync strategy & cost analysis
Potential savings & performance gainsTired of expensive third-party tools for simple data syncs? Let's explore how a lightweight, custom reverse ETL solution could save you thousands while giving you better control and performance.