Manufacturing · Data Engineering

Supply Chain
Data Pipeline

How we replaced a fragile chain of 23 SSIS packages with a modern Azure Data Factory pipeline — slashing ETL from 6 hours to 45 minutes and achieving 99.7% data quality scores.

87%ETL Time Reduced

99.7%Data Quality Score

23 → 4Pipeline Packages

$320KAnnual OpEx Saved

Overview

Client Context

A regional auto-parts manufacturer with 6 distribution centers relied on a sprawling SQL Server Integration Services (SSIS) architecture to move data between their ERP, warehouse management system, CRM, and reporting data warehouse. Over eight years, this had grown into 23 fragile SSIS packages with hard-coded connection strings, no error handling, and zero monitoring.

When a nightly ETL job failed — which happened 3-4 times per month — the operations team didn't know until the next morning, when warehouse managers reported missing inventory data. Each failure took 2-4 hours of DBA time to diagnose and restart, at an estimated cost of $2,200 per incident.

The data warehouse itself had grown to 1.2TB with no partitioning strategy. Full-load refreshes were the only option, taking 6+ hours and blocking the reporting layer during business hours across time zones.

SQL Server 2019 Azure Data Factory SSIS Migration T-SQL Optimization Change Data Capture Azure Monitor Columnstore Indexes Table Partitioning

The Challenge

A Data Pipeline Built on Quicksand

The existing data architecture had no concept of incremental loads, change tracking, or data quality validation. Everything was a full extract-transform-load that treated the data warehouse like it was being rebuilt from scratch every night.

📦 23 Fragile Packages

Each SSIS package was built by a different contractor over 8 years. No standardization, no shared connection managers, and hard-coded server names that broke during every infrastructure change.

⏰ 6-Hour ETL Window

Full-load extracts from the ERP ran for 6+ hours, overlapping with business hours in western time zones. Warehouse managers couldn't access accurate inventory data until after 10 AM.

🔇 Silent Failures

No monitoring, no alerting, no logging. The only failure detection was angry warehouse managers calling the help desk. DBA response time averaged 45 minutes just to begin diagnosis.

📊 Query Performance

The 1.2TB data warehouse had no partitioning, no columnstore indexes, and 200+ stored procedures with cursor-based operations. Complex reports timed out after 10 minutes.

The Solution

Modern Data Architecture

We designed a three-tier data architecture that separates ingestion, transformation, and presentation layers — each independently scalable and monitorable.

01 — Change Data Capture

Enabled CDC on all 34 source tables in the ERP database. Instead of extracting millions of rows every night, the pipeline now processes only the rows that changed since the last run — typically 0.5% of the total volume.

02 — ADF Pipeline Architecture

Consolidated 23 SSIS packages into 4 parameterized ADF pipelines with shared linked services, dynamic dataset references, and metadata-driven orchestration. Each pipeline is self-documenting and version-controlled in Git.

03 — Warehouse Optimization

Implemented sliding-window table partitioning on all fact tables by month. Added clustered columnstore indexes, reducing storage from 1.2TB to 340GB (72% compression) while accelerating analytical queries by 15×.

04 — Observability Layer

Built an Azure Monitor integration with custom KQL dashboards tracking pipeline duration, row counts, data quality scores, and SLA compliance. Automated PagerDuty alerts for any anomaly detected within 30 seconds of occurrence.

Technical Deep-Dive

CDC-Driven Incremental Loading

The core optimization was replacing full-table extracts with Change Data Capture incremental loads. The following pattern captures only modified rows, validates data quality in-flight, and merges into the warehouse using an efficient MERGE statement.

usp_IncrementalLoad_Inventory.sql

-- CDC-Driven Incremental Load with Quality Gates CREATE PROCEDURE [etl].usp_IncrementalLoad_Inventory @LastLSN binary(10) = NULL AS BEGIN SET NOCOUNT ON; -- Get change window boundaries DECLARE @from_lsn binary(10), @to_lsn binary(10); SET @from_lsn = ISNULL(@LastLSN, sys.fn_cdc_get_min_lsn('dbo_Inventory')); SET @to_lsn = sys.fn_cdc_get_max_lsn(); -- MERGE only changed rows into warehouse MERGE dw.FactInventory AS target USING ( SELECT ItemId, WarehouseId, QtyOnHand, QtyReserved, LastCountDate FROM cdc.fn_cdc_get_net_changes_dbo_Inventory( @from_lsn, @to_lsn, 'all') WHERE QtyOnHand >= 0 -- Quality gate ) AS source ON target.ItemId = source.ItemId AND target.WarehouseId = source.WarehouseId WHEN MATCHED THEN UPDATE SET target.QtyOnHand = source.QtyOnHand, target.QtyReserved = source.QtyReserved, target.LastModified = GETUTCDATE() WHEN NOT MATCHED THEN INSERT (ItemId, WarehouseId, QtyOnHand, QtyReserved, LastCountDate) VALUES (source.ItemId, source.WarehouseId, source.QtyOnHand, source.QtyReserved, source.LastCountDate); END;

This pattern processes an average of 12,000 changed rows per run instead of the previous 2.4 million full-table extract. Combined with columnstore compression, the warehouse now refreshes in 45 minutes — well before the first business user logs in.

Results

Measurable Impact

The migration took 10 weeks, including a 2-week parallel-run validation period where both old and new pipelines ran simultaneously to verify data parity.

Metric	Before	After	Improvement
ETL Duration	6 hours	45 minutes	▲ 87% faster
Pipeline Packages	23 SSIS packages	4 ADF pipelines	▲ 83% simplified
Monthly Failures	3-4 incidents	0 incidents (6 mo.)	▲ 100% reliability
Data Warehouse Size	1.2 TB	340 GB	▲ 72% smaller
Report Query Time	8-10 min (timeouts)	15-40 seconds	▲ 15× faster
Failure Detection	Next-day discovery	30-second alerting	▲ Real-time

Client Feedback

We used to dread Monday mornings because that's when the weekend ETL failures surfaced. Now the data warehouse refreshes in under an hour and we haven't had a single failure in six months. Our warehouse managers have accurate inventory before their first coffee. The ROI paid for the project in the first quarter.

VP of Supply Chain Operations

Regional Auto-Parts Manufacturer — 6 Distribution Centers

Related Work

More Case Studies

Ready to Fix Your Data Pipeline?

Let's replace fragile ETL with a modern, observable data architecture.

Start Your Project →

Supply ChainData Pipeline