Corrupted‑to‑Gold

Corrupted‑to‑Gold

Leader posted 1 min read

Corrupted‑to‑Gold: MDM Foundations

This build takes intentionally corrupted CSV files and transforms them into clean, trusted, analytics‑ready Delta tables. The pipeline demonstrates real‑world data engineering constraints: inconsistent delimiters, malformed dates, broken encodings, duplicate entities, and multi‑format ingestion.

The solution includes:

✔ RAW → BRONZE ingestion

  • Normalized delimiters
       Removed non‑ASCII noise
       Safe splitting logic
       Header detection
       Schema validation
    

✔ SILVER cleansing

  • Email, phone, country, and status normalization
       Multi‑format date parsing
       Null handling and type correction
    

✔ Survivorship scoring (MDM‑light)

  • Email validity
      Phone completeness
      Date reliability
      Status correctness
      Row‑level scoring + best‑record selection
    

✔ GOLD outputs

  • Clean, deduped, resequenced customer entities
       Delta‑optimized storage
       Ready for downstream MDM and ADF Mapping Data Flows
    

This project sets the foundation for the full MDM pipeline (matching, clustering, survivorship, and golden‑record creation) and the upcoming ADF MDF suite demonstrating schema drift, parameterization, joins, window functions, branching, SCD‑light, partitioning, and KPI generation.

If you want the full code, architecture, or notebooks, the repo is here:

github.com/usman19zafar/Project_corrupted-to-gold-data-journey

1 Comment

1 vote
0

More Posts

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20

5 Things This Playwright SQL Fixture Does So You Don't Have To

vitalicset - Apr 13

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Mini Ingestion Framework (MIF-1)

Dr. Usman Zafar - Jan 23
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

8 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!