Workflow Orchestration

Leader posted 1 min read

Just completed Module 2 of the Data Engineering Zoomcamp 2026. Built production-ready data pipelines using Kestra to process 26 million NYC taxi trip records.
What I accomplished:

Orchestrated ETL workflows with Kestra
Ingested data from GitHub to GCS to BigQuery
Implemented partitioned tables for query optimization
Built MERGE operations for data deduplication
Automated monthly data loads with schedule triggers
Completed all homework questions (6/6)

Key learnings:

Template rendering with Pebble syntax
Handling trigger.date for manual vs scheduled runs
GCS storage class compatibility (REGIONAL vs STANDARD)
IANA timezone format for DST handling
BigQuery partitioning strategies

Cost efficiency:
Processed 2GB of data for less than one dollar, staying well within GCP's free tier.
Check out my project on GitHub: https://github.com/Derrick-Ryan-Giggs/module-2-workflow-orchestration
Thanks to DataTalksClub for this amazing free course.

1 Comment

1 vote

More Posts

Tech Ecosystem Observatory: How I Built a Cloud-Native Data Pipeline to Track Global Tech Layoffs

Derrick Ryan - Mar 30

From APIs to Warehouses: AI-Assisted Data Ingestion with dlt

Derrick Ryan - Mar 1

Building the Sovereign Debt Observatory: An End-to-End ELT Pipeline on World Bank Debt Data for Low

Derrick Ryan - Apr 9

Analytics Engineering

Derrick Ryan - Feb 15

Batch Processing with Apache Spark

Derrick Ryan - Mar 7
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

7 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!