Workflow Orchestration

Leader posted 1 min read

Just completed Module 2 of the Data Engineering Zoomcamp 2026. Built production-ready data pipelines using Kestra to process 26 million NYC taxi trip records.
What I accomplished:

Orchestrated ETL workflows with Kestra
Ingested data from GitHub to GCS to BigQuery
Implemented partitioned tables for query optimization
Built MERGE operations for data deduplication
Automated monthly data loads with schedule triggers
Completed all homework questions (6/6)

Key learnings:

Template rendering with Pebble syntax
Handling trigger.date for manual vs scheduled runs
GCS storage class compatibility (REGIONAL vs STANDARD)
IANA timezone format for DST handling
BigQuery partitioning strategies

Cost efficiency:
Processed 2GB of data for less than one dollar, staying well within GCP's free tier.
Check out my project on GitHub: https://github.com/Derrick-Ryan-Giggs/module-2-workflow-orchestration
Thanks to DataTalksClub for this amazing free course.

1 Comment

1 vote

More Posts

Tech Ecosystem Observatory: How I Built a Cloud-Native Data Pipeline to Track Global Tech Layoffs

Derrick Ryan - Mar 30

Sibling Rivalry? How to Make Kestra Tasks Talk to Each Other

Amara Graham - Apr 23

From APIs to Warehouses: AI-Assisted Data Ingestion with dlt

Derrick Ryan - Mar 1

Batch Processing with Apache Spark

Derrick Ryan - Mar 7

I Built a Real-Time Crypto Analytics Pipeline for $0.01/Month — Here's the Full Architecture

Derrick Ryan - Apr 27
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

2 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!