Batch Processing with Apache Spark

Welcome to Coder Legion

Connect with 4,134 amazing developers

Don't have an account? Sign up

Week 6 of Data Engineering Zoomcamp by @DataTalksClub complete

Just finished Module 6 - Batch Processing with Spark. Learned how to:

✅ Set up PySpark and create Spark sessions

✅ Read and process Parquet files at scale

✅ Repartition data for optimal performance

✅ Analyze millions of taxi trips with DataFrames

✅ Use Spark UI for monitoring jobs

Processing 4M+ taxi trips with Spark - distributed computing is powerful

Following along with this amazing free course - who else is learning data engineering?

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Tech Ecosystem Observatory: How I Built a Cloud-Native Data Pipeline to Track Global Tech Layoffs Derrick Ryan - Mar 30
	From APIs to Warehouses: AI-Assisted Data Ingestion with dlt Derrick Ryan - Mar 1
	Tuesday Coding Tip 02 - Template with type-specific API Jakub Neruda - Mar 10
	Workflow Orchestration Derrick Ryan - Jan 30
	Integerating Webapps into Kubeflow Payezhi Chegattil Abhishek - Feb 1

0 Comments