Lessons from a Kaggle Playground Competition

Lessons from a Kaggle Playground Competition

BackerLeader 7 58 99
calendar_today agoschedule1 min read

Just participated in this month's Kaggle Tabular Playground challenge focused on Stellar Classification

The goal was to classify astronomical objects into:

  • STAR
  • GALAXY
  • QSO (Quasar)

A few interesting findings from the exploration:

  1. Redshift turned out to be one of the strongest predictive signals.

  2. Color indices such as u-g, g-r, and r-i were often more informative than raw photometric measurements.

  3. A simple ensemble of LightGBM, XGBoost, and CatBoost performed surprisingly well.

  4. Despite a moderately imbalanced dataset (~65% Galaxy, ~20% QSO, ~14% Star), explicit class weighting actually reduced validation performance in my experiments.

  5. Hyperparameter tuning with Optuna did not outperform a carefully engineered baseline model, reinforcing the importance of feature engineering and domain understanding.

To document the entire process, I created a beginner-friendly guided notebook covering:

• Dataset understanding
• Exploratory analysis
• Feature engineering
• Categorical encoding
• Stratified K-Fold validation
• LightGBM, XGBoost, and CatBoost
• Model ensembling
• Experimentation with class imbalance and hyperparameter tuning

If you're learning machine learning, working on tabular data, or interested in astronomy-related datasets, feel free to take a look.

I'm also happy to discuss alternative approaches, feature engineering ideas, or leaderboard strategies. Constructive feedback and questions are always welcome.

Notebook Link: Click Here

6.3k Points164 Badges7 58 99
Indiayogicode.org
35Posts
90Comments
166Followers
6Connections
AI Engineer focused on building applications & writing practical guides on coding and machine learning.
Build your own developer journey
Track progress. Share learning. Stay consistent.
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Hardening the Agentic Loop: A Technical Guide to NVIDIA NemoClaw and OpenShell

alessandro_pignati - Mar 26

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

alessandro_pignati - Apr 2

Your App Feels Smart, So Why Do Users Still Leave?

kajolshah - Feb 2

Before You Trust Linear Regression, Read This

yogirahul - Apr 20

The concept of scaling is one of the building block of Machine Learning.

yogirahul - Sep 10, 2025
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

12 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!