Expanding the GTZAN Dataset: A Journey from YouTube to Mel Spectrograms

Question

Expanding the GTZAN Dataset: A Journey from YouTube to Mel Spectrograms

calendar_todayApr 27 • schedule2 min read

— Originally published at dev.to

I am currently finishing my specialization in Artificial Intelligence and Big Data, and I’ve decided to document the progress of my final project. This work integrates everything I’ve learned in the modules of AI Models (Modelos de Inteligencia Artificial) and Machine Learning Systems (Sistemas de Aprendizaje Automático).

The goal? A robust Music Genre Classifier. But as any data scientist will tell you, the model is only as good as the data. Today, I focused on building the "Data Kitchen": the pipeline that fetches, cleans, and prepares audio for training.

1. The Challenge: Expanding the GTZAN Dataset

While the GTZAN dataset is the industry standard, it lacks modern genres. To make my project unique, I wanted to include Lofi and Rap another others.

I used yt-dlp to source high-quality audio from YouTube. However, I ran into a classic "Junior vs. Environment" boss fight: FFmpeg.

Technical Tip: Even if you install FFmpeg via Conda, Windows sometimes hides the binaries from your Python subprocesses. I solved this by explicitly mapping the ffmpeg_location in my script to ensure the conversion to .wav never fails.

2. Standardizing for Machine Learning Systems

In our Machine Learning Systems module, we emphasized that consistency is key. To make my new data compatible with GTZAN, I had to "clone" its technical specifications:

Sample Rate: 22,050 Hz.
Channels: Mono.
Duration: Exactly 30-second segments.

I developed a script that takes a 1-hour "Lofi Study Beats" mix and slices it into perfect 30-second chunks, maintaining a strict naming convention: lofi.00000.wav, lofi.00001.wav, etc. This ensures the data is ready for bulk processing without manual intervention.

3. Feature Extraction: The Mel Spectrogram

For the AI Models part of the project, we aren't just "listening" to the audio—we are "seeing" it. Using librosa, I transform the raw waveforms into Mel Spectrograms.

The Mel scale is vital because it represents frequencies the way humans actually perceive them. It turns a complex audio signal into a 2D image, allowing me to use Convolutional Neural Networks (CNNs) to identify patterns, like the low-pass filters typical of Lofi or the sharp transients in Rap.

Mel Spectogram of Lofi

4. Key Takeaways for Fellow Students

Relative Paths are Dangerous: When running scripts from the terminal, ../data might point to nowhere. I switched to Path(file).resolve() to make my project portable.
Data Validation: GTZAN has a famous corrupt file (jazz.00054.wav). Learning to handle these exceptions programmatically is a crucial skill I've sharpened during this project.

Next Steps

The pipeline is clean. The data is standardized. The next phase of my final project involves designing the CNN architecture and beginning the long-awaited training phase.

Are you a student or a pro in AI? How do you handle your audio preprocessing pipelines? Let’s discuss in the comments!

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Alejandro Tacoronte Gonzalez

2.1k Points • 15 Badges

Spain • github.com/alejandrotg-code

5Posts

3Comments

16Connections

Passionate Software Developer (DAM) currently specializing in AI and Big Data. I enjoy building robu... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

wanderer · Answer 1 · 2026-04-29T05:16:29+0000

wanderer • Apr 29

Cool project. Slicing mixes into 30s chunks is smart, but do you worry about label noise? Like parts of a mix not fully matching the genre?

alejandrotg-code • Apr 29

@[wanderer] That’s a very valid concern! To address that, I implemented a logic to skip the beginning and the end of each track, specifically to avoid silences, long intros, or 'fade-outs' that don't represent the genre's core features. This ensures the 30s chunks are pulled from the most information-dense part of the song. Do you think a fixed offset is enough, or would you go for a dynamic detection of audio activity?

	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI Masbadar - Mar 12
	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts alessandro_pignati - Apr 2

Expanding the GTZAN Dataset: A Journey from YouTube to Mel Spectrograms

1. The Challenge: Expanding the GTZAN Dataset

2. Standardizing for Machine Learning Systems

3. Feature Extraction: The Mel Spectrogram

4. Key Takeaways for Fellow Students

Next Steps

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

More From alejandrotg-code

️ Work in Progress: Building a Steam Review Analyzer & Recommender

My Project on Music Genre Classification Using Deep Learning

Java for Structure, Python for Speed: My Hybrid Journey in DAM & AI

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,673 amazing developers

Don't have an account? Sign up

OR

Expanding the GTZAN Dataset: A Journey from YouTube to Mel Spectrograms

1. The Challenge: Expanding the GTZAN Dataset

2. Standardizing for Machine Learning Systems

3. Feature Extraction: The Mel Spectrogram

4. Key Takeaways for Fellow Students

Next Steps

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From alejandrotg-code

Related Jobs

Commenters (This Week)