Why not Sakila? Building a Modern SQL Learning Database for MariaDB

Question

Why not Sakila? Building a Modern SQL Learning Database for MariaDB

calendar_todayJun 26 • schedule5 min read

— Originally published at dev.to

If you've taught SQL or learned it from a course, you've almost certainly met Sakila. The little DVD rental database has been the go-to sample schema for MySQL and MariaDB tutorials for nearly two decades. It's clean, well-normalized, and comes pre-loaded with enough data to write interesting queries.

But here's the thing: Sakila was designed in 2006 for MySQL 5.0.

A lot has changed since then.

What Sakila can't show you

Open the Sakila schema today and count the column types: INT, VARCHAR, TEXT, ENUM, DATETIME, DECIMAL, TINYINT. That's it. No JSON. No FULLTEXT search beyond a basic demo. No SET type. And absolutely no VECTOR.

Meanwhile, MariaDB 11.7 ships with:

Native VECTOR(N) type for AI-era similarity search
Rich JSON functions (JSON_TABLE, JSON_VALUE, JSON_EXTRACT)
Window functions (RANK, LAG, LEAD, running aggregates)
Recursive CTEs
FULLTEXT with boolean mode and relevance scoring

Teaching someone SQL with Sakila in 2026 is like teaching someone to drive in a car with no GPS, no reversing camera, and a manual choke. The fundamentals still apply — but they're missing a huge chunk of what the tool can actually do.

There's also the domain problem. DVD rentals. If you're under 35, you may never have set foot in a video rental shop. The mental model is unfamiliar and the business rules feel arbitrary. What's the difference between a film, an inventory item, and a rental? Why does payment exist independently of rental? Explaining the schema takes time that should go into explaining SQL.

The domain that everyone understands

I needed something universal. Something with:

Obvious entities and relationships
Multiple natural hierarchies (good for recursive CTEs)
A mix of small lookup tables and large transactional tables
A clear reason for JSON (semi-structured data)
An excuse to use vectors (semantic search is everywhere now)

A university fits perfectly. Students, courses, faculty, enrollments, grades — everyone has lived inside this system. The relationships are intuitive. And the domain naturally produces the data shapes I needed:

Need	University equivalent
Big analytic table	`grade_events` — 120 000+ rows of scored items
Hierarchy for recursive CTE	Departments (Faculty → Department → Sub-dept) and course prerequisites
JSON for semi-structured data	Faculty office hours, student emergency contacts, grant funding details
FULLTEXT search	Course descriptions, publication abstracts
VECTOR search	Course semantic embeddings for "find similar courses"
Audit trail	Every enrollment and grade change logged with JSON diffs

What University DB looks like

The schema has 16 tables in four tiers:

Lookup (tiny, < 100 rows): semesters, rooms, scholarships

Domain (small, up to 2 000 rows): departments, faculty, students, courses, course_prerequisites, sections

Transactional (medium): enrollments, student_scholarships, research_projects, publications, project_members

Analytic (big): grade_events (~120k rows) and audit_log (~60k rows, populated by triggers)

Every significant MariaDB datatype appears at least once:

-- VECTOR on courses — semantic embeddings for similarity search
embedding  VECTOR(1536) NULL

-- JSON on faculty — semi-structured office hours
office_hours JSON NULL
-- [{"day":"Mon","start":"10:00","end":"12:00"}, ...]

-- SET on publications — multi-value keyword tags
keywords SET('AI','ML','Databases','Security','Bioinformatics', ...) 

-- FULLTEXT on courses and publications
FULLTEXT KEY ft_course (title, description)

The schema ships with 7 views, 6 stored procedures, and 7 triggers — including one that blocks enrollment when a section is full, and three that write JSON diffs to the audit_log table.

Four levels of example queries

I structured the example queries into four files so the database works for everyone from first-day learners to DBAs:

Level 1 — Basics: SELECT, WHERE, GROUP BY, single-table aggregation

Level 2 — Intermediate: multi-table JOIN (up to 5 tables), correlated subqueries, FULLTEXT search, JSON_VALUE

Level 3 — Advanced: window functions, CTEs, recursive CTEs, JSON_TABLE, SET/FIND_IN_SET, VEC_Distance similarity search

Level 4 — DBA/Developer: EXPLAIN ANALYZE, index strategy, stored procedure authoring, transaction isolation levels, audit log forensics

A level 3 query to find the full prerequisite chain for a course looks like this:

WITH RECURSIVE prereq_chain AS (
    SELECT cp.prerequisite_id,
           p.code   AS prereq_code,
           p.title  AS prereq_title,
           1        AS depth
    FROM   course_prerequisites cp
    JOIN   courses p ON p.course_id = cp.prerequisite_id
    WHERE  cp.course_id = (SELECT course_id FROM courses WHERE code = 'CS300')

    UNION ALL

    SELECT cp2.prerequisite_id,
           p2.code,
           p2.title,
           pc.depth + 1
    FROM   course_prerequisites cp2
    JOIN   prereq_chain         pc ON pc.prerequisite_id = cp2.course_id
    JOIN   courses              p2 ON p2.course_id = cp2.prerequisite_id
    WHERE  pc.depth < 10
)
SELECT DISTINCT depth, prereq_code, prereq_title
FROM   prereq_chain
ORDER  BY depth, prereq_code;

And a vector similarity search to find courses related to a given one:

SELECT c.code, c.title,
       VEC_Distance(ref.embedding, c.embedding) AS distance
FROM   courses ref
JOIN   courses c ON c.course_id <> ref.course_id
WHERE  ref.code = 'CS101'
ORDER  BY distance
LIMIT  5;

Try it right now

You don't need to install anything to explore the schema.

You can run queries against University DB directly in sqlize.online — my online SQL editor that supports MariaDB 11.7. Paste any query from the example files and see results immediately.

If you want a more structured learning experience with exercises and instant feedback, check out sqltest.online — designed for exactly this kind of hands-on SQL practice.

Get the database

Everything is open-source under the MIT license:

👉 github.com/rozhnev/university-db

The repository includes:

01_schema.sql — all 16 tables
02_objects.sql — views, procedures, triggers
03_seed_small.sql — static seed data
generate_data.py — Python/Faker script to populate ~130 000 rows
docker-compose.yml — one command to get a running database
queries/level1.sql through level4.sql — 50+ example queries

git clone https://github.com/rozhnev/university-db.git
cd university-db
cp .env.example .env
docker compose up --build

How to contribute

The project is open source and contributions of any kind are welcome.

Found a bug? — Open an issue on GitHub. Mistakes in the schema, incorrect queries, typos in comments — all worth reporting. The more specific the description, the faster it gets fixed.

Have an interesting query? — Send a pull request to queries/. Level 3–4 examples are especially valuable: window functions, recursive CTEs, JSON forensics, vector search.

Want to improve the data generator? — generate_data.py is intentionally kept simple. More realistic grade distributions, additional data scenarios, or faster bulk-insert batching are all good targets.

Need support for another environment? — A Kubernetes manifest, Helm chart, or a setup script for a cloud managed service (RDS, Cloud SQL, PlanetScale) would be a useful addition.

Educational materials? — Exercises with solutions, workshop slides, or Jupyter notebooks built on this schema are all welcome.

Fork the repository and send a pull request — code review within a few days.

Is Sakila dead?

Not at all. It's still a perfectly valid database for learning basic SQL, and its portability (it runs on any MySQL 5.x+ install) is a genuine advantage. But as a primary teaching tool for modern MariaDB, it's showing its age.

University DB fills the gap for anyone who wants to teach or learn the full surface area of what MariaDB 11.7 can do — from a first SELECT to a vector similarity search in a recursive CTE inside a stored procedure.

I hope it's useful. Feedback, issues, and pull requests are very welcome.

Slava Rozhnev — sqlize.online · sqltest.online · GitHub

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Slava Rozhnev

907 Points • 17 Badges

Haifa, Israel • sqltest.online

3Posts

1Comments

3Connections

I'm Slava Rozhnev, the creator and primary developer behind sqltest.online (interactive SQL practice... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Delivering Database Changes Steve Fentonverified - Jul 22
	5 Things This Playwright SQL Fixture Does So You Don't Have To vitalicset - Apr 13
	The Hidden Program Behind Every SQL Statement lovestaco - Apr 11
	AWS Certifications Are a Building Block, Not the Final Destination Ijay - Jun 16
	AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems praneeth - Mar 31

Why not Sakila? Building a Modern SQL Learning Database for MariaDB

What Sakila can't show you

The domain that everyone understands

What University DB looks like

Four levels of example queries

Try it right now

Get the database

How to contribute

Is Sakila dead?

0 Comments

Please log in to comment on this post.

More Posts

Delivering Database Changes

5 Things This Playwright SQL Fixture Does So You Don't Have To

The Hidden Program Behind Every SQL Statement

AWS Certifications Are a Building Block, Not the Final Destination

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

More From rozhnev

SQL Syntax Highlighter

INNER JOIN vs Correlated Subquery in SQL: What Is Actually Faster?

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,762 amazing developers

Don't have an account? Sign up

OR

Why not Sakila? Building a Modern SQL Learning Database for MariaDB

What Sakila can't show you

The domain that everyone understands

What University DB looks like

Four levels of example queries

Try it right now

Get the database

How to contribute

Is Sakila dead?

0 Comments

Please log in to comment on this post.

More Posts

Delivering Database Changes

5 Things This Playwright SQL Fixture Does So You Don't Have To

The Hidden Program Behind Every SQL Statement

AWS Certifications Are a Building Block, Not the Final Destination

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

More From rozhnev

SQL Syntax Highlighter

INNER JOIN vs Correlated Subquery in SQL: What Is Actually Faster?

Related Jobs

Commenters (This Week)