If you've taught SQL or learned it from a course, you've almost certainly met Sakila. The little DVD rental database has been the go-to sample schema for MySQL and MariaDB tutorials for nearly two decades. It's clean, well-normalized, and comes pre-loaded with enough data to write interesting queries.
But here's the thing: Sakila was designed in 2006 for MySQL 5.0.
A lot has changed since then.
What Sakila can't show you
Open the Sakila schema today and count the column types: INT, VARCHAR, TEXT, ENUM, DATETIME, DECIMAL, TINYINT. That's it. No JSON. No FULLTEXT search beyond a basic demo. No SET type. And absolutely no VECTOR.
Meanwhile, MariaDB 11.7 ships with:
- Native
VECTOR(N) type for AI-era similarity search
- Rich
JSON functions (JSON_TABLE, JSON_VALUE, JSON_EXTRACT)
- Window functions (
RANK, LAG, LEAD, running aggregates)
- Recursive CTEs
FULLTEXT with boolean mode and relevance scoring
Teaching someone SQL with Sakila in 2026 is like teaching someone to drive in a car with no GPS, no reversing camera, and a manual choke. The fundamentals still apply — but they're missing a huge chunk of what the tool can actually do.
There's also the domain problem. DVD rentals. If you're under 35, you may never have set foot in a video rental shop. The mental model is unfamiliar and the business rules feel arbitrary. What's the difference between a film, an inventory item, and a rental? Why does payment exist independently of rental? Explaining the schema takes time that should go into explaining SQL.
The domain that everyone understands
I needed something universal. Something with:
- Obvious entities and relationships
- Multiple natural hierarchies (good for recursive CTEs)
- A mix of small lookup tables and large transactional tables
- A clear reason for JSON (semi-structured data)
- An excuse to use vectors (semantic search is everywhere now)
A university fits perfectly. Students, courses, faculty, enrollments, grades — everyone has lived inside this system. The relationships are intuitive. And the domain naturally produces the data shapes I needed:
| Need | University equivalent |
| Big analytic table | grade_events — 120 000+ rows of scored items |
| Hierarchy for recursive CTE | Departments (Faculty → Department → Sub-dept) and course prerequisites |
| JSON for semi-structured data | Faculty office hours, student emergency contacts, grant funding details |
| FULLTEXT search | Course descriptions, publication abstracts |
| VECTOR search | Course semantic embeddings for "find similar courses" |
| Audit trail | Every enrollment and grade change logged with JSON diffs |
What University DB looks like
The schema has 16 tables in four tiers:
Lookup (tiny, < 100 rows): semesters, rooms, scholarships
Domain (small, up to 2 000 rows): departments, faculty, students, courses, course_prerequisites, sections
Transactional (medium): enrollments, student_scholarships, research_projects, publications, project_members
Analytic (big): grade_events (~120k rows) and audit_log (~60k rows, populated by triggers)
Every significant MariaDB datatype appears at least once:
-- VECTOR on courses — semantic embeddings for similarity search
embedding VECTOR(1536) NULL
-- JSON on faculty — semi-structured office hours
office_hours JSON NULL
-- [{"day":"Mon","start":"10:00","end":"12:00"}, ...]
-- SET on publications — multi-value keyword tags
keywords SET('AI','ML','Databases','Security','Bioinformatics', ...)
-- FULLTEXT on courses and publications
FULLTEXT KEY ft_course (title, description)
The schema ships with 7 views, 6 stored procedures, and 7 triggers — including one that blocks enrollment when a section is full, and three that write JSON diffs to the audit_log table.
Four levels of example queries
I structured the example queries into four files so the database works for everyone from first-day learners to DBAs:
Level 1 — Basics: SELECT, WHERE, GROUP BY, single-table aggregation
Level 2 — Intermediate: multi-table JOIN (up to 5 tables), correlated subqueries, FULLTEXT search, JSON_VALUE
Level 3 — Advanced: window functions, CTEs, recursive CTEs, JSON_TABLE, SET/FIND_IN_SET, VEC_Distance similarity search
Level 4 — DBA/Developer: EXPLAIN ANALYZE, index strategy, stored procedure authoring, transaction isolation levels, audit log forensics
A level 3 query to find the full prerequisite chain for a course looks like this:
WITH RECURSIVE prereq_chain AS (
SELECT cp.prerequisite_id,
p.code AS prereq_code,
p.title AS prereq_title,
1 AS depth
FROM course_prerequisites cp
JOIN courses p ON p.course_id = cp.prerequisite_id
WHERE cp.course_id = (SELECT course_id FROM courses WHERE code = 'CS300')
UNION ALL
SELECT cp2.prerequisite_id,
p2.code,
p2.title,
pc.depth + 1
FROM course_prerequisites cp2
JOIN prereq_chain pc ON pc.prerequisite_id = cp2.course_id
JOIN courses p2 ON p2.course_id = cp2.prerequisite_id
WHERE pc.depth < 10
)
SELECT DISTINCT depth, prereq_code, prereq_title
FROM prereq_chain
ORDER BY depth, prereq_code;
And a vector similarity search to find courses related to a given one:
SELECT c.code, c.title,
VEC_Distance(ref.embedding, c.embedding) AS distance
FROM courses ref
JOIN courses c ON c.course_id <> ref.course_id
WHERE ref.code = 'CS101'
ORDER BY distance
LIMIT 5;
Try it right now
You don't need to install anything to explore the schema.
You can run queries against University DB directly in sqlize.online — my online SQL editor that supports MariaDB 11.7. Paste any query from the example files and see results immediately.
If you want a more structured learning experience with exercises and instant feedback, check out sqltest.online — designed for exactly this kind of hands-on SQL practice.
Get the database
Everything is open-source under the MIT license:
👉 github.com/rozhnev/university-db
The repository includes:
01_schema.sql — all 16 tables
02_objects.sql — views, procedures, triggers
03_seed_small.sql — static seed data
generate_data.py — Python/Faker script to populate ~130 000 rows
docker-compose.yml — one command to get a running database
queries/level1.sql through level4.sql — 50+ example queries
git clone https://github.com/rozhnev/university-db.git
cd university-db
cp .env.example .env
docker compose up --build
How to contribute
The project is open source and contributions of any kind are welcome.
Found a bug? — Open an issue on GitHub. Mistakes in the schema, incorrect queries, typos in comments — all worth reporting. The more specific the description, the faster it gets fixed.
Have an interesting query? — Send a pull request to queries/. Level 3–4 examples are especially valuable: window functions, recursive CTEs, JSON forensics, vector search.
Want to improve the data generator? — generate_data.py is intentionally kept simple. More realistic grade distributions, additional data scenarios, or faster bulk-insert batching are all good targets.
Need support for another environment? — A Kubernetes manifest, Helm chart, or a setup script for a cloud managed service (RDS, Cloud SQL, PlanetScale) would be a useful addition.
Educational materials? — Exercises with solutions, workshop slides, or Jupyter notebooks built on this schema are all welcome.
Fork the repository and send a pull request — code review within a few days.
Is Sakila dead?
Not at all. It's still a perfectly valid database for learning basic SQL, and its portability (it runs on any MySQL 5.x+ install) is a genuine advantage. But as a primary teaching tool for modern MariaDB, it's showing its age.
University DB fills the gap for anyone who wants to teach or learn the full surface area of what MariaDB 11.7 can do — from a first SELECT to a vector similarity search in a recursive CTE inside a stored procedure.
I hope it's useful. Feedback, issues, and pull requests are very welcome.
Slava Rozhnev — sqlize.online · sqltest.online · GitHub