I'm building a production-grade AI airline assistant in public. Here's the plan.

I'm building a production-grade AI airline assistant in public. Here's the plan.

posted Originally published at dev.to 7 min read

A full stack developer learns AI engineering by building a real project, phase by phase — and writes about every architectural decision, mistake, and lesson along the way.


The CloudSeven Agent series. Building a production-grade AI agent in public, one phase at a time. This is the introduction. Part 1 — Phase 1: Foundation is coming next.


I’m a full-stack engineer with about a decade of experience building applications with Node, PHP, React, and some Python. About three months ago, I started learning AI engineering — and like many developers, I had to decide how to approach learning it.

There's no shortage of great material out there. Plenty of high-quality tutorials, courses, and books cover every part of this stack in depth. For developers with the time to work through them carefully, those are the right path.

I didn't have that time. Between a full-time job and the rest of my life, hours of structured video tutorials weren't realistic for me. So I made a different bet: learn by building something real, and document the journey publicly as I go. Not because tutorials don't work — they do — but because building forces a specific kind of learning that I needed.

This series is the result.

What I'm building

The project is called CloudSeven Agent — a customer-service assistant for the fictional CloudSeven Airlines. The assistant is named Sevi. A passenger can ask Sevi about their flight, look up their booking, check their loyalty points, or get help with a policy question. Sevi figures out what the passenger needs, looks up the relevant data, and responds in natural language.

That description sounds simple. The architecture behind it isn't. The goal is to build something architected the way a real airline's customer-service system would be — with proper layering, dependency injection, type safety, structured logging, and evaluation — and to do it across ten clearly-scoped phases, each building on the last.

The repo is here: github.com/riyons/cloudseven-agent. It's MIT-licensed and Phases 1 and 2 are already shipped at the v0.2.0 tag. You can clone it, run it locally without an API key, and try it yourself in about five minutes.

Why a real-world project, not a toy demo

Most AI tutorials follow a similar pattern: import the LLM library, write a prompt, call the API, print the response. Maybe wrap it in a simple loop. That's it.

This teaches you the mechanics of calling an LLM, which is useful. But it doesn't teach you any of the harder questions a real production system has to answer. Where do you draw the boundary between business logic and the LLM? What happens when the model returns something unexpected? How do you swap providers without rewriting half your code? How do you test it? How do you know when it's actually working?

A real-world project forces those questions to surface. You can't ignore them because they cause real problems — and when you solve them, you learn the actual craft of AI engineering, not just the API surface.

I picked the airline domain specifically because it's concrete. Passenger queries are unambiguous ("what's the status of flight CS-204?"). The data is structured (flights, bookings, loyalty accounts). The failure modes are obvious (wrong information, leaked private data, hallucinated answers). And the domain is familiar enough that anyone can read the queries and instinctively know whether the response is good.

Why I started with a local LLM (no cloud, no API keys)

This was a practical decision that shaped a lot of what came after.

Building an AI agent involves a lot of iteration. You run the same query fifty times to see how the model behaves. You tweak the prompt and rerun. You write evaluations and burn through hundreds of queries to validate that nothing regressed. If every one of those calls hit a paid cloud API, the development cost would add up quickly — even at the cheapest tier, hundreds of queries during a single phase of development isn't trivial.

I chose to develop everything on a local LLM instead. The project uses Ollama running Qwen 2.5 (14B). I chose Qwen 2.5 for this project because it offered a good balance between local performance, reasoning quality, and hardware requirements for my setup.

Qwen 2.5 is a family of free, open-source models that can run locally on modern consumer hardware. Depending on your machine, you can choose smaller variants like Qwen 2.5 7B or larger variants like 14B for stronger reasoning quality. No API keys. No usage tracking. No surprises on the bill at month-end.

There’s a side benefit too: building against a local 14B model is a useful engineering discipline. Qwen 2.5 14B is impressively capable for a local model, and other local models like Llama and Mistral are strong options too, but building within its constraints forces better engineering habits. If the project works well on smaller self-hosted models, it’ll usually work even better on larger hosted models such as Claude or GPT-4 — but the reverse isn’t always true. Many tutorials naturally optimize for faster iteration on powerful hosted models, where the model can often make up for imperfect prompts or design choices. Building on a constrained local model tends to surface those issues earlier.

The project's architecture is also provider-agnostic — the LLM client is behind an interface, so swapping to a paid API like Anthropic's Claude or OpenAI's GPT-4 would be a config change, not a rewrite. That flexibility is there if I ever need it. But for the entire MVP build, the goal is to use only what's free and runs locally.

The roadmap, briefly

The full project plan is ten phases. Don't worry about understanding everything in this list yet — each phase will get its own deep-dive article when it ships. For now, this is just the map:

Phases 1–2 (already shipped):

  • Phase 1 — Foundation: Project scaffold, configuration, repository pattern, LLM abstraction, basic chatbot
  • Phase 2 — Tool calling: Teaching the chatbot to look things up by calling real functions

Phases 3–6 (the MVP build):

  • Phase 3 — LangGraph + semantic routing + tracing: Turning the agent into a proper state machine
  • Phase 4 — RAG over policies: Retrieval-augmented generation for grounded policy answers
  • Phase 5 — Guardrails: PII detection, prompt-injection defense, hallucination checks
  • Phase 6 — FastAPI + evals + deployment: Wrapping it all in a real API with proper evaluation

Phases 7–10 (extensions and stretch goals):

  • Phase 7 — MCP server: Exposing the tools via Model Context Protocol
  • Phase 8 — Text-to-SQL: Letting the agent write SQL for complex queries
  • Phase 9 — Cost optimization: Semantic caching and smart model routing
  • Phase 10 — Pick one: Voice interface, or multimodal (vision-based) extensions

Each phase produces a working, demonstrable system. Each phase will have an article in this series.

What you'll get from this series

If you follow along, here's what you can expect:

  • Real architectural reasoning — not "here's the code, copy it," but "here's the choice I faced, here are the alternatives, here's why I picked this one, and here's what I'd reconsider in hindsight."
  • Honest evaluation of what works and what doesn't — every phase ends with a published evaluation document that captures the actual behavior of the system, including the failures. Phase 2 already documented a regression where the new tool-aware prompt made policy answers worse, not better. That's the kind of finding I won't hide.
  • Beginner-friendly explanations — every new concept is introduced when it becomes relevant, in plain English, with the architectural reason it exists. No assumed knowledge of agents, RAG, or LangGraph.
  • A working open-source project — you can clone, run, modify, and learn from it. The README and architecture diagrams are written to be self-contained.
  • The mistakes I make in real-time — the ones I've already caught, and the ones I haven't yet.

What this series is not

A few things to set expectations clearly.

It's not a step-by-step tutorial. You won't follow along by copy-pasting code into your terminal. The articles are more like director's commentary on a working codebase — they explain the reasoning, but the actual building happens in the repo.

It's not a course on LLMs themselves. I'm not going to explain transformer architecture, attention mechanisms, or how models are trained. This series is about building applications with LLMs, not about the models themselves. There are excellent resources for the deeper foundations — this isn't one of them.

It's also not finished. The series will be published phase by phase as I build. Phases 1 and 2 are already done in the repo, so those articles will go out on a roughly biweekly schedule. After that, articles ship when phases ship.

How to follow along

Three concrete ways to stay connected with the project:

  1. Star the repo on GitHubgithub.com/riyons/cloudseven-agent — to keep up with the code as it evolves.
  2. Follow for upcoming articles in this series.
  3. Run the code yourself. Phase 2 is at the v0.2.0 tag. Instructions are in the README. It takes about 5 minutes to get a working agent running on your laptop.

A note for other beginners

If you're a developer who's been watching tutorials about agents, RAG, and LangGraph but feeling like the knowledge isn't sticking — I'd gently suggest that the missing piece might be the same one I was missing: building something real, even if your version is messier than the tutorials, even if you don't fully understand every part as you go.

You won't have time to do everything the right way the first time. That's fine. The point of building in public isn't to demonstrate that you already know how to do this. It's to show the messy middle — the decisions, the corrections, the mistakes — and trust that the messiness is itself the lesson.

If this series helps even one developer pick a real project and start building, it'll have done its job.

Part 1 — covering Phase 1 (the foundation) — is up next.


Coming up next: Part 1 of this series will cover Phase 1 — the foundation. We'll build a simple chatbot with no tools yet, see exactly what it can and can't do, and discover why it needs to evolve into an agent. Expected publication: roughly a week from now.


The CloudSeven Agent series · Introduction

GitHub: riyons/cloudseven-agent · Learning guide: docs/CloudSeven_Learning_Guide.pdf

Star the repo to support the project and receive updates.


CloudSeven Airlines and the assistant "Sevi" are fictional, created for this educational project. This project is not affiliated with any real airline, company, or brand using similar names.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapse - Apr 20

Your AI Agent Skills Have a Version Control Problem

snapsynapse - Apr 22

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

Split-Brain: Analyst-Grade Reasoning Without Raw Transactions on the Server

Pocket Portfolioverified - Apr 8
chevron_left

Related Jobs

Commenters (This Week)

2 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!