You're dropped into a 200,000-line codebase. No documentation. The original author left six months ago. Your task: fix a bug in the checkout flow by Friday.
Where do you even start?
Most developers default to reading code top-to-bottom, starting with main.py or index.ts, following imports, trying to build understanding linearly. Within five minutes, you're ten files deep, you've forgotten where you started, and you're no closer to understanding how the system actually works.
That's because codebases aren't books. They're cities. And you don't learn a city by walking down street #1, then street #2, then street #3. You look at a map, find landmarks, and navigate from there.
Here's how to do that with code.
1. Find the Entry Points
Every system has doors — the places where the outside world interacts with it. Find them before you read a single line of business logic.
For web applications, start with routes and endpoints. Open the router config or grep for route decorators (@app.get, @router.post, app.use). This gives you a table of contents — every action the system can perform, listed in one place. You'll immediately understand the scope: is this a 5-endpoint CRUD app or a 200-endpoint enterprise platform?
For CLI tools, find the argument parser. Whether it's argparse, click, commander, or clap, the command definitions tell you what the tool does.
For libraries, look at the public API — the exports, the __init__.py, the index.ts. What's the contract this library offers to consumers?
Then, before you go deeper, read the configuration files. docker-compose.yml tells you what databases, caches, and message queues the system depends on. Makefile or package.json scripts show you how the system is built and run. CI config (.github/workflows/, Jenkinsfile) reveals the deploy pipeline and test strategy.
These files are metadata about the system. They're short, readable, and they answer the question: what does this thing depend on to run?
2. Trace One Request End-to-End
Now pick the simplest user-facing action. A login. A list fetch. A form submission. Something with a clear start and end.
Trace it through the entire system:
HTTP request → router → handler → service layer → database query → response
Follow the function calls. Read each file only as far as you need to understand what happens next in the chain. Don't get distracted by neighboring functions or utility modules — stay on the path.
This single trace will teach you more than reading 50 files in isolation. You'll discover:
- How the application is layered (or not)
- Where business logic lives vs. where infrastructure code lives
- What patterns the team uses (repositories, services, controllers, or everything-in-the-handler)
- How data transforms as it moves through the system
I came to web application development from a data engineering background, where "follow the data" is instinct. Turns out it's the best approach for understanding any codebase. Data flows through a system like water through pipes — follow it and you'll find every important room in the building.
Once you've traced one request, trace a second one that touches different parts of the system. Two or three traces and you'll have a surprisingly solid mental model.
3. Map the Architecture Before Reading the Code
Before you dive into the details of any module, understand the boxes — the high-level building blocks and how they connect.
Look at the top-level directory structure. In most codebases, this is the architecture laid bare:
/api — HTTP layer
/services — Business logic
/models — Data structures
/repos — Database access
/workers — Background jobs
/utils — Shared helpers
Ask yourself:
- What are the main modules or packages?
- What's each one responsible for?
- How do they talk to each other — direct imports, API calls, message queues, shared database?
Draw it. Seriously. Even a rough sketch with boxes and arrows on the back of a napkin fundamentally changes how you think about the system. It forces you to name things and identify relationships. When your sketch doesn't match what the code does, that gap is your learning.
Look for boundaries — the places where one concern ends and another begins. Good codebases have clear boundaries. Messy codebases have blurred ones. Either way, identifying where the boundaries are (or should be) gives you the mental scaffolding to hang details on later.
4. Read the Tests
Most developers skip the tests when exploring a new codebase. This is a mistake.
Tests are executable documentation. They show you what the system is supposed to do, not just what it happens to do right now. And unlike comments or READMEs, tests break when they're wrong — so they tend to stay accurate.
Start with integration or end-to-end tests. These exercise full workflows and reveal the intended user journeys. A test called test_user_can_checkout_with_discount_code tells you more about the checkout flow than reading the checkout handler in isolation.
Then check unit tests for the modules you care about. These expose edge cases, boundary conditions, and assumptions the original developer thought were important enough to verify. If there's a test for "what happens when the payment gateway times out," that tells you timeouts actually happen and there's handling for it.
No tests? That tells you something too — about the team's practices, the system's maturity, and which parts of the code are most likely to have hidden bugs. Tread carefully in untested territory.
5. Mine the Git History
The codebase you see today is a snapshot. The git history is the story of how it got there — and the story is often more useful than the snapshot.
Recent history reveals current priorities:
git log --oneline -20
What's the team working on right now? What keeps changing? What areas are active vs. stable?
File history explains design decisions:
git log --oneline -- path/to/confusing/file.py
That weird helper function might look pointless today, but the commit that introduced it might say "workaround for X bug in library Y." Now the code makes sense.
Search for pain points:
grep -r "TODO\|HACK\|FIXME\|WORKAROUND" .
These comments are breadcrumbs left by previous developers marking where things are fragile, incomplete, or counterintuitive. They're a map of the system's weak spots.
Blame strategically. When a piece of code confuses you, git blame shows who wrote it and when. The associated commit message and PR (if your team uses them) often contain the reasoning you need. This isn't about assigning fault — it's about finding context.
You don't have to do all of this manually.
IDE features are your first line of support. "Go to definition," "find all references," and "call hierarchy" let you navigate code at the speed of thought. If you're not using these keyboard shortcuts fluently, learning them will 10x your exploration speed.
Dependency visualization tools can generate architecture diagrams from code:
madge for JavaScript/TypeScript module dependencies
pydeps for Python package graphs
- Your IDE's built-in dependency diagrams
AI tools for code Q&A have become genuinely useful for codebase exploration. They're strong at answering "what does this module do?" and "how does data flow from X to Y?" across large codebases. They won't replace your understanding, but they accelerate it — especially for the initial orientation phase where you need breadth over depth.
Language-specific tools matter too. Know your ecosystem's profilers, debuggers, and analysis tools. Running the application with a debugger and stepping through your traced request is one of the most effective learning techniques available.
7. The Mindset Shift
Here's the most important thing: stop trying to understand everything.
When you explore a new codebase, your goal isn't comprehensive knowledge. It's a mental model — a simplified map that lets you navigate. You want to know the neighborhoods, the main roads, and the landmarks. You don't need to know every house.
This means:
Build your model top-down, validate it bottom-up. Start with the architecture, form hypotheses about how things work, then read code to confirm or correct those hypotheses. This is fundamentally different from reading code and hoping understanding emerges.
Accept temporary confusion. You'll see patterns you don't understand. Naming conventions that seem arbitrary. Abstractions that feel like overkill. Note them and move on. Half of them will make sense once you've explored more of the system. The other half might actually be bad code — but you can't tell the difference until you understand the context.
Know where to look, not what everything does. The goal is to build enough of a map that when someone asks "where does the email notification get triggered?" you can say "probably somewhere in the notification service, let me check" — not "I memorized all 400 files."
You're not reading code. You're reverse-engineering a system. These are different skills. Reading code is passive. Reverse-engineering is active — you're forming hypotheses, testing them, and refining your mental model with every file you open.
Every senior developer you admire has this skill. Not because they're smarter, but because at some point they stopped reading files and started exploring systems. They learned to orient themselves quickly, trace the important paths, and build just enough understanding to be effective.
The codebase isn't a mystery to solve once. It's territory to navigate. And like any navigation skill, you get faster every time you do it.
The next time you're dropped into unfamiliar code, resist the urge to start reading from line 1. Step back. Find the doors. Trace a path. Draw a map.
Then walk the streets.
I'm Selva, a self-taught web developer with a data engineering background, building tools for developers who learn by doing. I built Revibe because I wanted a better way to explore and understand real codebases — check out the gallery to see open-source projects broken down into interactive architecture maps and deep dives.