Shipping a SaaS Solo: The Boring Architecture Behind Market Verdict
How I built a 10-language business viability analyzer on a Go monolith, a six-step ship script, and a refusal to do anything manually twice.
Market Verdict answers one question: is your business idea viable in a specific location? Type in "coffee roastery in Hamburg" and it pulls market data, demographics, competition density, and local purchasing power, then renders a verdict — with a paid tier that generates a tailored go-to-market plan and a multilingual PDF report.
It's built and operated by one person. This article is about the engineering decisions that make that possible: the stack, the deployment pipeline, the testing philosophy, and the workflow automation that turns a solo project into something that ships to production several times a day without fear.
The Stack: Aggressively Boring
The architecture is a single Go binary talking to PostgreSQL, deployed on Render. No microservices, no Kubernetes, no message broker cluster. The frontend is server-rendered HTML with HTMX for interactivity — no React, no build step, no node_modules folder the size of a small moon.
This isn't a retro statement. It's a headcount calculation. Every moving part in a system is something one person has to monitor, patch, and debug at 2 a.m. A monolith means one deploy artifact, one log stream, one place where bugs live.
The supporting cast:
River for background jobs. River is a Postgres-native job queue for Go — jobs live in the same database as everything else, which means job enqueueing participates in the same transaction as the business logic that triggered it. No "we charged the customer but the email job got lost" failure mode. Scheduled work (a daily SEO audit at 05:00, an auto-blog pipeline at 08:00) runs as River PeriodicJobs. There is no cron package in the codebase, by rule. One scheduler, one set of retry semantics, one dashboard.
Stripe for payments, hardened the hard way. The webhook handler uses a three-phase atomic pattern: record the event, process it, mark it complete — all transactional, with duplicate-event detection so Stripe's at-least-once delivery can't double-apply anything. Customer resolution metadata round-trips through checkout so a webhook can always find its user, even on the unhappy paths.
Honeybadger for error monitoring and Pushover for the alerts that genuinely need to wake someone up. Fatal-level errors send synchronously before the process dies — an error reporter that loses the most important errors is worse than useless.
i18n across ten languages, served on language subdomains (de.marketverdict.app, ja.marketverdict.app, and so on). The interesting part isn't the translation files — it's the test: a reflection-based completeness check walks every translation struct and fails the build if any language is missing any key. Translation drift is a compile-time-adjacent error, not a user-reported one.
The Ship Sequence
Every change to production goes through the same six numbered scripts:
1.dev-go.sh # build + full test suite against a real DB
2.git-push-dev.sh "feat: ..." # stage, commit (conventional), push dev
3.user-dev-test.sh # Playwright E2E against the dev deploy
4.git-merge-dev2main.sh # merge dev → main, triggers prod deploy
5.git-rev-parse.sh # confirm the deployed SHA
6.prod-smoke.sh # 96 checks against live production
The numbers are in the filenames on purpose. There is no ambiguity about order, no tribal knowledge, no "wait, do I run the smoke test before or after the merge?" The sequence is the documentation.
A few details that took iteration to get right:
The pipeline waits for the deploy. An early version would merge to main and immediately smoke-test production — against the old deploy, because Render hadn't finished rolling out. The fix was a wait_for_deploy step that polls until the live SHA matches the merged one. Obvious in hindsight; every pipeline bug is.
The smoke test is wide, not deep. Ninety-six checks: every static page returns 200 in every language, hreflang tags are sane, the sitemap round-trips, JSON-LD parses, security headers are present, the demo flow responds. It's a tripwire, not a test suite — its job is to catch "the deploy is broken in a way the unit tests structurally can't see" within sixty seconds of going live.
Shell scripts have failure modes too. A PIPESTATUS bug once let a failing test run report success because the exit code of go test was swallowed by a pipe into a formatter. The pipeline now treats its own scripts as production code: reviewed, version-controlled, and fixed with the same urgency as the application.
Testing: Real Databases or Nothing
The testing philosophy has two non-negotiables.
First: 100% coverage on every new function, shipped in the same patch as the function. Not as a vanity metric — as a forcing function. If a function is hard to get to 100%, it's usually doing too much, and the coverage requirement surfaces that at write time instead of refactor time. Coverage is also audited: at one point the merged coverage number looked healthy until a check of the package list revealed five packages silently excluded from the test scope. True coverage was lower than reported, with one package at 57%. The lesson generalizes — a metric you don't verify the denominator of is a metric you don't have.
Second: database-backed tests run against a real Postgres, never mocks. The store layer is tested through the actual driver against an actual database (a local Docker container in dev). Mocking the database tests your mocks. The real thing catches the bugs that matter: transaction isolation surprises, constraint violations, the time Docker's bridge networking was silently dropping bytes mid-connection — TCP handshake fine, driver hanging forever. No mock would ever have found that. (The fix, for the curious: --network host.)
End-to-end coverage is Playwright against the deployed dev environment, organized as numbered phases through the full user journey: land, analyze, sign up via magic link, pay, receive the PDF. Email verification in E2E runs through testmail.app with a unique address per test action, so flows that depend on receiving an email — magic-link login, blog approval — are tested for real, not stubbed. One workflow improvement halved total E2E runtime: the magic-link confirmation page used to require a click, and making it auto-submit (with a CSP-nonced inline script and a
fallback) removed a wait from every authenticated test.
Developing With an AI Pair — Without the Chaos
The primary development interface is Claude Code, the terminal-based AI coding agent. Making that productive rather than chaotic required treating the AI like a new senior hire who has amnesia between sessions: brilliant, fast, and in desperate need of process.
Three mechanisms do the heavy lifting:
A standing-rules file. A CLAUDE.md in the repo encodes every convention: where constants live, how patches are applied, which lint rules are non-negotiable, which scripts to use for which task. The AI reads it at session start. Rules that would otherwise be re-explained every session are written down once.
Recon before action. Every session opens with a read-only reconnaissance pass — inspect the actual current state of the files involved before planning anything. This rule exists because of two memorable "phantom bugs": defects described in a handoff note that, on systematic inspection, turned out not to exist in the code at all. The AI had been about to fix bugs that weren't there. Verify, then patch. Reads and writes never share a step.
A session ledger in Postgres. Context doesn't survive between AI sessions, so a small dedicated database records what each numbered session did: what shipped, what's parked, what's known-broken. Session hooks write to it automatically at start and end. It replaced an earlier system of markdown handoff files, which had a fatal flaw — stale notes. The ledger once carried forward a report of 1,186 unresolved payment errors that an API check revealed had all been resolved weeks earlier. Structured, queryable state beats prose that nobody re-validates.
The meta-lesson: AI-assisted development doesn't reduce the need for engineering discipline. It increases it, because the discipline is now the only thing standing between you and a very fast generator of plausible-looking mistakes.
Bugs Worth Remembering
A few production bugs that earned their place in the ledger:
The geocoder that moved Hamburg to New York. A location-resolution fallback was too eager, and ambiguous city names resolved to the wrong continent. Market analysis for the wrong hemisphere is worse than an error message.
The form that wiped itself. A browser back/forward-cache (pageshow) handler cleared form state unconditionally, missing the evt.persisted guard — so navigating back erased what the user typed. The same fix had already been applied to one template; a second, base template had the identical unguarded handler. Fixing a bug in one place is an invitation to grep for its siblings.
The job timeout that wasn't. A long-running report job had an 11-hour timeout configured — but the worker didn't override River's Timeout() method, so the framework's 1-minute default silently won. Configuration that isn't wired is decoration.
What Generalizes
Strip away the specifics and the operating principles are portable to any small team — or team of one:
Minimize moving parts. Every component is a pager you carry.
Number your pipeline. If the ship sequence fits in six scripts with ordinal filenames, nobody — human or AI — gets it wrong.
Test against real infrastructure. Mocks verify your assumptions; databases verify your software.
Automate the second occurrence. A manual step done twice is a system defect, not a workflow.
Verify before you fix. Half the bugs in your notes don't exist; some of the worst bugs aren't in your notes.
Give your AI tooling process, not just prompts. Standing rules, mandatory recon, and durable session state turn an autocomplete engine into a colleague.
None of it is glamorous. All of it is why one person can run a production SaaS in ten languages and still sleep.