The Stateless Inference Pipeline: Ephemeral Payloads and Quota Metadata

The Stateless Inference Pipeline: Ephemeral Payloads and Quota Metadata

4 50 109
calendar_todayschedule2 min read
— Originally published at www.openportfolio.co.uk

The Stateless Inference Pipeline: Ephemeral Payloads and Quota Metadata

POST /api/ai/chat is Pocket Analyst’s inference hop. The route header in app/api/ai/chat/route.ts states the contract plainly:

/**
 * Stateless: request payload (message, context, attachedContent) is used only
 * to build the LLM prompt and stream the response. No database write or cache
 * of the payload; only analytics/quota metadata are persisted.
 */

Stateless here means stateless with respect to the portfolio context string — not “the server never touches cloud.”


End-to-end flow (accurate)

Raw client corpus (trades / positions in memory)
  → buildPortfolioContext()     [client, Part 2]
  → bounded string in POST body
  → /api/ai/chat                [build prompt, stream out]
  → optional Firestore/KV quota + analytics metadata only

Full engineering record: docs/IP-TECHNICAL-MECHANISMS.md (patent-aligned mechanisms, CSV truncation rules, quote injection).


What may persist server-side

Data class Typical store Purpose
Portfolio context string Not written to portfolio DB by this route Ephemeral prompt input
Free-tier usage counters Firestore and/or Vercel KV Quota enforcement (FREE_TIER_MONTHLY_LIMIT)
Auth / tier resolution Firebase Admin Paid vs free model routing

Paid tiers skip monthly caps; free tier enforces limits — still not a ledger warehouse.


Model routing and attachments

The route tries Gemini when GOOGLE_GENERATIVE_AI_API_KEY is set, with OpenAI fallback. MAX_ATTACHED_CONTENT_LENGTH caps server-side attachment size (frontend also caps).

Attachments are a deliberate second boundary: if the user uploads file text, that is explicit — distinct from the default sanitized snapshot path.


Streaming

We use the Vercel AI SDK (streamText) for token streaming — UX requirement and operational standard in this codebase (CLAUDE.md). The response is generated once per request; there is no server-side conversation memory of portfolio rows in this handler.


Procurement questions this answers

  • Can you prove non-retention of portfolio payload? → Route comment + IP doc + absence of write calls on context.
  • Do you still have cloud? → Yes: Firebase, KV, LLM APIs, Vercel — calibrated honesty (Part 4).
  • What does the model see? → Bounded aggregate + user message + optional attachment + server quotes — not “nothing.”

Part 3 of Sovereign Ingestion & Stateless Inference.

Read the full Sovereign Intelligence book, explore Open Portfolio, or try Pocket Portfolio.

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support

Masbadar - Mar 13

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Ken W. Algerverified - Jun 4

Prompt Grounding in a Stateless World

Pocket Portfolio - Apr 22

Stateless Gateway for Institutional Inference

Pocket Portfolio - May 11
chevron_left
5.8k Points163 Badges
United Kingdompocketportfolio.app/press
66Posts
50Comments
32Connections
Led delivery of OceanBrain at National Grid Ventures, reducing manual subsea investigations by 60%.
... Show more

Related Jobs

View all jobs →

Commenters (This Week)

7 comments
6 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!