Architecting a Local-First Hybrid RAG for Finance

Question

Architecting a Local-First Hybrid RAG for Finance

calendar_todayFeb 25 • schedule3 min read

— Originally published at www.pocketportfolio.app

Architecting a Local-First Hybrid RAG for Finance

Server: Next.js App Router, Vercel AI SDK (streamText, useChat), Gemini 1.5 Flash (default) and optional Pro for paid tiers. The API route /api/ai/chat is the gatekeeper: it receives sanitized context, user message, and optional attached content; enforces quotas; calls the model; streams the response. Client: React, Redux (or equivalent) for app state, IndexedDB for persistence. The context builder runs in the browser and passes its output in the request body. No server-side session store for portfolio data; each request is self-contained.

Why this split. The browser is the only place that has access to the user's full data without copying it to the cloud. The server is the only place that can call the LLM and (optionally) Google Search grounding at scale. So we split: memory (what the user owns, what they've said this session) in the browser; reasoning (token generation, search) on the server. The API is stateless and does not persist portfolio or conversation history.

The flow

Client: User has trades/positions in state. Context builder runs buildPortfolioContext(trades, positions) → string. User types a message (and optionally attaches a file). Client sends POST /api/ai/chat with { message, portfolioContext?, attachedContent? }.
Server: Validates body; checks tier and quota (e.g. Firestore); selects model (Flash vs Pro); builds system prompt (role, scope, guardrails); appends portfolio context and optional attachment to the prompt; calls Gemini (with or without search grounding); streams tokens back.
Client: Vercel AI SDK consumes the stream; updates message list; renders Markdown. No server storage of the conversation; the client holds the message list in component state.

So: Client-side summarization → Stateless API route → LLM (with optional grounding) → Streaming response back to client.

Split Brain: memory vs. reasoning

Split Brain is the key concept. The "memory" lives in the browser: the full trade list, positions, and the conversation history for this session. The "reasoning" lives on the server: the LLM call, token generation, and optional search grounding. The API is a pure function: (sanitizedContext, userMessage, optionalFileContent) → stream. It does not read from a database of user data; it does not store the context after the request.

This avoids the worst of both worlds: we do not send raw data to the cloud (privacy), and we do not run the LLM in the browser (performance, cost, capability). We get the best of both: local data sovereignty and cloud-scale reasoning.

Why we do not use server-side RAG or vector store for portfolio

Traditional RAG indexes documents on a server and retrieves relevant chunks at query time. We deliberately do not do that for portfolio data: it would require uploading and storing the user's data on our infrastructure, which violates the sovereign-data principle. Our "RAG" is client-side: the client holds the corpus (trades, positions), reduces it to a summary (context builder), and sends only that summary. There is no server-side index of the user's portfolio. The only thing that looks like "retrieval" is Gemini's optional search grounding for market data—and that retrieves public web data, not the user's private data. Move AI to the data, not data to the AI.

Request shape

The client sends a JSON body: { message: string, portfolioContext?: string, attachedContent?: string }. The message is required; the other two are optional. The server responds with a streaming body so the client can render tokens as they arrive. If quota is exceeded, the server returns 429; the client shows the quota-exceeded modal.

Implementation notes

Context builder: app/lib/ai/contextBuilder.ts — buildPortfolioContext(trades, positions?) returns a string. Called in the client before each send.
Chat API: app/api/ai/chat/route.ts — POST handler; reads message, portfolioContext, attachedContent; enforces quota; calls Gemini; streams.
Usage and quotas: app/api/ai/usage/route.ts and Firestore for tier and remaining questions.

Part 2 of Sovereign Intelligence Serial — adapted from Sovereign Intelligence: Building Local-First RAG for Finance.

Read the full Sovereign Intelligence or Try the app.

1 Comment

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Abba Lawal From Pocket Portfolio

5.8k Points • 175 Badges

United Kingdom • pocketportfolio.app/press

67Posts

51Comments

33Connections

Led delivery of OceanBrain at National Grid Ventures, reducing manual subsea investigations by 60%.
... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

DuchessCodes · Answer 1 · 2026-03-19T15:06:31+0000

Really elegant approach ..I love the “Split Brain” concept. Keeping memory local and reasoning in the cloud balances privacy, performance, and LLM capability perfectly. Client-side context building avoids server-side data storage while still enabling hybrid RAG functionality. Makes me rethink the traditional RAG architecture for sensitive domains like finance.

	Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download) Pocket Portfolio - Apr 1
	The Privacy Gap: Why sending financial ledgers to OpenAI is broken Pocket Portfolio - Feb 23
	Local-First: The Browser as the Vault Pocket Portfolio - Apr 20
	AI Grounding: Connecting local data to live stock prices using Gemini 1.5 Pocket Portfolio - Mar 5
	Flash vs. GPT-4o: Benchmarking latency for financial reasoning Pocket Portfolio - Mar 23

Architecting a Local-First Hybrid RAG for Finance

Architecting a Local-First Hybrid RAG for Finance

The flow

Split Brain: memory vs. reasoning

Why we do not use server-side RAG or vector store for portfolio

Request shape

Implementation notes

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Local-First: The Browser as the Vault

AI Grounding: Connecting local data to live stock prices using Gemini 1.5

Flash vs. GPT-4o: Benchmarking latency for financial reasoning

More From Pocket Portfolio

🚀 SHIPPED: The Stateless Feedback Substrate (v1.0)

Standardizing the Ingestion Interface: OpenBrokerCSV and the MIT Importer

Persistence Honesty: Guests, Firebase, Cache, and User-Owned Sync

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,726 amazing developers

Don't have an account? Sign up

OR

Architecting a Local-First Hybrid RAG for Finance

Architecting a Local-First Hybrid RAG for Finance

The flow

Split Brain: memory vs. reasoning

Why we do not use server-side RAG or vector store for portfolio

Request shape

Implementation notes

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Pocket Portfolio

Related Jobs

Commenters (This Week)