The Fragmentation Problem: Why Financial Data is Broken

Question

The Fragmentation Problem: Why Financial Data is Broken

calendar_todayFeb 12 • schedule2 min read

— Originally published at www.pocketportfolio.app

The Fragmentation Problem: Why Financial Data is Broken

Every broker has a different CSV format. "Deal Date" vs "Trade Date" vs "Execution Date." "Epic" vs "Symbol" vs "Ticker." Supporting one broker means writing a parser; supporting ten means maintaining ten parsers and hoping none of them break when the broker changes their export. Users with exports from "unsupported" brokers hit a wall. The fragmentation problem is real.

We built Universal LLM Import (CSV) to invert this. Instead of "we support broker X," the system supports any CSV that carries the semantic content of trades: date, ticker, action, quantity, price. Recognition is schema inference—figuring out which column means what—not a fixed list of formats.

Why CSV is still the lingua franca

Every broker and bank can export CSV. No API, no partnership, no approval. The user exports a file and brings it to the app.
CSV is human-readable and tool-friendly. Excel, scripts, archives, and regulators all understand it.
Users already have these files. The product doesn't need to "pull" data; it only needs to interpret what the user provides.

So the bet is: CSV is the sovereign format. The hard part is interpreting messy, inconsistent headers and locales. That's where structure (heuristics) and semantics (LLM) come in.

Column mapping, not free-form parsing

The system does column mapping, not free-form parsing. Given headers and a few sample rows, it answers: which column is date? ticker? buy/sell? quantity? price? LLMs help when headers are non-standard—"Deal Date," "Epic," "No. of shares," "Open Rate"—or when multiple columns could match. They map user vocabulary to a fixed normalized schema. Parsing itself stays deterministic: numbers, dates, tickers are handled by rule-based, locale-aware functions. Only the mapping step can be probabilistic; the user can confirm or correct it before the final parse.

Example. A broker might export:

Deal Date,Epic,Buy/Sell,No. of shares,Deal Price
15/01/2024,AAPL,Buy,10,185.50

Schema inference produces a mapping: date → "Deal Date", ticker → "Epic", action → "Buy/Sell", quantity → "No. of shares", price → "Deal Price". The deterministic parser then reads each row using that mapping. The output is a list of normalized trade objects. The pipeline never says "we don't support this broker"; it says "we need to map your columns" and then proceeds.

This is part 1 of the Sovereign Serial—a 12-part technical series adapted from the book Universal LLM Import: Building Local-First, Sovereign CSV Ingestion.

Read the full Bestseller Edition or Try the app.

1 Comment

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Abba Lawal From Pocket Portfolio

5.8k Points • 172 Badges

United Kingdom • pocketportfolio.app/press

66Posts

50Comments

33Connections

Led delivery of OceanBrain at National Grid Ventures, reducing manual subsea investigations by 60%.
... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Muzzamil Abbas · Answer 1 · 2026-02-13T03:02:45+0000

Using the LLM only for column mapping while keeping the parsing deterministic is a neat design choice.

	Comparison: Universal Import vs. Plaid/Yodlee Pocket Portfolio - Mar 12
	The Interface of Uncertainty: Designing Human-in-the-Loop Pocket Portfolio - Mar 10
	The Privacy Gap: Why sending financial ledgers to OpenAI is broken Pocket Portfolio - Feb 23
	Data Normalization: Solving the Date/Locale Nightmare Pocket Portfolio - Mar 3
	The Future of Finance is Client-Side AI Pocket Portfolio - Mar 24

The Fragmentation Problem: Why Financial Data is Broken

The Fragmentation Problem: Why Financial Data is Broken

Why CSV is still the lingua franca

Column mapping, not free-form parsing

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Comparison: Universal Import vs. Plaid/Yodlee

The Interface of Uncertainty: Designing Human-in-the-Loop

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Data Normalization: Solving the Date/Locale Nightmare

The Future of Finance is Client-Side AI

More From Pocket Portfolio

Standardizing the Ingestion Interface: OpenBrokerCSV and the MIT Importer

Persistence Honesty: Guests, Firebase, Cache, and User-Owned Sync

The Stateless Inference Pipeline: Ephemeral Payloads and Quota Metadata

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,671 amazing developers

Don't have an account? Sign up

OR

The Fragmentation Problem: Why Financial Data is Broken

The Fragmentation Problem: Why Financial Data is Broken

Why CSV is still the lingua franca

Column mapping, not free-form parsing

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Pocket Portfolio

Related Jobs

Commenters (This Week)