The Fragmentation Problem: Why Financial Data is Broken

The Fragmentation Problem: Why Financial Data is Broken

Backer posted Originally published at www.pocketportfolio.app 2 min read

The Fragmentation Problem: Why Financial Data is Broken

Every broker has a different CSV format. "Deal Date" vs "Trade Date" vs "Execution Date." "Epic" vs "Symbol" vs "Ticker." Supporting one broker means writing a parser; supporting ten means maintaining ten parsers and hoping none of them break when the broker changes their export. Users with exports from "unsupported" brokers hit a wall. The fragmentation problem is real.

We built Universal LLM Import (CSV) to invert this. Instead of "we support broker X," the system supports any CSV that carries the semantic content of trades: date, ticker, action, quantity, price. Recognition is schema inference—figuring out which column means what—not a fixed list of formats.

Why CSV is still the lingua franca

  • Every broker and bank can export CSV. No API, no partnership, no approval. The user exports a file and brings it to the app.
  • CSV is human-readable and tool-friendly. Excel, scripts, archives, and regulators all understand it.
  • Users already have these files. The product doesn't need to "pull" data; it only needs to interpret what the user provides.

So the bet is: CSV is the sovereign format. The hard part is interpreting messy, inconsistent headers and locales. That's where structure (heuristics) and semantics (LLM) come in.

Column mapping, not free-form parsing

The system does column mapping, not free-form parsing. Given headers and a few sample rows, it answers: which column is date? ticker? buy/sell? quantity? price? LLMs help when headers are non-standard—"Deal Date," "Epic," "No. of shares," "Open Rate"—or when multiple columns could match. They map user vocabulary to a fixed normalized schema. Parsing itself stays deterministic: numbers, dates, tickers are handled by rule-based, locale-aware functions. Only the mapping step can be probabilistic; the user can confirm or correct it before the final parse.

Example. A broker might export:

Deal Date,Epic,Buy/Sell,No. of shares,Deal Price
15/01/2024,AAPL,Buy,10,185.50

Schema inference produces a mapping: date → "Deal Date", ticker → "Epic", action → "Buy/Sell", quantity → "No. of shares", price → "Deal Price". The deterministic parser then reads each row using that mapping. The output is a list of normalized trade objects. The pipeline never says "we don't support this broker"; it says "we need to map your columns" and then proceeds.


This is part 1 of the Sovereign Serial—a 12-part technical series adapted from the book Universal LLM Import: Building Local-First, Sovereign CSV Ingestion.

Read the full Bestseller Edition or Try the app.

1 Comment

1 vote

More Posts

Comparison: Universal Import vs. Plaid/Yodlee

Pocket Portfolioverified - Mar 12

The Interface of Uncertainty: Designing Human-in-the-Loop

Pocket Portfolioverified - Mar 10

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Pocket Portfolioverified - Feb 23

Data Normalization: Solving the Date/Locale Nightmare

Pocket Portfolioverified - Mar 3

The Future of Finance is Client-Side AI

Pocket Portfolioverified - Mar 24
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

2 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!