CSV Hell: How I Normalized 15 Broker Exports into a Single Local-First Schema
If you have ever tried to build anything on top of retail investing data, you already know the truth:
Every broker CSV export is garbage.
Not a bit inconsistent.
Not mostly usable.
Garbage.
This post is about how I spent three months normalizing 15 different broker CSV formats into a single, predictable, local-first schema, and why I extracted the whole thing into a standalone npm package so no one else has to suffer through this again.
This is not a marketing post.
This is an engineer-to-engineer write-up about pain, tradeoffs, and why CSV parsing is a harder problem than most people admit.
The Problem: Broker CSVs Are a Nightmare
Let us start with the reality.
You download a CSV from a broker thinking:
"Great, I will just parse this and move on."
Then you open the file.
Here are real examples I dealt with.
Robinhood
- Ticker is in column B
- Column name is "Instrument" instead of "Symbol"
- Buy and sell are represented as positive and negative quantities, but fees are embedded in price
Fidelity
- Adds three blank rows at the top of every export
- Column headers do not start until row 4
- Dates are formatted inconsistently across exports
eToro
- Dates are formatted as DD/MM/YYYY even for US users
- Numbers use commas as decimal separators in some locales
- Asset names include emojis and marketing strings
And that is before we even talk about:
- Missing headers
- Renamed columns between years
- Silent format changes
- CSVs that are technically valid but semantically useless
I looked for an existing library to handle this.
There was not one.
So I built it.
The Goal: One Schema to Rule Them All
Before writing a single line of parsing logic, I defined a rule:
Every broker must normalize into the same internal shape, no exceptions.
If the schema is not strict, everything downstream breaks:
- P and L calculations
- Portfolio timelines
- Tax logic
- Visualizations
So I designed a Standard Trade Schema that every adapter must output.
At a high level, every row becomes something like this:
{
"id": "uuid",
"symbol": "AAPL",
"assetType": "equity",
"side": "buy",
"quantity": 10,
"price": 172.34,
"currency": "USD",
"fees": 1.25,
"executedAt": "2024-01-12T15:31:00Z",
"broker": "robinhood",
"raw": {}
}
Once everything looks like this, the rest of the system becomes boring, which is exactly what you want.
The Architecture: Adapter Pattern or Nothing
The only sane way to solve this was the Adapter Pattern.
Each broker gets:
- Its own adapter
- Its own parsing rules
- Its own ugly edge cases
But all adapters implement the same interface.
The Interface
export interface BrokerAdapter {
canParse(file: File): boolean;
parse(file: File): Promise<StandardTrade[]>;
}
That is it.
No shared assumptions.
No almost compatible logic.
If an adapter cannot confidently parse a file, it refuses to.
From Raw CSV to Structured Data
Here is what a real raw row might look like coming out of a CSV parser:
{
"Instrument": "Apple Inc.",
"Quantity": "10",
"Price": "$172.34",
"Trade Date": "01/12/2024",
"Fees": ""
}
Every adapter’s job is to:
- Map column names
- Normalize numbers
- Fix dates
- Infer missing fields
- Validate the output
Example (simplified):
function normalizeRobinhoodRow(row: any): StandardTrade {
return {
id: crypto.randomUUID(),
symbol: mapInstrumentToSymbol(row["Instrument"]),
assetType: "equity",
side: Number(row["Quantity"]) > 0 ? "buy" : "sell",
quantity: Math.abs(Number(row["Quantity"])),
price: parseCurrency(row["Price"]),
currency: "USD",
fees: parseCurrency(row["Fees"] || "0"),
executedAt: parseDate(row["Trade Date"]),
broker: "robinhood",
raw: row
};
}
Every adapter does ugly work so the rest of the app never has to.
The Aha Moment: Do It in the Browser
Originally, I planned to do this on the backend.
That was a mistake.
Why Local-First Wins
- CSV files can be large
- Financial data is sensitive
- Users do not want to upload raw trade history to a server
- Latency kills onboarding
So I moved everything into the browser.
- PapaParse for streaming CSV parsing
- Web Workers to avoid blocking the UI
- Zod-style validation for schema safety
The result:
- Drag and drop CSV import
- Zero backend processing
- No credentials, no accounts, no risk
The user keeps their data.
The browser does the work.
At some point it became obvious:
This logic should not be locked inside an app.
So I extracted it into a standalone package:
@pocket-portfolio/importer
https://www.npmjs.com/package/@pocket-portfolio/importer
Install it:
npm install @pocket-portfolio/importer
Use it in your own project:
import { importTrades } from "@pocket-portfolio/importer";
const trades = await importTrades(file);
That is it.
No signup.
No API keys.
No backend.
Why This Is the Gateway
Most developers do not wake up wanting a portfolio tracker.
They wake up angry that:
- Their broker export is unusable
- Their script broke again
- Their data is inconsistent
This library solves that problem.
And once you have clean, normalized trade data, the next question is obvious:
"Okay, now what do I do with it?"
That is where Pocket Portfolio comes in, a free, local-first dashboard that already understands this schema.
You can use the library alone.
Or you can see it working end to end.
https://www.pocketportfolio.app
Lessons Learned
- CSV parsing is not a simple task
- Normalization is a product feature
- Local-first is not a buzzword, it is UX
- The best open-source tools start by solving boring pain
What Is Next
- More brokers
- Better locale handling
- Schema versioning
- Community adapters via PRs
If you have ever screamed at a CSV file, this is for you.
Links
This is part of the Rolling Thunder launch for Pocket Portfolio.
Every post ships something real.