CSV Hell: How I Normalized 15 Broker Exports into a Single Local-First Schema

CSV Hell: How I Normalized 15 Broker Exports into a Single Local-First Schema

Backer posted Originally published at pocketportfolio.app 4 min read

CSV Hell: How I Normalized 15 Broker Exports into a Single Local-First Schema

If you have ever tried to build anything on top of retail investing data, you already know the truth:

Every broker CSV export is garbage.

Not a bit inconsistent.
Not mostly usable.
Garbage.

This post is about how I spent three months normalizing 15 different broker CSV formats into a single, predictable, local-first schema, and why I extracted the whole thing into a standalone npm package so no one else has to suffer through this again.

This is not a marketing post.
This is an engineer-to-engineer write-up about pain, tradeoffs, and why CSV parsing is a harder problem than most people admit.


The Problem: Broker CSVs Are a Nightmare

Let us start with the reality.

You download a CSV from a broker thinking:
"Great, I will just parse this and move on."

Then you open the file.

Here are real examples I dealt with.

Robinhood

  • Ticker is in column B
  • Column name is "Instrument" instead of "Symbol"
  • Buy and sell are represented as positive and negative quantities, but fees are embedded in price

Fidelity

  • Adds three blank rows at the top of every export
  • Column headers do not start until row 4
  • Dates are formatted inconsistently across exports

eToro

  • Dates are formatted as DD/MM/YYYY even for US users
  • Numbers use commas as decimal separators in some locales
  • Asset names include emojis and marketing strings

And that is before we even talk about:

  • Missing headers
  • Renamed columns between years
  • Silent format changes
  • CSVs that are technically valid but semantically useless

I looked for an existing library to handle this.

There was not one.

So I built it.


The Goal: One Schema to Rule Them All

Before writing a single line of parsing logic, I defined a rule:

Every broker must normalize into the same internal shape, no exceptions.

If the schema is not strict, everything downstream breaks:

  • P and L calculations
  • Portfolio timelines
  • Tax logic
  • Visualizations

So I designed a Standard Trade Schema that every adapter must output.

At a high level, every row becomes something like this:

{
  "id": "uuid",
  "symbol": "AAPL",
  "assetType": "equity",
  "side": "buy",
  "quantity": 10,
  "price": 172.34,
  "currency": "USD",
  "fees": 1.25,
  "executedAt": "2024-01-12T15:31:00Z",
  "broker": "robinhood",
  "raw": {}
}

Once everything looks like this, the rest of the system becomes boring, which is exactly what you want.


The Architecture: Adapter Pattern or Nothing

The only sane way to solve this was the Adapter Pattern.

Each broker gets:

  • Its own adapter
  • Its own parsing rules
  • Its own ugly edge cases

But all adapters implement the same interface.

The Interface

export interface BrokerAdapter {
  canParse(file: File): boolean;
  parse(file: File): Promise<StandardTrade[]>;
}

That is it.

No shared assumptions.
No almost compatible logic.

If an adapter cannot confidently parse a file, it refuses to.


From Raw CSV to Structured Data

Here is what a real raw row might look like coming out of a CSV parser:

{
  "Instrument": "Apple Inc.",
  "Quantity": "10",
  "Price": "$172.34",
  "Trade Date": "01/12/2024",
  "Fees": ""
}

Every adapter’s job is to:

  1. Map column names
  2. Normalize numbers
  3. Fix dates
  4. Infer missing fields
  5. Validate the output

Example (simplified):

function normalizeRobinhoodRow(row: any): StandardTrade {
  return {
    id: crypto.randomUUID(),
    symbol: mapInstrumentToSymbol(row["Instrument"]),
    assetType: "equity",
    side: Number(row["Quantity"]) > 0 ? "buy" : "sell",
    quantity: Math.abs(Number(row["Quantity"])),
    price: parseCurrency(row["Price"]),
    currency: "USD",
    fees: parseCurrency(row["Fees"] || "0"),
    executedAt: parseDate(row["Trade Date"]),
    broker: "robinhood",
    raw: row
  };
}

Every adapter does ugly work so the rest of the app never has to.


The Aha Moment: Do It in the Browser

Originally, I planned to do this on the backend.

That was a mistake.

Why Local-First Wins

  • CSV files can be large
  • Financial data is sensitive
  • Users do not want to upload raw trade history to a server
  • Latency kills onboarding

So I moved everything into the browser.

Tools Used

  • PapaParse for streaming CSV parsing
  • Web Workers to avoid blocking the UI
  • Zod-style validation for schema safety

The result:

  • Drag and drop CSV import
  • Zero backend processing
  • No credentials, no accounts, no risk

The user keeps their data.
The browser does the work.


The Trojan Horse: Turning Pain into a Tool

At some point it became obvious:

This logic should not be locked inside an app.

So I extracted it into a standalone package:

@pocket-portfolio/importer
https://www.npmjs.com/package/@pocket-portfolio/importer

Install it:

npm install @pocket-portfolio/importer

Use it in your own project:

import { importTrades } from "@pocket-portfolio/importer";

const trades = await importTrades(file);

That is it.

No signup.
No API keys.
No backend.


Why This Is the Gateway

Most developers do not wake up wanting a portfolio tracker.

They wake up angry that:

  • Their broker export is unusable
  • Their script broke again
  • Their data is inconsistent

This library solves that problem.

And once you have clean, normalized trade data, the next question is obvious:

"Okay, now what do I do with it?"

That is where Pocket Portfolio comes in, a free, local-first dashboard that already understands this schema.

You can use the library alone.
Or you can see it working end to end.

https://www.pocketportfolio.app


Lessons Learned

  • CSV parsing is not a simple task
  • Normalization is a product feature
  • Local-first is not a buzzword, it is UX
  • The best open-source tools start by solving boring pain

What Is Next

  • More brokers
  • Better locale handling
  • Schema versioning
  • Community adapters via PRs

If you have ever screamed at a CSV file, this is for you.



This is part of the Rolling Thunder launch for Pocket Portfolio.
Every post ships something real.

2 Comments

2 votes
1

More Posts

Local-First: The Browser as the Vault

Pocket Portfolioverified - Apr 20

Architecting a Local-First Hybrid RAG for Finance

Pocket Portfolioverified - Feb 25

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Comparison: Universal Import vs. Plaid/Yodlee

Pocket Portfolioverified - Mar 12

Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support

Masbadar - Mar 13
chevron_left

Commenters (This Week)

1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!