I've been using Claude Code heavily and kept wondering how much I'm actually spending. There's no built-in way to see total token usage or cost history.

The Problem
I was a happy user of ccusage, a Node.js tool for tracking Claude Code token usage. It worked great — until it didn't.
As my Claude Code usage grew, ccusage started taking 43 seconds to scan my session files. What changed wasn't the tool — it was my data:
du -sh ~/.claude/projects
3.4G
find ~/.claude/projects -name "*.jsonl" | wc -l
2772
Claude Code auto-deletes sessions older than 30 days, but heavy usage still leaves thousands of JSONL files totaling gigabytes.
An issue that is not resolved quickly
A quick look at GitHub issues confirmed this was a widespread problem:
- #718 — Suddenly takes minutes
- #804 — CPU 300%+, memory 2.4GB
- #821 — 750 files / 4GB → timeout (30s+)
I tried contributing a cache optimization PR, ran benchmarks — and got no meaningful improvement. Here's why.
Why Node.js Couldn't Fix This
ccusage is written in TypeScript. Let's look at what it does:
// Simplified ccusage flow
import { readFile } from 'node:fs/promises';
import { glob } from 'tinyglobby';
const files = await glob(['**/*.jsonl']);
for (const file of files) {
const content = await readFile(file, 'utf-8');
for (const line of content.split('\n')) {
const parsed = JSON.parse(line);
// process...
}
}
Glob files → read each one entirely into memory → split lines → JSON.parse each line. Four compounding bottlenecks:
1. JSON.parse Is Sequential
V8 has steadily improved JSON performance:
- V8 v7.6 (2019) — Optimized JSON.parse memory allocation
- V8 v13.8 (Chrome 138, 2025) — SIMD-optimized JSON.stringify
But JSON.parse still processes one byte at a time. It scans input character-by-character, classifies tokens via lookup tables, and builds objects incrementally. The fundamental architecture is sequential.
Compare this to simdjson, which uses SIMD instructions to process 32–64 bytes in parallel:
JSON.parse: [a][b][c][d]…[e][f][g][h] → 8 operations (sequential)
simdjson: [a,b,c,d,e,f,g,h] → 1 operation (SIMD)
simdjson works in two stages:
- Stage 1 (Structural Discovery): SIMD scans 64 bytes at once, extracts structural character positions (
{, [, :, ,) as bitmasks — branchless
- Stage 2 (Value Materialization): Builds actual values from discovered structure
This separation lets Stage 1 run without branches, maximizing CPU pipeline utilization — achieving gigabytes per second of JSON throughput.
A standalone Node.js binding for simdjson exists, but it hasn't been actively maintained since 2021. While Node.js internally uses simdjson for some operations, it's not exposed as a replacement for JSON.parse. And even if JSON parsing got faster, the single-threaded and GC problems below would remain.
2. Memory Pressure From readFile
const content = await readFile(filePath, 'utf-8');
readFile loads entire files into memory. For 2,772 files, each one gets:
- Loaded into a Buffer
- UTF-8 decoded into a String
split('\n') creates yet another array of strings
You could use createReadStream for streaming, but JSON.parse needs complete strings anyway — so it doesn't fundamentally help.
3. libuv Thread Pool Ceiling
Node.js file I/O runs on libuv's thread pool, not the event loop:
Node.js I/O Architecture:
┌──────────────────┐
│ Event Loop │ ← single-threaded (JS execution + callbacks)
└────────┬─────────┘
│
┌────────▼─────────┐
│ libuv threadpool │ ← default: 4 threads (fs operations)
│ [1] [2] [3] [4] │
└──────────────────┘
The default is 4 threads. You can increase it with UV_THREADPOOL_SIZE, but I tested that:
UV_THREADPOOL_SIZE=4 (default) → 43.4s, 76% CPU
UV_THREADPOOL_SIZE=64 → 42.2s, 80% CPU (~3% improvement)
UV_THREADPOOL_SIZE=128 → 33.6s, 100% CPU (~22% improvement)
32x more threads → only 22% faster. CPU hit 100% at 33 seconds. The bottleneck isn't I/O — it's JSON.parse running on the single main thread.
You could use Worker Threads for parallel parsing, but transferring data between threads requires serialization — you'd pay the parsing cost twice.
4. GC Overhead
Parsing 3GB of JSONL creates millions of temporary objects. V8's GC has to clean them up, causing stop-the-world pauses:
GC trace results (3GB / 2,227 files):
Total GC events: 504
Peak heap memory: 378MB
Longest Major GC pause: 135ms
# Major GC (Mark-Compact) log excerpt
605 ms: Mark-Compact 195.3 → 163.4 MB, 135.79ms pause
1051 ms: Mark-Compact 273.0 → 239.9 MB
1378 ms: Mark-Compact 378.9 → 225.5 MB
1650 ms: Mark-Compact 390.2 → 210.9 MB
1886 ms: Mark-Compact 375.3 → 161.7 MB
V8's Orinoco GC uses incremental marking and concurrent sweeping to minimize pauses, but with this volume of data, 504 GC events in a single run — roughly one every 100ms — adds up.
Summary
Every fix I tried hit a wall:
- Sequential JSON.parse → simdjson Node.js binding? Unmaintained since 2021.
- libuv 4 threads → UV_THREADPOOL_SIZE=128? 22% improvement ceiling.
- Single-threaded parsing → Worker Threads? Serialization overhead.
- GC overhead → 504 GC events, 135ms max pause. No fix available.
- Combined: cache optimization PR? No meaningful improvement.
This wasn't a single bottleneck — it was multiple structural limitations compounding.
Why Rust?
Just rewrite it in another language — sure, but which one?
- Go — Concurrent GC with low pause times, but no zero-copy SIMD JSON ecosystem comparable to Rust's simd-json
- C++ — No GC, full SIMD and threading support. But no memory safety for handling thousands of files, and cross-compilation and npm distribution are both painful.
- Rust — No GC + memory safety + simd-json/rayon ecosystem + cargo cross for 5 platforms + npm distribution via binary wrapper. Everything I needed.
And honestly — I wanted to learn Rust properly, and this was the perfect project for it.
The Rewrite
I rewrote to the stack below. I used ccusage well before, so I referred to ccusage.
- JSON parsing:
JSON.parse (sequential) → simd-json (SIMD)
- File discovery:
tinyglobby → glob
- Parallelism: libuv 4 threads →
rayon (all cores)
- Memory: V8 GC → Ownership + zero-copy
- UI: Terminal output →
ratatui TUI dashboard
simd-json: Zero-Copy Parsing
// Zero-copy JSON parsing - no heap allocation for strings
let data: ClaudeJsonLine = simd_json::from_slice(&mut line_bytes)?;
Rust's simd-json is a port of simdjson. Pass &mut [u8] to from_slice and it parses in-place — borrowed string references, no allocations. This is zero-copy parsing.
rayon: Embarrassingly Simple Parallelism
// Parallel file processing: just change .iter() to .par_iter()
let entries: Vec<UsageEntry> = files
.par_iter()
.flat_map(|f| parse_file(f))
.collect();
rayon makes data parallelism trivial. Change .iter() to .par_iter() and it automatically distributes work across all CPU cores using work-stealing.
The key difference from Node.js: each Rust thread independently reads and parses files, then collects results. No serialization overhead between threads. The "parallel parsing requires serialization" dilemma simply doesn't exist.
Multi-CLI Support
Toktrack parses session data from three AI coding CLIs:
- Claude Code — JSONL format,
~/.claude/projects/
- Codex CLI — JSONL format,
~/.codex/sessions/
- Gemini CLI — JSON format,
~/.gemini/tmp/*/chats/
Each parser implements a shared CLIParser trait, and all files are processed in a single parallel pass. Support for OpenCode and GLM CLI is planned.
Architecture

Cold path (first run): Full glob scan → parallel SIMD parsing → build cache → aggregate.
Warm path (cached): Load cached summaries → recompute only today (past dates served from cache) → merge → aggregate.
Data Preservation
The cache isn't just for performance. AI CLIs silently delete session data — Claude Code defaults to cleanupPeriodDays: 30, wiping JSONL files after 30 days. When those files are gone, your token usage and cost history goes with them.
toktrack's daily cache solves this. Past dates are immutable — once a day is summarized, the result is never modified. Even if Claude Code deletes the original session files a month later, your cost history remains intact in ~/.toktrack/cache/.
Results
- ccusage (Node.js): ~43s
- toktrack (Rust): ~0.04s — ~1000x faster
| Mode | Time |
| Cold start (no cache) | ~1.0s |
| Warm start (cached) | ~0.04s |
| Throughput | ~3 GiB/s with rayon + simd-json |
Measured on Apple Silicon (M-series), 2,772 JSONL files, 3.4 GB total.
What I Learned
Node.js is excellent for many things, but parsing gigabytes of JSON across thousands of files exposes structural limits that no amount of optimization can fix:
- JSON.parse is fundamentally sequential
- Scaling libuv threads hits diminishing returns fast
- Worker Thread parallelism comes with serialization tax
- GC pressure grows with data volume
Rust's combination of SIMD JSON parsing, zero-cost parallelism, and no GC made the 40x speedup possible — not by clever algorithms, but by removing the bottlenecks entirely.
toktrack is open source. Feedback and contributions welcome!