I Rewrote a Node.js CLI in Rust — 1000x Faster

Question

I Rewrote a Node.js CLI in Rust — 1000x Faster

calendar_todayFeb 3 • schedule6 min read

I've been using Claude Code heavily and kept wondering how much I'm actually spending. There's no built-in way to see total token usage or cost history.

The Problem

I was a happy user of ccusage, a Node.js tool for tracking Claude Code token usage. It worked great — until it didn't.

As my Claude Code usage grew, ccusage started taking 43 seconds to scan my session files. What changed wasn't the tool — it was my data:

du -sh ~/.claude/projects
3.4G

find ~/.claude/projects -name "*.jsonl" | wc -l
2772

Claude Code auto-deletes sessions older than 30 days, but heavy usage still leaves thousands of JSONL files totaling gigabytes.

An issue that is not resolved quickly

A quick look at GitHub issues confirmed this was a widespread problem:

#718 — Suddenly takes minutes
#804 — CPU 300%+, memory 2.4GB
#821 — 750 files / 4GB → timeout (30s+)

I tried contributing a cache optimization PR, ran benchmarks — and got no meaningful improvement. Here's why.

Why Node.js Couldn't Fix This

ccusage is written in TypeScript. Let's look at what it does:

// Simplified ccusage flow
import { readFile } from 'node:fs/promises';
import { glob } from 'tinyglobby';

const files = await glob(['**/*.jsonl']);

for (const file of files) {
  const content = await readFile(file, 'utf-8');
  for (const line of content.split('\n')) {
    const parsed = JSON.parse(line);
    // process...
  }
}

Glob files → read each one entirely into memory → split lines → JSON.parse each line. Four compounding bottlenecks:

1. JSON.parse Is Sequential

V8 has steadily improved JSON performance:

V8 v7.6 (2019) — Optimized JSON.parse memory allocation
V8 v13.8 (Chrome 138, 2025) — SIMD-optimized JSON.stringify

But JSON.parse still processes one byte at a time. It scans input character-by-character, classifies tokens via lookup tables, and builds objects incrementally. The fundamental architecture is sequential.

Compare this to simdjson, which uses SIMD instructions to process 32–64 bytes in parallel:

JSON.parse:  [a][b][c][d]…[e][f][g][h] → 8 operations (sequential)
simdjson:    [a,b,c,d,e,f,g,h]         → 1 operation (SIMD)

simdjson works in two stages:

Stage 1 (Structural Discovery): SIMD scans 64 bytes at once, extracts structural character positions ({, [, :, ,) as bitmasks — branchless
Stage 2 (Value Materialization): Builds actual values from discovered structure

This separation lets Stage 1 run without branches, maximizing CPU pipeline utilization — achieving gigabytes per second of JSON throughput.

A standalone Node.js binding for simdjson exists, but it hasn't been actively maintained since 2021. While Node.js internally uses simdjson for some operations, it's not exposed as a replacement for JSON.parse. And even if JSON parsing got faster, the single-threaded and GC problems below would remain.

2. Memory Pressure From readFile

const content = await readFile(filePath, 'utf-8');

readFile loads entire files into memory. For 2,772 files, each one gets:

Loaded into a Buffer
UTF-8 decoded into a String
split('\n') creates yet another array of strings

You could use createReadStream for streaming, but JSON.parse needs complete strings anyway — so it doesn't fundamentally help.

3. libuv Thread Pool Ceiling

Node.js file I/O runs on libuv's thread pool, not the event loop:

Node.js I/O Architecture:
┌──────────────────┐
│ Event Loop       │ ← single-threaded (JS execution + callbacks)
└────────┬─────────┘
         │
┌────────▼─────────┐
│ libuv threadpool │ ← default: 4 threads (fs operations)
│ [1] [2] [3] [4]  │
└──────────────────┘

The default is 4 threads. You can increase it with UV_THREADPOOL_SIZE, but I tested that:

UV_THREADPOOL_SIZE=4 (default)  → 43.4s, 76% CPU
UV_THREADPOOL_SIZE=64           → 42.2s, 80% CPU  (~3% improvement)
UV_THREADPOOL_SIZE=128          → 33.6s, 100% CPU (~22% improvement)

32x more threads → only 22% faster. CPU hit 100% at 33 seconds. The bottleneck isn't I/O — it's JSON.parse running on the single main thread.

You could use Worker Threads for parallel parsing, but transferring data between threads requires serialization — you'd pay the parsing cost twice.

4. GC Overhead

Parsing 3GB of JSONL creates millions of temporary objects. V8's GC has to clean them up, causing stop-the-world pauses:

GC trace results (3GB / 2,227 files):
Total GC events: 504
Peak heap memory: 378MB
Longest Major GC pause: 135ms

# Major GC (Mark-Compact) log excerpt
605 ms:  Mark-Compact 195.3 → 163.4 MB, 135.79ms pause
1051 ms: Mark-Compact 273.0 → 239.9 MB
1378 ms: Mark-Compact 378.9 → 225.5 MB
1650 ms: Mark-Compact 390.2 → 210.9 MB
1886 ms: Mark-Compact 375.3 → 161.7 MB

V8's Orinoco GC uses incremental marking and concurrent sweeping to minimize pauses, but with this volume of data, 504 GC events in a single run — roughly one every 100ms — adds up.

Summary

Every fix I tried hit a wall:

Sequential JSON.parse → simdjson Node.js binding? Unmaintained since 2021.
libuv 4 threads → UV_THREADPOOL_SIZE=128? 22% improvement ceiling.
Single-threaded parsing → Worker Threads? Serialization overhead.
GC overhead → 504 GC events, 135ms max pause. No fix available.
Combined: cache optimization PR? No meaningful improvement.

This wasn't a single bottleneck — it was multiple structural limitations compounding.

Why Rust?

Just rewrite it in another language — sure, but which one?

Go — Concurrent GC with low pause times, but no zero-copy SIMD JSON ecosystem comparable to Rust's simd-json
C++ — No GC, full SIMD and threading support. But no memory safety for handling thousands of files, and cross-compilation and npm distribution are both painful.
Rust — No GC + memory safety + simd-json/rayon ecosystem + cargo cross for 5 platforms + npm distribution via binary wrapper. Everything I needed.

And honestly — I wanted to learn Rust properly, and this was the perfect project for it.

The Rewrite

I rewrote to the stack below. I used ccusage well before, so I referred to ccusage.

JSON parsing: JSON.parse (sequential) → simd-json (SIMD)
File discovery: tinyglobby → glob
Parallelism: libuv 4 threads → rayon (all cores)
Memory: V8 GC → Ownership + zero-copy
UI: Terminal output → ratatui TUI dashboard

simd-json: Zero-Copy Parsing

// Zero-copy JSON parsing - no heap allocation for strings
let data: ClaudeJsonLine = simd_json::from_slice(&mut line_bytes)?;

Rust's simd-json is a port of simdjson. Pass &mut [u8] to from_slice and it parses in-place — borrowed string references, no allocations. This is zero-copy parsing.

rayon: Embarrassingly Simple Parallelism

// Parallel file processing: just change .iter() to .par_iter()
let entries: Vec<UsageEntry> = files
    .par_iter()
    .flat_map(|f| parse_file(f))
    .collect();

rayon makes data parallelism trivial. Change .iter() to .par_iter() and it automatically distributes work across all CPU cores using work-stealing.

The key difference from Node.js: each Rust thread independently reads and parses files, then collects results. No serialization overhead between threads. The "parallel parsing requires serialization" dilemma simply doesn't exist.

Multi-CLI Support

Toktrack parses session data from three AI coding CLIs:

Claude Code — JSONL format, ~/.claude/projects/
Codex CLI — JSONL format, ~/.codex/sessions/
Gemini CLI — JSON format, ~/.gemini/tmp/*/chats/

Each parser implements a shared CLIParser trait, and all files are processed in a single parallel pass. Support for OpenCode and GLM CLI is planned.

Architecture

Cold path (first run): Full glob scan → parallel SIMD parsing → build cache → aggregate.

Warm path (cached): Load cached summaries → recompute only today (past dates served from cache) → merge → aggregate.

Data Preservation

The cache isn't just for performance. AI CLIs silently delete session data — Claude Code defaults to cleanupPeriodDays: 30, wiping JSONL files after 30 days. When those files are gone, your token usage and cost history goes with them.

toktrack's daily cache solves this. Past dates are immutable — once a day is summarized, the result is never modified. Even if Claude Code deletes the original session files a month later, your cost history remains intact in ~/.toktrack/cache/.

Results

ccusage (Node.js): ~43s
toktrack (Rust): ~0.04s — ~1000x faster

Mode	Time
Cold start (no cache)	~1.0s
Warm start (cached)	~0.04s
Throughput	~3 GiB/s with rayon + simd-json

Measured on Apple Silicon (M-series), 2,772 JSONL files, 3.4 GB total.

What I Learned

Node.js is excellent for many things, but parsing gigabytes of JSON across thousands of files exposes structural limits that no amount of optimization can fix:

JSON.parse is fundamentally sequential
Scaling libuv threads hits diminishing returns fast
Worker Thread parallelism comes with serialization tax
GC pressure grows with data volume

Rust's combination of SIMD JSON parsing, zero-cost parallelism, and no GC made the 40x speedup possible — not by clever algorithms, but by removing the bottlenecks entirely.

toktrack is open source. Feedback and contributions welcome!

GitHub: https://github.com/mag123c/toktrack
Install: npx toktrack

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Freya Williams · Answer 1 · 2026-02-04T04:48:52+0000

Freya Williams • Feb 3

Wow, 1000x faster just by switching to Rust is wild—makes me wonder what other Node.js bottlenecks could be smashed like this.

Mag123c • Feb 4

@[Freya Williams]
To be fair, it's not just "Rust = fast" — the 1000x comes from

SIMD JSON parsing (simd-json processes 64 bytes at once)
True parallelism (rayon uses all cores)
Zero-copy parsing (no GC pressure)
Smart caching (only reparse today's data)

Node.js hits a wall when you need all of these together.
For I/O-bound stuff, Node is still great!

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules snapsynapseverified - Apr 20

I Rewrote a Node.js CLI in Rust — 1000x Faster

The Problem

An issue that is not resolved quickly

Why Node.js Couldn't Fix This

1. JSON.parse Is Sequential

2. Memory Pressure From readFile

3. libuv Thread Pool Ceiling

4. GC Overhead

Summary

Why Rust?

The Rewrite

simd-json: Zero-Copy Parsing

rayon: Embarrassingly Simple Parallelism

Multi-CLI Support

Architecture

Data Preservation

Results

What I Learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

More From Mag123c

I Built an OpenSource - Ultra fast AI CLI token tracker : toktrack

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,668 amazing developers

Don't have an account? Sign up

OR

I Rewrote a Node.js CLI in Rust — 1000x Faster

The Problem

An issue that is not resolved quickly

Why Node.js Couldn't Fix This

1. JSON.parse Is Sequential

2. Memory Pressure From readFile

3. libuv Thread Pool Ceiling

4. GC Overhead

Summary

Why Rust?

The Rewrite

simd-json: Zero-Copy Parsing

rayon: Embarrassingly Simple Parallelism

Multi-CLI Support

Architecture

Data Preservation

Results

What I Learned

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

More From Mag123c

I Built an OpenSource - Ultra fast AI CLI token tracker : toktrack

Related Jobs

Commenters (This Week)