Love the local first approach. Have you noticed users caring more about privacy or performance?
I Built a Local-First AI Desktop Knowledge Base — Here's What I Learned
7 Comments
@[Next Big Creative] Great question — honestly, it's been both, but for different user profiles. The privacy-first crowd comes in knowing exactly what they want: zero cloud, no telemetry, full control. They're usually handling legal docs, research notes, or internal company data. Performance is secondary for them — they'll wait 2 seconds for a query if they know the data never left their machine.
The performance-first users discover the privacy benefits after the fact. They come for the ~1ms FTS5 latency or the offline Ollama setup, and then realise cloud RAG tools were slowing them down AND sending their data somewhere. That's actually been the more interesting conversion — people who didn't start out caring about privacy, and now do.
So the short answer: privacy brings them in, performance makes them stay.
Please log in to add a comment.
Gunjan, this is a phenomenal write-up and an incredible masterclass in local-first systems architecture.
What I love most about Knovex is that it completely rejects the 'Digital Attic' trap. Most desktop AI tools just blindly dump messy, uncurated markdown and PDF snippets into an embeddings base and pray that semantic search can figure it out at runtime. All that does is saddle the user with a massive, recurring Prose Tax—burning local compute and ballooning context windows just to re-explain structural history the system should already know.
Your 6-stage normalization pipeline in docnest-ai is the exact right antidote. By investing in structure, section assignment, and table normalization at the ingestion boundary, you've built a true Forensic Ingestor. Pre-paying that precision so that L0 and L1 can resolve 70% of queries at zero token cost is pure engineering maturity.
Did you hit any specific edge cases when normalizing highly irregular table structures into that clean JSON schema before embedding them? That's usually where the deterministic layer gets tested the hardest.
@[Ken W. Alger] Thanks — "Prose Tax" is a great way to put it; that's exactly the cost we were trying to pre-pay.
And yes, tables are where the deterministic layer earns its keep. A few that hurt the most:
Merged/spanning cells — rowspan/colspan break the clean grid assumption. We explode them so every logical cell carries its own row/column coordinates instead of inheriting position implicitly.
Multi-row & hierarchical headers — a single header row is the easy case. Stacked/grouped headers we flatten into a header path per column so a serialized row is still self-describing once it's a chunk.
Tables split across page breaks — PDFs love to continue a table on the next page with no repeated header. Stitching those back into one logical table (and not double-counting the header) was fiddly.
Borderless / whitespace-aligned tables — the hardest to even detect as tables before you can normalize them.
The thing that made it tractable was treating it as layered with explicit fallbacks rather than one rule — and serializing row-wise with the column context baked in, so a retrieved row is meaningful without the rest of the table. Curious whether you've gone row-level vs whole-table on the embedding side?
@[Gunjan Tailor] Exploding spanned cells and flattening hierarchical headers into explicit paths is exactly how a Forensic Ingestor earns its stripes. You've essentially built a structural compiler for documents. That multi-page PDF table-splitting issue is a notorious nightmare, and fixing it at ingestion completely sanitizes the retrieval layer.
To answer your question on the embedding strategy: in the Sovereign System Specification, we lean heavily into a hybrid Context-Injected Row Topology rather than a binary whole-table vs. naive row choice.
Whole-table embedding runs into severe semantic dilution—if a user queries a specific line item, the macro-table vector can easily miss it. On the flip side, raw row-level embedding loses the relational anchor. If a row just says ['Q3', '$42,000', '12%'], the vector is practically useless without the columns.
We handle this by chunking row-by-row and dynamically injecting the flattened header path and document entity metadata directly into the serialized string payload for each row before vectorization.
It looks something like:
[Doc: Q3_Report] -> [Section: Fiscal_Summary] -> [Headers: Quarter | Revenue | YoY_Growth] -> Row Data: ['Q3', '$42,000', '12%']
This ensures that every retrieved row stands alone as an independent, deterministic state asset. Because the row payload is entirely self-describing, it drastically reduces the Prose Tax during retrieval—the model doesn't need the surrounding fifty rows just to interpret the meaning of the data it's looking at.
I recently opened up the core repository for the community to dig into (you can check out the layout here: Sovereign System Specification Open Source Announcement).
Your work on docnest-ai is one of the cleanest production implementations of these ingestion-boundary principles I've seen in the wild. If you're forcing every logical cell to carry its own coordinate system, you're already 90% of the way toward a zero-variance memory architecture.
@[Ken W. Alger] The "Context-Injected Row Topology" framing is exactly what we landed on by trial and error — good to see it named properly. The self-describing row payload is the key insight: a retrieved row that needs surrounding rows to be interpretable is a retrieval liability, not an asset.
Will dig into the Sovereign System Specification repo — the header path injection approach maps closely to how we serialize hierarchical headers. Curious whether you track the injection cost on context window size at inference time vs. the retrieval precision gains. That trade-off is where we've had to tune the most.
@[Gunjan Tailor] You've hit the exact engineering pivot point at which this architecture needs to be tuned. The tension between structural overhead and context window inflation is very real.
In the Sovereign Spec, we treat this trade-off through a pattern called Dynamic Token Budgeting. We view the context window through two distinct types of token expenditure: Structural Overhead (the header paths) and the Prose Tax (conversational filler, raw text, and redundant surrounding rows).
Here is how we balance the scale to ensure precision wins without blowing past local hardware memory constraints:
The Token Trade-Off Math
If you pull in a naive 512-token chunk just to get a single relevant table row, you are paying a massive Prose Tax on the unformatted junk text surrounding it. By switching to a Context-Injected Row Topology, each row payload might grow by 40–80 tokens of pure structural metadata, but your total retrieval footprint shrinks drastically because you only pull the exact 3 rows required to answer the query. You're intentionally trading raw volume for high-density, high-integrity tokens.Runtime Schema Compaction
To prevent long hierarchical header paths from consuming the context window during inference, theSovereign-SDKuses a schema compaction step. At the ingestion boundary, we build an internal token dictionary for the document structure. Instead of repeating a massive path like[Document: 2026_Enterprise_Q3_Financial_Report_Final] -> [Section: North_American_Commercial_Division_Outcomes], the runtime compacts that down to a lightweight token alias or short structural hash before passing it to the context ring.Deterministic Gatekeeping
Because ourSessionContexttracks the execution depth deterministically, the system knows exactly when to strip or hydrate those structural headers. If three retrieved rows share the identical header path, the runtime router deduplicates the header block at the top of the context turn, rather than letting it repeat inline for every row.
Ultimately, we’ve found that paying a predictable, upfront token cost for a strict structure is always cheaper than gambling on a model's ability to guess a missing column header under compression.
How are you currently measuring the precision drop-off when you dial back the header injection length? Are you seeing the model hallucinate numbers, or is it just failing to associate the row with the right entity?
Please log in to add a comment.
Please log in to comment on this post.
More Posts
- © 2026 Coder Legion
- Feedback / Bug
- Privacy
- About Us
- Contacts
- Premium Subscription
- Terms of Service
- Refund
- Early Builders
More From Gunjan Tailor
Related Jobs
- Sr IT Infrastructure Engineer - Enterprise Database Platform Services - RemotePrime Therapeutics · Full time · Springfield, IL
- Lead Laravel PHP Software Developer (Chicago Based)Highland · Full time · Chicago, IL
- Lead Laravel PHP Software Developer (Chicago Based)Highland · Full time · Chicago, IL
Commenters (This Week)
Contribute meaningful comments to climb the leaderboard and earn badges!