FLAMEHAVEN FileSearch: A Self-Hosted RAG Engine Built for Production, Not Just Demos
When people talk about RAG, they usually show a clean architecture diagram.
- a parser
- a vector store
- an LLM
- a framework wrapper
- and a polished demo query
What usually gets skipped is the operational burden behind it:
- document parsing
- chunking
- embeddings
- source attribution
- auth and permissions
- storage decisions
- caching
- metrics
- deployment
That is where many RAG projects become harder to run than they first appear.
FLAMEHAVEN FileSearch is interesting because it is shaped like a deployable system from the beginning.
It combines:
- self-hosted deployment
- hybrid retrieval
- 34-file-format parsing
- multi-LLM support
- source attribution
- admin controls
- SDK/API access
in one stack.
Why this repo stands out
A lot of RAG tooling is powerful.
But in practice, many teams still end up stitching together:
- one parser
- one embedding workflow
- one vector database
- one answer layer
- one access-control layer
- one monitoring story
- and several hidden dependencies
That approach can work.
It also pushes a lot of infrastructure burden onto the user.
FLAMEHAVEN FileSearch takes the opposite route.
It compresses more of the operational surface area into a single deployable engine.
This is what makes it more interesting as a practical internal document search foundation than as just another retrieval demo.
Core differentiators
1) Self-hosted first
This project starts from a clear architectural position:
- keep sensitive documents inside your own environment
- avoid unnecessary dependence on hosted document workflows
- support fully local execution through Ollama when needed
That matters for teams dealing with:
- internal knowledge bases
- legal documents
- research material
- compliance-sensitive data
- healthcare-adjacent content
This is not only about cost.
It is about owning the boundary around your data.
2) Deployment is treated as a feature
Many RAG repos help you experiment.
Fewer help you stand something up quickly in a way that already looks operational.
FLAMEHAVEN FileSearch exposes the same system through:
- Docker
- Python SDK
- REST API
That is a meaningful product decision.
It reduces the distance between:
- “I tested retrieval”
- and
- “I can actually deploy this”
3) Hybrid retrieval instead of vector-only thinking
The search stack is grounded in a more realistic view of production search.
It supports:
- keyword search
- semantic search
- hybrid search via BM25 + RRF
- typo correction
That matters because real document search is rarely solved by embeddings alone.
In production, users still search with:
- exact policy names
- filenames
- product codes
- acronyms
- version labels
- mixed Korean/English terminology
Hybrid retrieval is often the more practical answer.
4) Lower operational weight
One of the most interesting aspects of this project is that it does not equate “more RAG” with “more dependency sprawl.”
The engine emphasizes:
- ultra-fast vector generation
- a lower dependency footprint
- a packaging shape that is easier to deploy than many stitched-together stacks
That matters because many RAG systems quietly accumulate drag through:
- heavy embedding dependencies
- tokenizer mismatches
- GPU assumptions
- fragile parsing chains
- hidden services outside the architecture diagram
Reducing that burden is not flashy.
It is what often makes the difference between a nice prototype and a system teams will actually keep running.
5) Source attribution is built in
This is one of the strongest practical choices in the project.
Every answer is designed to link back to the originating document and chunk.
That is one of the key differences between:
- “chatting with documents”
- and
- “a document system people can actually trust”
If answers cannot be traced, they are hard to audit, hard to debug, and easy to over-trust.
Source attribution is not a bonus feature.
In real workflows, it is part of the credibility model.
6) Broad ingestion surface
The parser supports a wide range of document types, including:
- PDF
- DOCX / DOC
- XLSX
- PPTX
- RTF
- HTML
- CSV
- LaTeX
- WebVTT
- images
- plain text
That matters because enterprise knowledge is never stored in one clean format.
Real search systems need to survive messy document reality.
Benchmark snapshot
Performance and system profile
- vector generation under 1ms
- cold start around 3 seconds
- 476 passing tests
- reduced memory footprint through int8 quantization
- reduced metadata size through compression
Example benchmark environment
- Docker on Apple M1 Mac, 16GB RAM
- 500 PDFs, ~2GB total
- health check: 8ms
- search (cache hit): 9ms
- search (cache miss): 1,250ms
- batch search (10): 2,500ms
- upload (50MB file): 3,200ms
The important part is not only the numbers.
It is that the project already presents a performance profile, a test footprint, and a deployment shape that look like a system intended for real use.
Where it fits against common alternatives
| Approach | Strength | Trade-off | Where FLAMEHAVEN FileSearch differs |
| Framework-only stack | Flexible and composable | You still assemble parsing, retrieval, auth, storage, attribution, and deployment yourself | FLAMEHAVEN packages more of the operational stack into one deployable engine |
| Hosted RAG / SaaS search | Fastest onboarding | External data boundary, vendor dependence, recurring cost model | FLAMEHAVEN emphasizes self-hosted control and optional fully local execution |
| Vector-first DIY pipeline | Good for experimentation | Often weak on lexical precision, source traceability, and operational polish | FLAMEHAVEN combines semantic + keyword + hybrid retrieval with attribution |
| FLAMEHAVEN FileSearch | Deployment-oriented, self-hosted, broad file support, API/SDK/Docker entry points | Less of a blank canvas than a fully DIY stack | Best fit for teams that want a production-shaped document search base quickly |
This is a comparison of deployment model and operational shape, not a claim that one framework universally outperforms every other option on every workload.
Feature highlights
Search and retrieval
- keyword, semantic, and hybrid search
- BM25 + RRF
- typo correction
- structure-aware chunking
- KnowledgeAtom 2-level indexing
- sliding-window context enrichment
Deployment and integration
- Docker-first setup
- Python SDK
- REST API
- LangChain integration
- LlamaIndex integration
- Haystack integration
- CrewAI integration
Storage and infrastructure
- SQLite by default
- PostgreSQL + pgvector
- optional Redis cache
Security and operations
- API key hashing with salt
- rate limiting
- fine-grained permissions
- audit logging
- OWASP headers
- input validation
- admin dashboard with metrics and quota controls
Why this matters in practice
The interesting part of this repo is not just that it does retrieval.
It seems to understand that many RAG systems fail at the boundaries between:
- retrieval quality
- deployment complexity
- privacy constraints
- integration burden
- operational maintainability
That is a more serious problem than “can we get a relevant chunk back?”
And that is why this project feels worth watching.
Final take
FLAMEHAVEN FileSearch looks less like a notebook experiment and more like a production-shaped document search engine.
That is the real differentiator.
Not just:
- another retriever
- another wrapper
- another vector demo
But a system trying to reduce the distance between:
- local documents
- trustworthy retrieval
- self-hosted deployment
- and real operational use
If your team wants document search that is:
- private
- attributable
- deployable
- and less painful to assemble
this repo is worth a close look.
Repository
GitHub: https://github.com/flamehaven01/Flamehaven-Filesearch