Three retrieval architectures. Same LLM. Same 7,928 queries across 45 domains. Different structure going in.
Here are the results:
| System | F1 Score | Tokens/query | Cost/query |
|---|---|---|---|
| RAG FAISS + Claude | 0.123 | 2,982 | ~$0.009 |
...