Your RAG isn't broken. Your chunks are.

Question

Your RAG isn't broken. Your chunks are.

calendar_todayMay 14 • schedule1 min read

Everyone wants to blame the model. "GPT-4 is hallucinating." "Claude isn't following instructions." "The embeddings are bad."Nine times out of ten, it's the chunks.I've been auditing RAG pipelines lately and the same patterns keep showing up:Fixed-size chunking that splits sentences in half. The model gets a fragment that starts with "...and therefore the policy applies only when" and you wonder why the answer is confidently wrong.No overlap between chunks. Context that spans a paragraph boundary is gone. The model sees the question but not the answer, because the answer lives in the chunk after the one that matched.Chunking before cleaning. Headers, nav menus, footer junk, and "Click here to subscribe" all get embedded as if they're content. Then they show up in retrieval and pollute the context window.Embedding the wrong thing. People embed the raw chunk but the user query is phrased completely differently. No wonder cosine similarity is low — they're not semantically close, they just happen to be about the same topic.Before you blame the model, look at what you're actually retrieving. Log the top 5 chunks for 10 real user queries and read them like a human. If you can't answer the question from those chunks, the model can't either.The model is a mirror. It reflects the quality of what you feed it.

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	Your Tech Stack Isn’t Your Ceiling. Your Story Is Karol Modelskiverified - Apr 9
	Your AI Doesn't Just Write Tests. It Runs Them Too. Kevin Martinez - May 12
	MCP Is the USB-C of AI. So Why Are You Plugging Everything In? Ken W. Algerverified - Jun 10
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolio - Feb 25

Your RAG isn't broken. Your chunks are.

0 Comments

Please log in to comment on this post.

More Posts

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Your AI Doesn't Just Write Tests. It Runs Them Too.

MCP Is the USB-C of AI. So Why Are You Plugging Everything In?

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Architecting a Local-First Hybrid RAG for Finance

More From Luis Cruz

Building Visit Lübeck: Designing a Better Digital Experience for Travel Discovery

Building Quantinium: A Focus on Production-Ready AI Systems

Building WhatIsGC.com: Lessons From Designing a Production-Ready AI Information System

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,714 amazing developers

Don't have an account? Sign up

OR

Your RAG isn't broken. Your chunks are.

0 Comments

Please log in to comment on this post.

More Posts

More From Luis Cruz

Related Jobs

Commenters (This Week)