Every time an LLM pipeline misbehaves in production, the post-mortem lands in the same place: not the model, not the prompt, not the inference infrastructure. The data going in was a mess.
We talk about this as though it is a new insight. It is not. Supply chain engineers have been fighting the same battle for decades, with the same failure modes, the same human cost, and eventually the same solution. The lesson did not transfer to software because we were not paying attention.
Here is what they figured out and what it means for the way we build AI systems today.
The First Mile Problem
Supply chain AI failed long before LLMs existed, and it failed for one reason: data quality at the ingestion layer.
Industry research estimates that 80 to 90% of enterprise data is unstructured, and in supply chain operations this is not the exception but the baseline. Every trading partner sends data differently. Purchase orders arrive as PDFs, waybills as scanned images, shipping notices in a dozen incompatible EDI dialects. When AI systems were introduced to automate logistics workflows, they choked immediately -- not on hard problems, but on superficial ones. Uncalibrated inputs. Inconsistent field formats. Free-text descriptions where structured identifiers should have been.
The result was that highly paid supply chain specialists spent their days manually correcting algorithm exceptions. The AI was not saving time. It was redirecting the same time to a more frustrating version of the same task.
The engineering term for this layer is the "first mile" -- the ingestion point where external data enters your systems. As one analysis puts it: clean API-to-API integrations rarely fail; messy data ingestion fails constantly. The first mile is the hardest and most neglected layer, and it is where everything breaks.
Sound familiar?
How They Fixed It: Standards at the Source
The supply chain world did not solve this problem by building better exception handlers. They solved it by imposing structure at the origin.
GS1 started with the barcode in 1973. The idea was simple: if every product in every supply chain uses the same identifier format, every system downstream can parse it without negotiation. No translation layer. No guessing. The data contract is enforced at the label, not at the receiving system.
RFID extended this to machine-readable events. RAIN RFID with GS1 EPC encoding means a tag does not just carry a product identifier -- it carries it in a format that every compliant reader in the world understands the same way. Getting that encoding right is not a hardware concern or a procurement concern. It is a data schema concern, enforced at the physical label before a single byte enters a system.
The result was measurable. Before RFID, retail inventory accuracy using manual barcode scanning sat between 60% and 80%. Research from the RFID Research Center at the University of Arkansas showed RFID-enabled inventory systems improved accuracy by 13 percentage points versus control stores. More tellingly, a 2025 report found that while 91% of supply chain professionals believed their organizations had adequate visibility, only 33% consistently achieved real-time inventory insight. The gap between confidence and reality closed only when data quality was enforced at the point of origin -- not cleaned up downstream.
The key finding from RFID Journal's retail analysis is unambiguous: item-level inventory accuracy in stores remains 55-80% at businesses that rely on legacy systems, and AI cannot improve on inaccurate source data. Accurate encoding is a prerequisite, not a nice-to-have.
The Same Architecture, Different Stack
Now look at what we are building.
A RAG pipeline ingests documents, chunks them, embeds them, retrieves relevant chunks at query time, and passes them to an LLM. The model's output quality is directly bounded by what arrives in context. Corrupted, noisy, or ambiguously structured source data produces corrupted, noisy retrievals, and the LLM hallucinates even if the prompt and model are both excellent.
The parallel to supply chain is exact:
| Supply Chain | LLM Pipeline |
| Product identifier (barcode / EPC) | Document metadata / entity ID |
| EDI message format | Input schema / prompt structure |
| RFID encoding standard | Chunking and embedding strategy |
| First-mile ingestion | RAG ingestion layer |
| Downstream inventory AI | Retrieval model + generation |
In both cases, the intelligence layer is only as reliable as the structure imposed at ingestion. In both cases, teams tried to compensate with smarter downstream processing rather than fixing the source. In both cases, that approach failed.
The structural mismatch in LLM agent systems is formally recognized in recent research. A 2025 paper on agentic AI pipelines puts it this way: LLMs are probabilistic generators that produce unstructured or semi-structured text, while the infrastructure they must control is deterministic and schema-bound. A single schema violation in an agent-generated command causes downstream failure. These systems do not accept ambiguous or malformed input.
This is not a model problem. It is a data contract problem, and supply chain engineers knew this twenty years ago.
Where It Shows Up in Practice
The concrete failure modes are similar enough to be instructive.
Inconsistent identifiers across sources. In supply chain, the same product arrives with different descriptions from different suppliers. In LLM pipelines, the same entity appears with different names, acronyms, and formats across ingested documents. Retrieval breaks because embedding similarity cannot compensate for naming chaos. The fix in both cases is normalization at ingestion, not at query time.
Free text where structured fields should be. Handwritten notes and non-standard waybills disrupted supply chain AI the same way unstructured PDFs and email threads disrupt RAG pipelines today. The content is there; the machine cannot parse it reliably without additional processing. The supply chain answer was EDI -- enforced structured interchange formats. The LLM answer is pre-ingestion extraction and schema validation.
Confidence in data that has not been verified. Supply chain teams believed they had accurate inventory before RFID proved otherwise. The gap between assumed and actual accuracy was enormous. AI teams today routinely overestimate the quality of their training and retrieval data. A 2026 analysis on LLM data quality makes the same point: AI agents querying a data warehouse see tables and columns, but they do not see quality scores, freshness indicators, or certification status. They have no way to know whether the data they are reading is trustworthy for the current use case.
What Good Looks Like
Supply chain's answer was layered: a universal identifier standard at the product level, an encoding standard at the data carrier level, an interchange standard at the message level, and compliance requirements enforced by trading partners downstream. Each layer reduced the surface area for ambiguity before the data reached any intelligence system.
For LLM pipelines, the equivalent pattern is:
# Bad: ingest whatever arrives, clean it up later
def ingest_document(path: str) -> list[str]:
text = extract_text(path)
chunks = split_by_token_count(text, max_tokens=512)
return chunks
# Better: enforce structure at ingestion, validate before embedding
def ingest_document(path: str, schema: DocumentSchema) -> list[Chunk]:
text = extract_text(path)
metadata = extract_and_validate_metadata(text, schema)
if not metadata.is_valid:
raise IngestionError(f"Schema validation failed: {metadata.errors}")
chunks = split_by_semantic_boundary(text, metadata)
return [
Chunk(
content=chunk,
entity_id=metadata.canonical_id, # normalized identifier
source=metadata.source,
freshness=metadata.timestamp,
schema_version=schema.version
)
for chunk in chunks
]
The difference is not the chunking strategy or the embedding model. It is the enforcement of a data contract before anything touches the pipeline. Canonical identifiers. Freshness tracking. Schema versioning. These are the same properties GS1 encoded into barcodes in 1973.
# Validate at the boundary, not after the fact
class DocumentSchema(BaseModel):
entity_id: str
source_system: Literal["erp", "crm", "docs", "external"]
content_type: Literal["policy", "product", "transaction", "reference"]
timestamp: datetime
version: str
@validator("entity_id")
def must_be_canonical(cls, v):
if not re.match(r"^[A-Z]{3}-\d{6}$", v):
raise ValueError("entity_id must follow canonical format")
return v
This feels like overhead until the first time a downstream agent acts on stale or ambiguous data at scale. Then it feels like the only thing you wish you had done earlier.
The Organizational Parallel
There is one more thing supply chain got right that most AI teams have not internalized yet.
Data quality was not treated as a technical problem owned by the engineering team. It was treated as a shared contract across the entire supply chain, enforced at every node. Suppliers had compliance requirements. Retailers had receiving standards. The GS1 standards existed precisely because one node in the chain fixing their own data was insufficient. Every participant had to encode correctly for any participant to benefit.
LLM pipelines are the same. The model team can build the best retrieval architecture in the world, but if the team uploading source documents does not follow naming conventions, does not tag entities consistently, and does not update stale content, the retrieval will fail. Data quality is a shared organizational contract, not an engineering problem to be solved at the model layer.
Supply chain engineers fought that battle for thirty years before it became common knowledge. We are running the same playbook now, just with different vocabulary.
The principle that held: Intelligence at the top of the stack is bounded by the structure at the bottom. Fix the encoding. Everything else follows.