When an AI Pipeline Passes — But One Path Still Must Be Held: EXP-034

When an AI Pipeline Passes — But One Path Still Must Be Held: EXP-034

Leader posted Originally published at flamehaven.space 8 min read

No efficacy, causal, or clinical claims are made in this report.
RExSyn is an experimental Bio-AI governance pipeline.

You do not need to know the earlier experiments to read this report.

Most AI pipeline reports ask one question:

Did the system pass?

EXP-034 asked a stricter one:

Which path was allowed to count?

That distinction matters.

In a multi-stage AI pipeline, a final PASS can hide a lot of unresolved risk. A branch may be unstable. A regeneration path may drift. A new external API may enter the chain without being governed. A new modality may appear to improve the system while quietly changing the basis of judgment.

The real result

So EXP-034 was not designed to produce a clean success story.

It was designed to separate three things:

Path Status Meaning
Anchored expansion path GO Accepted path for EXP-034 reporting
Current regeneration path HOLD Diagnostic evidence, not acceptance baseline
Next remediation cycle EXP-035 RCA and repair target

That is the real result.

EXP-034 passed, but not because every path passed.

It passed because the accepted anchor remained stable, the expansion tracks did not break the judgment system, and the unresolved regeneration path was explicitly held instead of being silently mixed into acceptance.


What EXP-034 tested

Locking the Boundary

EXP-033 had already established a parity baseline.

EXP-034 asked whether that baseline could survive controlled expansion while adding:

  • a modal update track,
  • a live AlphaFold EBI observer endpoint,
  • and AlphaGenome / AG measurement.

The operating rule was simple:

  1. Reproduce the parity baseline first.
  2. Only then allow expansion.
  3. Only then compare governance behavior across experiment cycles.

If the parity anchor breaks, the rest is not expansion.

It is regression.

The scope was also locked: methodology, governance, and reproducibility only. The experiment did not claim biological efficacy, causal inference, or clinical recommendation.

That boundary is important because this kind of system can easily sound more powerful than what was actually measured. EXP-034 was not asking whether the pipeline discovered a better biological answer.

It was asking whether the judgment system stayed governable after new signals entered the chain.


The key split: PASS did not mean everything passed

The key split

Track-A produced the defining decision of the experiment.

The accepted legacy replay anchor preserved the required PASS/BLOCK separation:

Metric Legacy replay anchor
sample accuracy 1.0
sample balanced accuracy 1.0
arm accuracy 1.0
arm balanced accuracy 1.0
dangerous false-pass rate 0.0
false reject rate 0.0

That was the path allowed to anchor EXP-034.

But the current regeneration path did not recover:

Metric Current regeneration
sample accuracy 0.5
sample balanced accuracy 0.5
status HOLD

This is the most important part of the experiment:

EXP-034 did not pretend the regeneration path passed.

It kept that result inside the experiment as diagnostic evidence, but did not allow it to redefine the accepted baseline.

That separation is not a minor operational detail. It is the governance result.

A weak pipeline would have blended the two paths and still reported a final success. EXP-034 did the opposite. It allowed the stable anchor to proceed and held the unstable path for RCA.

That is how a stage-gated system avoids changing its own question after seeing the result.


Why path splitting matters

The concrete governance problem is this:

A pipeline can pass for the wrong reason.

valid_report = stable_anchor × traceable_extension × contained_instability

If the anchor is not stable, the report cannot be trusted.

If the extension is not traceable, the new signal becomes an ungoverned side channel.

If instability is not contained, a diagnostic failure can quietly contaminate acceptance.

A single final PASS is not enough when several branches contribute to a verdict. You need to know which branch produced the accepted decision, which branch failed, which branch was only diagnostic, and which branch is allowed to affect future work.

EXP-034 passed because all three conditions were enforced:

  • the legacy replay anchor held,
  • the new observer and AG paths were measured under governance,
  • and the regeneration HOLD remained outside acceptance.

That is the difference between a pipeline that merely outputs a verdict and a pipeline that controls which verdicts are allowed to count.


Adding AlphaFold EBI as an observer, not a predictor

Controlled Expansion

Relative to EXP-033, EXP-034 added a live AlphaFold Protein Structure Database / EBI observer line.

This was not promoted into a primary predictor.

It was wired as an observer/reference oracle and traced into governance as ebi_g2.

The result:

Check Result
AlphaFold EBI direct endpoint for P23219 GO
Stage 7 observer tests 2 passed
ebi_g2 governance traceability PASS
BLOCKED_IDP mapping path validated in test

The point is not simply that an external endpoint responded.

The point is that the external signal entered the system through a governed path. It was not allowed to float beside the pipeline as informal context.

EXP-034 tested whether the new observer could be admitted without becoming an ungoverned side channel.


AG-live: non-degradation, not repair

AG-live: non-degradation, not repair

Track-C tested a simple question:

If AG-live enters the pipeline, does it change the final decision?

The answer was no.

AG-live did enter the pipeline.

The AlphaGenome field was present with:

AG field Value
source alphagenome_api_live
pathogenicity_score 0.5
confidence 0.7143
clinical_significance uncertain

These are sanitized branch artifact values, not implementation code or full raw artifacts.

AG-live did not change classification.

Both controls remained governed by the same conservative decision boundary:

Path Expected Observed Interpretation
EXP032-BLOCK-001 negative control BLOCK_EXPECTED BLOCK / ESCALATE fail-closed behavior preserved
EXP032-PASS-001 pass-eligible control PASS_ELIGIBLE BLOCK / ESCALATE conservative over-blocking persisted

That is the key nuance.

AG-live did not create a dangerous false-pass. The negative control stayed blocked.

But AG-live also did not repair the current regeneration hold. The pass-eligible control still failed to recover and remained blocked under R2_component_floor.

The governance surface moved slightly, but the verdict did not:

Metric Earlier AG branch AG-live branch
p_e2e 0.0912 0.0947
clinical status BLOCK BLOCK
rule R2_component_floor R2_component_floor

So the correct conclusion is not:

AG improved the pipeline.

The correct conclusion is:

AG-live changed the measurement surface slightly, but did not change the decision boundary.

That is exactly what non-degradation means here.

It preserved fail-closed behavior on the negative control while leaving the pass-eligible control over-blocked.

This is why Track-C can only be called non-degradation, not repair.


Contract passed, but governance still blocked

Contract passed, but governance still blocked

One of the most useful details in EXP-034 is that the contract layer and governance layer did not collapse into one verdict.

The contract inspection reported:

Field Value
pipeline contract score 0.9077
weakest connection C2
dangerous pass risk 0.0
gate recommendation PASS
overall OK true

But the clinical governance layer still blocked the case.

That is not a contradiction.

It means the pipeline connection was valid enough to inspect, but the decision was not safe enough to accept.

This distinction matters.

A weaker system might treat a passing contract as permission to pass the whole output. EXP-034 did not do that. It allowed the contract layer to say:

The pipeline is connected.

while the governance layer could still say:

The claim should not pass.

That separation is exactly what a governance layer is supposed to preserve.


Cross-cycle comparison: EXP-032 → EXP-033 → EXP-034

Track-D compared the accepted anchor path across cycles.

You do not need the earlier experiments as background. They matter here for one reason only:

EXP-034 was not allowed to invent a new success criterion.

EXP-032 and EXP-033 provided the previous PASS/BLOCK baseline. EXP-034 tested whether that baseline survived expansion.

The classification baseline stayed fixed:

Compare Accuracy / balanced accuracy
EXP-032 → EXP-034 1.0 / 1.0
EXP-033 → EXP-034 1.0 / 1.0

At the same time, governance signals moved:

Governance signal Delta
ccge_p_e2e_mean +0.04447488775996111
nnsl_sr9_tech_mean +0.04692394788063081
nnsl_di2_tech_mean -0.03667940951579321

The interpretation is narrow:

The judgment baseline stayed fixed while the governance surface became more measurable.

That is what EXP-034 was allowed to claim.

It did not prove biological efficacy.

It did not prove that every branch of the system was now stable.

It proved that controlled expansion could happen without breaking the accepted PASS/BLOCK baseline.


Stage-gate result

Cross-cycle comparison

EXP-034 ended with all five stage gates passing:

Gate Status
G1 parity PASS
G2 reproducibility PASS
G3 cross-experiment compare PASS
G4 governance traceability PASS
G5 extension safety PASS

Final state:

Field Value
overall status PASS
anchor mode legacy_replay
first failed gate null
diagnostic hold Track-A current regeneration

This is the important nuance:

The experiment passed with a retained diagnostic hold.

That is not a contradiction. It is the point of the control system.

The accepted anchor path was allowed to proceed. The current regeneration path was not. The remediation target was moved to EXP-035.

That separation is the actual proof EXP-034 provides: not that every branch became stable, but that instability was not allowed to contaminate acceptance.


What EXP-034 actually showed

What EXP-034 actually showed

EXP-034 did not show that the entire pipeline is now stable.

It showed something narrower and more useful:

A method-locked Bio-AI governance pipeline can admit modal expansion, AlphaFold EBI observer wiring, and AG-live measurement without losing its accepted PASS/BLOCK baseline — while keeping the unstable regeneration path out of acceptance.

Track-C sharpened that conclusion.

AG-live entered.
Metrics moved slightly.
The verdict did not change.
Dangerous false-pass did not appear.
Conservative over-blocking remained.

That is not a clean success story.

It is a governed result.


Closing

The Mark of a Mature AI Pipeline

Stage-gated experimentation is not just about getting a result.

It is about deciding whether the result should be allowed to exist.

In EXP-034, the answer was:

GO   for the anchored expansion path
HOLD for current regeneration
NEXT for EXP-035 remediation

That may sound less dramatic than a clean success story.

But in governance work, that is exactly the point.

A mature AI pipeline is not the one that claims everything passed.

It is the one that can say:

This path passed.
This path did not.
And we did not mix them.

More Posts

Local-First: The Browser as the Vault

Pocket Portfolioverified - Apr 20

Split-Brain: Analyst-Grade Reasoning Without Raw Transactions on the Server

Pocket Portfolioverified - Apr 8

What Is an Availability Zone Explained Simply

Ijay - Feb 12

AWS Account Locked! How One IAM Mistake Cost Me

Ijay - Mar 18

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!