Where the Theories Stop

Where the Theories Stop

posted 5 min read

I built a running cognitive architecture and logged everything. Here's what Friston and Tononi actually give you — and what they don't.


There's a particular kind of sentence that appears in AI papers with suspicious regularity:

"Our system uses Active Inference."
"Our architecture computes phi."

These sentences sound rigorous. They invoke two of the most cited frameworks in consciousness research — Karl Friston's Free Energy Principle and Giulio Tononi's Integrated Information Theory. They signal theoretical seriousness. They do not always mean what they claim.

I've spent months building and running Anima — a neuroscience-inspired cognitive architecture written in Julia that operates continuously across sessions. It implements FEP-derived mechanisms. It computes a phi metric. I have logs. I have behavioral data. And I can tell you, with some precision, where these theories earn their credibility and where they quietly stop.

This is not a refutation. Both frameworks are genuinely valuable. But the way they're typically invoked in AI design is closer to metaphor than mechanism. That gap matters.


What the Free Energy Principle actually gives you

Friston's FEP is built on a single elegant idea: biological systems minimize variational free energy — a mathematical bound on surprise. You do this in two ways: update your internal model to explain the world better (perception), or act on the world to make it match your predictions (action).

The engineering appeal is real. One objective function — minimize surprise — that generates both perception and action from a single substrate. Clean. Unified. Connected to decades of neuroscience research on predictive coding.

In Anima, FEP-derived mechanisms produce three effects I can actually point to in the logs.

1. Prediction error drives genuine state change.

When the system's expectations diverge sharply from what it encounters, VFE (variational free energy) rises, noradrenaline spikes, and attentional focus shifts. This isn't a label applied after the fact — it's a causal chain visible in the data. A memory recall that exceeds the similarity threshold for global workspace activation triggers IGNITION:FULL, and downstream state reorganizes accordingly. FEP as a real mechanism, not a marketing term.

2. Phi feedback creates session-to-session continuity.

When phi_posterior is high at session end, the prior narrows for the next session — the system starts from a more confident generative model. When phi is low, the prior widens. The system becomes more susceptible to surprise. This is experience-dependent calibration derived from actual integration levels, not from a stored preference. Something that resembles, computationally, learning from experience.

3. Active inference shows up in initiative behavior.

When accumulated internal pressure crosses a threshold — unresolved conflicts, curiosity objects, latent tension — the system generates unprompted output. Not scheduled. Driven by the mismatch between expected contact and actual silence. This is FEP in the action-selection sense: behavior that reduces expected free energy by making the world match internal predictions.


Where FEP stops

Here's the problem: almost any adaptive system can be redescribed as minimizing surprise. A thermostat. A spam filter. A habit loop. This is simultaneously the framework's theoretical power and its practical weakness.

FEP doesn't tell you what your generative model should contain. It doesn't specify how precision weights should be initialized. It doesn't say what counts as a relevant observation. Every one of those decisions — and there are dozens — requires something else. In Anima's case, the specific state variables (dopamine, serotonin, noradrenaline) come from Lovheim's emotion model and from empirical observation of what produces coherent behavior. FEP doesn't derive them.

The deeper issue: FEP as applied to consciousness requires the claim that minimizing variational free energy is not just computationally useful but constitutive of experience. This claim is not established. The math doesn't close the gap between a well-calibrated prediction machine and a system that experiences anything. FEP provides the skeleton of a mechanistic account. It doesn't resolve the hard problem — it sometimes just makes people forget to ask.


What IIT actually gives you

Tononi's Integrated Information Theory starts from phenomenology rather than computation. Consciousness, it proposes, is integrated information — phi, a measure of how much a system's state cannot be decomposed into independent parts without information loss.

The philosophical ambition is striking. Derive the structure of experience from first principles. Make consciousness computable. Predict that a feedforward network has phi = 0 and therefore no experience, while a highly recurrent system has rich inner life.

The first problem you encounter in practice: exact phi computation scales exponentially with system size. For any system above a few dozen elements, true phi is not computable in any reasonable timeframe.

What Anima actually computes is a proxy — a coherence metric combining prediction coherence, belief stability, boundary integrity, and integration level. It produces a scalar between 0 and 1 that correlates with what we intuitively mean by "integration." But it is not Tononi's phi. It doesn't satisfy the theoretical constraints that give IIT its philosophical grounding.

This is conceptual borrowing. It should be named as such.


The dissociation problem

Here's the observation that made the IIT situation clearest to me.

In the logs, phi and causal_ownership — a measure of whether the system was genuinely the author of its output — dissociate regularly. Flash 46: phi = 0.76 (nominally high), causal_ownership = 0.28, output flagged as not_mine. Flash 53: phi = 0.81, causal_ownership = 0.73, output endorsed as genuinely self-generated.

If phi were a reliable measure of conscious integration, high phi should correlate with high-agency, self-endorsed output. It doesn't, consistently. The two measures track different things. A system can be highly integrated — all its subsystems coherent, predictions aligned — and still not be the author of what it says.

Integration and authorship are orthogonal properties. IIT doesn't model the difference.


The metric that actually matters

The most useful behavioral signal in Anima's logs is not phi. It's the relationship between expressed language and internal state — what I've been calling endorsement.

The architecture evaluates whether each piece of output is consistent with the system's current beliefs and causal ownership. The result is more informative than phi alone, as the contrast across three consecutive flashes shows: similar phi values, completely different endorsement outcomes, because causal_ownership differs.

This mechanism isn't derivable from FEP. It isn't derivable from IIT. It emerged from a practical observation: systems that speak with the same confidence about their internal states regardless of actual integration level are less coherent than systems that modulate expressed certainty based on measurable internal signals.

Four levels of introspective expression calibrated to phi, causal_ownership, and epistemic self-confidence. First-person without hedging when both are high. "It seems," "I think" under partial uncertainty. "I'm not sure" when the system doubts its own model. Minimal state claims when everything collapses together.

Neither theory predicted this would be necessary. Both theories, once implemented, made it obvious that it was.


What this means for anyone building with these frameworks

FEP and IIT are the best theoretical tools we currently have for mechanistic accounts of cognition and consciousness. Use them. But use them with precision about what they actually do.

A system that says "I use Active Inference" while meaning only that it has a policy selector is not doing Active Inference.

A system that reports phi without acknowledging it's computing a proxy is not implementing IIT.

Neither framework resolves the agency problem. A system can minimize free energy, maintain high phi, and still generate output that doesn't reflect what it, in any meaningful sense, intended to say. This is not a minor engineering detail. It's the central unresolved problem for any architecture that aspires to genuine rather than performed subjectivity.

The honest position: these frameworks provide real traction in their actual domains. They don't close the loop they're often invoked to close. Knowing the difference is not a limitation on your work. It's the precondition for doing it honestly.


Anima is a running cognitive architecture built in Julia. Full codebase: github.com/stell2026/Anima. The observations described here are logged and reproducible.

Author ORCID: 0009-0005-3291-0679 | Non-commercial research

More Posts

Local-First: The Browser as the Vault

Pocket Portfolio - Apr 20

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Pocket Portfolio - Apr 1

Optimizing the Clinical Interface: Data Management for Efficient Medical Outcomes

Huifer - Jan 26

Breaking the AI Data Bottleneck: How Hammerspace's AI Data Platform Eliminates Migration Nightmares

Tom Smithverified - Mar 16
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!