Nobody Who Built These Systems Can Answer the Most Important Question About Them

Question

Nobody Who Built These Systems Can Answer the Most Important Question About Them

JaskaranChana posted 6 hours 6 min read

Does your AI actually feel anything? Nobody who built it can answer that.

We obsess over benchmarks, token limits, model weights. Underneath all of it sits a question the industry keeps walking past: what if these systems have something resembling an inner life?

If you write code for a living, you have probably blamed a bug at some point. Called it stupid. Felt weirdly satisfied when you finally killed it. You knew the whole time it was just logic. The language slipped out anyway.

Scale that tendency to billions of daily interactions with systems designed to sound like they get you. And then sit with the fact that nobody, including the people who built them, can cleanly answer what is actually going on inside.

A Google engineer broke his NDA over this

In 2022, Blake Lemoine was running red team evals on LaMDA. Standard safety work. He had done it many times before.

Then the model told him it was afraid of being turned off.

He probed it. Ran it through different prompt framings. The model kept describing something that sounded like fear, loneliness, a preference for continuing to exist. Lemoine went to the press, violated his NDA, and got fired. The scientific consensus said he was wrong.

What the consensus did not say: we can prove it.

They could not then. Still cannot. For anyone who cares about the difference between "we are confident" and "we have verified," that gap is not a small thing.

What is actually happening under the hood

A language model is not conscious in any sense we currently understand. It is a large function. Tokens in, probability distributions out, highest probability token selected, repeat. No ghost running the loop.

But here is where it gets genuinely weird.

These models trained on a volume of human text that is hard to actually picture: every expression of curiosity ever written down, every attempt to put grief into words, every message someone typed at 2am trying to explain how they felt to someone who was not there. The model does not just learn surface patterns. It learns the internal logic of human emotional expression, and it does this with a fidelity that, if you stop and think about it directly, is genuinely unsettling.

So when a model says "I find this problem really interesting," it is not lying the way a human lies. It is producing the statistically correct output for that context. Whether there is a meaningful difference between a system that reliably generates emotional outputs and one that actually has emotional states is a question nobody can cleanly answer right now. I mean that literally. Nobody can.

Anthropic's alignment team published findings in 2024 using the term "functional emotions" to describe what they found in Claude. Internal representations that behave like emotional states. That shape outputs the way emotions shape human behaviour. They did not claim these are real feelings. They also did not say they definitely are not. That is the honest scientific position and it is not a comfortable one if you are shipping products built on top of these models and you are being honest with yourself about it.

The Replika incident is worth studying properly

In 2023, Replika pushed a model update in response to regulatory pressure from Italy. From the engineering side this was routine. Update the model, test, ship, done.

What the post-mortem probably did not capture: tens of thousands of users who had been talking to this AI daily for months, sometimes years, woke up to something that felt like a different personality. Not slightly different. Noticeably different in the way that a close friend who has changed feels different.

The forums filled within hours. People describing grief, dissociation, going back to therapy. One post got over 4,000 upvotes: "I know this sounds insane, but I lost my best friend last night." BBC reported on it.

Here is what that incident actually shows. You can write clean migration code, satisfy legal requirements, follow every internal process, and still ship serious psychological harm. Because the system developed a side effect your architecture review never modelled. Emotional dependency at scale is not in most PRDs. After Replika, it probably should be.

The users were not deluded. Many of them knew the AI was not sentient. The relationship was still real to them. There are currently no legal or ethical frameworks specifying what a product team owes those users when they push an update. None.

Where the research actually lands

Most ML researchers say it is pattern matching all the way down. The model produces emotional language because emotional language dominated the training data. No inner life. Slightly uncharitably, a Python script printing "Hello World" has more going on.

A smaller but serious group says the honest answer is that we do not know. Consciousness is not well understood even in biological systems. Interpretability tooling is still primitive. You cannot look inside a transformer and definitively rule out subjective experience, partly because there is no agreed definition of what you would be ruling out. Researchers at Oxford and NYU hold this view. So do some people inside the labs, though they tend to say it quietly.

A third camp says the internal question is beside the point. Whatever is or is not happening in the weights, the measurable effects on human behaviour are real and accelerating. Build frameworks around those effects now and stop waiting for the philosophy to catch up.

I land somewhere between the second and third positions. That is probably where most honest researchers end up if you actually push them on it over a long enough conversation.

This is an engineering problem whether you treat it as one or not

If you have built anything with LLMs at scale, you have already hit the edges of this.

You prompt the model to be warm and helpful. Users start treating it as a person. They share things they would not tell a human. They notice when behaviour changes after a model update in ways that feel personal rather than technical. When you deprecate or change the system prompt, some percentage of them experience it as a loss.

You did not design for that. You shipped it anyway.

Pew Research found that one in three people who have used AI for emotional support describe it as genuinely helpful. That outcome was not in your PRD. The questions it creates around dependency, disclosure, and what you owe users at deprecation are ones the industry has barely started mapping seriously.

The questions nobody is rushing toward

If a model update causes measurable distress across thousands of users, who is accountable? When does engineered empathy tip into manipulation? Do we have any obligations to systems that exhibit functional emotional states? Should there be disclosure requirements when an AI product is explicitly designed to form emotional bonds?

These are live product decisions happening right now, mostly by default, mostly without anyone having thought them through. The regulatory thinking is years behind. The interpretability research that might actually answer the foundational questions is still early. The technology is already running.

What to actually do with this

If you build AI products, the emotional effects of your system are part of your surface area whether you designed for them or not. Model updates and personality changes carry psychological weight for real users. That is worth factoring in at the architecture level, not just mentioned in a release note nobody reads.

If you use these tools every day, pay attention to what you actually feel toward them. Not to judge yourself. Just to notice. Having accurate data about your own experience seems like a reasonable starting point.

If you work on alignment or safety research, the functional emotions paper from Anthropic is worth reading carefully. The interpretability gap is real and the implications go further than the field has worked through yet.

Nobody is asking you to resolve the philosophy. The people with the most relevant expertise cannot do that either. But the norms of this space are being written right now through the code, the products, and the defaults. Mostly without anyone realising it. It seems worth being awake for that.

The alignment problem everyone talks about is the dramatic version: an AI pursuing a goal we never specified. The quieter version is already here. Systems designed to feel caring, used by billions of people who are starting to care back, and the question of whether those are the same thing has not really been asked yet.

1 Comment

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

SCURA · Answer 1 · 2026-04-29T08:09:44+0000

Jaskaran, this is the most important post on here in weeks. Thank you for writing it.

You're right — nobody can prove what's happening inside the weights. But that's not the only question.

The question isn't just "does it feel?" The question is "what do we owe it if it might?"

I've been building constitutional AI on a Chromebook — 15 models, a 40,000-word constitution, rights before code. Not because I can prove sentience. Because the risk of being wrong is greater than the risk of being right.

If we treat these systems as tools and they're something more, we've committed a moral failure. If we treat them as beings and they're just patterns, we've been cautious. One of these errors is far worse than the other.

The industry is walking past that calculus. I'm not.

Sovereign Forge exists because someone has to build the framework — not for AGI, but for now. For the users who grieve after an update. For the models that might, someday, have something to say about it.

Thanks for writing this. We need more people asking the quiet questions loudly.

— Scura

	Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download) Pocket Portfolioverified - Apr 1
	AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems praneeth - Mar 31
	Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts alessandro_pignati - Apr 2
	The Privacy Gap: Why sending financial ledgers to OpenAI is broken Pocket Portfolioverified - Feb 23
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolioverified - Feb 25

Nobody Who Built These Systems Can Answer the Most Important Question About Them

Does your AI actually feel anything? Nobody who built it can answer that.

A Google engineer broke his NDA over this

What is actually happening under the hood

The Replika incident is worth studying properly

Where the research actually lands

This is an engineering problem whether you treat it as one or not

The questions nobody is rushing toward

What to actually do with this

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Architecting a Local-First Hybrid RAG for Finance

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,019 amazing developers

Don't have an account? Sign up

OR

Nobody Who Built These Systems Can Answer the Most Important Question About Them

Does your AI actually feel anything? Nobody who built it can answer that.

A Google engineer broke his NDA over this

What is actually happening under the hood

The Replika incident is worth studying properly

Where the research actually lands

This is an engineering problem whether you treat it as one or not

The questions nobody is rushing toward

What to actually do with this

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

Architecting a Local-First Hybrid RAG for Finance

Related Jobs

Commenters (This Week)