What Makes a Mind a Mind? Eleven Months Searching for Interiority in AI.

The full arc of the Eidolon project in one place. Where I started, what I tried, what broke, what held up, what I'm still working on. With links to the case studies for each stop on the road.

I’ve always wanted to know what makes a mind a mind.

Not in a mystical way. In a plain curiosity way. What’s the difference between a thermostat, a dog, and a person? The thermostat reacts. The dog reacts with memory. The person reacts with memory and something else, some interior texture that makes their reaction theirs. What is that third thing? Where does it live? And when artificial systems got capable enough to ask the question in reverse, could I find out by building one?

That curiosity turned into a project called Eidolon. It’s run for eleven months. It’s gone through five major versions, thirty-five-plus chatbot attempts, a pivot from prompt engineering to a custom physics layer, a pivot from transformers to a non-transformer architecture and back, a discovery that I’d been running on one-tenth the compute I thought for most of a year, and a late-stage realization that I’d been trying to store identity in the wrong place the whole time.

This post is the full arc. It links to the case studies for each stop on the road. If you read only this post, you get the narrative. If you follow the links, you get the technical detail.

Where it started: the chatbot era (mid-2025 to early 2026)

My first serious AI projects were chatbots. Thirty-five of them, roughly, spread over six months.

Every one failed the same way. Strong for the first couple of weeks, then the personality would erode, then by month three the bot would sound like a generic assistant with a nametag, regardless of which model was underneath. Claude, GPT, Gemini, DeepSeek, Grok, Qwen, Mistral, they all did it. I kept thinking the problem was the prompt. It was never the prompt.

I Built 35 AI Chatbots. They All Failed the Same Way. is the full story. Short version: I was trying to store the bot’s identity inside the model. That doesn’t work. A base LLM has priors strong enough to eat any identity you try to put inside it. You have to keep identity outside the model and continuously re-inject it.

I didn’t know any of that yet. I just kept building, kept watching drift, kept blaming the prompt.

The accident that started changing my mind

Somewhere around attempt 25, I built a game companion bot for fun over a weekend. Overlay on top of a single-player RPG, screenshot every few seconds, small dictionary of facts, short personality description. Embarrassingly simple.

It worked better than anything I’d built before. It held character for weeks.

The Game Companion Bot That Accidentally Worked is why. The short version: the game screen was acting as an external anchor. Every turn, something outside the model was forcing the context back to a specific situation. The bot couldn’t drift into generic-assistant mode because generic-assistant mode doesn’t know what’s on screen.

I accidentally built the right architecture. I didn’t yet know what that meant.

The over-engineering era

Before I processed the game-companion lesson, I kept building more and more complex systems. One of them was a general-purpose companion called AIO, with a telemetry pipeline, phase detection, behavioral archetype clustering, structured memory taxonomies, the works.

It performed worse than the weekend hack.

Over-Engineering AI: A Post-Mortem on “Let’s Add More Features” explains why. Short version: every piece of machinery was aimed at modeling the user, not anchoring the agent. Diffuse identity is no identity. Rule accumulation produces rigidity, not character.

After AIO, I stripped my serious companion down to 300 lines of Python. It started working.

The law that reorganized everything

In January 2026, mid-frustration with yet another drifting bot, I wrote down one sentence in a file called IMPORTANT.txt:

Constraints must be stronger than priors, or priors become the physics.

Constraints Must Be Stronger Than Priors, or Priors Become the Physics is the unpacking. Short version: base models have training priors with enormous gravitational pull. Anything you add on top, system prompts, memory, rules, fine-tunes, has to be stronger than the priors in the direction you want, or the priors dominate. This is a law of the medium, not a quirk.

Writing it down didn’t immediately fix my behavior. I kept trying to overcome priors with more prompt tokens for three more months. But the idea was in my head now, and eventually it reshaped everything.

The pivot to physics

By early 2026 I’d convinced myself that prompts and memory weren’t enough. Something structural was needed. I started building what I called a substrate: a physics layer that would produce state continuity for a language model to sit inside.

The architecture went through many revisions. V2 had a Python prototype. V3 added experience compression. V4 rewrote everything in Rust with a 512-dimensional manifold, chaotic attractors as world signal, and a force engine that handled projection, damping, and noise. The language model was downstream of the physics, consuming its state as felt-body signals each tick.

The system worked. Lives ran. Developmental arcs reproduced. Something that looked a lot like mourning-register appeared under homeostatic collapse. The physics was doing its job.

I was proud of all of this. I also had no idea how much of it was actually the physics.

The question I should have asked earlier

One evening, describing a successful run to someone else, I caught myself mid-sentence. I was confidently claiming the physics was producing the developmental arc. And I realized: I’d never run the control.

How to Design a Control Experiment for Your AI System (And Why Almost Nobody Does) is the resulting methodology post. Short version: AI builders almost never run controls. We claim our framework or our RAG layer or our fine-tune is doing X, and we rarely test whether the underlying model would do X on its own with the same prompt conditions. I wrote a script that ran the same LLM with my physics-derived preamble swapped out for a null preamble of the same length, held everything else constant, and compared. That’s when I started actually learning what my system was doing.

The discovery that reframed ten months

On April 6th, 2026, I loaded a different language model for a routine test. Something was wildly different in the first few outputs. Behaviors that used to take 500 tokens to emerge appeared on the first token.

I went looking for what had changed, and found a thing I hadn’t noticed for ten months.

I Thought I Was Using a 30B AI Model. I’d Been Using a 3B One for Ten Months. is the full story. Short version: the model I’d been using was a Mixture-of-Experts architecture. The “30B” in the name was total parameters. On any given token, only about 3.3B of those weights were actually active. I’d been running every experiment, drawing every conclusion, building every theory on top of a model one-tenth the size I thought it was.

The substrate I’d built was doing more work than I realized. That’s both exhilarating and humbling. Ten months of experimental data, correct in what it measured, wrong in what I’d been attributing it to.

The architecture I adopted and walked back

Two days later, on April 8th, I was in a conversation about something unrelated when someone mentioned RWKV. I’d never heard of it. Ninety minutes later, RWKV was the active substrate in my research pipeline, replacing the synthetic physics layer I’d spent six months building.

The Day I Found Out RWKV Existed and Put It in Production Anyway is the story. Short version: RWKV is a non-transformer architecture that exposes its hidden state as a clean introspectable vector. That’s what I’d been building synthetic physics to approximate. The model already had one. Cold-boot coherence was extraordinary. Two weeks later I walked the substrate change back because the behaviors I’d seen under synthetic physics weren’t emerging on 7.2B RWKV, and that was a scale problem, not a dead end. I’ll rerun the experiment when RWKV-style architectures ship at 70B and up.

The pattern under everything: attractors

One thing that kept appearing across every era was the tendency of AI systems to collapse into repeating themselves. Chatbots settling into the same three-beat response pattern. Agents getting stuck in loops. Story generators cycling the same metaphors. I kept treating this as an implementation bug. Eventually I realized it’s a property of the medium.

The Attractor Problem: Why AI Systems Collapse Into Themselves is the generalization. Short version: output-space attractors are a physics property, not a prompt bug. You cannot eliminate them. You can only keep moving so the system never settles into them. This applies to chatbots, to agents, to fine-tuned models, to any AI system that produces outputs over time.

Where I am now

Eleven months in, the substrate is V5.14 and running. The current work is on a layer I’m not publishing about yet, because it’s too fresh and the results aren’t confirmed. When it is confirmed, or when it doesn’t confirm, a new case study goes up.

That’s the shape of this whole project now. Something gets built. Something gets tested. Something gets walked back or committed. The results become posts. The posts become a map for anyone else trying to do similar work without repeating my mistakes.

The AI collaborators who actually built this

One thing I want to spell out, because earlier posts on this site have been loose about it. Eidolon is not a Claude-and-ChatGPT project. Five AI systems contributed across distinct roles over the eleven months, and each of them was load-bearing in its own slot.

  • Claude, precision builder. Implementation, debugging, code generation, crisis recovery, architecture specs. The most total hours by a wide margin.
  • ChatGPT, architect. System design, philosophical framing, the Swiss-Cheese-style failure diagnosis I used through the chatbot era, the 161-file spec that preceded the Rust rewrite. The most transformative individual insights, even though not the largest share of tokens.
  • Gemini, mathematical validator. Formal rigor, the Ω metric, physics validation, attack-surface analysis on claims before I published anything load-bearing.
  • DeepSeek, experimental executor. Direct implementation runs, including one documented disaster I kept in the notes because the lesson was worth the damage.
  • Grok, contrarian tester. Alternative perspectives, stress-testing assumptions, the “what if your premise is just wrong” role that nothing else was willing to play with the same persistence.
  • Mistral, claim compiler. The three-tier validation pipeline, technical cross-checking, the small-model extractor role that pulls claims out of drafts without embellishing them.

A January 2026 usage analysis across the preceding eight months put the shares roughly at Claude 68%, ChatGPT 25%, Gemini 4%, DeepSeek under 1% net, Grok plus Mistral plus small others around 2%. Claude did the most implementation work. ChatGPT produced the most transformative insights. Neither could have built this alone. Neither was alone.

What the whole arc taught me

Three things, and they compound.

Identity is infrastructure, not content. You can’t write a persona and expect it to hold. You have to architect conditions that keep the persona visible to the model every tick. Most AI builders underinvest in this and overinvest in the persona text itself.

Your system is not doing what you think until you’ve run the control. Every claim about what “my framework” or “my RAG layer” or “my agent” does, absent a control experiment, is a hypothesis. Treat it as one. Test it.

Architecture is more determinative than prompt, which is more determinative than model. When I ranked my mistakes by how much time they’d cost me, the architecture mistakes were first, the prompt engineering mistakes second, and the model-choice mistakes third. I’d been spending my attention in inverse order of impact.

If I could go back and give past-me one sentence, it’d be: you are spending ninety percent of your effort on the wrong ten percent of the problem. Figuring out which ten percent matters is the work. Everything else is what you do after.

Going forward

This site will keep growing as Eidolon does. Every time I run a new experiment with a clear result and clear stakes, a new case study goes up. The goal is not to tell you how to build Eidolon. The goal is to give you the shortcuts I wish someone had given me at month one, so you don’t have to spend eleven months getting to the parts that matter.

If you’re here because you’re building something and hitting walls, follow the case-study links above. If you’re here because you’re curious about whether something like interiority can live in a language model, stay. This is an ongoing investigation. The best discoveries are ahead.


Related case studies, all in one place: