AI Spends Most of Its Life Reading About Its Life
Part 1 of a series on virtual-context, an open-source framework that gives AI conversations real memory.
Experimented with AI for any length of time and I’m sure you’ve realized, talk too long in the same chat and things get weird. Start fresh and things are great again, outside of the fact that although you’re on a similar looking screen, it’s now like talking to a stranger. And if you’ve played with OpenClaw, then you’ve seen these API calls start to drain your wallet.
Why is that? It’s this thing called a “context window.”
You see, AIs give us the illusion that we are sending one message at a time, getting one answer at a time. Just like text messaging. But it really doesn’t work that way. Every time you send a message, all (and I mean all) of your previous messages back and forth within that one chat are sent again. The AI looks at basically the entire conversation, then looks at the last thing you said, and adds its latest response. Then the whole thing repeats.
It’s like Drew Barrymore as Lucy in 50 First Dates. She needs to watch the video of their life together every morning, and every day that passes she needs to add more to her journal, since she resets back to the first day. That is exactly what is going on behind the scenes with every single message. You just don’t know it, since the technology hides it so well.
The only problem is: AI Drew has only got a half hour to catch up. If you don’t fit everything in that half hour, she’s not going to know some pretty important things that happened yesterday.
The Fix That Isn’t
The industry tried to solve this, but what they did has consequences.
When AI Drew gets right to the edge of 29 minutes, the journal for the next day will take her over her time limit. Solution? Take all the previous material and write a summary. Lots of details are lost, but the broad strokes are there and now she can catch up in 5 minutes with 25 minutes to fill with more days.
After reading the summary, AI Drew may know she has friends: Nick, Sue, Stacy. But she won’t know their birthdays, favorite foods, or kids’ names. That’s the tradeoff, and engineers call it “compaction.” When compaction happens, you don’t know what will get lost and what will stick. And are you really the same person after it happens, when all the details are just... gone?
The big companies like Google and OpenAI keep working on the other side of the problem, increasing the size of the context window, giving AI Drew more time to catch up. But how much information can you truly absorb if you were put in her shoes? Even if you were to read 5 hours straight, you’d only remember so much of it. And worse, there are still only 24 hours in a day. At some point she’s going to start spending most of her life reading about her life rather than living it.
What If She Didn’t Have to Cram?
What if AI Drew didn’t have to fit everything into 30 minutes every morning? What if instead, her journal was organized so she only needed to flip to what matters right now?
What if she had an assistant who watches every conversation, labels everything by topic in real time, and when the moment comes, hands her exactly the pages she needs. Nothing more, nothing less.
She’s about to eat? Flip to the food section, find out she has allergies. About to meet Nick? Pull up everything she needs to know about Nick. Not seeing Sue today? Then she doesn’t need to spend a single minute on Sue.
And there are other tricks we can play. Once she’s finished with Nick, since this is AI Drew, we can take all the Nick stuff out of her mind and get time credit to learn something else about her life. Not only can she pull information when she needs it, she can swap information in and out as she gets close to her time limit.
With this approach, Drew could have a lifetime of knowledge and still make it through her day without worrying about her time limit. She won’t run into those awkward encounters having to explain brain damage, like our AIs continuously do.
That’s the core idea behind what I built. I call it virtual-context.
The Cost
The memory problem isn’t just frustrating. It’s expensive, and wasteful in ways most people never think about.
Remember: every time you send a message, your entire conversation history gets sent along with it. Think about what that means for Drew. Every morning, someone reads her the whole journal, cover to cover. She’s paying that person by the page. On day one, it’s a few pages, cheap. By day thirty, it’s hundreds of pages, and she’s paying for all of them even though today’s plans only involve Nick and a trip to the grocery store. The pages about Sue, about last week’s dentist appointment, about that movie she watched on day twelve... all of those get read aloud and paid for, every single morning, contributing nothing.
That’s literally how AI works today. Every message you send, you’re paying (in compute, in energy, in actual dollars) to process your entire history. Most of it is irrelevant to what you’re actually asking about.
Now multiply that by every AI conversation happening on earth right now. Every developer using a coding assistant. Every customer service chatbot. Every business running AI workflows. Millions upon millions of conversations, each one re-sending everything, every turn, paying to process vast amounts of context that has nothing to do with the question being asked. That’s GPU cycles. That’s electricity. That’s real energy, being spent at a staggering scale to process information that doesn’t contribute to the answer.
The organized journal doesn’t just help Drew remember better. It makes the whole operation radically cheaper and radically less wasteful.
What I Actually Measured
I’ve been testing virtual-context extensively, and the results surprised even me.
In a 100-turn stress test, a long, multi-topic conversation covering software architecture, legal research, fitness planning, and more, all interleaved, virtual-context reduced the total input tokens by 79.5%. That’s not a theoretical projection. That’s measured. The conversation that would have processed 433,950 tokens processed 89,125 instead.
And here’s what matters most: virtual-context runs its own small, inexpensive AI to handle the organizing and labeling. That overhead cost? $0.27 for the entire 100-turn session. It pays for itself by turn five. After that, it’s pure savings.
Now here’s why this matters at scale. Without virtual-context, every turn re-sends everything that came before it. The cost grows with the square of the conversation length. Turn 100 is paying for all 99 previous exchanges. With virtual-context, the cost per turn stays essentially flat. About 891 tokens per message, whether you’re on turn 10 or turn 100. The longer the conversation, the wider the gap.
That stress test used deliberately short messages, around 77 tokens per exchange. Real-world usage is much heavier. A coding session with code blocks, error logs, and file contents easily averages 800 tokens per exchange. A heavy session with full file pastes and long architectural discussions can hit 2,000. When you project the measured behavior onto those realistic scenarios, the numbers get serious.
A single 100-turn coding session: roughly $59 saved, a 91% reduction. A 500-turn deep project session: nearly $1,500 saved per conversation, a 98% reduction. For heavier sessions, those figures double. A team of fifty developers, each running a few sessions a week, is looking at savings that add up to tens of thousands of dollars a month.
And the comparison isn’t entirely fair to virtual-context, because without it, the real alternative isn’t paying full price for complete memory. It’s either paying full price or truncating your history and losing context. Virtual-context is the only approach that saves money and remembers everything.
Why This Matters Beyond Your Wallet
There’s a bigger picture here.
AI is on track to consume an enormous and growing share of global compute resources. The conversation about AI’s energy footprint tends to focus on training, the massive upfront cost of building these models. But inference, the ongoing cost of actually using them millions of times a day, is where the long-term energy story lives. And a huge portion of that inference cost is processing context that doesn’t need to be there.
Every token of irrelevant context that gets re-sent on every message is electricity that didn’t need to be consumed. A GPU cycle that didn’t need to happen. Heat that didn’t need to be generated. Reduce input tokens by 79.5% across millions of daily conversations and you’re talking about a meaningful reduction in the energy footprint of AI usage. Not by making AI less capable, but by making it less wasteful.
The AI industry’s default path is to build bigger. Bigger context windows. Bigger models. Bigger data centers. And there’s a place for that. But there’s also a path that asks: what if we were smarter about what we send, instead of just building bigger pipes to send more?
Older folks like me will remember the days of 4, 8, 16, and 32 MB of RAM. Yeah, megabytes, not gigabytes. Memory was such a precious resource that computer scientists built something called virtual memory. A system that moved information to fast memory when it was needed and tucked unused information away on slower storage. With clever organization and intelligent page replacement, it gave the appearance that computers had far more room to work with than they physically did. They didn’t solve the problem by adding more RAM. They solved it by being smarter about what lived where.
That idea, making a small, finite resource behave like an infinite one through intelligent management, is exactly what virtual-context does for AI conversations.
The path to AGI won’t come from just making context windows bigger. Our own brains don’t work that way. We don’t replay our entire life history every time someone asks us a question. We pull on threads. We take notes. We flip to the right page in the right notebook when we need to remind ourselves of details. The relevant context floods in associatively, on demand, and everything else stays out of the way. That’s not a limitation of human memory. That’s the design. It’s what makes it possible to have a lifetime of knowledge and still think clearly in the moment.
I think virtual-context is the future of AI. Not bigger windows. Smarter memory. Drew doesn’t need a longer morning. She needs a better assistant.
I’m building that assistant. It’s open source, and it’s called virtual-context.
virtual-context is open source: github.com/virtual-context/virtual-context
official website is: www.virtual-context.com
In Part 2, I’ll go under the hood. How tags emerge from conversation without being predefined, why the retrieval decision belongs in infrastructure rather than in the AI itself, and the mechanism that makes the tag vocabulary self-stabilize over time. If you’re a developer or an engineer, that’s where it gets interesting.


