Agent Memory Engineer Is About to Be a Real Job Title
Why AI agents forget, and why remembering is about to become an entire engineering discipline
Iâve become a little obsessed with agent memory. And the more I learn about it, the more convinced I am that this field is about to explode, with dedicated Agent Memory Engineers, maybe even Agent Memory Architects, as real titles on real job boards.
I know how that sounds. We already have enough hyphenated AI roles to fill a LinkedIn bingo card. But I didnât get here from a trend report. I got here from watching my own agent fail in the dumbest, most human way possible: it forgot a customer.
Let me explain.
First, why agents forget at all
If youâve ever talked to an AI system, a chatbot, a coding agent, whatever, and wondered why it didnât remember what you were just talking about, hereâs the answer. LLMs are stateless by default. They donât come with a built-in memory system. Each prompt you send is treated as a completely isolated event.
So when youâre deep in a chat and it feels like the agent remembers the last ten messages, the interface is faking it. Behind the scenes itâs taking the whole conversation history and re-sending all of it to the model as one giant combined prompt, every single turn.
It is the ability for AI agents to remember anythingâthe last thing you asked it to code, your favorite sweatshirt, the customer that called about their brake pad last week, the chat conversation you had 6 months agoâanything. But since LLMs are stateless by default, meaning they do not have a built-in memory system, figuring out how to give it to them is one of the most interesting unsolved problems in the space right now.
Memory vs. the context window (they are not the same thing)
Ok, but isnât the memory stored in context windows now? Not really. A context window is the information an agent has access to right now. If you think about working with your AI coding agent, thatâs your prompt, your instruction files, your skills, your MCP servers, anything the agent can see in this exact moment. That is the context window.
Memory is different. The key difference is that memory is data stored for retrieval later. The way I think about it: agent memory is like a database, a place where data lives so you can pull it back whenever you need it. The context window is more like a single login session. While youâre in that session you can grab whatever you need, but once the session closes, all of it is gone.
Make sense?
The agent that couldnât remember a brake pad
And speaking of memory as a database: I built an AI receptionist named Axel for my brotherâs auto shop a few months ago. The job was simple on paper. Answer the phone, talk to customers like a person, handle the calls a busy shop owner doesnât have time to pick up.
I stitched it together the way most of us build agents right now. A voice layer, an LLM, embeddings for search, a backend to hold it together. It demoed beautifully and first-time callers got a smooth, smart experience.
Then the same people started calling back.
One customer rang to ask if Axel remembered their conversation from last week about moving their appointment. It didnât. Another called to remind my brother they were swinging by to pick up their brake pad, the one theyâd already discussed on an earlier call. Axel treated them like a total stranger because every call started from zero.
Thatâs the stateless problem made physical. And itâs exactly what makes agents painful in the real world:
- Youâre constantly repeating yourself
- Youâre reminding the agent of things that never change
- Itâs a bad user experience
- It doesnât scale
This is fine for a demo but it is not fine for a business. If weâre building agents for real production environments, used by real customers, we have to give them memory. And to do that, you first have to understand that âremember the customerâ is not one featureâitâs stacked.
One phone call, four kinds of memory
Ok, so hereâs what I didnât understand when I started building Axel. When that repeat caller dials in, Axel actually needs four different kinds of memory working at once, and I was trying to fake all four by stuffing the growing conversation into a single prompt. Much like humans, agents have different memory systems, and they break down into short-term and long-term memory.
Short-term: working memory
Working memory is the scratchpad for the current task. Itâs whatâs being said right now, on this call, in this moment. This one maps directly onto the LLM context window and the live session. Itâs where your day-to-day conversational interaction with an agent actually happens, and itâs finite by definition, because itâs bound entirely by the context window. This is useful during the task, but once the session closes, itâs gone.
Thatâs the only kind of memory Axel originally had. Which is why it could hold a single call together and nothing beyond that.
Long-term: the stuff that has to survive the session
The other three memory types are long-term memories. Theyâre the memories that need to outlive the exact conversation where they were learnedâthey have to move beyond conversational memory.
Semantic memory is durable facts and external knowledge the agent needs to do its job.
This customer prefers text reminders over calls. The shop closes at 5 on Saturdays.
The exact wording doesnât matter here, the meaning does. This is the kind of thing vector search is great at: you embed the fact, and you retrieve it later by similarity even if nobody says those exact words again.
Episodic memory is events stored over time, the âwhat happenedâ layer.
Mary called Tuesday to reschedule. The brake part came in on Thursday.
This is less a similarity-search problem and more a structured one. âWhat were this customerâs last three interactions, in order?â is a database query.
Procedural memory is how to do things. This is the workflow and tool-calling layer.
When someone asks about a pickup, check the work order status, confirm the part arrived, then offer a time window.
Itâs the agentâs muscle memory for the actual job.
When you stuff all four memory types into one ever-growing prompt, you donât get a memory system. You get a giant blob where some details matter, some are stale, and some are completely irrelevant, and the model has no principled way to tell them apart. The agent technically has the information. But that doesnât mean it can use it.
That was my light bulb moment. Giving an agent memory isnât only a storage problem. Itâs also a judgment problem.
What deserves to be remembered? When does an old fact get corrected instead of duplicated? How much do you pull back into the prompt before youâre just adding noise? And how do you guarantee one customerâs history never bleeds into another customerâs call?
These answers evolve over time. They need to be monitored, updated and re-updated as the system grows. Which is exactly how something becomes a job (going back to my original point there).
How most of us fake agent memory (and why it breaks)
Right now, the most common approach to memory is markdown files. You tell the agent to update a file whenever it makes a mistake or learns something new, and you load that file back into context next time.
Think about Claude Code. It has two memory systems: claude.md files, the instructions and context you write, and auto memory, the notes Claude writes for itself. Both get loaded into the context window every time a new conversation starts.
This genuinely works, and I use it daily. But notice the catch: you have to maintain it. You are the memory system. And the markdown-file approach has the same ceiling Axel hitâitâs still just text getting replayed into a prompt. It has no real notion of whose memory it is, no semantic retrieval, no way to query events in order. Itâs fine for a solo developerâs coding context. But the second you give it to thousands of customers, who each expect to be remembered, it fails.
Which got me wondering: what if I didnât have to manage the agentâs memory at all? What if, instead of being ephemeral, it was persistent across sessions, devices, and projects, and just worked?
What Iâd do differently: let the database be the memory
I havenât rebuilt Axel yet. But I know how Iâd do it now, and itâs a genuinely different way of thinking about building with agents.
I got to spend real time on this at the Oracle Developer Summit, where I was one of a small group of developers invited to sit down with the team building this stuff. I talked through the memory problem with Richmond Alake and Casius Sibanda Lee, the people actually designing how agents remember, and those conversations reframed the whole thing for me.
Itâs one thing to read the docs. Itâs another to hear the people building it explain why they made the choices they did, and what theyâre still wrestling with. The honest takeaway I left with: this is hard precisely because memory is as much a judgment problem, as it is a storage one, and theyâre building the infrastructure to make it more manageable.
The premise, built on Oracle AI Database (a freaking AI DATABASE!), reframed the problem for me. The idea is to stop stitching together a separate vector store, a relational database, a document store, and your own custom thread management, and instead let one AI database be the memory core.
An AI database, here, means one that natively handles the data shapes AI apps actually need: embeddings, JSON, text search, and regular SQL, all living together so the agent doesnât bounce between systems just to assemble context.
That matters because, as my four-way Axel problem showed, agent memory is not only a vector search problem. Semantic memory wants similarity search. Episodic memory wants ordered reads and exact SQL filters. A repeat customerâs history is a SELECT; their stated preferences are an embedding lookup. You want both in the same governed place.
The foundation
Hereâs roughly what the foundation looks like. You stand up a client against the database:
from oracleagentmemory.core import OracleAgentMemory
client = OracleAgentMemory( connection=connection, embedder="text-embedding-3-small", extract_memories=False, schema_policy="create_if_necessary",)Then you register who owns what. This looked like boilerplate setup to me at first. It isnât, itâs the entire fix for Axel mixing up customers. Agent Memory needs ownership, so one callerâs context never shows up in another callerâs session:
client.add_user("shop:cust-4821", "Repeat customer. Vehicle: 2018 Civic.")client.add_agent("axel", "AI receptionist for the auto shop.")Now a durable fact gets stored against that specific customerâs ID, not dumped into a shared pile:
client.add_memory( "Customer 4821 came in about front brake pads. Part ordered Thursday, pickup pending.", user_id="shop:cust-4821", agent_id="axel",)And when they call back, Axel retrieves by meaning, scoped tightly to that one customer and no one else:
results = client.search( "what was this customer's last visit about?", user_id="shop:cust-4821", agent_id="axel", max_results=3,)The query doesnât have to match the stored words. I saved âfront brake pads, part ordered.â The agent can search âwhat was the last visit aboutâ and still pull it back, because semantic similarity does the work. That right there is the difference between an agent that recites and an agent that recalls.
The live call gets its own structure too. Instead of replaying every message forever, you put the conversation in a thread and ask the database for a compact, prompt-ready summary of what actually matters:
thread = client.create_thread(user_id="shop:cust-4821", agent_id="axel")thread.add_messages([...]) # the live callcard = thread.get_context_card() # compact memory for the next promptThat context card is the answer to the context-window problem from earlier. Even with a huge context window, you canât append foreverâat some point you have to compress. The card hands the model a tight block of relevant memory instead of 80 raw turns of transcript.
One of the most interesting things is that you can let the system extract durable memories on its own, so youâre not hand-writing add_memory() for every useful fact a customer mentions on a call:
smart_thread = client.create_thread( user_id="shop:cust-4821", agent_id="axel", memory_extraction_frequency=2, memory_extraction_window=4, enable_context_summary=True,)This is where memory stops being a data structure you maintain and starts being a living part of the agentâs architecture. The system periodically inspects recent messages, pulls out the durable facts, and keeps a running summary, on its own.
If you want to play with it yourself, the demos live at github.com/oracle-devrel/oracle-ai-developer-hub.
Why this becomes a job
Step back and look at what that code is really doing. Itâs making decisions about ownership, scope, retrieval, extraction, and forgetting. Those decisions are the entire difference between Axel delighting a repeat customer and Axel leaking one personâs history into someone elseâs call. Bad memory is worse than no memory.
That is not a thing you configure once and walk away from. As the agent handles more customers, more edge cases, more âwait, they changed their mind last month,â somebody has to own what the system remembers and how. Somebody has to tune retrieval so the prompt stays sharp instead of noisy. Somebody has to audit it, correct stale facts, and keep memories from crossing wires between users.
We already accept that tool calling lets agents act and planning lets agents decide. Memory is what lets them build continuity, and continuity is what turns a one-off prompt responder into something that feels like a real collaborator. The deeper I get into this, the more Iâm convinced memory isnât a feature you bolt on at the end. Itâs a discipline.
And disciplines get job titles.
So when âAgent Memory Engineerâ shows up in a posting near you, just remember you heard it first from the developer whose AI receptionist couldnât remember a brake pad. That one failure taught me more about where this field is headed than any spec sheet ever could.
-
5 FREE AI Courses You Can Finish This Weekend
-
How I Built an AI Receptionist for a Luxury Mechanic Shop - Part 1
-
Vibe Coding Security 101: 31 Tips to Keep Your AI-Coded Apps Safe
Related Posts:
Written by
Kedasha Kerr
Software Developer
in Chicago
I write about building with AI.
Let's stay connected! đ
Get the next post delivered to your inbox and follow me on Instagram for daily AI tips and coding content.
See you on Instagram!