Agent Memory Engineer Is About to Be a Real Job Title

I’ve become a little obsessed with agent memory. And the more I learn about it, the more convinced I am that this field is about to explode, with dedicated Agent Memory Engineers, maybe even Agent Memory Architects, as real titles on real job boards.

I know how that sounds. We already have enough hyphenated AI roles to fill a LinkedIn bingo card. But I didn’t get here from a trend report. I got here from watching my own agent fail in the dumbest, most human way possible: it forgot a customer.

Let me explain.

First, why agents forget at all

If you’ve ever talked to an AI system, a chatbot, a coding agent, whatever, and wondered why it didn’t remember what you were just talking about, here’s the answer. LLMs are stateless by default. They don’t come with a built-in memory system. Each prompt you send is treated as a completely isolated event.

So when you’re deep in a chat and it feels like the agent remembers the last ten messages, the interface is faking it. Behind the scenes it’s taking the whole conversation history and re-sending all of it to the model as one giant combined prompt, every single turn.

It is the ability for AI agents to remember anything—the last thing you asked it to code, your favorite sweatshirt, the customer that called about their brake pad last week, the chat conversation you had 6 months ago—anything. But since LLMs are stateless by default, meaning they do not have a built-in memory system, figuring out how to give it to them is one of the most interesting unsolved problems in the space right now.

Memory vs. the context window (they are not the same thing)

Ok, but isn’t the memory stored in context windows now? Not really. A context window is the information an agent has access to right now. If you think about working with your AI coding agent, that’s your prompt, your instruction files, your skills, your MCP servers, anything the agent can see in this exact moment. That is the context window.

Memory is different. The key difference is that memory is data stored for retrieval later. The way I think about it: agent memory is like a database, a place where data lives so you can pull it back whenever you need it. The context window is more like a single login session. While you’re in that session you can grab whatever you need, but once the session closes, all of it is gone.

Make sense?

The agent that couldn’t remember a brake pad

And speaking of memory as a database: I built an AI receptionist named Axel for my brother’s auto shop a few months ago. The job was simple on paper. Answer the phone, talk to customers like a person, handle the calls a busy shop owner doesn’t have time to pick up.

I stitched it together the way most of us build agents right now. A voice layer, an LLM, embeddings for search, a backend to hold it together. It demoed beautifully and first-time callers got a smooth, smart experience.

Then the same people started calling back.

One customer rang to ask if Axel remembered their conversation from last week about moving their appointment. It didn’t. Another called to remind my brother they were swinging by to pick up their brake pad, the one they’d already discussed on an earlier call. Axel treated them like a total stranger because every call started from zero.

That’s the stateless problem made physical. And it’s exactly what makes agents painful in the real world:

You’re constantly repeating yourself
You’re reminding the agent of things that never change
It’s a bad user experience
It doesn’t scale

This is fine for a demo but it is not fine for a business. If we’re building agents for real production environments, used by real customers, we have to give them memory. And to do that, you first have to understand that “remember the customer” is not one feature—it’s stacked.

One phone call, four kinds of memory

Ok, so here’s what I didn’t understand when I started building Axel. When that repeat caller dials in, Axel actually needs four different kinds of memory working at once, and I was trying to fake all four by stuffing the growing conversation into a single prompt. Much like humans, agents have different memory systems, and they break down into short-term and long-term memory.

Short-term: working memory

Working memory is the scratchpad for the current task. It’s what’s being said right now, on this call, in this moment. This one maps directly onto the LLM context window and the live session. It’s where your day-to-day conversational interaction with an agent actually happens, and it’s finite by definition, because it’s bound entirely by the context window. This is useful during the task, but once the session closes, it’s gone.

That’s the only kind of memory Axel originally had. Which is why it could hold a single call together and nothing beyond that.

Long-term: the stuff that has to survive the session

The other three memory types are long-term memories. They’re the memories that need to outlive the exact conversation where they were learned—they have to move beyond conversational memory.

Semantic memory is durable facts and external knowledge the agent needs to do its job.

This customer prefers text reminders over calls. The shop closes at 5 on Saturdays.

The exact wording doesn’t matter here, the meaning does. This is the kind of thing vector search is great at: you embed the fact, and you retrieve it later by similarity even if nobody says those exact words again.

Episodic memory is events stored over time, the “what happened” layer.

Mary called Tuesday to reschedule. The brake part came in on Thursday.

This is less a similarity-search problem and more a structured one. “What were this customer’s last three interactions, in order?” is a database query.

Procedural memory is how to do things. This is the workflow and tool-calling layer.

When someone asks about a pickup, check the work order status, confirm the part arrived, then offer a time window.

It’s the agent’s muscle memory for the actual job.

When you stuff all four memory types into one ever-growing prompt, you don’t get a memory system. You get a giant blob where some details matter, some are stale, and some are completely irrelevant, and the model has no principled way to tell them apart. The agent technically has the information. But that doesn’t mean it can use it.

That was my light bulb moment. Giving an agent memory isn’t only a storage problem. It’s also a judgment problem.

What deserves to be remembered? When does an old fact get corrected instead of duplicated? How much do you pull back into the prompt before you’re just adding noise? And how do you guarantee one customer’s history never bleeds into another customer’s call?

These answers evolve over time. They need to be monitored, updated and re-updated as the system grows. Which is exactly how something becomes a job (going back to my original point there).

How most of us fake agent memory (and why it breaks)

Right now, the most common approach to memory is markdown files. You tell the agent to update a file whenever it makes a mistake or learns something new, and you load that file back into context next time.

Think about Claude Code. It has two memory systems: claude.md files, the instructions and context you write, and auto memory, the notes Claude writes for itself. Both get loaded into the context window every time a new conversation starts.

This genuinely works, and I use it daily. But notice the catch: you have to maintain it. You are the memory system. And the markdown-file approach has the same ceiling Axel hit—it’s still just text getting replayed into a prompt. It has no real notion of whose memory it is, no semantic retrieval, no way to query events in order. It’s fine for a solo developer’s coding context. But the second you give it to thousands of customers, who each expect to be remembered, it fails.

Which got me wondering: what if I didn’t have to manage the agent’s memory at all? What if, instead of being ephemeral, it was persistent across sessions, devices, and projects, and just worked?

What I’d do differently: let the database be the memory

I haven’t rebuilt Axel yet. But I know how I’d do it now, and it’s a genuinely different way of thinking about building with agents.

I got to spend real time on this at the Oracle Developer Summit, where I was one of a small group of developers invited to sit down with the team building this stuff. I talked through the memory problem with Richmond Alake and Casius Sibanda Lee, the people actually designing how agents remember, and those conversations reframed the whole thing for me.

It’s one thing to read the docs. It’s another to hear the people building it explain why they made the choices they did, and what they’re still wrestling with. The honest takeaway I left with: this is hard precisely because memory is as much a judgment problem, as it is a storage one, and they’re building the infrastructure to make it more manageable.

The premise, built on Oracle AI Database (a freaking AI DATABASE!), reframed the problem for me. The idea is to stop stitching together a separate vector store, a relational database, a document store, and your own custom thread management, and instead let one AI database be the memory core.

An AI database, here, means one that natively handles the data shapes AI apps actually need: embeddings, JSON, text search, and regular SQL, all living together so the agent doesn’t bounce between systems just to assemble context.

That matters because, as my four-way Axel problem showed, agent memory is not only a vector search problem. Semantic memory wants similarity search. Episodic memory wants ordered reads and exact SQL filters. A repeat customer’s history is a SELECT; their stated preferences are an embedding lookup. You want both in the same governed place.

The foundation

Here’s roughly what the foundation looks like. You stand up a client against the database:

from oracleagentmemory.core import OracleAgentMemory

client = OracleAgentMemory(
    connection=connection,
    embedder="text-embedding-3-small",
    extract_memories=False,
    schema_policy="create_if_necessary",
)

Then you register who owns what. This looked like boilerplate setup to me at first. It isn’t, it’s the entire fix for Axel mixing up customers. Agent Memory needs ownership, so one caller’s context never shows up in another caller’s session:

client.add_user("shop:cust-4821", "Repeat customer. Vehicle: 2018 Civic.")
client.add_agent("axel", "AI receptionist for the auto shop.")

Now a durable fact gets stored against that specific customer’s ID, not dumped into a shared pile:

client.add_memory(
    "Customer 4821 came in about front brake pads. Part ordered Thursday, pickup pending.",
    user_id="shop:cust-4821",
    agent_id="axel",
)

And when they call back, Axel retrieves by meaning, scoped tightly to that one customer and no one else:

results = client.search(
    "what was this customer's last visit about?",
    user_id="shop:cust-4821",
    agent_id="axel",
    max_results=3,
)

The query doesn’t have to match the stored words. I saved “front brake pads, part ordered.” The agent can search “what was the last visit about” and still pull it back, because semantic similarity does the work. That right there is the difference between an agent that recites and an agent that recalls.

The live call gets its own structure too. Instead of replaying every message forever, you put the conversation in a thread and ask the database for a compact, prompt-ready summary of what actually matters:

thread = client.create_thread(user_id="shop:cust-4821", agent_id="axel")
thread.add_messages([...])         # the live call
card = thread.get_context_card()   # compact memory for the next prompt

That context card is the answer to the context-window problem from earlier. Even with a huge context window, you can’t append forever—at some point you have to compress. The card hands the model a tight block of relevant memory instead of 80 raw turns of transcript.

One of the most interesting things is that you can let the system extract durable memories on its own, so you’re not hand-writing add_memory() for every useful fact a customer mentions on a call:

smart_thread = client.create_thread(
    user_id="shop:cust-4821",
    agent_id="axel",
    memory_extraction_frequency=2,
    memory_extraction_window=4,
    enable_context_summary=True,
)

This is where memory stops being a data structure you maintain and starts being a living part of the agent’s architecture. The system periodically inspects recent messages, pulls out the durable facts, and keeps a running summary, on its own.

If you want to play with it yourself, the demos live at github.com/oracle-devrel/oracle-ai-developer-hub.

Why this becomes a job

Step back and look at what that code is really doing. It’s making decisions about ownership, scope, retrieval, extraction, and forgetting. Those decisions are the entire difference between Axel delighting a repeat customer and Axel leaking one person’s history into someone else’s call. Bad memory is worse than no memory.

That is not a thing you configure once and walk away from. As the agent handles more customers, more edge cases, more “wait, they changed their mind last month,” somebody has to own what the system remembers and how. Somebody has to tune retrieval so the prompt stays sharp instead of noisy. Somebody has to audit it, correct stale facts, and keep memories from crossing wires between users.

We already accept that tool calling lets agents act and planning lets agents decide. Memory is what lets them build continuity, and continuity is what turns a one-off prompt responder into something that feels like a real collaborator. The deeper I get into this, the more I’m convinced memory isn’t a feature you bolt on at the end. It’s a discipline.

And disciplines get job titles.

So when “Agent Memory Engineer” shows up in a posting near you, just remember you heard it first from the developer whose AI receptionist couldn’t remember a brake pad. That one failure taught me more about where this field is headed than any spec sheet ever could.

Agent Memory Engineer Is About to Be a Real Job Title

First, why agents forget at all

Memory vs. the context window (they are not the same thing)

The agent that couldn’t remember a brake pad

One phone call, four kinds of memory

Short-term: working memory

Long-term: the stuff that has to survive the session

How most of us fake agent memory (and why it breaks)

What I’d do differently: let the database be the memory

The foundation

Why this becomes a job

Why Claude Made My Landing Page Ugly (and How I Fixed It)

How to Code with Claude from Your Phone

A Beginner's Guide to Running AI Models Locally

Kedasha Kerr

I write about building with AI.
Let's stay connected! 💕

Agent Memory Engineer Is About to Be a Real Job Title

First, why agents forget at all

Memory vs. the context window (they are not the same thing)

The agent that couldn’t remember a brake pad

One phone call, four kinds of memory

Short-term: working memory

Long-term: the stuff that has to survive the session

How most of us fake agent memory (and why it breaks)

What I’d do differently: let the database be the memory

The foundation

Why this becomes a job

Why Claude Made My Landing Page Ugly (and How I Fixed It)

How to Code with Claude from Your Phone

A Beginner's Guide to Running AI Models Locally

Kedasha Kerr

I write about building with AI.Let's stay connected! 💕

I write about building with AI.
Let's stay connected! 💕