OpenAI Agents SDK Sessions: Persistent, Until You Need Actual Memory

SQLiteSession, the SQLAlchemy backend, and the compaction wrapper: which session fits which deployment, what Runner.run(session=...) automates, and the four things a message log cannot do (extract facts, cross sessions, scope per user, explain a recall). The two-layer pattern production agents converge on.

memnode•June 11, 2026•9 min read

openai agents sdksessionssqliteagent memoryintegration

The OpenAI Agents SDK ships with a clean answer to "my agent forgets everything between runs": Sessions. Pass one to the Runner and conversation history is fetched, merged, and persisted for you. It is genuinely good, well-designed, thread-safe, with pluggable backends. It is also a message log, and the distance between a message log and memory is where most teams discover they have a second project. Here is what Sessions do, which backend to pick, and where the line is.

The TL;DR: SQLiteSession with a file path is the right default for a single-process agent, switch to the SQLAlchemy-backed session for real databases and async fan-out, add the compaction session when conversations run long, and recognize that none of these extract facts, scope knowledge per user beyond an ID string, or let the agent know anything it was not literally told in that conversation. That last part is the actual memory problem, and the SDK, sensibly, does not pretend to solve it.

Sessions in one example

from agents import Agent, Runner, SQLiteSession

agent = Agent(name="Assistant", instructions="Reply concisely.")

# In-memory by default: history dies with the process
session = SQLiteSession("conversation_123")

# File-backed: history survives restarts
session = SQLiteSession("conversation_123", "conversations.db")

result = await Runner.run(
    agent,
    "What framework did we pick yesterday?",
    session=session,
)

With a session attached, each run does four things automatically: fetch stored items (session.get_items()), prepend them to the new input, execute the turn, and append the new items back to storage. No manual to_input_list() threading between turns. The session ID is your conversation key; the storage backend is whatever class you chose.

Choosing a backend

SQLiteSession("id"), no path: in-memory, vanishes on exit. Tests and demos.
SQLiteSession("id", "file.db"): persistent, thread-safe, zero infrastructure. The right default for a bot on one box.
SQLAlchemy-backed session: Postgres or MySQL behind the same protocol, for when conversations must survive the host and serve multiple workers. The async variant rides the same engine for non-blocking I/O.
Compaction session (Responses API): wraps another session and compacts old turns once a trigger fires, the SDK's answer to marathon conversations blowing the context window. Use it once long sessions get slow or expensive, and read what compaction loses before trusting it with detail you will need later.

Because Session is a small protocol (get, add, pop, clear), writing a custom backend is an afternoon, which is the SDK's quiet invitation to plug in something smarter than a log.

What a message log cannot do

Sessions persist what was said. Memory is knowing what is true. The gap shows up in four concrete ways:

No extraction. "My deploy target is Hetzner, eu-central" sits in row 47 of a transcript. Next week's conversation, under a different session ID, knows nothing. The log never becomes a fact.
Linear growth. Every turn appends. Long-lived sessions mean ever-larger prompts, rising latency and token cost per turn until compaction trims it, lossily.
Sessions do not cross. One session ID, one history. Per-user knowledge that should span dozens of conversations has no home; most teams fake it by stuffing summaries into instructions, which is a memory layer built by accident.
No provenance. When the agent asserts something stale from turn 12 of an old thread, the log cannot tell you why it surfaced or mark it superseded.

None of this is a flaw in the SDK. It drew the line at conversation persistence and drew it cleanly. The mistake is assuming the line is further out than it is.

The two-layer pattern that works

Production agents on the Agents SDK converge on the same architecture: Sessions for the transcript, a memory layer for the facts. The session keeps the current conversation coherent. A separate store holds extracted, user-scoped knowledge, "prefers TypeScript," "deploy target Hetzner," "decided against Pinecone in March", written either by tool calls the agent makes explicitly or by a post-run extraction pass over new session items. At the start of each run, relevant facts load into the agent's instructions; the session handles the rest. We walk the general version of this pattern in the long-term memory guide.

Patterns worth stealing

Corrections via pop_item

Because the protocol includes pop_item, you can implement "wait, I meant X" properly: pop the agent's last response and the user's last message off the session, then run the corrected input. The transcript stays clean instead of accumulating a correction dialogue the model has to reread forever:

await session.pop_item()   # drop the assistant's answer
await session.pop_item()   # drop the mistaken user turn
result = await Runner.run(agent, corrected_input, session=session)

One session per user-context, not per user

Session IDs are free. A support agent serving one customer across three distinct issues works better as three sessions ("acme-billing", "acme-onboarding", "acme-incident-42") than one ever-growing thread, shorter prompts, less cross-contamination between topics, cheaper turns. The thing that should span all three, what the agent knows about Acme, is fact-layer material, not transcript material.

Budget the transcript like the meter it is

A file-backed session that grows to 200 turns is silently re-sending most of those tokens every run. Before reaching for compaction, check whether the conversation should have been multiple sessions plus extracted facts; compaction spends an LLM call to lossily compress what a better split would have avoided storing in the prompt at all.

What it costs

SQLiteSession: free, a file on disk. The cost is the tokens of replayed history per turn, which grows with the session, the real bill is in the model meter, not the storage.
SQLAlchemy/Postgres sessions: whatever your database costs; marginal chat-log storage is noise.
Compaction session: an extra Responses API call when it triggers, paid to make subsequent turns cheaper. Worth it for marathon threads, wasted on well-scoped ones.
A fact layer: an extraction pass per conversation plus a recall per run, against which you save the ever-growing transcript replay. For agents with returning users, the two-layer shape is usually cheaper at the meter than a long session, as well as smarter.

The decision map

CLI tool, demo, test suite: in-memory SQLiteSession.
Single-host assistant that should survive restarts: file-backed SQLiteSession.
Multi-worker or async deployment: SQLAlchemy session on your existing Postgres.
Long-running conversations: add the compaction wrapper, knowing it is a token-budget tool, not memory.
Returning users the agent should know things about: two layers, sessions plus a fact store. No session backend substitutes.

Sessions and handoffs: one transcript, many agents

One subtlety worth knowing before you build a multi-agent flow: the session belongs to the conversation, not to an agent. When a run hands off, triage agent to billing agent to refunds agent, every agent in the chain reads and writes the same session items. That is usually what you want (the refunds agent should see what the user told triage), but it has two consequences teams discover late. First, an agent with a narrow system prompt still inherits the whole accumulated transcript, including material irrelevant to its specialty, which costs tokens and occasionally focus. Second, there is no per-agent privacy inside a conversation: anything one agent elicits, the next can see. If two agents genuinely must not share context, that is two sessions and your own orchestration, not a handoff. The fact layer has the same property in reverse, and it is a feature there: facts extracted once are deliberately visible to every agent serving that user.

The shape memnode optimizes for

Memnode is built to be that second layer. The agent writes facts over HTTP (or through MCP, the same server that backs Claude Code persistent memory), recall is namespace-scoped per user so one customer's facts never leak into another's context, and every fact carries lineage, when the agent says something wrong, you can see which conversation taught it and correct the source rather than the symptom. The data plane runs local-first on your infrastructure, which keeps transcripts and extracted knowledge out of third-party clouds. Use Sessions exactly as the SDK intends; add the fact layer the day "remember this conversation" stops being the same thing as "know this user."

The questions that come up every time

Is there a Redis session? The protocol is deliberately small, so if the built-ins do not fit, a Redis-backed implementation is a screenful of code, and community ones exist. Before writing it, ask whether Postgres-via-SQLAlchemy (already shipped, already durable) is actually insufficient, or just less fashionable.

Do sessions work with streaming and tool calls? Yes. Tool calls and their outputs are session items like everything else, which is exactly why transcripts grow faster than people expect: a five-turn conversation with a busy agent can be fifty items. The growth math above is about items, not user messages.

Can I share one session between two different agents? Mechanically yes, it is just a key into storage. Whether you should depends on the handoff discussion above: shared transcript, shared everything. The cleaner pattern for "two agents, one user" is separate sessions over a shared fact layer.