Cloudflare Agent Memory Beta: Teardown, Limits, and When Self-Hosted Wins
Cloudflare's Agents SDK ships a memory primitive in beta. We tested what it stores, what it doesn't, the per-tenant limits, the cold-start tax, and the three use cases where self-hosted memory still wins.
Cloudflare shipped a memory primitive inside the Agents SDK in 2026 and the search interest curve looks like a hockey stick. "Cloudflare agent memory beta" is rising +6,350% on Trends. The product is clearly going to capture a chunk of the agent-memory market for teams already on Workers. What follows is what the beta actually stores, what it does not, the three limits that are not loudly documented, and the use cases where self-hosted memory still wins.
This is a teardown, not a hit piece. Cloudflare shipped a real thing and parts of it are good. The point is to be precise about which workloads it fits and which ones it does not.
What Cloudflare Shipped
The Agents SDK exposes an Agent class backed by a Durable Object. Each agent instance has its own DO. Inside the DO, a memory primitive gives you a typed key-value store with set, get, list, and delete operations. The API surface is small on purpose:
// inside an Agent handler
await this.memory.set("user.preference.theme", "dark")
const theme = await this.memory.get("user.preference.theme")
const all = await this.memory.list("user.preference.")Under the hood this is the Durable Object storage API with a typed wrapper. Reads are point queries against the DO's local SQLite-backed storage layer. Writes go through the DO's single-threaded mutation loop and persist on commit. Latency is excellent (sub-millisecond reads from the same worker, low single-digit milliseconds from another colo).
The model is "memory tied to an agent instance." Each DO has one agent. The memory belongs to that agent. Cross-agent shared memory is not a first-class concept in the current beta.
What It Stores Well
Three workloads fit cleanly:
- Ephemeral session state. The agent is mid-conversation, the user is mid-task, the worker needs to remember what was just said. DO storage handles this perfectly. Latency is low, durability is good, the DO sticks around as long as the session is active.
- Per-instance configuration. The agent learned the user's preferred response format, the timezone, the language. A few dozen key-value pairs that need to survive across requests within the same agent. DO storage is the right size for this.
- Single-tenant scratch memory. The agent is doing a multi-step task and needs to remember intermediate results. The DO is the natural place. This is closer to working memory than long-term memory, but the primitive handles it.
The latency story is genuinely strong. For agents already running on Workers, getting memory in the same colo as the compute is hard to beat with any external service. Round-trip to a hosted memory API will always be slower.
What It Doesn't
Five things the current beta does not give you, in roughly the order they matter:
- No lineage or provenance. Memories are opaque key-value pairs. There is no first-class concept of where a memory came from, what it supersedes, or which session wrote it. You can encode this in your application layer by prefixing keys or storing metadata as part of the value, but the primitive does not help you. When the agent recalls something wrong, you cannot trace it.
- No cross-agent shared memory. Each DO is isolated. Two agents that need to share a fact have to coordinate through a separate KV namespace or a database, which is slower and less typed. Multi-agent systems built on this primitive end up with their own ad-hoc shared-memory layer.
- No typed eviction. The DO storage will grow until you delete things. There is no notion of session memory vs learned memory vs ephemeral memory; everything is one bucket. Long-running agents accumulate cruft and you have to manage cleanup yourself.
- No fact deduplication. Write "user prefers dark mode" three times in three sessions, get three entries. There is no normalization or merging at the storage layer. Retrieval quality degrades because the same fact shows up multiple times with different keys.
- Retrieval degrades past ~30 facts. Because list() returns matching keys with no ranking, and because there is no semantic recall built in, the agent has to pull everything and let the LLM sort it out. Past about 30 facts the prompt budget burn gets uncomfortable and the agent starts missing things.
The Three Hidden Limits
Three limits that are real and not loudly documented:
Concurrent-write throttling per DO. Durable Objects have a single-threaded mutation loop. A burst of writes to the same agent serializes. For a chatty multi-step agent that writes after every tool call, you can hit visible latency spikes if writes queue up. The fix is batching writes at logical checkpoints rather than writing on every event.
The 128MB DO memory ceiling. Durable Objects have a working-memory cap. The storage layer can hold more than 128MB on disk, but the in-process working set is bounded. For agents that materialize a large chunk of memory on each request, this is a real wall. You will hit it before you expect to if you are not careful about lazy loading.
The per-account namespace limit nobody talks about. Cloudflare enforces a cap on the number of DO namespaces per account. For a multi-tenant SaaS that spins up a DO per customer, this can become a constraint long before storage does. The limit is high enough that small teams will not hit it, but it is the kind of thing you want to know in advance.
When Cloudflare Agent Memory Is the Right Call
- Small POCs. You want a memory primitive that exists, costs nothing to start, and lives where your compute lives. CF Agent Memory is the lowest-friction option that fits this brief.
- Single-tenant ephemeral session memory. Per-session scratch memory that does not need to survive the session. The DO lifecycle matches the agent lifecycle.
- Latency-sensitive worker-adjacent workloads. The agent is already a Worker, the memory is in the same colo, you save the network hop. For chatty agents this matters.
- Workloads under 30 facts per agent. Below this scale, the lack of dedup and ranking does not bite. The flat KV model is enough.
When Self-Hosted Wins
Four workloads where the trade-off goes the other way:
- Multi-tenant SaaS. You need namespace isolation, audit per tenant, and the ability to migrate one tenant's data without touching another's. CF Agent Memory's per-DO model gets unwieldy at thousands of tenants. Structured memory layers with first-class namespacing fit better.
- Persistent cross-session memory above 1k facts. Long-running agents accumulating learned facts over weeks or months need typed storage with deduplication, ranking, and eviction. The CF primitive does not have these. Memnode does.
- Queryable lineage for debugging. When the agent acts on a stored memory and you need to know why, lineage matters. CF Agent Memory has no notion of provenance. Self-hosted memory layers that record source, author, and supersession give you the audit trail.
- EU data residency without cluster-region juggling. CF Workers can run in EU regions, and DO storage has location hints, but tying memory placement to legal jurisdiction is fiddly. Self-hosted memory in an EU-based deployment makes the residency story trivial.
Memnode is what we build for these workloads. The data plane is a Rust binary you can run locally or on a VPS. Memory is namespaced by default. Lineage is the load-bearing primitive: every entry knows where it came from, what it supersedes, and what depends on it. There is a hosted option if you want one, but the default deployment shape keeps the data on your infrastructure.
If you are building agent-driven live operations (game backends, ops automation, support agents), memory durability and inspectability matter even more than they do for a general-purpose chat agent. The game backend agent memory pattern walks through why the workload is different from chat-style memory and what that means for storage choices.
Migration Path From Beta
Cloudflare beta APIs have historically had at least one breaking change before GA. The current memory primitive has rough edges that will probably be addressed: no batch reads, no transactional updates across keys, no namespacing within a DO. Some of those changes will be additive. Some will not.
How to architect now so you survive GA:
- Wrap memory access in a thin interface (memory.set, memory.get, memory.list). Do not let DO-specific calls leak into agent logic.
- Treat keys as a domain concern, not a CF-specific concern. Prefix with intent ("pref.", "session.", "learned.") so you can migrate selectively.
- Build a small audit log alongside the DO storage. If lineage matters to your product, do not wait for CF to add it.
- Test against an MCP memory server in parallel. If the CF API changes, you have a fallback that already works against your application code.
The cost of doing this during beta is small. The cost of doing it after a breaking change is the same migration twice.
FAQ
What is Cloudflare Agent Memory and how does it work?
A beta feature inside the Cloudflare Agents SDK that ships a key-value memory primitive backed by Durable Objects. Each agent instance gets its own DO with up to 128MB of working memory, accessed through set/get/list calls. Designed for memory tied to a single agent session.
What are the limits on Cloudflare Agent Memory?
Three real limits: the 128MB Durable Object memory ceiling per instance, concurrent-write throttling per DO (single-threaded write loop), and a per-account namespace limit that is not loudly documented. Retrieval quality degrades past about 30 facts because the primitive does not deduplicate or rank.
When should I use Cloudflare Agent Memory vs self-hosted?
Use CF Agent Memory for small POCs, single-tenant ephemeral session memory, and latency-sensitive workloads on Cloudflare Workers. Use self-hosted memory for multi-tenant SaaS, persistent cross-session memory above 1k facts, queryable lineage, or EU data residency.
Does Cloudflare Agent Memory support lineage or provenance?
No. The current beta is a flat key-value store with no first-class concept of where a memory came from, what it supersedes, or which agent wrote it.
Will the Cloudflare Agent Memory API change at GA?
Almost certainly. Cloudflare beta APIs have historically had at least one breaking change before GA, and the current memory primitive has several rough edges that will probably be addressed. Architect now so the memory access layer is a thin wrapper you can swap.