CrewAI Memory: Four Stores Behind One Flag (and the Three Walls You Hit in Production)
memory=True turns on ChromaDB short-term + entity memory, SQLite long-term learnings, and contextual assembly. Where each store lives on disk, why containers wipe it (CREWAI_STORAGE_DIR), the recency/semantic scoring knobs, and the three production walls: no user scoping, no shared memory, no lineage.
CrewAI memory looks like one boolean. memory=True on the Crew and your agents "remember." Behind that flag sit four distinct stores with different backends, different lifetimes, and different failure modes, and most production CrewAI memory problems trace back to treating the flag as the whole story. Here is what the switch actually turns on, where each piece lives on disk, and when to replace the built-ins with an external provider.
The TL;DR: memory=True gives you a solid single-user, single-machine memory for free, short-term and entity memory in a local ChromaDB, long-term task learnings in SQLite, stitched together automatically before every task. It does not give you multi-user scoping, shared memory across machines, or any way to audit why an agent recalled what it recalled. Those are the three walls teams hit in production, and all three are external-memory problems, not configuration problems.
What memory=True actually enables
from crewai import Crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
memory=True,
)Four stores come online:
- Short-term memory: a ChromaDB collection queried by RAG. Holds what happened during the current execution, intermediate findings, tool outputs, so a later task can build on an earlier one without re-deriving it.
- Entity memory: also RAG over ChromaDB, but organized around the people, companies, and concepts the crew encountered. Ask the crew about a company it researched last task and this is the store that answers.
- Long-term memory: a SQLite database of task learnings that persists across executions. Run the same crew tomorrow and it carries lessons forward.
- Contextual memory: not a store, an orchestrator. Before each task runs, CrewAI queries the other stores, assembles the relevant slice, and injects it into the agent's context. You do not configure it; it is the thing the flag turns on.
Recent CrewAI releases present this through a unified Memory interface rather than four separately-configured classes, but the underlying anatomy, vector store for recall, SQLite for cross-run learnings, an assembly step before each task, is unchanged, and so are the operational consequences below.
Where your memory actually lives (and why it vanished)
The built-in stores write to a local app-data directory resolved per project. Three practical consequences that account for a large share of "CrewAI forgot everything" reports:
- Containers wipe it. The default path is inside the container filesystem. Every fresh deploy starts the crew amnesiac unless you mount a volume or set
CREWAI_STORAGE_DIRto a persistent path. - Two machines, two memories. ChromaDB and SQLite are local files. Scale your crew horizontally and each instance learns alone.
- The embedder must stay consistent. Short-term and entity recall are embedding lookups. Change the embedding provider between runs and old memories become unsearchable, not deleted, just unreachable, which is worse, because nothing errors.
Tuning recall: the scoring knobs exist
CrewAI's memory scoring weighs semantic similarity against recency and importance, and the weights are configurable: recency_weight, semantic_weight, importance_weight, and recency_half_life_days. The defaults are sane for short-lived crews. For long-running crews where last month's decision still matters, lengthen the half-life; for news-style workloads where stale context misleads, shorten it. Most teams never touch these, and then attribute "it recalled something irrelevant from three weeks ago" to the model rather than to a recency weight they could have set.
The three production walls
1. No user scoping
The built-in stores have no concept of whose memory a fact belongs to. One crew serving ten customers pools everything, which ranges from confusing (user B gets user A's context) to disqualifying (regulated data crossing tenant lines). External providers fix this with explicit scopes: Mem0's integration, for example, attaches every write to a user_id, agent_id, run_id, or org-level scope. Whatever provider you choose, per-user namespacing is the feature to verify first.
2. No shared memory across processes
Local files mean the crew on machine A cannot know what the crew on machine B learned. Any external memory provider with an API solves this by being a service instead of a file.
3. No lineage
When a crew confidently recalls something wrong, the built-ins give you no way to ask where that "fact" came from, which task wrote it, from which source, superseded by what. You can wipe the store and start over; you cannot audit it. This is the gap we keep writing about: recall quality is not just hit-rate, it is the ability to explain and correct what was recalled.
Operational habits that prevent the common failures
The configuration that survives production is short, and almost all of it addresses the disk-and-embedder facts above:
# Pin where memory lives BEFORE the first run
export CREWAI_STORAGE_DIR=/var/lib/crewai # mounted volume in containers
# Reset stores deliberately, not by deleting directories
crewai reset-memories --long # long-term SQLite only
crewai reset-memories --short # short-term ChromaDB
crewai reset-memories --all # everything- Set the storage dir on day one. Moving it later strands the memories accumulated at the old path; the crew does not error, it just gets quietly dumber.
- Use the reset commands, not
rm -rf. The CLI resets stores coherently; deleting files by hand can leave ChromaDB collections and SQLite learnings out of sync with each other. - Pin the embedder in config, not by default. An innocent provider upgrade that changes embedding dimensions orphans every existing memory. Treat the embedding model like a schema version, because that is what it is.
- Back up the storage dir if learnings matter. The long-term store is the crew's accumulated judgment about its own tasks. Teams snapshot Postgres nightly and never think about the SQLite file that holds weeks of task learnings.
What the built-ins cost (and what external memory adds)
- memory=True: free at the infrastructure level, but not at the token level. Contextual assembly injects recalled material into every task prompt, and entity/short-term writes run embedding calls. A busy crew's memory overhead is real model spend; if your per-run cost jumped when you flipped the flag, that is where it went.
- Mem0 hosted: per-operation pricing on writes and recalls; generous to start, meaningful at fleet scale.
- Self-hosted external memory (memnode or similar): a small VPS or a process next to the crew; the marginal cost of recall is disk, not API fees.
The decision map
- Single-user crew, one machine, non-sensitive data:
memory=True, setCREWAI_STORAGE_DIRto a persistent volume, pin your embedder. Done. - Long-running crews with drifting recall: same setup, tune the recency/semantic weights before blaming the model.
- Multiple users or tenants: external memory with per-user scoping (Mem0 integration or an HTTP-reachable memory service). The built-ins are the wrong shape, no configuration fixes that.
- Fleet of crews sharing knowledge: external memory as a service, treated as infrastructure with backups and access control.
Memory is not Knowledge (CrewAI has both, and people swap them)
CrewAI also ships a separate Knowledge feature, sources you attach so agents can consult documents, and it gets confused with memory constantly because both end up as embeddings the agent queries. The distinction that matters: Knowledge is what you tell the crew; memory is what the crew learns. Product docs, policy PDFs, and reference tables belong in Knowledge, where you control versions and updates. Task outcomes, discovered entities, and execution learnings belong in memory, where the crew writes them itself. The common mistake runs both directions: teams stuff reference material into conversations hoping memory retains it (it does, badly, as fragments), or treat the memory stores as a document base and wonder why retrieval feels arbitrary. If a human would look it up, make it Knowledge; if a human would remember it from experience, that is memory's job.
The shape memnode optimizes for
Memnode slots into CrewAI the same way any external provider does, over HTTP from a tool or a task callback, and the differences that matter are the three walls above: namespaces scope memory per user or per crew out of the box, every stored fact carries lineage you can audit when recall goes wrong, and the data plane runs local-first, on your VPS or your laptop, so crew memory containing customer data never leaves your infrastructure. The honest default for a solo crew remains the built-in flag; reach for an external layer when the second user, the second machine, or the first "why did it say that?" arrives.
The questions that come up every time
Does memory work with kickoff_for_each or flows? Yes, the stores do not care how the crew was invoked, but parallel kickoffs against local SQLite and ChromaDB share one set of files, so concurrent writes contend and per-run isolation does not exist. That is another doorway to external memory, not a bug to fix in config.
Can I read what the crew remembered? The stores are inspectable in the bluntest sense, open the SQLite file, query the Chroma collection, but there is no first-class "show me this crew's beliefs" surface. If auditability is a requirement rather than a curiosity, that is the lineage wall again.
Should agents or the crew own memory? The flag is crew-level and the stores are shared by design: the researcher's findings are supposed to reach the writer. Genuinely private per-agent memory means separate external namespaces, one per agent, which external providers make a one-line scoping decision.