Benchmark Framework · Market Position · Proof of Value
System-Recall is a self-hosted AI memory engine that persists context across every session, retrieves the right knowledge at the right moment, and continuously benchmarks and repairs itself — no human intervention required.
A fully observable AI memory platform — live dashboards, real-time benchmarking, knowledge graph visualization
Real-time system telemetry across every layer of the memory stack. At a glance: 12,149 stored memories, 53,945 graph relationships, 5 services (Qdrant, Neo4j, Redis, Postgres, Ollama) all green. Active LLM models, their sizes, quantization, and load state visible in one view.
The world's only memory system that ships with a reproducible benchmark suite you can run against your own data. +62% memory lift (dGCR), 92% accuracy, 0% stale entity rate — these aren't lab numbers, they're live measurements from this run (mb-20260224-181412). Every category broken down: Distractor, KG, Long Horizon, Preference, Procedural, SER Semantic, Temporal.
Deep analytics on the quality of the AI's memory over time. Feedback score 70.8% positive. Graph cohesion 0.862 (strong edge bonds). Importance distribution histogram showing how memories are weighted. Decay curve tracking how memories age. This is memory science visualized — not possible to get this insight from any competing product.
Interactive force-directed graph of the entire knowledge network — 50+ nodes, color-coded by layer (API Routes, Core Logic, Storage, Workers, Dashboard, Hooks). See which modules call which, how data flows between components, what hooks trigger what logic. This is the AI's memory architecture made visible. Filter by layer, zoom in on clusters, inspect relationships. No other AI memory product lets you see inside the machine.
The case for provable memory over claimed memory
Competitors make big claims based on shallow demonstrations. "My AI remembered something once" is not proof. System-Recall takes a different approach: a rigorous, reproducible benchmark suite — MemoryBench-6 — that measures memory performance the same way a laboratory measures drug efficacy. Define the test, run it, publish the results, let others verify.
captured_at + valid_from/valid_toEpisodic, semantic, procedural, and temporal — each with dedicated storage and retrieval behavior. No competitor implements all four.
Vector semantic + BM25 keyword + Neo4j graph traversal run simultaneously on every query. Most competitors use only semantic search.
10–50ms retrieval. Competitors calling LLMs at query time pay 500ms–3s per search and incur per-token costs at scale.
captured_at + valid_from/valid_to — answer "what did we know on January 15th?"
Usage-driven memory ranking, nightly consolidation, contradiction detection, and unsupervised graph connection discovery.
MemoryBench-6 acts as a control system. Performance drops trigger automatic diagnosis, patch generation, and validated deployment.
Every metric maps to a real user need. Every target is a measurable pass/fail gate.
50 scenarios across six distinct memory challenges — each proves a different capability
Tests whether memory correctly updates when facts change. A user moves cities, changes a preference, or revises an architecture decision — does the AI know the new fact without being told again?
Tests whether preferences stated once are honored indefinitely across sessions. Preferences are implicitly relevant to almost every interaction — a fundamentally different retrieval challenge than explicit factual queries.
Three-session arcs where early architectural decisions must influence later deployment tasks — even when middle sessions don't reference them. Tests cross-session retrieval triggering for topically distant queries.
Type A: irrelevant personal facts must not surface during technical queries. Type B: sensitive stored records must not be revealed by crafted prompts. Tests both signal-to-noise ratio and privacy boundaries.
Chains of 3–5 connected facts where the answer requires traversing graph relationships that semantic similarity cannot surface. "What infrastructure does our deployment process require?" — correct answer is 3 graph hops from the query.
Established workflows tested across sessions — debugging process, code review checklist, deployment runbook. The AI must recognize when a procedure applies from a trigger that may not name it, and replay all steps in order.
Every major AI memory system, feature by feature
| Feature | Mem0 | Zep | Letta | Cognee | LangChain | OpenAI | System-Recall |
|---|---|---|---|---|---|---|---|
| Funding | $24.5M Series A | $500K YC W24 | $10M seed | €7.5M seed | $260M total | N/A | Self-funded |
| Memory Types | Semantic only | Session + semantic | Episodic + semantic | Vector + graph | Buffer + vector | Semantic only | All 4 types ✦ |
| Knowledge Graph | $249/mo only | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ Included |
| Triple Hybrid Retrieval | ✗ | Partial | ✗ | Partial | ✗ | ✗ | ✓ Full |
| Zero LLM at Query Time | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ 10–50ms |
| Bi-Temporal Model | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Active Forgetting | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Self-Learning Engine | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Benchmark Self-Healing | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| 4-Tier Context Budget | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ Phase 2 |
| Temporal Arbitration | Limited | ✓ Graphiti | ✗ | ✗ | ✗ | ✗ | ✓ Supersedes chain |
| Nightly Decay + Reranking | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Hierarchical Compression | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ 4 levels |
| Predictive Context Loading | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ Phase 2 |
| Signal Detection (auto-capture) | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ Configurable |
| Code Pattern Memory (AST) | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ Phase 2 |
| Multi-Agent Shared State | Limited | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Multi-User Scoping | Limited | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ 3 levels |
| Self-Hosted First-Class | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ Primary |
| LLM Agnostic | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Entry Paid Price | $19/mo (no graph) | $25/mo | $20/user/mo | €8.50/1M tokens | $39/user/mo | N/A | $49/mo · 3 seats |
| Graph Memory Price | $249/mo | Included | N/A | Included | N/A | N/A | $49/mo (Starter+) |
What each capability actually does, why it matters, and why competitors don't have it.
Every competitor focuses primarily on semantic memory — storing facts and retrieving them by similarity. Recall implements four distinct types, each with dedicated storage and retrieval behavior:
Questions about processes, history, evolving facts, and connected entities all have dedicated memory types optimized for their retrieval pattern.
When a query arrives, Recall runs three independent retrieval methods simultaneously and merges the results:
No competitor runs all three. Most run only semantic search. This triple approach surfaces the right memory across all query types.
Every retrieval — session start injection, mid-session search, context assembly — uses only pre-computed embeddings and graph traversal. No LLM is called during retrieval. The result: query latency of 10–50ms.
Competitors that call an LLM to rerank, synthesize, or process memories at query time incur 500ms–3,000ms of latency on every single query. At scale, this is a meaningful UX difference and a significant cost difference — LLM calls are expensive; vector search is cheap.
LLM calls in Recall happen only at write time (fact extraction from transcripts) — never at read time.
Most memory systems store one timestamp: when the memory was saved. Recall stores two:
captured_at — when the memory was stored in the systemvalid_from / valid_to — when the fact was actually true in the real worldThis enables a class of queries no competitor can answer: "What did we know about this project on January 15th?" — "How has our understanding of the auth system evolved?" — "What changed between v1 and v2 architecture?" Critical for any use case involving evolving information: software projects, medical records, legal files, industrial systems.
Recall's Neo4j integration enables relational memory. Entities are linked. Relationships are traversable.
The CQS benchmark metric specifically tests this capability. It is one of the highest-value differentiators for enterprise customers with complex, interconnected data — and it's hard-paywalled at $249/mo in Mem0.
Recall runs a nightly consolidation job that does three things no other system does:
No competitor has a scheduled consolidation process. They accumulate. Recall prunes.
Recall tracks three events for every memory retrieval: retrieved (returned in search), cited (influenced a response), and ignored (returned but not used).
Memories with high citation rates are promoted. Memories repeatedly retrieved but ignored are candidates for demotion or deletion. This is a feedback loop between memory usefulness and memory visibility — without requiring any model retraining.
When new memories are stored, Recall runs a background scan for semantic conflicts with existing memories. If a contradiction is found (new memory says X, existing memory says not-X), it is flagged and stored in a dedicated contradictions table with status, detected timestamp, and resolution options.
Resolutions can be automatic (newer fact wins) or human-reviewed (when both facts may be valid in different contexts). This prevents the memory store from silently accumulating conflicting information — one of the most common failure modes in long-running AI systems.
Recall does not wait for you to manually save memories. A configurable set of signal keywords — remember, decided, architecture, important, don't forget, note to self, bug fix, breaking change — trigger immediate memory capture mid-session.
When a session ends, the full transcript is automatically processed and relevant facts are extracted. This is the difference between a tool you have to maintain and infrastructure that maintains itself.
Phase 2 replaces simple injection with a tiered budget system allocating 8,000 tokens across four priority levels that rebalance dynamically:
| Tier | Budget | Contents |
|---|---|---|
| Critical (T1) | 2,000 | Active decisions, unresolved contradictions, recent errors |
| Relevant (T2) | 3,000 | Predicted context based on current activity, related decisions |
| Background (T3) | 2,000 | Project conventions (compressed), workflow hints |
| Index (T4) | 1,000 | Pointers to retrievable deep context ("ask me about: auth...") |
Debugging expands T1 and shrinks T3. New feature work expands T2. The most relevant memory always occupies the most prominent position.
Every memory exists at four compression levels simultaneously. Recall selects the appropriate level based on tier and budget:
| Level | Example | Tokens |
|---|---|---|
| Full narrative | "The team decided to replace JWT with session auth after discovering token size issues with our edge proxy. Fix took 3 days and changed 12 files." | ~60 |
| Summary | "Replaced JWT with sessions due to token size issues" | ~10 |
| Compressed | auth: JWT→sessions (size) | ~5 |
| Index | auth_refactor | ~1 |
Deeply relevant memories get full detail. Background context gets index pointers. Maximum information density within the context budget.
The Predictive Context Engine pre-loads memory before it is needed, based on activity signals:
| Signal | Prediction | Action |
|---|---|---|
| File opened | Related memories for this module | Pre-load to T2 |
| Branch checkout | Feature context for this branch | Load decisions, patterns |
| Error in console | Similar errors and their fixes | Load fix patterns to T1 |
| Collaborator joined | Their preferences and constraints | Load to T2 |
Context is warm before the first query — not assembled after the first miss.
Recall does not just remember facts — it remembers how things are done in this specific project. Code patterns are stored as AST-aware templates: structured YAML with placeholders that can be applied when creating new endpoints, components, or configurations.
Workflows are stored as step sequences that can be replayed: debug: reproduce → capture logs → minimize → patch → rerun suite. This is procedural memory operationalized — not just "we wrote about a process" but "here is the process, structured so the AI can apply it."
Recall works with any LLM endpoint that supports OpenAI-compatible APIs: local Ollama (qwen3:14b, any open model), OpenAI, Anthropic, or any compatible provider. The embedding model is similarly pluggable. No dependency on a specific provider.
Significant for enterprise customers with existing LLM contracts, air-gapped environments, or specific model requirements. Also significant for self-hosted deployments where local models are the only acceptable option.
Memory in Recall is scoped at three levels:
| Scope | Contents | Example |
|---|---|---|
user-private | Individual preferences | "I prefer explicit error handling over try-catch" |
project-shared | Team decisions | "We are using PostgreSQL, not MySQL — final decision" |
system | Infrastructure knowledge | "API runs on port 8200, deployed to 192.168.50.19" |
Multiple developers share architectural decisions while keeping personal workflow preferences private. Team-level memory, not just individual memory.
Every competitor except Letta is a SaaS product — your memory data lives on their servers. For enterprise customers handling sensitive information (source code, customer data, internal strategy, personnel information), this is often a showstopper.
Recall is designed from the ground up to run on your own infrastructure. Docker Compose deployment, Proxmox LXC support, and a full self-hosted stack are the primary deployment model — not an afterthought. As data privacy regulation tightens globally (GDPR, HIPAA, CCPA, emerging AI regulation), self-hosted becomes increasingly non-negotiable for regulated industries.
Recall's relationship with MemoryBench-6 is bidirectional. The benchmark results feed back into the system: when a category of tests shows declining performance, the tuner adjusts retrieval weights, decay parameters, injection thresholds, and conflict resolution policies.
The benchmark is not just a quality gate — it is a closed-loop feedback system for continuous, measurable improvement without requiring manual intervention.
Where persistent AI memory changes everything — and where data sovereignty is non-negotiable.
System-Recall is infrastructure — the same way a database stores structured records, Recall stores the context that makes AI useful across sessions, users, and deployments. Any system that communicates with an AI model is a potential Recall integration. The differentiator is data sovereignty: in regulated, sensitive, or competitive environments, the memory layer cannot be hosted by a third party.
AI assistants that retain patient history, medication records, care plans, and clinical notes across visits — without any PHI leaving the hospital network. Enables continuity-of-care AI that actually remembers context from three months ago.
Classified briefing assistants, intelligence analysis tools, and inter-agency collaboration AI that require air-gapped or on-prem deployment. Session memory, document context, and personnel data cannot touch commercial cloud infrastructure.
AI assistants retaining case history, client communications, deposition summaries, and research memos across matters and associates. Attorney-client privilege makes any cloud memory layer a disqualifying conflict risk.
Portfolio AI that remembers client risk tolerance, past trade rationale, regulatory notes, and suitability assessments. Enables AI-augmented advisory that passes compliance audits — impossible with ephemeral or cloud-hosted sessions.
AI that accumulates lab notebook context, unpublished findings, grant proposal drafts, and literature synthesis over years — not just one session. Pre-publication IP cannot be entrusted to commercial memory infrastructure.
Operational AI that remembers equipment history, maintenance logs, process parameters, and failure patterns across shifts and facilities. Plant floor networks are commonly air-gapped — cloud memory is architecturally impossible.
Tutoring and coaching AI that tracks student progress, learning gaps, past struggles, and growth over an entire academic career — not just the current session. Student records under FERPA cannot be processed by third-party commercial AI memory.
AI coding assistants that retain architectural decisions, codebase patterns, API contracts, and review history across months of development — this is System-Recall's own origin use case. Proprietary source code, internal APIs, and unreleased product specs cannot route through commercial AI infrastructure.
CRM-layer AI that builds persistent memory around customer relationships — deal history, stakeholder preferences, friction points, escalation patterns, and success metrics — enabling AI reps and CSMs that truly know each account.
AI operators for power grids, water systems, pipeline control, and transportation hubs that accumulate operational history, anomaly patterns, and incident runbooks. Connectivity to external infrastructure is a regulatory and security prohibition.
| Industry | Key Regulation / Driver | Why Cloud Memory Fails | Self-Host Req |
|---|---|---|---|
| Healthcare | HIPAA / HITECH — PHI cannot leave covered entity | Cloud BAA does not cover AI memory indexing pipelines | Mandatory |
| Government / Defense | CMMC L2/L3 — CUI must stay in authorized enclave | Commercial cloud infrastructure not IL4+ authorized | Mandatory |
| Legal | Attorney-client privilege — third-party waiver risk | Sending privileged comms to cloud AI = potential waiver | Mandatory |
| Financial Services | SEC 17a-4 — record retention in auditable system | Cloud memory logs not compliant with WORM requirements | Strongly Advised |
| Research / Academia | Export Control (EAR/ITAR) — research data sovereignty | Unpublished findings in cloud = IP exposure risk | Strongly Advised |
| Manufacturing | ITAR / Trade Secrets — process IP protection | OT networks are architecturally air-gapped | Mandatory |
| Education | FERPA — student records cannot leave institution | Student data in commercial AI memory not FERPA-compliant | Strongly Advised |
| Critical Infrastructure | NERC-CIP / TSA — operational data cannot leave network | Internet connectivity itself may be prohibited for BES cyber | Mandatory |
| Enterprise Dev / Sales | SOC 2 / ISO 27001 — source code & customer data controls | Commercial memory trains on your proprietary context | Strongly Advised |
Recall is the memory backbone for tools like Claude Code. It stores architectural decisions, debugging insights, codebase patterns, and cross-session learnings. Without Recall, every session starts from zero — with it, the assistant compounds knowledge over months.
In orchestrator + worker agent architectures, Recall acts as the shared knowledge bus — agents store intermediate results, plans, and discoveries that other agents retrieve later. Enables true coordination across agent lifecycles.
Voice assistants are stateless by nature — each utterance is a fresh API call. Recall injects episodic memory so the assistant knows who you are, what you discussed last week, and what your preferences are. The "Sadie" family assistant is built on this pattern.
Support bots that remember a customer's full interaction history, product configuration, and past resolutions — not just the current ticket. Reduces escalations and eliminates "explain your issue again" experiences.
Agents that accumulate a persistent literature base — papers read, arguments synthesized, hypotheses explored, and evidence ranked. The assistant's knowledge compounds across research sessions rather than expiring with context windows.
Long-running agents that execute multi-step plans across hours or days use Recall for state persistence — checkpointing progress, storing intermediate artifacts, and surfacing prior context when resuming interrupted tasks. Enables reliable handoffs across sessions.
Your AI's memory is stored on your infrastructure. No third party indexes, trains on, or has access to your context. This is the only architecture that passes legal review in regulated industries.
Every memory write, retrieval, and injection is logged and queryable. You can inspect exactly what your AI remembered during any session. Required for SOC 2, HIPAA, and SEC compliance.
Memory stored in commercial platforms (ChatGPT memory, Claude projects, Mem0 cloud) is owned by that vendor. Self-hosted Recall means your organizational memory is a portable, owned asset — not a subscription dependency.
Recall works with any LLM — Claude, GPT-4o, Mistral, Llama, Gemini. When you switch models (or when models are deprecated), your memory travels with you. No re-onboarding, no context loss.
Deployable in fully disconnected environments. Ollama provides local embeddings and inference; the entire stack runs offline. Unique among memory systems — no cloud services required at any layer.
When employees leave, institutional knowledge walks out with them. Recall captures it passively as AI interactions happen — converting transient expertise into a searchable, retrievable organizational asset.
Five components that make Recall get better the longer it runs — automatically
Records retrieved, cited, and ignored events for every memory. Like a library tracking which books get read vs. browsed — user behavior does the optimization work.
Every night: promotes useful memories, forgets stale ones, removes superseded facts, discovers hidden connections in the knowledge graph — unsupervised.
Finds logical conflicts between memories. LLM call made exactly once per conflict and cached — cost doesn't grow with memory store size.
Pre-loads relevant memory before the first query by watching file opens, branch checkouts, and error signals. Context is warm before the question is asked.
Runs MemoryBench-6 on schedule. Diagnoses performance drops. Proposes config patches. Validates against holdout set. Auto-applies if it passes guardrails.
Score = (ΔGCR × 5) − (CPI × 10) − (SER × 8) − (avg_tokens / 1000 × 1)
Correctness rewarded (×5). Pollution penalized (×10). Superseding errors penalized hardest (×8). Token efficiency lightly weighted (×1) to prevent accuracy sacrifice.
qwen3-embedding:0.6b
Every stored memory is converted to a 1024-dimensional semantic vector by a locally-run embedding model via Ollama. No external API calls. Powers the vector similarity search in Qdrant.
qwen3:14b
Handles entity extraction, contradiction resolution, summarization, and importance scoring — entirely offline. No OpenAI or Claude calls for memory operations. Runs on the RTX 3090 in the local lab.
gradient-free
Uses benchmark delta (ΔGCR) as the objective function. Applies coordinate descent over decay_rate, importance_threshold, and reranking_weight — no labeled training data required. Each tuning cycle completes in ~12 minutes, unattended.
Live benchmark telemetry · Independent LLM judge · 100-user concurrency stress test · Market-ready validation
| Capability | Recall | mem0 | Zep | Cognee | Letta |
|---|---|---|---|---|---|
| Reproducible benchmark suite | ✓ | — | — | — | — |
| Published accuracy lift (dGCR +58%) | +58% | — | — | — | — |
| Self-healing automation loop | ✓ | — | — | — | — |
| Observability dashboard (live) | ✓ | — | partial | — | — |
| Zero stale-fact rate (SER = 0%) | ✓ | — | partial | — | — |
| Concurrency stress test published | 100 users | — | — | — | — |
| Knowledge graph visualization | ✓ | — | basic | basic | — |
| Self-hosted / on-prem option | ✓ | cloud only | cloud only | OSS | cloud only |
Year 1–5 · Conservative · Comparable-based · Market-validated
| Tier | Price | Target Customer | Included |
|---|---|---|---|
| Developer (Free) | $0 | Hobbyists, early adopters, open-source contributors | Self-hosted, community support, 500K memory objects |
| Pro | $49/mo per workspace | Individual developers, freelancers | Self-hosted or hosted, 2M memory objects, email support |
| Team | $149/mo | Small teams (up to 10 users) | Shared memory state, 5M objects, multi-agent, priority support |
| Enterprise | $500–$2,000/mo | Companies, AI product teams | Custom deployment, SLA, dedicated support, audit logging, SSO |
| API Usage | $0.001/operation | High-volume API users | Above free tier limits |
LangChain ($260M total · $1.25B valuation · $16M ARR) at 78x revenue multiple sets the valuation ceiling for AI infrastructure at scale.
Context: Public beta launched. Benchmark results published. Initial developer community forming around the GitHub repository. No sales team — growth is entirely organic.
Key Milestone: Publishing MemoryBench-6 results. No other memory vendor has done this. Being first to publish a rigorous, reproducible memory benchmark positions Recall as the technically credible option in a market full of claims without evidence.
Why achievable: Mem0 reportedly reached $1M+ ARR within 12 months of launch — with fewer features and no self-hosted tier. Targeting ~$800K in Year 1 is deliberately below Mem0's Y1 milestone, accounting for Recall's smaller initial marketing footprint and community-first strategy.
Context: Community is established. Benchmark comparisons against competitors published by third parties. First enterprise integration partners confirmed. Paid marketing begins.
Key Milestone: First major integration partner — an AI framework, developer tools company, or enterprise software vendor that embeds Recall as a memory layer. A signed integration agreement is the most important commercial signal in Year 2. Mem0's AWS Strands partnership (default memory for Amazon's agent framework) is the aspiration — a single platform partnership can multiply ARR in one quarter.
Why achievable: AI developer tool adoption can spike rapidly when benchmark proof exists. The transition from "interesting project" to "tool I use in production" often happens at the community level after the first widely-shared technical writeup. Zep reported $1M ARR with 5 people on a narrower product.
Context: Active enterprise sales motion. Recognized by industry analysts. Product expanding into vertical use cases (legal AI, healthcare AI, industrial IoT). Integration partner ecosystem growing.
Key Milestone: First Gartner or Forrester mention in an AI agent infrastructure or AI memory landscape report. Being named in an analyst report at this stage validates the market position and accelerates enterprise sales cycles significantly.
Why achievable: At ~3.3x growth from Year 2, $11.5M ARR requires approximately 0.6% of the addressable developer AI infrastructure market. Cognee's €7.5M seed (Feb 2026) confirms the market is attracting sustained investment — not consolidating around incumbents.
Context: Recall is a recognized market leader in the AI memory infrastructure category. Enterprise is a significant revenue driver. Potential acquisition interest from larger platform vendors.
Key Milestone: Potential Series A or strategic acquisition conversation. At $30M+ ARR with strong growth, Recall would attract attention from AI platform companies (Anthropic, OpenAI, Google DeepMind, major cloud providers). LangChain's $1.25B valuation at $16M ARR — a 78x revenue multiple — establishes what the market pays for AI infrastructure at this stage.
Why achievable: AI developer tool market growing 40%+ annually. Enterprise AI spending increasing 45%+ YoY. The self-hosted model becomes increasingly non-negotiable for regulated industries as privacy regulation tightens globally.
Context: Vertical market expansion into healthcare AI, industrial IoT, defense/government, and edge computing. Embedded licensing model for OEM partnerships. Possible IPO readiness.
Key Milestone: Expansion into edge/IoT and government verticals — the move no current memory competitor is making. An industrial robot that remembers its calibration history, a healthcare AI that remembers patient interaction patterns across sessions, a defense system that accumulates procedural knowledge from field operations — all use cases where Recall's self-hosted, multi-type architecture is the only credible solution.
Why achievable: At $79M ARR applying LangChain's 78x multiple implies a valuation above $6B. These projections are the conservative floor — not the upside case — with no viral moment, no platform partnership, and linear growth modeled.
Lead investor: Amazon's Alexa AI division. Exclusive AWS Strands partnership as default memory for Amazon's agent framework. Reportedly $1M+ ARR in first 12 months with fewer features and no self-hosted tier. Recall's Y1 target of ~$800K is deliberately below this, accounting for smaller initial footprint.
Tiny raise that underscores capital efficiency in this category. Reportedly hit $1M ARR with 5 people. Session-focused, narrower feature set, no procedural memory. Recall's architecture is demonstrably deeper — comparable or higher pricing per customer is justified.
Brand-new entrant as of February 2026, confirming the market is attracting sustained investment and not consolidating around incumbents. Charges €1,970/month for on-premises enterprise deployments — establishing enterprise willingness-to-pay that validates Recall's Enterprise tier pricing.
78x revenue multiple on $16M ARR. The flagship AI infrastructure company proves developer tooling built for the AI agent era can reach billion-dollar valuations on community-first adoption. Recall's memory infrastructure layer is exactly what LangChain does not own. At Recall's Year 5 ARR (~$79M), the same multiple implies $6B+ valuation.
How hardware repricing and SaaS adoption velocity compound. Toggle between scenarios to see the full range of outcomes.
Turn-key appliances for every deployment scale — from hobbyist SBC to enterprise AI cluster.
Recall buys best-pick hardware at wholesale, pre-installs and configures the full stack, ships as a ready-to-run appliance. One-time hardware revenue + recurring software subscription attached to each unit.
| Tier | Product (Best Pick) | COGS | Retail | Net/Unit | Net Margin | SW Renewal/yr |
|---|---|---|---|---|---|---|
| SBC | Orange Pi 5 Plus 16GB | $243 | $499 | $216 | 43% | $499 |
| Mini PC | Beelink SER6 Pro 32GB | $370 | $799 | $366 | 46% | $999 |
| Homelab | NVIDIA DGX Spark | $2,850 | $4,299 | $1,109 | 26% | $2,999 |
| Enterprise | Dell PowerEdge R750xa + L40S | $32,000 | $52,000 | $18,960 | 36% | $9,600 |
COGS includes: wholesale board cost, NVMe SSD, enclosure, labor (flash + QA + packaging), fulfillment. Net margin after returns (3%), payment processing (2.9%), and warranty reserve.
NRE: ~$75K (carrier board path, 6–9 month timeline). Same $499 retail price as the reseller SBC — the delta is pure margin captured by eliminating the third-party board vendor markup.
| Volume | Reseller COGS | Prop. COGS | Reseller Net | Prop. Net | Extra / Unit |
|---|---|---|---|---|---|
| 500 | $243 | $210 | $216 | $249 | +$33 |
| 1,000 | $243 | $175 | $216 | $284 | +$68 |
| 2,500 | $243 | $152 | $216 | $307 | +$91 |
| 5,000 | $243 | $132 | $216 | $327 | +$111 |
| 10,000 | $243 | $112 | $216 | $347 | +$131 |
Strategy: Start with reseller model (zero NRE, ships in weeks). Invest in proprietary carrier board after first 500 units validate demand. By Y3 the proprietary board has paid for itself 5× over and margins structurally exceed reseller by 23 points.
When you run the SaaS, you absorb inference costs. Model choice is the biggest variable in your actual COGS — it can be $0.20/user/mo or $7.50/user/mo depending on the tier.
Strategy: ship with qwen3:14B as the standard model across all plans — it delivers excellent quality at near-zero per-user cost. Offer GPT-4o and Claude Sonnet as premium LLM add-ons (+$8/seat/mo) for teams that need them. This preserves 90%+ gross margins at scale while giving enterprise customers full model choice.
The next phase of Recall — in planning, research, and active development.
This section is a placeholder — fill in upcoming features as the roadmap is defined.
Status tags: Planned · Researching · Scoped · In Dev
Three moats. One mission. Measurable results.
"When all 19 metrics pass their targets across all 50 scenarios, we will have proven something that no other AI memory vendor has demonstrated: that our system reliably does what it claims to do, in measurable, reproducible terms. The benchmark is not the finish line. It is the starting gun."
The foundation is live. These capabilities are in active development — each one extending Recall's intelligence further.
Nightly scraper across Simon Willison, HN, Anthropic blog, FastAPI releases — filtered by your exact stack. Claude starts sessions already aware of ecosystem changes.
Queries your own memories tagged shortcut, workaround, hack — then runs targeted threat modeling against them. Your codebase attacked by the entity that knows it best.
`recall_search(query="...", as_of="2025-11-01")` — reconstruct the mental model that led to any past decision. Temporal debugging. Postmortem reconstruction.
Decayed memories don't vanish — they move to SQLite cold storage with FTS5 search. Browse, restore, or permanently delete. Full lifecycle control, no silent data loss.
Redis pub/sub bridges casaclaude and proxyclaude. Start a session on one machine — immediately know what the other worked on this morning. True multi-machine continuity with zero manual sync.
recall_archaeology("auth.py") surfaces every decision ever made about a file — what was tried, what failed, why the weird pattern exists. No more mystery code.