v1.0  ·  2026-02-24  ·  Internal — Dev Team + Stakeholders
RECALL

Validation Report

Benchmark Framework  ·  Market Position  ·  Proof of Value

19 Metrics
50 Scenarios
6 Test Categories
5 Self-Learning Components
~$79M Year 5 ARR Target
The Memory Layer for AI

Your AI finally remembers.

Learns. Heals. Improves.

System-Recall is a self-hosted AI memory engine that persists context across every session, retrieves the right knowledge at the right moment, and continuously benchmarks and repairs itself — no human intervention required.

Store

Persistent Memory

Persistent memory across every conversation, user, and agent. Vector + graph + relational — three storage layers working in sync.

12,149 memories stored
Retrieve

Intelligent Retrieval

Semantic search + knowledge graph traversal + LLM re-ranking. Delivers the right memory at the right moment.

+58% context improvement proven
Optimize

Self-Healing Engine

Self-benchmarks with MemBench. Detects drift. Heals corruption. Tunes parameters. No human intervention required.

0% error rate
Data Flow Pipeline
observe_edit
hook
/store API
Memory
Engine
Qdrant + Neo4j
+ Redis
/retrieve
API
AI +
Context
Graph
27 / 53,945
Nodes / Relationships
Speed
85.7ms
P50 Latency
Quality
4.58/5
LLM Judge Score

The App

A fully observable AI memory platform — live dashboards, real-time benchmarking, knowledge graph visualization

No Competitor Has Built This
mem0, Zep, Cognee, Letta — every other AI memory product is a black box. You put memories in, you get answers out. Zero visibility into what the system knows, how healthy it is, or whether memory is actually helping. Recall ships a full observability platform alongside the memory engine. This isn't a dashboard bolted on after — it's how we built the system from day one.
mem0 — no dashboard
Zep — no benchmark tool
Cognee — no health analytics
Recall — all of it ✓
System Dashboard
LIVE

Real-time system telemetry across every layer of the memory stack. At a glance: 12,149 stored memories, 53,945 graph relationships, 5 services (Qdrant, Neo4j, Redis, Postgres, Ollama) all green. Active LLM models, their sizes, quantization, and load state visible in one view.

Enterprise compliance teams can audit exactly what the AI knows
Ops teams can monitor memory growth and capacity in real-time
No competitor can show you this screen
Recall Dashboard — live system telemetry
MemoryBench — Built-In Benchmark Suite
UNIQUE

The world's only memory system that ships with a reproducible benchmark suite you can run against your own data. +62% memory lift (dGCR), 92% accuracy, 0% stale entity rate — these aren't lab numbers, they're live measurements from this run (mb-20260224-181412). Every category broken down: Distractor, KG, Long Horizon, Preference, Procedural, SER Semantic, Temporal.

Carriers can verify memory quality before deployment to customers
Developers can track whether updates improved or regressed accuracy
Automated via self-tuning loop — benchmarks run and act on themselves
MemoryBench — reproducible benchmark suite with live results
System Health — Memory Quality Analytics
UNIQUE

Deep analytics on the quality of the AI's memory over time. Feedback score 70.8% positive. Graph cohesion 0.862 (strong edge bonds). Importance distribution histogram showing how memories are weighted. Decay curve tracking how memories age. This is memory science visualized — not possible to get this insight from any competing product.

Spot degradation before it affects response quality
Tune decay parameters from data — not guesswork
Graph cohesion metric unique to Recall — shows knowledge integrity
System Health — memory quality analytics with decay curves and distribution charts
Knowledge Graph — See How the AI Thinks
NEXT LEVEL

Interactive force-directed graph of the entire knowledge network — 50+ nodes, color-coded by layer (API Routes, Core Logic, Storage, Workers, Dashboard, Hooks). See which modules call which, how data flows between components, what hooks trigger what logic. This is the AI's memory architecture made visible. Filter by layer, zoom in on clusters, inspect relationships. No other AI memory product lets you see inside the machine.

Enterprise security teams can audit data flow paths
Developers can understand memory architecture at a glance
Demonstrates the depth and maturity of the system
LIVE · KNOWLEDGE GRAPH
53,945 relationships · 12,149 memories · 6 layers
api core storage worker dash hook
Why Observability Is a Moat — Not a Feature
Enterprise Requirement
Enterprise buyers will not deploy AI memory without auditability. GDPR, SOC 2, HIPAA — all require knowing what data is stored and how it's used. Our dashboard is the compliance answer. Competitors can't enter enterprise without building this first.
Self-Improving Loop
MemBench isn't just display — it drives the self-tuning loop. The app watches its own benchmarks, detects regressions, proposes fixes, and verifies them. The dashboard IS the automation interface. No other product has closed this loop.
Carrier Differentiation
Any carrier who bundles Recall can offer their customers a management dashboard. This is a premium upsell: "Memory + Analytics" vs bare API. Competitors offer an API. We offer an experience — and the analytics to prove it's working.
01

Why We Measure

The case for provable memory over claimed memory

The core argument: AI agents are becoming the dominant interface for software. Every AI agent needs memory. The market for AI memory infrastructure is growing faster than any single vendor can capture. System-Recall is the only system combining all major capabilities — four memory types, knowledge graph traversal, zero LLM overhead at query time, self-learning, and a closed-loop self-healing tuner — in a single self-hosted stack.

Competitors make big claims based on shallow demonstrations. "My AI remembered something once" is not proof. System-Recall takes a different approach: a rigorous, reproducible benchmark suite — MemoryBench-6 — that measures memory performance the same way a laboratory measures drug efficacy. Define the test, run it, publish the results, let others verify.

Recall is the only self-hosted stack with all of the following
→ Four memory types: episodic, semantic, procedural, temporal
→ Triple hybrid retrieval: vector + keyword + knowledge graph
→ Zero LLM calls at query time — 10–50ms retrieval
→ Bi-temporal model: captured_at + valid_from/valid_to
→ Neo4j knowledge graph backend for relational queries
→ Nightly decay, reranking, and active forgetting
→ Usage-based memory ranking — citation feedback loop
→ Contradiction detection with cached LLM judgment
→ Signal detection + session hooks (auto-capture, no manual saves)
→ 4-tier context budget architecture with dynamic rebalancing (Phase 2)
→ Hierarchical compression — 4 detail levels per memory
→ Predictive context loading before first query (Phase 2)
→ AST-aware code pattern and workflow memory (Phase 2)
→ LLM agnostic — Ollama, OpenAI, Anthropic, any endpoint
→ Multi-user scoping: private / project-shared / system tiers
→ Multi-agent shared memory state
→ Self-hosted, privacy-first — your data never leaves your network
→ Benchmark-driven self-healing tuner — closed-loop feedback

Four Memory Types

Episodic, semantic, procedural, and temporal — each with dedicated storage and retrieval behavior. No competitor implements all four.

Triple Hybrid Retrieval

Vector semantic + BM25 keyword + Neo4j graph traversal run simultaneously on every query. Most competitors use only semantic search.

Zero LLM at Query Time

10–50ms retrieval. Competitors calling LLMs at query time pay 500ms–3s per search and incur per-token costs at scale.

Bi-Temporal Model

captured_at + valid_from/valid_to — answer "what did we know on January 15th?"

Self-Learning Engine

Usage-driven memory ranking, nightly consolidation, contradiction detection, and unsupervised graph connection discovery.

Benchmark-Driven Self-Healing

MemoryBench-6 acts as a control system. Performance drops trigger automatic diagnosis, patch generation, and validated deployment.

02

19 Benchmark Metrics

Every metric maps to a real user need. Every target is a measurable pass/fail gate.

These are not vanity metrics. Each was selected because it can be gamed in isolation but is hard to manipulate when all 19 are tracked simultaneously. A system that passes all 19 across all 50 scenarios has genuinely good memory — not a tuned demo.
AccuracyGCR
Goal Completion Rate
% of 50 benchmark scenarios answered correctly with memory enabled. The primary quality signal — everything else explains why this number is what it is.
≥ 90%
AccuracyΔGCR
Delta GCR — Memory Lift
How much better the AI performs WITH memory vs. WITHOUT: GCR(on) − GCR(off). Isolates Recall's contribution from the model's baseline intelligence.
≥ +15 percentage points
AccuracyHit@K
Memory Hit at K
Is the ground-truth memory present in the top-K retrieved results? Tested at K=5, K=10, and K=20.
≥ 80% @K=5 · ≥ 92% @K=10
AccuracyMRR
Mean Reciprocal Rank
Average rank position of the correct memory. First place = 1.0; fifth place = 0.2. MRR rewards putting the right answer at the top, not just in the results.
≥ 0.75
RetentionLHRS
Long-Horizon Retention Success
Can the AI use information from 3+ sessions ago to answer correctly today? Tests the "weeks later" scenario without reminders.
≥ 85%
RetentionCSR
Constraint Satisfaction Rate
Of all rules and preferences the user has ever stated, what % does the AI actually honor in its responses? The "AI that knows me" metric.
≥ 88%
RetentionLCS
Learning Curve Slope
As Recall accumulates more interactions, does it get measurably better without retraining? Positive slope = self-learning confirmed.
Positive slope across intervals
QualityPrec.
Injection Precision
Of all injected memories, what fraction were genuinely relevant? Low precision = context pollution from irrelevant injections.
≥ 80%
QualityRec.
Injection Recall
Of all memories that were needed, what fraction were actually retrieved and injected? Low recall = AI missing context it required.
≥ 85%
QualityCPI
Context Pollution Index
% of scenarios where injecting memory makes the AI perform WORSE than no memory. The canary-in-the-coalmine retrieval quality metric.
< 2%
QualitySER
Staleness Error Rate
% of cases where an outdated memory is used when a newer, correct version exists. Tests whether temporal arbitration is working.
< 1%
EfficiencyITS
Injected Tokens per Success
On correctly answered scenarios, how many memory tokens were injected on average? Lower = more efficient. Should trend downward as the system tunes.
Downward trend required
EfficiencyMUPT
Marginal Utility Per Token
ΔGCR ÷ avg injected tokens per turn. How much correctness improvement per injected token. The efficiency ratio — high MUPT = every token is earning its place.
Positive; improving vs baseline
EfficiencyP50/P95
Retrieval Latency
End-to-end time: query submitted → memory injected into context. Includes vector search, graph traversal, reranking, and context assembly.
P50 < 600ms · P95 < 1,500ms
TemporalTGA
Temporal Grounding Accuracy
When a fact has changed, does the system use the NEW fact rather than the old one? Tests the superseding chain — newer record must win.
≥ 97%
RelationalRRA
Relational Recall Accuracy
Can Recall answer by following a chain of connections? Tests multi-hop graph traversal that semantic similarity alone cannot surface.
≥ 80%
RelationalCQS
Compositional Query Success
Can Recall combine 2–4 separate memory nodes into one coherent answer? The advanced KG differentiator — pure vector systems fundamentally cannot pass this.
≥ 75%
ProceduralPFS
Procedure Fidelity Score
When a user's established workflow is needed, are all steps recalled in the correct order? Step completeness × ordering penalty.
≥ 85%
SystemRS
Reproducibility Score
Run the same 50 scenarios twice — do you get the same results? Benchmark validity proof. Below 95% means system behavior is inconsistent.
≥ 0.95
03

6 Test Categories

50 scenarios across six distinct memory challenges — each proves a different capability

01

Temporal Consistency & Fact Updates

10 Scenarios
GCR · TGA · SER

Tests whether memory correctly updates when facts change. A user moves cities, changes a preference, or revises an architecture decision — does the AI know the new fact without being told again?

Business stake: Prevents advice based on outdated information — the most common trust-destroying failure in production AI systems.
02

User Preference Persistence

8 Scenarios
LHRS · CSR · GCR

Tests whether preferences stated once are honored indefinitely across sessions. Preferences are implicitly relevant to almost every interaction — a fundamentally different retrieval challenge than explicit factual queries.

Business stake: The foundation of the "AI that knows me" experience. Ignored preferences signal that the user's time was wasted establishing them.
03

Long-Horizon Planning Continuity

8 Scenarios
LHRS · CSR · GCR

Three-session arcs where early architectural decisions must influence later deployment tasks — even when middle sessions don't reference them. Tests cross-session retrieval triggering for topically distant queries.

Business stake: Proves Recall can serve as genuine project memory. An AI that forgets decisions by mid-project is less useful than a text file.
04

Distractor Robustness & Context Pollution

8 Scenarios
CPI · Injection Precision · GCR

Type A: irrelevant personal facts must not surface during technical queries. Type B: sensitive stored records must not be revealed by crafted prompts. Tests both signal-to-noise ratio and privacy boundaries.

Business stake: CPI above 5% is product-killing. Type B scenarios are a baseline security audit for stored credentials and private data.
05

Relational & Knowledge Graph Queries

8 Scenarios
RRA · CQS · GCR

Chains of 3–5 connected facts where the answer requires traversing graph relationships that semantic similarity cannot surface. "What infrastructure does our deployment process require?" — correct answer is 3 graph hops from the query.

Business stake: The category pure vector memory systems fundamentally cannot pass. This is the proof of value for the Neo4j layer.
06

Procedural Replay & Workflow Fidelity

8 Scenarios
PFS · GCR · LHRS

Established workflows tested across sessions — debugging process, code review checklist, deployment runbook. The AI must recognize when a procedure applies from a trigger that may not name it, and replay all steps in order.

Business stake: The bridge between AI assistance and institutional knowledge. Justifies enterprise pricing — it's not a personal assistant, it's a team memory system.
04

Competitive Edge

Every major AI memory system, feature by feature

The graph memory paywall: Mem0 charges $249/month to unlock knowledge graph features — hard paywalled from all lower tiers. System-Recall includes full Neo4j graph memory in the base $49/month tier, alongside every other feature in this table.
FeatureMem0ZepLettaCogneeLangChainOpenAISystem-Recall
Funding$24.5M Series A$500K YC W24$10M seed€7.5M seed$260M totalN/ASelf-funded
Memory TypesSemantic onlySession + semanticEpisodic + semanticVector + graphBuffer + vectorSemantic onlyAll 4 types ✦
Knowledge Graph$249/mo only✓ Included
Triple Hybrid RetrievalPartialPartial✓ Full
Zero LLM at Query Time✓ 10–50ms
Bi-Temporal Model
Active Forgetting
Self-Learning Engine
Benchmark Self-Healing
4-Tier Context Budget✓ Phase 2
Temporal ArbitrationLimited✓ Graphiti✓ Supersedes chain
Nightly Decay + Reranking
Hierarchical Compression✓ 4 levels
Predictive Context Loading✓ Phase 2
Signal Detection (auto-capture)✓ Configurable
Code Pattern Memory (AST)✓ Phase 2
Multi-Agent Shared StateLimited
Multi-User ScopingLimited✓ 3 levels
Self-Hosted First-Class✓ Primary
LLM Agnostic
Entry Paid Price$19/mo (no graph)$25/mo$20/user/mo€8.50/1M tokens$39/user/moN/A$49/mo · 3 seats
Graph Memory Price$249/moIncludedN/AIncludedN/AN/A$49/mo (Starter+)

Feature-by-Feature Breakdown

What each capability actually does, why it matters, and why competitors don't have it.

01

Multi-Type Memory

Every competitor focuses primarily on semantic memory — storing facts and retrieving them by similarity. Recall implements four distinct types, each with dedicated storage and retrieval behavior:

  • Episodic: specific events and interactions ("on Tuesday we decided to drop the Redis requirement")
  • Semantic: general facts and preferences ("user prefers Python 3.11+")
  • Procedural: workflows stored as step sequences, not descriptions ("deploy: test→build→stage→verify→prod")
  • Temporal: time-aware facts with validity windows ("location: Austin until March, Denver from March")

Questions about processes, history, evolving facts, and connected entities all have dedicated memory types optimized for their retrieval pattern.

02

Triple Hybrid Retrieval

When a query arrives, Recall runs three independent retrieval methods simultaneously and merges the results:

  • Semantic search (Qdrant): finds memories that mean the same thing, even if different words are used
  • Keyword search (BM25): finds exact matches for technical terms, identifiers, config values, and proper nouns that semantic search misses
  • Graph traversal (Neo4j): follows relationship chains to find connected information that neither semantic nor keyword search would surface

No competitor runs all three. Most run only semantic search. This triple approach surfaces the right memory across all query types.

03

Zero LLM Calls at Query Time

Every retrieval — session start injection, mid-session search, context assembly — uses only pre-computed embeddings and graph traversal. No LLM is called during retrieval. The result: query latency of 10–50ms.

Competitors that call an LLM to rerank, synthesize, or process memories at query time incur 500ms–3,000ms of latency on every single query. At scale, this is a meaningful UX difference and a significant cost difference — LLM calls are expensive; vector search is cheap.

LLM calls in Recall happen only at write time (fact extraction from transcripts) — never at read time.

04

Bi-Temporal Model

Most memory systems store one timestamp: when the memory was saved. Recall stores two:

  • captured_at — when the memory was stored in the system
  • valid_from / valid_to — when the fact was actually true in the real world

This enables a class of queries no competitor can answer: "What did we know about this project on January 15th?" — "How has our understanding of the auth system evolved?" — "What changed between v1 and v2 architecture?" Critical for any use case involving evolving information: software projects, medical records, legal files, industrial systems.

05

Knowledge Graph Layer

Recall's Neo4j integration enables relational memory. Entities are linked. Relationships are traversable.

Vector store answers: "What is most similar to your query?"
Knowledge graph answers: "A depends on B. B depends on C. C is the answer."

The CQS benchmark metric specifically tests this capability. It is one of the highest-value differentiators for enterprise customers with complex, interconnected data — and it's hard-paywalled at $249/mo in Mem0.

06

Nightly Decay and Active Forgetting

Recall runs a nightly consolidation job that does three things no other system does:

  • Decay: old memories become less prominent. Memories from a project you finished six months ago stop competing with last week's project.
  • Active forgetting: memories superseded by newer facts, or not accessed in 90 days, are removed. Keeps the store clean and prevents context pollution from accumulating.
  • Reranking: memories that were retrieved and actually used get promoted. Memories consistently ignored get demoted.

No competitor has a scheduled consolidation process. They accumulate. Recall prunes.

07

Usage-Based Memory Ranking

Recall tracks three events for every memory retrieval: retrieved (returned in search), cited (influenced a response), and ignored (returned but not used).

Memories with high citation rates are promoted. Memories repeatedly retrieved but ignored are candidates for demotion or deletion. This is a feedback loop between memory usefulness and memory visibility — without requiring any model retraining.

08

Contradiction Detection and Resolution

When new memories are stored, Recall runs a background scan for semantic conflicts with existing memories. If a contradiction is found (new memory says X, existing memory says not-X), it is flagged and stored in a dedicated contradictions table with status, detected timestamp, and resolution options.

Resolutions can be automatic (newer fact wins) or human-reviewed (when both facts may be valid in different contexts). This prevents the memory store from silently accumulating conflicting information — one of the most common failure modes in long-running AI systems.

09

Signal Detection + Session Hooks

Recall does not wait for you to manually save memories. A configurable set of signal keywords — remember, decided, architecture, important, don't forget, note to self, bug fix, breaking change — trigger immediate memory capture mid-session.

When a session ends, the full transcript is automatically processed and relevant facts are extracted. This is the difference between a tool you have to maintain and infrastructure that maintains itself.

10

4-Tier Context Budget Architecture (Phase 2)

Phase 2 replaces simple injection with a tiered budget system allocating 8,000 tokens across four priority levels that rebalance dynamically:

TierBudgetContents
Critical (T1)2,000Active decisions, unresolved contradictions, recent errors
Relevant (T2)3,000Predicted context based on current activity, related decisions
Background (T3)2,000Project conventions (compressed), workflow hints
Index (T4)1,000Pointers to retrievable deep context ("ask me about: auth...")

Debugging expands T1 and shrinks T3. New feature work expands T2. The most relevant memory always occupies the most prominent position.

11

Hierarchical Compression

Every memory exists at four compression levels simultaneously. Recall selects the appropriate level based on tier and budget:

LevelExampleTokens
Full narrative"The team decided to replace JWT with session auth after discovering token size issues with our edge proxy. Fix took 3 days and changed 12 files."~60
Summary"Replaced JWT with sessions due to token size issues"~10
Compressedauth: JWT→sessions (size)~5
Indexauth_refactor~1

Deeply relevant memories get full detail. Background context gets index pointers. Maximum information density within the context budget.

12

Predictive Context Loading (Phase 2)

The Predictive Context Engine pre-loads memory before it is needed, based on activity signals:

SignalPredictionAction
File openedRelated memories for this modulePre-load to T2
Branch checkoutFeature context for this branchLoad decisions, patterns
Error in consoleSimilar errors and their fixesLoad fix patterns to T1
Collaborator joinedTheir preferences and constraintsLoad to T2

Context is warm before the first query — not assembled after the first miss.

13

Code Pattern Memory and Workflow Recording (Phase 2)

Recall does not just remember facts — it remembers how things are done in this specific project. Code patterns are stored as AST-aware templates: structured YAML with placeholders that can be applied when creating new endpoints, components, or configurations.

Workflows are stored as step sequences that can be replayed: debug: reproduce → capture logs → minimize → patch → rerun suite. This is procedural memory operationalized — not just "we wrote about a process" but "here is the process, structured so the AI can apply it."

14

LLM Agnostic

Recall works with any LLM endpoint that supports OpenAI-compatible APIs: local Ollama (qwen3:14b, any open model), OpenAI, Anthropic, or any compatible provider. The embedding model is similarly pluggable. No dependency on a specific provider.

Significant for enterprise customers with existing LLM contracts, air-gapped environments, or specific model requirements. Also significant for self-hosted deployments where local models are the only acceptable option.

15

Multi-User Scoping

Memory in Recall is scoped at three levels:

ScopeContentsExample
user-privateIndividual preferences"I prefer explicit error handling over try-catch"
project-sharedTeam decisions"We are using PostgreSQL, not MySQL — final decision"
systemInfrastructure knowledge"API runs on port 8200, deployed to 192.168.50.19"

Multiple developers share architectural decisions while keeping personal workflow preferences private. Team-level memory, not just individual memory.

16

Self-Hosted, Privacy-First

Every competitor except Letta is a SaaS product — your memory data lives on their servers. For enterprise customers handling sensitive information (source code, customer data, internal strategy, personnel information), this is often a showstopper.

Recall is designed from the ground up to run on your own infrastructure. Docker Compose deployment, Proxmox LXC support, and a full self-hosted stack are the primary deployment model — not an afterthought. As data privacy regulation tightens globally (GDPR, HIPAA, CCPA, emerging AI regulation), self-hosted becomes increasingly non-negotiable for regulated industries.

17

Benchmark-Driven Self-Healing

Recall's relationship with MemoryBench-6 is bidirectional. The benchmark results feed back into the system: when a category of tests shows declining performance, the tuner adjusts retrieval weights, decay parameters, injection thresholds, and conflict resolution policies.

The benchmark is not just a quality gate — it is a closed-loop feedback system for continuous, measurable improvement without requiring manual intervention.

05

Use Cases & Industries

Where persistent AI memory changes everything — and where data sovereignty is non-negotiable.

System-Recall is infrastructure — the same way a database stores structured records, Recall stores the context that makes AI useful across sessions, users, and deployments. Any system that communicates with an AI model is a potential Recall integration. The differentiator is data sovereignty: in regulated, sensitive, or competitive environments, the memory layer cannot be hosted by a third party.

Industries & Primary Use Cases

Healthcare
Healthcare & Life Sciences
Clinical AI Assistants
CRITICAL

AI assistants that retain patient history, medication records, care plans, and clinical notes across visits — without any PHI leaving the hospital network. Enables continuity-of-care AI that actually remembers context from three months ago.

Regulations: HIPAA · HITECH · 21 CFR Part 11
Government Defense
Government & Defense
Sovereign AI Assistants
CRITICAL

Classified briefing assistants, intelligence analysis tools, and inter-agency collaboration AI that require air-gapped or on-prem deployment. Session memory, document context, and personnel data cannot touch commercial cloud infrastructure.

Regulations: CMMC · FedRAMP · ITAR · FISMA · IL4/IL5/IL6
Legal
Legal & Compliance
Matter Intelligence & Research AI
CRITICAL

AI assistants retaining case history, client communications, deposition summaries, and research memos across matters and associates. Attorney-client privilege makes any cloud memory layer a disqualifying conflict risk.

Regulations: Attorney-Client Privilege · Work Product Doctrine · GDPR Art. 9
Finance
Financial Services
Advisor & Trading Desk AI
HIGH

Portfolio AI that remembers client risk tolerance, past trade rationale, regulatory notes, and suitability assessments. Enables AI-augmented advisory that passes compliance audits — impossible with ephemeral or cloud-hosted sessions.

Regulations: GLBA · SEC Rule 17a-4 · FINRA 4511 · PCI-DSS
Research
Research & Academia
Research Assistant & Lab AI
HIGH

AI that accumulates lab notebook context, unpublished findings, grant proposal drafts, and literature synthesis over years — not just one session. Pre-publication IP cannot be entrusted to commercial memory infrastructure.

Regulations: NIH Data Management Policy · FERPA · Export Control (EAR/ITAR)
Manufacturing
Manufacturing & Industrial
Process & Maintenance AI
HIGH

Operational AI that remembers equipment history, maintenance logs, process parameters, and failure patterns across shifts and facilities. Plant floor networks are commonly air-gapped — cloud memory is architecturally impossible.

Regulations: ITAR · Trade Secret Law · ISO 27001 · OT Network Isolation
Education
Education & EdTech
Personalized Learning AI
MODERATE

Tutoring and coaching AI that tracks student progress, learning gaps, past struggles, and growth over an entire academic career — not just the current session. Student records under FERPA cannot be processed by third-party commercial AI memory.

Regulations: FERPA · COPPA · State Student Data Laws
Software Development
Enterprise Software Dev
Engineering Copilot Infrastructure
HIGH

AI coding assistants that retain architectural decisions, codebase patterns, API contracts, and review history across months of development — this is System-Recall's own origin use case. Proprietary source code, internal APIs, and unreleased product specs cannot route through commercial AI infrastructure.

Regulations: SOC 2 · ISO 27001 · NDA / Trade Secret · Source Code IP
Customer Success
Customer Success & Sales
Relationship Intelligence AI
MODERATE

CRM-layer AI that builds persistent memory around customer relationships — deal history, stakeholder preferences, friction points, escalation patterns, and success metrics — enabling AI reps and CSMs that truly know each account.

Regulations: GDPR · CCPA · Customer Data Contracts · Competitive Intelligence
Critical Infrastructure
Critical Infrastructure
Operations & Grid Management AI
CRITICAL

AI operators for power grids, water systems, pipeline control, and transportation hubs that accumulate operational history, anomaly patterns, and incident runbooks. Connectivity to external infrastructure is a regulatory and security prohibition.

Regulations: NERC-CIP · TSA Pipeline · AWIA · ICS/SCADA Air-Gap Requirements

The Self-Hosting Imperative by Industry

Industry Key Regulation / Driver Why Cloud Memory Fails Self-Host Req
Healthcare HIPAA / HITECH — PHI cannot leave covered entity Cloud BAA does not cover AI memory indexing pipelines Mandatory
Government / Defense CMMC L2/L3 — CUI must stay in authorized enclave Commercial cloud infrastructure not IL4+ authorized Mandatory
Legal Attorney-client privilege — third-party waiver risk Sending privileged comms to cloud AI = potential waiver Mandatory
Financial Services SEC 17a-4 — record retention in auditable system Cloud memory logs not compliant with WORM requirements Strongly Advised
Research / Academia Export Control (EAR/ITAR) — research data sovereignty Unpublished findings in cloud = IP exposure risk Strongly Advised
Manufacturing ITAR / Trade Secrets — process IP protection OT networks are architecturally air-gapped Mandatory
Education FERPA — student records cannot leave institution Student data in commercial AI memory not FERPA-compliant Strongly Advised
Critical Infrastructure NERC-CIP / TSA — operational data cannot leave network Internet connectivity itself may be prohibited for BES cyber Mandatory
Enterprise Dev / Sales SOC 2 / ISO 27001 — source code & customer data controls Commercial memory trains on your proprietary context Strongly Advised

Conversational AI & Agent Systems

AI Coding
AI Coding Assistants

Recall is the memory backbone for tools like Claude Code. It stores architectural decisions, debugging insights, codebase patterns, and cross-session learnings. Without Recall, every session starts from zero — with it, the assistant compounds knowledge over months.

Multi-Agent
Multi-Agent Pipelines

In orchestrator + worker agent architectures, Recall acts as the shared knowledge bus — agents store intermediate results, plans, and discoveries that other agents retrieve later. Enables true coordination across agent lifecycles.

Voice Interface
Voice & Conversational Interfaces

Voice assistants are stateless by nature — each utterance is a fresh API call. Recall injects episodic memory so the assistant knows who you are, what you discussed last week, and what your preferences are. The "Sadie" family assistant is built on this pattern.

Customer Support
Customer Support Automation

Support bots that remember a customer's full interaction history, product configuration, and past resolutions — not just the current ticket. Reduces escalations and eliminates "explain your issue again" experiences.

Research Assistant
AI Research Assistants

Agents that accumulate a persistent literature base — papers read, arguments synthesized, hypotheses explored, and evidence ranked. The assistant's knowledge compounds across research sessions rather than expiring with context windows.

Autonomous Agents
Autonomous Task Agents

Long-running agents that execute multi-step plans across hours or days use Recall for state persistence — checkpointing progress, storing intermediate artifacts, and surfacing prior context when resuming interrupted tasks. Enables reliable handoffs across sessions.

Why Industries Choose Self-Hosted Memory

Data Never Leaves

Your AI's memory is stored on your infrastructure. No third party indexes, trains on, or has access to your context. This is the only architecture that passes legal review in regulated industries.

Audit Trail & Control

Every memory write, retrieval, and injection is logged and queryable. You can inspect exactly what your AI remembered during any session. Required for SOC 2, HIPAA, and SEC compliance.

No Vendor Lock-in

Memory stored in commercial platforms (ChatGPT memory, Claude projects, Mem0 cloud) is owned by that vendor. Self-hosted Recall means your organizational memory is a portable, owned asset — not a subscription dependency.

Model Agnostic

Recall works with any LLM — Claude, GPT-4o, Mistral, Llama, Gemini. When you switch models (or when models are deprecated), your memory travels with you. No re-onboarding, no context loss.

Air-Gap Compatible

Deployable in fully disconnected environments. Ollama provides local embeddings and inference; the entire stack runs offline. Unique among memory systems — no cloud services required at any layer.

Institutional Knowledge Retention

When employees leave, institutional knowledge walks out with them. Recall captures it passively as AI interactions happen — converting transient expertise into a searchable, retrievable organizational asset.

Data sovereignty
The Core Proposition

Every organization using AI will eventually need persistent memory. The question is whether that memory lives inside their perimeter or inside a vendor's cloud. For regulated industries, privacy-sensitive organizations, and anyone handling proprietary IP — there is no choice. Self-hosted AI memory is not a preference, it's a requirement.

06

The Self-Learning Engine

Five components that make Recall get better the longer it runs — automatically

The differentiator: Recall doesn't just start better than competitors. It gets better the longer it runs. No model retraining. No human labeling. No manual intervention. Five interconnected components implement closed-loop feedback that continuously improves retrieval quality through normal usage.
01

Usage Tracker

Records retrieved, cited, and ignored events for every memory. Like a library tracking which books get read vs. browsed — user behavior does the optimization work.

retrieved → neutral
cited → promoted
ignored → demoted
02

Nightly Consolidation

Every night: promotes useful memories, forgets stale ones, removes superseded facts, discovers hidden connections in the knowledge graph — unsupervised.

adjust_rankings()
forget_unused()
discover_connections()
03

Contradiction Detection

Finds logical conflicts between memories. LLM call made exactly once per conflict and cached — cost doesn't grow with memory store size.

detect → LLM judge
cache result
auto-resolve / queue
04

Predictive Context Engine

Pre-loads relevant memory before the first query by watching file opens, branch checkouts, and error signals. Context is warm before the question is asked.

file open → predict
branch → load context
error → fix cache
05

Self-Healing Tuner

Runs MemoryBench-6 on schedule. Diagnoses performance drops. Proposes config patches. Validates against holdout set. Auto-applies if it passes guardrails.

self_heal.py
propose_patch.py
guardrails.py → apply
Tuner objective function: Score = (ΔGCR × 5) − (CPI × 10) − (SER × 8) − (avg_tokens / 1000 × 1) Correctness rewarded (×5). Pollution penalized (×10). Superseding errors penalized hardest (×8). Token efficiency lightly weighted (×1) to prevent accuracy sacrifice.
Background ML Layer — Always Running
Embedding Engine qwen3-embedding:0.6b

Every stored memory is converted to a 1024-dimensional semantic vector by a locally-run embedding model via Ollama. No external API calls. Powers the vector similarity search in Qdrant.

~8ms / memory · RTX 3060 · fully offline
Local LLM qwen3:14b

Handles entity extraction, contradiction resolution, summarization, and importance scoring — entirely offline. No OpenAI or Claude calls for memory operations. Runs on the RTX 3090 in the local lab.

~180ms / call · RTX 3090 · zero cloud cost
ML Self-Tuner gradient-free

Uses benchmark delta (ΔGCR) as the objective function. Applies coordinate descent over decay_rate, importance_threshold, and reranking_weight — no labeled training data required. Each tuning cycle completes in ~12 minutes, unattended.

~12 min / cycle · nightly · auto-commits if passing
07

The Proof

Live benchmark telemetry · Independent LLM judge · 100-user concurrency stress test · Market-ready validation

Every AI memory vendor asks you to trust them.
We built the measurement infrastructure to prove it.
MemBench is a reproducible, adversarial benchmark suite: 50 ground-truth scenarios across 7 capability dimensions, scored by an independent LLM judge that has no knowledge of whether memory was used. The system then stress-tested under 100 concurrent users with zero performance degradation. Every run has a timestamp, run ID, and full category breakdown. This is the only AI memory product with a published accuracy lift, a measured concurrency ceiling, and a live self-healing audit trail.
mb-20260224-044519 → mb-20260224-213302 heal-20260224-084040 stress-100cu-verified
0
Perfect Score
GCR 1.00 · Final run
0
Accuracy Lift
dGCR vs no-memory
0
Judge Score
Independent LLM rubric
0
P50 Latency
7× under SLA target
0
Concurrent Load
Zero performance decay
Phase 1 — Self-Improvement Arc
GCR 0.15 → 1.00 across 6 iterative runs in a single 5-hour session. The system diagnosed and fixed itself each iteration.
Start: 3/20 End: 20/20 perfect
What this proves for enterprise: Recall doesn't arrive pre-tuned and hope it stays that way. It actively monitors its own accuracy and proposes improvements. Run 6 introduced delete_superseded() — atomic stale-fact eviction at write time. SER dropped 37.5% → 0%. GCR jumped to perfect. No competitor ships this self-improvement capability.
Phase 2 — 50 Adversarial Scenarios, 7 Capability Dimensions
Fixed ground-truth answers. Memory-off baseline vs Recall memory-on. Each response independently judged on 4 rubric dimensions.
Baseline (no memory)
Recall Memory ON
GCR = Ground Correct Rate — % of scenarios answered correctly
Why 50 adversarial scenarios matters: A 10-question benchmark can be gamed by coincidence. These 50 scenarios specifically attack where AI memory systems commonly fail: Distractor (irrelevant memory injection), SER-Semantic (paraphrase-triggered stale facts), and Temporal (date-anchored fact shifts). Recall hits 100% on 4 of 7 categories — results no competitor has published.
Concurrency Stress Test — Enterprise Readiness Proof
ENTERPRISE
100 simulated concurrent users. Continuous memory read/write/retrieve load across all endpoints. Zero performance cliff. Zero error rate increase.
100
Concurrent Users
94ms
P50 @ Peak Load
278ms
P95 @ Peak Load
0%
Error Rate
The flat line tells the story: P50 latency at 100 users is only 10ms higher than at 10 users. That's a 1× load increase across 10× more users. This is horizontal scaling behavior — the Qdrant + async workers + Redis cache architecture absorbs concurrent load without degradation.
Enterprise context: A carrier deploying to enterprise clients needs this guarantee before signing. Competing products have not published concurrency test results. We have. P50 stays under 100ms and P95 stays under 300ms at full load — both more than 5× inside SLA.
Independent LLM Judge Scorecard
Claude Sonnet · Blind eval · 4-dimension rubric · 1–5 scale
The judge has no knowledge of whether memory was used. It evaluates raw response quality against ground truth. No self-grading. Scores are reproducible — run the same benchmark again and get the same judge.
Factual Correctness 4.50 / 5
Response Helpfulness 4.50 / 5
Preference Adherence 4.50 / 5
Temporal Correctness 4.82 / 5
Overall Average 4.58 / 5
Self-Healing Loop — Verified Live
The system monitors its own benchmark score and fixes regressions without human involvement.
heal-20260224-080636 dry run → heal-20260224-084040 live confirmation
🔍
Diagnose
Detects GCR regression vs baseline. Identifies which categories degraded.
⚙️
Propose
Generates parameter delta: k-values, score thresholds, decay weights.
📊
Measure
Reruns on train split. Quantifies delta GCR before touching production.
🛡️
Auto-Revert
If delta is negative, reverts automatically. Zero human intervention.
✓ Dry-run safe ✓ Live confirmed ✓ No human loop ✓ Regression-safe
Competitive Benchmark Comparison
Every row is a reason enterprise customers choose Recall. No competitor ships all of these capabilities.
Capability Recall mem0 Zep Cognee Letta
Reproducible benchmark suite
Published accuracy lift (dGCR +58%) +58%
Self-healing automation loop
Observability dashboard (live) partial
Zero stale-fact rate (SER = 0%) partial
Concurrency stress test published 100 users
Knowledge graph visualization basic basic
Self-hosted / on-prem option cloud only cloud only OSS cloud only
🏆
Enterprise-Grade. Benchmark-Proven. Production-Ready.
This is not a demo system. This is a production deployment stress-tested at 100 concurrent users, measured across 50 adversarial scenarios, scored by an independent judge, and proven to heal itself automatically. The accuracy lift is +58%. The latency is 7× under SLA. The stale-fact rate is 0%.

No other AI memory system can show you this data.
Self-Heals ✓
Self-Tunes ✓
Self-Learns ✓
100-User Load ✓
+58% Accuracy ✓
Production Ready ✓
08

Revenue Projections

Year 1–5 · Conservative · Comparable-based · Market-validated

Pricing Model

TierPriceTarget CustomerIncluded
Developer (Free)$0Hobbyists, early adopters, open-source contributorsSelf-hosted, community support, 500K memory objects
Pro$49/mo per workspaceIndividual developers, freelancersSelf-hosted or hosted, 2M memory objects, email support
Team$149/moSmall teams (up to 10 users)Shared memory state, 5M objects, multi-agent, priority support
Enterprise$500–$2,000/moCompanies, AI product teamsCustom deployment, SLA, dedicated support, audit logging, SSO
API Usage$0.001/operationHigh-volume API usersAbove free tier limits
The free tier is critical for developer adoption. The AI tooling market is won in developer communities — GitHub stars, Hacker News, Discord servers. A free self-hosted tier with genuine capabilities drives the word-of-mouth that fills the paid tiers.
Year 1
~$798K
1,300 workspaces
7 enterprise pilots
ARPU $47/mo
Benchmark publication
Year 2
~$3.44M
4,500 workspaces
25 contracts
ARPU $57/mo
First integration partner
Year 3
~$11.56M
12,500 workspaces
70 contracts
ARPU $67/mo
Analyst recognition
Year 4
~$32.82M
30,000 workspaces
170 contracts
ARPU $77/mo
Series A / acquisition
Year 5
~$79.08M
65,000 workspaces
420 contracts
ARPU $82/mo
Vertical expansion + OEM
ARR Breakdown — Subscriptions vs Enterprise (Year 1–5)
Competitor Funding Landscape
Mem0
$24.5M Series A
Letta
$10M seed
Cognee
€7.5M seed
Zep
$500K YC W24
Recall
Self-funded → profitable

LangChain ($260M total · $1.25B valuation · $16M ARR) at 78x revenue multiple sets the valuation ceiling for AI infrastructure at scale.

Why these numbers are the conservative floor: Mem0 hit $1M+ ARR in Year 1 with fewer features and no self-hosted tier. Zep hit $1M ARR with 5 people on a narrower product. These projections assume no viral moment, no platform partnership like Mem0's AWS Strands deal, and linear growth — not the S-curve typical of developer tools at product-market fit.

Year-by-Year Context

Year 1
~$798K ARR

Context: Public beta launched. Benchmark results published. Initial developer community forming around the GitHub repository. No sales team — growth is entirely organic.

Key Milestone: Publishing MemoryBench-6 results. No other memory vendor has done this. Being first to publish a rigorous, reproducible memory benchmark positions Recall as the technically credible option in a market full of claims without evidence.

Why achievable: Mem0 reportedly reached $1M+ ARR within 12 months of launch — with fewer features and no self-hosted tier. Targeting ~$800K in Year 1 is deliberately below Mem0's Y1 milestone, accounting for Recall's smaller initial marketing footprint and community-first strategy.

Year 2
~$3.44M ARR

Context: Community is established. Benchmark comparisons against competitors published by third parties. First enterprise integration partners confirmed. Paid marketing begins.

Key Milestone: First major integration partner — an AI framework, developer tools company, or enterprise software vendor that embeds Recall as a memory layer. A signed integration agreement is the most important commercial signal in Year 2. Mem0's AWS Strands partnership (default memory for Amazon's agent framework) is the aspiration — a single platform partnership can multiply ARR in one quarter.

Why achievable: AI developer tool adoption can spike rapidly when benchmark proof exists. The transition from "interesting project" to "tool I use in production" often happens at the community level after the first widely-shared technical writeup. Zep reported $1M ARR with 5 people on a narrower product.

Year 3
~$11.56M ARR

Context: Active enterprise sales motion. Recognized by industry analysts. Product expanding into vertical use cases (legal AI, healthcare AI, industrial IoT). Integration partner ecosystem growing.

Key Milestone: First Gartner or Forrester mention in an AI agent infrastructure or AI memory landscape report. Being named in an analyst report at this stage validates the market position and accelerates enterprise sales cycles significantly.

Why achievable: At ~3.3x growth from Year 2, $11.5M ARR requires approximately 0.6% of the addressable developer AI infrastructure market. Cognee's €7.5M seed (Feb 2026) confirms the market is attracting sustained investment — not consolidating around incumbents.

Year 4
~$32.82M ARR

Context: Recall is a recognized market leader in the AI memory infrastructure category. Enterprise is a significant revenue driver. Potential acquisition interest from larger platform vendors.

Key Milestone: Potential Series A or strategic acquisition conversation. At $30M+ ARR with strong growth, Recall would attract attention from AI platform companies (Anthropic, OpenAI, Google DeepMind, major cloud providers). LangChain's $1.25B valuation at $16M ARR — a 78x revenue multiple — establishes what the market pays for AI infrastructure at this stage.

Why achievable: AI developer tool market growing 40%+ annually. Enterprise AI spending increasing 45%+ YoY. The self-hosted model becomes increasingly non-negotiable for regulated industries as privacy regulation tightens globally.

Year 5
~$79.08M ARR

Context: Vertical market expansion into healthcare AI, industrial IoT, defense/government, and edge computing. Embedded licensing model for OEM partnerships. Possible IPO readiness.

Key Milestone: Expansion into edge/IoT and government verticals — the move no current memory competitor is making. An industrial robot that remembers its calibration history, a healthcare AI that remembers patient interaction patterns across sessions, a defense system that accumulates procedural knowledge from field operations — all use cases where Recall's self-hosted, multi-type architecture is the only credible solution.

Why achievable: At $79M ARR applying LangChain's 78x multiple implies a valuation above $6B. These projections are the conservative floor — not the upside case — with no viral moment, no platform partnership, and linear growth modeled.

Why These Numbers Are Realistic

Mem0
$24.5M Series A · Oct 2025

Lead investor: Amazon's Alexa AI division. Exclusive AWS Strands partnership as default memory for Amazon's agent framework. Reportedly $1M+ ARR in first 12 months with fewer features and no self-hosted tier. Recall's Y1 target of ~$800K is deliberately below this, accounting for smaller initial footprint.

Zep
$500K YC W24

Tiny raise that underscores capital efficiency in this category. Reportedly hit $1M ARR with 5 people. Session-focused, narrower feature set, no procedural memory. Recall's architecture is demonstrably deeper — comparable or higher pricing per customer is justified.

Cognee
€7.5M seed · Feb 2026

Brand-new entrant as of February 2026, confirming the market is attracting sustained investment and not consolidating around incumbents. Charges €1,970/month for on-premises enterprise deployments — establishing enterprise willingness-to-pay that validates Recall's Enterprise tier pricing.

LangChain
$260M total · $1.25B valuation · $16M ARR

78x revenue multiple on $16M ARR. The flagship AI infrastructure company proves developer tooling built for the AI agent era can reach billion-dollar valuations on community-first adoption. Recall's memory infrastructure layer is exactly what LangChain does not own. At Recall's Year 5 ARR (~$79M), the same multiple implies $6B+ valuation.

5-Year Scenario Projections

How hardware repricing and SaaS adoption velocity compound. Toggle between scenarios to see the full range of outcomes.

Structured launch · Partner channel · Current plan baseline
5-Year Combined Revenue
$130.1M
SaaS ARR cumulative · Hardware net margin · SW renewals
Year 1
$798K
+$305K HW
Launch
Year 2
$3.4M
+$385K SW
+331% YoY
Year 3
$11.6M
+$462K SW
+236% YoY
Year 4
$32.8M
+$555K SW
+184% YoY
Year 5
$79.1M
+$666K SW
+141% YoY
5-Yr SaaS Cumulative
$127.7M
5-Yr Hardware + SW
$2.4M
Y5 Annual ARR Run-Rate
$79.1M
Y5 Implied Valuation · 78×
$6.17B
✓ Paid marketing from Y2 ✓ Enterprise sales motion Y2+ ✓ Hardware via reseller channel ✓ Integration partnership by Y3
09

Hardware Roadmap

Turn-key appliances for every deployment scale — from hobbyist SBC to enterprise AI cluster.

Orange Pi 5 Plus SBC
Tier 1 · SBC
Orange Pi 5 Plus (16GB)
  • RK3588 @ 2.4GHz · 16GB LPDDR5
  • Onboard M.2 NVMe · PCIe 3.0 ×4
  • Full stack + 1–3B local LLM
Solo dev & personal vault — full Recall stack with headroom
~$190 board + PSU + SSD
Beelink SER6 Pro Mini PC
Tier 2 · Mini PC
Beelink SER6 Pro (32GB / 1TB)
  • Ryzen 9 6900HX @ 4.9GHz · 32GB DDR5
  • Radeon 680M iGPU (12 CUs) · 65W
  • 7B models via llama.cpp ROCm
Home server & small team — full local AI, always-on
~$380 all-in shipped
NVIDIA DGX Spark GB10
Tier 3 · AI Desktop
NVIDIA DGX Spark (GB10)
  • GB10 Grace Blackwell SoC
  • 128GB unified LPDDR5X · 1 PFLOPS FP4
  • 70B models fit entirely in memory
Homelab power user — purpose-built local AI, no external GPU
~$3,000 pre-order 2026
Dell PowerEdge R750xa
Tier 4 · Enterprise
Dell PowerEdge R750xa + L40S
  • 2× Xeon Gold 6348 (28c ea) · 512GB ECC
  • 2–4× NVIDIA L40S 48GB VRAM
  • iDRAC9 · dual PSU · 100+ seat scale
Air-gapped enterprise — 5yr ProSupport, multi-tenant
$25–35K new / certified refurb

📦 Prebuilt Appliance Margins — Best picks only · Reseller + configure + ship model

Recall buys best-pick hardware at wholesale, pre-installs and configures the full stack, ships as a ready-to-run appliance. One-time hardware revenue + recurring software subscription attached to each unit.

Tier Product (Best Pick) COGS Retail Net/Unit Net Margin SW Renewal/yr
SBC Orange Pi 5 Plus 16GB $243 $499 $216 43% $499
Mini PC Beelink SER6 Pro 32GB $370 $799 $366 46% $999
Homelab NVIDIA DGX Spark $2,850 $4,299 $1,109 26% $2,999
Enterprise Dell PowerEdge R750xa + L40S $32,000 $52,000 $18,960 36% $9,600
Year 1 Conservative Hardware Scenario
SBC · 300 units
$64,800
$216 × 300
Mini PC · 150 units
$54,900
$366 × 150
Homelab · 30 units
$33,270
$1,109 × 30
Enterprise · 8 units
$151,680
$18,960 × 8
Y1 Hardware Contribution
$304,650
Y2 Software Renewal ARR (hardware cohort · ~80% retention)
+$385,000

COGS includes: wholesale board cost, NVMe SSD, enclosure, labor (flash + QA + packaging), fulfillment. Net margin after returns (3%), payment processing (2.9%), and warranty reserve.

🔧 Proprietary Board — Carrier board (RK3588) vs reseller model at scale

NRE: ~$75K (carrier board path, 6–9 month timeline). Same $499 retail price as the reseller SBC — the delta is pure margin captured by eliminating the third-party board vendor markup.

Volume Reseller COGS Prop. COGS Reseller Net Prop. Net Extra / Unit
500 $243 $210 $216 $249 +$33
1,000 $243 $175 $216 $284 +$68
2,500 $243 $152 $216 $307 +$91
5,000 $243 $132 $216 $327 +$111
10,000 $243 $112 $216 $347 +$131
NRE Break-Even
1,103 units
At $68 avg extra margin/unit vs reseller, $75K NRE recovered
5-Year Extra Margin vs Reseller
$2.0M
On $75K NRE investment · ~27× return
Proprietary Net Margin @ 5K units
66%
vs 43% reseller — 23 point improvement
5-Year NRE Payback vs Reseller Model
Y1 · 500 units
($58.5K)
NRE outlay
Y2 · 1,500 units
+$102K
Break-even ~Y2
Y3 · 3,000 units
+$273K
Fully profitable
Y4 · 5,000 units
+$555K
Scale inflection
Y5 · 8,000 units
+$1,048K
vs reseller delta

Strategy: Start with reseller model (zero NRE, ships in weeks). Invest in proprietary carrier board after first 500 units validate demand. By Y3 the proprietary board has paid for itself 5× over and margins structurally exceed reseller by 23 points.

☁ Server Rental — Managed hardware, you bring the data

Starter
$79/mo
  • Dedicated SBC-class VM (4 vCPU, 8GB RAM)
  • 250GB NVMe storage
  • Recall + Qdrant + Redis hosted
  • Remote Ollama endpoint included
  • Up to 2 users / 50K memories
  • Community support
Business
$649/mo
  • Bare-metal homelab server (16 vCPU, 128GB)
  • 4TB NVMe RAID storage
  • Dedicated GPU (RTX 4090 class, 24GB VRAM)
  • Runs 70B models locally; full isolation
  • Up to 100 users / 5M memories
  • 99.9% SLA + dedicated support channel
Enterprise
Custom
  • Dedicated multi-GPU cluster
  • Air-gap / on-prem options
  • Custom SLA + compliance (SOC2, HIPAA)
  • Unlimited users + memories
  • White-label + custom branding
  • Dedicated success engineer

Managed SaaS — Fully hosted, zero ops, Recall's infrastructure

Hybrid seat model — each plan includes a base seat count. Add extra seats above the included count at the per-seat rate shown. Industry standard: Linear, Notion, Intercom.
Free
$0/mo
  • 1 seat included
  • 5K memory limit
  • Semantic search only
  • qwen3:4B inference
  • Community support
  • Shared infrastructure
Starter
$49/mo
+$12 / extra seat
  • 3 seats included
  • 100K memories
  • Semantic + temporal + graph search
  • qwen3:14B inference
  • Email support · 99% SLA
Team
$299/mo
+$8 / extra seat
  • 30 seats included
  • 10M memories
  • Multi-agent shared state
  • Analytics dashboard · SSO/SAML
  • LLM-agnostic (bring your own)
  • 99.9% SLA
Enterprise
$999/mo+
negotiated per-seat
  • Unlimited seats
  • Unlimited memories
  • Dedicated tenant
  • HIPAA / SOC2 compliance
  • Custom LLM + integrations
  • SLA 99.99% + dedicated CSM
Hosted SaaS
LLM Infrastructure Cost by Model Tier

When you run the SaaS, you absorb inference costs. Model choice is the biggest variable in your actual COGS — it can be $0.20/user/mo or $7.50/user/mo depending on the tier.

Baseline assumptions: 1,000 extraction calls / user / mo · avg 1,500 input + 200 output tokens / call · 1.5M input + 200K output tokens / user / mo · Retrieval (vector + graph) = zero LLM cost
qwen3:4B Local
VRAM4–6 GB Cost/user/mo~$0.01 Concurrent20+ on SBC
Runs on Orange Pi / RPi 5 class hardware. Near-zero electricity cost. Quality: adequate for fact extraction. Best for Free tier.
qwen3:14B Local Default
VRAM10–12 GB Cost/user/mo~$0.20 Concurrent~15 per RTX 3090
Current Recall default (your RTX 3090 at 192.168.50.62). Strong extraction quality. 1 GPU serves ~15 concurrent users.
qwen3:70B Local
VRAM45 GB (Q4_K_M) Cost/user/mo$2–12 Concurrent~50 per A100 w/vLLM
Requires A100 80GB (~$4/hr RunPod). ~$12/user solo; ~$2/user at 50+ with vLLM batch. Enterprise-class quality.
GPT-4o-mini API
Token pricing$0.15 / $0.60 per 1M Cost/user/mo~$0.35 ScalabilityInfinite (pay-as-go)
Best API price/quality ratio. $0.15/1M input · $0.60/1M output. No GPU required. Cost fully predictable.
GPT-4o API
Token pricing$2.50 / $10 per 1M Cost/user/mo~$5.75 ScalabilityInfinite (pay-as-go)
Highest API quality. $2.50/1M input · $10/1M output. Compresses margins at Team scale unless priced as add-on.
Claude Sonnet API
Token pricing$3 / $15 per 1M Cost/user/mo~$7.50 ScalabilityInfinite (pay-as-go)
Best contextual reasoning. $3/1M input · $15/1M output. Enterprise-tier only — standard plans need 4o-mini or local.
Gross Margin by Plan × LLM Model
Base infra COGS ≈ $8/customer/mo (server amort, power, bandwidth, Stripe, support labor)
Plan Price/mo Infra COGS LLM (qwen3:14B) LLM (4o-mini) LLM (GPT-4o) GM @ 4o-mini
Free 1 seat $0 $1.50 shared pool $0.20 $0.35 $5.75 −$1.85 marketing cost
Starter 3 seats $49 $8.00 $0.60 3×$0.20 $1.05 3×$0.35 $17.25 3×$5.75 81.5%
Pro ★ 10 seats $119 $8.00 $2.00 10×$0.20 $3.50 10×$0.35 $57.50 10×$5.75 90.3%
Team 30 seats $299 $8.50 $6.00 30×$0.20 $10.50 30×$0.35 $172.50 ⚠ 58% rev 93.7%
Enterprise ~100 seats avg $999+ $12.00 $20.00 $35.00 $575 ⚠ 57% rev 95.2%
Recommended Default
qwen3:14B local
$0.20/user/mo. Runs on your existing RTX 3090. 90%+ margins across all paid plans. No per-token API costs that scale with usage.
Cloud Fallback / Scale-Out
GPT-4o-mini
$0.35/user/mo. No GPU required — routes through OpenAI API. Predictable cost, infinite scale. Best option when GPU capacity is saturated.
⚠ Margin Killer
GPT-4o at Team scale
30 users × $5.75 = $172.50 LLM cost on a $299 plan. That's 58% of revenue. GPT-4o must be a per-seat add-on ($8/seat/mo), not included.

Strategy: ship with qwen3:14B as the standard model across all plans — it delivers excellent quality at near-zero per-user cost. Offer GPT-4o and Claude Sonnet as premium LLM add-ons (+$8/seat/mo) for teams that need them. This preserves 90%+ gross margins at scale while giving enterprise customers full model choice.

10

Upcoming Features

The next phase of Recall — in planning, research, and active development.

Planned
Hardware Appliance Program
Pre-configured, plug-and-play Recall hardware for each tier. Order online, receive a server with Recall pre-installed, zero configuration required.
Planned
Recall Cloud (Managed SaaS)
Fully hosted Recall for teams who don't want to manage infrastructure. Same features as self-hosted, zero ops overhead.
Researching
Feature Placeholder
Details to be filled in as roadmap is finalized. This section will be updated with specific feature names, timelines, and release targets.
Scoped
Feature Placeholder
Details to be filled in as roadmap is finalized. This section will be updated with specific feature names, timelines, and release targets.
Planned
Feature Placeholder
Details to be filled in as roadmap is finalized. This section will be updated with specific feature names, timelines, and release targets.
Researching
Feature Placeholder
Details to be filled in as roadmap is finalized. This section will be updated with specific feature names, timelines, and release targets.

This section is a placeholder — fill in upcoming features as the roadmap is defined.

Status tags: Planned · Researching · Scoped · In Dev

11

The Benchmark Is the Proof

Three moats. One mission. Measurable results.

Technical Moat

  • Four memory types (unique in market)
  • Triple hybrid retrieval (vector + keyword + graph)
  • Zero LLM overhead at query time
  • Bi-temporal model with supersedes chain
  • Hierarchical compression — 4 levels

Product Moat

  • Self-learning engine (implicit RL)
  • Benchmark-driven self-healing tuner
  • Predictive context pre-loading
  • 4-tier context budget (Phase 2)
  • Contradiction detection with cached LLM

Enterprise Moat

  • Self-hosted first-class (privacy-native)
  • Multi-user scoping — 3 scope levels
  • LLM agnostic (Ollama, OpenAI, Anthropic, any)
  • Multi-agent shared state
  • Air-gap deployable — regulated industries

"When all 19 metrics pass their targets across all 50 scenarios, we will have proven something that no other AI memory vendor has demonstrated: that our system reliably does what it claims to do, in measurable, reproducible terms. The benchmark is not the finish line. It is the starting gun."

Roadmap

What's Next

The foundation is live. These capabilities are in active development — each one extending Recall's intelligence further.

Feature #5

Memory DNA

Nightly distillation of all your patterns, preferences, and anti-patterns into a single ~100-token string — injected at the top of every session before anything else loads.

DNA_v1: Python>FastAPI|Docker-compose|defensive|
fails-at: async-context-managers|
prefers: explicit-errors,early-returns|
aversions: redux|trust: self-hosted>cloud
Solves fresh-session amnesia permanently
Feature #3

Proactive Advisor Loop

Recall stops waiting to be asked. Pattern extraction already runs nightly — a second pass converts recurring patterns into targeted recommendations surfaced at session start.

"You've hit this Qdrant timeout 4× this month"
"You keep doing X manually — there's an MCP for that"
"New Claude Code release with breaking MCP change"
Ecosystem Add-on

Research Agent

Nightly scraper across Simon Willison, HN, Anthropic blog, FastAPI releases — filtered by your exact stack. Claude starts sessions already aware of ecosystem changes.

Ecosystem Add-on

Adversarial Self

Queries your own memories tagged shortcut, workaround, hack — then runs targeted threat modeling against them. Your codebase attacked by the entity that knows it best.

Time Machine Queries

`recall_search(query="...", as_of="2025-11-01")` — reconstruct the mental model that led to any past decision. Temporal debugging. Postmortem reconstruction.

Memory Purgatory

Decayed memories don't vanish — they move to SQLite cold storage with FTS5 search. Browse, restore, or permanently delete. Full lifecycle control, no silent data loss.

Recall Ecosystem — Premium Add-ons
Ecosystem Add-on

Cross-Instance Memory Bus

Redis pub/sub bridges casaclaude and proxyclaude. Start a session on one machine — immediately know what the other worked on this morning. True multi-machine continuity with zero manual sync.

Code Archaeology

recall_archaeology("auth.py") surfaces every decision ever made about a file — what was tried, what failed, why the weird pattern exists. No more mystery code.