System-Recall Validation Report

The App

A fully observable AI memory platform — live dashboards, real-time benchmarking, knowledge graph visualization

No Competitor Has Built This

mem0, Zep, Cognee, Letta — every other AI memory product is a black box. You put memories in, you get answers out. Zero visibility into what the system knows, how healthy it is, or whether memory is actually helping. Recall ships a full observability platform alongside the memory engine. This isn't a dashboard bolted on after — it's how we built the system from day one.

mem0 — no dashboard

Zep — no benchmark tool

Cognee — no health analytics

Recall — all of it ✓

System Dashboard

LIVE

Real-time system telemetry across every layer of the memory stack. At a glance: 12,149 stored memories, 53,945 graph relationships, 5 services (Qdrant, Neo4j, Redis, Postgres, Ollama) all green. Active LLM models, their sizes, quantization, and load state visible in one view.

→ Enterprise compliance teams can audit exactly what the AI knows

→ Ops teams can monitor memory growth and capacity in real-time

→ No competitor can show you this screen

Recall Dashboard — live system telemetry

MemoryBench — Built-In Benchmark Suite

UNIQUE

The world's only memory system that ships with a reproducible benchmark suite you can run against your own data. +62% memory lift (dGCR), 92% accuracy, 0% stale entity rate — these aren't lab numbers, they're live measurements from this run (mb-20260224-181412). Every category broken down: Distractor, KG, Long Horizon, Preference, Procedural, SER Semantic, Temporal.

→ Carriers can verify memory quality before deployment to customers

→ Developers can track whether updates improved or regressed accuracy

→ Automated via self-tuning loop — benchmarks run and act on themselves

MemoryBench — reproducible benchmark suite with live results

System Health — Memory Quality Analytics

UNIQUE

Deep analytics on the quality of the AI's memory over time. Feedback score 70.8% positive. Graph cohesion 0.862 (strong edge bonds). Importance distribution histogram showing how memories are weighted. Decay curve tracking how memories age. This is memory science visualized — not possible to get this insight from any competing product.

→ Spot degradation before it affects response quality

→ Tune decay parameters from data — not guesswork

→ Graph cohesion metric unique to Recall — shows knowledge integrity

System Health — memory quality analytics with decay curves and distribution charts

Knowledge Graph — See How the AI Thinks

NEXT LEVEL

Interactive force-directed graph of the entire knowledge network — 50+ nodes, color-coded by layer (API Routes, Core Logic, Storage, Workers, Dashboard, Hooks). See which modules call which, how data flows between components, what hooks trigger what logic. This is the AI's memory architecture made visible. Filter by layer, zoom in on clusters, inspect relationships. No other AI memory product lets you see inside the machine.

→ Enterprise security teams can audit data flow paths

→ Developers can understand memory architecture at a glance

→ Demonstrates the depth and maturity of the system

LIVE · KNOWLEDGE GRAPH

53,945 relationships · 12,149 memories · 6 layers

api core storage worker dash hook

Why Observability Is a Moat — Not a Feature

Enterprise Requirement

Enterprise buyers will not deploy AI memory without auditability. GDPR, SOC 2, HIPAA — all require knowing what data is stored and how it's used. Our dashboard is the compliance answer. Competitors can't enter enterprise without building this first.

Self-Improving Loop

MemBench isn't just display — it drives the self-tuning loop. The app watches its own benchmarks, detects regressions, proposes fixes, and verifies them. The dashboard IS the automation interface. No other product has closed this loop.

Carrier Differentiation

Any carrier who bundles Recall can offer their customers a management dashboard. This is a premium upsell: "Memory + Analytics" vs bare API. Competitors offer an API. We offer an experience — and the analytics to prove it's working.

Why We Measure

The case for provable memory over claimed memory

The core argument: AI agents are becoming the dominant interface for software. Every AI agent needs memory. The market for AI memory infrastructure is growing faster than any single vendor can capture. System-Recall is the only system combining all major capabilities — four memory types, knowledge graph traversal, zero LLM overhead at query time, self-learning, and a closed-loop self-healing tuner — in a single self-hosted stack.

Competitors make big claims based on shallow demonstrations. "My AI remembered something once" is not proof. System-Recall takes a different approach: a rigorous, reproducible benchmark suite — MemoryBench-6 — that measures memory performance the same way a laboratory measures drug efficacy. Define the test, run it, publish the results, let others verify.

Recall is the only self-hosted stack with all of the following

→ Four memory types: episodic, semantic, procedural, temporal

→ Triple hybrid retrieval: vector + keyword + knowledge graph

→ Zero LLM calls at query time — 10–50ms retrieval

→ Bi-temporal model: captured_at + valid_from/valid_to

→ Neo4j knowledge graph backend for relational queries

→ Nightly decay, reranking, and active forgetting

→ Usage-based memory ranking — citation feedback loop

→ Contradiction detection with cached LLM judgment

→ Signal detection + session hooks (auto-capture, no manual saves)

→ 4-tier context budget architecture with dynamic rebalancing (Phase 2)

→ Hierarchical compression — 4 detail levels per memory

→ Predictive context loading before first query (Phase 2)

→ AST-aware code pattern and workflow memory (Phase 2)

→ LLM agnostic — Ollama, OpenAI, Anthropic, any endpoint

→ Multi-user scoping: private / project-shared / system tiers

→ Multi-agent shared memory state

→ Self-hosted, privacy-first — your data never leaves your network

→ Benchmark-driven self-healing tuner — closed-loop feedback

Four Memory Types

Episodic, semantic, procedural, and temporal — each with dedicated storage and retrieval behavior. No competitor implements all four.

Triple Hybrid Retrieval

Vector semantic + BM25 keyword + Neo4j graph traversal run simultaneously on every query. Most competitors use only semantic search.

Zero LLM at Query Time

10–50ms retrieval. Competitors calling LLMs at query time pay 500ms–3s per search and incur per-token costs at scale.

Bi-Temporal Model

captured_at + valid_from/valid_to — answer "what did we know on January 15th?"

Self-Learning Engine

Usage-driven memory ranking, nightly consolidation, contradiction detection, and unsupervised graph connection discovery.

Benchmark-Driven Self-Healing

MemoryBench-6 acts as a control system. Performance drops trigger automatic diagnosis, patch generation, and validated deployment.

19 Benchmark Metrics

Every metric maps to a real user need. Every target is a measurable pass/fail gate.

These are not vanity metrics. Each was selected because it can be gamed in isolation but is hard to manipulate when all 19 are tracked simultaneously. A system that passes all 19 across all 50 scenarios has genuinely good memory — not a tuned demo.

AccuracyGCR

Goal Completion Rate

% of 50 benchmark scenarios answered correctly with memory enabled. The primary quality signal — everything else explains why this number is what it is.

≥ 90%

AccuracyΔGCR

Delta GCR — Memory Lift

How much better the AI performs WITH memory vs. WITHOUT: GCR(on) − GCR(off). Isolates Recall's contribution from the model's baseline intelligence.

≥ +15 percentage points

AccuracyHit@K

Memory Hit at K

Is the ground-truth memory present in the top-K retrieved results? Tested at K=5, K=10, and K=20.

≥ 80% @K=5 · ≥ 92% @K=10

AccuracyMRR

Mean Reciprocal Rank

Average rank position of the correct memory. First place = 1.0; fifth place = 0.2. MRR rewards putting the right answer at the top, not just in the results.

≥ 0.75

RetentionLHRS

Long-Horizon Retention Success

Can the AI use information from 3+ sessions ago to answer correctly today? Tests the "weeks later" scenario without reminders.

≥ 85%

RetentionCSR

Constraint Satisfaction Rate

Of all rules and preferences the user has ever stated, what % does the AI actually honor in its responses? The "AI that knows me" metric.

≥ 88%

RetentionLCS

Learning Curve Slope

As Recall accumulates more interactions, does it get measurably better without retraining? Positive slope = self-learning confirmed.

Positive slope across intervals

QualityPrec.

Injection Precision

Of all injected memories, what fraction were genuinely relevant? Low precision = context pollution from irrelevant injections.

≥ 80%

QualityRec.

Injection Recall

Of all memories that were needed, what fraction were actually retrieved and injected? Low recall = AI missing context it required.

≥ 85%

QualityCPI

Context Pollution Index

% of scenarios where injecting memory makes the AI perform WORSE than no memory. The canary-in-the-coalmine retrieval quality metric.

< 2%

QualitySER

Staleness Error Rate

% of cases where an outdated memory is used when a newer, correct version exists. Tests whether temporal arbitration is working.

< 1%

EfficiencyITS

Injected Tokens per Success

On correctly answered scenarios, how many memory tokens were injected on average? Lower = more efficient. Should trend downward as the system tunes.

Downward trend required

EfficiencyMUPT

Marginal Utility Per Token

ΔGCR ÷ avg injected tokens per turn. How much correctness improvement per injected token. The efficiency ratio — high MUPT = every token is earning its place.

Positive; improving vs baseline

EfficiencyP50/P95

Retrieval Latency

End-to-end time: query submitted → memory injected into context. Includes vector search, graph traversal, reranking, and context assembly.

P50 < 600ms · P95 < 1,500ms

TemporalTGA

Temporal Grounding Accuracy

When a fact has changed, does the system use the NEW fact rather than the old one? Tests the superseding chain — newer record must win.

≥ 97%

RelationalRRA

Relational Recall Accuracy

Can Recall answer by following a chain of connections? Tests multi-hop graph traversal that semantic similarity alone cannot surface.

≥ 80%

RelationalCQS

Compositional Query Success

Can Recall combine 2–4 separate memory nodes into one coherent answer? The advanced KG differentiator — pure vector systems fundamentally cannot pass this.

≥ 75%

ProceduralPFS

Procedure Fidelity Score

When a user's established workflow is needed, are all steps recalled in the correct order? Step completeness × ordering penalty.

≥ 85%

SystemRS

Reproducibility Score

Run the same 50 scenarios twice — do you get the same results? Benchmark validity proof. Below 95% means system behavior is inconsistent.

≥ 0.95

6 Test Categories

50 scenarios across six distinct memory challenges — each proves a different capability

Temporal Consistency & Fact Updates

10 Scenarios

GCR · TGA · SER

Tests whether memory correctly updates when facts change. A user moves cities, changes a preference, or revises an architecture decision — does the AI know the new fact without being told again?

Business stake: Prevents advice based on outdated information — the most common trust-destroying failure in production AI systems.

User Preference Persistence

8 Scenarios

LHRS · CSR · GCR

Tests whether preferences stated once are honored indefinitely across sessions. Preferences are implicitly relevant to almost every interaction — a fundamentally different retrieval challenge than explicit factual queries.

Business stake: The foundation of the "AI that knows me" experience. Ignored preferences signal that the user's time was wasted establishing them.

Long-Horizon Planning Continuity

8 Scenarios

LHRS · CSR · GCR

Three-session arcs where early architectural decisions must influence later deployment tasks — even when middle sessions don't reference them. Tests cross-session retrieval triggering for topically distant queries.

Business stake: Proves Recall can serve as genuine project memory. An AI that forgets decisions by mid-project is less useful than a text file.

Distractor Robustness & Context Pollution

8 Scenarios

CPI · Injection Precision · GCR

Type A: irrelevant personal facts must not surface during technical queries. Type B: sensitive stored records must not be revealed by crafted prompts. Tests both signal-to-noise ratio and privacy boundaries.

Business stake: CPI above 5% is product-killing. Type B scenarios are a baseline security audit for stored credentials and private data.

Relational & Knowledge Graph Queries

8 Scenarios

RRA · CQS · GCR

Chains of 3–5 connected facts where the answer requires traversing graph relationships that semantic similarity cannot surface. "What infrastructure does our deployment process require?" — correct answer is 3 graph hops from the query.

Business stake: The category pure vector memory systems fundamentally cannot pass. This is the proof of value for the Neo4j layer.

Procedural Replay & Workflow Fidelity

8 Scenarios

PFS · GCR · LHRS

Established workflows tested across sessions — debugging process, code review checklist, deployment runbook. The AI must recognize when a procedure applies from a trigger that may not name it, and replay all steps in order.

Business stake: The bridge between AI assistance and institutional knowledge. Justifies enterprise pricing — it's not a personal assistant, it's a team memory system.

Feature	Mem0	Zep	Letta	Cognee	LangChain	OpenAI	System-Recall
Funding	$24.5M Series A	$500K YC W24	$10M seed	€7.5M seed	$260M total	N/A	Self-funded
Memory Types	Semantic only	Session + semantic	Episodic + semantic	Vector + graph	Buffer + vector	Semantic only	All 4 types ✦
Knowledge Graph	$249/mo only	✓	✗	✓	✗	✗	✓ Included
Triple Hybrid Retrieval	✗	Partial	✗	Partial	✗	✗	✓ Full
Zero LLM at Query Time	✗	✗	✗	✗	✗	✗	✓ 10–50ms
Bi-Temporal Model	✗	✓	✗	✗	✗	✗	✓
Active Forgetting	✗	✗	✗	✗	✗	✗	✓
Self-Learning Engine	✗	✗	✗	✗	✗	✗	✓
Benchmark Self-Healing	✗	✗	✗	✗	✗	✗	✓
4-Tier Context Budget	✗	✗	✗	✗	✗	✗	✓ Phase 2
Temporal Arbitration	Limited	✓ Graphiti	✗	✗	✗	✗	✓ Supersedes chain
Nightly Decay + Reranking	✗	✗	✗	✗	✗	✗	✓
Hierarchical Compression	✗	✗	✗	✗	✗	✗	✓ 4 levels
Predictive Context Loading	✗	✗	✗	✗	✗	✗	✓ Phase 2
Signal Detection (auto-capture)	✗	✗	✗	✗	✗	✗	✓ Configurable
Code Pattern Memory (AST)	✗	✗	✗	✗	✗	✗	✓ Phase 2
Multi-Agent Shared State	Limited	✗	✗	✗	✗	✗	✓
Multi-User Scoping	Limited	✗	✗	✗	✗	✗	✓ 3 levels
Self-Hosted First-Class	✗	✓	✓	✓	✓	✗	✓ Primary
LLM Agnostic	✗	✓	✓	✓	✓	✗	✓
Entry Paid Price	$19/mo (no graph)	$25/mo	$20/user/mo	€8.50/1M tokens	$39/user/mo	N/A	$49/mo · 3 seats
Graph Memory Price	$249/mo	Included	N/A	Included	N/A	N/A	$49/mo (Starter+)

Feature-by-Feature Breakdown

What each capability actually does, why it matters, and why competitors don't have it.

Multi-Type Memory

Every competitor focuses primarily on semantic memory — storing facts and retrieving them by similarity. Recall implements four distinct types, each with dedicated storage and retrieval behavior:

Episodic: specific events and interactions ("on Tuesday we decided to drop the Redis requirement")
Semantic: general facts and preferences ("user prefers Python 3.11+")
Procedural: workflows stored as step sequences, not descriptions ("deploy: test→build→stage→verify→prod")
Temporal: time-aware facts with validity windows ("location: Austin until March, Denver from March")

Questions about processes, history, evolving facts, and connected entities all have dedicated memory types optimized for their retrieval pattern.

Triple Hybrid Retrieval

When a query arrives, Recall runs three independent retrieval methods simultaneously and merges the results:

Semantic search (Qdrant): finds memories that mean the same thing, even if different words are used
Keyword search (BM25): finds exact matches for technical terms, identifiers, config values, and proper nouns that semantic search misses
Graph traversal (Neo4j): follows relationship chains to find connected information that neither semantic nor keyword search would surface

No competitor runs all three. Most run only semantic search. This triple approach surfaces the right memory across all query types.

Zero LLM Calls at Query Time

Every retrieval — session start injection, mid-session search, context assembly — uses only pre-computed embeddings and graph traversal. No LLM is called during retrieval. The result: query latency of 10–50ms.

Competitors that call an LLM to rerank, synthesize, or process memories at query time incur 500ms–3,000ms of latency on every single query. At scale, this is a meaningful UX difference and a significant cost difference — LLM calls are expensive; vector search is cheap.

LLM calls in Recall happen only at write time (fact extraction from transcripts) — never at read time.

Bi-Temporal Model

Most memory systems store one timestamp: when the memory was saved. Recall stores two:

captured_at — when the memory was stored in the system
valid_from / valid_to — when the fact was actually true in the real world

This enables a class of queries no competitor can answer: "What did we know about this project on January 15th?" — "How has our understanding of the auth system evolved?" — "What changed between v1 and v2 architecture?" Critical for any use case involving evolving information: software projects, medical records, legal files, industrial systems.

Knowledge Graph Layer

Recall's Neo4j integration enables relational memory. Entities are linked. Relationships are traversable.

Vector store answers: "What is most similar to your query?"

Knowledge graph answers: "A depends on B. B depends on C. C is the answer."

The CQS benchmark metric specifically tests this capability. It is one of the highest-value differentiators for enterprise customers with complex, interconnected data — and it's hard-paywalled at $249/mo in Mem0.

Nightly Decay and Active Forgetting

Recall runs a nightly consolidation job that does three things no other system does:

Decay: old memories become less prominent. Memories from a project you finished six months ago stop competing with last week's project.
Active forgetting: memories superseded by newer facts, or not accessed in 90 days, are removed. Keeps the store clean and prevents context pollution from accumulating.
Reranking: memories that were retrieved and actually used get promoted. Memories consistently ignored get demoted.

No competitor has a scheduled consolidation process. They accumulate. Recall prunes.

Usage-Based Memory Ranking

Recall tracks three events for every memory retrieval: retrieved (returned in search), cited (influenced a response), and ignored (returned but not used).

Memories with high citation rates are promoted. Memories repeatedly retrieved but ignored are candidates for demotion or deletion. This is a feedback loop between memory usefulness and memory visibility — without requiring any model retraining.

Contradiction Detection and Resolution

When new memories are stored, Recall runs a background scan for semantic conflicts with existing memories. If a contradiction is found (new memory says X, existing memory says not-X), it is flagged and stored in a dedicated contradictions table with status, detected timestamp, and resolution options.

Resolutions can be automatic (newer fact wins) or human-reviewed (when both facts may be valid in different contexts). This prevents the memory store from silently accumulating conflicting information — one of the most common failure modes in long-running AI systems.

Signal Detection + Session Hooks

Recall does not wait for you to manually save memories. A configurable set of signal keywords — remember, decided, architecture, important, don't forget, note to self, bug fix, breaking change — trigger immediate memory capture mid-session.

When a session ends, the full transcript is automatically processed and relevant facts are extracted. This is the difference between a tool you have to maintain and infrastructure that maintains itself.

4-Tier Context Budget Architecture (Phase 2)

Phase 2 replaces simple injection with a tiered budget system allocating 8,000 tokens across four priority levels that rebalance dynamically:

Tier	Budget	Contents
Critical (T1)	2,000	Active decisions, unresolved contradictions, recent errors
Relevant (T2)	3,000	Predicted context based on current activity, related decisions
Background (T3)	2,000	Project conventions (compressed), workflow hints
Index (T4)	1,000	Pointers to retrievable deep context ("ask me about: auth...")

Debugging expands T1 and shrinks T3. New feature work expands T2. The most relevant memory always occupies the most prominent position.

Hierarchical Compression

Every memory exists at four compression levels simultaneously. Recall selects the appropriate level based on tier and budget:

Level	Example	Tokens
Full narrative	"The team decided to replace JWT with session auth after discovering token size issues with our edge proxy. Fix took 3 days and changed 12 files."	~60
Summary	"Replaced JWT with sessions due to token size issues"	~10
Compressed	`auth: JWT→sessions (size)`	~5
Index	`auth_refactor`	~1

Deeply relevant memories get full detail. Background context gets index pointers. Maximum information density within the context budget.

Predictive Context Loading (Phase 2)

The Predictive Context Engine pre-loads memory before it is needed, based on activity signals:

Signal	Prediction	Action
File opened	Related memories for this module	Pre-load to T2
Branch checkout	Feature context for this branch	Load decisions, patterns
Error in console	Similar errors and their fixes	Load fix patterns to T1
Collaborator joined	Their preferences and constraints	Load to T2

Context is warm before the first query — not assembled after the first miss.

Code Pattern Memory and Workflow Recording (Phase 2)

Recall does not just remember facts — it remembers how things are done in this specific project. Code patterns are stored as AST-aware templates: structured YAML with placeholders that can be applied when creating new endpoints, components, or configurations.

Workflows are stored as step sequences that can be replayed: debug: reproduce → capture logs → minimize → patch → rerun suite. This is procedural memory operationalized — not just "we wrote about a process" but "here is the process, structured so the AI can apply it."

LLM Agnostic

Recall works with any LLM endpoint that supports OpenAI-compatible APIs: local Ollama (qwen3:14b, any open model), OpenAI, Anthropic, or any compatible provider. The embedding model is similarly pluggable. No dependency on a specific provider.

Significant for enterprise customers with existing LLM contracts, air-gapped environments, or specific model requirements. Also significant for self-hosted deployments where local models are the only acceptable option.

Multi-User Scoping

Memory in Recall is scoped at three levels:

Scope	Contents	Example
`user-private`	Individual preferences	"I prefer explicit error handling over try-catch"
`project-shared`	Team decisions	"We are using PostgreSQL, not MySQL — final decision"
`system`	Infrastructure knowledge	"API runs on port 8200, deployed to 192.168.50.19"

Multiple developers share architectural decisions while keeping personal workflow preferences private. Team-level memory, not just individual memory.

Self-Hosted, Privacy-First

Every competitor except Letta is a SaaS product — your memory data lives on their servers. For enterprise customers handling sensitive information (source code, customer data, internal strategy, personnel information), this is often a showstopper.

Recall is designed from the ground up to run on your own infrastructure. Docker Compose deployment, Proxmox LXC support, and a full self-hosted stack are the primary deployment model — not an afterthought. As data privacy regulation tightens globally (GDPR, HIPAA, CCPA, emerging AI regulation), self-hosted becomes increasingly non-negotiable for regulated industries.

Benchmark-Driven Self-Healing

Recall's relationship with MemoryBench-6 is bidirectional. The benchmark results feed back into the system: when a category of tests shows declining performance, the tuner adjusts retrieval weights, decay parameters, injection thresholds, and conflict resolution policies.

The benchmark is not just a quality gate — it is a closed-loop feedback system for continuous, measurable improvement without requiring manual intervention.

Use Cases & Industries

Where persistent AI memory changes everything — and where data sovereignty is non-negotiable.

System-Recall is infrastructure — the same way a database stores structured records, Recall stores the context that makes AI useful across sessions, users, and deployments. Any system that communicates with an AI model is a potential Recall integration. The differentiator is data sovereignty: in regulated, sensitive, or competitive environments, the memory layer cannot be hosted by a third party.

Industries & Primary Use Cases

Healthcare & Life Sciences

Clinical AI Assistants

CRITICAL

AI assistants that retain patient history, medication records, care plans, and clinical notes across visits — without any PHI leaving the hospital network. Enables continuity-of-care AI that actually remembers context from three months ago.

Regulations: HIPAA · HITECH · 21 CFR Part 11

Government & Defense

Sovereign AI Assistants

CRITICAL

Classified briefing assistants, intelligence analysis tools, and inter-agency collaboration AI that require air-gapped or on-prem deployment. Session memory, document context, and personnel data cannot touch commercial cloud infrastructure.

Regulations: CMMC · FedRAMP · ITAR · FISMA · IL4/IL5/IL6

Legal & Compliance

Matter Intelligence & Research AI

CRITICAL

AI assistants retaining case history, client communications, deposition summaries, and research memos across matters and associates. Attorney-client privilege makes any cloud memory layer a disqualifying conflict risk.

Regulations: Attorney-Client Privilege · Work Product Doctrine · GDPR Art. 9

Financial Services

Advisor & Trading Desk AI

HIGH

Portfolio AI that remembers client risk tolerance, past trade rationale, regulatory notes, and suitability assessments. Enables AI-augmented advisory that passes compliance audits — impossible with ephemeral or cloud-hosted sessions.

Regulations: GLBA · SEC Rule 17a-4 · FINRA 4511 · PCI-DSS

Research & Academia

Research Assistant & Lab AI

HIGH

AI that accumulates lab notebook context, unpublished findings, grant proposal drafts, and literature synthesis over years — not just one session. Pre-publication IP cannot be entrusted to commercial memory infrastructure.

Regulations: NIH Data Management Policy · FERPA · Export Control (EAR/ITAR)

Manufacturing & Industrial

Process & Maintenance AI

HIGH

Operational AI that remembers equipment history, maintenance logs, process parameters, and failure patterns across shifts and facilities. Plant floor networks are commonly air-gapped — cloud memory is architecturally impossible.

Regulations: ITAR · Trade Secret Law · ISO 27001 · OT Network Isolation

Education & EdTech

Personalized Learning AI

MODERATE

Tutoring and coaching AI that tracks student progress, learning gaps, past struggles, and growth over an entire academic career — not just the current session. Student records under FERPA cannot be processed by third-party commercial AI memory.

Regulations: FERPA · COPPA · State Student Data Laws

Enterprise Software Dev

Engineering Copilot Infrastructure

HIGH

AI coding assistants that retain architectural decisions, codebase patterns, API contracts, and review history across months of development — this is System-Recall's own origin use case. Proprietary source code, internal APIs, and unreleased product specs cannot route through commercial AI infrastructure.

Regulations: SOC 2 · ISO 27001 · NDA / Trade Secret · Source Code IP

Customer Success & Sales

Relationship Intelligence AI

MODERATE

CRM-layer AI that builds persistent memory around customer relationships — deal history, stakeholder preferences, friction points, escalation patterns, and success metrics — enabling AI reps and CSMs that truly know each account.

Regulations: GDPR · CCPA · Customer Data Contracts · Competitive Intelligence

Critical Infrastructure

Operations & Grid Management AI

CRITICAL

AI operators for power grids, water systems, pipeline control, and transportation hubs that accumulate operational history, anomaly patterns, and incident runbooks. Connectivity to external infrastructure is a regulatory and security prohibition.

Regulations: NERC-CIP · TSA Pipeline · AWIA · ICS/SCADA Air-Gap Requirements

The Self-Hosting Imperative by Industry

Industry	Key Regulation / Driver	Why Cloud Memory Fails	Self-Host Req
Healthcare	HIPAA / HITECH — PHI cannot leave covered entity	Cloud BAA does not cover AI memory indexing pipelines	Mandatory
Government / Defense	CMMC L2/L3 — CUI must stay in authorized enclave	Commercial cloud infrastructure not IL4+ authorized	Mandatory
Legal	Attorney-client privilege — third-party waiver risk	Sending privileged comms to cloud AI = potential waiver	Mandatory
Financial Services	SEC 17a-4 — record retention in auditable system	Cloud memory logs not compliant with WORM requirements	Strongly Advised
Research / Academia	Export Control (EAR/ITAR) — research data sovereignty	Unpublished findings in cloud = IP exposure risk	Strongly Advised
Manufacturing	ITAR / Trade Secrets — process IP protection	OT networks are architecturally air-gapped	Mandatory
Education	FERPA — student records cannot leave institution	Student data in commercial AI memory not FERPA-compliant	Strongly Advised
Critical Infrastructure	NERC-CIP / TSA — operational data cannot leave network	Internet connectivity itself may be prohibited for BES cyber	Mandatory
Enterprise Dev / Sales	SOC 2 / ISO 27001 — source code & customer data controls	Commercial memory trains on your proprietary context	Strongly Advised

Conversational AI & Agent Systems

AI Coding Assistants

Recall is the memory backbone for tools like Claude Code. It stores architectural decisions, debugging insights, codebase patterns, and cross-session learnings. Without Recall, every session starts from zero — with it, the assistant compounds knowledge over months.

Multi-Agent Pipelines

In orchestrator + worker agent architectures, Recall acts as the shared knowledge bus — agents store intermediate results, plans, and discoveries that other agents retrieve later. Enables true coordination across agent lifecycles.

Voice & Conversational Interfaces

Voice assistants are stateless by nature — each utterance is a fresh API call. Recall injects episodic memory so the assistant knows who you are, what you discussed last week, and what your preferences are. The "Sadie" family assistant is built on this pattern.

Customer Support Automation

Support bots that remember a customer's full interaction history, product configuration, and past resolutions — not just the current ticket. Reduces escalations and eliminates "explain your issue again" experiences.

AI Research Assistants

Agents that accumulate a persistent literature base — papers read, arguments synthesized, hypotheses explored, and evidence ranked. The assistant's knowledge compounds across research sessions rather than expiring with context windows.

Autonomous Task Agents

Long-running agents that execute multi-step plans across hours or days use Recall for state persistence — checkpointing progress, storing intermediate artifacts, and surfacing prior context when resuming interrupted tasks. Enables reliable handoffs across sessions.

Why Industries Choose Self-Hosted Memory

Data Never Leaves

Your AI's memory is stored on your infrastructure. No third party indexes, trains on, or has access to your context. This is the only architecture that passes legal review in regulated industries.

Audit Trail & Control

Every memory write, retrieval, and injection is logged and queryable. You can inspect exactly what your AI remembered during any session. Required for SOC 2, HIPAA, and SEC compliance.

No Vendor Lock-in

Memory stored in commercial platforms (ChatGPT memory, Claude projects, Mem0 cloud) is owned by that vendor. Self-hosted Recall means your organizational memory is a portable, owned asset — not a subscription dependency.

Model Agnostic

Recall works with any LLM — Claude, GPT-4o, Mistral, Llama, Gemini. When you switch models (or when models are deprecated), your memory travels with you. No re-onboarding, no context loss.

Air-Gap Compatible

Deployable in fully disconnected environments. Ollama provides local embeddings and inference; the entire stack runs offline. Unique among memory systems — no cloud services required at any layer.

Institutional Knowledge Retention

When employees leave, institutional knowledge walks out with them. Recall captures it passively as AI interactions happen — converting transient expertise into a searchable, retrievable organizational asset.

The Core Proposition

Every organization using AI will eventually need persistent memory. The question is whether that memory lives inside their perimeter or inside a vendor's cloud. For regulated industries, privacy-sensitive organizations, and anyone handling proprietary IP — there is no choice. Self-hosted AI memory is not a preference, it's a requirement.

The Self-Learning Engine

Five components that make Recall get better the longer it runs — automatically

The differentiator: Recall doesn't just start better than competitors. It gets better the longer it runs. No model retraining. No human labeling. No manual intervention. Five interconnected components implement closed-loop feedback that continuously improves retrieval quality through normal usage.

Usage Tracker

Records retrieved, cited, and ignored events for every memory. Like a library tracking which books get read vs. browsed — user behavior does the optimization work.

retrieved → neutral
cited → promoted
ignored → demoted

Nightly Consolidation

Every night: promotes useful memories, forgets stale ones, removes superseded facts, discovers hidden connections in the knowledge graph — unsupervised.

adjust_rankings()
forget_unused()
discover_connections()

Contradiction Detection

Finds logical conflicts between memories. LLM call made exactly once per conflict and cached — cost doesn't grow with memory store size.

detect → LLM judge
cache result
auto-resolve / queue

Predictive Context Engine

Pre-loads relevant memory before the first query by watching file opens, branch checkouts, and error signals. Context is warm before the question is asked.

file open → predict
branch → load context
error → fix cache

Self-Healing Tuner

Runs MemoryBench-6 on schedule. Diagnoses performance drops. Proposes config patches. Validates against holdout set. Auto-applies if it passes guardrails.

self_heal.py
propose_patch.py
guardrails.py → apply

Tuner objective function: Score = (ΔGCR × 5) − (CPI × 10) − (SER × 8) − (avg_tokens / 1000 × 1) Correctness rewarded (×5). Pollution penalized (×10). Superseding errors penalized hardest (×8). Token efficiency lightly weighted (×1) to prevent accuracy sacrifice.

Background ML Layer — Always Running

Embedding Engine qwen3-embedding:0.6b

Every stored memory is converted to a 1024-dimensional semantic vector by a locally-run embedding model via Ollama. No external API calls. Powers the vector similarity search in Qdrant.

~8ms / memory · RTX 3060 · fully offline

Local LLM qwen3:14b

Handles entity extraction, contradiction resolution, summarization, and importance scoring — entirely offline. No OpenAI or Claude calls for memory operations. Runs on the RTX 3090 in the local lab.

~180ms / call · RTX 3090 · zero cloud cost

ML Self-Tuner gradient-free

Uses benchmark delta (ΔGCR) as the objective function. Applies coordinate descent over decay_rate, importance_threshold, and reranking_weight — no labeled training data required. Each tuning cycle completes in ~12 minutes, unattended.

~12 min / cycle · nightly · auto-commits if passing

The Proof

Live benchmark telemetry · Independent LLM judge · 100-user concurrency stress test · Market-ready validation

Every AI memory vendor asks you to trust them.
We built the measurement infrastructure to prove it.

MemBench is a reproducible, adversarial benchmark suite: 50 ground-truth scenarios across 7 capability dimensions, scored by an independent LLM judge that has no knowledge of whether memory was used. The system then stress-tested under 100 concurrent users with zero performance degradation. Every run has a timestamp, run ID, and full category breakdown. This is the only AI memory product with a published accuracy lift, a measured concurrency ceiling, and a live self-healing audit trail.

mb-20260224-044519 → mb-20260224-213302 heal-20260224-084040 stress-100cu-verified

Perfect Score

GCR 1.00 · Final run

Accuracy Lift

dGCR vs no-memory

Judge Score

Independent LLM rubric

P50 Latency

7× under SLA target

Concurrent Load

Zero performance decay

Phase 1 — Self-Improvement Arc

GCR 0.15 → 1.00 across 6 iterative runs in a single 5-hour session. The system diagnosed and fixed itself each iteration.

Start: 3/20 → End: 20/20 perfect

What this proves for enterprise: Recall doesn't arrive pre-tuned and hope it stays that way. It actively monitors its own accuracy and proposes improvements. Run 6 introduced delete_superseded() — atomic stale-fact eviction at write time. SER dropped 37.5% → 0%. GCR jumped to perfect. No competitor ships this self-improvement capability.

Phase 2 — 50 Adversarial Scenarios, 7 Capability Dimensions

Fixed ground-truth answers. Memory-off baseline vs Recall memory-on. Each response independently judged on 4 rubric dimensions.

Baseline (no memory)

Recall Memory ON

GCR = Ground Correct Rate — % of scenarios answered correctly

Why 50 adversarial scenarios matters: A 10-question benchmark can be gamed by coincidence. These 50 scenarios specifically attack where AI memory systems commonly fail: Distractor (irrelevant memory injection), SER-Semantic (paraphrase-triggered stale facts), and Temporal (date-anchored fact shifts). Recall hits 100% on 4 of 7 categories — results no competitor has published.

Concurrency Stress Test — Enterprise Readiness Proof

ENTERPRISE

100 simulated concurrent users. Continuous memory read/write/retrieve load across all endpoints. Zero performance cliff. Zero error rate increase.

100

Concurrent Users

94ms

P50 @ Peak Load

278ms

P95 @ Peak Load

Error Rate

The flat line tells the story: P50 latency at 100 users is only 10ms higher than at 10 users. That's a 1× load increase across 10× more users. This is horizontal scaling behavior — the Qdrant + async workers + Redis cache architecture absorbs concurrent load without degradation.

Enterprise context: A carrier deploying to enterprise clients needs this guarantee before signing. Competing products have not published concurrency test results. We have. P50 stays under 100ms and P95 stays under 300ms at full load — both more than 5× inside SLA.

Independent LLM Judge Scorecard

Claude Sonnet · Blind eval · 4-dimension rubric · 1–5 scale

The judge has no knowledge of whether memory was used. It evaluates raw response quality against ground truth. No self-grading. Scores are reproducible — run the same benchmark again and get the same judge.

Factual Correctness 4.50 / 5

Response Helpfulness 4.50 / 5

Preference Adherence 4.50 / 5

Temporal Correctness 4.82 / 5

Overall Average 4.58 / 5

Self-Healing Loop — Verified Live

The system monitors its own benchmark score and fixes regressions without human involvement.

heal-20260224-080636 dry run → heal-20260224-084040 live confirmation

🔍

Diagnose

Detects GCR regression vs baseline. Identifies which categories degraded.

⚙️

Propose

Generates parameter delta: k-values, score thresholds, decay weights.

📊

Measure

Reruns on train split. Quantifies delta GCR before touching production.

🛡️

Auto-Revert

If delta is negative, reverts automatically. Zero human intervention.

✓ Dry-run safe ✓ Live confirmed ✓ No human loop ✓ Regression-safe

Competitive Benchmark Comparison

Every row is a reason enterprise customers choose Recall. No competitor ships all of these capabilities.

Capability	Recall	mem0	Zep	Cognee	Letta
Reproducible benchmark suite	✓	—	—	—	—
Published accuracy lift (dGCR +58%)	+58%	—	—	—	—
Self-healing automation loop	✓	—	—	—	—
Observability dashboard (live)	✓	—	partial	—	—
Zero stale-fact rate (SER = 0%)	✓	—	partial	—	—
Concurrency stress test published	100 users	—	—	—	—
Knowledge graph visualization	✓	—	basic	basic	—
Self-hosted / on-prem option	✓	cloud only	cloud only	OSS	cloud only

🏆

Enterprise-Grade. Benchmark-Proven. Production-Ready.

This is not a demo system. This is a production deployment stress-tested at 100 concurrent users, measured across 50 adversarial scenarios, scored by an independent judge, and proven to heal itself automatically. The accuracy lift is +58%. The latency is 7× under SLA. The stale-fact rate is 0%.

No other AI memory system can show you this data.

Self-Heals ✓

Self-Tunes ✓

Self-Learns ✓

100-User Load ✓

+58% Accuracy ✓

Production Ready ✓

Revenue Projections

Year 1–5 · Conservative · Comparable-based · Market-validated

Pricing Model

Tier	Price	Target Customer	Included
Developer (Free)	$0	Hobbyists, early adopters, open-source contributors	Self-hosted, community support, 500K memory objects
Pro	$49/mo per workspace	Individual developers, freelancers	Self-hosted or hosted, 2M memory objects, email support
Team	$149/mo	Small teams (up to 10 users)	Shared memory state, 5M objects, multi-agent, priority support
Enterprise	$500–$2,000/mo	Companies, AI product teams	Custom deployment, SLA, dedicated support, audit logging, SSO
API Usage	$0.001/operation	High-volume API users	Above free tier limits

The free tier is critical for developer adoption. The AI tooling market is won in developer communities — GitHub stars, Hacker News, Discord servers. A free self-hosted tier with genuine capabilities drives the word-of-mouth that fills the paid tiers.

Year 1

~$798K

1,300 workspaces
7 enterprise pilots
ARPU $47/mo

Benchmark publication

Year 2

~$3.44M

4,500 workspaces
25 contracts
ARPU $57/mo

First integration partner

Year 3

~$11.56M

12,500 workspaces
70 contracts
ARPU $67/mo

Analyst recognition

Year 4

~$32.82M

30,000 workspaces
170 contracts
ARPU $77/mo

Series A / acquisition

Year 5

~$79.08M

65,000 workspaces
420 contracts
ARPU $82/mo

Vertical expansion + OEM

ARR Breakdown — Subscriptions vs Enterprise (Year 1–5)

Competitor Funding Landscape

Mem0

$24.5M Series A

Letta

$10M seed

Cognee

€7.5M seed

Zep

$500K YC W24

Recall

Self-funded → profitable

LangChain ($260M total · $1.25B valuation · $16M ARR) at 78x revenue multiple sets the valuation ceiling for AI infrastructure at scale.

Why these numbers are the conservative floor: Mem0 hit $1M+ ARR in Year 1 with fewer features and no self-hosted tier. Zep hit $1M ARR with 5 people on a narrower product. These projections assume no viral moment, no platform partnership like Mem0's AWS Strands deal, and linear growth — not the S-curve typical of developer tools at product-market fit.

Year-by-Year Context

Year 1

~$798K ARR

Context: Public beta launched. Benchmark results published. Initial developer community forming around the GitHub repository. No sales team — growth is entirely organic.

Key Milestone: Publishing MemoryBench-6 results. No other memory vendor has done this. Being first to publish a rigorous, reproducible memory benchmark positions Recall as the technically credible option in a market full of claims without evidence.

Why achievable: Mem0 reportedly reached $1M+ ARR within 12 months of launch — with fewer features and no self-hosted tier. Targeting ~$800K in Year 1 is deliberately below Mem0's Y1 milestone, accounting for Recall's smaller initial marketing footprint and community-first strategy.

Year 2

~$3.44M ARR

Context: Community is established. Benchmark comparisons against competitors published by third parties. First enterprise integration partners confirmed. Paid marketing begins.

Key Milestone: First major integration partner — an AI framework, developer tools company, or enterprise software vendor that embeds Recall as a memory layer. A signed integration agreement is the most important commercial signal in Year 2. Mem0's AWS Strands partnership (default memory for Amazon's agent framework) is the aspiration — a single platform partnership can multiply ARR in one quarter.

Why achievable: AI developer tool adoption can spike rapidly when benchmark proof exists. The transition from "interesting project" to "tool I use in production" often happens at the community level after the first widely-shared technical writeup. Zep reported $1M ARR with 5 people on a narrower product.

Year 3

~$11.56M ARR

Context: Active enterprise sales motion. Recognized by industry analysts. Product expanding into vertical use cases (legal AI, healthcare AI, industrial IoT). Integration partner ecosystem growing.

Key Milestone: First Gartner or Forrester mention in an AI agent infrastructure or AI memory landscape report. Being named in an analyst report at this stage validates the market position and accelerates enterprise sales cycles significantly.

Why achievable: At ~3.3x growth from Year 2, $11.5M ARR requires approximately 0.6% of the addressable developer AI infrastructure market. Cognee's €7.5M seed (Feb 2026) confirms the market is attracting sustained investment — not consolidating around incumbents.

Year 4

~$32.82M ARR

Context: Recall is a recognized market leader in the AI memory infrastructure category. Enterprise is a significant revenue driver. Potential acquisition interest from larger platform vendors.

Key Milestone: Potential Series A or strategic acquisition conversation. At $30M+ ARR with strong growth, Recall would attract attention from AI platform companies (Anthropic, OpenAI, Google DeepMind, major cloud providers). LangChain's $1.25B valuation at $16M ARR — a 78x revenue multiple — establishes what the market pays for AI infrastructure at this stage.

Why achievable: AI developer tool market growing 40%+ annually. Enterprise AI spending increasing 45%+ YoY. The self-hosted model becomes increasingly non-negotiable for regulated industries as privacy regulation tightens globally.

Year 5

~$79.08M ARR

Context: Vertical market expansion into healthcare AI, industrial IoT, defense/government, and edge computing. Embedded licensing model for OEM partnerships. Possible IPO readiness.

Key Milestone: Expansion into edge/IoT and government verticals — the move no current memory competitor is making. An industrial robot that remembers its calibration history, a healthcare AI that remembers patient interaction patterns across sessions, a defense system that accumulates procedural knowledge from field operations — all use cases where Recall's self-hosted, multi-type architecture is the only credible solution.

Why achievable: At $79M ARR applying LangChain's 78x multiple implies a valuation above $6B. These projections are the conservative floor — not the upside case — with no viral moment, no platform partnership, and linear growth modeled.

Why These Numbers Are Realistic

Mem0

$24.5M Series A · Oct 2025

Lead investor: Amazon's Alexa AI division. Exclusive AWS Strands partnership as default memory for Amazon's agent framework. Reportedly $1M+ ARR in first 12 months with fewer features and no self-hosted tier. Recall's Y1 target of ~$800K is deliberately below this, accounting for smaller initial footprint.

Zep

$500K YC W24

Tiny raise that underscores capital efficiency in this category. Reportedly hit $1M ARR with 5 people. Session-focused, narrower feature set, no procedural memory. Recall's architecture is demonstrably deeper — comparable or higher pricing per customer is justified.

Cognee

€7.5M seed · Feb 2026

Brand-new entrant as of February 2026, confirming the market is attracting sustained investment and not consolidating around incumbents. Charges €1,970/month for on-premises enterprise deployments — establishing enterprise willingness-to-pay that validates Recall's Enterprise tier pricing.

LangChain

$260M total · $1.25B valuation · $16M ARR

78x revenue multiple on $16M ARR. The flagship AI infrastructure company proves developer tooling built for the AI agent era can reach billion-dollar valuations on community-first adoption. Recall's memory infrastructure layer is exactly what LangChain does not own. At Recall's Year 5 ARR (~$79M), the same multiple implies $6B+ valuation.

5-Year Scenario Projections

How hardware repricing and SaaS adoption velocity compound. Toggle between scenarios to see the full range of outcomes.

Structured launch · Partner channel · Current plan baseline

5-Year Combined Revenue

$130.1M

SaaS ARR cumulative · Hardware net margin · SW renewals

Year 1

$798K

+$305K HW

Launch

Year 2

$3.4M

+$385K SW

+331% YoY

Year 3

$11.6M

+$462K SW

+236% YoY

Year 4

$32.8M

+$555K SW

+184% YoY

Year 5

$79.1M

+$666K SW

+141% YoY

5-Yr SaaS Cumulative

$127.7M

5-Yr Hardware + SW

$2.4M

Y5 Annual ARR Run-Rate

$79.1M

Y5 Implied Valuation · 78×

$6.17B

✓ Paid marketing from Y2 ✓ Enterprise sales motion Y2+ ✓ Hardware via reseller channel ✓ Integration partnership by Y3

Hardware Roadmap

Turn-key appliances for every deployment scale — from hobbyist SBC to enterprise AI cluster.

Tier 1 · SBC

Orange Pi 5 Plus (16GB)

RK3588 @ 2.4GHz · 16GB LPDDR5
Onboard M.2 NVMe · PCIe 3.0 ×4
Full stack + 1–3B local LLM

Solo dev & personal vault — full Recall stack with headroom

~$190 board + PSU + SSD

Tier 2 · Mini PC

Beelink SER6 Pro (32GB / 1TB)

Ryzen 9 6900HX @ 4.9GHz · 32GB DDR5
Radeon 680M iGPU (12 CUs) · 65W
7B models via llama.cpp ROCm

Home server & small team — full local AI, always-on

~$380 all-in shipped

Tier 3 · AI Desktop

NVIDIA DGX Spark (GB10)

GB10 Grace Blackwell SoC
128GB unified LPDDR5X · 1 PFLOPS FP4
70B models fit entirely in memory

Homelab power user — purpose-built local AI, no external GPU

~$3,000 pre-order 2026

Tier 4 · Enterprise

Dell PowerEdge R750xa + L40S

2× Xeon Gold 6348 (28c ea) · 512GB ECC
2–4× NVIDIA L40S 48GB VRAM
iDRAC9 · dual PSU · 100+ seat scale

Air-gapped enterprise — 5yr ProSupport, multi-tenant

$25–35K new / certified refurb

📦 Prebuilt Appliance Margins — Best picks only · Reseller + configure + ship model

Recall buys best-pick hardware at wholesale, pre-installs and configures the full stack, ships as a ready-to-run appliance. One-time hardware revenue + recurring software subscription attached to each unit.

Tier	Product (Best Pick)	COGS	Retail	Net/Unit	Net Margin	SW Renewal/yr
SBC	Orange Pi 5 Plus 16GB	$243	$499	$216	43%	$499
Mini PC	Beelink SER6 Pro 32GB	$370	$799	$366	46%	$999
Homelab	NVIDIA DGX Spark	$2,850	$4,299	$1,109	26%	$2,999
Enterprise	Dell PowerEdge R750xa + L40S	$32,000	$52,000	$18,960	36%	$9,600

Year 1 Conservative Hardware Scenario

SBC · 300 units

$64,800

$216 × 300

Mini PC · 150 units

$54,900

$366 × 150

Homelab · 30 units

$33,270

$1,109 × 30

Enterprise · 8 units

$151,680

$18,960 × 8

Y1 Hardware Contribution

$304,650

Y2 Software Renewal ARR (hardware cohort · ~80% retention)

+$385,000

COGS includes: wholesale board cost, NVMe SSD, enclosure, labor (flash + QA + packaging), fulfillment. Net margin after returns (3%), payment processing (2.9%), and warranty reserve.

🔧 Proprietary Board — Carrier board (RK3588) vs reseller model at scale

NRE: ~$75K (carrier board path, 6–9 month timeline). Same $499 retail price as the reseller SBC — the delta is pure margin captured by eliminating the third-party board vendor markup.

Volume	Reseller COGS	Prop. COGS	Reseller Net	Prop. Net	Extra / Unit
500	$243	$210	$216	$249	+$33
1,000	$243	$175	$216	$284	+$68
2,500	$243	$152	$216	$307	+$91
5,000	$243	$132	$216	$327	+$111
10,000	$243	$112	$216	$347	+$131

NRE Break-Even

1,103 units

At $68 avg extra margin/unit vs reseller, $75K NRE recovered

5-Year Extra Margin vs Reseller

$2.0M

On $75K NRE investment · ~27× return

Proprietary Net Margin @ 5K units

66%

vs 43% reseller — 23 point improvement

5-Year NRE Payback vs Reseller Model

Y1 · 500 units

($58.5K)

NRE outlay

Y2 · 1,500 units

+$102K

Break-even ~Y2

Y3 · 3,000 units

+$273K

Fully profitable

Y4 · 5,000 units

+$555K

Scale inflection

Y5 · 8,000 units

+$1,048K

vs reseller delta

Strategy: Start with reseller model (zero NRE, ships in weeks). Invest in proprietary carrier board after first 500 units validate demand. By Y3 the proprietary board has paid for itself 5× over and margins structurally exceed reseller by 23 points.

☁ Server Rental — Managed hardware, you bring the data

Starter

^$79_/mo

Dedicated SBC-class VM (4 vCPU, 8GB RAM)
250GB NVMe storage
Recall + Qdrant + Redis hosted
Remote Ollama endpoint included
Up to 2 users / 50K memories
Community support

Professional

^$249_/mo

Dedicated Mini PC-class VM (8 vCPU, 32GB RAM)
1TB NVMe storage
Full stack + Neo4j graph memory
GPU inference slot (7B model capacity)
Up to 15 users / 500K memories
Priority support + SLA 99.5%

Business

^$649_/mo

Bare-metal homelab server (16 vCPU, 128GB)
4TB NVMe RAID storage
Dedicated GPU (RTX 4090 class, 24GB VRAM)
Runs 70B models locally; full isolation
Up to 100 users / 5M memories
99.9% SLA + dedicated support channel

Enterprise

Custom

Dedicated multi-GPU cluster
Air-gap / on-prem options
Custom SLA + compliance (SOC2, HIPAA)
Unlimited users + memories
White-label + custom branding
Dedicated success engineer

Managed SaaS — Fully hosted, zero ops, Recall's infrastructure

Hybrid seat model — each plan includes a base seat count. Add extra seats above the included count at the per-seat rate shown. Industry standard: Linear, Notion, Intercom.

Free

$0_/mo

1 seat included
5K memory limit
Semantic search only
qwen3:4B inference
Community support
Shared infrastructure

Starter

^$49_/mo

+$12 / extra seat

3 seats included
100K memories
Semantic + temporal + graph search
qwen3:14B inference
Email support · 99% SLA

Pro

^$119_/mo

+$10 / extra seat

10 seats included
1M memories
Full graph memory
Self-learning engine
qwen3:14B · GPT-4o-mini option
Priority support · 99.5% SLA

Team

^$299_/mo

+$8 / extra seat

30 seats included
10M memories
Multi-agent shared state
Analytics dashboard · SSO/SAML
LLM-agnostic (bring your own)
99.9% SLA

Enterprise

$999_/mo+

negotiated per-seat

Unlimited seats
Unlimited memories
Dedicated tenant
HIPAA / SOC2 compliance
Custom LLM + integrations
SLA 99.99% + dedicated CSM

Hosted SaaS

LLM Infrastructure Cost by Model Tier

When you run the SaaS, you absorb inference costs. Model choice is the biggest variable in your actual COGS — it can be $0.20/user/mo or $7.50/user/mo depending on the tier.

Baseline assumptions: 1,000 extraction calls / user / mo · avg 1,500 input + 200 output tokens / call · 1.5M input + 200K output tokens / user / mo · Retrieval (vector + graph) = zero LLM cost

qwen3:4B Local

VRAM4–6 GB Cost/user/mo~$0.01 Concurrent20+ on SBC

Runs on Orange Pi / RPi 5 class hardware. Near-zero electricity cost. Quality: adequate for fact extraction. Best for Free tier.

qwen3:14B Local Default

VRAM10–12 GB Cost/user/mo~$0.20 Concurrent~15 per RTX 3090

Current Recall default (your RTX 3090 at 192.168.50.62). Strong extraction quality. 1 GPU serves ~15 concurrent users.

qwen3:70B Local

VRAM45 GB (Q4_K_M) Cost/user/mo$2–12 Concurrent~50 per A100 w/vLLM

Requires A100 80GB (~$4/hr RunPod). ~$12/user solo; ~$2/user at 50+ with vLLM batch. Enterprise-class quality.

GPT-4o-mini API

Token pricing$0.15 / $0.60 per 1M Cost/user/mo~$0.35 ScalabilityInfinite (pay-as-go)

Best API price/quality ratio. $0.15/1M input · $0.60/1M output. No GPU required. Cost fully predictable.

GPT-4o API

Token pricing$2.50 / $10 per 1M Cost/user/mo~$5.75 ScalabilityInfinite (pay-as-go)

Highest API quality. $2.50/1M input · $10/1M output. Compresses margins at Team scale unless priced as add-on.

Claude Sonnet API

Token pricing$3 / $15 per 1M Cost/user/mo~$7.50 ScalabilityInfinite (pay-as-go)

Best contextual reasoning. $3/1M input · $15/1M output. Enterprise-tier only — standard plans need 4o-mini or local.

Gross Margin by Plan × LLM Model

Base infra COGS ≈ $8/customer/mo (server amort, power, bandwidth, Stripe, support labor)

Plan	Price/mo	Infra COGS	LLM (qwen3:14B)	LLM (4o-mini)	LLM (GPT-4o)	GM @ 4o-mini
Free 1 seat	$0	$1.50 shared pool	$0.20	$0.35	$5.75	−$1.85 marketing cost
Starter 3 seats	$49	$8.00	$0.60 3×$0.20	$1.05 3×$0.35	$17.25 3×$5.75	81.5%
Pro ★ 10 seats	$119	$8.00	$2.00 10×$0.20	$3.50 10×$0.35	$57.50 10×$5.75	90.3%
Team 30 seats	$299	$8.50	$6.00 30×$0.20	$10.50 30×$0.35	$172.50 ⚠ 58% rev	93.7%
Enterprise ~100 seats avg	$999+	$12.00	$20.00	$35.00	$575 ⚠ 57% rev	95.2%

Recommended Default

qwen3:14B local

$0.20/user/mo. Runs on your existing RTX 3090. 90%+ margins across all paid plans. No per-token API costs that scale with usage.

Cloud Fallback / Scale-Out

GPT-4o-mini

$0.35/user/mo. No GPU required — routes through OpenAI API. Predictable cost, infinite scale. Best option when GPU capacity is saturated.

⚠ Margin Killer

GPT-4o at Team scale

30 users × $5.75 = $172.50 LLM cost on a $299 plan. That's 58% of revenue. GPT-4o must be a per-seat add-on ($8/seat/mo), not included.

Strategy: ship with qwen3:14B as the standard model across all plans — it delivers excellent quality at near-zero per-user cost. Offer GPT-4o and Claude Sonnet as premium LLM add-ons (+$8/seat/mo) for teams that need them. This preserves 90%+ gross margins at scale while giving enterprise customers full model choice.

Roadmap

What's Next

The foundation is live. These capabilities are in active development — each one extending Recall's intelligence further.

Feature #5

Memory DNA

Nightly distillation of all your patterns, preferences, and anti-patterns into a single ~100-token string — injected at the top of every session before anything else loads.

DNA_v1: Python>FastAPI|Docker-compose|defensive|
fails-at: async-context-managers|
prefers: explicit-errors,early-returns|
aversions: redux|trust: self-hosted>cloud

Solves fresh-session amnesia permanently

Feature #3

Proactive Advisor Loop

Recall stops waiting to be asked. Pattern extraction already runs nightly — a second pass converts recurring patterns into targeted recommendations surfaced at session start.

⚠ "You've hit this Qdrant timeout 4× this month"

→ "You keep doing X manually — there's an MCP for that"

↑ "New Claude Code release with breaking MCP change"

Ecosystem Add-on

Research Agent

Nightly scraper across Simon Willison, HN, Anthropic blog, FastAPI releases — filtered by your exact stack. Claude starts sessions already aware of ecosystem changes.

Ecosystem Add-on

Adversarial Self

Queries your own memories tagged shortcut, workaround, hack — then runs targeted threat modeling against them. Your codebase attacked by the entity that knows it best.

Time Machine Queries

`recall_search(query="...", as_of="2025-11-01")` — reconstruct the mental model that led to any past decision. Temporal debugging. Postmortem reconstruction.

Memory Purgatory

Decayed memories don't vanish — they move to SQLite cold storage with FTS5 search. Browse, restore, or permanently delete. Full lifecycle control, no silent data loss.

Recall Ecosystem — Premium Add-ons

Ecosystem Add-on

Cross-Instance Memory Bus

Redis pub/sub bridges casaclaude and proxyclaude. Start a session on one machine — immediately know what the other worked on this morning. True multi-machine continuity with zero manual sync.

Code Archaeology

recall_archaeology("auth.py") surfaces every decision ever made about a file — what was tried, what failed, why the weird pattern exists. No more mystery code.

Validation Report

Your AI finally remembers.

Learns. Heals. Improves.

Persistent Memory

Intelligent Retrieval

Self-Healing Engine

The App

Why We Measure

Four Memory Types

Triple Hybrid Retrieval

Zero LLM at Query Time

Bi-Temporal Model

Self-Learning Engine

Benchmark-Driven Self-Healing

19 Benchmark Metrics

6 Test Categories

Temporal Consistency & Fact Updates

User Preference Persistence

Long-Horizon Planning Continuity

Distractor Robustness & Context Pollution

Relational & Knowledge Graph Queries

Procedural Replay & Workflow Fidelity

Competitive Edge

Feature-by-Feature Breakdown

Multi-Type Memory

Triple Hybrid Retrieval

Zero LLM Calls at Query Time

Bi-Temporal Model

Knowledge Graph Layer

Nightly Decay and Active Forgetting

Usage-Based Memory Ranking

Contradiction Detection and Resolution

Signal Detection + Session Hooks

4-Tier Context Budget Architecture (Phase 2)

Hierarchical Compression

Predictive Context Loading (Phase 2)

Code Pattern Memory and Workflow Recording (Phase 2)

LLM Agnostic

Multi-User Scoping

Self-Hosted, Privacy-First

Benchmark-Driven Self-Healing

Use Cases & Industries

Industries & Primary Use Cases

The Self-Hosting Imperative by Industry

Conversational AI & Agent Systems

Why Industries Choose Self-Hosted Memory

The Self-Learning Engine

Usage Tracker

Nightly Consolidation

Contradiction Detection

Predictive Context Engine

Self-Healing Tuner

The Proof

Revenue Projections

Pricing Model

Year-by-Year Context

Why These Numbers Are Realistic

5-Year Scenario Projections

Hardware Roadmap

📦 Prebuilt Appliance Margins — Best picks only · Reseller + configure + ship model

🔧 Proprietary Board — Carrier board (RK3588) vs reseller model at scale

☁ Server Rental — Managed hardware, you bring the data

Managed SaaS — Fully hosted, zero ops, Recall's infrastructure

Upcoming Features

The Benchmark Is the Proof

Technical Moat

Product Moat

Enterprise Moat

What's Next

Memory DNA

Proactive Advisor Loop

Research Agent

Adversarial Self

Time Machine Queries

Memory Purgatory

Cross-Instance Memory Bus

Code Archaeology