api.neruva.io·Typed agentic memory

Agent memory at the speed of a function call.

Records -- typed events with kind, tags, ts, and meta as first-class filters. Auto-embedded on ingest. GDPR-native forget by predicate. Portable as .neruva -- a single file you can export, diff, and own.

Recall cost
~3,125x cheaper
One records_query at $2/1M vs ~1.25k input tokens of context-stuffing on Opus 4.7 ($0.00625/turn). The recall slice you stop paying frontier-model rates for, every turn, forever.
Latency
<1ms
Server-side substrate compute is microseconds. End-to-end p95 from a warm caller is ~100 ms -- almost all of it network. No embedder LLM, no rerank model, no async reindex queue between you and the agent's next turn.
Forget
GDPR
One predicate, every matching record gone. Forget by kind, by tag, by user, by ts window, by id. Free and never metered. Tombstones flush on the next compaction.
Ownership
.neruva
One file per namespace. Atomic, point-in-time consistent, versioned. Records + reserved slots for KG, causal, and analogy sections. Export, diff, migrate, replay -- read it offline with the same client.
One substrate, four layers

Records is one of four. They share a wallet, a namespace, a portable file.

Every layer runs on the same D=8192 HD bipolar substrate, deterministic from a seed, sub-millisecond, exportable as part of the same .neruva container.

Records
Typed memory
kind / tags / ts / meta as first-class filters. Semantic + typed query in one call. Auto-embedded. GDPR-native forget. The primary substrate.
KG
Mutable structured state
(subject, relation, object) triples bound into ~32KB per relation. kg.query returns the answer with calibrated confidence -- or null below the floor. The thing your LLM stops hallucinating about.
Causal
Pearl's do-operator
P(Y|X=1) is what you observed. P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector store exposes this.
Analogy
HD pattern transfer
A:B::C:? Parallelogram completion over factored items. Sub-millisecond, deterministic, no LLM in the loop.
What we solve

Every agent team hits the same wall.

We've heard it from teams shipping agents into production. The shape of pain is always the same.

Your agents forget between sessions.

todayContext resets the moment the loop ends. Users restart and the agent has no idea who they are.

with neruvaTyped records keyed by namespace. Recall in a single call -- semantic + typed filters. No LLM round-trip.

Context-stuffing burns the same tokens every turn.

todayMost agents 'remember' by re-prepending recall text to every LLM call. ~1.25k input tokens at frontier rates, every turn, forever.

with neruvaOne records_query returns the same payload at $0.000002/turn. ~3,125x cheaper per recall slice.

LLM-as-memory hallucinates -- and bills you for it.

todayCalling a model to summarize, extract, and rerank on every write adds seconds of latency and per-token cost that compounds.

with neruvaNo model in the retrieval path. Deterministic, auditable, cheap by construction.

Filter dicts are gymnastics.

todayVector stores treat structured attributes as metadata, then ask you to express filters as $eq/$in dict-of-dicts. kind and tags should be query parameters, not metadata.

with neruvaRecords carry kind, tags, ts as first-class fields. Query by tag-any / tag-all / kind / time window directly.

Nobody can audit what your agent remembers.

todayCompliance asks 'show me what this agent knows about user X' and the answer is a black-box embedding.

with neruvaEvery record is a first-class row with kind, tags, ts, meta, and a content-addressable id. Inspect, export, delete.

Surgical forget is an afterthought.

todayA user revokes consent. Now you have to find their fingerprint scattered across embeddings, rebuilds, and caches.

with neruvaOne predicate -- by id, by tag, by user, by ts window. Free and never metered. Tombstones flush on next compaction.

Real-world use cases

Where teams reach for Neruva memory.

Multi-user customer-support agents

scenarioA SaaS support assistant serves 50,000 customers. Each customer needs an isolated namespace: past tickets, preferences, sentiment.

with neruvaOpen a Neruva namespace per customer. Marginal cost rounds to zero. Recall last 20 tickets in under a millisecond per turn -- typed by kind=ticket, filtered by tagsAny=[customer:abc-123].

Coding-agent persistent context

scenarioA Claude Code session ends, the next one starts cold. The agent has no idea what was decided, what broke, or where the project left off.

with neruvaneruva-record-install captures every prompt, tool call, decision and mistake as a typed record. Next session opens with records_query(kind=['decision','mistake','session_end']).

Compliance-grade agent audit trail

scenarioFinancial-services chatbot must prove which memory drove which response, retain for 7 years, support 'right to be forgotten' requests.

with neruvaEvery record is an inspectable row with kind, tags, ts, meta, content hash. Surgical delete by predicate. Export as .neruva to cold storage.

Write-heavy ingest pipelines

scenarioReal-time observability or log-analysis where thousands of events per minute become memory. HNSW indexes choke; tail latency explodes.

with neruvaAppend-only WAL with async indexing. Writes don't block on the index. Throughput stays flat as the corpus grows.

On-prem / edge agent deployments

scenarioAn agent that runs in a customer datacenter or on a developer's laptop. Managed indexes are off the table.

with neruvaExport the namespace as .neruva and ship it. The file is the deployment. Read it offline with the same client.

Cross-agent handoff

scenarioClaude Code finishes a refactor, Cursor picks it up, Codex reviews. Each tool today has its own memory store -- nothing carries across.

with neruvaOne namespace, one wallet, one MCP. Every harness reads the same typed records. Adopt the kind=session_end / handoff convention and the next agent picks up where the last one left off.

The cost wedge

Stop paying frontier-model rates for the recall slice.

Most agent stacks "remember" by re-prepending recall text to every LLM call. That recall slice gets billed at frontier-model input rates, every turn, forever. Replace it with one records_query.

Context-stuffing
Stuff 5 KB of recall context into every Opus 4.7 turn.
~1.25k input tokens × $5/M = $0.00625 / turn
records_query
One typed semantic query. Returns the same 5 KB.
$2 / 1M = $0.000002 / turn
Net savings
Per turn: ~3,125x cheaper.
A 1k-turn-per-day agent saves ~$190/mo on the recall slice alone.

Numbers use Claude Opus 4.7 list pricing ($5/M input). Other models vary -- Sonnet is cheaper, Haiku cheaper still. The shape of the savings is what matters: every recall via the substrate is a recall you don't do via a context-stuffing prompt.

How it feels in production

Designed for the workload you actually run.

Writes
Hot-path safe.
Agents write records inline without blocking. The index catches up in the background.
Tenancy
Namespace per agent, per user, per anything.
Spin up a million namespaces. We don't charge for them individually.
Filters
First-class on the record.
kind, tagsAny, tagsAll, tsGte, tsLt, ids. Pass them directly. No metadata-dict gymnastics.
Recency
Time-aware retrieval, first-class.
Time-window filters built into the record, not bolted on with metadata. Timeline endpoint streams most-recent-first with cursor pagination.
Compliance
Surgical forget. Full audit.
Forget by predicate. Every record is inspectable and removable. Free and never metered.
Billing
Wallet model. No surprises.
Top up via PayPal. Operations deduct in real time. No subscription, no minimum, no overage trap.
Migrating?
A legacy vector index API is available at /v1/indexes/* for codebases that already speak the upsert / query / delete shape. New users should start with Records.
Migration notes →

Stop renting search infra.
Start owning agent memory.

$5 in credits on signup. No card. No subscription. No demo call. Wire it in and decide.