paperclip/doc/plans/2026-03-17-memory-service-surface-api.md

# Paperclip Memory Service Plan

## Goal

Define a Paperclip memory service and surface API that can sit above multiple memory backends, while preserving Paperclip's control-plane requirements:

- company scoping
- auditability
- provenance back to Paperclip work objects
- budget and cost visibility
- plugin-first extensibility

This plan is based on the external landscape summarized in `doc/memory-landscape.md`, the AWS AgentCore comparison captured in [PAP-1274](/PAP/issues/PAP-1274), and the current Paperclip architecture in:

- `doc/SPEC-implementation.md`
- `doc/plugins/PLUGIN_SPEC.md`
- `doc/plugins/PLUGIN_AUTHORING_GUIDE.md`
- `packages/plugins/sdk/src/types.ts`

## Recommendation In One Sentence

Paperclip should add a company-scoped memory control plane with company default plus agent override resolution, shared hook delivery, and full operation attribution, while leaving extraction and storage semantics to built-ins and plugins.

## Product Decisions

### 1. Memory resolution is company default plus agent override

Every memory binding belongs to exactly one company.

Resolution order in V1:

- company default binding
- optional per-agent override

There is no per-project override in V1.

Project context can still appear in scope and provenance so providers can use it for retrieval and partitioning, but projects do not participate in binding selection.

No cross-company memory sharing in the initial design.

### 2. Providers are selected by stable binding key

Each configured memory provider gets a stable key inside a company, for example:

- `default`
- `mem0-prod`
- `local-markdown`
- `research-kb`

Agents, tools, and background hooks resolve the active provider by key, not by hard-coded vendor logic.

### 3. Plugins are the primary provider path

Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.

That keeps the core small and matches the broader Paperclip direction that specialized knowledge systems live at the edges.

### 4. Paperclip owns routing, provenance, and policy

Providers should not decide how Paperclip entities map to governance.

Paperclip core should own:

- binding resolution
- who is allowed to call a memory operation
- which company, agent, issue, project, run, and subject scope is active
- what source object the operation belongs to
- how usage and costs are attributed
- how operators inspect what happened

### 5. Paperclip exposes shared hooks, providers own extraction

Paperclip should emit a common set of memory hooks that built-ins, third-party adapters, and plugins can all use.

Those hooks should pass structured Paperclip source objects plus normalized metadata. The provider then decides how to extract from those objects.

Paperclip should not force one extraction pipeline or one canonical "memory text" transform before the provider sees the input.

### 6. Automatic memory should start narrow, but the hook surface should be general

Automatic capture is useful, but broad silent capture is dangerous.

Initial built-in automatic hooks should be:

- pre-run hydrate for agent context recall
- post-run capture from agent runs
- optional issue comment capture
- optional issue document capture

The hook registry itself should be general enough that other providers can subscribe to the same events without core changes.

### 7. No approval gate for binding changes in the open-source product

For the open-source version, changing memory bindings should not require approvals.

Paperclip should still log those changes in activity and preserve full auditability. Approval-gated memory governance can remain an enterprise or future policy layer.

## Proposed Concepts

### Memory provider

A built-in or plugin-supplied implementation that stores and retrieves memory.

Examples:

- local markdown plus semantic index
- mem0 adapter
- supermemory adapter
- MemOS adapter

### Memory binding

A company-scoped configuration record that points to a provider and carries provider-specific config.

This is the object selected by key.

### Memory binding target

A mapping from a Paperclip target to a binding.

V1 targets:

- `company`
- `agent`

### Memory scope

The normalized Paperclip scope passed into a provider request.

At minimum:

- `companyId`
- optional `agentId`
- optional `projectId`
- optional `issueId`
- optional `runId`
- optional `subjectId` for external or user identity
- optional `sessionKey` for providers that organize memory around sessions
- optional `namespace` for providers that need an explicit partition hint

### Memory source reference

The provenance handle that explains where a memory came from.

Supported source kinds should include:

- `issue_comment`
- `issue_document`
- `issue`
- `run`
- `activity`
- `manual_note`
- `external_document`

### Memory hook

A normalized trigger emitted by Paperclip when something memory-relevant happens.

Initial hook kinds:

- `pre_run_hydrate`
- `post_run_capture`
- `issue_comment_capture`
- `issue_document_capture`
- `manual_capture`

### Memory operation

A normalized capture, record-write, query, browse, get, correction, or delete action performed through Paperclip.

Paperclip should log every memory operation whether the provider is local, plugin-backed, or external.

## Required Adapter Contract

The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`, but strong enough to satisfy Paperclip's attribution and inspectability requirements.

```ts
export interface MemoryAdapterCapabilities {
  profile?: boolean;
  correction?: boolean;
  multimodal?: boolean;
  providerManagedExtraction?: boolean;
  asyncExtraction?: boolean;
  providerNativeBrowse?: boolean;
}

export interface MemoryScope {
  companyId: string;
  agentId?: string;
  projectId?: string;
  issueId?: string;
  runId?: string;
  subjectId?: string;
  sessionKey?: string;
  namespace?: string;
}

export interface MemorySourceRef {
  kind:
    | "issue_comment"
    | "issue_document"
    | "issue"
    | "run"
    | "activity"
    | "manual_note"
    | "external_document";
  companyId: string;
  issueId?: string;
  commentId?: string;
  documentKey?: string;
  runId?: string;
  activityId?: string;
  externalRef?: string;
}

export interface MemoryHookContext {
  hookKind:
    | "pre_run_hydrate"
    | "post_run_capture"
    | "issue_comment_capture"
    | "issue_document_capture"
    | "manual_capture";
  hookId: string;
  triggeredAt: string;
  actorAgentId?: string;
  heartbeatRunId?: string;
}

export interface MemorySourcePayload {
  text?: string;
  mimeType?: string;
  metadata?: Record<string, unknown>;
  object?: Record<string, unknown>;
}

export interface MemoryUsage {
  provider: string;
  biller?: string;
  model?: string;
  billingType?: "metered_api" | "subscription_included" | "subscription_overage" | "unknown";
  attributionMode?: "billed_directly" | "included_in_run" | "external_invoice" | "untracked";
  inputTokens?: number;
  cachedInputTokens?: number;
  outputTokens?: number;
  embeddingTokens?: number;
  costCents?: number;
  latencyMs?: number;
  details?: Record<string, unknown>;
}

export interface MemoryRecordHandle {
  providerKey: string;
  providerRecordId: string;
}

export interface MemoryCaptureRequest {
  bindingKey: string;
  scope: MemoryScope;
  source: MemorySourceRef;
  payload: MemorySourcePayload;
  hook?: MemoryHookContext;
  mode?: "capture_residue" | "capture_record";
  metadata?: Record<string, unknown>;
}

export interface MemoryRecordWriteRequest {
  bindingKey: string;
  scope: MemoryScope;
  source?: MemorySourceRef;
  records: Array<{
    text: string;
    summary?: string;
    metadata?: Record<string, unknown>;
  }>;
}

export interface MemoryQueryRequest {
  bindingKey: string;
  scope: MemoryScope;
  query: string;
  topK?: number;
  intent?: "agent_preamble" | "answer" | "browse";
  metadataFilter?: Record<string, unknown>;
}

export interface MemoryListRequest {
  bindingKey: string;
  scope: MemoryScope;
  cursor?: string;
  limit?: number;
  metadataFilter?: Record<string, unknown>;
}

export interface MemorySnippet {
  handle: MemoryRecordHandle;
  text: string;
  score?: number;
  summary?: string;
  source?: MemorySourceRef;
  metadata?: Record<string, unknown>;
}

export interface MemoryContextBundle {
  snippets: MemorySnippet[];
  profileSummary?: string;
  usage?: MemoryUsage[];
}

export interface MemoryListPage {
  items: MemorySnippet[];
  nextCursor?: string;
  usage?: MemoryUsage[];
}

export interface MemoryExtractionJob {
  providerJobId: string;
  status: "queued" | "running" | "succeeded" | "failed" | "cancelled";
  hookKind?: MemoryHookContext["hookKind"];
  source?: MemorySourceRef;
  error?: string;
  submittedAt?: string;
  startedAt?: string;
  finishedAt?: string;
}

export interface MemoryAdapter {
  key: string;
  capabilities: MemoryAdapterCapabilities;
  capture(req: MemoryCaptureRequest): Promise<{
    records?: MemoryRecordHandle[];
    jobs?: MemoryExtractionJob[];
    usage?: MemoryUsage[];
  }>;
  upsertRecords(req: MemoryRecordWriteRequest): Promise<{
    records?: MemoryRecordHandle[];
    usage?: MemoryUsage[];
  }>;
  query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
  list(req: MemoryListRequest): Promise<MemoryListPage>;
  get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
  forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
}
```

This contract intentionally does not force a provider to expose its internal graph, file tree, or ontology. It does require enough structure for Paperclip to browse, attribute, and audit what happened.

## Optional Adapter Surfaces

These should be capability-gated, not required:

- `correct(handle, patch)` for natural-language correction flows
- `profile(scope)` when the provider can synthesize stable preferences or summaries
- `listExtractionJobs(scope, cursor)` when async extraction needs richer operator visibility
- `retryExtractionJob(jobId)` when a provider supports re-drive
- `explain(queryResult)` for providers that can expose retrieval traces
- provider-native browse or graph surfaces exposed through plugin UI

## Lessons From AWS AgentCore Memory API

AWS AgentCore Memory is a useful check on whether this plan is too abstract or missing important operational surfaces.

The broad direction still looks right:

- AWS splits memory into a control plane (`CreateMemory`, `UpdateMemory`, `ListMemories`) and a data plane (`CreateEvent`, `RetrieveMemoryRecords`, `GetMemoryRecord`, `ListMemoryRecords`)
- AWS separates raw interaction capture from curated long-term memory records
- AWS supports both provider-managed extraction and self-managed pipelines
- AWS treats browse and list operations as first-class APIs, not ad hoc debugging helpers
- AWS exposes extraction jobs instead of hiding asynchronous maintenance completely

That lines up with the Paperclip plan at a high level: provider configuration, scoped writes, scoped retrieval, provider-managed extraction as a capability, and a browse and inspect surface.

The concrete changes Paperclip should take from AWS are:

### 1. Keep config APIs separate from runtime traffic

The rollout should preserve a clean separation between:

- control-plane APIs for binding CRUD, defaults, overrides, and capability metadata
- runtime APIs and tools for capture, record writes, query, list, get, forget, and extraction status

This keeps governance changes distinct from high-volume memory traffic.

### 2. Distinguish capture from curated record writes

AWS does not flatten everything into one write primitive. It distinguishes captured events from durable memory records.

Paperclip should do the same:

- `capture(...)` for raw run, comment, document, or activity residue
- `upsertRecords(...)` for curated durable facts and notes

That is a better fit for provider-managed extraction and for manual curation flows.

### 3. Make list and browse first-class

AWS exposes list and retrieve surfaces directly. Paperclip should not make browse optional at the portable layer.

The minimum portable surface should include:

- `query`
- `list`
- `get`

Provider-native graph or file browsing can remain optional beyond that.

### 4. Add pagination and cursors for operator inspection

AWS consistently uses pagination on browse-heavy APIs.

Paperclip should add cursor-based pagination to:

- record listing
- extraction job listing
- memory operation explorer APIs

Prompt hydration can continue to use `topK`, but operator surfaces need cursors.

### 5. Add explicit session and namespace hints

AWS uses `actorId`, `sessionId`, `namespace`, and `memoryStrategyId` heavily.

Paperclip should keep its own control-plane-centric model, but the adapter contract needs obvious places to map those concepts:

- `sessionKey`
- `namespace`

The provider adapter can map them to AWS or other vendor-specific identifiers without leaking those identifiers into core.

### 6. Treat asynchronous extraction as a real operational surface

AWS exposes extraction jobs explicitly. Paperclip should too.

Operators should be able to see:

- pending extraction work
- failed extraction work
- which hook or source caused the work
- whether a retry is available

### 7. Keep Paperclip provenance primary

Paperclip should continue to center:

- `companyId`
- `agentId`
- `projectId`
- `issueId`
- `runId`
- issue comments, documents, and activity as sources

The lesson from AWS is to support clean mapping into provider-specific models, not to let provider identifiers take over the core product model.

## What Paperclip Should Persist

Paperclip should not mirror the full provider memory corpus into Postgres unless the provider is a Paperclip-managed local provider.

Paperclip core should persist:

- memory bindings
- company default and agent override resolution targets
- provider keys and capability metadata
- normalized memory operation logs
- source references back to issue comments, documents, runs, and activity
- provider record handles returned by operations when available
- hook delivery records and extraction job state
- usage and cost attribution

For external providers, the actual memory payload can remain in the provider.

## Hook Model

### Shared hook surface

Paperclip should expose one shared hook system for memory.

That same system must be available to:

- built-in memory providers
- plugin-based memory providers
- third-party adapter integrations that want to use memory hooks

### What a hook delivers

Each hook delivery should include:

- resolved binding key
- normalized `MemoryScope`
- `MemorySourceRef`
- structured source payload
- hook metadata such as hook kind, trigger time, and related run id

The payload should include structured objects where possible so the provider can decide how to extract and chunk.

### Initial automatic hooks

These should be low-risk and easy to reason about:

1. `pre_run_hydrate`
   Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.

2. `post_run_capture`
   After a run finishes, Paperclip may call `capture(...)` with structured run output, excerpts, and provenance.

3. `issue_comment_capture`
   When enabled on the binding, Paperclip may call `capture(...)` for selected issue comments.

4. `issue_document_capture`
   When enabled on the binding, Paperclip may call `capture(...)` for selected issue documents.

### Explicit tools and APIs

These should be tool-driven or UI-driven first:

- `memory.search`
- `memory.note`
- `memory.forget`
- `memory.correct`
- memory record list and get
- extraction-job inspection

### Not automatic in the first version

- broad web crawling
- silent import of arbitrary repo files
- cross-company memory sharing
- automatic destructive deletion
- provider migration between bindings

## Agent UX Rules

Paperclip should give agents both automatic recall and explicit tools, with simple guidance:

- use `memory.search` when the task depends on prior decisions, people, projects, or long-running context that is not in the current issue thread
- use `memory.note` when a durable fact, preference, or decision should survive this run
- use `memory.correct` when the user explicitly says prior context is wrong
- rely on post-run auto-capture for ordinary session residue so agents do not have to write memory notes for every trivial exchange

This keeps memory available without forcing every agent prompt to become a memory-management protocol.

## Browse And Inspect Surface

Paperclip needs a first-class UI for memory, otherwise providers become black boxes.

The initial browse surface should support:

- active binding by company and agent
- recent memory operations
- recent write and capture sources
- record list and record detail with source backlinks
- query results with source backlinks
- extraction job status
- filters by agent, issue, project, run, source kind, and date
- provider usage, cost, and latency summaries

When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.

## Cost And Evaluation

Paperclip should treat memory accounting as two related but distinct concerns:

### 1. `memory_operations` is the authoritative audit trail

Every memory action should create a normalized operation record that captures:

- binding
- scope
- source provenance
- operation type
- success or failure
- latency
- usage details reported by the provider
- attribution mode
- related run, issue, and agent when available

This is where operators answer "what memory work happened and why?"

### 2. `cost_events` remains the canonical spend ledger for billable metered usage

The current `cost_events` model is already the canonical cost ledger for token and model spend, and `agent_runtime_state` plus `heartbeat_runs.usageJson` already roll up and summarize run usage.

The recommendation is:

- if a memory operation runs inside a normal Paperclip agent heartbeat and the model usage is already counted on that run, do not create a duplicate `cost_event`
- instead, store the memory operation with `attributionMode = "included_in_run"` and link it to the related `heartbeatRunId`
- if a memory provider makes a direct metered model call outside the agent run accounting path, the provider must report usage and Paperclip should create a `cost_event`
- that direct `cost_event` should still link back to the memory operation, agent, company, and issue or run context when possible

### 3. `finance_events` should carry flat subscription or invoice-style costs

If a memory service incurs:

- monthly subscription cost
- storage invoices
- provider platform charges not tied to one request

those should be represented as `finance_events`, not as synthetic per-query memory operations.

That keeps usage telemetry separate from accounting entries like invoices and flat fees.

### 4. Evaluation metrics still matter

Paperclip should record evaluation-oriented metrics where possible:

- recall hit rate
- empty query rate
- manual correction count
- extraction failure count
- per-binding success and failure counts

This is important because a memory system that "works" but silently burns budget or silently fails extraction is not acceptable in Paperclip.

## Suggested Data Model Additions

At the control-plane level, the likely new core tables are:

- `memory_bindings`
  - company-scoped key
  - provider id or plugin id
  - config blob
  - enabled status

- `memory_binding_targets`
  - target type (`company`, `agent`)
  - target id
  - binding id

- `memory_operations`
  - company id
  - binding id
  - operation type (`capture`, `record_upsert`, `query`, `list`, `get`, `forget`, `correct`)
  - scope fields
  - source refs
  - usage, latency, and attribution mode
  - related heartbeat run id
  - related cost event id
  - success or error

- `memory_extraction_jobs`
  - company id
  - binding id
  - operation id
  - provider job id
  - hook kind
  - status
  - source refs
  - error
  - submitted, started, and finished timestamps

Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.

## Recommended First Built-In

The best zero-config built-in is a local markdown-first provider with optional semantic indexing.

Why:

- it matches Paperclip's local-first posture
- it is inspectable
- it is easy to back up and debug
- it gives the system a baseline even without external API keys

The design should still treat that built-in as just another provider behind the same control-plane contract.

## Rollout Phases

### Phase 1: Control-plane contract

- add memory binding models and API types
- add company default plus agent override resolution
- add plugin capability and registration surface for memory providers

### Phase 2: Hook delivery and operation audit

- add shared memory hook emission in core
- add operation logging, extraction job state, and usage attribution
- add direct-provider cost and finance-event linkage rules

### Phase 3: One built-in plus one plugin example

- ship a local markdown-first provider
- ship one hosted adapter example to validate the external-provider path

### Phase 4: UI inspection

- add company and agent memory settings
- add a memory operation explorer
- add record list and detail surfaces
- add source backlinks to issues and runs

### Phase 5: Rich capabilities

- correction flows
- provider-native browse or graph views
- evaluation dashboards
- retention and quota controls

## Remaining Open Questions

- Which built-in local provider should ship first: pure markdown, markdown plus embeddings, or a lightweight local vector store?
- How much source payload should Paperclip snapshot inside `memory_operations` for debugging without duplicating large transcripts?
- Should correction flows mutate provider state directly, create superseding records, or both depending on provider capability?
- What default retention and size limits should the local built-in enforce?

## Bottom Line

The right abstraction is:

- Paperclip owns bindings, resolution, hooks, provenance, policy, and attribution.
- Providers own extraction, ranking, storage, and provider-native memory semantics.

That gives Paperclip a stable memory service without locking the product to one memory philosophy or one vendor, and it integrates the AWS lessons without importing AWS's model into core.