Files

T

Dotta 0096b56a1c [codex] Add LLM Wiki plugin host support (#5597 )

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system needs host contracts and runtime support before
large plugins can integrate cleanly.
> - The source branch mixed the LLM Wiki package with supporting
host/runtime work, managed plugin skills, root-level storage spaces, and
a bookmarks reference plugin.
> - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be
split by file boundary: plugin package separately from everything else.
> - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have
plugin-local spaces, but Paperclip core should not reorganize top-level
local storage into spaces.
> - Follow-up review clarified that the bookmarks example should not
ship in this PR either.
> - This pull request contains the
non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime
state under the selected Paperclip instance root, and no longer includes
the bookmarks example.

## What Changed

- Added/updated plugin host contracts, SDK types, worker RPC plumbing,
managed plugin skill support, and related server tests.
- Removed the bookmarks example plugin package and its
bundled-example/workspace references.
- Removed the root-level local spaces CLI/migration surface and restored
instance-root runtime defaults for config, db, logs, storage, secrets,
workspaces, projects, and adapter homes.
- Replaced shared root `space-paths` helpers with `home-paths` helpers
for core runtime storage.
- Tightened stranded recovery unique-conflict detection so concurrent
recovery scans reuse the raced recovery issue when Postgres errors are
wrapped.
- Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff;
plugin-local spaces remain in the stacked plugin-only PR.

## Verification

- `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts
cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts
packages/shared/src/home-paths.test.ts
packages/db/src/runtime-config.test.ts
server/src/__tests__/agent-instructions-service.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts`
- `pnpm exec vitest run packages/db/src/runtime-config.test.ts`
- `pnpm exec vitest run
server/src/__tests__/plugin-routes-authz.test.ts`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the
raced stranded recovery issue"` skipped locally because embedded
Postgres did not initialize on this macOS temp host; the code path was
typechecked and is covered by Linux CI.
- Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`,
`spaces migrate-default`, `@paperclipai/shared/space-paths`,
`registerSpacesCommands`, or the removed bookmarks example.
- Previous PR head `4f23e034` had green GitHub checks: `verify`, all
four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk,
and `Greptile Review`. Current head `582f466d` is re-running checks
after the bookmarks deletion.

## Risks

- Plugin host changes touch shared runtime paths, so regressions would
most likely appear in adapter startup, plugin loading, or local dev path
defaults.
- Removing the bookmarks example also removes one demonstration of
plugin database namespaces plus local-folder persistence; remaining
plugin examples still cover bundled example discovery and plugin host
flows.
- The plugin package itself is intentionally deferred to the stacked
plugin-only PR, where LLM Wiki plugin-local spaces live.
- Existing installs that tested the transient root-level spaces CLI
should stop using it; this PR intentionally removes that unsupported
migration surface before merge.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass, except where noted above
for host-specific embedded Postgres initialization
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stacked follow-up: PR #5592 contains only
`packages/plugins/plugin-llm-wiki/` and targets this branch.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>

2026-05-10 07:34:12 -05:00

5.9 KiB

Raw Permalink Blame History

LLM Wiki Paperclip Asset And Work-Product Security Gate

Status: accepted Phase 5 policy Date: 2026-05-06 Owner: Security engineering Scope: Paperclip-derived ingestion into the LLM Wiki before any asset or work-product content indexing ships

Decision

Phase 5 remains fail-closed for Paperclip assets and work products.

Paperclip-derived text extraction is allowed only for issue titles/descriptions, issue comments, and issue documents.
Paperclip assets/attachments and issue work products are metadata-only in Phase 5.
Linked summaries and content extraction for assets/work products are not approved in Phase 5.
No implementation may fetch /api/assets/:id/content, dereference a work-product url, scrape preview pages, or embed binary/blob content into source bundles or source snapshots.

This keeps the secure path easier than the insecure one and avoids broadening the wiki into a second content-distribution channel.

Allowed Source Kinds

These source kinds may contribute body text to Paperclip-derived source bundles:

Source kind	Allowed body fields	Reason
Issue	`title`, `description`, identifier/status metadata	First-party Paperclip text under company ACL
Comment	`body`	First-party Paperclip text under company ACL
Document	`body`, `title`, `key`, revision metadata	First-party Paperclip text under company ACL

Assets And Work Products

Assets / attachments

Allowed in Phase 5:

metadata-only references built from allowlisted structured fields already stored in Paperclip
recommended fields: issueId, issueCommentId, attachmentId, assetId, originalFilename, contentType, byteSize, sha256, createdAt, createdByAgentId, createdByUserId

Disallowed in Phase 5:

fetching asset bytes from /api/assets/:id/content
parsing any blob body, including text/plain, text/markdown, application/json, images, SVG, PDFs, archives, or office formats
storing contentPath in wiki source bundles or source snapshots
model summarization of attachment bodies

Work products

Allowed in Phase 5:

metadata-only references built from allowlisted structured fields already stored in Paperclip
recommended fields: issueId, workProductId, type, provider, title, status, reviewState, healthStatus, externalId, isPrimary, createdAt, updatedAt
optional boolean/derived metadata such as hasUrl: true

Disallowed in Phase 5:

fetching or crawling the work-product url
scraping preview pages, artifacts, pull requests, branches, commits, or custom provider targets through the wiki ingestion path
storing raw url values in wiki source bundles or source snapshots
model-authored linked summaries derived from off-record content

MIME Allowlists And Size Caps

No MIME allowlist is approved for asset content extraction in Phase 5 because no asset body extraction is approved at all.

Every asset MIME type is treated as opaque for Paperclip-derived indexing.
Existing upload limits remain storage concerns, not ingestion approvals.
Work-product destinations are also opaque regardless of MIME type or size.

Any future issue that wants blob parsing must define:

a positive MIME allowlist
per-type parser strategy
per-source size caps
sandbox/isolation requirements
prompt-injection handling
regression tests for refusal paths

Redaction Rules

Metadata-only means structured facts only, not capability-bearing links.

Do not persist contentPath for assets.
Do not persist raw work-product url values.
Do not persist query strings, fragments, signed URL tokens, or userinfo.
Prefer stable identifiers (assetId, workProductId, externalId) over links.

This addresses Sensitive Information Disclosure, Unsafe Consumption of APIs, and Insecure Output Handling risks.

Provenance Rules

Every metadata-only reference must preserve enough provenance to explain where it came from without reading the underlying content:

companyId
issueId
attachment/work-product id
producer identity when available
timestamps
an explicit metadata_only marker in any future reference/snapshot schema

Review-Required Behavior

Human review is not required for plain metadata-only references that stay inside the allowlisted fields above.

Human review is required, with a separate security sign-off issue, before enabling any of the following:

asset body extraction
work-product URL fetching
linked summaries generated from asset/work-product content
storing raw blob links or raw remote URLs in wiki source material
non-default-space routing for Paperclip-derived asset/work-product references

Security Rationale

This gate exists because the current host surfaces have different trust properties:

issue/comment/document text is first-party Paperclip content already exposed through company-scoped issue/document APIs
asset content is a blob download surface (/api/assets/:id/content) and can carry prompt-injection or parser-risk payloads
work products can point at arbitrary destinations through url, which reintroduces SSRF, token leakage, and prompt-injection risk if dereferenced automatically

Relevant threat classes:

OWASP LLM Top 10: Prompt Injection, Sensitive Information Disclosure, Insecure Output Handling, Excessive Agency
OWASP API Top 10: SSRF, Unsafe Consumption of APIs, Broken Object Property Level Authorization
Saltzer & Schroeder: Least Privilege, Fail Securely, Complete Mediation, Secure Defaults

Follow-Up Implementation Scope

A follow-up implementation issue is justified only for metadata-only references.

That implementation must:

keep assets/work products out of source-bundle body text
never fetch blob bytes or remote URLs
redact capability-bearing link fields
mark references as metadata_only
ship tests proving source bundles/snapshots never contain contentPath or raw work-product url fields

5.9 KiB Raw Permalink Blame History