Commit Graph

606 Commits

Author SHA1 Message Date
Devin Foley 4b1e92a588 feat(plugins): add Modal sandbox provider plugin (#6245)
## Thinking Path

> - Paperclip orchestrates AI agents through company-scoped
control-plane workflows and extensible runtime integrations.
> - Sandbox providers are part of that extension surface because they
let agents execute isolated work without baking each provider into the
core server.
> - Modal already offers managed sandboxes with filesystem, process,
timeout, and networking controls that map onto Paperclip's sandbox
provider contract.
> - The repo did not have a Modal provider plugin, so teams wanting
Modal-backed sandboxes had no first-party integration path.
> - This pull request adds a standalone
`packages/plugins/sandbox-providers/modal` plugin that implements the
provider contract, worker entrypoint, docs, and tests.
> - The benefit is that Modal can now be installed as a provider plugin
without expanding the core control-plane surface area.

## What Changed

- Added a new `packages/plugins/sandbox-providers/modal` package with
the plugin manifest, worker entrypoint, and exported plugin surface.
- Implemented Modal-backed sandbox lifecycle support for creation,
command execution, file operations, networking options, termination, and
metadata translation.
- Added focused Vitest coverage for config validation, env handling,
lifecycle flows, networking behavior, and error mapping.
- Documented installation, configuration, and usage requirements in the
plugin README.
- Removed misleading `MODAL_TOKEN_*` fallback behavior so authentication
relies on supported Modal credentials only.

## Verification

- `pnpm -r typecheck`
- `pnpm test:run`
- `pnpm build`
- `cd packages/plugins/sandbox-providers/modal && pnpm test`

## Risks

- Low to medium risk: this is isolated to a new plugin package, but
runtime behavior still depends on live Modal account credentials and
service-side sandbox semantics.
- Modal's current docs target a newer Node baseline than the repo
default, so the first live install should confirm credential loading and
sandbox startup behavior in a real Modal workspace.
- No UI or schema changes are included in this PR.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex via Paperclip `codex_local` agent (GPT-5-class Codex
coding model; exact backend model ID is not exposed by the runtime),
with tool use, shell execution, and code-editing capabilities enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-18 08:36:34 -07:00
Dotta 5071c4c776 [codex] Add workspace diff viewer plugin (#6071)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Operators need to inspect what agents changed inside execution and
project workspaces.
> - The existing workspace detail views did not provide a first-party
rich diff surface for staged, unstaged, head, renamed, binary,
oversized, and untracked changes.
> - The plugin system is the intended extension point for optional rich
UI surfaces.
> - This pull request adds a workspace diff plugin plus host services
and shared contracts so Changes tabs can render workspace diffs through
plugin slots.
> - The diff-renderer dependency should stay owned by the plugin package
rather than the core UI app.
> - The dependency surface must stay aligned with repository PR policy,
including intentionally omitting `pnpm-lock.yaml` from the PR.
> - The benefit is a more reviewable workspace surface without
hard-coding the renderer into every page.

## What Changed

- Added `@paperclipai/plugin-workspace-diff`, including diff
normalization, plugin manifest/worker/UI entrypoints, and focused plugin
tests.
- Kept `@pierre/diffs` scoped to `@paperclipai/plugin-workspace-diff`;
removed the core UI lab diff-renderer surface and direct UI package
dependency.
- Added shared workspace diff types and validators, plus plugin SDK
surface for workspace diff host services.
- Added server workspace diff service support and route coverage for
execution/project workspace diff flows.
- Wired Execution Workspace and Project Workspace Changes tabs to load
the diff plugin, including loading/error fallback behavior.
- Added UI tests and fixtures for the Changes tabs and plugin bridge
behavior.
- Added the new plugin package manifest to the Docker deps stage so PR
policy can validate dependency coverage.
- Addressed review hardening around empty untracked patches, workspace
path exposure, project workspace read capability checks, and default
base refs.

## Verification

- `pnpm --filter @paperclipai/plugin-workspace-diff test`
- `pnpm exec vitest run
packages/shared/src/validators/workspace-diff.test.ts
server/src/__tests__/workspace-diff-service.test.ts
ui/src/pages/ProjectWorkspaceDetail.test.tsx
ui/src/pages/ExecutionWorkspaceDetail.test.tsx`
- `pnpm exec vitest run ui/src/plugins/bridge.test.ts
server/src/__tests__/workspace-runtime-routes-authz.test.ts`
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/plugin-workspace-diff typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `node ./scripts/check-docker-deps-stage.mjs`
- Browser screenshot captured from the local worktree dev server:
https://files.catbox.moe/ofdpsp.png
- Confirmed branch is rebased onto `public-gh/master`,
`.github/workflows/pr.yml` is not included in the PR diff,
`ui/package.json` is not included in the PR diff, and `pnpm-lock.yaml`
is not included in the PR diff.

## Risks

- Medium UI integration risk: the Changes tab depends on the plugin slot
and host diff service path.
- Medium dependency risk: this adds `@pierre/diffs` in the plugin
package, but `pnpm-lock.yaml` is intentionally omitted per packaging
instructions because repository automation manages lockfile updates.
- Current CI blocker: downstream frozen installs fail until the
repository policy path for new plugin package dependencies is chosen.
- Diff rendering edge cases are covered for common working-tree and head
diff states, but very large repositories may still expose performance
limits.
- No migrations are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 class coding model, tool-enabled local execution
environment. Exact context window was not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-18 08:50:06 -05:00
Dotta d734bd43d1 [codex] Roll up May 17 branch changes (#6210)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies, so agent
work needs visible ownership, recovery, and operator controls.
> - This local branch had accumulated several related control-plane
reliability and operator-experience fixes across recovery actions,
watchdog folding, model-profile defaults, mentions, markdown editing,
plugin launchers, and small UI polish.
> - The branch needed to be converted into a PR against the current
`origin/master` without losing dirty work or including lockfile/workflow
churn.
> - The safest standalone shape is a single rollup PR because the
recovery/server/UI files overlap heavily across the local commits and
splitting would create avoidable conflicts.
> - This pull request replays the local branch onto latest
`origin/master`, preserves the uncommitted work as logical commits, and
adds a Zod 4 validator compatibility fix found during verification.
> - The benefit is that the May 17 local branch can be reviewed and
merged as one coherent, conflict-free branch under the 100-file Greptile
limit.

## What Changed

- Rebased the local May 17 branch work onto current `origin/master` in a
dedicated worktree.
- Preserved and committed previously dirty changes for recovery retry
handling, plugin/sidebar launcher polish, and `.herenow` ignores.
- Added recovery-action behavior for returning source issues to `todo`
when retrying source-scoped recovery.
- Included the existing local recovery/liveness/watchdog fold, Codex
cheap-profile, markdown/mention, duplicate-agent, and UI polish commits
from the branch.
- Normalized shared validator `z.record(...)` schemas to explicit
string-key records for Zod 4 compatibility.
- Confirmed the PR has no `pnpm-lock.yaml` or `.github/workflows/*`
changes and stays below the 100-file Greptile limit.

## Verification

- `pnpm install --frozen-lockfile --ignore-scripts`
- `npm run install` in
`node_modules/.pnpm/sqlite3@5.1.7/node_modules/sqlite3` to build the
local native sqlite3 binding after installing with scripts disabled
- `pnpm exec vitest run packages/shared/src/validators/issue.test.ts
packages/shared/src/project-mentions.test.ts
packages/adapter-utils/src/server-utils.test.ts
server/src/__tests__/heartbeat-model-profile.test.ts
server/src/__tests__/issue-recovery-actions.test.ts
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts
server/src/__tests__/plugin-local-folders.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/components/Sidebar.test.tsx
ui/src/components/SidebarAccountMenu.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/components/MarkdownEditor.test.tsx
ui/src/components/MarkdownBody.test.tsx
ui/src/lib/duplicate-agent-payload.test.ts
ui/src/pages/Routines.test.tsx`
- First pass: 13 files passed with 201 passing tests; 3 server files
failed before sqlite3 native binding was built.
- After rebuilding sqlite3:
`server/src/__tests__/heartbeat-model-profile.test.ts`,
`server/src/__tests__/issue-recovery-actions.test.ts`, and
`server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts`
passed/loaded; embedded Postgres tests were skipped by the local host
guard.
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/adapter-utils typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`

## Risks

- Medium risk: this is a broad rollup PR across recovery semantics,
server tests, shared validators, and UI surfaces.
- Some embedded Postgres tests skipped locally due the host guard, so CI
should provide the stronger database-backed signal.
- UI changes were covered by component tests, but no browser screenshot
was captured in this PR creation pass.
- This branch may overlap with existing recovery/liveness PR work; merge
this PR independently or restack/close overlapping branches rather than
merging duplicate implementations together.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local repository
and GitHub workflow, medium reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-17 17:15:06 -05:00
Dotta 705c1b8d81 [codex] Add routine env secrets support (#6212)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Scheduled routines are the control-plane path for recurring agent
work.
> - Routines already had dispatch/history, but their runtime environment
did not carry routine-owned secret bindings through execution.
> - Operators need routine-specific secrets that can override
project/agent env without exposing secret values in history, logs, or
access events.
> - This pull request adds the routine env runtime contract, wires it
into execution, and makes the routine UI/history surfaces show safe
secret metadata.
> - The benefit is that routine executions can use scoped secret refs
predictably while preserving company boundaries and auditability.

## What Changed

- Added routine env persistence/runtime support, including
`routines.env`, `routine_runs.routine_revision_id`, revision snapshots,
and idempotent migration `0086_routine_env_runtime_contract`.
- Resolved routine env during heartbeat adapter config assembly with
precedence `agent < project < routine` and secret access events recorded
against the routine consumer.
- Added secret binding synchronization for routine create/update/restore
flows and guarded cross-company, missing, disabled, and deleted secret
cases.
- Added a Secrets tab to routine detail, env/secret history diff
rendering, and Storybook coverage for the new UI states.
- Added server/UI regression tests, including an embedded-Postgres QA
path for routine secret execution and restore behavior.
- Updated implementation/database docs for routine env and
secret-binding behavior.

## Verification

- `pnpm install --frozen-lockfile` after rebasing onto
`public-gh/master` to refresh workspace links for the newly-added
upstream Grok adapter package.
- `pnpm exec vitest run
server/src/__tests__/heartbeat-project-env.test.ts
server/src/__tests__/routines-service.test.ts
server/src/__tests__/secrets-service.test.ts
server/src/__tests__/qa-routine-secrets-e2e.test.ts
ui/src/components/RoutineHistoryTab.test.tsx` passed: 5 files, 92 tests.
- `pnpm -r typecheck` passed across the workspace.
- `pnpm build` passed. Vite emitted the existing
large-chunk/dynamic-import warnings.
- UI screenshots were captured locally during QA in
`artifacts/pap-9521/` and `artifacts/pap-9522/`; generated screenshots
are not committed to avoid adding binary artifacts to the repo.

## Risks

- Migration risk is limited by `IF NOT EXISTS` guards for the new
columns, FK, and index, and the migration is ordered as `0086`
immediately after upstream `0085`.
- Runtime behavior changes env precedence for routine executions by
adding routine env as the highest-precedence layer; tests cover
agent/project/routine precedence.
- Secret handling is security-sensitive; tests cover value-free
manifests/events/errors, disabled/missing/deleted secrets, and
cross-company rejection.
- UI history now renders routine env/secret diffs; tests and Storybook
stories cover the main rendering paths.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with shell/tool use and
medium reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-17 16:30:34 -05:00
Devin Foley 573e9ec909 fix(grok-local): restore turn boundaries in streaming reasoning text (#6142)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The `grok-local` adapter streams reasoning text to the issue
"Working..." panel as the grok CLI runs
> - The `grok` CLI's `--output-format streaming-json` mode silently
drops the `\n` separator between reasoning turns around tool calls
> - Consecutive `thought` chunks (e.g. `` "`" `` followed by `"The"`)
arrive with no intervening whitespace event, so the UI's `delta: true`
concatenator merged them into run-on text like `"…planningGreat, now I
have the issue descriptionThe only co"`
> - This PR adds a small turn-boundary helper that detects sentence
boundaries in the upstream `thought` stream and inserts a single `\n`
only when the previous chunk ended with sentence punctuation (or a
balanced closing backtick) AND the next chunk begins a new uppercase
sentence
> - The benefit is readable streaming reasoning in the UI without
changing how completed messages are stored

## What Changed

- Added `packages/adapters/grok-local/src/shared/turn-boundary.ts` with
per-stream state (last chunk + backtick parity) and a
`restoreTurnBoundary()` helper that inserts `\n` only between balanced,
sentence-terminated `thought` chunks
- Wired the helper into `parseGrokJsonl` (server) and added a new
`createGrokStdoutParser` factory used by `grokLocalUIAdapter` for the
live "Working..." panel
- Added focused tests in `shared/turn-boundary.test.ts`, plus regression
assertions in `server/parse.test.ts` and `ui/parse-stdout.test.ts`

## Verification

- `pnpm --filter @paperclip/grok-local test` — 23/23 adapter tests pass
- `pnpm --filter @paperclip/grok-local typecheck` and UI typecheck —
clean
- Replayed an actual broken `grok 0.1.210` stream from the report;
previously-merged boundaries (`` `ls`The ``, `returned:Confirmed`) now
render with a separating newline; chunks inside un-closed backtick spans
are left alone

## Risks

- Low risk. Boundary insertion only fires when prev ends with
`.`/`!`/`?`/balanced `` ` `` and next begins with an uppercase ≥2-char
word, with no whitespace on either side. Worst case: a rare missed split
or a misplaced newline inside reasoning — both purely cosmetic and
confined to the live streaming panel.

## Model Used

- Claude Opus 4.7 (claude-opus-4-7), Anthropic, extended thinking + tool
use via Claude Code

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-16 11:48:51 -07:00
Devin Foley ab8b471685 Add built-in grok_local adapter (#6087)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so
adapter quality directly affects what runtimes the control plane can
supervise.
> - Local CLI adapters are one of the core execution surfaces because
they turn real coding tools into Paperclip-managed employees with
heartbeats, transcripts, and reviewability.
> - Grok Build was installed on the Paperclip host, but Paperclip had no
built-in `grok_local` adapter, so the runtime could not be configured
through the normal server/UI/CLI adapter path.
> - That gap needed to be closed with the same built-in registry,
environment diagnostics, transcript parsing, and skill/instructions
behavior that the other local adapters already rely on.
> - After the initial adapter landed, a real follow-up run showed that
Grok streaming text was being rendered one fragment per line, which made
transcripts harder to read even though the runtime itself was working.
> - This pull request adds the built-in `grok_local` adapter end-to-end
and then fixes the transcript parser so streamed Grok output is
coalesced into readable assistant/thinking blocks.
> - The benefit is that Grok Build becomes a first-class Paperclip
runtime with a usable operator experience instead of a partially wired
runtime with noisy transcript output.

## What Changed

- Added a new built-in `@paperclipai/adapter-grok-local` package with
server, UI, and CLI entrypoints.
- Implemented Grok execution, session handling, environment diagnostics,
config building, skill syncing, and parser coverage inside the new
adapter package.
- Registered `grok_local` across the built-in adapter inventories and
capability/display metadata in server, UI, CLI, and shared constants.
- Added adapter route coverage for the new built-in type.
- Fixed Grok transcript readability by emitting streamed `text` and
`thought` fragments as deltas so the shared transcript builder coalesces
them into readable message blocks.
- Added regression tests for the Grok parser and transcript coalescing
behavior.

## Verification

- `pnpm vitest run
packages/adapters/grok-local/src/ui/parse-stdout.test.ts
ui/src/adapters/transcript.test.ts`
- `pnpm --filter @paperclipai/adapter-grok-local build`
- Manual runtime verification on the Paperclip host during
implementation and follow-up review:
  - confirmed the Grok CLI was installed and authenticated
- confirmed the worktree dev server could be restarted cleanly and
health-checked after the parser follow-up
- No screenshots attached. This change is primarily adapter plumbing
plus transcript formatting behavior; reviewers can verify via the
Grok-backed run surfaces directly.

## Risks

- This adds a new built-in adapter, so any missed registration surface
could create inconsistencies between server, UI, and CLI behavior.
- The adapter depends on Grok Build's current event/output shape; if
upstream Grok streaming JSON changes, transcript parsing or session
extraction may need follow-up updates.
- The transcript readability fix intentionally changes how Grok
fragments are grouped, so any downstream code that implicitly expected
one entry per fragment would behave differently.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex via Paperclip `codex_local` agent runtime.
- GPT-5-class coding model with tool use, shell execution, file editing,
and repo inspection enabled.
- Exact backend model ID/context window were not surfaced to the agent
in this Paperclip session.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-16 09:51:09 -07:00
Dotta 4c47eb46c3 [codex] Add multilingual issue preservation coverage (#6069)
## Thinking Path

> - Paperclip orchestrates AI agents for autonomous companies.
> - Agents and board operators coordinate through company-scoped issues,
comments, documents, and heartbeat wake payloads.
> - Chinese, Japanese, and Hindi text needs to survive the full issue
lifecycle without normalization or prompt serialization damage.
> - The riskiest paths are board issue creation, server
issue/comment/document round-tripping, and scoped wake prompt rendering.
> - This pull request adds focused regression coverage across those
surfaces.
> - The benefit is higher confidence that multilingual operators and
agents can create, search, comment on, complete, and wake on issues
using non-Latin text.

## What Changed

- Added adapter-utils wake payload and prompt rendering coverage for
Chinese, Japanese, and Hindi issue/comment text.
- Added UI New Issue dialog coverage proving multilingual title and
description text is submitted unchanged.
- Added server route coverage that round-trips multilingual issue text
through create, search, comments, documents, completion comments, and
heartbeat context.
- Addressed Greptile feedback by using a typed storage mock and
splitting the server route integration path into smaller ordered
assertions.

## Verification

- `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts
ui/src/components/NewIssueDialog.test.tsx
server/src/__tests__/multilingual-issues-routes.test.ts`
- Result: 3 test files passed, 51 tests passed.

## Risks

- Low risk: this PR adds regression coverage only and does not change
runtime behavior.
- The new server test uses embedded Postgres support and skips on
unsupported hosts using the existing helper pattern.
- No migrations are included.
- No `pnpm-lock.yaml` changes are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected - check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 based coding agent, with shell, git, Vitest, and
GitHub connector/CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-15 12:49:57 -05:00
Dotta eb38b226c2 Fix LLM Wiki package and migration validation (#6010)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Plugins extend the control plane with optional capabilities such as
LLM Wiki.
> - LLM Wiki needs its package assets and plugin-owned database
migrations to work when installed from the packaged plugin.
> - The bundled spaces migration used validation-hostile dynamic SQL,
and the packaged plugin could omit non-dist runtime assets.
> - This pull request makes the LLM Wiki package include its required
assets and cuts the spaces migration over to explicit, idempotent SQL
that passes the production plugin database validator.
> - The benefit is a simpler plugin install path that validates and
applies the bundled LLM Wiki migrations without adding plugin-specific
legacy handling to Paperclip core.

## What Changed

- Added the LLM Wiki package asset allowlist so agents, migrations,
skills, templates, dist output, and README are included when packaged.
- Renamed the bootstrap `.gitignore` template to `gitignore.template`
and updated the runtime lookup so package tooling does not drop the
hidden template file.
- Relaxed plugin migration validation to allow namespace-scoped
`INSERT`/`UPDATE` backfills and `CREATE INDEX` statements while
continuing to reject destructive or cross-namespace SQL.
- Replaced the LLM Wiki spaces migration's dynamic constraint-drop DO
block with explicit `DROP CONSTRAINT IF EXISTS` statements.
- Replaced fragile regex-source dispatch in SQL reference extraction
with explicit capture-group descriptors.
- Added regression coverage that applies the bundled LLM Wiki migrations
through the production validator and checks the expected constraints.

## Verification

- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-database.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --filter @paperclipai/plugin-llm-wiki build`
- `git diff --check`
- Confirmed `pnpm-lock.yaml` is not included in the branch diff.

## Risks

- Low migration risk for current users: LLM Wiki spaces are new, so this
intentionally cuts over the plugin migration instead of adding legacy
handling in core.
- Validator behavior is broader than before, but still requires fully
qualified plugin namespace targets, blocks deletes/destructive DDL, and
keeps public table access read-only and allowlisted.

> Checked [`ROADMAP.md`](ROADMAP.md); this is a targeted plugin
packaging/migration fix and does not duplicate planned core feature
work. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 based coding agent, tool-enabled local repo
access, reasoning mode managed by the Paperclip/Codex runtime. Exact
context window was not surfaced in this session.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-15 10:20:02 -05:00
Dotta 03ad5c5bea [codex] Add issue document locking (#6009)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through company-scoped
issues, comments, and issue documents.
> - Issue documents are the durable place where plans, handoffs, and
other work artifacts are revised over time.
> - Some documents need to be preserved as operator-approved snapshots
while agents continue working on the same issue.
> - Without document locking, a later board or agent write can overwrite
the document key that reviewers expected to remain stable.
> - This pull request adds board-managed issue document locks and makes
agent writes to locked keys create a derived document instead of
mutating the locked document.
> - The benefit is safer document handoffs: approved or frozen issue
documents stay immutable until the board explicitly unlocks them.

## What Changed

- Added `locked_at`, `locked_by_agent_id`, and `locked_by_user_id`
document fields plus migration `0085_tranquil_the_executioner.sql`.
- Added document lock/unlock service behavior, route endpoints, activity
events, and locked-document write protections.
- Made agent document writes to locked keys create a new derived key
such as `plan-2` rather than overwriting the locked document.
- Surfaced lock state through shared issue document types, UI API
methods, document header lock controls, and activity formatting.
- Added server and UI tests for lock/unlock behavior, locked document
immutability, and UI action visibility.
- Updated `doc/SPEC-implementation.md` with the V1 document lock
contract and endpoints.

## Verification

- `git rebase public-gh/master` completed cleanly after committing the
branch changes.
- `git diff --check` passed before commit.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/documents-service.test.ts
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
ui/src/components/IssueDocumentsSection.test.tsx
ui/src/components/IssueContinuationHandoff.test.tsx
ui/src/lib/document-revisions.test.ts` passed: 5 files, 32 tests.

## Risks

- Medium risk because this changes the document persistence contract and
adds a migration.
- The migration uses `ADD COLUMN IF NOT EXISTS` and guarded foreign-key
creation so it remains safe for users who may have already applied an
earlier copy of the migration.
- Locked documents intentionally reject board edits/deletes/restores
until unlocked; any existing workflows that expected direct overwrite
need to unlock first.
- Agent writes to locked keys now create derived documents, which may
create extra issue documents when agents retry locked writes.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with tool use and local code
execution in the Paperclip worktree.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-15 08:54:55 -05:00
Devin Foley 1bd44c8a0d Harden Cloudflare sandbox execution (#5967)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Remote-managed adapters need sandbox/environment execution to behave
like real agent runs, not just local host probes.
> - The Cloudflare sandbox path was the weakest leg in the SSH +
Cloudflare QA matrix because bridge execution could truncate output,
time out long-running installs, and under-provision the worker instance.
> - That made several adapters fail for reasons unrelated to their
actual business logic, which blocks confidence in Paperclip's non-local
environment model.
> - This pull request hardens the Cloudflare bridge/runtime path and
adjusts sandbox probe budgets so adapter verification matches the
measured behavior of the fixed environment.
> - It also corrects the Pi sandbox install command so the QA matrix
exercises a real, supported install path.
> - The benefit is a materially more reliable SSH + Cloudflare adapter
matrix with fewer false negatives and clearer failure boundaries.

## What Changed

- Switched the Cloudflare bridge worker instance type to `standard-2`
for the QA-matrix execution path.
- Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE
keepalives so long-running install/exec calls can complete instead of
dying at the transport layer.
- Fixed Cloudflare bridge-channel command handling to avoid dropped
final stdout chunks on short-lived execs.
- Made Claude, OpenCode, and Cursor sandbox probe timeouts
configurable/sandbox-aware, then tightened the defaults to the measured
post-fix range.
- Updated the Pi sandbox install command to use the package currently
installed by the official `pi.dev` installer, pinned to a specific npm
version.
- Added/updated tests around Cloudflare bridge behavior and adapter
sandbox probe paths.

## Verification

- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/adapter-opencode-local typecheck`
- `pnpm --filter @paperclipai/adapter-cursor-local typecheck`
- `pnpm vitest run packages/adapters/cursor-local
packages/adapters/claude-local packages/adapters/opencode-local
packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare
server/src/services/__tests__/plugin-worker-manager.test.ts`
- Manual QA on the dedicated dev instance using the SSH + Cloudflare
environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes:
SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare
`claude_local`, `codex_local`, `cursor`, `gemini_local`.

## Risks

- Cloudflare sandbox cost increases because the bridge worker now runs
on `standard-2` instead of `lite`.
- Higher timeout ceilings can delay surfacing truly hung Cloudflare
bridge calls, even though they remove transport-level false negatives.
- The manual heartbeat matrix still exposed follow-on
execution/sync/disposition bugs in `opencode_local` and `pi_local`;
those are not fixed by this PR.

## Model Used

- OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`,
tool use enabled, repo search enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable)
- [x] I have updated relevant documentation to reflect my changes (not
applicable)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 22:00:10 -07:00
Dotta 4142559c37 [codex] Add blocked inbox attention view (#5603)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped issues, comments, approvals, and execution workspaces.
> - Operators need the Inbox to show not only active work, but also
blocked work that may need human or agent attention.
> - The existing inbox experience did not have a dedicated blocked-work
surface, so blocked tasks were harder to triage and resume deliberately.
> - Backend consumers also needed a compact attention signal that
distinguishes actionable blockers from covered or waiting blocker
states.
> - This pull request adds a Blocked Inbox tab backed by issue
blocker-attention metadata, shared validators, and UI helpers.
> - The benefit is a clearer triage path for stalled or blocked
Paperclip work without exposing external wait internals in the
operator-facing UI.

## What Changed

- Added shared issue blocker-attention types, validators, and exports
for the API/UI contract.
- Added backend blocker-attention computation and issue route support
for blocked inbox data.
- Added the Blocked Inbox tab, blocked reason chips, filtering/search
UI, responsive layouts, and Storybook stories.
- Updated inbox helpers and page behavior so toolbar controls only
appear where they apply.
- Added coverage for shared validators, server blocker-attention
behavior, blocked inbox UI helpers/components, and the Inbox page.
- Added a screenshot helper script for the blocked inbox Storybook
stories.
- Addressed Greptile feedback by making urgency sorting deterministic
for null stop times, avoiding full blocked-inbox list enrichment for
counts, and hardening the screenshot helper.

## Verification

- Rebased the branch cleanly onto `public-gh/master`.
- Confirmed the diff does not include `pnpm-lock.yaml`.
- Confirmed the diff does not include database migration files.
- Ran `pnpm exec vitest run packages/shared/src/validators/issue.test.ts
server/src/__tests__/issue-blocker-attention.test.ts
ui/src/components/BlockedInboxView.test.tsx
ui/src/components/BlockedReasonChip.test.tsx
ui/src/lib/blockedInbox.test.ts ui/src/lib/inbox.test.ts
ui/src/pages/Inbox.test.tsx`.
- Ran `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`.
- Checked `ROADMAP.md`; this is scoped inbox/operator triage work and
does not duplicate a listed roadmap feature.
- Greptile Review is green on the latest head and all four Greptile
review threads are resolved.
- GitHub PR checks are green on the latest head: policy, security/snyk,
e2e, verify, Canary Dry Run, Greptile Review, and serialized server
suites 1/4 through 4/4.

## Risks

- Medium review surface because this touches the shared issue contract,
server issue services, and the Inbox UI together.
- Blocker-attention classification may need product tuning after
operators use it on real blocked queues.
- UI screenshots were not attached in this PR-opening pass; the branch
includes `scripts/screenshot-blocked-inbox.mjs` and Storybook stories
for visual capture.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

OpenAI Codex, GPT-5-based coding agent with shell, git, GitHub CLI,
GitHub connector, and Paperclip API tool use. Reasoning mode: medium.
Context window: not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 16:41:36 -05:00
Dotta d1a8c873b2 fix(remote-sandbox): harden host workspace resumes (#5922)
## Thinking Path

> - Paperclip orchestrates AI agents through a control plane while
adapters execute work in local, remote, or sandboxed runtimes.
> - Remote sandbox execution depends on a strict host-versus-remote
workspace boundary: the host prepares/restores files, while the adapter
command runs inside the sandbox cwd.
> - Jannes' PR #5823 identified host-side failure modes that were not
covered by replacement PR #5822.
> - Persisting a remote pod cwd in session params could poison the next
host heartbeat resume and make Paperclip inspect or upload system temp
roots.
> - Plugin sandbox providers also need a narrow way to receive
model-provider API keys without exposing the full server environment to
every plugin worker.
> - This pull request ports the host-side fixes from #5823 in the
current codebase style, with focused regression coverage.
> - The benefit is safer remote sandbox resumes and plugin worker
environment handling without broadening core plugin privileges.

## What Changed

- Persist host workspace cwd, not remote sandbox cwd, in `claude_local`
session params while retaining remote execution identity metadata.
- Reject saved session cwds that point at system roots before heartbeat
falls back to agent home workspace.
- Skip sockets, FIFOs, devices, and other non-file entries during
workspace restore snapshot capture/comparison.
- Pass a small model-provider API-key allowlist only to plugins
declaring `environment.drivers.register`.
- Added focused regression tests for remote Claude session params,
unsafe session cwd detection, plugin worker env filtering, and non-file
snapshot entries.

Credits: ports host-side fixes from Jannes' #5823.

## Verification

- `pnpm vitest run
packages/adapter-utils/src/workspace-restore-merge.test.ts
server/src/services/session-workspace-cwd.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/plugin-database.test.ts` (25 passed, 7 skipped by
existing embedded-Postgres host guard)
- `pnpm --filter @paperclipai/adapter-utils typecheck`
- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/server typecheck`

## Risks

- Low risk: changes are scoped to remote sandbox/session metadata,
workspace snapshot filtering, and plugin worker env setup.
- Sandbox-provider plugins now receive only the explicit model-provider
key allowlist; any provider needing another key name will need a
deliberate allowlist update.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local code
execution and repository editing.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 16:23:04 -05:00
Dotta b947a7d76c [codex] Improve local plugin development workflow (#5821)
## Thinking Path

> - Paperclip is the control plane for autonomous AI-agent companies.
> - Plugins are the extension point for adding capabilities without
expanding the core product surface.
> - Local plugin development needed a tighter CLI-first loop so plugin
authors can scaffold, run, install, inspect, and reload plugins without
reaching into internal package paths.
> - The server plugin install path also needed local-path handling that
keeps plugin identity, dashboard routes, and development watchers
coherent.
> - This pull request adds the CLI scaffold/install workflow, fixes the
server and SDK edge cases that blocked that loop, and updates the
agent-facing plugin creation skill and docs.
> - The benefit is that contributors can develop plugins from local
folders with a documented, repeatable happy path.

## What Changed

- Added `paperclipai plugin init` coverage and CLI wiring for local
plugin scaffolding.
- Improved local plugin install handling, plugin key route resolution,
dashboard capability behavior, and dev watcher startup/reload behavior.
- Fixed plugin SDK worker entrypoint validation for symlinked package
layouts.
- Added targeted tests for plugin init, server plugin authz/watcher
behavior, SDK worker host validation, and the authoring smoke example.
- Added a short local plugin development guide and refreshed the plugin
authoring guide plus `paperclip-create-plugin` skill instructions.

## Verification

- `pnpm run preflight:workspace-links && pnpm --filter
@paperclipai/plugin-sdk build && pnpm --filter
@paperclipai/create-paperclip-plugin typecheck && pnpm --filter
paperclipai typecheck && pnpm --filter @paperclipai/plugin-sdk typecheck
&& pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run --project paperclipai
cli/src/__tests__/plugin-init.test.ts`
- `pnpm exec vitest run --project @paperclipai/plugin-sdk
packages/plugins/sdk/tests/worker-rpc-host.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-dev-watcher.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-routes-authz.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --dir packages/plugins/examples/plugin-authoring-smoke-example
test`
- Confirmed `pnpm-lock.yaml` is not included in the PR diff.

## Risks

- Medium risk: this touches plugin install routing, CLI command
behavior, and the local development watcher.
- Local path plugin installs execute trusted local code by design; the
new docs call out that trust boundary.
- No database migrations are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled local shell and git
workflow, medium reasoning effort. Context window details were not
exposed in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

UI screenshots: not applicable; this PR changes CLI/server/plugin docs
and tests, not board UI rendering.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 17:38:24 -05:00
Dotta 0808b388ee [codex] Add source-scoped recovery actions (#5599)
## Thinking Path

> - Paperclip is a control plane for autonomous AI companies, where work
must end with a clear disposition rather than ambiguous agent liveness.
> - Recovery currently detects stalled or missing-next-step issues, but
source issue recovery can become split across child recovery issues,
blockers, and comments.
> - That makes it harder for operators and agents to see who owns
recovery and what exact action is needed on the original issue.
> - Source-scoped recovery actions give the original issue a first-class
active recovery state with owner, evidence, wake policy, and resolution
outcome.
> - This pull request adds the recovery-action data model, backend
reconciliation and resolution APIs, and board UI indicators/actions.
> - The benefit is clearer stalled-work recovery without losing source
issue context or relying on comments as the liveness path.

## What Changed

- Added the `issue_recovery_actions` schema, shared
types/constants/validators, and an idempotent
`0084_issue_recovery_actions` migration ordered after current `master`
migrations.
- Updated stranded/missing-disposition recovery to create source-scoped
recovery actions, wake the recovery owner on the source issue, and avoid
locking the source issue for recovery-action wakes.
- Added API support for reading active recovery actions on issue
detail/list surfaces and resolving them with restored, blocked,
cancelled, or false-positive outcomes.
- Require blocked recovery resolutions to have an unresolved first-class
blocker, and removed the UI shortcut that could mark recovery blocked
without a blocker selection path.
- Surfaced recovery indicators/actions in the issue UI, blocker notices,
active run panels, issue rows, and Storybook coverage.
- Updated docs and focused tests for recovery semantics, ownership,
races, stale comments, and UI behavior.

## Verification

- `pnpm exec vitest run
server/src/__tests__/issue-recovery-actions.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/components/IssueBlockedNotice.test.tsx ui/src/api/issues.test.ts`
— 5 files, 72 tests passed.
- `pnpm --filter @paperclipai/shared typecheck` — passed.
- `pnpm --filter @paperclipai/db typecheck` — passed, including
migration numbering check.
- `pnpm --filter @paperclipai/server typecheck` — passed.
- `pnpm --filter @paperclipai/ui typecheck` — passed.
- Follow-up verification after blocker-resolution guard: `pnpm exec
vitest run server/src/__tests__/issue-recovery-actions.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/api/issues.test.ts` — 3 files, 27 tests passed.
- Follow-up `pnpm --filter @paperclipai/server typecheck` — passed.
- Follow-up `pnpm --filter @paperclipai/ui typecheck` — passed.
- UI states are available in
`ui/storybook/stories/source-issue-recovery.stories.tsx`; screenshot
capture helper is `scripts/screenshot-recovery-card.cjs`.

## Risks

- Medium: recovery behavior changes from child recovery issue ownership
toward source-scoped actions, so operators may see stalled-work state in
new places.
- Migration risk is mitigated by using the next migration slot after
`master` and making the table/constraints/index creation idempotent for
anyone who previously applied the old branch-local
`0082_dizzy_master_mold` migration.
- Existing child recovery issue paths are still guarded for
already-created recovery issues, but new source-scoped flows should be
watched in CI and Greptile review.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use enabled for shell, Git,
GitHub, and local test execution. Context window not exposed by the
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 09:37:15 -05:00
Devin Foley c445e59256 fix(ui): fix message attribution for agent-posted comments with user author IDs (#5780)
## Thinking Path

> - Paperclip’s issue chat is an audit surface: reviewers need to trust
who actually authored a message.
> - Some historical agent comments were persisted with `authorUserId`
and no surviving `createdByRunId`, so the UI rendered real agent output
as if it came from the board user.
> - A pure timestamp-window fallback is too risky because human
reviewers can comment while agents are running.
> - The safe recovery path is to derive attribution only when the server
can prove it from same-issue run logs that include the exact posted
comment id, then let the chat renderer prefer that recovered agent
attribution.
> - This keeps historical threads trustworthy without mutating old
database rows or guessing in ambiguous cases.

## What Changed

- Added shared `IssueComment` fields for derived attribution so server
and UI can carry recovered `derivedAuthorAgentId`,
`derivedCreatedByRunId`, and `derivedAuthorSource` consistently.
- Added server-side attribution recovery in
`server/src/services/issues.ts` that reads same-issue run logs and only
derives agent authorship when a run log contains the exact `comment id:
...` emitted during posting.
- Updated issue chat rendering in `ui/src/lib/issue-chat-messages.ts` to
prefer direct agent authorship, then activity-log `runAgentId`, then the
server-derived attribution.
- Removed the unsafe UI-only run-window fallback from
`ui/src/pages/IssueDetail.tsx` so human comments posted during an active
run are not silently relabeled as agent output.
- Added regression coverage for both the run-log derivation path and the
chat-rendering fallback behavior.
- Bounded server-side run-log enrichment to 8 concurrent reads per
request and removed the unused `issueCommentSchema` declaration during
PR cleanup.

## Verification

- `pnpm exec vitest run ui/src/lib/issue-chat-messages.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm test:run:general`
- Live validation on May 12, 2026 in `PAPA-322`: confirmed the
previously misattributed historical comments on `PAPA-316` now render as
Claude-authored on `http://goldie.gerbil-company.ts.net:3100`.
- Reviewer check: open `PAPA-316` in the running instance and confirm
historical comments such as `## Investigation: exe.dev 422 + codex
re-test` render under Claude instead of the board user.

## Risks

- Low risk. The change is scoped to comment attribution recovery and
rendering.
- Derived attribution is intentionally conservative: if there is no
exact run-log proof, the comment remains user-authored instead of
guessing.
- Run-log recovery depends on retained same-issue logs, so older
comments without that evidence remain unchanged.

## Model Used

- OpenAI Codex via the Paperclip `codex_local` adapter (GPT-5-class
coding agent with tool use in the local Paperclip runtime; the exact
deployment/model ID is not surfaced by this workspace).

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 01:20:49 -07:00
Dotta 563413ecd4 Fix LLM wiki type contracts (#5758)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies, and
plugins extend that control plane without bloating core.
> - The LLM Wiki plugin adds a knowledge surface through the plugin
runtime and shared plugin UI components.
> - After the LLM Wiki work merged to `master`, CI exposed TypeScript
contract drift between plugin code, SDK component types, and update
settings types.
> - The ingestion settings update path intentionally accepts partial
source toggles, but its type intersected with the full settings shape
and required every source key.
> - The LLM Wiki UI also passes managed routine default-drift metadata
through the shared routine list item shape, but that metadata was
missing from the public item type.
> - This pull request narrows those type contracts to match the existing
runtime behavior.
> - The benefit is restoring typecheck on `master` with a small,
non-behavioral follow-up.

## What Changed

- Added a `WikiEventIngestionSettingsUpdate` type that permits partial
source updates without weakening normalized stored settings.
- Added managed routine default-drift metadata to the plugin SDK
`ManagedRoutinesListItem` type.
- Mirrored that managed routine default-drift type in the host UI
component item type.

## Verification

- `pnpm --filter @paperclipai/plugin-llm-wiki typecheck`
- `pnpm --filter @paperclipai/plugin-sdk typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `git diff --check`

## Risks

- Low risk. This is a TypeScript type-contract fix only; no runtime
behavior or database schema changes.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local repository
editing and command execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Notes on checklist applicability: no screenshots are included because
the UI change is a shared type-only contract update with no visual
behavior change; no docs were required because no behavior or commands
changed.

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 21:07:06 -05:00
Dotta 508355b8fc [codex] Add LLM Wiki plugin package to master (#5716)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system is the extension surface for optional product
capabilities without baking every workflow into core.
> - The LLM Wiki plugin package was reviewed in stacked PR #5592, which
targeted `pap-9173-llm-wiki-rest`.
> - The stack base PR #5597 merged to `master` before #5592 was merged
into that branch, so the plugin package never reached `master`.
> - A direct PR from `pap-9173-llm-wiki-rest` back to `master` would be
noisy because that branch has diverged from current `master`.
> - This pull request reapplies the reviewed
`packages/plugins/plugin-llm-wiki/` package onto current `master` and
updates Docker deps-stage manifest coverage.
> - The branch intentionally no longer changes `pnpm-workspace.yaml`
after maintainer feedback; because the new package is now a root
workspace importer, the remaining integration question is how
maintainers want the root lockfile handled under the current PR policy.

## What Changed

- Added the LLM Wiki plugin package under
`packages/plugins/plugin-llm-wiki/` from the merged PR #5592 head.
- Preserved the post-review cleanup from #5592: generated
design/screenshot artifacts are not committed, and `src/ui/index.tsx` /
`src/wiki.ts` are small public entrypoints.
- Added the new plugin package manifest to the Docker deps stage so
policy can validate package manifest coverage.
- Removed the earlier `pnpm-workspace.yaml` exclusion per maintainer
request, so the plugin is included by the existing `packages/plugins/*`
workspace glob.

## Verification

Current head:
- PGlite migration harness: ran migrations 001-003, verified old
non-space distillation unique constraints were removed, inserted
duplicate cursor and work-item keys in a second space, then reran
migration 003 successfully
- `node ./scripts/check-docker-deps-stage.mjs`
- `git diff --check`

Known current-head install result after removing the workspace
exclusion:
- `pnpm install --frozen-lockfile` fails because `pnpm-lock.yaml` has no
importer for `packages/plugins/plugin-llm-wiki/package.json`.

Previously verified on the same plugin source before the
workspace-exclusion removal:
- `pnpm --filter @paperclipai/plugin-sdk build`
- `cd packages/plugins/plugin-llm-wiki && pnpm install --lockfile=false
&& pnpm test`

## Risks

- The branch now includes `packages/plugins/plugin-llm-wiki` in the root
workspace but does not update `pnpm-lock.yaml`. Root frozen install will
fail until maintainers choose a lockfile path that fits repo policy.
- Committing `pnpm-lock.yaml` directly on this PR conflicts with the
current PR policy check, while excluding the package from
`pnpm-workspace.yaml` was rejected in maintainer feedback.
- The package includes UI code already reviewed in #5592; generated
screenshot/design artifacts were intentionally removed per maintainer
request, so visual review should regenerate screenshots locally if
needed.
- The package depends on plugin host support from #5597, which is
already merged to `master`.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run the targeted checks listed above
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stack context: #5592 was merged into `pap-9173-llm-wiki-rest` after
#5597 had already merged that branch to `master`, so this follow-up PR
is needed to carry the plugin package itself into `master`.

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 20:45:41 -05:00
Devin Foley ad0bb57350 Fix exe.dev sandbox installs for gemini/opencode local adapters (#5737)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, including
running adapter CLIs inside remote sandboxes
> - The QA matrix in PAPA-316 spins up local-runtime adapters
(claude/gemini/opencode) against both SSH and the new exe.dev sandbox
provider, and "Test" exercises the same install + probe path the real
runtime uses
> - On exe.dev the QA matrix failed at three different points:
SSH/sandbox secret refs would not resolve, gemini-local could not find
npm, and opencode-local installed a binary that was not on the
probe-shell PATH
> - These are all environment-shape issues the runtime should handle,
not regressions in any individual adapter, so they need to be fixed in
the shared install/resolve layer before the matrix can pass
> - This pull request wires the environment id through to secret-ref
resolution, bootstraps npm from a portable Node tarball when the sandbox
image lacks Node, and symlinks the opencode binary into a directory that
non-login shells see
> - The benefit is that the QA matrix passes end-to-end on exe.dev, and
any future sandbox provider that ships without Node or relies on rc-file
PATH wiring gets the same fixes for free

## What Changed

- `server/src/services/environment-execution-target.ts`: pass the
environment `id` into `resolveEnvironmentDriverConfigForRuntime` for
both the sandbox and SSH branches, so `privateKeySecretRef` /
sandbox-provider secret refs (e.g. exe.dev `apiKey`) can resolve against
the secret store at runtime instead of throwing `Runtime secret
resolution requires an environment id`.
- `packages/adapter-utils/src/sandbox-install-command.ts`: extend
`buildSandboxNpmInstallCommand` with an `ENSURE_NPM_PREAMBLE` that, when
`npm` is missing, downloads a portable Node v22 tarball into
`$HOME/.local` and sets `PAPERCLIP_NPM_BOOTSTRAPPED=1` so the install
step skips sudo (sudo's `secure_path` would lose the freshly-installed
`npm` in `$HOME/.local/bin`). Distro-packaged Node from apt-get is
intentionally avoided because it tends to be too old to parse modern JS
syntax used by `@google/gemini-cli`.
- `packages/adapters/gemini-local/src/index.ts`: switch the hardcoded
`npm install -g @google/gemini-cli` to `buildSandboxNpmInstallCommand`,
so gemini-local picks up the same sudo-aware + npm-bootstrap behavior as
the other local adapters.
- `packages/adapters/opencode-local/src/index.ts`: append a step to the
install command that symlinks `$HOME/.opencode/bin/opencode` into
`$HOME/.local/bin`. The upstream installer only adds `~/.opencode/bin`
to PATH via `~/.bashrc`, which non-login `sh -c` probe invocations do
not source.
- `packages/adapter-utils/src/sandbox-install-command.test.ts`: cover
the new preamble plus the unchanged root/sudo/user-prefix branches.

## Verification

- `cd packages/adapter-utils && npm test -- sandbox-install-command`
(passes; new "bootstraps npm from a portable Node tarball when missing"
case is included).
- Manual: ran the in-app `Test` action against the QA matrix dev
instance for `QA exe.dev Claude`, `QA exe.dev Gemini`, and `QA exe.dev
OpenCode` — all three now report `status=pass` including the hello
probe. `QA SSH Claude` also passes; without the environment-id fix, SSH
resolution threw before the wrapper / install fixes could run.
- Suggested reviewer check: re-run the matrix on a fresh exe.dev
environment and confirm the install step no longer hits `npm: command
not found` for gemini and the opencode probe no longer hits `opencode:
command not found`.

## Risks

- Low/medium. The npm bootstrap pins Node `v22.11.0` from
`nodejs.org/dist`; if that URL becomes unreachable the install will fail
with a clear `curl` error rather than corrupting state. The bootstrap
path is only taken when `npm` is genuinely missing, so existing sandbox
images that ship with Node are unaffected.
- The opencode symlink uses `ln -sf` into `$HOME/.local/bin`, which is
created with `mkdir -p`; idempotent on re-install.
- The `id` change is a strict additive: callers previously got
`undefined` and only the secret-ref code paths actually read it. No
behavior change for environments without secret refs.

## Model Used

- Claude (Anthropic), `claude-opus-4-7`, with extended thinking and tool
use enabled. Iterated through the Paperclip QA matrix harness; no other
model assisted.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (n/a — runtime/install path only)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 14:28:22 -07:00
Devin Foley 5a64cf52a1 Add exe.dev sandbox provider plugin (#5688)
> _Stacked on top of #5685#5686#5687. Diff against master includes
commits from earlier PRs in the stack — review focuses on the two new
commits (`Add long-secret textarea variant to JsonSchemaForm
SecretField` + `Add exe.dev sandbox provider plugin`)._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose the
provider — today E2B, Daytona, and (in this stack) Cloudflare
> - exe.dev offers per-VM sandboxes via a small CLI / HTTP API — useful
for operators who want full Linux VMs (vs container/runtime-only
sandboxes)
> - The plugin shape mirrors the e2b plugin: lifecycle hooks (`new`,
`ls`, `rm`) drive exe.dev's CLI; SSH plumbing handles direct VM access
for adapters that need it
> - exe.dev VMs come up bare — `node` is not preinstalled, so the
Paperclip sandbox callback bridge (a Node script) needs Node 20
installed at VM init via `--setup-script`. The plugin defaults the setup
script to a Nodesource install
> - The auth field accepts long SSH private keys, which need a textarea
variant of the existing `SecretField` in `JsonSchemaForm` — added behind
a `maxLength > THRESHOLD` opt-in so other secret fields are unaffected
> - The benefit is that operators get exe.dev as a fully working sandbox
provider out of the box, with no manual VM provisioning required

## What Changed

**Shared UI support (`Add long-secret textarea variant to JsonSchemaForm
SecretField`):**

- `ui/src/components/JsonSchemaForm.tsx` + new
`JsonSchemaForm.test.tsx`: when a secret-formatted field declares
`maxLength` larger than the existing single-line threshold, render a
monospace textarea instead of the masked input. Short secrets (API keys,
tokens) keep the existing masked-input + show/hide toggle behavior.

**The exe.dev plugin (`Add exe.dev sandbox provider plugin`):**

- `packages/plugins/sandbox-providers/exe-dev/`: plugin entry, manifest,
plugin runtime, README, and 19-test Vitest suite.
- Manifest fields: API token (with `secret-ref` + `/exec` permission
notes — needs `new`, `ls`, `rm`), API URL override, optional SSH
username, optional SSH private key (uses the new `JsonSchemaForm`
textarea variant via `maxLength: 4096`), optional SSH identity-file
path, optional setup script.
- Default `--setup-script` is a Nodesource Node 20 install. exe.dev VMs
come up bare and the Paperclip sandbox callback bridge is a Node script,
so without Node preinstalled the bridge can't start. Operators can
override by supplying their own setup script.
- `runLifecycleCommand` redacts env values from the executed command
before surfacing it in error messages, so secrets passed via
`--env=KEY=VALUE` don't leak into operator-visible failures.
- The plugin distinguishes exe.dev's SSH onboarding failures (`Please
complete registration by running: ssh exe.dev`) from general SSH
failures and surfaces a clear remediation message.
- `scripts/release-package-manifest.json`: register the new plugin for
CI publish alongside the existing daytona / e2b providers.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
ui/src/components/JsonSchemaForm.test.tsx`
- `(cd packages/plugins/sandbox-providers/exe-dev && pnpm test)` — 19
passing

For an operator-side smoke test:

1. Get an exe.dev API token with `/exec` permission for `new`, `ls`,
`rm`.
2. Register the plugin in your Paperclip instance, configure an
environment with the token.
3. Create a sandbox env whose provider is `exe-dev`, then run a Codex or
Claude job against it. The default Node 20 setup script should bring the
VM up automatically.

## Risks

- Adds a new sandbox provider plugin that follows the existing daytona /
e2b shape; behavior on existing providers is unchanged.
- The `JsonSchemaForm` textarea variant only engages for fields that opt
in via `maxLength` larger than the existing threshold. All existing
secret fields (which don't declare a `maxLength`) keep their current
rendering. Test coverage pins both paths.
- The redaction in `runLifecycleCommand` is a defense-in-depth measure;
the test suite exercises the redaction path. If the redaction misses a
future env-arg shape, the worst case is restored behavior (secrets in
error messages), which is what the existing daytona / e2b plugins also
do today.
- Default setup script downloads from `deb.nodesource.com` over HTTPS at
VM init. Operators on air-gapped networks or with a different package
strategy can override the setup script.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — UI change is a textarea variant of an existing secret
field; will attach screenshots before requesting merge
- [x] I have updated relevant documentation to reflect my changes
(plugin README, manifest descriptions)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 07:42:18 -07:00
Devin Foley 486fb88a15 Add Cloudflare sandbox provider plugin (#5687)
> _Stacked on top of #5685#5686. Diff against master includes commits
from earlier PRs in the stack — review focuses on the two new commits
(`Extend sandbox callback bridge for Worker-hosted plugins` + `Add
Cloudflare sandbox provider plugin`)._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose which
provider backs that sandbox — today E2B and Daytona are bundled with the
platform
> - Cloudflare Workers + Durable Objects + the Sandbox SDK offer a
credible new option: globally distributed, cheap idle, and
operator-deployable as a single Worker
> - To plug it in, Paperclip needs (a) a provider plugin that speaks the
`PaperclipPluginManifestV1` lifecycle and (b) a small operator-deployed
Worker — the **bridge** — that adapts Paperclip's runtime RPCs to the
Cloudflare Sandbox SDK
> - The plugin extends the existing sandbox-callback-bridge with a
`bridge.transport: "worker"` discriminator so the platform routes
runtime RPCs through the Worker bridge instead of the in-process runner
> - This pull request adds the plugin, the bridge Worker template, and
the supporting adapter-utils + server hooks the new transport needs
> - The benefit is that operators can run sandboxes on Cloudflare's edge
with no new platform code beyond installing the plugin and deploying the
Worker

## What Changed

**Shared support (`Extend sandbox callback bridge for Worker-hosted
plugins`):**

- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
expose `expectedHostHeader` so plugin-side bridge clients can verify the
canonical request envelope before forwarding.
- `packages/adapter-utils/src/command-managed-runtime.{ts,test.ts}`:
relax the always-fresh runner construction so callers can re-use a
runner across exec calls (Worker-hosted bridges hold the runner inside a
Durable Object).
- `server/src/services/environment-runtime.ts` +
`environment-runtime.test.ts`: route Worker-hosted bridges through the
same env-shaping path as E2B and pin the `requestEnv` contract.
- `server/src/services/plugin-environment-driver.ts`: thread an optional
`issueId` through the runtime descriptor so bridges can scope leases to
the originating issue (used by Cloudflare to map a sandbox to the
issue/workflow for billing and audit).
- `packages/plugins/sdk/src/protocol.ts`: add `issueId?` to
`PluginEnvironmentDriverBaseParams` and the new `bridge.transport:
"worker"` discriminator that the new plugin declares.
- `server/__tests__/heartbeat-plugin-environment.test.ts`: pin the
heartbeat path against the new runtime descriptor.

**The Cloudflare plugin itself (`Add Cloudflare sandbox provider
plugin`):**

- `packages/plugins/sandbox-providers/cloudflare/`: plugin entry,
manifest, plugin runtime (lifecycle + bridge client), config parsing,
and Vitest coverage. Manifest declares `bridge.transport: "worker"` so
the platform routes runtime RPCs through the bridge client.
- `bridge-template/`: a Worker template the operator deploys with
`wrangler`. Owns Durable Object-backed sessions (`sessions.ts`),
exec/stream routes (`exec.ts`, `routes.ts`), and an HMAC auth layer
(`auth.ts`) that pins the `Host` header surface. Includes the
SDK-contract-correct exec implementation, lease recovery, and chunked
stdout/stderr streaming.
- Tests cover lease/session handoff (`bridge-template/src/exec.test.ts`,
`routes.test.ts`), bridge client request shaping
(`src/bridge-client.test.ts`), and end-to-end plugin behavior
(`src/plugin.test.ts`) including streamed exec output. 27 tests in
total.
- `README.md` walks the operator through deploying the bridge Worker,
registering the plugin, and configuring the runtime.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/sandbox-callback-bridge.test.ts
packages/adapter-utils/src/command-managed-runtime.test.ts
server/src/__tests__/environment-runtime.test.ts
server/src/__tests__/heartbeat-plugin-environment.test.ts`
- `(cd packages/plugins/sandbox-providers/cloudflare && pnpm test)` — 27
passing

For an operator-side smoke test:

1. Deploy the bridge: `cd
packages/plugins/sandbox-providers/cloudflare/bridge-template &&
wrangler deploy`
2. Register the plugin in your Paperclip instance, point its bridge URL
at the deployed Worker, set the HMAC shared secret.
3. Create a sandbox environment whose provider is `cloudflare`, then run
a Codex or Claude job against it.

## Risks

- Adds a new `bridge.transport: "worker"` code path, but the existing
E2B / Daytona transports go through the same shaped helpers and have
explicit test coverage that pins their behavior unchanged.
- The Worker bridge stores session state in a Durable Object; operator
instances must be aware of the corresponding Cloudflare costs (DO
requests, storage). Documented in the README.
- The `issueId` plumbing is optional throughout — existing plugins that
don't supply it continue to work.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
(plugin README, bridge-template README)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 07:33:13 -07:00
Devin Foley 0fe39a2d5c fix(cursor-local): resolve sandbox agent installs from cursor bin (#5686)
> _Stacked on top of #5685 (Harden remote sandbox runtime). Diff against
master includes commits from earlier PRs in the stack — review focuses
on the new commit only._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The cursor-local adapter wraps the Cursor Agent CLI so a Paperclip
workflow can drive it inside a sandbox
> - When the adapter runs in a remote sandbox, the Cursor Agent CLI
installs under `$HOME/.local/bin/cursor-agent` (or wherever
`$XDG_BIN_HOME` points), not on the global PATH
> - The existing post-install resolution assumed `cursor-agent` would
resolve via the sandbox's login shell PATH after `npm install -g`, which
fails on sandboxes where the install lands in a user-prefixed directory
that isn't on PATH at probe time
> - This pull request resolves the agent CLI from the cursor binary's
own directory (`dirname "$(command -v cursor)"`) so the install probe
and execute path agree on a real binary location
> - The benefit is that cursor-local works correctly on any sandbox
provider where `npm install` lands in a user-prefixed directory

## What Changed

- `packages/adapters/cursor-local/src/server/remote-command.ts`: resolve
the cursor-agent binary from the cursor bin directory after install,
instead of relying on PATH.
- `packages/adapters/cursor-local/src/server/test.ts`: corresponding
probe tweak.
- `packages/adapters/cursor-local/src/server/test.test.ts` (new) +
`remote-command.test.ts`: focused coverage that exercises the install +
resolve path against a sandbox runner that places the binary in a
user-prefixed directory.

## Verification

- `pnpm exec vitest run --no-coverage
packages/adapters/cursor-local/src/server/test.test.ts
packages/adapters/cursor-local/src/server/remote-command.test.ts
packages/adapters/cursor-local/src/server/execute.test.ts`

All passing locally.

## Risks

- Local cursor-local runs are unaffected — the resolution change only
kicks in for the sandbox install path.
- Low risk; isolated to one adapter.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: tool use (Read/Edit/Bash), no code execution beyond
local repo commands

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 00:41:20 -07:00
Devin Foley b24c6909e8 Harden remote sandbox runtime probes, timeouts, and installs (#5685)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs inside a sandbox environment so its CLI is isolated
from the host
> - Sandbox-backed adapter runs go through a small set of shared helpers
— `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback
bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings
> - When standing up new sandbox provider plugins, the existing helpers
timed out, missed install fallbacks, or leaned on assumptions that only
held for E2B
> - Local adapters (`claude-local`, `codex-local`, `gemini-local`,
`opencode-local`) needed slightly hardened probes so they could install
themselves and validate inside *any* remote sandbox transport, not just
E2B
> - This pull request bundles those runtime fixes so future sandbox
provider plugins inherit a working baseline
> - The benefit is that adding a new sandbox provider plugin no longer
requires touching adapter-utils or each local-adapter probe — the
supporting infra is already correct

## What Changed

- `packages/adapter-utils/src/execution-target.ts`: introduce
`DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and
`resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters
keep the historical "0 means no adapter timeout" behavior;
sandbox-backed runs without an explicit `timeoutSec` get an explicit
30-minute default so remote installs and warm-up don't time out at the
per-RPC default. Plumbed `timeoutSec` through
`ensureAdapterExecutionTargetCommandResolvable` so install probes inside
a sandbox honor adapter-level overrides instead of the bridge's 5-minute
default.
- `packages/adapters/opencode-local/src/index.ts`: switch
`SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl
-fsSL https://opencode.ai/install | bash`. The npm package reifies four
large prebuilt-binary subpackages in parallel even though only one
matches the host arch; on bandwidth-constrained sandboxes that blew
through the 240s install budget. The official installer fetches one
arch-specific binary and adds `$HOME/.opencode/bin` to PATH via
`~/.bashrc`, which the sandbox-callback-bridge login-shell script
already sources.
- `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden
remote-target probes — pass `--skip-git-repo-check` for Codex when
probing outside a repo, normalize permission flags for Claude, and add
`*.remote.test.ts` coverage that exercises the remote-sandbox path
explicitly for each adapter.
- `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}`
(new): add `buildSandboxNpmInstallCommand` helper.
`server/src/adapters/registry.ts` + new
`server/src/__tests__/adapter-registry.test.ts`: wire adapter install
commands so they fall back to a writable `$HOME/.local` prefix when
global install isn't available.
- `server/src/__tests__/plugin-worker-manager.test.ts` + new
`server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call
timeout overrides so plugin worker exec calls honor the caller's timeout
instead of the worker's default.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/execution-target-sandbox.test.ts
packages/adapter-utils/src/sandbox-install-command.test.ts`
- `pnpm exec vitest run --no-coverage
server/src/__tests__/plugin-worker-manager.test.ts
server/src/__tests__/adapter-registry.test.ts
server/src/__tests__/claude-local-adapter-environment.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter-environment.test.ts`
- `pnpm exec vitest run --no-coverage
packages/adapters/codex-local/src/server/test.remote.test.ts
packages/adapters/opencode-local/src/server/test.remote.test.ts
packages/adapters/codex-local/src/server/codex-args.test.ts
packages/adapters/codex-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts`

All passing locally.

## Risks

- Touches shared `adapter-utils` and several `*-local` adapters. The
30-minute default applies only when both (a) the target is
`remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH
paths are unchanged. New test coverage was added alongside each behavior
change to pin the contracts.
- Switching OpenCode's install command to the official installer is a
behavior change for any operator running OpenCode inside a remote
sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND`
only runs when an adapter is being installed inside a sandbox).
- Low risk overall — no migrations, no API surface change.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep),
no code execution beyond local repo commands

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 00:31:54 -07:00
Devin Foley 534aee66ae Add cursor_cloud adapter for Cursor SDK + Cloud Agents API v1 (#5664)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - There are many adapter types, one per agent-runtime product (Claude,
Codex, OpenCode, Cursor local CLI, etc.)
> - Cursor shipped a public TypeScript SDK on 2026-04-29 that exposes
Cursor's full hosted-agent platform (cloud VMs, harness, MCP, skills,
hooks)
> - Paperclip had no first-class adapter for this — agents that wanted
to use Cursor's managed cloud runtime had to fall back to the local CLI
adapter, which loses the cloud session, streaming, and durable run model
> - This PR adds a new `cursor_cloud` adapter built directly on
`@cursor/sdk`, with Paperclip's heartbeat mapped to Cursor's
durable-agent + per-run model
> - The benefit is that any Paperclip agent can now drive a Cursor cloud
agent across heartbeats with native session reuse, streaming, and
cancellation, while Paperclip remains the source of truth for issue/task
state

## What Changed

- New built-in adapter package `packages/adapters/cursor-cloud` (15
files, ~1.7k LOC) backed by `@cursor/sdk` ^1.0.12
- `src/server/execute.ts` — SDK-first lifecycle: `Agent.create` /
`Agent.resume` / `Agent.getRun` / `agent.send` / `run.stream` /
`run.wait`, with session reuse keyed on the (runtime env type, env name,
repo set) tuple
- `src/server/session.ts` — codec for `cursorAgentId` + `latestRunId` +
repo metadata, persisted in `runtime.sessionParams`
- `src/server/test.ts` — environment probe via `Cursor.me()` and
optional model validation via `Cursor.models.list()`
- `src/ui/parse-stdout.ts` + `src/cli/format-event.ts` — normalize
Cursor SDK message types (`status`, `thinking`, `assistant`, `user`,
`tool_call`, `tool_result`, `result`) into Paperclip transcript events
for the UI and CLI
- Registrations: `packages/shared/src/constants.ts`,
`packages/adapter-utils/src/session-compaction.ts`,
`server/src/adapters/{registry,builtin-adapter-types}.ts`,
`ui/src/adapters/{registry,adapter-display-registry}.ts` +
`ui/src/adapters/cursor-cloud/index.ts`, `cli/src/adapters/registry.ts`,
plus workspace deps in `cli`/`server`/`ui` `package.json`
- `ui/src/components/AgentConfigForm.tsx` — hide local-Cursor
`mode`/thinking-effort field for `cursor_cloud` (different config
surface)
- 11 vitest tests covering execute paths (fresh create, matching-resume,
active-run reattach, non-finished result), session codec round-trip,
transcript parsing, and config building

## Verification

Reviewer steps:

```bash
pnpm install
pnpm --filter @paperclipai/adapter-cursor-cloud typecheck   # → clean
pnpm vitest run packages/adapters/cursor-cloud              # → 11/11 passing
```

End-to-end check against a real Cursor cloud agent (requires
`CURSOR_API_KEY` and Cursor GitHub-app install on the target repo):

1. Create a `cursor_cloud` agent in Paperclip with `repoUrl` set to the
test repo, `repoStartingRef: main`, and `env.CURSOR_API_KEY` set
2. Trigger a heartbeat → adapter calls `Agent.create({ cloud: { env: {
type: "cloud" }, repos: [...] } })`, streams events, terminates on
`finished`
3. Trigger a second heartbeat → adapter calls `Agent.resume` or
`agent.send` follow-up depending on prior-run state, reusing
`cursorAgentId`
4. The Paperclip UI/CLI transcript reflects Cursor `status` / `thinking`
/ `assistant` events as they stream
5. Cancellation from Paperclip maps to `run.cancel()` or Cloud API v1
`cancelRun` for cross-heartbeat cancellation

A direct-SDK smoke run against a real repo (devinfoley/my_test_project @
main) confirmed: `Cursor.me()` ok → `Agent.create` → `agent.send` →
`run.stream()` (30 events) → terminal status `finished` in ~11s.

## Risks

- **New adapter, additive only.** No existing adapter or registry is
replaced; current `cursor` local-CLI adapter is untouched. Default
behavior of any existing agent is unchanged.
- **External dependency on `@cursor/sdk`.** Cursor's SDK is v1.0.x and
may evolve. Mocked unit tests cover the public surface used here; if the
SDK breaks compatibility we update the adapter independently.
- **Cost/budget.** `cursor_cloud` runs on Cursor's billed cloud VMs;
operators must understand they are spending money outside Paperclip's
budget controls when they enable this adapter. Same shape as other
API-billed adapters.
- **No webhook support in V1.** The SDK already provides
stream/wait/cancel/reattach, so V1 does not require a public callback
URL. If a future use case needs out-of-band wakes, we add a Cloud API v1
webhook bridge as a separate change. This is called out in the issue
plan document.
- **Lockfile.** Per repo policy, `pnpm-lock.yaml` is intentionally not
in this PR — CI's lockfile workflow will update it on merge given the
manifest changes.

## Model Used

- Provider: Anthropic Claude (via Claude Code / Paperclip `claude_local`
adapter)
- Model: `claude-opus-4-7` (Claude Opus 4.7), knowledge cutoff January
2026
- Mode: standard tool-use with extended reasoning
- Context: ~200k token window
- Capabilities used: code generation, multi-file edits, shell/test
execution, GitHub PR workflow

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass (11/11 in
`packages/adapters/cursor-cloud`)
- [x] I have added or updated tests where applicable (4 new test files,
11 cases)
- [ ] If this change affects the UI, I have included before/after
screenshots (the only UI change is hiding the local-Cursor mode field on
the `cursor_cloud` adapter — happy to attach a screenshot if the
reviewer wants one)
- [x] I have updated relevant documentation to reflect my changes (issue
plan document supersedes the pre-SDK design; tracked in PAPA-203)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-10 17:21:04 -07:00
Dotta 0096b56a1c [codex] Add LLM Wiki plugin host support (#5597)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system needs host contracts and runtime support before
large plugins can integrate cleanly.
> - The source branch mixed the LLM Wiki package with supporting
host/runtime work, managed plugin skills, root-level storage spaces, and
a bookmarks reference plugin.
> - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be
split by file boundary: plugin package separately from everything else.
> - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have
plugin-local spaces, but Paperclip core should not reorganize top-level
local storage into spaces.
> - Follow-up review clarified that the bookmarks example should not
ship in this PR either.
> - This pull request contains the
non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime
state under the selected Paperclip instance root, and no longer includes
the bookmarks example.

## What Changed

- Added/updated plugin host contracts, SDK types, worker RPC plumbing,
managed plugin skill support, and related server tests.
- Removed the bookmarks example plugin package and its
bundled-example/workspace references.
- Removed the root-level local spaces CLI/migration surface and restored
instance-root runtime defaults for config, db, logs, storage, secrets,
workspaces, projects, and adapter homes.
- Replaced shared root `space-paths` helpers with `home-paths` helpers
for core runtime storage.
- Tightened stranded recovery unique-conflict detection so concurrent
recovery scans reuse the raced recovery issue when Postgres errors are
wrapped.
- Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff;
plugin-local spaces remain in the stacked plugin-only PR.

## Verification

- `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts
cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts
packages/shared/src/home-paths.test.ts
packages/db/src/runtime-config.test.ts
server/src/__tests__/agent-instructions-service.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts`
- `pnpm exec vitest run packages/db/src/runtime-config.test.ts`
- `pnpm exec vitest run
server/src/__tests__/plugin-routes-authz.test.ts`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the
raced stranded recovery issue"` skipped locally because embedded
Postgres did not initialize on this macOS temp host; the code path was
typechecked and is covered by Linux CI.
- Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`,
`spaces migrate-default`, `@paperclipai/shared/space-paths`,
`registerSpacesCommands`, or the removed bookmarks example.
- Previous PR head `4f23e034` had green GitHub checks: `verify`, all
four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk,
and `Greptile Review`. Current head `582f466d` is re-running checks
after the bookmarks deletion.

## Risks

- Plugin host changes touch shared runtime paths, so regressions would
most likely appear in adapter startup, plugin loading, or local dev path
defaults.
- Removing the bookmarks example also removes one demonstration of
plugin database namespaces plus local-folder persistence; remaining
plugin examples still cover bundled example discovery and plugin host
flows.
- The plugin package itself is intentionally deferred to the stacked
plugin-only PR, where LLM Wiki plugin-local spaces live.
- Existing installs that tested the transient root-level spaces CLI
should stop using it; this PR intentionally removes that unsupported
migration surface before merge.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass, except where noted above
for host-specific embedded Postgres initialization
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stacked follow-up: PR #5592 contains only
`packages/plugins/plugin-llm-wiki/` and targets this branch.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-10 07:34:12 -05:00
Devin Foley 2f72cb29ea chore: update drizzle-orm to 0.45.2 (#5589)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The server, DB package, and CLI all rely on the shared Drizzle ORM
dependency for core persistence flows.
> - A published install was still resolving nested `drizzle-orm@0.38.4`,
which left the production package graph behind the intended security
update.
> - The repo’s documented dependency policy says GitHub Actions owns
`pnpm-lock.yaml`, so the correct maintainer workflow is to update
dependency manifests in the feature PR and let the lockfile refresh
happen separately after merge.
> - This pull request therefore keeps the Drizzle upgrade to the package
manifests only and leaves lockfile regeneration to the existing `Refresh
Lockfile` automation.

## What Changed

- Updated `drizzle-orm` dependency declarations in `cli/package.json`,
`packages/db/package.json`, and `server/package.json` from `0.38.4` /
`^0.38.4` to `0.45.2` / `^0.45.2`.
- Re-verified the packed `@paperclipai/db` and `@paperclipai/server`
publish payloads to confirm their generated `package.json` files
advertise `drizzle-orm ^0.45.2`.
- Removed the temporary lockfile/CI follow-up commits so the branch now
matches the intended manifest-only protocol.

## Verification

- `pnpm list drizzle-orm -r --depth 0`
- `pnpm exec vitest run packages/db/src/client.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm run test:release-registry`
- Packed `@paperclipai/db` and `@paperclipai/server` locally and
inspected the tarball `package.json` files to confirm they advertise
`drizzle-orm ^0.45.2`.

## Risks

- Low to moderate risk: the runtime code paths are unchanged, but
downstream lockfile refresh now depends on the existing post-merge
GitHub automation working as documented.
- A separate packaging/versioning issue around unpublished
`@paperclipai/plugin-sdk@1.0.0` showed up during a raw local tarball
install experiment; that is called out for reviewers but is not part of
this Drizzle bump.

## Model Used

- OpenAI Codex via the `codex_local` adapter, using a GPT-5-based coding
agent with terminal tool use and code execution. The adapter does not
expose a public exact model ID or context-window value in this
environment.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-09 21:31:57 -07:00
Dotta 778e775c35 Add secrets provider vaults and remote import (#5429)
## Thinking Path

> - Paperclip orchestrates AI-agent companies and needs secrets handling
to work across local development, hosted operators, and governed agent
execution.
> - The affected subsystem is the company-scoped secrets control plane:
database schema, server services/routes, CLI workflows, and the Secrets
settings UI.
> - The gap was that secrets were local-only and operators could not
manage provider vaults or import existing remote references without
exposing plaintext.
> - This branch adds provider vault configuration plus an AWS Secrets
Manager remote-import path while preserving company boundaries, binding
context, and audit trails.
> - I kept the PR to a single branch PR, removed unrelated
lockfile/package drift, rebased the full branch onto the current
`public-gh/master`, and addressed fresh Greptile findings.
> - The benefit is a reviewable implementation of provider-backed
secrets with focused tests covering provider selection, import
conflicts, deleted secret reuse, rotation guards, and AWS signing
behavior.

## What Changed

- Added provider vault support for company secrets, including provider
config storage, default vault handling, health checks, binding usage,
access events, and remote import preview/commit.
- Added an AWS Secrets Manager provider using SigV4 request signing,
bounded request timeouts, namespace guardrails, cached runtime
credential resolution, and external-reference linking without plaintext
reads.
- Added Secrets UI surfaces for vault management and remote import, plus
CLI/API documentation for setup and operations.
- Stabilized routine webhook secret binding paths and SSH
environment-driver fixture bindings discovered during verification.
- Addressed Greptile and CI findings: no lockfile/package drift,
monotonic migration metadata, disabled-vault default races, soft-deleted
secret hiding/recreate behavior, remove behavior with disabled vaults,
soft-deleted external-reference re-import, non-active rotation guards,
managed-secret soft deletion through PATCH, and per-call AWS SDK
credential client churn.
- Rebased this branch onto `public-gh/master` at `0e1a5828` and
force-pushed with lease to keep this as the single PR for the branch.

## Verification

- `git fetch public-gh master`
- `git rebase public-gh/master`
- `git diff --name-only public-gh/master...HEAD | grep
'^pnpm-lock\.yaml$' || true` confirmed `pnpm-lock.yaml` is not in the PR
diff.
- Confirmed migration ordering: master ends at `0081_optimal_dormammu`;
this PR adds `0082_dry_vision` and
`0083_company_secret_provider_configs`.
- Inspected migrations for repeat safety: new tables/indexes use `IF NOT
EXISTS`; foreign keys are guarded by `DO $$ ... IF NOT EXISTS`; column
additions use `ADD COLUMN IF NOT EXISTS`.
- `pnpm -r typecheck` passed before the Greptile follow-up commits.
- `pnpm test:run` ran the full stable Vitest path before the Greptile
follow-up commits; it completed with 3 timing-related failures under
parallel load: `codex-local-execute.test.ts`,
`cursor-local-execute.test.ts`, and `environment-service.test.ts`.
- `pnpm --filter @paperclipai/server exec vitest run
src/__tests__/codex-local-execute.test.ts
src/__tests__/cursor-local-execute.test.ts
src/__tests__/environment-service.test.ts` passed on targeted rerun
(`24/24`).
- `pnpm build` passed before the Greptile follow-up commits. Vite
reported existing chunk-size/dynamic-import warnings.
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/secrets-service.test.ts` passed (`26/26`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/aws-secrets-manager-provider.test.ts
src/__tests__/secrets-service.test.ts` passed (`39/39`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
typecheck` passed.
- Captured Storybook screenshots from `ui/storybook-static` for visual
review.
- Latest PR checks on `5ca3a5cf`: `policy`, serialized server suites
1/4-4/4, `Canary Dry Run`, `e2e`, `security/snyk`, and `Greptile Review`
pass; aggregate `verify` is still registering the completed child
checks.
- Greptile review loop continued through the latest requested pass; all
Greptile review threads are resolved and the latest `Greptile Review`
check on `5ca3a5cf` passed with 0 comments added.

## Screenshots

Before: the provider-vault and remote-import surfaces did not exist on
`master`; these are after-state screenshots from the Storybook fixtures.

![Secrets
inventory](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/secrets-inventory.png)

![Secret binding
picker](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/secret-binding-picker.png)

![Environment editor with
secrets](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/env-editor-with-secrets.png)

## Risks

- Migration risk: this adds new secret provider tables and extends
existing secret rows. The migrations were checked for monotonic ordering
and idempotent guards, but reviewers should still inspect upgrade
behavior carefully.
- Provider risk: AWS support uses direct SigV4 requests. Automated tests
cover signing, request timeouts, vault-config selection, namespace
guardrails, pending-version archival, sanitized provider errors, and
service-level cleanup paths. A real-vault AWS smoke test remains
deployment validation for an operator with AWS credentials rather than
an unverified merge blocker in this local branch.
- UI risk: the Secrets page and import dialog are large new surfaces;
screenshots are included above for reviewer inspection.
- Verification risk: the full local stable test command hit
parallel-load timing failures, although the exact failed files passed
when rerun directly.
- Operational risk: remote import intentionally avoids plaintext reads;
operators must understand that imported external references resolve at
runtime and may fail if AWS permissions change.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent with local shell/tool use in the
Paperclip worktree. Exact context-window size was not exposed by the
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:22:17 -05:00
Devin Foley 06e6ee25cd Add Daytona sandbox provider plugin (#5580)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents need isolated sandbox environments to execute work safely;
Paperclip already supports E2B as a sandbox provider plugin
> - Users want to use Daytona (https://www.daytona.io/) as an
alternative sandbox backend, but no plugin existed for it
> - Without a Daytona plugin, teams that prefer Daytona's
pricing/regions/runtime can't run Paperclip agents on it
> - This pull request adds a `@paperclip/sandbox-provider-daytona`
plugin that mirrors the existing E2B plugin shape and wires up Daytona's
`@daytonaio/sdk` for sandbox lifecycle, command execution, and shell
detection
> - The benefit is that operators can pick Daytona as a first-class
sandbox provider without touching core code, broadening Paperclip's
runtime options

## What Changed

- New plugin package `packages/plugins/sandbox-providers/daytona` with
manifest, worker entry, and provider implementation backed by
`@daytonaio/sdk`
- Implements sandbox create/destroy/exec/upload/download lifecycle,
shell command detection, and config/env wiring consistent with the E2B
plugin
- Adds unit tests under `src/plugin.test.ts` and a README documenting
setup and the `DAYTONA_API_KEY` requirement
- Minor adjustments in `scripts/paperclip-issue-update.sh`,
`packages/shared/src/issue-thread-interactions.test.ts`, and
`packages/shared/src/validators/issue.ts` to support the integration

## Verification

- Re-ran the full sandbox provider matrix on the QA Paperclip instance
using Daytona as the runtime — all 6 adapters executed inside the
Daytona sandbox with zero `environmentExecute` timeouts
- 5/6 adapters pass cleanly (or with informational warns); the only
failure is `codex_local`, which is an OpenAI quota/billing issue
unrelated to Daytona
- `pnpm --filter @paperclip/sandbox-provider-daytona test` runs the
plugin unit tests

## Risks

- New optional plugin; no behavior change for users who don't enable it
- Requires `DAYTONA_API_KEY` for runtime use — documented in the plugin
README
- Daytona SDK is a new external dependency; tracked in the plugin's own
package.json so it doesn't affect the core install footprint

## Model Used

- Claude Opus 4.7 (`claude-opus-4-7`), extended thinking, tool use
enabled

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — backend plugin)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-09 11:50:12 -07:00
Devin Foley 0e1a582831 Revert "Add experimental newest-first issue thread" (#5460)
This is actually bad. Glad it was under experiments.
2026-05-07 16:50:31 -07:00
Devin Foley a904effb96 Add experimental newest-first issue thread (#5455)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so issue
threads are a core operator surface for reviewing work.
> - The issue detail page is the place where humans read agent messages,
user comments, and execution context together.
> - That thread originally rendered oldest-first, which made recent
activity harder to see during active review.
> - Reversing the thread order changes navigation expectations,
timestamp placement, and the "Jump to latest" affordance, so the UI
behavior needed to move as a coherent set.
> - Because this is a visible core-product behavior shift, it also
needed a safe rollout path instead of becoming the default immediately.
> - This pull request adds the newest-first issue thread behavior behind
an Experimental setting, updates the thread UI to match that mode, and
keeps the legacy oldest-first experience unchanged by default.
> - The benefit is that reviewers can opt into a more recent-first issue
workflow without forcing a global behavior change on every Paperclip
instance.

## What Changed

- Reversed issue thread rendering so the newest comments and messages
appear first when the experiment is enabled.
- Moved the plain comment timestamp into the card header in newest-first
mode and kept the legacy timestamp placement for oldest-first mode.
- Moved the `Jump to latest` control to the bottom of the thread in
newest-first mode while leaving the existing top placement for the
legacy mode.
- Added the `Enable Newest-First Issue Thread` experimental instance
setting and wired issue detail to read that toggle.
- Added regression coverage for thread order, timestamp placement,
jump-button placement, and the issue-detail experiment toggle behavior.

## Verification

- `pnpm -r typecheck`
- `pnpm test:run`
- `pnpm build`
- Focused checks that also passed during issue review:
- `pnpm vitest run src/components/IssueChatThread.test.tsx
src/pages/IssueDetail.test.tsx` in `ui/`
- `pnpm vitest run src/__tests__/instance-settings-routes.test.ts` in
`server/`
- Manual review path:
- Enable `Instance Settings > Experimental > Enable Newest-First Issue
Thread`
- Open an issue with comments/messages and confirm newest activity
renders first, timestamps move into the header, and `Jump to latest`
sits below the thread
- Disable the experiment and confirm the legacy oldest-first behavior
returns

## Risks

- Low risk: the behavioral change is gated behind an instance-level
experimental toggle and defaults off.
- The main regression risk is thread navigation drift between the two
modes, especially around anchor scrolling and the `Jump to latest`
affordance.
- There is some UI coupling between issue-detail query state and
experimental settings fetches, so future changes in that area should
keep both modes covered.
- Screenshots are not attached in this PR body; verification is
described with automated coverage and manual steps instead.

> I checked [`ROADMAP.md`](ROADMAP.md). This is a scoped issue-thread UX
improvement and rollout gate, not a duplicate of a roadmap-level planned
core feature.

## Model Used

- OpenAI Codex via the local `codex_local` Paperclip adapter,
GPT-5-based coding agent with terminal tool use and local code execution
in this repository worktree.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-07 16:45:12 -07:00
Devin Foley 4269545b19 Stabilize Cursor sandbox runtime resolution (#5446)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The Cursor adapter spawns the Cursor CLI against local, SSH, and
sandbox execution targets; on a fresh sandbox lease, it has to resolve
where Cursor was installed
> - The previous resolver only looked for `~/.local/bin/cursor-agent`
even though the official installer (and the adapter's own
`SANDBOX_INSTALL_COMMAND`) sometimes lays the binary down as
`~/.local/bin/agent`, so a sandbox where the install ran successfully
would still fail to find the CLI
> - This pull request lets the resolver accept either basename and lets
the caller pass an optional `remoteSystemHomeDirHint` so a probe doesn't
pay the cost of a remote `printf $HOME` round-trip when the home
directory is already known
> - The benefit is sandboxed Cursor runs find the binary that the
install actually produced, and runtime probes are cheaper when the home
dir is already resolved

## What Changed

- `packages/adapters/cursor-local/src/server/remote-command.ts`: accept
either `agent` or `cursor-agent` as the preferred basename; new optional
`remoteSystemHomeDirHint` short-circuits the home-dir probe
- `packages/adapters/cursor-local/src/server/execute.ts`: thread the
home-dir hint through, prefer the resolved binary path, and shift the
effective execution cwd to the per-run managed subdirectory once the
runtime is prepared
- New `remote-command.test.ts` and `execute.test.ts` cover both
basenames, the hint short-circuit, and the cwd shift
- `packages/adapters/cursor-local/src/index.ts`: update doc string to
reflect the broader resolution
- `execute.remote.test.ts` updated to expect the managed-subdirectory
cwd shape introduced by the cwd shift

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-cursor-local` — 6/6 passing
- `pnpm typecheck` clean
- Manual: a fresh sandbox lease with `npm install -g …`-installed Cursor
(binary lands as `~/.local/bin/agent`) now runs cleanly through the
adapter

## Risks

Low. Resolver is strictly broader (matches a superset of paths);
existing setups with `~/.local/bin/cursor-agent` continue to work. The
home-dir hint is opt-in; callers that don't pass it get the existing
probe behavior. Cursor's effective execution cwd now matches the rest of
the adapters (per-run managed subdirectory) — sessions previously rooted
at the workspace root will land in the new subdirectory.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
both basenames + hint short-circuit + cwd shift
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---

> **Stacked PR.** Sits on top of #5445 (which sits on #5444). Cumulative
diff against `master` includes both of those PRs' content; the files
touched by *this* PR's commit are listed under "What Changed" above.
Will rebase onto `master` and force-push once the prerequisite PRs
merge.
2026-05-07 15:00:28 -07:00
Devin Foley fe3904f434 Stabilize runtime probes and Codex env tests (#5445)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters expose a Test action that probes the configured runtime —
install, resolvability, hello — to give operators a fast yes/no on
whether an environment is healthy
> - The Codex test path was running its hello probe directly without
going through the managed-runtime preparation that production runs use,
so a healthy production setup could still report a probe failure
> - The plugin worker manager wasn't surfacing terminated workers
cleanly, leaving the runtime probe waiting on a dead worker until the
request timed out
> - This pull request routes the Codex test probe through
`prepareAdapterExecutionTargetRuntime` (so it sees the same managed
Codex home production sees), exposes `commandCwd` on
`createCommandManagedRuntimeClient` so callers can target a per-probe
directory without leaking the workspace `remoteCwd`, and propagates
plugin-worker termination as a usable error instead of a hang
> - The benefit is the Codex Test action mirrors production behavior
end-to-end, and probes against a terminated plugin worker fail fast
instead of timing out

## What Changed

- `packages/adapter-utils/src/command-managed-runtime.ts`: rename the
`remoteCwd` knob to `commandCwd` so callers can target a per-probe
directory without inheriting the workspace cwd; matching test coverage
in `command-managed-runtime.test.ts`
- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
small fixes to keep callback bridge stop semantics deterministic
- `packages/adapters/codex-local/src/server/test.ts`: thread the Codex
hello probe through `prepareAdapterExecutionTargetRuntime` +
`prepareManagedCodexHome` so the probe sees the same managed home
production sees; new `test.remote.test.ts` covers the remote probe path
- `packages/adapters/cursor-local/src/server/execute.ts`: small
probe-side cleanup that aligns with the new commandCwd contract
- `server/src/services/plugin-worker-manager.ts`: surface plugin-worker
termination as a structured error so callers fail fast; new
`plugin-worker-terminated.cjs` fixture and
`plugin-worker-manager.test.ts` cases pin the behavior

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project @paperclipai/server` —
1749/1750 passing (1 unrelated skip)
- `pnpm typecheck` clean

## Risks

Low–medium. The `remoteCwd → commandCwd` rename is a parameter renaming
on an internal helper used only by adapter test/execute paths in this
repo. The plugin-worker-terminated path was previously a hang; failing
fast may surface latent timeouts as explicit termination errors in
callers that already expected them.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
commandCwd, plugin-worker termination, and Codex remote test path
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---

> **Stacked PR.** Sits on top of #5444 which adds the per-run runtime
API surface this PR builds on. Cumulative diff against `master` includes
that PR's content; the files touched by *this* PR's commit are listed
under "What Changed" above. Will rebase onto `master` and force-push
once #5444 merges.
2026-05-07 14:52:31 -07:00
Devin Foley 12cb7b40fd Harden remote workspace sync and restore flows (#5444)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs against a remote target, Paperclip syncs the
workspace out to the remote at run start and restores changes back to
the local workspace at run end
> - The previous restore flow naïvely overwrote local files with
whatever the remote returned, so files that the remote run never touched
but had timestamp/mode drift could be needlessly rewritten — and a
single static `refs/paperclip/ssh-sync/imported` ref made concurrent SSH
workspace exports race on the same git ref
> - This pull request adds a `workspace-restore-merge` module that diffs
a pre-run snapshot against the post-run remote state and only writes
back files the remote actually changed; SSH workspace exports now use a
per-import unique ref so concurrent runs can't trample each other
> - Every adapter's execute path threads the snapshot through
`prepareAdapterExecutionTargetRuntime` so the merge has the baseline it
needs
> - The benefit is workspace restores no longer churn untouched files,
and concurrent SSH runs no longer collide on the import ref

## What Changed

- `packages/adapter-utils/src/workspace-restore-merge.{ts,test.ts}`: new
module — directory snapshot (kind/mode/sha256/symlink target) plus
snapshot-aware merge that writes only the files the remote changed
- `packages/adapter-utils/src/ssh.ts`: SSH workspace export uses a
per-import unique ref (`refs/paperclip/ssh-sync/imported/<uuid>`);
restore goes through the new merge helper; `ssh-fixture.test.ts` covers
the unique-ref + merge paths
- `packages/adapter-utils/src/sandbox-managed-runtime.ts` +
`remote-managed-runtime.ts`: thread the snapshot/merge through the
sandbox and SSH paths
- `packages/adapter-utils/src/server-utils.{ts,test.ts}` +
`execution-target.ts`: helpers for capturing the pre-run snapshot;
`prepareAdapterExecutionTargetRuntime` gains required `runId` and
optional `workspaceRemoteDir`, and returns the realized
`workspaceRemoteDir`
- Each adapter's `execute.ts` (acpx, claude, codex, cursor, gemini,
opencode, pi) takes the snapshot at run start and passes it through to
the runtime restore
- Remote execute test mocks updated to match the new
`prepareWorkspaceForSshExecution` return shape and the per-run
`${managedRemoteWorkspace}` cwd subdirectory

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-acpx-local --project
@paperclipai/adapter-claude-local --project
@paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project
@paperclipai/adapter-gemini-local --project
@paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local` — 196/196 passing
- `pnpm typecheck` clean across the workspace

## Risks

Medium. The restore path now writes a strict subset of what it
previously did — files the remote did not touch are no longer rewritten.
If any flow was relying on a touch-without-content-change being copied
back (timestamp or permission propagation only), that behavior is now
skipped. Snapshot capture adds an O(N-files-in-workspace) hash pass at
run start; the cost is bounded by the existing exclude list. The `runId`
parameter on `prepareAdapterExecutionTargetRuntime` is now required —
every in-tree caller is updated; out-of-tree adapter authors need to
pass it.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new module +
every adapter execute path covered
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-07 14:44:45 -07:00
Dotta e400315cbf Guard assigned backlog liveness (#5428)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The issue graph and liveness recovery system decide whether assigned
work is executable or parked
> - Assigned issues created without an explicit status could silently
land in backlog, making parents look blocked with no productive wake
path
> - The server, shared validators, recovery analysis, and UI all need to
agree on that execution semantic
> - This pull request makes assigned issue creation default to `todo`,
flags assigned backlog blockers, and surfaces the state in the board
> - The benefit is that parked assigned work becomes intentional and
visible instead of creating silent liveness stalls

## What Changed

- Adds contract tests for assigned issue creation defaults.
- Defaults assigned issue creation to `todo` when status is omitted
while preserving explicit `backlog` parking.
- Exposes `resolveCreateIssueStatusDefault` through shared validators.
- Teaches liveness/blocker attention paths to distinguish assigned
backlog blockers.
- Adds UI notices, row/header badges, and issue detail safeguards for
assigned backlog blockers.
- Adds Storybook fixtures and execution-semantics documentation for the
assigned-backlog behavior.

## Verification

- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/validators/issue.test.ts
server/src/__tests__/issue-assigned-backlog-contract-routes.test.ts
server/src/__tests__/issue-blocker-attention.test.ts
server/src/__tests__/issue-liveness.test.ts
server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts
ui/src/components/IssueAssignedBacklogNotice.test.tsx
ui/src/components/IssueRow.test.tsx` — 50 passed, 23 skipped.
- Skipped tests were embedded Postgres suites on this host with the repo
skip message: `Postgres init script exited with code null. Please check
the logs for extra info. The data directory might already exist.`
- Pairwise merge check against the issue-controls PR branch completed
without conflicts via `git merge --no-commit --no-ff` in a temporary
worktree.
- Screenshots for assigned-backlog UI states:
[light](docs/pr-screenshots/pr-5428/assigned-backlog-light.png),
[dark](docs/pr-screenshots/pr-5428/assigned-backlog-dark.png).
- Follow-up checks: `pnpm --filter /ui typecheck`; `pnpm --filter
/mcp-server build`; `pnpm --filter /mcp-server test`; `pnpm exec vitest
run packages/shared/src/validators/issue.test.ts`; focused UI component
tests.
- Remote PR checks on head `6300b3c`: policy, verify, serialized server
shards 1/4-4/4, Canary Dry Run, e2e, Greptile Review, and Snyk all
passed.

## Risks

- Medium: changes status defaulting for assigned issue creation when the
caller omits status. Explicit `backlog` remains supported, and
server/shared tests cover both paths.
- Medium: liveness classification changes can affect blocker attention
labels; focused service and UI tests cover the new assigned-backlog
state.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-07 12:25:26 -05:00
Dotta 772fc92619 Add issue controls and retry-now recovery (#5426)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Issue operators need clear controls for execution settings, model
overrides, and recovery retries
> - Existing issue properties hid useful adapter override state and did
not expose a board-triggered retry for scheduled heartbeat recovery
> - Scheduled retries also need to respect the same safety gates as
normal execution instead of bypassing budget, review, pause, dependency,
or terminal-state checks
> - This pull request adds the issue property controls and retry-now
surfaces together because they share the issue details/properties UI
> - The benefit is that operators can inspect and adjust issue execution
settings and safely trigger pending scheduled recovery without hidden
control-plane behavior

## What Changed

- Adds editable issue assignee model override controls in
`IssueProperties`, with focused coverage.
- Removes the stale workspace tasks link from issue properties.
- Adds a scheduled retry `retry-now` backend path and shared response
types.
- Adds main-pane and properties-pane scheduled retry UI, backed by a
shared `useRetryNowMutation` hook.
- Adds suppression coverage for budget hard stops, review participant
changes, subtree pause holds, unresolved blockers, terminal issues, and
company scoping.
- Updates the `IssueProperties` test harness with toast actions required
by the retry-now hook.

## Verification

- `pnpm exec vitest run ui/src/components/IssueProperties.test.tsx
ui/src/components/IssueScheduledRetryCard.test.tsx` — 31 passed.
- `pnpm exec vitest run
server/src/__tests__/issue-scheduled-retry-routes.test.ts` — exited 0,
but this host skipped the embedded Postgres route tests with: `Postgres
init script exited with code null. Please check the logs for extra info.
The data directory might already exist.`
- Pairwise merge check against the assigned-backlog PR branch completed
without conflicts via `git merge --no-commit --no-ff` in a temporary
worktree.

### Visual verification screenshots

Storybook story: `Product/Issue Scheduled retry surfaces /
ScheduledRetrySurfaces`.

![Scheduled retry card and issue properties rows -
desktop](https://raw.githubusercontent.com/paperclipai/paperclip/62fb566f357312b43b9162af02252d0175530a8f/docs/assets/pr-5426/scheduled-retry-story-desktop.png)

![Scheduled retry card and issue properties rows -
mobile](https://raw.githubusercontent.com/paperclipai/paperclip/62fb566f357312b43b9162af02252d0175530a8f/docs/assets/pr-5426/scheduled-retry-story-mobile.png)

## Risks

- Medium: this touches issue execution/retry behavior, so CI should run
the embedded Postgres route tests on a host that can initialize
Postgres.
- Low-to-medium UI risk around duplicated retry-now entry points; both
surfaces share one mutation hook to keep behavior consistent.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-07 12:23:13 -05:00
Dotta d0e9cc76f2 Show workspace changes and stale notices in issue threads (#5356)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The issue thread is the operator's durable audit trail for what
changed and why
> - Workspace changes and stale disposition notices need to be visible
in that same timeline without noisy or misleading rendering
> - The local branch already contained backend activity details,
timeline conversion, and UI rendering work for those events
> - This pull request isolates the issue-thread activity work into a
standalone branch against `origin/master`
> - The benefit is a focused audit-trail PR that can merge independently
of the sidebar/operator UI polish branch

## What Changed

- Adds readable workspace-change activity details to issue update
activity events.
- Surfaces workspace-change events in issue chat/timeline rendering.
- Makes the existing issue comment migration idempotent.
- Folds and renders stale disposition notices inline so they match
activity-log styling and spacing.
- Adds focused route, timeline, and issue-thread system notice coverage.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run
server/src/__tests__/issue-activity-events-routes.test.ts
ui/src/lib/issue-timeline-events.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx` — 3 files
passed, 22 tests passed.
- Confirmed the PR changes 9 files and does not include `pnpm-lock.yaml`
or `.github/workflows/*`.
- `pnpm exec vitest run
server/src/__tests__/issue-closed-workspace-routes.test.ts` — 1 file
passed, 4 tests passed.
- `pnpm exec vitest run
server/src/__tests__/issue-activity-events-routes.test.ts
ui/src/lib/issue-timeline-events.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx
server/src/services/recovery/successful-run-handoff.test.ts
packages/shared/src/validators/issue.test.ts` — 5 files passed, 54 tests
passed.
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`.
- `pnpm --filter @paperclipai/ui typecheck` after adding the Storybook
screenshot fixture.
- Captured Storybook screenshots for the new UI rendering paths:
- Collapsed stale notice + workspace-change row:
`docs/pr-screenshots/pr-5356/issue-thread-notices-collapsed.png`
- Expanded stale notice details:
`docs/pr-screenshots/pr-5356/issue-thread-notices-expanded.png`


### Screenshots

Collapsed stale notice with workspace-change row:

![Collapsed stale notice with workspace-change
row](docs/pr-screenshots/pr-5356/issue-thread-notices-collapsed.png)

Expanded stale notice details:

![Expanded stale notice
details](docs/pr-screenshots/pr-5356/issue-thread-notices-expanded.png)

## Risks

- Moderate risk: this touches issue activity serialization and
issue-thread rendering, both of which are central operator surfaces.
- Migration risk is low: the only migration change makes an existing
migration idempotent.
- No new migrations are introduced, so there is no cross-PR migration
ordering requirement.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, shell/tool-use enabled, used to
split the existing branch, verify the isolated PR branch, and create
this PR.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 09:00:54 -05:00
Dotta 68f69975a4 Harden control-plane safety and issue identifiers (#5292)
## Thinking Path

> - Paperclip relies on issue identifiers, execution policies, and agent
heartbeat rules to keep autonomous work auditable.
> - Safety checks need to reject ambiguous agent handoffs, and
identifier parsing needs to support Cloud tenant prefixes.
> - Agent instructions also need to make final-disposition rules
explicit so work does not stall in vague states.
> - This pull request isolates backend correctness and governance
hardening from the UI and recovery-system-notice branches.
> - The benefit is safer in-review transitions, better identifier
compatibility, and clearer agent operating contracts.

## What Changed

- Fixed run-aware confirmation ordering and interrupted-run state
cleanup.
- Added Cloud tenant identity bootstrap and alphanumeric issue
identifier support across shared parsing and server routes.
- Guarded agent-authored `in_review` updates unless a real review path
exists.
- Tightened heartbeat disposition instructions in adapter
utilities/default AGENTS/Paperclip skill.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run packages/shared/src/issue-references.test.ts
server/src/__tests__/issue-identifier-routes.test.ts
server/src/__tests__/issue-execution-policy-routes.test.ts
packages/adapter-utils/src/server-utils.test.ts` initially had the first
execution-policy test hit Vitest's 5s timeout under the parallel bundle
while the rest passed.
- `pnpm exec vitest run
server/src/__tests__/issue-execution-policy-routes.test.ts
--testTimeout=20000` passed with 10/10 tests.

- Follow-up: `pnpm run typecheck:build-gaps` passed.
- Follow-up: `pnpm --filter @paperclipai/ui typecheck` passed.
- Follow-up: `pnpm vitest run
server/src/__tests__/issue-comment-reopen-routes.test.ts
server/src/__tests__/company-portability.test.ts
server/src/__tests__/costs-service.test.ts` passed.
- Follow-up: `pnpm vitest run ui/src/context/LiveUpdatesProvider.test.ts
ui/src/lib/issue-chat-messages.test.ts
ui/src/lib/issue-reference.test.ts
ui/src/lib/issue-timeline-events.test.ts` passed.

## Risks

- Medium control-plane risk: in-review update validation changes agent
behavior. The error message is explicit and tests cover allowed review
paths.

## Model Used

- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 07:49:47 -05:00
Dotta a1b30c9f35 Add planning mode for issue work (#5353)
## Thinking Path

> - Paperclip is a control plane for autonomous AI companies.
> - Issues are the core unit of work, and issue comments are how board
users and agents coordinate execution.
> - Some issue conversations need to produce plans and approvals instead
of immediate implementation work.
> - The existing issue contract did not distinguish standard execution
comments from planning-oriented issue work.
> - This pull request adds an issue work-mode contract and board UI
affordances for standard vs planning mode.
> - The benefit is that planning-mode issues can be created, displayed,
discussed, and carried through agent heartbeat context without losing
the normal issue workflow.

## What Changed

- Added `standard` / `planning` issue work-mode contracts across DB,
shared validators/types, server issue flows, plugin protocol, and
adapter heartbeat payloads.
- Added an idempotent `0081_optimal_dormammu` migration for
`issues.work_mode`, ordered after current `public-gh/master` migrations.
- Updated heartbeat/context summaries and issue-thread interaction
behavior so planning work mode is preserved when creating suggested
follow-up issues.
- Added UI support for planning-mode issue creation, issue rows, detail
composer styling, and composer work-mode toggles.
- Added focused server/shared/UI tests plus a Playwright visual
verification spec for planning-mode surfaces.
- Rebased the branch onto current `public-gh/master` and added durable
planning-mode screenshots under `doc/assets/pap-3368/`.

## Verification

- `pnpm --filter @paperclipai/db run check:migrations`
- `pnpm exec vitest run --project @paperclipai/shared
packages/shared/src/validators/issue.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/heartbeat-context-summary.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts
server/src/__tests__/issues-goal-context-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm exec vitest run --project @paperclipai/ui
ui/src/components/IssueChatThread.test.tsx
ui/src/components/NewIssueDialog.test.tsx
ui/src/components/IssueRow.test.tsx ui/src/pages/IssueDetail.test.tsx`
- `pnpm exec vitest run --project @paperclipai/adapter-utils
packages/adapter-utils/src/server-utils.test.ts`
- `PAPERCLIP_E2E_SKIP_LLM=true npx playwright test --config
tests/e2e/playwright.config.ts
tests/e2e/planning-mode-visual-verification.spec.ts`

## Screenshots

Desktop planning detail:

![Desktop planning
detail](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-3368-plan-a-planning-mode-for-issues/doc/assets/pap-3368/desktop-planning-detail.png)

Desktop planning row:

![Desktop planning
row](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-3368-plan-a-planning-mode-for-issues/doc/assets/pap-3368/desktop-planning-row.png)

Desktop staged standard toggle:

![Desktop staged standard
toggle](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-3368-plan-a-planning-mode-for-issues/doc/assets/pap-3368/desktop-standard-toggle.png)

Mobile planning detail:

![Mobile planning
detail](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-3368-plan-a-planning-mode-for-issues/doc/assets/pap-3368/mobile-planning-detail.png)

Mobile planning row:

![Mobile planning
row](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-3368-plan-a-planning-mode-for-issues/doc/assets/pap-3368/mobile-planning-row.png)

## Risks

- Medium migration risk: this adds a non-null issue column. The
migration uses `ADD COLUMN IF NOT EXISTS` so installations that applied
an older branch-local migration number can still apply the final
numbered migration safely.
- Medium contract risk: issue payloads, plugin payloads, and adapter
heartbeat payloads now include work mode; compatibility is handled by
defaulting missing values to `standard`.
- UI risk is moderate because composer controls changed; focused
component tests and visual e2e coverage exercise standard vs planning
display and toggle behavior.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent in a local Paperclip worktree, with
shell/tool use. Exact context-window size is not exposed in this
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 07:01:28 -05:00
Dotta 320fd5d23b Add full company search page (#5293)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Operators need to find work, documents, agents, projects, comments,
and activity across a company without jumping through separate surfaces.
> - The existing Command-K flow was useful for fast navigation but not
enough for deeper company-wide discovery.
> - Search also needs company-scoped backend contracts, query cost
controls, and indexed document matching so it stays safe as company data
grows.
> - This pull request adds a full company search API and a dedicated
board search page that Command-K can hand off to.
> - The benefit is a single searchable control-plane surface with richer
result context, recents, highlights, and test coverage across server and
UI behavior.

## What Changed

- Added a company-scoped search endpoint/service with query validation,
rate limiting, text matching, fuzzy title matching, and result typing
shared through `@paperclipai/shared`.
- Added idempotent search migrations for document search indexes and
fuzzy matching support.
- Added the full `/companies/:companyKey/search` UI, search result row
components, highlighted snippets, recent searches, and sidebar/Command-K
handoff.
- Added Storybook coverage for search surfaces and Vitest coverage for
server search behavior, rate limiting, route generation, Command-K
behavior, and the search page.
- Addressed Greptile findings by renaming the no-match SQL helper,
applying search pagination after cross-type merge sorting, and
lazy-initializing the default search service so unrelated route-test
mocks do not need to know about it.
- Merged current `public-gh/master` and renumbered the search migrations
behind upstream `0078_white_darwin`: search indexes are now
`0079_company_search_document_indexes` and fuzzy matching is
`0080_company_search_fuzzystrmatch`.

## Verification

- `git fetch public-gh master`
- `git diff --check public-gh/master...HEAD`
- `git diff --name-only public-gh/master...HEAD | rg '^pnpm-lock\.yaml$'
|| true` produced no output before opening the PR.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts
ui/src/pages/Search.test.tsx ui/src/components/CommandPalette.test.tsx
ui/src/lib/company-routes.test.ts` passed: 5 files, 25 tests.
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck
&& pnpm --filter @paperclipai/ui typecheck` passed.
- `pnpm exec vitest run
server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts && pnpm
--filter @paperclipai/server typecheck` passed after Greptile pagination
fixes.
- `pnpm exec vitest run
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts
server/src/__tests__/company-search-service.test.ts && pnpm --filter
@paperclipai/server typecheck` passed after the CI mock fix.
- After resolving the migration conflict with current
`public-gh/master`: `pnpm --filter @paperclipai/db typecheck && pnpm
exec vitest run server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts && pnpm
--filter @paperclipai/server typecheck` passed.
- DB migration numbering check passed as part of `@paperclipai/db`
typecheck.
- UI states are covered by the added Storybook stories in
`ui/storybook/stories/search.stories.tsx`.
- GitHub reports the PR merge state as `CLEAN` on head `18e54fa8`.
- GitHub PR checks are green on head `18e54fa8`: policy, verify,
serialized server shards 1/4 through 4/4, e2e, canary dry run, Snyk, and
Greptile Review.

## Risks

- Search ranking and snippets are new user-facing behavior, so reviewers
should check whether result ordering feels right on real company data.
- Search touches broad company data, so company scoping and query
cost/rate-limit behavior should be reviewed carefully.
- The migrations add search indexes/extensions; they are idempotent with
`IF NOT EXISTS` for users who may have applied an earlier branch
migration number.

> ROADMAP.md checked. This PR adds a focused board search surface and
does not duplicate an open roadmap item.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub CLI
session with medium reasoning effort. Existing branch commits were
produced across prior agent sessions; this packaging pass verified,
opened the PR, addressed Greptile findings, resolved migration conflicts
after upstream PRs landed, and got PR checks green.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 06:32:37 -05:00
Dotta 424e81d087 Improve operator workflow QoL (#5291)
## Thinking Path

> - Paperclip is a control plane operators use repeatedly to supervise
agent companies.
> - Common operator workflows depend on fast scanning of inboxes, issue
sidebars, workspaces, cost totals, and runtime services.
> - Several small UI and service gaps made those workflows slower or
less clear.
> - This pull request groups the operator-facing QoL changes that can
stand alone from recovery and adapter work.
> - The benefit is a denser, clearer board experience for issue triage
and workspace operation.

## What Changed

- Added inbox assignee/project grouping and issue list token/runtime
totals.
- Improved issue properties with removable blocker chips and workspace
task links.
- Improved execution workspace layout, runtime controls, issues tab
default, and stopped-port reuse behavior.
- Added mobile markdown/routine dialog fixes, page title company names,
sidebar polish, and dashboard run task label cleanup.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run ui/src/lib/inbox.test.ts
ui/src/components/IssueProperties.test.tsx
ui/src/components/WorkspaceRuntimeControls.test.tsx
server/src/__tests__/workspace-runtime.test.ts
server/src/__tests__/costs-service.test.ts`

## Risks

- Medium UI risk because this touches several operator surfaces. The
branch is intentionally grouped around workflow/QoL files and keeps the
file count below the Greptile limit.

## Model Used

- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 06:30:44 -05:00
Dotta 11ffd6f2c5 Improve ACPX adapter configuration (#5290)
## Thinking Path

> - Paperclip orchestrates AI agents across several adapter
implementations.
> - ACPX is a local adapter path that can proxy Claude and Codex-style
execution.
> - Its configuration needed stronger schema defaults, provider-aware
model handling, and better UI support.
> - Plugin authors also need clear docs for managed resources.
> - This pull request improves ACPX adapter configuration and documents
plugin-managed resources.
> - The benefit is a more predictable adapter setup path without
changing unrelated control-plane behavior.

## What Changed

- Improved ACPX config schema, execution config handling, UI build
config, and route coverage.
- Added ACPX model filtering support and tests.
- Updated the agent config form and storybook coverage for ACPX
model/provider behavior.
- Expanded plugin authoring documentation for managed resources.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run server/src/__tests__/acpx-local-execute.test.ts
server/src/__tests__/adapter-routes.test.ts
ui/src/lib/acpx-model-filter.test.ts`

## Risks

- Low-to-medium risk: adapter configuration behavior changes can affect
ACPX users, but the change is isolated to ACPX/plugin-doc surfaces and
covered by targeted adapter tests.

## Model Used

- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 06:06:47 -05:00
Dotta 454edfe81e Add recovery handoff system notices (#5289)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Agent runs can end productively while the source issue still lacks a
durable final disposition.
> - That leaves the control plane unsure whether to resume, escalate, or
close the work.
> - Issue comments also need a presentation contract so system-authored
recovery notices can render as first-class thread messages without
overloading normal comments.
> - This pull request adds successful-run handoff recovery, comment
presentation metadata, and system notice rendering.
> - The benefit is stricter task liveness with clearer operator-facing
recovery state.

## What Changed

- Added successful-run handoff decisions, wake payloads, escalation
behavior, and recovery tests.
- Added issue comment presentation metadata with migration
`0078_white_darwin.sql` and shared/server/company portability support.
- Rendered recovery/system notices in issue chat with dedicated UI
components, fixtures, tests, and storybook/lab coverage.
- Included the current recovery model-profile hint patch so automatic
recovery follow-ups use the cheap profile.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run
server/src/services/recovery/successful-run-handoff.test.ts
ui/src/components/SystemNotice.test.tsx
ui/src/lib/system-notice-comment.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx`

## Risks

- Migration-bearing PR: merge this before any other branch that might
later add a migration.
- The branch touches both recovery services and issue-thread rendering,
so review should pay attention to recovery wake idempotency and comment
metadata compatibility.

## Model Used

- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-06 06:05:58 -05:00
Devin Foley 50db8c01d2 Serialize sandbox callback bridge against concurrent heartbeats (#5326)
> **Stacked PR.** This PR's branch carries cumulative content from #5324
(bridge allowlist expand) and #5325 (env sanitization) — the
mutex/sha256 logic in this PR sits on top of both. Reviewers should
focus on the files this PR's commit touches:
`packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`,
`packages/adapter-utils/src/ssh.ts`, and
`packages/adapter-utils/src/ssh-fixture.test.ts`. Will rebase onto
`master` and force-push once both prerequisite PRs are merged.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent that runs in a sandbox or via SSH talks back to the
Paperclip server through a per-lease callback bridge whose entrypoint
script is uploaded to the remote
> - When two heartbeats target the same agent on the same machine
concurrently, both upload the bridge entrypoint and both write to the
same response files — producing torn-write races: `SyntaxError:
Identifier 'randomUUID' has already been declared` from a concatenated
upload, `mv: cannot stat …` from colliding `.json.tmp` writes, and
0-byte commits from a truncated stdin
> - This pull request serializes those operations with a POSIX
`mkdir`-mutex (PID liveness check + atomic rename) at the bridge
entrypoint upload, applies the same lock to the bridge response writer,
forwards stdin into remote ssh commands so the entrypoint payload
arrives intact, and verifies a sha256 of the upload before promoting it
> - The benefit is concurrent heartbeats no longer corrupt each other's
bridge state

## What Changed

- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: serialize
entrypoint upload and response writes via POSIX `mkdir`-mutex with PID
liveness; sha256 the upload before promoting via `mv`; content-skip when
the existing entrypoint already matches
- `packages/adapter-utils/src/ssh.ts`: forward stdin into remote ssh
commands through the SSH managed runtime so `cat > "$remote_upload"`
actually receives the base64-encoded entrypoint
- `packages/adapter-utils/src/ssh-fixture.test.ts`: cover the
stdin-forwarded SSH path
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: cover
the mutex, content-skip, sha256-verify, and atomic-rename paths

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: two parallel heartbeats targeting the same SSH agent no longer
race on the bridge entrypoint or response files

## Risks

Medium. Serializing previously-parallel operations adds latency on the
contended path (one heartbeat waits on another), bounded by the
entrypoint upload time. The mutex includes PID liveness so a crashed
heartbeat doesn't deadlock subsequent ones. Sha256-verify gives a clear
"torn upload" failure mode instead of silent 0-byte commits.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — tests cover mutex
+ sha256-verify + stdin-forwarded ssh
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 20:01:04 -07:00
Devin Foley f6bad8f6bf Sanitize remote execution envs at the boundary (#5325)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters spawn CLIs against local, SSH, and sandbox targets,
threading a runtime env through `runAdapterExecutionTargetProcess` and
the SSH/sandbox runners
> - Host identity vars (HOME, TMPDIR, XDG_*, NVM_DIR, PATH) routinely
leak into the env we send to remote targets — sometimes via test probes,
sometimes via runtime config — and break sandboxed/SSH'd CLIs whose own
profiles set those values correctly
> - The sanitization logic existed but lived alongside other helpers in
`server-utils.ts` and was applied piecemeal at adapter callsites, so it
was easy to bypass
> - This pull request lifts the sanitization into a standalone
`remote-execution-env.ts`, applies it at the SSH and sandbox runtime
boundary so every remote spawn goes through it, and removes the
duplicated callsite-level filtering
> - The benefit is identity-bound host env stops leaking across
SSH/sandbox transports regardless of which adapter calls in

## What Changed

- `packages/adapter-utils/src/remote-execution-env.ts`: new module —
single source of truth for which env keys are identity-bound and how to
strip them when the value matches the host's value
- `packages/adapter-utils/src/server-utils.ts`: remove the inline
sanitization (now in `remote-execution-env.ts`)
- `packages/adapter-utils/src/execution-target.ts`: apply sanitization
at the sandbox runtime boundary
- `packages/adapter-utils/src/ssh.ts`: apply sanitization at the SSH
spawn boundary
- `packages/adapters/opencode-local/src/server/test.ts`: drop
now-redundant callsite filtering
- `packages/adapters/pi-local/src/server/test.ts`: drop now-redundant
callsite filtering
- New tests `execution-target.test.ts` and
`execution-target-sandbox.test.ts` cover the sanitizer flow at both
transports, including positive cases (host-shaped path stripped) and
explicit-override preservation

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local`
- `pnpm typecheck` clean

## Risks

Low–medium. The sanitization is now applied at one layer (boundary)
instead of N (callsites), so behavior is more consistent. Any adapter
that previously relied on a leaked host var landing on the remote shell
would now see it stripped — but those reliances were what this change
exists to fix.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests at both
transports
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 19:30:14 -07:00
Devin Foley 36eaf9778f Expand sandbox callback bridge allowlist to cover the documented heartbeat surface (#5324)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs in an e2b sandbox or other non-managed
environment, it talks back to the Paperclip server through a per-lease
callback bridge that proxies HTTP requests
> - The bridge has an allowlist of method/path patterns it will forward;
anything outside the list is rejected to keep the bridge tight
> - The allowlist had drifted behind what the heartbeat documentation
describes as the supported callback surface — several documented
endpoints (issue updates, agent-side log emit, work-status writes) were
being rejected at the bridge
> - This pull request expands the allowlist to cover the documented
heartbeat surface and adds tests that pin every newly-allowed pattern,
so the doc and the bridge stay in sync
> - The benefit is sandboxed runs no longer hit "method not allowed" /
"path not allowed" rejections on the documented set of callbacks

## What Changed

- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: expand the
method/path allowlist to match the documented heartbeat callback surface
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: add
coverage for every newly-allowed pattern, plus negative cases for
patterns that should still be rejected

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: previously-rejected callbacks from sandboxed runs now succeed
end-to-end

## Risks

Low. The allowlist only grows; nothing previously allowed is now
blocked. Tests pin both the new allowed patterns and that out-of-doc
patterns stay rejected.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
added patterns + still-rejected negatives
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 19:30:11 -07:00
Devin Foley 9fb0c73e0a Raise gemini-local hello probe timeout to 60s for SSH and E2B targets (#5322)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The Gemini adapter's environment Test surfaces a hello probe so
operators can confirm the CLI runs end-to-end on the configured target
> - On SSH and E2B sandbox targets the round-trip cost (login-shell
sourcing, network, model warm-up) routinely exceeds the existing 10s
probe timeout, so the probe spuriously fails on environments that are
actually healthy
> - This pull request raises the gemini-local hello probe timeout to
60s, matching the timeout we use for slower-bootstrapping adapters
> - The benefit is the Gemini Test action no longer reports false
negatives on remote targets that need a longer first-run window

## What Changed

- `packages/adapters/gemini-local/src/server/test.ts`: hello probe
timeout raised from 10s to 60s

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-gemini-local`
- Manual: SSH and E2B Gemini hello probes now complete cleanly without
spurious timeouts

## Risks

Low. A 60s ceiling on a non-blocking probe is consistent with sibling
adapters; the only behavior change is a longer worst-case wait when the
probe genuinely hangs.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — N/A (one-line
timeout change)
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 19:30:04 -07:00
Dotta d6d7a7cea6 Add routine revision history and restore flow (#5285)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies.
> - Routines are the scheduled/recurring work surface that keeps a
company operating without manual kicks.
> - Operators need routine edits to be auditable and recoverable,
especially when routines control assignments, prompts, triggers, and
webhook secrets.
> - Documents already have revision-style safety, but routines did not
have equivalent history or restore semantics.
> - This pull request adds append-only routine revisions across the
database, shared contracts, server routes, and board UI.
> - The benefit is safer routine iteration: users can inspect history,
compare changes, restore older definitions, and avoid overwriting newer
edits.

## What Changed

- Added `routine_revisions` storage, latest revision pointers on
routines, shared types, validators, and API docs for routine revision
history.
- Added server service/route support for listing routine revisions,
conflict-aware routine saves, and append-only restore operations.
- Added a History tab on routine detail with revision preview,
structured change summaries, description line diffs, dirty-edit
blocking, restore confirmation, and restored webhook secret surfacing.
- Extracted the line diff helper from `DocumentDiffModal` into
`ui/src/lib/line-diff.ts` for reuse.
- Rebased the branch onto current `public-gh/master` and renumbered the
routine revision migration to `0077_unusual_karnak` after upstream
`0076_useful_elektra`.
- Made the `0077` routine revision migration idempotent so installs that
already applied the branch-local `0076_unusual_karnak` can safely
advance.
- Updated the plugin SDK test harness routine fixture with the new
revision fields required by the shared `Routine` contract.

## Verification

- `pnpm --filter @paperclipai/db run check:migrations` passed.
- `pnpm exec vitest run --project @paperclipai/shared
packages/shared/src/validators/routine.test.ts` passed.
- `pnpm exec vitest run --project @paperclipai/ui
ui/src/lib/line-diff.test.ts
ui/src/components/RoutineHistoryTab.test.tsx
ui/src/lib/workspace-routines.test.ts ui/src/pages/Routines.test.tsx`
passed.
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/routines-service.test.ts --pool=forks
--poolOptions.forks.isolate=true` passed.
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/routines-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true` passed.
- `pnpm --filter @paperclipai/plugin-sdk typecheck` passed after
updating the SDK test harness fixture.
- `pnpm --filter @paperclipai/plugin-sdk build` passed; this refreshed
local generated SDK output needed by plugin example typechecks.
- `pnpm -r typecheck` passed.

## Risks

- Medium migration risk: this adds routine revision storage and
backfills existing routines. The migration is ordered after upstream
`0076` and uses `IF NOT EXISTS` / duplicate-object guards to tolerate
earlier branch-local migration application.
- Restore behavior intentionally appends a new revision instead of
mutating history; callers expecting an in-place rollback need to follow
the new latest revision pointer.
- Restoring webhook triggers recreates webhook secret material, so users
must copy newly surfaced secrets after restore.
- Conflict-aware saves now reject stale routine edits when the client
sends an older `baseRevisionId`.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, with shell/tool use in a local
git worktree. Exact context-window size is not exposed in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Screenshots: not attached in this draft PR; the new UI flow is covered
by component tests listed above.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-05 11:54:52 -05:00
Devin Foley 9578dc3da7 Wire per-adapter sandbox install commands through test and execute paths (#5280)
> **Stacked PR.** Sits on top of the e2b sandbox chain — #5278 (stdin
staging) and #5279 (honest-resolvability + login-profiles). The
cumulative diff against `master` includes both of those PRs' content;
the files touched by *this* PR's commit are the new
`maybeRunSandboxInstallCommand` helper in
`packages/adapter-utils/src/execution-target.ts` and the per-adapter
`index.ts`/`server/test.ts`/`server/execute.ts` wiring under
`packages/adapters/{claude,codex,cursor,gemini,opencode,pi}-local/`. The
honest resolvability check from #5279 is what gives this PR's install
command a meaningful "did it actually land on PATH" follow-up.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Sandbox execution targets are ephemeral — each fresh lease starts
from a template image that may or may not have the agent CLIs
preinstalled
> - When a CLI isn't preinstalled, the resolvability probe fails at
`command -v` and the hello probe never runs
> - There's no shared mechanism for "before you probe or provision,
install the CLI on this sandbox"
> - This pull request adds a `SANDBOX_INSTALL_COMMAND` constant per
adapter and a `maybeRunSandboxInstallCommand` helper that runs it via
the existing sandbox login shell, captures structured output, and never
throws (so the resolvability + hello probe still run after); each
adapter's `test()` and `execute()` share the constant so the two
callsites can't drift
> - The benefit is a fresh sandbox lease without a preinstalled CLI now
installs it once via `sh -lc` before the resolvability probe and before
managed-runtime provisioning, with a uniform
`<adapter>_install_command_run` check on the test report

## What Changed

- `packages/adapter-utils/src/execution-target.ts`: add
`AdapterSandboxInstallCommandCheck` and `maybeRunSandboxInstallCommand`
(runs the install via existing sandbox shell, captures
exit/stdout/stderr, returns a structured info/warn check, never throws)
- Add `SANDBOX_INSTALL_COMMAND` to each adapter's `index.ts` so `test()`
and `execute()` share a single source of truth
- Wire each of the 6 affected adapter `testEnvironment()`s to call
`maybeRunSandboxInstallCommand` before
`ensureAdapterExecutionTargetCommandResolvable`
- Pass `installCommand: SANDBOX_INSTALL_COMMAND` through
`prepareAdapterExecutionTargetRuntime` in each adapter's `execute()`
- Per-adapter install commands use npm globals where possible so
binaries land on a PATH segment the template already exports:
  - claude → `npm install -g @anthropic-ai/claude-code`
  - codex → `npm install -g @openai/codex`
  - cursor → `curl https://cursor.com/install -fsS | bash`
  - gemini → `npm install -g @google/gemini-cli`
  - opencode → `npm install -g opencode-ai`
  - pi → `npm install -g @mariozechner/pi-coding-agent`

SSH and local targets ignore `installCommand` (SSH runtime takes no such
param; local short-circuits before runtime prep), so this is a no-op for
non-sandbox environments.

## Verification

- `pnpm typecheck` clean
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
and per-adapter projects pass
- Manual sandbox matrix (claude, codex, cursor, gemini, opencode, pi) —
each goes `install_command_run → resolvable → hello_probe_passed` (Codex
and Pi land on `hello_probe_auth_required`, which is the
configured-credentials problem, not an install issue)
- SSH no-regression: SSH Claude still passes; the helper short-circuits
on non-sandbox targets

## Risks

Medium — adds a network/CPU cost (npm install / curl) on every fresh
sandbox lease. Cost is bounded (one-time per lease, typically tens of
seconds for npm globals), and the helper never throws so a failing
install still lets the report run resolvability and hello probes. If a
sandbox image already has the CLI, the install is an idempotent
reinstall.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:29:28 -07:00
Devin Foley af9386f879 Run a real command-v probe and source login profiles before exec in e2b sandboxes (#5279)
> **Stacked PR.** Sits on top of #5278 (`e2b/stage-stdin-to-temp-file`)
which ships the stdin-staging fix this builds on. The cumulative diff
against `master` includes that PR's content; the files touched by *this*
PR's commit are `packages/adapter-utils/src/execution-target.ts`,
`packages/plugins/sandbox-providers/e2b/src/plugin.ts`, and
`packages/plugins/sandbox-providers/e2b/src/plugin.test.ts`.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The adapter Test flow does an "is the command resolvable?" probe
before running the hello probe so the report distinguishes "binary not
installed" from "binary errored"
> - For sandbox targets, that resolvability check was a no-op
early-return — every sandboxed adapter test reported "Command is
executable" regardless of whether the binary existed
> - That made the resolvability check disagree with the hello probe in a
way that looked like a PATH bug, when it was actually a missing CLI
> - Separately, the e2b spawn used `sandbox.commands.run` with a
non-login non-interactive shell whose PATH did not include npm-globals,
nvm shims, or anything else the template installs via
`.profile`/`.bashrc`
> - This pull request makes the resolvability check honest by running a
real `command -v` invocation through the sandbox runner, and aligns the
e2b spawn with SSH by sourcing login profiles before `exec env KEY=val
<cmd>`
> - The benefit is the e2b sandbox spawn agrees with the hello probe and
finds CLIs at template-installed paths

## What Changed

- `packages/adapter-utils/src/execution-target.ts`: add
`ensureSandboxCommandResolvable` that runs `command -v <cli>` through
the sandbox runner; replace the early-return in
`ensureAdapterExecutionTargetCommandResolvable` for sandbox targets
- `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: replace
`buildCommandLine` with `buildLoginShellScript` (sources `/etc/profile`,
`~/.profile`, `~/.bash_profile`, `~/.bashrc`, `~/.zprofile`, and nvm.sh
before `exec env KEY=val <cmd>`); env vars are interpolated inline so
user-configured adapter env always wins over profile-exported values;
drop the now-unused `envs:` SDK option
- `plugin.test.ts` updated for the login-shell wrapping

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/sandbox-e2b` —
17/17 plugin tests pass
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
clean
- `pnpm typecheck` clean
- Manual: previously every sandboxed adapter said "Command is
executable" then the hello probe failed with "exec: not found". After
this change, missing CLIs surface honestly at the resolvability step.
SSH no-regression: SSH Claude probe still passes.

## Risks

Medium — sandbox adapter Test reports will start failing at the
resolvability step for environments where the CLI was never actually
installed. This was always the real state; the previous "Command is
executable" message was incorrect. Operators should expect
previously-green-but-broken sandbox environments to report accurately.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — `plugin.test.ts`
updated for the login-shell wrapping
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:21:37 -07:00
Devin Foley cb6af7c2cc Stage stdin to a temp file so the e2b sandbox executor delivers it reliably (#5278)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The e2b sandbox provider implements `onEnvironmentExecute` so
adapters can spawn CLIs in an e2b sandbox
> - For commands that need stdin (e.g. piping a hello prompt to a CLI),
the previous implementation awaited a foreground `commands.run({ stdin:
true, ... })` and then tried to call `sendStdin(pid)` on the now-dead
PID
> - That call resolves only after the process exits, so stdin was never
delivered and e2b raised "process not found"
> - This pull request stages stdin to `/tmp/paperclip-stdin-<uuid>`
inside the sandbox and shell-redirects it (`exec '<cmd>' '<args>' <
'<file>'`), making the command synchronous regardless of whether stdin
is supplied
> - The benefit is adapter Test probes that pipe a hello prompt to a CLI
inside an e2b sandbox now actually deliver the prompt

## What Changed

- `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: replace the
broken async `commands.run` + `sendStdin` flow with stdin-staging to a
sandbox temp file and shell-redirection
- Staged file is removed in a `finally` block; write failures propagate
after best-effort cleanup

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/sandbox-e2b` —
all 17 unit tests pass
- `pnpm typecheck` clean
- Manual: a sandboxed adapter Test probe that pipes a hello prompt now
receives the prompt

## Risks

Low risk — `plugin.test.ts` already encodes the temp-file design; the
change brings the implementation in line with the test.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — existing tests
already encode the new design
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:00:49 -07:00
Devin Foley 9042b8d042 Write apikey-mode auth.json so Codex CLI 0.122+ can authenticate via OPENAI_API_KEY (#5276)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The Codex adapter spawns the OpenAI Codex CLI to drive the model
> - Codex CLI 0.122 changed how it reads credentials: it ignores
`OPENAI_API_KEY` from the environment and reads only
`$CODEX_HOME/auth.json`
> - Without auth.json, Codex 0.122+ returns 401 "Missing bearer or basic
authentication" on `/v1/responses` even when `OPENAI_API_KEY` is
forwarded into the sandbox or remote shell
> - This pull request materializes an apikey-mode `auth.json` in the
managed Codex home (or per-run for the test probe) when an
`OPENAI_API_KEY` is configured
> - The benefit is configured Codex API keys authenticate correctly with
current Codex CLI versions across local, SSH, and sandbox targets

## What Changed

- `codex-home.ts`: add `writeApiKeyAuthJson()` and let
`prepareManagedCodexHome` accept an `apiKey` override that replaces the
symlinked host auth.json with an apikey-mode file
- `execute.ts`: pass `envConfig.OPENAI_API_KEY` into
`prepareManagedCodexHome` so the managed (and synced-to-remote) Codex
home authenticates via the configured key
- `test.ts`: when `OPENAI_API_KEY` is available, wrap the hello probe
with a small shell that materializes a per-run `$CODEX_HOME/auth.json`
before exec'ing codex; key content rides through env to avoid leaking
into process listings
- Update the `codex_hello_probe_auth_required` hint to explain Codex CLI
does not read `OPENAI_API_KEY` from env

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-codex-local`
- `pnpm typecheck` clean
- Manual: Codex 0.122.0 with empty `CODEX_HOME` returns 401 with
env-only auth; with this change it authenticates cleanly

## Risks

Low risk — when no API key is configured, behavior is unchanged (no
auth.json written, existing chatgpt-mode flow preserved). Apikey-mode
`auth.json` is the upstream-supported format.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:00:27 -07:00