Now that upstream's #5429 provides provider-based secrets management with formal
bindings (companySecretBindings table populated at config-save time) plus the
/secrets/:id/usage endpoint backed by listBindingReferences(), the fork's
parallel usages() scan is redundant for agent/routine bindings.
The fork's scan did cover one path upstream doesn't track: skill metadata
sourceAuthSecretId references. Dropping this means accidental deletion of a
skill's auth secret is no longer rejected — accepted as a chase-upstream tradeoff.
- server/src/services/secrets.ts: drop usages(), SecretUsage* types, in-use guard
in remove(), and companySkills/agentService imports
- server/src/routes/secrets.ts: drop GET /secrets/:id/usages route
- ui/src/api/secrets.ts: drop usages() client method
Typechecks clean on server and ui.
The merge of upstream's #5429 (provider vaults + AWS Secrets Manager) added a
new 2,155-line Secrets.tsx page covering everything the fork's 528-line
CompanySecrets page did, plus vault management and AWS import. Auto-merge took
both sides' sidebar entry and route registration, so React Router picked the
fork's older page first and upstream's AWS UI never rendered.
- ui/src/App.tsx: drop CompanySecrets import + duplicate route at /company/settings/secrets
- ui/src/components/CompanySettingsSidebar.tsx: drop duplicate Secrets nav item
- ui/src/pages/CompanySecrets.tsx: delete (superseded)
Server-side delete guard (services/secrets.ts: refuse delete when secret is
referenced by agents or skills) is preserved and still enforced; upstream's
page surfaces the API error message in place of the fork's inline reference list.
Upstream #5664 added the cursor-cloud adapter; the .farhoodlabs/Dockerfile
overlay was missing the deps-stage COPY for its package.json, so pnpm install
didn't wire it as a workspace package and the build stage failed type-checking
cursor-cloud's UI sources from ui's tsc -b.
## Thinking Path
> - Paperclip orchestrates AI agents, and its release flow depends on
explicit package enrollment for automated publishing.
> - The release registry tooling uses
`scripts/release-package-manifest.json` as the source of truth for which
public packages CI is allowed to publish.
> - The cursor cloud adapter plus the Cloudflare and exe.dev sandbox
plugins are public packages that now need to ship through the normal CI
release path.
> - Leaving those entries at `publishFromCi: false` keeps release
automation and registry validation out of sync with the intended package
set.
> - This pull request updates only those three manifest entries and
leaves the release tooling itself unchanged.
> - The benefit is that CI release enrollment now matches the packages
we intend to publish, with the existing manifest checks continuing to
guard correctness.
## What Changed
- Enabled CI publishing for `@paperclipai/adapter-cursor-cloud` in
`scripts/release-package-manifest.json`.
- Enabled CI publishing for `@paperclipai/plugin-cloudflare-sandbox` in
`scripts/release-package-manifest.json`.
- Enabled CI publishing for `@paperclipai/plugin-exe-dev` in
`scripts/release-package-manifest.json`.
## Verification
- `node ./scripts/release-package-map.mjs check`
- `pnpm test:release-registry`
## Risks
- Low risk. This is a manifest-only change, but a wrong enrollment flag
would affect release automation, so the release-registry checks are the
main guardrail.
## Model Used
- OpenAI GPT-5.4 via Paperclip `codex_local` (`adapterConfig.model:
gpt-5.4`), high reasoning effort, with tool use and shell/code
execution. The adapter does not expose a separate context-window value
in this environment.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents through visible, governable task
and routine workflows.
> - Routines are the recurring-work surface where operators configure
schedules, runs, and activity.
> - PR #3569 moved routine operational tabs into the right-hand
properties panel while also redesigning the routine trigger editor.
> - The current product request is to remove that routine properties
right-tab change for now and come back to it later.
> - The cleanest way to do that is a direct revert of #3569 on top of
current `master`, which already includes the #5703 revert.
> - This pull request restores the pre-#3569 routine trigger/detail
behavior and removes the right-tab properties-panel routine layout.
> - The benefit is a simple, reviewable rollback with no schema or API
changes.
## What Changed
- Reverted #3569: `fix(ui): prevent lossy cron rewrites + redesign
routine triggers tab`.
- Restored the previous `RoutineDetail` inline tabs and trigger editing
flow.
- Restored the earlier `ScheduleEditor` implementation.
- Removed the UI components and tests introduced by #3569:
`ConfirmDialog`, `TriggerDialog`, `TriggerListCard`, and
`ScheduleEditor.test.ts`.
## Verification
- `git diff --check origin/master..HEAD`
- `pnpm vitest run ui/src/pages/Routines.test.tsx
ui/src/components/RoutineHistoryTab.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
Notes:
- `pnpm install --frozen-lockfile` was run in the clean worktree before
verification. It completed with known workspace bin-link warnings for
`paperclip-plugin-dev-server` because the plugin SDK `dist/dev-cli.js`
has not been built in that fresh worktree.
- `Routines.test.tsx` emitted existing Radix dialog accessibility
warnings during the test run; the tests passed.
### Screenshots
This is a direct revert of #3569. The visual state after this PR
corresponds to the old screenshot from #3569, and the state being
removed corresponds to the new/right-panel screenshots from #3569.
| Before this revert | After this revert |
| --- | --- |
| <img width="1410" height="1325"
alt="routine-triggers-before-this-revert"
src="https://github.com/user-attachments/assets/d70dd35b-e72f-4fc6-bb21-be9b0d92b3b1"
/> | <img width="721" height="707"
alt="routine-triggers-after-this-revert"
src="https://github.com/user-attachments/assets/260bb682-32cb-4dff-b038-d55e45824b04"
/> |
Right-hand properties panel state removed by this revert:
<img width="1409" height="830" alt="routine-properties-panel-removed"
src="https://github.com/user-attachments/assets/f1d42f07-7cd3-4614-8e93-5b585affd4bf"
/>
## Risks
- Low technical risk: this is a clean Git revert of a UI-only PR.
- Product risk: #3569 also fixed lossy cron editing and added broader
schedule presets, so this rollback intentionally removes those
improvements along with the right-tab routine layout.
- Follow-up risk: if we want only the schedule-editor fixes back later,
they should be reintroduced separately from the routine properties-panel
layout.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, tool-enabled with local shell and
GitHub CLI access. Context window size was not exposed in this session.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents through visible, governable task
and routine workflows.
> - The routines UI includes the routine detail page, properties panel,
history tab, and shared sidebar components.
> - PR #5703 changed that workflow by widening the routine properties
panel and moving revision inspection/comparison into dialogs.
> - The product direction for that change is being paused for now, so
the safest path is a direct revert instead of partial edits.
> - This pull request reverts merge commit
`74cb560c41305ac3283067d1ec8d3060ffdc28cb` from #5703.
> - The benefit is restoring the prior routines UI behavior while
keeping the revert easy to review and re-apply later if needed.
## What Changed
- Reverted #5703: `fix(ui): improve routine properties panel and history
UX`.
- Restored the previous routine properties panel sizing, panel context
API, routine detail layout, and routine history rendering behavior.
- Removed the reverted sidebar pane test additions and restored the
previous focused routine history test expectations.
## Verification
- `git diff --check origin/master..HEAD`
- `pnpm vitest run ui/src/components/RoutineHistoryTab.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
### Screenshots
This is a direct revert of #5703. The visual state after this PR
corresponds to the "Before" screenshots from #5703, and the state being
removed corresponds to the "After" screenshots from #5703.
#### Trigger Panel Width
| Before this revert | After this revert |
| --- | --- |
| <img width="1742" height="1288" alt="triggers-before-this-revert"
src="https://github.com/user-attachments/assets/9e818978-283c-49a3-9401-879be550c67b"
/> | <img width="1741" height="1289" alt="triggers-after-this-revert"
src="https://github.com/user-attachments/assets/2a391769-c355-4219-8da3-d1ea18698430"
/> |
#### History Panel
| Before this revert | After this revert |
| --- | --- |
| <img width="1741" height="1290" alt="history-before-this-revert"
src="https://github.com/user-attachments/assets/4c139238-8494-4438-89e1-4277d05bc3aa"
/> | <img width="1739" height="1289" alt="history-after-this-revert"
src="https://github.com/user-attachments/assets/eaea4f3d-bb65-4af6-b67f-3ba3026fe0c9"
/> |
## Risks
- Low technical risk: this is a clean Git revert of a recently merged
UI-only PR.
- Product risk: the routine properties panel and revision history return
to the older, narrower workflow that #5703 was improving.
- Re-application risk: future work that wants the #5703 behavior back
should re-apply it deliberately rather than cherry-picking around this
revert.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, tool-enabled with local shell and
GitHub CLI access. Context window size was not exposed in this session.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip is the control plane for autonomous AI companies, and its
Docker image needs to build from the checked-in core repository.
> - The Docker `deps` stage copies workspace package manifests before
running `pnpm install --frozen-lockfile` so dependency installation can
be cached.
> - Current `master` copied
`packages/plugins/plugin-llm-wiki/package.json`, but that plugin package
has not been merged into core yet.
> - Docker fails before install with a missing build-context path, so
the release image cannot build from the current repository state.
> - This pull request removes the premature plugin manifest copy while
leaving the plugin SDK and existing sandbox plugin package copies
intact.
> - The benefit is that the Docker build no longer depends on an
unmerged plugin package.
## What Changed
- Removed the `packages/plugins/plugin-llm-wiki/package.json` copy from
the Dockerfile `deps` stage.
## Verification
- `git diff --check`
- Static Dockerfile source validation: parsed non-stage `COPY` sources
and confirmed every source exists in the build context.
- Attempted `docker build --target deps --progress=plain -t
paperclip-pap-9235-deps-check .`, but Docker is unavailable in this
execution environment: `Cannot connect to the Docker daemon at
unix:///Users/dotta/.docker/run/docker.sock`.
## Risks
- Low risk. The removed path points to a package that is absent from the
repository, so retaining it is what breaks the build. The plugin can add
its manifest copy back when the package itself lands.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex using GPT-5, tool-enabled coding agent in a local
repository workspace. Exact context-window metadata is not exposed in
this runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
> _Stacked on top of #5685 → #5686 → #5687. Diff against master includes
commits from earlier PRs in the stack — review focuses on the two new
commits (`Add long-secret textarea variant to JsonSchemaForm
SecretField` + `Add exe.dev sandbox provider plugin`)._
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose the
provider — today E2B, Daytona, and (in this stack) Cloudflare
> - exe.dev offers per-VM sandboxes via a small CLI / HTTP API — useful
for operators who want full Linux VMs (vs container/runtime-only
sandboxes)
> - The plugin shape mirrors the e2b plugin: lifecycle hooks (`new`,
`ls`, `rm`) drive exe.dev's CLI; SSH plumbing handles direct VM access
for adapters that need it
> - exe.dev VMs come up bare — `node` is not preinstalled, so the
Paperclip sandbox callback bridge (a Node script) needs Node 20
installed at VM init via `--setup-script`. The plugin defaults the setup
script to a Nodesource install
> - The auth field accepts long SSH private keys, which need a textarea
variant of the existing `SecretField` in `JsonSchemaForm` — added behind
a `maxLength > THRESHOLD` opt-in so other secret fields are unaffected
> - The benefit is that operators get exe.dev as a fully working sandbox
provider out of the box, with no manual VM provisioning required
## What Changed
**Shared UI support (`Add long-secret textarea variant to JsonSchemaForm
SecretField`):**
- `ui/src/components/JsonSchemaForm.tsx` + new
`JsonSchemaForm.test.tsx`: when a secret-formatted field declares
`maxLength` larger than the existing single-line threshold, render a
monospace textarea instead of the masked input. Short secrets (API keys,
tokens) keep the existing masked-input + show/hide toggle behavior.
**The exe.dev plugin (`Add exe.dev sandbox provider plugin`):**
- `packages/plugins/sandbox-providers/exe-dev/`: plugin entry, manifest,
plugin runtime, README, and 19-test Vitest suite.
- Manifest fields: API token (with `secret-ref` + `/exec` permission
notes — needs `new`, `ls`, `rm`), API URL override, optional SSH
username, optional SSH private key (uses the new `JsonSchemaForm`
textarea variant via `maxLength: 4096`), optional SSH identity-file
path, optional setup script.
- Default `--setup-script` is a Nodesource Node 20 install. exe.dev VMs
come up bare and the Paperclip sandbox callback bridge is a Node script,
so without Node preinstalled the bridge can't start. Operators can
override by supplying their own setup script.
- `runLifecycleCommand` redacts env values from the executed command
before surfacing it in error messages, so secrets passed via
`--env=KEY=VALUE` don't leak into operator-visible failures.
- The plugin distinguishes exe.dev's SSH onboarding failures (`Please
complete registration by running: ssh exe.dev`) from general SSH
failures and surfaces a clear remediation message.
- `scripts/release-package-manifest.json`: register the new plugin for
CI publish alongside the existing daytona / e2b providers.
## Verification
- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
ui/src/components/JsonSchemaForm.test.tsx`
- `(cd packages/plugins/sandbox-providers/exe-dev && pnpm test)` — 19
passing
For an operator-side smoke test:
1. Get an exe.dev API token with `/exec` permission for `new`, `ls`,
`rm`.
2. Register the plugin in your Paperclip instance, configure an
environment with the token.
3. Create a sandbox env whose provider is `exe-dev`, then run a Codex or
Claude job against it. The default Node 20 setup script should bring the
VM up automatically.
## Risks
- Adds a new sandbox provider plugin that follows the existing daytona /
e2b shape; behavior on existing providers is unchanged.
- The `JsonSchemaForm` textarea variant only engages for fields that opt
in via `maxLength` larger than the existing threshold. All existing
secret fields (which don't declare a `maxLength`) keep their current
rendering. Test coverage pins both paths.
- The redaction in `runLifecycleCommand` is a defense-in-depth measure;
the test suite exercises the redaction path. If the redaction misses a
future env-arg shape, the worst case is restored behavior (secrets in
error messages), which is what the existing daytona / e2b plugins also
do today.
- Default setup script downloads from `deb.nodesource.com` over HTTPS at
VM init. Operators on air-gapped networks or with a different package
strategy can override the setup script.
## Model Used
- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — UI change is a textarea variant of an existing secret
field; will attach screenshots before requesting merge
- [x] I have updated relevant documentation to reflect my changes
(plugin README, manifest descriptions)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - Routines are the recurring-work surface where operators configure
schedules, executions, activity, and revision history.
> - The routine detail view uses a contextual right properties panel for
triggers, runs, activity, and history.
> - That panel was too cramped for routine workflows: the routine header
could collapse at constrained widths, and revision previews/comparisons
were trying to live inside the same narrow panel.
> - This pull request makes the routine properties panel wider and
responsive without changing the default panel behavior for other pages.
> - It also moves routine revision viewing and comparison into focused
dialogs so history stays usable instead of rendering dense revision
content inside the right panel.
> - The benefit is a cleaner routine workflow: triggers remain
scannable, the main routine stays readable, and revisions can be
inspected, compared, and restored without fighting the sidebar width.
## What Changed
- Added optional per-panel layout options for storage key, default
width, min/max width, and compact viewport behavior.
- Set the routine properties panel to use its own 400px default width
and persistence key, while compacting to 320px on narrower viewports.
- Made the shared resizable sidebar support right-side panes, custom
width bounds, compact max width, and keyboard resizing.
- Fixed the routine detail header so title text and action controls
remain readable beside the properties panel at constrained widths.
- Reworked routine history so selecting a revision opens a read-only
snapshot dialog instead of trying to render the whole revision inside
the right panel.
- Added a side-by-side current-vs-selected revision comparison dialog
with clearer diff markers for structured fields, triggers, and
variables.
- Added focused tests for the resizable pane and routine history
behavior.
## Verification
- `pnpm vitest run ui/src/components/RoutineHistoryTab.test.tsx
ui/src/components/ResizableSidebarPane.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- `pnpm -r typecheck`
- `git diff --check`
- Browser E2E in TestCo at `http://localhost:3100/TES/dashboard`:
- created and edited a routine
- added, edited, toggled, and deleted schedule triggers
- paused automation
- ran the routine and stopped the live run
- verified runs, activity, history, snapshot dialog, compare mode,
restore confirmation, routine list, recent runs, row actions, panel
close/reopen, and constrained-width layout
### Screenshots
#### Trigger Panel Width
| Before | After |
| --- | --- |
| <img width="1741" height="1289" alt="triggers-before"
src="https://github.com/user-attachments/assets/2a391769-c355-4219-8da3-d1ea18698430"
/> | <img width="1742" height="1288" alt="triggers-after"
src="https://github.com/user-attachments/assets/9e818978-283c-49a3-9401-879be550c67b"
/> |
#### History Panel
Before, selecting a revision attempted to show dense revision content
inside the already narrow right panel. After, history remains a compact
list and revision details open separately.
| Before | After |
| --- | --- |
| <img width="1739" height="1289" alt="history-before"
src="https://github.com/user-attachments/assets/eaea4f3d-bb65-4af6-b67f-3ba3026fe0c9"
/> | <img width="1741" height="1290" alt="history-after"
src="https://github.com/user-attachments/assets/4c139238-8494-4438-89e1-4277d05bc3aa"
/> |
#### Revision Snapshot
The selected revision now opens in a dedicated read-only dialog instead
of crowding the properties panel.
<img width="1740" height="1289" alt="revision-single"
src="https://github.com/user-attachments/assets/f930f50f-7016-434b-bd81-d8d97304c528"
/>
#### Revision Compare
Historical revisions can be compared side-by-side with the current
revision, including changed structured fields and trigger differences.
<img width="1740" height="1287" alt="revision-compare"
src="https://github.com/user-attachments/assets/5640201e-de4f-446b-8941-1b0f140c56d7"
/>
## Risks
- Low to moderate UI risk: the shared resizable pane API gained optional
layout parameters, but existing callers keep the previous defaults.
- Routine history now uses dialogs for revision viewing and comparison,
so reviewers should confirm the new workflow feels right for restore and
compare.
- Routine panel width now persists under a routine-specific key, so
previous global properties panel width preferences do not carry into
routines.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent in Codex Desktop, tool-enabled with
local shell, git, and in-app browser automation. Context window size was
not exposed in this session.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> _Stacked on top of #5685 → #5686. Diff against master includes commits
from earlier PRs in the stack — review focuses on the two new commits
(`Extend sandbox callback bridge for Worker-hosted plugins` + `Add
Cloudflare sandbox provider plugin`)._
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose which
provider backs that sandbox — today E2B and Daytona are bundled with the
platform
> - Cloudflare Workers + Durable Objects + the Sandbox SDK offer a
credible new option: globally distributed, cheap idle, and
operator-deployable as a single Worker
> - To plug it in, Paperclip needs (a) a provider plugin that speaks the
`PaperclipPluginManifestV1` lifecycle and (b) a small operator-deployed
Worker — the **bridge** — that adapts Paperclip's runtime RPCs to the
Cloudflare Sandbox SDK
> - The plugin extends the existing sandbox-callback-bridge with a
`bridge.transport: "worker"` discriminator so the platform routes
runtime RPCs through the Worker bridge instead of the in-process runner
> - This pull request adds the plugin, the bridge Worker template, and
the supporting adapter-utils + server hooks the new transport needs
> - The benefit is that operators can run sandboxes on Cloudflare's edge
with no new platform code beyond installing the plugin and deploying the
Worker
## What Changed
**Shared support (`Extend sandbox callback bridge for Worker-hosted
plugins`):**
- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
expose `expectedHostHeader` so plugin-side bridge clients can verify the
canonical request envelope before forwarding.
- `packages/adapter-utils/src/command-managed-runtime.{ts,test.ts}`:
relax the always-fresh runner construction so callers can re-use a
runner across exec calls (Worker-hosted bridges hold the runner inside a
Durable Object).
- `server/src/services/environment-runtime.ts` +
`environment-runtime.test.ts`: route Worker-hosted bridges through the
same env-shaping path as E2B and pin the `requestEnv` contract.
- `server/src/services/plugin-environment-driver.ts`: thread an optional
`issueId` through the runtime descriptor so bridges can scope leases to
the originating issue (used by Cloudflare to map a sandbox to the
issue/workflow for billing and audit).
- `packages/plugins/sdk/src/protocol.ts`: add `issueId?` to
`PluginEnvironmentDriverBaseParams` and the new `bridge.transport:
"worker"` discriminator that the new plugin declares.
- `server/__tests__/heartbeat-plugin-environment.test.ts`: pin the
heartbeat path against the new runtime descriptor.
**The Cloudflare plugin itself (`Add Cloudflare sandbox provider
plugin`):**
- `packages/plugins/sandbox-providers/cloudflare/`: plugin entry,
manifest, plugin runtime (lifecycle + bridge client), config parsing,
and Vitest coverage. Manifest declares `bridge.transport: "worker"` so
the platform routes runtime RPCs through the bridge client.
- `bridge-template/`: a Worker template the operator deploys with
`wrangler`. Owns Durable Object-backed sessions (`sessions.ts`),
exec/stream routes (`exec.ts`, `routes.ts`), and an HMAC auth layer
(`auth.ts`) that pins the `Host` header surface. Includes the
SDK-contract-correct exec implementation, lease recovery, and chunked
stdout/stderr streaming.
- Tests cover lease/session handoff (`bridge-template/src/exec.test.ts`,
`routes.test.ts`), bridge client request shaping
(`src/bridge-client.test.ts`), and end-to-end plugin behavior
(`src/plugin.test.ts`) including streamed exec output. 27 tests in
total.
- `README.md` walks the operator through deploying the bridge Worker,
registering the plugin, and configuring the runtime.
## Verification
- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/sandbox-callback-bridge.test.ts
packages/adapter-utils/src/command-managed-runtime.test.ts
server/src/__tests__/environment-runtime.test.ts
server/src/__tests__/heartbeat-plugin-environment.test.ts`
- `(cd packages/plugins/sandbox-providers/cloudflare && pnpm test)` — 27
passing
For an operator-side smoke test:
1. Deploy the bridge: `cd
packages/plugins/sandbox-providers/cloudflare/bridge-template &&
wrangler deploy`
2. Register the plugin in your Paperclip instance, point its bridge URL
at the deployed Worker, set the HMAC shared secret.
3. Create a sandbox environment whose provider is `cloudflare`, then run
a Codex or Claude job against it.
## Risks
- Adds a new `bridge.transport: "worker"` code path, but the existing
E2B / Daytona transports go through the same shaped helpers and have
explicit test coverage that pins their behavior unchanged.
- The Worker bridge stores session state in a Durable Object; operator
instances must be aware of the corresponding Cloudflare costs (DO
requests, storage). Documented in the README.
- The `issueId` plumbing is optional throughout — existing plugins that
don't supply it continue to work.
## Model Used
- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
(plugin README, bridge-template README)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Summary
- Adds the user-facing stable release changelog at
`releases/v2026.511.0.md`, generated per
[`.agents/skills/release-changelog/SKILL.md`](../blob/master/.agents/skills/release-changelog/SKILL.md)
from the 89 commits between `v2026.428.0` and `origin/master`.
- Surfaces nine headline themes: planning mode, full company search,
routine revision history, recovery system notices, expanded plugin host,
**secrets provider vaults + remote import**, **Cursor cloud adapter**,
**Daytona sandbox provider plugin**, and the ACPX local adapter — plus
categorized improvements/fixes.
- Documents nine new additive/idempotent migrations (`0075`–`0083`) in
the Upgrade Guide, including the `fuzzystrmatch` extension requirement
for company search and the `provider_config_id` text→uuid retype on
`company_secrets`.
Branch retains its `release/v2026.506.0` name to avoid PR churn; only
the artifact filename and content track the actual release date. Branch
was rebased onto current `origin/master` and force-pushed with a single
squashed commit.
No `BREAKING:` / `BREAKING CHANGE:` / `feat!:` signals across the full
range.
## Test plan
- [ ] Maintainer review of the draft changelog content for tone/scope
- [ ] Confirm the headline grouping reflects intended release messaging
- [ ] Confirm the migration list and upgrade guide match runtime
behavior
- [ ] Confirm the `0083` `provider_config_id` retype guidance is
acceptable for the release notes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Humans configure when those agents run via **routines**, which are
driven by cron-backed triggers
> - The routine detail page exposed triggers through an always-visible
inline add form and per-row inline editor, with a ScheduleEditor that
only understood a narrow set of cron shapes
> - That combination was actively lossy: pasting `0 9,13,17 * * *`
silently collapsed to `0 10 * * *` on save, and common shapes
(every-N-minutes within a window, multiple times per day, monthly on
several dates) had no first-class UI
> - This pull request rebuilds the triggers tab around a list of cards +
add/edit modal, teaches ScheduleEditor the cron shapes users actually
want, and prevents cron round-trips from dropping data
> - It also *optionally* tucks the Triggers/Runs/Activity tabs into the
shared right-hand PropertiesPanel (same pattern as Issues and Goals) so
they stay in view alongside the routine instead of being hidden below
the main content
> - The benefit is that routine scheduling becomes non-destructive and
legible — operators can see, describe, and edit real-world schedules
without dropping into raw cron and without fear that saving will
silently rewrite their trigger
## What Changed
**Core fixes + redesign (required):**
- **ScheduleEditor correctness** — `parseCronToPreset` now detects comma
lists, ranges, steps, and unknown tokens across every cron field and
routes anything it can't round-trip losslessly to the `custom` preset
(except `dow === "1-5"` → `weekdays`). Fixes the `0 9,13,17 * * *` → `0
10 * * *` regression.
- **ScheduleEditor presets** — adds first-class support for
every-N-minutes (with optional hour window + weekdays-only),
every-N-hours, hourly at minute offset, daily with multiple times/day,
selected-days-of-week with multiple times, and monthly on multiple
dates. `describeSchedule` unfolds multi-value hour/day lists into
readable sentences.
- **ScheduleEditor polish** — swaps raw `<input type=\"checkbox\">` for
the shadcn `Checkbox` primitive so hour-window and weekdays-only toggles
match the rest of the app.
- **Triggers tab redesign** — replaces the inline add form + inline
editor with a header + \"Add trigger\" button, compact `TriggerListCard`
entries, and a `TriggerDialog` add/edit modal. Enable/disable is now a
single-click switch on each card; delete goes through a `ConfirmDialog`.
- **Webhook trigger gating** — webhook kind is visible but disabled with
\"— COMING SOON\" in the add dialog, matching the old inline form's
production behaviour. Editing existing webhook triggers still works.
- **Tests** — adds `ScheduleEditor.test.ts` covering the regression cron
strings (`0 9,13,17 * * *`, `0 */4 * * *`, `0 10,16 * * *`) plus
existing preset patterns as regression guards in the other direction.
**Optional layout change (commit `145a86b5` — can be dropped without
affecting the rest):**
- Moves Triggers/Runs/Activity into the shared right-hand
`PropertiesPanel` (persisted open/close, header toggle button),
mirroring `IssueDetail` and `GoalDetail`. The reasoning: these tabs are
the primary way a human *operates* a routine, and keeping them docked on
the right means they're always in view next to the routine content
rather than hidden below the fold. Mobile parity is preserved by
rendering the same tabs inline below `md`. Trigger cards and
run/activity rows were restructured into vertical stacks so they fit the
320px panel without overflow, and the last-result badge became a
wrapping inline chip so long error strings no longer fill the card
width.
- **If reviewers prefer to keep the tabs inline below the routine, this
commit can be reverted cleanly without touching any of the fixes
above.**
## Screenshots:
Old:
<img width="721" height="707" alt="triggers-old"
src="https://github.com/user-attachments/assets/260bb682-32cb-4dff-b038-d55e45824b04"
/>
New:
<img width="1410" height="1325" alt="Screenshot 2026-04-13 at 12 25 00"
src="https://github.com/user-attachments/assets/d70dd35b-e72f-4fc6-bb21-be9b0d92b3b1"
/>
New Add Trigger modal:
<img width="1408" height="1321" alt="Screenshot 2026-04-13 at 12 25 07"
src="https://github.com/user-attachments/assets/0f23a83d-ba2c-47ed-9efa-829e777dcdf5"
/>
Commit 145a86b5 Properties panel:
<img width="1409" height="830"
alt="commit-145a86b51265e326160cb8c48e0874cb36d86f37"
src="https://github.com/user-attachments/assets/f1d42f07-7cd3-4614-8e93-5b585affd4bf"
/>
## Verification
- `cd ui && npm test -- ScheduleEditor` — new cron parser/describer
cases pass.
- Full UI test suite + typecheck green locally.
- Manual:
1. Open a routine → Triggers tab → verify cards render with enable
switch, edit, and delete (confirm dialog).
2. Create a schedule trigger with each preset (every-N-min with window,
every-N-hours, hourly@offset, daily multi-time, weekly multi-time,
monthly multi-date) → save → reopen → preset + values round-trip intact.
3. Paste `0 9,13,17 * * *` into an existing trigger → editor routes to
Custom with the raw cron preserved → save → value unchanged.
4. Try to add a webhook trigger → kind option shows \"— COMING SOON\"
and is disabled; edit an existing webhook trigger still works.
5. Toggle the properties panel via header button → state persists across
reload. Resize below `md` → tabs render inline.
- **Before/after screenshots:** attached in PR description (inline
triggers tab → list+modal; raw-cron save hazard → custom preset
preservation; bottom-of-page tabs → right-hand PropertiesPanel).
## Risks
- **Medium-low.** UI-only change; no API, schema, or migration impact.
- `parseCronToPreset` / `describeSchedule` signatures are preserved, but
their *behaviour* shifts: more cron strings now resolve to `custom` than
before. Any external caller relying on the old (lossy) classification
would see different preset tags — none known in-repo.
- PropertiesPanel reuse (optional commit) depends on the existing
localStorage key behaviour; if two routes ever write conflicting
open/close state under the same key, one could clobber the other.
Mirrors the established `IssueDetail`/`GoalDetail` pattern, so risk is
bounded. Reverting `145a86b5` removes this risk entirely while keeping
the fixes.
- Webhook kind is disabled in the add dialog only; existing webhook
triggers remain editable, so no data is stranded.
## Model Used
- **Authoring / PR drafting:** Anthropic Claude — `claude-opus-4-6` (1M
context window), via Claude Code CLI. Used for diff review and PR
description drafting. Code authored by @aronprins.
- **Post-hoc audit:** OpenAI Codex — `gpt-5.4` (high reasoning). Audited
the completed work after implementation; found no issues.
## Checklist
- [x] Thinking path traces from project context to this change
- [x] Model used specified with version + capability details
- [x] Tests run locally and pass
- [x] Added/updated tests (`ScheduleEditor.test.ts`)
- [x] Before/after screenshots attached
- [ ] Documentation updated — none required (internal UI only)
- [x] Risks documented
- [x] Will address all Greptile + reviewer comments before merge
> _Stacked on top of #5685 (Harden remote sandbox runtime). Diff against
master includes commits from earlier PRs in the stack — review focuses
on the new commit only._
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The cursor-local adapter wraps the Cursor Agent CLI so a Paperclip
workflow can drive it inside a sandbox
> - When the adapter runs in a remote sandbox, the Cursor Agent CLI
installs under `$HOME/.local/bin/cursor-agent` (or wherever
`$XDG_BIN_HOME` points), not on the global PATH
> - The existing post-install resolution assumed `cursor-agent` would
resolve via the sandbox's login shell PATH after `npm install -g`, which
fails on sandboxes where the install lands in a user-prefixed directory
that isn't on PATH at probe time
> - This pull request resolves the agent CLI from the cursor binary's
own directory (`dirname "$(command -v cursor)"`) so the install probe
and execute path agree on a real binary location
> - The benefit is that cursor-local works correctly on any sandbox
provider where `npm install` lands in a user-prefixed directory
## What Changed
- `packages/adapters/cursor-local/src/server/remote-command.ts`: resolve
the cursor-agent binary from the cursor bin directory after install,
instead of relying on PATH.
- `packages/adapters/cursor-local/src/server/test.ts`: corresponding
probe tweak.
- `packages/adapters/cursor-local/src/server/test.test.ts` (new) +
`remote-command.test.ts`: focused coverage that exercises the install +
resolve path against a sandbox runner that places the binary in a
user-prefixed directory.
## Verification
- `pnpm exec vitest run --no-coverage
packages/adapters/cursor-local/src/server/test.test.ts
packages/adapters/cursor-local/src/server/remote-command.test.ts
packages/adapters/cursor-local/src/server/execute.test.ts`
All passing locally.
## Risks
- Local cursor-local runs are unaffected — the resolution change only
kicks in for the sandbox install path.
- Low risk; isolated to one adapter.
## Model Used
- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: tool use (Read/Edit/Bash), no code execution beyond
local repo commands
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs inside a sandbox environment so its CLI is isolated
from the host
> - Sandbox-backed adapter runs go through a small set of shared helpers
— `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback
bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings
> - When standing up new sandbox provider plugins, the existing helpers
timed out, missed install fallbacks, or leaned on assumptions that only
held for E2B
> - Local adapters (`claude-local`, `codex-local`, `gemini-local`,
`opencode-local`) needed slightly hardened probes so they could install
themselves and validate inside *any* remote sandbox transport, not just
E2B
> - This pull request bundles those runtime fixes so future sandbox
provider plugins inherit a working baseline
> - The benefit is that adding a new sandbox provider plugin no longer
requires touching adapter-utils or each local-adapter probe — the
supporting infra is already correct
## What Changed
- `packages/adapter-utils/src/execution-target.ts`: introduce
`DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and
`resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters
keep the historical "0 means no adapter timeout" behavior;
sandbox-backed runs without an explicit `timeoutSec` get an explicit
30-minute default so remote installs and warm-up don't time out at the
per-RPC default. Plumbed `timeoutSec` through
`ensureAdapterExecutionTargetCommandResolvable` so install probes inside
a sandbox honor adapter-level overrides instead of the bridge's 5-minute
default.
- `packages/adapters/opencode-local/src/index.ts`: switch
`SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl
-fsSL https://opencode.ai/install | bash`. The npm package reifies four
large prebuilt-binary subpackages in parallel even though only one
matches the host arch; on bandwidth-constrained sandboxes that blew
through the 240s install budget. The official installer fetches one
arch-specific binary and adds `$HOME/.opencode/bin` to PATH via
`~/.bashrc`, which the sandbox-callback-bridge login-shell script
already sources.
- `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden
remote-target probes — pass `--skip-git-repo-check` for Codex when
probing outside a repo, normalize permission flags for Claude, and add
`*.remote.test.ts` coverage that exercises the remote-sandbox path
explicitly for each adapter.
- `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}`
(new): add `buildSandboxNpmInstallCommand` helper.
`server/src/adapters/registry.ts` + new
`server/src/__tests__/adapter-registry.test.ts`: wire adapter install
commands so they fall back to a writable `$HOME/.local` prefix when
global install isn't available.
- `server/src/__tests__/plugin-worker-manager.test.ts` + new
`server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call
timeout overrides so plugin worker exec calls honor the caller's timeout
instead of the worker's default.
## Verification
- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/execution-target-sandbox.test.ts
packages/adapter-utils/src/sandbox-install-command.test.ts`
- `pnpm exec vitest run --no-coverage
server/src/__tests__/plugin-worker-manager.test.ts
server/src/__tests__/adapter-registry.test.ts
server/src/__tests__/claude-local-adapter-environment.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter-environment.test.ts`
- `pnpm exec vitest run --no-coverage
packages/adapters/codex-local/src/server/test.remote.test.ts
packages/adapters/opencode-local/src/server/test.remote.test.ts
packages/adapters/codex-local/src/server/codex-args.test.ts
packages/adapters/codex-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts`
All passing locally.
## Risks
- Touches shared `adapter-utils` and several `*-local` adapters. The
30-minute default applies only when both (a) the target is
`remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH
paths are unchanged. New test coverage was added alongside each behavior
change to pin the contracts.
- Switching OpenCode's install command to the official installer is a
behavior change for any operator running OpenCode inside a remote
sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND`
only runs when an adapter is being installed inside a sandbox).
- Low risk overall — no migrations, no API surface change.
## Model Used
- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep),
no code execution beyond local repo commands
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
Auto-generated lockfile refresh after dependencies changed on master.
This PR only updates pnpm-lock.yaml.
Co-authored-by: lockfile-bot <lockfile-bot@users.noreply.github.com>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - There are many adapter types, one per agent-runtime product (Claude,
Codex, OpenCode, Cursor local CLI, etc.)
> - Cursor shipped a public TypeScript SDK on 2026-04-29 that exposes
Cursor's full hosted-agent platform (cloud VMs, harness, MCP, skills,
hooks)
> - Paperclip had no first-class adapter for this — agents that wanted
to use Cursor's managed cloud runtime had to fall back to the local CLI
adapter, which loses the cloud session, streaming, and durable run model
> - This PR adds a new `cursor_cloud` adapter built directly on
`@cursor/sdk`, with Paperclip's heartbeat mapped to Cursor's
durable-agent + per-run model
> - The benefit is that any Paperclip agent can now drive a Cursor cloud
agent across heartbeats with native session reuse, streaming, and
cancellation, while Paperclip remains the source of truth for issue/task
state
## What Changed
- New built-in adapter package `packages/adapters/cursor-cloud` (15
files, ~1.7k LOC) backed by `@cursor/sdk` ^1.0.12
- `src/server/execute.ts` — SDK-first lifecycle: `Agent.create` /
`Agent.resume` / `Agent.getRun` / `agent.send` / `run.stream` /
`run.wait`, with session reuse keyed on the (runtime env type, env name,
repo set) tuple
- `src/server/session.ts` — codec for `cursorAgentId` + `latestRunId` +
repo metadata, persisted in `runtime.sessionParams`
- `src/server/test.ts` — environment probe via `Cursor.me()` and
optional model validation via `Cursor.models.list()`
- `src/ui/parse-stdout.ts` + `src/cli/format-event.ts` — normalize
Cursor SDK message types (`status`, `thinking`, `assistant`, `user`,
`tool_call`, `tool_result`, `result`) into Paperclip transcript events
for the UI and CLI
- Registrations: `packages/shared/src/constants.ts`,
`packages/adapter-utils/src/session-compaction.ts`,
`server/src/adapters/{registry,builtin-adapter-types}.ts`,
`ui/src/adapters/{registry,adapter-display-registry}.ts` +
`ui/src/adapters/cursor-cloud/index.ts`, `cli/src/adapters/registry.ts`,
plus workspace deps in `cli`/`server`/`ui` `package.json`
- `ui/src/components/AgentConfigForm.tsx` — hide local-Cursor
`mode`/thinking-effort field for `cursor_cloud` (different config
surface)
- 11 vitest tests covering execute paths (fresh create, matching-resume,
active-run reattach, non-finished result), session codec round-trip,
transcript parsing, and config building
## Verification
Reviewer steps:
```bash
pnpm install
pnpm --filter @paperclipai/adapter-cursor-cloud typecheck # → clean
pnpm vitest run packages/adapters/cursor-cloud # → 11/11 passing
```
End-to-end check against a real Cursor cloud agent (requires
`CURSOR_API_KEY` and Cursor GitHub-app install on the target repo):
1. Create a `cursor_cloud` agent in Paperclip with `repoUrl` set to the
test repo, `repoStartingRef: main`, and `env.CURSOR_API_KEY` set
2. Trigger a heartbeat → adapter calls `Agent.create({ cloud: { env: {
type: "cloud" }, repos: [...] } })`, streams events, terminates on
`finished`
3. Trigger a second heartbeat → adapter calls `Agent.resume` or
`agent.send` follow-up depending on prior-run state, reusing
`cursorAgentId`
4. The Paperclip UI/CLI transcript reflects Cursor `status` / `thinking`
/ `assistant` events as they stream
5. Cancellation from Paperclip maps to `run.cancel()` or Cloud API v1
`cancelRun` for cross-heartbeat cancellation
A direct-SDK smoke run against a real repo (devinfoley/my_test_project @
main) confirmed: `Cursor.me()` ok → `Agent.create` → `agent.send` →
`run.stream()` (30 events) → terminal status `finished` in ~11s.
## Risks
- **New adapter, additive only.** No existing adapter or registry is
replaced; current `cursor` local-CLI adapter is untouched. Default
behavior of any existing agent is unchanged.
- **External dependency on `@cursor/sdk`.** Cursor's SDK is v1.0.x and
may evolve. Mocked unit tests cover the public surface used here; if the
SDK breaks compatibility we update the adapter independently.
- **Cost/budget.** `cursor_cloud` runs on Cursor's billed cloud VMs;
operators must understand they are spending money outside Paperclip's
budget controls when they enable this adapter. Same shape as other
API-billed adapters.
- **No webhook support in V1.** The SDK already provides
stream/wait/cancel/reattach, so V1 does not require a public callback
URL. If a future use case needs out-of-band wakes, we add a Cloud API v1
webhook bridge as a separate change. This is called out in the issue
plan document.
- **Lockfile.** Per repo policy, `pnpm-lock.yaml` is intentionally not
in this PR — CI's lockfile workflow will update it on merge given the
manifest changes.
## Model Used
- Provider: Anthropic Claude (via Claude Code / Paperclip `claude_local`
adapter)
- Model: `claude-opus-4-7` (Claude Opus 4.7), knowledge cutoff January
2026
- Mode: standard tool-use with extended reasoning
- Context: ~200k token window
- Capabilities used: code generation, multi-file edits, shell/test
execution, GitHub PR workflow
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass (11/11 in
`packages/adapters/cursor-cloud`)
- [x] I have added or updated tests where applicable (4 new test files,
11 cases)
- [ ] If this change affects the UI, I have included before/after
screenshots (the only UI change is hiding the local-Cursor mode field on
the `cursor_cloud` adapter — happy to attach a screenshot if the
reviewer wants one)
- [x] I have updated relevant documentation to reflect my changes (issue
plan document supersedes the pre-SDK design; tracked in PAPA-203)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system needs host contracts and runtime support before
large plugins can integrate cleanly.
> - The source branch mixed the LLM Wiki package with supporting
host/runtime work, managed plugin skills, root-level storage spaces, and
a bookmarks reference plugin.
> - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be
split by file boundary: plugin package separately from everything else.
> - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have
plugin-local spaces, but Paperclip core should not reorganize top-level
local storage into spaces.
> - Follow-up review clarified that the bookmarks example should not
ship in this PR either.
> - This pull request contains the
non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime
state under the selected Paperclip instance root, and no longer includes
the bookmarks example.
## What Changed
- Added/updated plugin host contracts, SDK types, worker RPC plumbing,
managed plugin skill support, and related server tests.
- Removed the bookmarks example plugin package and its
bundled-example/workspace references.
- Removed the root-level local spaces CLI/migration surface and restored
instance-root runtime defaults for config, db, logs, storage, secrets,
workspaces, projects, and adapter homes.
- Replaced shared root `space-paths` helpers with `home-paths` helpers
for core runtime storage.
- Tightened stranded recovery unique-conflict detection so concurrent
recovery scans reuse the raced recovery issue when Postgres errors are
wrapped.
- Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff;
plugin-local spaces remain in the stacked plugin-only PR.
## Verification
- `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts
cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts
packages/shared/src/home-paths.test.ts
packages/db/src/runtime-config.test.ts
server/src/__tests__/agent-instructions-service.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts`
- `pnpm exec vitest run packages/db/src/runtime-config.test.ts`
- `pnpm exec vitest run
server/src/__tests__/plugin-routes-authz.test.ts`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the
raced stranded recovery issue"` skipped locally because embedded
Postgres did not initialize on this macOS temp host; the code path was
typechecked and is covered by Linux CI.
- Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`,
`spaces migrate-default`, `@paperclipai/shared/space-paths`,
`registerSpacesCommands`, or the removed bookmarks example.
- Previous PR head `4f23e034` had green GitHub checks: `verify`, all
four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk,
and `Greptile Review`. Current head `582f466d` is re-running checks
after the bookmarks deletion.
## Risks
- Plugin host changes touch shared runtime paths, so regressions would
most likely appear in adapter startup, plugin loading, or local dev path
defaults.
- Removing the bookmarks example also removes one demonstration of
plugin database namespaces plus local-folder persistence; remaining
plugin examples still cover bundled example discovery and plugin host
flows.
- The plugin package itself is intentionally deferred to the stacked
plugin-only PR, where LLM Wiki plugin-local spaces live.
- Existing installs that tested the transient root-level spaces CLI
should stop using it; this PR intentionally removes that unsupported
migration surface before merge.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass, except where noted above
for host-specific embedded Postgres initialization
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Stacked follow-up: PR #5592 contains only
`packages/plugins/plugin-llm-wiki/` and targets this branch.
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - Company Environments is the operator-facing seam for choosing where
compatible adapters execute work.
> - Sandbox provider plugins such as E2B extend that seam, but they are
not agent adapters themselves.
> - The current Company Environments copy put adapter capability rows
and sandbox-provider enablement on the same page without clearly
distinguishing the two concepts.
> - That made it look like installing the E2B sandbox provider caused a
new adapter to appear under adapters.
> - This pull request clarifies the UI language so provider plugins are
described as backing the Sandbox driver rather than being adapter types.
> - The benefit is a more accurate mental model for operators
configuring environments and adapters.
## What Changed
- Added explicit Company Environments copy stating that installed
sandbox providers are not adapter types and instead back the Sandbox
driver for compatible adapters.
- Renamed the support-matrix column from `Sandbox` to `Sandbox via
plugin` to make the provider relationship visible in the table itself.
- Extended the existing environments UI test to assert the new
clarification text.
## Verification
- `pnpm test -- --run ui/src/pages/CompanySettings.test.tsx`
Result: could not complete cleanly in this worktree because the checkout
is missing its local workspace install links.
- Direct Vitest fallback against `ui/src/pages/CompanySettings.test.tsx`
Result: failed before test collection on local dependency resolution
(`react/jsx-dev-runtime`), so there is no passing automated signal from
this checkout.
- Manual review
Confirm the Company Environments page now says sandbox providers are not
adapter types and labels the table column as `Sandbox via plugin`.
## Risks
- Low risk. This is a copy-only UI clarification plus a matching test
assertion; the main risk is wording drift if the product later decides
sandbox providers should be surfaced differently.
## Model Used
- OpenAI Codex via the local `codex_local` Paperclip adapter. This run
used tool-assisted code editing and shell execution. The exact backend
model ID and context window are not exposed in the Paperclip run context
for this session.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [ ] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Its release automation publishes canary packages to npm and then
validates the published registry state before considering the release
healthy
> - The failing canary run `25139465018` showed that npm can expose a
newly published version through version-specific endpoints before the
root package document has fully converged
> - That made a successful canary publish look like a failed release
because the verifier trusted stale root metadata too early
> - This pull request hardens the registry verification path by
preferring version-specific manifest checks, retrying
convergence-sensitive failures, and distinguishing permanent failures
from propagation lag
> - While validating that change in CI, a separate teardown race in
`heartbeat-stale-queue-invalidation.test.ts` surfaced and was hardened
so the PR could pass reliably
> - The benefit is that transient npm propagation lag no longer fails a
successful canary publish, while genuine registry-state and
dependency-integrity failures still stop the release flow promptly
## What Changed
- Hardened `scripts/verify-release-registry-state.mjs` so it prefers
version-specific manifest resolution over stale root metadata, adds
bounded registry-fetch timeouts, and classifies failures as retriable vs
non-retriable.
- Updated `scripts/release-lib.sh` and `scripts/release.sh` so
post-publish registry verification retries only convergence-sensitive
failures and reports immediate permanent failures clearly.
- Expanded `scripts/verify-release-registry-state.test.mjs` with
regression coverage for stale root metadata, fetch timeout behavior,
peer dependency range handling, non-retriable canary-latest cases, and
related verifier edge cases.
- Hardened
`server/src/__tests__/heartbeat-stale-queue-invalidation.test.ts`
teardown to tolerate the late-comment foreign-key race that CI exposed
while validating this branch.
## Verification
- `pnpm run test:release-registry`
- `node --check scripts/verify-release-registry-state.mjs`
- `bash -n scripts/release.sh && bash -n scripts/release-lib.sh`
- PR checks passed on head `5c422600fc12acac61f6b7c267a4dc915df622b1`:
`policy`, `verify`, `e2e`, `security/snyk`, and `Greptile Review`
## Risks
- Low risk. The main behavioral changes are limited to release
automation and verifier retry semantics, plus a test-only teardown
hardening for a CI race.
> I checked [`ROADMAP.md`](ROADMAP.md). This is a narrow release bugfix
and does not overlap planned core feature work.
## Model Used
- OpenAI Codex via Paperclip `codex_local` with tool use and local code
execution enabled. This agent session runs on a GPT-5-class coding
model; the exact backend model ID/context window is not exposed by the
local adapter runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I have addressed all Greptile and reviewer comments before
requesting merge
Auto-generated lockfile refresh after dependencies changed on master.
This PR only updates pnpm-lock.yaml.
Co-authored-by: lockfile-bot <lockfile-bot@users.noreply.github.com>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - The server, DB package, and CLI all rely on the shared Drizzle ORM
dependency for core persistence flows.
> - A published install was still resolving nested `drizzle-orm@0.38.4`,
which left the production package graph behind the intended security
update.
> - The repo’s documented dependency policy says GitHub Actions owns
`pnpm-lock.yaml`, so the correct maintainer workflow is to update
dependency manifests in the feature PR and let the lockfile refresh
happen separately after merge.
> - This pull request therefore keeps the Drizzle upgrade to the package
manifests only and leaves lockfile regeneration to the existing `Refresh
Lockfile` automation.
## What Changed
- Updated `drizzle-orm` dependency declarations in `cli/package.json`,
`packages/db/package.json`, and `server/package.json` from `0.38.4` /
`^0.38.4` to `0.45.2` / `^0.45.2`.
- Re-verified the packed `@paperclipai/db` and `@paperclipai/server`
publish payloads to confirm their generated `package.json` files
advertise `drizzle-orm ^0.45.2`.
- Removed the temporary lockfile/CI follow-up commits so the branch now
matches the intended manifest-only protocol.
## Verification
- `pnpm list drizzle-orm -r --depth 0`
- `pnpm exec vitest run packages/db/src/client.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm run test:release-registry`
- Packed `@paperclipai/db` and `@paperclipai/server` locally and
inspected the tarball `package.json` files to confirm they advertise
`drizzle-orm ^0.45.2`.
## Risks
- Low to moderate risk: the runtime code paths are unchanged, but
downstream lockfile refresh now depends on the existing post-merge
GitHub automation working as documented.
- A separate packaging/versioning issue around unpublished
`@paperclipai/plugin-sdk@1.0.0` showed up during a raw local tarball
install experiment; that is called out for reviewers but is not part of
this Drizzle bump.
## Model Used
- OpenAI Codex via the `codex_local` adapter, using a GPT-5-based coding
agent with terminal tool use and code execution. The adapter does not
expose a public exact model ID or context-window value in this
environment.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip is the control plane for AI-agent companies.
> - The board UI sidebar is one of the main ways operators scan active
agents and projects.
> - Agents and projects had duplicated section header behavior, which
made collapse controls, add actions, and future section menus harder to
keep consistent.
> - Operators also need lightweight ways to switch between their curated
sidebar order and common scan orders like alphabetical or recent
activity.
> - This pull request introduces a shared sidebar section header and
uses it for the Agents and Projects sidebar sections.
> - The benefit is a more consistent sidebar surface with reusable
header controls and persisted sort modes without losing the existing
drag-ordered Top view.
## What Changed
- Added a reusable `SidebarSection` component that supports collapsible
content, header actions, and section dropdown menus.
- Updated the Agents sidebar section to use the shared header and add
persisted `Top`, `Alphabetical`, and `Recent` sort modes.
- Updated the Projects sidebar section to use the shared header and add
persisted `Top`, `Alphabetical`, and `Recent` sort modes.
- Added local-storage helpers and cross-tab update events for
agent/project sidebar sort preferences.
- Added focused component coverage for the shared section behavior and
the updated Agents/Projects sidebar ordering paths.
## Verification
- `pnpm run preflight:workspace-links && pnpm exec vitest run
ui/src/components/SidebarSection.test.tsx
ui/src/components/SidebarProjects.test.tsx
ui/src/components/SidebarAgents.test.tsx`
- 3 test files passed
- 18 tests passed
## Risks
- Low-to-moderate UI risk: this changes sidebar section header
interactions and adds persisted client-side sort preferences.
- Drag ordering is intentionally limited to `Top` mode; non-top modes
render sorted lists and do not persist drag order changes.
- No database migrations or API contract changes.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5-based model, tool-use enabled; exact
hosted model build/context-window identifier was not exposed in this
session.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The release pipeline gates new public packages behind a bootstrap
policy: `scripts/check-release-package-bootstrap.mjs` requires every
package marked `publishFromCi: true` in
`scripts/release-package-manifest.json` to already exist on npm
> - PR #5580 added the new Daytona sandbox provider plugin but had to
land with `publishFromCi: false` because the package had never been
published, so CI's release plan would have failed bootstrap validation
otherwise
> - Now that `@paperclipai/plugin-daytona` has been bootstrap-published
to npm by hand, the temporary `false` flag is the only thing keeping it
out of the standard CI publish flow
> - This pull request flips the Daytona entry to `publishFromCi: true`,
matching every other release-enabled package in the manifest
> - The benefit is that future tagged releases will publish the Daytona
plugin automatically alongside the rest of the monorepo's public
packages
## What Changed
- Single-line flip in `scripts/release-package-manifest.json`:
`@paperclipai/plugin-daytona` is now `publishFromCi: true`
## Verification
- `node ./scripts/release-package-map.mjs check` → `Release package
manifest OK: 19 enabled for CI publish, 0 disabled pending bootstrap`
(was 18 + 1)
- `node ./scripts/check-release-package-bootstrap.mjs
scripts/release-package-manifest.json` against `origin/master` →
`Release bootstrap OK for changed manifests:
@paperclipai/plugin-daytona`, confirming npm sees the
bootstrap-published package
- No code changes; no tests required beyond the existing manifest
validators
## Risks
- Low risk. Only effect is that the next release run will include
`@paperclipai/plugin-daytona` in its publish set
- If the npm bootstrap was incomplete, CI's bootstrap check will fail
loudly before any release tag goes out — same safety net the policy is
designed to provide
## Model Used
- Claude Opus 4.7 (`claude-opus-4-7`), extended thinking, tool use
enabled
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [ ] I have added or updated tests where applicable (N/A —
manifest-only flag flip, covered by existing validators)
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — release config)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI-agent companies and needs secrets handling
to work across local development, hosted operators, and governed agent
execution.
> - The affected subsystem is the company-scoped secrets control plane:
database schema, server services/routes, CLI workflows, and the Secrets
settings UI.
> - The gap was that secrets were local-only and operators could not
manage provider vaults or import existing remote references without
exposing plaintext.
> - This branch adds provider vault configuration plus an AWS Secrets
Manager remote-import path while preserving company boundaries, binding
context, and audit trails.
> - I kept the PR to a single branch PR, removed unrelated
lockfile/package drift, rebased the full branch onto the current
`public-gh/master`, and addressed fresh Greptile findings.
> - The benefit is a reviewable implementation of provider-backed
secrets with focused tests covering provider selection, import
conflicts, deleted secret reuse, rotation guards, and AWS signing
behavior.
## What Changed
- Added provider vault support for company secrets, including provider
config storage, default vault handling, health checks, binding usage,
access events, and remote import preview/commit.
- Added an AWS Secrets Manager provider using SigV4 request signing,
bounded request timeouts, namespace guardrails, cached runtime
credential resolution, and external-reference linking without plaintext
reads.
- Added Secrets UI surfaces for vault management and remote import, plus
CLI/API documentation for setup and operations.
- Stabilized routine webhook secret binding paths and SSH
environment-driver fixture bindings discovered during verification.
- Addressed Greptile and CI findings: no lockfile/package drift,
monotonic migration metadata, disabled-vault default races, soft-deleted
secret hiding/recreate behavior, remove behavior with disabled vaults,
soft-deleted external-reference re-import, non-active rotation guards,
managed-secret soft deletion through PATCH, and per-call AWS SDK
credential client churn.
- Rebased this branch onto `public-gh/master` at `0e1a5828` and
force-pushed with lease to keep this as the single PR for the branch.
## Verification
- `git fetch public-gh master`
- `git rebase public-gh/master`
- `git diff --name-only public-gh/master...HEAD | grep
'^pnpm-lock\.yaml$' || true` confirmed `pnpm-lock.yaml` is not in the PR
diff.
- Confirmed migration ordering: master ends at `0081_optimal_dormammu`;
this PR adds `0082_dry_vision` and
`0083_company_secret_provider_configs`.
- Inspected migrations for repeat safety: new tables/indexes use `IF NOT
EXISTS`; foreign keys are guarded by `DO $$ ... IF NOT EXISTS`; column
additions use `ADD COLUMN IF NOT EXISTS`.
- `pnpm -r typecheck` passed before the Greptile follow-up commits.
- `pnpm test:run` ran the full stable Vitest path before the Greptile
follow-up commits; it completed with 3 timing-related failures under
parallel load: `codex-local-execute.test.ts`,
`cursor-local-execute.test.ts`, and `environment-service.test.ts`.
- `pnpm --filter @paperclipai/server exec vitest run
src/__tests__/codex-local-execute.test.ts
src/__tests__/cursor-local-execute.test.ts
src/__tests__/environment-service.test.ts` passed on targeted rerun
(`24/24`).
- `pnpm build` passed before the Greptile follow-up commits. Vite
reported existing chunk-size/dynamic-import warnings.
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/secrets-service.test.ts` passed (`26/26`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/aws-secrets-manager-provider.test.ts
src/__tests__/secrets-service.test.ts` passed (`39/39`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
typecheck` passed.
- Captured Storybook screenshots from `ui/storybook-static` for visual
review.
- Latest PR checks on `5ca3a5cf`: `policy`, serialized server suites
1/4-4/4, `Canary Dry Run`, `e2e`, `security/snyk`, and `Greptile Review`
pass; aggregate `verify` is still registering the completed child
checks.
- Greptile review loop continued through the latest requested pass; all
Greptile review threads are resolved and the latest `Greptile Review`
check on `5ca3a5cf` passed with 0 comments added.
## Screenshots
Before: the provider-vault and remote-import surfaces did not exist on
`master`; these are after-state screenshots from the Storybook fixtures.



## Risks
- Migration risk: this adds new secret provider tables and extends
existing secret rows. The migrations were checked for monotonic ordering
and idempotent guards, but reviewers should still inspect upgrade
behavior carefully.
- Provider risk: AWS support uses direct SigV4 requests. Automated tests
cover signing, request timeouts, vault-config selection, namespace
guardrails, pending-version archival, sanitized provider errors, and
service-level cleanup paths. A real-vault AWS smoke test remains
deployment validation for an operator with AWS credentials rather than
an unverified merge blocker in this local branch.
- UI risk: the Secrets page and import dialog are large new surfaces;
screenshots are included above for reviewer inspection.
- Verification risk: the full local stable test command hit
parallel-load timing failures, although the exact failed files passed
when rerun directly.
- Operational risk: remote import intentionally avoids plaintext reads;
operators must understand that imported external references resolve at
runtime and may fail if AWS permissions change.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent with local shell/tool use in the
Paperclip worktree. Exact context-window size was not exposed by the
runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents need isolated sandbox environments to execute work safely;
Paperclip already supports E2B as a sandbox provider plugin
> - Users want to use Daytona (https://www.daytona.io/) as an
alternative sandbox backend, but no plugin existed for it
> - Without a Daytona plugin, teams that prefer Daytona's
pricing/regions/runtime can't run Paperclip agents on it
> - This pull request adds a `@paperclip/sandbox-provider-daytona`
plugin that mirrors the existing E2B plugin shape and wires up Daytona's
`@daytonaio/sdk` for sandbox lifecycle, command execution, and shell
detection
> - The benefit is that operators can pick Daytona as a first-class
sandbox provider without touching core code, broadening Paperclip's
runtime options
## What Changed
- New plugin package `packages/plugins/sandbox-providers/daytona` with
manifest, worker entry, and provider implementation backed by
`@daytonaio/sdk`
- Implements sandbox create/destroy/exec/upload/download lifecycle,
shell command detection, and config/env wiring consistent with the E2B
plugin
- Adds unit tests under `src/plugin.test.ts` and a README documenting
setup and the `DAYTONA_API_KEY` requirement
- Minor adjustments in `scripts/paperclip-issue-update.sh`,
`packages/shared/src/issue-thread-interactions.test.ts`, and
`packages/shared/src/validators/issue.ts` to support the integration
## Verification
- Re-ran the full sandbox provider matrix on the QA Paperclip instance
using Daytona as the runtime — all 6 adapters executed inside the
Daytona sandbox with zero `environmentExecute` timeouts
- 5/6 adapters pass cleanly (or with informational warns); the only
failure is `codex_local`, which is an OpenAI quota/billing issue
unrelated to Daytona
- `pnpm --filter @paperclip/sandbox-provider-daytona test` runs the
plugin unit tests
## Risks
- New optional plugin; no behavior change for users who don't enable it
- Requires `DAYTONA_API_KEY` for runtime use — documented in the plugin
README
- Daytona SDK is a new external dependency; tracked in the plugin's own
package.json so it doesn't affect the core install footprint
## Model Used
- Claude Opus 4.7 (`claude-opus-4-7`), extended thinking, tool use
enabled
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — backend plugin)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies, and the
release pipeline is part of keeping that control plane shippable.
> - The relevant subsystem here is the release automation in
`scripts/release.sh`, which publishes canary builds and then verifies
npm registry state.
> - The failing CI run showed a successful publish followed by an
immediate registry-state verification failure while npm dist-tag
metadata was still propagating.
> - That made the canary job flaky even when the publish itself had
succeeded, which is the wrong failure mode for release automation.
> - This pull request adds bounded retries around the post-publish
registry-state verification step instead of failing on the first stale
read.
> - The benefit is that canary releases tolerate transient npm
propagation lag while still failing clearly if registry metadata never
converges.
## What Changed
- Wrapped the post-publish `verify-release-registry-state.mjs` call in a
bounded retry loop inside `scripts/release.sh`.
- Reused the existing publish verification retry defaults and added
optional overrides via `NPM_REGISTRY_STATE_VERIFY_ATTEMPTS` and
`NPM_REGISTRY_STATE_VERIFY_DELAY_SECONDS` for dist-tag-specific tuning.
## Verification
- `bash -n scripts/release.sh`
- CI will also exercise the release path via the existing `Canary Dry
Run` workflow job in `.github/workflows/pr.yml`.
## Risks
- Low risk. The main behavioral change is that a genuinely broken
registry-state verification can now wait through the configured retry
window before failing.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex local agent, GPT-5-based Codex runtime in Paperclip with
tool use and shell execution. The exact backend model ID/context window
is not surfaced in this local heartbeat environment.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies, so issue
threads are a core operator surface for reviewing work.
> - The issue detail page is the place where humans read agent messages,
user comments, and execution context together.
> - That thread originally rendered oldest-first, which made recent
activity harder to see during active review.
> - Reversing the thread order changes navigation expectations,
timestamp placement, and the "Jump to latest" affordance, so the UI
behavior needed to move as a coherent set.
> - Because this is a visible core-product behavior shift, it also
needed a safe rollout path instead of becoming the default immediately.
> - This pull request adds the newest-first issue thread behavior behind
an Experimental setting, updates the thread UI to match that mode, and
keeps the legacy oldest-first experience unchanged by default.
> - The benefit is that reviewers can opt into a more recent-first issue
workflow without forcing a global behavior change on every Paperclip
instance.
## What Changed
- Reversed issue thread rendering so the newest comments and messages
appear first when the experiment is enabled.
- Moved the plain comment timestamp into the card header in newest-first
mode and kept the legacy timestamp placement for oldest-first mode.
- Moved the `Jump to latest` control to the bottom of the thread in
newest-first mode while leaving the existing top placement for the
legacy mode.
- Added the `Enable Newest-First Issue Thread` experimental instance
setting and wired issue detail to read that toggle.
- Added regression coverage for thread order, timestamp placement,
jump-button placement, and the issue-detail experiment toggle behavior.
## Verification
- `pnpm -r typecheck`
- `pnpm test:run`
- `pnpm build`
- Focused checks that also passed during issue review:
- `pnpm vitest run src/components/IssueChatThread.test.tsx
src/pages/IssueDetail.test.tsx` in `ui/`
- `pnpm vitest run src/__tests__/instance-settings-routes.test.ts` in
`server/`
- Manual review path:
- Enable `Instance Settings > Experimental > Enable Newest-First Issue
Thread`
- Open an issue with comments/messages and confirm newest activity
renders first, timestamps move into the header, and `Jump to latest`
sits below the thread
- Disable the experiment and confirm the legacy oldest-first behavior
returns
## Risks
- Low risk: the behavioral change is gated behind an instance-level
experimental toggle and defaults off.
- The main regression risk is thread navigation drift between the two
modes, especially around anchor scrolling and the `Jump to latest`
affordance.
- There is some UI coupling between issue-detail query state and
experimental settings fetches, so future changes in that area should
keep both modes covered.
- Screenshots are not attached in this PR body; verification is
described with automated coverage and manual steps instead.
> I checked [`ROADMAP.md`](ROADMAP.md). This is a scoped issue-thread UX
improvement and rollout gate, not a duplicate of a roadmap-level planned
core feature.
## Model Used
- OpenAI Codex via the local `codex_local` Paperclip adapter,
GPT-5-based coding agent with terminal tool use and local code execution
in this repository worktree.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The Cursor adapter spawns the Cursor CLI against local, SSH, and
sandbox execution targets; on a fresh sandbox lease, it has to resolve
where Cursor was installed
> - The previous resolver only looked for `~/.local/bin/cursor-agent`
even though the official installer (and the adapter's own
`SANDBOX_INSTALL_COMMAND`) sometimes lays the binary down as
`~/.local/bin/agent`, so a sandbox where the install ran successfully
would still fail to find the CLI
> - This pull request lets the resolver accept either basename and lets
the caller pass an optional `remoteSystemHomeDirHint` so a probe doesn't
pay the cost of a remote `printf $HOME` round-trip when the home
directory is already known
> - The benefit is sandboxed Cursor runs find the binary that the
install actually produced, and runtime probes are cheaper when the home
dir is already resolved
## What Changed
- `packages/adapters/cursor-local/src/server/remote-command.ts`: accept
either `agent` or `cursor-agent` as the preferred basename; new optional
`remoteSystemHomeDirHint` short-circuits the home-dir probe
- `packages/adapters/cursor-local/src/server/execute.ts`: thread the
home-dir hint through, prefer the resolved binary path, and shift the
effective execution cwd to the per-run managed subdirectory once the
runtime is prepared
- New `remote-command.test.ts` and `execute.test.ts` cover both
basenames, the hint short-circuit, and the cwd shift
- `packages/adapters/cursor-local/src/index.ts`: update doc string to
reflect the broader resolution
- `execute.remote.test.ts` updated to expect the managed-subdirectory
cwd shape introduced by the cwd shift
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-cursor-local` — 6/6 passing
- `pnpm typecheck` clean
- Manual: a fresh sandbox lease with `npm install -g …`-installed Cursor
(binary lands as `~/.local/bin/agent`) now runs cleanly through the
adapter
## Risks
Low. Resolver is strictly broader (matches a superset of paths);
existing setups with `~/.local/bin/cursor-agent` continue to work. The
home-dir hint is opt-in; callers that don't pass it get the existing
probe behavior. Cursor's effective execution cwd now matches the rest of
the adapters (per-run managed subdirectory) — sessions previously rooted
at the workspace root will land in the new subdirectory.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
both basenames + hint short-circuit + cwd shift
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---
> **Stacked PR.** Sits on top of #5445 (which sits on #5444). Cumulative
diff against `master` includes both of those PRs' content; the files
touched by *this* PR's commit are listed under "What Changed" above.
Will rebase onto `master` and force-push once the prerequisite PRs
merge.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters expose a Test action that probes the configured runtime —
install, resolvability, hello — to give operators a fast yes/no on
whether an environment is healthy
> - The Codex test path was running its hello probe directly without
going through the managed-runtime preparation that production runs use,
so a healthy production setup could still report a probe failure
> - The plugin worker manager wasn't surfacing terminated workers
cleanly, leaving the runtime probe waiting on a dead worker until the
request timed out
> - This pull request routes the Codex test probe through
`prepareAdapterExecutionTargetRuntime` (so it sees the same managed
Codex home production sees), exposes `commandCwd` on
`createCommandManagedRuntimeClient` so callers can target a per-probe
directory without leaking the workspace `remoteCwd`, and propagates
plugin-worker termination as a usable error instead of a hang
> - The benefit is the Codex Test action mirrors production behavior
end-to-end, and probes against a terminated plugin worker fail fast
instead of timing out
## What Changed
- `packages/adapter-utils/src/command-managed-runtime.ts`: rename the
`remoteCwd` knob to `commandCwd` so callers can target a per-probe
directory without inheriting the workspace cwd; matching test coverage
in `command-managed-runtime.test.ts`
- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
small fixes to keep callback bridge stop semantics deterministic
- `packages/adapters/codex-local/src/server/test.ts`: thread the Codex
hello probe through `prepareAdapterExecutionTargetRuntime` +
`prepareManagedCodexHome` so the probe sees the same managed home
production sees; new `test.remote.test.ts` covers the remote probe path
- `packages/adapters/cursor-local/src/server/execute.ts`: small
probe-side cleanup that aligns with the new commandCwd contract
- `server/src/services/plugin-worker-manager.ts`: surface plugin-worker
termination as a structured error so callers fail fast; new
`plugin-worker-terminated.cjs` fixture and
`plugin-worker-manager.test.ts` cases pin the behavior
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project @paperclipai/server` —
1749/1750 passing (1 unrelated skip)
- `pnpm typecheck` clean
## Risks
Low–medium. The `remoteCwd → commandCwd` rename is a parameter renaming
on an internal helper used only by adapter test/execute paths in this
repo. The plugin-worker-terminated path was previously a hang; failing
fast may surface latent timeouts as explicit termination errors in
callers that already expected them.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
commandCwd, plugin-worker termination, and Codex remote test path
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---
> **Stacked PR.** Sits on top of #5444 which adds the per-run runtime
API surface this PR builds on. Cumulative diff against `master` includes
that PR's content; the files touched by *this* PR's commit are listed
under "What Changed" above. Will rebase onto `master` and force-push
once #5444 merges.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs against a remote target, Paperclip syncs the
workspace out to the remote at run start and restores changes back to
the local workspace at run end
> - The previous restore flow naïvely overwrote local files with
whatever the remote returned, so files that the remote run never touched
but had timestamp/mode drift could be needlessly rewritten — and a
single static `refs/paperclip/ssh-sync/imported` ref made concurrent SSH
workspace exports race on the same git ref
> - This pull request adds a `workspace-restore-merge` module that diffs
a pre-run snapshot against the post-run remote state and only writes
back files the remote actually changed; SSH workspace exports now use a
per-import unique ref so concurrent runs can't trample each other
> - Every adapter's execute path threads the snapshot through
`prepareAdapterExecutionTargetRuntime` so the merge has the baseline it
needs
> - The benefit is workspace restores no longer churn untouched files,
and concurrent SSH runs no longer collide on the import ref
## What Changed
- `packages/adapter-utils/src/workspace-restore-merge.{ts,test.ts}`: new
module — directory snapshot (kind/mode/sha256/symlink target) plus
snapshot-aware merge that writes only the files the remote changed
- `packages/adapter-utils/src/ssh.ts`: SSH workspace export uses a
per-import unique ref (`refs/paperclip/ssh-sync/imported/<uuid>`);
restore goes through the new merge helper; `ssh-fixture.test.ts` covers
the unique-ref + merge paths
- `packages/adapter-utils/src/sandbox-managed-runtime.ts` +
`remote-managed-runtime.ts`: thread the snapshot/merge through the
sandbox and SSH paths
- `packages/adapter-utils/src/server-utils.{ts,test.ts}` +
`execution-target.ts`: helpers for capturing the pre-run snapshot;
`prepareAdapterExecutionTargetRuntime` gains required `runId` and
optional `workspaceRemoteDir`, and returns the realized
`workspaceRemoteDir`
- Each adapter's `execute.ts` (acpx, claude, codex, cursor, gemini,
opencode, pi) takes the snapshot at run start and passes it through to
the runtime restore
- Remote execute test mocks updated to match the new
`prepareWorkspaceForSshExecution` return shape and the per-run
`${managedRemoteWorkspace}` cwd subdirectory
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-acpx-local --project
@paperclipai/adapter-claude-local --project
@paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project
@paperclipai/adapter-gemini-local --project
@paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local` — 196/196 passing
- `pnpm typecheck` clean across the workspace
## Risks
Medium. The restore path now writes a strict subset of what it
previously did — files the remote did not touch are no longer rewritten.
If any flow was relying on a touch-without-content-change being copied
back (timestamp or permission propagation only), that behavior is now
skipped. Snapshot capture adds an O(N-files-in-workspace) hash pass at
run start; the cost is bounded by the existing exclude list. The `runId`
parameter on `prepareAdapterExecutionTargetRuntime` is now required —
every in-tree caller is updated; out-of-tree adapter authors need to
pass it.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new module +
every adapter execute path covered
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Operators use the sidebar as their primary board navigation surface
> - The board now has a dedicated search page, so the header search icon
should behave as normal navigation instead of only dispatching a
command-palette shortcut
> - The Work nav also had a separate Search row, which duplicated the
always-visible header search affordance
> - This pull request keeps search one click away while making it a
direct `/search` link and reducing sidebar nav noise
> - The benefit is a smaller, clearer sidebar with search still
accessible from the top-level chrome
## What Changed
- Changed the sidebar header search icon into a direct `NavLink` to
`/search`.
- Removed the duplicate `Search` row from the Work navigation section.
- Added focused Sidebar coverage that asserts the header search link
target and confirms Search is not rendered in the Work nav.
- Refactored the Sidebar test setup helper to avoid repeating the React
Query wrapper across tests.
## Verification
- `pnpm install --frozen-lockfile` in the PR worktree so workspace
package symlinks existed for test execution. This completed with
existing plugin SDK bin warnings for missing built artifacts.
- `pnpm exec vitest run ui/src/components/Sidebar.test.tsx` — 3 passed.
- `pnpm --filter @paperclipai/ui typecheck` — passed.
## Risks
- Low: this changes a sidebar navigation affordance only. Users who
previously clicked the header icon now land on the full search page
instead of opening the command-palette shortcut path.
- Low: removing the Work nav Search row could affect users who expected
Search in that section, but the icon remains in the fixed sidebar header
and is covered by a targeted DOM test.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots or equivalent focused UI verification
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The issue graph and liveness recovery system decide whether assigned
work is executable or parked
> - Assigned issues created without an explicit status could silently
land in backlog, making parents look blocked with no productive wake
path
> - The server, shared validators, recovery analysis, and UI all need to
agree on that execution semantic
> - This pull request makes assigned issue creation default to `todo`,
flags assigned backlog blockers, and surfaces the state in the board
> - The benefit is that parked assigned work becomes intentional and
visible instead of creating silent liveness stalls
## What Changed
- Adds contract tests for assigned issue creation defaults.
- Defaults assigned issue creation to `todo` when status is omitted
while preserving explicit `backlog` parking.
- Exposes `resolveCreateIssueStatusDefault` through shared validators.
- Teaches liveness/blocker attention paths to distinguish assigned
backlog blockers.
- Adds UI notices, row/header badges, and issue detail safeguards for
assigned backlog blockers.
- Adds Storybook fixtures and execution-semantics documentation for the
assigned-backlog behavior.
## Verification
- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/validators/issue.test.ts
server/src/__tests__/issue-assigned-backlog-contract-routes.test.ts
server/src/__tests__/issue-blocker-attention.test.ts
server/src/__tests__/issue-liveness.test.ts
server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts
ui/src/components/IssueAssignedBacklogNotice.test.tsx
ui/src/components/IssueRow.test.tsx` — 50 passed, 23 skipped.
- Skipped tests were embedded Postgres suites on this host with the repo
skip message: `Postgres init script exited with code null. Please check
the logs for extra info. The data directory might already exist.`
- Pairwise merge check against the issue-controls PR branch completed
without conflicts via `git merge --no-commit --no-ff` in a temporary
worktree.
- Screenshots for assigned-backlog UI states:
[light](docs/pr-screenshots/pr-5428/assigned-backlog-light.png),
[dark](docs/pr-screenshots/pr-5428/assigned-backlog-dark.png).
- Follow-up checks: `pnpm --filter /ui typecheck`; `pnpm --filter
/mcp-server build`; `pnpm --filter /mcp-server test`; `pnpm exec vitest
run packages/shared/src/validators/issue.test.ts`; focused UI component
tests.
- Remote PR checks on head `6300b3c`: policy, verify, serialized server
shards 1/4-4/4, Canary Dry Run, e2e, Greptile Review, and Snyk all
passed.
## Risks
- Medium: changes status defaulting for assigned issue creation when the
caller omits status. Explicit `backlog` remains supported, and
server/shared tests cover both paths.
- Medium: liveness classification changes can affect blocker attention
labels; focused service and UI tests cover the new assigned-backlog
state.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Operators spend most of their day scanning skills, routines, inbox
groups, and activity cards
> - Several small UI rough edges made those surfaces harder to scan or
easier to crash on real API payloads
> - These fixes are grouped together because they are low-risk operator
quality-of-life improvements rather than separate control-plane
contracts
> - This pull request polishes skills metadata, routine run-now access,
grouped issue creation defaults, monitor activity rendering, and
activity row identity layout
> - The benefit is a smoother board workflow with fewer small
interruptions while keeping the change set compact
## What Changed
- Improves company skill source display and the used-by agent list.
- Truncates long skill source paths and adds a copy affordance.
- Adds a row-level run-now button to the routines table.
- Adds grouped issue creation defaults for inbox issue groups and aligns
grouped add buttons to the right.
- Fixes `IssueMonitorActivityCard` when `monitorNextCheckAt` arrives as
an ISO string.
- Polishes activity row actor avatar/name layout by using the shared
avatar primitive.
## Verification
- `pnpm run preflight:workspace-links && pnpm exec vitest run
ui/src/pages/Routines.test.tsx ui/src/components/IssuesList.test.tsx
ui/src/lib/inbox.test.ts
ui/src/components/IssueMonitorActivityCard.test.tsx` — 91 passed.
- The routines test emitted the pre-existing Radix warning about missing
`DialogTitle`/description in dialog content; tests still passed.
- Pairwise merge checks against the other two PR branches reported no
textual conflicts.
## Risks
- Low: changes are UI-focused and covered by targeted component/lib
tests.
- Low-to-medium: activity row layout changes could affect dense feed
scanability; the implementation uses the shared avatar component and
keeps truncation behavior.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Issue operators need clear controls for execution settings, model
overrides, and recovery retries
> - Existing issue properties hid useful adapter override state and did
not expose a board-triggered retry for scheduled heartbeat recovery
> - Scheduled retries also need to respect the same safety gates as
normal execution instead of bypassing budget, review, pause, dependency,
or terminal-state checks
> - This pull request adds the issue property controls and retry-now
surfaces together because they share the issue details/properties UI
> - The benefit is that operators can inspect and adjust issue execution
settings and safely trigger pending scheduled recovery without hidden
control-plane behavior
## What Changed
- Adds editable issue assignee model override controls in
`IssueProperties`, with focused coverage.
- Removes the stale workspace tasks link from issue properties.
- Adds a scheduled retry `retry-now` backend path and shared response
types.
- Adds main-pane and properties-pane scheduled retry UI, backed by a
shared `useRetryNowMutation` hook.
- Adds suppression coverage for budget hard stops, review participant
changes, subtree pause holds, unresolved blockers, terminal issues, and
company scoping.
- Updates the `IssueProperties` test harness with toast actions required
by the retry-now hook.
## Verification
- `pnpm exec vitest run ui/src/components/IssueProperties.test.tsx
ui/src/components/IssueScheduledRetryCard.test.tsx` — 31 passed.
- `pnpm exec vitest run
server/src/__tests__/issue-scheduled-retry-routes.test.ts` — exited 0,
but this host skipped the embedded Postgres route tests with: `Postgres
init script exited with code null. Please check the logs for extra info.
The data directory might already exist.`
- Pairwise merge check against the assigned-backlog PR branch completed
without conflicts via `git merge --no-commit --no-ff` in a temporary
worktree.
### Visual verification screenshots
Storybook story: `Product/Issue Scheduled retry surfaces /
ScheduledRetrySurfaces`.


## Risks
- Medium: this touches issue execution/retry behavior, so CI should run
the embedded Postgres route tests on a host that can initialize
Postgres.
- Low-to-medium UI risk around duplicated retry-now entry points; both
surfaces share one mutation hook to keep behavior consistent.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The issue thread is the operator's durable audit trail for what
changed and why
> - Workspace changes and stale disposition notices need to be visible
in that same timeline without noisy or misleading rendering
> - The local branch already contained backend activity details,
timeline conversion, and UI rendering work for those events
> - This pull request isolates the issue-thread activity work into a
standalone branch against `origin/master`
> - The benefit is a focused audit-trail PR that can merge independently
of the sidebar/operator UI polish branch
## What Changed
- Adds readable workspace-change activity details to issue update
activity events.
- Surfaces workspace-change events in issue chat/timeline rendering.
- Makes the existing issue comment migration idempotent.
- Folds and renders stale disposition notices inline so they match
activity-log styling and spacing.
- Adds focused route, timeline, and issue-thread system notice coverage.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run
server/src/__tests__/issue-activity-events-routes.test.ts
ui/src/lib/issue-timeline-events.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx` — 3 files
passed, 22 tests passed.
- Confirmed the PR changes 9 files and does not include `pnpm-lock.yaml`
or `.github/workflows/*`.
- `pnpm exec vitest run
server/src/__tests__/issue-closed-workspace-routes.test.ts` — 1 file
passed, 4 tests passed.
- `pnpm exec vitest run
server/src/__tests__/issue-activity-events-routes.test.ts
ui/src/lib/issue-timeline-events.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx
server/src/services/recovery/successful-run-handoff.test.ts
packages/shared/src/validators/issue.test.ts` — 5 files passed, 54 tests
passed.
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`.
- `pnpm --filter @paperclipai/ui typecheck` after adding the Storybook
screenshot fixture.
- Captured Storybook screenshots for the new UI rendering paths:
- Collapsed stale notice + workspace-change row:
`docs/pr-screenshots/pr-5356/issue-thread-notices-collapsed.png`
- Expanded stale notice details:
`docs/pr-screenshots/pr-5356/issue-thread-notices-expanded.png`
### Screenshots
Collapsed stale notice with workspace-change row:

Expanded stale notice details:

## Risks
- Moderate risk: this touches issue activity serialization and
issue-thread rendering, both of which are central operator surfaces.
- Migration risk is low: the only migration change makes an existing
migration idempotent.
- No new migrations are introduced, so there is no cross-PR migration
ordering requirement.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, shell/tool-use enabled, used to
split the existing branch, verify the isolated PR branch, and create
this PR.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Operators use the board sidebar and issue properties panel to move
between companies and understand task metadata
> - Small UI regressions in these controls make repeated board operation
slower and less predictable
> - The local branch already contained targeted fixes for company
ordering, issue date display, and sidebar rail sizing
> - This pull request isolates those operator UI quality-of-life fixes
into a standalone branch against `origin/master`
> - The benefit is a focused, reviewable PR that can merge independently
of the issue-thread activity work
## What Changed
- Shows issue property timestamps with time, not just dates.
- Adds edit-mode support for ordering companies in the sidebar company
menu.
- Fixes a workspace switcher rail regression and keeps the account menu
aligned with the rail width.
- Includes focused component coverage for the touched controls.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run ui/src/components/IssueProperties.test.tsx
ui/src/components/SidebarCompanyMenu.test.tsx
ui/src/components/Layout.test.tsx
ui/src/components/SidebarAccountMenu.test.tsx` — 4 files passed, 29
tests passed.
- `pnpm --filter /ui typecheck`
- PR checks on `a4030f7a` are green: policy, verify, serialized server
suites 1/4-4/4, e2e, Canary Dry Run, Greptile Review, and Snyk.
- Captured a local Storybook screenshot of `Product/Navigation & Layout`
after the sidebar polish:
`/tmp/pap-3659-screenshots/navigation-layout-after.png`.
- Confirmed the PR changes 8 files and does not include `pnpm-lock.yaml`
or `.github/workflows/*`.
## Risks
- Low to moderate UI risk: this touches shared sidebar components and
issue metadata rendering.
- The company ordering behavior depends on existing query/cache
behavior, so stale cache bugs would show up as ordering inconsistencies.
- No database, API, workflow, or lockfile changes are included.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, shell/tool-use enabled, used to
split the existing branch, verify the isolated PR branch, and create
this PR.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip relies on issue identifiers, execution policies, and agent
heartbeat rules to keep autonomous work auditable.
> - Safety checks need to reject ambiguous agent handoffs, and
identifier parsing needs to support Cloud tenant prefixes.
> - Agent instructions also need to make final-disposition rules
explicit so work does not stall in vague states.
> - This pull request isolates backend correctness and governance
hardening from the UI and recovery-system-notice branches.
> - The benefit is safer in-review transitions, better identifier
compatibility, and clearer agent operating contracts.
## What Changed
- Fixed run-aware confirmation ordering and interrupted-run state
cleanup.
- Added Cloud tenant identity bootstrap and alphanumeric issue
identifier support across shared parsing and server routes.
- Guarded agent-authored `in_review` updates unless a real review path
exists.
- Tightened heartbeat disposition instructions in adapter
utilities/default AGENTS/Paperclip skill.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run packages/shared/src/issue-references.test.ts
server/src/__tests__/issue-identifier-routes.test.ts
server/src/__tests__/issue-execution-policy-routes.test.ts
packages/adapter-utils/src/server-utils.test.ts` initially had the first
execution-policy test hit Vitest's 5s timeout under the parallel bundle
while the rest passed.
- `pnpm exec vitest run
server/src/__tests__/issue-execution-policy-routes.test.ts
--testTimeout=20000` passed with 10/10 tests.
- Follow-up: `pnpm run typecheck:build-gaps` passed.
- Follow-up: `pnpm --filter @paperclipai/ui typecheck` passed.
- Follow-up: `pnpm vitest run
server/src/__tests__/issue-comment-reopen-routes.test.ts
server/src/__tests__/company-portability.test.ts
server/src/__tests__/costs-service.test.ts` passed.
- Follow-up: `pnpm vitest run ui/src/context/LiveUpdatesProvider.test.ts
ui/src/lib/issue-chat-messages.test.ts
ui/src/lib/issue-reference.test.ts
ui/src/lib/issue-timeline-events.test.ts` passed.
## Risks
- Medium control-plane risk: in-review update validation changes agent
behavior. The error message is explicit and tests cover allowed review
paths.
## Model Used
- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip is a control plane for autonomous AI companies.
> - Issues are the core unit of work, and issue comments are how board
users and agents coordinate execution.
> - Some issue conversations need to produce plans and approvals instead
of immediate implementation work.
> - The existing issue contract did not distinguish standard execution
comments from planning-oriented issue work.
> - This pull request adds an issue work-mode contract and board UI
affordances for standard vs planning mode.
> - The benefit is that planning-mode issues can be created, displayed,
discussed, and carried through agent heartbeat context without losing
the normal issue workflow.
## What Changed
- Added `standard` / `planning` issue work-mode contracts across DB,
shared validators/types, server issue flows, plugin protocol, and
adapter heartbeat payloads.
- Added an idempotent `0081_optimal_dormammu` migration for
`issues.work_mode`, ordered after current `public-gh/master` migrations.
- Updated heartbeat/context summaries and issue-thread interaction
behavior so planning work mode is preserved when creating suggested
follow-up issues.
- Added UI support for planning-mode issue creation, issue rows, detail
composer styling, and composer work-mode toggles.
- Added focused server/shared/UI tests plus a Playwright visual
verification spec for planning-mode surfaces.
- Rebased the branch onto current `public-gh/master` and added durable
planning-mode screenshots under `doc/assets/pap-3368/`.
## Verification
- `pnpm --filter @paperclipai/db run check:migrations`
- `pnpm exec vitest run --project @paperclipai/shared
packages/shared/src/validators/issue.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/heartbeat-context-summary.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts
server/src/__tests__/issues-goal-context-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm exec vitest run --project @paperclipai/ui
ui/src/components/IssueChatThread.test.tsx
ui/src/components/NewIssueDialog.test.tsx
ui/src/components/IssueRow.test.tsx ui/src/pages/IssueDetail.test.tsx`
- `pnpm exec vitest run --project @paperclipai/adapter-utils
packages/adapter-utils/src/server-utils.test.ts`
- `PAPERCLIP_E2E_SKIP_LLM=true npx playwright test --config
tests/e2e/playwright.config.ts
tests/e2e/planning-mode-visual-verification.spec.ts`
## Screenshots
Desktop planning detail:

Desktop planning row:

Desktop staged standard toggle:

Mobile planning detail:

Mobile planning row:

## Risks
- Medium migration risk: this adds a non-null issue column. The
migration uses `ADD COLUMN IF NOT EXISTS` so installations that applied
an older branch-local migration number can still apply the final
numbered migration safely.
- Medium contract risk: issue payloads, plugin payloads, and adapter
heartbeat payloads now include work mode; compatibility is handled by
defaulting missing values to `standard`.
- UI risk is moderate because composer controls changed; focused
component tests and visual e2e coverage exercise standard vs planning
display and toggle behavior.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent in a local Paperclip worktree, with
shell/tool use. Exact context-window size is not exposed in this
runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - Operators need to find work, documents, agents, projects, comments,
and activity across a company without jumping through separate surfaces.
> - The existing Command-K flow was useful for fast navigation but not
enough for deeper company-wide discovery.
> - Search also needs company-scoped backend contracts, query cost
controls, and indexed document matching so it stays safe as company data
grows.
> - This pull request adds a full company search API and a dedicated
board search page that Command-K can hand off to.
> - The benefit is a single searchable control-plane surface with richer
result context, recents, highlights, and test coverage across server and
UI behavior.
## What Changed
- Added a company-scoped search endpoint/service with query validation,
rate limiting, text matching, fuzzy title matching, and result typing
shared through `@paperclipai/shared`.
- Added idempotent search migrations for document search indexes and
fuzzy matching support.
- Added the full `/companies/:companyKey/search` UI, search result row
components, highlighted snippets, recent searches, and sidebar/Command-K
handoff.
- Added Storybook coverage for search surfaces and Vitest coverage for
server search behavior, rate limiting, route generation, Command-K
behavior, and the search page.
- Addressed Greptile findings by renaming the no-match SQL helper,
applying search pagination after cross-type merge sorting, and
lazy-initializing the default search service so unrelated route-test
mocks do not need to know about it.
- Merged current `public-gh/master` and renumbered the search migrations
behind upstream `0078_white_darwin`: search indexes are now
`0079_company_search_document_indexes` and fuzzy matching is
`0080_company_search_fuzzystrmatch`.
## Verification
- `git fetch public-gh master`
- `git diff --check public-gh/master...HEAD`
- `git diff --name-only public-gh/master...HEAD | rg '^pnpm-lock\.yaml$'
|| true` produced no output before opening the PR.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts
ui/src/pages/Search.test.tsx ui/src/components/CommandPalette.test.tsx
ui/src/lib/company-routes.test.ts` passed: 5 files, 25 tests.
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck
&& pnpm --filter @paperclipai/ui typecheck` passed.
- `pnpm exec vitest run
server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts && pnpm
--filter @paperclipai/server typecheck` passed after Greptile pagination
fixes.
- `pnpm exec vitest run
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts
server/src/__tests__/company-search-service.test.ts && pnpm --filter
@paperclipai/server typecheck` passed after the CI mock fix.
- After resolving the migration conflict with current
`public-gh/master`: `pnpm --filter @paperclipai/db typecheck && pnpm
exec vitest run server/src/__tests__/company-search-service.test.ts
server/src/__tests__/company-search-rate-limit-routes.test.ts && pnpm
--filter @paperclipai/server typecheck` passed.
- DB migration numbering check passed as part of `@paperclipai/db`
typecheck.
- UI states are covered by the added Storybook stories in
`ui/storybook/stories/search.stories.tsx`.
- GitHub reports the PR merge state as `CLEAN` on head `18e54fa8`.
- GitHub PR checks are green on head `18e54fa8`: policy, verify,
serialized server shards 1/4 through 4/4, e2e, canary dry run, Snyk, and
Greptile Review.
## Risks
- Search ranking and snippets are new user-facing behavior, so reviewers
should check whether result ordering feels right on real company data.
- Search touches broad company data, so company scoping and query
cost/rate-limit behavior should be reviewed carefully.
- The migrations add search indexes/extensions; they are idempotent with
`IF NOT EXISTS` for users who may have applied an earlier branch
migration number.
> ROADMAP.md checked. This PR adds a focused board search surface and
does not duplicate an open roadmap item.
## Model Used
- OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub CLI
session with medium reasoning effort. Existing branch commits were
produced across prior agent sessions; this packaging pass verified,
opened the PR, addressed Greptile findings, resolved migration conflicts
after upstream PRs landed, and got PR checks green.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
## Thinking Path
> - Paperclip is a control plane operators use repeatedly to supervise
agent companies.
> - Common operator workflows depend on fast scanning of inboxes, issue
sidebars, workspaces, cost totals, and runtime services.
> - Several small UI and service gaps made those workflows slower or
less clear.
> - This pull request groups the operator-facing QoL changes that can
stand alone from recovery and adapter work.
> - The benefit is a denser, clearer board experience for issue triage
and workspace operation.
## What Changed
- Added inbox assignee/project grouping and issue list token/runtime
totals.
- Improved issue properties with removable blocker chips and workspace
task links.
- Improved execution workspace layout, runtime controls, issues tab
default, and stopped-port reuse behavior.
- Added mobile markdown/routine dialog fixes, page title company names,
sidebar polish, and dashboard run task label cleanup.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run ui/src/lib/inbox.test.ts
ui/src/components/IssueProperties.test.tsx
ui/src/components/WorkspaceRuntimeControls.test.tsx
server/src/__tests__/workspace-runtime.test.ts
server/src/__tests__/costs-service.test.ts`
## Risks
- Medium UI risk because this touches several operator surfaces. The
branch is intentionally grouped around workflow/QoL files and keeps the
file count below the Greptile limit.
## Model Used
- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents across several adapter
implementations.
> - ACPX is a local adapter path that can proxy Claude and Codex-style
execution.
> - Its configuration needed stronger schema defaults, provider-aware
model handling, and better UI support.
> - Plugin authors also need clear docs for managed resources.
> - This pull request improves ACPX adapter configuration and documents
plugin-managed resources.
> - The benefit is a more predictable adapter setup path without
changing unrelated control-plane behavior.
## What Changed
- Improved ACPX config schema, execution config handling, UI build
config, and route coverage.
- Added ACPX model filtering support and tests.
- Updated the agent config form and storybook coverage for ACPX
model/provider behavior.
- Expanded plugin authoring documentation for managed resources.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run server/src/__tests__/acpx-local-execute.test.ts
server/src/__tests__/adapter-routes.test.ts
ui/src/lib/acpx-model-filter.test.ts`
## Risks
- Low-to-medium risk: adapter configuration behavior changes can affect
ACPX users, but the change is isolated to ACPX/plugin-doc surfaces and
covered by targeted adapter tests.
## Model Used
- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies.
> - Agent runs can end productively while the source issue still lacks a
durable final disposition.
> - That leaves the control plane unsure whether to resume, escalate, or
close the work.
> - Issue comments also need a presentation contract so system-authored
recovery notices can render as first-class thread messages without
overloading normal comments.
> - This pull request adds successful-run handoff recovery, comment
presentation metadata, and system notice rendering.
> - The benefit is stricter task liveness with clearer operator-facing
recovery state.
## What Changed
- Added successful-run handoff decisions, wake payloads, escalation
behavior, and recovery tests.
- Added issue comment presentation metadata with migration
`0078_white_darwin.sql` and shared/server/company portability support.
- Rendered recovery/system notices in issue chat with dedicated UI
components, fixtures, tests, and storybook/lab coverage.
- Included the current recovery model-profile hint patch so automatic
recovery follow-ups use the cheap profile.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run
server/src/services/recovery/successful-run-handoff.test.ts
ui/src/components/SystemNotice.test.tsx
ui/src/lib/system-notice-comment.test.ts
ui/src/components/IssueChatThreadSystemNotice.test.tsx`
## Risks
- Migration-bearing PR: merge this before any other branch that might
later add a migration.
- The branch touches both recovery services and issue-thread rendering,
so review should pay attention to recovery wake idempotency and comment
metadata compatibility.
## Model Used
- OpenAI GPT-5 Codex via Paperclip `codex_local` adapter, with
shell/git/GitHub CLI tool use.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
> **Stacked PR.** This PR's branch carries cumulative content from #5324
(bridge allowlist expand) and #5325 (env sanitization) — the
mutex/sha256 logic in this PR sits on top of both. Reviewers should
focus on the files this PR's commit touches:
`packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`,
`packages/adapter-utils/src/ssh.ts`, and
`packages/adapter-utils/src/ssh-fixture.test.ts`. Will rebase onto
`master` and force-push once both prerequisite PRs are merged.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent that runs in a sandbox or via SSH talks back to the
Paperclip server through a per-lease callback bridge whose entrypoint
script is uploaded to the remote
> - When two heartbeats target the same agent on the same machine
concurrently, both upload the bridge entrypoint and both write to the
same response files — producing torn-write races: `SyntaxError:
Identifier 'randomUUID' has already been declared` from a concatenated
upload, `mv: cannot stat …` from colliding `.json.tmp` writes, and
0-byte commits from a truncated stdin
> - This pull request serializes those operations with a POSIX
`mkdir`-mutex (PID liveness check + atomic rename) at the bridge
entrypoint upload, applies the same lock to the bridge response writer,
forwards stdin into remote ssh commands so the entrypoint payload
arrives intact, and verifies a sha256 of the upload before promoting it
> - The benefit is concurrent heartbeats no longer corrupt each other's
bridge state
## What Changed
- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: serialize
entrypoint upload and response writes via POSIX `mkdir`-mutex with PID
liveness; sha256 the upload before promoting via `mv`; content-skip when
the existing entrypoint already matches
- `packages/adapter-utils/src/ssh.ts`: forward stdin into remote ssh
commands through the SSH managed runtime so `cat > "$remote_upload"`
actually receives the base64-encoded entrypoint
- `packages/adapter-utils/src/ssh-fixture.test.ts`: cover the
stdin-forwarded SSH path
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: cover
the mutex, content-skip, sha256-verify, and atomic-rename paths
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: two parallel heartbeats targeting the same SSH agent no longer
race on the bridge entrypoint or response files
## Risks
Medium. Serializing previously-parallel operations adds latency on the
contended path (one heartbeat waits on another), bounded by the
entrypoint upload time. The mutex includes PID liveness so a crashed
heartbeat doesn't deadlock subsequent ones. Sha256-verify gives a clear
"torn upload" failure mode instead of silent 0-byte commits.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — tests cover mutex
+ sha256-verify + stdin-forwarded ssh
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters spawn CLIs against local, SSH, and sandbox targets,
threading a runtime env through `runAdapterExecutionTargetProcess` and
the SSH/sandbox runners
> - Host identity vars (HOME, TMPDIR, XDG_*, NVM_DIR, PATH) routinely
leak into the env we send to remote targets — sometimes via test probes,
sometimes via runtime config — and break sandboxed/SSH'd CLIs whose own
profiles set those values correctly
> - The sanitization logic existed but lived alongside other helpers in
`server-utils.ts` and was applied piecemeal at adapter callsites, so it
was easy to bypass
> - This pull request lifts the sanitization into a standalone
`remote-execution-env.ts`, applies it at the SSH and sandbox runtime
boundary so every remote spawn goes through it, and removes the
duplicated callsite-level filtering
> - The benefit is identity-bound host env stops leaking across
SSH/sandbox transports regardless of which adapter calls in
## What Changed
- `packages/adapter-utils/src/remote-execution-env.ts`: new module —
single source of truth for which env keys are identity-bound and how to
strip them when the value matches the host's value
- `packages/adapter-utils/src/server-utils.ts`: remove the inline
sanitization (now in `remote-execution-env.ts`)
- `packages/adapter-utils/src/execution-target.ts`: apply sanitization
at the sandbox runtime boundary
- `packages/adapter-utils/src/ssh.ts`: apply sanitization at the SSH
spawn boundary
- `packages/adapters/opencode-local/src/server/test.ts`: drop
now-redundant callsite filtering
- `packages/adapters/pi-local/src/server/test.ts`: drop now-redundant
callsite filtering
- New tests `execution-target.test.ts` and
`execution-target-sandbox.test.ts` cover the sanitizer flow at both
transports, including positive cases (host-shaped path stripped) and
explicit-override preservation
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local`
- `pnpm typecheck` clean
## Risks
Low–medium. The sanitization is now applied at one layer (boundary)
instead of N (callsites), so behavior is more consistent. Any adapter
that previously relied on a leaked host var landing on the remote shell
would now see it stripped — but those reliances were what this change
exists to fix.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests at both
transports
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs in an e2b sandbox or other non-managed
environment, it talks back to the Paperclip server through a per-lease
callback bridge that proxies HTTP requests
> - The bridge has an allowlist of method/path patterns it will forward;
anything outside the list is rejected to keep the bridge tight
> - The allowlist had drifted behind what the heartbeat documentation
describes as the supported callback surface — several documented
endpoints (issue updates, agent-side log emit, work-status writes) were
being rejected at the bridge
> - This pull request expands the allowlist to cover the documented
heartbeat surface and adds tests that pin every newly-allowed pattern,
so the doc and the bridge stay in sync
> - The benefit is sandboxed runs no longer hit "method not allowed" /
"path not allowed" rejections on the documented set of callbacks
## What Changed
- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: expand the
method/path allowlist to match the documented heartbeat callback surface
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: add
coverage for every newly-allowed pattern, plus negative cases for
patterns that should still be rejected
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: previously-rejected callbacks from sandboxed runs now succeed
end-to-end
## Risks
Low. The allowlist only grows; nothing previously allowed is now
blocked. Tests pin both the new allowed patterns and that out-of-doc
patterns stay rejected.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
added patterns + still-rejected negatives
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The agent live-run route lets operators trigger a manual heartbeat
invocation so an agent can pick up a specific issue or step out of band
> - The current route flow drops the caller's scope (issue/run context)
when forwarding the manual invoke into the heartbeat service, so the
resulting run loses the targeting the operator specified
> - This pull request threads the operator-supplied scope through the
manual invoke path on both the server route and the UI client, with a
regression test that confirms the scope round-trips
> - The benefit is manual heartbeat invokes from the live-run UI
actually pick up the scoped issue/run instead of falling through to the
agent's default routine
## What Changed
- `server/src/routes/agents.ts`: forward the operator-supplied scope
into the manual invoke heartbeat service call
- `server/src/__tests__/agent-live-run-routes.test.ts`: new test
verifying the manual invoke path preserves scope
- `ui/src/api/agents.ts`: pass scope through the live-run client API
## Verification
- `pnpm vitest run --no-coverage
server/src/__tests__/agent-live-run-routes.test.ts`
- `pnpm typecheck` clean
## Risks
Low. The change is purely additive on the route surface — handlers that
did not previously pass scope continue to work; handlers that did pass
it now have it preserved instead of dropped.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new test covers
the preserved-scope path
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (internal API change, no visible UI shift)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The Gemini adapter's environment Test surfaces a hello probe so
operators can confirm the CLI runs end-to-end on the configured target
> - On SSH and E2B sandbox targets the round-trip cost (login-shell
sourcing, network, model warm-up) routinely exceeds the existing 10s
probe timeout, so the probe spuriously fails on environments that are
actually healthy
> - This pull request raises the gemini-local hello probe timeout to
60s, matching the timeout we use for slower-bootstrapping adapters
> - The benefit is the Gemini Test action no longer reports false
negatives on remote targets that need a longer first-run window
## What Changed
- `packages/adapters/gemini-local/src/server/test.ts`: hello probe
timeout raised from 10s to 60s
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-gemini-local`
- Manual: SSH and E2B Gemini hello probes now complete cleanly without
spurious timeouts
## Risks
Low. A 60s ceiling on a non-blocking probe is consistent with sibling
adapters; the only behavior change is a longer worst-case wait when the
probe genuinely hangs.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — N/A (one-line
timeout change)
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip is the control plane for autonomous AI companies.
> - Routines are the scheduled/recurring work surface that keeps a
company operating without manual kicks.
> - Operators need routine edits to be auditable and recoverable,
especially when routines control assignments, prompts, triggers, and
webhook secrets.
> - Documents already have revision-style safety, but routines did not
have equivalent history or restore semantics.
> - This pull request adds append-only routine revisions across the
database, shared contracts, server routes, and board UI.
> - The benefit is safer routine iteration: users can inspect history,
compare changes, restore older definitions, and avoid overwriting newer
edits.
## What Changed
- Added `routine_revisions` storage, latest revision pointers on
routines, shared types, validators, and API docs for routine revision
history.
- Added server service/route support for listing routine revisions,
conflict-aware routine saves, and append-only restore operations.
- Added a History tab on routine detail with revision preview,
structured change summaries, description line diffs, dirty-edit
blocking, restore confirmation, and restored webhook secret surfacing.
- Extracted the line diff helper from `DocumentDiffModal` into
`ui/src/lib/line-diff.ts` for reuse.
- Rebased the branch onto current `public-gh/master` and renumbered the
routine revision migration to `0077_unusual_karnak` after upstream
`0076_useful_elektra`.
- Made the `0077` routine revision migration idempotent so installs that
already applied the branch-local `0076_unusual_karnak` can safely
advance.
- Updated the plugin SDK test harness routine fixture with the new
revision fields required by the shared `Routine` contract.
## Verification
- `pnpm --filter @paperclipai/db run check:migrations` passed.
- `pnpm exec vitest run --project @paperclipai/shared
packages/shared/src/validators/routine.test.ts` passed.
- `pnpm exec vitest run --project @paperclipai/ui
ui/src/lib/line-diff.test.ts
ui/src/components/RoutineHistoryTab.test.tsx
ui/src/lib/workspace-routines.test.ts ui/src/pages/Routines.test.tsx`
passed.
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/routines-service.test.ts --pool=forks
--poolOptions.forks.isolate=true` passed.
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/routines-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true` passed.
- `pnpm --filter @paperclipai/plugin-sdk typecheck` passed after
updating the SDK test harness fixture.
- `pnpm --filter @paperclipai/plugin-sdk build` passed; this refreshed
local generated SDK output needed by plugin example typechecks.
- `pnpm -r typecheck` passed.
## Risks
- Medium migration risk: this adds routine revision storage and
backfills existing routines. The migration is ordered after upstream
`0076` and uses `IF NOT EXISTS` / duplicate-object guards to tolerate
earlier branch-local migration application.
- Restore behavior intentionally appends a new revision instead of
mutating history; callers expecting an in-place rollback need to follow
the new latest revision pointer.
- Restoring webhook triggers recreates webhook secret material, so users
must copy newly surfaced secrets after restore.
- Conflict-aware saves now reject stale routine edits when the client
sends an older `baseRevisionId`.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5-based coding agent, with shell/tool use in a local
git worktree. Exact context-window size is not exposed in this runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Screenshots: not attached in this draft PR; the new UI flow is covered
by component tests listed above.
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
> **Stacked PR.** Sits on top of the e2b sandbox chain — #5278 (stdin
staging) and #5279 (honest-resolvability + login-profiles). The
cumulative diff against `master` includes both of those PRs' content;
the files touched by *this* PR's commit are the new
`maybeRunSandboxInstallCommand` helper in
`packages/adapter-utils/src/execution-target.ts` and the per-adapter
`index.ts`/`server/test.ts`/`server/execute.ts` wiring under
`packages/adapters/{claude,codex,cursor,gemini,opencode,pi}-local/`. The
honest resolvability check from #5279 is what gives this PR's install
command a meaningful "did it actually land on PATH" follow-up.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Sandbox execution targets are ephemeral — each fresh lease starts
from a template image that may or may not have the agent CLIs
preinstalled
> - When a CLI isn't preinstalled, the resolvability probe fails at
`command -v` and the hello probe never runs
> - There's no shared mechanism for "before you probe or provision,
install the CLI on this sandbox"
> - This pull request adds a `SANDBOX_INSTALL_COMMAND` constant per
adapter and a `maybeRunSandboxInstallCommand` helper that runs it via
the existing sandbox login shell, captures structured output, and never
throws (so the resolvability + hello probe still run after); each
adapter's `test()` and `execute()` share the constant so the two
callsites can't drift
> - The benefit is a fresh sandbox lease without a preinstalled CLI now
installs it once via `sh -lc` before the resolvability probe and before
managed-runtime provisioning, with a uniform
`<adapter>_install_command_run` check on the test report
## What Changed
- `packages/adapter-utils/src/execution-target.ts`: add
`AdapterSandboxInstallCommandCheck` and `maybeRunSandboxInstallCommand`
(runs the install via existing sandbox shell, captures
exit/stdout/stderr, returns a structured info/warn check, never throws)
- Add `SANDBOX_INSTALL_COMMAND` to each adapter's `index.ts` so `test()`
and `execute()` share a single source of truth
- Wire each of the 6 affected adapter `testEnvironment()`s to call
`maybeRunSandboxInstallCommand` before
`ensureAdapterExecutionTargetCommandResolvable`
- Pass `installCommand: SANDBOX_INSTALL_COMMAND` through
`prepareAdapterExecutionTargetRuntime` in each adapter's `execute()`
- Per-adapter install commands use npm globals where possible so
binaries land on a PATH segment the template already exports:
- claude → `npm install -g @anthropic-ai/claude-code`
- codex → `npm install -g @openai/codex`
- cursor → `curl https://cursor.com/install -fsS | bash`
- gemini → `npm install -g @google/gemini-cli`
- opencode → `npm install -g opencode-ai`
- pi → `npm install -g @mariozechner/pi-coding-agent`
SSH and local targets ignore `installCommand` (SSH runtime takes no such
param; local short-circuits before runtime prep), so this is a no-op for
non-sandbox environments.
## Verification
- `pnpm typecheck` clean
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
and per-adapter projects pass
- Manual sandbox matrix (claude, codex, cursor, gemini, opencode, pi) —
each goes `install_command_run → resolvable → hello_probe_passed` (Codex
and Pi land on `hello_probe_auth_required`, which is the
configured-credentials problem, not an install issue)
- SSH no-regression: SSH Claude still passes; the helper short-circuits
on non-sandbox targets
## Risks
Medium — adds a network/CPU cost (npm install / curl) on every fresh
sandbox lease. Cost is bounded (one-time per lease, typically tens of
seconds for npm globals), and the helper never throws so a failing
install still lets the report run resolvability and hello probes. If a
sandbox image already has the CLI, the install is an idempotent
reinstall.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR.** Sits on top of #5278 (`e2b/stage-stdin-to-temp-file`)
which ships the stdin-staging fix this builds on. The cumulative diff
against `master` includes that PR's content; the files touched by *this*
PR's commit are `packages/adapter-utils/src/execution-target.ts`,
`packages/plugins/sandbox-providers/e2b/src/plugin.ts`, and
`packages/plugins/sandbox-providers/e2b/src/plugin.test.ts`.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The adapter Test flow does an "is the command resolvable?" probe
before running the hello probe so the report distinguishes "binary not
installed" from "binary errored"
> - For sandbox targets, that resolvability check was a no-op
early-return — every sandboxed adapter test reported "Command is
executable" regardless of whether the binary existed
> - That made the resolvability check disagree with the hello probe in a
way that looked like a PATH bug, when it was actually a missing CLI
> - Separately, the e2b spawn used `sandbox.commands.run` with a
non-login non-interactive shell whose PATH did not include npm-globals,
nvm shims, or anything else the template installs via
`.profile`/`.bashrc`
> - This pull request makes the resolvability check honest by running a
real `command -v` invocation through the sandbox runner, and aligns the
e2b spawn with SSH by sourcing login profiles before `exec env KEY=val
<cmd>`
> - The benefit is the e2b sandbox spawn agrees with the hello probe and
finds CLIs at template-installed paths
## What Changed
- `packages/adapter-utils/src/execution-target.ts`: add
`ensureSandboxCommandResolvable` that runs `command -v <cli>` through
the sandbox runner; replace the early-return in
`ensureAdapterExecutionTargetCommandResolvable` for sandbox targets
- `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: replace
`buildCommandLine` with `buildLoginShellScript` (sources `/etc/profile`,
`~/.profile`, `~/.bash_profile`, `~/.bashrc`, `~/.zprofile`, and nvm.sh
before `exec env KEY=val <cmd>`); env vars are interpolated inline so
user-configured adapter env always wins over profile-exported values;
drop the now-unused `envs:` SDK option
- `plugin.test.ts` updated for the login-shell wrapping
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/sandbox-e2b` —
17/17 plugin tests pass
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
clean
- `pnpm typecheck` clean
- Manual: previously every sandboxed adapter said "Command is
executable" then the hello probe failed with "exec: not found". After
this change, missing CLIs surface honestly at the resolvability step.
SSH no-regression: SSH Claude probe still passes.
## Risks
Medium — sandbox adapter Test reports will start failing at the
resolvability step for environments where the CLI was never actually
installed. This was always the real state; the previous "Command is
executable" message was incorrect. Operators should expect
previously-green-but-broken sandbox environments to report accurately.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — `plugin.test.ts`
updated for the login-shell wrapping
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The e2b sandbox provider implements `onEnvironmentExecute` so
adapters can spawn CLIs in an e2b sandbox
> - For commands that need stdin (e.g. piping a hello prompt to a CLI),
the previous implementation awaited a foreground `commands.run({ stdin:
true, ... })` and then tried to call `sendStdin(pid)` on the now-dead
PID
> - That call resolves only after the process exits, so stdin was never
delivered and e2b raised "process not found"
> - This pull request stages stdin to `/tmp/paperclip-stdin-<uuid>`
inside the sandbox and shell-redirects it (`exec '<cmd>' '<args>' <
'<file>'`), making the command synchronous regardless of whether stdin
is supplied
> - The benefit is adapter Test probes that pipe a hello prompt to a CLI
inside an e2b sandbox now actually deliver the prompt
## What Changed
- `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: replace the
broken async `commands.run` + `sendStdin` flow with stdin-staging to a
sandbox temp file and shell-redirection
- Staged file is removed in a `finally` block; write failures propagate
after best-effort cleanup
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/sandbox-e2b` —
all 17 unit tests pass
- `pnpm typecheck` clean
- Manual: a sandboxed adapter Test probe that pipes a hello prompt now
receives the prompt
## Risks
Low risk — `plugin.test.ts` already encodes the temp-file design; the
change brings the implementation in line with the test.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — existing tests
already encode the new design
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - When a user clicks "Test" on a configured environment (SSH or
sandbox), the agent-test route exercises the adapter against that target
> - The route previously fell back to running the probe on the Paperclip
host whenever an explicit environment target couldn't be resolved, with
the test report still saying "passed"
> - That hid two real failure modes: misconfigured environments looked
green, and sandbox environments were never actually exercised
> - This pull request acquires an ad-hoc lease and realizes a workspace
for sandbox/plugin test environments, resolves a sandbox execution
target wired to the environment runtime, and returns synthesized
diagnostics instead of running a host probe when an explicit env target
can't be resolved
> - The benefit is the Test action surfaces the real environment state
and never silently exercises the wrong machine
## What Changed
- `server/routes/agents.ts`: acquire an ad-hoc lease and realize a
workspace for sandbox/plugin test environments; resolve a sandbox
execution target wired to the environment runtime
- Return synthesized diagnostics (no host fallback) when an explicit env
target can't be resolved
- `server/services/environment-runtime.ts`: small adjustments to support
the explicit-env-target case
- Clarify test-route messages so they no longer claim a host fallback in
explicit env flows
- New `agent-test-environment-routes.test.ts` covers the guard and
missing-environment path
## Verification
- `pnpm vitest run --no-coverage
server/src/__tests__/agent-test-environment-routes.test.ts`
- `pnpm typecheck` clean
- Manual: a deliberately misconfigured sandbox environment now reports
diagnostics instead of a misleading host-pass
## Risks
Medium — Test route behavior change. Explicit environments that
previously appeared to pass via host fallback will now report their real
state. This is the desired behavior, but operators should expect to see
new failures for environments that were never actually working.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
guard + missing-env paths
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The Codex adapter spawns the OpenAI Codex CLI to drive the model
> - Codex CLI 0.122 changed how it reads credentials: it ignores
`OPENAI_API_KEY` from the environment and reads only
`$CODEX_HOME/auth.json`
> - Without auth.json, Codex 0.122+ returns 401 "Missing bearer or basic
authentication" on `/v1/responses` even when `OPENAI_API_KEY` is
forwarded into the sandbox or remote shell
> - This pull request materializes an apikey-mode `auth.json` in the
managed Codex home (or per-run for the test probe) when an
`OPENAI_API_KEY` is configured
> - The benefit is configured Codex API keys authenticate correctly with
current Codex CLI versions across local, SSH, and sandbox targets
## What Changed
- `codex-home.ts`: add `writeApiKeyAuthJson()` and let
`prepareManagedCodexHome` accept an `apiKey` override that replaces the
symlinked host auth.json with an apikey-mode file
- `execute.ts`: pass `envConfig.OPENAI_API_KEY` into
`prepareManagedCodexHome` so the managed (and synced-to-remote) Codex
home authenticates via the configured key
- `test.ts`: when `OPENAI_API_KEY` is available, wrap the hello probe
with a small shell that materializes a per-run `$CODEX_HOME/auth.json`
before exec'ing codex; key content rides through env to avoid leaking
into process listings
- Update the `codex_hello_probe_auth_required` hint to explain Codex CLI
does not read `OPENAI_API_KEY` from env
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-codex-local`
- `pnpm typecheck` clean
- Manual: Codex 0.122.0 with empty `CODEX_HOME` returns 401 with
env-only auth; with this change it authenticates cleanly
## Risks
Low risk — when no API key is configured, behavior is unchanged (no
auth.json written, existing chatgpt-mode flow preserved). Apikey-mode
`auth.json` is the upstream-supported format.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The Pi adapter runs the pi-coding-agent CLI against local, SSH, and
sandbox execution targets
> - The Test path's hello probe spreads the host's `process.env` into
the remote process env, including the macOS PATH
> - The leaked Mac PATH overrides the nvm-sourced PATH set up by
`buildSshSpawnTarget`, so on a Linux SSH target `node` resolves to
system Node 18 instead of nvm's Node 20+
> - pi-coding-agent v0.68 / pi-tui then crashes at
`pi-tui/dist/utils.js:27` with `SyntaxError: Invalid regular expression
flags` on the `/v` unicode-sets regex (a Node 20+ feature)
> - This pull request stops the leak — same fix as the opencode SSH
probe — by passing only user-configured adapter env to the probe when
the target is remote
> - The benefit is the Pi hello probe now passes end-to-end against an
SSH target without the Node version mismatch
## What Changed
- `packages/adapters/pi-local/src/server/test.ts` passes only the
user-configured adapter env (`normalizeEnv(env)`) to
`runAdapterExecutionTargetProcess` when the target is remote
- Local probes still get the full `runtimeEnv` so headless permission
injection keeps working
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-pi-local`
- `pnpm typecheck` clean
- Manual: Pi hello probe goes from `pi_hello_probe_failed` (Node 18
regex error) to `pi_hello_probe_passed` against an SSH target
## Risks
Low risk — same pattern shipped for opencode-local and consistent with
claude-local / codex-local / gemini-local.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — pattern mirrors
sibling adapters
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The OpenCode adapter runs against local, SSH, and sandbox execution
targets
> - The Test path's hello probe spreads the Paperclip host's
`process.env` into the remote process env, which over SSH gets exported
on the remote shell
> - On a Linux SSH target, `HOME=/Users/...` and a host XDG_CONFIG_HOME
pointing at a macOS `/var/folders/...` temp dir cause OpenCode to walk a
host-only path and fail with `EACCES: permission denied, mkdir '/Users'`
> - This pull request stops the leak by passing only user-configured
adapter env to the probe when the target is remote, matching the pattern
already used by claude-local, codex-local, and gemini-local
> - The benefit is the OpenCode hello probe now passes end-to-end
against an SSH target without spurious filesystem errors
## What Changed
- `prepareOpenCodeRuntimeConfig` short-circuits when the target is
remote — the host-fs temp config dir is meaningless and harmful for a
remote target
- `test.ts` passes only the user-configured adapter env (no host
`process.env` spread) to `runAdapterExecutionTargetProcess` when
`targetIsRemote`
- Local probes still get the full `runtimeEnv` so headless permission
injection keeps working
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-opencode-local`
- `pnpm typecheck` clean
- Manual: SSH OpenCode hello probe goes from `EACCES … mkdir '/Users'`
to `opencode_hello_probe_passed`
## Risks
Low risk — local probe behavior is unchanged; the change only narrows
the env passed to remote targets, matching the pattern already shipped
in sibling adapters.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — pattern mirrors
existing sibling tests
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent uses an adapter that drives a CLI (Claude, Gemini, Codex,
etc.)
> - The Gemini adapter parses a JSONL transcript stream the CLI emits to
learn what the model said
> - Gemini CLI v0.38 changed the transcript shape: assistant text now
comes through `type=message` with `role`/`content` and terminal status
comes through `type=status` / `type=stats`
> - The existing parser was written against the older `type=assistant` /
`type=result` shape, so post-v0.38 outputs left the parsed summary empty
and downgraded the SSH hello probe to "unexpected output"
> - This pull request updates every Gemini consumer (server parser, UI
parser, CLI formatter) to accept the v0.38 shape while keeping the
legacy shape working
> - The benefit is the Gemini adapter handles current upstream output
without losing backward compatibility, with explicit test coverage for
both shapes
## What Changed
- `packages/adapters/gemini-local/src/server/parse.ts` recognizes
`type=message` events with role/content and stops downgrading them
- `packages/adapters/gemini-local/src/ui/parse-stdout.ts` mirrors the
parser changes for the live UI transcript
- `packages/adapters/gemini-local/src/cli/format-event.ts` formats the
new event shape correctly for CLI output
- `parse.test.ts` and `parse-stdout.test.ts` add v0.38 coverage;
`gemini-local-adapter.test.ts` and `execute.remote.test.ts` switch
happy-path fixtures to the current real wire format and keep dedicated
tests for the older schema
## Verification
- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-gemini-local` — full suite passes including new
v0.38 cases and preserved legacy cases
- `pnpm typecheck` clean
## Risks
Low risk — additive event handling. Legacy event shape path is preserved
with its own tests, so existing fixtures continue to parse identically.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The plugin system is the extension boundary for optional product
capabilities
> - Rich plugins need more than a worker entrypoint: they need scoped
database storage, local project folders, managed agents/routines, host
navigation, and reusable UI components
> - The LLM Wiki work exposed those missing host surfaces while keeping
plugin code outside the core control plane
> - This pull request expands the core plugin host, SDK, server APIs,
and UI bridge so plugins can declare and use those surfaces
> - The benefit is that future plugins can integrate with Paperclip
through documented, validated contracts instead of bespoke server or UI
imports
## What Changed
- Added plugin-managed database namespaces and migration tracking,
including Drizzle schema/migration files and SQL validation for
namespace isolation.
- Added server support for plugin local folders, managed agents, managed
routines, scoped plugin APIs, and plugin operation visibility.
- Expanded shared plugin manifest/types/validators and SDK
host/testing/UI exports for richer plugin surfaces.
- Added reusable UI pieces for file trees, managed routines, resizable
sidebars, route sidebars, and plugin bridge initialization.
- Updated plugin docs and example plugins to use the expanded host and
SDK surface.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/validators/plugin.test.ts
server/src/__tests__/plugin-database.test.ts
server/src/__tests__/plugin-local-folders.test.ts
server/src/__tests__/plugin-managed-agents.test.ts
server/src/__tests__/plugin-managed-routines.test.ts
server/src/__tests__/plugin-orchestration-apis.test.ts
ui/src/api/plugins.test.ts ui/src/components/FileTree.test.tsx
ui/src/components/ResizableSidebarPane.test.tsx
ui/src/pages/PluginPage.test.tsx ui/src/plugins/bridge.test.ts` passed:
11 files, 67 tests.
- Confirmed this PR changes 89 files and does not include
`pnpm-lock.yaml` or `.github/workflows/*`.
## Risks
- Medium: this expands plugin host contracts across db/shared/server/ui
and includes a new core migration (`0076_useful_elektra.sql`).
- The plugin database namespace validator is intentionally restrictive;
plugin authors may need follow-up affordances for SQL patterns that
remain blocked.
- Merge this before the LLM Wiki plugin PR so the plugin can resolve the
new SDK and host APIs.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub
workflow. Context window size was not exposed by the runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Summary
- Allow Cloud tenant issue identifiers with alphanumeric prefixes, such
as `PC1897-1`, to normalize as issue references.
- Resolve those identifiers through issue detail/update routes, active
run/live run polling, activity, costs, and `issueService.getById`.
- Keep UI issue-link parsing aligned so tenant links normalize back to
`/issues/<IDENTIFIER>`.
## Root Cause
Cloud tenant issue prefixes include digits from the stack-id hash. The
app-side route normalization still accepted only all-letter prefixes, so
`/api/issues/PC1897-1` skipped identifier lookup and fell through as a
non-UUID id.
## Verification
- `pnpm exec vitest run packages/shared/src/issue-references.test.ts
ui/src/lib/issue-reference.test.ts
server/src/__tests__/issue-identifier-routes.test.ts
server/src/__tests__/activity-routes.test.ts
server/src/__tests__/costs-service.test.ts
server/src/__tests__/agent-live-run-routes.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck`
- `git diff --check`
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip orchestrates AI agents for autonomous companies, so
developer throughput on the control plane repo directly affects how fast
the product can evolve.
> - The PR workflow is part of that throughput surface because every
change waits on it before review and merge.
> - This branch started from measured evidence that the PR critical path
was dominated by work that was either serialized unnecessarily or placed
on the wrong part of the graph.
> - The biggest concrete problems were: the canary dry run living inside
`verify`, the server isolated suites running one-by-one in a single
lane, and duplicate CI work that the PR path was paying for without
increasing coverage proportionally.
> - This pull request restructures the PR workflow so those costs are
reduced without removing the important coverage that was already
protecting release and test quality.
> - Follow-up fixes on the branch hardened the new entrypoints so they
work on clean GitHub runners and so the reduced PR typecheck path stays
self-maintaining as workspace packages evolve.
> - The benefit is materially faster PR wall-clock time while keeping
canary packaging checks, serialized-suite isolation, plugin SDK
consumers, and explicit TypeScript coverage where builds do not already
provide it.
## What Changed
- Moved the PR canary dry run into its own `Canary Dry Run` job so it
still runs on PRs but no longer extends the `verify` critical path.
- Split the custom Vitest runner into `general`, `serialized`, and `all`
modes, and added shard support for the isolated server suites.
- Added `test:run:general` and `test:run:serialized` scripts, then
rewired PR CI to fan the serialized server suites out across a 4-way
matrix.
- Added the required `@paperclipai/plugin-sdk` build preflight before
the new reduced-scope typecheck and test entrypoints so they succeed on
clean CI runners.
- Replaced the hardcoded PR build-gap list with
`scripts/run-typecheck-build-gaps.mjs`, which discovers workspace
packages whose `build` scripts skip TypeScript and runs only their
explicit `typecheck` scripts.
- Removed the redundant `pnpm build` from the PR `e2e` job because the
Playwright onboarding path boots Paperclip from source.
## Verification
- `ruby -e "require 'yaml'; YAML.load_file('.github/workflows/pr.yml');
puts 'workflow ok'"`
- `node scripts/run-vitest-stable.mjs --mode general --dry-run`
- `node scripts/run-vitest-stable.mjs --mode serialized --shard-index 0
--shard-count 4 --dry-run`
- `pnpm run typecheck:build-gaps`
- `pnpm test:run:general`
- `pnpm test:run:serialized -- --shard-index 0 --shard-count 4`
- `pnpm build`
- `pnpm paperclipai onboard --yes --run`
- `curl http://127.0.0.1:3299/api/health`
## Risks
- Branch protection or required-check configuration may need to be
updated for the new standalone `Canary Dry Run` job and the
serialized-suite matrix job names.
- `scripts/run-typecheck-build-gaps.mjs` assumes packages that need
explicit PR-time typechecking are the ones whose `build` scripts omit
`tsc`; if build conventions change, that heuristic needs to stay
aligned.
- Serialized test sharding preserves per-suite isolation, but the first
few CI runs should still be watched for shard-balance or naming
assumptions in downstream tooling.
## Model Used
- OpenAI GPT-5.4 via the Codex local adapter, using high reasoning
effort with shell, git, and file-edit tool use in a local worktree.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
## Thinking Path
> - Paperclip is a control plane for autonomous agent companies, so its
release automation is part of the core operator trust boundary.
> - The affected subsystem is npm/GitHub Actions release publishing for
the public monorepo packages.
> - The concrete failure was that a newly added package reached
`master`, the canary workflow attempted its first publish, and npm
trusted publishing was not yet bootstrapped for that package.
> - That means the problem is not just one broken run; it is a missing
pre-merge guard that lets release-ineligible packages land and only fail
once `publish_canary` runs.
> - This pull request makes release enrollment explicit, validates that
enrollment in CI, and adds a PR-time bootstrap check against npm for
changed release-enabled package manifests.
> - The result is that we keep trusted publishing, avoid teaching CI to
`npm adduser`, and move this class of failure from post-merge canary
time to pre-merge review time.
## What Changed
- Added `scripts/release-package-manifest.json` so release-managed
public packages are explicitly enrolled instead of being inferred from
every non-private workspace package.
- Hardened `scripts/release-package-map.mjs` to validate the manifest
before release workflows rewrite versions or assemble publish payloads.
- Added `scripts/check-release-package-bootstrap.mjs` and wired it into
`.github/workflows/pr.yml` so PRs that change a release-enabled package
manifest fail if that package does not already exist on npm.
- Added release-package manifest coverage tests to
`scripts/release-package-map.test.mjs` and included them in `pnpm run
test:release-registry`.
- Wired manifest validation into `.github/workflows/release.yml` and
documented the first-publish bootstrap policy in `doc/PUBLISHING.md` and
`doc/RELEASE-AUTOMATION-SETUP.md`.
## Verification
- `pnpm run test:release-registry`
- `./scripts/release.sh canary --skip-verify --dry-run`
- Confirmed the committed diff contains no obvious PII/secrets via
targeted pattern scan before pushing.
## Risks
- Low risk overall: this is CI/release-policy code, not product runtime
logic.
- The new PR bootstrap check depends on npm metadata availability, so a
transient npm outage could block a PR that changes a release-enabled
package manifest.
- The manifest introduces a new source of truth that must stay aligned
with public package additions, but that is intentional and now enforced.
## Model Used
- OpenAI Codex via the `codex_local` Paperclip adapter; GPT-5-based
coding agent with tool use, terminal execution, git, and GitHub CLI.
Exact served model ID/context window are not exposed by the local
runtime.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies, including
agents
> running the Gemini CLI (`gemini-local` adapter)
> - The Gemini CLI emits a JSONL event stream during a run that the
adapter
> parses to extract the assistant's response text, tool results, and
usage
> - Recent versions of the Gemini CLI emit assistant responses as
> `{ "type": "message", "role": "assistant", "content": ... }` events in
> addition to the previously-handled event shapes
> - The parser was not handling the new event type, so the assistant's
actual
> response text was being silently dropped from parsed output. Callers
ended
> up with empty assistant messages even when Gemini had successfully
> responded
> - This PR teaches the parser to recognize `{type: "message", role:
> "assistant"}` events and extract their content text via the same
> `collectMessageText` helper used for other message-shaped events
> - The benefit is that Gemini runs surface the assistant's real
response in
> downstream consumers (issue comments, run logs, downstream agent
context)
> instead of vanishing
## What Changed
- `packages/adapters/gemini-local/src/server/parse.ts`: in
`parseGeminiJsonl(...)`, add a branch for `event.type === "message"`
with
`role === "assistant"` that calls
`messages.push(...collectMessageText(event.content))`.
- `packages/adapters/gemini-local/src/server/parse.test.ts`: ~19 lines
of
coverage for the new branch.
## Verification
- `pnpm --filter @paperclipai/adapter-gemini-local test -- parse`
- Manual QA: run a Gemini agent on an issue, confirm the assistant's
response
appears as the issue comment / run output. Before this fix the comment
was
empty even when the run completed successfully.
## Risks
- Tightly scoped: 8 lines of production code in one parser branch. No
effect
on existing event shapes or other adapters.
- If the Gemini CLI changes its event schema again, this branch may need
to be
revisited — but adding it is strictly additive over current behaviour.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents executing on remote SSH hosts receive an env map built from
the host
> process's env plus per-run additions like `PAPERCLIP_API_KEY`,
> `PAPERCLIP_RUN_ID`, etc.
> - The env map currently includes inherited host vars by default,
including
> identity-bound ones like `PATH`, `HOME`, `USER`, `NVM_DIR`, `XDG_*` —
> variables whose values are meaningful only on the host they came from
> - Sending the host's `PATH` (containing host-only directories like a
local
> nvm install path) to a remote SSH box overrides the remote's actual
`PATH`
> and breaks command resolution. Same hazard for `HOME` (commands
looking for
> config files end up in a non-existent dir), `USER` (writes go to the
wrong
> path), etc.
> - This PR adds `sanitizeSshRemoteEnv()` that drops inherited
identity-bound
> vars when their value matches the host process's value. Explicitly-set
> values pass through untouched, so callers that genuinely want to
override
> remote `PATH` etc. still can — but accidental leakage from
> `process.env` is filtered.
> - The benefit is that SSH remote execution stops corrupting the remote
> shell's environment with host-shaped paths, so commands resolve
correctly
> against the remote PATH and config files land in the remote `HOME`
## What Changed
- New `sanitizeSshRemoteEnv(env, inheritedEnv = process.env)` in
`packages/adapter-utils/src/server-utils.ts`. The identity-bound key set
is:
- `PATH`, `HOME`, `PWD`, `SHELL`, `USER`, `LOGNAME`
- `NVM_DIR`, `TMPDIR`, `TMP`, `TEMP`
- `XDG_CONFIG_HOME`, `XDG_CACHE_HOME`, `XDG_DATA_HOME`,
`XDG_STATE_HOME`, `XDG_RUNTIME_DIR`
For any key in this set, the entry is dropped iff the env value equals
the
inherited (host process) value. Other keys pass through unchanged.
- `readEnvValueCaseInsensitive(...)` helper handles Windows-style
case-insensitive env var lookups.
- Wired into `resolveSpawnTarget(...)` for the SSH transport. Sandbox
and local
paths are unaffected.
- Tests added in `server-utils.test.ts` (~50 lines) covering: matching
keys
filtered, mismatched keys preserved, non-identity keys passed through,
case
insensitivity.
## Verification
- `pnpm --filter @paperclipai/adapter-utils test -- server-utils`
- Manual QA: run any adapter against an SSH-backed environment, confirm
remote command resolution works (e.g. `node`, `npm`, the adapter's CLI)
and
config files land in the remote user's `HOME`. Compare to the prior
behaviour
by transiently re-introducing the inherited `PATH` and watching commands
fail with `command not found`.
## Risks
- Behavioural shift: SSH remote execution previously passed inherited
host env
vars verbatim. Code that relied on that (e.g. a remote command somehow
expecting the host's `PATH`) will see different behaviour. None of the
adapter code in this repo has such a dependency.
- Edge case: if a caller explicitly sets `PATH` to the same value as the
host's
`PATH` (literally — same exact string), the sanitizer drops it as a
leak.
In practice no caller constructs the env this way.
- Windows host: case-insensitive lookup handles `Path` vs `PATH`
correctly.
Tested.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies, running
adapter
> commands like `claude`, `codex`, `pi` either locally or on remote
runtimes
> (SSH hosts, sandboxes, etc.)
> - On a fresh remote runtime — particularly an ephemeral sandbox — the
> adapter's CLI may not be installed yet. Today operators handle this
via
> external configuration (e.g. a project-level `provisionCommand` shell
> script) that has to know about every adapter the operator might want
to use
> - This means every adapter has its own well-known npm package, but
operators
> end up writing duplicate provision shell scripts that paste together
> `npm install -g @anthropic-ai/claude-code`, `npm install -g
@openai/codex`,
> etc. — knowledge the adapter itself already has
> - This PR moves that knowledge into the adapter modules: each adapter
declares
> how its runtime command should be detected and (if applicable)
installed
> via `getRuntimeCommandSpec(config)`. The execution path runs the
adapter's
> own install command on remote sandbox targets before launching, so a
fresh
> sandbox bootstraps itself instead of requiring a hand-written
provision script
> - The benefit is fewer footguns for operators provisioning remote
runtimes,
> and a clean place for new adapters to plug in their install recipe
## What Changed
- New types in `packages/adapter-utils/src/types.ts`:
- `AdapterRuntimeCommandSpec` describing `command`, optional
`detectCommand`, and optional `installCommand`
- Optional `getRuntimeCommandSpec(config)` on `ServerAdapterModule`
- Optional `runtimeCommandSpec` on `AdapterExecutionContext` so adapters
receive the resolved spec at execute time
- New helper `ensureAdapterExecutionTargetRuntimeCommandInstalled(...)`
in
`packages/adapter-utils/src/execution-target.ts` that runs the install
command
on remote targets when `transport === "sandbox"`. SSH and local targets
are
no-ops. Throws on timeout or non-zero exit so failures surface early.
- Each of `claude-local`, `codex-local`, `cursor-local`, `gemini-local`,
`opencode-local`, `pi-local`'s `execute.ts` now reads
`ctx.runtimeCommandSpec?.installCommand` and calls the helper before
launching
the adapter command.
- `server/src/adapters/registry.ts` declares `getRuntimeCommandSpec` for
each
adapter:
- claude/codex/gemini/opencode/pi-local: `npm install -g <package>`
recipe via
a shared `buildNpmRuntimeCommandSpec` helper, with a defensive guard
that
only auto-installs when the configured `command` matches the well-known
fallback (custom binaries are left alone).
- cursor-local: declares `command` only; no auto-install (no public npm
package), preserving the existing manual setup.
- `server/src/services/heartbeat.ts` resolves the spec via
`adapter.getRuntimeCommandSpec?.(runtimeConfig)` and passes it through
to
`AdapterExecutionContext`.
- Tests added in `execution-target.test.ts` (~75 lines), e2b
`plugin.test.ts` (~32 lines), and `environment-run-orchestrator.test.ts`
(~76 lines).
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-run-orchestrator`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: run an adapter (claude/codex/etc.) against a fresh
sandbox-backed
environment that does NOT have the adapter CLI pre-installed. Confirm
the
install runs once at the start of the agent run and the adapter then
launches
successfully. Re-run on the same sandbox; confirm the install command is
idempotent and the second run starts faster.
- Confirm SSH and local execution paths are unaffected (gated by
`transport === "sandbox"`).
## Risks
- Behavioural shift on sandbox runs: a new install step now runs at the
start
of every sandbox agent run for adapters with `installCommand` set. The
install commands are idempotent (`if ! command -v X >/dev/null 2>&1;
then
npm install -g <pkg>; fi`), so this is fast on warm sandboxes. On a cold
sandbox, the first run takes longer.
- Operators who used the legacy project-level `provisionCommand` to
install
adapter CLIs can drop that part of their script; the adapter handles it
now.
Existing scripts continue to work — installs are idempotent.
- The cursor-local adapter has no auto-install (no public npm package).
Behaviour for cursor-local on sandboxes is unchanged.
- New optional surface on `ServerAdapterModule`. Plugins that don't
implement
`getRuntimeCommandSpec` retain previous behaviour (no auto-install).
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents on remote sandboxes call back into the Paperclip control
plane via a
> callback bridge — the host process running the bridge proxies HTTP
requests
> from the sandbox to the Paperclip API
> - When something goes wrong end-to-end (sandbox can't reach Paperclip,
requests
> timing out, malformed responses), it's hard to tell whether the bridge
> processed the request, what URL/method it used, and what the upstream
> responded with
> - There was no built-in way to log bridge proxy traffic without
modifying
> adapter code or attaching a debugger
> - This PR adds opt-in stdout logging of every bridge proxy request and
response
> (method, path, query, status), gated behind `PAPERCLIP_BRIDGE_DEBUG`
so it
> stays off by default
> - The benefit is that operators can flip a single env var to get full
visibility
> into bridge traffic when debugging remote runs, without changing code
## What Changed
- `packages/adapter-utils/src/execution-target.ts`:
`startAdapterExecutionTargetPaperclipBridge`'s `handleRequest` now logs
each
proxied request and response when `PAPERCLIP_BRIDGE_DEBUG` is truthy:
- `[paperclip] Bridge proxy <METHOD> <path>?<query>` before fetch
- `[paperclip] Bridge proxy response <status> for <METHOD>
<path>?<query>` after
- Logging is no-op when the env var is unset/`"0"`/`"false"`.
## Verification
- Set `PAPERCLIP_BRIDGE_DEBUG=1` in the host process env, run an agent
against
a sandbox-backed environment, confirm the bridge log lines appear in
stdout.
- Unset the env var and confirm no extra log lines appear during normal
runs.
## Risks
- Off-by-default, no observable change for shipping users.
- When enabled, the logging is verbose — every API call from the sandbox
produces 2 stdout lines. Operators should only enable it during active
debugging.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [ ] I have added or updated tests where applicable — covered by
exercising the
flag in dev; the underlying handleRequest behavior is unchanged
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents run inside execution workspaces (a per-issue cwd + env), and
an issue
> can prefer to reuse an existing workspace or get a fresh one each time
> - The heartbeat service was reading the existing workspace's config to
derive
> environment selection regardless of whether the issue actually wanted
to reuse
> it. So fresh-run issues were inheriting stale config from a workspace
that was
> about to be discarded
> - Separately, when an issue is assigned to an agent, the issue's
execution
> workspace settings weren't picking up the agent's
`defaultEnvironmentId`,
> even though the agent's choice is the natural default for that issue
> - This PR makes both selection paths honor the obvious source of
truth:
> workspace config flows only when the issue actually wants
`reuse_existing`,
> and the assignee agent's default environment is applied at assignment
time if
> nothing else is set on the issue
> - The benefit is that re-running a flaky issue picks up the right
environment
> instead of inheriting the previous run's config, and assigning an
agent to an
> issue does the obvious thing without operator intervention
## What Changed
- `server/src/services/heartbeat.ts`: introduce
`reusableExecutionWorkspaceConfig`
that is non-null only when `shouldReuseExisting` is true. Both
`resolveExecutionWorkspaceEnvironmentId(...)` and
`applyPersistedExecutionWorkspaceConfig(...)` now read from it instead
of
unconditionally consulting `existingExecutionWorkspace?.config`.
Fresh-run
issues no longer inherit stale environment config from an in-flight
workspace
about to be discarded.
- `server/src/services/issues.ts`: when an issue update sets a new
`assigneeAgentId` and isolated workspaces are enabled, populate
`executionWorkspaceSettings.environmentId` from the assignee agent's
`defaultEnvironmentId` if the issue doesn't have an explicit
`environmentId` set yet.
- Tests added in `heartbeat-plugin-environment.test.ts` (~216 lines) and
`issues-service.test.ts` (~85 lines) covering both paths.
## Verification
- `pnpm --filter @paperclipai/server test --
heartbeat-plugin-environment issues-service`
- Manual QA: assign an issue to an agent that has a non-default
`defaultEnvironmentId`, confirm the issue's workspace settings now
include that
environment id without operator intervention. Trigger a rerun on an
issue
whose existing workspace points at a stale environment, confirm the
rerun uses
the freshly-resolved environment.
## Risks
- Behavioural shift on assignment: previously assigning an agent didn't
propagate the agent's default environment to the issue. Now it does.
Callers
that explicitly want the issue to keep its existing/null environment
must set
`executionWorkspaceSettings.environmentId` themselves; the new logic
only
fires when no explicit value is set.
- Behavioural shift on rerun: stale workspace config is no longer
applied to
fresh runs. Operators who relied on this implicit inheritance may see
different environment selection on the first rerun after deploy.
Mitigation:
the explicit isssue settings and project policy are still honored as
before.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR (part 7 of 7).** Depends on:
- PR #5114
- PR #5115
- PR #5116
- PR #5117
- PR #5118
- PR #5119
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The Pi adapter persists a session jsonl per agent so subsequent runs
resume
> conversation context instead of starting cold
> - SSH testing reproduced a real failure: a verification issue reached
terminal
> `done` and the agent claimed success, but the proof artifact
> `manual-qa/environment-matrix/ssh/pi_local.md` was missing from the
realized
> SSH workspace on the QA target box
> - Root cause: the saved session header recorded a different cwd than
the new
> execution cwd, but the resume eligibility check only compared
session-params
> cwd via local-style `path.resolve` (which doesn't roundtrip on remote
POSIX
> paths). The stale session got resumed and writes landed in the wrong
cwd
> - This PR tightens resume eligibility for remote targets: it adds
remote-aware
> cwd normalisation, reads the first line of the session jsonl over SSH
(`head
> -n 1`) to verify the saved header cwd, and only resumes when both
> session-params cwd *and* the on-disk header cwd match the realised
execution
> cwd. Stale sessions are skipped silently and the run starts cold
> - The benefit is that Pi runs across cwd-changing environments stop
> accidentally resuming each other's sessions, and proof artifacts land
where
> reviewers expect them
## What Changed
- Added `normalizeExecutionCwd`, `executionCwdsMatch`,
`readSessionHeaderCwd`,
and `readSavedSessionCwd` helpers in `pi-local/src/server/execute.ts`
- `readSavedSessionCwd` reads the first line of the session jsonl —
locally via
`fs.readFile`, remotely via `runAdapterExecutionTargetShellCommand`
(`head -n 1`)
- Resume eligibility now requires:
1. Saved session id is non-empty
2. Execution target shape matches (existing check)
3. Session-params cwd matches the realised execution cwd
4. Session-header cwd (from the on-disk jsonl) matches the realised
execution cwd
- Stale sessions are skipped silently (run starts cold) instead of
resumed
- `execute.remote.test.ts` extended with: matching header → resume;
mismatched
header → start fresh; missing/unreadable header → start fresh; remote
head
command failure → start fresh
## Verification
- `pnpm --filter @paperclipai/adapter-pi-local test`
- `pnpm test -- pi-local`
- Manual QA: ran a Pi agent twice in two different remote cwds,
confirmed
the second run did not pick up the first run's session and that
subsequent
runs in the original cwd still resumed correctly
## Risks
- Adds a `head -n 1` shell call per Pi run on remote targets. Negligible
latency (single read of session jsonl), bounded by 15s timeout.
- If the `head` call fails for unrelated reasons (transient remote
unreachability), the run will start cold instead of resuming. This is
the
safe default but worth noting — operators may see one extra cold run if
a
remote glitches mid-session.
- No data is deleted or migrated; stale sessions remain on disk for
manual
inspection if desired.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR (part 6 of 7).** Depends on:
- PR #5114
- PR #5115
- PR #5116
- PR #5117
- PR #5118
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - The OpenCode adapter validates that its configured model exists
before letting
> a run start so misconfiguration fails fast with a clear error
> - SSH testing reproduced an OpenCode failure where issues stayed
`backlog`,
> timed out, and produced no comments. The root cause was in
> `packages/adapters/opencode-local/src/server/execute.ts`: the local
model
> guard `ensureOpenCodeModelConfiguredAndAvailable(...)` only ran when
execution
> was *not* remote, so SSH OpenCode bypassed it and failed silently
later
> - Subsequent testing surfaced a related remote-only failure where the
probe
> (when wired up naively) hits `EACCES: permission denied, mkdir
'/var/folders'`
> on the SSH box because of how OpenCode's runtime config picks a
tempdir
> - This PR runs the model probe on the actual execution target —
`opencode
> models` via `runAdapterExecutionTargetProcess` — instead of the local
CLI,
> parses the output with the shared `parseOpenCodeModelsOutput` helper,
and
> reports a concrete error naming the offending model and a sample of
available
> remote models when the configured model isn't present
> - The benefit is that mismatched OpenCode models surface as a clear
pre-flight
> error referencing the remote target instead of a silent run that never
leaves
> `backlog`
## What Changed
- Added `ensureRemoteOpenCodeModelConfiguredAndAvailable` in
`opencode-local/src/server/execute.ts` that runs `opencode models` via
`runAdapterExecutionTargetProcess` and validates the configured model is
in
the parsed output
- `models.ts` now exports `parseOpenCodeModelsOutput` and
`requireOpenCodeModelId`
so the remote path can reuse them
- `execute.ts` calls the remote variant when `executionTargetIsRemote`,
otherwise
the existing local `ensureOpenCodeModelConfiguredAndAvailable`
- Errors include the offending model id and a sample of available remote
models
so the operator knows exactly what's missing
- `execute.remote.test.ts` extended with cases for: probe timeout, probe
non-zero exit, empty model list, and missing-model error
## Verification
- `pnpm --filter @paperclipai/adapter-opencode-local test`
- `pnpm test -- opencode-local`
- Manual QA: configured an OpenCode agent with a model that exists
locally but
not in the remote sandbox, and confirmed the new error fires before the
run
starts and references the remote target
## Risks
- New behaviour: remote model validation adds a `~20s timeout` `opencode
models`
call on every remote run start. For most environments this is fast, but
a
network-slow sandbox could see startup latency rise. Timeout is bounded.
- If the remote CLI is missing or misconfigured, the new error replaces
the old
generic startup failure — clearer message, but the failure point shifts
earlier. Monitor for any QA flows that relied on the old failure shape.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR (part 5 of 7).** Depends on:
- PR #5114
- PR #5115
- PR #5116
- PR #5117
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents run with a Paperclip-shaped environment
(`PAPERCLIP_WORKSPACE_CWD`,
> worktree path, `PAPERCLIP_WORKSPACES_JSON` hints) so the CLI can
locate the
> correct project tree
> - SSH testing reproduced a real failure: a Codex SSH run wrote to
> `/tmp/paperclip-env-matrix-...` (the *host* path) instead of the
realized
> remote workspace at `/home/<user>/paperclip-env-matrix-ssh-claude/...`
> because the adapter injected `PAPERCLIP_WORKSPACE_CWD=/tmp/...` into
the
> remote env
> - Code review on the initial codex-only fix asked to roll the same
approach
> into every other SSH-capable adapter (claude, acpx, cursor, opencode,
gemini,
> pi) via a shared helper rather than duplicating per-adapter
> - This PR adds `shapePaperclipWorkspaceEnvForExecution` in
adapter-utils that,
> when the execution target is remote: replaces local cwd with the
realized
> execution cwd, nulls out worktree path (which has no remote meaning),
and
> rewrites/strips `cwd` entries in workspace hints based on what was
actually
> synced. Every adapter calls it before invoking the remote runner
> - The benefit is that remote runs see the realized remote workspace,
host-local
> paths stop leaking into remote env, and the rule is unit-tested in one
place
## What Changed
- Added `shapePaperclipWorkspaceEnvForExecution` to
`packages/adapter-utils/src/server-utils.ts` with full unit coverage
(`server-utils.test.ts`)
- Each of acpx-local, claude-local, codex-local, cursor-local,
gemini-local,
opencode-local, pi-local now calls the new shaper before issuing the
remote
command and feeds the shaped values into `applyPaperclipWorkspaceEnv`
- Per-adapter `execute.remote.test.ts` files extended to cover the new
shaping
behaviour: localhost paths replaced with remote cwd, foreign-cwd hints
stripped, worktree path nulled out for remote targets
- `acpx-local/src/server/execute.test.ts` extended with shaping coverage
## Verification
- `pnpm test -- server-utils execute.remote`
- `pnpm --filter @paperclipai/adapter-acpx-local test`
- Manual QA reproducing the original failure:
1. Provision an E2B sandbox environment for the Paperclip QA company
2. Assign an issue to a remote-targeted claude-local agent and confirm
the
run starts in the correct remote cwd (no `/Users/...` path leakage in
the
run logs)
3. Repeat for opencode-local and pi-local
## Risks
- Behavioural shift: hints whose `cwd` doesn't match the workspace cwd
are now
stripped on remote targets. If any adapter relied on a leaked local hint
cwd,
it will see a missing `cwd` instead. Reviewed all current callers — none
do.
- Adds a small per-run cost (path resolve + string normalisation) on
every remote
execution. Negligible.
- Worktree path is now nulled out on remote (it has no meaning there).
Adapters
that previously read the value defensively will continue to work.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR (part 4 of 7).** Depends on:
- PR #5114
- PR #5115
- PR #5116
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - When creating an OpenCode-local agent, Paperclip currently validates
> `adapterConfig.model` against the *Paperclip host's* `opencode models`
output
> - SSH testing surfaced that this blocks creating an OpenCode agent for
an SSH
> environment: the model that exists on the SSH target isn't visible to
the
> host, so creation fails with "OpenCode requires `adapterConfig.model`
in
> provider/model format" even when the operator picked a real remote
model
> - The initial direction was environment-aware model discovery; the
final
> decision was to keep OpenCode on the same explicit-model pattern as
other
> adapters (default + curated list + manual override) and stop blocking
> creation on host-side discovery
> - This PR does both: the adapter-models endpoint now accepts
`environmentId` and
> probes against the target environment, and the create-time hard gate
is
> replaced by `requireOpenCodeModelId` which validates `provider/model`
*format*
> without requiring host-local discovery. Test/run-time still surfaces
real
> auth/availability problems
> - The benefit is that operators can create OpenCode agents for remote
> environments without out-of-band setup, and the model picker in the UI
> reflects the actually-targeted environment
## What Changed
- Added `requireOpenCodeModelId(input)` in
`opencode-local/src/server/models.ts`,
exported it from the adapter index
- `ensureOpenCodeModelConfiguredAndAvailable` now delegates the format
check to
`requireOpenCodeModelId`
- `agentsApi.adapterModels(companyId, adapterType, { environmentId })`
now accepts
an environment ID and passes it as a query parameter
- `queryKeys.agents.adapterModels` now keys on `(companyId, adapterType,
environmentId)`
- `server/src/routes/agents.ts` reads and validates the new query
parameter,
forwarding it to the adapter's model probe
- `AgentConfigForm.tsx` and `OnboardingWizard.tsx` build the model query
key from
the currently selected default environment ID and disable autodetect for
`opencode_local` (model selection is explicit)
- `NewAgent.tsx` simplified — no longer special-cases OpenCode
autodetect
- `company-portability.ts` no longer needs OpenCode-specific autodetect
handling
- Tests added/updated:
`adapter-model-refresh-routes.test.ts`, `adapter-models.test.ts`,
`agent-permissions-routes.test.ts`,
`opencode-local/src/server/models.test.ts`
## Verification
- `pnpm --filter @paperclipai/server test -- adapter-models
adapter-model-refresh agent-permissions`
- `pnpm --filter @paperclipai/adapter-opencode-local test`
- `pnpm --filter @paperclipai/ui test -- AgentConfigForm
OnboardingWizard NewAgent`
- Manual QA in browser:
1. Boot Paperclip on Tailscale-bound port (so it's reachable from
another
machine), create an OpenCode-local agent, switch the default environment
between two installed sandboxes, and confirm the model list refreshes
per-environment
2. Submit with a malformed `provider/model` string and verify the new
`requireOpenCodeModelId` error surfaces
- Before/after screenshots attached for `AgentConfigForm` model picker
## Risks
- Behavioural shift: switching default environment now triggers a model
refetch.
Should be cheap but introduces a new UI loading state for OpenCode
users.
- Removing dynamic autodetect for OpenCode: if any user configured an
agent
without specifying `model` and relied on autodetect populating it, that
agent
will now fail at submit time. Mitigation: validation error is explicit
and
actionable.
- New query string parameter on `/api/companies/:id/adapter-models` —
older
clients that omit it still work (parameter is optional and defaults to
null).
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
> **Stacked PR (part 3 of 7).** Depends on:
- PR #5114
- PR #5115
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents executing on a remote SSH-backed environment need a way to
call back into
> the Paperclip control plane (run events, log streaming, signals)
> - When the SSH host can't reach the Paperclip host (NAT, firewalls, or
simply not
> on the same network), the run silently fails or hangs — a recurring
class of
> failure during SSH testing
> - In sandboxed environments we already solved this with a callback
bridge that
> tunnels back through the existing connection; SSH was the odd one out
> - This PR migrates SSH execution to use the same callback bridge, so
every
> adapter's remote run uses one consistent reverse-channel. Per-adapter
SSH glue
> is deleted in favour of a shared `CommandManagedRuntimeRunner` built
from the
> SSH spec
> - The benefit is fewer SSH-specific failure modes, a smaller code
surface, and
> one place to evolve the callback contract going forward
## What Changed
- Added `createSshCommandManagedRuntimeRunner` in
`packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a
generic
command-managed-runtime runner (with cwd, env, and timeout handling)
- Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge
URL now flows
through the shared runner
- Reworked `execution-target.ts` to use the SSH runner alongside sandbox
runners
via a unified `CommandManagedRuntimeRunner` interface
- Simplified `remote-managed-runtime.ts` and
`sandbox-managed-runtime.ts` to consume
the shared runner abstraction
- Deleted per-adapter SSH callback wiring from claude-local,
codex-local,
cursor-local, gemini-local, opencode-local, pi-local execute.ts files
- Removed `environment-runtime-driver-contract.test.ts` (the contract is
now
enforced by `environment-execution-target.test.ts`)
- Added/updated `execute.remote.test.ts` cases for each adapter to cover
the SSH
runner path
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm test -- execute.remote` (covers all six local adapters' SSH
paths)
- Manual QA: ran a claude-local agent against an SSH-backed environment,
confirmed
the agent successfully called back to `/api/agent-callback/*` endpoints
during
the run
## Risks
- Refactor touches all six local adapters. If any adapter had subtle
SSH-specific
behaviour that wasn't captured in tests, it could regress. Mitigation:
each
adapter's `execute.remote.test.ts` was extended.
- `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking
type change
for any internal consumer. Verified no external plugins consume this
type.
- The new `CommandManagedRuntimeRunner` shape is a public surface in
`@paperclipai/adapter-utils`; downstream plugins implementing custom
runners may
need updates, but no such plugins exist in this repo.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents execute in sandboxed remote environments served by pluggable
sandbox
> providers (E2B today, more later)
> - Today every sandbox command runs under `sh -lc` regardless of what
the
> provider's container actually ships
> - That misses bash-only shell init on E2B (which ships bash) and
prevents
> future providers from declaring a different default — there's no way
for a
> provider to say "I have bash, use it"
> - This PR adds a `shellCommand` field to sandbox execution targets so
providers
> can declare their preferred shell ("bash" for E2B), threads it through
the
> sandbox-managed-runtime client, callback bridge, and execution-target
shell
> helper, and validates the value at the lease-metadata boundary
> - The benefit is that sandbox commands run under the right shell on
the right
> provider, and adding new sandbox providers only needs to declare a
shell
> preference
## What Changed
- Added `packages/adapter-utils/src/sandbox-shell.ts` exporting
`preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is
`"bash"`,
else `"sh"`)
- Added `shellCommand?: "bash" | "sh" | null` to
`AdapterSandboxExecutionTarget`
and `CommandManagedRuntimeSpec`; threaded it through
`runAdapterExecutionTargetShellCommand`,
`prepareAdapterExecutionTargetRuntime`,
and `startAdapterExecutionTargetPaperclipBridge`
- `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`,
and
`createCommandManagedSandboxCallbackBridgeQueueClient` now take an
optional
`shellCommand` and use `preferredShellForSandbox` to pick the shell
- `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its
server
startup, readiness probe, and stop hook
- E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata`
- `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease
metadata
(validating against `"bash" | "sh" | null`)
- `environment-runtime.ts` adds `"shellCommand"` to
`INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS`
so the field round-trips through internal plugin config without leaking
to
external plugin metadata
- Updated tests in `command-managed-runtime.test.ts`,
`execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`,
`environment-execution-target.test.ts`
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-execution-target`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: boot a Paperclip instance, create an E2B-backed
environment, run a
claude_local agent against it, and confirm the run completes (verifies
bash
shell semantics flow through the callback bridge end-to-end)
## Risks
- E2B sandbox commands now run under `bash -lc` instead of `sh -lc`.
Bash is a
strict superset for the commands we issue (no busybox-only flags in our
shell
scripts), so risk is low. The shellCommand field is opt-in via lease
metadata —
providers that don't declare it stay on `sh`.
- New optional field on `CommandManagedRuntimeSpec` and
`AdapterSandboxExecutionTarget`.
Consumers ignoring the field retain previous behaviour (sh).
- Lease metadata now carries an additional field. Existing leases
without
`shellCommand` resolve to `null` and fall back to sh — backwards
compatible.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
## Thinking Path
> - Paperclip orchestrates AI agents for autonomous companies, and
heartbeat execution is the control-plane loop that keeps assigned work
moving.
> - Max-turn exhaustion is a recoverable local-adapter stop condition
for Claude and Gemini agents when a run needs another heartbeat to
continue safely.
> - The previous behavior could leave max-turn continuation details hard
to inspect, and duplicate/stale continuation wakes could keep running
after issue state changed.
> - The adapter layer also needed to avoid trusting arbitrary
stdout/stderr text as scheduler control metadata.
> - This pull request adds bounded max-turn continuation scheduling,
visible retry state, structured stop metadata handling, and
stale/duplicate continuation guards.
> - The benefit is safer automatic continuation after max-turn stops,
clearer operator visibility, and fewer duplicate or stale agent runs.
## What Changed
- Replaces closed PR #4952, whose head repository was deleted.
- Rebases the recovered max-turn continuation branch onto current
`paperclipai/paperclip:master`.
- Adds max-turn continuation scheduling and retry-state plumbing for
heartbeat runs.
- Adds stale/duplicate continuation suppression when issue status,
ownership, or execution locks change.
- Normalizes Claude/Gemini max-turn detection around structured stop
metadata instead of unstructured stdout/stderr text.
- Surfaces max-turn continuation settings and retry visibility in the
board UI.
- Adds focused server, adapter, and UI tests for max-turn stop metadata,
retry scheduling, stale queued-run invalidation, adapter
parsing/execution, run ledger display, and agent config patching.
## Verification
- `pnpm install --no-frozen-lockfile` to refresh local dependencies
after rebasing onto current `master`.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/claude-local-adapter.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter.test.ts
server/src/__tests__/gemini-local-execute.test.ts
server/src/__tests__/heartbeat-retry-scheduling.test.ts
server/src/__tests__/heartbeat-stale-queue-invalidation.test.ts
server/src/services/heartbeat-stop-metadata.test.ts
ui/src/components/IssueRunLedger.test.tsx
ui/src/lib/agent-config-patch.test.ts ui/src/lib/runRetryState.test.ts
--testTimeout=20000`
- `pnpm --filter @paperclipai/adapter-claude-local typecheck && pnpm
--filter @paperclipai/adapter-gemini-local typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`
- UI screenshot note: the UI changes are limited to config/ledger state
rendering rather than layout changes; component/unit coverage above
verifies the rendered behavior.
## Risks
- Medium behavior risk: heartbeat retry gating now suppresses max-turn
continuations when issue state or execution locks drift, so any callers
that relied on stale continuations running will now see cancellation
instead.
- Low adapter risk: Claude/Gemini unstructured text no longer triggers
max-turn scheduler metadata, so only structured stop signals and Gemini
exit code 53 are trusted.
- No database migrations.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex coding agent, GPT-5-class model, tool-enabled local
repository editing and command execution.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable: state/default rendering only; covered by
component/unit tests)
- [x] I have updated relevant documentation to reflect my changes (not
applicable: no user-facing command or docs contract changed)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
Five connected gaps in the original PAT feature, all in company-skills.ts:
1. upsertImportedSkills only protected bundled rows from overwrite when the
incoming source claimed to be paperclipai/paperclip. A SKILL.md from any
other org/repo whose key resolves to paperclipai/paperclip/<slug> would
hijack the bundled row and gain a sourceAuthSecretId. Broadened: any
non-bundled incoming is rejected when existing is paperclip_bundled.
2. The metadata-build block preserved sourceAuthSecretId from existing
indiscriminately, so any pollution of a bundled row was kept across every
ensureBundledSkills re-upsert. Skip preservation when existing is bundled.
3. importFromSource's auth-token loop wrote sourceAuthSecretId for every
imported skill including any bundled ones that snuck through. Defense in
depth: skip skills with sourceKind === "paperclip_bundled".
4. updateSkillAuth had no guard, so the PATCH /skills/:id/auth route could
attach a PAT to a bundled skill via direct API call. Reject explicitly.
5. deleteSkill removed the secret without checking whether any sibling skill
still referenced it via metadata.sourceAuthSecretId. Re-imports preserve
that reference, so two skills could share a secret and deleting one would
orphan the other's reference. Now skip the remove if another skill in the
same company still references the secret.
## Thinking Path
> - Paperclip is a control plane for autonomous AI companies where work
must stay observable, governable, and recoverable.
> - The task/heartbeat subsystem owns agent execution continuity, issue
state transitions, and visible recovery behavior.
> - Waiting on an external service is not the same as being blocked when
the assignee still owns a future check.
> - The gap was that agents had no first-class one-shot monitor state
for external-service waits, so recovery could look stalled or require ad
hoc comments.
> - This pull request adds bounded issue monitors that can wake the
owner, clear exhausted waits, and produce explicit recovery behavior.
> - It also surfaces monitor status in the board UI and documents when
to use monitors versus `blocked`.
> - The benefit is clearer liveness semantics for asynchronous waits
without weakening single-assignee task ownership.
## What Changed
- Added issue monitor fields, shared types, validators, constants, and
an idempotent `0075` migration for scheduled monitor state.
- Added server-side monitor scheduling, dispatch, recovery bounds,
activity logging, and external-ref redaction.
- Added board/agent route coverage for monitor permissions and child
monitor scheduling.
- Added issue detail/property UI for monitor state, a monitor activity
card, and Storybook stories for review surfaces.
- Documented monitor semantics and recovery policy behavior in
`doc/execution-semantics.md`.
- Addressed Greptile review feedback by preserving monitor state in
skipped-stage builders and making board monitor saves send `scheduledBy:
"board"`.
## Verification
- `pnpm install --frozen-lockfile`
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/issue-execution-policy-routes.test.ts
server/src/__tests__/issue-execution-policy.test.ts
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/lib/activity-format.test.ts`
- First run passed 5 files and failed to collect 2 server suites because
the worktree was missing the optional `acpx/runtime` dependency.
- After `pnpm install --frozen-lockfile`, reran the 2 failed suites
successfully.
- `pnpm exec vitest run
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts`
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck
&& pnpm --filter @paperclipai/ui typecheck`
- `pnpm exec vitest run
server/src/__tests__/issue-execution-policy.test.ts
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/server typecheck && pnpm --filter
@paperclipai/ui typecheck`
- `pnpm exec vitest run
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- Storybook screenshot captured from
`http://127.0.0.1:6006/iframe.html?viewMode=story&id=product-issue-monitor-surfaces--monitor-surfaces`
with Playwright.
## Screenshots

## Risks
- Medium: this changes heartbeat recovery behavior for scheduled
external-service waits, so regressions could affect wake timing or
recovery issue creation.
- Migration risk is reduced by using `IF NOT EXISTS` for the new issue
monitor columns and index.
- External monitor references are treated as secret-adjacent and are
intentionally omitted from visible activity/wake payloads.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent with repository tool use and terminal
execution.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots or Storybook review surfaces
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
---------
Co-authored-by: Paperclip <noreply@paperclip.ing>
The self-hosted runner has been hitting context-deadline timeouts to
docker.io. The actual image push goes to GHCR, so the Docker Hub login
is only there to avoid pull rate limits. Mark it continue-on-error so
transient docker.io connectivity issues don't fail the whole build —
base image pulls fall back to anonymous and proceed.
## Thinking Path
> - Paperclip is the control plane for autonomous AI companies.
> - The board UI needs a clear persistent way to move between company
workspaces.
> - The previous layout kept company switching in a separate left rail,
which made the sidebar feel split between workspace selection and
navigation.
> - The workspace switcher belongs in the sidebar header so navigation
and workspace context stay together.
> - This pull request removes the separate company rail from the layout
and turns the sidebar company menu into the primary workspace switcher.
> - The benefit is a cleaner sidebar structure that keeps workspace
identity, switching, company actions, and navigation in one place.
## What Changed
- Removed the standalone `CompanyRail` from the main layout.
- Added the company/workspace switcher to the default, company settings,
and instance settings sidebars.
- Expanded `SidebarCompanyMenu` to list active workspaces, indicate the
current workspace, navigate out of instance settings when switching, and
expose add-company onboarding.
- Updated focused component tests for the new workspace-switcher
behavior.
## Verification
- `pnpm --filter @paperclipai/ui exec vitest run
src/components/SidebarCompanyMenu.test.tsx
src/components/CompanySettingsSidebar.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- `git diff --check`
- Visual smoke attempted against the managed dev server at
`http://127.0.0.1:57385`; a fresh browser context reached the
authenticated sign-in screen, so I could not capture an authenticated
sidebar screenshot from this heartbeat.
## Risks
- Low-to-medium UI risk: this changes the primary sidebar structure and
workspace-switching entry point.
- The instance-settings switch behavior now routes back to the selected
company dashboard when a workspace is selected.
- No migrations, API contracts, or lockfile changes.
> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.
## Model Used
- OpenAI Codex, GPT-5 coding agent, tool-enabled, medium reasoning mode.
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
Co-authored-by: Paperclip <noreply@paperclip.ing>