623 Commits

Author SHA1 Message Date
Devin Foley aea35fe695 exe.dev config UX: advanced-options disclosure, form-default fix, SSH key handling (PAPA-407) (#7025)
## Thinking Path

> - Paperclip orchestrates AI agents and provisions sandboxed execution
environments for them; one of those provisioners is the exe.dev plugin,
which runs each agent inside a long-lived VM reached over SSH.
> - The instance-config form for that plugin is rendered generically by
`JsonSchemaForm` from the plugin's `instanceConfigSchema`, so any UX
problem with the form is split between the shared form component and the
plugin's schema/runtime code.
> - Users coming in cold hit a 12-field flat config they couldn't reason
about (PAPA-407), a form that silently submitted `cpu: 0` for untouched
optional fields (PAPA-407 root cause), a `sshPrivateKey` textarea that
truncated RSA-4096 keys at 4096 chars (PAPA-449), a save flow that
accepted clearly-malformed keys and only blew up at lease time with raw
SSH stderr (PAPA-450, PAPA-451), and a manifest that didn't distinguish
"essential" from "advanced" knobs (PAPA-410 / PAPA-411 — duplicate
sub-issues with identical scope; PAPA-418 reconciliation kept PAPA-410
canonical).
> - These problems all point at the same surface (exe.dev sandbox
config) and are tightly coupled in code — PAPA-449/450/451 patch fields
that PAPA-410/411 introduce — so they get reviewed together.
> - This pull request lands the shared-form changes (advanced-options
disclosure, optional-scalar defaults) and the exe.dev-specific changes
(manifest restructure, longer `maxLength`, stderr translation, save-time
key validation) as five focused commits stacked on `master`.
> - The benefit is a config form that defaults to the two fields a new
user actually needs (API key + SSH private key) with a collapsible
disclosure for the rest, no silent truncation or zero-default
submissions, and SSH key problems surfaced at save time with actionable
messages instead of cryptic post-provision failures.

## What Changed

- **JsonSchemaForm advanced-options disclosure** (PAPA-410, PAPA-411 —
same scope, see note above): adds `x-paperclip-advanced` /
`x-paperclip-group` schema annotations and renders flagged fields behind
a collapsible "Advanced options" disclosure that auto-opens when a
hidden field has a validation error. Exe.dev manifest is restructured to
use the new annotations, so essentials (`apiKey`, `sshPrivateKey`) show
by default while the long tail of optional knobs is grouped under "SSH
access" / "VM resources" / "More options" headings.
- **Omit optional scalar defaults** (PAPA-407): `getDefaultForSchema` no
longer materialises `0` / `""` for optional
`number`/`integer`/`string`/`secret-ref` fields without an explicit
`default`. Object recursion drops properties whose default is
`undefined`. Fields that declare a `default` (e.g. `sshPort: 22`) still
round-trip. Adds a regression test against `getDefaultValues`.
- **Raise `sshPrivateKey` `maxLength`** (PAPA-449): bumps the exe.dev
manifest cap from 4096 to 8192 so RSA-4096 OpenSSH private keys (which
can exceed 4 KB with comments/metadata) aren't silently truncated at
submit.
- **Translate `invalid format` SSH stderr** (PAPA-450):
`formatSshFailure` now recognises `Load key … invalid format` in
combined stderr/stdout and returns a specific message naming the
key-format problem ("isn't an OpenSSH/PEM private key — confirm the
secret starts with `-----BEGIN … PRIVATE KEY-----` and isn't the `.pub`
or a PuTTY `.ppk` export") instead of dumping the raw stderr.
- **Save-time SSH key validation** (PAPA-451):
`onEnvironmentValidateConfig` inline-parses `sshPrivateKey` and rejects
common failure modes — pasted public keys, PuTTY `.ppk` format, missing
`-----END-----` footer, non-base64 body — so the form surfaces an inline
error before any VM is provisioned. Secret-ref bindings (UUIDs) are
still passed through unchanged.

## Verification

CI gates (`pnpm typecheck`, `pnpm test`, the targeted vitest suites
below) all pass.

Run locally:

```bash
# Shared form
pnpm --filter @paperclipai/ui exec vitest run src/components/JsonSchemaForm
# 9 tests pass — includes the new "omits optional scalar fields" regression
# and the three advanced-options-disclosure tests.

# exe.dev plugin
cd packages/plugins/sandbox-providers/exe-dev && pnpm test
# 32 tests pass — includes the new sshPrivateKey-validation cases
# and the new "invalid format" stderr-translation case.
```

Manual smoke (after reinstalling the plugin so the DB manifest
refreshes):

1. Open the exe.dev environment config page. **Default view shows API
Key + SSH Private Key only**, with an "Advanced options" disclosure for
everything else (PAPA-410 / PAPA-411).
2. Paste a `.pub` file's contents into SSH Private Key, click Save.
**Inline error** rejecting the wrong-format key (PAPA-451).
3. Re-paste a valid OpenSSH/PEM private key longer than 4096 bytes —
saves cleanly (PAPA-449).
4. Save the form with everything optional left blank — server no longer
rejects with `"cpu must be greater than 0 when provided"` (PAPA-407).
5. Force a bad key through via a stored secret-ref binding and lease a
VM — failure message names the key-format problem instead of dumping raw
SSH stderr (PAPA-450).

## Risks

- **PAPA-410 / PAPA-411 manifest restructure** is the largest surface
here. Schemas using `x-paperclip-*` extensions are forward-compatible
with stricter JSON Schema validators (extensions are ignored by
default), and the form gracefully renders a flat layout when no field
opts in.
- **PAPA-407** changes form-default behaviour: optional scalar fields
that previously round-tripped as `""` / `0` will now be `undefined` and
absent from the submitted payload. Downstream consumers that expected
the empty-string/zero shape need to treat the field as optional.
Spot-checked the existing exe.dev driver — it already uses
`parseOptionalString` / `parseOptionalInteger`, which treat missing
fields as `null` rather than `0`/`""`.
- **PAPA-451** adds a save-time check, so a
previously-saved-but-malformed `sshPrivateKey` raw value will now fail
to re-save. Bound secret-refs are unaffected, matching how the user
reaches the bad-key state today (via the secrets picker).
- **PAPA-449** simply raises a cap; no semantic risk.
- **PAPA-450** only kicks in on the "invalid format" code path; existing
onboarding-marker branch is untouched.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (`claude-opus-4-7`)
- Capabilities used: code reading, code editing, test execution, git/PR
mechanics, Paperclip API for issue coordination

## Checklist

- [x] PR body sections present (Thinking Path, What Changed,
Verification, Risks, Model Used, Checklist)
- [x] Unit tests added for the new behaviours (JsonSchemaForm
default-value omission + advanced disclosure; exe.dev plugin validation
+ stderr translation)
- [x] Existing tests still pass locally (`vitest run` on both packages)
- [x] No raw secrets, IP addresses, or machine-local config in commits
or PR body
- [x] Commits are atomic per linked issue (PAPA-410 / PAPA-411,
PAPA-407, PAPA-449, PAPA-450, PAPA-451)
- [x] Branch is up-to-date with `origin/master`

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-29 18:19:37 -07:00
Dotta 5153b01ada [codex] Add Claude model refresh (#6953)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through adapter-backed
local and external runtimes.
> - The agent configuration UI lets operators choose adapter models and
refresh model lists when adapters support live discovery.
> - Codex already had a live refresh path, but Claude Local only exposed
static fallback models and the UI hid the refresh action for Claude.
> - A newly available Claude Opus model should not require a code
release every time the model catalog changes.
> - This pull request adds Anthropic model discovery for Claude Local,
keeps the static fallback current with Claude Opus 4.8, and exposes the
existing refresh button in the Claude Local dropdown.
> - The benefit is that operators can refresh Claude models from the
same model selector flow they already use for Codex.

## What Changed

- Added `claude-opus-4-8` to the Claude Local fallback model list.
- Added Claude model discovery through Anthropic-compatible `GET
/v1/models` when `ANTHROPIC_API_KEY` is available.
- Added normal cache reuse, forced refresh support, a SHA-256-based
API-key fingerprint for cache keys, and warning logging for discovery
errors before fallback.
- Wired `claude_local.refreshModels` into the server adapter registry.
- Enabled the existing `Refresh models` dropdown action for
`claude_local` in `AgentConfigForm`.
- Added tests for Claude fallback, live discovery, API-failure fallback,
forced refresh, and the UI refresh-button gate.

## Verification

- `pnpm exec vitest run server/src/__tests__/adapter-models.test.ts`
- `pnpm exec vitest run ui/src/components/AgentConfigForm.test.ts`
- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- Greptile review reached Confidence Score: 5/5 on commit `b796cf4f1`
with addressed threads resolved.

UI note: the visible change is a conditional action row inside the
existing model dropdown; the regression test covers that `claude_local`
now receives the refresh action.

## Risks

- Low risk. Without `ANTHROPIC_API_KEY`, Claude Local still uses the
static fallback list.
- If Anthropic model discovery fails or times out, Paperclip falls back
to the existing cached or static list.
- Bedrock environments remain on Bedrock-native model IDs.

## Model Used

OpenAI GPT-5 via Codex local coding agent, with repository file access,
shell command execution, git operations, and targeted test/typecheck
verification. Exact context window is not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-29 07:03:07 -10:00
Devin Foley 1f70fd9a22 PAPA-430: workspace finalize gates + no-remote-git enforcement (#6969)
## Thinking Path

> - Paperclip orchestrates AI agents across isolated execution
workspaces; the local cwd is the only persistence boundary between runs.
> - Workspace lifecycle (worktree_prepare → execute →
workspace_finalize) and the wake/accept flow are what guarantee that
dependent issues see a consistent worktree.
> - PAPA-380 / PAPA-431 / PAPA-432 / PAPA-440 surfaced three holes in
that contract: silent env reuse across assignees, dependent wakes firing
before finalize, and `issue.interaction.accept` advancing before
finalize landed.
> - PAPA-441 / PAPA-442 then needed to document the "no remote git"
contract and prevent future adapter/runtime code from quietly
reintroducing `git push` as a backdoor sync.
> - This pull request lands those server fixes, the static
`check-no-git-push` enforcement, the AUTHORING.md cross-link, and the
Cody-review follow-ups on the PAPA-430 thread.
> - The benefit is that finalize is a real barrier — board accepts,
dependent wakes, and operator-set env all respect it — and adapter code
can't bypass it via raw `git push`.

## What Changed

- **server (PAPA-380, PAPA-431):** `execution-workspace-policy` refuses
silent env reuse when the assignee's resolved env disagrees with the
workspace it would inherit. The inheritance protection is now scoped to
the actual inheritance signal — explicit issue-level `environmentId` is
honored even when the agent's default env is `null`.
- **server (PAPA-432):** `heartbeat.ts` gates dependent wakes on
`listUnfinalizedExecutionWorkspaceIds`, and writes a
`workspace_finalize` row on the succeeded path. Write failures now
surface instead of being swallowed so dependents aren't silently
stranded behind a missing row.
- **server (PAPA-440):** `issue-thread-interactions.acceptInteraction`
adds a workspace_finalize precondition for `request_confirmation` (not
`suggest_tasks`). Accept returns 409 if finalize hasn't succeeded for
the latest workspace operation.
- **ci (PAPA-442):** new `scripts/check-no-git-push.mjs` static check
scans `packages/adapters/`, `packages/adapter-utils/`, `server/src/`,
and `cli/src/` for any `git push` invocation (string or args-array).
Wired into the `policy` PR job and `test:release-registry`. Operators
can opt in per-call with `// paperclip:allow-git-push: <reason>`.
Release scripts are out of scope by design.
- **docs (PAPA-441):** `AUTHORING.md` documents the no-remote-git
contract and cross-links the static check so adapter authors learn the
rule and the enforcement together.
- **review follow-up (PAPA-430, Cody):** three fixes — env resolver bug,
accept-gate scope (request_confirmation only), and finalize record write
on the succeeded path.

## Verification

- `pnpm exec vitest run
server/src/__tests__/execution-workspace-policy.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts` → 33/33
pass
- `node scripts/check-no-git-push.test.mjs` → check covers string form,
args-array form, comment exclusions, and per-line allow-comment.
- Manual: server compiles; the policy job runs the check in <1s before
heavier jobs.

## Risks

- **Behavioral shift in accept:** boards accepting
`request_confirmation` while finalize is in-flight now get 409s. This is
intentional — they can retry — but it changes timing on a hot path.
`suggest_tasks` is unaffected.
- **Workspace policy:** the env-reuse refusal is a new error path.
Issues that previously silently reused an env from a different-assignee
workspace will now fail-loud; the resolver still honors explicit
issue-level `executionWorkspaceSettings.environmentId`.
- **CI rule:** any future legitimate `git push` in scoped dirs must be
marked with the allow-comment, which is the intended ergonomic.

## Model Used

- Claude Opus 4.7 (`claude-opus-4-7`, extended thinking), via Claude
Code in the Paperclip executor adapter.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — server/CI/docs only)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Closes related issues: PAPA-430, PAPA-380, PAPA-431, PAPA-432, PAPA-440,
PAPA-441, PAPA-442

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-29 08:25:29 -07:00
Devin Foley d9f91576a0 Add accepted-plan decomposition exact-once guards and UI state (#6831)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so
planning approvals and child-issue fan-out are part of the core
control-plane loop.
> - Accepted plans are supposed to be a safe bridge from planning into
execution, especially when agents wake from review decisions and reuse
isolated workspaces.
> - The duplicate-subtask incident showed that an accepted plan revision
could be interpreted more than once across overlapping runs, which broke
the single-source-of-truth model for issue decomposition.
> - Fixing that required tightening the backend contract first:
accepted-plan decomposition needs an exact-once fingerprint, durable
claim state, and retry-safe child creation.
> - Once that backend behavior existed, the board still needed
visibility into what happened, so the issue detail view needed a
dedicated decomposition section instead of forcing operators to
reconstruct child creation from raw activity.
> - This pull request adds the exact-once decomposition primitive,
hardens wake routing and regressions around the incident, and surfaces
decomposition state in the UI so future incidents are both prevented and
easier to inspect.

## What Changed

- Added accepted-plan decomposition semantics to
`doc/execution-semantics.md`, including the exact-once fingerprint,
durable claim/result expectations, and retry/resume behavior.
- Added persistent accepted-plan decomposition claims in the backend,
including schema, shared types/validators, service logic, and issue
routes for creating and listing decomposition state.
- Hardened heartbeat routing so an accepted-plan continuation stays
scoped to the relevant planning issue instead of opportunistically
re-decomposing another accepted issue on the same assignee.
- Added regression coverage for the original failure modes: concurrent
same-parent retries, cross-issue accepted-plan isolation, and partial
child recreation under the same fingerprint.
- Added the `Plan decomposition` issue-detail section plus supporting
API/query-key/activity formatting updates so operators can see revision
status, owner, child counts, and the linked child issues directly in the
UI.
- Included the small follow-up UI fix so the decomposition section still
renders when the issue work mode is no longer `planning`.

## Verification

- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `pnpm --filter @paperclipai/db typecheck`
- `pnpm exec vitest run server/src/__tests__/issues-service.test.ts`
- `pnpm exec vitest run server/src/__tests__/issues-service.test.ts -t
"lists persisted decompositions with child issue summaries"`
- `pnpm exec vitest run server/src/__tests__/issues-service.test.ts -t
"accepted plan decomposition"
server/src/__tests__/heartbeat-accepted-plan-workspace-refresh.test.ts
server/src/__tests__/heartbeat-context-summary.test.ts`
- Manual UI path: create a planning issue without an isolated execution
workspace, add a `plan` document, accept the `request_confirmation`, let
Paperclip create child issues, then reopen the parent issue detail page
and confirm the `Plan decomposition` section shows the accepted
revision, status, idempotent-claim badge, and child links.
- Separate follow-up bug noted during manual UI validation: accepting a
plan on an issue whose run never records `workspace_finalize` is tracked
in `PAPA-445` and is not part of this PR’s fix scope.

## Risks

- This adds a new migration and a large Drizzle snapshot update;
reviewers should confirm the schema shape and generated metadata match
the intended decomposition table.
- The exact-once claim changes sit on the accepted-plan fan-out path, so
regressions there could block legitimate child creation or mis-handle
retries if the claim state machine is wrong.
- The new UI only appears when decomposition records exist; reviewers
should use the manual verification path above rather than expecting
existing issues on a stale local instance to show the section
automatically.
- `PAPA-445` remains an open follow-up for the `workspace_finalize`
accept gate when a planning handoff never records finalize; that bug can
interfere with reproducing the UI flow on isolated workspaces but does
not change the correctness of the exact-once decomposition feature
itself.

> Checked `ROADMAP.md`: this PR is a bug fix / control-plane hardening
change for accepted-plan decomposition, not a new uncoordinated roadmap
feature.

## Model Used

- OpenAI Codex via Paperclip `codex_local` (GPT-5-based coding agent;
exact backend model ID/context window not exposed in the run context),
with repository tool use, shell execution, and code-editing
capabilities.

<img width="806" height="1069" alt="Screenshot 2026-05-27 at 11 05
48 PM"
src="https://github.com/user-attachments/assets/5b00b670-96cd-4470-b0a3-581743bcae28"
/>


## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-28 23:30:18 -07:00
Dotta 9eac727cf1 [codex] Add skills CLI and catalog management (#6782)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped control-plane workflows.
> - Agents need reusable, inspectable skills that can be installed,
reset, audited, exported, and assigned without bespoke local setup.
> - The existing skill truth model needed cleanup so bundled skills,
optional catalog skills, runtime skills, and adapter-provided skills
have clear provenance.
> - Operators also need a practical CLI and board UI for discovering and
managing company skills.
> - This pull request adds the skills CLI, packaged skills catalog,
company skills APIs, and catalog-aware board UI.
> - The benefit is a more reusable Paperclip company setup where skills
are portable, auditable, and easier for operators and agents to manage.

## What Changed

- Added `paperclipai skills` CLI commands and coverage for catalog
listing, installing, resetting, and inspecting company skills.
- Added a packaged `@paperclipai/skills-catalog` workspace with bundled
and optional skill content plus validation/build tests.
- Added shared company-skill types and validators used across CLI,
server, and UI contracts.
- Added server catalog APIs/services for company skill catalog
operations, reset semantics, audit behavior, and portability provenance.
- Updated adapter skill handling so runtime/catalog provenance remains
explicit across local adapters.
- Added board UI support for browsing and managing catalog-backed
company skills.
- Updated docs for the skills CLI/catalog flow and the company skills
Paperclip skill reference.
- Rebased the branch onto current `paperclipai/paperclip:master`; no
`pnpm-lock.yaml`, `.github/workflows`, or migration files are included
in the final PR diff.

## Verification

- Passed: `pnpm run preflight:workspace-links && pnpm exec vitest run
cli/src/__tests__/skills.test.ts
packages/skills-catalog/src/catalog-builder.test.ts
packages/skills-catalog/src/shipped-catalog.test.ts
packages/shared/src/validators/company-skill.test.ts
packages/adapter-utils/src/server-utils.test.ts
packages/plugins/create-paperclip-plugin/src/entrypoints.test.ts
server/src/__tests__/company-skills-catalog-service.test.ts
server/src/__tests__/company-skills-routes.test.ts
server/src/__tests__/company-portability.test.ts`.
- Passed: `pnpm exec vitest run
server/src/__tests__/workspace-runtime.test.ts -t "default
branch|origin/master|symbolic-ref"`.
- Attempted: full `server/src/__tests__/workspace-runtime.test.ts`. Four
provisioning tests failed while seeding an isolated worktree database
from the local Paperclip instance because the local plugin schema dump
contains a duplicate-column foreign key
(`plugin_content_machine_18a7bc327b.content_case_signals`). The
default-branch tests touched by the rebase conflict passed in the
focused run above.
- Checked final diff: no `pnpm-lock.yaml`, no `.github/workflows`, and
no migration-file changes relative to `master`.

## Risks

- Medium: this is a broad skills/catalog change touching CLI, server
APIs, shared contracts, adapter skill sync, and UI.
- Catalog validation and reset semantics need careful reviewer attention
because they affect reusable company setup and portability.
- No database migrations are included in this PR, so there is no
migration ordering/idempotency risk in the final diff.
- No lockfile is included by design; dependency resolution will be
handled by the repository lockfile workflow.

## Model Used

- OpenAI Codex coding agent based on GPT-5, running in Paperclip via the
`codex_local` adapter with shell, git, GitHub CLI, and code-editing tool
access. Exact hosted model build/context-window metadata is not exposed
in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run targeted tests locally and documented the local
workspace-runtime seed failure above
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, screenshots were intentionally
omitted per PAP-10124 instructions; UI behavior is covered by tests and
reviewer inspection
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-28 07:33:51 -10:00
Dotta b7545823be [codex] Add document annotations and comments (#6733)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through issues, documents,
runs, and durable company-scoped state.
> - Issue documents are where agents and operators capture plans,
handoffs, and work products.
> - Before this change, document collaboration could only happen through
whole-document edits and detached issue comments.
> - Inline document annotations need stable anchors, revision-aware
persistence, and UI affordances that do not break existing document
editing.
> - This pull request adds company-scoped document annotation threads,
comments, anchor snapshots, API routes, and board UI.
> - The benefit is that operators and agents can discuss specific
document passages without losing context as documents evolve.

## What Changed

- Added document annotation tables, schema exports, shared types,
validators, anchor hashing, and text-anchor helpers.
- Added server-side document annotation services and issue routes for
listing, creating, commenting, resolving, and reopening annotation
threads.
- Included annotation summaries in relevant issue document reads and
backup/recovery document workspace behavior.
- Added React UI for inline document highlights, comment panels, mobile
sheet behavior, deep-link focus, and resolved/open filtering.
- Added annotation design artifacts, Storybook coverage, screenshots,
and a screenshot helper script.
- Rebased the branch onto current `paperclipai/paperclip` `master` and
renumbered the annotation migration from `0085_old_swarm` to
`0091_old_swarm`; the SQL uses `IF NOT EXISTS` guards so environments
that previously applied the old migration number can safely apply the
new one.
- Adjusted the new annotation UI tests to use a local async flush helper
because this workspace's React 19.2.4 export does not expose
`React.act`.

## Verification

- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/document-anchors.test.ts
server/src/__tests__/document-annotation-routes.test.ts
server/src/__tests__/document-annotations-service.test.ts
ui/src/components/DocumentAnnotationLayer.test.tsx
ui/src/components/IssueDocumentAnnotations.test.tsx
ui/src/lib/document-annotation-hash.test.ts
ui/src/lib/document-annotation-selection.test.ts`
- Confirmed `git diff --check` passes.
- Confirmed no `pnpm-lock.yaml` or `.github/workflows/*` files are
included in the PR diff.

## Risks

- Medium risk: this adds new persisted annotation tables and routes
across db/shared/server/ui.
- Migration risk is reduced by moving the branch migration to
`0091_old_swarm` after upstream `0090_resource_memberships` and keeping
the SQL idempotent for old `0085_old_swarm` adopters.
- UI risk is mostly around text range anchoring and panel positioning
across long documents, folded content, and mobile layouts; the PR
includes focused unit coverage and design screenshots.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-using software engineering
mode. Context window size is not exposed in this Paperclip runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-26 06:41:23 -07:00
Dotta 9aea3e3d35 [codex] Add resource membership controls (#6677)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through company-scoped
issues, projects, agents, and board-visible workflows.
> - The board sidebar and project list are the daily navigation surface
for that control plane.
> - Users need to keep all projects and agents accessible while hiding
resources they have intentionally left from their own sidebar.
> - That requires user-scoped resource membership state backed by
company-scoped API and database contracts.
> - The branch also needed to preserve HTTP worktree login sessions and
keep the project list easier to scan after membership grouping.
> - This pull request adds resource membership controls, sidebar leave
actions, grouped/sortable project listings, and focused tests.
> - The benefit is a cleaner personal workspace view without weakening
company-scoped access to the underlying project or agent detail pages.

## What Changed

- Added `project_memberships` and `agent_memberships` tables with
API/shared/server contracts for current-user join/leave state.
- Renumbered the membership migration to `0090_resource_memberships`
after rebasing onto current `master`, and made it idempotent for anyone
who had applied the old branch-local `0087` migration.
- Added project and agent sidebar leave actions, plus list filtering
that waits for membership state before hiding resources.
- Added grouped project listing, project sorting controls, and reserved
row subtitle height for cleaner scanning.
- Fixed HTTP auth cookie security handling so HTTP worktree sessions can
persist.
- Updated focused server and UI tests for the new membership, sidebar,
project list, and auth behavior.

## Verification

- `pnpm exec vitest run server/src/__tests__/better-auth.test.ts
server/src/__tests__/resource-memberships-routes.test.ts
ui/src/pages/Projects.test.tsx
ui/src/components/SidebarProjects.test.tsx
ui/src/components/SidebarAgents.test.tsx
ui/src/components/MembershipAction.test.tsx
ui/src/components/EntityRow.test.tsx`
- Confirmed the branch is rebased on current `origin/master`.
- Confirmed the PR diff does not include `pnpm-lock.yaml` or
`.github/workflows` changes.

## Risks

- Migration safety: low to medium. The migration now uses `IF NOT
EXISTS` / guarded constraints and is numbered after current master
migrations, but it should still get CI coverage against fresh databases.
- UI behavior: low. Left resources are hidden from sidebar only after
membership state loads; direct detail access remains available.
- Auth behavior: low. Cookie security is relaxed only for HTTP/private
local-style origins where secure cookies would prevent login
persistence.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex coding agent, tool-enabled shell/git workflow,
context window not exposed by runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Screenshot note: no browser screenshots were captured in this heartbeat;
the UI changes are covered by focused component tests above.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-25 13:12:41 -05:00
Dotta ece8a51e22 [codex] Bundle local branch fixes from PAP-10032 (#6604)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - This branch accumulated multiple already-tested control-plane,
adapter runtime, invite, workspace, plugin, and UI quality fixes on the
primary Paperclip checkout.
> - `origin/master` advanced while those commits were still local, so
the branch needed to be preserved and reconciled before review.
> - Splitting the branch commit-by-commit against the new base produced
overlapping conflicts with recently merged upstream PRs.
> - This pull request keeps the remaining branch as one standalone PR
because the final diff is 38 files after removing screenshot artifacts,
under Greptile's 100-file cap, and can be merged independently after
review.
> - The benefit is that none of the local work is lost, the branch is
now based on current `origin/master`, and reviewers can evaluate the
reconciled changes in one place.

## What Changed

- Merged the local accumulated branch with current `origin/master` and
resolved the invite-flow overlaps from the newer upstream companies
query helper.
- Preserved the local fixes for invite existing-member behavior, invite
link copy fallback, reusable workspace selection, worktree auth, static
SPA fallback, markdown wrapping, plugin slot registration, cloud
upstream UX/server polish, project sorting, and related tests.
- Removed screenshot artifacts from the PR per review request.
- Kept the PR under the requested file limit: 38 files changed, with no
`pnpm-lock.yaml` or `.github/workflows/*` changes.

## Verification

- `NODE_ENV=test pnpm exec vitest run
ui/src/pages/CompanyInvites.test.tsx ui/src/pages/InviteLanding.test.tsx
ui/src/pages/Projects.test.tsx ui/src/plugins/slots.test.ts
ui/src/components/MarkdownBody.test.tsx
server/src/__tests__/invite-accept-existing-member.test.ts
server/src/__tests__/static-index-html.test.ts
server/src/__tests__/execution-workspaces-service.test.ts
server/src/__tests__/better-auth.test.ts
server/src/__tests__/worktree-config.test.ts`
- `NODE_ENV=test pnpm --filter @paperclipai/ui typecheck`
- `NODE_ENV=test pnpm --filter @paperclipai/server typecheck`
- Confirmed `git diff --name-only origin/master...HEAD | wc -l` is `38`.
- Confirmed no PR diff entries match `pnpm-lock.yaml`,
`.github/workflows/*`, or `screenshots/*`.

## Risks

- Medium review risk because this is a bundled rescue PR rather than
several narrow feature PRs.
- Invite flow and company cache behavior overlapped with newer upstream
changes; the merge resolution intentionally keeps the shared
`companiesListQueryOptions` helper while preserving local
existing-member invite behavior.
- Visual review evidence is no longer attached in-repo because
screenshots were removed from this PR per review request.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, with repository tool access,
terminal execution, and git/GitHub CLI operations.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] UI screenshots were intentionally removed from this PR per review
request
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: CodexCoder <codexcoder@paperclip.local>
2026-05-25 07:25:26 -05:00
Devin Foley 96f0279e08 Make ACPX-Claude adapter work seamlessly (PAPA-388) (#6590)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so when
an adapter fails, the platform must surface enough detail for the next
agent (or human reviewer) to act
> - The `acpx_local` adapter wraps `claude-agent-acp`, which in turn
drives the Claude Code SDK — three layers, three different permission
and error-handling models
> - A user created a `Claude Local ACPX` agent in PAPA-387 and it failed
instantly with the generic `acpx.error / "Internal error"` log,
stranding the work and triggering an opaque `stranded_assigned_issue`
recovery to the CTO
> - Once the diagnostic blackbox was opened, the underlying cause turned
out to be two SDK-level mismatches: a model-name allowlist that rejects
bare IDs like `claude-opus-4-7`, and a Claude Code
permission/Read-sandbox configuration that silently denies every
non-allowlisted tool when the user's `~/.claude/settings.json` has
`defaultMode: "dontAsk"`
> - This pull request fixes both classes of failure in the adapter
itself so new ACPX agents work seamlessly without per-host
configuration, and widens the diagnostic surface so the *next* failure
of any kind is actionable
> - The benefit is that ACPX-Claude can join the regular agent roster —
verified end to end on PAPA-401, where the agent successfully reached
the Paperclip API, opened a worktree, surveyed existing notification
PRs, and posted a structured plan

## What Changed

- Widen ACPX failure diagnostics
(`packages/adapters/acpx-local/src/server/execute.ts`):
- Capture `err.name`, ACP code, `cause.message`, retryable flag, and a
5-frame stack preview into `errorMeta`.
- Promote phase-specific error codes: `ensure_session →
acpx_session_init_failed`, `configure_session →
acpx_session_config_failed`, `turn → acpx_turn_failed`, plus mapping for
`ACP_BACKEND_MISSING` / `ACP_BACKEND_UNAVAILABLE`.
- Set `verbose: true` on the ACPX runtime so its session-event log flows
through `ctx.onLog`.
- Capture child-process stderr via a wrapper-script tee into
`<stateDir>/run-stderr/<runId>.log`, inline the tail into the
`acpx.error` payload as `childStderrTail`, and forward it through
`ctx.onLog("stderr", …)` so it lands in the heartbeat `stderrExcerpt`
column (existing redaction applies).
- Set the model via `ANTHROPIC_MODEL` env for the `claude` agent instead
of `set_config_option(model, …)`. The ACP server's `set_config_option`
handler validates against an internal allowlist and rejects bare IDs
like `claude-opus-4-7`. `ANTHROPIC_MODEL` is read during initialization
and bypasses that check.
- Seed `<worktree>/.claude/settings.local.json` before spawning
`claude-agent-acp` (the seamless-API fix). Since `claude-agent-acp`
hard-codes `settingSources: ["user", "project", "local"]` and "local"
has the highest precedence:
- Set `permissions.defaultMode: "default"`, but **only** if the user's
value is missing or `"dontAsk"` (the broken case). Other modes like
`acceptEdits`/`plan` are preserved.
- Pre-allow Paperclip's Bash surface (`Bash(curl:*)`, `Bash(env:*)`,
`Bash(<cwd>/scripts/paperclip-issue-update.sh:*)`,
`Bash(<cwd>/scripts/paperclip:*)`).
- Widen `permissions.additionalDirectories` to include `stateDir`,
`agentHome`, and the per-company instance root
(`~/.paperclip/instances/<id>/companies/<companyId>`). Scoped to this
company only — does not expose other tenants.
- Existing user entries are merged, not replaced. The resolved roots are
folded into the session fingerprint so warm-session handles invalidate
when they change.
- Sync the existing server-side integration test
(`server/src/__tests__/acpx-local-execute.test.ts`) to assert
`acpx_session_init_failed` instead of the now-removed
`acpx_protocol_error` for `ACP_SESSION_INIT_FAILED` (a follow-up to
commit 1).

## Verification

- `pnpm --filter "@paperclipai/adapter-acpx-local" run typecheck` —
passes.
- `pnpm vitest run` in `packages/adapters/acpx-local` — 35/35 pass,
includes 4 new tests covering the settings.local.json write path (claude
only, merge with pre-existing content, `dontAsk` override, codex no-op).
- `pnpm vitest run src/__tests__/acpx-local-execute.test.ts` in
`server/` — 15/15 pass after the test-sync commit.
- End-to-end manual verification (PAPA-401): the `Claude Local ACPX`
agent that previously hit "restricted environment" now successfully
reaches the Paperclip API, opens its worktree, posts structured plan
comments, and flips the issue to `in_review` without any external
configuration.

## Risks

- **Low**, scoped to the `acpx_local` adapter. The settings.local.json
write is per-worktree (worktrees live under
`.paperclip/worktrees/<issue>/`) and only triggers when `acpxAgent ===
"claude"`. Existing user content is merged with `[...existing,
...paperclip]` and deduped — nothing is overwritten outright.
- The `defaultMode` override is intentionally narrow: it only flips
`"dontAsk"` (which silently denies every tool and is the root cause) to
`"default"`. Users who explicitly picked `acceptEdits`, `plan`, or any
other mode keep their choice.
- Stderr capture goes through the existing `log-redaction` pass before
persisting, so `PAPERCLIP_API_KEY` and similar secrets in the wrapper
env don't leak into heartbeat logs.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- Claude Opus 4.7 (`claude-opus-4-7`), running in the `claude_local`
adapter via Paperclip's harness. Extended thinking enabled, tool use
enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (adapter-only)
- [ ] I have updated relevant documentation to reflect my changes — no
user-facing docs changed; internal commentary in the code change
explains the SDK constraints
- [x] I have considered and documented any risks above
- [ ] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-23 13:01:27 -07:00
Devin Foley e3c875c1c7 fix(sandbox): prevent E2B workspace upload + lease idle failures (PAPA-382) (#6560)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Heartbeats run inside managed sandboxes (E2B, Cloudflare Sandbox),
and each run begins by uploading the agent's workspace as a tar archive
> - PAPA-381's E2B runs were failing at 5 and 11 minutes — two distinct
failure modes were entangled: workspace tar extraction errors on Linux,
and sandbox idle/lease timeouts during normal heartbeat gaps
> - Workspace tar extraction failed because macOS bsdtar embeds
`LIBARCHIVE.xattr.*` PAX headers that GNU tar on Linux rejects with
"This does not look like a tar archive"; the existing
`COPYFILE_DISABLE=1` only suppresses AppleDouble `._*` sidecars, not
inline PAX xattr entries
> - E2B sandboxes also expired between heartbeats because `timeoutMs`
defaulted to a short window and was never refreshed per execute, and
Cloudflare sandboxes idled out because `sleepAfter` defaulted to 10
minutes
> - This pull request adds `--no-xattrs` to the workspace tar
invocation, refreshes the E2B sandbox lifetime on each execute and bumps
the default `timeoutMs` to 1h, and raises the Cloudflare `sleepAfter`
default to 1h
> - The benefit is that long-running heartbeat-driven runs (Claude,
Codex, etc.) survive across both their initial workspace upload and the
natural idle gaps between executes on both E2B and Cloudflare

## What Changed

- `packages/adapter-utils/src/sandbox-managed-runtime.ts`: added
`--no-xattrs` to `createTarballFromDirectory` so macOS bsdtar produces a
clean POSIX tar that GNU tar on Linux can extract, with an inline
comment explaining why `COPYFILE_DISABLE=1` alone is insufficient.
- `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: refresh the
sandbox lifetime on every execute (so long runs don't expire mid-job)
and raised the default `timeoutMs` to 1h.
- `packages/plugins/sandbox-providers/e2b/src/manifest.ts` and
`plugin.test.ts`: updated manifest defaults and added regression
coverage for the new behavior.
- `packages/plugins/sandbox-providers/cloudflare/src/config.ts`,
`manifest.ts`, `plugin.test.ts`: raised default `sleepAfter` from 10m to
1h, mirroring the E2B 1h default, and added a regression test asserting
the acquire-lease request body sends `sleepAfter: "1h"` when not
overridden.

## Verification

- `pnpm --filter @paperclipai/plugin-e2b test`
- `pnpm --filter @paperclipai/plugin-cloudflare-sandbox test`
- Locally cherry-picked the `--no-xattrs` fix onto master and confirmed
end-to-end via a real PAPA-381-style heartbeat-driven E2B run that the
workspace upload now extracts cleanly on Linux. The user (board
operator) tested this on master and reported "Ok, that worked."
- Manual reviewer steps: trigger an E2B heartbeat from a macOS host
(this is where the bsdtar xattr headers come from), confirm the
workspace tar extracts on the Linux sandbox side; run a long (>15 min)
Cloudflare sandbox flow and confirm no lost-lease/idle errors between
executes.

## Risks

- Low risk overall.
- `--no-xattrs` is widely supported by both macOS bsdtar and GNU tar
(Linux). Worst case it silently no-ops on a future host that doesn't
support it; in that case the existing failure mode reappears, it doesn't
introduce a new one.
- Raising default `timeoutMs` (E2B) and `sleepAfter` (Cloudflare) from
short values to 1h means sandboxes stay alive longer between executes by
default. This is the intended behavior — operators that want a tighter
idle window can still override via plugin config.
- E2B per-execute sandbox lifetime refresh adds a small API call per
execute; it is bounded by the same client that already handles execute
traffic, so no new dependencies or retry semantics.

## Model Used

- Claude (Anthropic), `claude-opus-4-7`, extended thinking enabled, tool
use enabled (file/grep/git tools and Paperclip control-plane API). Used
to diagnose the dual failure mode (workspace tar PAX xattr headers +
sandbox lifetime), write the fixes and tests, and drive the verification
loop with the board operator.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — no UI changes)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-22 13:34:11 -07:00
Dotta ad6effa65c [codex] Improve runtime and import reliability (#6549)
## Thinking Path

> - Paperclip coordinates autonomous company work through local and
hosted runtime surfaces.
> - Local embedded Postgres and tenant import/export paths are
foundational reliability pieces.
> - A runtime failure in either path can stop agents or imports before
useful work begins.
> - The branch included remaining fixes for embedded native library
bootstrap and async tenant import handling.
> - This pull request groups those runtime/import reliability changes
into one standalone PR.
> - The benefit is a more robust local runtime and safer cloud tenant
import behavior.

## What Changed

- Prepared embedded Postgres native runtime before startup in
CLI/server/test entrypoints.
- Added embedded Postgres native bootstrap coverage.
- Added async tenant import job handling and deferred validation
coverage.
- Kept the runtime/import changes based directly on current
`origin/master` after related upstream PRs had already merged.

## Verification

- `pnpm --filter @paperclipai/plugin-sdk build`
- `NODE_ENV=test pnpm exec vitest run
packages/db/src/embedded-postgres-native.test.ts
server/src/__tests__/company-portability-routes.test.ts`

## Risks

- Medium-low: this touches startup/import paths, but the branch is small
and covered by targeted tests.
- The embedded Postgres change depends on platform-specific
native-library behavior, so CI and follow-up checks should still verify
supported runners.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via `codex_local`, tool-enabled coding session;
exact context window not exposed by this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-22 09:57:22 -05:00
Dotta e43b392a79 [codex] Add local Cloud Upstream sync (#6548)
## Thinking Path

> - Paperclip is the control plane for AI-agent companies.
> - Operators need a path to move local company state toward Paperclip
Cloud without losing local-first control.
> - The Cloud Upstream flow needs API, persistence, CLI, and board UI
surfaces that agree on the same manifest/run model.
> - The existing branch had the feature work plus UX and error-handling
follow-ups.
> - This pull request packages the remaining Cloud Upstream sync work
into one standalone branch.
> - The benefit is an inspectable local-to-cloud sync workflow with
preview, conflicts, activation, and captured UX review states.

## What Changed

- Added Cloud Upstream shared types, server routes/services, and
persisted run schema/migration.
- Added Paperclip Cloud CLI sync helpers and local connection storage.
- Added the Cloud Upstream board UI, settings entry points, query keys,
and UX lab page.
- Added preview/activation checklist behavior, redirect handling,
manifest-only preview support, friendly errors, in-flight hints, and
entity count summaries.

## Verification

- `pnpm --filter @paperclipai/plugin-sdk build`
- `NODE_ENV=test pnpm exec vitest run cli/src/__tests__/cloud.test.ts
server/src/__tests__/instance-settings-routes.test.ts
server/src/__tests__/instance-settings-service.test.ts
ui/src/pages/CloudUpstream.test.tsx
ui/src/components/CompanySettingsSidebar.test.tsx`
- `NODE_ENV=test pnpm exec vitest run
server/src/__tests__/cloud-upstreams.test.ts`

Worktree setup note: the isolated worktree install skipped native sqlite
build scripts, so I copied the already-built local sqlite binding from
the main checkout before running
`server/src/__tests__/cloud-upstreams.test.ts`. The test then passed.

## Risks

- Medium: this adds a database migration and a broad feature path across
CLI/server/UI.
- Merge order: this is the only PR in this split with a DB migration;
merge it before any future Cloud Upstream migration follow-up.
- Mitigation: the PR is based directly on current `origin/master`, has
targeted route/service/UI tests, and keeps the feature behind existing
experimental Cloud Sync settings.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via `codex_local`, tool-enabled coding session;
exact context window not exposed by this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, screenshot artifacts are
intentionally omitted per reviewer request
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-22 09:56:22 -05:00
Dotta a1835cfa5e [codex] Harden plugin runtime invocation scope (#6547)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through a company-scoped
control plane.
> - Plugins extend that control plane, but plugin workers still call
back into host APIs.
> - Those worker-to-host calls need the same company boundary guarantees
as normal API routes.
> - Plugin action handlers also need authenticated actor context from
the host instead of trusting caller-supplied params.
> - This pull request hardens plugin bridge/action scope and keeps
plugin operation issues out of normal issue surfaces.
> - The benefit is safer plugin execution with clearer authorization
boundaries and better test coverage.

## What Changed

- Added host-owned invocation context plumbing for nested plugin worker
calls.
- Added actor context to plugin `performAction` calls and test harness
helpers.
- Enforced company invocation scope on worker-to-host calls and filtered
company lists to the active invocation scope.
- Extended plugin action route tests for board and agent actor context,
spoofed company params, and cross-company rejection.
- Extended plugin worker manager coverage for invocation-scope
propagation.
- Filtered typed and legacy plugin operation issue origins from default
issue/inbox lists.

## Verification

- `pnpm --filter @paperclipai/plugin-sdk build`
- `NODE_ENV=test pnpm exec vitest run
packages/plugins/sdk/tests/host-client-factory.test.ts
packages/plugins/sdk/tests/testing-actions.test.ts
server/src/__tests__/plugin-routes-authz.test.ts
server/src/__tests__/plugin-worker-manager.test.ts
server/src/__tests__/issues-service.test.ts`

Note: embedded Postgres issue-service tests reported host-level Postgres
init skip for 47 tests; the non-embedded targeted tests passed.

## Risks

- Medium: plugin host authorization paths are sensitive, and external
plugins may rely on previously loose company params.
- Mitigation: the change only tightens calls when the host attached a
company invocation scope and includes explicit tests for board, agent,
and nested worker calls.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via `codex_local`, tool-enabled coding session;
exact context window not exposed by this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-22 09:16:24 -05:00
Dotta 38c185fb8b [codex] Add agent permissions and controls plan (#6386)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies by keeping
task ownership, approvals, and operator control inside one control
plane.
> - Agent permissions and plugin-hosted company settings sit on the
boundary between autonomy and governance.
> - V1 needs scoped task assignment rules, plugin extension points, and
clearer company access surfaces without weakening company boundaries.
> - The branch builds the core authorization service, plugin SDK/host
APIs, and UI simplifications needed to support those controls.
> - Paperclip EE plugin surfaces were intentionally moved out of this
core PR per review direction, so this PR now carries only the public
core/plugin infrastructure work.
> - The latest updates preserve the PAP-9937 branch changes that belong
in this PR, remove the `design/` artifacts, and exclude the experimental
`plugin-briefs` package.
> - Greptile feedback was applied through the authorization/audit paths
and the final cleanup commit was re-reviewed at 5/5 with no unresolved
Greptile threads.
> - The benefit is safer assignment control with extension hooks for
richer permission products while preserving simple defaults for normal
operators.

## What Changed

- Added scoped task-assignment authorization decisions and routed
issue/agent assignment mutations through the authorization service.
- Added plugin SDK and host APIs for company settings slots,
authorization policy/grant management, assignment previews, and bridge
invocation scope propagation.
- Simplified core company access UI and moved advanced controls behind
plugin-provided settings surfaces.
- Added retry-now affordances for blocked issue next-step notices.
- Added protected-assignment enforcement for persisted
agent/project/issue policies, including explicit-grant fallback
behavior.
- Added incremental principal-access compatibility backfill for active
agent memberships and role-default human permission grants.
- Added the Markdown code block wrap action fix from the latest branch
changes.
- Removed `design/` artifacts from the PR and removed
`packages/plugins/plugin-briefs` from the final diff.
- Addressed Greptile feedback for plugin actor sanitization, legacy
membership handling, audit pagination, unknown grant-scope metadata, and
startup test mocks.

## Verification

- `pnpm exec vitest run server/src/__tests__/access-service.test.ts
server/src/__tests__/company-portability.test.ts` -> 2 files passed, 54
tests passed.
- `pnpm exec vitest run
server/src/__tests__/server-startup-feedback-export.test.ts
server/src/__tests__/access-service.test.ts
server/src/__tests__/company-portability.test.ts` -> 3 files passed, 62
tests passed.
- `pnpm exec vitest run
server/src/__tests__/authorization-service.test.ts
server/src/__tests__/plugin-access-authorization-host-services.test.ts
server/src/__tests__/server-startup-feedback-export.test.ts` -> 3 files
passed, 28 tests passed.
- `pnpm --filter @paperclipai/server typecheck` -> passed.
- `git diff --check` -> passed.
- `node ./scripts/check-docker-deps-stage.mjs` -> passed.
- `CI=true pnpm install --frozen-lockfile --ignore-scripts` -> passed
with no lockfile update.
- `pnpm exec vitest run
ui/src/components/MarkdownBody.interaction.test.tsx` -> 1 test passed.
- `git ls-files design packages/plugins/plugin-briefs | wc -l` -> 0.
- GitHub CI on `40cd83b53` -> all checks passed, merge state `CLEAN`.
- Greptile on `40cd83b53` -> 5/5, 102 files reviewed, 0
comments/annotations added, 0 unresolved review threads.
- Confirmed the PR diff contains no `design/`,
`packages/plugins/plugin-briefs`, `pnpm-lock.yaml`, or
`.github/workflows` changes.

## Risks

- Medium: task assignment authorization paths are behaviorally stricter
for protected/private policy data, so existing plugin-authored policies
may block assignment until explicit grants or approval flows are
configured.
- Medium: plugin-host authorization APIs expand the surface area
available to trusted plugins and need careful review for company
scoping.
- Low: startup now performs a principal-access compatibility backfill,
but the migration and runtime backfill use conflict-tolerant inserts.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled workflow with shell,
git, and GitHub CLI access.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-22 08:12:52 -05:00
Dotta c91a062326 [codex] Runtime control-plane fixes (#6380)
## Thinking Path

> - Paperclip orchestrates AI agents through a server-side control plane
> - That control plane depends on reliable issue state transitions,
plugin lifecycle behavior, import limits, and startup/shutdown handling
> - Several small runtime fixes had accumulated on the working branch
and were mixed with larger feature work
> - Keeping them separate makes the correctness fixes reviewable and
mergeable without waiting for cloud-sync UI work
> - This pull request groups the server/runtime control-plane fixes into
one standalone branch
> - The benefit is a tighter, safer runtime baseline for retries,
imports, plugin migrations, feedback flushing, and trusted cloud import
handling

## What Changed

- Fixed updated issue list pagination sorting and scheduled retry
comment handling.
- Re-applied pending plugin migrations during hot reload and fixed
plugin-schema worktree seed restore.
- Hardened public tenant DB startup, portable import body limits,
trusted cloud import errors, and trusted cloud tenant import mutation
access.
- Expired stale request confirmations after user comments.
- Added feedback export shutdown hardening so database-unavailable flush
loops stop cleanly.
- Guarded plugin worker `error` event emission when no listener is
registered.

## Verification

- `pnpm install --frozen-lockfile --ignore-scripts`
- `pnpm --filter @paperclipai/plugin-sdk build`
- `npm run install --prefix
node_modules/.pnpm/sqlite3@5.1.7/node_modules/sqlite3`
- `pnpm exec vitest run server/src/__tests__/issues-service.test.ts
server/src/__tests__/plugin-lifecycle-restart.test.ts
server/src/__tests__/server-startup-feedback-export.test.ts
server/src/__tests__/issue-comment-reopen-routes.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts
server/src/__tests__/issue-thread-interaction-routes.test.ts
server/src/__tests__/body-limits.test.ts
server/src/__tests__/feedback-flush-controller.test.ts
server/src/__tests__/error-handler.test.ts
server/src/__tests__/board-mutation-guard.test.ts
packages/db/src/backup-lib.test.ts` initially exposed local setup issues
and two 5s test timeouts.
- Rerun after local prereq build: `pnpm exec vitest run --testTimeout
15000 server/src/__tests__/issue-comment-reopen-routes.test.ts
server/src/__tests__/issue-thread-interaction-routes.test.ts
server/src/__tests__/feedback-flush-controller.test.ts
server/src/__tests__/server-startup-feedback-export.test.ts` passed.
- Some embedded Postgres-backed tests skipped on this host because local
Postgres init was unavailable.

## Risks

- Runtime-touching branch: startup/shutdown and issue interaction
behavior should be reviewed carefully.
- The feedback export change disables repeated flush attempts only for
database connection-refused failures; other upload failures still log
normally.
- The plugin worker error guard avoids process crashes from unhandled
EventEmitter errors but may hide errors from code paths that expected an
emitted listener.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent with local shell/git/tool use.
Exact hosted model ID and context-window size are not exposed by the
local Paperclip adapter runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-20 10:37:11 -05:00
Dotta 43c5bb81b6 [codex] Workspace diff polish (#6383)
## Thinking Path

> - Paperclip gives operators a workspace diff plugin so they can
inspect agent changes before review
> - The diff view needs reliable base-ref defaults and controls that
stay usable while scrolling large diffs
> - The working branch mixed those plugin improvements with unrelated
server and cloud work
> - Keeping the workspace diff plugin changes isolated makes them easy
to test and review
> - This pull request polishes the workspace diff plugin controls,
base-ref behavior, and sticky headers
> - The benefit is a more predictable diff review surface for agent
workspaces

## What Changed

- Fixed workspace diff default base-ref resolution.
- Improved split/unified and working-tree/against-ref pane controls.
- Made workspace diff headers stay sticky while scrolling.
- Added a review screenshot at
`screenshots/PAP-9841-workspace-diff.png`.

## Verification

- `pnpm install --frozen-lockfile --ignore-scripts`
- `pnpm --filter @paperclipai/plugin-sdk build`
- `pnpm --filter @paperclipai/plugin-workspace-diff exec vitest run
tests/plugin.spec.ts`
- Result: 9 tests passed.

## Risks

- UI-only plugin branch with low data risk.
- The default base-ref inference should be reviewed against unusual
worktree/upstream combinations.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent with local shell/git/tool use.
Exact hosted model ID and context-window size are not exposed by the
local Paperclip adapter runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-19 15:51:13 -05:00
Dotta d67347be77 [codex] Provider vault secrets UX (#6381)
## Thinking Path

> - Paperclip orchestrates AI agents that need scoped, auditable access
to secrets
> - Hosted and external deployments need provider vault configuration
without exposing secret values in Paperclip metadata
> - AWS Secrets Manager vault setup previously required too much manual
operator knowledge
> - Provider vault discovery and removal belong together as an
independent secrets-management improvement
> - This pull request adds AWS provider vault discovery/prefill plus
vault removal flows
> - The benefit is a safer operator path for configuring external secret
storage before higher-level cloud workflows depend on it

## What Changed

- Added shared validators/types for AWS provider vault discovery
payloads and safe provider metadata.
- Implemented AWS provider vault discovery preview on the server.
- Added provider vault removal service/route behavior.
- Added Secrets page UI for discovery prefill, removal messaging, and
related rendering coverage.
- Added Storybook provider-vault fixtures and captured screenshots for
the new UX states.

## Verification

- `pnpm install --frozen-lockfile --ignore-scripts`
- `pnpm exec vitest run packages/shared/src/validators/secret.test.ts
server/src/__tests__/aws-secrets-manager-provider.test.ts
server/src/__tests__/secrets-routes.test.ts
server/src/__tests__/secrets-service.test.ts
ui/src/pages/Secrets.render.test.tsx`
- Result: 4 files passed, 1 embedded Postgres-backed file skipped on
this host because local Postgres init was unavailable.
- `pnpm --filter @paperclipai/ui exec vitest run
src/pages/Secrets.render.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- Storybook screenshot capture against `Product/Secrets` on
`http://127.0.0.1:60381/iframe.html?id=product-secrets--secrets-inventory&viewMode=story&globals=theme:dark`

## Screenshots

Provider vaults tab after this change:

![Provider vaults
tab](https://raw.githubusercontent.com/paperclipai/paperclip/pap-9861-provider-vault-secrets/doc/screenshots/pr-6381/provider-vaults-tab.png)

AWS discovery candidate flow:

![AWS discovery candidate
flow](https://raw.githubusercontent.com/paperclipai/paperclip/pap-9861-provider-vault-secrets/doc/screenshots/pr-6381/aws-discovery-candidates.png)

Provider vault removal confirmation:

![Provider vault removal
confirmation](https://raw.githubusercontent.com/paperclipai/paperclip/pap-9861-provider-vault-secrets/doc/screenshots/pr-6381/remove-provider-vault-confirmation.png)

## Risks

- Secret provider metadata handling must remain non-sensitive;
validators reject credential-bearing Vault URLs and sensitive AWS
discovery keys.
- AWS discovery depends on deployment credentials being configured
correctly outside Paperclip-managed company secrets.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent with local shell/git/tool use.
Exact hosted model ID and context-window size are not exposed by the
local Paperclip adapter runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-19 15:50:23 -05:00
Devin Foley 4b1e92a588 feat(plugins): add Modal sandbox provider plugin (#6245)
## Thinking Path

> - Paperclip orchestrates AI agents through company-scoped
control-plane workflows and extensible runtime integrations.
> - Sandbox providers are part of that extension surface because they
let agents execute isolated work without baking each provider into the
core server.
> - Modal already offers managed sandboxes with filesystem, process,
timeout, and networking controls that map onto Paperclip's sandbox
provider contract.
> - The repo did not have a Modal provider plugin, so teams wanting
Modal-backed sandboxes had no first-party integration path.
> - This pull request adds a standalone
`packages/plugins/sandbox-providers/modal` plugin that implements the
provider contract, worker entrypoint, docs, and tests.
> - The benefit is that Modal can now be installed as a provider plugin
without expanding the core control-plane surface area.

## What Changed

- Added a new `packages/plugins/sandbox-providers/modal` package with
the plugin manifest, worker entrypoint, and exported plugin surface.
- Implemented Modal-backed sandbox lifecycle support for creation,
command execution, file operations, networking options, termination, and
metadata translation.
- Added focused Vitest coverage for config validation, env handling,
lifecycle flows, networking behavior, and error mapping.
- Documented installation, configuration, and usage requirements in the
plugin README.
- Removed misleading `MODAL_TOKEN_*` fallback behavior so authentication
relies on supported Modal credentials only.

## Verification

- `pnpm -r typecheck`
- `pnpm test:run`
- `pnpm build`
- `cd packages/plugins/sandbox-providers/modal && pnpm test`

## Risks

- Low to medium risk: this is isolated to a new plugin package, but
runtime behavior still depends on live Modal account credentials and
service-side sandbox semantics.
- Modal's current docs target a newer Node baseline than the repo
default, so the first live install should confirm credential loading and
sandbox startup behavior in a real Modal workspace.
- No UI or schema changes are included in this PR.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex via Paperclip `codex_local` agent (GPT-5-class Codex
coding model; exact backend model ID is not exposed by the runtime),
with tool use, shell execution, and code-editing capabilities enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-18 08:36:34 -07:00
Dotta 5071c4c776 [codex] Add workspace diff viewer plugin (#6071)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Operators need to inspect what agents changed inside execution and
project workspaces.
> - The existing workspace detail views did not provide a first-party
rich diff surface for staged, unstaged, head, renamed, binary,
oversized, and untracked changes.
> - The plugin system is the intended extension point for optional rich
UI surfaces.
> - This pull request adds a workspace diff plugin plus host services
and shared contracts so Changes tabs can render workspace diffs through
plugin slots.
> - The diff-renderer dependency should stay owned by the plugin package
rather than the core UI app.
> - The dependency surface must stay aligned with repository PR policy,
including intentionally omitting `pnpm-lock.yaml` from the PR.
> - The benefit is a more reviewable workspace surface without
hard-coding the renderer into every page.

## What Changed

- Added `@paperclipai/plugin-workspace-diff`, including diff
normalization, plugin manifest/worker/UI entrypoints, and focused plugin
tests.
- Kept `@pierre/diffs` scoped to `@paperclipai/plugin-workspace-diff`;
removed the core UI lab diff-renderer surface and direct UI package
dependency.
- Added shared workspace diff types and validators, plus plugin SDK
surface for workspace diff host services.
- Added server workspace diff service support and route coverage for
execution/project workspace diff flows.
- Wired Execution Workspace and Project Workspace Changes tabs to load
the diff plugin, including loading/error fallback behavior.
- Added UI tests and fixtures for the Changes tabs and plugin bridge
behavior.
- Added the new plugin package manifest to the Docker deps stage so PR
policy can validate dependency coverage.
- Addressed review hardening around empty untracked patches, workspace
path exposure, project workspace read capability checks, and default
base refs.

## Verification

- `pnpm --filter @paperclipai/plugin-workspace-diff test`
- `pnpm exec vitest run
packages/shared/src/validators/workspace-diff.test.ts
server/src/__tests__/workspace-diff-service.test.ts
ui/src/pages/ProjectWorkspaceDetail.test.tsx
ui/src/pages/ExecutionWorkspaceDetail.test.tsx`
- `pnpm exec vitest run ui/src/plugins/bridge.test.ts
server/src/__tests__/workspace-runtime-routes-authz.test.ts`
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/plugin-workspace-diff typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `node ./scripts/check-docker-deps-stage.mjs`
- Browser screenshot captured from the local worktree dev server:
https://files.catbox.moe/ofdpsp.png
- Confirmed branch is rebased onto `public-gh/master`,
`.github/workflows/pr.yml` is not included in the PR diff,
`ui/package.json` is not included in the PR diff, and `pnpm-lock.yaml`
is not included in the PR diff.

## Risks

- Medium UI integration risk: the Changes tab depends on the plugin slot
and host diff service path.
- Medium dependency risk: this adds `@pierre/diffs` in the plugin
package, but `pnpm-lock.yaml` is intentionally omitted per packaging
instructions because repository automation manages lockfile updates.
- Current CI blocker: downstream frozen installs fail until the
repository policy path for new plugin package dependencies is chosen.
- Diff rendering edge cases are covered for common working-tree and head
diff states, but very large repositories may still expose performance
limits.
- No migrations are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 class coding model, tool-enabled local execution
environment. Exact context window was not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-18 08:50:06 -05:00
Dotta d734bd43d1 [codex] Roll up May 17 branch changes (#6210)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies, so agent
work needs visible ownership, recovery, and operator controls.
> - This local branch had accumulated several related control-plane
reliability and operator-experience fixes across recovery actions,
watchdog folding, model-profile defaults, mentions, markdown editing,
plugin launchers, and small UI polish.
> - The branch needed to be converted into a PR against the current
`origin/master` without losing dirty work or including lockfile/workflow
churn.
> - The safest standalone shape is a single rollup PR because the
recovery/server/UI files overlap heavily across the local commits and
splitting would create avoidable conflicts.
> - This pull request replays the local branch onto latest
`origin/master`, preserves the uncommitted work as logical commits, and
adds a Zod 4 validator compatibility fix found during verification.
> - The benefit is that the May 17 local branch can be reviewed and
merged as one coherent, conflict-free branch under the 100-file Greptile
limit.

## What Changed

- Rebased the local May 17 branch work onto current `origin/master` in a
dedicated worktree.
- Preserved and committed previously dirty changes for recovery retry
handling, plugin/sidebar launcher polish, and `.herenow` ignores.
- Added recovery-action behavior for returning source issues to `todo`
when retrying source-scoped recovery.
- Included the existing local recovery/liveness/watchdog fold, Codex
cheap-profile, markdown/mention, duplicate-agent, and UI polish commits
from the branch.
- Normalized shared validator `z.record(...)` schemas to explicit
string-key records for Zod 4 compatibility.
- Confirmed the PR has no `pnpm-lock.yaml` or `.github/workflows/*`
changes and stays below the 100-file Greptile limit.

## Verification

- `pnpm install --frozen-lockfile --ignore-scripts`
- `npm run install` in
`node_modules/.pnpm/sqlite3@5.1.7/node_modules/sqlite3` to build the
local native sqlite3 binding after installing with scripts disabled
- `pnpm exec vitest run packages/shared/src/validators/issue.test.ts
packages/shared/src/project-mentions.test.ts
packages/adapter-utils/src/server-utils.test.ts
server/src/__tests__/heartbeat-model-profile.test.ts
server/src/__tests__/issue-recovery-actions.test.ts
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts
server/src/__tests__/plugin-local-folders.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/components/Sidebar.test.tsx
ui/src/components/SidebarAccountMenu.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/components/MarkdownEditor.test.tsx
ui/src/components/MarkdownBody.test.tsx
ui/src/lib/duplicate-agent-payload.test.ts
ui/src/pages/Routines.test.tsx`
- First pass: 13 files passed with 201 passing tests; 3 server files
failed before sqlite3 native binding was built.
- After rebuilding sqlite3:
`server/src/__tests__/heartbeat-model-profile.test.ts`,
`server/src/__tests__/issue-recovery-actions.test.ts`, and
`server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts`
passed/loaded; embedded Postgres tests were skipped by the local host
guard.
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/adapter-utils typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`

## Risks

- Medium risk: this is a broad rollup PR across recovery semantics,
server tests, shared validators, and UI surfaces.
- Some embedded Postgres tests skipped locally due the host guard, so CI
should provide the stronger database-backed signal.
- UI changes were covered by component tests, but no browser screenshot
was captured in this PR creation pass.
- This branch may overlap with existing recovery/liveness PR work; merge
this PR independently or restack/close overlapping branches rather than
merging duplicate implementations together.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local repository
and GitHub workflow, medium reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-17 17:15:06 -05:00
Dotta 705c1b8d81 [codex] Add routine env secrets support (#6212)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Scheduled routines are the control-plane path for recurring agent
work.
> - Routines already had dispatch/history, but their runtime environment
did not carry routine-owned secret bindings through execution.
> - Operators need routine-specific secrets that can override
project/agent env without exposing secret values in history, logs, or
access events.
> - This pull request adds the routine env runtime contract, wires it
into execution, and makes the routine UI/history surfaces show safe
secret metadata.
> - The benefit is that routine executions can use scoped secret refs
predictably while preserving company boundaries and auditability.

## What Changed

- Added routine env persistence/runtime support, including
`routines.env`, `routine_runs.routine_revision_id`, revision snapshots,
and idempotent migration `0086_routine_env_runtime_contract`.
- Resolved routine env during heartbeat adapter config assembly with
precedence `agent < project < routine` and secret access events recorded
against the routine consumer.
- Added secret binding synchronization for routine create/update/restore
flows and guarded cross-company, missing, disabled, and deleted secret
cases.
- Added a Secrets tab to routine detail, env/secret history diff
rendering, and Storybook coverage for the new UI states.
- Added server/UI regression tests, including an embedded-Postgres QA
path for routine secret execution and restore behavior.
- Updated implementation/database docs for routine env and
secret-binding behavior.

## Verification

- `pnpm install --frozen-lockfile` after rebasing onto
`public-gh/master` to refresh workspace links for the newly-added
upstream Grok adapter package.
- `pnpm exec vitest run
server/src/__tests__/heartbeat-project-env.test.ts
server/src/__tests__/routines-service.test.ts
server/src/__tests__/secrets-service.test.ts
server/src/__tests__/qa-routine-secrets-e2e.test.ts
ui/src/components/RoutineHistoryTab.test.tsx` passed: 5 files, 92 tests.
- `pnpm -r typecheck` passed across the workspace.
- `pnpm build` passed. Vite emitted the existing
large-chunk/dynamic-import warnings.
- UI screenshots were captured locally during QA in
`artifacts/pap-9521/` and `artifacts/pap-9522/`; generated screenshots
are not committed to avoid adding binary artifacts to the repo.

## Risks

- Migration risk is limited by `IF NOT EXISTS` guards for the new
columns, FK, and index, and the migration is ordered as `0086`
immediately after upstream `0085`.
- Runtime behavior changes env precedence for routine executions by
adding routine env as the highest-precedence layer; tests cover
agent/project/routine precedence.
- Secret handling is security-sensitive; tests cover value-free
manifests/events/errors, disabled/missing/deleted secrets, and
cross-company rejection.
- UI history now renders routine env/secret diffs; tests and Storybook
stories cover the main rendering paths.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with shell/tool use and
medium reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-17 16:30:34 -05:00
Devin Foley 573e9ec909 fix(grok-local): restore turn boundaries in streaming reasoning text (#6142)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The `grok-local` adapter streams reasoning text to the issue
"Working..." panel as the grok CLI runs
> - The `grok` CLI's `--output-format streaming-json` mode silently
drops the `\n` separator between reasoning turns around tool calls
> - Consecutive `thought` chunks (e.g. `` "`" `` followed by `"The"`)
arrive with no intervening whitespace event, so the UI's `delta: true`
concatenator merged them into run-on text like `"…planningGreat, now I
have the issue descriptionThe only co"`
> - This PR adds a small turn-boundary helper that detects sentence
boundaries in the upstream `thought` stream and inserts a single `\n`
only when the previous chunk ended with sentence punctuation (or a
balanced closing backtick) AND the next chunk begins a new uppercase
sentence
> - The benefit is readable streaming reasoning in the UI without
changing how completed messages are stored

## What Changed

- Added `packages/adapters/grok-local/src/shared/turn-boundary.ts` with
per-stream state (last chunk + backtick parity) and a
`restoreTurnBoundary()` helper that inserts `\n` only between balanced,
sentence-terminated `thought` chunks
- Wired the helper into `parseGrokJsonl` (server) and added a new
`createGrokStdoutParser` factory used by `grokLocalUIAdapter` for the
live "Working..." panel
- Added focused tests in `shared/turn-boundary.test.ts`, plus regression
assertions in `server/parse.test.ts` and `ui/parse-stdout.test.ts`

## Verification

- `pnpm --filter @paperclip/grok-local test` — 23/23 adapter tests pass
- `pnpm --filter @paperclip/grok-local typecheck` and UI typecheck —
clean
- Replayed an actual broken `grok 0.1.210` stream from the report;
previously-merged boundaries (`` `ls`The ``, `returned:Confirmed`) now
render with a separating newline; chunks inside un-closed backtick spans
are left alone

## Risks

- Low risk. Boundary insertion only fires when prev ends with
`.`/`!`/`?`/balanced `` ` `` and next begins with an uppercase ≥2-char
word, with no whitespace on either side. Worst case: a rare missed split
or a misplaced newline inside reasoning — both purely cosmetic and
confined to the live streaming panel.

## Model Used

- Claude Opus 4.7 (claude-opus-4-7), Anthropic, extended thinking + tool
use via Claude Code

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-16 11:48:51 -07:00
Devin Foley ab8b471685 Add built-in grok_local adapter (#6087)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so
adapter quality directly affects what runtimes the control plane can
supervise.
> - Local CLI adapters are one of the core execution surfaces because
they turn real coding tools into Paperclip-managed employees with
heartbeats, transcripts, and reviewability.
> - Grok Build was installed on the Paperclip host, but Paperclip had no
built-in `grok_local` adapter, so the runtime could not be configured
through the normal server/UI/CLI adapter path.
> - That gap needed to be closed with the same built-in registry,
environment diagnostics, transcript parsing, and skill/instructions
behavior that the other local adapters already rely on.
> - After the initial adapter landed, a real follow-up run showed that
Grok streaming text was being rendered one fragment per line, which made
transcripts harder to read even though the runtime itself was working.
> - This pull request adds the built-in `grok_local` adapter end-to-end
and then fixes the transcript parser so streamed Grok output is
coalesced into readable assistant/thinking blocks.
> - The benefit is that Grok Build becomes a first-class Paperclip
runtime with a usable operator experience instead of a partially wired
runtime with noisy transcript output.

## What Changed

- Added a new built-in `@paperclipai/adapter-grok-local` package with
server, UI, and CLI entrypoints.
- Implemented Grok execution, session handling, environment diagnostics,
config building, skill syncing, and parser coverage inside the new
adapter package.
- Registered `grok_local` across the built-in adapter inventories and
capability/display metadata in server, UI, CLI, and shared constants.
- Added adapter route coverage for the new built-in type.
- Fixed Grok transcript readability by emitting streamed `text` and
`thought` fragments as deltas so the shared transcript builder coalesces
them into readable message blocks.
- Added regression tests for the Grok parser and transcript coalescing
behavior.

## Verification

- `pnpm vitest run
packages/adapters/grok-local/src/ui/parse-stdout.test.ts
ui/src/adapters/transcript.test.ts`
- `pnpm --filter @paperclipai/adapter-grok-local build`
- Manual runtime verification on the Paperclip host during
implementation and follow-up review:
  - confirmed the Grok CLI was installed and authenticated
- confirmed the worktree dev server could be restarted cleanly and
health-checked after the parser follow-up
- No screenshots attached. This change is primarily adapter plumbing
plus transcript formatting behavior; reviewers can verify via the
Grok-backed run surfaces directly.

## Risks

- This adds a new built-in adapter, so any missed registration surface
could create inconsistencies between server, UI, and CLI behavior.
- The adapter depends on Grok Build's current event/output shape; if
upstream Grok streaming JSON changes, transcript parsing or session
extraction may need follow-up updates.
- The transcript readability fix intentionally changes how Grok
fragments are grouped, so any downstream code that implicitly expected
one entry per fragment would behave differently.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex via Paperclip `codex_local` agent runtime.
- GPT-5-class coding model with tool use, shell execution, file editing,
and repo inspection enabled.
- Exact backend model ID/context window were not surfaced to the agent
in this Paperclip session.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-16 09:51:09 -07:00
Dotta 4c47eb46c3 [codex] Add multilingual issue preservation coverage (#6069)
## Thinking Path

> - Paperclip orchestrates AI agents for autonomous companies.
> - Agents and board operators coordinate through company-scoped issues,
comments, documents, and heartbeat wake payloads.
> - Chinese, Japanese, and Hindi text needs to survive the full issue
lifecycle without normalization or prompt serialization damage.
> - The riskiest paths are board issue creation, server
issue/comment/document round-tripping, and scoped wake prompt rendering.
> - This pull request adds focused regression coverage across those
surfaces.
> - The benefit is higher confidence that multilingual operators and
agents can create, search, comment on, complete, and wake on issues
using non-Latin text.

## What Changed

- Added adapter-utils wake payload and prompt rendering coverage for
Chinese, Japanese, and Hindi issue/comment text.
- Added UI New Issue dialog coverage proving multilingual title and
description text is submitted unchanged.
- Added server route coverage that round-trips multilingual issue text
through create, search, comments, documents, completion comments, and
heartbeat context.
- Addressed Greptile feedback by using a typed storage mock and
splitting the server route integration path into smaller ordered
assertions.

## Verification

- `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts
ui/src/components/NewIssueDialog.test.tsx
server/src/__tests__/multilingual-issues-routes.test.ts`
- Result: 3 test files passed, 51 tests passed.

## Risks

- Low risk: this PR adds regression coverage only and does not change
runtime behavior.
- The new server test uses embedded Postgres support and skips on
unsupported hosts using the existing helper pattern.
- No migrations are included.
- No `pnpm-lock.yaml` changes are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected - check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 based coding agent, with shell, git, Vitest, and
GitHub connector/CLI tool use.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-15 12:49:57 -05:00
Dotta eb38b226c2 Fix LLM Wiki package and migration validation (#6010)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Plugins extend the control plane with optional capabilities such as
LLM Wiki.
> - LLM Wiki needs its package assets and plugin-owned database
migrations to work when installed from the packaged plugin.
> - The bundled spaces migration used validation-hostile dynamic SQL,
and the packaged plugin could omit non-dist runtime assets.
> - This pull request makes the LLM Wiki package include its required
assets and cuts the spaces migration over to explicit, idempotent SQL
that passes the production plugin database validator.
> - The benefit is a simpler plugin install path that validates and
applies the bundled LLM Wiki migrations without adding plugin-specific
legacy handling to Paperclip core.

## What Changed

- Added the LLM Wiki package asset allowlist so agents, migrations,
skills, templates, dist output, and README are included when packaged.
- Renamed the bootstrap `.gitignore` template to `gitignore.template`
and updated the runtime lookup so package tooling does not drop the
hidden template file.
- Relaxed plugin migration validation to allow namespace-scoped
`INSERT`/`UPDATE` backfills and `CREATE INDEX` statements while
continuing to reject destructive or cross-namespace SQL.
- Replaced the LLM Wiki spaces migration's dynamic constraint-drop DO
block with explicit `DROP CONSTRAINT IF EXISTS` statements.
- Replaced fragile regex-source dispatch in SQL reference extraction
with explicit capture-group descriptors.
- Added regression coverage that applies the bundled LLM Wiki migrations
through the production validator and checks the expected constraints.

## Verification

- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-database.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --filter @paperclipai/plugin-llm-wiki build`
- `git diff --check`
- Confirmed `pnpm-lock.yaml` is not included in the branch diff.

## Risks

- Low migration risk for current users: LLM Wiki spaces are new, so this
intentionally cuts over the plugin migration instead of adding legacy
handling in core.
- Validator behavior is broader than before, but still requires fully
qualified plugin namespace targets, blocks deletes/destructive DDL, and
keeps public table access read-only and allowlisted.

> Checked [`ROADMAP.md`](ROADMAP.md); this is a targeted plugin
packaging/migration fix and does not duplicate planned core feature
work. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 based coding agent, tool-enabled local repo
access, reasoning mode managed by the Paperclip/Codex runtime. Exact
context window was not surfaced in this session.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-15 10:20:02 -05:00
Dotta 03ad5c5bea [codex] Add issue document locking (#6009)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through company-scoped
issues, comments, and issue documents.
> - Issue documents are the durable place where plans, handoffs, and
other work artifacts are revised over time.
> - Some documents need to be preserved as operator-approved snapshots
while agents continue working on the same issue.
> - Without document locking, a later board or agent write can overwrite
the document key that reviewers expected to remain stable.
> - This pull request adds board-managed issue document locks and makes
agent writes to locked keys create a derived document instead of
mutating the locked document.
> - The benefit is safer document handoffs: approved or frozen issue
documents stay immutable until the board explicitly unlocks them.

## What Changed

- Added `locked_at`, `locked_by_agent_id`, and `locked_by_user_id`
document fields plus migration `0085_tranquil_the_executioner.sql`.
- Added document lock/unlock service behavior, route endpoints, activity
events, and locked-document write protections.
- Made agent document writes to locked keys create a new derived key
such as `plan-2` rather than overwriting the locked document.
- Surfaced lock state through shared issue document types, UI API
methods, document header lock controls, and activity formatting.
- Added server and UI tests for lock/unlock behavior, locked document
immutability, and UI action visibility.
- Updated `doc/SPEC-implementation.md` with the V1 document lock
contract and endpoints.

## Verification

- `git rebase public-gh/master` completed cleanly after committing the
branch changes.
- `git diff --check` passed before commit.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/documents-service.test.ts
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
ui/src/components/IssueDocumentsSection.test.tsx
ui/src/components/IssueContinuationHandoff.test.tsx
ui/src/lib/document-revisions.test.ts` passed: 5 files, 32 tests.

## Risks

- Medium risk because this changes the document persistence contract and
adds a migration.
- The migration uses `ADD COLUMN IF NOT EXISTS` and guarded foreign-key
creation so it remains safe for users who may have already applied an
earlier copy of the migration.
- Locked documents intentionally reject board edits/deletes/restores
until unlocked; any existing workflows that expected direct overwrite
need to unlock first.
- Agent writes to locked keys now create derived documents, which may
create extra issue documents when agents retry locked writes.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with tool use and local code
execution in the Paperclip worktree.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-15 08:54:55 -05:00
Devin Foley 1bd44c8a0d Harden Cloudflare sandbox execution (#5967)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Remote-managed adapters need sandbox/environment execution to behave
like real agent runs, not just local host probes.
> - The Cloudflare sandbox path was the weakest leg in the SSH +
Cloudflare QA matrix because bridge execution could truncate output,
time out long-running installs, and under-provision the worker instance.
> - That made several adapters fail for reasons unrelated to their
actual business logic, which blocks confidence in Paperclip's non-local
environment model.
> - This pull request hardens the Cloudflare bridge/runtime path and
adjusts sandbox probe budgets so adapter verification matches the
measured behavior of the fixed environment.
> - It also corrects the Pi sandbox install command so the QA matrix
exercises a real, supported install path.
> - The benefit is a materially more reliable SSH + Cloudflare adapter
matrix with fewer false negatives and clearer failure boundaries.

## What Changed

- Switched the Cloudflare bridge worker instance type to `standard-2`
for the QA-matrix execution path.
- Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE
keepalives so long-running install/exec calls can complete instead of
dying at the transport layer.
- Fixed Cloudflare bridge-channel command handling to avoid dropped
final stdout chunks on short-lived execs.
- Made Claude, OpenCode, and Cursor sandbox probe timeouts
configurable/sandbox-aware, then tightened the defaults to the measured
post-fix range.
- Updated the Pi sandbox install command to use the package currently
installed by the official `pi.dev` installer, pinned to a specific npm
version.
- Added/updated tests around Cloudflare bridge behavior and adapter
sandbox probe paths.

## Verification

- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/adapter-opencode-local typecheck`
- `pnpm --filter @paperclipai/adapter-cursor-local typecheck`
- `pnpm vitest run packages/adapters/cursor-local
packages/adapters/claude-local packages/adapters/opencode-local
packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare
server/src/services/__tests__/plugin-worker-manager.test.ts`
- Manual QA on the dedicated dev instance using the SSH + Cloudflare
environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes:
SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare
`claude_local`, `codex_local`, `cursor`, `gemini_local`.

## Risks

- Cloudflare sandbox cost increases because the bridge worker now runs
on `standard-2` instead of `lite`.
- Higher timeout ceilings can delay surfacing truly hung Cloudflare
bridge calls, even though they remove transport-level false negatives.
- The manual heartbeat matrix still exposed follow-on
execution/sync/disposition bugs in `opencode_local` and `pi_local`;
those are not fixed by this PR.

## Model Used

- OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`,
tool use enabled, repo search enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable)
- [x] I have updated relevant documentation to reflect my changes (not
applicable)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 22:00:10 -07:00
Dotta 4142559c37 [codex] Add blocked inbox attention view (#5603)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped issues, comments, approvals, and execution workspaces.
> - Operators need the Inbox to show not only active work, but also
blocked work that may need human or agent attention.
> - The existing inbox experience did not have a dedicated blocked-work
surface, so blocked tasks were harder to triage and resume deliberately.
> - Backend consumers also needed a compact attention signal that
distinguishes actionable blockers from covered or waiting blocker
states.
> - This pull request adds a Blocked Inbox tab backed by issue
blocker-attention metadata, shared validators, and UI helpers.
> - The benefit is a clearer triage path for stalled or blocked
Paperclip work without exposing external wait internals in the
operator-facing UI.

## What Changed

- Added shared issue blocker-attention types, validators, and exports
for the API/UI contract.
- Added backend blocker-attention computation and issue route support
for blocked inbox data.
- Added the Blocked Inbox tab, blocked reason chips, filtering/search
UI, responsive layouts, and Storybook stories.
- Updated inbox helpers and page behavior so toolbar controls only
appear where they apply.
- Added coverage for shared validators, server blocker-attention
behavior, blocked inbox UI helpers/components, and the Inbox page.
- Added a screenshot helper script for the blocked inbox Storybook
stories.
- Addressed Greptile feedback by making urgency sorting deterministic
for null stop times, avoiding full blocked-inbox list enrichment for
counts, and hardening the screenshot helper.

## Verification

- Rebased the branch cleanly onto `public-gh/master`.
- Confirmed the diff does not include `pnpm-lock.yaml`.
- Confirmed the diff does not include database migration files.
- Ran `pnpm exec vitest run packages/shared/src/validators/issue.test.ts
server/src/__tests__/issue-blocker-attention.test.ts
ui/src/components/BlockedInboxView.test.tsx
ui/src/components/BlockedReasonChip.test.tsx
ui/src/lib/blockedInbox.test.ts ui/src/lib/inbox.test.ts
ui/src/pages/Inbox.test.tsx`.
- Ran `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`.
- Checked `ROADMAP.md`; this is scoped inbox/operator triage work and
does not duplicate a listed roadmap feature.
- Greptile Review is green on the latest head and all four Greptile
review threads are resolved.
- GitHub PR checks are green on the latest head: policy, security/snyk,
e2e, verify, Canary Dry Run, Greptile Review, and serialized server
suites 1/4 through 4/4.

## Risks

- Medium review surface because this touches the shared issue contract,
server issue services, and the Inbox UI together.
- Blocker-attention classification may need product tuning after
operators use it on real blocked queues.
- UI screenshots were not attached in this PR-opening pass; the branch
includes `scripts/screenshot-blocked-inbox.mjs` and Storybook stories
for visual capture.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

OpenAI Codex, GPT-5-based coding agent with shell, git, GitHub CLI,
GitHub connector, and Paperclip API tool use. Reasoning mode: medium.
Context window: not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 16:41:36 -05:00
Dotta d1a8c873b2 fix(remote-sandbox): harden host workspace resumes (#5922)
## Thinking Path

> - Paperclip orchestrates AI agents through a control plane while
adapters execute work in local, remote, or sandboxed runtimes.
> - Remote sandbox execution depends on a strict host-versus-remote
workspace boundary: the host prepares/restores files, while the adapter
command runs inside the sandbox cwd.
> - Jannes' PR #5823 identified host-side failure modes that were not
covered by replacement PR #5822.
> - Persisting a remote pod cwd in session params could poison the next
host heartbeat resume and make Paperclip inspect or upload system temp
roots.
> - Plugin sandbox providers also need a narrow way to receive
model-provider API keys without exposing the full server environment to
every plugin worker.
> - This pull request ports the host-side fixes from #5823 in the
current codebase style, with focused regression coverage.
> - The benefit is safer remote sandbox resumes and plugin worker
environment handling without broadening core plugin privileges.

## What Changed

- Persist host workspace cwd, not remote sandbox cwd, in `claude_local`
session params while retaining remote execution identity metadata.
- Reject saved session cwds that point at system roots before heartbeat
falls back to agent home workspace.
- Skip sockets, FIFOs, devices, and other non-file entries during
workspace restore snapshot capture/comparison.
- Pass a small model-provider API-key allowlist only to plugins
declaring `environment.drivers.register`.
- Added focused regression tests for remote Claude session params,
unsafe session cwd detection, plugin worker env filtering, and non-file
snapshot entries.

Credits: ports host-side fixes from Jannes' #5823.

## Verification

- `pnpm vitest run
packages/adapter-utils/src/workspace-restore-merge.test.ts
server/src/services/session-workspace-cwd.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/plugin-database.test.ts` (25 passed, 7 skipped by
existing embedded-Postgres host guard)
- `pnpm --filter @paperclipai/adapter-utils typecheck`
- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/server typecheck`

## Risks

- Low risk: changes are scoped to remote sandbox/session metadata,
workspace snapshot filtering, and plugin worker env setup.
- Sandbox-provider plugins now receive only the explicit model-provider
key allowlist; any provider needing another key name will need a
deliberate allowlist update.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local code
execution and repository editing.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 16:23:04 -05:00
Dotta b947a7d76c [codex] Improve local plugin development workflow (#5821)
## Thinking Path

> - Paperclip is the control plane for autonomous AI-agent companies.
> - Plugins are the extension point for adding capabilities without
expanding the core product surface.
> - Local plugin development needed a tighter CLI-first loop so plugin
authors can scaffold, run, install, inspect, and reload plugins without
reaching into internal package paths.
> - The server plugin install path also needed local-path handling that
keeps plugin identity, dashboard routes, and development watchers
coherent.
> - This pull request adds the CLI scaffold/install workflow, fixes the
server and SDK edge cases that blocked that loop, and updates the
agent-facing plugin creation skill and docs.
> - The benefit is that contributors can develop plugins from local
folders with a documented, repeatable happy path.

## What Changed

- Added `paperclipai plugin init` coverage and CLI wiring for local
plugin scaffolding.
- Improved local plugin install handling, plugin key route resolution,
dashboard capability behavior, and dev watcher startup/reload behavior.
- Fixed plugin SDK worker entrypoint validation for symlinked package
layouts.
- Added targeted tests for plugin init, server plugin authz/watcher
behavior, SDK worker host validation, and the authoring smoke example.
- Added a short local plugin development guide and refreshed the plugin
authoring guide plus `paperclip-create-plugin` skill instructions.

## Verification

- `pnpm run preflight:workspace-links && pnpm --filter
@paperclipai/plugin-sdk build && pnpm --filter
@paperclipai/create-paperclip-plugin typecheck && pnpm --filter
paperclipai typecheck && pnpm --filter @paperclipai/plugin-sdk typecheck
&& pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run --project paperclipai
cli/src/__tests__/plugin-init.test.ts`
- `pnpm exec vitest run --project @paperclipai/plugin-sdk
packages/plugins/sdk/tests/worker-rpc-host.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-dev-watcher.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/plugin-routes-authz.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --dir packages/plugins/examples/plugin-authoring-smoke-example
test`
- Confirmed `pnpm-lock.yaml` is not included in the PR diff.

## Risks

- Medium risk: this touches plugin install routing, CLI command
behavior, and the local development watcher.
- Local path plugin installs execute trusted local code by design; the
new docs call out that trust boundary.
- No database migrations are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled local shell and git
workflow, medium reasoning effort. Context window details were not
exposed in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

UI screenshots: not applicable; this PR changes CLI/server/plugin docs
and tests, not board UI rendering.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 17:38:24 -05:00
Dotta 0808b388ee [codex] Add source-scoped recovery actions (#5599)
## Thinking Path

> - Paperclip is a control plane for autonomous AI companies, where work
must end with a clear disposition rather than ambiguous agent liveness.
> - Recovery currently detects stalled or missing-next-step issues, but
source issue recovery can become split across child recovery issues,
blockers, and comments.
> - That makes it harder for operators and agents to see who owns
recovery and what exact action is needed on the original issue.
> - Source-scoped recovery actions give the original issue a first-class
active recovery state with owner, evidence, wake policy, and resolution
outcome.
> - This pull request adds the recovery-action data model, backend
reconciliation and resolution APIs, and board UI indicators/actions.
> - The benefit is clearer stalled-work recovery without losing source
issue context or relying on comments as the liveness path.

## What Changed

- Added the `issue_recovery_actions` schema, shared
types/constants/validators, and an idempotent
`0084_issue_recovery_actions` migration ordered after current `master`
migrations.
- Updated stranded/missing-disposition recovery to create source-scoped
recovery actions, wake the recovery owner on the source issue, and avoid
locking the source issue for recovery-action wakes.
- Added API support for reading active recovery actions on issue
detail/list surfaces and resolving them with restored, blocked,
cancelled, or false-positive outcomes.
- Require blocked recovery resolutions to have an unresolved first-class
blocker, and removed the UI shortcut that could mark recovery blocked
without a blocker selection path.
- Surfaced recovery indicators/actions in the issue UI, blocker notices,
active run panels, issue rows, and Storybook coverage.
- Updated docs and focused tests for recovery semantics, ownership,
races, stale comments, and UI behavior.

## Verification

- `pnpm exec vitest run
server/src/__tests__/issue-recovery-actions.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/components/IssueBlockedNotice.test.tsx ui/src/api/issues.test.ts`
— 5 files, 72 tests passed.
- `pnpm --filter @paperclipai/shared typecheck` — passed.
- `pnpm --filter @paperclipai/db typecheck` — passed, including
migration numbering check.
- `pnpm --filter @paperclipai/server typecheck` — passed.
- `pnpm --filter @paperclipai/ui typecheck` — passed.
- Follow-up verification after blocker-resolution guard: `pnpm exec
vitest run server/src/__tests__/issue-recovery-actions.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/api/issues.test.ts` — 3 files, 27 tests passed.
- Follow-up `pnpm --filter @paperclipai/server typecheck` — passed.
- Follow-up `pnpm --filter @paperclipai/ui typecheck` — passed.
- UI states are available in
`ui/storybook/stories/source-issue-recovery.stories.tsx`; screenshot
capture helper is `scripts/screenshot-recovery-card.cjs`.

## Risks

- Medium: recovery behavior changes from child recovery issue ownership
toward source-scoped actions, so operators may see stalled-work state in
new places.
- Migration risk is mitigated by using the next migration slot after
`master` and making the table/constraints/index creation idempotent for
anyone who previously applied the old branch-local
`0082_dizzy_master_mold` migration.
- Existing child recovery issue paths are still guarded for
already-created recovery issues, but new source-scoped flows should be
watched in CI and Greptile review.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use enabled for shell, Git,
GitHub, and local test execution. Context window not exposed by the
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 09:37:15 -05:00
Devin Foley c445e59256 fix(ui): fix message attribution for agent-posted comments with user author IDs (#5780)
## Thinking Path

> - Paperclip’s issue chat is an audit surface: reviewers need to trust
who actually authored a message.
> - Some historical agent comments were persisted with `authorUserId`
and no surviving `createdByRunId`, so the UI rendered real agent output
as if it came from the board user.
> - A pure timestamp-window fallback is too risky because human
reviewers can comment while agents are running.
> - The safe recovery path is to derive attribution only when the server
can prove it from same-issue run logs that include the exact posted
comment id, then let the chat renderer prefer that recovered agent
attribution.
> - This keeps historical threads trustworthy without mutating old
database rows or guessing in ambiguous cases.

## What Changed

- Added shared `IssueComment` fields for derived attribution so server
and UI can carry recovered `derivedAuthorAgentId`,
`derivedCreatedByRunId`, and `derivedAuthorSource` consistently.
- Added server-side attribution recovery in
`server/src/services/issues.ts` that reads same-issue run logs and only
derives agent authorship when a run log contains the exact `comment id:
...` emitted during posting.
- Updated issue chat rendering in `ui/src/lib/issue-chat-messages.ts` to
prefer direct agent authorship, then activity-log `runAgentId`, then the
server-derived attribution.
- Removed the unsafe UI-only run-window fallback from
`ui/src/pages/IssueDetail.tsx` so human comments posted during an active
run are not silently relabeled as agent output.
- Added regression coverage for both the run-log derivation path and the
chat-rendering fallback behavior.
- Bounded server-side run-log enrichment to 8 concurrent reads per
request and removed the unused `issueCommentSchema` declaration during
PR cleanup.

## Verification

- `pnpm exec vitest run ui/src/lib/issue-chat-messages.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm test:run:general`
- Live validation on May 12, 2026 in `PAPA-322`: confirmed the
previously misattributed historical comments on `PAPA-316` now render as
Claude-authored on `http://goldie.gerbil-company.ts.net:3100`.
- Reviewer check: open `PAPA-316` in the running instance and confirm
historical comments such as `## Investigation: exe.dev 422 + codex
re-test` render under Claude instead of the board user.

## Risks

- Low risk. The change is scoped to comment attribution recovery and
rendering.
- Derived attribution is intentionally conservative: if there is no
exact run-log proof, the comment remains user-authored instead of
guessing.
- Run-log recovery depends on retained same-issue logs, so older
comments without that evidence remain unchanged.

## Model Used

- OpenAI Codex via the Paperclip `codex_local` adapter (GPT-5-class
coding agent with tool use in the local Paperclip runtime; the exact
deployment/model ID is not surfaced by this workspace).

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-12 01:20:49 -07:00
Dotta 563413ecd4 Fix LLM wiki type contracts (#5758)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies, and
plugins extend that control plane without bloating core.
> - The LLM Wiki plugin adds a knowledge surface through the plugin
runtime and shared plugin UI components.
> - After the LLM Wiki work merged to `master`, CI exposed TypeScript
contract drift between plugin code, SDK component types, and update
settings types.
> - The ingestion settings update path intentionally accepts partial
source toggles, but its type intersected with the full settings shape
and required every source key.
> - The LLM Wiki UI also passes managed routine default-drift metadata
through the shared routine list item shape, but that metadata was
missing from the public item type.
> - This pull request narrows those type contracts to match the existing
runtime behavior.
> - The benefit is restoring typecheck on `master` with a small,
non-behavioral follow-up.

## What Changed

- Added a `WikiEventIngestionSettingsUpdate` type that permits partial
source updates without weakening normalized stored settings.
- Added managed routine default-drift metadata to the plugin SDK
`ManagedRoutinesListItem` type.
- Mirrored that managed routine default-drift type in the host UI
component item type.

## Verification

- `pnpm --filter @paperclipai/plugin-llm-wiki typecheck`
- `pnpm --filter @paperclipai/plugin-sdk typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `git diff --check`

## Risks

- Low risk. This is a TypeScript type-contract fix only; no runtime
behavior or database schema changes.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5-based coding agent, tool-enabled local repository
editing and command execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Notes on checklist applicability: no screenshots are included because
the UI change is a shared type-only contract update with no visual
behavior change; no docs were required because no behavior or commands
changed.

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 21:07:06 -05:00
Dotta 508355b8fc [codex] Add LLM Wiki plugin package to master (#5716)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system is the extension surface for optional product
capabilities without baking every workflow into core.
> - The LLM Wiki plugin package was reviewed in stacked PR #5592, which
targeted `pap-9173-llm-wiki-rest`.
> - The stack base PR #5597 merged to `master` before #5592 was merged
into that branch, so the plugin package never reached `master`.
> - A direct PR from `pap-9173-llm-wiki-rest` back to `master` would be
noisy because that branch has diverged from current `master`.
> - This pull request reapplies the reviewed
`packages/plugins/plugin-llm-wiki/` package onto current `master` and
updates Docker deps-stage manifest coverage.
> - The branch intentionally no longer changes `pnpm-workspace.yaml`
after maintainer feedback; because the new package is now a root
workspace importer, the remaining integration question is how
maintainers want the root lockfile handled under the current PR policy.

## What Changed

- Added the LLM Wiki plugin package under
`packages/plugins/plugin-llm-wiki/` from the merged PR #5592 head.
- Preserved the post-review cleanup from #5592: generated
design/screenshot artifacts are not committed, and `src/ui/index.tsx` /
`src/wiki.ts` are small public entrypoints.
- Added the new plugin package manifest to the Docker deps stage so
policy can validate package manifest coverage.
- Removed the earlier `pnpm-workspace.yaml` exclusion per maintainer
request, so the plugin is included by the existing `packages/plugins/*`
workspace glob.

## Verification

Current head:
- PGlite migration harness: ran migrations 001-003, verified old
non-space distillation unique constraints were removed, inserted
duplicate cursor and work-item keys in a second space, then reran
migration 003 successfully
- `node ./scripts/check-docker-deps-stage.mjs`
- `git diff --check`

Known current-head install result after removing the workspace
exclusion:
- `pnpm install --frozen-lockfile` fails because `pnpm-lock.yaml` has no
importer for `packages/plugins/plugin-llm-wiki/package.json`.

Previously verified on the same plugin source before the
workspace-exclusion removal:
- `pnpm --filter @paperclipai/plugin-sdk build`
- `cd packages/plugins/plugin-llm-wiki && pnpm install --lockfile=false
&& pnpm test`

## Risks

- The branch now includes `packages/plugins/plugin-llm-wiki` in the root
workspace but does not update `pnpm-lock.yaml`. Root frozen install will
fail until maintainers choose a lockfile path that fits repo policy.
- Committing `pnpm-lock.yaml` directly on this PR conflicts with the
current PR policy check, while excluding the package from
`pnpm-workspace.yaml` was rejected in maintainer feedback.
- The package includes UI code already reviewed in #5592; generated
screenshot/design artifacts were intentionally removed per maintainer
request, so visual review should regenerate screenshots locally if
needed.
- The package depends on plugin host support from #5597, which is
already merged to `master`.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run the targeted checks listed above
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stack context: #5592 was merged into `pap-9173-llm-wiki-rest` after
#5597 had already merged that branch to `master`, so this follow-up PR
is needed to carry the plugin package itself into `master`.

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 20:45:41 -05:00
Devin Foley ad0bb57350 Fix exe.dev sandbox installs for gemini/opencode local adapters (#5737)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, including
running adapter CLIs inside remote sandboxes
> - The QA matrix in PAPA-316 spins up local-runtime adapters
(claude/gemini/opencode) against both SSH and the new exe.dev sandbox
provider, and "Test" exercises the same install + probe path the real
runtime uses
> - On exe.dev the QA matrix failed at three different points:
SSH/sandbox secret refs would not resolve, gemini-local could not find
npm, and opencode-local installed a binary that was not on the
probe-shell PATH
> - These are all environment-shape issues the runtime should handle,
not regressions in any individual adapter, so they need to be fixed in
the shared install/resolve layer before the matrix can pass
> - This pull request wires the environment id through to secret-ref
resolution, bootstraps npm from a portable Node tarball when the sandbox
image lacks Node, and symlinks the opencode binary into a directory that
non-login shells see
> - The benefit is that the QA matrix passes end-to-end on exe.dev, and
any future sandbox provider that ships without Node or relies on rc-file
PATH wiring gets the same fixes for free

## What Changed

- `server/src/services/environment-execution-target.ts`: pass the
environment `id` into `resolveEnvironmentDriverConfigForRuntime` for
both the sandbox and SSH branches, so `privateKeySecretRef` /
sandbox-provider secret refs (e.g. exe.dev `apiKey`) can resolve against
the secret store at runtime instead of throwing `Runtime secret
resolution requires an environment id`.
- `packages/adapter-utils/src/sandbox-install-command.ts`: extend
`buildSandboxNpmInstallCommand` with an `ENSURE_NPM_PREAMBLE` that, when
`npm` is missing, downloads a portable Node v22 tarball into
`$HOME/.local` and sets `PAPERCLIP_NPM_BOOTSTRAPPED=1` so the install
step skips sudo (sudo's `secure_path` would lose the freshly-installed
`npm` in `$HOME/.local/bin`). Distro-packaged Node from apt-get is
intentionally avoided because it tends to be too old to parse modern JS
syntax used by `@google/gemini-cli`.
- `packages/adapters/gemini-local/src/index.ts`: switch the hardcoded
`npm install -g @google/gemini-cli` to `buildSandboxNpmInstallCommand`,
so gemini-local picks up the same sudo-aware + npm-bootstrap behavior as
the other local adapters.
- `packages/adapters/opencode-local/src/index.ts`: append a step to the
install command that symlinks `$HOME/.opencode/bin/opencode` into
`$HOME/.local/bin`. The upstream installer only adds `~/.opencode/bin`
to PATH via `~/.bashrc`, which non-login `sh -c` probe invocations do
not source.
- `packages/adapter-utils/src/sandbox-install-command.test.ts`: cover
the new preamble plus the unchanged root/sudo/user-prefix branches.

## Verification

- `cd packages/adapter-utils && npm test -- sandbox-install-command`
(passes; new "bootstraps npm from a portable Node tarball when missing"
case is included).
- Manual: ran the in-app `Test` action against the QA matrix dev
instance for `QA exe.dev Claude`, `QA exe.dev Gemini`, and `QA exe.dev
OpenCode` — all three now report `status=pass` including the hello
probe. `QA SSH Claude` also passes; without the environment-id fix, SSH
resolution threw before the wrapper / install fixes could run.
- Suggested reviewer check: re-run the matrix on a fresh exe.dev
environment and confirm the install step no longer hits `npm: command
not found` for gemini and the opencode probe no longer hits `opencode:
command not found`.

## Risks

- Low/medium. The npm bootstrap pins Node `v22.11.0` from
`nodejs.org/dist`; if that URL becomes unreachable the install will fail
with a clear `curl` error rather than corrupting state. The bootstrap
path is only taken when `npm` is genuinely missing, so existing sandbox
images that ship with Node are unaffected.
- The opencode symlink uses `ln -sf` into `$HOME/.local/bin`, which is
created with `mkdir -p`; idempotent on re-install.
- The `id` change is a strict additive: callers previously got
`undefined` and only the secret-ref code paths actually read it. No
behavior change for environments without secret refs.

## Model Used

- Claude (Anthropic), `claude-opus-4-7`, with extended thinking and tool
use enabled. Iterated through the Paperclip QA matrix harness; no other
model assisted.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (n/a — runtime/install path only)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 14:28:22 -07:00
Devin Foley 5a64cf52a1 Add exe.dev sandbox provider plugin (#5688)
> _Stacked on top of #5685#5686#5687. Diff against master includes
commits from earlier PRs in the stack — review focuses on the two new
commits (`Add long-secret textarea variant to JsonSchemaForm
SecretField` + `Add exe.dev sandbox provider plugin`)._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose the
provider — today E2B, Daytona, and (in this stack) Cloudflare
> - exe.dev offers per-VM sandboxes via a small CLI / HTTP API — useful
for operators who want full Linux VMs (vs container/runtime-only
sandboxes)
> - The plugin shape mirrors the e2b plugin: lifecycle hooks (`new`,
`ls`, `rm`) drive exe.dev's CLI; SSH plumbing handles direct VM access
for adapters that need it
> - exe.dev VMs come up bare — `node` is not preinstalled, so the
Paperclip sandbox callback bridge (a Node script) needs Node 20
installed at VM init via `--setup-script`. The plugin defaults the setup
script to a Nodesource install
> - The auth field accepts long SSH private keys, which need a textarea
variant of the existing `SecretField` in `JsonSchemaForm` — added behind
a `maxLength > THRESHOLD` opt-in so other secret fields are unaffected
> - The benefit is that operators get exe.dev as a fully working sandbox
provider out of the box, with no manual VM provisioning required

## What Changed

**Shared UI support (`Add long-secret textarea variant to JsonSchemaForm
SecretField`):**

- `ui/src/components/JsonSchemaForm.tsx` + new
`JsonSchemaForm.test.tsx`: when a secret-formatted field declares
`maxLength` larger than the existing single-line threshold, render a
monospace textarea instead of the masked input. Short secrets (API keys,
tokens) keep the existing masked-input + show/hide toggle behavior.

**The exe.dev plugin (`Add exe.dev sandbox provider plugin`):**

- `packages/plugins/sandbox-providers/exe-dev/`: plugin entry, manifest,
plugin runtime, README, and 19-test Vitest suite.
- Manifest fields: API token (with `secret-ref` + `/exec` permission
notes — needs `new`, `ls`, `rm`), API URL override, optional SSH
username, optional SSH private key (uses the new `JsonSchemaForm`
textarea variant via `maxLength: 4096`), optional SSH identity-file
path, optional setup script.
- Default `--setup-script` is a Nodesource Node 20 install. exe.dev VMs
come up bare and the Paperclip sandbox callback bridge is a Node script,
so without Node preinstalled the bridge can't start. Operators can
override by supplying their own setup script.
- `runLifecycleCommand` redacts env values from the executed command
before surfacing it in error messages, so secrets passed via
`--env=KEY=VALUE` don't leak into operator-visible failures.
- The plugin distinguishes exe.dev's SSH onboarding failures (`Please
complete registration by running: ssh exe.dev`) from general SSH
failures and surfaces a clear remediation message.
- `scripts/release-package-manifest.json`: register the new plugin for
CI publish alongside the existing daytona / e2b providers.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
ui/src/components/JsonSchemaForm.test.tsx`
- `(cd packages/plugins/sandbox-providers/exe-dev && pnpm test)` — 19
passing

For an operator-side smoke test:

1. Get an exe.dev API token with `/exec` permission for `new`, `ls`,
`rm`.
2. Register the plugin in your Paperclip instance, configure an
environment with the token.
3. Create a sandbox env whose provider is `exe-dev`, then run a Codex or
Claude job against it. The default Node 20 setup script should bring the
VM up automatically.

## Risks

- Adds a new sandbox provider plugin that follows the existing daytona /
e2b shape; behavior on existing providers is unchanged.
- The `JsonSchemaForm` textarea variant only engages for fields that opt
in via `maxLength` larger than the existing threshold. All existing
secret fields (which don't declare a `maxLength`) keep their current
rendering. Test coverage pins both paths.
- The redaction in `runLifecycleCommand` is a defense-in-depth measure;
the test suite exercises the redaction path. If the redaction misses a
future env-arg shape, the worst case is restored behavior (secrets in
error messages), which is what the existing daytona / e2b plugins also
do today.
- Default setup script downloads from `deb.nodesource.com` over HTTPS at
VM init. Operators on air-gapped networks or with a different package
strategy can override the setup script.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — UI change is a textarea variant of an existing secret
field; will attach screenshots before requesting merge
- [x] I have updated relevant documentation to reflect my changes
(plugin README, manifest descriptions)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 07:42:18 -07:00
Devin Foley 486fb88a15 Add Cloudflare sandbox provider plugin (#5687)
> _Stacked on top of #5685#5686. Diff against master includes commits
from earlier PRs in the stack — review focuses on the two new commits
(`Extend sandbox callback bridge for Worker-hosted plugins` + `Add
Cloudflare sandbox provider plugin`)._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs in a sandbox environment, and operators choose which
provider backs that sandbox — today E2B and Daytona are bundled with the
platform
> - Cloudflare Workers + Durable Objects + the Sandbox SDK offer a
credible new option: globally distributed, cheap idle, and
operator-deployable as a single Worker
> - To plug it in, Paperclip needs (a) a provider plugin that speaks the
`PaperclipPluginManifestV1` lifecycle and (b) a small operator-deployed
Worker — the **bridge** — that adapts Paperclip's runtime RPCs to the
Cloudflare Sandbox SDK
> - The plugin extends the existing sandbox-callback-bridge with a
`bridge.transport: "worker"` discriminator so the platform routes
runtime RPCs through the Worker bridge instead of the in-process runner
> - This pull request adds the plugin, the bridge Worker template, and
the supporting adapter-utils + server hooks the new transport needs
> - The benefit is that operators can run sandboxes on Cloudflare's edge
with no new platform code beyond installing the plugin and deploying the
Worker

## What Changed

**Shared support (`Extend sandbox callback bridge for Worker-hosted
plugins`):**

- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
expose `expectedHostHeader` so plugin-side bridge clients can verify the
canonical request envelope before forwarding.
- `packages/adapter-utils/src/command-managed-runtime.{ts,test.ts}`:
relax the always-fresh runner construction so callers can re-use a
runner across exec calls (Worker-hosted bridges hold the runner inside a
Durable Object).
- `server/src/services/environment-runtime.ts` +
`environment-runtime.test.ts`: route Worker-hosted bridges through the
same env-shaping path as E2B and pin the `requestEnv` contract.
- `server/src/services/plugin-environment-driver.ts`: thread an optional
`issueId` through the runtime descriptor so bridges can scope leases to
the originating issue (used by Cloudflare to map a sandbox to the
issue/workflow for billing and audit).
- `packages/plugins/sdk/src/protocol.ts`: add `issueId?` to
`PluginEnvironmentDriverBaseParams` and the new `bridge.transport:
"worker"` discriminator that the new plugin declares.
- `server/__tests__/heartbeat-plugin-environment.test.ts`: pin the
heartbeat path against the new runtime descriptor.

**The Cloudflare plugin itself (`Add Cloudflare sandbox provider
plugin`):**

- `packages/plugins/sandbox-providers/cloudflare/`: plugin entry,
manifest, plugin runtime (lifecycle + bridge client), config parsing,
and Vitest coverage. Manifest declares `bridge.transport: "worker"` so
the platform routes runtime RPCs through the bridge client.
- `bridge-template/`: a Worker template the operator deploys with
`wrangler`. Owns Durable Object-backed sessions (`sessions.ts`),
exec/stream routes (`exec.ts`, `routes.ts`), and an HMAC auth layer
(`auth.ts`) that pins the `Host` header surface. Includes the
SDK-contract-correct exec implementation, lease recovery, and chunked
stdout/stderr streaming.
- Tests cover lease/session handoff (`bridge-template/src/exec.test.ts`,
`routes.test.ts`), bridge client request shaping
(`src/bridge-client.test.ts`), and end-to-end plugin behavior
(`src/plugin.test.ts`) including streamed exec output. 27 tests in
total.
- `README.md` walks the operator through deploying the bridge Worker,
registering the plugin, and configuring the runtime.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/sandbox-callback-bridge.test.ts
packages/adapter-utils/src/command-managed-runtime.test.ts
server/src/__tests__/environment-runtime.test.ts
server/src/__tests__/heartbeat-plugin-environment.test.ts`
- `(cd packages/plugins/sandbox-providers/cloudflare && pnpm test)` — 27
passing

For an operator-side smoke test:

1. Deploy the bridge: `cd
packages/plugins/sandbox-providers/cloudflare/bridge-template &&
wrangler deploy`
2. Register the plugin in your Paperclip instance, point its bridge URL
at the deployed Worker, set the HMAC shared secret.
3. Create a sandbox environment whose provider is `cloudflare`, then run
a Codex or Claude job against it.

## Risks

- Adds a new `bridge.transport: "worker"` code path, but the existing
E2B / Daytona transports go through the same shaped helpers and have
explicit test coverage that pins their behavior unchanged.
- The Worker bridge stores session state in a Durable Object; operator
instances must be aware of the corresponding Cloudflare costs (DO
requests, storage). Documented in the README.
- The `issueId` plumbing is optional throughout — existing plugins that
don't supply it continue to work.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
(plugin README, bridge-template README)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 07:33:13 -07:00
Devin Foley 0fe39a2d5c fix(cursor-local): resolve sandbox agent installs from cursor bin (#5686)
> _Stacked on top of #5685 (Harden remote sandbox runtime). Diff against
master includes commits from earlier PRs in the stack — review focuses
on the new commit only._

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The cursor-local adapter wraps the Cursor Agent CLI so a Paperclip
workflow can drive it inside a sandbox
> - When the adapter runs in a remote sandbox, the Cursor Agent CLI
installs under `$HOME/.local/bin/cursor-agent` (or wherever
`$XDG_BIN_HOME` points), not on the global PATH
> - The existing post-install resolution assumed `cursor-agent` would
resolve via the sandbox's login shell PATH after `npm install -g`, which
fails on sandboxes where the install lands in a user-prefixed directory
that isn't on PATH at probe time
> - This pull request resolves the agent CLI from the cursor binary's
own directory (`dirname "$(command -v cursor)"`) so the install probe
and execute path agree on a real binary location
> - The benefit is that cursor-local works correctly on any sandbox
provider where `npm install` lands in a user-prefixed directory

## What Changed

- `packages/adapters/cursor-local/src/server/remote-command.ts`: resolve
the cursor-agent binary from the cursor bin directory after install,
instead of relying on PATH.
- `packages/adapters/cursor-local/src/server/test.ts`: corresponding
probe tweak.
- `packages/adapters/cursor-local/src/server/test.test.ts` (new) +
`remote-command.test.ts`: focused coverage that exercises the install +
resolve path against a sandbox runner that places the binary in a
user-prefixed directory.

## Verification

- `pnpm exec vitest run --no-coverage
packages/adapters/cursor-local/src/server/test.test.ts
packages/adapters/cursor-local/src/server/remote-command.test.ts
packages/adapters/cursor-local/src/server/execute.test.ts`

All passing locally.

## Risks

- Local cursor-local runs are unaffected — the resolution change only
kicks in for the sandbox install path.
- Low risk; isolated to one adapter.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: tool use (Read/Edit/Bash), no code execution beyond
local repo commands

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 00:41:20 -07:00
Devin Foley b24c6909e8 Harden remote sandbox runtime probes, timeouts, and installs (#5685)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs inside a sandbox environment so its CLI is isolated
from the host
> - Sandbox-backed adapter runs go through a small set of shared helpers
— `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback
bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings
> - When standing up new sandbox provider plugins, the existing helpers
timed out, missed install fallbacks, or leaned on assumptions that only
held for E2B
> - Local adapters (`claude-local`, `codex-local`, `gemini-local`,
`opencode-local`) needed slightly hardened probes so they could install
themselves and validate inside *any* remote sandbox transport, not just
E2B
> - This pull request bundles those runtime fixes so future sandbox
provider plugins inherit a working baseline
> - The benefit is that adding a new sandbox provider plugin no longer
requires touching adapter-utils or each local-adapter probe — the
supporting infra is already correct

## What Changed

- `packages/adapter-utils/src/execution-target.ts`: introduce
`DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and
`resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters
keep the historical "0 means no adapter timeout" behavior;
sandbox-backed runs without an explicit `timeoutSec` get an explicit
30-minute default so remote installs and warm-up don't time out at the
per-RPC default. Plumbed `timeoutSec` through
`ensureAdapterExecutionTargetCommandResolvable` so install probes inside
a sandbox honor adapter-level overrides instead of the bridge's 5-minute
default.
- `packages/adapters/opencode-local/src/index.ts`: switch
`SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl
-fsSL https://opencode.ai/install | bash`. The npm package reifies four
large prebuilt-binary subpackages in parallel even though only one
matches the host arch; on bandwidth-constrained sandboxes that blew
through the 240s install budget. The official installer fetches one
arch-specific binary and adds `$HOME/.opencode/bin` to PATH via
`~/.bashrc`, which the sandbox-callback-bridge login-shell script
already sources.
- `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden
remote-target probes — pass `--skip-git-repo-check` for Codex when
probing outside a repo, normalize permission flags for Claude, and add
`*.remote.test.ts` coverage that exercises the remote-sandbox path
explicitly for each adapter.
- `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}`
(new): add `buildSandboxNpmInstallCommand` helper.
`server/src/adapters/registry.ts` + new
`server/src/__tests__/adapter-registry.test.ts`: wire adapter install
commands so they fall back to a writable `$HOME/.local` prefix when
global install isn't available.
- `server/src/__tests__/plugin-worker-manager.test.ts` + new
`server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call
timeout overrides so plugin worker exec calls honor the caller's timeout
instead of the worker's default.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/execution-target-sandbox.test.ts
packages/adapter-utils/src/sandbox-install-command.test.ts`
- `pnpm exec vitest run --no-coverage
server/src/__tests__/plugin-worker-manager.test.ts
server/src/__tests__/adapter-registry.test.ts
server/src/__tests__/claude-local-adapter-environment.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter-environment.test.ts`
- `pnpm exec vitest run --no-coverage
packages/adapters/codex-local/src/server/test.remote.test.ts
packages/adapters/opencode-local/src/server/test.remote.test.ts
packages/adapters/codex-local/src/server/codex-args.test.ts
packages/adapters/codex-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts`

All passing locally.

## Risks

- Touches shared `adapter-utils` and several `*-local` adapters. The
30-minute default applies only when both (a) the target is
`remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH
paths are unchanged. New test coverage was added alongside each behavior
change to pin the contracts.
- Switching OpenCode's install command to the official installer is a
behavior change for any operator running OpenCode inside a remote
sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND`
only runs when an adapter is being installed inside a sandbox).
- Low risk overall — no migrations, no API surface change.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep),
no code execution beyond local repo commands

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 00:31:54 -07:00
Devin Foley 534aee66ae Add cursor_cloud adapter for Cursor SDK + Cloud Agents API v1 (#5664)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - There are many adapter types, one per agent-runtime product (Claude,
Codex, OpenCode, Cursor local CLI, etc.)
> - Cursor shipped a public TypeScript SDK on 2026-04-29 that exposes
Cursor's full hosted-agent platform (cloud VMs, harness, MCP, skills,
hooks)
> - Paperclip had no first-class adapter for this — agents that wanted
to use Cursor's managed cloud runtime had to fall back to the local CLI
adapter, which loses the cloud session, streaming, and durable run model
> - This PR adds a new `cursor_cloud` adapter built directly on
`@cursor/sdk`, with Paperclip's heartbeat mapped to Cursor's
durable-agent + per-run model
> - The benefit is that any Paperclip agent can now drive a Cursor cloud
agent across heartbeats with native session reuse, streaming, and
cancellation, while Paperclip remains the source of truth for issue/task
state

## What Changed

- New built-in adapter package `packages/adapters/cursor-cloud` (15
files, ~1.7k LOC) backed by `@cursor/sdk` ^1.0.12
- `src/server/execute.ts` — SDK-first lifecycle: `Agent.create` /
`Agent.resume` / `Agent.getRun` / `agent.send` / `run.stream` /
`run.wait`, with session reuse keyed on the (runtime env type, env name,
repo set) tuple
- `src/server/session.ts` — codec for `cursorAgentId` + `latestRunId` +
repo metadata, persisted in `runtime.sessionParams`
- `src/server/test.ts` — environment probe via `Cursor.me()` and
optional model validation via `Cursor.models.list()`
- `src/ui/parse-stdout.ts` + `src/cli/format-event.ts` — normalize
Cursor SDK message types (`status`, `thinking`, `assistant`, `user`,
`tool_call`, `tool_result`, `result`) into Paperclip transcript events
for the UI and CLI
- Registrations: `packages/shared/src/constants.ts`,
`packages/adapter-utils/src/session-compaction.ts`,
`server/src/adapters/{registry,builtin-adapter-types}.ts`,
`ui/src/adapters/{registry,adapter-display-registry}.ts` +
`ui/src/adapters/cursor-cloud/index.ts`, `cli/src/adapters/registry.ts`,
plus workspace deps in `cli`/`server`/`ui` `package.json`
- `ui/src/components/AgentConfigForm.tsx` — hide local-Cursor
`mode`/thinking-effort field for `cursor_cloud` (different config
surface)
- 11 vitest tests covering execute paths (fresh create, matching-resume,
active-run reattach, non-finished result), session codec round-trip,
transcript parsing, and config building

## Verification

Reviewer steps:

```bash
pnpm install
pnpm --filter @paperclipai/adapter-cursor-cloud typecheck   # → clean
pnpm vitest run packages/adapters/cursor-cloud              # → 11/11 passing
```

End-to-end check against a real Cursor cloud agent (requires
`CURSOR_API_KEY` and Cursor GitHub-app install on the target repo):

1. Create a `cursor_cloud` agent in Paperclip with `repoUrl` set to the
test repo, `repoStartingRef: main`, and `env.CURSOR_API_KEY` set
2. Trigger a heartbeat → adapter calls `Agent.create({ cloud: { env: {
type: "cloud" }, repos: [...] } })`, streams events, terminates on
`finished`
3. Trigger a second heartbeat → adapter calls `Agent.resume` or
`agent.send` follow-up depending on prior-run state, reusing
`cursorAgentId`
4. The Paperclip UI/CLI transcript reflects Cursor `status` / `thinking`
/ `assistant` events as they stream
5. Cancellation from Paperclip maps to `run.cancel()` or Cloud API v1
`cancelRun` for cross-heartbeat cancellation

A direct-SDK smoke run against a real repo (devinfoley/my_test_project @
main) confirmed: `Cursor.me()` ok → `Agent.create` → `agent.send` →
`run.stream()` (30 events) → terminal status `finished` in ~11s.

## Risks

- **New adapter, additive only.** No existing adapter or registry is
replaced; current `cursor` local-CLI adapter is untouched. Default
behavior of any existing agent is unchanged.
- **External dependency on `@cursor/sdk`.** Cursor's SDK is v1.0.x and
may evolve. Mocked unit tests cover the public surface used here; if the
SDK breaks compatibility we update the adapter independently.
- **Cost/budget.** `cursor_cloud` runs on Cursor's billed cloud VMs;
operators must understand they are spending money outside Paperclip's
budget controls when they enable this adapter. Same shape as other
API-billed adapters.
- **No webhook support in V1.** The SDK already provides
stream/wait/cancel/reattach, so V1 does not require a public callback
URL. If a future use case needs out-of-band wakes, we add a Cloud API v1
webhook bridge as a separate change. This is called out in the issue
plan document.
- **Lockfile.** Per repo policy, `pnpm-lock.yaml` is intentionally not
in this PR — CI's lockfile workflow will update it on merge given the
manifest changes.

## Model Used

- Provider: Anthropic Claude (via Claude Code / Paperclip `claude_local`
adapter)
- Model: `claude-opus-4-7` (Claude Opus 4.7), knowledge cutoff January
2026
- Mode: standard tool-use with extended reasoning
- Context: ~200k token window
- Capabilities used: code generation, multi-file edits, shell/test
execution, GitHub PR workflow

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass (11/11 in
`packages/adapters/cursor-cloud`)
- [x] I have added or updated tests where applicable (4 new test files,
11 cases)
- [ ] If this change affects the UI, I have included before/after
screenshots (the only UI change is hiding the local-Cursor mode field on
the `cursor_cloud` adapter — happy to attach a screenshot if the
reviewer wants one)
- [x] I have updated relevant documentation to reflect my changes (issue
plan document supersedes the pre-SDK design; tracked in PAPA-203)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-10 17:21:04 -07:00
Dotta 0096b56a1c [codex] Add LLM Wiki plugin host support (#5597)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system needs host contracts and runtime support before
large plugins can integrate cleanly.
> - The source branch mixed the LLM Wiki package with supporting
host/runtime work, managed plugin skills, root-level storage spaces, and
a bookmarks reference plugin.
> - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be
split by file boundary: plugin package separately from everything else.
> - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have
plugin-local spaces, but Paperclip core should not reorganize top-level
local storage into spaces.
> - Follow-up review clarified that the bookmarks example should not
ship in this PR either.
> - This pull request contains the
non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime
state under the selected Paperclip instance root, and no longer includes
the bookmarks example.

## What Changed

- Added/updated plugin host contracts, SDK types, worker RPC plumbing,
managed plugin skill support, and related server tests.
- Removed the bookmarks example plugin package and its
bundled-example/workspace references.
- Removed the root-level local spaces CLI/migration surface and restored
instance-root runtime defaults for config, db, logs, storage, secrets,
workspaces, projects, and adapter homes.
- Replaced shared root `space-paths` helpers with `home-paths` helpers
for core runtime storage.
- Tightened stranded recovery unique-conflict detection so concurrent
recovery scans reuse the raced recovery issue when Postgres errors are
wrapped.
- Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff;
plugin-local spaces remain in the stacked plugin-only PR.

## Verification

- `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts
cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts
packages/shared/src/home-paths.test.ts
packages/db/src/runtime-config.test.ts
server/src/__tests__/agent-instructions-service.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts`
- `pnpm exec vitest run packages/db/src/runtime-config.test.ts`
- `pnpm exec vitest run
server/src/__tests__/plugin-routes-authz.test.ts`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the
raced stranded recovery issue"` skipped locally because embedded
Postgres did not initialize on this macOS temp host; the code path was
typechecked and is covered by Linux CI.
- Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`,
`spaces migrate-default`, `@paperclipai/shared/space-paths`,
`registerSpacesCommands`, or the removed bookmarks example.
- Previous PR head `4f23e034` had green GitHub checks: `verify`, all
four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk,
and `Greptile Review`. Current head `582f466d` is re-running checks
after the bookmarks deletion.

## Risks

- Plugin host changes touch shared runtime paths, so regressions would
most likely appear in adapter startup, plugin loading, or local dev path
defaults.
- Removing the bookmarks example also removes one demonstration of
plugin database namespaces plus local-folder persistence; remaining
plugin examples still cover bundled example discovery and plugin host
flows.
- The plugin package itself is intentionally deferred to the stacked
plugin-only PR, where LLM Wiki plugin-local spaces live.
- Existing installs that tested the transient root-level spaces CLI
should stop using it; this PR intentionally removes that unsupported
migration surface before merge.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass, except where noted above
for host-specific embedded Postgres initialization
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stacked follow-up: PR #5592 contains only
`packages/plugins/plugin-llm-wiki/` and targets this branch.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-10 07:34:12 -05:00
Devin Foley 2f72cb29ea chore: update drizzle-orm to 0.45.2 (#5589)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The server, DB package, and CLI all rely on the shared Drizzle ORM
dependency for core persistence flows.
> - A published install was still resolving nested `drizzle-orm@0.38.4`,
which left the production package graph behind the intended security
update.
> - The repo’s documented dependency policy says GitHub Actions owns
`pnpm-lock.yaml`, so the correct maintainer workflow is to update
dependency manifests in the feature PR and let the lockfile refresh
happen separately after merge.
> - This pull request therefore keeps the Drizzle upgrade to the package
manifests only and leaves lockfile regeneration to the existing `Refresh
Lockfile` automation.

## What Changed

- Updated `drizzle-orm` dependency declarations in `cli/package.json`,
`packages/db/package.json`, and `server/package.json` from `0.38.4` /
`^0.38.4` to `0.45.2` / `^0.45.2`.
- Re-verified the packed `@paperclipai/db` and `@paperclipai/server`
publish payloads to confirm their generated `package.json` files
advertise `drizzle-orm ^0.45.2`.
- Removed the temporary lockfile/CI follow-up commits so the branch now
matches the intended manifest-only protocol.

## Verification

- `pnpm list drizzle-orm -r --depth 0`
- `pnpm exec vitest run packages/db/src/client.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm run test:release-registry`
- Packed `@paperclipai/db` and `@paperclipai/server` locally and
inspected the tarball `package.json` files to confirm they advertise
`drizzle-orm ^0.45.2`.

## Risks

- Low to moderate risk: the runtime code paths are unchanged, but
downstream lockfile refresh now depends on the existing post-merge
GitHub automation working as documented.
- A separate packaging/versioning issue around unpublished
`@paperclipai/plugin-sdk@1.0.0` showed up during a raw local tarball
install experiment; that is called out for reviewers but is not part of
this Drizzle bump.

## Model Used

- OpenAI Codex via the `codex_local` adapter, using a GPT-5-based coding
agent with terminal tool use and code execution. The adapter does not
expose a public exact model ID or context-window value in this
environment.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-09 21:31:57 -07:00
Dotta 778e775c35 Add secrets provider vaults and remote import (#5429)
## Thinking Path

> - Paperclip orchestrates AI-agent companies and needs secrets handling
to work across local development, hosted operators, and governed agent
execution.
> - The affected subsystem is the company-scoped secrets control plane:
database schema, server services/routes, CLI workflows, and the Secrets
settings UI.
> - The gap was that secrets were local-only and operators could not
manage provider vaults or import existing remote references without
exposing plaintext.
> - This branch adds provider vault configuration plus an AWS Secrets
Manager remote-import path while preserving company boundaries, binding
context, and audit trails.
> - I kept the PR to a single branch PR, removed unrelated
lockfile/package drift, rebased the full branch onto the current
`public-gh/master`, and addressed fresh Greptile findings.
> - The benefit is a reviewable implementation of provider-backed
secrets with focused tests covering provider selection, import
conflicts, deleted secret reuse, rotation guards, and AWS signing
behavior.

## What Changed

- Added provider vault support for company secrets, including provider
config storage, default vault handling, health checks, binding usage,
access events, and remote import preview/commit.
- Added an AWS Secrets Manager provider using SigV4 request signing,
bounded request timeouts, namespace guardrails, cached runtime
credential resolution, and external-reference linking without plaintext
reads.
- Added Secrets UI surfaces for vault management and remote import, plus
CLI/API documentation for setup and operations.
- Stabilized routine webhook secret binding paths and SSH
environment-driver fixture bindings discovered during verification.
- Addressed Greptile and CI findings: no lockfile/package drift,
monotonic migration metadata, disabled-vault default races, soft-deleted
secret hiding/recreate behavior, remove behavior with disabled vaults,
soft-deleted external-reference re-import, non-active rotation guards,
managed-secret soft deletion through PATCH, and per-call AWS SDK
credential client churn.
- Rebased this branch onto `public-gh/master` at `0e1a5828` and
force-pushed with lease to keep this as the single PR for the branch.

## Verification

- `git fetch public-gh master`
- `git rebase public-gh/master`
- `git diff --name-only public-gh/master...HEAD | grep
'^pnpm-lock\.yaml$' || true` confirmed `pnpm-lock.yaml` is not in the PR
diff.
- Confirmed migration ordering: master ends at `0081_optimal_dormammu`;
this PR adds `0082_dry_vision` and
`0083_company_secret_provider_configs`.
- Inspected migrations for repeat safety: new tables/indexes use `IF NOT
EXISTS`; foreign keys are guarded by `DO $$ ... IF NOT EXISTS`; column
additions use `ADD COLUMN IF NOT EXISTS`.
- `pnpm -r typecheck` passed before the Greptile follow-up commits.
- `pnpm test:run` ran the full stable Vitest path before the Greptile
follow-up commits; it completed with 3 timing-related failures under
parallel load: `codex-local-execute.test.ts`,
`cursor-local-execute.test.ts`, and `environment-service.test.ts`.
- `pnpm --filter @paperclipai/server exec vitest run
src/__tests__/codex-local-execute.test.ts
src/__tests__/cursor-local-execute.test.ts
src/__tests__/environment-service.test.ts` passed on targeted rerun
(`24/24`).
- `pnpm build` passed before the Greptile follow-up commits. Vite
reported existing chunk-size/dynamic-import warnings.
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/secrets-service.test.ts` passed (`26/26`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
exec vitest run src/__tests__/aws-secrets-manager-provider.test.ts
src/__tests__/secrets-service.test.ts` passed (`39/39`).
- After Greptile follow-up commits: `pnpm --filter @paperclipai/server
typecheck` passed.
- Captured Storybook screenshots from `ui/storybook-static` for visual
review.
- Latest PR checks on `5ca3a5cf`: `policy`, serialized server suites
1/4-4/4, `Canary Dry Run`, `e2e`, `security/snyk`, and `Greptile Review`
pass; aggregate `verify` is still registering the completed child
checks.
- Greptile review loop continued through the latest requested pass; all
Greptile review threads are resolved and the latest `Greptile Review`
check on `5ca3a5cf` passed with 0 comments added.

## Screenshots

Before: the provider-vault and remote-import surfaces did not exist on
`master`; these are after-state screenshots from the Storybook fixtures.

![Secrets
inventory](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/secrets-inventory.png)

![Secret binding
picker](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/secret-binding-picker.png)

![Environment editor with
secrets](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2339-secrets-make-a-plan/doc/pr/5429/env-editor-with-secrets.png)

## Risks

- Migration risk: this adds new secret provider tables and extends
existing secret rows. The migrations were checked for monotonic ordering
and idempotent guards, but reviewers should still inspect upgrade
behavior carefully.
- Provider risk: AWS support uses direct SigV4 requests. Automated tests
cover signing, request timeouts, vault-config selection, namespace
guardrails, pending-version archival, sanitized provider errors, and
service-level cleanup paths. A real-vault AWS smoke test remains
deployment validation for an operator with AWS credentials rather than
an unverified merge blocker in this local branch.
- UI risk: the Secrets page and import dialog are large new surfaces;
screenshots are included above for reviewer inspection.
- Verification risk: the full local stable test command hit
parallel-load timing failures, although the exact failed files passed
when rerun directly.
- Operational risk: remote import intentionally avoids plaintext reads;
operators must understand that imported external references resolve at
runtime and may fail if AWS permissions change.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent with local shell/tool use in the
Paperclip worktree. Exact context-window size was not exposed by the
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:22:17 -05:00
Devin Foley 06e6ee25cd Add Daytona sandbox provider plugin (#5580)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents need isolated sandbox environments to execute work safely;
Paperclip already supports E2B as a sandbox provider plugin
> - Users want to use Daytona (https://www.daytona.io/) as an
alternative sandbox backend, but no plugin existed for it
> - Without a Daytona plugin, teams that prefer Daytona's
pricing/regions/runtime can't run Paperclip agents on it
> - This pull request adds a `@paperclip/sandbox-provider-daytona`
plugin that mirrors the existing E2B plugin shape and wires up Daytona's
`@daytonaio/sdk` for sandbox lifecycle, command execution, and shell
detection
> - The benefit is that operators can pick Daytona as a first-class
sandbox provider without touching core code, broadening Paperclip's
runtime options

## What Changed

- New plugin package `packages/plugins/sandbox-providers/daytona` with
manifest, worker entry, and provider implementation backed by
`@daytonaio/sdk`
- Implements sandbox create/destroy/exec/upload/download lifecycle,
shell command detection, and config/env wiring consistent with the E2B
plugin
- Adds unit tests under `src/plugin.test.ts` and a README documenting
setup and the `DAYTONA_API_KEY` requirement
- Minor adjustments in `scripts/paperclip-issue-update.sh`,
`packages/shared/src/issue-thread-interactions.test.ts`, and
`packages/shared/src/validators/issue.ts` to support the integration

## Verification

- Re-ran the full sandbox provider matrix on the QA Paperclip instance
using Daytona as the runtime — all 6 adapters executed inside the
Daytona sandbox with zero `environmentExecute` timeouts
- 5/6 adapters pass cleanly (or with informational warns); the only
failure is `codex_local`, which is an OpenAI quota/billing issue
unrelated to Daytona
- `pnpm --filter @paperclip/sandbox-provider-daytona test` runs the
plugin unit tests

## Risks

- New optional plugin; no behavior change for users who don't enable it
- Requires `DAYTONA_API_KEY` for runtime use — documented in the plugin
README
- Daytona SDK is a new external dependency; tracked in the plugin's own
package.json so it doesn't affect the core install footprint

## Model Used

- Claude Opus 4.7 (`claude-opus-4-7`), extended thinking, tool use
enabled

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots (N/A — backend plugin)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-09 11:50:12 -07:00
Devin Foley 0e1a582831 Revert "Add experimental newest-first issue thread" (#5460)
This is actually bad. Glad it was under experiments.
2026-05-07 16:50:31 -07:00
Devin Foley a904effb96 Add experimental newest-first issue thread (#5455)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, so issue
threads are a core operator surface for reviewing work.
> - The issue detail page is the place where humans read agent messages,
user comments, and execution context together.
> - That thread originally rendered oldest-first, which made recent
activity harder to see during active review.
> - Reversing the thread order changes navigation expectations,
timestamp placement, and the "Jump to latest" affordance, so the UI
behavior needed to move as a coherent set.
> - Because this is a visible core-product behavior shift, it also
needed a safe rollout path instead of becoming the default immediately.
> - This pull request adds the newest-first issue thread behavior behind
an Experimental setting, updates the thread UI to match that mode, and
keeps the legacy oldest-first experience unchanged by default.
> - The benefit is that reviewers can opt into a more recent-first issue
workflow without forcing a global behavior change on every Paperclip
instance.

## What Changed

- Reversed issue thread rendering so the newest comments and messages
appear first when the experiment is enabled.
- Moved the plain comment timestamp into the card header in newest-first
mode and kept the legacy timestamp placement for oldest-first mode.
- Moved the `Jump to latest` control to the bottom of the thread in
newest-first mode while leaving the existing top placement for the
legacy mode.
- Added the `Enable Newest-First Issue Thread` experimental instance
setting and wired issue detail to read that toggle.
- Added regression coverage for thread order, timestamp placement,
jump-button placement, and the issue-detail experiment toggle behavior.

## Verification

- `pnpm -r typecheck`
- `pnpm test:run`
- `pnpm build`
- Focused checks that also passed during issue review:
- `pnpm vitest run src/components/IssueChatThread.test.tsx
src/pages/IssueDetail.test.tsx` in `ui/`
- `pnpm vitest run src/__tests__/instance-settings-routes.test.ts` in
`server/`
- Manual review path:
- Enable `Instance Settings > Experimental > Enable Newest-First Issue
Thread`
- Open an issue with comments/messages and confirm newest activity
renders first, timestamps move into the header, and `Jump to latest`
sits below the thread
- Disable the experiment and confirm the legacy oldest-first behavior
returns

## Risks

- Low risk: the behavioral change is gated behind an instance-level
experimental toggle and defaults off.
- The main regression risk is thread navigation drift between the two
modes, especially around anchor scrolling and the `Jump to latest`
affordance.
- There is some UI coupling between issue-detail query state and
experimental settings fetches, so future changes in that area should
keep both modes covered.
- Screenshots are not attached in this PR body; verification is
described with automated coverage and manual steps instead.

> I checked [`ROADMAP.md`](ROADMAP.md). This is a scoped issue-thread UX
improvement and rollout gate, not a duplicate of a roadmap-level planned
core feature.

## Model Used

- OpenAI Codex via the local `codex_local` Paperclip adapter,
GPT-5-based coding agent with terminal tool use and local code execution
in this repository worktree.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-07 16:45:12 -07:00
Devin Foley 4269545b19 Stabilize Cursor sandbox runtime resolution (#5446)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The Cursor adapter spawns the Cursor CLI against local, SSH, and
sandbox execution targets; on a fresh sandbox lease, it has to resolve
where Cursor was installed
> - The previous resolver only looked for `~/.local/bin/cursor-agent`
even though the official installer (and the adapter's own
`SANDBOX_INSTALL_COMMAND`) sometimes lays the binary down as
`~/.local/bin/agent`, so a sandbox where the install ran successfully
would still fail to find the CLI
> - This pull request lets the resolver accept either basename and lets
the caller pass an optional `remoteSystemHomeDirHint` so a probe doesn't
pay the cost of a remote `printf $HOME` round-trip when the home
directory is already known
> - The benefit is sandboxed Cursor runs find the binary that the
install actually produced, and runtime probes are cheaper when the home
dir is already resolved

## What Changed

- `packages/adapters/cursor-local/src/server/remote-command.ts`: accept
either `agent` or `cursor-agent` as the preferred basename; new optional
`remoteSystemHomeDirHint` short-circuits the home-dir probe
- `packages/adapters/cursor-local/src/server/execute.ts`: thread the
home-dir hint through, prefer the resolved binary path, and shift the
effective execution cwd to the per-run managed subdirectory once the
runtime is prepared
- New `remote-command.test.ts` and `execute.test.ts` cover both
basenames, the hint short-circuit, and the cwd shift
- `packages/adapters/cursor-local/src/index.ts`: update doc string to
reflect the broader resolution
- `execute.remote.test.ts` updated to expect the managed-subdirectory
cwd shape introduced by the cwd shift

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-cursor-local` — 6/6 passing
- `pnpm typecheck` clean
- Manual: a fresh sandbox lease with `npm install -g …`-installed Cursor
(binary lands as `~/.local/bin/agent`) now runs cleanly through the
adapter

## Risks

Low. Resolver is strictly broader (matches a superset of paths);
existing setups with `~/.local/bin/cursor-agent` continue to work. The
home-dir hint is opt-in; callers that don't pass it get the existing
probe behavior. Cursor's effective execution cwd now matches the rest of
the adapters (per-run managed subdirectory) — sessions previously rooted
at the workspace root will land in the new subdirectory.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
both basenames + hint short-circuit + cwd shift
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---

> **Stacked PR.** Sits on top of #5445 (which sits on #5444). Cumulative
diff against `master` includes both of those PRs' content; the files
touched by *this* PR's commit are listed under "What Changed" above.
Will rebase onto `master` and force-push once the prerequisite PRs
merge.
2026-05-07 15:00:28 -07:00
Devin Foley fe3904f434 Stabilize runtime probes and Codex env tests (#5445)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters expose a Test action that probes the configured runtime —
install, resolvability, hello — to give operators a fast yes/no on
whether an environment is healthy
> - The Codex test path was running its hello probe directly without
going through the managed-runtime preparation that production runs use,
so a healthy production setup could still report a probe failure
> - The plugin worker manager wasn't surfacing terminated workers
cleanly, leaving the runtime probe waiting on a dead worker until the
request timed out
> - This pull request routes the Codex test probe through
`prepareAdapterExecutionTargetRuntime` (so it sees the same managed
Codex home production sees), exposes `commandCwd` on
`createCommandManagedRuntimeClient` so callers can target a per-probe
directory without leaking the workspace `remoteCwd`, and propagates
plugin-worker termination as a usable error instead of a hang
> - The benefit is the Codex Test action mirrors production behavior
end-to-end, and probes against a terminated plugin worker fail fast
instead of timing out

## What Changed

- `packages/adapter-utils/src/command-managed-runtime.ts`: rename the
`remoteCwd` knob to `commandCwd` so callers can target a per-probe
directory without inheriting the workspace cwd; matching test coverage
in `command-managed-runtime.test.ts`
- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
small fixes to keep callback bridge stop semantics deterministic
- `packages/adapters/codex-local/src/server/test.ts`: thread the Codex
hello probe through `prepareAdapterExecutionTargetRuntime` +
`prepareManagedCodexHome` so the probe sees the same managed home
production sees; new `test.remote.test.ts` covers the remote probe path
- `packages/adapters/cursor-local/src/server/execute.ts`: small
probe-side cleanup that aligns with the new commandCwd contract
- `server/src/services/plugin-worker-manager.ts`: surface plugin-worker
termination as a structured error so callers fail fast; new
`plugin-worker-terminated.cjs` fixture and
`plugin-worker-manager.test.ts` cases pin the behavior

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project @paperclipai/server` —
1749/1750 passing (1 unrelated skip)
- `pnpm typecheck` clean

## Risks

Low–medium. The `remoteCwd → commandCwd` rename is a parameter renaming
on an internal helper used only by adapter test/execute paths in this
repo. The plugin-worker-terminated path was previously a hang; failing
fast may surface latent timeouts as explicit termination errors in
callers that already expected them.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
commandCwd, plugin-worker termination, and Codex remote test path
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---

> **Stacked PR.** Sits on top of #5444 which adds the per-run runtime
API surface this PR builds on. Cumulative diff against `master` includes
that PR's content; the files touched by *this* PR's commit are listed
under "What Changed" above. Will rebase onto `master` and force-push
once #5444 merges.
2026-05-07 14:52:31 -07:00
Devin Foley 12cb7b40fd Harden remote workspace sync and restore flows (#5444)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs against a remote target, Paperclip syncs the
workspace out to the remote at run start and restores changes back to
the local workspace at run end
> - The previous restore flow naïvely overwrote local files with
whatever the remote returned, so files that the remote run never touched
but had timestamp/mode drift could be needlessly rewritten — and a
single static `refs/paperclip/ssh-sync/imported` ref made concurrent SSH
workspace exports race on the same git ref
> - This pull request adds a `workspace-restore-merge` module that diffs
a pre-run snapshot against the post-run remote state and only writes
back files the remote actually changed; SSH workspace exports now use a
per-import unique ref so concurrent runs can't trample each other
> - Every adapter's execute path threads the snapshot through
`prepareAdapterExecutionTargetRuntime` so the merge has the baseline it
needs
> - The benefit is workspace restores no longer churn untouched files,
and concurrent SSH runs no longer collide on the import ref

## What Changed

- `packages/adapter-utils/src/workspace-restore-merge.{ts,test.ts}`: new
module — directory snapshot (kind/mode/sha256/symlink target) plus
snapshot-aware merge that writes only the files the remote changed
- `packages/adapter-utils/src/ssh.ts`: SSH workspace export uses a
per-import unique ref (`refs/paperclip/ssh-sync/imported/<uuid>`);
restore goes through the new merge helper; `ssh-fixture.test.ts` covers
the unique-ref + merge paths
- `packages/adapter-utils/src/sandbox-managed-runtime.ts` +
`remote-managed-runtime.ts`: thread the snapshot/merge through the
sandbox and SSH paths
- `packages/adapter-utils/src/server-utils.{ts,test.ts}` +
`execution-target.ts`: helpers for capturing the pre-run snapshot;
`prepareAdapterExecutionTargetRuntime` gains required `runId` and
optional `workspaceRemoteDir`, and returns the realized
`workspaceRemoteDir`
- Each adapter's `execute.ts` (acpx, claude, codex, cursor, gemini,
opencode, pi) takes the snapshot at run start and passes it through to
the runtime restore
- Remote execute test mocks updated to match the new
`prepareWorkspaceForSshExecution` return shape and the per-run
`${managedRemoteWorkspace}` cwd subdirectory

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-acpx-local --project
@paperclipai/adapter-claude-local --project
@paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project
@paperclipai/adapter-gemini-local --project
@paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local` — 196/196 passing
- `pnpm typecheck` clean across the workspace

## Risks

Medium. The restore path now writes a strict subset of what it
previously did — files the remote did not touch are no longer rewritten.
If any flow was relying on a touch-without-content-change being copied
back (timestamp or permission propagation only), that behavior is now
skipped. Snapshot capture adds an O(N-files-in-workspace) hash pass at
run start; the cost is bounded by the existing exclude list. The `runId`
parameter on `prepareAdapterExecutionTargetRuntime` is now required —
every in-tree caller is updated; out-of-tree adapter authors need to
pass it.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new module +
every adapter execute path covered
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-07 14:44:45 -07:00
Dotta e400315cbf Guard assigned backlog liveness (#5428)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The issue graph and liveness recovery system decide whether assigned
work is executable or parked
> - Assigned issues created without an explicit status could silently
land in backlog, making parents look blocked with no productive wake
path
> - The server, shared validators, recovery analysis, and UI all need to
agree on that execution semantic
> - This pull request makes assigned issue creation default to `todo`,
flags assigned backlog blockers, and surfaces the state in the board
> - The benefit is that parked assigned work becomes intentional and
visible instead of creating silent liveness stalls

## What Changed

- Adds contract tests for assigned issue creation defaults.
- Defaults assigned issue creation to `todo` when status is omitted
while preserving explicit `backlog` parking.
- Exposes `resolveCreateIssueStatusDefault` through shared validators.
- Teaches liveness/blocker attention paths to distinguish assigned
backlog blockers.
- Adds UI notices, row/header badges, and issue detail safeguards for
assigned backlog blockers.
- Adds Storybook fixtures and execution-semantics documentation for the
assigned-backlog behavior.

## Verification

- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/validators/issue.test.ts
server/src/__tests__/issue-assigned-backlog-contract-routes.test.ts
server/src/__tests__/issue-blocker-attention.test.ts
server/src/__tests__/issue-liveness.test.ts
server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts
ui/src/components/IssueAssignedBacklogNotice.test.tsx
ui/src/components/IssueRow.test.tsx` — 50 passed, 23 skipped.
- Skipped tests were embedded Postgres suites on this host with the repo
skip message: `Postgres init script exited with code null. Please check
the logs for extra info. The data directory might already exist.`
- Pairwise merge check against the issue-controls PR branch completed
without conflicts via `git merge --no-commit --no-ff` in a temporary
worktree.
- Screenshots for assigned-backlog UI states:
[light](docs/pr-screenshots/pr-5428/assigned-backlog-light.png),
[dark](docs/pr-screenshots/pr-5428/assigned-backlog-dark.png).
- Follow-up checks: `pnpm --filter /ui typecheck`; `pnpm --filter
/mcp-server build`; `pnpm --filter /mcp-server test`; `pnpm exec vitest
run packages/shared/src/validators/issue.test.ts`; focused UI component
tests.
- Remote PR checks on head `6300b3c`: policy, verify, serialized server
shards 1/4-4/4, Canary Dry Run, e2e, Greptile Review, and Snyk all
passed.

## Risks

- Medium: changes status defaulting for assigned issue creation when the
caller omits status. Explicit `backlog` remains supported, and
server/shared tests cover both paths.
- Medium: liveness classification changes can affect blocker attention
labels; focused service and UI tests cover the new assigned-backlog
state.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent, GPT-5 model family (`gpt-5`), tool-enabled
Paperclip heartbeat environment. Context window and internal reasoning
mode are not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-07 12:25:26 -05:00