Commit Graph

555 Commits

Author SHA1 Message Date
Devin Foley 028c5aa00a Stop leaking host process.env into the remote OpenCode SSH probe (#5274)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The OpenCode adapter runs against local, SSH, and sandbox execution
targets
> - The Test path's hello probe spreads the Paperclip host's
`process.env` into the remote process env, which over SSH gets exported
on the remote shell
> - On a Linux SSH target, `HOME=/Users/...` and a host XDG_CONFIG_HOME
pointing at a macOS `/var/folders/...` temp dir cause OpenCode to walk a
host-only path and fail with `EACCES: permission denied, mkdir '/Users'`
> - This pull request stops the leak by passing only user-configured
adapter env to the probe when the target is remote, matching the pattern
already used by claude-local, codex-local, and gemini-local
> - The benefit is the OpenCode hello probe now passes end-to-end
against an SSH target without spurious filesystem errors

## What Changed

- `prepareOpenCodeRuntimeConfig` short-circuits when the target is
remote — the host-fs temp config dir is meaningless and harmful for a
remote target
- `test.ts` passes only the user-configured adapter env (no host
`process.env` spread) to `runAdapterExecutionTargetProcess` when
`targetIsRemote`
- Local probes still get the full `runtimeEnv` so headless permission
injection keeps working

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-opencode-local`
- `pnpm typecheck` clean
- Manual: SSH OpenCode hello probe goes from `EACCES … mkdir '/Users'`
to `opencode_hello_probe_passed`

## Risks

Low risk — local probe behavior is unchanged; the change only narrows
the env passed to remote targets, matching the pattern already shipped
in sibling adapters.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — pattern mirrors
existing sibling tests
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:00:19 -07:00
Devin Foley ea7f53fd7d Handle Gemini CLI v0.38 stream-json wire format across parser, UI, and CLI formatter (#5273)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent uses an adapter that drives a CLI (Claude, Gemini, Codex,
etc.)
> - The Gemini adapter parses a JSONL transcript stream the CLI emits to
learn what the model said
> - Gemini CLI v0.38 changed the transcript shape: assistant text now
comes through `type=message` with `role`/`content` and terminal status
comes through `type=status` / `type=stats`
> - The existing parser was written against the older `type=assistant` /
`type=result` shape, so post-v0.38 outputs left the parsed summary empty
and downgraded the SSH hello probe to "unexpected output"
> - This pull request updates every Gemini consumer (server parser, UI
parser, CLI formatter) to accept the v0.38 shape while keeping the
legacy shape working
> - The benefit is the Gemini adapter handles current upstream output
without losing backward compatibility, with explicit test coverage for
both shapes

## What Changed

- `packages/adapters/gemini-local/src/server/parse.ts` recognizes
`type=message` events with role/content and stops downgrading them
- `packages/adapters/gemini-local/src/ui/parse-stdout.ts` mirrors the
parser changes for the live UI transcript
- `packages/adapters/gemini-local/src/cli/format-event.ts` formats the
new event shape correctly for CLI output
- `parse.test.ts` and `parse-stdout.test.ts` add v0.38 coverage;
`gemini-local-adapter.test.ts` and `execute.remote.test.ts` switch
happy-path fixtures to the current real wire format and keep dedicated
tests for the older schema

## Verification

- `pnpm vitest run --no-coverage --project
@paperclipai/adapter-gemini-local` — full suite passes including new
v0.38 cases and preserved legacy cases
- `pnpm typecheck` clean

## Risks

Low risk — additive event handling. Legacy event shape path is preserved
with its own tests, so existing fixtures continue to parse identically.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 08:00:14 -07:00
Dotta 3c73ed26b5 Expand plugin host surface (#5205)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The plugin system is the extension boundary for optional product
capabilities
> - Rich plugins need more than a worker entrypoint: they need scoped
database storage, local project folders, managed agents/routines, host
navigation, and reusable UI components
> - The LLM Wiki work exposed those missing host surfaces while keeping
plugin code outside the core control plane
> - This pull request expands the core plugin host, SDK, server APIs,
and UI bridge so plugins can declare and use those surfaces
> - The benefit is that future plugins can integrate with Paperclip
through documented, validated contracts instead of bespoke server or UI
imports

## What Changed

- Added plugin-managed database namespaces and migration tracking,
including Drizzle schema/migration files and SQL validation for
namespace isolation.
- Added server support for plugin local folders, managed agents, managed
routines, scoped plugin APIs, and plugin operation visibility.
- Expanded shared plugin manifest/types/validators and SDK
host/testing/UI exports for richer plugin surfaces.
- Added reusable UI pieces for file trees, managed routines, resizable
sidebars, route sidebars, and plugin bridge initialization.
- Updated plugin docs and example plugins to use the expanded host and
SDK surface.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm run preflight:workspace-links && pnpm exec vitest run
packages/shared/src/validators/plugin.test.ts
server/src/__tests__/plugin-database.test.ts
server/src/__tests__/plugin-local-folders.test.ts
server/src/__tests__/plugin-managed-agents.test.ts
server/src/__tests__/plugin-managed-routines.test.ts
server/src/__tests__/plugin-orchestration-apis.test.ts
ui/src/api/plugins.test.ts ui/src/components/FileTree.test.tsx
ui/src/components/ResizableSidebarPane.test.tsx
ui/src/pages/PluginPage.test.tsx ui/src/plugins/bridge.test.ts` passed:
11 files, 67 tests.
- Confirmed this PR changes 89 files and does not include
`pnpm-lock.yaml` or `.github/workflows/*`.

## Risks

- Medium: this expands plugin host contracts across db/shared/server/ui
and includes a new core migration (`0076_useful_elektra.sql`).
- The plugin database namespace validator is intentionally restrictive;
plugin authors may need follow-up affordances for SQL patterns that
remain blocked.
- Merge this before the LLM Wiki plugin PR so the plugin can resolve the
new SDK and host APIs.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub
workflow. Context window size was not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-05 07:42:57 -05:00
Dotta d6bee62f02 Fix Cloud tenant issue identifier routes (#5196)
## Summary

- Allow Cloud tenant issue identifiers with alphanumeric prefixes, such
as `PC1897-1`, to normalize as issue references.
- Resolve those identifiers through issue detail/update routes, active
run/live run polling, activity, costs, and `issueService.getById`.
- Keep UI issue-link parsing aligned so tenant links normalize back to
`/issues/<IDENTIFIER>`.

## Root Cause

Cloud tenant issue prefixes include digits from the stack-id hash. The
app-side route normalization still accepted only all-letter prefixes, so
`/api/issues/PC1897-1` skipped identifier lookup and fell through as a
non-UUID id.

## Verification

- `pnpm exec vitest run packages/shared/src/issue-references.test.ts
ui/src/lib/issue-reference.test.ts
server/src/__tests__/issue-identifier-routes.test.ts
server/src/__tests__/activity-routes.test.ts
server/src/__tests__/costs-service.test.ts
server/src/__tests__/agent-live-run-routes.test.ts
server/src/__tests__/issues-service.test.ts`
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/server typecheck`
- `git diff --check`

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-04 13:20:58 -05:00
Devin Foley a5430f010d Handle Gemini assistant message events in JSONL parser (#5143)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, including
agents
>   running the Gemini CLI (`gemini-local` adapter)
> - The Gemini CLI emits a JSONL event stream during a run that the
adapter
> parses to extract the assistant's response text, tool results, and
usage
> - Recent versions of the Gemini CLI emit assistant responses as
> `{ "type": "message", "role": "assistant", "content": ... }` events in
>   addition to the previously-handled event shapes
> - The parser was not handling the new event type, so the assistant's
actual
> response text was being silently dropped from parsed output. Callers
ended
>   up with empty assistant messages even when Gemini had successfully
>   responded
> - This PR teaches the parser to recognize `{type: "message", role:
>   "assistant"}` events and extract their content text via the same
>   `collectMessageText` helper used for other message-shaped events
> - The benefit is that Gemini runs surface the assistant's real
response in
> downstream consumers (issue comments, run logs, downstream agent
context)
>   instead of vanishing

## What Changed

- `packages/adapters/gemini-local/src/server/parse.ts`: in
`parseGeminiJsonl(...)`, add a branch for `event.type === "message"`
with
  `role === "assistant"` that calls
  `messages.push(...collectMessageText(event.content))`.
- `packages/adapters/gemini-local/src/server/parse.test.ts`: ~19 lines
of
  coverage for the new branch.

## Verification

- `pnpm --filter @paperclipai/adapter-gemini-local test -- parse`
- Manual QA: run a Gemini agent on an issue, confirm the assistant's
response
appears as the issue comment / run output. Before this fix the comment
was
  empty even when the run completed successfully.

## Risks

- Tightly scoped: 8 lines of production code in one parser branch. No
effect
  on existing event shapes or other adapters.
- If the Gemini CLI changes its event schema again, this branch may need
to be
  revisited — but adding it is strictly additive over current behaviour.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 18:36:50 -07:00
Devin Foley 6c090f84a9 Strip inherited host shell env from SSH remote execution (#5142)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents executing on remote SSH hosts receive an env map built from
the host
>   process's env plus per-run additions like `PAPERCLIP_API_KEY`,
>   `PAPERCLIP_RUN_ID`, etc.
> - The env map currently includes inherited host vars by default,
including
> identity-bound ones like `PATH`, `HOME`, `USER`, `NVM_DIR`, `XDG_*` —
> variables whose values are meaningful only on the host they came from
> - Sending the host's `PATH` (containing host-only directories like a
local
> nvm install path) to a remote SSH box overrides the remote's actual
`PATH`
> and breaks command resolution. Same hazard for `HOME` (commands
looking for
> config files end up in a non-existent dir), `USER` (writes go to the
wrong
>   path), etc.
> - This PR adds `sanitizeSshRemoteEnv()` that drops inherited
identity-bound
> vars when their value matches the host process's value. Explicitly-set
> values pass through untouched, so callers that genuinely want to
override
>   remote `PATH` etc. still can — but accidental leakage from
>   `process.env` is filtered.
> - The benefit is that SSH remote execution stops corrupting the remote
> shell's environment with host-shaped paths, so commands resolve
correctly
>   against the remote PATH and config files land in the remote `HOME`

## What Changed

- New `sanitizeSshRemoteEnv(env, inheritedEnv = process.env)` in
`packages/adapter-utils/src/server-utils.ts`. The identity-bound key set
is:
    - `PATH`, `HOME`, `PWD`, `SHELL`, `USER`, `LOGNAME`
    - `NVM_DIR`, `TMPDIR`, `TMP`, `TEMP`
    - `XDG_CONFIG_HOME`, `XDG_CACHE_HOME`, `XDG_DATA_HOME`,
      `XDG_STATE_HOME`, `XDG_RUNTIME_DIR`
For any key in this set, the entry is dropped iff the env value equals
the
  inherited (host process) value. Other keys pass through unchanged.
- `readEnvValueCaseInsensitive(...)` helper handles Windows-style
  case-insensitive env var lookups.
- Wired into `resolveSpawnTarget(...)` for the SSH transport. Sandbox
and local
  paths are unaffected.
- Tests added in `server-utils.test.ts` (~50 lines) covering: matching
keys
filtered, mismatched keys preserved, non-identity keys passed through,
case
  insensitivity.

## Verification

- `pnpm --filter @paperclipai/adapter-utils test -- server-utils`
- Manual QA: run any adapter against an SSH-backed environment, confirm
remote command resolution works (e.g. `node`, `npm`, the adapter's CLI)
and
config files land in the remote user's `HOME`. Compare to the prior
behaviour
by transiently re-introducing the inherited `PATH` and watching commands
  fail with `command not found`.

## Risks

- Behavioural shift: SSH remote execution previously passed inherited
host env
  vars verbatim. Code that relied on that (e.g. a remote command somehow
  expecting the host's `PATH`) will see different behaviour. None of the
  adapter code in this repo has such a dependency.
- Edge case: if a caller explicitly sets `PATH` to the same value as the
host's
`PATH` (literally — same exact string), the sanitizer drops it as a
leak.
  In practice no caller constructs the env this way.
- Windows host: case-insensitive lookup handles `Path` vs `PATH`
correctly.
  Tested.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 18:36:13 -07:00
Devin Foley 90631b09b3 Let adapters declare runtime command spec for remote provisioning (#5141)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, running
adapter
> commands like `claude`, `codex`, `pi` either locally or on remote
runtimes
>   (SSH hosts, sandboxes, etc.)
> - On a fresh remote runtime — particularly an ephemeral sandbox — the
> adapter's CLI may not be installed yet. Today operators handle this
via
> external configuration (e.g. a project-level `provisionCommand` shell
> script) that has to know about every adapter the operator might want
to use
> - This means every adapter has its own well-known npm package, but
operators
>   end up writing duplicate provision shell scripts that paste together
> `npm install -g @anthropic-ai/claude-code`, `npm install -g
@openai/codex`,
>   etc. — knowledge the adapter itself already has
> - This PR moves that knowledge into the adapter modules: each adapter
declares
> how its runtime command should be detected and (if applicable)
installed
> via `getRuntimeCommandSpec(config)`. The execution path runs the
adapter's
> own install command on remote sandbox targets before launching, so a
fresh
> sandbox bootstraps itself instead of requiring a hand-written
provision script
> - The benefit is fewer footguns for operators provisioning remote
runtimes,
>   and a clean place for new adapters to plug in their install recipe

## What Changed

- New types in `packages/adapter-utils/src/types.ts`:
    - `AdapterRuntimeCommandSpec` describing `command`, optional
      `detectCommand`, and optional `installCommand`
    - Optional `getRuntimeCommandSpec(config)` on `ServerAdapterModule`
- Optional `runtimeCommandSpec` on `AdapterExecutionContext` so adapters
      receive the resolved spec at execute time
- New helper `ensureAdapterExecutionTargetRuntimeCommandInstalled(...)`
in
`packages/adapter-utils/src/execution-target.ts` that runs the install
command
on remote targets when `transport === "sandbox"`. SSH and local targets
are
  no-ops. Throws on timeout or non-zero exit so failures surface early.
- Each of `claude-local`, `codex-local`, `cursor-local`, `gemini-local`,
  `opencode-local`, `pi-local`'s `execute.ts` now reads
`ctx.runtimeCommandSpec?.installCommand` and calls the helper before
launching
  the adapter command.
- `server/src/adapters/registry.ts` declares `getRuntimeCommandSpec` for
each
  adapter:
- claude/codex/gemini/opencode/pi-local: `npm install -g <package>`
recipe via
a shared `buildNpmRuntimeCommandSpec` helper, with a defensive guard
that
only auto-installs when the configured `command` matches the well-known
      fallback (custom binaries are left alone).
- cursor-local: declares `command` only; no auto-install (no public npm
      package), preserving the existing manual setup.
- `server/src/services/heartbeat.ts` resolves the spec via
`adapter.getRuntimeCommandSpec?.(runtimeConfig)` and passes it through
to
  `AdapterExecutionContext`.
- Tests added in `execution-target.test.ts` (~75 lines), e2b
`plugin.test.ts` (~32 lines), and `environment-run-orchestrator.test.ts`
  (~76 lines).

## Verification

- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-run-orchestrator`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: run an adapter (claude/codex/etc.) against a fresh
sandbox-backed
environment that does NOT have the adapter CLI pre-installed. Confirm
the
install runs once at the start of the agent run and the adapter then
launches
successfully. Re-run on the same sandbox; confirm the install command is
  idempotent and the second run starts faster.
- Confirm SSH and local execution paths are unaffected (gated by
  `transport === "sandbox"`).

## Risks

- Behavioural shift on sandbox runs: a new install step now runs at the
start
  of every sandbox agent run for adapters with `installCommand` set. The
install commands are idempotent (`if ! command -v X >/dev/null 2>&1;
then
npm install -g <pkg>; fi`), so this is fast on warm sandboxes. On a cold
  sandbox, the first run takes longer.
- Operators who used the legacy project-level `provisionCommand` to
install
adapter CLIs can drop that part of their script; the adapter handles it
now.
  Existing scripts continue to work — installs are idempotent.
- The cursor-local adapter has no auto-install (no public npm package).
  Behaviour for cursor-local on sandboxes is unchanged.
- New optional surface on `ServerAdapterModule`. Plugins that don't
implement
  `getRuntimeCommandSpec` retain previous behaviour (no auto-install).

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 18:35:36 -07:00
Devin Foley 2dce81fbf6 Add optional bridge proxy request logging via PAPERCLIP_BRIDGE_DEBUG (#5140)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents on remote sandboxes call back into the Paperclip control
plane via a
> callback bridge — the host process running the bridge proxies HTTP
requests
>   from the sandbox to the Paperclip API
> - When something goes wrong end-to-end (sandbox can't reach Paperclip,
requests
> timing out, malformed responses), it's hard to tell whether the bridge
> processed the request, what URL/method it used, and what the upstream
>   responded with
> - There was no built-in way to log bridge proxy traffic without
modifying
>   adapter code or attaching a debugger
> - This PR adds opt-in stdout logging of every bridge proxy request and
response
> (method, path, query, status), gated behind `PAPERCLIP_BRIDGE_DEBUG`
so it
>   stays off by default
> - The benefit is that operators can flip a single env var to get full
visibility
> into bridge traffic when debugging remote runs, without changing code

## What Changed

- `packages/adapter-utils/src/execution-target.ts`:
`startAdapterExecutionTargetPaperclipBridge`'s `handleRequest` now logs
each
  proxied request and response when `PAPERCLIP_BRIDGE_DEBUG` is truthy:
    - `[paperclip] Bridge proxy <METHOD> <path>?<query>` before fetch
- `[paperclip] Bridge proxy response <status> for <METHOD>
<path>?<query>` after
- Logging is no-op when the env var is unset/`"0"`/`"false"`.

## Verification

- Set `PAPERCLIP_BRIDGE_DEBUG=1` in the host process env, run an agent
against
a sandbox-backed environment, confirm the bridge log lines appear in
stdout.
- Unset the env var and confirm no extra log lines appear during normal
runs.

## Risks

- Off-by-default, no observable change for shipping users.
- When enabled, the logging is verbose — every API call from the sandbox
  produces 2 stdout lines. Operators should only enable it during active
  debugging.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [ ] I have added or updated tests where applicable — covered by
exercising the
      flag in dev; the underlying handleRequest behavior is unchanged
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 18:34:48 -07:00
Devin Foley 09eceb952a Avoid resuming stale remote sessions (Pi adapter) (#5120)
> **Stacked PR (part 7 of 7).** Depends on:
  - PR #5114
  - PR #5115
  - PR #5116
  - PR #5117
  - PR #5118
  - PR #5119
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The Pi adapter persists a session jsonl per agent so subsequent runs
resume
>   conversation context instead of starting cold
> - SSH testing reproduced a real failure: a verification issue reached
terminal
>   `done` and the agent claimed success, but the proof artifact
> `manual-qa/environment-matrix/ssh/pi_local.md` was missing from the
realized
>   SSH workspace on the QA target box
> - Root cause: the saved session header recorded a different cwd than
the new
> execution cwd, but the resume eligibility check only compared
session-params
> cwd via local-style `path.resolve` (which doesn't roundtrip on remote
POSIX
> paths). The stale session got resumed and writes landed in the wrong
cwd
> - This PR tightens resume eligibility for remote targets: it adds
remote-aware
> cwd normalisation, reads the first line of the session jsonl over SSH
(`head
>   -n 1`) to verify the saved header cwd, and only resumes when both
> session-params cwd *and* the on-disk header cwd match the realised
execution
>   cwd. Stale sessions are skipped silently and the run starts cold
> - The benefit is that Pi runs across cwd-changing environments stop
> accidentally resuming each other's sessions, and proof artifacts land
where
>   reviewers expect them

## What Changed

- Added `normalizeExecutionCwd`, `executionCwdsMatch`,
`readSessionHeaderCwd`,
  and `readSavedSessionCwd` helpers in `pi-local/src/server/execute.ts`
- `readSavedSessionCwd` reads the first line of the session jsonl —
locally via
`fs.readFile`, remotely via `runAdapterExecutionTargetShellCommand`
(`head -n 1`)
- Resume eligibility now requires:
  1. Saved session id is non-empty
  2. Execution target shape matches (existing check)
  3. Session-params cwd matches the realised execution cwd
4. Session-header cwd (from the on-disk jsonl) matches the realised
execution cwd
- Stale sessions are skipped silently (run starts cold) instead of
resumed
- `execute.remote.test.ts` extended with: matching header → resume;
mismatched
header → start fresh; missing/unreadable header → start fresh; remote
head
  command failure → start fresh

## Verification

- `pnpm --filter @paperclipai/adapter-pi-local test`
- `pnpm test -- pi-local`
- Manual QA: ran a Pi agent twice in two different remote cwds,
confirmed
the second run did not pick up the first run's session and that
subsequent
  runs in the original cwd still resumed correctly

## Risks

- Adds a `head -n 1` shell call per Pi run on remote targets. Negligible
  latency (single read of session jsonl), bounded by 15s timeout.
- If the `head` call fails for unrelated reasons (transient remote
unreachability), the run will start cold instead of resuming. This is
the
safe default but worth noting — operators may see one extra cold run if
a
  remote glitches mid-session.
- No data is deleted or migrated; stale sessions remain on disk for
manual
  inspection if desired.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 13:51:38 -07:00
Devin Foley d22e790bd4 Validate remote model probes on execution target (OpenCode) (#5119)
> **Stacked PR (part 6 of 7).** Depends on:
  - PR #5114
  - PR #5115
  - PR #5116
  - PR #5117
  - PR #5118
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The OpenCode adapter validates that its configured model exists
before letting
>   a run start so misconfiguration fails fast with a clear error
> - SSH testing reproduced an OpenCode failure where issues stayed
`backlog`,
>   timed out, and produced no comments. The root cause was in
> `packages/adapters/opencode-local/src/server/execute.ts`: the local
model
> guard `ensureOpenCodeModelConfiguredAndAvailable(...)` only ran when
execution
> was *not* remote, so SSH OpenCode bypassed it and failed silently
later
> - Subsequent testing surfaced a related remote-only failure where the
probe
> (when wired up naively) hits `EACCES: permission denied, mkdir
'/var/folders'`
> on the SSH box because of how OpenCode's runtime config picks a
tempdir
> - This PR runs the model probe on the actual execution target —
`opencode
> models` via `runAdapterExecutionTargetProcess` — instead of the local
CLI,
> parses the output with the shared `parseOpenCodeModelsOutput` helper,
and
> reports a concrete error naming the offending model and a sample of
available
>   remote models when the configured model isn't present
> - The benefit is that mismatched OpenCode models surface as a clear
pre-flight
> error referencing the remote target instead of a silent run that never
leaves
>   `backlog`

## What Changed

- Added `ensureRemoteOpenCodeModelConfiguredAndAvailable` in
  `opencode-local/src/server/execute.ts` that runs `opencode models` via
`runAdapterExecutionTargetProcess` and validates the configured model is
in
  the parsed output
- `models.ts` now exports `parseOpenCodeModelsOutput` and
`requireOpenCodeModelId`
  so the remote path can reuse them
- `execute.ts` calls the remote variant when `executionTargetIsRemote`,
otherwise
  the existing local `ensureOpenCodeModelConfiguredAndAvailable`
- Errors include the offending model id and a sample of available remote
models
  so the operator knows exactly what's missing
- `execute.remote.test.ts` extended with cases for: probe timeout, probe
  non-zero exit, empty model list, and missing-model error

## Verification

- `pnpm --filter @paperclipai/adapter-opencode-local test`
- `pnpm test -- opencode-local`
- Manual QA: configured an OpenCode agent with a model that exists
locally but
not in the remote sandbox, and confirmed the new error fires before the
run
  starts and references the remote target

## Risks

- New behaviour: remote model validation adds a `~20s timeout` `opencode
models`
call on every remote run start. For most environments this is fast, but
a
network-slow sandbox could see startup latency rise. Timeout is bounded.
- If the remote CLI is missing or misconfigured, the new error replaces
the old
generic startup failure — clearer message, but the failure point shifts
earlier. Monitor for any QA flows that relied on the old failure shape.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 13:34:09 -07:00
Devin Foley 856c6cb192 Fix remote workspace environment shaping (#5118)
> **Stacked PR (part 5 of 7).** Depends on:
  - PR #5114
  - PR #5115
  - PR #5116
  - PR #5117
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents run with a Paperclip-shaped environment
(`PAPERCLIP_WORKSPACE_CWD`,
> worktree path, `PAPERCLIP_WORKSPACES_JSON` hints) so the CLI can
locate the
>   correct project tree
> - SSH testing reproduced a real failure: a Codex SSH run wrote to
> `/tmp/paperclip-env-matrix-...` (the *host* path) instead of the
realized
> remote workspace at `/home/<user>/paperclip-env-matrix-ssh-claude/...`
> because the adapter injected `PAPERCLIP_WORKSPACE_CWD=/tmp/...` into
the
>   remote env
> - Code review on the initial codex-only fix asked to roll the same
approach
> into every other SSH-capable adapter (claude, acpx, cursor, opencode,
gemini,
>   pi) via a shared helper rather than duplicating per-adapter
> - This PR adds `shapePaperclipWorkspaceEnvForExecution` in
adapter-utils that,
> when the execution target is remote: replaces local cwd with the
realized
> execution cwd, nulls out worktree path (which has no remote meaning),
and
> rewrites/strips `cwd` entries in workspace hints based on what was
actually
>   synced. Every adapter calls it before invoking the remote runner
> - The benefit is that remote runs see the realized remote workspace,
host-local
> paths stop leaking into remote env, and the rule is unit-tested in one
place

## What Changed

- Added `shapePaperclipWorkspaceEnvForExecution` to
  `packages/adapter-utils/src/server-utils.ts` with full unit coverage
  (`server-utils.test.ts`)
- Each of acpx-local, claude-local, codex-local, cursor-local,
gemini-local,
opencode-local, pi-local now calls the new shaper before issuing the
remote
  command and feeds the shaped values into `applyPaperclipWorkspaceEnv`
- Per-adapter `execute.remote.test.ts` files extended to cover the new
shaping
  behaviour: localhost paths replaced with remote cwd, foreign-cwd hints
  stripped, worktree path nulled out for remote targets
- `acpx-local/src/server/execute.test.ts` extended with shaping coverage

## Verification

- `pnpm test -- server-utils execute.remote`
- `pnpm --filter @paperclipai/adapter-acpx-local test`
- Manual QA reproducing the original failure:
  1. Provision an E2B sandbox environment for the Paperclip QA company
2. Assign an issue to a remote-targeted claude-local agent and confirm
the
run starts in the correct remote cwd (no `/Users/...` path leakage in
the
     run logs)
  3. Repeat for opencode-local and pi-local

## Risks

- Behavioural shift: hints whose `cwd` doesn't match the workspace cwd
are now
stripped on remote targets. If any adapter relied on a leaked local hint
cwd,
it will see a missing `cwd` instead. Reviewed all current callers — none
do.
- Adds a small per-run cost (path resolve + string normalisation) on
every remote
  execution. Negligible.
- Worktree path is now nulled out on remote (it has no meaning there).
Adapters
  that previously read the value defensively will continue to work.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 13:17:52 -07:00
Devin Foley bb7d040894 Switch OpenCode to explicit static/local-aware model selection (#5117)
> **Stacked PR (part 4 of 7).** Depends on:
  - PR #5114
  - PR #5115
  - PR #5116
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - When creating an OpenCode-local agent, Paperclip currently validates
> `adapterConfig.model` against the *Paperclip host's* `opencode models`
output
> - SSH testing surfaced that this blocks creating an OpenCode agent for
an SSH
> environment: the model that exists on the SSH target isn't visible to
the
> host, so creation fails with "OpenCode requires `adapterConfig.model`
in
> provider/model format" even when the operator picked a real remote
model
> - The initial direction was environment-aware model discovery; the
final
> decision was to keep OpenCode on the same explicit-model pattern as
other
> adapters (default + curated list + manual override) and stop blocking
>   creation on host-side discovery
> - This PR does both: the adapter-models endpoint now accepts
`environmentId` and
> probes against the target environment, and the create-time hard gate
is
> replaced by `requireOpenCodeModelId` which validates `provider/model`
*format*
> without requiring host-local discovery. Test/run-time still surfaces
real
>   auth/availability problems
> - The benefit is that operators can create OpenCode agents for remote
> environments without out-of-band setup, and the model picker in the UI
>   reflects the actually-targeted environment

## What Changed

- Added `requireOpenCodeModelId(input)` in
`opencode-local/src/server/models.ts`,
  exported it from the adapter index
- `ensureOpenCodeModelConfiguredAndAvailable` now delegates the format
check to
  `requireOpenCodeModelId`
- `agentsApi.adapterModels(companyId, adapterType, { environmentId })`
now accepts
  an environment ID and passes it as a query parameter
- `queryKeys.agents.adapterModels` now keys on `(companyId, adapterType,
environmentId)`
- `server/src/routes/agents.ts` reads and validates the new query
parameter,
  forwarding it to the adapter's model probe
- `AgentConfigForm.tsx` and `OnboardingWizard.tsx` build the model query
key from
the currently selected default environment ID and disable autodetect for
  `opencode_local` (model selection is explicit)
- `NewAgent.tsx` simplified — no longer special-cases OpenCode
autodetect
- `company-portability.ts` no longer needs OpenCode-specific autodetect
handling
- Tests added/updated:
  `adapter-model-refresh-routes.test.ts`, `adapter-models.test.ts`,
`agent-permissions-routes.test.ts`,
`opencode-local/src/server/models.test.ts`

## Verification

- `pnpm --filter @paperclipai/server test -- adapter-models
adapter-model-refresh agent-permissions`
- `pnpm --filter @paperclipai/adapter-opencode-local test`
- `pnpm --filter @paperclipai/ui test -- AgentConfigForm
OnboardingWizard NewAgent`
- Manual QA in browser:
1. Boot Paperclip on Tailscale-bound port (so it's reachable from
another
machine), create an OpenCode-local agent, switch the default environment
between two installed sandboxes, and confirm the model list refreshes
     per-environment
  2. Submit with a malformed `provider/model` string and verify the new
     `requireOpenCodeModelId` error surfaces
- Before/after screenshots attached for `AgentConfigForm` model picker

## Risks

- Behavioural shift: switching default environment now triggers a model
refetch.
Should be cheap but introduces a new UI loading state for OpenCode
users.
- Removing dynamic autodetect for OpenCode: if any user configured an
agent
without specifying `model` and relied on autodetect populating it, that
agent
will now fail at submit time. Mitigation: validation error is explicit
and
  actionable.
- New query string parameter on `/api/companies/:id/adapter-models` —
older
clients that omit it still work (parameter is optional and defaults to
null).

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 13:01:34 -07:00
Devin Foley 076067865f Migrate SSH environment callback to bridge (#5116)
> **Stacked PR (part 3 of 7).** Depends on:
  - PR #5114
  - PR #5115
> Diff against `master` includes commits from earlier PRs in the stack —
the new commit in this PR is the topmost one.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents executing on a remote SSH-backed environment need a way to
call back into
>   the Paperclip control plane (run events, log streaming, signals)
> - When the SSH host can't reach the Paperclip host (NAT, firewalls, or
simply not
> on the same network), the run silently fails or hangs — a recurring
class of
>   failure during SSH testing
> - In sandboxed environments we already solved this with a callback
bridge that
> tunnels back through the existing connection; SSH was the odd one out
> - This PR migrates SSH execution to use the same callback bridge, so
every
> adapter's remote run uses one consistent reverse-channel. Per-adapter
SSH glue
> is deleted in favour of a shared `CommandManagedRuntimeRunner` built
from the
>   SSH spec
> - The benefit is fewer SSH-specific failure modes, a smaller code
surface, and
>   one place to evolve the callback contract going forward

## What Changed

- Added `createSshCommandManagedRuntimeRunner` in
`packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a
generic
  command-managed-runtime runner (with cwd, env, and timeout handling)
- Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge
URL now flows
  through the shared runner
- Reworked `execution-target.ts` to use the SSH runner alongside sandbox
runners
  via a unified `CommandManagedRuntimeRunner` interface
- Simplified `remote-managed-runtime.ts` and
`sandbox-managed-runtime.ts` to consume
  the shared runner abstraction
- Deleted per-adapter SSH callback wiring from claude-local,
codex-local,
  cursor-local, gemini-local, opencode-local, pi-local execute.ts files
- Removed `environment-runtime-driver-contract.test.ts` (the contract is
now
  enforced by `environment-execution-target.test.ts`)
- Added/updated `execute.remote.test.ts` cases for each adapter to cover
the SSH
  runner path

## Verification

- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm test -- execute.remote` (covers all six local adapters' SSH
paths)
- Manual QA: ran a claude-local agent against an SSH-backed environment,
confirmed
the agent successfully called back to `/api/agent-callback/*` endpoints
during
  the run

## Risks

- Refactor touches all six local adapters. If any adapter had subtle
SSH-specific
behaviour that wasn't captured in tests, it could regress. Mitigation:
each
  adapter's `execute.remote.test.ts` was extended.
- `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking
type change
for any internal consumer. Verified no external plugins consume this
type.
- The new `CommandManagedRuntimeRunner` shape is a public surface in
`@paperclipai/adapter-utils`; downstream plugins implementing custom
runners may
  need updates, but no such plugins exist in this repo.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 12:43:52 -07:00
Devin Foley a7b45938b7 Let sandbox providers declare shell defaults (#5114)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents execute in sandboxed remote environments served by pluggable
sandbox
>   providers (E2B today, more later)
> - Today every sandbox command runs under `sh -lc` regardless of what
the
>   provider's container actually ships
> - That misses bash-only shell init on E2B (which ships bash) and
prevents
> future providers from declaring a different default — there's no way
for a
>   provider to say "I have bash, use it"
> - This PR adds a `shellCommand` field to sandbox execution targets so
providers
> can declare their preferred shell ("bash" for E2B), threads it through
the
> sandbox-managed-runtime client, callback bridge, and execution-target
shell
>   helper, and validates the value at the lease-metadata boundary
> - The benefit is that sandbox commands run under the right shell on
the right
> provider, and adding new sandbox providers only needs to declare a
shell
>   preference

## What Changed

- Added `packages/adapter-utils/src/sandbox-shell.ts` exporting
`preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is
`"bash"`,
  else `"sh"`)
- Added `shellCommand?: "bash" | "sh" | null` to
`AdapterSandboxExecutionTarget`
  and `CommandManagedRuntimeSpec`; threaded it through
`runAdapterExecutionTargetShellCommand`,
`prepareAdapterExecutionTargetRuntime`,
  and `startAdapterExecutionTargetPaperclipBridge`
- `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`,
and
`createCommandManagedSandboxCallbackBridgeQueueClient` now take an
optional
  `shellCommand` and use `preferredShellForSandbox` to pick the shell
- `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its
server
  startup, readiness probe, and stop hook
- E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata`
- `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease
metadata
  (validating against `"bash" | "sh" | null`)
- `environment-runtime.ts` adds `"shellCommand"` to
`INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS`
so the field round-trips through internal plugin config without leaking
to
  external plugin metadata
- Updated tests in `command-managed-runtime.test.ts`,
  `execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`,
  `environment-execution-target.test.ts`

## Verification

- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-execution-target`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: boot a Paperclip instance, create an E2B-backed
environment, run a
claude_local agent against it, and confirm the run completes (verifies
bash
  shell semantics flow through the callback bridge end-to-end)

## Risks

- E2B sandbox commands now run under `bash -lc` instead of `sh -lc`.
Bash is a
strict superset for the commands we issue (no busybox-only flags in our
shell
scripts), so risk is low. The shellCommand field is opt-in via lease
metadata —
  providers that don't declare it stay on `sh`.
- New optional field on `CommandManagedRuntimeSpec` and
`AdapterSandboxExecutionTarget`.
  Consumers ignoring the field retain previous behaviour (sh).
- Lease metadata now carries an additional field. Existing leases
without
`shellCommand` resolve to `null` and fall back to sh — backwards
compatible.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 12:19:35 -07:00
Dotta 15eac43b43 [codex] Retry max-turn exhausted heartbeats (#5096)
## Thinking Path

> - Paperclip orchestrates AI agents for autonomous companies, and
heartbeat execution is the control-plane loop that keeps assigned work
moving.
> - Max-turn exhaustion is a recoverable local-adapter stop condition
for Claude and Gemini agents when a run needs another heartbeat to
continue safely.
> - The previous behavior could leave max-turn continuation details hard
to inspect, and duplicate/stale continuation wakes could keep running
after issue state changed.
> - The adapter layer also needed to avoid trusting arbitrary
stdout/stderr text as scheduler control metadata.
> - This pull request adds bounded max-turn continuation scheduling,
visible retry state, structured stop metadata handling, and
stale/duplicate continuation guards.
> - The benefit is safer automatic continuation after max-turn stops,
clearer operator visibility, and fewer duplicate or stale agent runs.

## What Changed

- Replaces closed PR #4952, whose head repository was deleted.
- Rebases the recovered max-turn continuation branch onto current
`paperclipai/paperclip:master`.
- Adds max-turn continuation scheduling and retry-state plumbing for
heartbeat runs.
- Adds stale/duplicate continuation suppression when issue status,
ownership, or execution locks change.
- Normalizes Claude/Gemini max-turn detection around structured stop
metadata instead of unstructured stdout/stderr text.
- Surfaces max-turn continuation settings and retry visibility in the
board UI.
- Adds focused server, adapter, and UI tests for max-turn stop metadata,
retry scheduling, stale queued-run invalidation, adapter
parsing/execution, run ledger display, and agent config patching.

## Verification

- `pnpm install --no-frozen-lockfile` to refresh local dependencies
after rebasing onto current `master`.
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/claude-local-adapter.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter.test.ts
server/src/__tests__/gemini-local-execute.test.ts
server/src/__tests__/heartbeat-retry-scheduling.test.ts
server/src/__tests__/heartbeat-stale-queue-invalidation.test.ts
server/src/services/heartbeat-stop-metadata.test.ts
ui/src/components/IssueRunLedger.test.tsx
ui/src/lib/agent-config-patch.test.ts ui/src/lib/runRetryState.test.ts
--testTimeout=20000`
- `pnpm --filter @paperclipai/adapter-claude-local typecheck && pnpm
--filter @paperclipai/adapter-gemini-local typecheck && pnpm --filter
@paperclipai/server typecheck && pnpm --filter @paperclipai/ui
typecheck`
- UI screenshot note: the UI changes are limited to config/ledger state
rendering rather than layout changes; component/unit coverage above
verifies the rendered behavior.

## Risks

- Medium behavior risk: heartbeat retry gating now suppresses max-turn
continuations when issue state or execution locks drift, so any callers
that relied on stale continuations running will now see cancellation
instead.
- Low adapter risk: Claude/Gemini unstructured text no longer triggers
max-turn scheduler metadata, so only structured stop signals and Gemini
exit code 53 are trusted.
- No database migrations.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent, GPT-5-class model, tool-enabled local
repository editing and command execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable: state/default rendering only; covered by
component/unit tests)
- [x] I have updated relevant documentation to reflect my changes (not
applicable: no user-facing command or docs contract changed)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-03 11:30:48 -05:00
Dotta 57229d0f24 [codex] Add issue monitor liveness controls (#4988)
## Thinking Path

> - Paperclip is a control plane for autonomous AI companies where work
must stay observable, governable, and recoverable.
> - The task/heartbeat subsystem owns agent execution continuity, issue
state transitions, and visible recovery behavior.
> - Waiting on an external service is not the same as being blocked when
the assignee still owns a future check.
> - The gap was that agents had no first-class one-shot monitor state
for external-service waits, so recovery could look stalled or require ad
hoc comments.
> - This pull request adds bounded issue monitors that can wake the
owner, clear exhausted waits, and produce explicit recovery behavior.
> - It also surfaces monitor status in the board UI and documents when
to use monitors versus `blocked`.
> - The benefit is clearer liveness semantics for asynchronous waits
without weakening single-assignee task ownership.

## What Changed

- Added issue monitor fields, shared types, validators, constants, and
an idempotent `0075` migration for scheduled monitor state.
- Added server-side monitor scheduling, dispatch, recovery bounds,
activity logging, and external-ref redaction.
- Added board/agent route coverage for monitor permissions and child
monitor scheduling.
- Added issue detail/property UI for monitor state, a monitor activity
card, and Storybook stories for review surfaces.
- Documented monitor semantics and recovery policy behavior in
`doc/execution-semantics.md`.
- Addressed Greptile review feedback by preserving monitor state in
skipped-stage builders and making board monitor saves send `scheduledBy:
"board"`.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/issue-execution-policy-routes.test.ts
server/src/__tests__/issue-execution-policy.test.ts
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/lib/activity-format.test.ts`
- First run passed 5 files and failed to collect 2 server suites because
the worktree was missing the optional `acpx/runtime` dependency.
- After `pnpm install --frozen-lockfile`, reran the 2 failed suites
successfully.
- `pnpm exec vitest run
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts`
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck
&& pnpm --filter @paperclipai/ui typecheck`
- `pnpm exec vitest run
server/src/__tests__/issue-execution-policy.test.ts
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/server typecheck && pnpm --filter
@paperclipai/ui typecheck`
- `pnpm exec vitest run
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- Storybook screenshot captured from
`http://127.0.0.1:6006/iframe.html?viewMode=story&id=product-issue-monitor-surfaces--monitor-surfaces`
with Playwright.

## Screenshots

![Issue monitor Storybook
surfaces](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2945-when-a-task-is-waiting-for-an-_external-service_-what-state-should-it-be-in-and-what-recovery-method-could-it-h/docs/pr-screenshots/pap-2945/monitor-surfaces.png)

## Risks

- Medium: this changes heartbeat recovery behavior for scheduled
external-service waits, so regressions could affect wake timing or
recovery issue creation.
- Migration risk is reduced by using `IF NOT EXISTS` for the new issue
monitor columns and index.
- External monitor references are treated as secret-adjacent and are
intentionally omitted from visible activity/wake payloads.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent with repository tool use and terminal
execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots or Storybook review surfaces
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-03 08:58:53 -05:00
Dotta d7719423e9 [codex] Harden non-system database backup schemas (#4960)
## Thinking Path

> - Paperclip is a control plane whose database is the durable audit and
work record
> - Database backup needs to include operator/plugin schemas while
excluding PostgreSQL-owned internals
> - PostgreSQL reserves the `pg_` schema prefix for system schemas,
including temp and toast variants
> - A single escaped `pg_` prefix predicate is less brittle than
enumerating individual `pg_toast` and `pg_temp` forms
> - This pull request tightens non-system schema discovery for logical
backups without changing the normal user/plugin schema path

## What Changed

- Replaced narrow `pg_toast` and `pg_temp` schema exclusions with an
escaped `pg_` reserved-prefix exclusion.
- Kept `information_schema` excluded from logical backup metadata
discovery.
- Addressed Greptile feedback by removing redundant no-op additions from
the prior iteration.

## Verification

- `pnpm exec vitest run packages/db/src/backup-lib.test.ts`
- PR checks on the latest pushed head: policy, verify, e2e, Greptile
Review, and Snyk

## Risks

- Low risk: PostgreSQL reserves `pg_` schema names for system use, so
this should only exclude database-owned internals that should not be
restored from Paperclip logical backups.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected - check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use and local command
execution. Exact context window was not exposed in the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-01 11:59:53 -05:00
Dotta e8275318ba [codex] Raise agent heartbeat concurrency default (#4954)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agent heartbeat settings control how much parallel work one employee
can run
> - The previous default of 5 concurrent runs was too restrictive for
active local agent teams
> - The shared default, heartbeat clamp, docs, and route/import/UI
expectations need to agree
> - This pull request raises the default heartbeat concurrency to 20
while keeping explicit headroom up to 50 for power users
> - The benefit is higher throughput for agent teams without each new
agent needing manual runtime config edits

## What Changed

- Raised `AGENT_DEFAULT_MAX_CONCURRENT_RUNS` from 5 to 20.
- Raised the heartbeat service max clamp from 10 to 50, keeping the new
default below the ceiling.
- Updated V1 implementation docs and tests that assert default
imported/exported runtime config.
- Updated the new-agent UI runtime config test to assert the shared
default constant instead of duplicating the numeric value.

## Verification

- `pnpm exec vitest run
server/src/__tests__/agent-permissions-routes.test.ts
server/src/__tests__/company-portability.test.ts
ui/src/lib/new-agent-runtime-config.test.ts`

## Risks

- Medium risk: new agents can consume more local execution capacity by
default. The heartbeat scheduler still respects configured max
concurrency and budget/pause controls, and operators can lower or raise
the per-agent cap within the `1..50` clamp.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use and local command
execution. Exact context window was not exposed in the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-01 10:42:56 -05:00
Devin Foley d2dd759caa plugins: make e2b template default explicit (#4901)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Remote execution environments are part of that control plane,
including sandbox-provider plugins like E2B
> - The E2B provider already normalizes config and runtime behavior
around a `base` template default
> - But the manifest still presented `template` as required, which
forces redundant operator input and makes the UI contract stricter than
runtime behavior
> - That mismatch showed up while building a repeatable QA workflow for
sandbox testing
> - This pull request makes the manifest and validation contract line up
with the existing `base` default
> - The benefit is a simpler and more accurate E2B environment setup
experience

## What Changed

- Removed the E2B manifest's `required: ["template"]` requirement so the
config schema matches runtime behavior
- Clarified the manifest description to say the template defaults to
`base` when omitted
- Added a focused unit test proving that validation normalizes a missing
template to `base`

## Verification

- Ran the focused E2B plugin test for the new behavior:
- `cd packages/plugins/sandbox-providers/e2b && pnpm test --
--testNamePattern "defaults a missing template to base"`

## Risks

- Low risk. This only loosens the schema to match the plugin's existing
runtime normalization and adds a test for that path.
- The broader E2B plugin suite currently has unrelated existing failures
outside this change; this PR does not modify those paths.

## Model Used

- OpenAI Codex, GPT-5 Codex via Codex CLI agent tooling, large-context
coding workflow with terminal tool use and local test execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [ ] I will address all Greptile and reviewer comments before
requesting merge
2026-04-30 22:43:24 -07:00
Devin Foley b02e67cea5 fix(ci): diff PR workflow paths from merge base (#4903)
## Thinking Path

> - Paperclip’s PR workflow is part of the control-plane safety surface
because it decides whether a branch is allowed to merge.
> - This issue started in that workflow: the lockfile and manifest
policy checks were diffing `base.sha..head.sha`, which incorrectly
treated unrelated `master` commits as if they belonged to the PR branch.
> - The right fix there is to diff from the PR merge base
(`base...head`) so policy checks only evaluate files introduced by the
branch itself.
> - Once that workflow fix was in place, `/checkpr` exposed a second
blocker on the PR merge ref: `verify` was failing in newer `master`-side
tests that were not part of the original branch diff.
> - The actionable repeated failure came from the ACPX local adapter
test suite, where a test hard-coded the managed Codex home under
`instances/default` even though the stable Vitest runner sets a
non-default `PAPERCLIP_INSTANCE_ID`.
> - This pull request now includes both the original CI diff-scope fix
and the targeted ACPX test fix so the PR’s actual checks align with
current base-branch execution.
> - The benefit is that the original false-positive lockfile failure is
removed, and the merge-ref verify path is hardened against the
instance-id isolation used in CI.

## What Changed

- Updated `.github/workflows/pr.yml` so the lockfile policy and manifest
policy steps diff `pull_request.base.sha...pull_request.head.sha` from
the merge base instead of using a two-dot base/head diff.
- Added an inline workflow comment explaining why the three-dot diff is
required for PR-scoped file detection.
- Updated `packages/adapters/acpx-local/src/server/execute.test.ts` so
the managed Codex home assertion uses a test-specific
`PAPERCLIP_INSTANCE_ID` instead of hard-coding `default`.
- Restored `PAPERCLIP_INSTANCE_ID` after that ACPX test finishes so the
test remains isolated and does not leak process env changes.

## Verification

- Reproduced the original false positive locally by comparing PR heads
`#4901` and `#4902` with the old `base..head` logic; both incorrectly
included `pnpm-lock.yaml` from unrelated `master` commits.
- Verified the new `base...head` logic reduces those PRs to only their
actual changed files and excludes `pnpm-lock.yaml`.
- Verified a real manifest-changing PR (`#4893`) still reports
`package.json` changes under the new logic.
- Ran `pnpm -r typecheck` successfully.
- Ran `pnpm vitest run
packages/adapters/acpx-local/src/server/execute.test.ts` successfully
after the ACPX test fix.
- Ran `pnpm vitest run packages/db/src/backup-lib.test.ts` successfully
against the merge-ref-related DB failure path observed during
`/checkpr`.
- Pushed commit `9520a976` and allowed PR `#4903` checks to rerun on the
updated branch.

## Risks

- Low risk: the workflow change only affects how PR policy checks
determine the changed file set.
- Low risk: the ACPX change is test-only and aligns the test with the
instance-isolation behavior already used by
`scripts/run-vitest-stable.mjs` in CI.
- The remaining operational risk is limited to other unrelated
merge-ref-only failures that were not reproduced in the targeted local
verification above.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, `gpt-5-codex`, via the Codex local adapter in Paperclip.
- Tool-using coding model with shell execution, git, GitHub CLI, and
repository inspection in a local worktree.
- Context included the current repo, the Paperclip task thread, PR check
output, and the isolated execution workspace.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-30 21:22:40 -07:00
Dotta 4272c1604d Add ACPX local adapter runtime (#4893)
## Thinking Path

> - Paperclip orchestrates AI-agent companies through a control plane
that can start, supervise, and recover agent runs.
> - Local adapters are the bridge between Paperclip issues and concrete
agent runtimes such as Claude, Codex, and other ACP-compatible tools.
> - The roadmap calls out broader “bring your own agent” and claw-style
agent support, and ACPX gives Paperclip one path to normalize multiple
ACP agents behind a single adapter.
> - The branch needed to become one reviewable PR against current
`paperclipai/paperclip:master`, without carrying stale base conflicts or
generated lockfile churn.
> - This pull request adds an experimental built-in `acpx_local`
adapter, integrates it through the server/CLI/UI adapter surfaces, and
adds regression coverage for runtime execution, skill sync, stream
parsing, diagnostics, and log redaction.
> - The benefit is that Paperclip can run Claude/Codex/custom ACP agents
through ACPX while keeping operator configuration, skills, logging, and
transcript rendering inside the existing adapter model.

## What Changed

- Added `@paperclipai/adapter-acpx-local` with server execution, config
schema, ACPX session handling, CLI formatting, UI config helpers, and
stdout parsing.
- Registered `acpx_local` across CLI, server, shared constants, UI
adapter metadata, adapter capabilities, and agent creation/editing
surfaces.
- Added ACPX runtime execution support with persistent sessions,
local-agent JWT environment handling, skill snapshots, runtime skill
materialization, and isolation/security regressions.
- Added ACPX adapter diagnostics and marked the adapter experimental in
the UI.
- Added command/env secret redaction for resolved command metadata in
adapter-utils, server event storage, and the Agent Detail invocation UI.
- Added Storybook coverage for ACPX config, transcript rendering, and
skill states, plus PR screenshots under `docs/pr-screenshots/pap-2944/`.
- Rebased the branch onto current `public-gh/master`; `pnpm-lock.yaml`
is intentionally not included and there are no migration/schema changes.

## Verification

- `pnpm exec vitest run
packages/adapters/acpx-local/src/server/execute.test.ts
packages/adapters/acpx-local/src/server/test.test.ts
packages/adapters/acpx-local/src/cli/format-event.test.ts
packages/adapters/acpx-local/src/ui/parse-stdout.test.ts
packages/adapter-utils/src/server-utils.test.ts
server/src/__tests__/redaction.test.ts
server/src/__tests__/acpx-local-execute.test.ts
server/src/__tests__/acpx-local-skill-sync.test.ts
server/src/__tests__/acpx-local-adapter-environment.test.ts
server/src/__tests__/adapter-routes.test.ts
server/src/__tests__/agent-skills-routes.test.ts
ui/src/adapters/metadata.test.ts` — 12 files, 87 tests passed.
- `pnpm --filter @paperclipai/adapter-acpx-local typecheck` — passed.
- `pnpm --filter @paperclipai/server typecheck` — passed.
- `pnpm --filter @paperclipai/ui typecheck` — passed.
- Confirmed PR diff does not include `pnpm-lock.yaml`, database schema
files, or migrations.

Screenshots:

![ACPX Claude skills
light](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-claude-light.png?raw=true)
![ACPX Claude skills
dark](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-claude-dark.png?raw=true)
![ACPX custom skills
light](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-custom-light.png?raw=true)

## Risks

- Medium risk: this introduces a new built-in adapter package and
touches runtime execution, adapter registration, agent config, skills,
and transcript rendering.
- ACPX and ACP agent behavior can vary by installed tool versions; the
adapter is marked experimental to set operator expectations.
- `pnpm-lock.yaml` is excluded per repository PR policy, so dependency
lock refresh must be handled by the repo’s automation or maintainers.
- No database migration risk: no schema or migration files changed.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with repository tool use,
shell execution, git operations, and local verification. Exact hosted
context window was not exposed in this environment.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-30 19:57:05 -05:00
Dotta a3de1d764d Add cheap model profiles for local adapters (#4881)
## Thinking Path

> - Paperclip is a control plane for autonomous AI companies, where
adapters are the boundary between the board, agents, and execution
runtimes.
> - Local adapters currently expose a primary runtime configuration, but
operators often need a cheaper model lane for routine or low-risk work.
> - That cheap lane has to stay adapter-owned: runtime profile settings
should not mutate the primary adapter config or bypass existing
auth/secret mediation.
> - Issue creation also needs an ergonomic way to request primary,
cheap, or custom model behavior for a selected assignee.
> - This pull request adds a first-class `cheap` model profile contract
across adapter capabilities, heartbeat config resolution, agent
configuration, and issue creation.
> - The benefit is cheaper task execution can be configured and
requested explicitly while preserving adapter boundaries, secret
handling, and audit visibility.

## What Changed

- Added adapter model-profile capability metadata and a `cheap` profile
contract for supported local adapters.
- Applied `runtimeConfig.modelProfiles.cheap.adapterConfig` during
heartbeat config resolution, including requested/applied/fallback run
metadata.
- Added agent configuration UI for cheap model profile settings without
writing those settings into primary `adapterConfig`.
- Added New Issue assignee model lane controls for Primary / Cheap /
Custom and request payload handling.
- Added run ledger profile badges and Storybook stories for the new
cheap-lane UI states.
- Added tests for validators, heartbeat model profile application,
permission/secret mediation, UI payload helpers, and run ledger
rendering.
- Added committed UI verification screenshots under
`docs/pr-screenshots/pap-2837/`.
- Addressed Greptile review feedback around cheap-profile defaults,
shared profile types, and fallback test data.

## Verification

Local:

- `pnpm exec vitest run packages/shared/src/validators/issue.test.ts
server/src/__tests__/adapter-registry.test.ts
server/src/__tests__/agent-permissions-routes.test.ts
server/src/__tests__/heartbeat-model-profile.test.ts
ui/src/components/IssueRunLedger.test.tsx
ui/src/lib/agent-config-patch.test.ts
ui/src/lib/issue-assignee-overrides.test.ts
ui/src/lib/new-agent-runtime-config.test.ts` — passed, 8 files / 103
tests.
- `pnpm exec vitest run ui/src/lib/new-agent-runtime-config.test.ts
ui/src/components/IssueRunLedger.test.tsx` — passed after
Greptile/rebase follow-up, 2 files / 17 tests.
- `pnpm --filter @paperclipai/ui typecheck` — passed after
Greptile/rebase follow-up.
- `pnpm -r typecheck` — passed.
- `pnpm build` — passed.
- `pnpm test:run` — did not complete successfully in this local
worktree: it stopped in pre-existing `@paperclipai/adapter-utils`
sandbox/SSH fixture suites outside this PR diff. Failures were 5s local
timeouts plus `git init -b` unsupported by this machine's Git 2.21.0.
The branch-specific targeted suites above passed.
- Branch was fetched/rebased onto `public-gh/master`; `git rev-list
--left-right --count public-gh/master...HEAD` reports `0 9`.

Remote PR checks on latest head
`e30bf399146451c86cee98ed528d51d33fa5af5a`:

- `policy` — passed.
- `verify` — passed.
- `e2e` — passed.
- `Greptile Review` — passed, confidence score 5/5; Greptile review
threads resolved.
- `security/snyk (cryppadotta)` — passed.

Screenshots:

- [New issue cheap lane
desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-cheap-desktop.png)
- [New issue custom lane
desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-custom-desktop.png)
- [New issue unsupported adapter
desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-unsupported-desktop.png)
- [Run ledger model profile badges
desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/runledger-profile-badges-desktop.png)
- Mobile variants are also in `docs/pr-screenshots/pap-2837/`.

## Risks

- Medium: heartbeat config mediation now merges runtime model profiles
into adapter configs, so adapter secret normalization and host-command
restrictions must keep covering nested config paths.
- Medium: the UI adds another issue creation choice; unsupported
adapters must keep hiding the cheap lane and preserve primary behavior.
- Low migration risk: no database migration is included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

OpenAI Codex coding agent using GPT-5-class reasoning with repo tool use
and command execution. Exact served model/context window was not exposed
by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:32:04 -05:00
Dotta 1fe1067361 Polish board settings and skills workflow (#4863)
## Thinking Path

> - Paperclip's board UI and bundled skills are the operator layer for
configuring agents, routines, issue workflows, and local troubleshooting
loops.
> - The prior rollup mixed this operator polish with database backups,
backend reliability, thread scale, and cost/workflow primitives.
> - This pull request isolates the remaining board QoL, settings,
issue-detail integration, adapter config cleanup, and skills smoke
tooling.
> - It includes some integration-level overlap with the thread and
workflow slices so this branch can run from `origin/master` while still
preserving the full original work.
> - Preferred merge order is the narrower primitives first, then this
integration PR last.
> - The benefit is that reviewers can inspect the user-facing
board/settings/skills layer separately from backend infrastructure
changes.

## What Changed

- Added board/settings polish for agents, routines, company settings,
project workspace detail, and issue detail controls.
- Added agent/routine UI regression tests and New Issue dialog coverage.
- Integrated issue-detail activity/cost/interaction surfaces and leaf
work pause/resume controls.
- Cleaned bundled adapter UI config defaults and onboarding copy.
- Added terminal-bench loop and work-stoppage diagnosis skills plus a
smoke test script.
- Updated attachment type handling and Paperclip skill/API guidance.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run ui/src/pages/Agents.test.tsx
ui/src/pages/Routines.test.tsx ui/src/components/NewIssueDialog.test.tsx
ui/src/pages/IssueDetail.test.tsx
server/src/__tests__/costs-service.test.ts
server/src/__tests__/issue-thread-interaction-routes.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts`
- Result: 7 test files passed, 54 tests passed.
- `pnpm run smoke:terminal-bench-loop-skill`
- Result: JSON output included `"ok": true` and `"cleanup": true`.
- UI screenshots not included because verification is focused
component/page coverage for the changed board surfaces.

## Risks

- This is the integration-heavy PR in the split and intentionally
overlaps some component/API primitives with the issue-thread and
workflow PRs so it can run from `origin/master`.
- Preferred merge order: #4859, #4860, #4861, #4862, then this PR last.
If earlier branches merge first, this PR may need a straightforward
conflict refresh in shared UI files.
- The terminal-bench smoke script creates temporary mock issues and
relies on cleanup; the verified run returned `cleanup: true`.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5.5, code execution and GitHub CLI tool use, medium
reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-30 15:28:11 -05:00
Dotta c4269bab59 Add workflow interaction cancellation and issue cost summaries (#4862)
## Thinking Path

> - Paperclip coordinates work through issue-thread interactions, run
history, and cost telemetry.
> - Operators need workflow prompts to be cancellable and costs to be
visible at the issue level.
> - The earlier rollup mixed this workflow/cost work with database
backups, reliability recovery, thread scaling, and settings polish.
> - This pull request isolates the interaction and cost surfaces into a
reviewable slice.
> - The backend now supports cancelling pending question interactions
and summarizing issue-tree costs.
> - The UI component layer can render cancelled questions and interleave
activity with run ledger rows.

## What Changed

- Added `cancelled` as an issue-thread interaction status and result
shape for question interactions.
- Added the board-only `POST
/issues/:id/interactions/:interactionId/cancel` route and service
implementation.
- Added issue-tree cost summary support in the cost service and
`/issues/:id/cost-summary` API route.
- Extended shared cost exports and UI API/query keys for issue cost
summaries.
- Updated `IssueThreadInteractionCard` and `IssueRunLedger` components
for cancelled questions, issue cost surfaces, and activity/run
interleaving.
- Added focused server and component regression coverage.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run server/src/__tests__/costs-service.test.ts
server/src/__tests__/issue-thread-interaction-routes.test.ts
server/src/__tests__/issue-thread-interactions-service.test.ts
ui/src/components/IssueRunLedger.test.tsx`
- Result: 4 test files passed, 45 tests passed.
- UI screenshots not included because this PR updates reusable
components and API surfaces without wiring a new page-level layout.

## Risks

- Adds a new interaction terminal status; clients that switch
exhaustively on interaction status may need to handle `cancelled`.
- Issue-tree cost summaries use recursive issue traversal and should be
watched on unusually large issue trees.
- Page-level issue detail wiring is intentionally left to the board
QoL/issue-detail branch to keep this PR narrow.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5.5, code execution and GitHub CLI tool use, medium
reasoning effort.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-30 13:57:25 -05:00
Dotta cd606563f6 Expand database backups to non-system schemas (#4859)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies.
> - Reliable backups are part of operating that control plane safely.
> - The previous backup path was public-schema oriented and did not
clearly cover plugin-owned schemas or migration history.
> - Paperclip now has plugin database namespaces and Drizzle migration
state that must survive backup/restore.
> - This pull request expands logical database backups to non-system
schemas and documents the backup boundary.
> - The benefit is safer restore behavior for core and plugin-owned
database state without implying full filesystem disaster recovery.

## What Changed

- Include non-system database schemas in JavaScript and pg_dump backup
paths.
- Preserve enum, table, sequence, index, constraint, migration, and
plugin-schema objects across backup/restore.
- Add restore coverage for plugin-owned schemas and Drizzle migration
history.
- Clarify docs that DB backups are logical database backups, not full
instance filesystem backups.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm exec vitest run packages/db/src/backup-lib.test.ts`
- Result: 1 test file passed, 4 tests passed.
- Confirmed this PR does not include `pnpm-lock.yaml` or
`.github/workflows/*` changes.

## Risks

- Medium: backup generation touches schema discovery and restore
ordering, so unusual database objects may need additional coverage
later.
- No migrations are included.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use enabled, medium reasoning
effort. Exact hosted context-window details are not exposed in this
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Note: no UI changes are included in this PR, so screenshots are not
applicable.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-30 12:54:35 -05:00
Devin Foley c0ce35d1fb Improve E2B plugin configuration UX and fix execution timeouts (#4802)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - E2B is a sandbox provider plugin that runs agent code in isolated
cloud environments
> - Operators configure E2B through the plugin settings page
> - But the E2B API key configuration was unclear — the settings field
description didn't explain that pasted keys are auto-saved as company
secrets, and the fallback to the host `E2B_API_KEY` variable wasn't
documented
> - Additionally, long-running E2B sandbox commands were timing out
because the plugin environment RPC driver used a fixed timeout, and
environment commands competed for the single foreground command slot
> - This PR clarifies the E2B configuration UX, fixes RPC timeouts for
plugin environment execution, and runs E2B environment commands in
background mode to avoid blocking the foreground slot
> - The benefit is clearer E2B setup for operators and more reliable
sandbox command execution

## What Changed

- Updated E2B plugin manifest and settings UI to clarify API key
configuration — field description now explains that pasted keys are
saved as company secrets and documents the `E2B_API_KEY` host fallback
- Added test coverage for the plugin settings page rendering
- Fixed `plugin-environment-driver.ts` to pass the configured timeout
through to RPC calls instead of using a hardcoded default
- Updated `environment-runtime.ts` to propagate timeout from the
environment lease to the plugin driver
- Changed E2B sandbox command execution to use background handles so
long-running agent commands don't block the foreground slot needed by
the callback bridge

## Verification

- `pnpm test` — all existing and new tests pass
- `pnpm typecheck` — clean
- Manual: navigate to plugin settings, verify E2B API key field shows
the updated description text
- Manual: run an E2B-backed agent task with a long-running command,
verify it completes without RPC timeout

## Risks

- Low risk. Configuration UX change is cosmetic. The timeout fix passes
an existing value through instead of dropping it. Background command
execution is a behavioral change but only affects E2B sandbox commands —
the foreground slot is still available for bridge health checks.

## Model Used

Codex GPT 5.4 high via Paperclip.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-29 17:12:30 -07:00
Devin Foley a4ac6ff133 Add sandbox callback bridge for remote environment API access (#4801)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents can run inside sandboxed environments like E2B, which are
isolated from the host network
> - Sandboxed agents need to call back to the Paperclip API to report
progress, post comments, and update issue status
> - But sandbox environments cannot reach the Paperclip server directly
because they run in isolated network namespaces
> - This PR adds a callback bridge that proxies API requests from the
sandbox to the Paperclip server, running as a local HTTP server on the
host that forwards authenticated requests
> - The bridge is started automatically when an adapter launches a
sandbox execution, and torn down when the run completes
> - The benefit is sandboxed agents can interact with the Paperclip API
without requiring network-level access to the host, enabling E2B and
similar providers to work end-to-end

## What Changed

- Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a
lightweight HTTP bridge server that accepts requests from sandbox
environments and proxies them to the Paperclip API with authentication
- Added request validation and security policy: the bridge only forwards
requests to the configured API URL, validates content types, enforces
size limits, and rejects non-API paths
- Wired the bridge into all remote adapter execute paths (claude, codex,
cursor, gemini, pi) — the bridge starts before the agent process and the
bridge URL is passed via environment variables
- Updated `environment-execution-target.ts` to prefer the explicit API
URL from environment lease metadata for sandbox callback routing
- Fixed Claude sandbox runtime setup to work with the bridge
configuration
- Added comprehensive test coverage for bridge request handling, policy
enforcement, and sandbox execution integration
- Fixed browser bundling — the bridge module is excluded from the
frontend bundle via the adapter-utils index export

## Verification

- `pnpm test` — all existing and new tests pass, including bridge unit
tests and sandbox execution integration tests
- `pnpm typecheck` — clean
- Manual: configure an E2B environment, run an agent task, verify the
agent can post comments and update issue status through the bridge

## Risks

- Medium. This is a new network-facing component (HTTP server on
localhost). The security policy restricts forwarding to the configured
API URL only and validates all requests, but any proxy introduces attack
surface. The bridge binds to localhost only and is scoped to the
lifetime of a single agent run.

## Model Used

Codex GPT 5.4 high via Paperclip.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-29 16:37:34 -07:00
Devin Foley f9cf1d2f6a Add cursor sandbox support and fix SSH workspace sync (#4803)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents can run inside sandboxed environments like E2B, or on remote
hosts via SSH
> - The cursor adapter needs to resolve `cursor-agent` inside sandbox
environments where it's installed in `~/.local/bin`
> - But when using the default `agent` command on a sandbox target, the
adapter didn't know to look in `~/.local/bin/cursor-agent`, causing
"command not found" failures
> - Additionally, repeated SSH runs failed because `git checkout` during
workspace sync conflicted with leftover `.paperclip-runtime` files from
previous runs
> - This PR adds sandbox-aware command resolution for cursor and fixes
the SSH workspace sync conflict
> - The benefit is cursor works in E2B sandboxes out of the box, and
repeated SSH runs don't fail on workspace sync

## What Changed

- `cursor-local`: Added `prepareCursorSandboxCommand` — on sandbox
targets, reads the remote `$HOME`, prepends `~/.local/bin` to PATH, and
prefers `~/.local/bin/cursor-agent` when the default command is
requested; tightened the sandbox command probe to validate the binary
exists before launching; preserves explicit custom command overrides
- `adapter-utils/ssh.ts`: Added `--force` to git checkout in SSH
workspace sync to handle `.paperclip-runtime` untracked file conflicts
from previous runs

## Verification

- `pnpm test` — all existing and new tests pass, including cursor
sandbox probe, sandbox execution, and custom command override tests
- `pnpm typecheck` — clean
- Manual: configure an E2B environment, run a cursor-local task, verify
it resolves cursor-agent from the sandbox install path

## Risks

- Low-medium. The `--force` flag on git checkout could discard
uncommitted changes in the remote workspace, but the workspace is
managed by Paperclip and should not contain user edits.

## Model Used

Codex GPT 5.4 high via Paperclip.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-29 16:12:06 -07:00
Devin Foley 9b99d30330 Add dedicated environment settings page and test-in-environment (#4798)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents run inside environments (local, SSH, E2B sandbox)
> - Operators need to configure and manage these environments
> - But environment settings were buried inside the general company
settings page, making them hard to find
> - Additionally, when testing an agent from the configuration form, the
test always ran locally regardless of which environment was selected
> - This PR moves environments into a dedicated top-level company
settings section and wires the "Test Environment" button to run inside
the selected environment
> - The benefit is operators can find and manage environments more
easily, and the test button now validates the actual environment the
agent will use

## What Changed

- Added a dedicated `CompanyEnvironments` settings page with its own
route and sidebar entry
- Updated `CompanySettingsSidebar` and `CompanySettingsNav` to include
the new environments section
- Modified the agent test route (`POST /agents/:id/test`) to accept an
optional `environmentId` parameter
- Updated all adapter `test.ts` handlers to resolve and use the
specified execution target environment
- Added `resolveTestExecutionTarget` to `execution-target.ts` for remote
environment test resolution with cwd fallback
- Moved the "Test Environment" button and its feedback display into the
`NewAgent` page footer for better UX flow

## Verification

- `pnpm test` — all existing and new tests pass
- `pnpm typecheck` — clean
- Manual: navigate to Company Settings, confirm "Environments" appears
as a top-level section
- Manual: configure an agent with a non-local environment, click "Test
Environment", confirm the test runs inside that environment

## Risks

- Low risk. UI-only routing change for the settings page. The
test-in-environment change adds an optional parameter with a local
fallback, so existing behavior is preserved when no environment is
specified.

## Model Used

Codex GPT 5.4 high via Paperclip.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-29 15:56:13 -07:00
Dotta 1991ec9d6f [codex] Split backend control-plane QoL slice (#4700)
## Thinking Path

> - Paperclip is the control plane for autonomous AI companies, so
backend task ownership, recovery, review visibility, and company-scoped
limits need to stay enforceable without UI-only coupling.
> - Closed PR #4692 bundled those backend changes with UI workflow,
docs, skills, workflow, and lockfile churn.
> - PAP-2694 asks for a clean backend/control-plane slice from that
closed branch.
> - This branch starts from current `master` and mines only the `cli`,
`packages/db`, `packages/shared`, and `server` contracts/tests needed
for the backend behavior.
> - It explicitly excludes UI workflow/performance work,
`.github/workflows/pr.yml`, `pnpm-lock.yaml`, docs, skills,
package-script, adapter UI build-config, and perf fixture script
changes; the only UI files are fixture/test updates required by the
tightened shared `Company` contract.
> - The benefit is a smaller reviewable PR that preserves the
control-plane fixes while staying under Greptile s 100-file review
limit.

## What Changed

- Added company-scoped attachment-size limits through DB
schema/migrations, shared company portability contracts, CLI
import/export coverage, and server attachment upload enforcement.
- Added productivity review service/API behavior for no-comment streak,
long-active, and high-churn review issues, including request-depth
clamping and issue summary exposure.
- Hardened issue ownership and recovery/control-plane paths: peer-agent
mutation denial, issue tree pause/resume behavior, stranded recovery
origins, and related activity/test coverage.
- Preserved related backend contract updates for routine timestamp
variables and managed agent instruction bundles because they live in
shared/server contracts from the source branch.
- Addressed Greptile feedback by making `Company.attachmentMaxBytes`
non-optional, simplifying review request-depth clamping, fixing the
migration final newline, and enforcing the process-level attachment cap
as the final ceiling for uploads.
- Added minimal company fixtures needed for repo-wide typecheck/build
and kept the PR to 66 changed files with forbidden/non-slice paths
excluded.

## Verification

- `pnpm install --frozen-lockfile`
- `git diff --check origin/master..HEAD`
- `git diff --name-only origin/master..HEAD | wc -l` -> 66 files
- `git diff --name-only origin/master..HEAD -- .github/workflows/pr.yml
pnpm-lock.yaml package.json doc skills .agents scripts
packages/adapters` -> no output
- `pnpm exec vitest run --config vitest.config.ts
packages/shared/src/validators/issue.test.ts
packages/shared/src/routine-variables.test.ts
packages/shared/src/adapter-types.test.ts
cli/src/__tests__/company-import-export-e2e.test.ts
cli/src/__tests__/company.test.ts
server/src/__tests__/productivity-review-service.test.ts
server/src/__tests__/issue-tree-control-service.test.ts
server/src/__tests__/issue-tree-control-routes.test.ts
server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts
server/src/__tests__/issue-attachment-routes.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
server/src/__tests__/issues-service.test.ts` -> 12 files, 147 tests
passed
- `pnpm exec vitest run --config vitest.config.ts
cli/src/__tests__/company-delete.test.ts
cli/src/__tests__/company-import-export-e2e.test.ts
server/src/__tests__/productivity-review-service.test.ts` -> 3 files, 18
tests passed
- `pnpm exec vitest run --config vitest.config.ts
server/src/__tests__/issue-attachment-routes.test.ts` -> 1 file, 6 tests
passed
- `pnpm --filter @paperclipai/db typecheck && pnpm --filter
@paperclipai/shared typecheck && pnpm --filter @paperclipai/server
typecheck && pnpm --filter paperclipai typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck && pnpm --filter
@paperclipai/ui build`

## Risks

- Includes migrations `0073_shiny_salo.sql` and
`0074_striped_genesis.sql`; merge ordering matters if another PR adds
migrations first.
- This is intentionally backend-only apart from fixture/test updates
forced by shared type correctness; UI affordances from PR #4692 are not
present here and should land in separate UI slices.
- The worktree install emitted plugin SDK bin-link warnings for unbuilt
plugin packages, but the targeted tests and package typechecks completed
successfully.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected; check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled terminal/GitHub
workflow. Exact runtime context window was not exposed by the harness.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-28 16:46:45 -05:00
Dotta 7a9b3a6037 [codex] Harden recovery issue handling (#4600)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The control plane must recover stranded agent work without creating
new operational loops
> - Stranded recovery issues can themselves fail, and exposing raw retry
errors in comments can leak sensitive adapter details
> - New local companies also should not force a hire-approval gate
unless operators enable that policy
> - This pull request hardens recovery issue handling, redacts retry
failure details in issue copy, preserves `maxConcurrentRuns: 1`, and
flips new-hire approval to an opt-in default
> - The benefit is safer automatic recovery and smoother default company
setup without hidden migration conflicts

## What Changed

- Added migration `0071_default_hire_approval_off` and updated company
schema/import/export/docs so hire approvals default off and serialize
only when enabled.
- Added migration `0072_large_sandman` with a partial unique index
preventing duplicate active stranded recovery issues for the same source
issue.
- Blocked failed `stranded_issue_recovery` issues in place instead of
creating nested recovery issues.
- Redacted latest retry failure details from recovery issue comments
while still linking reviewers to run evidence.
- Allowed `maxConcurrentRuns: 1` to be honored by heartbeat concurrency
normalization.
- Added focused regression coverage for recovery recursion, redaction,
migration ordering, and concurrency behavior.

## Verification

- `pnpm --filter @paperclipai/db run check:migrations`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/recovery-classifiers.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/company-portability.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/agent-permissions-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/heartbeat-process-recovery.test.ts --pool=forks
--poolOptions.forks.isolate=true` exits 0, but this host skipped the
embedded Postgres tests with the existing init guard.
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/heartbeat-dependency-scheduling.test.ts
--pool=forks --poolOptions.forks.isolate=true` exits 0, but this host
skipped the embedded Postgres tests with the existing init guard.

## Risks

- Migration risk is low but this PR intentionally owns both new
migrations to avoid separate PR migration-journal conflicts.
- Recovery comments now require operators to inspect linked run evidence
for details instead of reading raw errors inline.
- The hire approval default changes behavior for newly created/imported
companies only; existing persisted company settings are not changed
except by the SQL default for future rows.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled terminal/GitHub
workflow, reasoning mode active. Context window not exposed in this
environment.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-27 15:02:47 -05:00
Dotta fda296ee4f [codex] Add configurable liveness auto-recovery controls (#4587)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Heartbeat liveness recovery decides when stalled issue trees need
manager-visible follow-up.
> - Automatic recovery issue creation is useful, but operators need
instance-level controls for how aggressive it is.
> - Without controls, recovery behavior is harder to tune for local
development, production operations, and noisy edge cases.
> - This pull request adds configurable liveness auto-recovery settings
across shared contracts, API routes, services, and the instance
experimental settings UI.
> - The benefit is that operators can keep liveness findings advisory or
enable bounded recovery automation with explicit intervals and lookback
windows.

## What Changed

- Added shared types and validators for liveness auto-recovery settings.
- Extended instance settings routes and services to persist and validate
the new controls.
- Wired heartbeat/recovery services to honor enablement, minimum
interval, and lookback settings.
- Added UI controls for liveness recovery under instance experimental
settings.
- Covered the new server behavior with instance settings and liveness
escalation tests.

## Verification

- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts
server/src/__tests__/instance-settings-routes.test.ts --pool=forks
--poolOptions.forks.isolate=true`
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`

## Risks

- Moderate behavioral risk because recovery automation timing changes
when enabled; defaults keep existing advisory behavior unless the
setting is turned on.
- No database migration in this PR; settings are stored through the
existing instance settings path.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, `gpt-5`, coding model with tool use and local command
execution; context window not exposed by the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-27 08:46:44 -05:00
Dotta 82e257c7ba Cancel stale queued heartbeats when issue graph changes (PAP-2314) (#4534)
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-26 21:17:38 -05:00
Devin Foley 868d08903e test: isolate CLI company import e2e state (#4560)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies, and its
CLI import/export path is part of how operators move company state
safely between environments.
> - The `paperclipai company import/export` e2e test is supposed to
validate that portability flow inside a hermetic harness, not against a
developer's live Paperclip home.
> - This regression showed nested CLI subprocesses could silently fall
back to ambient `PAPERCLIP_*` state and mutate a real local instance by
creating extra companies such as `CLI-1-Roundtrip-Test`.
> - The first job was to pin the test subprocesses to isolated config,
home, instance, auth, and context paths, and to add a regression
assertion that proves the nested CLI writes stay inside the test-owned
state.
> - Once the PR was up, CI and Greptile exposed two follow-on issues
that were blocking merge: plugin SDK typecheck bootstrap was racing
across packages in fresh CI, and the new lock helper needed one more fix
to release its lock on failure.
> - This pull request therefore ends up doing two tightly related
things: fixing the original CLI isolation leak, and hardening the
supporting typecheck/bootstrap path enough for the fix to verify cleanly
in CI.
> - The benefit is that the portability e2e test is now actually
isolated, and the PR verification path is stable enough to catch
regressions instead of introducing its own nondeterministic failures.

## What Changed

- Hardened `cli/src/__tests__/company-import-export-e2e.test.ts` so
nested CLI subprocesses re-seed isolated `PAPERCLIP_CONFIG`,
`PAPERCLIP_HOME`, `PAPERCLIP_INSTANCE_ID`, `PAPERCLIP_CONTEXT`,
`PAPERCLIP_AUTH_STORE`, and throwaway `HOME` values instead of falling
back to ambient machine state.
- Added a regression assertion around `paperclipai context set --json`,
then cleared the temporary `context.json` so the isolation check and the
later export/import flow stay independent.
- Passed the same isolated `HOME` into the server subprocess so both
sides of the e2e harness are symmetric.
- Introduced locking in `scripts/ensure-plugin-build-deps.mjs` and
switched the server/plugin example `typecheck` scripts to use that
helper instead of launching concurrent raw `@paperclipai/plugin-sdk`
builds.
- Fixed the helper failure path so it releases the lock before exiting
non-zero, which prevents stale-lock timeouts during parallel typecheck
runs.

## Verification

- `pnpm vitest run cli/src/__tests__/company-import-export-e2e.test.ts
--project paperclipai`
- `pnpm --filter paperclipai typecheck`
- `pnpm -r typecheck`
- PR checks now pass on the current head, including `policy`, `verify`,
`e2e`, `security/snyk`, and `Greptile Review`.

## Risks

- Low risk. The product-facing behavior change is scoped to test harness
code in the CLI e2e suite.
- The CI stabilization changes only affect bootstrap/typecheck helper
paths for the server and plugin/example packages, but they do touch
shared verification plumbing; the main risk is changing how fresh build
artifacts are prepared in local/CI typecheck runs.

## Model Used

- Anthropic Claude via Paperclip `claude_local`, model
`claude-opus-4-7`, high-effort local coding agent, used for the initial
implementation and first peer-reviewed verification.
- OpenAI Codex via Paperclip `codex_local`, model `gpt-5.4`, high
reasoning-effort local coding agent with tool use, used for CI triage,
Greptile follow-up fixes, verification, and PR maintenance.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-26 19:10:01 -07:00
Devin Foley d47ffa87f0 Fix CEO AGENT_HOME paths and centralize workspace env propagation (#4551)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The local adapter layer is responsible for turning Paperclip runtime
context into the environment seen by the child agent process.
> - The CEO onboarding bundle tells the agent where to read and write
its persistent memory and fact files.
> - That bundle was using `./memory/...` and `./life/...`, which only
works when the process cwd happens to equal the agent home directory.
> - At the same time, six local adapters each duplicated the same
workspace-env propagation logic, including `AGENT_HOME`, which makes
this contract easy to drift.
> - This pull request fixes the CEO instructions to use
`$AGENT_HOME/...` and centralizes workspace-env propagation in one
shared helper with shared tests.
> - The benefit is a real bug fix for agent memory paths plus a single
tested contract that makes future built-in adapter work less likely to
forget `AGENT_HOME`.

## What Changed

- Updated `server/src/onboarding-assets/ceo/HEARTBEAT.md` to use
`$AGENT_HOME/memory/...` and `$AGENT_HOME/life/...` instead of
cwd-relative `./memory/...` and `./life/...`.
- Added `applyPaperclipWorkspaceEnv(...)` in
`packages/adapter-utils/src/server-utils.ts` to centralize
`PAPERCLIP_WORKSPACE_*` and `AGENT_HOME` propagation.
- Added shared helper coverage in
`packages/adapter-utils/src/server-utils.test.ts` for both populated and
skip-empty cases.
- Switched the built-in local adapters (`claude_local`, `codex_local`,
`cursor_local`, `gemini_local`, `opencode_local`, `pi_local`) over to
the shared helper instead of inline env assignment blocks.

## Verification

- `pnpm install`
- `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts
packages/adapters/claude-local/src/server/execute.remote.test.ts
packages/adapters/codex-local/src/server/execute.remote.test.ts
packages/adapters/cursor-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts
packages/adapters/opencode-local/src/server/execute.remote.test.ts
packages/adapters/pi-local/src/server/execute.remote.test.ts`
- Result: 7 test files passed, 31 tests passed, 0 failures.

## Risks

- Low risk.
- The only behavioral surface is the shared env propagation refactor
across six adapters; if the helper diverged from prior semantics, an
adapter could miss a workspace env var.
- The shared helper test plus the affected adapter execute tests reduce
that risk, and the helper preserves the prior "set only non-empty
strings" behavior.

## Model Used

- OpenAI Codex via Paperclip `codex_local` agent runtime; tool-assisted
coding workflow with shell execution, file patching, git operations, and
API interaction. The exact backend model identifier and context window
are not surfaced by this local runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-26 13:57:35 -07:00
Devin Foley 91333ec86f feat: add paperclip-dev skill with optional bundled skill support (#3854)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents working on the Paperclip codebase itself need guidance on dev
workflows: server lifecycle, worktrees, builds, database ops,
diagnostics
> - There was no bundled skill covering these workflows — agents had to
figure it out from scratch each time
> - Additionally, not every skill should be force-installed on every
agent — a dev-focused skill should be opt-in
> - This PR adds a `paperclip-dev` skill with `required: false`
frontmatter so it ships with Paperclip but isn't auto-installed
> - The skill's PR section references canonical files
(`.github/PULL_REQUEST_TEMPLATE.md`, `CONTRIBUTING.md`) instead of
duplicating their content, with gated instructions that force agents to
read those files before creating any PR
> - The benefit is that developers (human or agent) can opt in to
structured dev guidance without polluting the default agent skill set or
creating drift between duplicated docs

## What Changed

- Added `skills/paperclip-dev/SKILL.md` covering server management,
worktree lifecycle, builds, database ops, diagnostics, agent operations,
and common mistakes
- The Pull Requests section uses gated, reference-based instructions —
agents MUST read `.github/PULL_REQUEST_TEMPLATE.md` and
`CONTRIBUTING.md` before running `gh pr create`, with a brief checklist
of required section names (no content duplication)
- Updated `packages/adapter-utils/src/server-utils.ts` to respect
`required: false` frontmatter — optional skills are bundled but not
auto-installed on agents
- Added test in `server/src/__tests__/paperclip-skill-utils.test.ts`
verifying that optional skills are excluded from the default install set

## Verification

```bash
# Run tests
pnpm test

# Manual verification: create a fresh worktree without seeding
npx paperclipai worktree:make test-optional-skill --no-seed
cd ~/paperclip-test-optional-skill
eval "$(npx paperclipai worktree env)"
npx paperclipai run

# Verify paperclip-dev appears in company skill library but is NOT auto-assigned
# Call listPaperclipSkillEntries() — paperclip-dev should show required: false
# Call resolvePaperclipDesiredSkillNames() — paperclip-dev should NOT be in the default set

# Cleanup
npx paperclipai worktree:cleanup test-optional-skill
```

## Risks

- Low risk. The `required` field defaults to `true` when absent, so all
existing skills behave identically. Only the new `paperclip-dev` skill
sets `required: false`.

## Model Used

Claude Opus 4.6 (`claude-opus-4-6`) via Claude Code, with tool use and
extended context.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-26 11:06:13 -07:00
Dotta c036bbfa98 Add first-class security agent role to taxonomy (#4532)
## Thinking Path

> - Paperclip is the control plane for AI-agent companies, so agent
metadata is part of the platform's governance and audit surface.
> - The shared agent taxonomy in `@paperclipai/shared` is the source of
truth for allowed agent roles and their UI labels.
> - The current taxonomy lacks a `security` role, which causes Security
Engineer hires to collapse into `engineer`.
> - That breaks separation-of-duties evidence in telemetry and weakens
role-level audit fidelity even though it does not directly change
permissions.
> - This pull request adds `security` as a first-class shared role and
covers the prior rejection path with a regression test.
> - The benefit is that Security Engineer agents can now be persisted
and rendered under the correct role without schema or permission churn.

## What Changed

- Added `security` to `AGENT_ROLES` in
`packages/shared/src/constants.ts`.
- Added the `Security` display label to `AGENT_ROLE_LABELS` so existing
UI consumers render the new role automatically.
- Added a shared validator regression test proving `createAgentSchema`
accepts `role: "security"` and that the label stays stable.

## Verification

- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/shared exec vitest run
src/adapter-types.test.ts`

## Risks

- Low risk. This is a shared enum expansion with no database migration
and no permission-model change.
- Residual risk: this PR does not backfill existing agents already
persisted as `engineer`; it only fixes new validations and labels going
forward.

> I checked `ROADMAP.md`/`doc` for overlap and did not find an existing
planned item covering this taxonomy fix.

## Model Used

- OpenAI GPT-5.4 via the Codex local adapter, with tool use and local
code execution enabled. The runtime did not surface a separate
context-window identifier in agent metadata.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-26 07:52:05 -05:00
Devin Foley 4ef969f084 Add E2B sandbox provider plugin (#4452)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Sandbox environments are part of that execution layer, and the
recent core refactor moved provider-specific behavior to a generic
plugin seam
> - This pull request adds a dedicated `@paperclipai/plugin-e2b` package
so E2B can live entirely outside core host code
> - Because the feature is still unreleased, the plugin should model
third-party packaging directly instead of carrying extra
backward-compatibility complexity in core or the workspace lockfile
> - This branch therefore makes the E2B provider a standalone
publishable package, documents the package-local dev flow, and keeps the
publish manifest/runtime dependency story correct
> - The benefit is that E2B becomes a true plugin reference
implementation that can be installed by package name without reopening
core Paperclip code

## What Changed

- Added `packages/plugins/paperclip-plugin-e2b` as the E2B sandbox
provider plugin package
- Implemented config validation, lease acquire/resume/release/destroy
handlers, workspace realization, and command execution for E2B sandboxes
- Excluded the E2B plugin package from the root workspace so the repo no
longer needs `pnpm-lock.yaml` churn for its third-party dependency graph
- Added package-local development/install support plus a prepack
manifest generator so the published tarball still declares
`@paperclipai/plugin-sdk` and `e2b` runtime dependencies
- Addressed review feedback by fixing sandbox cleanup on acquire
failures, rejecting blank templates, normalizing fractional `timeoutMs`,
and always passing the configured template name to the E2B SDK
- Updated focused Vitest coverage for config normalization, validation,
acquire cleanup, command execution, and lease release behavior
- Updated the Dockerfile deps stage to copy the E2B package manifest so
the policy check stays in sync

## Verification

- `cd packages/plugins/paperclip-plugin-e2b && pnpm install
--ignore-workspace --no-lockfile`
- `cd packages/plugins/paperclip-plugin-e2b && pnpm build`
- `cd packages/plugins/paperclip-plugin-e2b && pnpm --ignore-workspace
test`
- `cd packages/plugins/paperclip-plugin-e2b && pnpm --ignore-workspace
typecheck`
- `cd packages/plugins/paperclip-plugin-e2b && npm pack --dry-run`

## Risks

- The package now relies on a prepack manifest rewrite so the
publish-time dependency list stays correct while the repo-local dev
manifest stays workspace-light
- The current repo snapshot is still unreleased, so the generated
publish manifest points at the repo SDK version until the normal release
flow rewrites versions before publish
- Real-world E2B environments may still expose edge cases around
lifecycle timing or sandbox metadata beyond the mocked unit coverage

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex via `codex_local`
- Model ID: `gpt-5.4`
- Reasoning effort: `high`
- Context window observed in runtime session metadata: `258400` tokens
- Capabilities used: terminal tool execution, git, GitHub CLI, and local
build/test inspection

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-25 11:01:11 -07:00
Devin Foley 5bd0f578fd Generalize sandbox provider core for plugin-only providers (#4449)
## Thinking Path

> - Paperclip is a control plane, so optional execution providers should
sit at the plugin edge instead of hardcoding provider-specific behavior
into core shared/server/ui layers.
> - Sandbox environments are already first-class, and the fake provider
proves the built-in path; the remaining gap was that real providers
still leaked provider-specific config and runtime assumptions into core.
> - That coupling showed up in config normalization, secret persistence,
capabilities reporting, lease reconstruction, and the board UI form
fields.
> - As long as core knew about those provider-shaped details, shipping a
provider as a pure third-party plugin meant every new provider would
still require host changes.
> - This pull request generalizes the sandbox provider seam around
schema-driven plugin metadata and generic secret-ref handling.
> - The runtime and UI now consume provider metadata generically, so
core only special-cases the built-in fake provider while third-party
providers can live entirely in plugins.

## What Changed

- Added generic sandbox-provider capability metadata so plugin-backed
providers can expose `configSchema` through shared environment support
and the environments capabilities API.
- Reworked sandbox config normalization/persistence/runtime resolution
to handle schema-declared secret-ref fields generically, storing them as
Paperclip secrets and resolving them for probe/execute/release flows.
- Generalized plugin sandbox runtime handling so provider validation,
reusable-lease matching, lease reconstruction, and plugin worker calls
all operate on provider-agnostic config instead of provider-shaped
branches.
- Replaced hardcoded sandbox provider form fields in Company Settings
with schema-driven rendering and blocked agent environment selection
from the built-in fake provider.
- Added regression coverage for the generic seam across shared support
helpers plus environment config, probe, routes, runtime, and
sandbox-provider runtime tests.

## Verification

- `pnpm vitest --run packages/shared/src/environment-support.test.ts
server/src/__tests__/environment-config.test.ts
server/src/__tests__/environment-probe.test.ts
server/src/__tests__/environment-routes.test.ts
server/src/__tests__/environment-runtime.test.ts
server/src/__tests__/sandbox-provider-runtime.test.ts`
- `pnpm -r typecheck`

## Risks

- Plugin sandbox providers now depend more heavily on accurate
`configSchema` declarations; incorrect schemas can misclassify
secret-bearing fields or omit required config.
- Reusable lease matching is now metadata-driven for plugin-backed
providers, so providers that fail to persist stable metadata may
reprovision instead of resuming an existing lease.
- The UI form is now fully schema-driven for plugin-backed sandbox
providers; provider manifests without good defaults or descriptions may
produce a rougher operator experience.

## Model Used

- OpenAI Codex via `codex_local`
- Model ID: `gpt-5.4`
- Reasoning effort: `high`
- Context window observed in runtime session metadata: `258400` tokens
- Capabilities used: terminal tool execution, git, and local code/test
inspection

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-24 18:03:41 -07:00
Dotta 73fbdf36db Gate stale-run watchdog decisions by board access (#4446)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The run ledger surfaces stale-run watchdog evaluation issues and
recovery actions
> - Viewer-level board users should be able to inspect status without
getting controls that the server will reject
> - The UI also needs enough board-access context to know when to hide
those decision actions
> - This pull request exposes board memberships in the current board
access snapshot and gates watchdog action controls for known viewer
contexts
> - The benefit is clearer least-privilege UI behavior around recovery
controls

## What Changed

- Included memberships in `/api/cli-auth/me` so the board UI can
distinguish active viewer memberships from operator/admin access.
- Added the stale-run evaluation issue assignee to output silence
summaries.
- Hid stale-run watchdog decision buttons for known non-owner viewer
contexts.
- Surfaced watchdog decision failures through toast and inline error
text.
- Threaded `companyId` through the issue activity run ledger so access
checks are company-scoped.
- Added IssueRunLedger coverage for non-owner viewers.

## Verification

- `pnpm exec vitest run --project @paperclipai/ui
ui/src/components/IssueRunLedger.test.tsx`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`

## Risks

- Medium-low risk. This is a UI gating change backed by existing server
authorization.
- Local implicit and instance-admin board contexts continue to show
watchdog decision controls.
- No migrations.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled with
shell/GitHub/Paperclip API access. Context window was not reported by
the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-24 19:25:23 -05:00
Dotta 0c6961a03e Normalize escaped multiline issue and approval text (#4444)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The board and agent APIs accept multiline issue, approval,
interaction, and document text
> - Some callers accidentally send literal escaped newline sequences
like `\n` instead of JSON-decoded line breaks
> - That makes comments, descriptions, documents, and approval notes
render as flattened text instead of readable markdown
> - This pull request centralizes multiline text normalization in shared
validators
> - The benefit is newline-preserving API behavior across issue and
approval workflows without route-specific fixes

## What Changed

- Added a shared `multilineTextSchema` helper that normalizes escaped
`\n`, `\r\n`, and `\r` sequences to real line breaks.
- Applied the helper to issue descriptions, issue update comments, issue
comment bodies, suggested task descriptions, interaction summaries,
issue documents, approval comments, and approval decision notes.
- Added shared validator regressions for issue and approval multiline
inputs.

## Verification

- `pnpm exec vitest run --project @paperclipai/shared
packages/shared/src/validators/approval.test.ts
packages/shared/src/validators/issue.test.ts`
- `pnpm --filter @paperclipai/shared typecheck`

## Risks

- Low risk. This only changes text fields that are explicitly multiline
user/operator content.
- If a caller intentionally wanted literal backslash-n text in these
fields, it will now render as a real line break.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled with
shell/GitHub/Paperclip API access. Context window was not reported by
the runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-24 18:02:45 -05:00
Dotta 5a0c1979cf [codex] Add runtime lifecycle recovery and live issue visibility (#4419) 2026-04-24 15:50:32 -05:00
Dotta 9a8d219949 [codex] Stabilize tests and local maintenance assets (#4423)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - A fast-moving control plane needs stable local tests and repeatable
local maintenance tools so contributors can safely split and review work
> - Several route suites needed stronger isolation, Codex manual model
selection needed a faster-mode option, and local browser cleanup missed
Playwright's headless shell binary
> - Storybook static output also needed to be preserved as a generated
review artifact from the working branch
> - This pull request groups the test/local-dev maintenance pieces so
they can be reviewed separately from product runtime changes
> - The benefit is more predictable contributor verification and cleaner
local maintenance without mixing these changes into feature PRs

## What Changed

- Added stable Vitest runner support and serialized route/authz test
isolation.
- Fixed workspace runtime authz route mocks and stabilized
Claude/company-import related assertions.
- Allowed Codex fast mode for manually selected models.
- Broadened the agent browser cleanup script to detect
`chrome-headless-shell` as well as Chrome for Testing.
- Preserved generated Storybook static output from the source branch.

## Verification

- `pnpm exec vitest run
src/__tests__/workspace-runtime-routes-authz.test.ts
src/__tests__/claude-local-execute.test.ts --config vitest.config.ts`
from `server/` passed: 2 files, 19 tests.
- `pnpm exec vitest run src/server/codex-args.test.ts --config
vitest.config.ts` from `packages/adapters/codex-local/` passed: 1 file,
3 tests.
- `bash -n scripts/kill-agent-browsers.sh &&
scripts/kill-agent-browsers.sh --dry` passed; dry-run detected
`chrome-headless-shell` processes without killing them.
- `test -f ui/storybook-static/index.html && test -f
ui/storybook-static/assets/forms-editors.stories-Dry7qwx2.js` passed.
- `git diff --check public-gh/master..pap-2228-test-local-maintenance --
. ':(exclude)ui/storybook-static'` passed.
- `pnpm exec vitest run
cli/src/__tests__/company-import-export-e2e.test.ts --config
cli/vitest.config.ts` did not complete in the isolated split worktree
because `paperclipai run` exited during build prep with `TS2688: Cannot
find type definition file for 'react'`; this appears to be caused by the
worktree dependency symlink setup, not the code under test.
- Confirmed this PR does not include `pnpm-lock.yaml`.

## Risks

- Medium risk: the stable Vitest runner changes how route/authz tests
are scheduled.
- Generated `ui/storybook-static` files are large and contain minified
third-party output; `git diff --check` reports whitespace inside those
generated assets, so reviewers may choose to drop or regenerate that
artifact before merge.
- No database migrations.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent based on GPT-5, with shell, git, Paperclip
API, and GitHub CLI tool use in the local Paperclip workspace.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Note: screenshot checklist item is not applicable to source UI behavior;
the included Storybook static output is generated artifact preservation
from the source branch.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-24 15:11:42 -05:00
Devin Foley 70679a3321 Add sandbox environment support (#4415)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The environment/runtime layer decides where agent work executes and
how the control plane reaches those runtimes.
> - Today Paperclip can run locally and over SSH, but sandboxed
execution needs a first-class environment model instead of one-off
adapter behavior.
> - We also want sandbox providers to be pluggable so the core does not
hardcode every provider implementation.
> - This branch adds the Sandbox environment path, the provider
contract, and a deterministic fake provider plugin.
> - That required synchronized changes across shared contracts, plugin
SDK surfaces, server runtime orchestration, and the UI
environment/workspace flows.
> - The result is that sandbox execution becomes a core control-plane
capability while keeping provider implementations extensible and
testable.

## What Changed

- Added sandbox runtime support to the environment execution path,
including runtime URL discovery, sandbox execution targeting,
orchestration, and heartbeat integration.
- Added plugin-provider support for sandbox environments so providers
can be supplied via plugins instead of hardcoded server logic.
- Added the fake sandbox provider plugin with deterministic behavior
suitable for local and automated testing.
- Updated shared types, validators, plugin protocol definitions, and SDK
helpers to carry sandbox provider and workspace-runtime contracts across
package boundaries.
- Updated server routes and services so companies can create sandbox
environments, select them for work, and execute work through the sandbox
runtime path.
- Updated the UI environment and workspace surfaces to expose sandbox
environment configuration and selection.
- Added test coverage for sandbox runtime behavior, provider seams,
environment route guards, orchestration, and the fake provider plugin.

## Verification

- Ran locally before the final fixture-only scrub:
  - `pnpm -r typecheck`
  - `pnpm test:run`
  - `pnpm build`
- Ran locally after the final scrub amend:
  - `pnpm vitest run server/src/__tests__/runtime-api.test.ts`
- Reviewer spot checks:
  - create a sandbox environment backed by the fake provider plugin
  - run work through that environment
- confirm sandbox provider execution does not inherit host secrets
implicitly

## Risks

- This touches shared contracts, plugin SDK plumbing, server runtime
orchestration, and UI environment/workspace flows, so regressions would
likely show up as cross-layer mismatches rather than isolated type
errors.
- Runtime URL discovery and sandbox callback selection are sensitive to
host/bind configuration; if that logic is wrong, sandbox-backed
callbacks may fail even when execution succeeds.
- The fake provider plugin is intentionally deterministic and
test-oriented; future providers may expose capability gaps that this
branch does not yet cover.

## Model Used

- OpenAI Codex coding agent on a GPT-5-class backend in the
Paperclip/Codex harness. Exact backend model ID is not exposed
in-session. Tool-assisted workflow with shell execution, file editing,
git history inspection, and local test execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-24 12:15:53 -07:00
Dotta 8f1cd0474f [codex] Improve transient recovery and Codex model refresh (#4383)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Adapter execution and retry classification decide whether agent work
pauses, retries, or recovers automatically
> - Transient provider failures need to be classified precisely so
Paperclip does not convert retryable upstream conditions into false hard
failures
> - At the same time, operators need an up-to-date model list for
Codex-backed agents and prompts should nudge agents toward targeted
verification instead of repo-wide sweeps
> - This pull request tightens transient recovery classification for
Claude and Codex, updates the agent prompt guidance, and adds Codex
model refresh support end-to-end
> - The benefit is better automatic retry behavior plus fresher
operator-facing model configuration

## What Changed

- added Codex usage-limit retry-window parsing and Claude extra-usage
transient classification
- normalized the heartbeat transient-recovery contract across adapter
executions and heartbeat scheduling
- documented that deferred comment wakes only reopen completed issues
for human/comment-reopen interactions, while system follow-ups leave
closed work closed
- updated adapter-utils prompt guidance to prefer targeted verification
- added Codex model refresh support in the server route, registry,
shared types, and agent config form
- added adapter/server tests covering the new parsing, retry scheduling,
and model-refresh behavior

## Verification

- `pnpm exec vitest run --project @paperclipai/adapter-utils
packages/adapter-utils/src/server-utils.test.ts`
- `pnpm exec vitest run --project @paperclipai/adapter-claude-local
packages/adapters/claude-local/src/server/parse.test.ts`
- `pnpm exec vitest run --project @paperclipai/adapter-codex-local
packages/adapters/codex-local/src/server/parse.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/adapter-model-refresh-routes.test.ts
server/src/__tests__/adapter-models.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
server/src/__tests__/heartbeat-retry-scheduling.test.ts`

## Risks

- Moderate behavior risk: retry classification affects whether runs
auto-recover or block, so mistakes here could either suppress needed
retries or over-retry real failures
- Low workflow risk: deferred comment wake reopening is intentionally
scoped to human/comment-reopen interactions so system follow-ups do not
revive completed issues unexpectedly

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex GPT-5-based coding agent with tool use and code execution
in the Codex CLI environment

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-24 09:40:40 -05:00
Dotta 7ad225a198 [codex] Improve issue thread review flow (#4381)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Issue detail is where operators coordinate review, approvals, and
follow-up work with active runs
> - That thread UI needs to surface blockers, descendants, review
handoffs, and reply ergonomics clearly enough for humans to guide agent
work
> - Several small gaps in the issue-thread flow were making review and
navigation clunkier than necessary
> - This pull request improves the reply composer, descendant/blocker
presentation, interaction folding, and review-request handoff plumbing
together as one cohesive issue-thread workflow slice
> - The benefit is a cleaner operator review loop without changing the
broader task model

## What Changed

- restored and refined the floating reply composer behavior in the issue
thread
- folded expired confirmation interactions and improved post-submit
thread scrolling behavior
- surfaced descendant issue context and inline blocker/paused-assignee
notices on the issue detail view
- tightened large-board first paint behavior in `IssuesList`
- added loose review-request handoffs through the issue
execution-policy/update path and covered them with tests

## Verification

- `pnpm vitest run ui/src/pages/IssueDetail.test.tsx`
- `pnpm vitest run server/src/__tests__/issues-service.test.ts
server/src/__tests__/issue-execution-policy.test.ts`
- `pnpm exec vitest run --project @paperclipai/ui
ui/src/components/IssueChatThread.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/components/IssuesList.test.tsx ui/src/lib/issue-tree.test.ts
ui/src/api/issues.test.ts`
- `pnpm exec vitest run --project @paperclipai/adapter-utils
packages/adapter-utils/src/server-utils.test.ts`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/issue-comment-reopen-routes.test.ts -t "coerces
executor handoff patches into workflow-controlled review wakes|wakes the
return assignee with execution_changes_requested"`
- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/issue-execution-policy.test.ts
server/src/__tests__/issues-service.test.ts`

## Visual Evidence

- UI layout changes are covered by the focused issue-thread component
and issue-detail tests listed above. Browser screenshots were not
attachable from this automated greploop environment, so reviewers should
use the running preview for final visual confirmation.

## Risks

- Moderate UI-flow risk: these changes touch the issue detail experience
in multiple spots, so regressions would most likely show up as
thread-layout quirks or incorrect review-handoff behavior

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex GPT-5-based coding agent with tool use and code execution
in the Codex CLI environment

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots or documented the visual verification path
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-24 08:02:45 -05:00
Dotta 35a9dc37b0 [codex] Speed up company skill detail loading (#4380)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Company skills are part of the control plane for distributing
reusable capabilities
> - Board flows that inspect company skill detail should stay responsive
because they are operator-facing control-plane reads
> - The existing detail path was doing broader work than needed for the
specific detail screen
> - This pull request narrows that company-skill detail loading path and
adds a regression test around it
> - The benefit is faster company skill detail reads without changing
the external API contract

## What Changed

- tightened the company-skill detail loading path in
`server/src/services/company-skills.ts`
- added `server/src/__tests__/company-skills-detail.test.ts` to verify
the detail route only pulls the required data

## Verification

- `pnpm exec vitest run --project @paperclipai/server
server/src/__tests__/company-skills-detail.test.ts`

## Risks

- Low risk: this only changes the company-skill detail query path, but
any missed assumption in the detail consumer would surface when loading
that screen

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex GPT-5-based coding agent with tool use and code execution
in the Codex CLI environment

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-24 07:37:13 -05:00
Devin Foley e4995bbb1c Add SSH environment support (#4358)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - The environments subsystem already models execution environments,
but before this branch there was no end-to-end SSH-backed runtime path
for agents to actually run work against a remote box
> - That meant agents could be configured around environment concepts
without a reliable way to execute adapter sessions remotely, sync
workspace state, and preserve run context across supported adapters
> - We also need environment selection to participate in normal
Paperclip control-plane behavior: agent defaults, project/issue
selection, route validation, and environment probing
> - Because this capability is still experimental, the UI surface should
be easy to hide and easy to remove later without undoing the underlying
implementation
> - This pull request adds SSH environment execution support across the
runtime, adapters, routes, schema, and tests, then puts the visible
environment-management UI behind an experimental flag
> - The benefit is that we can validate real SSH-backed agent execution
now while keeping the user-facing controls safely gated until the
feature is ready to come out of experimentation

## What Changed

- Added SSH-backed execution target support in the shared adapter
runtime, including remote workspace preparation, skill/runtime asset
sync, remote session handling, and workspace restore behavior after
runs.
- Added SSH execution coverage for supported local adapters, plus remote
execution tests across Claude, Codex, Cursor, Gemini, OpenCode, and Pi.
- Added environment selection and environment-management backend support
needed for SSH execution, including route/service work, validation,
probing, and agent default environment persistence.
- Added CLI support for SSH environment lab verification and updated
related docs/tests.
- Added the `enableEnvironments` experimental flag and gated the
environment UI behind it on company settings, agent configuration, and
project configuration surfaces.

## Verification

- `pnpm exec vitest run
packages/adapters/claude-local/src/server/execute.remote.test.ts
packages/adapters/cursor-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts
packages/adapters/opencode-local/src/server/execute.remote.test.ts
packages/adapters/pi-local/src/server/execute.remote.test.ts`
- `pnpm exec vitest run server/src/__tests__/environment-routes.test.ts`
- `pnpm exec vitest run
server/src/__tests__/instance-settings-routes.test.ts`
- `pnpm exec vitest run ui/src/lib/new-agent-hire-payload.test.ts
ui/src/lib/new-agent-runtime-config.test.ts`
- `pnpm -r typecheck`
- `pnpm build`
- Manual verification on a branch-local dev server:
  - enabled the experimental flag
  - created an SSH environment
  - created a Linux Claude agent using that environment
- confirmed a run executed on the Linux box and synced workspace changes
back

## Risks

- Medium: this touches runtime execution flow across multiple adapters,
so regressions would likely show up in remote session setup, workspace
sync, or environment selection precedence.
- The UI flag reduces exposure, but the underlying runtime and route
changes are still substantial and rely on migration correctness.
- The change set is broad across adapters, control-plane services,
migrations, and UI gating, so review should pay close attention to
environment-selection precedence and remote workspace lifecycle
behavior.

## Model Used

- OpenAI Codex via Paperclip's local Codex adapter, GPT-5-class coding
model with tool use and code execution in the local repo workspace. The
local adapter does not surface a more specific public model version
string in this branch workflow.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-23 19:15:22 -07:00
Dotta f98c348e2b [codex] Add issue subtree pause, cancel, and restore controls (#4332)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - This branch extends the issue control-plane so board operators can
pause, cancel, and later restore whole issue subtrees while keeping
descendant execution and wake behavior coherent.
> - That required new hold state in the database, shared contracts,
server routes/services, and issue detail UI controls so subtree actions
are durable and auditable instead of ad hoc.
> - While this branch was in flight, `master` advanced with new
environment lifecycle work, including a new `0065_environments`
migration.
> - Before opening the PR, this branch had to be rebased onto
`paperclipai/paperclip:master` without losing the existing
subtree-control work or leaving conflicting migration numbering behind.
> - This pull request rebases the subtree pause/cancel/restore feature
cleanly onto current `master`, renumbers the hold migration to
`0066_issue_tree_holds`, and preserves the full branch diff in a single
PR.
> - The benefit is that reviewers get one clean, mergeable PR for the
subtree-control feature instead of stale branch history with migration
conflicts.

## What Changed

- Added durable issue subtree hold data structures, shared
API/types/validators, server routes/services, and UI flows for subtree
pause, cancel, and restore operations.
- Added server and UI coverage for subtree previewing, hold
creation/release, dependency-aware scheduling under holds, and issue
detail subtree controls.
- Rebased the branch onto current `paperclipai/paperclip:master` and
renumbered the branch migration from `0065_issue_tree_holds` to
`0066_issue_tree_holds` so it no longer conflicts with upstream
`0065_environments`.
- Added a small follow-up commit that makes restore requests return `200
OK` explicitly while keeping pause/cancel hold creation at `201
Created`, and updated the route test to match that contract.

## Verification

- `pnpm --filter @paperclipai/db typecheck`
- `pnpm --filter @paperclipai/shared typecheck`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm --filter @paperclipai/ui typecheck`
- `cd server && pnpm exec vitest run
src/__tests__/issue-tree-control-routes.test.ts
src/__tests__/issue-tree-control-service.test.ts
src/__tests__/issue-tree-control-service-unit.test.ts
src/__tests__/heartbeat-dependency-scheduling.test.ts`
- `cd ui && pnpm exec vitest run src/components/IssueChatThread.test.tsx
src/pages/IssueDetail.test.tsx`

## Risks

- This is a broad cross-layer change touching DB/schema, shared
contracts, server orchestration, and UI; regressions are most likely
around subtree status restoration or wake suppression/resume edge cases.
- The migration was renumbered during PR prep to avoid the new upstream
`0065_environments` conflict. Reviewers should confirm the final
`0066_issue_tree_holds` ordering is the only hold-related migration that
lands.
- The issue-tree restore endpoint now responds with `200` instead of
relying on implicit behavior, which is semantically better for a restore
operation but still changes an API detail that clients or tests could
have assumed.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex coding agent in the Paperclip Codex runtime (GPT-5-class
tool-using coding model; exact deployment ID/context window is not
exposed inside this session).

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [ ] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-04-23 14:51:46 -05:00
Russell Dempsey 854fa81757 fix(pi-local): prepend installed skill bin/ dirs to child PATH (#4331)
## Thinking Path

> - Paperclip orchestrates AI agents; each agent runs under an adapter
that spawns a model CLI as a child process.
> - The pi-local adapter (`packages/adapters/pi-local`) spawns `pi` and
inherits the child's shell environment — including `PATH`, which
determines what the child's bash tool can execute by name.
> - Paperclip skills ship executable helpers under `<skill>/bin/` (e.g.
`paperclip-get-issue`) and Reviewer/QA-style `AGENTS.md` files invoke
them by name via the agent's bash tool.
> - Pi-local builds its runtime env with `ensurePathInEnv({
...process.env, ...env })` only — it never adds the installed skills'
`bin/` dirs to PATH. The pi CLI's `--skill` arg loads each skill's
SKILL.md but does not augment PATH.
> - Consequence: every bash invocation of a skill helper fails with
`exit 127: command not found`. The agent then spends its heartbeat
guessing (re-reading SKILL.md, trying `find`, inventing command paths)
and either times out or gives up.
> - This PR prepends each injected skill's `bin/` directory to the child
PATH immediately before runtimeEnv is constructed.
> - The benefit: pi_local agents whose AGENTS.md uses any `paperclip-*`
skill helper can actually run those helpers.

## What Changed

- `packages/adapters/pi-local/src/server/execute.ts`: compute
`skillBinDirs` from the already-resolved `piSkillEntries`, dedupe
against the existing PATH, prepend them to whichever of `PATH` / `Path`
the merged env uses, then build `runtimeEnv`. No new helpers, no
adapter-utils changes.

## Verification

Manual repro before the fix:

1. Create a pi_local agent wired to a paperclip skill (e.g.
paperclip-control).
2. Wake the agent on an in_review issue with an AGENTS.md that starts
with `paperclip-get-issue "$PAPERCLIP_TASK_ID"`.
3. Session file: `{ "role": "toolResult", "isError": true, "content": [{
"text": "/bin/bash: paperclip-get-issue: command not found\n\nCommand
exited with code 127" }] }`.

After the fix: same wake; `paperclip-get-issue` resolves and returns the
issue JSON; agent proceeds.

Local commands:

```
pnpm --filter @paperclipai/adapter-pi-local typecheck   # clean
pnpm --filter @paperclipai/adapter-pi-local build       # clean
pnpm --filter @paperclipai/server exec vitest run \
  src/__tests__/pi-local-execute.test.ts \
  src/__tests__/pi-local-adapter-environment.test.ts \
  src/__tests__/pi-local-skill-sync.test.ts
# 5/5 passing
```

No new tests: the existing `pi-local-skill-sync.test.ts` covers skill
symlink injection (upstream of the PATH step), and
`pi-local-execute.test.ts` covers the spawn path; this change only
augments env on the same spawn path.

## Risks

Low. Pure PATH augmentation on the child env. Edge cases:

- Zero skills installed → no PATH change (guarded by
`skillBinDirs.length > 0`).
- Duplicate bin dirs already on PATH → deduped; no pollution on re-runs.
- Windows `Path` casing → falls back correctly when merged env uses
`Path` instead of `PATH`.
- Skill dir without `bin/` subdir → joined path simply won't resolve;
harmless.

No behavioral change for pi_local agents that don't use skill-provided
commands.

## Model Used

- Claude, `claude-opus-4-7` (1M context), extended thinking enabled,
tool use enabled. Walked pi-local/cursor-local/claude-local and
adapter-utils to isolate the gap, wrote the inlined fix, and ran
typecheck/build/test locally.

## Checklist

- [x] Thinking path from project context to this change
- [x] Model used specified
- [x] Checked ROADMAP.md — no overlap
- [x] Tests run locally, passing
- [x] Tests added — new case in
`server/src/__tests__/pi-local-execute.test.ts`; verified it fails when
the fix is reverted
- [ ] UI screenshots — N/A (backend adapter change)
- [x] Docs updated — N/A (internal adapter, no user-facing docs)
- [x] Risks documented
- [x] Will address reviewer comments before merge
2026-04-23 10:15:10 -05:00