forked from farhoodlabs/paperclip
8014445b238531a00a6b91e79e94de6be9981f59
61 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1bd44c8a0d |
Harden Cloudflare sandbox execution (#5967)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Remote-managed adapters need sandbox/environment execution to behave like real agent runs, not just local host probes. > - The Cloudflare sandbox path was the weakest leg in the SSH + Cloudflare QA matrix because bridge execution could truncate output, time out long-running installs, and under-provision the worker instance. > - That made several adapters fail for reasons unrelated to their actual business logic, which blocks confidence in Paperclip's non-local environment model. > - This pull request hardens the Cloudflare bridge/runtime path and adjusts sandbox probe budgets so adapter verification matches the measured behavior of the fixed environment. > - It also corrects the Pi sandbox install command so the QA matrix exercises a real, supported install path. > - The benefit is a materially more reliable SSH + Cloudflare adapter matrix with fewer false negatives and clearer failure boundaries. ## What Changed - Switched the Cloudflare bridge worker instance type to `standard-2` for the QA-matrix execution path. - Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE keepalives so long-running install/exec calls can complete instead of dying at the transport layer. - Fixed Cloudflare bridge-channel command handling to avoid dropped final stdout chunks on short-lived execs. - Made Claude, OpenCode, and Cursor sandbox probe timeouts configurable/sandbox-aware, then tightened the defaults to the measured post-fix range. - Updated the Pi sandbox install command to use the package currently installed by the official `pi.dev` installer, pinned to a specific npm version. - Added/updated tests around Cloudflare bridge behavior and adapter sandbox probe paths. ## Verification - `pnpm --filter @paperclipai/adapter-claude-local typecheck` - `pnpm --filter @paperclipai/adapter-opencode-local typecheck` - `pnpm --filter @paperclipai/adapter-cursor-local typecheck` - `pnpm vitest run packages/adapters/cursor-local packages/adapters/claude-local packages/adapters/opencode-local packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare server/src/services/__tests__/plugin-worker-manager.test.ts` - Manual QA on the dedicated dev instance using the SSH + Cloudflare environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes: SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare `claude_local`, `codex_local`, `cursor`, `gemini_local`. ## Risks - Cloudflare sandbox cost increases because the bridge worker now runs on `standard-2` instead of `lite`. - Higher timeout ceilings can delay surfacing truly hung Cloudflare bridge calls, even though they remove transport-level false negatives. - The manual heartbeat matrix still exposed follow-on execution/sync/disposition bugs in `opencode_local` and `pi_local`; those are not fixed by this PR. ## Model Used - OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`, tool use enabled, repo search enabled. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots (not applicable) - [x] I have updated relevant documentation to reflect my changes (not applicable) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
0fe39a2d5c |
fix(cursor-local): resolve sandbox agent installs from cursor bin (#5686)
> _Stacked on top of #5685 (Harden remote sandbox runtime). Diff against master includes commits from earlier PRs in the stack — review focuses on the new commit only._ ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The cursor-local adapter wraps the Cursor Agent CLI so a Paperclip workflow can drive it inside a sandbox > - When the adapter runs in a remote sandbox, the Cursor Agent CLI installs under `$HOME/.local/bin/cursor-agent` (or wherever `$XDG_BIN_HOME` points), not on the global PATH > - The existing post-install resolution assumed `cursor-agent` would resolve via the sandbox's login shell PATH after `npm install -g`, which fails on sandboxes where the install lands in a user-prefixed directory that isn't on PATH at probe time > - This pull request resolves the agent CLI from the cursor binary's own directory (`dirname "$(command -v cursor)"`) so the install probe and execute path agree on a real binary location > - The benefit is that cursor-local works correctly on any sandbox provider where `npm install` lands in a user-prefixed directory ## What Changed - `packages/adapters/cursor-local/src/server/remote-command.ts`: resolve the cursor-agent binary from the cursor bin directory after install, instead of relying on PATH. - `packages/adapters/cursor-local/src/server/test.ts`: corresponding probe tweak. - `packages/adapters/cursor-local/src/server/test.test.ts` (new) + `remote-command.test.ts`: focused coverage that exercises the install + resolve path against a sandbox runner that places the binary in a user-prefixed directory. ## Verification - `pnpm exec vitest run --no-coverage packages/adapters/cursor-local/src/server/test.test.ts packages/adapters/cursor-local/src/server/remote-command.test.ts packages/adapters/cursor-local/src/server/execute.test.ts` All passing locally. ## Risks - Local cursor-local runs are unaffected — the resolution change only kicks in for the sandbox install path. - Low risk; isolated to one adapter. ## Model Used - Provider: Anthropic - Model: Claude Opus 4.7 (1M context) - Capabilities used: tool use (Read/Edit/Bash), no code execution beyond local repo commands ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A, no UI change - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
b24c6909e8 |
Harden remote sandbox runtime probes, timeouts, and installs (#5685)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Each agent runs inside a sandbox environment so its CLI is isolated from the host > - Sandbox-backed adapter runs go through a small set of shared helpers — `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings > - When standing up new sandbox provider plugins, the existing helpers timed out, missed install fallbacks, or leaned on assumptions that only held for E2B > - Local adapters (`claude-local`, `codex-local`, `gemini-local`, `opencode-local`) needed slightly hardened probes so they could install themselves and validate inside *any* remote sandbox transport, not just E2B > - This pull request bundles those runtime fixes so future sandbox provider plugins inherit a working baseline > - The benefit is that adding a new sandbox provider plugin no longer requires touching adapter-utils or each local-adapter probe — the supporting infra is already correct ## What Changed - `packages/adapter-utils/src/execution-target.ts`: introduce `DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and `resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters keep the historical "0 means no adapter timeout" behavior; sandbox-backed runs without an explicit `timeoutSec` get an explicit 30-minute default so remote installs and warm-up don't time out at the per-RPC default. Plumbed `timeoutSec` through `ensureAdapterExecutionTargetCommandResolvable` so install probes inside a sandbox honor adapter-level overrides instead of the bridge's 5-minute default. - `packages/adapters/opencode-local/src/index.ts`: switch `SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl -fsSL https://opencode.ai/install | bash`. The npm package reifies four large prebuilt-binary subpackages in parallel even though only one matches the host arch; on bandwidth-constrained sandboxes that blew through the 240s install budget. The official installer fetches one arch-specific binary and adds `$HOME/.opencode/bin` to PATH via `~/.bashrc`, which the sandbox-callback-bridge login-shell script already sources. - `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden remote-target probes — pass `--skip-git-repo-check` for Codex when probing outside a repo, normalize permission flags for Claude, and add `*.remote.test.ts` coverage that exercises the remote-sandbox path explicitly for each adapter. - `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}` (new): add `buildSandboxNpmInstallCommand` helper. `server/src/adapters/registry.ts` + new `server/src/__tests__/adapter-registry.test.ts`: wire adapter install commands so they fall back to a writable `$HOME/.local` prefix when global install isn't available. - `server/src/__tests__/plugin-worker-manager.test.ts` + new `server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call timeout overrides so plugin worker exec calls honor the caller's timeout instead of the worker's default. ## Verification - `pnpm typecheck` - `pnpm exec vitest run --no-coverage packages/adapter-utils/src/execution-target-sandbox.test.ts packages/adapter-utils/src/sandbox-install-command.test.ts` - `pnpm exec vitest run --no-coverage server/src/__tests__/plugin-worker-manager.test.ts server/src/__tests__/adapter-registry.test.ts server/src/__tests__/claude-local-adapter-environment.test.ts server/src/__tests__/claude-local-execute.test.ts server/src/__tests__/gemini-local-adapter-environment.test.ts` - `pnpm exec vitest run --no-coverage packages/adapters/codex-local/src/server/test.remote.test.ts packages/adapters/opencode-local/src/server/test.remote.test.ts packages/adapters/codex-local/src/server/codex-args.test.ts packages/adapters/codex-local/src/server/execute.remote.test.ts packages/adapters/gemini-local/src/server/execute.remote.test.ts` All passing locally. ## Risks - Touches shared `adapter-utils` and several `*-local` adapters. The 30-minute default applies only when both (a) the target is `remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH paths are unchanged. New test coverage was added alongside each behavior change to pin the contracts. - Switching OpenCode's install command to the official installer is a behavior change for any operator running OpenCode inside a remote sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND` only runs when an adapter is being installed inside a sandbox). - Low risk overall — no migrations, no API surface change. ## Model Used - Provider: Anthropic - Model: Claude Opus 4.7 (1M context) - Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep), no code execution beyond local repo commands ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A, no UI change - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
4269545b19 |
Stabilize Cursor sandbox runtime resolution (#5446)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The Cursor adapter spawns the Cursor CLI against local, SSH, and sandbox execution targets; on a fresh sandbox lease, it has to resolve where Cursor was installed > - The previous resolver only looked for `~/.local/bin/cursor-agent` even though the official installer (and the adapter's own `SANDBOX_INSTALL_COMMAND`) sometimes lays the binary down as `~/.local/bin/agent`, so a sandbox where the install ran successfully would still fail to find the CLI > - This pull request lets the resolver accept either basename and lets the caller pass an optional `remoteSystemHomeDirHint` so a probe doesn't pay the cost of a remote `printf $HOME` round-trip when the home directory is already known > - The benefit is sandboxed Cursor runs find the binary that the install actually produced, and runtime probes are cheaper when the home dir is already resolved ## What Changed - `packages/adapters/cursor-local/src/server/remote-command.ts`: accept either `agent` or `cursor-agent` as the preferred basename; new optional `remoteSystemHomeDirHint` short-circuits the home-dir probe - `packages/adapters/cursor-local/src/server/execute.ts`: thread the home-dir hint through, prefer the resolved binary path, and shift the effective execution cwd to the per-run managed subdirectory once the runtime is prepared - New `remote-command.test.ts` and `execute.test.ts` cover both basenames, the hint short-circuit, and the cwd shift - `packages/adapters/cursor-local/src/index.ts`: update doc string to reflect the broader resolution - `execute.remote.test.ts` updated to expect the managed-subdirectory cwd shape introduced by the cwd shift ## Verification - `pnpm vitest run --no-coverage --project @paperclipai/adapter-cursor-local` — 6/6 passing - `pnpm typecheck` clean - Manual: a fresh sandbox lease with `npm install -g …`-installed Cursor (binary lands as `~/.local/bin/agent`) now runs cleanly through the adapter ## Risks Low. Resolver is strictly broader (matches a superset of paths); existing setups with `~/.local/bin/cursor-agent` continue to work. The home-dir hint is opt-in; callers that don't pass it get the existing probe behavior. Cursor's effective execution cwd now matches the rest of the adapters (per-run managed subdirectory) — sessions previously rooted at the workspace root will land in the new subdirectory. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable — new tests cover both basenames + hint short-circuit + cwd shift - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --- > **Stacked PR.** Sits on top of #5445 (which sits on #5444). Cumulative diff against `master` includes both of those PRs' content; the files touched by *this* PR's commit are listed under "What Changed" above. Will rebase onto `master` and force-push once the prerequisite PRs merge. |
||
|
|
12cb7b40fd |
Harden remote workspace sync and restore flows (#5444)
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs against a remote target, Paperclip syncs the
workspace out to the remote at run start and restores changes back to
the local workspace at run end
> - The previous restore flow naïvely overwrote local files with
whatever the remote returned, so files that the remote run never touched
but had timestamp/mode drift could be needlessly rewritten — and a
single static `refs/paperclip/ssh-sync/imported` ref made concurrent SSH
workspace exports race on the same git ref
> - This pull request adds a `workspace-restore-merge` module that diffs
a pre-run snapshot against the post-run remote state and only writes
back files the remote actually changed; SSH workspace exports now use a
per-import unique ref so concurrent runs can't trample each other
> - Every adapter's execute path threads the snapshot through
`prepareAdapterExecutionTargetRuntime` so the merge has the baseline it
needs
> - The benefit is workspace restores no longer churn untouched files,
and concurrent SSH runs no longer collide on the import ref
## What Changed
- `packages/adapter-utils/src/workspace-restore-merge.{ts,test.ts}`: new
module — directory snapshot (kind/mode/sha256/symlink target) plus
snapshot-aware merge that writes only the files the remote changed
- `packages/adapter-utils/src/ssh.ts`: SSH workspace export uses a
per-import unique ref (`refs/paperclip/ssh-sync/imported/<uuid>`);
restore goes through the new merge helper; `ssh-fixture.test.ts` covers
the unique-ref + merge paths
- `packages/adapter-utils/src/sandbox-managed-runtime.ts` +
`remote-managed-runtime.ts`: thread the snapshot/merge through the
sandbox and SSH paths
- `packages/adapter-utils/src/server-utils.{ts,test.ts}` +
`execution-target.ts`: helpers for capturing the pre-run snapshot;
`prepareAdapterExecutionTargetRuntime` gains required `runId` and
optional `workspaceRemoteDir`, and returns the realized
`workspaceRemoteDir`
- Each adapter's `execute.ts` (acpx, claude, codex, cursor, gemini,
opencode, pi) takes the snapshot at run start and passes it through to
the runtime restore
- Remote execute test mocks updated to match the new
`prepareWorkspaceForSshExecution` return shape and the per-run
`${managedRemoteWorkspace}` cwd subdirectory
## Verification
- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-acpx-local --project
@paperclipai/adapter-claude-local --project
@paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project
@paperclipai/adapter-gemini-local --project
@paperclipai/adapter-opencode-local --project
@paperclipai/adapter-pi-local` — 196/196 passing
- `pnpm typecheck` clean across the workspace
## Risks
Medium. The restore path now writes a strict subset of what it
previously did — files the remote did not touch are no longer rewritten.
If any flow was relying on a touch-without-content-change being copied
back (timestamp or permission propagation only), that behavior is now
skipped. Snapshot capture adds an O(N-files-in-workspace) hash pass at
run start; the cost is bounded by the existing exclude list. The `runId`
parameter on `prepareAdapterExecutionTargetRuntime` is now required —
every in-tree caller is updated; out-of-tree adapter authors need to
pass it.
## Model Used
Claude Opus 4.7 (1M context)
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new module +
every adapter execute path covered
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
|
||
|
|
a1b30c9f35 |
Add planning mode for issue work (#5353)
## Thinking Path > - Paperclip is a control plane for autonomous AI companies. > - Issues are the core unit of work, and issue comments are how board users and agents coordinate execution. > - Some issue conversations need to produce plans and approvals instead of immediate implementation work. > - The existing issue contract did not distinguish standard execution comments from planning-oriented issue work. > - This pull request adds an issue work-mode contract and board UI affordances for standard vs planning mode. > - The benefit is that planning-mode issues can be created, displayed, discussed, and carried through agent heartbeat context without losing the normal issue workflow. ## What Changed - Added `standard` / `planning` issue work-mode contracts across DB, shared validators/types, server issue flows, plugin protocol, and adapter heartbeat payloads. - Added an idempotent `0081_optimal_dormammu` migration for `issues.work_mode`, ordered after current `public-gh/master` migrations. - Updated heartbeat/context summaries and issue-thread interaction behavior so planning work mode is preserved when creating suggested follow-up issues. - Added UI support for planning-mode issue creation, issue rows, detail composer styling, and composer work-mode toggles. - Added focused server/shared/UI tests plus a Playwright visual verification spec for planning-mode surfaces. - Rebased the branch onto current `public-gh/master` and added durable planning-mode screenshots under `doc/assets/pap-3368/`. ## Verification - `pnpm --filter @paperclipai/db run check:migrations` - `pnpm exec vitest run --project @paperclipai/shared packages/shared/src/validators/issue.test.ts` - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/heartbeat-context-summary.test.ts server/src/__tests__/issue-thread-interactions-service.test.ts server/src/__tests__/issues-goal-context-routes.test.ts --pool=forks --poolOptions.forks.isolate=true` - `pnpm exec vitest run --project @paperclipai/ui ui/src/components/IssueChatThread.test.tsx ui/src/components/NewIssueDialog.test.tsx ui/src/components/IssueRow.test.tsx ui/src/pages/IssueDetail.test.tsx` - `pnpm exec vitest run --project @paperclipai/adapter-utils packages/adapter-utils/src/server-utils.test.ts` - `PAPERCLIP_E2E_SKIP_LLM=true npx playwright test --config tests/e2e/playwright.config.ts tests/e2e/planning-mode-visual-verification.spec.ts` ## Screenshots Desktop planning detail:  Desktop planning row:  Desktop staged standard toggle:  Mobile planning detail:  Mobile planning row:  ## Risks - Medium migration risk: this adds a non-null issue column. The migration uses `ADD COLUMN IF NOT EXISTS` so installations that applied an older branch-local migration number can still apply the final numbered migration safely. - Medium contract risk: issue payloads, plugin payloads, and adapter heartbeat payloads now include work mode; compatibility is handled by defaulting missing values to `standard`. - UI risk is moderate because composer controls changed; focused component tests and visual e2e coverage exercise standard vs planning display and toggle behavior. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent in a local Paperclip worktree, with shell/tool use. Exact context-window size is not exposed in this runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
9578dc3da7 |
Wire per-adapter sandbox install commands through test and execute paths (#5280)
> **Stacked PR.** Sits on top of the e2b sandbox chain — #5278 (stdin staging) and #5279 (honest-resolvability + login-profiles). The cumulative diff against `master` includes both of those PRs' content; the files touched by *this* PR's commit are the new `maybeRunSandboxInstallCommand` helper in `packages/adapter-utils/src/execution-target.ts` and the per-adapter `index.ts`/`server/test.ts`/`server/execute.ts` wiring under `packages/adapters/{claude,codex,cursor,gemini,opencode,pi}-local/`. The honest resolvability check from #5279 is what gives this PR's install command a meaningful "did it actually land on PATH" follow-up. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Sandbox execution targets are ephemeral — each fresh lease starts from a template image that may or may not have the agent CLIs preinstalled > - When a CLI isn't preinstalled, the resolvability probe fails at `command -v` and the hello probe never runs > - There's no shared mechanism for "before you probe or provision, install the CLI on this sandbox" > - This pull request adds a `SANDBOX_INSTALL_COMMAND` constant per adapter and a `maybeRunSandboxInstallCommand` helper that runs it via the existing sandbox login shell, captures structured output, and never throws (so the resolvability + hello probe still run after); each adapter's `test()` and `execute()` share the constant so the two callsites can't drift > - The benefit is a fresh sandbox lease without a preinstalled CLI now installs it once via `sh -lc` before the resolvability probe and before managed-runtime provisioning, with a uniform `<adapter>_install_command_run` check on the test report ## What Changed - `packages/adapter-utils/src/execution-target.ts`: add `AdapterSandboxInstallCommandCheck` and `maybeRunSandboxInstallCommand` (runs the install via existing sandbox shell, captures exit/stdout/stderr, returns a structured info/warn check, never throws) - Add `SANDBOX_INSTALL_COMMAND` to each adapter's `index.ts` so `test()` and `execute()` share a single source of truth - Wire each of the 6 affected adapter `testEnvironment()`s to call `maybeRunSandboxInstallCommand` before `ensureAdapterExecutionTargetCommandResolvable` - Pass `installCommand: SANDBOX_INSTALL_COMMAND` through `prepareAdapterExecutionTargetRuntime` in each adapter's `execute()` - Per-adapter install commands use npm globals where possible so binaries land on a PATH segment the template already exports: - claude → `npm install -g @anthropic-ai/claude-code` - codex → `npm install -g @openai/codex` - cursor → `curl https://cursor.com/install -fsS | bash` - gemini → `npm install -g @google/gemini-cli` - opencode → `npm install -g opencode-ai` - pi → `npm install -g @mariozechner/pi-coding-agent` SSH and local targets ignore `installCommand` (SSH runtime takes no such param; local short-circuits before runtime prep), so this is a no-op for non-sandbox environments. ## Verification - `pnpm typecheck` clean - `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils` and per-adapter projects pass - Manual sandbox matrix (claude, codex, cursor, gemini, opencode, pi) — each goes `install_command_run → resolvable → hello_probe_passed` (Codex and Pi land on `hello_probe_auth_required`, which is the configured-credentials problem, not an install issue) - SSH no-regression: SSH Claude still passes; the helper short-circuits on non-sandbox targets ## Risks Medium — adds a network/CPU cost (npm install / curl) on every fresh sandbox lease. Cost is bounded (one-time per lease, typically tens of seconds for npm globals), and the helper never throws so a failing install still lets the report run resolvability and hello probes. If a sandbox image already has the CLI, the install is an idempotent reinstall. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
90631b09b3 |
Let adapters declare runtime command spec for remote provisioning (#5141)
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies, running
adapter
> commands like `claude`, `codex`, `pi` either locally or on remote
runtimes
> (SSH hosts, sandboxes, etc.)
> - On a fresh remote runtime — particularly an ephemeral sandbox — the
> adapter's CLI may not be installed yet. Today operators handle this
via
> external configuration (e.g. a project-level `provisionCommand` shell
> script) that has to know about every adapter the operator might want
to use
> - This means every adapter has its own well-known npm package, but
operators
> end up writing duplicate provision shell scripts that paste together
> `npm install -g @anthropic-ai/claude-code`, `npm install -g
@openai/codex`,
> etc. — knowledge the adapter itself already has
> - This PR moves that knowledge into the adapter modules: each adapter
declares
> how its runtime command should be detected and (if applicable)
installed
> via `getRuntimeCommandSpec(config)`. The execution path runs the
adapter's
> own install command on remote sandbox targets before launching, so a
fresh
> sandbox bootstraps itself instead of requiring a hand-written
provision script
> - The benefit is fewer footguns for operators provisioning remote
runtimes,
> and a clean place for new adapters to plug in their install recipe
## What Changed
- New types in `packages/adapter-utils/src/types.ts`:
- `AdapterRuntimeCommandSpec` describing `command`, optional
`detectCommand`, and optional `installCommand`
- Optional `getRuntimeCommandSpec(config)` on `ServerAdapterModule`
- Optional `runtimeCommandSpec` on `AdapterExecutionContext` so adapters
receive the resolved spec at execute time
- New helper `ensureAdapterExecutionTargetRuntimeCommandInstalled(...)`
in
`packages/adapter-utils/src/execution-target.ts` that runs the install
command
on remote targets when `transport === "sandbox"`. SSH and local targets
are
no-ops. Throws on timeout or non-zero exit so failures surface early.
- Each of `claude-local`, `codex-local`, `cursor-local`, `gemini-local`,
`opencode-local`, `pi-local`'s `execute.ts` now reads
`ctx.runtimeCommandSpec?.installCommand` and calls the helper before
launching
the adapter command.
- `server/src/adapters/registry.ts` declares `getRuntimeCommandSpec` for
each
adapter:
- claude/codex/gemini/opencode/pi-local: `npm install -g <package>`
recipe via
a shared `buildNpmRuntimeCommandSpec` helper, with a defensive guard
that
only auto-installs when the configured `command` matches the well-known
fallback (custom binaries are left alone).
- cursor-local: declares `command` only; no auto-install (no public npm
package), preserving the existing manual setup.
- `server/src/services/heartbeat.ts` resolves the spec via
`adapter.getRuntimeCommandSpec?.(runtimeConfig)` and passes it through
to
`AdapterExecutionContext`.
- Tests added in `execution-target.test.ts` (~75 lines), e2b
`plugin.test.ts` (~32 lines), and `environment-run-orchestrator.test.ts`
(~76 lines).
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-run-orchestrator`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: run an adapter (claude/codex/etc.) against a fresh
sandbox-backed
environment that does NOT have the adapter CLI pre-installed. Confirm
the
install runs once at the start of the agent run and the adapter then
launches
successfully. Re-run on the same sandbox; confirm the install command is
idempotent and the second run starts faster.
- Confirm SSH and local execution paths are unaffected (gated by
`transport === "sandbox"`).
## Risks
- Behavioural shift on sandbox runs: a new install step now runs at the
start
of every sandbox agent run for adapters with `installCommand` set. The
install commands are idempotent (`if ! command -v X >/dev/null 2>&1;
then
npm install -g <pkg>; fi`), so this is fast on warm sandboxes. On a cold
sandbox, the first run takes longer.
- Operators who used the legacy project-level `provisionCommand` to
install
adapter CLIs can drop that part of their script; the adapter handles it
now.
Existing scripts continue to work — installs are idempotent.
- The cursor-local adapter has no auto-install (no public npm package).
Behaviour for cursor-local on sandboxes is unchanged.
- New optional surface on `ServerAdapterModule`. Plugins that don't
implement
`getRuntimeCommandSpec` retain previous behaviour (no auto-install).
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
|
||
|
|
856c6cb192 |
Fix remote workspace environment shaping (#5118)
> **Stacked PR (part 5 of 7).** Depends on: - PR #5114 - PR #5115 - PR #5116 - PR #5117 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run with a Paperclip-shaped environment (`PAPERCLIP_WORKSPACE_CWD`, > worktree path, `PAPERCLIP_WORKSPACES_JSON` hints) so the CLI can locate the > correct project tree > - SSH testing reproduced a real failure: a Codex SSH run wrote to > `/tmp/paperclip-env-matrix-...` (the *host* path) instead of the realized > remote workspace at `/home/<user>/paperclip-env-matrix-ssh-claude/...` > because the adapter injected `PAPERCLIP_WORKSPACE_CWD=/tmp/...` into the > remote env > - Code review on the initial codex-only fix asked to roll the same approach > into every other SSH-capable adapter (claude, acpx, cursor, opencode, gemini, > pi) via a shared helper rather than duplicating per-adapter > - This PR adds `shapePaperclipWorkspaceEnvForExecution` in adapter-utils that, > when the execution target is remote: replaces local cwd with the realized > execution cwd, nulls out worktree path (which has no remote meaning), and > rewrites/strips `cwd` entries in workspace hints based on what was actually > synced. Every adapter calls it before invoking the remote runner > - The benefit is that remote runs see the realized remote workspace, host-local > paths stop leaking into remote env, and the rule is unit-tested in one place ## What Changed - Added `shapePaperclipWorkspaceEnvForExecution` to `packages/adapter-utils/src/server-utils.ts` with full unit coverage (`server-utils.test.ts`) - Each of acpx-local, claude-local, codex-local, cursor-local, gemini-local, opencode-local, pi-local now calls the new shaper before issuing the remote command and feeds the shaped values into `applyPaperclipWorkspaceEnv` - Per-adapter `execute.remote.test.ts` files extended to cover the new shaping behaviour: localhost paths replaced with remote cwd, foreign-cwd hints stripped, worktree path nulled out for remote targets - `acpx-local/src/server/execute.test.ts` extended with shaping coverage ## Verification - `pnpm test -- server-utils execute.remote` - `pnpm --filter @paperclipai/adapter-acpx-local test` - Manual QA reproducing the original failure: 1. Provision an E2B sandbox environment for the Paperclip QA company 2. Assign an issue to a remote-targeted claude-local agent and confirm the run starts in the correct remote cwd (no `/Users/...` path leakage in the run logs) 3. Repeat for opencode-local and pi-local ## Risks - Behavioural shift: hints whose `cwd` doesn't match the workspace cwd are now stripped on remote targets. If any adapter relied on a leaked local hint cwd, it will see a missing `cwd` instead. Reviewed all current callers — none do. - Adds a small per-run cost (path resolve + string normalisation) on every remote execution. Negligible. - Worktree path is now nulled out on remote (it has no meaning there). Adapters that previously read the value defensively will continue to work. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
076067865f |
Migrate SSH environment callback to bridge (#5116)
> **Stacked PR (part 3 of 7).** Depends on: - PR #5114 - PR #5115 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents executing on a remote SSH-backed environment need a way to call back into > the Paperclip control plane (run events, log streaming, signals) > - When the SSH host can't reach the Paperclip host (NAT, firewalls, or simply not > on the same network), the run silently fails or hangs — a recurring class of > failure during SSH testing > - In sandboxed environments we already solved this with a callback bridge that > tunnels back through the existing connection; SSH was the odd one out > - This PR migrates SSH execution to use the same callback bridge, so every > adapter's remote run uses one consistent reverse-channel. Per-adapter SSH glue > is deleted in favour of a shared `CommandManagedRuntimeRunner` built from the > SSH spec > - The benefit is fewer SSH-specific failure modes, a smaller code surface, and > one place to evolve the callback contract going forward ## What Changed - Added `createSshCommandManagedRuntimeRunner` in `packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a generic command-managed-runtime runner (with cwd, env, and timeout handling) - Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge URL now flows through the shared runner - Reworked `execution-target.ts` to use the SSH runner alongside sandbox runners via a unified `CommandManagedRuntimeRunner` interface - Simplified `remote-managed-runtime.ts` and `sandbox-managed-runtime.ts` to consume the shared runner abstraction - Deleted per-adapter SSH callback wiring from claude-local, codex-local, cursor-local, gemini-local, opencode-local, pi-local execute.ts files - Removed `environment-runtime-driver-contract.test.ts` (the contract is now enforced by `environment-execution-target.test.ts`) - Added/updated `execute.remote.test.ts` cases for each adapter to cover the SSH runner path ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm test -- execute.remote` (covers all six local adapters' SSH paths) - Manual QA: ran a claude-local agent against an SSH-backed environment, confirmed the agent successfully called back to `/api/agent-callback/*` endpoints during the run ## Risks - Refactor touches all six local adapters. If any adapter had subtle SSH-specific behaviour that wasn't captured in tests, it could regress. Mitigation: each adapter's `execute.remote.test.ts` was extended. - `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking type change for any internal consumer. Verified no external plugins consume this type. - The new `CommandManagedRuntimeRunner` shape is a public surface in `@paperclipai/adapter-utils`; downstream plugins implementing custom runners may need updates, but no such plugins exist in this repo. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
a3de1d764d |
Add cheap model profiles for local adapters (#4881)
## Thinking Path > - Paperclip is a control plane for autonomous AI companies, where adapters are the boundary between the board, agents, and execution runtimes. > - Local adapters currently expose a primary runtime configuration, but operators often need a cheaper model lane for routine or low-risk work. > - That cheap lane has to stay adapter-owned: runtime profile settings should not mutate the primary adapter config or bypass existing auth/secret mediation. > - Issue creation also needs an ergonomic way to request primary, cheap, or custom model behavior for a selected assignee. > - This pull request adds a first-class `cheap` model profile contract across adapter capabilities, heartbeat config resolution, agent configuration, and issue creation. > - The benefit is cheaper task execution can be configured and requested explicitly while preserving adapter boundaries, secret handling, and audit visibility. ## What Changed - Added adapter model-profile capability metadata and a `cheap` profile contract for supported local adapters. - Applied `runtimeConfig.modelProfiles.cheap.adapterConfig` during heartbeat config resolution, including requested/applied/fallback run metadata. - Added agent configuration UI for cheap model profile settings without writing those settings into primary `adapterConfig`. - Added New Issue assignee model lane controls for Primary / Cheap / Custom and request payload handling. - Added run ledger profile badges and Storybook stories for the new cheap-lane UI states. - Added tests for validators, heartbeat model profile application, permission/secret mediation, UI payload helpers, and run ledger rendering. - Added committed UI verification screenshots under `docs/pr-screenshots/pap-2837/`. - Addressed Greptile review feedback around cheap-profile defaults, shared profile types, and fallback test data. ## Verification Local: - `pnpm exec vitest run packages/shared/src/validators/issue.test.ts server/src/__tests__/adapter-registry.test.ts server/src/__tests__/agent-permissions-routes.test.ts server/src/__tests__/heartbeat-model-profile.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/lib/agent-config-patch.test.ts ui/src/lib/issue-assignee-overrides.test.ts ui/src/lib/new-agent-runtime-config.test.ts` — passed, 8 files / 103 tests. - `pnpm exec vitest run ui/src/lib/new-agent-runtime-config.test.ts ui/src/components/IssueRunLedger.test.tsx` — passed after Greptile/rebase follow-up, 2 files / 17 tests. - `pnpm --filter @paperclipai/ui typecheck` — passed after Greptile/rebase follow-up. - `pnpm -r typecheck` — passed. - `pnpm build` — passed. - `pnpm test:run` — did not complete successfully in this local worktree: it stopped in pre-existing `@paperclipai/adapter-utils` sandbox/SSH fixture suites outside this PR diff. Failures were 5s local timeouts plus `git init -b` unsupported by this machine's Git 2.21.0. The branch-specific targeted suites above passed. - Branch was fetched/rebased onto `public-gh/master`; `git rev-list --left-right --count public-gh/master...HEAD` reports `0 9`. Remote PR checks on latest head `e30bf399146451c86cee98ed528d51d33fa5af5a`: - `policy` — passed. - `verify` — passed. - `e2e` — passed. - `Greptile Review` — passed, confidence score 5/5; Greptile review threads resolved. - `security/snyk (cryppadotta)` — passed. Screenshots: - [New issue cheap lane desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-cheap-desktop.png) - [New issue custom lane desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-custom-desktop.png) - [New issue unsupported adapter desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-unsupported-desktop.png) - [Run ledger model profile badges desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/runledger-profile-badges-desktop.png) - Mobile variants are also in `docs/pr-screenshots/pap-2837/`. ## Risks - Medium: heartbeat config mediation now merges runtime model profiles into adapter configs, so adapter secret normalization and host-command restrictions must keep covering nested config paths. - Medium: the UI adds another issue creation choice; unsupported adapters must keep hiding the cheap lane and preserve primary behavior. - Low migration risk: no database migration is included. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used OpenAI Codex coding agent using GPT-5-class reasoning with repo tool use and command execution. Exact served model/context window was not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [ ] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
1fe1067361 |
Polish board settings and skills workflow (#4863)
## Thinking Path > - Paperclip's board UI and bundled skills are the operator layer for configuring agents, routines, issue workflows, and local troubleshooting loops. > - The prior rollup mixed this operator polish with database backups, backend reliability, thread scale, and cost/workflow primitives. > - This pull request isolates the remaining board QoL, settings, issue-detail integration, adapter config cleanup, and skills smoke tooling. > - It includes some integration-level overlap with the thread and workflow slices so this branch can run from `origin/master` while still preserving the full original work. > - Preferred merge order is the narrower primitives first, then this integration PR last. > - The benefit is that reviewers can inspect the user-facing board/settings/skills layer separately from backend infrastructure changes. ## What Changed - Added board/settings polish for agents, routines, company settings, project workspace detail, and issue detail controls. - Added agent/routine UI regression tests and New Issue dialog coverage. - Integrated issue-detail activity/cost/interaction surfaces and leaf work pause/resume controls. - Cleaned bundled adapter UI config defaults and onboarding copy. - Added terminal-bench loop and work-stoppage diagnosis skills plus a smoke test script. - Updated attachment type handling and Paperclip skill/API guidance. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run ui/src/pages/Agents.test.tsx ui/src/pages/Routines.test.tsx ui/src/components/NewIssueDialog.test.tsx ui/src/pages/IssueDetail.test.tsx server/src/__tests__/costs-service.test.ts server/src/__tests__/issue-thread-interaction-routes.test.ts server/src/__tests__/issue-thread-interactions-service.test.ts` - Result: 7 test files passed, 54 tests passed. - `pnpm run smoke:terminal-bench-loop-skill` - Result: JSON output included `"ok": true` and `"cleanup": true`. - UI screenshots not included because verification is focused component/page coverage for the changed board surfaces. ## Risks - This is the integration-heavy PR in the split and intentionally overlaps some component/API primitives with the issue-thread and workflow PRs so it can run from `origin/master`. - Preferred merge order: #4859, #4860, #4861, #4862, then this PR last. If earlier branches merge first, this PR may need a straightforward conflict refresh in shared UI files. - The terminal-bench smoke script creates temporary mock issues and relies on cleanup; the verified run returned `cleanup: true`. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
a4ac6ff133 |
Add sandbox callback bridge for remote environment API access (#4801)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, which are isolated from the host network > - Sandboxed agents need to call back to the Paperclip API to report progress, post comments, and update issue status > - But sandbox environments cannot reach the Paperclip server directly because they run in isolated network namespaces > - This PR adds a callback bridge that proxies API requests from the sandbox to the Paperclip server, running as a local HTTP server on the host that forwards authenticated requests > - The bridge is started automatically when an adapter launches a sandbox execution, and torn down when the run completes > - The benefit is sandboxed agents can interact with the Paperclip API without requiring network-level access to the host, enabling E2B and similar providers to work end-to-end ## What Changed - Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a lightweight HTTP bridge server that accepts requests from sandbox environments and proxies them to the Paperclip API with authentication - Added request validation and security policy: the bridge only forwards requests to the configured API URL, validates content types, enforces size limits, and rejects non-API paths - Wired the bridge into all remote adapter execute paths (claude, codex, cursor, gemini, pi) — the bridge starts before the agent process and the bridge URL is passed via environment variables - Updated `environment-execution-target.ts` to prefer the explicit API URL from environment lease metadata for sandbox callback routing - Fixed Claude sandbox runtime setup to work with the bridge configuration - Added comprehensive test coverage for bridge request handling, policy enforcement, and sandbox execution integration - Fixed browser bundling — the bridge module is excluded from the frontend bundle via the adapter-utils index export ## Verification - `pnpm test` — all existing and new tests pass, including bridge unit tests and sandbox execution integration tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run an agent task, verify the agent can post comments and update issue status through the bridge ## Risks - Medium. This is a new network-facing component (HTTP server on localhost). The security policy restricts forwarding to the configured API URL only and validates all requests, but any proxy introduces attack surface. The bridge binds to localhost only and is scoped to the lifetime of a single agent run. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
f9cf1d2f6a |
Add cursor sandbox support and fix SSH workspace sync (#4803)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, or on remote hosts via SSH > - The cursor adapter needs to resolve `cursor-agent` inside sandbox environments where it's installed in `~/.local/bin` > - But when using the default `agent` command on a sandbox target, the adapter didn't know to look in `~/.local/bin/cursor-agent`, causing "command not found" failures > - Additionally, repeated SSH runs failed because `git checkout` during workspace sync conflicted with leftover `.paperclip-runtime` files from previous runs > - This PR adds sandbox-aware command resolution for cursor and fixes the SSH workspace sync conflict > - The benefit is cursor works in E2B sandboxes out of the box, and repeated SSH runs don't fail on workspace sync ## What Changed - `cursor-local`: Added `prepareCursorSandboxCommand` — on sandbox targets, reads the remote `$HOME`, prepends `~/.local/bin` to PATH, and prefers `~/.local/bin/cursor-agent` when the default command is requested; tightened the sandbox command probe to validate the binary exists before launching; preserves explicit custom command overrides - `adapter-utils/ssh.ts`: Added `--force` to git checkout in SSH workspace sync to handle `.paperclip-runtime` untracked file conflicts from previous runs ## Verification - `pnpm test` — all existing and new tests pass, including cursor sandbox probe, sandbox execution, and custom command override tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run a cursor-local task, verify it resolves cursor-agent from the sandbox install path ## Risks - Low-medium. The `--force` flag on git checkout could discard uncommitted changes in the remote workspace, but the workspace is managed by Paperclip and should not contain user edits. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
9b99d30330 |
Add dedicated environment settings page and test-in-environment (#4798)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run inside environments (local, SSH, E2B sandbox) > - Operators need to configure and manage these environments > - But environment settings were buried inside the general company settings page, making them hard to find > - Additionally, when testing an agent from the configuration form, the test always ran locally regardless of which environment was selected > - This PR moves environments into a dedicated top-level company settings section and wires the "Test Environment" button to run inside the selected environment > - The benefit is operators can find and manage environments more easily, and the test button now validates the actual environment the agent will use ## What Changed - Added a dedicated `CompanyEnvironments` settings page with its own route and sidebar entry - Updated `CompanySettingsSidebar` and `CompanySettingsNav` to include the new environments section - Modified the agent test route (`POST /agents/:id/test`) to accept an optional `environmentId` parameter - Updated all adapter `test.ts` handlers to resolve and use the specified execution target environment - Added `resolveTestExecutionTarget` to `execution-target.ts` for remote environment test resolution with cwd fallback - Moved the "Test Environment" button and its feedback display into the `NewAgent` page footer for better UX flow ## Verification - `pnpm test` — all existing and new tests pass - `pnpm typecheck` — clean - Manual: navigate to Company Settings, confirm "Environments" appears as a top-level section - Manual: configure an agent with a non-local environment, click "Test Environment", confirm the test runs inside that environment ## Risks - Low risk. UI-only routing change for the settings page. The test-in-environment change adds an optional parameter with a local fallback, so existing behavior is preserved when no environment is specified. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
d47ffa87f0 |
Fix CEO AGENT_HOME paths and centralize workspace env propagation (#4551)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The local adapter layer is responsible for turning Paperclip runtime context into the environment seen by the child agent process. > - The CEO onboarding bundle tells the agent where to read and write its persistent memory and fact files. > - That bundle was using `./memory/...` and `./life/...`, which only works when the process cwd happens to equal the agent home directory. > - At the same time, six local adapters each duplicated the same workspace-env propagation logic, including `AGENT_HOME`, which makes this contract easy to drift. > - This pull request fixes the CEO instructions to use `$AGENT_HOME/...` and centralizes workspace-env propagation in one shared helper with shared tests. > - The benefit is a real bug fix for agent memory paths plus a single tested contract that makes future built-in adapter work less likely to forget `AGENT_HOME`. ## What Changed - Updated `server/src/onboarding-assets/ceo/HEARTBEAT.md` to use `$AGENT_HOME/memory/...` and `$AGENT_HOME/life/...` instead of cwd-relative `./memory/...` and `./life/...`. - Added `applyPaperclipWorkspaceEnv(...)` in `packages/adapter-utils/src/server-utils.ts` to centralize `PAPERCLIP_WORKSPACE_*` and `AGENT_HOME` propagation. - Added shared helper coverage in `packages/adapter-utils/src/server-utils.test.ts` for both populated and skip-empty cases. - Switched the built-in local adapters (`claude_local`, `codex_local`, `cursor_local`, `gemini_local`, `opencode_local`, `pi_local`) over to the shared helper instead of inline env assignment blocks. ## Verification - `pnpm install` - `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts packages/adapters/claude-local/src/server/execute.remote.test.ts packages/adapters/codex-local/src/server/execute.remote.test.ts packages/adapters/cursor-local/src/server/execute.remote.test.ts packages/adapters/gemini-local/src/server/execute.remote.test.ts packages/adapters/opencode-local/src/server/execute.remote.test.ts packages/adapters/pi-local/src/server/execute.remote.test.ts` - Result: 7 test files passed, 31 tests passed, 0 failures. ## Risks - Low risk. - The only behavioral surface is the shared env propagation refactor across six adapters; if the helper diverged from prior semantics, an adapter could miss a workspace env var. - The shared helper test plus the affected adapter execute tests reduce that risk, and the helper preserves the prior "set only non-empty strings" behavior. ## Model Used - OpenAI Codex via Paperclip `codex_local` agent runtime; tool-assisted coding workflow with shell execution, file patching, git operations, and API interaction. The exact backend model identifier and context window are not surfaced by this local runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
e4995bbb1c |
Add SSH environment support (#4358)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The environments subsystem already models execution environments, but before this branch there was no end-to-end SSH-backed runtime path for agents to actually run work against a remote box > - That meant agents could be configured around environment concepts without a reliable way to execute adapter sessions remotely, sync workspace state, and preserve run context across supported adapters > - We also need environment selection to participate in normal Paperclip control-plane behavior: agent defaults, project/issue selection, route validation, and environment probing > - Because this capability is still experimental, the UI surface should be easy to hide and easy to remove later without undoing the underlying implementation > - This pull request adds SSH environment execution support across the runtime, adapters, routes, schema, and tests, then puts the visible environment-management UI behind an experimental flag > - The benefit is that we can validate real SSH-backed agent execution now while keeping the user-facing controls safely gated until the feature is ready to come out of experimentation ## What Changed - Added SSH-backed execution target support in the shared adapter runtime, including remote workspace preparation, skill/runtime asset sync, remote session handling, and workspace restore behavior after runs. - Added SSH execution coverage for supported local adapters, plus remote execution tests across Claude, Codex, Cursor, Gemini, OpenCode, and Pi. - Added environment selection and environment-management backend support needed for SSH execution, including route/service work, validation, probing, and agent default environment persistence. - Added CLI support for SSH environment lab verification and updated related docs/tests. - Added the `enableEnvironments` experimental flag and gated the environment UI behind it on company settings, agent configuration, and project configuration surfaces. ## Verification - `pnpm exec vitest run packages/adapters/claude-local/src/server/execute.remote.test.ts packages/adapters/cursor-local/src/server/execute.remote.test.ts packages/adapters/gemini-local/src/server/execute.remote.test.ts packages/adapters/opencode-local/src/server/execute.remote.test.ts packages/adapters/pi-local/src/server/execute.remote.test.ts` - `pnpm exec vitest run server/src/__tests__/environment-routes.test.ts` - `pnpm exec vitest run server/src/__tests__/instance-settings-routes.test.ts` - `pnpm exec vitest run ui/src/lib/new-agent-hire-payload.test.ts ui/src/lib/new-agent-runtime-config.test.ts` - `pnpm -r typecheck` - `pnpm build` - Manual verification on a branch-local dev server: - enabled the experimental flag - created an SSH environment - created a Linux Claude agent using that environment - confirmed a run executed on the Linux box and synced workspace changes back ## Risks - Medium: this touches runtime execution flow across multiple adapters, so regressions would likely show up in remote session setup, workspace sync, or environment selection precedence. - The UI flag reduces exposure, but the underlying runtime and route changes are still substantial and rely on migration correctness. - The change set is broad across adapters, control-plane services, migrations, and UI gating, so review should pay close attention to environment-selection precedence and remote workspace lifecycle behavior. ## Model Used - OpenAI Codex via Paperclip's local Codex adapter, GPT-5-class coding model with tool use and code execution in the local repo workspace. The local adapter does not surface a more specific public model version string in this branch workflow. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
236d11d36f |
[codex] Add run liveness continuations (#4083)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Heartbeat runs are the control-plane record of each agent execution window. > - Long-running local agents can exhaust context or stop while still holding useful next-step state. > - Operators need that stop reason, next action, and continuation path to be durable and visible. > - This pull request adds run liveness metadata, continuation summaries, and UI surfaces for issue run ledgers. > - The benefit is that interrupted or long-running work can resume with clearer context instead of losing the agent's last useful handoff. ## What Changed - Added heartbeat-run liveness fields, continuation attempt tracking, and an idempotent `0058` migration. - Added server services and tests for run liveness, continuation summaries, stop metadata, and activity backfill. - Wired local and HTTP adapters to surface continuation/liveness context through shared adapter utilities. - Added shared constants, validators, and heartbeat types for liveness continuation state. - Added issue-detail UI surfaces for continuation handoffs and the run ledger, with component tests. - Updated agent runtime docs, heartbeat protocol docs, prompt guidance, onboarding assets, and skills instructions to explain continuation behavior. - Addressed Greptile feedback by scoping document evidence by run, excluding system continuation-summary documents from liveness evidence, importing shared liveness types, surfacing hidden ledger run counts, documenting bounded retry behavior, and moving run-ledger liveness backfill off the request path. ## Verification - `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/run-liveness.test.ts server/src/__tests__/activity-service.test.ts server/src/__tests__/documents-service.test.ts server/src/__tests__/issue-continuation-summary.test.ts server/src/services/heartbeat-stop-metadata.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/components/IssueContinuationHandoff.test.tsx ui/src/components/IssueDocumentsSection.test.tsx` - `pnpm --filter @paperclipai/db build` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm --filter @paperclipai/ui typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/run-continuations.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts -t "treats a plan document update"` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/heartbeat-process-recovery.test.ts -t "activity service|treats a plan document update"` - Remote PR checks on head `e53b1a1d`: `verify`, `e2e`, `policy`, and Snyk all passed. - Confirmed `public-gh/master` is an ancestor of this branch after fetching `public-gh master`. - Confirmed `pnpm-lock.yaml` is not included in the branch diff. - Confirmed migration `0058_wealthy_starbolt.sql` is ordered after `0057` and uses `IF NOT EXISTS` guards for repeat application. - Greptile inline review threads are resolved. ## Risks - Medium risk: this touches heartbeat execution, liveness recovery, activity rendering, issue routes, shared contracts, docs, and UI. - Migration risk is mitigated by additive columns/indexes and idempotent guards. - Run-ledger liveness backfill is now asynchronous, so the first ledger response can briefly show historical missing liveness until the background backfill completes. - UI screenshot coverage is not included in this packaging pass; validation is currently through focused component tests. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.4, local tool-use coding agent with terminal, git, GitHub connector, GitHub CLI, and Paperclip API access. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Screenshot note: no before/after screenshots were captured in this PR packaging pass; the UI changes are covered by focused component tests listed above. --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
b9b2bf3b5b | Trim resumed comment wake prompts | ||
|
|
91e040a696 |
Batch inline comment wake payloads
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
cadfcd1bc6 |
Log resolved adapter command in run metadata
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
bd60ea4909 |
refactor: use async fs.readFile in readCursorAuthInfo for consistency
Match the async pattern used by readCodexAuthInfo in the Codex adapter. |
||
|
|
083d7c9ac4 |
fix(cursor): check native auth before warning about missing API key
When CURSOR_API_KEY is not set, check ~/.cursor/cli-config.json for authInfo from `agent login` before emitting the missing key warning. Users authenticated via native login no longer see a false warning. |
||
|
|
67841a0c6d |
Remove noisy "Loaded agent instructions file" log from all adapters
Loading an instructions file is normal, expected behavior — not worth logging to stdout/stderr on every run. Warning logs for failed reads are preserved. Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
d07d86f778 | Merge public-gh/master into paperclip-company-import-export | ||
|
|
c844ca1a40 |
Improve orphaned local heartbeat recovery
Persist child-process metadata for local adapter runs, keep detached runs alive when their pid still exists, queue a single automatic retry when the pid is confirmed dead, and clear detached warnings when the original run reports activity again. Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
58a3cbd654 |
Route non-fatal adapter notices to stdout
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
cfc53bf96b |
Add unmanaged skill provenance to agent skills
Expose adapter-discovered user-installed skills with provenance metadata, share persistent skill snapshot classification across local adapters, and render unmanaged skills as a read-only section in the agent skills UI. Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
0036f0e9a1 | fix: add npm provenance package metadata | ||
|
|
6ba9aea8ba | Add publish metadata for npm provenance | ||
|
|
5890b318c4 |
Namespace company skill identities
Persist canonical namespaced skill keys, split adapter runtime names from skill keys, and update portability/import flows to carry the canonical identity end-to-end. Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
cca086b863 | Merge public-gh/master into paperclip-company-import-export | ||
|
|
76e6cc08a6 | feat(costs): add billing, quota, and budget control plane | ||
|
|
7675fd0856 | Fix runtime skill injection across adapters | ||
|
|
2c35be0212 | Merge public-gh/master into paperclip-company-import-export | ||
|
|
e619e64433 | Add skill sync for remaining local adapters | ||
|
|
325fcf8505 |
Merge pull request #864 from paperclipai/fix/agent-home-env
fix: set AGENT_HOME env var for agent processes |
||
|
|
dcd8a47d4f |
Merge pull request #713 from paperclipai/release/0.3.1
Release/0.3.1 |
||
|
|
d671a59306 |
fix: set AGENT_HOME env var for agent processes
The $AGENT_HOME environment variable was referenced by skills (e.g. para-memory-files) but never actually set, causing runtime errors like "/HEARTBEAT.md: No such file or directory" when agents tried to resolve paths relative to their home directory. Add agentHome to the paperclipWorkspace context in the heartbeat service and propagate it as the AGENT_HOME env var in all local adapters. Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
7d1748b3a7 |
feat: optimize heartbeat token usage
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
29a743cb9e |
fix: keep runtime skills scoped to ./skills
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
4e759da070 | fix: prefer .agents skills and repair codex symlink targets\n\nCo-Authored-By: Paperclip <noreply@paperclip.ing> | ||
|
|
63c62e3ada | chore: release v0.3.1 | ||
|
|
87b8e21701 |
Humanize run transcripts across run detail and live surfaces
Co-Authored-By: Paperclip <noreply@paperclip.ing> |
||
|
|
7e8908afa2 | chore: release v0.3.0 | ||
|
|
5dd1e6335a | Fix root TypeScript solution config | ||
|
|
048e2b1bfe | Remove legacy OpenClaw adapter and keep gateway-only flow | ||
|
|
5fae7d4de7 | Fix CI typecheck and default OpenClaw sessions to issue scope | ||
|
|
732ae4e46c |
feat(cursor): compact shell tool calls and format results in run log
Show only the command for shellToolCall/shell inputs instead of the full payload. Format shell results with exit code + truncated stdout/stderr sections for readability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
69b2875060 |
cursor adapter: use --yolo instead of --trust
The --yolo flag bypasses interactive prompts more broadly than --trust. Updated execute, test probe, docs, and test expectations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |