paperclip

Author	SHA1	Message	Date
Devin Foley	5c2f9aba9d	Run explicit-environment adapter tests on the requested target instead of falling back to the host (#5277 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - When a user clicks "Test" on a configured environment (SSH or sandbox), the agent-test route exercises the adapter against that target > - The route previously fell back to running the probe on the Paperclip host whenever an explicit environment target couldn't be resolved, with the test report still saying "passed" > - That hid two real failure modes: misconfigured environments looked green, and sandbox environments were never actually exercised > - This pull request acquires an ad-hoc lease and realizes a workspace for sandbox/plugin test environments, resolves a sandbox execution target wired to the environment runtime, and returns synthesized diagnostics instead of running a host probe when an explicit env target can't be resolved > - The benefit is the Test action surfaces the real environment state and never silently exercises the wrong machine ## What Changed - `server/routes/agents.ts`: acquire an ad-hoc lease and realize a workspace for sandbox/plugin test environments; resolve a sandbox execution target wired to the environment runtime - Return synthesized diagnostics (no host fallback) when an explicit env target can't be resolved - `server/services/environment-runtime.ts`: small adjustments to support the explicit-env-target case - Clarify test-route messages so they no longer claim a host fallback in explicit env flows - New `agent-test-environment-routes.test.ts` covers the guard and missing-environment path ## Verification - `pnpm vitest run --no-coverage server/src/__tests__/agent-test-environment-routes.test.ts` - `pnpm typecheck` clean - Manual: a deliberately misconfigured sandbox environment now reports diagnostics instead of a misleading host-pass ## Risks Medium — Test route behavior change. Explicit environments that previously appeared to pass via host fallback will now report their real state. This is the desired behavior, but operators should expect to see new failures for environments that were never actually working. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable — new tests cover guard + missing-env paths - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-05 08:00:32 -07:00
Devin Foley	ea7f53fd7d	Handle Gemini CLI v0.38 stream-json wire format across parser, UI, and CLI formatter (#5273 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Each agent uses an adapter that drives a CLI (Claude, Gemini, Codex, etc.) > - The Gemini adapter parses a JSONL transcript stream the CLI emits to learn what the model said > - Gemini CLI v0.38 changed the transcript shape: assistant text now comes through `type=message` with `role`/`content` and terminal status comes through `type=status` / `type=stats` > - The existing parser was written against the older `type=assistant` / `type=result` shape, so post-v0.38 outputs left the parsed summary empty and downgraded the SSH hello probe to "unexpected output" > - This pull request updates every Gemini consumer (server parser, UI parser, CLI formatter) to accept the v0.38 shape while keeping the legacy shape working > - The benefit is the Gemini adapter handles current upstream output without losing backward compatibility, with explicit test coverage for both shapes ## What Changed - `packages/adapters/gemini-local/src/server/parse.ts` recognizes `type=message` events with role/content and stops downgrading them - `packages/adapters/gemini-local/src/ui/parse-stdout.ts` mirrors the parser changes for the live UI transcript - `packages/adapters/gemini-local/src/cli/format-event.ts` formats the new event shape correctly for CLI output - `parse.test.ts` and `parse-stdout.test.ts` add v0.38 coverage; `gemini-local-adapter.test.ts` and `execute.remote.test.ts` switch happy-path fixtures to the current real wire format and keep dedicated tests for the older schema ## Verification - `pnpm vitest run --no-coverage --project @paperclipai/adapter-gemini-local` — full suite passes including new v0.38 cases and preserved legacy cases - `pnpm typecheck` clean ## Risks Low risk — additive event handling. Legacy event shape path is preserved with its own tests, so existing fixtures continue to parse identically. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-05 08:00:14 -07:00
Dotta	3c73ed26b5	Expand plugin host surface (#5205 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The plugin system is the extension boundary for optional product capabilities > - Rich plugins need more than a worker entrypoint: they need scoped database storage, local project folders, managed agents/routines, host navigation, and reusable UI components > - The LLM Wiki work exposed those missing host surfaces while keeping plugin code outside the core control plane > - This pull request expands the core plugin host, SDK, server APIs, and UI bridge so plugins can declare and use those surfaces > - The benefit is that future plugins can integrate with Paperclip through documented, validated contracts instead of bespoke server or UI imports ## What Changed - Added plugin-managed database namespaces and migration tracking, including Drizzle schema/migration files and SQL validation for namespace isolation. - Added server support for plugin local folders, managed agents, managed routines, scoped plugin APIs, and plugin operation visibility. - Expanded shared plugin manifest/types/validators and SDK host/testing/UI exports for richer plugin surfaces. - Added reusable UI pieces for file trees, managed routines, resizable sidebars, route sidebars, and plugin bridge initialization. - Updated plugin docs and example plugins to use the expanded host and SDK surface. ## Verification - `pnpm install --frozen-lockfile` - `pnpm run preflight:workspace-links && pnpm exec vitest run packages/shared/src/validators/plugin.test.ts server/src/__tests__/plugin-database.test.ts server/src/__tests__/plugin-local-folders.test.ts server/src/__tests__/plugin-managed-agents.test.ts server/src/__tests__/plugin-managed-routines.test.ts server/src/__tests__/plugin-orchestration-apis.test.ts ui/src/api/plugins.test.ts ui/src/components/FileTree.test.tsx ui/src/components/ResizableSidebarPane.test.tsx ui/src/pages/PluginPage.test.tsx ui/src/plugins/bridge.test.ts` passed: 11 files, 67 tests. - Confirmed this PR changes 89 files and does not include `pnpm-lock.yaml` or `.github/workflows/*`. ## Risks - Medium: this expands plugin host contracts across db/shared/server/ui and includes a new core migration (`0076_useful_elektra.sql`). - The plugin database namespace validator is intentionally restrictive; plugin authors may need follow-up affordances for SQL patterns that remain blocked. - Merge this before the LLM Wiki plugin PR so the plugin can resolve the new SDK and host APIs. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub workflow. Context window size was not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-05 07:42:57 -05:00
Dotta	d6bee62f02	Fix Cloud tenant issue identifier routes (#5196 ) ## Summary - Allow Cloud tenant issue identifiers with alphanumeric prefixes, such as `PC1897-1`, to normalize as issue references. - Resolve those identifiers through issue detail/update routes, active run/live run polling, activity, costs, and `issueService.getById`. - Keep UI issue-link parsing aligned so tenant links normalize back to `/issues/<IDENTIFIER>`. ## Root Cause Cloud tenant issue prefixes include digits from the stack-id hash. The app-side route normalization still accepted only all-letter prefixes, so `/api/issues/PC1897-1` skipped identifier lookup and fell through as a non-UUID id. ## Verification - `pnpm exec vitest run packages/shared/src/issue-references.test.ts ui/src/lib/issue-reference.test.ts server/src/__tests__/issue-identifier-routes.test.ts server/src/__tests__/activity-routes.test.ts server/src/__tests__/costs-service.test.ts server/src/__tests__/agent-live-run-routes.test.ts server/src/__tests__/issues-service.test.ts` - `pnpm --filter @paperclipai/shared typecheck && pnpm --filter @paperclipai/server typecheck` - `git diff --check` Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-04 13:20:58 -05:00
Dotta	ae23e02526	Support Cloud tenant identity bootstrap Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-05-03 21:55:52 -05:00
Devin Foley	29401b231b	fix(ci): gate new release packages on npm bootstrap (#5146 ) ## Thinking Path > - Paperclip is a control plane for autonomous agent companies, so its release automation is part of the core operator trust boundary. > - The affected subsystem is npm/GitHub Actions release publishing for the public monorepo packages. > - The concrete failure was that a newly added package reached `master`, the canary workflow attempted its first publish, and npm trusted publishing was not yet bootstrapped for that package. > - That means the problem is not just one broken run; it is a missing pre-merge guard that lets release-ineligible packages land and only fail once `publish_canary` runs. > - This pull request makes release enrollment explicit, validates that enrollment in CI, and adds a PR-time bootstrap check against npm for changed release-enabled package manifests. > - The result is that we keep trusted publishing, avoid teaching CI to `npm adduser`, and move this class of failure from post-merge canary time to pre-merge review time. ## What Changed - Added `scripts/release-package-manifest.json` so release-managed public packages are explicitly enrolled instead of being inferred from every non-private workspace package. - Hardened `scripts/release-package-map.mjs` to validate the manifest before release workflows rewrite versions or assemble publish payloads. - Added `scripts/check-release-package-bootstrap.mjs` and wired it into `.github/workflows/pr.yml` so PRs that change a release-enabled package manifest fail if that package does not already exist on npm. - Added release-package manifest coverage tests to `scripts/release-package-map.test.mjs` and included them in `pnpm run test:release-registry`. - Wired manifest validation into `.github/workflows/release.yml` and documented the first-publish bootstrap policy in `doc/PUBLISHING.md` and `doc/RELEASE-AUTOMATION-SETUP.md`. ## Verification - `pnpm run test:release-registry` - `./scripts/release.sh canary --skip-verify --dry-run` - Confirmed the committed diff contains no obvious PII/secrets via targeted pattern scan before pushing. ## Risks - Low risk overall: this is CI/release-policy code, not product runtime logic. - The new PR bootstrap check depends on npm metadata availability, so a transient npm outage could block a PR that changes a release-enabled package manifest. - The manifest introduces a new source of truth that must stay aligned with public package additions, but that is intentional and now enforced. ## Model Used - OpenAI Codex via the `codex_local` Paperclip adapter; GPT-5-based coding agent with tool use, terminal execution, git, and GitHub CLI. Exact served model ID/context window are not exposed by the local runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 19:31:28 -07:00
Devin Foley	90631b09b3	Let adapters declare runtime command spec for remote provisioning (#5141 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies, running adapter > commands like `claude`, `codex`, `pi` either locally or on remote runtimes > (SSH hosts, sandboxes, etc.) > - On a fresh remote runtime — particularly an ephemeral sandbox — the > adapter's CLI may not be installed yet. Today operators handle this via > external configuration (e.g. a project-level `provisionCommand` shell > script) that has to know about every adapter the operator might want to use > - This means every adapter has its own well-known npm package, but operators > end up writing duplicate provision shell scripts that paste together > `npm install -g @anthropic-ai/claude-code`, `npm install -g @openai/codex`, > etc. — knowledge the adapter itself already has > - This PR moves that knowledge into the adapter modules: each adapter declares > how its runtime command should be detected and (if applicable) installed > via `getRuntimeCommandSpec(config)`. The execution path runs the adapter's > own install command on remote sandbox targets before launching, so a fresh > sandbox bootstraps itself instead of requiring a hand-written provision script > - The benefit is fewer footguns for operators provisioning remote runtimes, > and a clean place for new adapters to plug in their install recipe ## What Changed - New types in `packages/adapter-utils/src/types.ts`: - `AdapterRuntimeCommandSpec` describing `command`, optional `detectCommand`, and optional `installCommand` - Optional `getRuntimeCommandSpec(config)` on `ServerAdapterModule` - Optional `runtimeCommandSpec` on `AdapterExecutionContext` so adapters receive the resolved spec at execute time - New helper `ensureAdapterExecutionTargetRuntimeCommandInstalled(...)` in `packages/adapter-utils/src/execution-target.ts` that runs the install command on remote targets when `transport === "sandbox"`. SSH and local targets are no-ops. Throws on timeout or non-zero exit so failures surface early. - Each of `claude-local`, `codex-local`, `cursor-local`, `gemini-local`, `opencode-local`, `pi-local`'s `execute.ts` now reads `ctx.runtimeCommandSpec?.installCommand` and calls the helper before launching the adapter command. - `server/src/adapters/registry.ts` declares `getRuntimeCommandSpec` for each adapter: - claude/codex/gemini/opencode/pi-local: `npm install -g <package>` recipe via a shared `buildNpmRuntimeCommandSpec` helper, with a defensive guard that only auto-installs when the configured `command` matches the well-known fallback (custom binaries are left alone). - cursor-local: declares `command` only; no auto-install (no public npm package), preserving the existing manual setup. - `server/src/services/heartbeat.ts` resolves the spec via `adapter.getRuntimeCommandSpec?.(runtimeConfig)` and passes it through to `AdapterExecutionContext`. - Tests added in `execution-target.test.ts` (~75 lines), e2b `plugin.test.ts` (~32 lines), and `environment-run-orchestrator.test.ts` (~76 lines). ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm --filter @paperclipai/server test -- environment-run-orchestrator` - `pnpm --filter @paperclipai/sandbox-providers-e2b test` - Manual QA: run an adapter (claude/codex/etc.) against a fresh sandbox-backed environment that does NOT have the adapter CLI pre-installed. Confirm the install runs once at the start of the agent run and the adapter then launches successfully. Re-run on the same sandbox; confirm the install command is idempotent and the second run starts faster. - Confirm SSH and local execution paths are unaffected (gated by `transport === "sandbox"`). ## Risks - Behavioural shift on sandbox runs: a new install step now runs at the start of every sandbox agent run for adapters with `installCommand` set. The install commands are idempotent (`if ! command -v X >/dev/null 2>&1; then npm install -g <pkg>; fi`), so this is fast on warm sandboxes. On a cold sandbox, the first run takes longer. - Operators who used the legacy project-level `provisionCommand` to install adapter CLIs can drop that part of their script; the adapter handles it now. Existing scripts continue to work — installs are idempotent. - The cursor-local adapter has no auto-install (no public npm package). Behaviour for cursor-local on sandboxes is unchanged. - New optional surface on `ServerAdapterModule`. Plugins that don't implement `getRuntimeCommandSpec` retain previous behaviour (no auto-install). ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 18:35:36 -07:00
Devin Foley	0e51fa2b0d	Honor reuse-existing preference and assignee default environment in issue runs (#5139 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run inside execution workspaces (a per-issue cwd + env), and an issue > can prefer to reuse an existing workspace or get a fresh one each time > - The heartbeat service was reading the existing workspace's config to derive > environment selection regardless of whether the issue actually wanted to reuse > it. So fresh-run issues were inheriting stale config from a workspace that was > about to be discarded > - Separately, when an issue is assigned to an agent, the issue's execution > workspace settings weren't picking up the agent's `defaultEnvironmentId`, > even though the agent's choice is the natural default for that issue > - This PR makes both selection paths honor the obvious source of truth: > workspace config flows only when the issue actually wants `reuse_existing`, > and the assignee agent's default environment is applied at assignment time if > nothing else is set on the issue > - The benefit is that re-running a flaky issue picks up the right environment > instead of inheriting the previous run's config, and assigning an agent to an > issue does the obvious thing without operator intervention ## What Changed - `server/src/services/heartbeat.ts`: introduce `reusableExecutionWorkspaceConfig` that is non-null only when `shouldReuseExisting` is true. Both `resolveExecutionWorkspaceEnvironmentId(...)` and `applyPersistedExecutionWorkspaceConfig(...)` now read from it instead of unconditionally consulting `existingExecutionWorkspace?.config`. Fresh-run issues no longer inherit stale environment config from an in-flight workspace about to be discarded. - `server/src/services/issues.ts`: when an issue update sets a new `assigneeAgentId` and isolated workspaces are enabled, populate `executionWorkspaceSettings.environmentId` from the assignee agent's `defaultEnvironmentId` if the issue doesn't have an explicit `environmentId` set yet. - Tests added in `heartbeat-plugin-environment.test.ts` (~216 lines) and `issues-service.test.ts` (~85 lines) covering both paths. ## Verification - `pnpm --filter @paperclipai/server test -- heartbeat-plugin-environment issues-service` - Manual QA: assign an issue to an agent that has a non-default `defaultEnvironmentId`, confirm the issue's workspace settings now include that environment id without operator intervention. Trigger a rerun on an issue whose existing workspace points at a stale environment, confirm the rerun uses the freshly-resolved environment. ## Risks - Behavioural shift on assignment: previously assigning an agent didn't propagate the agent's default environment to the issue. Now it does. Callers that explicitly want the issue to keep its existing/null environment must set `executionWorkspaceSettings.environmentId` themselves; the new logic only fires when no explicit value is set. - Behavioural shift on rerun: stale workspace config is no longer applied to fresh runs. Operators who relied on this implicit inheritance may see different environment selection on the first rerun after deploy. Mitigation: the explicit isssue settings and project policy are still honored as before. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A (no UI changes) - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 18:33:55 -07:00
Devin Foley	bb7d040894	Switch OpenCode to explicit static/local-aware model selection (#5117 ) > Stacked PR (part 4 of 7). Depends on: - PR #5114 - PR #5115 - PR #5116 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - When creating an OpenCode-local agent, Paperclip currently validates > `adapterConfig.model` against the Paperclip host's `opencode models` output > - SSH testing surfaced that this blocks creating an OpenCode agent for an SSH > environment: the model that exists on the SSH target isn't visible to the > host, so creation fails with "OpenCode requires `adapterConfig.model` in > provider/model format" even when the operator picked a real remote model > - The initial direction was environment-aware model discovery; the final > decision was to keep OpenCode on the same explicit-model pattern as other > adapters (default + curated list + manual override) and stop blocking > creation on host-side discovery > - This PR does both: the adapter-models endpoint now accepts `environmentId` and > probes against the target environment, and the create-time hard gate is > replaced by `requireOpenCodeModelId` which validates `provider/model` format > without requiring host-local discovery. Test/run-time still surfaces real > auth/availability problems > - The benefit is that operators can create OpenCode agents for remote > environments without out-of-band setup, and the model picker in the UI > reflects the actually-targeted environment ## What Changed - Added `requireOpenCodeModelId(input)` in `opencode-local/src/server/models.ts`, exported it from the adapter index - `ensureOpenCodeModelConfiguredAndAvailable` now delegates the format check to `requireOpenCodeModelId` - `agentsApi.adapterModels(companyId, adapterType, { environmentId })` now accepts an environment ID and passes it as a query parameter - `queryKeys.agents.adapterModels` now keys on `(companyId, adapterType, environmentId)` - `server/src/routes/agents.ts` reads and validates the new query parameter, forwarding it to the adapter's model probe - `AgentConfigForm.tsx` and `OnboardingWizard.tsx` build the model query key from the currently selected default environment ID and disable autodetect for `opencode_local` (model selection is explicit) - `NewAgent.tsx` simplified — no longer special-cases OpenCode autodetect - `company-portability.ts` no longer needs OpenCode-specific autodetect handling - Tests added/updated: `adapter-model-refresh-routes.test.ts`, `adapter-models.test.ts`, `agent-permissions-routes.test.ts`, `opencode-local/src/server/models.test.ts` ## Verification - `pnpm --filter @paperclipai/server test -- adapter-models adapter-model-refresh agent-permissions` - `pnpm --filter @paperclipai/adapter-opencode-local test` - `pnpm --filter @paperclipai/ui test -- AgentConfigForm OnboardingWizard NewAgent` - Manual QA in browser: 1. Boot Paperclip on Tailscale-bound port (so it's reachable from another machine), create an OpenCode-local agent, switch the default environment between two installed sandboxes, and confirm the model list refreshes per-environment 2. Submit with a malformed `provider/model` string and verify the new `requireOpenCodeModelId` error surfaces - Before/after screenshots attached for `AgentConfigForm` model picker ## Risks - Behavioural shift: switching default environment now triggers a model refetch. Should be cheap but introduces a new UI loading state for OpenCode users. - Removing dynamic autodetect for OpenCode: if any user configured an agent without specifying `model` and relied on autodetect populating it, that agent will now fail at submit time. Mitigation: validation error is explicit and actionable. - New query string parameter on `/api/companies/:id/adapter-models` — older clients that omit it still work (parameter is optional and defaults to null). ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 13:01:34 -07:00
Devin Foley	076067865f	Migrate SSH environment callback to bridge (#5116 ) > Stacked PR (part 3 of 7). Depends on: - PR #5114 - PR #5115 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents executing on a remote SSH-backed environment need a way to call back into > the Paperclip control plane (run events, log streaming, signals) > - When the SSH host can't reach the Paperclip host (NAT, firewalls, or simply not > on the same network), the run silently fails or hangs — a recurring class of > failure during SSH testing > - In sandboxed environments we already solved this with a callback bridge that > tunnels back through the existing connection; SSH was the odd one out > - This PR migrates SSH execution to use the same callback bridge, so every > adapter's remote run uses one consistent reverse-channel. Per-adapter SSH glue > is deleted in favour of a shared `CommandManagedRuntimeRunner` built from the > SSH spec > - The benefit is fewer SSH-specific failure modes, a smaller code surface, and > one place to evolve the callback contract going forward ## What Changed - Added `createSshCommandManagedRuntimeRunner` in `packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a generic command-managed-runtime runner (with cwd, env, and timeout handling) - Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge URL now flows through the shared runner - Reworked `execution-target.ts` to use the SSH runner alongside sandbox runners via a unified `CommandManagedRuntimeRunner` interface - Simplified `remote-managed-runtime.ts` and `sandbox-managed-runtime.ts` to consume the shared runner abstraction - Deleted per-adapter SSH callback wiring from claude-local, codex-local, cursor-local, gemini-local, opencode-local, pi-local execute.ts files - Removed `environment-runtime-driver-contract.test.ts` (the contract is now enforced by `environment-execution-target.test.ts`) - Added/updated `execute.remote.test.ts` cases for each adapter to cover the SSH runner path ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm test -- execute.remote` (covers all six local adapters' SSH paths) - Manual QA: ran a claude-local agent against an SSH-backed environment, confirmed the agent successfully called back to `/api/agent-callback/*` endpoints during the run ## Risks - Refactor touches all six local adapters. If any adapter had subtle SSH-specific behaviour that wasn't captured in tests, it could regress. Mitigation: each adapter's `execute.remote.test.ts` was extended. - `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking type change for any internal consumer. Verified no external plugins consume this type. - The new `CommandManagedRuntimeRunner` shape is a public surface in `@paperclipai/adapter-utils`; downstream plugins implementing custom runners may need updates, but no such plugins exist in this repo. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 12:43:52 -07:00
Devin Foley	a7b45938b7	Let sandbox providers declare shell defaults (#5114 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents execute in sandboxed remote environments served by pluggable sandbox > providers (E2B today, more later) > - Today every sandbox command runs under `sh -lc` regardless of what the > provider's container actually ships > - That misses bash-only shell init on E2B (which ships bash) and prevents > future providers from declaring a different default — there's no way for a > provider to say "I have bash, use it" > - This PR adds a `shellCommand` field to sandbox execution targets so providers > can declare their preferred shell ("bash" for E2B), threads it through the > sandbox-managed-runtime client, callback bridge, and execution-target shell > helper, and validates the value at the lease-metadata boundary > - The benefit is that sandbox commands run under the right shell on the right > provider, and adding new sandbox providers only needs to declare a shell > preference ## What Changed - Added `packages/adapter-utils/src/sandbox-shell.ts` exporting `preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is `"bash"`, else `"sh"`) - Added `shellCommand?: "bash" \| "sh" \| null` to `AdapterSandboxExecutionTarget` and `CommandManagedRuntimeSpec`; threaded it through `runAdapterExecutionTargetShellCommand`, `prepareAdapterExecutionTargetRuntime`, and `startAdapterExecutionTargetPaperclipBridge` - `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`, and `createCommandManagedSandboxCallbackBridgeQueueClient` now take an optional `shellCommand` and use `preferredShellForSandbox` to pick the shell - `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its server startup, readiness probe, and stop hook - E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata` - `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease metadata (validating against `"bash" \| "sh" \| null`) - `environment-runtime.ts` adds `"shellCommand"` to `INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS` so the field round-trips through internal plugin config without leaking to external plugin metadata - Updated tests in `command-managed-runtime.test.ts`, `execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`, `environment-execution-target.test.ts` ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm --filter @paperclipai/server test -- environment-execution-target` - `pnpm --filter @paperclipai/sandbox-providers-e2b test` - Manual QA: boot a Paperclip instance, create an E2B-backed environment, run a claude_local agent against it, and confirm the run completes (verifies bash shell semantics flow through the callback bridge end-to-end) ## Risks - E2B sandbox commands now run under `bash -lc` instead of `sh -lc`. Bash is a strict superset for the commands we issue (no busybox-only flags in our shell scripts), so risk is low. The shellCommand field is opt-in via lease metadata — providers that don't declare it stay on `sh`. - New optional field on `CommandManagedRuntimeSpec` and `AdapterSandboxExecutionTarget`. Consumers ignoring the field retain previous behaviour (sh). - Lease metadata now carries an additional field. Existing leases without `shellCommand` resolve to `null` and fall back to sh — backwards compatible. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A (no UI changes) - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-05-03 12:19:35 -07:00
Dotta	15eac43b43	[codex] Retry max-turn exhausted heartbeats (#5096 ) ## Thinking Path > - Paperclip orchestrates AI agents for autonomous companies, and heartbeat execution is the control-plane loop that keeps assigned work moving. > - Max-turn exhaustion is a recoverable local-adapter stop condition for Claude and Gemini agents when a run needs another heartbeat to continue safely. > - The previous behavior could leave max-turn continuation details hard to inspect, and duplicate/stale continuation wakes could keep running after issue state changed. > - The adapter layer also needed to avoid trusting arbitrary stdout/stderr text as scheduler control metadata. > - This pull request adds bounded max-turn continuation scheduling, visible retry state, structured stop metadata handling, and stale/duplicate continuation guards. > - The benefit is safer automatic continuation after max-turn stops, clearer operator visibility, and fewer duplicate or stale agent runs. ## What Changed - Replaces closed PR #4952, whose head repository was deleted. - Rebases the recovered max-turn continuation branch onto current `paperclipai/paperclip:master`. - Adds max-turn continuation scheduling and retry-state plumbing for heartbeat runs. - Adds stale/duplicate continuation suppression when issue status, ownership, or execution locks change. - Normalizes Claude/Gemini max-turn detection around structured stop metadata instead of unstructured stdout/stderr text. - Surfaces max-turn continuation settings and retry visibility in the board UI. - Adds focused server, adapter, and UI tests for max-turn stop metadata, retry scheduling, stale queued-run invalidation, adapter parsing/execution, run ledger display, and agent config patching. ## Verification - `pnpm install --no-frozen-lockfile` to refresh local dependencies after rebasing onto current `master`. - `pnpm run preflight:workspace-links && pnpm exec vitest run server/src/__tests__/claude-local-adapter.test.ts server/src/__tests__/claude-local-execute.test.ts server/src/__tests__/gemini-local-adapter.test.ts server/src/__tests__/gemini-local-execute.test.ts server/src/__tests__/heartbeat-retry-scheduling.test.ts server/src/__tests__/heartbeat-stale-queue-invalidation.test.ts server/src/services/heartbeat-stop-metadata.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/lib/agent-config-patch.test.ts ui/src/lib/runRetryState.test.ts --testTimeout=20000` - `pnpm --filter @paperclipai/adapter-claude-local typecheck && pnpm --filter @paperclipai/adapter-gemini-local typecheck && pnpm --filter @paperclipai/server typecheck && pnpm --filter @paperclipai/ui typecheck` - UI screenshot note: the UI changes are limited to config/ledger state rendering rather than layout changes; component/unit coverage above verifies the rendered behavior. ## Risks - Medium behavior risk: heartbeat retry gating now suppresses max-turn continuations when issue state or execution locks drift, so any callers that relied on stale continuations running will now see cancellation instead. - Low adapter risk: Claude/Gemini unstructured text no longer triggers max-turn scheduler metadata, so only structured stop signals and Gemini exit code 53 are trusted. - No database migrations. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex coding agent, GPT-5-class model, tool-enabled local repository editing and command execution. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots (not applicable: state/default rendering only; covered by component/unit tests) - [x] I have updated relevant documentation to reflect my changes (not applicable: no user-facing command or docs contract changed) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-03 11:30:48 -05:00
Dotta	57229d0f24	[codex] Add issue monitor liveness controls (#4988 ) ## Thinking Path > - Paperclip is a control plane for autonomous AI companies where work must stay observable, governable, and recoverable. > - The task/heartbeat subsystem owns agent execution continuity, issue state transitions, and visible recovery behavior. > - Waiting on an external service is not the same as being blocked when the assignee still owns a future check. > - The gap was that agents had no first-class one-shot monitor state for external-service waits, so recovery could look stalled or require ad hoc comments. > - This pull request adds bounded issue monitors that can wake the owner, clear exhausted waits, and produce explicit recovery behavior. > - It also surfaces monitor status in the board UI and documents when to use monitors versus `blocked`. > - The benefit is clearer liveness semantics for asynchronous waits without weakening single-assignee task ownership. ## What Changed - Added issue monitor fields, shared types, validators, constants, and an idempotent `0075` migration for scheduled monitor state. - Added server-side monitor scheduling, dispatch, recovery bounds, activity logging, and external-ref redaction. - Added board/agent route coverage for monitor permissions and child monitor scheduling. - Added issue detail/property UI for monitor state, a monitor activity card, and Storybook stories for review surfaces. - Documented monitor semantics and recovery policy behavior in `doc/execution-semantics.md`. - Addressed Greptile review feedback by preserving monitor state in skipped-stage builders and making board monitor saves send `scheduledBy: "board"`. ## Verification - `pnpm install --frozen-lockfile` - `pnpm run preflight:workspace-links && pnpm exec vitest run server/src/__tests__/issue-execution-policy-routes.test.ts server/src/__tests__/issue-execution-policy.test.ts server/src/__tests__/issue-monitor-scheduler.test.ts server/src/__tests__/recovery-classifiers.test.ts ui/src/components/IssueMonitorActivityCard.test.tsx ui/src/components/IssueProperties.test.tsx ui/src/lib/activity-format.test.ts` - First run passed 5 files and failed to collect 2 server suites because the worktree was missing the optional `acpx/runtime` dependency. - After `pnpm install --frozen-lockfile`, reran the 2 failed suites successfully. - `pnpm exec vitest run server/src/__tests__/issue-monitor-scheduler.test.ts server/src/__tests__/recovery-classifiers.test.ts` - `pnpm --filter @paperclipai/shared typecheck && pnpm --filter @paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck && pnpm --filter @paperclipai/ui typecheck` - `pnpm exec vitest run server/src/__tests__/issue-execution-policy.test.ts ui/src/components/IssueProperties.test.tsx` - `pnpm --filter @paperclipai/server typecheck && pnpm --filter @paperclipai/ui typecheck` - `pnpm exec vitest run ui/src/components/IssueMonitorActivityCard.test.tsx ui/src/components/IssueProperties.test.tsx` - `pnpm --filter @paperclipai/ui typecheck` - Storybook screenshot captured from `http://127.0.0.1:6006/iframe.html?viewMode=story&id=product-issue-monitor-surfaces--monitor-surfaces` with Playwright. ## Screenshots ![Issue monitor Storybook surfaces](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2945-when-a-task-is-waiting-for-an-_external-service_-what-state-should-it-be-in-and-what-recovery-method-could-it-h/docs/pr-screenshots/pap-2945/monitor-surfaces.png) ## Risks - Medium: this changes heartbeat recovery behavior for scheduled external-service waits, so regressions could affect wake timing or recovery issue creation. - Migration risk is reduced by using `IF NOT EXISTS` for the new issue monitor columns and index. - External monitor references are treated as secret-adjacent and are intentionally omitted from visible activity/wake payloads. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent with repository tool use and terminal execution. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots or Storybook review surfaces - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-03 08:58:53 -05:00
Dotta	2d72292ad6	[codex] Add workspace routine run tab (#4958 ) ## Thinking Path > - Paperclip orchestrates AI agents through reusable execution workspaces and routines > - Operators need a fast way to run workspace-aware routines against a specific execution workspace > - The existing workspace detail surface showed configuration, runtime logs, and linked issues, but not routines that depend on workspace variables > - Routine runs also needed to prefill the selected execution workspace so branch variables resolve correctly > - This pull request adds a workspace routines tab and prefilled routine-run dialog support > - The benefit is a tighter workflow for rerunning reviews, smoke checks, and other workspace-specific routines ## What Changed - Added an execution workspace `Routines` tab and company-prefixed routes. - Listed routines that declare or reference workspace-specific variables. - Added `Run now` support that preselects the current execution workspace in `RoutineRunVariablesDialog`. - Centralized reusable execution workspace ordering/deduplication for issue creation and workspace cards. - Added focused UI helper and dialog regression tests. ## Verification - `pnpm exec vitest run ui/src/lib/reusable-execution-workspaces.test.ts ui/src/lib/workspace-routines.test.ts ui/src/components/RoutineRunVariablesDialog.test.tsx ui/src/lib/company-routes.test.ts` - Screenshots were not captured in this PR split; the visible flow is covered by focused component/helper tests and should get browser QA in the follow-up issue. ## Risks - Medium risk: this adds a new workspace detail tab and routine-run path. It is isolated to workspace-scoped routines and uses existing routine run APIs. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool use and local command execution. Exact context window was not exposed in the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 11:58:15 -05:00
Dotta	570a4206da	[codex] Recover productive terminal continuations (#4956 ) ## Thinking Path > - Paperclip orchestrates AI agents through issue-scoped heartbeat runs > - Recovery logic decides whether in-progress work still has a live path after a terminal run > - A productive terminal continuation can still leave an issue stranded when no active run or wake remains > - Treating that state as healthy leaves work stuck despite evidence that more action is needed > - This pull request re-enqueues recovery for productive terminal continuations that left no live path > - The benefit is fewer silently stranded in-progress issues after agents make partial progress ## What Changed - Reclassified successful-but-productive terminal continuations as recoverable when no live path remains. - Enqueue a follow-up recovery wake with the original run id and continuation metadata. - Added regression tests covering productive terminal continuation recovery and advanced liveness handoff. ## Verification - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts server/src/__tests__/run-continuations.test.ts` ## Risks - Medium risk: recovery may schedule one more follow-up where Paperclip previously considered the work observed. The existing uniqueness, budget, and escalation checks still constrain retry loops. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool use and local command execution. Exact context window was not exposed in the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 11:57:23 -05:00
Dotta	3cd26a78fc	[codex] Surface live run comment context (#4957 ) ## Thinking Path > - Paperclip orchestrates AI agents through issue comments and heartbeat runs > - The board UI needs to distinguish a comment that triggered a live run from comments queued after that run started > - The run payload already stores comment context, but active-run API responses did not expose the ids the UI needs > - Without those ids, the triggering comment can flash as queued while the agent is already responding to it > - This pull request exposes live-run comment context and teaches the optimistic comment helper to ignore the trigger comment > - The benefit is clearer issue-chat state during comment-triggered agent interruptions ## What Changed - Added `contextCommentId` and `contextWakeCommentId` to active/live run payloads. - Threaded those ids through server routes, heartbeat summaries, UI API types, and issue detail rendering. - Updated optimistic comment classification to avoid marking the triggering comment as queued. - Added server and UI regression coverage. ## Verification - `pnpm exec vitest run server/src/__tests__/agent-live-run-routes.test.ts ui/src/lib/optimistic-issue-comments.test.ts` ## Risks - Low-to-medium risk: adds optional fields to existing run payloads. Existing consumers should ignore unknown fields, and UI handling is null-safe. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool use and local command execution. Exact context window was not exposed in the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 10:44:11 -05:00
Dotta	e8275318ba	[codex] Raise agent heartbeat concurrency default (#4954 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agent heartbeat settings control how much parallel work one employee can run > - The previous default of 5 concurrent runs was too restrictive for active local agent teams > - The shared default, heartbeat clamp, docs, and route/import/UI expectations need to agree > - This pull request raises the default heartbeat concurrency to 20 while keeping explicit headroom up to 50 for power users > - The benefit is higher throughput for agent teams without each new agent needing manual runtime config edits ## What Changed - Raised `AGENT_DEFAULT_MAX_CONCURRENT_RUNS` from 5 to 20. - Raised the heartbeat service max clamp from 10 to 50, keeping the new default below the ceiling. - Updated V1 implementation docs and tests that assert default imported/exported runtime config. - Updated the new-agent UI runtime config test to assert the shared default constant instead of duplicating the numeric value. ## Verification - `pnpm exec vitest run server/src/__tests__/agent-permissions-routes.test.ts server/src/__tests__/company-portability.test.ts ui/src/lib/new-agent-runtime-config.test.ts` ## Risks - Medium risk: new agents can consume more local execution capacity by default. The heartbeat scheduler still respects configured max concurrency and budget/pause controls, and operators can lower or raise the per-agent cap within the `1..50` clamp. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool use and local command execution. Exact context window was not exposed in the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 10:42:56 -05:00
Dotta	e273d621fc	[PAP-3154] Stop padding /live-runs by default (#4963 ) ## Summary - Fix [PAP-3154](/PAP/issues/PAP-3154): the Sidebar's "Dashboard NN live" badge showed a constant 50 in every company because `GET /api/companies/:companyId/live-runs` was padding its response with up to 50 recent (non-live) heartbeat runs whenever the caller did not pass `minCount`. - Regression introduced by [#4875](https://github.com/paperclipai/paperclip/pull/4875) (commit `6445bef9`), which capped both `minCount` and `limit` at 50 with a fallback of 50 for omitted values. The cap is correct for `limit` (real unboundedness guard); for `minCount` it conflates "no padding" with "pad to the cap". - Default `minCount` to 0 so callers asking for "live runs" only get actually-live runs unless they explicitly request padding (`ActiveAgentsPanel` is the only caller that does). Keep `limit` capped at 50 by default. ## Test plan - [x] `pnpm exec vitest run server/src/__tests__/agent-live-run-routes.test.ts` — 7/7 pass, including new tests for the no-pad default and explicit padding. - [x] `pnpm exec vitest run ui/src/components/Sidebar.test.tsx ui/src/components/ActiveAgentsPanel.test.tsx ui/src/api/heartbeats.test.ts` — 6/6 pass. - [ ] Verify in dev: with ~8 truly-live runs in a company, the sidebar Dashboard badge shows the real count (not 50). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 10:33:13 -05:00
Dotta	42a299fb9d	[codex] Bound productivity review recovery loops (#4948 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The heartbeat/productivity review subsystem detects when assigned work is likely stuck or churning. > - Productivity reviews are useful, but repeated reconciliation can create noisy refresh comments or repeated review issues around the same source issue. > - That makes manager follow-up harder because the signal can get buried under duplicate review activity. > - This pull request bounds productivity review refreshes and creation loops while preserving the existing escalation path. > - The benefit is a quieter recovery loop that still surfaces stuck or high-churn work for manager attention. ## What Changed - Added refresh throttling for open productivity review issues, including a one-hour default interval and a maximum of three refresh comments per open review. - Added a rolling 24-hour creation cap so completed/closed reviews cannot immediately recreate review issues indefinitely for the same source issue. - Excluded cancelled productivity reviews from the creation cap so manager cancellations do not silently suppress future legitimate reviews. - Preserved productivity review timestamps in deterministic test paths and added targeted coverage for immediate refresh suppression, refresh caps, creation caps, and cancelled-review exclusion. ## Verification - `pnpm run preflight:workspace-links && pnpm exec vitest run server/src/__tests__/productivity-review-service.test.ts` - `pnpm exec vitest run server/src/__tests__/productivity-review-service.test.ts` - Greptile Review: 5/5 on commit `bcf25832d0ffae25890b2ee7eed112d1c2d114fe` with review threads resolved. - GitHub PR checks passed on the latest head: `policy`, `verify`, `e2e`, `Greptile Review`, and `security/snyk (cryppadotta)`. - Verified the branch is rebased onto `public-gh/master` with no conflicts. - Verified the diff does not include `pnpm-lock.yaml`, database schema changes, or migrations. ## Risks - Low-to-medium risk: this changes automation cadence for productivity reviews. A truly stuck issue may receive fewer repeated refresh comments, but the original review issue remains open and assigned for manager action. - No migration risk: this is server logic and tests only. > Checked [`ROADMAP.md`](ROADMAP.md) for overlapping planned core work; this is a targeted recovery-loop fix and does not add a new roadmap feature. ## Model Used - OpenAI Codex coding agent, GPT-5 model family, tool-using software engineering mode. Exact context window is not exposed in this runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots (not applicable; server-only change) - [x] I have updated relevant documentation to reflect my changes (not applicable; no user-facing docs or commands changed) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-05-01 08:32:04 -05:00
Dotta	4272c1604d	Add ACPX local adapter runtime (#4893 ) ## Thinking Path > - Paperclip orchestrates AI-agent companies through a control plane that can start, supervise, and recover agent runs. > - Local adapters are the bridge between Paperclip issues and concrete agent runtimes such as Claude, Codex, and other ACP-compatible tools. > - The roadmap calls out broader “bring your own agent” and claw-style agent support, and ACPX gives Paperclip one path to normalize multiple ACP agents behind a single adapter. > - The branch needed to become one reviewable PR against current `paperclipai/paperclip:master`, without carrying stale base conflicts or generated lockfile churn. > - This pull request adds an experimental built-in `acpx_local` adapter, integrates it through the server/CLI/UI adapter surfaces, and adds regression coverage for runtime execution, skill sync, stream parsing, diagnostics, and log redaction. > - The benefit is that Paperclip can run Claude/Codex/custom ACP agents through ACPX while keeping operator configuration, skills, logging, and transcript rendering inside the existing adapter model. ## What Changed - Added `@paperclipai/adapter-acpx-local` with server execution, config schema, ACPX session handling, CLI formatting, UI config helpers, and stdout parsing. - Registered `acpx_local` across CLI, server, shared constants, UI adapter metadata, adapter capabilities, and agent creation/editing surfaces. - Added ACPX runtime execution support with persistent sessions, local-agent JWT environment handling, skill snapshots, runtime skill materialization, and isolation/security regressions. - Added ACPX adapter diagnostics and marked the adapter experimental in the UI. - Added command/env secret redaction for resolved command metadata in adapter-utils, server event storage, and the Agent Detail invocation UI. - Added Storybook coverage for ACPX config, transcript rendering, and skill states, plus PR screenshots under `docs/pr-screenshots/pap-2944/`. - Rebased the branch onto current `public-gh/master`; `pnpm-lock.yaml` is intentionally not included and there are no migration/schema changes. ## Verification - `pnpm exec vitest run packages/adapters/acpx-local/src/server/execute.test.ts packages/adapters/acpx-local/src/server/test.test.ts packages/adapters/acpx-local/src/cli/format-event.test.ts packages/adapters/acpx-local/src/ui/parse-stdout.test.ts packages/adapter-utils/src/server-utils.test.ts server/src/__tests__/redaction.test.ts server/src/__tests__/acpx-local-execute.test.ts server/src/__tests__/acpx-local-skill-sync.test.ts server/src/__tests__/acpx-local-adapter-environment.test.ts server/src/__tests__/adapter-routes.test.ts server/src/__tests__/agent-skills-routes.test.ts ui/src/adapters/metadata.test.ts` — 12 files, 87 tests passed. - `pnpm --filter @paperclipai/adapter-acpx-local typecheck` — passed. - `pnpm --filter @paperclipai/server typecheck` — passed. - `pnpm --filter @paperclipai/ui typecheck` — passed. - Confirmed PR diff does not include `pnpm-lock.yaml`, database schema files, or migrations. Screenshots: ![ACPX Claude skills light](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-claude-light.png?raw=true) ![ACPX Claude skills dark](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-claude-dark.png?raw=true) ![ACPX custom skills light](https://github.com/cryppadotta/paperclip-1/blob/PAP-2944-acpx-make-a-claude_local-adapter-that-uses-acpx-instead/docs/pr-screenshots/pap-2944/skills-custom-light.png?raw=true) ## Risks - Medium risk: this introduces a new built-in adapter package and touches runtime execution, adapter registration, agent config, skills, and transcript rendering. - ACPX and ACP agent behavior can vary by installed tool versions; the adapter is marked experimental to set operator expectations. - `pnpm-lock.yaml` is excluded per repository PR policy, so dependency lock refresh must be handled by the repo’s automation or maintainers. - No database migration risk: no schema or migration files changed. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex coding agent based on GPT-5, with repository tool use, shell execution, git operations, and local verification. Exact hosted context window was not exposed in this environment. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-30 19:57:05 -05:00
Dotta	ad5432fece	[codex] Harden issue recovery reliability (#4875 ) ## Thinking Path > - Paperclip is the control plane for autonomous agent companies, so non-terminal issue state must always have a clear live, waiting, or recovery owner. > - This change stays inside the server reliability and liveness subsystem for assigned issue recovery, blocker attention, and live-run polling. > - Closed PR #4860 mixed this reliability work with separate mutation-boundary policy changes, which made review and merge risk too broad. > - [PAP-2981](/PAP/issues/PAP-2981) asked for a replacement PR containing only the remaining reliability slice and explicitly excluding user-assignment and execution-policy restrictions. > - Follow-up review also split `advanced` run-liveness continuation behavior out of this PR so it can be reviewed separately. > - The implementation hardens repeated recovery escalation, expands blocker-attention coverage for explicit waiting and recovery paths, and caps company live-run polling defaults. > - The benefit is a smaller reliability PR that improves liveness behavior without changing agent/user mutation authorization boundaries or `advanced` continuation semantics. ## What Changed - Avoid repeated liveness escalation updates when the source issue is already blocked by the same open escalation. - Treat open liveness escalation recovery issues, their source issues, and their leaf blockers as covered waiting paths in blocker attention. - Cap default company live-run polling at 50 rows for both `minCount` and `limit`, including explicit zero values, to avoid unbounded responses. - Preserve the existing behavior where succeeded `advanced` runs are considered productive/healthy for stranded-work recovery and are not actionable bounded run-liveness continuations. - Added focused server coverage for recovery dedupe, blocker attention, liveness escalation, run continuations, and live-run polling. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts server/src/__tests__/issue-blocker-attention.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/agent-live-run-routes.test.ts` - Result: 5 files passed, 63 tests passed. - `pnpm --filter @paperclipai/server typecheck` - Result: passed. - No UI changes; screenshots are not applicable. ## Risks - Recovery and blocker-attention classification changes can affect which blocked chains are shown as covered versus needing attention. - Live-run polling now treats omitted, invalid, or non-positive `limit` / `minCount` values as the capped default of 50. - `advanced` run-liveness continuation behavior is intentionally excluded from this PR and split for separate review. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-30 16:44:28 -05:00
Dotta	a3de1d764d	Add cheap model profiles for local adapters (#4881 ) ## Thinking Path > - Paperclip is a control plane for autonomous AI companies, where adapters are the boundary between the board, agents, and execution runtimes. > - Local adapters currently expose a primary runtime configuration, but operators often need a cheaper model lane for routine or low-risk work. > - That cheap lane has to stay adapter-owned: runtime profile settings should not mutate the primary adapter config or bypass existing auth/secret mediation. > - Issue creation also needs an ergonomic way to request primary, cheap, or custom model behavior for a selected assignee. > - This pull request adds a first-class `cheap` model profile contract across adapter capabilities, heartbeat config resolution, agent configuration, and issue creation. > - The benefit is cheaper task execution can be configured and requested explicitly while preserving adapter boundaries, secret handling, and audit visibility. ## What Changed - Added adapter model-profile capability metadata and a `cheap` profile contract for supported local adapters. - Applied `runtimeConfig.modelProfiles.cheap.adapterConfig` during heartbeat config resolution, including requested/applied/fallback run metadata. - Added agent configuration UI for cheap model profile settings without writing those settings into primary `adapterConfig`. - Added New Issue assignee model lane controls for Primary / Cheap / Custom and request payload handling. - Added run ledger profile badges and Storybook stories for the new cheap-lane UI states. - Added tests for validators, heartbeat model profile application, permission/secret mediation, UI payload helpers, and run ledger rendering. - Added committed UI verification screenshots under `docs/pr-screenshots/pap-2837/`. - Addressed Greptile review feedback around cheap-profile defaults, shared profile types, and fallback test data. ## Verification Local: - `pnpm exec vitest run packages/shared/src/validators/issue.test.ts server/src/__tests__/adapter-registry.test.ts server/src/__tests__/agent-permissions-routes.test.ts server/src/__tests__/heartbeat-model-profile.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/lib/agent-config-patch.test.ts ui/src/lib/issue-assignee-overrides.test.ts ui/src/lib/new-agent-runtime-config.test.ts` — passed, 8 files / 103 tests. - `pnpm exec vitest run ui/src/lib/new-agent-runtime-config.test.ts ui/src/components/IssueRunLedger.test.tsx` — passed after Greptile/rebase follow-up, 2 files / 17 tests. - `pnpm --filter @paperclipai/ui typecheck` — passed after Greptile/rebase follow-up. - `pnpm -r typecheck` — passed. - `pnpm build` — passed. - `pnpm test:run` — did not complete successfully in this local worktree: it stopped in pre-existing `@paperclipai/adapter-utils` sandbox/SSH fixture suites outside this PR diff. Failures were 5s local timeouts plus `git init -b` unsupported by this machine's Git 2.21.0. The branch-specific targeted suites above passed. - Branch was fetched/rebased onto `public-gh/master`; `git rev-list --left-right --count public-gh/master...HEAD` reports `0 9`. Remote PR checks on latest head `e30bf399146451c86cee98ed528d51d33fa5af5a`: - `policy` — passed. - `verify` — passed. - `e2e` — passed. - `Greptile Review` — passed, confidence score 5/5; Greptile review threads resolved. - `security/snyk (cryppadotta)` — passed. Screenshots: - [New issue cheap lane desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-cheap-desktop.png) - [New issue custom lane desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-custom-desktop.png) - [New issue unsupported adapter desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/newissue-unsupported-desktop.png) - [Run ledger model profile badges desktop](https://github.com/paperclipai/paperclip/blob/PAP-2837-plan-cheap-model-for-adapters-that-can-support-it/docs/pr-screenshots/pap-2837/runledger-profile-badges-desktop.png) - Mobile variants are also in `docs/pr-screenshots/pap-2837/`. ## Risks - Medium: heartbeat config mediation now merges runtime model profiles into adapter configs, so adapter secret normalization and host-command restrictions must keep covering nested config paths. - Medium: the UI adds another issue creation choice; unsupported adapters must keep hiding the cheap lane and preserve primary behavior. - Low migration risk: no database migration is included. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used OpenAI Codex coding agent using GPT-5-class reasoning with repo tool use and command execution. Exact served model/context window was not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [ ] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:32:04 -05:00
Dotta	1fe1067361	Polish board settings and skills workflow (#4863 ) ## Thinking Path > - Paperclip's board UI and bundled skills are the operator layer for configuring agents, routines, issue workflows, and local troubleshooting loops. > - The prior rollup mixed this operator polish with database backups, backend reliability, thread scale, and cost/workflow primitives. > - This pull request isolates the remaining board QoL, settings, issue-detail integration, adapter config cleanup, and skills smoke tooling. > - It includes some integration-level overlap with the thread and workflow slices so this branch can run from `origin/master` while still preserving the full original work. > - Preferred merge order is the narrower primitives first, then this integration PR last. > - The benefit is that reviewers can inspect the user-facing board/settings/skills layer separately from backend infrastructure changes. ## What Changed - Added board/settings polish for agents, routines, company settings, project workspace detail, and issue detail controls. - Added agent/routine UI regression tests and New Issue dialog coverage. - Integrated issue-detail activity/cost/interaction surfaces and leaf work pause/resume controls. - Cleaned bundled adapter UI config defaults and onboarding copy. - Added terminal-bench loop and work-stoppage diagnosis skills plus a smoke test script. - Updated attachment type handling and Paperclip skill/API guidance. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run ui/src/pages/Agents.test.tsx ui/src/pages/Routines.test.tsx ui/src/components/NewIssueDialog.test.tsx ui/src/pages/IssueDetail.test.tsx server/src/__tests__/costs-service.test.ts server/src/__tests__/issue-thread-interaction-routes.test.ts server/src/__tests__/issue-thread-interactions-service.test.ts` - Result: 7 test files passed, 54 tests passed. - `pnpm run smoke:terminal-bench-loop-skill` - Result: JSON output included `"ok": true` and `"cleanup": true`. - UI screenshots not included because verification is focused component/page coverage for the changed board surfaces. ## Risks - This is the integration-heavy PR in the split and intentionally overlaps some component/API primitives with the issue-thread and workflow PRs so it can run from `origin/master`. - Preferred merge order: #4859, #4860, #4861, #4862, then this PR last. If earlier branches merge first, this PR may need a straightforward conflict refresh in shared UI files. - The terminal-bench smoke script creates temporary mock issues and relies on cleanup; the verified run returned `cleanup: true`. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-30 15:28:11 -05:00
Dotta	c4269bab59	Add workflow interaction cancellation and issue cost summaries (#4862 ) ## Thinking Path > - Paperclip coordinates work through issue-thread interactions, run history, and cost telemetry. > - Operators need workflow prompts to be cancellable and costs to be visible at the issue level. > - The earlier rollup mixed this workflow/cost work with database backups, reliability recovery, thread scaling, and settings polish. > - This pull request isolates the interaction and cost surfaces into a reviewable slice. > - The backend now supports cancelling pending question interactions and summarizing issue-tree costs. > - The UI component layer can render cancelled questions and interleave activity with run ledger rows. ## What Changed - Added `cancelled` as an issue-thread interaction status and result shape for question interactions. - Added the board-only `POST /issues/:id/interactions/:interactionId/cancel` route and service implementation. - Added issue-tree cost summary support in the cost service and `/issues/:id/cost-summary` API route. - Extended shared cost exports and UI API/query keys for issue cost summaries. - Updated `IssueThreadInteractionCard` and `IssueRunLedger` components for cancelled questions, issue cost surfaces, and activity/run interleaving. - Added focused server and component regression coverage. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run server/src/__tests__/costs-service.test.ts server/src/__tests__/issue-thread-interaction-routes.test.ts server/src/__tests__/issue-thread-interactions-service.test.ts ui/src/components/IssueRunLedger.test.tsx` - Result: 4 test files passed, 45 tests passed. - UI screenshots not included because this PR updates reusable components and API surfaces without wiring a new page-level layout. ## Risks - Adds a new interaction terminal status; clients that switch exhaustively on interaction status may need to handle `cancelled`. - Issue-tree cost summaries use recursive issue traversal and should be watched on unusually large issue trees. - Page-level issue detail wiring is intentionally left to the board QoL/issue-detail branch to keep this PR narrow. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-30 13:57:25 -05:00
Devin Foley	c0ce35d1fb	Improve E2B plugin configuration UX and fix execution timeouts (#4802 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - E2B is a sandbox provider plugin that runs agent code in isolated cloud environments > - Operators configure E2B through the plugin settings page > - But the E2B API key configuration was unclear — the settings field description didn't explain that pasted keys are auto-saved as company secrets, and the fallback to the host `E2B_API_KEY` variable wasn't documented > - Additionally, long-running E2B sandbox commands were timing out because the plugin environment RPC driver used a fixed timeout, and environment commands competed for the single foreground command slot > - This PR clarifies the E2B configuration UX, fixes RPC timeouts for plugin environment execution, and runs E2B environment commands in background mode to avoid blocking the foreground slot > - The benefit is clearer E2B setup for operators and more reliable sandbox command execution ## What Changed - Updated E2B plugin manifest and settings UI to clarify API key configuration — field description now explains that pasted keys are saved as company secrets and documents the `E2B_API_KEY` host fallback - Added test coverage for the plugin settings page rendering - Fixed `plugin-environment-driver.ts` to pass the configured timeout through to RPC calls instead of using a hardcoded default - Updated `environment-runtime.ts` to propagate timeout from the environment lease to the plugin driver - Changed E2B sandbox command execution to use background handles so long-running agent commands don't block the foreground slot needed by the callback bridge ## Verification - `pnpm test` — all existing and new tests pass - `pnpm typecheck` — clean - Manual: navigate to plugin settings, verify E2B API key field shows the updated description text - Manual: run an E2B-backed agent task with a long-running command, verify it completes without RPC timeout ## Risks - Low risk. Configuration UX change is cosmetic. The timeout fix passes an existing value through instead of dropping it. Background command execution is a behavioral change but only affects E2B sandbox commands — the foreground slot is still available for bridge health checks. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 17:12:30 -07:00
Devin Foley	a4ac6ff133	Add sandbox callback bridge for remote environment API access (#4801 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, which are isolated from the host network > - Sandboxed agents need to call back to the Paperclip API to report progress, post comments, and update issue status > - But sandbox environments cannot reach the Paperclip server directly because they run in isolated network namespaces > - This PR adds a callback bridge that proxies API requests from the sandbox to the Paperclip server, running as a local HTTP server on the host that forwards authenticated requests > - The bridge is started automatically when an adapter launches a sandbox execution, and torn down when the run completes > - The benefit is sandboxed agents can interact with the Paperclip API without requiring network-level access to the host, enabling E2B and similar providers to work end-to-end ## What Changed - Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a lightweight HTTP bridge server that accepts requests from sandbox environments and proxies them to the Paperclip API with authentication - Added request validation and security policy: the bridge only forwards requests to the configured API URL, validates content types, enforces size limits, and rejects non-API paths - Wired the bridge into all remote adapter execute paths (claude, codex, cursor, gemini, pi) — the bridge starts before the agent process and the bridge URL is passed via environment variables - Updated `environment-execution-target.ts` to prefer the explicit API URL from environment lease metadata for sandbox callback routing - Fixed Claude sandbox runtime setup to work with the bridge configuration - Added comprehensive test coverage for bridge request handling, policy enforcement, and sandbox execution integration - Fixed browser bundling — the bridge module is excluded from the frontend bundle via the adapter-utils index export ## Verification - `pnpm test` — all existing and new tests pass, including bridge unit tests and sandbox execution integration tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run an agent task, verify the agent can post comments and update issue status through the bridge ## Risks - Medium. This is a new network-facing component (HTTP server on localhost). The security policy restricts forwarding to the configured API URL only and validates all requests, but any proxy introduces attack surface. The bridge binds to localhost only and is scoped to the lifetime of a single agent run. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 16:37:34 -07:00
Devin Foley	4cf612a92d	Fix runtime state race, workspace sync, plugin startup, and orphaned leases (#4804 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run inside environments that are leased, and the server manages runtime state, workspace configuration, and plugin lifecycle > - Several edge cases caused failures during concurrent operations: a race condition in runtime state insertion could produce duplicate-key errors, reused workspaces didn't sync their configuration when the parent issue was updated, sandbox provider plugins could be queried before registration completed, and orphaned environment leases from failed runs were never released > - This PR fixes these four runtime/environment issues > - The benefit is more reliable concurrent agent execution and proper resource cleanup ## What Changed - `services/heartbeat.ts`: Fixed a race condition where concurrent runtime state inserts could fail with a duplicate-key error by using an upsert pattern - `services/issues.ts`: Sync reused workspace configuration when an issue is updated, so the workspace reflects the latest issue state - `services/environment-runtime.ts`: Fixed a startup race where sandbox provider plugins could be queried before registration completed, by awaiting plugin readiness before resolving environment drivers - `services/heartbeat.ts`: Release environment leases for orphaned runs that lost their process without cleanup ## Verification - `pnpm test` — all existing and new tests pass, including new tests for runtime state upsert and process recovery lease cleanup - `pnpm typecheck` — clean - Manual: trigger concurrent agent runs to verify no duplicate-key failures; verify orphaned leases are released after process loss ## Risks - Low risk. The runtime state upsert changes insert-to-upsert behavior, which could mask a legitimate duplicate if two different runs produce the same key — but this is prevented by the run ID being part of the key. The plugin startup await is bounded by the existing registration timeout. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 16:37:10 -07:00
Devin Foley	f9cf1d2f6a	Add cursor sandbox support and fix SSH workspace sync (#4803 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, or on remote hosts via SSH > - The cursor adapter needs to resolve `cursor-agent` inside sandbox environments where it's installed in `~/.local/bin` > - But when using the default `agent` command on a sandbox target, the adapter didn't know to look in `~/.local/bin/cursor-agent`, causing "command not found" failures > - Additionally, repeated SSH runs failed because `git checkout` during workspace sync conflicted with leftover `.paperclip-runtime` files from previous runs > - This PR adds sandbox-aware command resolution for cursor and fixes the SSH workspace sync conflict > - The benefit is cursor works in E2B sandboxes out of the box, and repeated SSH runs don't fail on workspace sync ## What Changed - `cursor-local`: Added `prepareCursorSandboxCommand` — on sandbox targets, reads the remote `$HOME`, prepends `~/.local/bin` to PATH, and prefers `~/.local/bin/cursor-agent` when the default command is requested; tightened the sandbox command probe to validate the binary exists before launching; preserves explicit custom command overrides - `adapter-utils/ssh.ts`: Added `--force` to git checkout in SSH workspace sync to handle `.paperclip-runtime` untracked file conflicts from previous runs ## Verification - `pnpm test` — all existing and new tests pass, including cursor sandbox probe, sandbox execution, and custom command override tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run a cursor-local task, verify it resolves cursor-agent from the sandbox install path ## Risks - Low-medium. The `--force` flag on git checkout could discard uncommitted changes in the remote workspace, but the workspace is managed by Paperclip and should not contain user edits. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 16:12:06 -07:00
Devin Foley	367d4cab72	Fix SSH callback URL selection for LAN and private networks (#4799 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run on remote hosts via SSH environments > - When a remote agent needs to call back to the Paperclip API, it needs a reachable URL > - But the runtime API URL candidate builder did not account for private network topologies where the server is only reachable via LAN or VPN addresses > - Agents on SSH hosts were failing to connect because the callback URL pointed to localhost or an unreachable address > - This PR fixes callback URL selection to honor `PAPERCLIP_API_URL`, prefer LAN-reachable candidates, filter unreachable link-local addresses, and include interface hosts in onboarding invite URLs > - The benefit is SSH-based agents can reliably reach the Paperclip API on private networks without manual URL configuration ## What Changed - `runtime-api.ts`: Added `PAPERCLIP_API_URL` as a first-priority candidate in `buildRuntimeApiCandidateUrls`; extracted `collectReachableInterfaceHosts` to enumerate non-loopback, non-link-local network interface IPs with IPv4 preference - `server/src/index.ts`: Export `PAPERCLIP_API_URL` from the server environment so it is available to callback candidate resolution - `server/src/routes/access.ts`: Include LAN interface hosts in onboarding invite connection candidates - `server/src/config.ts`: Attempted auto-allowing LAN interface hosts, then reverted to the per-instance allowlist approach (both commits included for history clarity) ## Verification - `pnpm test` — all existing and new tests pass, including new tests for LAN candidate ordering and link-local filtering - `pnpm typecheck` — clean - Manual: start a Paperclip server on a machine with a LAN IP, create an SSH environment pointing to another host on the same LAN, verify the agent's callback URL uses the LAN IP rather than localhost ## Risks - Low-medium. The candidate list now includes more addresses (all non-loopback LAN interfaces). These are candidates for the agent to try, not an allowlist — the server's allowed hostnames still gate which origins are accepted. Ordering change (LAN preferred over loopback) could affect existing setups where localhost was intentionally preferred. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 15:56:17 -07:00
Devin Foley	9b99d30330	Add dedicated environment settings page and test-in-environment (#4798 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run inside environments (local, SSH, E2B sandbox) > - Operators need to configure and manage these environments > - But environment settings were buried inside the general company settings page, making them hard to find > - Additionally, when testing an agent from the configuration form, the test always ran locally regardless of which environment was selected > - This PR moves environments into a dedicated top-level company settings section and wires the "Test Environment" button to run inside the selected environment > - The benefit is operators can find and manage environments more easily, and the test button now validates the actual environment the agent will use ## What Changed - Added a dedicated `CompanyEnvironments` settings page with its own route and sidebar entry - Updated `CompanySettingsSidebar` and `CompanySettingsNav` to include the new environments section - Modified the agent test route (`POST /agents/:id/test`) to accept an optional `environmentId` parameter - Updated all adapter `test.ts` handlers to resolve and use the specified execution target environment - Added `resolveTestExecutionTarget` to `execution-target.ts` for remote environment test resolution with cwd fallback - Moved the "Test Environment" button and its feedback display into the `NewAgent` page footer for better UX flow ## Verification - `pnpm test` — all existing and new tests pass - `pnpm typecheck` — clean - Manual: navigate to Company Settings, confirm "Environments" appears as a top-level section - Manual: configure an agent with a non-local environment, click "Test Environment", confirm the test runs inside that environment ## Risks - Low risk. UI-only routing change for the settings page. The test-in-environment change adds an optional parameter with a local fallback, so existing behavior is preserved when no environment is specified. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-29 15:56:13 -07:00
Dotta	6b7f6ce4b8	[codex] Split PR #4692 UI/QoL updates (#4701 ) ## Thinking Path > - Paperclip orchestrates AI agents through a company-scoped control plane. > - The affected surface is the board UI for issue threads, issue lists, routines, dialogs, navigation, and issue review indicators. > - Closed PR #4692 bundled backend, schema, docs, workflow, and UI/QoL work into one oversized change set. > - Greptile could not keep reviewing that broad PR because it exceeded the 100-file review limit and mixed unrelated concerns. > - This pull request extracts the UI/QoL slice into a fresh branch under the review limit while leaving workflow and lockfile churn out. > - The benefit is a focused review path for the board UI performance and workflow improvements without reopening the oversized PR. ## What Changed - Added long issue-thread virtualization, scroll-container binding, anchor preservation, latest-comment jump targeting, and related regression/perf fixtures. - Improved issue list scalability with scroll-based loading, server offset parameters, and pagination-focused UI tests. - Reduced new issue dialog typing churn and split dialog action subscriptions so broad layout/nav surfaces avoid unnecessary renders. - Added routine variables help and routine description mention options for users, agents, and projects. - Added productivity review badge/link UI and fixed the badge to use Paperclip's company-prefixed router link. - Kept the split PR below Greptile's review limit and excluded `.github/workflows/pr.yml` and `pnpm-lock.yaml`. ## Verification - `pnpm install --no-frozen-lockfile` in the clean worktree to install `@tanstack/react-virtual` locally without committing lockfile churn. - `pnpm --filter @paperclipai/ui exec vitest run --config vitest.config.ts src/components/IssueChatThread.test.tsx src/components/IssuesList.test.tsx src/components/NewIssueDialog.test.tsx src/pages/Routines.test.tsx src/pages/Issues.test.tsx` passed: 5 files, 83 tests. - `pnpm --filter @paperclipai/ui typecheck` passed. - `git diff --check origin/master..HEAD` passed. - Split-scope checks: 53 changed files; no `.github/workflows/pr.yml`; no `pnpm-lock.yaml`. - Screenshots were not captured in this heartbeat; the changes are primarily virtualization, routing, pagination, and editor behavior covered by focused regression tests. ## Risks - Moderate UI risk because issue-thread virtualization changes scroll behavior on long conversations; regression tests cover anchor jumps, latest-comment targeting, row metadata, and short-thread fallback. - Moderate integration risk because the issue-list offset parameter and productivity review field depend on matching API behavior. - Dependency risk: the UI package adds `@tanstack/react-virtual` while repository policy keeps `pnpm-lock.yaml` out of PRs, so CI must resolve dependency changes through the repo's normal lockfile policy. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled local repository and GitHub workflow. Exact runtime context window was not exposed by the harness. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-28 17:18:58 -05:00
Dotta	1991ec9d6f	[codex] Split backend control-plane QoL slice (#4700 ) ## Thinking Path > - Paperclip is the control plane for autonomous AI companies, so backend task ownership, recovery, review visibility, and company-scoped limits need to stay enforceable without UI-only coupling. > - Closed PR #4692 bundled those backend changes with UI workflow, docs, skills, workflow, and lockfile churn. > - PAP-2694 asks for a clean backend/control-plane slice from that closed branch. > - This branch starts from current `master` and mines only the `cli`, `packages/db`, `packages/shared`, and `server` contracts/tests needed for the backend behavior. > - It explicitly excludes UI workflow/performance work, `.github/workflows/pr.yml`, `pnpm-lock.yaml`, docs, skills, package-script, adapter UI build-config, and perf fixture script changes; the only UI files are fixture/test updates required by the tightened shared `Company` contract. > - The benefit is a smaller reviewable PR that preserves the control-plane fixes while staying under Greptile s 100-file review limit. ## What Changed - Added company-scoped attachment-size limits through DB schema/migrations, shared company portability contracts, CLI import/export coverage, and server attachment upload enforcement. - Added productivity review service/API behavior for no-comment streak, long-active, and high-churn review issues, including request-depth clamping and issue summary exposure. - Hardened issue ownership and recovery/control-plane paths: peer-agent mutation denial, issue tree pause/resume behavior, stranded recovery origins, and related activity/test coverage. - Preserved related backend contract updates for routine timestamp variables and managed agent instruction bundles because they live in shared/server contracts from the source branch. - Addressed Greptile feedback by making `Company.attachmentMaxBytes` non-optional, simplifying review request-depth clamping, fixing the migration final newline, and enforcing the process-level attachment cap as the final ceiling for uploads. - Added minimal company fixtures needed for repo-wide typecheck/build and kept the PR to 66 changed files with forbidden/non-slice paths excluded. ## Verification - `pnpm install --frozen-lockfile` - `git diff --check origin/master..HEAD` - `git diff --name-only origin/master..HEAD \| wc -l` -> 66 files - `git diff --name-only origin/master..HEAD -- .github/workflows/pr.yml pnpm-lock.yaml package.json doc skills .agents scripts packages/adapters` -> no output - `pnpm exec vitest run --config vitest.config.ts packages/shared/src/validators/issue.test.ts packages/shared/src/routine-variables.test.ts packages/shared/src/adapter-types.test.ts cli/src/__tests__/company-import-export-e2e.test.ts cli/src/__tests__/company.test.ts server/src/__tests__/productivity-review-service.test.ts server/src/__tests__/issue-tree-control-service.test.ts server/src/__tests__/issue-tree-control-routes.test.ts server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts server/src/__tests__/issue-attachment-routes.test.ts server/src/__tests__/heartbeat-process-recovery.test.ts server/src/__tests__/issues-service.test.ts` -> 12 files, 147 tests passed - `pnpm exec vitest run --config vitest.config.ts cli/src/__tests__/company-delete.test.ts cli/src/__tests__/company-import-export-e2e.test.ts server/src/__tests__/productivity-review-service.test.ts` -> 3 files, 18 tests passed - `pnpm exec vitest run --config vitest.config.ts server/src/__tests__/issue-attachment-routes.test.ts` -> 1 file, 6 tests passed - `pnpm --filter @paperclipai/db typecheck && pnpm --filter @paperclipai/shared typecheck && pnpm --filter @paperclipai/server typecheck && pnpm --filter paperclipai typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm --filter @paperclipai/ui typecheck && pnpm --filter @paperclipai/ui build` ## Risks - Includes migrations `0073_shiny_salo.sql` and `0074_striped_genesis.sql`; merge ordering matters if another PR adds migrations first. - This is intentionally backend-only apart from fixture/test updates forced by shared type correctness; UI affordances from PR #4692 are not present here and should land in separate UI slices. - The worktree install emitted plugin SDK bin-link warnings for unbuilt plugin packages, but the targeted tests and package typechecks completed successfully. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected; check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled terminal/GitHub workflow. Exact runtime context window was not exposed by the harness. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-28 16:46:45 -05:00
Dotta	f88f538e6d	Keep manual routine runs visible in the runner inbox (#4615 ) ## Thinking Path > - Paperclip coordinates recurring agent work through scheduled and manual routines. > - Manual routine runs are board-initiated work and should stay visible to the human who kicked them off. > - Routine execution issues are agent-assigned, so they can be filtered away from a board user's inbox unless the user is recorded as touching the work. > - Coalesced or skipped active routine runs have the same visibility problem because they reuse an existing live issue. > - This pull request carries the manual runner actor into routine dispatch and touches the linked issue for that user's inbox. > - The benefit is that manually triggered routine work stays discoverable by the operator who started it. ## What Changed - Passed the board or agent actor from the routine run route into the routine service. - Recorded manual board runners as `createdByUserId` on fresh routine execution issues. - Touched coalesced or skipped active routine issues for the manual runner by updating read state and clearing that user's inbox archive. - Added route and service regressions for manual routine run actor propagation and inbox visibility. ## Verification - `pnpm exec vitest run server/src/__tests__/routines-routes.test.ts server/src/__tests__/routines-service.test.ts` ## Risks - Low risk: the change is scoped to manual routine runs and only updates issue attribution/read-state metadata for the initiating actor. - No migrations. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex coding agent based on GPT-5, tool-enabled local repository and shell access, Paperclip heartbeat context. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-27 20:03:24 -05:00
Dotta	68c37660f0	Dispatch assigned todo work during recovery sweeps (#4614 ) ## Thinking Path > - Paperclip orchestrates AI agents for autonomous companies. > - Agent assignments must reliably turn into heartbeat work without board operators manually nudging stuck tasks. > - The stranded-assignment recovery sweep already handles failed or lost runs. > - But assigned `todo` issues with no prior run could sit idle because there was nothing to retry or recover. > - This pull request dispatches those never-started assigned todos as normal assignment wakes. > - The benefit is that recovery fixes missed initial dispatches without creating unnecessary recovery issues. ## What Changed - Added an initial assigned-todo dispatch path to the recovery service when an assigned `todo` issue has no heartbeat run yet. - Reused invocation budget hard-stop checks before dispatching or requeueing recovery work. - Counted `assignmentDispatched` in startup/scheduled recovery logs. - Added heartbeat recovery regressions for first dispatch, duplicate queued wake prevention, budget-blocked skips, and paused-agent skips. ## Verification - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts` ## Risks - Low to medium risk: this changes liveness recovery behavior for assigned `todo` issues, but it stays on the existing assignment wake path and skips paused or budget-blocked agents. - No migrations. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex coding agent based on GPT-5, tool-enabled local repository and shell access, Paperclip heartbeat context. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-27 20:02:44 -05:00
Dotta	7a9b3a6037	[codex] Harden recovery issue handling (#4600 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The control plane must recover stranded agent work without creating new operational loops > - Stranded recovery issues can themselves fail, and exposing raw retry errors in comments can leak sensitive adapter details > - New local companies also should not force a hire-approval gate unless operators enable that policy > - This pull request hardens recovery issue handling, redacts retry failure details in issue copy, preserves `maxConcurrentRuns: 1`, and flips new-hire approval to an opt-in default > - The benefit is safer automatic recovery and smoother default company setup without hidden migration conflicts ## What Changed - Added migration `0071_default_hire_approval_off` and updated company schema/import/export/docs so hire approvals default off and serialize only when enabled. - Added migration `0072_large_sandman` with a partial unique index preventing duplicate active stranded recovery issues for the same source issue. - Blocked failed `stranded_issue_recovery` issues in place instead of creating nested recovery issues. - Redacted latest retry failure details from recovery issue comments while still linking reviewers to run evidence. - Allowed `maxConcurrentRuns: 1` to be honored by heartbeat concurrency normalization. - Added focused regression coverage for recovery recursion, redaction, migration ordering, and concurrency behavior. ## Verification - `pnpm --filter @paperclipai/db run check:migrations` - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/recovery-classifiers.test.ts` - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/company-portability.test.ts --pool=forks --poolOptions.forks.isolate=true` - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/agent-permissions-routes.test.ts --pool=forks --poolOptions.forks.isolate=true` - `pnpm --filter @paperclipai/server typecheck` - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/heartbeat-process-recovery.test.ts --pool=forks --poolOptions.forks.isolate=true` exits 0, but this host skipped the embedded Postgres tests with the existing init guard. - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/heartbeat-dependency-scheduling.test.ts --pool=forks --poolOptions.forks.isolate=true` exits 0, but this host skipped the embedded Postgres tests with the existing init guard. ## Risks - Migration risk is low but this PR intentionally owns both new migrations to avoid separate PR migration-journal conflicts. - Recovery comments now require operators to inspect linked run evidence for details instead of reading raw errors inline. - The hire approval default changes behavior for newly created/imported companies only; existing persisted company settings are not changed except by the SQL default for future rows. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled terminal/GitHub workflow, reasoning mode active. Context window not exposed in this environment. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-27 15:02:47 -05:00
Dotta	6ccf80bcf2	[codex] Reject stale company skill refreshes (#4601 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Company skills are part of the reusable agent capability layer > - Skill inventory refresh work can outlive the company it was requested for > - Without an explicit company existence check, stale refreshes can continue into bundled/local skill cleanup for deleted or missing companies > - This pull request makes company-skill listing fail fast when the company no longer exists > - The benefit is clearer API behavior and less stale background work against missing company scope ## What Changed - Added a company existence check before `companySkillService.list()` refreshes bundled and local-path skill state. - Added regression coverage asserting missing companies return `404 Company not found`. ## Verification - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/company-skills-service.test.ts --pool=forks --poolOptions.forks.isolate=true` exits 0, but this host skipped the embedded Postgres tests with the existing init guard. ## Risks - Low risk. Existing callers for valid companies are unchanged. - Missing-company callers now receive an explicit 404 instead of continuing refresh work. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled terminal/GitHub workflow, reasoning mode active. Context window not exposed in this environment. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-27 13:19:38 -05:00
Dotta	215b6cd161	[codex] Add security role route coverage (#4589 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Agent creation accepts roles that become part of the agent contract and telemetry. > - The shared role list already includes the security role. > - Direct agent creation should preserve that role through route handling and analytics metadata. > - This pull request adds route coverage for creating a security-role agent and asserting telemetry receives the same role. > - The benefit is regression coverage for security agents without changing the production route behavior. ## What Changed - Added a server route test that creates an agent with `role: "security"`. - Asserted the create payload and telemetry metadata preserve `security` as the agent role. ## Verification - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/agent-skills-routes.test.ts --pool=forks --poolOptions.forks.isolate=true` ## Risks - Low risk; test-only coverage. - No runtime behavior, schema, or API contract changes. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, `gpt-5`, coding model with tool use and local command execution; context window not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-27 08:49:59 -05:00
Dotta	fda296ee4f	[codex] Add configurable liveness auto-recovery controls (#4587 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Heartbeat liveness recovery decides when stalled issue trees need manager-visible follow-up. > - Automatic recovery issue creation is useful, but operators need instance-level controls for how aggressive it is. > - Without controls, recovery behavior is harder to tune for local development, production operations, and noisy edge cases. > - This pull request adds configurable liveness auto-recovery settings across shared contracts, API routes, services, and the instance experimental settings UI. > - The benefit is that operators can keep liveness findings advisory or enable bounded recovery automation with explicit intervals and lookback windows. ## What Changed - Added shared types and validators for liveness auto-recovery settings. - Extended instance settings routes and services to persist and validate the new controls. - Wired heartbeat/recovery services to honor enablement, minimum interval, and lookback settings. - Added UI controls for liveness recovery under instance experimental settings. - Covered the new server behavior with instance settings and liveness escalation tests. ## Verification - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts server/src/__tests__/instance-settings-routes.test.ts --pool=forks --poolOptions.forks.isolate=true` - `pnpm --filter @paperclipai/shared typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm --filter @paperclipai/ui typecheck` ## Risks - Moderate behavioral risk because recovery automation timing changes when enabled; defaults keep existing advisory behavior unless the setting is turned on. - No database migration in this PR; settings are stored through the existing instance settings path. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, `gpt-5`, coding model with tool use and local command execution; context window not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-27 08:46:44 -05:00
Dotta	1d8c7a09b8	[codex] Add security role route regression (#4586 ) ## Thinking Path > - Paperclip orchestrates AI agents through company-scoped control-plane workflows. > - Agent creation is one of the core board/operator surfaces for defining who works in a company. > - The shared taxonomy now includes a first-class `security` agent role. > - Direct agent creation must preserve that role through default instruction materialization and telemetry. > - A prior replacement PR covered this path, but Greptile identified that the route-test mock could let a future patch object shadow the regression. > - This pull request reopens the narrow regression coverage from current `master` with the mock ordering fixed. > - The benefit is a focused guardrail that keeps `security` role creation observable without expanding the production diff. ## What Changed - Added a direct agent creation route regression test for `role: "security"`. - Verified telemetry receives `agentRole: "security"` after the default instruction materialization update path. - Ordered the regression mock as `...patch` before `role: "security"` so future patch fields cannot shadow the asserted role. ## Verification - `pnpm install --frozen-lockfile` to link dependencies in the fresh worktree; it completed with existing plugin SDK bin warnings. - `pnpm exec vitest run server/src/__tests__/agent-skills-routes.test.ts packages/shared/src/adapter-types.test.ts` ## Risks - Low risk. This is test-only coverage and does not change runtime behavior. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 based coding agent, tool-enabled with local shell and repository editing capabilities. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots (N/A: no UI changes) - [x] I have updated relevant documentation to reflect my changes (N/A: test-only regression) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-27 08:11:52 -05:00
Dotta	82e257c7ba	Cancel stale queued heartbeats when issue graph changes (PAP-2314) (#4534 ) Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-26 21:17:38 -05:00
Devin Foley	868d08903e	test: isolate CLI company import e2e state (#4560 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies, and its CLI import/export path is part of how operators move company state safely between environments. > - The `paperclipai company import/export` e2e test is supposed to validate that portability flow inside a hermetic harness, not against a developer's live Paperclip home. > - This regression showed nested CLI subprocesses could silently fall back to ambient `PAPERCLIP_*` state and mutate a real local instance by creating extra companies such as `CLI-1-Roundtrip-Test`. > - The first job was to pin the test subprocesses to isolated config, home, instance, auth, and context paths, and to add a regression assertion that proves the nested CLI writes stay inside the test-owned state. > - Once the PR was up, CI and Greptile exposed two follow-on issues that were blocking merge: plugin SDK typecheck bootstrap was racing across packages in fresh CI, and the new lock helper needed one more fix to release its lock on failure. > - This pull request therefore ends up doing two tightly related things: fixing the original CLI isolation leak, and hardening the supporting typecheck/bootstrap path enough for the fix to verify cleanly in CI. > - The benefit is that the portability e2e test is now actually isolated, and the PR verification path is stable enough to catch regressions instead of introducing its own nondeterministic failures. ## What Changed - Hardened `cli/src/__tests__/company-import-export-e2e.test.ts` so nested CLI subprocesses re-seed isolated `PAPERCLIP_CONFIG`, `PAPERCLIP_HOME`, `PAPERCLIP_INSTANCE_ID`, `PAPERCLIP_CONTEXT`, `PAPERCLIP_AUTH_STORE`, and throwaway `HOME` values instead of falling back to ambient machine state. - Added a regression assertion around `paperclipai context set --json`, then cleared the temporary `context.json` so the isolation check and the later export/import flow stay independent. - Passed the same isolated `HOME` into the server subprocess so both sides of the e2e harness are symmetric. - Introduced locking in `scripts/ensure-plugin-build-deps.mjs` and switched the server/plugin example `typecheck` scripts to use that helper instead of launching concurrent raw `@paperclipai/plugin-sdk` builds. - Fixed the helper failure path so it releases the lock before exiting non-zero, which prevents stale-lock timeouts during parallel typecheck runs. ## Verification - `pnpm vitest run cli/src/__tests__/company-import-export-e2e.test.ts --project paperclipai` - `pnpm --filter paperclipai typecheck` - `pnpm -r typecheck` - PR checks now pass on the current head, including `policy`, `verify`, `e2e`, `security/snyk`, and `Greptile Review`. ## Risks - Low risk. The product-facing behavior change is scoped to test harness code in the CLI e2e suite. - The CI stabilization changes only affect bootstrap/typecheck helper paths for the server and plugin/example packages, but they do touch shared verification plumbing; the main risk is changing how fresh build artifacts are prepared in local/CI typecheck runs. ## Model Used - Anthropic Claude via Paperclip `claude_local`, model `claude-opus-4-7`, high-effort local coding agent, used for the initial implementation and first peer-reviewed verification. - OpenAI Codex via Paperclip `codex_local`, model `gpt-5.4`, high reasoning-effort local coding agent with tool use, used for CI triage, Greptile follow-up fixes, verification, and PR maintenance. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 19:10:01 -07:00
Devin Foley	1d9f7a5149	Fix flaky heartbeat recovery teardown CI failure (#4559 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The linked CI job is in the server test/recovery path, where heartbeat runs and issue cleanup need to leave the control plane in a consistent state even when retries fail. > - In this case the failure was not runtime product behavior but test teardown behavior inside `heartbeat-process-recovery.test.ts`. > - The failing GitHub Actions job showed a foreign-key race on `company_skills_company_id_companies_id_fk` while the test tried to delete the parent company record. > - The surrounding teardown code already uses bounded retry cleanup for other dependent tables (`issues`, `heartbeatRuns`, and `agents`) because this test file intentionally exercises asynchronous recovery flows. > - This pull request applies that same retry pattern to the final `db.delete(companies)` step, re-clearing `companySkills` before each retry. > - The benefit is a targeted fix for the CI flake without changing runtime behavior or expanding the scope beyond the failing teardown path. ## What Changed - Wrapped the final `db.delete(companies)` call in `server/src/__tests__/heartbeat-process-recovery.test.ts` with the same 5-attempt retry pattern already used elsewhere in that teardown. - Re-cleared `companySkills` before each company-delete retry so late-arriving FK-dependent rows do not mask the real test result. - Verified the fix against the originally failing `heartbeat-process-recovery` test file and the broader `pnpm test:run` command under CI-like env conditions. ## Verification - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts` - Re-ran `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts` multiple times locally; the previously failing teardown stayed green. - `env -u PAPERCLIP_API_URL -u PAPERCLIP_RUNTIME_API_URL -u PAPERCLIP_RUN_ID -u PAPERCLIP_TASK_ID -u PAPERCLIP_AGENT_ID -u PAPERCLIP_COMPANY_ID -u PAPERCLIP_API_KEY -u PAPERCLIP_WAKE_REASON -u PAPERCLIP_WAKE_COMMENT_ID -u PAPERCLIP_WAKE_PAYLOAD_JSON -u PAPERCLIP_APPROVAL_ID -u PAPERCLIP_APPROVAL_STATUS pnpm test:run` ## Risks - Low risk. The change is test-only and scoped to teardown retry behavior in a single server test file. - If the underlying async cleanup behavior changes again, this test could still become flaky in a different way, but this PR addresses the specific FK race seen in the linked CI job. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI `gpt-5.4` via Paperclip `codex_local`, high reasoning mode, with tool use for shell, git, HTTP API calls, and patch application. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 17:30:20 -07:00
Devin Foley	54ab0d24cd	Fix disappearing issue comments (#4557 ) ## Thinking Path > - Paperclip is a control plane for AI-agent companies, so issue detail pages are a primary surface for understanding agent work and human feedback. > - The relevant subsystem here is the issue comments/chat experience across the React issue detail page and the server comment pagination API. > - Long issue threads were only surfacing the newest page of comments at first render, which hid earlier human and agent messages behind extra pagination. > - The first UI fix exposed that the descending cursor path on the server could also fail for older-page fetches, leaving the chat tab stuck on an infinite "Loading earlier comments..." state. > - This needed to be addressed in both layers so the chat tab can surface earlier conversation history without manual recovery and without server errors. > - This pull request auto-loads earlier comment pages in the issue detail chat view and fixes the descending cursor predicate used by issue comment pagination. > - The benefit is that long-running issues like `PAPA-103` now show the missing conversation history near the top of the chat surface instead of hiding it or failing to load it. ## What Changed - Auto-load earlier issue comment pages in the issue detail chat tab until the thread reaches a 150-comment cap or there are no older comments left. - Add UI-side guard logic and regression coverage for optimistic issue comment pagination so the autoload behavior stops cleanly. - Replace the raw SQL descending cursor predicate in `issueService.listComments` with typed Drizzle comparisons for the `(createdAt, id)` anchor tuple. - Add a server regression test that paginates earlier comments in descending order from an anchor comment. - Smoke-test the exact previously failing seeded `PAPA-103` cursor path on the isolated dev instance used for review. ## Verification - `pnpm --filter @paperclipai/server exec vitest run src/__tests__/issues-service.test.ts` - `pnpm --filter @paperclipai/server typecheck` - Manual smoke against seeded `PAPA-103` data on the isolated dev server: - `GET /api/issues/PAPA-103/comments?order=desc&limit=50` returns `200` - `GET /api/issues/PAPA-103/comments?after=765d3609-edc6-4d11-a8fe-d466affbe85d&order=desc&limit=50` now returns `200` with 50 comments instead of `500` ## Risks - Moderate UI/perf risk on very large threads because the chat tab now prefetches multiple earlier pages on mount; the cap is intentionally limited to 150 comments to bound that work. - Low API risk because the server fix only changes the cursor predicate construction for anchor-based comment pagination, but any mistake there would affect older-comment paging order. > I checked `ROADMAP.md` before opening this PR and this bug fix does not duplicate planned core work. ## Model Used - OpenAI Codex coding agent in the Paperclip local adapter environment. The exact backend model ID and context window were not exposed in-session. Tool-assisted workflow included shell execution, git/GitHub CLI, local test execution, and targeted code edits. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [ ] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 16:23:53 -07:00
Devin Foley	b2496c8067	fix(auth): trust allowed hostname port variants on detected listen port (#4554 ) ## Thinking Path > - Paperclip is the control plane for autonomous AI companies, so authenticated board access has to be predictable across local and worktree deployments. > - This change sits in the authenticated-mode server startup and Better Auth origin-trust wiring. > - The original auth branch fixed one real gap by adding port-qualified trusted origins for allowed hostnames on non-default ports. > - Review of that branch found a second-order bug: trusted origins were still derived from the configured port before startup detected the actual listen port. > - In isolated worktrees, that meant a common `3100 -> 3101` port shift could still leave Better Auth trusting the stale origin. > - This pull request keeps the original allowed-hostname port-variant fix, then moves trust derivation onto the resolved listen port and adds regression coverage around startup wiring. > - The benefit is that authenticated sessions keep working on allowed private hostnames even when Paperclip has to auto-shift to a different local port. ## What Changed - Added `:port` trusted-origin variants for authenticated-mode `allowedHostnames` when Paperclip runs on non-default ports. - Changed authenticated startup so `listenPort` is detected before Better Auth initialization, and explicit auth base URLs are rewritten before auth startup. - Updated `deriveAuthTrustedOrigins()` to accept the resolved listen port so Better Auth trusts the actual browser origin instead of the stale configured port. - Added focused regression coverage in `server/src/__tests__/better-auth.test.ts` and `server/src/__tests__/server-startup-feedback-export.test.ts`. ## Verification - `pnpm exec vitest run server/src/__tests__/better-auth.test.ts server/src/__tests__/server-startup-feedback-export.test.ts` - Reviewer re-check: reviewed commits `380f5b9f` and `092bb34c` after the follow-up fix landed and found no remaining issues. ## Risks - Low risk: this only affects authenticated-mode origin derivation and startup ordering around detected listen ports. - Main behavioral shift: startup no longer mutates `config.port` to the selected port; it now carries `requestedListenPort` separately and uses `listenPort` where runtime behavior needs the resolved value. - If another path was implicitly relying on `config.port` being overwritten during startup, that path would need follow-up, though the current startup/test coverage did not reveal one. > I checked `ROADMAP.md` and did not find an overlapping planned core work item for this auth trusted-origin port handling fix. ## Model Used - OpenAI Codex via Paperclip `codex_local` agents for implementation and review. Exact backend model ID/context window were not surfaced in this run context; work was performed through the Codex local adapter with tool use, code execution, and review passes. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 15:40:39 -07:00
Devin Foley	08af830430	Tighten publicBaseUrl port rewriting (#4553 ) ## Thinking Path > - Paperclip is a control plane for autonomous agent companies, so its local and authenticated deployment behavior has to stay predictable under port rebinding and worktree isolation. > - This change sits in the server/worktree configuration path that derives runtime URLs and auth origins from `auth.publicBaseUrl`. > - The original hostname-port rewrite change fixed one real gap for private/tailnet host:port worktree setups, but it widened the rewrite rule too far. > - Rewriting every explicit `auth.publicBaseUrl` can corrupt public or reverse-proxy URLs by turning a stable origin like `https://paperclip.example` into a local listen-port URL. > - Paperclip's auth and trusted-origin handling depend on that URL staying semantically correct, so this had to be narrowed before merge. > - This pull request tightens the rewrite rule to explicit-port URLs only and adds regression coverage across the CLI helper, worktree config persistence, and server startup path. > - The benefit is that private host:port worktree flows still work, while public/default-port URLs remain stable and safe. ## What Changed - Tightened `rewriteLocalUrlPort` in `cli/src/commands/worktree-lib.ts`, `server/src/worktree-config.ts`, and `server/src/index.ts` so it only rewrites URLs that already include an explicit port. - Removed the old loopback-only hostname gate from the CLI/worktree helpers and replaced it with the more precise `parsed.port` guard. - Updated CLI helper coverage to assert that explicit-port non-loopback URLs still rewrite while no-port public URLs stay unchanged. - Expanded `server/src/__tests__/worktree-config.test.ts` to cover explicit-port rewrite and no-port stability for both persisted worktree config and in-memory runtime port selection. - Added startup-path coverage in `server/src/__tests__/server-startup-feedback-export.test.ts` for `detect-port` rebinding with both explicit-port and no-port `auth.publicBaseUrl` values. ## Verification - `pnpm --filter @paperclipai/plugin-sdk build` - `npx vitest run server/src/__tests__/server-startup-feedback-export.test.ts` - `npx vitest run cli/src/__tests__/worktree.test.ts server/src/__tests__/worktree-config.test.ts` - All of the above were run locally in this issue worktree and passed. ## Risks - Low risk. The behavior change is deliberately narrower than the reviewed broad-host rewrite and is guarded by regression coverage for both the explicit-port and no-port cases. - The main remaining risk is behavioral only if another code path starts depending on port rewriting for URLs that never declared a port, which would be a separate bug. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex local agent using `gpt-5.4` with high reasoning effort, tool use, shell execution, and file editing. - Anthropic Claude local agent using `claude-opus-4-6` for follow-up code review approval on the implementation issue. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 14:29:22 -07:00
Devin Foley	d47ffa87f0	Fix CEO AGENT_HOME paths and centralize workspace env propagation (#4551 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The local adapter layer is responsible for turning Paperclip runtime context into the environment seen by the child agent process. > - The CEO onboarding bundle tells the agent where to read and write its persistent memory and fact files. > - That bundle was using `./memory/...` and `./life/...`, which only works when the process cwd happens to equal the agent home directory. > - At the same time, six local adapters each duplicated the same workspace-env propagation logic, including `AGENT_HOME`, which makes this contract easy to drift. > - This pull request fixes the CEO instructions to use `$AGENT_HOME/...` and centralizes workspace-env propagation in one shared helper with shared tests. > - The benefit is a real bug fix for agent memory paths plus a single tested contract that makes future built-in adapter work less likely to forget `AGENT_HOME`. ## What Changed - Updated `server/src/onboarding-assets/ceo/HEARTBEAT.md` to use `$AGENT_HOME/memory/...` and `$AGENT_HOME/life/...` instead of cwd-relative `./memory/...` and `./life/...`. - Added `applyPaperclipWorkspaceEnv(...)` in `packages/adapter-utils/src/server-utils.ts` to centralize `PAPERCLIP_WORKSPACE_*` and `AGENT_HOME` propagation. - Added shared helper coverage in `packages/adapter-utils/src/server-utils.test.ts` for both populated and skip-empty cases. - Switched the built-in local adapters (`claude_local`, `codex_local`, `cursor_local`, `gemini_local`, `opencode_local`, `pi_local`) over to the shared helper instead of inline env assignment blocks. ## Verification - `pnpm install` - `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts packages/adapters/claude-local/src/server/execute.remote.test.ts packages/adapters/codex-local/src/server/execute.remote.test.ts packages/adapters/cursor-local/src/server/execute.remote.test.ts packages/adapters/gemini-local/src/server/execute.remote.test.ts packages/adapters/opencode-local/src/server/execute.remote.test.ts packages/adapters/pi-local/src/server/execute.remote.test.ts` - Result: 7 test files passed, 31 tests passed, 0 failures. ## Risks - Low risk. - The only behavioral surface is the shared env propagation refactor across six adapters; if the helper diverged from prior semantics, an adapter could miss a workspace env var. - The shared helper test plus the affected adapter execute tests reduce that risk, and the helper preserves the prior "set only non-empty strings" behavior. ## Model Used - OpenAI Codex via Paperclip `codex_local` agent runtime; tool-assisted coding workflow with shell execution, file patching, git operations, and API interaction. The exact backend model identifier and context window are not surfaced by this local runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-26 13:57:35 -07:00
Devin Foley	91333ec86f	feat: add paperclip-dev skill with optional bundled skill support (#3854 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents working on the Paperclip codebase itself need guidance on dev workflows: server lifecycle, worktrees, builds, database ops, diagnostics > - There was no bundled skill covering these workflows — agents had to figure it out from scratch each time > - Additionally, not every skill should be force-installed on every agent — a dev-focused skill should be opt-in > - This PR adds a `paperclip-dev` skill with `required: false` frontmatter so it ships with Paperclip but isn't auto-installed > - The skill's PR section references canonical files (`.github/PULL_REQUEST_TEMPLATE.md`, `CONTRIBUTING.md`) instead of duplicating their content, with gated instructions that force agents to read those files before creating any PR > - The benefit is that developers (human or agent) can opt in to structured dev guidance without polluting the default agent skill set or creating drift between duplicated docs ## What Changed - Added `skills/paperclip-dev/SKILL.md` covering server management, worktree lifecycle, builds, database ops, diagnostics, agent operations, and common mistakes - The Pull Requests section uses gated, reference-based instructions — agents MUST read `.github/PULL_REQUEST_TEMPLATE.md` and `CONTRIBUTING.md` before running `gh pr create`, with a brief checklist of required section names (no content duplication) - Updated `packages/adapter-utils/src/server-utils.ts` to respect `required: false` frontmatter — optional skills are bundled but not auto-installed on agents - Added test in `server/src/__tests__/paperclip-skill-utils.test.ts` verifying that optional skills are excluded from the default install set ## Verification ```bash # Run tests pnpm test # Manual verification: create a fresh worktree without seeding npx paperclipai worktree:make test-optional-skill --no-seed cd ~/paperclip-test-optional-skill eval "$(npx paperclipai worktree env)" npx paperclipai run # Verify paperclip-dev appears in company skill library but is NOT auto-assigned # Call listPaperclipSkillEntries() — paperclip-dev should show required: false # Call resolvePaperclipDesiredSkillNames() — paperclip-dev should NOT be in the default set # Cleanup npx paperclipai worktree:cleanup test-optional-skill ``` ## Risks - Low risk. The `required` field defaults to `true` when absent, so all existing skills behave identically. Only the new `paperclip-dev` skill sets `required: false`. ## Model Used Claude Opus 4.6 (`claude-opus-4-6`) via Claude Code, with tool use and extended context. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-26 11:06:13 -07:00
Dotta	df425fde96	Present ordered sub-issues as a workflow checklist (#4523 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Operators use issue detail pages and child issue lists to understand multi-step execution plans. > - Ordered sub-issues currently read like a flat table, so dependency chains and current next steps are harder to scan. > - The branch work adds a workflow-oriented presentation for child issues without changing the single-assignee task model. > - This pull request makes ordered sub-issues read more like a progress checklist while preserving normal issue list controls. > - The benefit is that operators can see completed steps, active work, blocked follow-ups, and dependency order at a glance. ## What Changed - Added workflow sorting utilities and tests for dependency-aware child issue ordering. - Added sub-issue progress summary, checklist numbering, current-step affordances, blocker context, and done-state de-emphasis in the issue list UI. - Wired issue detail sub-issue panels to use the workflow sort/progress checklist presentation. - Updated issue service behavior/tests for child issue ordering inputs used by the UI. - Added a Storybook visual review fixture and screenshot helper for the sub-issue workflow checklist surface. ## Verification - `pnpm run preflight:workspace-links && pnpm exec vitest run server/src/__tests__/issues-service.test.ts ui/src/components/IssueRow.test.tsx ui/src/components/IssuesList.test.tsx ui/src/pages/IssueDetail.test.tsx ui/src/lib/issue-detail-subissues.test.ts ui/src/lib/workflow-sort.test.ts` - Result: 6 test files passed, 55 tests passed, 34 embedded Postgres issue-service tests skipped because `@embedded-postgres/darwin-x64` is unavailable on this host. - Visual review: generated Storybook screenshots from the existing local Storybook server on port 6006 with `node scripts/screenshot-subissues.mjs /tmp/pap-2189-subissues-screens http://localhost:6006`. - Screenshot artifacts: - Desktop dark: ![Desktop dark](doc/assets/pap-2189/desktop-1440x900-dark.png) - Desktop light: ![Desktop light](doc/assets/pap-2189/desktop-1440x900-light.png) - Mobile dark: ![Mobile dark](doc/assets/pap-2189/mobile-390x844-dark.png) - Mobile light: ![Mobile light](doc/assets/pap-2189/mobile-390x844-light.png) - Local Storybook note: starting a second Storybook process selected port 6008 because 6006 was occupied, then Vite failed with an esbuild host/binary version mismatch (`0.25.12` host vs `0.27.3` binary). The already-running Storybook server on 6006 served the fixture successfully for screenshots. ## Risks - Medium UI risk: the issue list now has additional sub-issue-specific visual states, so dense lists should be checked for spacing and scanability. - Low ordering risk: workflow sorting is covered by focused unit tests, but unusual dependency topologies may still need reviewer attention. - No migration risk: this PR does not add database migrations or touch `pnpm-lock.yaml`. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled shell/git/GitHub workflow. Context window is runtime-provided and not exposed in this environment. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-26 07:36:49 -05:00
Devin Foley	5bd0f578fd	Generalize sandbox provider core for plugin-only providers (#4449 ) ## Thinking Path > - Paperclip is a control plane, so optional execution providers should sit at the plugin edge instead of hardcoding provider-specific behavior into core shared/server/ui layers. > - Sandbox environments are already first-class, and the fake provider proves the built-in path; the remaining gap was that real providers still leaked provider-specific config and runtime assumptions into core. > - That coupling showed up in config normalization, secret persistence, capabilities reporting, lease reconstruction, and the board UI form fields. > - As long as core knew about those provider-shaped details, shipping a provider as a pure third-party plugin meant every new provider would still require host changes. > - This pull request generalizes the sandbox provider seam around schema-driven plugin metadata and generic secret-ref handling. > - The runtime and UI now consume provider metadata generically, so core only special-cases the built-in fake provider while third-party providers can live entirely in plugins. ## What Changed - Added generic sandbox-provider capability metadata so plugin-backed providers can expose `configSchema` through shared environment support and the environments capabilities API. - Reworked sandbox config normalization/persistence/runtime resolution to handle schema-declared secret-ref fields generically, storing them as Paperclip secrets and resolving them for probe/execute/release flows. - Generalized plugin sandbox runtime handling so provider validation, reusable-lease matching, lease reconstruction, and plugin worker calls all operate on provider-agnostic config instead of provider-shaped branches. - Replaced hardcoded sandbox provider form fields in Company Settings with schema-driven rendering and blocked agent environment selection from the built-in fake provider. - Added regression coverage for the generic seam across shared support helpers plus environment config, probe, routes, runtime, and sandbox-provider runtime tests. ## Verification - `pnpm vitest --run packages/shared/src/environment-support.test.ts server/src/__tests__/environment-config.test.ts server/src/__tests__/environment-probe.test.ts server/src/__tests__/environment-routes.test.ts server/src/__tests__/environment-runtime.test.ts server/src/__tests__/sandbox-provider-runtime.test.ts` - `pnpm -r typecheck` ## Risks - Plugin sandbox providers now depend more heavily on accurate `configSchema` declarations; incorrect schemas can misclassify secret-bearing fields or omit required config. - Reusable lease matching is now metadata-driven for plugin-backed providers, so providers that fail to persist stable metadata may reprovision instead of resuming an existing lease. - The UI form is now fully schema-driven for plugin-backed sandbox providers; provider manifests without good defaults or descriptions may produce a rougher operator experience. ## Model Used - OpenAI Codex via `codex_local` - Model ID: `gpt-5.4` - Reasoning effort: `high` - Context window observed in runtime session metadata: `258400` tokens - Capabilities used: terminal tool execution, git, and local code/test inspection ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge	2026-04-24 18:03:41 -07:00
Dotta	deba60ebb2	Stabilize serialized server route tests (#4448 ) ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - The server route suite is a core confidence layer for auth, issue context, and workspace runtime behavior > - Some route tests were doing extra module/server isolation work that made local runs slower and more fragile > - The stable Vitest runner also needs to pass server-relative exclude paths to avoid accidentally re-including serialized suites > - This pull request tightens route test isolation and runner serialization behavior > - The benefit is more reliable targeted and stable-route test execution without product behavior changes ## What Changed - Updated `run-vitest-stable.mjs` to exclude serialized server tests using server-relative paths. - Forced the server Vitest config to use a single worker in addition to isolated forks. - Simplified agent permission route tests to create per-request test servers without shared server lifecycle state. - Stabilized issue goal context route mocks by using static mocked services and a sequential suite. - Re-registered workspace runtime route mocks before cache-busted route imports. ## Verification - `pnpm exec vitest run --project @paperclipai/server server/src/__tests__/agent-permissions-routes.test.ts server/src/__tests__/issues-goal-context-routes.test.ts server/src/__tests__/workspace-runtime-routes-authz.test.ts --pool=forks --poolOptions.forks.isolate=true` - `node --check scripts/run-vitest-stable.mjs` ## Risks - Low risk. This is test infrastructure only. - The stable runner path fix changes which tests are excluded from the non-serialized server batch, matching the server project root that Vitest applies internally. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool-enabled with shell/GitHub/Paperclip API access. Context window was not reported by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>	2026-04-24 19:27:00 -05:00

1 2 3 4 5 ...

868 Commits