7bbdfb69dfb8b073d0aebfb4798102914871e895
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b24c6909e8 |
Harden remote sandbox runtime probes, timeouts, and installs (#5685)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Each agent runs inside a sandbox environment so its CLI is isolated from the host > - Sandbox-backed adapter runs go through a small set of shared helpers — `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings > - When standing up new sandbox provider plugins, the existing helpers timed out, missed install fallbacks, or leaned on assumptions that only held for E2B > - Local adapters (`claude-local`, `codex-local`, `gemini-local`, `opencode-local`) needed slightly hardened probes so they could install themselves and validate inside *any* remote sandbox transport, not just E2B > - This pull request bundles those runtime fixes so future sandbox provider plugins inherit a working baseline > - The benefit is that adding a new sandbox provider plugin no longer requires touching adapter-utils or each local-adapter probe — the supporting infra is already correct ## What Changed - `packages/adapter-utils/src/execution-target.ts`: introduce `DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and `resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters keep the historical "0 means no adapter timeout" behavior; sandbox-backed runs without an explicit `timeoutSec` get an explicit 30-minute default so remote installs and warm-up don't time out at the per-RPC default. Plumbed `timeoutSec` through `ensureAdapterExecutionTargetCommandResolvable` so install probes inside a sandbox honor adapter-level overrides instead of the bridge's 5-minute default. - `packages/adapters/opencode-local/src/index.ts`: switch `SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl -fsSL https://opencode.ai/install | bash`. The npm package reifies four large prebuilt-binary subpackages in parallel even though only one matches the host arch; on bandwidth-constrained sandboxes that blew through the 240s install budget. The official installer fetches one arch-specific binary and adds `$HOME/.opencode/bin` to PATH via `~/.bashrc`, which the sandbox-callback-bridge login-shell script already sources. - `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden remote-target probes — pass `--skip-git-repo-check` for Codex when probing outside a repo, normalize permission flags for Claude, and add `*.remote.test.ts` coverage that exercises the remote-sandbox path explicitly for each adapter. - `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}` (new): add `buildSandboxNpmInstallCommand` helper. `server/src/adapters/registry.ts` + new `server/src/__tests__/adapter-registry.test.ts`: wire adapter install commands so they fall back to a writable `$HOME/.local` prefix when global install isn't available. - `server/src/__tests__/plugin-worker-manager.test.ts` + new `server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call timeout overrides so plugin worker exec calls honor the caller's timeout instead of the worker's default. ## Verification - `pnpm typecheck` - `pnpm exec vitest run --no-coverage packages/adapter-utils/src/execution-target-sandbox.test.ts packages/adapter-utils/src/sandbox-install-command.test.ts` - `pnpm exec vitest run --no-coverage server/src/__tests__/plugin-worker-manager.test.ts server/src/__tests__/adapter-registry.test.ts server/src/__tests__/claude-local-adapter-environment.test.ts server/src/__tests__/claude-local-execute.test.ts server/src/__tests__/gemini-local-adapter-environment.test.ts` - `pnpm exec vitest run --no-coverage packages/adapters/codex-local/src/server/test.remote.test.ts packages/adapters/opencode-local/src/server/test.remote.test.ts packages/adapters/codex-local/src/server/codex-args.test.ts packages/adapters/codex-local/src/server/execute.remote.test.ts packages/adapters/gemini-local/src/server/execute.remote.test.ts` All passing locally. ## Risks - Touches shared `adapter-utils` and several `*-local` adapters. The 30-minute default applies only when both (a) the target is `remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH paths are unchanged. New test coverage was added alongside each behavior change to pin the contracts. - Switching OpenCode's install command to the official installer is a behavior change for any operator running OpenCode inside a remote sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND` only runs when an adapter is being installed inside a sandbox). - Low risk overall — no migrations, no API surface change. ## Model Used - Provider: Anthropic - Model: Claude Opus 4.7 (1M context) - Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep), no code execution beyond local repo commands ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A, no UI change - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
778e775c35 |
Add secrets provider vaults and remote import (#5429)
## Thinking Path > - Paperclip orchestrates AI-agent companies and needs secrets handling to work across local development, hosted operators, and governed agent execution. > - The affected subsystem is the company-scoped secrets control plane: database schema, server services/routes, CLI workflows, and the Secrets settings UI. > - The gap was that secrets were local-only and operators could not manage provider vaults or import existing remote references without exposing plaintext. > - This branch adds provider vault configuration plus an AWS Secrets Manager remote-import path while preserving company boundaries, binding context, and audit trails. > - I kept the PR to a single branch PR, removed unrelated lockfile/package drift, rebased the full branch onto the current `public-gh/master`, and addressed fresh Greptile findings. > - The benefit is a reviewable implementation of provider-backed secrets with focused tests covering provider selection, import conflicts, deleted secret reuse, rotation guards, and AWS signing behavior. ## What Changed - Added provider vault support for company secrets, including provider config storage, default vault handling, health checks, binding usage, access events, and remote import preview/commit. - Added an AWS Secrets Manager provider using SigV4 request signing, bounded request timeouts, namespace guardrails, cached runtime credential resolution, and external-reference linking without plaintext reads. - Added Secrets UI surfaces for vault management and remote import, plus CLI/API documentation for setup and operations. - Stabilized routine webhook secret binding paths and SSH environment-driver fixture bindings discovered during verification. - Addressed Greptile and CI findings: no lockfile/package drift, monotonic migration metadata, disabled-vault default races, soft-deleted secret hiding/recreate behavior, remove behavior with disabled vaults, soft-deleted external-reference re-import, non-active rotation guards, managed-secret soft deletion through PATCH, and per-call AWS SDK credential client churn. - Rebased this branch onto `public-gh/master` at `0e1a5828` and force-pushed with lease to keep this as the single PR for the branch. ## Verification - `git fetch public-gh master` - `git rebase public-gh/master` - `git diff --name-only public-gh/master...HEAD | grep '^pnpm-lock\.yaml$' || true` confirmed `pnpm-lock.yaml` is not in the PR diff. - Confirmed migration ordering: master ends at `0081_optimal_dormammu`; this PR adds `0082_dry_vision` and `0083_company_secret_provider_configs`. - Inspected migrations for repeat safety: new tables/indexes use `IF NOT EXISTS`; foreign keys are guarded by `DO $$ ... IF NOT EXISTS`; column additions use `ADD COLUMN IF NOT EXISTS`. - `pnpm -r typecheck` passed before the Greptile follow-up commits. - `pnpm test:run` ran the full stable Vitest path before the Greptile follow-up commits; it completed with 3 timing-related failures under parallel load: `codex-local-execute.test.ts`, `cursor-local-execute.test.ts`, and `environment-service.test.ts`. - `pnpm --filter @paperclipai/server exec vitest run src/__tests__/codex-local-execute.test.ts src/__tests__/cursor-local-execute.test.ts src/__tests__/environment-service.test.ts` passed on targeted rerun (`24/24`). - `pnpm build` passed before the Greptile follow-up commits. Vite reported existing chunk-size/dynamic-import warnings. - After Greptile follow-up commits: `pnpm --filter @paperclipai/server exec vitest run src/__tests__/secrets-service.test.ts` passed (`26/26`). - After Greptile follow-up commits: `pnpm --filter @paperclipai/server exec vitest run src/__tests__/aws-secrets-manager-provider.test.ts src/__tests__/secrets-service.test.ts` passed (`39/39`). - After Greptile follow-up commits: `pnpm --filter @paperclipai/server typecheck` passed. - Captured Storybook screenshots from `ui/storybook-static` for visual review. - Latest PR checks on `5ca3a5cf`: `policy`, serialized server suites 1/4-4/4, `Canary Dry Run`, `e2e`, `security/snyk`, and `Greptile Review` pass; aggregate `verify` is still registering the completed child checks. - Greptile review loop continued through the latest requested pass; all Greptile review threads are resolved and the latest `Greptile Review` check on `5ca3a5cf` passed with 0 comments added. ## Screenshots Before: the provider-vault and remote-import surfaces did not exist on `master`; these are after-state screenshots from the Storybook fixtures.    ## Risks - Migration risk: this adds new secret provider tables and extends existing secret rows. The migrations were checked for monotonic ordering and idempotent guards, but reviewers should still inspect upgrade behavior carefully. - Provider risk: AWS support uses direct SigV4 requests. Automated tests cover signing, request timeouts, vault-config selection, namespace guardrails, pending-version archival, sanitized provider errors, and service-level cleanup paths. A real-vault AWS smoke test remains deployment validation for an operator with AWS credentials rather than an unverified merge blocker in this local branch. - UI risk: the Secrets page and import dialog are large new surfaces; screenshots are included above for reviewer inspection. - Verification risk: the full local stable test command hit parallel-load timing failures, although the exact failed files passed when rerun directly. - Operational risk: remote import intentionally avoids plaintext reads; operators must understand that imported external references resolve at runtime and may fail if AWS permissions change. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent with local shell/tool use in the Paperclip worktree. Exact context-window size was not exposed by the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [ ] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
f6bad8f6bf |
Sanitize remote execution envs at the boundary (#5325)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Adapters spawn CLIs against local, SSH, and sandbox targets, threading a runtime env through `runAdapterExecutionTargetProcess` and the SSH/sandbox runners > - Host identity vars (HOME, TMPDIR, XDG_*, NVM_DIR, PATH) routinely leak into the env we send to remote targets — sometimes via test probes, sometimes via runtime config — and break sandboxed/SSH'd CLIs whose own profiles set those values correctly > - The sanitization logic existed but lived alongside other helpers in `server-utils.ts` and was applied piecemeal at adapter callsites, so it was easy to bypass > - This pull request lifts the sanitization into a standalone `remote-execution-env.ts`, applies it at the SSH and sandbox runtime boundary so every remote spawn goes through it, and removes the duplicated callsite-level filtering > - The benefit is identity-bound host env stops leaking across SSH/sandbox transports regardless of which adapter calls in ## What Changed - `packages/adapter-utils/src/remote-execution-env.ts`: new module — single source of truth for which env keys are identity-bound and how to strip them when the value matches the host's value - `packages/adapter-utils/src/server-utils.ts`: remove the inline sanitization (now in `remote-execution-env.ts`) - `packages/adapter-utils/src/execution-target.ts`: apply sanitization at the sandbox runtime boundary - `packages/adapter-utils/src/ssh.ts`: apply sanitization at the SSH spawn boundary - `packages/adapters/opencode-local/src/server/test.ts`: drop now-redundant callsite filtering - `packages/adapters/pi-local/src/server/test.ts`: drop now-redundant callsite filtering - New tests `execution-target.test.ts` and `execution-target-sandbox.test.ts` cover the sanitizer flow at both transports, including positive cases (host-shaped path stripped) and explicit-override preservation ## Verification - `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils --project @paperclipai/adapter-opencode-local --project @paperclipai/adapter-pi-local` - `pnpm typecheck` clean ## Risks Low–medium. The sanitization is now applied at one layer (boundary) instead of N (callsites), so behavior is more consistent. Any adapter that previously relied on a leaked host var landing on the remote shell would now see it stripped — but those reliances were what this change exists to fix. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable — new tests at both transports - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
076067865f |
Migrate SSH environment callback to bridge (#5116)
> **Stacked PR (part 3 of 7).** Depends on: - PR #5114 - PR #5115 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents executing on a remote SSH-backed environment need a way to call back into > the Paperclip control plane (run events, log streaming, signals) > - When the SSH host can't reach the Paperclip host (NAT, firewalls, or simply not > on the same network), the run silently fails or hangs — a recurring class of > failure during SSH testing > - In sandboxed environments we already solved this with a callback bridge that > tunnels back through the existing connection; SSH was the odd one out > - This PR migrates SSH execution to use the same callback bridge, so every > adapter's remote run uses one consistent reverse-channel. Per-adapter SSH glue > is deleted in favour of a shared `CommandManagedRuntimeRunner` built from the > SSH spec > - The benefit is fewer SSH-specific failure modes, a smaller code surface, and > one place to evolve the callback contract going forward ## What Changed - Added `createSshCommandManagedRuntimeRunner` in `packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a generic command-managed-runtime runner (with cwd, env, and timeout handling) - Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge URL now flows through the shared runner - Reworked `execution-target.ts` to use the SSH runner alongside sandbox runners via a unified `CommandManagedRuntimeRunner` interface - Simplified `remote-managed-runtime.ts` and `sandbox-managed-runtime.ts` to consume the shared runner abstraction - Deleted per-adapter SSH callback wiring from claude-local, codex-local, cursor-local, gemini-local, opencode-local, pi-local execute.ts files - Removed `environment-runtime-driver-contract.test.ts` (the contract is now enforced by `environment-execution-target.test.ts`) - Added/updated `execute.remote.test.ts` cases for each adapter to cover the SSH runner path ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm test -- execute.remote` (covers all six local adapters' SSH paths) - Manual QA: ran a claude-local agent against an SSH-backed environment, confirmed the agent successfully called back to `/api/agent-callback/*` endpoints during the run ## Risks - Refactor touches all six local adapters. If any adapter had subtle SSH-specific behaviour that wasn't captured in tests, it could regress. Mitigation: each adapter's `execute.remote.test.ts` was extended. - `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking type change for any internal consumer. Verified no external plugins consume this type. - The new `CommandManagedRuntimeRunner` shape is a public surface in `@paperclipai/adapter-utils`; downstream plugins implementing custom runners may need updates, but no such plugins exist in this repo. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
a7b45938b7 |
Let sandbox providers declare shell defaults (#5114)
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents execute in sandboxed remote environments served by pluggable
sandbox
> providers (E2B today, more later)
> - Today every sandbox command runs under `sh -lc` regardless of what
the
> provider's container actually ships
> - That misses bash-only shell init on E2B (which ships bash) and
prevents
> future providers from declaring a different default — there's no way
for a
> provider to say "I have bash, use it"
> - This PR adds a `shellCommand` field to sandbox execution targets so
providers
> can declare their preferred shell ("bash" for E2B), threads it through
the
> sandbox-managed-runtime client, callback bridge, and execution-target
shell
> helper, and validates the value at the lease-metadata boundary
> - The benefit is that sandbox commands run under the right shell on
the right
> provider, and adding new sandbox providers only needs to declare a
shell
> preference
## What Changed
- Added `packages/adapter-utils/src/sandbox-shell.ts` exporting
`preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is
`"bash"`,
else `"sh"`)
- Added `shellCommand?: "bash" | "sh" | null` to
`AdapterSandboxExecutionTarget`
and `CommandManagedRuntimeSpec`; threaded it through
`runAdapterExecutionTargetShellCommand`,
`prepareAdapterExecutionTargetRuntime`,
and `startAdapterExecutionTargetPaperclipBridge`
- `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`,
and
`createCommandManagedSandboxCallbackBridgeQueueClient` now take an
optional
`shellCommand` and use `preferredShellForSandbox` to pick the shell
- `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its
server
startup, readiness probe, and stop hook
- E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata`
- `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease
metadata
(validating against `"bash" | "sh" | null`)
- `environment-runtime.ts` adds `"shellCommand"` to
`INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS`
so the field round-trips through internal plugin config without leaking
to
external plugin metadata
- Updated tests in `command-managed-runtime.test.ts`,
`execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`,
`environment-execution-target.test.ts`
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-execution-target`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: boot a Paperclip instance, create an E2B-backed
environment, run a
claude_local agent against it, and confirm the run completes (verifies
bash
shell semantics flow through the callback bridge end-to-end)
## Risks
- E2B sandbox commands now run under `bash -lc` instead of `sh -lc`.
Bash is a
strict superset for the commands we issue (no busybox-only flags in our
shell
scripts), so risk is low. The shellCommand field is opt-in via lease
metadata —
providers that don't declare it stay on `sh`.
- New optional field on `CommandManagedRuntimeSpec` and
`AdapterSandboxExecutionTarget`.
Consumers ignoring the field retain previous behaviour (sh).
- Lease metadata now carries an additional field. Existing leases
without
`shellCommand` resolve to `null` and fall back to sh — backwards
compatible.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
|
||
|
|
a4ac6ff133 |
Add sandbox callback bridge for remote environment API access (#4801)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, which are isolated from the host network > - Sandboxed agents need to call back to the Paperclip API to report progress, post comments, and update issue status > - But sandbox environments cannot reach the Paperclip server directly because they run in isolated network namespaces > - This PR adds a callback bridge that proxies API requests from the sandbox to the Paperclip server, running as a local HTTP server on the host that forwards authenticated requests > - The bridge is started automatically when an adapter launches a sandbox execution, and torn down when the run completes > - The benefit is sandboxed agents can interact with the Paperclip API without requiring network-level access to the host, enabling E2B and similar providers to work end-to-end ## What Changed - Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a lightweight HTTP bridge server that accepts requests from sandbox environments and proxies them to the Paperclip API with authentication - Added request validation and security policy: the bridge only forwards requests to the configured API URL, validates content types, enforces size limits, and rejects non-API paths - Wired the bridge into all remote adapter execute paths (claude, codex, cursor, gemini, pi) — the bridge starts before the agent process and the bridge URL is passed via environment variables - Updated `environment-execution-target.ts` to prefer the explicit API URL from environment lease metadata for sandbox callback routing - Fixed Claude sandbox runtime setup to work with the bridge configuration - Added comprehensive test coverage for bridge request handling, policy enforcement, and sandbox execution integration - Fixed browser bundling — the bridge module is excluded from the frontend bundle via the adapter-utils index export ## Verification - `pnpm test` — all existing and new tests pass, including bridge unit tests and sandbox execution integration tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run an agent task, verify the agent can post comments and update issue status through the bridge ## Risks - Medium. This is a new network-facing component (HTTP server on localhost). The security policy restricts forwarding to the configured API URL only and validates all requests, but any proxy introduces attack surface. The bridge binds to localhost only and is scoped to the lifetime of a single agent run. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
70679a3321 |
Add sandbox environment support (#4415)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The environment/runtime layer decides where agent work executes and how the control plane reaches those runtimes. > - Today Paperclip can run locally and over SSH, but sandboxed execution needs a first-class environment model instead of one-off adapter behavior. > - We also want sandbox providers to be pluggable so the core does not hardcode every provider implementation. > - This branch adds the Sandbox environment path, the provider contract, and a deterministic fake provider plugin. > - That required synchronized changes across shared contracts, plugin SDK surfaces, server runtime orchestration, and the UI environment/workspace flows. > - The result is that sandbox execution becomes a core control-plane capability while keeping provider implementations extensible and testable. ## What Changed - Added sandbox runtime support to the environment execution path, including runtime URL discovery, sandbox execution targeting, orchestration, and heartbeat integration. - Added plugin-provider support for sandbox environments so providers can be supplied via plugins instead of hardcoded server logic. - Added the fake sandbox provider plugin with deterministic behavior suitable for local and automated testing. - Updated shared types, validators, plugin protocol definitions, and SDK helpers to carry sandbox provider and workspace-runtime contracts across package boundaries. - Updated server routes and services so companies can create sandbox environments, select them for work, and execute work through the sandbox runtime path. - Updated the UI environment and workspace surfaces to expose sandbox environment configuration and selection. - Added test coverage for sandbox runtime behavior, provider seams, environment route guards, orchestration, and the fake provider plugin. ## Verification - Ran locally before the final fixture-only scrub: - `pnpm -r typecheck` - `pnpm test:run` - `pnpm build` - Ran locally after the final scrub amend: - `pnpm vitest run server/src/__tests__/runtime-api.test.ts` - Reviewer spot checks: - create a sandbox environment backed by the fake provider plugin - run work through that environment - confirm sandbox provider execution does not inherit host secrets implicitly ## Risks - This touches shared contracts, plugin SDK plumbing, server runtime orchestration, and UI environment/workspace flows, so regressions would likely show up as cross-layer mismatches rather than isolated type errors. - Runtime URL discovery and sandbox callback selection are sensitive to host/bind configuration; if that logic is wrong, sandbox-backed callbacks may fail even when execution succeeds. - The fake provider plugin is intentionally deterministic and test-oriented; future providers may expose capability gaps that this branch does not yet cover. ## Model Used - OpenAI Codex coding agent on a GPT-5-class backend in the Paperclip/Codex harness. Exact backend model ID is not exposed in-session. Tool-assisted workflow with shell execution, file editing, git history inspection, and local test execution. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |