forked from farhoodlabs/paperclip
8014445b238531a00a6b91e79e94de6be9981f59
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e3c875c1c7 |
fix(sandbox): prevent E2B workspace upload + lease idle failures (PAPA-382) (#6560)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Heartbeats run inside managed sandboxes (E2B, Cloudflare Sandbox), and each run begins by uploading the agent's workspace as a tar archive > - PAPA-381's E2B runs were failing at 5 and 11 minutes — two distinct failure modes were entangled: workspace tar extraction errors on Linux, and sandbox idle/lease timeouts during normal heartbeat gaps > - Workspace tar extraction failed because macOS bsdtar embeds `LIBARCHIVE.xattr.*` PAX headers that GNU tar on Linux rejects with "This does not look like a tar archive"; the existing `COPYFILE_DISABLE=1` only suppresses AppleDouble `._*` sidecars, not inline PAX xattr entries > - E2B sandboxes also expired between heartbeats because `timeoutMs` defaulted to a short window and was never refreshed per execute, and Cloudflare sandboxes idled out because `sleepAfter` defaulted to 10 minutes > - This pull request adds `--no-xattrs` to the workspace tar invocation, refreshes the E2B sandbox lifetime on each execute and bumps the default `timeoutMs` to 1h, and raises the Cloudflare `sleepAfter` default to 1h > - The benefit is that long-running heartbeat-driven runs (Claude, Codex, etc.) survive across both their initial workspace upload and the natural idle gaps between executes on both E2B and Cloudflare ## What Changed - `packages/adapter-utils/src/sandbox-managed-runtime.ts`: added `--no-xattrs` to `createTarballFromDirectory` so macOS bsdtar produces a clean POSIX tar that GNU tar on Linux can extract, with an inline comment explaining why `COPYFILE_DISABLE=1` alone is insufficient. - `packages/plugins/sandbox-providers/e2b/src/plugin.ts`: refresh the sandbox lifetime on every execute (so long runs don't expire mid-job) and raised the default `timeoutMs` to 1h. - `packages/plugins/sandbox-providers/e2b/src/manifest.ts` and `plugin.test.ts`: updated manifest defaults and added regression coverage for the new behavior. - `packages/plugins/sandbox-providers/cloudflare/src/config.ts`, `manifest.ts`, `plugin.test.ts`: raised default `sleepAfter` from 10m to 1h, mirroring the E2B 1h default, and added a regression test asserting the acquire-lease request body sends `sleepAfter: "1h"` when not overridden. ## Verification - `pnpm --filter @paperclipai/plugin-e2b test` - `pnpm --filter @paperclipai/plugin-cloudflare-sandbox test` - Locally cherry-picked the `--no-xattrs` fix onto master and confirmed end-to-end via a real PAPA-381-style heartbeat-driven E2B run that the workspace upload now extracts cleanly on Linux. The user (board operator) tested this on master and reported "Ok, that worked." - Manual reviewer steps: trigger an E2B heartbeat from a macOS host (this is where the bsdtar xattr headers come from), confirm the workspace tar extracts on the Linux sandbox side; run a long (>15 min) Cloudflare sandbox flow and confirm no lost-lease/idle errors between executes. ## Risks - Low risk overall. - `--no-xattrs` is widely supported by both macOS bsdtar and GNU tar (Linux). Worst case it silently no-ops on a future host that doesn't support it; in that case the existing failure mode reappears, it doesn't introduce a new one. - Raising default `timeoutMs` (E2B) and `sleepAfter` (Cloudflare) from short values to 1h means sandboxes stay alive longer between executes by default. This is the intended behavior — operators that want a tighter idle window can still override via plugin config. - E2B per-execute sandbox lifetime refresh adds a small API call per execute; it is bounded by the same client that already handles execute traffic, so no new dependencies or retry semantics. ## Model Used - Claude (Anthropic), `claude-opus-4-7`, extended thinking enabled, tool use enabled (file/grep/git tools and Paperclip control-plane API). Used to diagnose the dual failure mode (workspace tar PAX xattr headers + sandbox lifetime), write the fixes and tests, and drive the verification loop with the board operator. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots (N/A — no UI changes) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
1bd44c8a0d |
Harden Cloudflare sandbox execution (#5967)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Remote-managed adapters need sandbox/environment execution to behave like real agent runs, not just local host probes. > - The Cloudflare sandbox path was the weakest leg in the SSH + Cloudflare QA matrix because bridge execution could truncate output, time out long-running installs, and under-provision the worker instance. > - That made several adapters fail for reasons unrelated to their actual business logic, which blocks confidence in Paperclip's non-local environment model. > - This pull request hardens the Cloudflare bridge/runtime path and adjusts sandbox probe budgets so adapter verification matches the measured behavior of the fixed environment. > - It also corrects the Pi sandbox install command so the QA matrix exercises a real, supported install path. > - The benefit is a materially more reliable SSH + Cloudflare adapter matrix with fewer false negatives and clearer failure boundaries. ## What Changed - Switched the Cloudflare bridge worker instance type to `standard-2` for the QA-matrix execution path. - Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE keepalives so long-running install/exec calls can complete instead of dying at the transport layer. - Fixed Cloudflare bridge-channel command handling to avoid dropped final stdout chunks on short-lived execs. - Made Claude, OpenCode, and Cursor sandbox probe timeouts configurable/sandbox-aware, then tightened the defaults to the measured post-fix range. - Updated the Pi sandbox install command to use the package currently installed by the official `pi.dev` installer, pinned to a specific npm version. - Added/updated tests around Cloudflare bridge behavior and adapter sandbox probe paths. ## Verification - `pnpm --filter @paperclipai/adapter-claude-local typecheck` - `pnpm --filter @paperclipai/adapter-opencode-local typecheck` - `pnpm --filter @paperclipai/adapter-cursor-local typecheck` - `pnpm vitest run packages/adapters/cursor-local packages/adapters/claude-local packages/adapters/opencode-local packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare server/src/services/__tests__/plugin-worker-manager.test.ts` - Manual QA on the dedicated dev instance using the SSH + Cloudflare environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes: SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare `claude_local`, `codex_local`, `cursor`, `gemini_local`. ## Risks - Cloudflare sandbox cost increases because the bridge worker now runs on `standard-2` instead of `lite`. - Higher timeout ceilings can delay surfacing truly hung Cloudflare bridge calls, even though they remove transport-level false negatives. - The manual heartbeat matrix still exposed follow-on execution/sync/disposition bugs in `opencode_local` and `pi_local`; those are not fixed by this PR. ## Model Used - OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`, tool use enabled, repo search enabled. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots (not applicable) - [x] I have updated relevant documentation to reflect my changes (not applicable) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
486fb88a15 |
Add Cloudflare sandbox provider plugin (#5687)
> _Stacked on top of #5685 → #5686. Diff against master includes commits from earlier PRs in the stack — review focuses on the two new commits (`Extend sandbox callback bridge for Worker-hosted plugins` + `Add Cloudflare sandbox provider plugin`)._ ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Each agent runs in a sandbox environment, and operators choose which provider backs that sandbox — today E2B and Daytona are bundled with the platform > - Cloudflare Workers + Durable Objects + the Sandbox SDK offer a credible new option: globally distributed, cheap idle, and operator-deployable as a single Worker > - To plug it in, Paperclip needs (a) a provider plugin that speaks the `PaperclipPluginManifestV1` lifecycle and (b) a small operator-deployed Worker — the **bridge** — that adapts Paperclip's runtime RPCs to the Cloudflare Sandbox SDK > - The plugin extends the existing sandbox-callback-bridge with a `bridge.transport: "worker"` discriminator so the platform routes runtime RPCs through the Worker bridge instead of the in-process runner > - This pull request adds the plugin, the bridge Worker template, and the supporting adapter-utils + server hooks the new transport needs > - The benefit is that operators can run sandboxes on Cloudflare's edge with no new platform code beyond installing the plugin and deploying the Worker ## What Changed **Shared support (`Extend sandbox callback bridge for Worker-hosted plugins`):** - `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`: expose `expectedHostHeader` so plugin-side bridge clients can verify the canonical request envelope before forwarding. - `packages/adapter-utils/src/command-managed-runtime.{ts,test.ts}`: relax the always-fresh runner construction so callers can re-use a runner across exec calls (Worker-hosted bridges hold the runner inside a Durable Object). - `server/src/services/environment-runtime.ts` + `environment-runtime.test.ts`: route Worker-hosted bridges through the same env-shaping path as E2B and pin the `requestEnv` contract. - `server/src/services/plugin-environment-driver.ts`: thread an optional `issueId` through the runtime descriptor so bridges can scope leases to the originating issue (used by Cloudflare to map a sandbox to the issue/workflow for billing and audit). - `packages/plugins/sdk/src/protocol.ts`: add `issueId?` to `PluginEnvironmentDriverBaseParams` and the new `bridge.transport: "worker"` discriminator that the new plugin declares. - `server/__tests__/heartbeat-plugin-environment.test.ts`: pin the heartbeat path against the new runtime descriptor. **The Cloudflare plugin itself (`Add Cloudflare sandbox provider plugin`):** - `packages/plugins/sandbox-providers/cloudflare/`: plugin entry, manifest, plugin runtime (lifecycle + bridge client), config parsing, and Vitest coverage. Manifest declares `bridge.transport: "worker"` so the platform routes runtime RPCs through the bridge client. - `bridge-template/`: a Worker template the operator deploys with `wrangler`. Owns Durable Object-backed sessions (`sessions.ts`), exec/stream routes (`exec.ts`, `routes.ts`), and an HMAC auth layer (`auth.ts`) that pins the `Host` header surface. Includes the SDK-contract-correct exec implementation, lease recovery, and chunked stdout/stderr streaming. - Tests cover lease/session handoff (`bridge-template/src/exec.test.ts`, `routes.test.ts`), bridge client request shaping (`src/bridge-client.test.ts`), and end-to-end plugin behavior (`src/plugin.test.ts`) including streamed exec output. 27 tests in total. - `README.md` walks the operator through deploying the bridge Worker, registering the plugin, and configuring the runtime. ## Verification - `pnpm typecheck` - `pnpm exec vitest run --no-coverage packages/adapter-utils/src/sandbox-callback-bridge.test.ts packages/adapter-utils/src/command-managed-runtime.test.ts server/src/__tests__/environment-runtime.test.ts server/src/__tests__/heartbeat-plugin-environment.test.ts` - `(cd packages/plugins/sandbox-providers/cloudflare && pnpm test)` — 27 passing For an operator-side smoke test: 1. Deploy the bridge: `cd packages/plugins/sandbox-providers/cloudflare/bridge-template && wrangler deploy` 2. Register the plugin in your Paperclip instance, point its bridge URL at the deployed Worker, set the HMAC shared secret. 3. Create a sandbox environment whose provider is `cloudflare`, then run a Codex or Claude job against it. ## Risks - Adds a new `bridge.transport: "worker"` code path, but the existing E2B / Daytona transports go through the same shaped helpers and have explicit test coverage that pins their behavior unchanged. - The Worker bridge stores session state in a Durable Object; operator instances must be aware of the corresponding Cloudflare costs (DO requests, storage). Documented in the README. - The `issueId` plumbing is optional throughout — existing plugins that don't supply it continue to work. ## Model Used - Provider: Anthropic - Model: Claude Opus 4.7 (1M context) - Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A, no UI change - [x] I have updated relevant documentation to reflect my changes (plugin README, bridge-template README) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |