Files
paperclip/packages/adapters/cursor-local/src/server/test.ts
T
Devin Foley 1bd44c8a0d Harden Cloudflare sandbox execution (#5967)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Remote-managed adapters need sandbox/environment execution to behave
like real agent runs, not just local host probes.
> - The Cloudflare sandbox path was the weakest leg in the SSH +
Cloudflare QA matrix because bridge execution could truncate output,
time out long-running installs, and under-provision the worker instance.
> - That made several adapters fail for reasons unrelated to their
actual business logic, which blocks confidence in Paperclip's non-local
environment model.
> - This pull request hardens the Cloudflare bridge/runtime path and
adjusts sandbox probe budgets so adapter verification matches the
measured behavior of the fixed environment.
> - It also corrects the Pi sandbox install command so the QA matrix
exercises a real, supported install path.
> - The benefit is a materially more reliable SSH + Cloudflare adapter
matrix with fewer false negatives and clearer failure boundaries.

## What Changed

- Switched the Cloudflare bridge worker instance type to `standard-2`
for the QA-matrix execution path.
- Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE
keepalives so long-running install/exec calls can complete instead of
dying at the transport layer.
- Fixed Cloudflare bridge-channel command handling to avoid dropped
final stdout chunks on short-lived execs.
- Made Claude, OpenCode, and Cursor sandbox probe timeouts
configurable/sandbox-aware, then tightened the defaults to the measured
post-fix range.
- Updated the Pi sandbox install command to use the package currently
installed by the official `pi.dev` installer, pinned to a specific npm
version.
- Added/updated tests around Cloudflare bridge behavior and adapter
sandbox probe paths.

## Verification

- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/adapter-opencode-local typecheck`
- `pnpm --filter @paperclipai/adapter-cursor-local typecheck`
- `pnpm vitest run packages/adapters/cursor-local
packages/adapters/claude-local packages/adapters/opencode-local
packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare
server/src/services/__tests__/plugin-worker-manager.test.ts`
- Manual QA on the dedicated dev instance using the SSH + Cloudflare
environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes:
SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare
`claude_local`, `codex_local`, `cursor`, `gemini_local`.

## Risks

- Cloudflare sandbox cost increases because the bridge worker now runs
on `standard-2` instead of `lite`.
- Higher timeout ceilings can delay surfacing truly hung Cloudflare
bridge calls, even though they remove transport-level false negatives.
- The manual heartbeat matrix still exposed follow-on
execution/sync/disposition bugs in `opencode_local` and `pi_local`;
those are not fixed by this PR.

## Model Used

- OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`,
tool use enabled, repo search enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable)
- [x] I have updated relevant documentation to reflect my changes (not
applicable)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-13 22:00:10 -07:00

379 lines
13 KiB
TypeScript

import type {
AdapterEnvironmentCheck,
AdapterEnvironmentTestContext,
AdapterEnvironmentTestResult,
} from "@paperclipai/adapter-utils";
import {
asNumber,
asString,
asStringArray,
parseObject,
ensurePathInEnv,
} from "@paperclipai/adapter-utils/server-utils";
import {
ensureAdapterExecutionTargetCommandResolvable,
ensureAdapterExecutionTargetDirectory,
maybeRunSandboxInstallCommand,
runAdapterExecutionTargetProcess,
describeAdapterExecutionTarget,
resolveAdapterExecutionTargetCwd,
} from "@paperclipai/adapter-utils/execution-target";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { DEFAULT_CURSOR_LOCAL_MODEL, SANDBOX_INSTALL_COMMAND } from "../index.js";
import { parseCursorJsonl } from "./parse.js";
import { isDefaultCursorCommand, prepareCursorSandboxCommand } from "./remote-command.js";
import { hasCursorTrustBypassArg } from "../shared/trust.js";
function summarizeStatus(checks: AdapterEnvironmentCheck[]): AdapterEnvironmentTestResult["status"] {
if (checks.some((check) => check.level === "error")) return "fail";
if (checks.some((check) => check.level === "warn")) return "warn";
return "pass";
}
function isNonEmpty(value: unknown): value is string {
return typeof value === "string" && value.trim().length > 0;
}
function firstNonEmptyLine(text: string): string {
return (
text
.split(/\r?\n/)
.map((line) => line.trim())
.find(Boolean) ?? ""
);
}
function summarizeProbeDetail(stdout: string, stderr: string, parsedError: string | null): string | null {
const raw = parsedError?.trim() || firstNonEmptyLine(stderr) || firstNonEmptyLine(stdout);
if (!raw) return null;
const clean = raw.replace(/\s+/g, " ").trim();
const max = 240;
return clean.length > max ? `${clean.slice(0, max - 1)}` : clean;
}
export interface CursorAuthInfo {
email: string | null;
displayName: string | null;
userId: number | null;
}
export function cursorConfigPath(cursorHome?: string): string {
return path.join(cursorHome ?? path.join(os.homedir(), ".cursor"), "cli-config.json");
}
export async function readCursorAuthInfo(cursorHome?: string): Promise<CursorAuthInfo | null> {
let raw: string;
try {
raw = await fs.readFile(cursorConfigPath(cursorHome), "utf8");
} catch {
return null;
}
let parsed: unknown;
try {
parsed = JSON.parse(raw);
} catch {
return null;
}
if (typeof parsed !== "object" || parsed === null) return null;
const obj = parsed as Record<string, unknown>;
const authInfo = obj.authInfo;
if (typeof authInfo !== "object" || authInfo === null) return null;
const info = authInfo as Record<string, unknown>;
const email = typeof info.email === "string" && info.email.trim().length > 0 ? info.email.trim() : null;
const displayName = typeof info.displayName === "string" && info.displayName.trim().length > 0 ? info.displayName.trim() : null;
const userId = typeof info.userId === "number" ? info.userId : null;
if (!email && !displayName && userId == null) return null;
return { email, displayName, userId };
}
const CURSOR_AUTH_REQUIRED_RE =
/(?:authentication\s+required|not\s+authenticated|not\s+logged\s+in|unauthorized|invalid(?:\s+or\s+missing)?\s+api(?:[_\s-]?key)?|cursor[_\s-]?api[_\s-]?key|run\s+'?agent\s+login'?\s+first|api(?:[_\s-]?key)?(?:\s+is)?\s+required)/i;
export async function testEnvironment(
ctx: AdapterEnvironmentTestContext,
): Promise<AdapterEnvironmentTestResult> {
const checks: AdapterEnvironmentCheck[] = [];
const config = parseObject(ctx.config);
let command = asString(config.command, "agent");
const target = ctx.executionTarget ?? null;
const targetIsRemote = target?.kind === "remote";
const targetIsSandbox = target?.kind === "remote" && target.transport === "sandbox";
const cwd = resolveAdapterExecutionTargetCwd(target, asString(config.cwd, ""), process.cwd());
const targetLabel = targetIsRemote
? ctx.environmentName ?? describeAdapterExecutionTarget(target)
: null;
const runId = `cursor-envtest-${Date.now()}-${Math.random().toString(16).slice(2)}`;
if (targetLabel) {
checks.push({
code: "cursor_environment_target",
level: "info",
message: `Probing inside environment: ${targetLabel}`,
});
}
try {
await ensureAdapterExecutionTargetDirectory(runId, target, cwd, {
cwd,
env: {},
createIfMissing: true,
});
checks.push({
code: "cursor_cwd_valid",
level: "info",
message: `Working directory is valid: ${cwd}`,
});
} catch (err) {
checks.push({
code: "cursor_cwd_invalid",
level: "error",
message: err instanceof Error ? err.message : "Invalid working directory",
detail: cwd,
});
}
const envConfig = parseObject(config.env);
let env: Record<string, string> = {};
for (const [key, value] of Object.entries(envConfig)) {
if (typeof value === "string") env[key] = value;
}
const sandboxCommand = await prepareCursorSandboxCommand({
runId,
target,
command,
cwd,
env,
timeoutSec: 45,
graceSec: 5,
});
command = sandboxCommand.command;
env = sandboxCommand.env;
const installCheck = await maybeRunSandboxInstallCommand({
runId,
target,
adapterKey: "cursor",
installCommand: SANDBOX_INSTALL_COMMAND,
detectCommand: command,
env,
});
if (installCheck) checks.push(installCheck);
const finalSandboxCommand = await prepareCursorSandboxCommand({
runId,
target,
command,
cwd,
env,
remoteSystemHomeDirHint: sandboxCommand.remoteSystemHomeDir,
timeoutSec: 45,
graceSec: 5,
});
command = finalSandboxCommand.command;
env = finalSandboxCommand.env;
const runtimeEnv = ensurePathInEnv({ ...process.env, ...env });
try {
await ensureAdapterExecutionTargetCommandResolvable(command, target, cwd, runtimeEnv);
checks.push({
code: "cursor_command_resolvable",
level: "info",
message: `Command is executable: ${command}`,
});
} catch (err) {
checks.push({
code: "cursor_command_unresolvable",
level: "error",
message: err instanceof Error ? err.message : "Command is not executable",
detail: command,
});
}
const configCursorApiKey = env.CURSOR_API_KEY;
const hostCursorApiKey = targetIsRemote ? undefined : process.env.CURSOR_API_KEY;
if (isNonEmpty(configCursorApiKey) || isNonEmpty(hostCursorApiKey)) {
const source = isNonEmpty(configCursorApiKey) ? "adapter config env" : "server environment";
checks.push({
code: "cursor_api_key_present",
level: "info",
message: "CURSOR_API_KEY is set for Cursor authentication.",
detail: `Detected in ${source}.`,
});
} else if (!targetIsRemote) {
const cursorHome = isNonEmpty(env.CURSOR_HOME) ? env.CURSOR_HOME : undefined;
const cursorAuth = await readCursorAuthInfo(cursorHome).catch(() => null);
if (cursorAuth) {
checks.push({
code: "cursor_native_auth_present",
level: "info",
message: "Cursor is authenticated via `agent login`.",
detail: cursorAuth.email
? `Logged in as ${cursorAuth.email}.`
: `Credentials found in ${cursorConfigPath(cursorHome)}.`,
});
} else {
checks.push({
code: "cursor_api_key_missing",
level: "warn",
message: "CURSOR_API_KEY is not set. Cursor runs may fail until authentication is configured.",
hint: "Set CURSOR_API_KEY in adapter env or run `agent login`.",
});
}
}
const canRunProbe =
checks.every((check) => check.code !== "cursor_cwd_invalid" && check.code !== "cursor_command_unresolvable");
if (canRunProbe) {
if (!isDefaultCursorCommand(command)) {
checks.push({
code: "cursor_hello_probe_skipped_custom_command",
level: "info",
message: "Skipped hello probe because command is not a default Cursor CLI entrypoint.",
detail: command,
hint: "Use `agent` or `cursor-agent` to run the automatic installation and auth probe.",
});
} else {
// Cursor's `agent` binary still pays cold-start overhead in container
// sandboxes, but standard-2 probes no longer need a 120s version budget.
const versionProbeTimeoutSec = Math.max(
1,
asNumber(config.versionProbeTimeoutSec, targetIsSandbox ? 60 : 45),
);
const versionProbe = await runAdapterExecutionTargetProcess(
runId,
target,
command,
["--version"],
{
cwd,
env,
timeoutSec: versionProbeTimeoutSec,
graceSec: 5,
onLog: async () => {},
},
);
const versionDetail = summarizeProbeDetail(versionProbe.stdout, versionProbe.stderr, null);
if (versionProbe.timedOut) {
checks.push({
code: "cursor_version_probe_timed_out",
level: "error",
message: "Cursor version probe timed out.",
hint: "Run `agent --version` manually in this working directory to confirm the installed CLI is reachable non-interactively.",
});
} else if ((versionProbe.exitCode ?? 1) === 0) {
checks.push({
code: "cursor_version_probe_passed",
level: "info",
message: "Cursor version probe succeeded.",
...(versionDetail ? { detail: versionDetail } : {}),
});
} else {
checks.push({
code: "cursor_version_probe_failed",
level: "error",
message: "Cursor version probe failed.",
...(versionDetail ? { detail: versionDetail } : {}),
hint: "Run `agent --version` manually in this working directory to confirm the installed CLI is reachable non-interactively.",
});
}
const canRunHelloProbe = checks.every(
(check) =>
check.code !== "cursor_version_probe_failed" &&
check.code !== "cursor_version_probe_timed_out",
);
if (!canRunHelloProbe) {
return {
adapterType: ctx.adapterType,
status: summarizeStatus(checks),
checks,
testedAt: new Date().toISOString(),
};
}
const model = asString(config.model, DEFAULT_CURSOR_LOCAL_MODEL).trim();
const extraArgs = (() => {
const fromExtraArgs = asStringArray(config.extraArgs);
if (fromExtraArgs.length > 0) return fromExtraArgs;
return asStringArray(config.args);
})();
const autoTrustEnabled = !hasCursorTrustBypassArg(extraArgs);
const args = ["-p", "--mode", "ask", "--output-format", "json", "--workspace", cwd];
if (model) args.push("--model", model);
if (autoTrustEnabled) args.push("--yolo");
if (extraArgs.length > 0) args.push(...extraArgs);
args.push("Respond with hello.");
// Sandbox bridges still add cursor CLI cold-start overhead, but the
// standard-2 tier now completes probes fast enough that 90s is ample.
const helloProbeTimeoutSec = Math.max(
1,
asNumber(config.helloProbeTimeoutSec, targetIsSandbox ? 90 : 45),
);
const probe = await runAdapterExecutionTargetProcess(
runId,
target,
command,
args,
{
cwd,
env,
timeoutSec: helloProbeTimeoutSec,
graceSec: 5,
onLog: async () => {},
},
);
const parsed = parseCursorJsonl(probe.stdout);
const detail = summarizeProbeDetail(probe.stdout, probe.stderr, parsed.errorMessage);
const authEvidence = `${parsed.errorMessage ?? ""}\n${probe.stdout}\n${probe.stderr}`.trim();
if (probe.timedOut) {
checks.push({
code: "cursor_hello_probe_timed_out",
level: "warn",
message: "Cursor hello probe timed out.",
hint: "Retry the probe. If this persists, verify `agent -p --mode ask --output-format json \"Respond with hello.\"` manually.",
});
} else if ((probe.exitCode ?? 1) === 0) {
const summary = parsed.summary.trim();
const hasHello = /\bhello\b/i.test(summary);
checks.push({
code: hasHello ? "cursor_hello_probe_passed" : "cursor_hello_probe_unexpected_output",
level: hasHello ? "info" : "warn",
message: hasHello
? "Cursor hello probe succeeded."
: "Cursor probe ran but did not return `hello` as expected.",
...(summary ? { detail: summary.replace(/\s+/g, " ").trim().slice(0, 240) } : {}),
...(hasHello
? {}
: {
hint: "Try `agent -p --mode ask --output-format json \"Respond with hello.\"` manually to inspect full output.",
}),
});
} else if (CURSOR_AUTH_REQUIRED_RE.test(authEvidence)) {
checks.push({
code: "cursor_hello_probe_auth_required",
level: "warn",
message: "Cursor CLI is installed, but authentication is not ready.",
...(detail ? { detail } : {}),
hint: "Run `agent login` or configure CURSOR_API_KEY in adapter env/shell, then retry the probe.",
});
} else {
checks.push({
code: "cursor_hello_probe_failed",
level: "error",
message: "Cursor hello probe failed.",
...(detail ? { detail } : {}),
hint: "Run `agent -p --mode ask --output-format json \"Respond with hello.\"` manually in this working directory to debug.",
});
}
}
}
return {
adapterType: ctx.adapterType,
status: summarizeStatus(checks),
checks,
testedAt: new Date().toISOString(),
};
}