Files
paperclip/packages/adapters/claude-local/src/server/test.ts
T
Devin Foley b24c6909e8 Harden remote sandbox runtime probes, timeouts, and installs (#5685)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent runs inside a sandbox environment so its CLI is isolated
from the host
> - Sandbox-backed adapter runs go through a small set of shared helpers
— `ensureAdapterExecutionTargetCommandResolvable`, the sandbox callback
bridge runner, and per-adapter `SANDBOX_INSTALL_COMMAND` strings
> - When standing up new sandbox provider plugins, the existing helpers
timed out, missed install fallbacks, or leaned on assumptions that only
held for E2B
> - Local adapters (`claude-local`, `codex-local`, `gemini-local`,
`opencode-local`) needed slightly hardened probes so they could install
themselves and validate inside *any* remote sandbox transport, not just
E2B
> - This pull request bundles those runtime fixes so future sandbox
provider plugins inherit a working baseline
> - The benefit is that adding a new sandbox provider plugin no longer
requires touching adapter-utils or each local-adapter probe — the
supporting infra is already correct

## What Changed

- `packages/adapter-utils/src/execution-target.ts`: introduce
`DEFAULT_REMOTE_SANDBOX_ADAPTER_TIMEOUT_SEC = 1800` and
`resolveAdapterExecutionTargetTimeoutSec(...)`. Local and SSH adapters
keep the historical "0 means no adapter timeout" behavior;
sandbox-backed runs without an explicit `timeoutSec` get an explicit
30-minute default so remote installs and warm-up don't time out at the
per-RPC default. Plumbed `timeoutSec` through
`ensureAdapterExecutionTargetCommandResolvable` so install probes inside
a sandbox honor adapter-level overrides instead of the bridge's 5-minute
default.
- `packages/adapters/opencode-local/src/index.ts`: switch
`SANDBOX_INSTALL_COMMAND` from `npm install -g opencode-ai` to `curl
-fsSL https://opencode.ai/install | bash`. The npm package reifies four
large prebuilt-binary subpackages in parallel even though only one
matches the host arch; on bandwidth-constrained sandboxes that blew
through the 240s install budget. The official installer fetches one
arch-specific binary and adds `$HOME/.opencode/bin` to PATH via
`~/.bashrc`, which the sandbox-callback-bridge login-shell script
already sources.
- `packages/adapters/{claude,codex,gemini,opencode}-local/`: harden
remote-target probes — pass `--skip-git-repo-check` for Codex when
probing outside a repo, normalize permission flags for Claude, and add
`*.remote.test.ts` coverage that exercises the remote-sandbox path
explicitly for each adapter.
- `packages/adapter-utils/src/sandbox-install-command.{ts,test.ts}`
(new): add `buildSandboxNpmInstallCommand` helper.
`server/src/adapters/registry.ts` + new
`server/src/__tests__/adapter-registry.test.ts`: wire adapter install
commands so they fall back to a writable `$HOME/.local` prefix when
global install isn't available.
- `server/src/__tests__/plugin-worker-manager.test.ts` + new
`server/src/__tests__/fixtures/plugin-worker-delayed.cjs`: pin per-call
timeout overrides so plugin worker exec calls honor the caller's timeout
instead of the worker's default.

## Verification

- `pnpm typecheck`
- `pnpm exec vitest run --no-coverage
packages/adapter-utils/src/execution-target-sandbox.test.ts
packages/adapter-utils/src/sandbox-install-command.test.ts`
- `pnpm exec vitest run --no-coverage
server/src/__tests__/plugin-worker-manager.test.ts
server/src/__tests__/adapter-registry.test.ts
server/src/__tests__/claude-local-adapter-environment.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/gemini-local-adapter-environment.test.ts`
- `pnpm exec vitest run --no-coverage
packages/adapters/codex-local/src/server/test.remote.test.ts
packages/adapters/opencode-local/src/server/test.remote.test.ts
packages/adapters/codex-local/src/server/codex-args.test.ts
packages/adapters/codex-local/src/server/execute.remote.test.ts
packages/adapters/gemini-local/src/server/execute.remote.test.ts`

All passing locally.

## Risks

- Touches shared `adapter-utils` and several `*-local` adapters. The
30-minute default applies only when both (a) the target is
`remote+sandbox` and (b) no `timeoutSec` is configured — local + SSH
paths are unchanged. New test coverage was added alongside each behavior
change to pin the contracts.
- Switching OpenCode's install command to the official installer is a
behavior change for any operator running OpenCode inside a remote
sandbox. Local installs are unaffected (the `SANDBOX_INSTALL_COMMAND`
only runs when an adapter is being installed inside a sandbox).
- Low risk overall — no migrations, no API surface change.

## Model Used

- Provider: Anthropic
- Model: Claude Opus 4.7 (1M context)
- Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep),
no code execution beyond local repo commands

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A, no UI change
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-11 00:31:54 -07:00

291 lines
11 KiB
TypeScript

import type {
AdapterEnvironmentCheck,
AdapterEnvironmentTestContext,
AdapterEnvironmentTestResult,
} from "@paperclipai/adapter-utils";
import {
asString,
asBoolean,
asNumber,
asStringArray,
parseObject,
ensurePathInEnv,
} from "@paperclipai/adapter-utils/server-utils";
import {
ensureAdapterExecutionTargetCommandResolvable,
ensureAdapterExecutionTargetDirectory,
maybeRunSandboxInstallCommand,
runAdapterExecutionTargetProcess,
describeAdapterExecutionTarget,
resolveAdapterExecutionTargetCwd,
} from "@paperclipai/adapter-utils/execution-target";
import path from "node:path";
import { detectClaudeLoginRequired, parseClaudeStreamJson } from "./parse.js";
import { isBedrockModelId } from "./models.js";
import { buildClaudeProbePermissionArgs } from "./permissions.js";
import { SANDBOX_INSTALL_COMMAND } from "../index.js";
function summarizeStatus(checks: AdapterEnvironmentCheck[]): AdapterEnvironmentTestResult["status"] {
if (checks.some((check) => check.level === "error")) return "fail";
if (checks.some((check) => check.level === "warn")) return "warn";
return "pass";
}
function isNonEmpty(value: unknown): value is string {
return typeof value === "string" && value.trim().length > 0;
}
function firstNonEmptyLine(text: string): string {
return (
text
.split(/\r?\n/)
.map((line) => line.trim())
.find(Boolean) ?? ""
);
}
function commandLooksLike(command: string, expected: string): boolean {
const base = path.basename(command).toLowerCase();
return base === expected || base === `${expected}.cmd` || base === `${expected}.exe`;
}
function summarizeProbeDetail(stdout: string, stderr: string): string | null {
const raw = firstNonEmptyLine(stderr) || firstNonEmptyLine(stdout);
if (!raw) return null;
const clean = raw.replace(/\s+/g, " ").trim();
const max = 240;
return clean.length > max ? `${clean.slice(0, max - 1)}` : clean;
}
export async function testEnvironment(
ctx: AdapterEnvironmentTestContext,
): Promise<AdapterEnvironmentTestResult> {
const checks: AdapterEnvironmentCheck[] = [];
const config = parseObject(ctx.config);
const command = asString(config.command, "claude");
const target = ctx.executionTarget ?? null;
const targetIsRemote = target?.kind === "remote";
const targetIsSandbox = target?.kind === "remote" && target.transport === "sandbox";
const cwd = resolveAdapterExecutionTargetCwd(target, asString(config.cwd, ""), process.cwd());
const targetLabel = targetIsRemote
? ctx.environmentName ?? describeAdapterExecutionTarget(target)
: null;
const runId = `claude-envtest-${Date.now()}-${Math.random().toString(16).slice(2)}`;
if (targetLabel) {
checks.push({
code: "claude_environment_target",
level: "info",
message: `Probing inside environment: ${targetLabel}`,
});
}
try {
await ensureAdapterExecutionTargetDirectory(runId, target, cwd, {
cwd,
env: {},
createIfMissing: true,
});
checks.push({
code: "claude_cwd_valid",
level: "info",
message: `Working directory is valid: ${cwd}`,
});
} catch (err) {
checks.push({
code: "claude_cwd_invalid",
level: "error",
message: err instanceof Error ? err.message : "Invalid working directory",
detail: cwd,
});
}
const envConfig = parseObject(config.env);
const env: Record<string, string> = {};
for (const [key, value] of Object.entries(envConfig)) {
if (typeof value === "string") env[key] = value;
}
const runtimeEnv = ensurePathInEnv({ ...process.env, ...env });
const installCheck = await maybeRunSandboxInstallCommand({
runId,
target,
adapterKey: "claude",
installCommand: SANDBOX_INSTALL_COMMAND,
detectCommand: command,
env,
});
if (installCheck) checks.push(installCheck);
try {
await ensureAdapterExecutionTargetCommandResolvable(command, target, cwd, runtimeEnv);
checks.push({
code: "claude_command_resolvable",
level: "info",
message: `Command is executable: ${command}`,
});
} catch (err) {
checks.push({
code: "claude_command_unresolvable",
level: "error",
message: err instanceof Error ? err.message : "Command is not executable",
detail: command,
});
}
// When probing a remote target, the Paperclip host's process.env does not
// reflect what the agent will actually see at runtime. Only consider env
// vars from the adapter config in that case; the probe itself will surface
// any auth issues on the remote box.
const considerHostEnv = !targetIsRemote;
const hasBedrock =
env.CLAUDE_CODE_USE_BEDROCK === "1" ||
env.CLAUDE_CODE_USE_BEDROCK === "true" ||
(considerHostEnv && process.env.CLAUDE_CODE_USE_BEDROCK === "1") ||
(considerHostEnv && process.env.CLAUDE_CODE_USE_BEDROCK === "true") ||
isNonEmpty(env.ANTHROPIC_BEDROCK_BASE_URL) ||
(considerHostEnv && isNonEmpty(process.env.ANTHROPIC_BEDROCK_BASE_URL));
const configApiKey = env.ANTHROPIC_API_KEY;
const hostApiKey = considerHostEnv ? process.env.ANTHROPIC_API_KEY : undefined;
if (hasBedrock) {
const source =
env.CLAUDE_CODE_USE_BEDROCK === "1" ||
env.CLAUDE_CODE_USE_BEDROCK === "true" ||
isNonEmpty(env.ANTHROPIC_BEDROCK_BASE_URL)
? "adapter config env"
: "server environment";
checks.push({
code: "claude_bedrock_auth",
level: "info",
message: "AWS Bedrock auth detected. Claude will use Bedrock for inference.",
detail: `Detected in ${source}.`,
hint: "Ensure AWS credentials (AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or AWS_PROFILE) and AWS_REGION are configured.",
});
} else if (isNonEmpty(configApiKey) || isNonEmpty(hostApiKey)) {
const source = isNonEmpty(configApiKey) ? "adapter config env" : "server environment";
checks.push({
code: "claude_anthropic_api_key_overrides_subscription",
level: "warn",
message:
"ANTHROPIC_API_KEY is set. Claude will use API-key auth instead of subscription credentials.",
detail: `Detected in ${source}.`,
hint: "Unset ANTHROPIC_API_KEY if you want subscription-based Claude login behavior.",
});
} else if (!targetIsRemote) {
checks.push({
code: "claude_subscription_mode_possible",
level: "info",
message: "ANTHROPIC_API_KEY is not set; subscription-based auth can be used if Claude is logged in.",
});
}
const canRunProbe =
checks.every((check) => check.code !== "claude_cwd_invalid" && check.code !== "claude_command_unresolvable");
if (canRunProbe) {
if (!commandLooksLike(command, "claude")) {
checks.push({
code: "claude_hello_probe_skipped_custom_command",
level: "info",
message: "Skipped hello probe because command is not `claude`.",
detail: command,
hint: "Use the `claude` CLI command to run the automatic login and installation probe.",
});
} else {
const model = asString(config.model, "").trim();
const effort = asString(config.effort, "").trim();
const chrome = asBoolean(config.chrome, false);
const maxTurns = asNumber(config.maxTurnsPerRun, 0);
const dangerouslySkipPermissions = asBoolean(config.dangerouslySkipPermissions, true);
const extraArgs = (() => {
const fromExtraArgs = asStringArray(config.extraArgs);
if (fromExtraArgs.length > 0) return fromExtraArgs;
return asStringArray(config.args);
})();
const args = ["--print", "-", "--output-format", "stream-json", "--verbose"];
args.push(...buildClaudeProbePermissionArgs({ dangerouslySkipPermissions, targetIsSandbox }));
if (chrome) args.push("--chrome");
// For Bedrock: only pass --model when the ID is a Bedrock-native identifier.
if (model && (!hasBedrock || isBedrockModelId(model))) {
args.push("--model", model);
}
if (effort) args.push("--effort", effort);
if (maxTurns > 0) args.push("--max-turns", String(maxTurns));
if (extraArgs.length > 0) args.push(...extraArgs);
const probe = await runAdapterExecutionTargetProcess(
runId,
target,
command,
args,
{
cwd,
env,
timeoutSec: 45,
graceSec: 5,
stdin: "Respond with hello.",
onLog: async () => {},
},
);
const parsedStream = parseClaudeStreamJson(probe.stdout);
const parsed = parsedStream.resultJson;
const loginMeta = detectClaudeLoginRequired({
parsed,
stdout: probe.stdout,
stderr: probe.stderr,
});
const detail = summarizeProbeDetail(probe.stdout, probe.stderr);
if (probe.timedOut) {
checks.push({
code: "claude_hello_probe_timed_out",
level: "warn",
message: "Claude hello probe timed out.",
hint: "Retry the probe. If this persists, verify Claude can run `Respond with hello` from this directory manually.",
});
} else if (loginMeta.requiresLogin) {
checks.push({
code: "claude_hello_probe_auth_required",
level: "warn",
message: "Claude CLI is installed, but login is required.",
...(detail ? { detail } : {}),
hint: loginMeta.loginUrl
? `Run \`claude login\` and complete sign-in at ${loginMeta.loginUrl}, then retry.`
: "Run `claude login` in this environment, then retry the probe.",
});
} else if ((probe.exitCode ?? 1) === 0) {
const summary = parsedStream.summary.trim();
const hasHello = /\bhello\b/i.test(summary);
checks.push({
code: hasHello ? "claude_hello_probe_passed" : "claude_hello_probe_unexpected_output",
level: hasHello ? "info" : "warn",
message: hasHello
? "Claude hello probe succeeded."
: "Claude probe ran but did not return `hello` as expected.",
...(summary ? { detail: summary.replace(/\s+/g, " ").trim().slice(0, 240) } : {}),
...(hasHello
? {}
: {
hint: "Try the probe manually (`claude --print - --output-format stream-json --verbose`) and prompt `Respond with hello`.",
}),
});
} else {
checks.push({
code: "claude_hello_probe_failed",
level: "error",
message: "Claude hello probe failed.",
...(detail ? { detail } : {}),
hint: "Run `claude --print - --output-format stream-json --verbose` manually in this directory and prompt `Respond with hello` to debug.",
});
}
}
}
return {
adapterType: ctx.adapterType,
status: summarizeStatus(checks),
checks,
testedAt: new Date().toISOString(),
};
}