Locks in the existing override behavior so a future regression that
reverts to a hardcoded image is caught immediately. Closes the
investigation on FAR-90.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The cancel-poll test sets PAPERCLIP_API_KEY='test-key' but the actual
PAPERCLIP_DEV_API_KEY was leaking through from the harness environment.
Since execute.ts prefers PAPERCLIP_DEV_API_KEY over PAPERCLIP_API_KEY,
the poll was sending the real dev key instead of 'test-key'.
Fix: add beforeEach to set PAPERCLIP_DEV_API_KEY='test-key', and afterEach
to clean both env vars.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The env var was set to /opencode-db (the mount point directory), but sqlite
requires a file path. Changed to /opencode-db/opencode.db.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
So we can answer "what's coverage?" without re-installing each time.
Run with: \`npx vitest run --coverage --coverage.provider=v8 --coverage.reporter=text-summary\`
Co-Authored-By: Paperclip <noreply@paperclip.ing>
There was no test file for k8s-client.ts. Existing pvc.test.ts mocked
`getPvc` directly and never exercised the underlying isNotFound predicate,
so the v1.x ApiException `code` vs `statusCode` regression had nothing to
catch it.
Add k8s-client.test.ts that mocks @kubernetes/client-node, throws errors
shaped exactly like the real ApiException (status under `code`), and
verifies:
- getPvc returns null on code=404 (the FAR-85 case)
- getPvc still handles legacy statusCode=404 and response.statusCode=404
- getPvc re-throws non-404 errors (500, 403)
- deletePvc swallows 404, re-throws others
- createPvc forwards spec to the SDK
Confirmed the new tests fail when k8s-client.ts is reverted to the
pre-fix predicate (2 failures), and pass with the fix in place.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The v1.x ApiException exposes the HTTP status as `code`, not `statusCode`.
Both `isNotFound` (k8s-client) and `isK8s404` (execute) only checked
`statusCode`/`response.statusCode`, so 404s were never recognized:
- `getPvc` re-threw the 404 instead of returning null, which bubbled up
through `ensureAgentDbPvc` as `k8s_job_create_failed` with the raw
"persistentvolumeclaims X not found" body — the symptom in FAR-85.
- The PVC was never actually created, because the existence check threw
before reaching `createPvc`.
Add `code === 404` to both predicates and a regression test for `isK8s404`.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous workflow ran npm publish on every push to master and
gated it via npm view on a stale scoped package name, which made
the check always think the version was unpublished and 403'd
whenever the registry already had it.
Switch the publish job to fire only on push of a v* tag, verify
the tag matches package.json, and use the standard
NODE_AUTH_TOKEN flow via setup-node's registry-url. Tests still
run on master push and PRs.
Release flow: bump version, commit, push master, then
git tag v<version> && git push origin v<version>.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Adapter releases are distributed via the Paperclip adapter plugin
system, not tarballs in git. Removes legacy 0.1.22/0.1.23/0.1.26
tarballs and a stray screenshot, and adds *.tgz to .gitignore so
future npm pack output is not committed.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Before creating a PVC, ensureAgentDbPvc checks if it exists and creates
it if not. However, the Kubernetes API may return a Success response
without actually creating the resource. This commit adds a verification
step after createPvc to confirm the PVC actually exists before returning.
Fixes FAR-84.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The exponential backoff sleep in streamPodLogs used a single setTimeout
for the full delay (3s, 6s, 12s...). When logStopSignal.stopped was set
mid-backoff (e.g. by external cancel), the loop body could not check the
signal until the timer expired — causing the cancel test to time out when
the 12s backoff overlapped with the 15s cancel window.
Sleep in 200ms chunks so a stop signal can exit the backoff immediately.
Fixes the pre-existing CI timeout in execute.test.ts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Prior commit from remote + this branch both added the field; deduplicate,
keeping the entry at the top of the Kubernetes group.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds serviceAccountName field to the Kubernetes group in getConfigSchema()
so operators can specify a dedicated SA (e.g. paperclip-developer) for Job
pods that need k8s API access. The field was already consumed in job-manifest.ts;
this makes it visible in the UI. Bumps to 0.1.25.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surfaces the serviceAccountName field in the adapter UI under the
Kubernetes group. The job manifest builder already reads this field;
this change makes it configurable via the UI.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
External cancel polling in execute.ts used PAPERCLIP_API_KEY which is
a short-lived run JWT for the main Paperclip instance. In multi-instance
setups (dev vs main), the agent runs on the dev instance but the run JWT
is only valid on the main instance, causing 401 on every poll.
Now polls with PAPERCLIP_DEV_API_KEY if set, falling back to
PAPERCLIP_API_KEY. The dev key is inherited through job-manifest.ts
from the pod's inherited env.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replace fixed 3s reconnect delay with exponential backoff (3s → 6s → 12s → 24s → capped at 30s) to avoid hammering the K8s API server during prolonged network blips while remaining responsive during brief disconnects.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Seven direct unit tests for ensureAgentDbPvc covering ephemeral mode,
existing PVC (no create), PVC creation with storage class/capacity,
missing storage class error, default mode, and agent ID slug derivation.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replaces the Option A shared-PVC path implementation with a long-lived
dedicated PVC per agent, mounted at /opencode-db with OPENCODE_DB=/opencode-db.
Changes:
- k8s-client.ts: add getPvc/createPvc/deletePvc CoreV1Api helpers
- execute.ts: add ensureAgentDbPvc() that gets-or-creates a PVC named
opencode-db-<agentId> before Job creation; pass agentDbClaimName through
to buildJobManifest; return null for ephemeral mode (emptyDir used instead)
- job-manifest.ts: accept agentDbClaimName on JobBuildInput; mount dedicated
PVC or emptyDir at /opencode-db; set OPENCODE_DB=/opencode-db; revert init
container to simple form (no mkdir, no PVC mount)
- config-schema.ts: replace opencodeDbMode/opencodeDbPath with agentDbMode
(dedicated_pvc|ephemeral, default dedicated_pvc), agentDbStorageClass
(required for dedicated_pvc), agentDbStorageCapacity (default 1Gi)
- test.ts: add create/delete RBAC checks for persistentvolumeclaims
- pvc.test.ts: unit tests for ensureAgentDbPvc (7 cases incl. error paths)
- 289/289 tests pass; typecheck clean
- No agent-delete hook exists; opencode-db PVC janitor routine is a deferred
follow-up task
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- shared_pvc mode (default): sets OPENCODE_DB to /paperclip/.opencode/db/<agentId>
and prepends mkdir -p to the busybox init container when a PVC is present
- ephemeral mode: mounts an emptyDir at /opencode-db and points OPENCODE_DB there
- config-schema: adds opencodeDbMode (select, default shared_pvc) and
opencodeDbPath (optional text override for shared_pvc path)
- No agent-delete hook exists in this adapter; per-agent DB dir cleanup is
deferred to a janitor routine (follow-up work)
- 284/284 tests pass; typecheck clean
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The cancel poller was calling GET /api/heartbeat-runs/{runId} which
returned 401 because the adapter key lacks access to the internal
heartbeat-runs endpoint. Switch to GET /api/issues/{issueId}, which
the adapter key can read. Also tighten the trigger condition from
status !== "running" to status === "cancelled" so that other terminal
states (done, blocked, etc.) do not abort the K8s job.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When multiple tasks are assigned simultaneously, only one K8s job can run
at a time (shared PVC/session guard). Previously, all other tasks received
k8s_concurrent_run_blocked immediately and stayed blocked forever.
Now the guard retries once: wait for all blocking jobs to complete via
waitForJobCompletion, then re-check before proceeding to create a new job.
If the re-check still shows a running job, the error is returned as before.
The agentCreationMutex already serializes guard-check + job-create, so
tasks naturally queue up and execute one at a time without concurrent jobs.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Two bugs prevented skill content from reaching K8s Job prompts, and
resumeLastSession: false was silently ignored.
Skills fix (execute.ts, FAR-57):
- Add /paperclip/.claude/skills as additional candidate to
readPaperclipRuntimeSkillEntries — the relative candidates in
adapter-utils don't resolve to the PVC-mounted skills home
- Read entry.source/SKILL.md instead of entry.source (which is a
directory path); fall back to source directly for file-based entries
- Mock readPaperclipRuntimeSkillEntries in execute.test.ts to prevent
real SKILL.md reads from delaying fake-timer registration
Session fix (job-manifest.ts, FAR-56):
- Gate --session flag on asBoolean(config.resumeLastSession, true)
so setting resumeLastSession: false actually stops session resumption
- Default true preserves existing behaviour for agents without config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The K8s log client v1.x closes the follow-stream prematurely due to a
known upstream bug — causing the grace timer to fire 30 s after log
stream exit even when the container is still running. The old behaviour
(`waitForPodTermination` with a hardcoded 120 s timeout) was too short
for agents whose opencode runs take several minutes, leading to premature
failure and issues stuck in `blocked`.
Fix: the grace poller now calls `readNamespacedPod` before resolving the
completion promise. If the pod is still Running/Pending, it resets
`logExitTime` to defer the grace deadline. A `graceCheckPending` guard
prevents concurrent checks. A `graceMaxWaitMs` cap (= completionTimeoutMs
when set, 20 min otherwise) ensures we never wait forever for unlimited
jobs. Version bumped to 0.1.21.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The @kubernetes/client-node v1.x Log.follow stream closes prematurely
(known upstream TODO). Combined with Node.js buffering stdout to pipes,
the live log stream always returns empty. When the 30s grace timer fires
and the stream is empty, the container may still be running.
Add waitForPodTermination() to block in the empty-stdout fallback path
until the container actually exits (up to 120s), then read its complete
output with readNamespacedPodLog. This makes runs complete successfully
instead of looping indefinitely in in_progress.
Bump version to 0.1.20.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
TypeScript CFA does not trace the assignment inside the vi.spyOn
mockImplementation callback, so it narrows capturedHandler to null at
the if-check, making the body unreachable (never). Cast at the call
site breaks the false narrowing without changing runtime behaviour.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Poll PAPERCLIP_API_URL/api/heartbeat-runs/{runId} at keepalive cadence
during log streaming. When status != "running", delete the Job with
propagationPolicy=Background and return errorCode="cancelled" as a
distinct result, matching the claude_k8s reference implementation.
Also includes: reattachOrphanedJobs config field that lets the adapter
reattach to a same-task Job left over from a prior server restart;
task-id and session-id K8s labels on Job manifests for observability.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add agentCreationMutex (Map<agentId, Promise>) that serializes
guard-check + job-create per agent, eliminating the TOCTOU race where
two concurrent execute() calls both pass the list-then-create check.
- Change catch {} on listNamespacedJob errors to return
errorCode: "k8s_concurrency_guard_unreachable" (fail-closed) instead
of silently bypassing the concurrency guard.
- Add ensureSigtermHandler() which tracks active Jobs in activeJobs Map
and deletes all of them (plus prompt Secrets) on SIGTERM before exit.
- Track orphaned-job reattaches in activeJobs for consistent cleanup.
- Update execute.test.ts: change "proceeds on list error" test to assert
k8s_concurrency_guard_unreachable; add mutex serialization test and
SIGTERM handler registration tests.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add `sanitizeLabelValue()` export: strips [^a-z0-9._-], lowercases, truncates to 63 chars, warns on drop
- Apply sanitizer to all paperclip.io/* label values (agent-id, run-id, company-id, extra labels)
- Job name now includes 6-char sha256 hash over raw agent.id+runId for collision resistance
- Trailing hyphens stripped from final job name
- Slugs extended from 8 to 16 chars to match claude_k8s reference
- 32 unit tests covering sanitizeLabelValue, job name format, determinism, and collision avoidance
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add describe block "execute — large-prompt Secret path" with 5 cases:
buildJobManifest called twice (promptSecretName on second call),
Secret created before Job, ownerReference patched after Job creation,
Secret deleted in finally block, Secret cleaned up on Job create failure
- Update vi.mock for job-manifest to export LARGE_PROMPT_THRESHOLD_BYTES
- Add createNamespacedSecret/deleteNamespacedSecret/patchNamespacedSecret
to makeCoreApi for completeness
- Update makeBatchApi to return { metadata: { uid } } so ownerRef tests work
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Split streamPodLogs into streamPodLogsOnce (with bail timer + stopSignal)
and streamPodLogs (reconnect loop, up to MAX_LOG_RECONNECT_ATTEMPTS=50)
- LogLineDedupFilter suppresses replayed JSONL events on reconnect, keyed
by type+sessionID+part.id (OpenCode shape)
- Bail timer (LOG_STREAM_BAIL_TIMEOUT_MS=3s) forces writable.destroy() +
promise resolution when stopSignal fires and logApi.log hangs
- Keepalive: emits '[paperclip] keepalive — job X running (Ns since last output)'
every 15s during silent phases, with 2-consecutive-reading latch to avoid
false-positive terminal detections
- completionGraced uses logExitTime + grace poller so log stream stop signal
is set immediately when job condition resolves
- All 235 tests pass, tsc clean
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- SelfPodInfo gains inheritedEnvValueFrom (V1EnvVar[]) and inheritedEnvFrom (V1EnvFromSource[])
- Container selection now prefers the container named "paperclip", falls back to first
- buildJobManifest appends valueFrom env vars (skipping names already overridden)
and sets envFrom on the opencode container when present
- Tests updated: mock updated, 5 new cases covering secretKeyRef forwarding,
dedup, envFrom passthrough, and empty-envFrom omission
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When both a JSONL error (e.g. "killed") and a pod terminated reason (e.g. "OOMKilled")
are present, join them with "; " so the richer pod classification is never silently
dropped by the parsedError short-circuit.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace getPodExitCode with getPodTerminatedInfo to capture exit code
and reason (OOMKilled, Error, etc.) from terminated container state;
pod failure description now surfaces in returned errorMessage
- Add partial-stdout fallback: readPodLogs is triggered when stdout is
non-empty but contains no sessionId (missing session result), not just
when stdout is fully empty
- Detect empty LLM response: when a session ran but produced 0 output
tokens and no messages, return errorCode "llm_api_error"
- Add 13 new unit tests covering all three new paths
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- config-schema: add instructionsFilePath UI field (Core group, text type)
- server/index.ts: set supportsInstructionsBundle=true, instructionsPathKey="instructionsFilePath"
- execute.ts: read instructionsFilePath file + desired skill markdown files from PVC; pass to buildJobManifest as instructionsContent / skillsBundleContent
- job-manifest.ts: accept instructionsContent + skillsBundleContent in JobBuildInput; prepend both to prompt via joinPromptSections; add instructionsChars + skillsBundleChars to promptMetrics
- index.ts: document instructionsFilePath and skills injection in agentConfigurationDoc
- CLAUDE.md: document skill materialization (ephemeral mode) and instructionsFilePath field
- Bump version to 0.1.18
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Implement skill sync handlers that were missing, matching the approach
used in the claude_k8s adapter. The adapter now surfaces available,
configured, and external skills from /paperclip/.claude/skills in K8s
pods, resolving desired skills from config and reporting missing ones.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrade from ^2026.411.0-canary.8 to 2026.415.0-canary.7 to get
ServerAdapterModule capability flag fields (supportsInstructionsBundle,
instructionsPathKey, requiresMaterializedRuntimeSkills).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Declare supportsInstructionsBundle, instructionsPathKey, and
requiresMaterializedRuntimeSkills on ServerAdapterModule. opencode_k8s
does not support instructions bundles (instructions are piped via init
container) and does not require materialized runtime skills (bundled in
container image).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace joinPromptSections, stringifyPaperclipWakePayload, and
renderPaperclipWakePrompt with imports from adapter-utils/server-utils
(the fork's renderPaperclipWakePrompt adds execution stage routing,
resume delta sections, and full comment batch rendering)
- Replace local inferOpenAiCompatibleBiller with import from adapter-utils
- Declare sessionManagement using getAdapterSessionManagement("opencode_local")
with fallback defaults for proper session compaction policy
- Add log redaction via redactHomePathUserSegments in streamPodLogs
- Bump peerDependency to >=0.3.1 and version to 0.1.14
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Lock file was stale at 0.1.11 with an outdated peerDependency constraint;
bring it in line with package.json (0.1.13, >=0.3.0).
Co-Authored-By: Paperclip <noreply@paperclip.ing>