- Run tailPodLogFile and waitForJobCompletion in parallel via Promise.allSettled;
completion sets stopSignal.stopped so the tail loop drains and exits. Without
this, tailPodLogFile loops forever — the only natural exit was fh.stat()
throwing on file removal, which never happened during normal job completion.
- Restructure tail loop to read-then-sleep, with a final drain after stopSignal
is set to capture bytes written between the last poll and terminal state.
- Port the c8429cf fix from paperclip-adapter-claude-k8s:
* buildPodLogPath now writes to /paperclip/instances/default/data/run-logs/...
to match the server PVC layout (the /data/ segment was missing).
* Drop the mkdir -p ... && from both init container command variants — the
PVC isn't mounted in the init container, so the mkdir was failing with
exit code 1 and the && short-circuit prevented the prompt copy.
- Test infrastructure:
* Hoisted fs/promises mock now uses importOriginal so readFile (used for
skill bundle loading) hits the real implementation.
* setMockJsonl() lets individual tests inject specific JSONL into the tail's
read buffer (previously dead constants in the test file).
* fh.read mock now writes into the caller's buffer instead of returning a
separate one.
- Add src/server/test.test.ts covering testEnvironment (was 0% → 98.5% stmts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vi.mock("node:fs/promises") factory previously used a closure variable
that accumulated across tests despite vi.clearAllMocks(). Switched to
vi.hoisted() with an explicit resetFsMocks() called in beforeEach() so
the read offset counter is properly reset between tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Replaced streamPodLogs / streamPodLogsOnce / readPodLogs / waitForPodTermination
with tailPodLogFile() that polls a shared PVC file path with adaptive cadence
(250ms active, 1000ms idle after 5 consecutive empty polls)
- Added buildPodLogPath() export and podLogPath to JobBuildResult
- Added assertSafePathComponent with [a-zA-Z0-9-:] allowance for UUIDs
- Updated Job manifest to tee stdout to /paperclip/instances/default/run-logs/<companyId>/<agentId>/<runId>.pod.ndjson
- Added hasOutOfProcessLiveness: true to createServerAdapter (cast required)
- Deleted log-dedup.ts and log-dedup.test.ts entirely
- Removed all LogLineDedupFilter, Writable, and LOG_STREAM_* constants
- Removed completionResult.status workaround (completionWithGrace returns directly)
- Test infrastructure: mocked node:fs/promises to prevent unmocked fs.stat hangs
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cancel poll now uses ctx.authToken exclusively. Remove forwarding of
PAPERCLIP_DEV_API_KEY into job pods and all associated tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cancel poll was sending empty Authorization headers because
PAPERCLIP_API_KEY is not set on the Paperclip server pod. Use the
per-run authToken from ctx instead, which is the JWT issued by Paperclip
for this execution. PAPERCLIP_DEV_API_KEY still overrides for dev instances.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cancel-poll test sets PAPERCLIP_API_KEY='test-key' but the actual
PAPERCLIP_DEV_API_KEY was leaking through from the harness environment.
Since execute.ts prefers PAPERCLIP_DEV_API_KEY over PAPERCLIP_API_KEY,
the poll was sending the real dev key instead of 'test-key'.
Fix: add beforeEach to set PAPERCLIP_DEV_API_KEY='test-key', and afterEach
to clean both env vars.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The v1.x ApiException exposes the HTTP status as `code`, not `statusCode`.
Both `isNotFound` (k8s-client) and `isK8s404` (execute) only checked
`statusCode`/`response.statusCode`, so 404s were never recognized:
- `getPvc` re-threw the 404 instead of returning null, which bubbled up
through `ensureAgentDbPvc` as `k8s_job_create_failed` with the raw
"persistentvolumeclaims X not found" body — the symptom in FAR-85.
- The PVC was never actually created, because the existence check threw
before reaching `createPvc`.
Add `code === 404` to both predicates and a regression test for `isK8s404`.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Seven direct unit tests for ensureAgentDbPvc covering ephemeral mode,
existing PVC (no create), PVC creation with storage class/capacity,
missing storage class error, default mode, and agent ID slug derivation.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replaces the Option A shared-PVC path implementation with a long-lived
dedicated PVC per agent, mounted at /opencode-db with OPENCODE_DB=/opencode-db.
Changes:
- k8s-client.ts: add getPvc/createPvc/deletePvc CoreV1Api helpers
- execute.ts: add ensureAgentDbPvc() that gets-or-creates a PVC named
opencode-db-<agentId> before Job creation; pass agentDbClaimName through
to buildJobManifest; return null for ephemeral mode (emptyDir used instead)
- job-manifest.ts: accept agentDbClaimName on JobBuildInput; mount dedicated
PVC or emptyDir at /opencode-db; set OPENCODE_DB=/opencode-db; revert init
container to simple form (no mkdir, no PVC mount)
- config-schema.ts: replace opencodeDbMode/opencodeDbPath with agentDbMode
(dedicated_pvc|ephemeral, default dedicated_pvc), agentDbStorageClass
(required for dedicated_pvc), agentDbStorageCapacity (default 1Gi)
- test.ts: add create/delete RBAC checks for persistentvolumeclaims
- pvc.test.ts: unit tests for ensureAgentDbPvc (7 cases incl. error paths)
- 289/289 tests pass; typecheck clean
- No agent-delete hook exists; opencode-db PVC janitor routine is a deferred
follow-up task
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The cancel poller was calling GET /api/heartbeat-runs/{runId} which
returned 401 because the adapter key lacks access to the internal
heartbeat-runs endpoint. Switch to GET /api/issues/{issueId}, which
the adapter key can read. Also tighten the trigger condition from
status !== "running" to status === "cancelled" so that other terminal
states (done, blocked, etc.) do not abort the K8s job.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When multiple tasks are assigned simultaneously, only one K8s job can run
at a time (shared PVC/session guard). Previously, all other tasks received
k8s_concurrent_run_blocked immediately and stayed blocked forever.
Now the guard retries once: wait for all blocking jobs to complete via
waitForJobCompletion, then re-check before proceeding to create a new job.
If the re-check still shows a running job, the error is returned as before.
The agentCreationMutex already serializes guard-check + job-create, so
tasks naturally queue up and execute one at a time without concurrent jobs.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Two bugs prevented skill content from reaching K8s Job prompts, and
resumeLastSession: false was silently ignored.
Skills fix (execute.ts, FAR-57):
- Add /paperclip/.claude/skills as additional candidate to
readPaperclipRuntimeSkillEntries — the relative candidates in
adapter-utils don't resolve to the PVC-mounted skills home
- Read entry.source/SKILL.md instead of entry.source (which is a
directory path); fall back to source directly for file-based entries
- Mock readPaperclipRuntimeSkillEntries in execute.test.ts to prevent
real SKILL.md reads from delaying fake-timer registration
Session fix (job-manifest.ts, FAR-56):
- Gate --session flag on asBoolean(config.resumeLastSession, true)
so setting resumeLastSession: false actually stops session resumption
- Default true preserves existing behaviour for agents without config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TypeScript CFA does not trace the assignment inside the vi.spyOn
mockImplementation callback, so it narrows capturedHandler to null at
the if-check, making the body unreachable (never). Cast at the call
site breaks the false narrowing without changing runtime behaviour.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add agentCreationMutex (Map<agentId, Promise>) that serializes
guard-check + job-create per agent, eliminating the TOCTOU race where
two concurrent execute() calls both pass the list-then-create check.
- Change catch {} on listNamespacedJob errors to return
errorCode: "k8s_concurrency_guard_unreachable" (fail-closed) instead
of silently bypassing the concurrency guard.
- Add ensureSigtermHandler() which tracks active Jobs in activeJobs Map
and deletes all of them (plus prompt Secrets) on SIGTERM before exit.
- Track orphaned-job reattaches in activeJobs for consistent cleanup.
- Update execute.test.ts: change "proceeds on list error" test to assert
k8s_concurrency_guard_unreachable; add mutex serialization test and
SIGTERM handler registration tests.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add describe block "execute — large-prompt Secret path" with 5 cases:
buildJobManifest called twice (promptSecretName on second call),
Secret created before Job, ownerReference patched after Job creation,
Secret deleted in finally block, Secret cleaned up on Job create failure
- Update vi.mock for job-manifest to export LARGE_PROMPT_THRESHOLD_BYTES
- Add createNamespacedSecret/deleteNamespacedSecret/patchNamespacedSecret
to makeCoreApi for completeness
- Update makeBatchApi to return { metadata: { uid } } so ownerRef tests work
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- SelfPodInfo gains inheritedEnvValueFrom (V1EnvVar[]) and inheritedEnvFrom (V1EnvFromSource[])
- Container selection now prefers the container named "paperclip", falls back to first
- buildJobManifest appends valueFrom env vars (skipping names already overridden)
and sets envFrom on the opencode container when present
- Tests updated: mock updated, 5 new cases covering secretKeyRef forwarding,
dedup, envFrom passthrough, and empty-envFrom omission
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace getPodExitCode with getPodTerminatedInfo to capture exit code
and reason (OOMKilled, Error, etc.) from terminated container state;
pod failure description now surfaces in returned errorMessage
- Add partial-stdout fallback: readPodLogs is triggered when stdout is
non-empty but contains no sessionId (missing session result), not just
when stdout is fully empty
- Detect empty LLM response: when a session ran but produced 0 output
tokens and no messages, return errorCode "llm_api_error"
- Add 13 new unit tests covering all three new paths
Co-Authored-By: Paperclip <noreply@paperclip.ing>