The cancel poller was calling GET /api/heartbeat-runs/{runId} which
returned 401 because the adapter key lacks access to the internal
heartbeat-runs endpoint. Switch to GET /api/issues/{issueId}, which
the adapter key can read. Also tighten the trigger condition from
status !== "running" to status === "cancelled" so that other terminal
states (done, blocked, etc.) do not abort the K8s job.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When multiple tasks are assigned simultaneously, only one K8s job can run
at a time (shared PVC/session guard). Previously, all other tasks received
k8s_concurrent_run_blocked immediately and stayed blocked forever.
Now the guard retries once: wait for all blocking jobs to complete via
waitForJobCompletion, then re-check before proceeding to create a new job.
If the re-check still shows a running job, the error is returned as before.
The agentCreationMutex already serializes guard-check + job-create, so
tasks naturally queue up and execute one at a time without concurrent jobs.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Two bugs prevented skill content from reaching K8s Job prompts, and
resumeLastSession: false was silently ignored.
Skills fix (execute.ts, FAR-57):
- Add /paperclip/.claude/skills as additional candidate to
readPaperclipRuntimeSkillEntries — the relative candidates in
adapter-utils don't resolve to the PVC-mounted skills home
- Read entry.source/SKILL.md instead of entry.source (which is a
directory path); fall back to source directly for file-based entries
- Mock readPaperclipRuntimeSkillEntries in execute.test.ts to prevent
real SKILL.md reads from delaying fake-timer registration
Session fix (job-manifest.ts, FAR-56):
- Gate --session flag on asBoolean(config.resumeLastSession, true)
so setting resumeLastSession: false actually stops session resumption
- Default true preserves existing behaviour for agents without config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The K8s log client v1.x closes the follow-stream prematurely due to a
known upstream bug — causing the grace timer to fire 30 s after log
stream exit even when the container is still running. The old behaviour
(`waitForPodTermination` with a hardcoded 120 s timeout) was too short
for agents whose opencode runs take several minutes, leading to premature
failure and issues stuck in `blocked`.
Fix: the grace poller now calls `readNamespacedPod` before resolving the
completion promise. If the pod is still Running/Pending, it resets
`logExitTime` to defer the grace deadline. A `graceCheckPending` guard
prevents concurrent checks. A `graceMaxWaitMs` cap (= completionTimeoutMs
when set, 20 min otherwise) ensures we never wait forever for unlimited
jobs. Version bumped to 0.1.21.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The @kubernetes/client-node v1.x Log.follow stream closes prematurely
(known upstream TODO). Combined with Node.js buffering stdout to pipes,
the live log stream always returns empty. When the 30s grace timer fires
and the stream is empty, the container may still be running.
Add waitForPodTermination() to block in the empty-stdout fallback path
until the container actually exits (up to 120s), then read its complete
output with readNamespacedPodLog. This makes runs complete successfully
instead of looping indefinitely in in_progress.
Bump version to 0.1.20.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add agentCreationMutex (Map<agentId, Promise>) that serializes
guard-check + job-create per agent, eliminating the TOCTOU race where
two concurrent execute() calls both pass the list-then-create check.
- Change catch {} on listNamespacedJob errors to return
errorCode: "k8s_concurrency_guard_unreachable" (fail-closed) instead
of silently bypassing the concurrency guard.
- Add ensureSigtermHandler() which tracks active Jobs in activeJobs Map
and deletes all of them (plus prompt Secrets) on SIGTERM before exit.
- Track orphaned-job reattaches in activeJobs for consistent cleanup.
- Update execute.test.ts: change "proceeds on list error" test to assert
k8s_concurrency_guard_unreachable; add mutex serialization test and
SIGTERM handler registration tests.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Split streamPodLogs into streamPodLogsOnce (with bail timer + stopSignal)
and streamPodLogs (reconnect loop, up to MAX_LOG_RECONNECT_ATTEMPTS=50)
- LogLineDedupFilter suppresses replayed JSONL events on reconnect, keyed
by type+sessionID+part.id (OpenCode shape)
- Bail timer (LOG_STREAM_BAIL_TIMEOUT_MS=3s) forces writable.destroy() +
promise resolution when stopSignal fires and logApi.log hangs
- Keepalive: emits '[paperclip] keepalive — job X running (Ns since last output)'
every 15s during silent phases, with 2-consecutive-reading latch to avoid
false-positive terminal detections
- completionGraced uses logExitTime + grace poller so log stream stop signal
is set immediately when job condition resolves
- All 235 tests pass, tsc clean
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- SelfPodInfo gains inheritedEnvValueFrom (V1EnvVar[]) and inheritedEnvFrom (V1EnvFromSource[])
- Container selection now prefers the container named "paperclip", falls back to first
- buildJobManifest appends valueFrom env vars (skipping names already overridden)
and sets envFrom on the opencode container when present
- Tests updated: mock updated, 5 new cases covering secretKeyRef forwarding,
dedup, envFrom passthrough, and empty-envFrom omission
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When both a JSONL error (e.g. "killed") and a pod terminated reason (e.g. "OOMKilled")
are present, join them with "; " so the richer pod classification is never silently
dropped by the parsedError short-circuit.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace getPodExitCode with getPodTerminatedInfo to capture exit code
and reason (OOMKilled, Error, etc.) from terminated container state;
pod failure description now surfaces in returned errorMessage
- Add partial-stdout fallback: readPodLogs is triggered when stdout is
non-empty but contains no sessionId (missing session result), not just
when stdout is fully empty
- Detect empty LLM response: when a session ran but produced 0 output
tokens and no messages, return errorCode "llm_api_error"
- Add 13 new unit tests covering all three new paths
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace joinPromptSections, stringifyPaperclipWakePayload, and
renderPaperclipWakePrompt with imports from adapter-utils/server-utils
(the fork's renderPaperclipWakePrompt adds execution stage routing,
resume delta sections, and full comment batch rendering)
- Replace local inferOpenAiCompatibleBiller with import from adapter-utils
- Declare sessionManagement using getAdapterSessionManagement("opencode_local")
with fallback defaults for proper session compaction policy
- Add log redaction via redactHomePathUserSegments in streamPodLogs
- Bump peerDependency to >=0.3.1 and version to 0.1.14
Co-Authored-By: Paperclip <noreply@paperclip.ing>