paperclip-adapter-opencode-k8s

farhoodlabs/paperclip-adapter-opencode-k8s

Author	SHA1	Message	Date
Chris Farhood	5fa9e1396e	fix: poll issue status instead of heartbeat-run for cancel detection (FAR-60) The cancel poller was calling GET /api/heartbeat-runs/{runId} which returned 401 because the adapter key lacks access to the internal heartbeat-runs endpoint. Switch to GET /api/issues/{issueId}, which the adapter key can read. Also tighten the trigger condition from status !== "running" to status === "cancelled" so that other terminal states (done, blocked, etc.) do not abort the K8s job. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-25 12:25:41 +00:00
Chris Farhood	80d18005f9	fix: wait for concurrent job to finish instead of returning permanent blocked error (FAR-61) When multiple tasks are assigned simultaneously, only one K8s job can run at a time (shared PVC/session guard). Previously, all other tasks received k8s_concurrent_run_blocked immediately and stayed blocked forever. Now the guard retries once: wait for all blocking jobs to complete via waitForJobCompletion, then re-check before proceeding to create a new job. If the re-check still shows a running job, the error is returned as before. The agentCreationMutex already serializes guard-check + job-create, so tasks naturally queue up and execute one at a time without concurrent jobs. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-25 11:10:27 +00:00
Chris Farhood	2bd8107f1d	fix: skills not bundled and resumeLastSession ignored (FAR-56, FAR-57) Two bugs prevented skill content from reaching K8s Job prompts, and resumeLastSession: false was silently ignored. Skills fix (execute.ts, FAR-57): - Add /paperclip/.claude/skills as additional candidate to readPaperclipRuntimeSkillEntries — the relative candidates in adapter-utils don't resolve to the PVC-mounted skills home - Read entry.source/SKILL.md instead of entry.source (which is a directory path); fall back to source directly for file-based entries - Mock readPaperclipRuntimeSkillEntries in execute.test.ts to prevent real SKILL.md reads from delaying fake-timer registration Session fix (job-manifest.ts, FAR-56): - Gate --session flag on asBoolean(config.resumeLastSession, true) so setting resumeLastSession: false actually stops session resumption - Default true preserves existing behaviour for agents without config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:11:47 +00:00
Chris Farhood	0e94e84e2c	fix: grace poller checks pod liveness before giving up on log stream (FAR-52) The K8s log client v1.x closes the follow-stream prematurely due to a known upstream bug — causing the grace timer to fire 30 s after log stream exit even when the container is still running. The old behaviour (`waitForPodTermination` with a hardcoded 120 s timeout) was too short for agents whose opencode runs take several minutes, leading to premature failure and issues stuck in `blocked`. Fix: the grace poller now calls `readNamespacedPod` before resolving the completion promise. If the pod is still Running/Pending, it resets `logExitTime` to defer the grace deadline. A `graceCheckPending` guard prevents concurrent checks. A `graceMaxWaitMs` cap (= completionTimeoutMs when set, 20 min otherwise) ensures we never wait forever for unlimited jobs. Version bumped to 0.1.21. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 01:40:21 +00:00
Chris Farhood	2625b8ffb3	fix: wait for pod termination before readPodLogs fallback (FAR-52) The @kubernetes/client-node v1.x Log.follow stream closes prematurely (known upstream TODO). Combined with Node.js buffering stdout to pipes, the live log stream always returns empty. When the 30s grace timer fires and the stream is empty, the container may still be running. Add waitForPodTermination() to block in the empty-stdout fallback path until the container actually exits (up to 120s), then read its complete output with readNamespacedPodLog. This makes runs complete successfully instead of looping indefinitely in in_progress. Bump version to 0.1.20. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-25 01:21:40 +00:00
Chris Farhood	2b4049464c	feat: per-agent mutex, fail-closed guard, SIGTERM cleanup (FAR-40) - Add agentCreationMutex (Map<agentId, Promise>) that serializes guard-check + job-create per agent, eliminating the TOCTOU race where two concurrent execute() calls both pass the list-then-create check. - Change catch {} on listNamespacedJob errors to return errorCode: "k8s_concurrency_guard_unreachable" (fail-closed) instead of silently bypassing the concurrency guard. - Add ensureSigtermHandler() which tracks active Jobs in activeJobs Map and deletes all of them (plus prompt Secrets) on SIGTERM before exit. - Track orphaned-job reattaches in activeJobs for consistent cleanup. - Update execute.test.ts: change "proceeds on list error" test to assert k8s_concurrency_guard_unreachable; add mutex serialization test and SIGTERM handler registration tests. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-25 00:22:17 +00:00
Chris Farhood	c05d1d7515	feat: log stream reconnect, dedup, bail, keepalive (FAR-38) - Split streamPodLogs into streamPodLogsOnce (with bail timer + stopSignal) and streamPodLogs (reconnect loop, up to MAX_LOG_RECONNECT_ATTEMPTS=50) - LogLineDedupFilter suppresses replayed JSONL events on reconnect, keyed by type+sessionID+part.id (OpenCode shape) - Bail timer (LOG_STREAM_BAIL_TIMEOUT_MS=3s) forces writable.destroy() + promise resolution when stopSignal fires and logApi.log hangs - Keepalive: emits '[paperclip] keepalive — job X running (Ns since last output)' every 15s during silent phases, with 2-consecutive-reading latch to avoid false-positive terminal detections - completionGraced uses logExitTime + grace poller so log stream stop signal is set immediately when job condition resolves - All 235 tests pass, tsc clean Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 22:14:36 +00:00
Chris Farhood	61d2a42a66	feat: inherit valueFrom/envFrom env from Deployment; prefer paperclip container - SelfPodInfo gains inheritedEnvValueFrom (V1EnvVar[]) and inheritedEnvFrom (V1EnvFromSource[]) - Container selection now prefers the container named "paperclip", falls back to first - buildJobManifest appends valueFrom env vars (skipping names already overridden) and sets envFrom on the opencode container when present - Tests updated: mock updated, 5 new cases covering secretKeyRef forwarding, dedup, envFrom passthrough, and empty-envFrom omission Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 22:12:31 +00:00
Chris Farhood	84dc0f5930	fix: merge parsedError + podFailureDescription so OOMKilled surfaces in errorMessage When both a JSONL error (e.g. "killed") and a pod terminated reason (e.g. "OOMKilled") are present, join them with "; " so the richer pod classification is never silently dropped by the parsedError short-circuit. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 22:10:42 +00:00
Chris Farhood	d60afaebcd	feat: pod-failure classification, partial stdout fallback, llm_api_error - Replace getPodExitCode with getPodTerminatedInfo to capture exit code and reason (OOMKilled, Error, etc.) from terminated container state; pod failure description now surfaces in returned errorMessage - Add partial-stdout fallback: readPodLogs is triggered when stdout is non-empty but contains no sessionId (missing session result), not just when stdout is fully empty - Detect empty LLM response: when a session ran but produced 0 output tokens and no messages, return errorCode "llm_api_error" - Add 13 new unit tests covering all three new paths Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 22:09:33 +00:00
Chris Farhood	13c2a3032b	feat: UI parser kinds, nodeSelector textarea, step-limit session clear, per-line path redaction - ui-parser: add thinking kind + handler for standalone thinking events, thinking blocks in assistant content arrays, and user-turn tool_result blocks - job-manifest: parseKeyValueOrObject helper so nodeSelector (and labels) accept key=value textarea lines in addition to JSON objects - parse: isOpenCodeStepLimitResult detects step_finish with max_turns / max_steps / step_limit reason - execute: return clearSession:true when step limit reached so next run starts fresh; redactHomePathUserSegments moved to per-line to prevent paths split across chunks - tests: ui-parser.test.ts (new), extended parse.test.ts and job-manifest.test.ts Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 22:01:35 +00:00
Pawla Abdul	e53bcf2501	Replace local utility stubs with fork's adapter-utils imports - Replace joinPromptSections, stringifyPaperclipWakePayload, and renderPaperclipWakePrompt with imports from adapter-utils/server-utils (the fork's renderPaperclipWakePrompt adds execution stage routing, resume delta sections, and full comment batch rendering) - Replace local inferOpenAiCompatibleBiller with import from adapter-utils - Declare sessionManagement using getAdapterSessionManagement("opencode_local") with fallback defaults for proper session compaction policy - Add log redaction via redactHomePathUserSegments in streamPodLogs - Bump peerDependency to >=0.3.1 and version to 0.1.14 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-14 11:00:33 +00:00
Chris Farhood	be7c525063	Initial commit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 23:08:05 -04:00

13 Commits