fix: prevent process_lost when K8s Job completes (FAR-10) #9
Reference in New Issue
Block a user
Delete Branch "fix/far-10-process-lost-after-job-complete"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Four stacked bugs in
src/server/execute.tscaused theclaude_k8sadapter to hang after a K8s Job completed, letting the 5-minute reaper mark the heartbeat runfailed/process_losteven when the Job succeeded.Bug 1 — Log follow outlives the Job:
streamPodLogsOncehad no mechanism to abort an in-flightlogApi.log(..., follow: true)call when the job-completion branch fired. Added a 200ms polling loop that destroys theWritableoncestopSignal.stoppedis set, aborting the hung HTTP stream. ThestopSignalis also now threaded throughstreamPodLogs→streamPodLogsOnce.Bug 2 —
waitForPodentered log-stream path onphase=Failed: The original code treatedphase === "Failed"the same asRunning, returning the pod name and continuing into log streaming against a dead pod. Fixed to throw a structured error viadescribePodTerminatedError(new exported helper) so callers get a real error with exit code and reason.Bug 3 — Terminated container state not surfaced: The per-tick status detail loop omitted
cs.state?.terminated, so operators saw onlyphase=Failedwith no indication of why the claude container exited. Added the terminated case with exit code and reason.Bug 4 — Keepalive stopped refreshing
updatedAtimmediately at Job terminal: OncekeepaliveJobTerminal = true,onSpawnstopped firing, meaning any cleanup activity (job deletion, log parsing, K8s API calls) after the Job went terminal would trip the 5-minute reaper. Added aPOST_TERMINAL_KEEPALIVE_MS(90s) window during whichonSpawncontinues refreshing at a lower cadence.Test plan
describePodTerminatedErrorunit tests: phase=Failed with claude container (exit code + reason), fallback tomessagefield, empty container list, non-claude containers, nullexitCodenpm test)🤖 Generated with Claude Code