Four stacked bugs caused the adapter to hang after K8s Job completion,
allowing the 5-minute reaper to mark runs process_lost even when the Job
actually succeeded.
- streamPodLogsOnce: add stopSignal polling loop that destroys the
writable every 200ms once the job-completion branch fires, aborting
any in-flight follow stream that would otherwise hang indefinitely
- waitForPod: treat phase=Failed as a terminal error (throw via
describePodTerminatedError) instead of entering the log-stream path
with a dead pod (new helper is exported for unit tests)
- waitForPod: surface cs.state?.terminated in the per-tick detail line
so operators see exit code / reason without needing kubectl
- keepalive: add POST_TERMINAL_KEEPALIVE_MS (90s) window after Job goes
terminal so onSpawn keeps refreshing updatedAt during cleanup; if
execute() genuinely stalls past 90s the reaper will still catch it
Regression tests added for describePodTerminatedError (phase=Failed
with and without claude container status).
Co-Authored-By: Paperclip <noreply@paperclip.ing>