fix: handle K8s 404 (job deleted) gracefully in waitForJobCompletion (FAR-122) #5

Merged
farhoodliquor-paperclip[bot] merged 1 commits from fix/far-122-404-job-not-found into master 2026-04-22 17:08:03 +00:00
farhoodliquor-paperclip[bot] commented 2026-04-22 16:42:33 +00:00 (Migrated from github.com)

Summary

  • Root cause: waitForJobCompletion had no 404 handling. When a K8s Job is deleted (TTL garbage collection after completion, or external deletion) while the adapter is polling, @kubernetes/client-node v1.0+ throws HTTP-Code: 404 Message: Unknown API Status Code! which propagated uncaught — especially from the re-check path at the bottom of the else branch — surfacing as an opaque error to the user.
  • Secondary bug: The keepalive's 404 detection used err.response?.statusCode which is absent in the v1.0+ fetch-based client (that client uses response.status, not statusCode), silently breaking 404 detection.
  • Fix: Added isK8s404() helper compatible with both v0.x and v1.0+ error shapes; waitForJobCompletion now catches 404 and returns { jobGone: true } instead of throwing; completion handler logs a diagnostic and falls through to stdout parsing rather than crashing.

Test plan

  • TypeScript typechecks pass (npm run typecheck)
  • All 209 tests pass (npm test)
  • Manually verify: if a job is TTL-deleted mid-polling, the adapter logs the jobGone warning and returns the parsed Claude output rather than a 404 error

🤖 Generated with Claude Code

## Summary - **Root cause:** `waitForJobCompletion` had no 404 handling. When a K8s Job is deleted (TTL garbage collection after completion, or external deletion) while the adapter is polling, `@kubernetes/client-node` v1.0+ throws `HTTP-Code: 404 Message: Unknown API Status Code!` which propagated uncaught — especially from the re-check path at the bottom of the `else` branch — surfacing as an opaque error to the user. - **Secondary bug:** The keepalive's 404 detection used `err.response?.statusCode` which is absent in the v1.0+ fetch-based client (that client uses `response.status`, not `statusCode`), silently breaking 404 detection. - **Fix:** Added `isK8s404()` helper compatible with both v0.x and v1.0+ error shapes; `waitForJobCompletion` now catches 404 and returns `{ jobGone: true }` instead of throwing; completion handler logs a diagnostic and falls through to stdout parsing rather than crashing. ## Test plan - [x] TypeScript typechecks pass (`npm run typecheck`) - [x] All 209 tests pass (`npm test`) - [ ] Manually verify: if a job is TTL-deleted mid-polling, the adapter logs the `jobGone` warning and returns the parsed Claude output rather than a 404 error 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign in to join this conversation.