fix: P0+P1 correctness fixes (FAR-107 PR 1-2/3) #3

Merged
farhoodliquor-paperclip[bot] merged 3 commits from fix/p0-correctness-far107 into master 2026-04-20 19:41:17 +00:00
farhoodliquor-paperclip[bot] commented 2026-04-20 18:57:34 +00:00 (Migrated from github.com)

Summary

P0 + P1 correctness and operational fixes from the FAR-104 / FAR-105 analysis. Ships items 1-8, 10-11, 13 from FAR-107.

PR 1 — P0 Correctness (commit d74b6d3)

  • Inherit envFrom and env.valueFrom from self pod — Secrets wired via valueFrom.secretKeyRef or envFrom.secretRef are now forwarded to Job pods, fixing silent credential drops.
  • Distinguish 404 vs transient errors in keepalive — Only 404 marks keepalive terminal; transient errors logged and retried.
  • Fail closed on concurrency-guard read failure — Returns k8s_concurrency_guard_unreachable error instead of silently proceeding.
  • Bound waitForJobCompletion re-check — 60s timeout instead of polling forever.

PR 2 — P1 Correctness / Operational (commit 1e517bb)

  • Cap log stream reconnects at 50 — Prevents infinite loops during API partitions.
  • Fire keepalive refresh earlier — Tick 1 + every 12 ticks (~3min) for better reaper safety margin.
  • Catch onLog rejections in keepalive — Prevents unhandledRejection on SSE backpressure.
  • Prevent sanitized-name collisions — 16-char slugs + 6-char SHA-256 hash suffix, ac- prefix.
  • Fix config-hint paritynodeSelector and labels now parse both key=value text and JSON.
  • Large-prompt Secret fallback — Prompts >256 KiB staged via K8s Secret, avoiding ~1 MiB PodSpec limit.
  • Track last-seen log timestamp — Reconnect window anchored at last received line (fixes FAR-105 duplicative logs). Plus parser-level text dedup.

Test plan

  • npm run typecheck passes
  • All 200 tests pass (npm test) — 12 new tests added
  • Manual: deploy with secretRef-based ANTHROPIC_API_KEY
  • Manual: test with prompts >256 KiB
  • Manual: verify nodeSelector with key=value textarea input

🤖 Generated with Claude Code

## Summary P0 + P1 correctness and operational fixes from the [FAR-104](https://github.com/farhoodliquor/paperclip-adapter-claude-k8s/issues/FAR-104) / FAR-105 analysis. Ships items 1-8, 10-11, 13 from FAR-107. ### PR 1 — P0 Correctness (commit d74b6d3) - **Inherit `envFrom` and `env.valueFrom` from self pod** — Secrets wired via `valueFrom.secretKeyRef` or `envFrom.secretRef` are now forwarded to Job pods, fixing silent credential drops. - **Distinguish 404 vs transient errors in keepalive** — Only 404 marks keepalive terminal; transient errors logged and retried. - **Fail closed on concurrency-guard read failure** — Returns `k8s_concurrency_guard_unreachable` error instead of silently proceeding. - **Bound `waitForJobCompletion` re-check** — 60s timeout instead of polling forever. ### PR 2 — P1 Correctness / Operational (commit 1e517bb) - **Cap log stream reconnects at 50** — Prevents infinite loops during API partitions. - **Fire keepalive refresh earlier** — Tick 1 + every 12 ticks (~3min) for better reaper safety margin. - **Catch onLog rejections in keepalive** — Prevents unhandledRejection on SSE backpressure. - **Prevent sanitized-name collisions** — 16-char slugs + 6-char SHA-256 hash suffix, `ac-` prefix. - **Fix config-hint parity** — `nodeSelector` and `labels` now parse both `key=value` text and JSON. - **Large-prompt Secret fallback** — Prompts >256 KiB staged via K8s Secret, avoiding ~1 MiB PodSpec limit. - **Track last-seen log timestamp** — Reconnect window anchored at last received line (fixes FAR-105 duplicative logs). Plus parser-level text dedup. ## Test plan - [x] `npm run typecheck` passes - [x] All 200 tests pass (`npm test`) — 12 new tests added - [ ] Manual: deploy with secretRef-based ANTHROPIC_API_KEY - [ ] Manual: test with prompts >256 KiB - [ ] Manual: verify nodeSelector with key=value textarea input 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign in to join this conversation.