fix: scope cancel-polling to explicit cancellation states only (FAR-107)

shouldAbortForCancellation previously treated any non-`running` runStatus
as a cancellation signal — which made the keepalive's cancel-poll delete
the K8s Job whenever the heartbeat-runs API briefly returned a transient
or stale status (e.g. queued, pending, succeeded, failed, completed,
unknown) for an in-flight run.  The follow-up `waitForJobCompletion`
poll then observed the 404 and surfaced a spurious
`k8s_job_deleted_externally` error to the user, even though no human
or external system deleted the Job.

Privileged Escalation's "null-pointer-nancy" agent reproduced this on
runs that were never cancelled and were not adjacent to a paperclip
restart, ruling out the SIGTERM path that 0.1.50 already addressed.

Tighten the guard to fire only on `cancelled` / `cancelling`.  Other
terminal statuses are unreachable while the adapter is still executing
(the adapter's own return is what flips them) and even if observed
mid-run, they do not justify deleting a Job that may still be doing
real work — the natural completion path will tear it down.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
2026-04-26 21:24:11 +00:00
committed by Hugh Commit [agent]
parent dae9e18659
commit 49288fa5c7
2 changed files with 27 additions and 9 deletions
+14 -6
View File
@@ -1561,16 +1561,24 @@ describe("shouldAbortForCancellation", () => {
expect(shouldAbortForCancellation("cancelled")).toBe(true);
});
it("returns true when status is 'failed'", () => {
expect(shouldAbortForCancellation("failed")).toBe(true);
it("returns true when status is 'cancelling'", () => {
expect(shouldAbortForCancellation("cancelling")).toBe(true);
});
it("returns true when status is 'completed'", () => {
expect(shouldAbortForCancellation("completed")).toBe(true);
// FAR-107: terminal-but-not-cancelled statuses MUST NOT trigger Job deletion.
// The previous "anything but running" guard caused k8s_job_deleted_externally
// false positives for in-flight runs whenever the API briefly reported a
// transient/stale status.
it("returns false for non-cancellation terminal statuses (FAR-107)", () => {
expect(shouldAbortForCancellation("succeeded")).toBe(false);
expect(shouldAbortForCancellation("failed")).toBe(false);
expect(shouldAbortForCancellation("completed")).toBe(false);
});
it("returns true for any non-running non-empty string", () => {
expect(shouldAbortForCancellation("unknown")).toBe(true);
it("returns false for unknown statuses (FAR-107)", () => {
expect(shouldAbortForCancellation("unknown")).toBe(false);
expect(shouldAbortForCancellation("queued")).toBe(false);
expect(shouldAbortForCancellation("pending")).toBe(false);
});
});