be84428226
The error previously fired with no diagnostic context, making it impossible to distinguish (a) self-delete by our SIGTERM/cancel path, (b) TTL after a missed Complete condition, or (c) actual external deletion without cluster shell access. Two changes: 1. Grace-period verification: when the log stream exits and the 30s grace timer fires, do a one-shot readNamespacedJob before declaring the Job gone. If it's still there, settle as gracePeriodFired (not jobGone) so we don't mis-classify K8s condition propagation lag as deletion. 2. Forensic capture: track which of the three detection paths (completion-poll-404, grace-period-verify-404, recheck-poll-404) first observed the 404, the last successful Job conditions read, the poll count, elapsed time since pod-running, and stdout size. Append all of it to the errorMessage so the next occurrence is self-diagnosing. Co-Authored-By: Paperclip <noreply@paperclip.ing>