When waitForJobCompletion threw and the job was still not terminal, we
were returning an error but still deleting the job in the finally block.
This left the UI holding an error while the job (still alive) would be
cleaned up by Kubernetes, causing the next heartbeat to find nothing and
think it was safe to retry — spawning a concurrent pod.
Now we set skipCleanup=true when returning the mismatch error, so the
job is retained and the heartbeat can still find and wait on it.
Also removes a duplicate empty-stdout fallback block.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When waitForJobCompletion threw a transient error (API disconnect, etc.),
the code fell through with jobTimedOut=true and returned a result even
though the job was still running. This caused the UI to think the run
was complete while the job kept running, resulting in concurrency errors.
Now when completion throws, we re-check the job's actual state. If still
not terminal, we return a k8s_job_state_mismatch error so the UI knows
the run is not done.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds supportsInstructionsBundle, instructionsPathKey, and
requiresMaterializedRuntimeSkills flags so the UI renders the
bundle editor for claude_k8s agents. Bumps adapter-utils peer
dep to the canary that includes the capability type fields.
Co-Authored-By: Paperclip <noreply@paperclip.ing>