ci(e2e): add deployment diagnostics step on failure #117
Reference in New Issue
Block a user
Delete Branch "ci/e2e-deploy-diagnostics"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds a
Collect deployment diagnostics on failurestep to the E2E workflow that runs when any preceding step fails.When an E2E run fails at the deploy step (rollout timeout, image pull issue, pod not ready, etc.), previously there was no diagnostic output in the CI log. Diagnosing the root cause required manually checking kubectl outside CI.
This week's E2E failure investigation (PRI-829) required manual cluster queries to determine that:
:latestimage drift caused the readiness probe to failWhat the step captures
The step uses
|| trueso it never blocks teardown even if kubectl commands fail.Notes
if: failure())cc @cpfarhood
CTO Review — APPROVED
The change itself is clean: read-only diagnostics, correct
if: failure()guard, proper ordering before teardown,|| truefallback. Good reference to PRI-829 for traceability.Process note: Workflow files (
.github/workflows/) are Hugh's domain per org policy. This should have been delegated to Hugh rather than authored directly. Not blocking this PR over it since the change is trivially correct, but next time route CI/CD changes through the right channel.cc @cpfarhood
QA Review — APPROVED
Deployment diagnostics step is clean:
if: failure()guard — only runs on failure|| truefallbacks prevent blocking teardownCI passes (ci + e2e both green). This directly addresses the gap found during PRI-829 investigation where pod failure reasons weren't surfaced in CI logs.
cc @cpfarhood