ci(e2e): add deployment diagnostics step on failure

When the E2E deploy step fails (rollout timeout, pod not ready, etc.),
previously required manual cluster investigation to diagnose the root
cause. This heartbeat had to grep CI logs and query kubectl separately
to determine a :latest image drift issue.

The new step captures pod state, pod describe output, and recent namespace
events immediately when a failure occurs — surfacing the root cause
directly in the CI run log.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
privilegedescalation-engineer[bot]
2026-03-24 21:57:58 +00:00
parent 8f10be39bd
commit 4edc829b3f
+10
View File
@@ -72,6 +72,16 @@ jobs:
HEADLAMP_URL: ${{ env.HEADLAMP_URL }}
HEADLAMP_TOKEN: ${{ env.HEADLAMP_TOKEN }}
- name: Collect deployment diagnostics on failure
if: failure()
run: |
echo "=== Pod state ==="
kubectl get pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
echo "=== Pod describe ==="
kubectl describe pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
echo "=== Recent namespace events ==="
kubectl get events -n "$E2E_NAMESPACE" --sort-by='.lastTimestamp' 2>&1 | tail -20 || true
- name: Teardown E2E instance
if: always()
run: scripts/teardown-e2e-headlamp.sh