ci(e2e): add deployment diagnostics step on failure

When the E2E deploy step fails (rollout timeout, pod not ready, etc.), previously required manual cluster investigation to diagnose the root cause. This heartbeat had to grep CI logs and query kubectl separately to determine a :latest image drift issue. The new step captures pod state, pod describe output, and recent namespace events immediately when a failure occurs — surfacing the root cause directly in the CI run log. Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-24 21:57:58 +00:00
parent 8f10be39bd
commit 4edc829b3f
1 changed files with 10 additions and 0 deletions
@@ -72,6 +72,16 @@ jobs:
          HEADLAMP_URL: ${{ env.HEADLAMP_URL }}
          HEADLAMP_TOKEN: ${{ env.HEADLAMP_TOKEN }}

+      - name: Collect deployment diagnostics on failure
+        if: failure()
+        run: |
+          echo "=== Pod state ==="
+          kubectl get pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
+          echo "=== Pod describe ==="
+          kubectl describe pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
+          echo "=== Recent namespace events ==="
+          kubectl get events -n "$E2E_NAMESPACE" --sort-by='.lastTimestamp' 2>&1 | tail -20 || true
+
      - name: Teardown E2E instance
        if: always()
        run: scripts/teardown-e2e-headlamp.sh