fix e2e: add comprehensive RBAC checks and deployment diagnostics #28

Closed
privilegedescalation-engineer[bot] wants to merge 12 commits from hugh/fix-e2e-deploy-script into main
privilegedescalation-engineer[bot] commented 2026-05-05 15:57:14 +00:00 (Migrated from github.com)

Summary

Fix E2E deployment script to properly diagnose RBAC issues and provide actionable debug output.

Changes

  • Comprehensive RBAC check: Now validates all required permissions (not just delete configmaps) before attempting deployment. Missing permissions fail fast with a clear error message naming the exact missing permission.
  • Deployment diagnostics on failure: On kubectl rollout status failure or service unreachability, script now dumps pod state, pod events, pod logs, and namespace events — matching the diagnostic step in the workflow but doing it inline so the deploy step itself produces actionable output.
  • Clearer error messages: Errors now include context about what the operator needs to do (grant RBAC permission to the workflow's service account in headlamp-dev).

Root cause addressed

The original deploy script only checked kubectl auth can-i delete configmaps. The actual deployment requires create on serviceaccounts/deployments/pods, get/list on pods, and create token on serviceaccounts. If any of these were missing, the script would fail mid-deployment with no diagnostic output — making the actual failure root cause opaque.

Testing

Manually verified RBAC check logic. The fix branch targets main so it will run through CI and the full E2E pipeline on merge.

cc @cpfarhood

## Summary Fix E2E deployment script to properly diagnose RBAC issues and provide actionable debug output. ## Changes - **Comprehensive RBAC check**: Now validates all required permissions (not just `delete configmaps`) before attempting deployment. Missing permissions fail fast with a clear error message naming the exact missing permission. - **Deployment diagnostics on failure**: On `kubectl rollout status` failure or service unreachability, script now dumps pod state, pod events, pod logs, and namespace events — matching the diagnostic step in the workflow but doing it inline so the deploy step itself produces actionable output. - **Clearer error messages**: Errors now include context about what the operator needs to do (grant RBAC permission to the workflow's service account in `headlamp-dev`). ## Root cause addressed The original deploy script only checked `kubectl auth can-i delete configmaps`. The actual deployment requires `create` on serviceaccounts/deployments/pods, `get`/`list` on pods, and `create token` on serviceaccounts. If any of these were missing, the script would fail mid-deployment with no diagnostic output — making the actual failure root cause opaque. ## Testing Manually verified RBAC check logic. The fix branch targets `main` so it will run through CI and the full E2E pipeline on merge. cc @cpfarhood
privilegedescalation-engineer[bot] commented 2026-05-05 19:13:37 +00:00 (Migrated from github.com)

Closing — superseded by #29 (canonical E2E consolidation PR). E2E infra changes have been consolidated into a single PR per repo per PRI-779.

Closing — superseded by #29 (canonical E2E consolidation PR). E2E infra changes have been consolidated into a single PR per repo per PRI-779.

Pull request closed

Sign in to join this conversation.