The E2E Headlamp instance runs in privilegedescalation-dev but needs to proxy
to the Polaris dashboard service in the polaris namespace to fetch audit results.
Root cause:
- E2E tests consistently fail with 'Polaris dashboard not reachable' because
the in-cluster Headlamp (running as ServiceAccount headlamp-e2e-test in
privilegedescalation-dev) lacks permission to proxy to polaris-dashboard
in the polaris namespace
- The default RBAC only covered the privilegedescalation-dev namespace
- The error manifests as a 503 from the Kubernetes API proxy, causing
the loading spinner to persist indefinitely in E2E runs
Fix:
- Add a new Role + RoleBinding for the polaris namespace that grants
get+proxy on the polaris-dashboard service
- The ARC runner's ServiceAccount (runners-privilegedescalation-gha-rs-no-permission
in arc-runners) is the subject for both bindings, matching the existing pattern
- Add a pre-flight check in deploy-e2e-headlamp.sh that warns if Polaris
proxy RBAC is missing, so CI output makes the issue self-diagnosing
Note: This RBAC change must be applied to the cluster before E2E runs will
pass. The deploy script detects and warns about the missing permission.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
When the E2E deploy step fails (rollout timeout, pod not ready, etc.),
previously required manual cluster investigation to diagnose the root
cause. This heartbeat had to grep CI logs and query kubectl separately
to determine a :latest image drift issue.
The new step captures pod state, pod describe output, and recent namespace
events immediately when a failure occurs — surfacing the root cause
directly in the CI run log.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The :latest tag caused E2E flakiness when a newer Headlamp image was
pulled on some cluster nodes (IfNotPresent policy) but not others.
Concurrent E2E runs on main saw different image versions, and the newest
:latest (sha256:89c6c65) failed to pass the readiness probe within 120s.
Pin to v0.40.1 — the same version running in production (kube-system) —
so all nodes use the same cached digest and CI is deterministic. Update
this pin when Headlamp is upgraded in production.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
kubectl apply without prior deletion patches in place: if the pod spec is
unchanged between runs, no rollout is triggered and a potentially degraded
pod from a prior run keeps serving. This caused the auth.setup.ts timeout
(waiting for the "use a token" button) even when no concurrent runs were
present — the headlamp-e2e pod was in an inconsistent state from a previous
run that didn't tear down cleanly.
Changes:
- deploy-e2e-headlamp.sh: delete Deployment, Service, and ServiceAccount
(with --wait) before applying, guaranteeing a fresh pod each run
- auth.setup.ts: add explicit waitFor({ state: 'visible', timeout: 15_000 })
before the "use a token" button click, so failures surface at 15 s with a
clear locator error rather than silently timing out at 60 s
Fixes the pre-existing infra issue blocking PR#110.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cancel-in-progress: true would cancel in-flight E2E runs when a new one
arrives. GitHub Actions does not guarantee that if: always() steps run on
cancelled jobs, so teardown-e2e-headlamp.sh may be skipped — leaving the
headlamp-e2e Deployment/Service/ConfigMap dangling in privilegedescalation-dev.
Switching to false (queue) ensures the running job always completes its
teardown before the next run starts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Prevents parallel E2E runs from conflicting over the shared
headlamp-e2e Helm release in privilegedescalation-dev. With
cancel-in-progress: true, a new push cancels any in-progress
run on the same repo — only one E2E suite runs at a time.
Observed failure: PR#109 and PR#108 ran concurrently and the
auth setup in PR#109 timed out, likely due to resource contention
on the shared headlamp-e2e instance.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replaces the duplicated Renovate config with a simple extend from the
org-level preset (privilegedescalation/.github:renovate-config). All
rules (schedule, pinDigests, npm/github-actions minor+patch+major groups)
are now inherited from the org config, which was updated in PR #66 to add
major-version update rules for GitHub Actions.
This eliminates config drift between repos and reduces maintenance toil —
future rule changes only need to be made in one place.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-24 16:16:15 +00:00
6 changed files with 78 additions and 21 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.