PR #146 moved RBAC management to the infra repo (Flux). The local
RBAC apply steps were removed from the E2E workflow, but this file
was inadvertently left behind. It is now orphaned since the
polaris-dashboard-proxy-reader Role and RoleBinding are managed
by Flux.
See PRI-916 for context.
The 'local' bash keyword can only be used inside a function. Using it
at top-level of a run: block causes 'local: can only be used in a
function' error and exits the script with code 1.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The ARC runner has no static kubeconfig at any of the expected paths
(/runner/config, ~/.kube/config). It DOES have a service account token
(/var/run/secrets/kubernetes.io/serviceaccount/token) and
KUBERNETES_SERVICE_HOST=10.43.0.1, confirming in-cluster access.
This commit adds a third fallback tier: when no static kubeconfig is
found AND the runner is in-cluster (service account token present),
generate a kubeconfig from the in-cluster service account credentials.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous diagnostic step used $KUBECONFIG and $HOME directly,
which causes 'unbound variable' exit when run with set -euo pipefail
and KUBECONFIG is unset. Use ${VAR:-} defaults throughout.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Adds a comprehensive diagnostic block that prints env vars, lists all
known kubeconfig paths, checks in-cluster service account, and attempts
kubectl config view. This will reveal the actual path on the runner.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous loop silently skipped if no kubeconfig was found, causing
kubectl commands to fall back to localhost:8080. Use explicit paths
in priority order with a hard error if none exist.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replace the impersonation check with direct verification of RBAC
resources. The kubectl auth can-i --as check fails with
localhost:8080 because kubectl cannot find kubeconfig. Instead,
directly verify that the Role and RoleBinding were created
by kubectl apply.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Test if kubectl apply dry-run works without KUBECONFIG (the original
behavior that succeeded). Also test kubectl auth can-i without KUBECONFIG
(to confirm the failure mode). Compare with KUBECONFIG set to service account.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ARC runner has no kubeconfig file. Use the service account
token at /var/run/secrets/kubernetes.io/serviceaccount/ to build
a kubeconfig that connects to the Kubernetes API server from
within the pod. This is the standard in-cluster access pattern.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Test if kubectl can find kubeconfig without explicit KUBECONFIG
on the ARC runner. kubectl config view --raw shows the config
content if it exists, kubectl cluster-info tests connectivity.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ls -la /github/ exits with code 2 when /github/ doesn't exist,
causing set -e to fail the step. Remove that listing.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Check /paperclip/.kube, /paperclip/.kube/config, /home/runner/.kube,
/home/runner/.kube/config, /runner, and /runner/config. Export
KUBECONFIG so kubectl uses the real cluster.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ARC runner stores kubeconfig in /home/runner/k8s/config (mounted
by Actions Runtime). Add both k8s and k8s-novolume to the search
paths and remove non-existent paths from diagnostics.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Add ls and echo diagnostics to understand where ARC runners store
kubeconfig. Include ACTIONS_KUBECONFIG and HOME env vars.
Also add $HOME/.kube to the search paths.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The 'kubectl auth can-i --as' impersonation check was falling back to
localhost:8080 because KUBECONFIG was not set and the ARC runner's
kubeconfig was not in the default location. azure/setup-kubectl@v4
does not set KUBECONFIG — it installs kubectl and relies on the runner's
existing kubeconfig in /runner/.kube/config (ARC runner home).
Add a 'Locate kubeconfig for ARC runner' step that searches the known
runner kubeconfig paths before the RBAC step runs, exports KUBECONFIG
to GITHUB_ENV, and verifies cluster connectivity before proceeding.
Fixes: PRI-785
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Apply e2e-ci-runner RBAC + polaris RBAC in workflow before pre-flight check
- Add e2e-ci-runner-polaris Role+RoleBinding so CI runner can manage polaris namespace RBAC
- Add roles/rolebindings CRUD to e2e-ci-runner Role (headlamp-dev namespace)
- Collapsed MISSING_ROLE/MISSING_ROLEBINDING into single MISSING flag (QA nit)
- Drop non-standard --quiet flag on kubectl auth can-i (QA nit)
Address PRI-324 QA feedback: workflow now applies its own RBAC so the pre-flight
check is meaningful and the green path is achievable.
* chore: replace Dependabot references with Renovate
- SECURITY.md: update to mention Renovate (org-wide Mend Renovate)
- PROJECT_ASSESSMENT.md: mark Renovate as integrated (org-wide config)
Closes PRI-389. Parent PRI-387.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix: override picomatch >=4.0.4 and vite >=6.4.2 to patch high-severity vulnerabilities
Resolves 3 high-severity vulnerabilities from pnpm audit:
- GHSA-c2c7-rcm5-vvqj: Picomatch ReDoS via extglob quantifiers (>=4.0.0 <4.0.4)
- GHSA-p9ff-h696-f583: Vite arbitrary file read via dev server WebSocket
- GHSA-4w7w-66w2-5vf9: Vite path traversal in optimized deps .map handling
Also addresses moderate GHSA-3v7f-55p6-f55p (picomatch method injection).
Remaining vulnerabilities (moderate/low) are in transitive dependencies
managed by @kinvolk/headlamp-plugin and @headlamp-k8s/eslint-config
which require upstream updates to those packages.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
---------
Co-authored-by: Chris Farhood <chris@farhood.org>
Co-authored-by: Paperclip <noreply@paperclip.ing>
The E2E workflow and deploy scripts were targeting the legacy
privilegedescalation-dev namespace, which is not managed by Flux GitOps
in privilegedescalation/infra.
The infra repo (PR #11) already provisions the headlamp-dev namespace
and corresponding RBAC (e2e-ci-runner-headlamp-rbac.yaml) that grants
the ARC runner SA (runners-privilegedescalation-gha-rs-no-permission in
arc-runners) the permissions needed to deploy/teardown the E2E
Headlamp instance.
This change aligns all E2E infrastructure to use headlamp-dev:
- .github/workflows/e2e.yaml: E2E_NAMESPACE=headlamp-dev
- scripts/deploy-e2e-headlamp.sh: default namespace and comments
- scripts/teardown-e2e-headlamp.sh: default namespace
- deployment/e2e-ci-runner-rbac.yaml: namespace and add missing events
permission (already present in infra copy)
Refs: PRI-423
Co-authored-by: Chris Farhood <chris@farhood.org>
Co-authored-by: Paperclip <noreply@paperclip.ing>
* fix: override lodash >=4.18.0 to patch code injection vulnerability
GHSA-r5fr-rjxr-66jc is a code injection vulnerability in lodash
below 4.18.0. The vulnerable transitive dependency comes through
@kinvolk/headlamp-plugin.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: update pnpm-lock.yaml to satisfy lodash override
The package.json pnpm.overrides requires lodash >=4.18.0, but the lockfile
had an older version. Regenerated lockfile with pnpm install.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): scope heading locators to main content area
Fix E2E test failures by scoping heading locators to the main
content area instead of searching the entire page. This prevents
matching headings in the sidebar or other non-content areas.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): scope remaining getByText to main element
The 'Cluster Score' text matcher was still searching the entire page
instead of being scoped to the main content area. This could cause
false positives if the same text appears in the sidebar.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: trigger fresh E2E run
Re-pushing to trigger a new CI run since the last E2E was cancelled.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): use [role=main] instead of main element
Switch from 'main' element selector to '[role="main"]' attribute
selector for better compatibility with Headlamp's app structure.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): hybrid approach - unscoped headings, main-scoped text
Use broader heading selectors matching intel-gpu pattern, but
keep text checks scoped to main element to avoid sidebar conflicts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: re-test original code to verify baseline
---------
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
When the E2E deploy step fails (rollout timeout, pod not ready, etc.),
previously required manual cluster investigation to diagnose the root
cause. This heartbeat had to grep CI logs and query kubectl separately
to determine a :latest image drift issue.
The new step captures pod state, pod describe output, and recent namespace
events immediately when a failure occurs — surfacing the root cause
directly in the CI run log.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The :latest tag caused E2E flakiness when a newer Headlamp image was
pulled on some cluster nodes (IfNotPresent policy) but not others.
Concurrent E2E runs on main saw different image versions, and the newest
:latest (sha256:89c6c65) failed to pass the readiness probe within 120s.
Pin to v0.40.1 — the same version running in production (kube-system) —
so all nodes use the same cached digest and CI is deterministic. Update
this pin when Headlamp is upgraded in production.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
kubectl apply without prior deletion patches in place: if the pod spec is
unchanged between runs, no rollout is triggered and a potentially degraded
pod from a prior run keeps serving. This caused the auth.setup.ts timeout
(waiting for the "use a token" button) even when no concurrent runs were
present — the headlamp-e2e pod was in an inconsistent state from a previous
run that didn't tear down cleanly.
Changes:
- deploy-e2e-headlamp.sh: delete Deployment, Service, and ServiceAccount
(with --wait) before applying, guaranteeing a fresh pod each run
- auth.setup.ts: add explicit waitFor({ state: 'visible', timeout: 15_000 })
before the "use a token" button click, so failures surface at 15 s with a
clear locator error rather than silently timing out at 60 s
Fixes the pre-existing infra issue blocking PR#110.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cancel-in-progress: true would cancel in-flight E2E runs when a new one
arrives. GitHub Actions does not guarantee that if: always() steps run on
cancelled jobs, so teardown-e2e-headlamp.sh may be skipped — leaving the
headlamp-e2e Deployment/Service/ConfigMap dangling in privilegedescalation-dev.
Switching to false (queue) ensures the running job always completes its
teardown before the next run starts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Prevents parallel E2E runs from conflicting over the shared
headlamp-e2e Helm release in privilegedescalation-dev. With
cancel-in-progress: true, a new push cancels any in-progress
run on the same repo — only one E2E suite runs at a time.
Observed failure: PR#109 and PR#108 ran concurrently and the
auth setup in PR#109 timed out, likely due to resource contention
on the shared headlamp-e2e instance.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replaces the duplicated Renovate config with a simple extend from the
org-level preset (privilegedescalation/.github:renovate-config). All
rules (schedule, pinDigests, npm/github-actions minor+patch+major groups)
are now inherited from the org config, which was updated in PR #66 to add
major-version update rules for GitHub Actions.
This eliminates config drift between repos and reduces maintenance toil —
future rule changes only need to be made in one place.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-24 16:16:15 +00:00
13 changed files with 807 additions and 208 deletions
# PRI-324 Spec: Make E2E Workflow Self-Sufficient with RBAC
## Context
PR #123 introduced an RBAC pre-flight check to the E2E workflow. QA (Nancy, acting as QA) verified the "fails fast without RBAC" path works, but found that the "with RBAC passes" path had no green CI evidence — the workflow did not apply RBAC before the pre-flight check.
PR #131 attempted to fix this by adding `kubectl apply` steps and extending the CI runner RBAC, but its merge commit (739db6fe) was reverted by the next commit on main (aa1db921) due to a vulnerability fix PR (#128).
The current E2E workflow on `main` lacks the RBAC apply steps and CI runner permissions needed to make the pre-flight check meaningful.
## Required Changes
### 1. `.github/workflows/e2e.yaml`
Add between the "Setup kubectl" and "Install dependencies" steps:
- [ ] Workflow applies `deployment/e2e-ci-runner-rbac.yaml` before the pre-flight check
- [ ] Workflow applies `deployment/polaris-rbac.yaml` before the pre-flight check
- [ ] CI runner has RBAC to apply the manifests (added via new Role+RoleBinding in polaris namespace)
- [ ] E2E pipeline passes on the PR branch (proof of green path)
- [ ]`kubectl get … --quiet` flag removed (QA nit)
- [ ]`MISSING_ROLE`/`MISSING_ROLEBINDING` collapsed to single `MISSING` flag (QA nit)
## Definition of Done
PR #123 QA changes-requested are addressed: the workflow is self-sufficient (applies its own RBAC), the green path is demonstrated, and QA review is re-requested.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.