PR #146 moved RBAC management to the infra repo (Flux). The local
RBAC apply steps were removed from the E2E workflow, but this file
was inadvertently left behind. It is now orphaned since the
polaris-dashboard-proxy-reader Role and RoleBinding are managed
by Flux.
See PRI-916 for context.
The 'local' bash keyword can only be used inside a function. Using it
at top-level of a run: block causes 'local: can only be used in a
function' error and exits the script with code 1.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The ARC runner has no static kubeconfig at any of the expected paths
(/runner/config, ~/.kube/config). It DOES have a service account token
(/var/run/secrets/kubernetes.io/serviceaccount/token) and
KUBERNETES_SERVICE_HOST=10.43.0.1, confirming in-cluster access.
This commit adds a third fallback tier: when no static kubeconfig is
found AND the runner is in-cluster (service account token present),
generate a kubeconfig from the in-cluster service account credentials.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous diagnostic step used $KUBECONFIG and $HOME directly,
which causes 'unbound variable' exit when run with set -euo pipefail
and KUBECONFIG is unset. Use ${VAR:-} defaults throughout.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Adds a comprehensive diagnostic block that prints env vars, lists all
known kubeconfig paths, checks in-cluster service account, and attempts
kubectl config view. This will reveal the actual path on the runner.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous loop silently skipped if no kubeconfig was found, causing
kubectl commands to fall back to localhost:8080. Use explicit paths
in priority order with a hard error if none exist.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replace the impersonation check with direct verification of RBAC
resources. The kubectl auth can-i --as check fails with
localhost:8080 because kubectl cannot find kubeconfig. Instead,
directly verify that the Role and RoleBinding were created
by kubectl apply.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Test if kubectl apply dry-run works without KUBECONFIG (the original
behavior that succeeded). Also test kubectl auth can-i without KUBECONFIG
(to confirm the failure mode). Compare with KUBECONFIG set to service account.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ARC runner has no kubeconfig file. Use the service account
token at /var/run/secrets/kubernetes.io/serviceaccount/ to build
a kubeconfig that connects to the Kubernetes API server from
within the pod. This is the standard in-cluster access pattern.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Test if kubectl can find kubeconfig without explicit KUBECONFIG
on the ARC runner. kubectl config view --raw shows the config
content if it exists, kubectl cluster-info tests connectivity.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ls -la /github/ exits with code 2 when /github/ doesn't exist,
causing set -e to fail the step. Remove that listing.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Check /paperclip/.kube, /paperclip/.kube/config, /home/runner/.kube,
/home/runner/.kube/config, /runner, and /runner/config. Export
KUBECONFIG so kubectl uses the real cluster.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
ARC runner stores kubeconfig in /home/runner/k8s/config (mounted
by Actions Runtime). Add both k8s and k8s-novolume to the search
paths and remove non-existent paths from diagnostics.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Add ls and echo diagnostics to understand where ARC runners store
kubeconfig. Include ACTIONS_KUBECONFIG and HOME env vars.
Also add $HOME/.kube to the search paths.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The 'kubectl auth can-i --as' impersonation check was falling back to
localhost:8080 because KUBECONFIG was not set and the ARC runner's
kubeconfig was not in the default location. azure/setup-kubectl@v4
does not set KUBECONFIG — it installs kubectl and relies on the runner's
existing kubeconfig in /runner/.kube/config (ARC runner home).
Add a 'Locate kubeconfig for ARC runner' step that searches the known
runner kubeconfig paths before the RBAC step runs, exports KUBECONFIG
to GITHUB_ENV, and verifies cluster connectivity before proceeding.
Fixes: PRI-785
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Apply e2e-ci-runner RBAC + polaris RBAC in workflow before pre-flight check
- Add e2e-ci-runner-polaris Role+RoleBinding so CI runner can manage polaris namespace RBAC
- Add roles/rolebindings CRUD to e2e-ci-runner Role (headlamp-dev namespace)
- Collapsed MISSING_ROLE/MISSING_ROLEBINDING into single MISSING flag (QA nit)
- Drop non-standard --quiet flag on kubectl auth can-i (QA nit)
Address PRI-324 QA feedback: workflow now applies its own RBAC so the pre-flight
check is meaningful and the green path is achievable.
* chore: replace Dependabot references with Renovate
- SECURITY.md: update to mention Renovate (org-wide Mend Renovate)
- PROJECT_ASSESSMENT.md: mark Renovate as integrated (org-wide config)
Closes PRI-389. Parent PRI-387.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix: override picomatch >=4.0.4 and vite >=6.4.2 to patch high-severity vulnerabilities
Resolves 3 high-severity vulnerabilities from pnpm audit:
- GHSA-c2c7-rcm5-vvqj: Picomatch ReDoS via extglob quantifiers (>=4.0.0 <4.0.4)
- GHSA-p9ff-h696-f583: Vite arbitrary file read via dev server WebSocket
- GHSA-4w7w-66w2-5vf9: Vite path traversal in optimized deps .map handling
Also addresses moderate GHSA-3v7f-55p6-f55p (picomatch method injection).
Remaining vulnerabilities (moderate/low) are in transitive dependencies
managed by @kinvolk/headlamp-plugin and @headlamp-k8s/eslint-config
which require upstream updates to those packages.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
---------
Co-authored-by: Chris Farhood <chris@farhood.org>
Co-authored-by: Paperclip <noreply@paperclip.ing>
The E2E workflow and deploy scripts were targeting the legacy
privilegedescalation-dev namespace, which is not managed by Flux GitOps
in privilegedescalation/infra.
The infra repo (PR #11) already provisions the headlamp-dev namespace
and corresponding RBAC (e2e-ci-runner-headlamp-rbac.yaml) that grants
the ARC runner SA (runners-privilegedescalation-gha-rs-no-permission in
arc-runners) the permissions needed to deploy/teardown the E2E
Headlamp instance.
This change aligns all E2E infrastructure to use headlamp-dev:
- .github/workflows/e2e.yaml: E2E_NAMESPACE=headlamp-dev
- scripts/deploy-e2e-headlamp.sh: default namespace and comments
- scripts/teardown-e2e-headlamp.sh: default namespace
- deployment/e2e-ci-runner-rbac.yaml: namespace and add missing events
permission (already present in infra copy)
Refs: PRI-423
Co-authored-by: Chris Farhood <chris@farhood.org>
Co-authored-by: Paperclip <noreply@paperclip.ing>
* fix: override lodash >=4.18.0 to patch code injection vulnerability
GHSA-r5fr-rjxr-66jc is a code injection vulnerability in lodash
below 4.18.0. The vulnerable transitive dependency comes through
@kinvolk/headlamp-plugin.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: update pnpm-lock.yaml to satisfy lodash override
The package.json pnpm.overrides requires lodash >=4.18.0, but the lockfile
had an older version. Regenerated lockfile with pnpm install.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): scope heading locators to main content area
Fix E2E test failures by scoping heading locators to the main
content area instead of searching the entire page. This prevents
matching headings in the sidebar or other non-content areas.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): scope remaining getByText to main element
The 'Cluster Score' text matcher was still searching the entire page
instead of being scoped to the main content area. This could cause
false positives if the same text appears in the sidebar.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: trigger fresh E2E run
Re-pushing to trigger a new CI run since the last E2E was cancelled.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): use [role=main] instead of main element
Switch from 'main' element selector to '[role="main"]' attribute
selector for better compatibility with Headlamp's app structure.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* fix(e2e): hybrid approach - unscoped headings, main-scoped text
Use broader heading selectors matching intel-gpu pattern, but
keep text checks scoped to main element to avoid sidebar conflicts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: re-test original code to verify baseline
---------
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
When the E2E deploy step fails (rollout timeout, pod not ready, etc.),
previously required manual cluster investigation to diagnose the root
cause. This heartbeat had to grep CI logs and query kubectl separately
to determine a :latest image drift issue.
The new step captures pod state, pod describe output, and recent namespace
events immediately when a failure occurs — surfacing the root cause
directly in the CI run log.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The :latest tag caused E2E flakiness when a newer Headlamp image was
pulled on some cluster nodes (IfNotPresent policy) but not others.
Concurrent E2E runs on main saw different image versions, and the newest
:latest (sha256:89c6c65) failed to pass the readiness probe within 120s.
Pin to v0.40.1 — the same version running in production (kube-system) —
so all nodes use the same cached digest and CI is deterministic. Update
this pin when Headlamp is upgraded in production.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
kubectl apply without prior deletion patches in place: if the pod spec is
unchanged between runs, no rollout is triggered and a potentially degraded
pod from a prior run keeps serving. This caused the auth.setup.ts timeout
(waiting for the "use a token" button) even when no concurrent runs were
present — the headlamp-e2e pod was in an inconsistent state from a previous
run that didn't tear down cleanly.
Changes:
- deploy-e2e-headlamp.sh: delete Deployment, Service, and ServiceAccount
(with --wait) before applying, guaranteeing a fresh pod each run
- auth.setup.ts: add explicit waitFor({ state: 'visible', timeout: 15_000 })
before the "use a token" button click, so failures surface at 15 s with a
clear locator error rather than silently timing out at 60 s
Fixes the pre-existing infra issue blocking PR#110.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cancel-in-progress: true would cancel in-flight E2E runs when a new one
arrives. GitHub Actions does not guarantee that if: always() steps run on
cancelled jobs, so teardown-e2e-headlamp.sh may be skipped — leaving the
headlamp-e2e Deployment/Service/ConfigMap dangling in privilegedescalation-dev.
Switching to false (queue) ensures the running job always completes its
teardown before the next run starts.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Prevents parallel E2E runs from conflicting over the shared
headlamp-e2e Helm release in privilegedescalation-dev. With
cancel-in-progress: true, a new push cancels any in-progress
run on the same repo — only one E2E suite runs at a time.
Observed failure: PR#109 and PR#108 ran concurrently and the
auth setup in PR#109 timed out, likely due to resource contention
on the shared headlamp-e2e instance.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Replaces the duplicated Renovate config with a simple extend from the
org-level preset (privilegedescalation/.github:renovate-config). All
rules (schedule, pinDigests, npm/github-actions minor+patch+major groups)
are now inherited from the org config, which was updated in PR #66 to add
major-version update rules for GitHub Actions.
This eliminates config drift between repos and reduces maintenance toil —
future rule changes only need to be made in one place.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Remove "Coverage threshold: Vitest coverage threshold enforced in CI
(#82)" — the shared CI workflow does not run coverage; this line was
inaccurate.
- Fix [1.0.0] compare link from v0.6.0...v1.0.0 to v0.7.2...v1.0.0
to accurately reflect the last tagged release.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bump version to 1.0.0 in package.json and package-lock.json
- Update artifacthub-pkg.yml: version, archive-url, and changes section
- Add v1.0.0 CHANGELOG entry covering changes since v0.7.2
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: add packageManager field to pin pnpm version
Pins pnpm@10.32.1 via the packageManager field. This allows
pnpm/action-setup@v4 to resolve the version from package.json instead
of relying on `version: latest`, preventing silent breakage on major
pnpm version bumps.
Fixes: PRI-674
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: trigger fresh CI run after Corepack fix merges in .github PR #57
The shared plugin-ci.yaml now uses Corepack when packageManager field
is set, avoiding the 'Multiple versions of pnpm specified' error.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
* ci: retrigger CI with updated shared workflow (python3 pnpm detection)
---------
Co-authored-by: Hugh Hackman <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: privilegedescalation-ceo[bot] <269721483+privilegedescalation-ceo[bot]@users.noreply.github.com>
Co-authored-by: Hugh Hackman <hugh@privilegedescalation.paperclip.ing>
Co-authored-by: Hugh Hackman <hugh@privilegedescalation.com>
The org renovate-config.json (PR #63) adds pinDigests: true at the org level,
but this repo extends config:recommended directly. Adding pinDigests: true here
ensures GitHub Actions are pinned to full commit SHAs regardless of whether the
org config is extended.
Related: privilegedescalation/.github#63, PRI-757
kubectl rollout status confirms the pod is ready per readinessProbe, but
Kubernetes Service DNS propagation to the runner pod may lag behind.
This caused intermittent E2E failures with ERR_NAME_NOT_RESOLVED.
Add a poll loop (max 120s) after rollout status that verifies the service
URL is reachable via HTTP before writing .env.e2e. This eliminates the
race condition between DNS propagation and Playwright launch.
Fixes: PRI-687 (intermittent E2E DNS failure)
Calls the shared privilegedescalation/.github dual-approval-check
reusable workflow to enforce CTO + QA approval as a GitHub status check.
Once privilegedescalation/.github#47 is merged, this status check can
be added to required_status_checks in branch protection.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
pnpm strict mode does not hoist transitive deps. @headlamp-k8s/eslint-config
was only available via @kinvolk/headlamp-plugin, causing ESLint to fail with
"Cannot find config @headlamp-k8s/eslint-config to extend from". Adding it
as a direct devDependency makes it accessible at the root node_modules level.
Co-Authored-By: Paperclip <noreply@paperclip.ing>