Compare commits

..

10 Commits

Author SHA1 Message Date
privilegedescalation-engineer 17a9aa165a fix test: properly mock pod selector calls to resolve immediately
The withTimeout test was failing because:
1. The mock made ALL ApiProxy.request calls hang, but the implementation
   has 4 sequential requests (1 CRD + 3 pod selectors) each wrapped in
   their own withTimeout
2. Using advanceTimersByTimeAsync with hanging promises causes act() to
   hang because flushPromises() waits for pending promises

Fix:
- Use mockReturnValueOnce for the CRD call (hanging) and
  mockResolvedValueOnce for each pod selector call (resolves immediately)
- Use synchronous advanceTimersByTime() instead of async version
- Simplified test flow: check loading=true initially, advance timers,
  then verify crdAvailable=false and loading=false

Fixes PRI-1040
2026-03-25 09:03:03 +00:00
privilegedescalation-engineer 3e306b70f8 Merge remote changes and resolve conflict - keep QA-requested fix with never-resolving promise 2026-03-25 07:42:29 +00:00
privilegedescalation-engineer 3aa9c15e80 fix test: use never-resolving promise and fake timers for withTimeout
The previous mock used mockRejectedValue which immediately rejects,
so Promise.race resolved before withTimeout's setTimeout fired.
Now we use new Promise(() => {}) to simulate a hanging request
and advance timers to properly exercise the 2s timeout logic.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 07:41:47 +00:00
privilegedescalation-engineer 957cf144a7 fix: reapply formatting after rebase
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 07:21:22 +00:00
privilegedescalation-engineer 52b1429ba0 fix: reformat withTimeout call and add unit test for timeout behavior
- Reformat withTimeout call to single line (prettier)
- Add unit test for CRD timeout behavior (crdAvailable=false when API fails)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 07:20:31 +00:00
Gandalf the Greybeard 66575982af fix: add request timeout wrapper to prevent E2E test hang
Add withTimeout() helper that wraps ApiProxy.request calls with a 2s timeout.
This prevents the plugin from hanging indefinitely when CRD requests fail
or network issues occur in the E2E environment.

Root cause: ApiProxy.request to non-existent CRDs would hang forever,
causing the Loading Intel GPU data... progressbar to never resolve.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 07:20:19 +00:00
privilegedescalation-engineer 66932958b1 fix: reformat withTimeout call and add unit test for timeout behavior
- Reformat withTimeout call to single line (prettier)
- Add unit test for CRD timeout behavior (crdAvailable=false when API fails)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 07:18:19 +00:00
privilegedescalation-ceo[bot] 0d5f65176b ci: re-trigger workflows after Actions approval setting change 2026-03-25 07:06:07 +00:00
Gandalf the Greybeard 5670c008e1 fix: add request timeout wrapper to prevent E2E test hang
Add withTimeout() helper that wraps ApiProxy.request calls with a 2s timeout.
This prevents the plugin from hanging indefinitely when CRD requests fail
or network issues occur in the E2E environment.

Root cause: ApiProxy.request to non-existent CRDs would hang forever,
causing the Loading Intel GPU data... progressbar to never resolve.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 05:57:15 +00:00
privilegedescalation-engineer f9325772bd fix(e2e): use specific regex for nodes page heading
The /node/i regex was too broad and matched both the page heading
'Intel GPU — Nodes' and the empty state 'No GPU Nodes Found',
causing a strict mode violation in Playwright.

Use /intel gpu.*nodes/i to match only the actual page heading,
which contains 'Intel GPU' before 'Nodes'.
2026-03-25 01:55:02 +00:00
10 changed files with 122 additions and 63 deletions
-2
View File
@@ -16,5 +16,3 @@ jobs:
dual-approval:
uses: privilegedescalation/.github/.github/workflows/dual-approval-check.yaml@main
secrets: inherit
with:
pr_number: ${{ github.event.pull_request.number }}
+85 -4
View File
@@ -10,13 +10,94 @@ on:
permissions:
contents: read
# Only one E2E run at a time: the shared E2E_RELEASE (headlamp-e2e) in
# privilegedescalation-dev cannot be shared across concurrent runs.
# cancel-in-progress: false (queue, don't cancel) — cancelling in-flight
# runs may skip the if: always() teardown, leaving dangling cluster resources.
concurrency:
group: e2e-${{ github.repository }}
cancel-in-progress: false
env:
E2E_NAMESPACE: privilegedescalation-dev
E2E_RELEASE: headlamp-e2e
# Pin to a known-good Headlamp version. Using :latest is risky because
# the tag can change between CI runs, causing flaky failures when a newer
# image is pulled on some nodes but not others (IfNotPresent pull policy).
# Update this when Headlamp is upgraded in production (kube-system).
HEADLAMP_VERSION: v0.40.1
jobs:
e2e:
uses: privilegedescalation/.github/.github/workflows/plugin-e2e.yaml@main
with:
node-version: "22"
headlamp-version: v0.40.1
runs-on: runners-privilegedescalation
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
cache: 'npm'
- name: Setup kubectl
uses: azure/setup-kubectl@v4
- name: Install dependencies
run: npm ci
- name: Build plugin
run: npx @kinvolk/headlamp-plugin build
- name: Deploy E2E Headlamp instance
run: scripts/deploy-e2e-headlamp.sh
- name: Load E2E environment
run: |
if [ -f .env.e2e ]; then
cat .env.e2e >> "$GITHUB_ENV"
else
echo "::error::deploy-e2e-headlamp.sh did not produce .env.e2e"
exit 1
fi
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run E2E tests
run: npm run e2e
env:
HEADLAMP_URL: ${{ env.HEADLAMP_URL }}
HEADLAMP_TOKEN: ${{ env.HEADLAMP_TOKEN }}
- name: Collect deployment diagnostics on failure
if: failure()
run: |
echo "=== Pod state ==="
kubectl get pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
echo "=== Pod describe ==="
kubectl describe pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
echo "=== Recent namespace events ==="
kubectl get events -n "$E2E_NAMESPACE" --sort-by='.lastTimestamp' 2>&1 | tail -20 || true
- name: Teardown E2E instance
if: always()
run: scripts/teardown-e2e-headlamp.sh
- name: Upload Playwright report
uses: actions/upload-artifact@v7
if: failure()
with:
name: playwright-report
path: playwright-report/
retention-days: 7
- name: Upload test results
uses: actions/upload-artifact@v7
if: failure()
with:
name: test-results
path: test-results/
retention-days: 7
+3 -3
View File
@@ -1,4 +1,4 @@
version: "1.1.0"
version: "1.0.0"
name: headlamp-intel-gpu
displayName: Intel GPU
description: >-
@@ -99,7 +99,7 @@ screenshots:
url: https://raw.githubusercontent.com/privilegedescalation/headlamp-intel-gpu-plugin/main/docs/screenshots/03-metrics.svg
annotations:
headlamp/plugin/archive-url: "https://github.com/privilegedescalation/headlamp-intel-gpu-plugin/releases/download/v1.1.0/intel-gpu-1.1.0.tar.gz"
headlamp/plugin/archive-checksum: sha256:e212381f38c331383604b06f6552997fcba5c8b42a3bd828e3b43ed3e5028448
headlamp/plugin/archive-url: "https://github.com/privilegedescalation/headlamp-intel-gpu-plugin/releases/download/v1.0.0/intel-gpu-1.0.0.tar.gz"
headlamp/plugin/archive-checksum: sha256:93d6c531e7c12440c9625138f0645fc0c3521b574d0089492759699b324943f0
headlamp/plugin/version-compat: ">=0.20.0"
headlamp/plugin/distro-compat: "in-cluster,web,app"
+13 -21
View File
@@ -19,18 +19,16 @@ test.describe('Intel GPU plugin smoke tests', () => {
// Should navigate to the overview route
await expect(page).toHaveURL(/\/intel-gpu$/);
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
).toBeVisible();
await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible();
});
test('overview page renders GPU device list or empty state', async ({ page }) => {
await page.goto('/c/main/intel-gpu');
// Overview heading should be present
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible({
timeout: 15_000,
});
// Either a populated table/list or an empty-state indicator must be visible
const hasTable = await page.locator('table').first().isVisible().catch(() => false);
@@ -45,9 +43,9 @@ test.describe('Intel GPU plugin smoke tests', () => {
test('device plugins page renders or shows empty state', async ({ page }) => {
await page.goto('/c/main/intel-gpu/device-plugins');
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Device Plugins' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /device plugin/i })).toBeVisible({
timeout: 15_000,
});
const hasTable = await page.locator('table').first().isVisible().catch(() => false);
const hasEmptyState = await page
@@ -63,24 +61,18 @@ test.describe('Intel GPU plugin smoke tests', () => {
// not after clicking the parent entry from the overview. Test route
// accessibility via direct navigation — each route must render its heading.
await page.goto('/c/main/intel-gpu');
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible({
timeout: 15_000,
});
await page.goto('/c/main/intel-gpu/nodes');
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Nodes' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /intel gpu.*nodes/i })).toBeVisible({ timeout: 15_000 });
await page.goto('/c/main/intel-gpu/pods');
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Pods' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /pod/i })).toBeVisible({ timeout: 15_000 });
await page.goto('/c/main/intel-gpu/metrics');
await expect(
page.locator('main').getByRole('heading', { name: 'Intel GPU — Metrics' })
).toBeVisible({ timeout: 15_000 });
await expect(page.getByRole('heading', { name: /metric/i })).toBeVisible({ timeout: 15_000 });
});
test('plugin settings page shows intel-gpu plugin entry', async ({ page }) => {
+5 -5
View File
@@ -1,12 +1,12 @@
{
"name": "intel-gpu",
"version": "1.1.0",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "intel-gpu",
"version": "1.1.0",
"version": "1.0.0",
"license": "Apache-2.0",
"devDependencies": {
"@kinvolk/headlamp-plugin": "^0.13.0",
@@ -11600,9 +11600,9 @@
}
},
"node_modules/lodash": {
"version": "4.18.1",
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.18.1.tgz",
"integrity": "sha512-dMInicTPVE8d1e5otfwmmjlxkZoUpiVLwyeTdUsi/Caj/gfzzblBcCE5sRHV/AsjuCmxWrte2TNGSYuCeCq+0Q==",
"version": "4.17.23",
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.23.tgz",
"integrity": "sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==",
"dev": true,
"license": "MIT"
},
+2 -3
View File
@@ -1,6 +1,6 @@
{
"name": "intel-gpu",
"version": "1.1.0",
"version": "1.0.0",
"description": "Headlamp plugin for Intel GPU device plugin visibility and monitoring",
"repository": {
"type": "git",
@@ -44,7 +44,6 @@
},
"overrides": {
"tar": "^7.5.11",
"undici": "^7.24.3",
"lodash": ">=4.18.0"
"undici": "^7.24.3"
}
}
+5 -15
View File
@@ -5,7 +5,7 @@
# a ConfigMap volume mount. No custom Docker images — the plugin is built
# in CI and injected as a ConfigMap.
#
# E2E resources are deployed to the `headlamp-dev` namespace. Nothing
# E2E resources are deployed to the `privilegedescalation-dev` namespace. Nothing
# persists beyond the test run — teardown cleans up all created resources.
#
# Prerequisites:
@@ -14,7 +14,7 @@
# - RBAC applied: kubectl apply -f deployment/e2e-ci-runner-rbac.yaml
#
# Environment:
# E2E_NAMESPACE — namespace for E2E Headlamp (default: headlamp-dev)
# E2E_NAMESPACE — namespace for E2E Headlamp (default: privilegedescalation-dev)
# E2E_RELEASE — release/resource name prefix (default: headlamp-e2e)
# HEADLAMP_VERSION — Headlamp image tag (default: latest)
set -euo pipefail
@@ -22,7 +22,7 @@ set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
DIST_DIR="$REPO_ROOT/dist"
E2E_NAMESPACE="${E2E_NAMESPACE:-headlamp-dev}"
E2E_NAMESPACE="${E2E_NAMESPACE:-privilegedescalation-dev}"
E2E_RELEASE="${E2E_RELEASE:-headlamp-e2e}"
HEADLAMP_VERSION="${HEADLAMP_VERSION:-latest}"
@@ -59,21 +59,11 @@ kubectl create configmap headlamp-intel-gpu-plugin \
--from-file=package.json="$REPO_ROOT/package.json"
# --- Tear down any existing E2E deployment for a clean start ---
# Deleting the Deployment forces a fresh pod (new ReplicaSet) regardless of
# whether the pod spec changed. We do NOT delete the ServiceAccount — keeping
# it avoids a token-race condition where kubelet tries to mount a volume using a
# token that has been deleted but the new one isn't ready yet.
# The Service is NOT deleted — leaving it in place avoids an
# Endpoints UID race (FailedToUpdateEndpoint) that causes DNS resolution
# failures. kubectl apply below upserts the Service in-place, and the new
# pod's IP is added to the existing Endpoints automatically.
echo ""
echo "Removing any existing E2E deployment (clean-start)..."
kubectl delete deployment "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait
# ServiceAccount is kept — create it idempotently so the first run works too
kubectl create serviceaccount "${E2E_RELEASE}" \
-n "$E2E_NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl delete service "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait
kubectl delete serviceaccount "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait
# --- Deploy Headlamp via kubectl apply ---
echo ""
+2 -2
View File
@@ -4,13 +4,13 @@
# Tears down the dedicated E2E Headlamp instance deployed by deploy-e2e-headlamp.sh.
#
# Environment:
# E2E_NAMESPACE — namespace to clean up (default: headlamp-dev)
# E2E_NAMESPACE — namespace to clean up (default: privilegedescalation-dev)
# E2E_RELEASE — release/resource name prefix (default: headlamp-e2e)
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
E2E_NAMESPACE="${E2E_NAMESPACE:-headlamp-dev}"
E2E_NAMESPACE="${E2E_NAMESPACE:-privilegedescalation-dev}"
E2E_RELEASE="${E2E_RELEASE:-headlamp-e2e}"
echo "=== E2E Headlamp Teardown ==="
+1 -3
View File
@@ -106,13 +106,11 @@ describe('MetricsPage', () => {
vi.clearAllMocks();
});
it('shows loader when ctxLoading=true but heading is visible immediately', () => {
it('shows loader when ctxLoading=true', () => {
vi.mocked(useIntelGpuContext).mockReturnValue(makeContext({ loading: true }));
// fetchGpuMetrics should never be called in loading state
vi.mocked(fetchGpuMetrics).mockResolvedValue(null);
render(<MetricsPage />);
// Heading renders immediately, loader appears below it while waiting for context
expect(screen.getByText('Intel GPU — Metrics')).toBeInTheDocument();
expect(screen.getByTestId('loader')).toHaveTextContent('Loading Intel GPU data...');
});
+6 -5
View File
@@ -230,6 +230,10 @@ export default function MetricsPage() {
};
}, [ctxLoading, fetchSeq]);
if (ctxLoading) {
return <Loader title="Loading Intel GPU data..." />;
}
return (
<>
<div
@@ -243,7 +247,7 @@ export default function MetricsPage() {
<SectionHeader title="Intel GPU — Metrics" />
<button
onClick={() => void doFetch()}
disabled={fetching || ctxLoading}
disabled={fetching}
aria-label="Refresh metrics"
style={{
padding: '6px 16px',
@@ -251,18 +255,15 @@ export default function MetricsPage() {
color: 'var(--mui-palette-primary-main, #0071c5)',
border: '1px solid var(--mui-palette-primary-main, #0071c5)',
borderRadius: '4px',
cursor: fetching || ctxLoading ? 'not-allowed' : 'pointer',
cursor: 'pointer',
fontSize: '13px',
fontWeight: 500,
opacity: fetching || ctxLoading ? 0.6 : 1,
}}
>
{fetching ? 'Refreshing…' : 'Refresh'}
</button>
</div>
{ctxLoading && <Loader title="Loading Intel GPU data..." />}
<MetricRequirements />
{fetching && !metrics && <Loader title="Querying Prometheus for GPU metrics..." />}