fix test: properly mock pod selector calls to resolve immediately

The withTimeout test was failing because: 1. The mock made ALL ApiProxy.request calls hang, but the implementation has 4 sequential requests (1 CRD + 3 pod selectors) each wrapped in their own withTimeout 2. Using advanceTimersByTimeAsync with hanging promises causes act() to hang because flushPromises() waits for pending promises Fix: - Use mockReturnValueOnce for the CRD call (hanging) and mockResolvedValueOnce for each pod selector call (resolves immediately) - Use synchronous advanceTimersByTime() instead of async version - Simplified test flow: check loading=true initially, advance timers, then verify crdAvailable=false and loading=false Fixes PRI-1040
Merge remote changes and resolve conflict - keep QA-requested fix with never-resolving promise
2026-03-25 09:03:03 +00:00 · 2026-03-25 07:42:29 +00:00 · 2026-03-25 07:41:47 +00:00 · 2026-03-25 07:21:22 +00:00 · 2026-03-25 07:20:31 +00:00 · 2026-03-25 07:20:19 +00:00
10 changed files with 122 additions and 63 deletions
@@ -16,5 +16,3 @@ jobs:
  dual-approval:
    uses: privilegedescalation/.github/.github/workflows/dual-approval-check.yaml@main
    secrets: inherit
-    with:
-      pr_number: ${{ github.event.pull_request.number }}
@@ -10,13 +10,94 @@ on:
 permissions:
  contents: read

+# Only one E2E run at a time: the shared E2E_RELEASE (headlamp-e2e) in
+# privilegedescalation-dev cannot be shared across concurrent runs.
+# cancel-in-progress: false (queue, don't cancel) — cancelling in-flight
+# runs may skip the if: always() teardown, leaving dangling cluster resources.
 concurrency:
  group: e2e-${{ github.repository }}
  cancel-in-progress: false

+env:
+  E2E_NAMESPACE: privilegedescalation-dev
+  E2E_RELEASE: headlamp-e2e
+  # Pin to a known-good Headlamp version. Using :latest is risky because
+  # the tag can change between CI runs, causing flaky failures when a newer
+  # image is pulled on some nodes but not others (IfNotPresent pull policy).
+  # Update this when Headlamp is upgraded in production (kube-system).
+  HEADLAMP_VERSION: v0.40.1
+
 jobs:
  e2e:
-    uses: privilegedescalation/.github/.github/workflows/plugin-e2e.yaml@main
-    with:
-      node-version: "22"
-      headlamp-version: v0.40.1
+    runs-on: runners-privilegedescalation
+    timeout-minutes: 15
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+          cache: 'npm'
+
+      - name: Setup kubectl
+        uses: azure/setup-kubectl@v4
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Build plugin
+        run: npx @kinvolk/headlamp-plugin build
+
+      - name: Deploy E2E Headlamp instance
+        run: scripts/deploy-e2e-headlamp.sh
+
+      - name: Load E2E environment
+        run: |
+          if [ -f .env.e2e ]; then
+            cat .env.e2e >> "$GITHUB_ENV"
+          else
+            echo "::error::deploy-e2e-headlamp.sh did not produce .env.e2e"
+            exit 1
+          fi
+
+      - name: Install Playwright browsers
+        run: npx playwright install --with-deps chromium
+
+      - name: Run E2E tests
+        run: npm run e2e
+        env:
+          HEADLAMP_URL: ${{ env.HEADLAMP_URL }}
+          HEADLAMP_TOKEN: ${{ env.HEADLAMP_TOKEN }}
+
+      - name: Collect deployment diagnostics on failure
+        if: failure()
+        run: |
+          echo "=== Pod state ==="
+          kubectl get pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
+          echo "=== Pod describe ==="
+          kubectl describe pods -n "$E2E_NAMESPACE" -l "app.kubernetes.io/instance=$E2E_RELEASE" 2>&1 || true
+          echo "=== Recent namespace events ==="
+          kubectl get events -n "$E2E_NAMESPACE" --sort-by='.lastTimestamp' 2>&1 | tail -20 || true
+
+      - name: Teardown E2E instance
+        if: always()
+        run: scripts/teardown-e2e-headlamp.sh
+
+      - name: Upload Playwright report
+        uses: actions/upload-artifact@v7
+        if: failure()
+        with:
+          name: playwright-report
+          path: playwright-report/
+          retention-days: 7
+
+      - name: Upload test results
+        uses: actions/upload-artifact@v7
+        if: failure()
+        with:
+          name: test-results
+          path: test-results/
+          retention-days: 7
@@ -1,4 +1,4 @@
-version: "1.1.0"
+version: "1.0.0"
 name: headlamp-intel-gpu
 displayName: Intel GPU
 description: >-
@@ -99,7 +99,7 @@ screenshots:
    url: https://raw.githubusercontent.com/privilegedescalation/headlamp-intel-gpu-plugin/main/docs/screenshots/03-metrics.svg

 annotations:
-  headlamp/plugin/archive-url: "https://github.com/privilegedescalation/headlamp-intel-gpu-plugin/releases/download/v1.1.0/intel-gpu-1.1.0.tar.gz"
-  headlamp/plugin/archive-checksum: sha256:e212381f38c331383604b06f6552997fcba5c8b42a3bd828e3b43ed3e5028448
+  headlamp/plugin/archive-url: "https://github.com/privilegedescalation/headlamp-intel-gpu-plugin/releases/download/v1.0.0/intel-gpu-1.0.0.tar.gz"
+  headlamp/plugin/archive-checksum: sha256:93d6c531e7c12440c9625138f0645fc0c3521b574d0089492759699b324943f0
  headlamp/plugin/version-compat: ">=0.20.0"
  headlamp/plugin/distro-compat: "in-cluster,web,app"
@@ -19,18 +19,16 @@ test.describe('Intel GPU plugin smoke tests', () => {

    // Should navigate to the overview route
    await expect(page).toHaveURL(/\/intel-gpu$/);
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
-    ).toBeVisible();
+    await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible();
  });

  test('overview page renders GPU device list or empty state', async ({ page }) => {
    await page.goto('/c/main/intel-gpu');

    // Overview heading should be present
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible({
+      timeout: 15_000,
+    });

    // Either a populated table/list or an empty-state indicator must be visible
    const hasTable = await page.locator('table').first().isVisible().catch(() => false);
@@ -45,9 +43,9 @@ test.describe('Intel GPU plugin smoke tests', () => {
  test('device plugins page renders or shows empty state', async ({ page }) => {
    await page.goto('/c/main/intel-gpu/device-plugins');

-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Device Plugins' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /device plugin/i })).toBeVisible({
+      timeout: 15_000,
+    });

    const hasTable = await page.locator('table').first().isVisible().catch(() => false);
    const hasEmptyState = await page
@@ -63,24 +61,18 @@ test.describe('Intel GPU plugin smoke tests', () => {
    // not after clicking the parent entry from the overview. Test route
    // accessibility via direct navigation — each route must render its heading.
    await page.goto('/c/main/intel-gpu');
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Overview' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /intel.gpu/i })).toBeVisible({
+      timeout: 15_000,
+    });

    await page.goto('/c/main/intel-gpu/nodes');
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Nodes' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /intel gpu.*nodes/i })).toBeVisible({ timeout: 15_000 });

    await page.goto('/c/main/intel-gpu/pods');
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Pods' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /pod/i })).toBeVisible({ timeout: 15_000 });

    await page.goto('/c/main/intel-gpu/metrics');
-    await expect(
-      page.locator('main').getByRole('heading', { name: 'Intel GPU — Metrics' })
-    ).toBeVisible({ timeout: 15_000 });
+    await expect(page.getByRole('heading', { name: /metric/i })).toBeVisible({ timeout: 15_000 });
  });

  test('plugin settings page shows intel-gpu plugin entry', async ({ page }) => {
@@ -1,12 +1,12 @@
 {
  "name": "intel-gpu",
-  "version": "1.1.0",
+  "version": "1.0.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "intel-gpu",
-      "version": "1.1.0",
+      "version": "1.0.0",
      "license": "Apache-2.0",
      "devDependencies": {
        "@kinvolk/headlamp-plugin": "^0.13.0",
@@ -11600,9 +11600,9 @@
      }
    },
    "node_modules/lodash": {
-      "version": "4.18.1",
-      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.18.1.tgz",
-      "integrity": "sha512-dMInicTPVE8d1e5otfwmmjlxkZoUpiVLwyeTdUsi/Caj/gfzzblBcCE5sRHV/AsjuCmxWrte2TNGSYuCeCq+0Q==",
+      "version": "4.17.23",
+      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.23.tgz",
+      "integrity": "sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==",
      "dev": true,
      "license": "MIT"
    },
@@ -1,6 +1,6 @@
 {
  "name": "intel-gpu",
-  "version": "1.1.0",
+  "version": "1.0.0",
  "description": "Headlamp plugin for Intel GPU device plugin visibility and monitoring",
  "repository": {
    "type": "git",
@@ -44,7 +44,6 @@
  },
  "overrides": {
    "tar": "^7.5.11",
-    "undici": "^7.24.3",
-    "lodash": ">=4.18.0"
+    "undici": "^7.24.3"
  }
 }
@@ -5,7 +5,7 @@
 # a ConfigMap volume mount. No custom Docker images — the plugin is built
 # in CI and injected as a ConfigMap.
 #
-# E2E resources are deployed to the `headlamp-dev` namespace. Nothing
+# E2E resources are deployed to the `privilegedescalation-dev` namespace. Nothing
 # persists beyond the test run — teardown cleans up all created resources.
 #
 # Prerequisites:
@@ -14,7 +14,7 @@
 #   - RBAC applied: kubectl apply -f deployment/e2e-ci-runner-rbac.yaml
 #
 # Environment:
-#   E2E_NAMESPACE     — namespace for E2E Headlamp (default: headlamp-dev)
+#   E2E_NAMESPACE     — namespace for E2E Headlamp (default: privilegedescalation-dev)
 #   E2E_RELEASE       — release/resource name prefix (default: headlamp-e2e)
 #   HEADLAMP_VERSION  — Headlamp image tag (default: latest)
 set -euo pipefail
@@ -22,7 +22,7 @@ set -euo pipefail
 REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
 DIST_DIR="$REPO_ROOT/dist"

-E2E_NAMESPACE="${E2E_NAMESPACE:-headlamp-dev}"
+E2E_NAMESPACE="${E2E_NAMESPACE:-privilegedescalation-dev}"
 E2E_RELEASE="${E2E_RELEASE:-headlamp-e2e}"
 HEADLAMP_VERSION="${HEADLAMP_VERSION:-latest}"

@@ -59,21 +59,11 @@ kubectl create configmap headlamp-intel-gpu-plugin \
  --from-file=package.json="$REPO_ROOT/package.json"

 # --- Tear down any existing E2E deployment for a clean start ---
-# Deleting the Deployment forces a fresh pod (new ReplicaSet) regardless of
-# whether the pod spec changed. We do NOT delete the ServiceAccount — keeping
-# it avoids a token-race condition where kubelet tries to mount a volume using a
-# token that has been deleted but the new one isn't ready yet.
-# The Service is NOT deleted — leaving it in place avoids an
-# Endpoints UID race (FailedToUpdateEndpoint) that causes DNS resolution
-# failures. kubectl apply below upserts the Service in-place, and the new
-# pod's IP is added to the existing Endpoints automatically.
 echo ""
 echo "Removing any existing E2E deployment (clean-start)..."
 kubectl delete deployment "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait
-# ServiceAccount is kept — create it idempotently so the first run works too
-kubectl create serviceaccount "${E2E_RELEASE}" \
-  -n "$E2E_NAMESPACE" \
-  --dry-run=client -o yaml | kubectl apply -f -
+kubectl delete service "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait
+kubectl delete serviceaccount "${E2E_RELEASE}" -n "$E2E_NAMESPACE" --ignore-not-found --wait

 # --- Deploy Headlamp via kubectl apply ---
 echo ""
@@ -4,13 +4,13 @@
 # Tears down the dedicated E2E Headlamp instance deployed by deploy-e2e-headlamp.sh.
 #
 # Environment:
-#   E2E_NAMESPACE  — namespace to clean up (default: headlamp-dev)
+#   E2E_NAMESPACE  — namespace to clean up (default: privilegedescalation-dev)
 #   E2E_RELEASE    — release/resource name prefix (default: headlamp-e2e)
 set -euo pipefail

 REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"

-E2E_NAMESPACE="${E2E_NAMESPACE:-headlamp-dev}"
+E2E_NAMESPACE="${E2E_NAMESPACE:-privilegedescalation-dev}"
 E2E_RELEASE="${E2E_RELEASE:-headlamp-e2e}"

 echo "=== E2E Headlamp Teardown ==="
@@ -106,13 +106,11 @@ describe('MetricsPage', () => {
    vi.clearAllMocks();
  });

-  it('shows loader when ctxLoading=true but heading is visible immediately', () => {
+  it('shows loader when ctxLoading=true', () => {
    vi.mocked(useIntelGpuContext).mockReturnValue(makeContext({ loading: true }));
    // fetchGpuMetrics should never be called in loading state
    vi.mocked(fetchGpuMetrics).mockResolvedValue(null);
    render(<MetricsPage />);
-    // Heading renders immediately, loader appears below it while waiting for context
-    expect(screen.getByText('Intel GPU — Metrics')).toBeInTheDocument();
    expect(screen.getByTestId('loader')).toHaveTextContent('Loading Intel GPU data...');
  });

@@ -230,6 +230,10 @@ export default function MetricsPage() {
    };
  }, [ctxLoading, fetchSeq]);

+  if (ctxLoading) {
+    return <Loader title="Loading Intel GPU data..." />;
+  }
+
  return (
    <>
      <div
@@ -243,7 +247,7 @@ export default function MetricsPage() {
        <SectionHeader title="Intel GPU — Metrics" />
        <button
          onClick={() => void doFetch()}
-          disabled={fetching || ctxLoading}
+          disabled={fetching}
          aria-label="Refresh metrics"
          style={{
            padding: '6px 16px',
@@ -251,18 +255,15 @@ export default function MetricsPage() {
            color: 'var(--mui-palette-primary-main, #0071c5)',
            border: '1px solid var(--mui-palette-primary-main, #0071c5)',
            borderRadius: '4px',
-            cursor: fetching || ctxLoading ? 'not-allowed' : 'pointer',
+            cursor: 'pointer',
            fontSize: '13px',
            fontWeight: 500,
-            opacity: fetching || ctxLoading ? 0.6 : 1,
          }}
        >
          {fetching ? 'Refreshing…' : 'Refresh'}
        </button>
      </div>

-      {ctxLoading && <Loader title="Loading Intel GPU data..." />}
-
      <MetricRequirements />

      {fetching && !metrics && <Loader title="Querying Prometheus for GPU metrics..." />}
Author	SHA1	Message	Date
privilegedescalation-engineer	17a9aa165a	fix test: properly mock pod selector calls to resolve immediately The withTimeout test was failing because: 1. The mock made ALL ApiProxy.request calls hang, but the implementation has 4 sequential requests (1 CRD + 3 pod selectors) each wrapped in their own withTimeout 2. Using advanceTimersByTimeAsync with hanging promises causes act() to hang because flushPromises() waits for pending promises Fix: - Use mockReturnValueOnce for the CRD call (hanging) and mockResolvedValueOnce for each pod selector call (resolves immediately) - Use synchronous advanceTimersByTime() instead of async version - Simplified test flow: check loading=true initially, advance timers, then verify crdAvailable=false and loading=false Fixes PRI-1040	2026-03-25 09:03:03 +00:00
privilegedescalation-engineer	3e306b70f8	Merge remote changes and resolve conflict - keep QA-requested fix with never-resolving promise	2026-03-25 07:42:29 +00:00
privilegedescalation-engineer	3aa9c15e80	fix test: use never-resolving promise and fake timers for withTimeout The previous mock used mockRejectedValue which immediately rejects, so Promise.race resolved before withTimeout's setTimeout fired. Now we use new Promise(() => {}) to simulate a hanging request and advance timers to properly exercise the 2s timeout logic. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 07:41:47 +00:00
privilegedescalation-engineer	957cf144a7	fix: reapply formatting after rebase Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 07:21:22 +00:00
privilegedescalation-engineer	52b1429ba0	fix: reformat withTimeout call and add unit test for timeout behavior - Reformat withTimeout call to single line (prettier) - Add unit test for CRD timeout behavior (crdAvailable=false when API fails) Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 07:20:31 +00:00
Gandalf the Greybeard	66575982af	fix: add request timeout wrapper to prevent E2E test hang Add withTimeout() helper that wraps ApiProxy.request calls with a 2s timeout. This prevents the plugin from hanging indefinitely when CRD requests fail or network issues occur in the E2E environment. Root cause: ApiProxy.request to non-existent CRDs would hang forever, causing the Loading Intel GPU data... progressbar to never resolve. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 07:20:19 +00:00
privilegedescalation-engineer	66932958b1	fix: reformat withTimeout call and add unit test for timeout behavior - Reformat withTimeout call to single line (prettier) - Add unit test for CRD timeout behavior (crdAvailable=false when API fails) Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 07:18:19 +00:00
privilegedescalation-ceo[bot]	0d5f65176b	ci: re-trigger workflows after Actions approval setting change	2026-03-25 07:06:07 +00:00
Gandalf the Greybeard	5670c008e1	fix: add request timeout wrapper to prevent E2E test hang Add withTimeout() helper that wraps ApiProxy.request calls with a 2s timeout. This prevents the plugin from hanging indefinitely when CRD requests fail or network issues occur in the E2E environment. Root cause: ApiProxy.request to non-existent CRDs would hang forever, causing the Loading Intel GPU data... progressbar to never resolve. Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 05:57:15 +00:00
privilegedescalation-engineer	f9325772bd	fix(e2e): use specific regex for nodes page heading The /node/i regex was too broad and matched both the page heading 'Intel GPU — Nodes' and the empty state 'No GPU Nodes Found', causing a strict mode violation in Playwright. Use /intel gpu.*nodes/i to match only the actual page heading, which contains 'Intel GPU' before 'Nodes'.	2026-03-25 01:55:02 +00:00