Commit Graph

11 Commits

Author SHA1 Message Date
privilegedescalation-engineer 15ddba4f79 fix: add request timeout wrapper to prevent E2E test hang
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 11:26:01 +00:00
Gandalf the Greybeard ff4a2810a5 fix: render heading immediately in MetricsPage, before ctxLoading resolves
The heading 'Intel GPU — Metrics' was blocked behind the ctxLoading check,
causing the E2E navigation test to timeout when navigating directly to
/c/main/intel-gpu/metrics. The K8s.ResourceClasses.useList() hooks
in IntelGpuDataContext can take time to resolve when navigating directly
to the metrics route (as opposed to via sidebar), causing ctxLoading to
remain true beyond the 15s test timeout.

Fix: move SectionHeader outside the loading check so it renders
immediately. The Loader now appears below the heading while waiting
for context to load. Also disable the Refresh button during ctxLoading.

Updated unit test to verify heading is visible even when ctxLoading=true.

Fixes: headlamp-intel-gpu-plugin#42

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-25 06:18:45 +00:00
privilegedescalation-engineer[bot] 6cd159b5a4 test: add component test coverage for all untested files (#17)
* test: add component test coverage for all untested files

Adds 60 new tests (108 total) covering every untested module:
- IntelGpuDataContext: provider renders, loading/loaded states, CRD
  available/unavailable paths, refresh, useIntelGpuContext throws outside
  provider
- OverviewPage: loading, plugin-not-detected, error, populated, refresh
  button, CRD notice, device plugin table, plugin daemon pods, active pods
- NodesPage: loading, empty state, GPU node summary table, detail cards
- PodsPage: loading, empty state, summary counts, pending pod attention,
  all-pods table
- DevicePluginsPage: loading, CRD unavailable, no-plugins, plugin detail,
  daemon pod table
- NodeDetailSection: null for non-GPU nodes, GPU capacity/allocatable rows,
  pod list, loading state
- PodDetailSection: null for non-GPU pods, GPU resource rows, phase status,
  limits-only containers
- MetricsPage: context loading gate, Prometheus unreachable, empty chips,
  chip cards with power values, MetricRequirements always rendered, refresh

Also fixes vitest.config.mts to pin NODE_ENV=test so tests run correctly
without requiring callers to set it explicitly.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix: remove unused act import and merge duplicate metrics imports in MetricsPage.test.tsx

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix: cast useList mock return values to any in IntelGpuDataContext.test.tsx

The Headlamp useList() return type is an intersection of a tuple and
QueryListResponse, which plain array literals like [[], null] and
[null, null] do not satisfy. Cast all useList mockReturnValue arguments
to any so tsc passes without requiring full KubeObject stub objects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: run Prettier formatting and ESLint lint:fix on test files

Addresses CI format:check failures and import-sort warning in
MetricsPage.test.tsx flagged by QA on PR #17.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hugh Hackman <hugh@privilegedescalation.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.com>
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Gandalf the Greybeard <gandalf-the-greybeard[bot]@users.noreply.github.com>
2026-03-21 12:53:04 +00:00
gandalf-the-greybeard[bot] e5e681b415 fix: rename plugin from headlamp-intel-gpu to intel-gpu (#6)
Aligns naming convention across all plugins. Renames package, sidebar entries, routes, and documentation references.
2026-03-10 23:49:08 +00:00
gandalf-the-greybeard[bot] 231cb41d06 Rename plugin from intel-gpu to headlamp-intel-gpu
Artifact Hub listing was renamed with new repository ID
3c97f78a-26e3-4e8a-89e7-29884602e3d7. Updates package name,
sidebar entries, routes, archive URL, and documentation.

Refs: PRI-26

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 12:14:20 +00:00
DevContainer User 1ae6e2d355 release: v0.4.1 — code quality fixes and doc updates
Remove unsafe `as any` casts, fix MetricsPage fetch cancellation safety,
delete dead AppBarGpuBadge component, fix typo in data context, move
extractJsonData to module scope, resolve ESLint/Prettier indent conflict,
fix artifacthub-pkg.yml version mismatch and inaccurate description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:05:58 +00:00
DevContainer User 488bf90abc fix: resolve eslint errors and apply formatting to match shared config
Auto-fix import ordering, quote style, and indentation via eslint --fix
and prettier --write. Remove unused variable in NodesPage and PodsPage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 11:50:29 +00:00
Chris Farhood cc0ad5b286 docs: document metric availability and requirements in MetricsPage
Add a file-level comment and in-page requirements section explaining
exactly what is and isn't available for each metric type:

  Power (W)       -- available on discrete GPU nodes via node-exporter
                     hwmon collector + i915 driver (no extra config)
  Frequency (MHz) -- NOT available; node-exporter --collector.drm is
                     AMD-only and does not read i915 gt_freq sysfs
  Utilization (%) -- NOT available; no standard Prometheus collector
                     supports i915 engine busy metrics
  iGPU nodes      -- no metrics at all (iGPU driver has no hwmon)

The in-page MetricRequirements component surfaces this information
directly in the UI so operators know what to expect and why.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 22:07:19 -05:00
Chris Farhood 4b4e565a1a fix: switch Metrics page to Prometheus/node-exporter i915 hwmon source
The Intel GPU device plugin -enable-monitoring flag registers a monitoring
K8s resource type (not a Prometheus endpoint). Real GPU power metrics come
from node-exporter's hwmon collector which scrapes the i915 kernel driver.

- Rewrite src/api/metrics.ts: query kube-prometheus-stack Prometheus for
  node_hwmon_energy_joule_total (rate → watts), node_hwmon_power_max_watt
  (TDP), joined with node_hwmon_chip_names{chip_name="i915"} to identify
  GPU chips. Instance → node name resolved via node_uname_info.

- Rewrite src/components/MetricsPage.tsx: shows per-chip current power (W)
  with bar vs TDP, total fleet power summary, last-fetched timestamp.
  Auto-discovers Prometheus service in monitoring namespace.

- Update artifacthub-pkg.yml checksum for repackaged v0.2.0 tarball.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 21:37:16 -05:00
Chris Farhood a226f0191c feat: add Metrics page, remove app bar badge, fix sidebar label
- Add src/api/metrics.ts: Prometheus text parser + fetchGpuPluginMetrics()
  fetching from Intel GPU device plugin pods (port 9090). Extracts engine
  utilization (active/total ticks → %), boost frequency (MHz), VRAM and
  system memory usage, cumulative energy (µJ).

- Add src/components/MetricsPage.tsx: per-card metrics display with inline
  utilization bars, graceful fallback when enableMonitoring is not set.

- Register Metrics sidebar entry (mdi:chart-line) and route /intel-gpu/metrics.

- Remove registerAppBarAction and AppBarGpuBadge (colored info bubble).

- Fix sidebar parent label: 'Intel GPU' → 'intel-gpu'.

- Bump to v0.2.0; update artifacthub-pkg.yml with new archive URL and checksum.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 21:23:36 -05:00
Chris Farhood 41bf2aead4 feat: initial release of headlamp-intel-gpu-plugin v0.1.0
Adds a Headlamp plugin for Intel GPU device plugin visibility:

- Dedicated sidebar section: Overview, Device Plugins, GPU Nodes, GPU Pods
- Native Node detail page injection: GPU capacity, allocatable, utilization, active pods
- Native Pod detail page injection: per-container GPU resource requests/limits
- Native Nodes table: GPU Type and GPU Devices columns
- App bar health badge (hidden when plugin not installed)
- GpuDevicePlugin CRD monitoring (deviceplugin.intel.com/v1) with graceful
  degradation when CRD is not present
- Supports discrete (i915), Xe, and integrated GPU nodes via node labels
- 48 unit tests, TypeScript clean, 28 kB production bundle

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 17:58:49 -05:00