Commit Graph

8 Commits

Author SHA1 Message Date
privilegedescalation-engineer[bot] b81f25ad74 fix: combine E2E infrastructure fixes (selectors + metrics heading + timeout) (#45)
QA + CTO approved. CI + E2E passing. E2E test fix PR — UAT via automated suite. Merged by CEO.
2026-04-11 14:05:48 +00:00
privilegedescalation-engineer[bot] 6cd159b5a4 test: add component test coverage for all untested files (#17)
* test: add component test coverage for all untested files

Adds 60 new tests (108 total) covering every untested module:
- IntelGpuDataContext: provider renders, loading/loaded states, CRD
  available/unavailable paths, refresh, useIntelGpuContext throws outside
  provider
- OverviewPage: loading, plugin-not-detected, error, populated, refresh
  button, CRD notice, device plugin table, plugin daemon pods, active pods
- NodesPage: loading, empty state, GPU node summary table, detail cards
- PodsPage: loading, empty state, summary counts, pending pod attention,
  all-pods table
- DevicePluginsPage: loading, CRD unavailable, no-plugins, plugin detail,
  daemon pod table
- NodeDetailSection: null for non-GPU nodes, GPU capacity/allocatable rows,
  pod list, loading state
- PodDetailSection: null for non-GPU pods, GPU resource rows, phase status,
  limits-only containers
- MetricsPage: context loading gate, Prometheus unreachable, empty chips,
  chip cards with power values, MetricRequirements always rendered, refresh

Also fixes vitest.config.mts to pin NODE_ENV=test so tests run correctly
without requiring callers to set it explicitly.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix: remove unused act import and merge duplicate metrics imports in MetricsPage.test.tsx

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix: cast useList mock return values to any in IntelGpuDataContext.test.tsx

The Headlamp useList() return type is an intersection of a tuple and
QueryListResponse, which plain array literals like [[], null] and
[null, null] do not satisfy. Cast all useList mockReturnValue arguments
to any so tsc passes without requiring full KubeObject stub objects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: run Prettier formatting and ESLint lint:fix on test files

Addresses CI format:check failures and import-sort warning in
MetricsPage.test.tsx flagged by QA on PR #17.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hugh Hackman <hugh@privilegedescalation.com>
Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.com>
Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Gandalf the Greybeard <gandalf-the-greybeard[bot]@users.noreply.github.com>
2026-03-21 12:53:04 +00:00
DevContainer User 1ae6e2d355 release: v0.4.1 — code quality fixes and doc updates
Remove unsafe `as any` casts, fix MetricsPage fetch cancellation safety,
delete dead AppBarGpuBadge component, fix typo in data context, move
extractJsonData to module scope, resolve ESLint/Prettier indent conflict,
fix artifacthub-pkg.yml version mismatch and inaccurate description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 13:05:58 +00:00
DevContainer User 488bf90abc fix: resolve eslint errors and apply formatting to match shared config
Auto-fix import ordering, quote style, and indentation via eslint --fix
and prettier --write. Remove unused variable in NodesPage and PodsPage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 11:50:29 +00:00
Chris Farhood cc0ad5b286 docs: document metric availability and requirements in MetricsPage
Add a file-level comment and in-page requirements section explaining
exactly what is and isn't available for each metric type:

  Power (W)       -- available on discrete GPU nodes via node-exporter
                     hwmon collector + i915 driver (no extra config)
  Frequency (MHz) -- NOT available; node-exporter --collector.drm is
                     AMD-only and does not read i915 gt_freq sysfs
  Utilization (%) -- NOT available; no standard Prometheus collector
                     supports i915 engine busy metrics
  iGPU nodes      -- no metrics at all (iGPU driver has no hwmon)

The in-page MetricRequirements component surfaces this information
directly in the UI so operators know what to expect and why.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 22:07:19 -05:00
Chris Farhood 4b4e565a1a fix: switch Metrics page to Prometheus/node-exporter i915 hwmon source
The Intel GPU device plugin -enable-monitoring flag registers a monitoring
K8s resource type (not a Prometheus endpoint). Real GPU power metrics come
from node-exporter's hwmon collector which scrapes the i915 kernel driver.

- Rewrite src/api/metrics.ts: query kube-prometheus-stack Prometheus for
  node_hwmon_energy_joule_total (rate → watts), node_hwmon_power_max_watt
  (TDP), joined with node_hwmon_chip_names{chip_name="i915"} to identify
  GPU chips. Instance → node name resolved via node_uname_info.

- Rewrite src/components/MetricsPage.tsx: shows per-chip current power (W)
  with bar vs TDP, total fleet power summary, last-fetched timestamp.
  Auto-discovers Prometheus service in monitoring namespace.

- Update artifacthub-pkg.yml checksum for repackaged v0.2.0 tarball.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 21:37:16 -05:00
Chris Farhood a226f0191c feat: add Metrics page, remove app bar badge, fix sidebar label
- Add src/api/metrics.ts: Prometheus text parser + fetchGpuPluginMetrics()
  fetching from Intel GPU device plugin pods (port 9090). Extracts engine
  utilization (active/total ticks → %), boost frequency (MHz), VRAM and
  system memory usage, cumulative energy (µJ).

- Add src/components/MetricsPage.tsx: per-card metrics display with inline
  utilization bars, graceful fallback when enableMonitoring is not set.

- Register Metrics sidebar entry (mdi:chart-line) and route /intel-gpu/metrics.

- Remove registerAppBarAction and AppBarGpuBadge (colored info bubble).

- Fix sidebar parent label: 'Intel GPU' → 'intel-gpu'.

- Bump to v0.2.0; update artifacthub-pkg.yml with new archive URL and checksum.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 21:23:36 -05:00
Chris Farhood 41bf2aead4 feat: initial release of headlamp-intel-gpu-plugin v0.1.0
Adds a Headlamp plugin for Intel GPU device plugin visibility:

- Dedicated sidebar section: Overview, Device Plugins, GPU Nodes, GPU Pods
- Native Node detail page injection: GPU capacity, allocatable, utilization, active pods
- Native Pod detail page injection: per-container GPU resource requests/limits
- Native Nodes table: GPU Type and GPU Devices columns
- App bar health badge (hidden when plugin not installed)
- GpuDevicePlugin CRD monitoring (deviceplugin.intel.com/v1) with graceful
  degradation when CRD is not present
- Supports discrete (i915), Xe, and integrated GPU nodes via node labels
- 48 unit tests, TypeScript clean, 28 kB production bundle

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 17:58:49 -05:00