headlamp-intel-gpu-plugin

privilegedescalation/headlamp-intel-gpu-plugin

Archived

Author	SHA1	Message	Date
privilegedescalation-engineer	15ddba4f79	fix: add request timeout wrapper to prevent E2E test hang Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 11:26:01 +00:00
Gandalf the Greybeard	ff4a2810a5	fix: render heading immediately in MetricsPage, before ctxLoading resolves The heading 'Intel GPU — Metrics' was blocked behind the ctxLoading check, causing the E2E navigation test to timeout when navigating directly to /c/main/intel-gpu/metrics. The K8s.ResourceClasses.useList() hooks in IntelGpuDataContext can take time to resolve when navigating directly to the metrics route (as opposed to via sidebar), causing ctxLoading to remain true beyond the 15s test timeout. Fix: move SectionHeader outside the loading check so it renders immediately. The Loader now appears below the heading while waiting for context to load. Also disable the Refresh button during ctxLoading. Updated unit test to verify heading is visible even when ctxLoading=true. Fixes: headlamp-intel-gpu-plugin#42 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-03-25 06:18:45 +00:00
privilegedescalation-engineer[bot]	6cd159b5a4	test: add component test coverage for all untested files (#17 ) * test: add component test coverage for all untested files Adds 60 new tests (108 total) covering every untested module: - IntelGpuDataContext: provider renders, loading/loaded states, CRD available/unavailable paths, refresh, useIntelGpuContext throws outside provider - OverviewPage: loading, plugin-not-detected, error, populated, refresh button, CRD notice, device plugin table, plugin daemon pods, active pods - NodesPage: loading, empty state, GPU node summary table, detail cards - PodsPage: loading, empty state, summary counts, pending pod attention, all-pods table - DevicePluginsPage: loading, CRD unavailable, no-plugins, plugin detail, daemon pod table - NodeDetailSection: null for non-GPU nodes, GPU capacity/allocatable rows, pod list, loading state - PodDetailSection: null for non-GPU pods, GPU resource rows, phase status, limits-only containers - MetricsPage: context loading gate, Prometheus unreachable, empty chips, chip cards with power values, MetricRequirements always rendered, refresh Also fixes vitest.config.mts to pin NODE_ENV=test so tests run correctly without requiring callers to set it explicitly. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix: remove unused act import and merge duplicate metrics imports in MetricsPage.test.tsx Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix: cast useList mock return values to any in IntelGpuDataContext.test.tsx The Headlamp useList() return type is an intersection of a tuple and QueryListResponse, which plain array literals like [[], null] and [null, null] do not satisfy. Cast all useList mockReturnValue arguments to any so tsc passes without requiring full KubeObject stub objects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: run Prettier formatting and ESLint lint:fix on test files Addresses CI format:check failures and import-sort warning in MetricsPage.test.tsx flagged by QA on PR #17. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Hugh Hackman <hugh@privilegedescalation.com> Co-authored-by: Paperclip <noreply@paperclip.ing> Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.com> Co-authored-by: Gandalf the Greybeard <gandalf@privilegedescalation.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Gandalf the Greybeard <gandalf-the-greybeard[bot]@users.noreply.github.com>	2026-03-21 12:53:04 +00:00
gandalf-the-greybeard[bot]	e5e681b415	fix: rename plugin from headlamp-intel-gpu to intel-gpu (#6 ) Aligns naming convention across all plugins. Renames package, sidebar entries, routes, and documentation references.	2026-03-10 23:49:08 +00:00
gandalf-the-greybeard[bot]	231cb41d06	Rename plugin from intel-gpu to headlamp-intel-gpu Artifact Hub listing was renamed with new repository ID 3c97f78a-26e3-4e8a-89e7-29884602e3d7. Updates package name, sidebar entries, routes, archive URL, and documentation. Refs: PRI-26 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 12:14:20 +00:00
DevContainer User	1ae6e2d355	release: v0.4.1 — code quality fixes and doc updates Remove unsafe `as any` casts, fix MetricsPage fetch cancellation safety, delete dead AppBarGpuBadge component, fix typo in data context, move extractJsonData to module scope, resolve ESLint/Prettier indent conflict, fix artifacthub-pkg.yml version mismatch and inaccurate description. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:05:58 +00:00
DevContainer User	488bf90abc	fix: resolve eslint errors and apply formatting to match shared config Auto-fix import ordering, quote style, and indentation via eslint --fix and prettier --write. Remove unused variable in NodesPage and PodsPage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 11:50:29 +00:00
Chris Farhood	cc0ad5b286	docs: document metric availability and requirements in MetricsPage Add a file-level comment and in-page requirements section explaining exactly what is and isn't available for each metric type: Power (W) -- available on discrete GPU nodes via node-exporter hwmon collector + i915 driver (no extra config) Frequency (MHz) -- NOT available; node-exporter --collector.drm is AMD-only and does not read i915 gt_freq sysfs Utilization (%) -- NOT available; no standard Prometheus collector supports i915 engine busy metrics iGPU nodes -- no metrics at all (iGPU driver has no hwmon) The in-page MetricRequirements component surfaces this information directly in the UI so operators know what to expect and why. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 22:07:19 -05:00
Chris Farhood	4b4e565a1a	fix: switch Metrics page to Prometheus/node-exporter i915 hwmon source The Intel GPU device plugin -enable-monitoring flag registers a monitoring K8s resource type (not a Prometheus endpoint). Real GPU power metrics come from node-exporter's hwmon collector which scrapes the i915 kernel driver. - Rewrite src/api/metrics.ts: query kube-prometheus-stack Prometheus for node_hwmon_energy_joule_total (rate → watts), node_hwmon_power_max_watt (TDP), joined with node_hwmon_chip_names{chip_name="i915"} to identify GPU chips. Instance → node name resolved via node_uname_info. - Rewrite src/components/MetricsPage.tsx: shows per-chip current power (W) with bar vs TDP, total fleet power summary, last-fetched timestamp. Auto-discovers Prometheus service in monitoring namespace. - Update artifacthub-pkg.yml checksum for repackaged v0.2.0 tarball. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 21:37:16 -05:00
Chris Farhood	a226f0191c	feat: add Metrics page, remove app bar badge, fix sidebar label - Add src/api/metrics.ts: Prometheus text parser + fetchGpuPluginMetrics() fetching from Intel GPU device plugin pods (port 9090). Extracts engine utilization (active/total ticks → %), boost frequency (MHz), VRAM and system memory usage, cumulative energy (µJ). - Add src/components/MetricsPage.tsx: per-card metrics display with inline utilization bars, graceful fallback when enableMonitoring is not set. - Register Metrics sidebar entry (mdi:chart-line) and route /intel-gpu/metrics. - Remove registerAppBarAction and AppBarGpuBadge (colored info bubble). - Fix sidebar parent label: 'Intel GPU' → 'intel-gpu'. - Bump to v0.2.0; update artifacthub-pkg.yml with new archive URL and checksum. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 21:23:36 -05:00
Chris Farhood	41bf2aead4	feat: initial release of headlamp-intel-gpu-plugin v0.1.0 Adds a Headlamp plugin for Intel GPU device plugin visibility: - Dedicated sidebar section: Overview, Device Plugins, GPU Nodes, GPU Pods - Native Node detail page injection: GPU capacity, allocatable, utilization, active pods - Native Pod detail page injection: per-container GPU resource requests/limits - Native Nodes table: GPU Type and GPU Devices columns - App bar health badge (hidden when plugin not installed) - GpuDevicePlugin CRD monitoring (deviceplugin.intel.com/v1) with graceful degradation when CRD is not present - Supports discrete (i915), Xe, and integrated GPU nodes via node labels - 48 unit tests, TypeScript clean, 28 kB production bundle Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 17:58:49 -05:00

11 Commits