headlamp-intel-gpu-plugin

5 Commits 28 Branches 9 Tags

Author	SHA1	Message	Date
Chris Farhood	cc0ad5b286	docs: document metric availability and requirements in MetricsPage Add a file-level comment and in-page requirements section explaining exactly what is and isn't available for each metric type: Power (W) -- available on discrete GPU nodes via node-exporter hwmon collector + i915 driver (no extra config) Frequency (MHz) -- NOT available; node-exporter --collector.drm is AMD-only and does not read i915 gt_freq sysfs Utilization (%) -- NOT available; no standard Prometheus collector supports i915 engine busy metrics iGPU nodes -- no metrics at all (iGPU driver has no hwmon) The in-page MetricRequirements component surfaces this information directly in the UI so operators know what to expect and why. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 22:07:19 -05:00
Chris Farhood	4b4e565a1a	fix: switch Metrics page to Prometheus/node-exporter i915 hwmon source The Intel GPU device plugin -enable-monitoring flag registers a monitoring K8s resource type (not a Prometheus endpoint). Real GPU power metrics come from node-exporter's hwmon collector which scrapes the i915 kernel driver. - Rewrite src/api/metrics.ts: query kube-prometheus-stack Prometheus for node_hwmon_energy_joule_total (rate → watts), node_hwmon_power_max_watt (TDP), joined with node_hwmon_chip_names{chip_name="i915"} to identify GPU chips. Instance → node name resolved via node_uname_info. - Rewrite src/components/MetricsPage.tsx: shows per-chip current power (W) with bar vs TDP, total fleet power summary, last-fetched timestamp. Auto-discovers Prometheus service in monitoring namespace. - Update artifacthub-pkg.yml checksum for repackaged v0.2.0 tarball. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 21:37:16 -05:00
Chris Farhood	a226f0191c	feat: add Metrics page, remove app bar badge, fix sidebar label - Add src/api/metrics.ts: Prometheus text parser + fetchGpuPluginMetrics() fetching from Intel GPU device plugin pods (port 9090). Extracts engine utilization (active/total ticks → %), boost frequency (MHz), VRAM and system memory usage, cumulative energy (µJ). - Add src/components/MetricsPage.tsx: per-card metrics display with inline utilization bars, graceful fallback when enableMonitoring is not set. - Register Metrics sidebar entry (mdi:chart-line) and route /intel-gpu/metrics. - Remove registerAppBarAction and AppBarGpuBadge (colored info bubble). - Fix sidebar parent label: 'Intel GPU' → 'intel-gpu'. - Bump to v0.2.0; update artifacthub-pkg.yml with new archive URL and checksum. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> v0.2.0	2026-02-18 21:23:36 -05:00
Chris Farhood	3c045e54be	chore: add Artifact Hub metadata for v0.1.0 Adds artifacthub-pkg.yml and artifacthub-repo.yml required for Artifact Hub indexing. Includes archive URL and sha256 checksum pointing to the v0.1.0 GitHub release tarball. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> v0.1.0	2026-02-18 19:36:15 -05:00
Chris Farhood	41bf2aead4	feat: initial release of headlamp-intel-gpu-plugin v0.1.0 Adds a Headlamp plugin for Intel GPU device plugin visibility: - Dedicated sidebar section: Overview, Device Plugins, GPU Nodes, GPU Pods - Native Node detail page injection: GPU capacity, allocatable, utilization, active pods - Native Pod detail page injection: per-container GPU resource requests/limits - Native Nodes table: GPU Type and GPU Devices columns - App bar health badge (hidden when plugin not installed) - GpuDevicePlugin CRD monitoring (deviceplugin.intel.com/v1) with graceful degradation when CRD is not present - Supports discrete (i915), Xe, and integrated GPU nodes via node labels - 48 unit tests, TypeScript clean, 28 kB production bundle Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-02-18 17:58:49 -05:00