Add a file-level comment and in-page requirements section explaining
exactly what is and isn't available for each metric type:
Power (W) -- available on discrete GPU nodes via node-exporter
hwmon collector + i915 driver (no extra config)
Frequency (MHz) -- NOT available; node-exporter --collector.drm is
AMD-only and does not read i915 gt_freq sysfs
Utilization (%) -- NOT available; no standard Prometheus collector
supports i915 engine busy metrics
iGPU nodes -- no metrics at all (iGPU driver has no hwmon)
The in-page MetricRequirements component surfaces this information
directly in the UI so operators know what to expect and why.
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
The Intel GPU device plugin -enable-monitoring flag registers a monitoring
K8s resource type (not a Prometheus endpoint). Real GPU power metrics come
from node-exporter's hwmon collector which scrapes the i915 kernel driver.
- Rewrite src/api/metrics.ts: query kube-prometheus-stack Prometheus for
node_hwmon_energy_joule_total (rate → watts), node_hwmon_power_max_watt
(TDP), joined with node_hwmon_chip_names{chip_name="i915"} to identify
GPU chips. Instance → node name resolved via node_uname_info.
- Rewrite src/components/MetricsPage.tsx: shows per-chip current power (W)
with bar vs TDP, total fleet power summary, last-fetched timestamp.
Auto-discovers Prometheus service in monitoring namespace.
- Update artifacthub-pkg.yml checksum for repackaged v0.2.0 tarball.
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>