feat: native Headlamp integration, TrueNAS API, docs, and CI for v0.2.0

Native Headlamp integrations: - registerResourceTableColumnsProcessor: add Protocol/Pool/Server columns to native StorageClass table and Protocol/Volume Handle to PV table - registerDetailsViewSection: inject TNS-CSI section into PV detail pages - registerDetailsViewSection: inject driver role/status into tns-csi Pod pages - registerDetailsViewHeaderAction: Benchmark shortcut on StorageClass detail - registerAppBarAction: driver health badge (N/Nc M/Mn, color-coded) - Trim sidebar from 6 → 4 entries (Overview, Snapshots, Metrics, Benchmark) TrueNAS API integration: - src/api/truenas.ts: ConfigStore-backed settings, WebSocket JSON-RPC client for pool.query (auth.login_with_api_key + pool.query) - src/components/TnsCsiSettings.tsx: API key + server override settings UI with connection test button - TnsCsiDataContext: fetch real pool stats (size/allocated/free/status) - OverviewPage: three-tier pool capacity display (real data → error → metrics fallback) Documentation: - README, CHANGELOG, CONTRIBUTING, SECURITY - docs/: architecture, deployment (Helm), getting-started, user-guide, troubleshooting CI: - .github/workflows/ci.yaml: lint + type-check + test on PR/push - .github/workflows/release.yaml: workflow_dispatch versioned release Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
2026-02-18 16:37:56 -05:00
parent f2f3c3a87e
commit f1feb5c2f7
30 changed files with 3540 additions and 44 deletions
@@ -0,0 +1,112 @@
+# Troubleshooting
+
+## Quick Diagnosis
+
+| Symptom | Likely Cause | Fix |
+| ------- | ------------ | --- |
+| **Plugin not in sidebar** | Not installed or browser cache | Hard refresh (Cmd+Shift+R / Ctrl+Shift+F5) |
+| **"TrueNAS (tns-csi)" missing from sidebar** | Plugin not loaded | Check Headlamp plugin manager or restart Headlamp pod |
+| **No StorageClasses listed** | Wrong provisioner or driver not installed | See [Driver Detection](#driver-detection) |
+| **Driver status "Not installed"** | CSIDriver object missing | `kubectl get csidriver tns.csi.io` |
+| **Protocol/Pool/Server showing "—"** | StorageClass missing parameters | `kubectl get sc <name> -o yaml` to inspect |
+| **403 on any page** | Missing RBAC | See [RBAC Issues](rbac.md) |
+| **Metrics page empty** | Controller pod unreachable or no metrics | See [Metrics Issues](metrics.md) |
+| **Snapshots tab: "CRD not available"** | Snapshot CRD not installed | Install `snapshot.storage.k8s.io` CRDs |
+| **Snapshots tab empty (no message)** | No snapshots or wrong snapshot class | Check VolumeSnapshotClass driver field |
+| **Benchmark fails immediately** | Missing RBAC for Jobs/PVCs | See [Benchmark Issues](benchmark.md) |
+| **Benchmark stuck in "Running"** | kbench pod not starting | `kubectl get pods -n <ns> -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin` |
+| **Page loads but data is stale** | Watch connection dropped | Click the Refresh button or reload the page |
+
+## Driver Detection
+
+The plugin detects the tns-csi driver by querying:
+
+```
+GET /apis/storage.k8s.io/v1/csidrivers/tns.csi.io
+```
+
+If this returns 404, the driver shows as "Not installed".
+
+**Check:**
+
+```bash
+kubectl get csidriver tns.csi.io
+```
+
+If missing, verify the tns-csi driver is deployed. The driver registers its CSIDriver object on startup.
+
+## StorageClass Parameters Showing "—"
+
+StorageClass Protocol, Pool, and Server come from the StorageClass `parameters` field.
+
+**Check:**
+
+```bash
+kubectl get sc -o yaml | grep -A5 "provisioner: tns.csi.io"
+```
+
+Expected output includes:
+
+```yaml
+parameters:
+  protocol: nfs
+  pool: tank/k8s
+  server: 192.168.1.1
+```
+
+If `parameters` is absent, the StorageClass was created without them — the CSI driver documentation specifies the required parameters for each protocol.
+
+## Controller Pods Not Showing
+
+The Overview page shows controller and node pod counts using label selectors:
+
+- Controller: `app.kubernetes.io/name=tns-csi-driver,app.kubernetes.io/component=controller`
+- Node: `app.kubernetes.io/name=tns-csi-driver,app.kubernetes.io/component=node`
+
+**Check:**
+
+```bash
+kubectl get pods -n kube-system -l app.kubernetes.io/name=tns-csi-driver
+```
+
+If pods exist but aren't showing, verify the `app.kubernetes.io/component` label is set correctly.
+
+## Infinite Loading Spinner
+
+If a page shows a loading spinner indefinitely:
+
+1. **Check browser console** for errors (F12 → Console)
+2. **Check network tab** for failed API requests (look for 403, 404, 500)
+3. **Check Headlamp pod logs**: `kubectl logs -n kube-system -l app.kubernetes.io/name=headlamp`
+4. **Try refreshing** — the watch connection may have been interrupted
+
+## Common API Errors
+
+| HTTP Status | Meaning | Action |
+| ----------- | ------- | ------ |
+| `401 Unauthorized` | Token expired or invalid | Re-authenticate in Headlamp |
+| `403 Forbidden` | Missing RBAC permission | See [RBAC Issues](rbac.md) |
+| `404 Not Found` | Resource doesn't exist | Expected for optional resources (CSIDriver, snapshot CRD) |
+| `503 Service Unavailable` | API server overloaded | Wait and retry |
+
+## Getting More Information
+
+**Browser console:**
+
+```
+F12 → Console tab
+```
+
+Look for errors related to `tns-csi`, `headlamp-plugin`, or Kubernetes API paths.
+
+**Headlamp pod logs:**
+
+```bash
+kubectl logs -n kube-system -l app.kubernetes.io/name=headlamp --tail=100
+```
+
+**tns-csi controller logs:**
+
+```bash
+kubectl logs -n kube-system -l app.kubernetes.io/name=tns-csi-driver,app.kubernetes.io/component=controller --tail=100
+```
@@ -0,0 +1,93 @@
+# Benchmark Issues
+
+## Benchmark Fails to Start
+
+### Check RBAC
+
+The Benchmark page requires permissions to create and delete Jobs and PVCs:
+
+```bash
+kubectl auth can-i create jobs -n <benchmark-namespace> \
+  --as=system:serviceaccount:kube-system:headlamp
+
+kubectl auth can-i create persistentvolumeclaims -n <benchmark-namespace> \
+  --as=system:serviceaccount:kube-system:headlamp
+```
+
+Apply the additional permissions if missing — see [RBAC Issues](rbac.md) or [SECURITY.md](../../SECURITY.md).
+
+### Check the Target Namespace Exists
+
+The namespace you select in the Benchmark form must exist. Create it if needed:
+
+```bash
+kubectl create namespace <benchmark-namespace>
+```
+
+## Benchmark Stuck in "Running"
+
+### Check the kbench Pod
+
+```bash
+kubectl get pods -n <benchmark-namespace> \
+  -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin
+```
+
+Common states:
+
+| Pod State | Cause | Action |
+| --------- | ----- | ------ |
+| `Pending` | PVC not provisioned or scheduler issue | Check PVC status and StorageClass |
+| `Init:Error` | kbench image pull failure | Check image pull policy and network |
+| `Running` | Benchmark in progress | Wait for completion |
+| `Completed` | Finished — results should appear | Check FIO log section |
+| `Error` / `OOMKilled` | kbench ran out of memory | Reduce test size or capacity |
+
+### Check the PVC
+
+```bash
+kubectl get pvc -n <benchmark-namespace> \
+  -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin
+```
+
+If the PVC is stuck in `Pending`, the StorageClass provisioner may not be able to create the volume:
+
+```bash
+kubectl describe pvc -n <benchmark-namespace> <pvc-name>
+```
+
+Look for events at the bottom of the describe output.
+
+### View kbench Logs Directly
+
+```bash
+kubectl logs -n <benchmark-namespace> \
+  -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin \
+  --tail=100
+```
+
+## Leftover Resources After Failed Benchmark
+
+If the benchmark was stopped or the plugin page was closed during a run, the Job and PVC may not have been cleaned up:
+
+```bash
+# List leftover resources
+kubectl get jobs,pvc -n <benchmark-namespace> \
+  -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin
+
+# Clean up manually
+kubectl delete jobs,pvc -n <benchmark-namespace> \
+  -l app.kubernetes.io/managed-by=headlamp-tns-csi-plugin
+```
+
+The plugin adds the `app.kubernetes.io/managed-by=headlamp-tns-csi-plugin` label to all benchmark resources precisely to enable safe cleanup with this label selector.
+
+## No Results Shown After Benchmark Completes
+
+The plugin parses the FIO log output from the kbench pod. If results don't appear:
+
+1. Check the pod completed successfully (status `Completed`, exit code 0)
+2. View the raw log: `kubectl logs -n <ns> <kbench-pod>`
+3. Look for the FIO result section — it should contain lines like `READ: bw=...` or `WRITE: bw=...`
+
+If the kbench version produces output in a different format, the FIO log parser may not recognize it. Open a [GitHub Issue](https://github.com/privilegedescalation/headlamp-tns-csi-plugin/issues) with a sample of the log output.
@@ -0,0 +1,68 @@
+# Metrics Issues
+
+## Metrics Page Shows No Data
+
+### 1. Check the Controller Pod Is Running
+
+```bash
+kubectl get pods -n kube-system \
+  -l app.kubernetes.io/name=tns-csi-driver,app.kubernetes.io/component=controller
+```
+
+The controller pod must be in `Running` state with all containers ready.
+
+### 2. Verify Port 8080 Is Exposed
+
+```bash
+# Check the pod spec for port 8080
+kubectl get pod -n kube-system <controller-pod-name> -o yaml | grep -A5 "ports:"
+```
+
+If port 8080 is not declared, the tns-csi driver version you're running may not expose Prometheus metrics. Check the driver documentation.
+
+### 3. Test the Metrics Endpoint Directly
+
+```bash
+# Port-forward the controller pod
+kubectl port-forward -n kube-system \
+  $(kubectl get pods -n kube-system -l app.kubernetes.io/name=tns-csi-driver,app.kubernetes.io/component=controller -o name | head -1) \
+  8080:8080
+
+# In another terminal
+curl http://localhost:8080/metrics | head -20
+```
+
+If this returns Prometheus text format output, the endpoint is working. If it returns 404 or connection refused, the controller isn't exposing metrics.
+
+### 4. Check RBAC for Pod Proxy
+
+The plugin accesses metrics via the Kubernetes pod proxy sub-resource:
+
+```
+GET /api/v1/namespaces/kube-system/pods/<pod>/proxy/metrics
+```
+
+This requires `get` on `pods/proxy` in `kube-system`:
+
+```bash
+kubectl auth can-i get pods/proxy \
+  -n kube-system \
+  --as=system:serviceaccount:kube-system:headlamp
+```
+
+### 5. Network Policies
+
+If `kube-system` has NetworkPolicies, ensure the Kubernetes API server can reach the controller pod on port 8080. The pod proxy hop is performed by the API server, not by Headlamp directly.
+
+## Metrics Show Stale Values
+
+The Metrics page fetches data on-demand when the page loads. Click **Refresh** to re-fetch the latest metrics from the controller pod.
+
+## Some Metric Cards Show "—"
+
+Not all tns-csi driver versions expose all metrics. The plugin shows placeholder "—" values for metrics that are absent from the Prometheus output. This is expected behavior.
+
+The plugin specifically looks for:
+- `kubelet_volume_stats_*` metrics (volume I/O)
+- `csi_operations_seconds_*` metrics (CSI operation latency)
+- Any tns-csi specific metrics on port 8080
@@ -0,0 +1,64 @@
+# RBAC Issues
+
+## 403 Forbidden Errors
+
+A 403 error means the identity making the API request (Headlamp's service account or the logged-in user's token) lacks the required permission.
+
+### Diagnosing Which Permission Is Missing
+
+Use `kubectl auth can-i` to check specific permissions:
+
+```bash
+# Check if the Headlamp service account can list StorageClasses
+kubectl auth can-i list storageclasses \
+  --as=system:serviceaccount:kube-system:headlamp
+
+# Check pod proxy access (for metrics)
+kubectl auth can-i get pods/proxy \
+  -n kube-system \
+  --as=system:serviceaccount:kube-system:headlamp
+
+# Check snapshot access
+kubectl auth can-i list volumesnapshots \
+  --as=system:serviceaccount:kube-system:headlamp
+```
+
+### Applying the Required RBAC
+
+See [RBAC Permissions](../user-guide/rbac.md) for the complete ClusterRole manifest.
+
+Quick apply:
+
+```bash
+kubectl apply -f https://raw.githubusercontent.com/privilegedescalation/headlamp-tns-csi-plugin/main/docs/user-guide/rbac-manifest.yaml
+```
+
+Or manually apply the ClusterRole and ClusterRoleBinding from [SECURITY.md](../../SECURITY.md).
+
+### OIDC Token Mode
+
+If Headlamp is configured for OIDC authentication, each user's own token is used for API requests. The RBAC must be bound to the user's identity (email, group) rather than the service account:
+
+```yaml
+subjects:
+  - kind: Group
+    name: "engineering"
+    apiGroup: rbac.authorization.k8s.io
+```
+
+Users not in the group will see 403 errors in the plugin.
+
+### Benchmark 403
+
+The Benchmark page requires additional write permissions:
+
+```yaml
+- apiGroups: ["batch"]
+  resources: ["jobs"]
+  verbs: ["get", "list", "watch", "create", "delete"]
+- apiGroups: [""]
+  resources: ["persistentvolumeclaims"]
+  verbs: ["create", "delete"]
+```
+
+If only the Benchmark page shows 403, add these rules to your ClusterRole (or a separate Role scoped to the benchmark namespace).