Files
headlamp-polaris-plugin/docs/deployment/production.md
T
Chris Farhood 56d10a1d40 docs: update Headlamp install namespace from kube-system to headlamp
Updates documentation to reflect that Headlamp is installed in the
'headlamp' namespace (not 'kube-system'). Only documentation files
that reference the Headlamp install namespace are changed.

Changed files:
- docs/deployment/production.md: NetworkPolicy namespaceSelector
- docs/troubleshooting/network-problems.md: NetworkPolicy namespaceSelector
- docs/user-guide/rbac-permissions.md: NetworkPolicy namespaceSelector
- e2e/README.md: kubectl commands for local E2E testing

Files NOT changed (upstream workload namespace - out of scope per PRI-340):
- Source files, tests, or configs referencing where Polaris runs

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-05-08 11:07:50 +00:00

12 KiB

Production Deployment

Production deployment checklist, best practices, and security considerations for the Headlamp Polaris Plugin.

Table of Contents

Pre-Deployment Checklist

Before deploying to production:

Infrastructure

  • Kubernetes cluster v1.24+ running
  • Polaris deployed in polaris namespace
  • Polaris dashboard service (polaris-dashboard:80) accessible
  • Headlamp v0.26+ deployed (v0.39+ recommended)
  • Ingress controller configured (if exposing externally)
  • TLS certificates provisioned (cert-manager recommended)

Verification Commands

# Verify Polaris
kubectl -n polaris get pods
kubectl -n polaris get svc polaris-dashboard

# Test Polaris API
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion

# Verify Headlamp
kubectl -n headlamp get deployment headlamp
kubectl -n headlamp get svc headlamp

Production Checklist

Deployment

  • Plugin installed via Plugin Manager or sidecar init container
  • RBAC Role and RoleBinding applied
  • NetworkPolicies configured (if using strict network policies)
  • Headlamp pods running with 2+ replicas (high availability)
  • Resource limits and requests configured

Post-Deployment Verification

# 1. Verify Polaris API is accessible via service proxy
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion
# Expected: "1.0" or similar

# 2. Verify RBAC permissions
kubectl auth can-i get services/proxy \
  --as=system:serviceaccount:headlamp:headlamp \
  -n polaris \
  --resource-name=polaris-dashboard
# Expected: yes

# 3. Check Headlamp logs for plugin loading
kubectl -n headlamp logs deployment/headlamp | grep -i polaris
# Expected: No errors related to plugin loading

# 4. Verify plugin files exist
kubectl -n headlamp exec deployment/headlamp -c headlamp -- ls -la /headlamp/plugins/headlamp-polaris-plugin/
# Expected: dist/, package.json present

UI Verification

  • Navigate to Settings → Plugins
  • Verify "headlamp-polaris-plugin" is listed with correct version
  • Sidebar shows "Polaris" entry
  • Click Polaris → Overview - page loads successfully
  • Cluster score gauge displays
  • Namespaces table loads with data
  • App bar shows Polaris score badge
  • Click namespace - detail drawer opens
  • Test inline audit section on a Deployment/StatefulSet

Security Best Practices

RBAC

Principle of Least Privilege:

# ✅ GOOD: Scoped to specific service
rules:
  - apiGroups: [""]
    resources: ["services/proxy"]
    resourceNames: ["polaris-dashboard"]
    verbs: ["get"]

# ❌ BAD: Too broad
rules:
  - apiGroups: [""]
    resources: ["services/proxy"]
    verbs: ["get"]  # Allows proxy to ALL services

Token-Auth Mode:

When Headlamp uses user-supplied tokens (OIDC), each user needs the RoleBinding:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: authenticated-users-polaris-proxy
  namespace: polaris
subjects:
  - kind: Group
    name: system:authenticated # All authenticated users
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: polaris-proxy-reader
  apiGroup: rbac.authorization.k8s.io

For fine-grained control, bind specific users or groups:

subjects:
  - kind: Group
    name: sre-team # Only SRE team
    apiGroup: rbac.authorization.k8s.io

Network Policies

If using strict NetworkPolicies:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-apiserver-to-polaris
  namespace: polaris
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: polaris
      app.kubernetes.io/component: dashboard
  policyTypes:
    - Ingress
  ingress:
    # Allow from API server (performs the proxy hop)
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: headlamp
        - podSelector:
            matchLabels:
              component: kube-apiserver
      ports:
        - protocol: TCP
          port: 80

Note: The API server proxies the request, not the Headlamp pod directly.

Audit Logging

Kubernetes audit logs record every service proxy request:

  • What's logged: User/service account, timestamp, response code
  • Volume: Auto-refresh interval affects audit log volume
  • Recommendation: Configure audit policy level if concerned about log volume
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: Metadata # Log metadata only (not full request/response)
    verbs: ['get']
    resources:
      - group: ''
        resources: ['services/proxy']
    namespaces: ['polaris']

Data Sensitivity

Polaris audit data may contain:

  • Resource names and namespaces
  • Configuration details
  • Potential security vulnerabilities

Recommendation: Restrict plugin access to authorized users only (not system:authenticated unless appropriate).

High Availability

Headlamp Replicas

Deploy Headlamp with 2+ replicas for high availability:

# helm-values.yaml
replicaCount: 2

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: headlamp
          topologyKey: kubernetes.io/hostname

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

Pod Disruption Budget

Ensure at least one replica is always available during node maintenance:

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: headlamp-pdb
  namespace: headlamp
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: headlamp

Health Checks

Configure liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 10
  periodSeconds: 5

Monitoring and Observability

Metrics to Monitor

Application Metrics:

  • Headlamp pod CPU/memory usage
  • HTTP request latency and error rates
  • Plugin load time

Polaris Metrics:

  • Polaris dashboard API response time
  • Service proxy request latency
  • RBAC denial rate (403 errors)

Prometheus Integration

Example ServiceMonitor for Headlamp:

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: headlamp
  namespace: headlamp
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: headlamp
  endpoints:
    - port: http
      interval: 30s
      path: /metrics

Logging

Headlamp Logs:

# View logs
kubectl -n headlamp logs deployment/headlamp -f

# Filter for plugin-related logs
kubectl -n headlamp logs deployment/headlamp | grep -i polaris

Polaris Dashboard Logs:

kubectl -n polaris logs deployment/polaris-dashboard -f

Alerts

Recommended alerts:

  • Headlamp pod not ready
  • High error rate (4xx/5xx)
  • Polaris dashboard unavailable
  • RBAC denials (403 errors)

Example PrometheusRule:

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: headlamp-alerts
  namespace: headlamp
spec:
  groups:
    - name: headlamp
      interval: 30s
      rules:
        - alert: HeadlampPodNotReady
          expr: kube_pod_status_ready{namespace="headlamp", pod=~"headlamp-.*"} == 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: 'Headlamp pod not ready'
            description: 'Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been not ready for 5 minutes.'

Performance Tuning

Plugin Refresh Interval

The plugin auto-refreshes Polaris data at a configurable interval (default: 5 minutes).

Recommendations:

  • High-traffic clusters: 10-30 minutes (reduces API server load)
  • Low-traffic clusters: 1-5 minutes (more real-time data)

Configure via Settings → Plugins → Polaris in Headlamp UI.

Browser Caching

The plugin uses localStorage for settings. Browser cache can affect plugin loading.

Best Practice: Instruct users to hard refresh after plugin updates (Cmd+Shift+R / Ctrl+Shift+R).

Resource Limits

Recommended resource limits for Headlamp with plugin:

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

Adjust based on cluster size and user count.

Disaster Recovery

Backup Considerations

What to back up:

  • Headlamp Helm values or Kubernetes manifests
  • RBAC manifests (Role, RoleBinding)
  • Plugin configuration (ConfigMap if using sidecar method)

What NOT to back up:

  • Plugin tarball (available on GitHub releases)
  • Polaris audit data (regenerated by Polaris)
  • Browser localStorage (user-specific settings)

Recovery Procedure

If Headlamp or plugin becomes unavailable:

  1. Verify Polaris is running:

    kubectl -n polaris get pods
    kubectl -n polaris get svc polaris-dashboard
    
  2. Redeploy Headlamp:

helm upgrade --install headlamp headlamp/headlamp
--namespace headlamp
--values headlamp-values.yaml


3. **Reapply RBAC:**

```bash
kubectl apply -f polaris-plugin-rbac.yaml
  1. Verify plugin files:

    kubectl -n headlamp exec deployment/headlamp -- \
      ls /headlamp/plugins/headlamp-polaris-plugin/
    
  2. Hard refresh browser: Cmd+Shift+R / Ctrl+Shift+R

Known Issues

Skipped Count Limitation

Symptom: "Skipped" count in UI is lower than native Polaris dashboard

Cause: Plugin only counts checks with Severity: "ignore" from API response

Explanation:

Polaris omits annotation-based exemptions (e.g., polaris.fairwinds.com/*-exempt) from the results.json endpoint. The native Polaris dashboard computes skipped count by querying raw Kubernetes resources and parsing annotations.

Workaround: Use "View in Polaris Dashboard" link for accurate exemption count.

Future Enhancement: Would require cluster-wide read access to all workload types (significant RBAC expansion).

ArtifactHub Sync Delay

Symptom: New plugin version not appearing in Headlamp catalog

Cause: ArtifactHub syncs from GitHub every 30 minutes (no webhook/push mechanism)

Solution: Wait 30 minutes after GitHub release for new version to appear in catalog.

Troubleshooting

For production issues, see:

Next Steps

References