Updates documentation to reflect that Headlamp is installed in the 'headlamp' namespace (not 'kube-system'). Only documentation files that reference the Headlamp install namespace are changed. Changed files: - docs/deployment/production.md: NetworkPolicy namespaceSelector - docs/troubleshooting/network-problems.md: NetworkPolicy namespaceSelector - docs/user-guide/rbac-permissions.md: NetworkPolicy namespaceSelector - e2e/README.md: kubectl commands for local E2E testing Files NOT changed (upstream workload namespace - out of scope per PRI-340): - Source files, tests, or configs referencing where Polaris runs Co-Authored-By: Paperclip <noreply@paperclip.ing>
12 KiB
Production Deployment
Production deployment checklist, best practices, and security considerations for the Headlamp Polaris Plugin.
Table of Contents
- Pre-Deployment Checklist
- Production Checklist
- Security Best Practices
- High Availability
- Monitoring and Observability
- Performance Tuning
- Disaster Recovery
- Known Issues
Pre-Deployment Checklist
Before deploying to production:
Infrastructure
- Kubernetes cluster v1.24+ running
- Polaris deployed in
polarisnamespace - Polaris dashboard service (
polaris-dashboard:80) accessible - Headlamp v0.26+ deployed (v0.39+ recommended)
- Ingress controller configured (if exposing externally)
- TLS certificates provisioned (cert-manager recommended)
Verification Commands
# Verify Polaris
kubectl -n polaris get pods
kubectl -n polaris get svc polaris-dashboard
# Test Polaris API
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion
# Verify Headlamp
kubectl -n headlamp get deployment headlamp
kubectl -n headlamp get svc headlamp
Production Checklist
Deployment
- Plugin installed via Plugin Manager or sidecar init container
- RBAC Role and RoleBinding applied
- NetworkPolicies configured (if using strict network policies)
- Headlamp pods running with 2+ replicas (high availability)
- Resource limits and requests configured
Post-Deployment Verification
# 1. Verify Polaris API is accessible via service proxy
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion
# Expected: "1.0" or similar
# 2. Verify RBAC permissions
kubectl auth can-i get services/proxy \
--as=system:serviceaccount:headlamp:headlamp \
-n polaris \
--resource-name=polaris-dashboard
# Expected: yes
# 3. Check Headlamp logs for plugin loading
kubectl -n headlamp logs deployment/headlamp | grep -i polaris
# Expected: No errors related to plugin loading
# 4. Verify plugin files exist
kubectl -n headlamp exec deployment/headlamp -c headlamp -- ls -la /headlamp/plugins/headlamp-polaris-plugin/
# Expected: dist/, package.json present
UI Verification
- Navigate to Settings → Plugins
- Verify "headlamp-polaris-plugin" is listed with correct version
- Sidebar shows "Polaris" entry
- Click Polaris → Overview - page loads successfully
- Cluster score gauge displays
- Namespaces table loads with data
- App bar shows Polaris score badge
- Click namespace - detail drawer opens
- Test inline audit section on a Deployment/StatefulSet
Security Best Practices
RBAC
Principle of Least Privilege:
# ✅ GOOD: Scoped to specific service
rules:
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["polaris-dashboard"]
verbs: ["get"]
# ❌ BAD: Too broad
rules:
- apiGroups: [""]
resources: ["services/proxy"]
verbs: ["get"] # Allows proxy to ALL services
Token-Auth Mode:
When Headlamp uses user-supplied tokens (OIDC), each user needs the RoleBinding:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: authenticated-users-polaris-proxy
namespace: polaris
subjects:
- kind: Group
name: system:authenticated # All authenticated users
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: polaris-proxy-reader
apiGroup: rbac.authorization.k8s.io
For fine-grained control, bind specific users or groups:
subjects:
- kind: Group
name: sre-team # Only SRE team
apiGroup: rbac.authorization.k8s.io
Network Policies
If using strict NetworkPolicies:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-apiserver-to-polaris
namespace: polaris
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: polaris
app.kubernetes.io/component: dashboard
policyTypes:
- Ingress
ingress:
# Allow from API server (performs the proxy hop)
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: headlamp
- podSelector:
matchLabels:
component: kube-apiserver
ports:
- protocol: TCP
port: 80
Note: The API server proxies the request, not the Headlamp pod directly.
Audit Logging
Kubernetes audit logs record every service proxy request:
- What's logged: User/service account, timestamp, response code
- Volume: Auto-refresh interval affects audit log volume
- Recommendation: Configure audit policy level if concerned about log volume
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata # Log metadata only (not full request/response)
verbs: ['get']
resources:
- group: ''
resources: ['services/proxy']
namespaces: ['polaris']
Data Sensitivity
Polaris audit data may contain:
- Resource names and namespaces
- Configuration details
- Potential security vulnerabilities
Recommendation: Restrict plugin access to authorized users only (not system:authenticated unless appropriate).
High Availability
Headlamp Replicas
Deploy Headlamp with 2+ replicas for high availability:
# helm-values.yaml
replicaCount: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: headlamp
topologyKey: kubernetes.io/hostname
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Pod Disruption Budget
Ensure at least one replica is always available during node maintenance:
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: headlamp-pdb
namespace: headlamp
spec:
minAvailable: 1
selector:
matchLabels:
app.kubernetes.io/name: headlamp
Health Checks
Configure liveness and readiness probes:
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 10
periodSeconds: 5
Monitoring and Observability
Metrics to Monitor
Application Metrics:
- Headlamp pod CPU/memory usage
- HTTP request latency and error rates
- Plugin load time
Polaris Metrics:
- Polaris dashboard API response time
- Service proxy request latency
- RBAC denial rate (403 errors)
Prometheus Integration
Example ServiceMonitor for Headlamp:
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: headlamp
namespace: headlamp
spec:
selector:
matchLabels:
app.kubernetes.io/name: headlamp
endpoints:
- port: http
interval: 30s
path: /metrics
Logging
Headlamp Logs:
# View logs
kubectl -n headlamp logs deployment/headlamp -f
# Filter for plugin-related logs
kubectl -n headlamp logs deployment/headlamp | grep -i polaris
Polaris Dashboard Logs:
kubectl -n polaris logs deployment/polaris-dashboard -f
Alerts
Recommended alerts:
- Headlamp pod not ready
- High error rate (4xx/5xx)
- Polaris dashboard unavailable
- RBAC denials (403 errors)
Example PrometheusRule:
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: headlamp-alerts
namespace: headlamp
spec:
groups:
- name: headlamp
interval: 30s
rules:
- alert: HeadlampPodNotReady
expr: kube_pod_status_ready{namespace="headlamp", pod=~"headlamp-.*"} == 0
for: 5m
labels:
severity: warning
annotations:
summary: 'Headlamp pod not ready'
description: 'Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been not ready for 5 minutes.'
Performance Tuning
Plugin Refresh Interval
The plugin auto-refreshes Polaris data at a configurable interval (default: 5 minutes).
Recommendations:
- High-traffic clusters: 10-30 minutes (reduces API server load)
- Low-traffic clusters: 1-5 minutes (more real-time data)
Configure via Settings → Plugins → Polaris in Headlamp UI.
Browser Caching
The plugin uses localStorage for settings. Browser cache can affect plugin loading.
Best Practice: Instruct users to hard refresh after plugin updates (Cmd+Shift+R / Ctrl+Shift+R).
Resource Limits
Recommended resource limits for Headlamp with plugin:
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Adjust based on cluster size and user count.
Disaster Recovery
Backup Considerations
What to back up:
- Headlamp Helm values or Kubernetes manifests
- RBAC manifests (Role, RoleBinding)
- Plugin configuration (ConfigMap if using sidecar method)
What NOT to back up:
- Plugin tarball (available on GitHub releases)
- Polaris audit data (regenerated by Polaris)
- Browser localStorage (user-specific settings)
Recovery Procedure
If Headlamp or plugin becomes unavailable:
-
Verify Polaris is running:
kubectl -n polaris get pods kubectl -n polaris get svc polaris-dashboard -
Redeploy Headlamp:
helm upgrade --install headlamp headlamp/headlamp
--namespace headlamp
--values headlamp-values.yaml
3. **Reapply RBAC:**
```bash
kubectl apply -f polaris-plugin-rbac.yaml
-
Verify plugin files:
kubectl -n headlamp exec deployment/headlamp -- \ ls /headlamp/plugins/headlamp-polaris-plugin/ -
Hard refresh browser: Cmd+Shift+R / Ctrl+Shift+R
Known Issues
Skipped Count Limitation
Symptom: "Skipped" count in UI is lower than native Polaris dashboard
Cause: Plugin only counts checks with Severity: "ignore" from API response
Explanation:
Polaris omits annotation-based exemptions (e.g., polaris.fairwinds.com/*-exempt) from the results.json endpoint. The native Polaris dashboard computes skipped count by querying raw Kubernetes resources and parsing annotations.
Workaround: Use "View in Polaris Dashboard" link for accurate exemption count.
Future Enhancement: Would require cluster-wide read access to all workload types (significant RBAC expansion).
ArtifactHub Sync Delay
Symptom: New plugin version not appearing in Headlamp catalog
Cause: ArtifactHub syncs from GitHub every 30 minutes (no webhook/push mechanism)
Solution: Wait 30 minutes after GitHub release for new version to appear in catalog.
Troubleshooting
For production issues, see:
- Troubleshooting Guide - Comprehensive troubleshooting
- RBAC Issues - Permission debugging
- Network Problems - Connectivity issues
Next Steps
- Kubernetes Deployment - Raw manifest deployment
- Helm Deployment - Helm chart deployment
- Troubleshooting - Issue resolution