e2ae92648c
* docs: update Headlamp install namespace references from kube-system to headlamp Updates all documentation references to the Headlamp install namespace from kube-system to headlamp as part of PRI-433. In-scope files updated: - README.md, SECURITY.md - docs/getting-started/installation.md, quick-start.md, prerequisites.md - docs/deployment/helm.md, kubernetes.md, production.md - docs/troubleshooting/README.md, common-issues.md, rbac-issues.md - docs/user-guide/configuration.md, rbac-permissions.md - docs/TESTING.md, TROUBLESHOOTING.md, DEPLOYMENT.md Out-of-scope (unchanged): - Source files referencing upstream workload namespace - RBAC manifests describing Polaris namespace (polaris ns is unchanged) - NetworkPolicy namespaceSelector (API server runs in kube-system) - design-decisions.md and ARCHITECTURE.md (URL hashes refer to cluster namespaces, not Headlamp install ns) Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix: correct RBAC manifest per QA review (PRI-555) - Remove rbac.authorization.k8s.io privilege escalation block - Fix orphaned comment from round 1 - Add EOF newline - Keep serviceaccounts/token for E2E auth (confirmed needed) - Namespace already correct (privilegedescalation-dev) Co-Authored-By: Paperclip <noreply@paperclip.ing> * docs: replace hardcoded namespace with <your-namespace> placeholder Users choose their own namespace for Headlamp. Replace all hardcoded namespace references (headlamp, kube-system) in user-facing docs with <your-namespace> so users substitute their own value. Conventions: - Helm install: --namespace <your-namespace> --create-namespace - kubectl commands: -n <your-namespace> - YAML metadata: namespace: <your-namespace> - Prose: "the namespace where Headlamp is installed" Out-of-scope references left untouched: - kube-system in NetworkPolicy selectors (API server namespace) - polaris namespace references (upstream workload namespace) - Source code and test files Refs: PRI-433 Co-Authored-By: Paperclip <noreply@paperclip.ing> * docs: fix remaining hardcoded headlamp namespace to <your-namespace> placeholder Prior commit was inconsistent — some files used <your-namespace> while DEPLOYMENT.md, TROUBLESHOOTING.md and several troubleshooting/user-guide docs still hardcoded headlamp as the namespace. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Chris Farhood <chris@farhood.org> Co-authored-by: Paperclip <noreply@paperclip.ing>
489 lines
12 KiB
Markdown
489 lines
12 KiB
Markdown
# Production Deployment
|
|
|
|
Production deployment checklist, best practices, and security considerations for the Headlamp Polaris Plugin.
|
|
|
|
## Table of Contents
|
|
|
|
- [Pre-Deployment Checklist](#pre-deployment-checklist)
|
|
- [Production Checklist](#production-checklist)
|
|
- [Security Best Practices](#security-best-practices)
|
|
- [High Availability](#high-availability)
|
|
- [Monitoring and Observability](#monitoring-and-observability)
|
|
- [Performance Tuning](#performance-tuning)
|
|
- [Disaster Recovery](#disaster-recovery)
|
|
- [Known Issues](#known-issues)
|
|
|
|
## Pre-Deployment Checklist
|
|
|
|
Before deploying to production:
|
|
|
|
### Infrastructure
|
|
|
|
- [ ] Kubernetes cluster v1.24+ running
|
|
- [ ] Polaris deployed in `polaris` namespace
|
|
- [ ] Polaris dashboard service (`polaris-dashboard:80`) accessible
|
|
- [ ] Headlamp v0.26+ deployed (v0.39+ recommended)
|
|
- [ ] Ingress controller configured (if exposing externally)
|
|
- [ ] TLS certificates provisioned (cert-manager recommended)
|
|
|
|
### Verification Commands
|
|
|
|
```bash
|
|
# Verify Polaris
|
|
kubectl -n polaris get pods
|
|
kubectl -n polaris get svc polaris-dashboard
|
|
|
|
# Test Polaris API
|
|
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion
|
|
|
|
# Verify Headlamp
|
|
kubectl -n <your-namespace> get deployment headlamp
|
|
kubectl -n <your-namespace> get svc headlamp
|
|
```
|
|
|
|
## Production Checklist
|
|
|
|
### Deployment
|
|
|
|
- [ ] Plugin installed via Plugin Manager or sidecar init container
|
|
- [ ] RBAC Role and RoleBinding applied
|
|
- [ ] NetworkPolicies configured (if using strict network policies)
|
|
- [ ] Headlamp pods running with 2+ replicas (high availability)
|
|
- [ ] Resource limits and requests configured
|
|
|
|
### Post-Deployment Verification
|
|
|
|
```bash
|
|
# 1. Verify Polaris API is accessible via service proxy
|
|
kubectl get --raw /api/v1/namespaces/polaris/services/polaris-dashboard:80/proxy/results.json | jq .PolarisOutputVersion
|
|
# Expected: "1.0" or similar
|
|
|
|
# 2. Verify RBAC permissions
|
|
kubectl auth can-i get services/proxy \
|
|
--as=system:serviceaccount:<your-namespace>:headlamp \
|
|
-n polaris \
|
|
--resource-name=polaris-dashboard
|
|
# Expected: yes
|
|
|
|
# 3. Check Headlamp logs for plugin loading
|
|
kubectl -n <your-namespace> logs deployment/headlamp | grep -i polaris
|
|
# Expected: No errors related to plugin loading
|
|
|
|
# 4. Verify plugin files exist
|
|
kubectl -n <your-namespace> exec deployment/headlamp -c headlamp -- ls -la /headlamp/plugins/headlamp-polaris-plugin/
|
|
# Expected: dist/, package.json present
|
|
```
|
|
|
|
### UI Verification
|
|
|
|
- [ ] Navigate to **Settings → Plugins**
|
|
- [ ] Verify "headlamp-polaris-plugin" is listed with correct version
|
|
- [ ] Sidebar shows "Polaris" entry
|
|
- [ ] Click **Polaris → Overview** - page loads successfully
|
|
- [ ] Cluster score gauge displays
|
|
- [ ] Namespaces table loads with data
|
|
- [ ] App bar shows Polaris score badge
|
|
- [ ] Click namespace - detail drawer opens
|
|
- [ ] Test inline audit section on a Deployment/StatefulSet
|
|
|
|
## Security Best Practices
|
|
|
|
### RBAC
|
|
|
|
**Principle of Least Privilege:**
|
|
|
|
```yaml
|
|
# ✅ GOOD: Scoped to specific service
|
|
rules:
|
|
- apiGroups: [""]
|
|
resources: ["services/proxy"]
|
|
resourceNames: ["polaris-dashboard"]
|
|
verbs: ["get"]
|
|
|
|
# ❌ BAD: Too broad
|
|
rules:
|
|
- apiGroups: [""]
|
|
resources: ["services/proxy"]
|
|
verbs: ["get"] # Allows proxy to ALL services
|
|
```
|
|
|
|
**Token-Auth Mode:**
|
|
|
|
When Headlamp uses user-supplied tokens (OIDC), each user needs the RoleBinding:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: rbac.authorization.k8s.io/v1
|
|
kind: RoleBinding
|
|
metadata:
|
|
name: authenticated-users-polaris-proxy
|
|
namespace: polaris
|
|
subjects:
|
|
- kind: Group
|
|
name: system:authenticated # All authenticated users
|
|
apiGroup: rbac.authorization.k8s.io
|
|
roleRef:
|
|
kind: Role
|
|
name: polaris-proxy-reader
|
|
apiGroup: rbac.authorization.k8s.io
|
|
```
|
|
|
|
For fine-grained control, bind specific users or groups:
|
|
|
|
```yaml
|
|
subjects:
|
|
- kind: Group
|
|
name: sre-team # Only SRE team
|
|
apiGroup: rbac.authorization.k8s.io
|
|
```
|
|
|
|
### Network Policies
|
|
|
|
If using strict NetworkPolicies:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: allow-apiserver-to-polaris
|
|
namespace: polaris
|
|
spec:
|
|
podSelector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: polaris
|
|
app.kubernetes.io/component: dashboard
|
|
policyTypes:
|
|
- Ingress
|
|
ingress:
|
|
# Allow from API server (performs the proxy hop)
|
|
- from:
|
|
- namespaceSelector:
|
|
matchLabels:
|
|
kubernetes.io/metadata.name: kube-system
|
|
- podSelector:
|
|
matchLabels:
|
|
component: kube-apiserver
|
|
ports:
|
|
- protocol: TCP
|
|
port: 80
|
|
```
|
|
|
|
**Note:** The API server proxies the request, not the Headlamp pod directly.
|
|
|
|
### Audit Logging
|
|
|
|
Kubernetes audit logs record every service proxy request:
|
|
|
|
- **What's logged:** User/service account, timestamp, response code
|
|
- **Volume:** Auto-refresh interval affects audit log volume
|
|
- **Recommendation:** Configure audit policy level if concerned about log volume
|
|
|
|
```yaml
|
|
# audit-policy.yaml
|
|
apiVersion: audit.k8s.io/v1
|
|
kind: Policy
|
|
rules:
|
|
- level: Metadata # Log metadata only (not full request/response)
|
|
verbs: ['get']
|
|
resources:
|
|
- group: ''
|
|
resources: ['services/proxy']
|
|
namespaces: ['polaris']
|
|
```
|
|
|
|
### Data Sensitivity
|
|
|
|
Polaris audit data may contain:
|
|
|
|
- Resource names and namespaces
|
|
- Configuration details
|
|
- Potential security vulnerabilities
|
|
|
|
**Recommendation:** Restrict plugin access to authorized users only (not `system:authenticated` unless appropriate).
|
|
|
|
## High Availability
|
|
|
|
### Headlamp Replicas
|
|
|
|
Deploy Headlamp with 2+ replicas for high availability:
|
|
|
|
```yaml
|
|
# helm-values.yaml
|
|
replicaCount: 2
|
|
|
|
affinity:
|
|
podAntiAffinity:
|
|
preferredDuringSchedulingIgnoredDuringExecution:
|
|
- weight: 100
|
|
podAffinityTerm:
|
|
labelSelector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: headlamp
|
|
topologyKey: kubernetes.io/hostname
|
|
|
|
resources:
|
|
limits:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
requests:
|
|
cpu: 100m
|
|
memory: 128Mi
|
|
```
|
|
|
|
### Pod Disruption Budget
|
|
|
|
Ensure at least one replica is always available during node maintenance:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: policy/v1
|
|
kind: PodDisruptionBudget
|
|
metadata:
|
|
name: headlamp-pdb
|
|
namespace: <your-namespace>
|
|
spec:
|
|
minAvailable: 1
|
|
selector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: headlamp
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
Configure liveness and readiness probes:
|
|
|
|
```yaml
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /
|
|
port: http
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /
|
|
port: http
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Metrics to Monitor
|
|
|
|
**Application Metrics:**
|
|
|
|
- Headlamp pod CPU/memory usage
|
|
- HTTP request latency and error rates
|
|
- Plugin load time
|
|
|
|
**Polaris Metrics:**
|
|
|
|
- Polaris dashboard API response time
|
|
- Service proxy request latency
|
|
- RBAC denial rate (403 errors)
|
|
|
|
### Prometheus Integration
|
|
|
|
Example ServiceMonitor for Headlamp:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: headlamp
|
|
namespace: <your-namespace>
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
app.kubernetes.io/name: headlamp
|
|
endpoints:
|
|
- port: http
|
|
interval: 30s
|
|
path: /metrics
|
|
```
|
|
|
|
### Logging
|
|
|
|
**Headlamp Logs:**
|
|
|
|
```bash
|
|
# View logs
|
|
kubectl -n <your-namespace> logs deployment/headlamp -f
|
|
|
|
# Filter for plugin-related logs
|
|
kubectl -n <your-namespace> logs deployment/headlamp | grep -i polaris
|
|
```
|
|
|
|
**Polaris Dashboard Logs:**
|
|
|
|
```bash
|
|
kubectl -n polaris logs deployment/polaris-dashboard -f
|
|
```
|
|
|
|
### Alerts
|
|
|
|
Recommended alerts:
|
|
|
|
- Headlamp pod not ready
|
|
- High error rate (4xx/5xx)
|
|
- Polaris dashboard unavailable
|
|
- RBAC denials (403 errors)
|
|
|
|
Example PrometheusRule:
|
|
|
|
```yaml
|
|
---
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: PrometheusRule
|
|
metadata:
|
|
name: headlamp-alerts
|
|
namespace: <your-namespace>
|
|
spec:
|
|
groups:
|
|
- name: headlamp
|
|
interval: 30s
|
|
rules:
|
|
- alert: HeadlampPodNotReady
|
|
expr: kube_pod_status_ready{namespace="<your-namespace>", pod=~"headlamp-.*"} == 0
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: 'Headlamp pod not ready'
|
|
description: 'Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been not ready for 5 minutes.'
|
|
```
|
|
|
|
## Performance Tuning
|
|
|
|
### Plugin Refresh Interval
|
|
|
|
The plugin auto-refreshes Polaris data at a configurable interval (default: 5 minutes).
|
|
|
|
**Recommendations:**
|
|
|
|
- **High-traffic clusters:** 10-30 minutes (reduces API server load)
|
|
- **Low-traffic clusters:** 1-5 minutes (more real-time data)
|
|
|
|
Configure via **Settings → Plugins → Polaris** in Headlamp UI.
|
|
|
|
### Browser Caching
|
|
|
|
The plugin uses localStorage for settings. Browser cache can affect plugin loading.
|
|
|
|
**Best Practice:** Instruct users to hard refresh after plugin updates (**Cmd+Shift+R** / **Ctrl+Shift+R**).
|
|
|
|
### Resource Limits
|
|
|
|
Recommended resource limits for Headlamp with plugin:
|
|
|
|
```yaml
|
|
resources:
|
|
limits:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
requests:
|
|
cpu: 100m
|
|
memory: 128Mi
|
|
```
|
|
|
|
Adjust based on cluster size and user count.
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backup Considerations
|
|
|
|
**What to back up:**
|
|
|
|
- Headlamp Helm values or Kubernetes manifests
|
|
- RBAC manifests (Role, RoleBinding)
|
|
- Plugin configuration (ConfigMap if using sidecar method)
|
|
|
|
**What NOT to back up:**
|
|
|
|
- Plugin tarball (available on GitHub releases)
|
|
- Polaris audit data (regenerated by Polaris)
|
|
- Browser localStorage (user-specific settings)
|
|
|
|
### Recovery Procedure
|
|
|
|
If Headlamp or plugin becomes unavailable:
|
|
|
|
1. **Verify Polaris is running:**
|
|
|
|
```bash
|
|
kubectl -n polaris get pods
|
|
kubectl -n polaris get svc polaris-dashboard
|
|
```
|
|
|
|
2. **Redeploy Headlamp:**
|
|
|
|
```bash
|
|
helm upgrade --install headlamp headlamp/headlamp \
|
|
--namespace <your-namespace> \
|
|
--values headlamp-values.yaml
|
|
```
|
|
|
|
3. **Reapply RBAC:**
|
|
|
|
```bash
|
|
kubectl apply -f polaris-plugin-rbac.yaml
|
|
```
|
|
|
|
4. **Verify plugin files:**
|
|
|
|
```bash
|
|
kubectl -n <your-namespace> exec deployment/headlamp -- \
|
|
ls /headlamp/plugins/headlamp-polaris-plugin/
|
|
```
|
|
|
|
5. **Hard refresh browser:**
|
|
**Cmd+Shift+R** / **Ctrl+Shift+R**
|
|
|
|
## Known Issues
|
|
|
|
### Skipped Count Limitation
|
|
|
|
**Symptom:** "Skipped" count in UI is lower than native Polaris dashboard
|
|
|
|
**Cause:** Plugin only counts checks with `Severity: "ignore"` from API response
|
|
|
|
**Explanation:**
|
|
|
|
Polaris omits annotation-based exemptions (e.g., `polaris.fairwinds.com/*-exempt`) from the `results.json` endpoint. The native Polaris dashboard computes skipped count by querying raw Kubernetes resources and parsing annotations.
|
|
|
|
**Workaround:** Use "View in Polaris Dashboard" link for accurate exemption count.
|
|
|
|
**Future Enhancement:** Would require cluster-wide read access to all workload types (significant RBAC expansion).
|
|
|
|
### ArtifactHub Sync Delay
|
|
|
|
**Symptom:** New plugin version not appearing in Headlamp catalog
|
|
|
|
**Cause:** ArtifactHub syncs from GitHub every 30 minutes (no webhook/push mechanism)
|
|
|
|
**Solution:** Wait 30 minutes after GitHub release for new version to appear in catalog.
|
|
|
|
## Troubleshooting
|
|
|
|
For production issues, see:
|
|
|
|
- **[Troubleshooting Guide](../troubleshooting/README.md)** - Comprehensive troubleshooting
|
|
- **[RBAC Issues](../troubleshooting/rbac-issues.md)** - Permission debugging
|
|
- **[Network Problems](../troubleshooting/network-problems.md)** - Connectivity issues
|
|
|
|
## Next Steps
|
|
|
|
- **[Kubernetes Deployment](kubernetes.md)** - Raw manifest deployment
|
|
- **[Helm Deployment](helm.md)** - Helm chart deployment
|
|
- **[Troubleshooting](../troubleshooting/README.md)** - Issue resolution
|
|
|
|
## References
|
|
|
|
- [Kubernetes Production Best Practices](https://kubernetes.io/docs/setup/best-practices/)
|
|
- [Headlamp Security](https://headlamp.dev/docs/latest/installation/in-cluster/#security)
|
|
- [Polaris Configuration](https://polaris.docs.fairwinds.com/customization/checks/)
|