enhancement: Add comprehensive validation and health monitoring system #37

Open
opened 2026-02-22 13:13:25 +00:00 by cpfarhood · 0 comments
cpfarhood commented 2026-02-22 13:13:25 +00:00 (Migrated from github.com)

Summary

Implement comprehensive validation and health monitoring to prevent common deployment issues and provide proactive alerting for problems.

Proposed Components

1. Template Validation

Pre-deployment validation in Helm templates:

{{/* values.yaml validation */}}
{{- define "antigravity.validateValues" -}}

{{/* Required fields validation */}}
{{- if not .Values.name }}
{{- fail "name is required" }}
{{- end }}

{{/* Naming convention validation */}}
{{- if not (regexMatch "^[a-z0-9][a-z0-9-]*[a-z0-9]$" .Values.name) }}
{{- fail "name must be a valid Kubernetes resource name" }}
{{- end }}

{{/* Resource conflicts */}}
{{- if and (eq .Values.ide.type "none") .Values.display }}
{{- fail "display settings are not applicable when ide.type=none" }}
{{- end }}

{{/* MCP sidecar dependencies */}}
{{- if and .Values.mcp.sidecars.kubernetes.enabled (eq .Values.clusterAccess "none") }}
{{- fail "kubernetes MCP sidecar requires clusterAccess != none" }}
{{- end }}

{{- end }}

Usage: Automatically called from main templates:

# deployment.yaml
{{- include "antigravity.validateValues" . }}
apiVersion: apps/v1
kind: Deployment

2. Runtime Health Checks

Enhanced health monitoring beyond basic liveness/readiness:

# templates/healthcheck-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "antigravity.fullname" . }}-healthcheck
data:
  healthcheck.sh: |
    #!/bin/bash
    # Comprehensive health check script
    
    # Check VNC server
    if ! curl -f http://localhost:5800/vnc.html >/dev/null 2>&1; then
        echo "❌ VNC server not responding"
        exit 1
    fi
    
    # Check MCP sidecars
    {{- range $name, $config := .Values.mcp.sidecars }}
    {{- if $config.enabled }}
    if ! curl -f http://localhost:{{ $config.port }}/health >/dev/null 2>&1; then
        echo "❌ MCP sidecar {{ $name }} not healthy"
        exit 1
    fi
    {{- end }}
    {{- end }}
    
    # Check workspace mount
    if [ ! -d "/workspace" ]; then
        echo "❌ Workspace not mounted"
        exit 1
    fi
    
    # Check git configuration
    if [ ! -f "/config/userdata/.gitconfig" ]; then
        echo "⚠️  Git not configured"
    fi
    
    echo "✅ All health checks passed"

3. Deployment Validation Tests

Helm test hooks for post-deployment validation:

# templates/tests/connection-test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: {{ include "antigravity.fullname" . }}-test-connection
  annotations:
    "helm.sh/hook": test
spec:
  restartPolicy: Never
  containers:
  - name: test-connection
    image: curlimages/curl:latest
    command: ['sh', '-c']
    args:
    - |
      set -e
      echo "Testing VNC connection..."
      curl -f http://{{ include "antigravity.fullname" . }}:5800/vnc.html
      
      echo "Testing MCP endpoints..."
      {{- range $name, $config := .Values.mcp.sidecars }}
      {{- if $config.enabled }}
      curl -f http://{{ include "antigravity.fullname" . }}:{{ $config.port }}/sse || echo "Warning: {{ $name }} MCP not ready"
      {{- end }}
      {{- end }}
      
      echo "All connection tests passed!"

Usage:

helm test mydev  # Run all validation tests

4. Configuration Warnings System

Smart warnings for potential issues:

{{/* Configuration warnings */}}
{{- define "antigravity.configWarnings" -}}

{{/* Resource warnings */}}
{{- $memRequest := .Values.resources.requests.memory | default "2Gi" }}
{{- if (lt (include "antigravity.memoryToBytes" $memRequest) 2147483648) }}
{{- printf "WARNING: Memory request %s may be insufficient for enabled features" $memRequest | print }}
{{- end }}

{{/* Storage warnings */}}
{{- $storageSize := .Values.storage.size | default "32Gi" }}
{{- if and .Values.mcp.sidecars.playwright.enabled (lt (include "antigravity.storageToBytes" $storageSize) 21474836480) }}
{{- print "WARNING: Storage size may be insufficient for Playwright browser downloads" }}
{{- end }}

{{/* Security warnings */}}
{{- if and .Values.ssh.enabled (not .Values.envSecretName) }}
{{- print "WARNING: SSH enabled but no SSH_AUTHORIZED_KEYS secret configured" }}
{{- end }}

{{/* Performance warnings */}}
{{- $enabledSidecars := 0 }}
{{- range $name, $config := .Values.mcp.sidecars }}
{{- if $config.enabled }}{{- $enabledSidecars = add $enabledSidecars 1 }}{{- end }}
{{- end }}
{{- if gt $enabledSidecars 4 }}
{{- printf "WARNING: %d MCP sidecars enabled may impact performance" $enabledSidecars }}
{{- end }}

{{- end }}

5. Monitoring and Alerting

Built-in monitoring for common issues:

# templates/monitoring/servicemonitor.yaml (optional)
{{- if .Values.monitoring.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "antigravity.fullname" . }}
spec:
  selector:
    matchLabels:
      {{- include "antigravity.labels" . | nindent 6 }}
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s
{{- end }}

# Built-in metrics endpoint in main container
apiVersion: v1
kind: ConfigMap  
metadata:
  name: {{ include "antigravity.fullname" . }}-metrics
data:
  metrics.sh: |
    #!/bin/bash
    # Export basic metrics for monitoring
    
    # Container health
    echo "devcontainer_up{instance=\"{{ .Values.name }}\"} 1"
    
    # MCP sidecar status  
    {{- range $name, $config := .Values.mcp.sidecars }}
    {{- if $config.enabled }}
    if curl -f http://localhost:{{ $config.port }}/health >/dev/null 2>&1; then
        echo "devcontainer_mcp_sidecar_up{sidecar=\"{{ $name }}\"} 1"
    else
        echo "devcontainer_mcp_sidecar_up{sidecar=\"{{ $name }}\"} 0"  
    fi
    {{- end }}
    {{- end }}
    
    # Resource usage
    echo "devcontainer_memory_usage_bytes $(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)"
    echo "devcontainer_cpu_usage_percent $(grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage}')"

6. Startup Validation

Init container for pre-startup validation:

# In deployment.yaml
initContainers:
- name: validate-environment  
  image: bitnami/kubectl:latest
  command: ['sh', '-c']
  args:
  - |
    set -e
    echo "🔍 Validating deployment environment..."
    
    # Check storage class exists
    {{- if .Values.storage.className }}
    kubectl get storageclass {{ .Values.storage.className }}
    {{- end }}
    
    # Validate secret if specified
    {{- if .Values.envSecretName }}
    kubectl get secret {{ .Values.envSecretName }}
    {{- end }}
    
    # Check cluster access permissions
    {{- if ne .Values.clusterAccess "none" }}
    kubectl auth can-i get pods --as=system:serviceaccount:{{ .Release.Namespace }}:{{ include "antigravity.fullname" . }}
    {{- end }}
    
    echo "✅ Environment validation passed"

Alert Categories

Error Level (Deployment Fails)

  • Required values missing
  • Invalid configuration combinations
  • Resource conflicts
  • Permission issues

Warning Level (Shows in NOTES.txt)

  • Suboptimal resource sizing
  • Missing optional configuration
  • Security recommendations
  • Performance concerns

Info Level (Logged)

  • Configuration recommendations
  • Usage tips
  • Feature suggestions

Implementation Examples

NOTES.txt Enhancement

# templates/NOTES.txt
{{- include "antigravity.configWarnings" . }}

✅ Dev Container deployed successfully!

🌐 Access your environment:
{{- if ne .Values.ide.type "none" }}
   VNC: kubectl port-forward deployment/{{ include "antigravity.fullname" . }} 5800:5800
   URL: http://localhost:5800
{{- end }}

{{- if .Values.ssh.enabled }}
   SSH: kubectl port-forward deployment/{{ include "antigravity.fullname" . }} 2222:22
   Connect: ssh -p 2222 user@localhost
{{- end }}

🔧 Validate deployment:
   helm test {{ .Release.Name }}

📊 Monitor health:
   kubectl logs deployment/{{ include "antigravity.fullname" . }} -f

Benefits

  • Prevent Issues: Catch problems before deployment
  • Faster Resolution: Clear error messages and diagnostics
  • Proactive Monitoring: Detect issues before they impact users
  • Better UX: Helpful warnings and recommendations
  • Operational Excellence: Built-in best practices

Implementation Plan

  • Add template validation helpers
  • Implement runtime health checks
  • Create Helm test suite
  • Add configuration warning system
  • Build monitoring and metrics
  • Create startup validation init container
  • Enhance NOTES.txt with warnings
  • Test validation across different scenarios
  • Document validation features
## Summary Implement comprehensive validation and health monitoring to prevent common deployment issues and provide proactive alerting for problems. ## Proposed Components ### 1. Template Validation **Pre-deployment validation** in Helm templates: ```yaml {{/* values.yaml validation */}} {{- define "antigravity.validateValues" -}} {{/* Required fields validation */}} {{- if not .Values.name }} {{- fail "name is required" }} {{- end }} {{/* Naming convention validation */}} {{- if not (regexMatch "^[a-z0-9][a-z0-9-]*[a-z0-9]$" .Values.name) }} {{- fail "name must be a valid Kubernetes resource name" }} {{- end }} {{/* Resource conflicts */}} {{- if and (eq .Values.ide.type "none") .Values.display }} {{- fail "display settings are not applicable when ide.type=none" }} {{- end }} {{/* MCP sidecar dependencies */}} {{- if and .Values.mcp.sidecars.kubernetes.enabled (eq .Values.clusterAccess "none") }} {{- fail "kubernetes MCP sidecar requires clusterAccess != none" }} {{- end }} {{- end }} ``` **Usage**: Automatically called from main templates: ```yaml # deployment.yaml {{- include "antigravity.validateValues" . }} apiVersion: apps/v1 kind: Deployment ``` ### 2. Runtime Health Checks **Enhanced health monitoring** beyond basic liveness/readiness: ```yaml # templates/healthcheck-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "antigravity.fullname" . }}-healthcheck data: healthcheck.sh: | #!/bin/bash # Comprehensive health check script # Check VNC server if ! curl -f http://localhost:5800/vnc.html >/dev/null 2>&1; then echo "❌ VNC server not responding" exit 1 fi # Check MCP sidecars {{- range $name, $config := .Values.mcp.sidecars }} {{- if $config.enabled }} if ! curl -f http://localhost:{{ $config.port }}/health >/dev/null 2>&1; then echo "❌ MCP sidecar {{ $name }} not healthy" exit 1 fi {{- end }} {{- end }} # Check workspace mount if [ ! -d "/workspace" ]; then echo "❌ Workspace not mounted" exit 1 fi # Check git configuration if [ ! -f "/config/userdata/.gitconfig" ]; then echo "⚠️ Git not configured" fi echo "✅ All health checks passed" ``` ### 3. Deployment Validation Tests **Helm test hooks** for post-deployment validation: ```yaml # templates/tests/connection-test.yaml apiVersion: v1 kind: Pod metadata: name: {{ include "antigravity.fullname" . }}-test-connection annotations: "helm.sh/hook": test spec: restartPolicy: Never containers: - name: test-connection image: curlimages/curl:latest command: ['sh', '-c'] args: - | set -e echo "Testing VNC connection..." curl -f http://{{ include "antigravity.fullname" . }}:5800/vnc.html echo "Testing MCP endpoints..." {{- range $name, $config := .Values.mcp.sidecars }} {{- if $config.enabled }} curl -f http://{{ include "antigravity.fullname" . }}:{{ $config.port }}/sse || echo "Warning: {{ $name }} MCP not ready" {{- end }} {{- end }} echo "All connection tests passed!" ``` **Usage**: ```bash helm test mydev # Run all validation tests ``` ### 4. Configuration Warnings System **Smart warnings** for potential issues: ```yaml {{/* Configuration warnings */}} {{- define "antigravity.configWarnings" -}} {{/* Resource warnings */}} {{- $memRequest := .Values.resources.requests.memory | default "2Gi" }} {{- if (lt (include "antigravity.memoryToBytes" $memRequest) 2147483648) }} {{- printf "WARNING: Memory request %s may be insufficient for enabled features" $memRequest | print }} {{- end }} {{/* Storage warnings */}} {{- $storageSize := .Values.storage.size | default "32Gi" }} {{- if and .Values.mcp.sidecars.playwright.enabled (lt (include "antigravity.storageToBytes" $storageSize) 21474836480) }} {{- print "WARNING: Storage size may be insufficient for Playwright browser downloads" }} {{- end }} {{/* Security warnings */}} {{- if and .Values.ssh.enabled (not .Values.envSecretName) }} {{- print "WARNING: SSH enabled but no SSH_AUTHORIZED_KEYS secret configured" }} {{- end }} {{/* Performance warnings */}} {{- $enabledSidecars := 0 }} {{- range $name, $config := .Values.mcp.sidecars }} {{- if $config.enabled }}{{- $enabledSidecars = add $enabledSidecars 1 }}{{- end }} {{- end }} {{- if gt $enabledSidecars 4 }} {{- printf "WARNING: %d MCP sidecars enabled may impact performance" $enabledSidecars }} {{- end }} {{- end }} ``` ### 5. Monitoring and Alerting **Built-in monitoring** for common issues: ```yaml # templates/monitoring/servicemonitor.yaml (optional) {{- if .Values.monitoring.enabled }} apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: {{ include "antigravity.fullname" . }} spec: selector: matchLabels: {{- include "antigravity.labels" . | nindent 6 }} endpoints: - port: metrics path: /metrics interval: 30s {{- end }} # Built-in metrics endpoint in main container apiVersion: v1 kind: ConfigMap metadata: name: {{ include "antigravity.fullname" . }}-metrics data: metrics.sh: | #!/bin/bash # Export basic metrics for monitoring # Container health echo "devcontainer_up{instance=\"{{ .Values.name }}\"} 1" # MCP sidecar status {{- range $name, $config := .Values.mcp.sidecars }} {{- if $config.enabled }} if curl -f http://localhost:{{ $config.port }}/health >/dev/null 2>&1; then echo "devcontainer_mcp_sidecar_up{sidecar=\"{{ $name }}\"} 1" else echo "devcontainer_mcp_sidecar_up{sidecar=\"{{ $name }}\"} 0" fi {{- end }} {{- end }} # Resource usage echo "devcontainer_memory_usage_bytes $(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)" echo "devcontainer_cpu_usage_percent $(grep 'cpu ' /proc/stat | awk '{usage=($2+$4)*100/($2+$4+$5)} END {print usage}')" ``` ### 6. Startup Validation **Init container** for pre-startup validation: ```yaml # In deployment.yaml initContainers: - name: validate-environment image: bitnami/kubectl:latest command: ['sh', '-c'] args: - | set -e echo "🔍 Validating deployment environment..." # Check storage class exists {{- if .Values.storage.className }} kubectl get storageclass {{ .Values.storage.className }} {{- end }} # Validate secret if specified {{- if .Values.envSecretName }} kubectl get secret {{ .Values.envSecretName }} {{- end }} # Check cluster access permissions {{- if ne .Values.clusterAccess "none" }} kubectl auth can-i get pods --as=system:serviceaccount:{{ .Release.Namespace }}:{{ include "antigravity.fullname" . }} {{- end }} echo "✅ Environment validation passed" ``` ## Alert Categories ### Error Level (Deployment Fails) - Required values missing - Invalid configuration combinations - Resource conflicts - Permission issues ### Warning Level (Shows in NOTES.txt) - Suboptimal resource sizing - Missing optional configuration - Security recommendations - Performance concerns ### Info Level (Logged) - Configuration recommendations - Usage tips - Feature suggestions ## Implementation Examples ### NOTES.txt Enhancement ```yaml # templates/NOTES.txt {{- include "antigravity.configWarnings" . }} ✅ Dev Container deployed successfully! 🌐 Access your environment: {{- if ne .Values.ide.type "none" }} VNC: kubectl port-forward deployment/{{ include "antigravity.fullname" . }} 5800:5800 URL: http://localhost:5800 {{- end }} {{- if .Values.ssh.enabled }} SSH: kubectl port-forward deployment/{{ include "antigravity.fullname" . }} 2222:22 Connect: ssh -p 2222 user@localhost {{- end }} 🔧 Validate deployment: helm test {{ .Release.Name }} 📊 Monitor health: kubectl logs deployment/{{ include "antigravity.fullname" . }} -f ``` ## Benefits - **Prevent Issues**: Catch problems before deployment - **Faster Resolution**: Clear error messages and diagnostics - **Proactive Monitoring**: Detect issues before they impact users - **Better UX**: Helpful warnings and recommendations - **Operational Excellence**: Built-in best practices ## Implementation Plan - [ ] Add template validation helpers - [ ] Implement runtime health checks - [ ] Create Helm test suite - [ ] Add configuration warning system - [ ] Build monitoring and metrics - [ ] Create startup validation init container - [ ] Enhance NOTES.txt with warnings - [ ] Test validation across different scenarios - [ ] Document validation features
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: farhoodlabs/devcontainer#37