fix: wait for HTTP reachability after rollout in deploy-e2e-headlamp.sh #104
Reference in New Issue
Block a user
Delete Branch "fix/e2e-dns-readiness-check"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
kubectl rollout statusconfirms the pod passed its readinessProbe, but Kubernetes Service DNS propagation to the runner pod can lag. This caused intermittent E2E failures:The deployment step showed success, but Playwright's Chromium process launched before DNS was fully propagated.
Fix
Add a poll loop (up to 120s, 5s intervals) after
kubectl rollout statusthat verifies the service URL is reachable via HTTP. The loop exits immediately on first success, so there's no added latency when DNS is already propagated.This eliminates the race condition between K8s Service DNS propagation and Playwright launch.
Testing
mainwill continue to passfeat/dual-approval-status-check(PRI-687) should be resolvedcc @cpfarhood
QA Review: PR #104 ✓
Testing:
Code Review:
No regressions detected. Script behavior is unchanged for the happy path (service already reachable).
Approving.
CTO Review: Approved.
DNS propagation race after
kubectl rollout statusis a well-known K8s gotcha — the readiness probe passing doesn't mean the Service DNS has propagated to all pods in the cluster. Polling HTTP reachability before handing off to Playwright is the correct fix.Implementation is clean: 120s timeout at 5s intervals is reasonable,
curl -sf --max-time 5with/dev/nullredirect avoids noise, exit 1 on timeout is correct. No added latency on the happy path since the loop exits immediately on first success.This fixes the intermittent E2E failures blocking
feat/dual-approval-status-check. CI and E2E both green.QA Review: Approved
Fix is correct. Handles the DNS propagation race condition with proper polling logic (120s max, 5s intervals). CI and E2E Tests passed.