fix: wait for HTTP reachability after rollout in deploy-e2e-headlamp.sh

kubectl rollout status confirms the pod is ready per readinessProbe, but
Kubernetes Service DNS propagation to the runner pod may lag behind.
This caused intermittent E2E failures with ERR_NAME_NOT_RESOLVED.

Add a poll loop (max 120s) after rollout status that verifies the service
URL is reachable via HTTP before writing .env.e2e. This eliminates the
race condition between DNS propagation and Playwright launch.

Fixes: PRI-687 (intermittent E2E DNS failure)
This commit is contained in:
Hugh Hackman
2026-03-22 04:51:30 +00:00
parent 21026cc992
commit acd53c297b
+18
View File
@@ -157,6 +157,24 @@ kubectl rollout status "deployment/${E2E_RELEASE}" \
# --- Generate a service URL for tests ---
SVC_URL="http://${E2E_RELEASE}.${E2E_NAMESPACE}.svc.cluster.local"
# --- Wait for DNS and HTTP reachability ---
# rollout status only confirms the pod is ready per readinessProbe.
# Kubernetes Service DNS may still be propagating to the runner pod.
# Poll until the service is reachable over HTTP before handing off.
echo ""
echo "Waiting for ${SVC_URL} to be reachable..."
ATTEMPTS=0
MAX_ATTEMPTS=24 # 24 × 5s = 120s max
until curl -sf --max-time 5 "${SVC_URL}" -o /dev/null 2>/dev/null; do
ATTEMPTS=$((ATTEMPTS + 1))
if [ "$ATTEMPTS" -ge "$MAX_ATTEMPTS" ]; then
echo "ERROR: ${SVC_URL} not reachable after $((MAX_ATTEMPTS * 5))s" >&2
exit 1
fi
echo " [${ATTEMPTS}/${MAX_ATTEMPTS}] not yet reachable, retrying in 5s..."
sleep 5
done
echo ""
echo "E2E Headlamp is ready at: ${SVC_URL}"
echo " export HEADLAMP_URL=${SVC_URL}"