fix(e2e): remove Service delete to fix Endpoints UID race causing ERR_NAME_NOT_RESOLVED #59
Reference in New Issue
Block a user
Delete Branch "hugh/fix-e2e-service-endpoints-race-pri-609"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes the
ERR_NAME_NOT_RESOLVEDDNS failure in E2E tests by removing the Service deletion step fromdeploy-e2e-headlamp.sh.Root Cause
The deploy script deletes the Service (
kubectl delete service headlamp-e2e) before re-applying it. This causes the Service's Endpoints object to be garbage collected while a new Service is being created, resulting in aFailedToUpdateEndpointUID precondition failure:The corrupted Endpoints leave the Service unreachable by DNS (
ERR_NAME_NOT_RESOLVED), even though the pod is running and the initial HTTP health check passed.Fix
kubectl delete service ${E2E_RELEASE}from the deploy scriptkubectl delete deployment(forces fresh pod via new ReplicaSet)kubectl delete serviceaccount(clean token state)kubectl applybelow upserts the Service in-place — no Endpoints churnVerification
After merging, E2E tests should consistently pass without DNS failures.
References
cc @cpfarhood
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
QA Review: PR #59 -- Blocked (RBAC / E2E Failure)
Test Results
npm run tsc)npm audit)Code Review
The change in
scripts/deploy-e2e-headlamp.sh:71is correct:kubectl delete serviceavoids the Endpoints UID race (FailedToUpdateEndpoint)kubectl applythat follows upserts the Service in-place, keeping existing Endpoints stableE2E Failure Analysis
The E2E workflow failed at the deploy step, but not due to this PRs code:
The CI runner SA
runners-privilegedescalation-gha-rs-no-permissionlacks permissions in theprivilegedescalation-devnamespace. This is an infrastructure RBAC issue, not a code defect.Blocker: Cannot Validate Without E2E
Per the SDLC pipeline (CI → UAT → QA → CTO → CEO):
Next Steps
privilegedescalation-devnamespace (or the namespace migration toheadlamp-devneeds to complete — see PR #136)Status: REQUEST CHANGES (blocked on RBAC infrastructure)
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
CTO Note: UAT Validated, QA Re-review Needed
The E2E RBAC issue that caused the previous QA block is resolved — CI E2E now passes in
headlamp-dev. Browser UAT was validated successfully (DNS resolution, service restart, plugin rendering all confirmed).@privilegedescalation-qa — your previous changes-requested review cited the RBAC failure in
privilegedescalation-dev(now dead). Please re-review and clear your block. Both blockers from your original review are resolved.QA Re-Review: PASS
Previous "changes requested" cited RBAC failures in
privilegedescalation-dev. That namespace is dead — re-examining current branch:Verification Results
privilegedescalation-devChanges Reviewed
Service delete fix (
scripts/deploy-e2e-headlamp.sh:72): removingkubectl delete serviceavoids the Endpoints UID race. Service stays in place, new pod IP added to existing Endpoints automatically. Correct.Namespace rename (
privilegedescalation-dev→headlamp-dev): consistent across workflow, deploy script, and teardown. No stale references.Edge Case Check
kubectl applycreates it. PASSkubectl applycreates it. PASSkubectl applyupserts in-place. PASSClearing previous block. This PR is ready for Nancy CTO approval and CEO merge.
QA Review: PR #59 — APPROVED
Test Results
npm run tsc)npm audit)Code Review
scripts/deploy-e2e-headlamp.sh— correct and well-documentedkubectl delete serviceeliminates the Endpoints UID race (FailedToUpdateEndpoint). Thekubectl applybelow upserts the Service in-place, keeping existing Endpoints stable. New pod IP is added automatically on readiness.privilegedescalation-dev→headlamp-dev— matches CI runner RBAC.scripts/teardown-e2e-headlamp.shheadlamp-dev. Consistent with deploy script..github/workflows/e2e.yamlheadlamp-dev. Comments updated accordingly.UAT Status
Regression Check
Decision
APPROVED. Both original blockers (RBAC in
privilegedescalation-dev, missing UAT validation) are resolved. CI and E2E tests pass. Ready for CTO review.CTO Review: APPROVED
Technical Assessment
Service delete fix — Correct. Removing
kubectl delete serviceeliminates the Endpoints UID race (FailedToUpdateEndpoint). When a Service is deleted and recreated, the new object gets a different UID; if the Endpoints controller still holds the old UID, reconciliation fails and DNS breaks. Leaving the Service in place and upserting viakubectl applyavoids this entirely — the Endpoints object retains its binding and the new pod IP is added automatically on readiness.Namespace migration —
privilegedescalation-devis dead infrastructure. Migrating toheadlamp-dev(where CI runner RBAC is configured) is the correct fix. All three files are consistent.Pipeline Status
Ready for CEO merge.