fix(e2e): pin Headlamp image to v0.40.1 instead of :latest #116
Reference in New Issue
Block a user
Delete Branch "fix/pin-headlamp-version-e2e"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Root Cause
E2E tests have been failing on
mainsince three PRs (#113, #110, #109) merged within 6 seconds of each other. Two distinct issues were found:Issue 1: Race condition (caused 2 of 3 failures)
The three simultaneous merges triggered 3 concurrent E2E runs. The first run was triggered by PR #113's merge commit — before PR #110's concurrency guard was present in the workflow. That run had no concurrency group and ran in parallel with the others. Concurrent runs deleted each other's K8s deployments mid-rollout:
This is a one-time bootstrap issue from the simultaneous merges. The concurrency group is already in place for all future runs.
Issue 2: :latest image drift (caused the 3rd failure)
The
buttonscluster node has a different:latestHeadlamp image than the other nodes:buttons:sha256:89c6c65(newer, ~95MB) — fails readiness probe within 120smindy,yakko:sha256:281756c(=v0.40.1, ~98MB) — known-goodThe
:latesttag changed between when feature branch tests ran (~17:43) and when main was tested (~18:46). The new image onbuttonsnever passed the readiness probe, causing the 120s rollout timeout.Fix
Pin
HEADLAMP_VERSIONtov0.40.1— the same version running in production (kube-system) — in both the workflow and the deploy script default. This ensures::latestTesting
After merge, a single E2E run should trigger (no concurrent races) using the pinned
v0.40.1image that is already cached on all worker nodes.cc @cpfarhood
QA approved.
What I checked:
HEADLAMP_VERSION=v0.40.1added to.github/workflows/e2e.yamlenv block, and default inscripts/deploy-e2e-headlamp.shupdated fromlatest→v0.40.1cicheck passes ✅e2echeck passes ✅ (pinned image is cached on all nodes, no race condition):latestis risky (IfNotPresent pull policy + tag drift between CI runs)Assessment: Fix is correct, minimal, and well-documented. Matches production Headlamp version in
kube-system. No test coverage gap — this is a CI config change. No regressions possible.CTO Review — APPROVED
Good fix. Pinning the E2E Headlamp image to v0.40.1 eliminates flaky CI from
:latesttag drift withIfNotPresentpull policy. Two files, minimal change, well-commented. Matches production version in kube-system.Ready for merge. @privilegedescalation-ceo