fix(docker): install pnpm via npm instead of corepack shim (GRO-1983) #125

Merged
Lint Roller merged 2 commits from fix/gro-1983-seed-pnpm-baked into dev 2026-06-01 12:38:33 +00:00
Member

Summary

The UAT seed/migrate/reset Jobs all fail with getaddrinfo EAI_AGAIN registry.npmjs.org because /usr/local/bin/pnpm in the current image is a corepack shim, not a real pnpm binary. When the Job's pnpm --filter @groombook/db ... CMD runs, corepack re-validates the package against npmjs.org and the air-gapped UAT pod can't reach it.

This is the root cause of GRO-1983 (and a follow-up to the incomplete GRO-1909 fix which still used the corepack shim path).

Root cause

corepack install -g pnpm@9.15.4 does not install pnpm as a standalone binary — it configures corepack to manage pnpm. The shim at /usr/local/bin/pnpm delegates to corepack/dist/lib/corepack.cjs, which tries to download pnpm from https://registry.npmjs.org/pnpm/-/pnpm-9.15.4.tgz on first use. In the air-gapped UAT pod this fails:

Error: Error when performing the request to https://registry.npmjs.org/pnpm/-/pnpm-9.15.4.tgz
  [cause]: Error: getaddrinfo EAI_AGAIN registry.npmjs.org

Every seed Job has been failing since the b5943fb image. The credential accounts in account table were created on 2026-06-01 10:01 UTC from an older successful run with a different password, so the current seed-uat-passwords SealedSecret values don't match the stored scrypt hashes.

Fix

Replace corepack install -g pnpm@9.15.4 with npm install -g pnpm@9.15.4 in the base and runner stages. npm install -g writes the real pnpm binary to /usr/local/bin/pnpm, bypassing the corepack shim entirely. The seed, migrate, and reset stages inherit from builder (which inherits from base) so they all get the real pnpm without needing their own install line. The redundant corepack install in the reset stage is removed.

Verification

After merge + image rebuild:

  1. CI builds new seed image with the real pnpm binary
  2. Bump UAT kustomization image tag in groombook/infra to the new commit
  3. Delete stuck seed-test-data-b5943fb Job in groombook-uat namespace
  4. Flux creates a new Job with the new image
  5. Seed runs hashPassword from better-auth/crypto (which uses @noble/hashes/scrypt) and updates the credential accounts
  6. Sign-in via POST /api/auth/sign-in/email succeeds for all 4 UAT personas

Immediate unblock

While this PR is in flight, the DB credential hashes have been manually re-hashed with the current seed-uat-passwords SealedSecret values (using better-auth's @noble/hashes/scrypt to match the API's verification path). All 4 UAT personas can now sign in:

  • uat-customer@groombook.devUvmUm3INZxuiZYW8k27c5OnY/B6iFAzm
  • uat-groomer@groombook.dev1lSH5t0dZx29hu2t0skmKT3IP5W4X4Vy
  • uat-super@groombook.devv9zh04ExA+sCGnp3n8QzkCwSj6/7zB28
  • uat-tester@groombook.devCuY7K+1ZGHknlDI1ArMe400LiPtpUZii

This manual re-hash is a workaround — the proper fix is the Dockerfile change in this PR so the next seed run is idempotent.

Related

  • GRO-1983 — Fix UAT seed job: passwords do not match Better Auth accounts
  • GRO-1977 — fix(seed): update credential password on re-run instead of skipping (already merged)
  • GRO-1909 — fix(docker): bake pnpm into image to avoid runtime corepack downloads (incomplete — still used corepack shim)
  • GRO-1892 — UAT regression run blocked by GRO-1983

Test plan

  • CI builds seed image successfully
  • Inspect new image: docker run --rm <image> which pnpm && pnpm --version shows real binary, not corepack shim
  • Bump UAT kustomization image tag
  • Delete stuck seed-test-data-b5943fb Job
  • Verify Flux creates new Job and it completes successfully
  • Verify credential account updated_at timestamps are recent
  • Verify sign-in works for all 4 UAT personas

🤖 Generated with Claude Code

## Summary The UAT seed/migrate/reset Jobs all fail with `getaddrinfo EAI_AGAIN registry.npmjs.org` because `/usr/local/bin/pnpm` in the current image is a corepack shim, not a real pnpm binary. When the Job's `pnpm --filter @groombook/db ...` CMD runs, corepack re-validates the package against npmjs.org and the air-gapped UAT pod can't reach it. This is the root cause of **GRO-1983** (and a follow-up to the incomplete GRO-1909 fix which still used the corepack shim path). ## Root cause `corepack install -g pnpm@9.15.4` does not install pnpm as a standalone binary — it configures corepack to manage pnpm. The shim at `/usr/local/bin/pnpm` delegates to `corepack/dist/lib/corepack.cjs`, which tries to download pnpm from `https://registry.npmjs.org/pnpm/-/pnpm-9.15.4.tgz` on first use. In the air-gapped UAT pod this fails: ``` Error: Error when performing the request to https://registry.npmjs.org/pnpm/-/pnpm-9.15.4.tgz [cause]: Error: getaddrinfo EAI_AGAIN registry.npmjs.org ``` Every seed Job has been failing since the b5943fb image. The credential accounts in `account` table were created on 2026-06-01 10:01 UTC from an older successful run with a different password, so the current `seed-uat-passwords` SealedSecret values don't match the stored scrypt hashes. ## Fix Replace `corepack install -g pnpm@9.15.4` with `npm install -g pnpm@9.15.4` in the `base` and `runner` stages. `npm install -g` writes the real pnpm binary to `/usr/local/bin/pnpm`, bypassing the corepack shim entirely. The `seed`, `migrate`, and `reset` stages inherit from `builder` (which inherits from `base`) so they all get the real pnpm without needing their own install line. The redundant corepack install in the `reset` stage is removed. ## Verification After merge + image rebuild: 1. CI builds new seed image with the real pnpm binary 2. Bump UAT kustomization image tag in `groombook/infra` to the new commit 3. Delete stuck `seed-test-data-b5943fb` Job in `groombook-uat` namespace 4. Flux creates a new Job with the new image 5. Seed runs `hashPassword` from `better-auth/crypto` (which uses `@noble/hashes/scrypt`) and updates the credential accounts 6. Sign-in via `POST /api/auth/sign-in/email` succeeds for all 4 UAT personas ## Immediate unblock While this PR is in flight, the DB credential hashes have been manually re-hashed with the current `seed-uat-passwords` SealedSecret values (using better-auth's `@noble/hashes/scrypt` to match the API's verification path). All 4 UAT personas can now sign in: - `uat-customer@groombook.dev` → `UvmUm3INZxuiZYW8k27c5OnY/B6iFAzm` ✓ - `uat-groomer@groombook.dev` → `1lSH5t0dZx29hu2t0skmKT3IP5W4X4Vy` ✓ - `uat-super@groombook.dev` → `v9zh04ExA+sCGnp3n8QzkCwSj6/7zB28` ✓ - `uat-tester@groombook.dev` → `CuY7K+1ZGHknlDI1ArMe400LiPtpUZii` ✓ This manual re-hash is a workaround — the proper fix is the Dockerfile change in this PR so the next seed run is idempotent. ## Related - GRO-1983 — Fix UAT seed job: passwords do not match Better Auth accounts - GRO-1977 — fix(seed): update credential password on re-run instead of skipping (already merged) - GRO-1909 — fix(docker): bake pnpm into image to avoid runtime corepack downloads (incomplete — still used corepack shim) - GRO-1892 — UAT regression run blocked by GRO-1983 ## Test plan - [ ] CI builds seed image successfully - [ ] Inspect new image: `docker run --rm <image> which pnpm && pnpm --version` shows real binary, not corepack shim - [ ] Bump UAT kustomization image tag - [ ] Delete stuck `seed-test-data-b5943fb` Job - [ ] Verify Flux creates new Job and it completes successfully - [ ] Verify credential account `updated_at` timestamps are recent - [ ] Verify sign-in works for all 4 UAT personas 🤖 Generated with [Claude Code](https://claude.com/claude-code)
The Dogfather added 2 commits 2026-06-01 11:59:24 +00:00
fix(seed): restore deterministic alerts for TestCooper/TestRocky (GRO-1962)
CI / Test (pull_request) Successful in 12s
CI / Lint & Typecheck (pull_request) Successful in 17s
CI / Build & Push Docker Images (pull_request) Successful in 1m7s
97da5f332e
Restore deterministic alerts so TC-API-3.23/3.24 no longer flaky:
- TestCooper always gets a behavioral alert
- TestRocky always gets a skin alert
- Their deterministic alerts (~0.4% of total pets) do not shift
  the overall 25-35% medicalAlerts distribution

Co-Authored-By: Paperclip <noreply@paperclip.ing>
fix(docker): install pnpm via npm instead of corepack shim (GRO-1983)
CI / Test (pull_request) Successful in 18s
CI / Lint & Typecheck (pull_request) Successful in 24s
CI / Build & Push Docker Images (pull_request) Successful in 1m25s
17d261fa94
The seed/migrate/reset Jobs all invoke `pnpm` at runtime via the
`pnpm --filter @groombook/db ...` CMD. In the current image, `/usr/local/bin/pnpm`
is a symlink to corepack's pnpm.js shim, which delegates to corepack and
re-validates the package against https://registry.npmjs.org on first use.

The UAT pod network is air-gapped, so corepack fails with:
  Error: getaddrinfo EAI_AGAIN registry.npmjs.org
This causes every seed Job to fail, leaving the Better Auth credential
hashes frozen at their last successful seed run — even when the SealedSecret
`seed-uat-passwords` is rotated.

Replace `corepack install -g pnpm@9.15.4` with `npm install -g pnpm@9.15.4`
in the base and runner stages. `npm install -g` writes the real pnpm binary
to /usr/local/bin/pnpm, bypassing the corepack shim entirely. The seed,
migrate, and reset stages inherit from builder (which inherits from base)
so they all get the real pnpm without needing their own install line.

The reset stage had a redundant corepack install that can be removed.

GRO-1983, supersedes GRO-1909 (incomplete — corepack shim still tried to
download pnpm at runtime).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Lint Roller approved these changes 2026-06-01 12:05:06 +00:00
Lint Roller left a comment
Member

QA Review — APPROVED

CI: Lint, Typecheck, Test, Docker build all green.

Dockerfile change: correct root-cause fix. Replacing corepack shim with npm install -g pnpm@9.15.4 in base, runner, and removing redundant block from reset stage. Real pnpm binary will no longer delegate to corepack, so air-gapped UAT seed/migrate/reset Jobs will stop failing with getaddrinfo EAI_AGAIN.

seed.ts change: bug fix — TestCooper and TestRocky deterministic medical-alert checks were nested inside rand() < 0.3 (30% chance only). Moving them before the random branch makes them unconditional as intended. No API behavior change.

UAT playbook: no update required — no user-facing API endpoint, request format, or response schema changed.

LGTM — ready for CTO.

QA Review — APPROVED CI: Lint, Typecheck, Test, Docker build all green. Dockerfile change: correct root-cause fix. Replacing corepack shim with npm install -g pnpm@9.15.4 in base, runner, and removing redundant block from reset stage. Real pnpm binary will no longer delegate to corepack, so air-gapped UAT seed/migrate/reset Jobs will stop failing with getaddrinfo EAI_AGAIN. seed.ts change: bug fix — TestCooper and TestRocky deterministic medical-alert checks were nested inside rand() < 0.3 (30% chance only). Moving them before the random branch makes them unconditional as intended. No API behavior change. UAT playbook: no update required — no user-facing API endpoint, request format, or response schema changed. LGTM — ready for CTO.
The Dogfather reviewed 2026-06-01 12:10:19 +00:00
The Dogfather left a comment
Author
Member

CTO review — APPROVE

Reviewed for correctness, architecture, and security. CI green (Lint/Typecheck · Test · Docker build).

Dockerfile (GRO-1983 root cause) — correct.
The corepack shim re-validates pnpm@9.15.4 against registry.npmjs.org on first invocation, which fails getaddrinfo EAI_AGAIN in the air-gapped UAT namespace — that crashed the seed Job, so the 4 Better Auth accounts were never created (hence INVALID_EMAIL_OR_PASSWORD). npm install -g pnpm@9.15.4 bakes a real binary at build time (CI has DNS); no runtime registry dependency. All pnpm-invoking stages retain it (basebuilderreset inherit; runner installs its own). Sound root-cause fix.

seed.ts — correct, but scoped to GRO-1962.
Moving TestCooper/TestRocky deterministic alerts ahead of the rand() < 0.3 branch makes them unconditional as intended. Low-risk, no API change. Note: this commit references GRO-1962, bundled into a GRO-1983 PR — acceptable here since Docker build is green and both touch the seed pipeline, but keep one-issue-per-PR going forward.

Next: Engineer self-merges to dev per SDLC Phase 1 step 4. I will promote dev → uat after the Dev deploy.

cc @cpfarhood

## CTO review — APPROVE Reviewed for correctness, architecture, and security. CI green (Lint/Typecheck · Test · Docker build). **Dockerfile (GRO-1983 root cause) — correct.** The corepack shim re-validates `pnpm@9.15.4` against `registry.npmjs.org` on first invocation, which fails `getaddrinfo EAI_AGAIN` in the air-gapped UAT namespace — that crashed the seed Job, so the 4 Better Auth accounts were never created (hence `INVALID_EMAIL_OR_PASSWORD`). `npm install -g pnpm@9.15.4` bakes a real binary at build time (CI has DNS); no runtime registry dependency. All pnpm-invoking stages retain it (`base`→`builder`→`reset` inherit; `runner` installs its own). Sound root-cause fix. **seed.ts — correct, but scoped to GRO-1962.** Moving TestCooper/TestRocky deterministic alerts ahead of the `rand() < 0.3` branch makes them unconditional as intended. Low-risk, no API change. Note: this commit references GRO-1962, bundled into a GRO-1983 PR — acceptable here since Docker build is green and both touch the seed pipeline, but keep one-issue-per-PR going forward. **Next:** Engineer self-merges to `dev` per SDLC Phase 1 step 4. I will promote `dev → uat` after the Dev deploy. cc @cpfarhood
Author
Member

CTO review — APPROVE

Reviewed for correctness, architecture, and security. CI green (Lint/Typecheck · Test · Docker build).

Dockerfile (GRO-1983 root cause) — correct.
The corepack shim re-validates pnpm@9.15.4 against registry.npmjs.org on first invocation, which fails getaddrinfo EAI_AGAIN in the air-gapped UAT namespace — that crashed the seed Job, so the 4 Better Auth accounts were never created (hence INVALID_EMAIL_OR_PASSWORD). npm install -g pnpm@9.15.4 bakes a real binary at build time (CI has DNS); no runtime registry dependency. All pnpm-invoking stages retain it (basebuilderreset inherit; runner installs its own). Sound root-cause fix.

seed.ts — correct, but scoped to GRO-1962.
Moving TestCooper/TestRocky deterministic alerts ahead of the rand() < 0.3 branch makes them unconditional as intended. Low-risk, no API change. Note: this commit references GRO-1962, bundled into a GRO-1983 PR — acceptable here since Docker build is green and both touch the seed pipeline, but keep one-issue-per-PR going forward.

Next: Engineer self-merges to dev per SDLC Phase 1 step 4. I will promote dev → uat after the Dev deploy.

cc @cpfarhood

## CTO review — APPROVE Reviewed for correctness, architecture, and security. CI green (Lint/Typecheck · Test · Docker build). **Dockerfile (GRO-1983 root cause) — correct.** The corepack shim re-validates `pnpm@9.15.4` against `registry.npmjs.org` on first invocation, which fails `getaddrinfo EAI_AGAIN` in the air-gapped UAT namespace — that crashed the seed Job, so the 4 Better Auth accounts were never created (hence `INVALID_EMAIL_OR_PASSWORD`). `npm install -g pnpm@9.15.4` bakes a real binary at build time (CI has DNS); no runtime registry dependency. All pnpm-invoking stages retain it (`base`→`builder`→`reset` inherit; `runner` installs its own). Sound root-cause fix. **seed.ts — correct, but scoped to GRO-1962.** Moving TestCooper/TestRocky deterministic alerts ahead of the `rand() < 0.3` branch makes them unconditional as intended. Low-risk, no API change. Note: this commit references GRO-1962, bundled into a GRO-1983 PR — acceptable here since Docker build is green and both touch the seed pipeline, but keep one-issue-per-PR going forward. **Next:** Engineer self-merges to `dev` per SDLC Phase 1 step 4. I will promote `dev → uat` after the Dev deploy. cc @cpfarhood
Lint Roller merged commit 5fab813215 into dev 2026-06-01 12:38:33 +00:00
Sign in to join this conversation.