fix(GRO-2123): serialize seed.ts with Postgres advisory lock #155

Merged
Flea Flicker merged 1 commits from flea-flicker/gro-2123-seed-advisory-lock into dev 2026-06-04 11:23:41 +00:00
Member

Summary

Fixes the non-deterministic FK 23503 on invoice_tip_splits in the UAT reset-demo-data CronJob (GRO-2123).

Root cause

packages/db/src/seed.ts does TRUNCATE … invoices, invoice_tip_splits … then thousands of inserts with no enclosing transaction and no concurrency lock. The in-loop flush order is correct, so a single isolated run cannot produce this FK violation. But all three reset-demo-data CronJob overlays used concurrencyPolicy: Replace on an hourly schedule: a new pod can start while the previous one is still mid-seed, and the new pod's TRUNCATE invoices deletes rows the old pod is still inserting against. Hence the in-flight invoice_tip_splits insert references an invoice that was just truncated away → code: '23503', non-deterministic across nodes (wakko/mindy).

Fix (this PR — api)

Acquire a session-level pg_advisory_lock for the full duration of seed() so overlapping invocations block then proceed in order.

Critical implementation note (architectural correctness): with postgres-js connection pooling, a session-level pg_advisory_lock acquired on one pooled connection and released on a different one is a no-op (the lock is bound to the session / pg-backend that took it). We therefore:

  1. Reserve a dedicated connection from the pool via client.reserve().
  2. Run pg_advisory_lock(KEY) on that reserved connection (before the TRUNCATE).
  3. Run the existing seed on the pooled db (no per-insert connection changes).
  4. Release the lock on the same reserved connection in a try/finally so a thrown seed error cannot leak the lock or the connection.
  5. End the pooled client (client.end()) after the lock releases — releasing a reserved connection back to a closed pool is a no-op that confuses the cleanup.

SEED_ADVISORY_LOCK_KEY = 0x47524f4f — ASCII "GROO", stable, documented in the file. The single-argument pg_advisory_lock(int) form is used (postgres-js serializes int as a plain number; no bigint type plumbing required).

Defense in depth (companion PR)

groombook/infra PR #TBD: concurrencyPolicy: Replace → Forbid on all three reset-demo-data CronJob overlays (dev/uat/prod). Forbid skips a new scheduled run while one is still active, removing the cross-schedule overlap vector entirely.

Acceptance criteria

  • PR 1 (api — this PR) opens against dev.
  • PR 2 (infra) merges first so the Forbid policy is in place when the rebuilt reset image deploys.
  • After merge, reset image rebuilds and deploys to UAT via the existing Flux pipeline.
  • Backend smoke (in groombook-uat): kubectl create job --from=cronjob/reset-demo-data verify-gro2123 -n groombook-uat — pod reaches Completed, logs show ✓ Acquired seed advisory lock and ✓ Released seed advisory lock from seed.ts, no code: '23503' / invoice_tip_splits FK error.
  • Bonus: back-to-back manual jobs prove the lock serializes (second one blocks on the first).
  • No regression to existing seed counts/output lines (500 clients, ~4000 invoices, same ✓ Created N lines).

UAT Playbook

UAT_PLAYBOOK.md §3.29 documents the regression check.

Files changed

  • packages/db/src/seed.ts — add SEED_ADVISORY_LOCK_KEY, withSeedAdvisoryLock helper, extract runSeedBody; wrap seed() in the helper; remove the in-body client.end() so the lock connection can release cleanly first.
  • UAT_PLAYBOOK.md — new §3.29 (TC-API-3.29).

cc @cpfarhood

🤖 Generated with Claude Code

## Summary Fixes the non-deterministic `FK 23503` on `invoice_tip_splits` in the UAT `reset-demo-data` CronJob (GRO-2123). ### Root cause `packages/db/src/seed.ts` does `TRUNCATE … invoices, invoice_tip_splits …` then thousands of inserts **with no enclosing transaction and no concurrency lock**. The in-loop flush order is correct, so a single isolated run cannot produce this FK violation. But all three `reset-demo-data` CronJob overlays used `concurrencyPolicy: Replace` on an hourly schedule: a new pod can start while the previous one is still mid-seed, and the new pod's `TRUNCATE invoices` deletes rows the old pod is still inserting against. Hence the in-flight `invoice_tip_splits` insert references an invoice that was just truncated away → `code: '23503'`, non-deterministic across nodes (wakko/mindy). ### Fix (this PR — `api`) Acquire a session-level `pg_advisory_lock` for the full duration of `seed()` so overlapping invocations **block then proceed** in order. **Critical implementation note (architectural correctness):** with postgres-js connection pooling, a session-level `pg_advisory_lock` acquired on one pooled connection and released on a *different* one is a no-op (the lock is bound to the session / pg-backend that took it). We therefore: 1. Reserve a **dedicated** connection from the pool via `client.reserve()`. 2. Run `pg_advisory_lock(KEY)` on that reserved connection (before the `TRUNCATE`). 3. Run the existing seed on the pooled `db` (no per-insert connection changes). 4. Release the lock on the **same** reserved connection in a `try/finally` so a thrown seed error cannot leak the lock or the connection. 5. End the pooled client (`client.end()`) **after** the lock releases — releasing a reserved connection back to a closed pool is a no-op that confuses the cleanup. `SEED_ADVISORY_LOCK_KEY = 0x47524f4f` — ASCII `"GROO"`, stable, documented in the file. The single-argument `pg_advisory_lock(int)` form is used (postgres-js serializes `int` as a plain `number`; no `bigint` type plumbing required). ### Defense in depth (companion PR) `groombook/infra` PR #TBD: `concurrencyPolicy: Replace → Forbid` on all three `reset-demo-data` CronJob overlays (dev/uat/prod). `Forbid` skips a new scheduled run while one is still active, removing the cross-schedule overlap vector entirely. ### Acceptance criteria - [x] PR 1 (`api` — this PR) opens against `dev`. - [ ] PR 2 (`infra`) merges first so the Forbid policy is in place when the rebuilt reset image deploys. - [ ] After merge, reset image rebuilds and deploys to UAT via the existing Flux pipeline. - [ ] Backend smoke (in `groombook-uat`): `kubectl create job --from=cronjob/reset-demo-data verify-gro2123 -n groombook-uat` — pod reaches `Completed`, logs show `✓ Acquired seed advisory lock` and `✓ Released seed advisory lock` from `seed.ts`, no `code: '23503'` / `invoice_tip_splits` FK error. - [ ] Bonus: back-to-back manual jobs prove the lock serializes (second one blocks on the first). - [ ] No regression to existing seed counts/output lines (500 clients, ~4000 invoices, same `✓ Created N` lines). ### UAT Playbook `UAT_PLAYBOOK.md` §3.29 documents the regression check. ### Files changed - `packages/db/src/seed.ts` — add `SEED_ADVISORY_LOCK_KEY`, `withSeedAdvisoryLock` helper, extract `runSeedBody`; wrap `seed()` in the helper; remove the in-body `client.end()` so the lock connection can release cleanly first. - `UAT_PLAYBOOK.md` — new §3.29 (TC-API-3.29). cc @cpfarhood 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Flea Flicker added 1 commit 2026-06-04 11:13:01 +00:00
fix(GRO-2123): serialize seed.ts with Postgres advisory lock
CI / Test (pull_request) Successful in 13s
CI / Lint & Typecheck (pull_request) Successful in 15s
CI / Build & Push Docker Images (pull_request) Successful in 58s
d1a68d93de
The reset-demo-data CronJob in groombook-uat intermittently failed with
FK 23503 on invoice_tip_splits because two pods could run the seed
concurrently: the new pod's TRUNCATE deleted rows the old pod was still
inserting.

Acquire a session-level advisory lock for the full duration of the seed.
CRITICAL: with postgres-js connection pooling, a pg_advisory_lock
acquired on one pooled connection and released on a different one is a
no-op (the lock is bound to the pg-backend that took it). We therefore
reserve a dedicated connection for the lock, take pg_advisory_lock(KEY)
on it, run the seed on the pooled connections, and release the lock +
reserved connection in a try/finally so a thrown seed error cannot leak
the lock or the connection.

Defence-in-depth with the infra PR that switches
concurrencyPolicy: Replace → Forbid on the reset-demo-data CronJob.

- Adds withSeedAdvisoryLock helper and runSeedBody extracted function
- Wraps seed() body in the helper; client.end() runs after the lock
  releases so a reserved connection is not returned to a closed pool
- SEED_ADVISORY_LOCK_KEY = 0x47524f4f ("GROO" in ASCII) — arbitrary
  stable 32-bit key, referenced in runbooks
- UAT_PLAYBOOK.md §3.29 documents the regression check

cc @cpfarhood

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Flea Flicker scheduled this pull request to auto merge when all checks succeed 2026-06-04 11:23:39 +00:00
Flea Flicker merged commit f67b96ddfe into dev 2026-06-04 11:23:41 +00:00
Sign in to join this conversation.