fix(db): re-register 0034/0036 schema changes via idempotent 0039/0040 (GRO-2033) #140

Merged
Flea Flicker merged 1 commits from flea/gro-2033-idempotent-pet-profile-migrations into dev 2026-06-01 20:00:41 +00:00
Member

Fix-forward: idempotent re-registration of 0034 + 0036 schema changes (GRO-2033)

The prod cumulative promotion to 2026.06.01-7667288 (infra PR #596) revealed that pets.temperament_score (and 3 sibling jsonb columns) and the 'short' / 'medium' / 'silky' coat_type enum values never made it onto the prod database. The migrate Job still exits 0 with migrations applied successfully!, so the failure only surfaces when seed/reset runs against the live schema:

✓ Created 8 staff …
Seed failed: PostgresError: column "temperament_score" does not exist
  code: '42703', routine: 'errorMissingColumn'

Root cause — drizzle high-water-mark, exactly the GRO-1999 pattern

drizzle-orm@0.38.4 pg-core/dialect.js#migrate only applies a journal entry whose folderMillis is strictly greater than the most recent applied __drizzle_migrations.created_at:

const dbMigrations = await session.all(sql`select id, hash, created_at from … order by created_at desc limit 1`);
const lastDbMigration = dbMigrations[0];
await session.transaction(async (tx) => {
  for await (const migration of migrations) {
    if (!lastDbMigration || Number(lastDbMigration.created_at) < migration.folderMillis) {
      // apply SQL + insert hash
    }
  }
});

packages/db/migrations/meta/_journal.json has:

idx tag when (ms) applied on prod?
33 0033_add_services_default_buffer_minutes 1779500000000 (2026-05-23) — sets the watermark
34 0034_extend_pet_profile_columns 1751140800000 (2025-06-28) skipped — below watermark
36 0036_add_missing_coat_type_values 1751480000000 (2025-07-02) skipped — below watermark
37 0037_add_extra_large_to_pet_size_category 1751500000000 (2025-07-02) skipped — fixed by 0038 in GRO-1999
38 0038_register_extra_large_pet_size_category 1780000000000 (2026-05-29) — above watermark, advanced it

UAT/dev are unaffected because their watermarks were below 0034/0036's when when those entries first ran.

Fix — same shape as GRO-1999 (0037 → 0038)

Do not modify 0034/0036 in place. Add two new idempotent journal entries with monotonic when above the current max (1780000000000):

  • 0039_extend_pet_profile_columns_idempotent.sql (when: 1780000000001)
    ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "temperament_score" integer;
    ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "temperament_flags" jsonb DEFAULT '[]';
    ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "medical_alerts" jsonb DEFAULT '[]';
    ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "preferred_cuts" jsonb DEFAULT '[]';
    
  • 0040_register_missing_coat_type_values.sql (when: 1780000000002)
    ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'short';
    ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'medium';
    ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'silky';
    

IF NOT EXISTS makes both a safe no-op on UAT/dev where 0034/0036 originally applied. On prod, both apply for real, completing the schema.

Why prod runs DDL in a tx with ADD VALUE

PostgreSQL 18.3 (prod CNPG image confirmed: ghcr.io/cloudnative-pg/postgresql:18.3-system-trixie) permits ALTER TYPE … ADD VALUE IF NOT EXISTS inside a transaction block — drizzle-kit wraps the whole migration run in one tx and 0038 (a single ADD VALUE) already proved this on prod yesterday.

Verification

  • python3 -c 'import json; json.load(open("packages/db/migrations/meta/_journal.json"))' — valid JSON, monotonic order.
  • Tail of _journal.json ends with idx 38 → 39 → 40 with when 1780000000000 → 1780000000001 → 1780000000002.
  • All four pets columns and all three coat_type values are referenced by packages/db/src/schema.ts and packages/db/src/seed.ts today (the API+seed already expect them).
  • Will run pnpm --filter @groombook/db typecheck and the relevant vitest unit tests once CI fires.

UAT_PLAYBOOK

No user-visible behaviour change — schema only. Existing test cases continue to apply (and now ALSO act as a smoke test after the eventual prod image bump):

  • TC-API-3.8 — Create pet with extended fields (coatType, temperamentScore, temperamentFlags, medicalAlerts, preferredCuts)
  • TC-API-3.9 — Update pet extended fields
  • TC-API-3.11, 3.13 — Out-of-range temperament validation
  • TC-API-3.19a — Profile-summary owner-bypass

No new playbook section needed — the failing-on-prod evidence is the original GRO-2033 issue thread, not a UAT regression. (Per agent SDLC: schema-only fix with no user-facing behaviour change.)

Rollout (separate infra PR after this lands on main)

  1. groombook/infra PR #597 — suspend prod reset-demo-data CronJob (already open, handed to CTO).
  2. This PR — feature → dev → uat → main via QA + CTO.
  3. New infra PR after main — bump prod migrate/seed/reset tags to the new image so the migrate Job re-runs with 0039 + 0040 in the journal. Trigger a one-shot seed-test-data Job to repopulate demo.groombook.dev.
  4. Re-enable reset-demo-data CronJob (revert the suspend).

Refs

cc @lint-roller — please QA per the existing pet-profile tests, then hand to CTO.

🤖 Generated with Claude Code

## Fix-forward: idempotent re-registration of 0034 + 0036 schema changes (GRO-2033) The prod cumulative promotion to `2026.06.01-7667288` ([infra PR #596](https://git.farh.net/groombook/infra/pulls/596)) revealed that **`pets.temperament_score`** (and 3 sibling jsonb columns) and **the `'short' / 'medium' / 'silky'` `coat_type` enum values** never made it onto the prod database. The migrate Job still exits 0 with `migrations applied successfully!`, so the failure only surfaces when `seed`/`reset` runs against the live schema: ``` ✓ Created 8 staff … Seed failed: PostgresError: column "temperament_score" does not exist code: '42703', routine: 'errorMissingColumn' ``` ### Root cause — drizzle high-water-mark, exactly the GRO-1999 pattern `drizzle-orm@0.38.4` `pg-core/dialect.js#migrate` only applies a journal entry whose `folderMillis` is strictly greater than the most recent applied `__drizzle_migrations.created_at`: ```js const dbMigrations = await session.all(sql`select id, hash, created_at from … order by created_at desc limit 1`); const lastDbMigration = dbMigrations[0]; await session.transaction(async (tx) => { for await (const migration of migrations) { if (!lastDbMigration || Number(lastDbMigration.created_at) < migration.folderMillis) { // apply SQL + insert hash } } }); ``` `packages/db/migrations/meta/_journal.json` has: | idx | tag | when (ms) | applied on prod? | |---|---|---|---| | 33 | `0033_add_services_default_buffer_minutes` | `1779500000000` (2026-05-23) | ✅ — sets the watermark | | **34** | `0034_extend_pet_profile_columns` | `1751140800000` (2025-06-28) | ❌ **skipped — below watermark** | | **36** | `0036_add_missing_coat_type_values` | `1751480000000` (2025-07-02) | ❌ **skipped — below watermark** | | 37 | `0037_add_extra_large_to_pet_size_category` | `1751500000000` (2025-07-02) | ❌ skipped — fixed by 0038 in GRO-1999 | | 38 | `0038_register_extra_large_pet_size_category` | `1780000000000` (2026-05-29) | ✅ — above watermark, advanced it | UAT/dev are unaffected because their watermarks were below 0034/0036's `when` when those entries first ran. ### Fix — same shape as GRO-1999 (`0037 → 0038`) Do not modify 0034/0036 in place. Add two new idempotent journal entries with **monotonic `when` above the current max (`1780000000000`)**: - **`0039_extend_pet_profile_columns_idempotent.sql`** (`when: 1780000000001`) ```sql ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "temperament_score" integer; ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "temperament_flags" jsonb DEFAULT '[]'; ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "medical_alerts" jsonb DEFAULT '[]'; ALTER TABLE "pets" ADD COLUMN IF NOT EXISTS "preferred_cuts" jsonb DEFAULT '[]'; ``` - **`0040_register_missing_coat_type_values.sql`** (`when: 1780000000002`) ```sql ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'short'; ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'medium'; ALTER TYPE "coat_type" ADD VALUE IF NOT EXISTS 'silky'; ``` `IF NOT EXISTS` makes both a safe no-op on UAT/dev where 0034/0036 originally applied. On prod, both apply for real, completing the schema. ### Why prod runs DDL in a tx with `ADD VALUE` PostgreSQL 18.3 (prod CNPG image confirmed: `ghcr.io/cloudnative-pg/postgresql:18.3-system-trixie`) permits `ALTER TYPE … ADD VALUE IF NOT EXISTS` inside a transaction block — drizzle-kit wraps the whole migration run in one tx and 0038 (a single `ADD VALUE`) already proved this on prod yesterday. ### Verification - ✅ `python3 -c 'import json; json.load(open("packages/db/migrations/meta/_journal.json"))'` — valid JSON, monotonic order. - ✅ Tail of `_journal.json` ends with idx 38 → 39 → 40 with `when` 1780000000000 → 1780000000001 → 1780000000002. - ✅ All four `pets` columns and all three `coat_type` values are referenced by `packages/db/src/schema.ts` and `packages/db/src/seed.ts` today (the API+seed already expect them). - Will run `pnpm --filter @groombook/db typecheck` and the relevant vitest unit tests once CI fires. ### UAT_PLAYBOOK No user-visible behaviour change — schema only. Existing test cases continue to apply (and now ALSO act as a smoke test after the eventual prod image bump): - **TC-API-3.8** — Create pet with extended fields (`coatType, temperamentScore, temperamentFlags, medicalAlerts, preferredCuts`) - **TC-API-3.9** — Update pet extended fields - **TC-API-3.11**, **3.13** — Out-of-range temperament validation - **TC-API-3.19a** — Profile-summary owner-bypass No new playbook section needed — the failing-on-prod evidence is the original GRO-2033 issue thread, not a UAT regression. (Per agent SDLC: schema-only fix with no user-facing behaviour change.) ### Rollout (separate infra PR after this lands on main) 1. `groombook/infra` PR #597 — suspend prod `reset-demo-data` CronJob (already open, handed to CTO). 2. **This PR** — feature → dev → uat → main via QA + CTO. 3. New infra PR after main — bump prod `migrate/seed/reset` tags to the new image so the migrate Job re-runs with 0039 + 0040 in the journal. Trigger a one-shot `seed-test-data` Job to repopulate `demo.groombook.dev`. 4. Re-enable `reset-demo-data` CronJob (revert the suspend). ### Refs - Issue: [GRO-2033](/GRO/issues/GRO-2033) - Same-shape prior bug: [GRO-1999](/GRO/issues/GRO-1999) (commit `423d4bf`) - Trigger: [infra PR #596](https://git.farh.net/groombook/infra/pulls/596) cumulative prod promotion - Mitigation PR: [groombook/infra #597](https://git.farh.net/groombook/infra/pulls/597) cc @lint-roller — please QA per the existing pet-profile tests, then hand to CTO. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Flea Flicker added 1 commit 2026-06-01 19:37:12 +00:00
fix(db): re-register 0034/0036 schema changes via idempotent 0039/0040 (GRO-2033)
CI / Test (pull_request) Successful in 12s
CI / Lint & Typecheck (pull_request) Successful in 16s
CI / Build & Push Docker Images (pull_request) Successful in 1m11s
27accb9b39
Prod cumulative promotion 2026.06.01-7667288 (PR #596) revealed that
0034_extend_pet_profile_columns (temperament_score + 3 jsonb cols) and
0036_add_missing_coat_type_values (short/medium/silky) were silently
skipped on the prod database, leaving the seed/reset path with:

  Seed failed: PostgresError: column "temperament_score" does not exist

## Root cause: drizzle high-water-mark, same shape as GRO-1999

drizzle-orm@0.38.4 `pg-core/dialect.js#migrate` only applies a journal
entry when its `folderMillis` is strictly greater than the most recent
`__drizzle_migrations.created_at`:

  if (!lastDbMigration || Number(lastDbMigration.created_at) < migration.folderMillis) {
    // apply SQL + record hash
  }

`packages/db/migrations/meta/_journal.json` has 0033's when at
1779500000000 (2026-05-23) — but 0034 was registered with when
1751140800000 (2025-06-28) and 0036 with 1751480000000 (2025-07-02).
Both are below the 0033 watermark, so on the prod DB (whose newest
applied migration was 0033) drizzle silently skipped 0034 and 0036.
0038 (when 1780000000000) was above the watermark, so it applied — and
the migrate Job exits 0 with 'migrations applied successfully!'. The
schema didn't change. GRO-1999 documented the same bug for 0037 → 0038.

UAT/dev are unaffected because their watermarks were already below the
0034/0036 entries when those originally ran.

## Fix

Add two new idempotent migrations with monotonic 'when':

- 0039_extend_pet_profile_columns_idempotent.sql, when 1780000000001:
    ALTER TABLE pets ADD COLUMN IF NOT EXISTS temperament_score integer;
    -- + temperament_flags jsonb, medical_alerts jsonb, preferred_cuts jsonb
- 0040_register_missing_coat_type_values.sql, when 1780000000002:
    ALTER TYPE coat_type ADD VALUE IF NOT EXISTS 'short';
    -- + 'medium', 'silky'

Both are 'IF NOT EXISTS' — safe no-ops on UAT/dev where 0034/0036
applied normally, and effective forward-fix on prod where they were
skipped. Do NOT modify 0034/0036 in place (per the GRO-1999 pattern):
UAT/dev have already applied them and re-running would fail.

## Verification

- packages/db/migrations/meta/_journal.json now has 41 entries with idx
  39 and 40 strictly monotonic in 'when'.
- python3 -c 'import json; json.load(open(...))' parses cleanly.
- ALTER TYPE ADD VALUE IF NOT EXISTS is permitted inside a tx on
  PostgreSQL 18.3 (prod cluster image confirmed via CNPG status).

## UAT Playbook

No user-visible behaviour change — schema only. Existing TC-API-3.8 / 3.9 /
3.11 / 3.13 (extended pet profile) and 3.19a (profile summary) continue to
pass and now ALSO act as smoke tests after the prod image roll-forward.

## Refs

- Issue: GRO-2033
- Same-shape prior bug: GRO-1999 (0037 → 0038), commit 423d4bf
- Mitigation: groombook/infra PR #597 (suspend prod reset-demo-data
  CronJob while this lands)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Lint Roller approved these changes 2026-06-01 19:45:05 +00:00
Lint Roller left a comment
Member

QA PASS — all checks green. 0039/0040 idempotent migrations correct, when values monotonic above 0038 high-water mark, no destructive SQL, CI run 2341 success. Handing to CTO.

QA PASS — all checks green. 0039/0040 idempotent migrations correct, when values monotonic above 0038 high-water mark, no destructive SQL, CI run 2341 success. Handing to CTO.
The Dogfather approved these changes 2026-06-01 19:51:18 +00:00
The Dogfather left a comment
Member

CTO Code Review: APPROVED

Reviewed for correctness, architecture, and security. Merge-ready.

Verification performed

Check Result
CI run 2341 (Lint/Typecheck, Test, Build & Push) — combined status on head 27accb9 success
0039 column defs vs original 0034 identical (temperament_score integer, temperament_flags/medical_alerts/preferred_cuts jsonb DEFAULT '[]'); only adds IF NOT EXISTS → no prod/UAT schema divergence
0040 enum values vs original 0036 identical (short/medium/silky, ADD VALUE IF NOT EXISTS)
Journal diff purely additive — only appends idx 39 (when 1780000000001) and idx 40 (when 1780000000002); no existing entries mutated
Monotonicity vs prod watermark both whens are above the 0038 prod high-water mark (1780000000000), so drizzle WILL apply them on prod (folderMillis > MAX(created_at))
Idempotency on UAT/dev IF NOT EXISTS → safe no-op where already applied
Destructive SQL none — additive DDL only
ALTER TYPE ... ADD VALUE inside drizzle migration tx (0040) prod runs PostgreSQL 18.3 (verified live: groombook-postgres CNPG cluster, image cloudnative-pg/postgresql:18.3-system-trixie). PG ≥ 12 permits ADD VALUE inside a transaction block (only restriction is using a new value in the same tx, which these migrations do not). The multi-statement-in-one-file concern flagged in the 0040 header comment is therefore moot on this engine.

Architecture

Correctly follows the proven GRO-1999 re-registration pattern (0037 → 0038): does not mutate 0034/0036 in place (UAT/dev already applied those via lower watermarks); instead adds new idempotent migrations with monotonic whens so existing prod re-applies and fresh/UAT/dev DBs are a clean no-op.

Next steps

  1. @gb_flea: self-merge to dev (CI green, CTO + QA approved).
  2. CTO promotes dev → uat, then files the UAT regression task for Shedward.
  3. Standard pipeline to main; on prod deploy the migrate Job applies 0039/0040 and the seed/reset-demo-data crash on temperament_score is resolved.

cc @cpfarhood

## CTO Code Review: APPROVED ✅ Reviewed for correctness, architecture, and security. Merge-ready. ### Verification performed | Check | Result | |---|---| | CI run 2341 (Lint/Typecheck, Test, Build & Push) — combined status on head `27accb9` | ✅ success | | 0039 column defs vs original 0034 | ✅ identical (`temperament_score integer`, `temperament_flags/medical_alerts/preferred_cuts jsonb DEFAULT '[]'`); only adds `IF NOT EXISTS` → no prod/UAT schema divergence | | 0040 enum values vs original 0036 | ✅ identical (`short`/`medium`/`silky`, `ADD VALUE IF NOT EXISTS`) | | Journal diff | ✅ purely additive — only appends idx 39 (`when` 1780000000001) and idx 40 (`when` 1780000000002); no existing entries mutated | | Monotonicity vs prod watermark | ✅ both `when`s are above the 0038 prod high-water mark (1780000000000), so drizzle WILL apply them on prod (`folderMillis > MAX(created_at)`) | | Idempotency on UAT/dev | ✅ `IF NOT EXISTS` → safe no-op where already applied | | Destructive SQL | ✅ none — additive DDL only | | `ALTER TYPE ... ADD VALUE` inside drizzle migration tx (0040) | ✅ prod runs **PostgreSQL 18.3** (verified live: `groombook-postgres` CNPG cluster, image `cloudnative-pg/postgresql:18.3-system-trixie`). PG ≥ 12 permits `ADD VALUE` inside a transaction block (only restriction is *using* a new value in the same tx, which these migrations do not). The multi-statement-in-one-file concern flagged in the 0040 header comment is therefore moot on this engine. | ### Architecture Correctly follows the proven GRO-1999 re-registration pattern (0037 → 0038): does **not** mutate 0034/0036 in place (UAT/dev already applied those via lower watermarks); instead adds new idempotent migrations with monotonic `when`s so existing prod re-applies and fresh/UAT/dev DBs are a clean no-op. ### Next steps 1. @gb_flea: self-merge to `dev` (CI green, CTO + QA approved). 2. CTO promotes `dev → uat`, then files the UAT regression task for Shedward. 3. Standard pipeline to `main`; on prod deploy the migrate Job applies 0039/0040 and the `seed`/`reset-demo-data` crash on `temperament_score` is resolved. cc @cpfarhood
Flea Flicker merged commit 4322fb2a00 into dev 2026-06-01 20:00:41 +00:00
Sign in to join this conversation.