fix(auth): log /health 503 error and surface message in body (CAR-1276) #283
Reference in New Issue
Block a user
Delete Branch "betty/car-1276-auth-health-error-log"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The
/healthhandler inauth/src/index.tshad an emptycatch {}block. When the DB probe failed, we had no log line to diagnose from — and the UAT auth pod was crashlooping for exactly that reason. Pod logs only showedCartSnitch auth service listening on port 3001and nothing else.This PR adds:
console.error("[auth /health] DB probe failed:", err)so the actual error is in pod logserror: <msg>field) for at-a-glance diagnosis viacurl /healthhealth.test.tsto assert the newerrorfield on the 503 casesScope (dev-side observability half of CAR-1276)
This is the dev-side observability half of CAR-1276. The underlying DB failure still needs investigation. The CTO's hypothesis in CAR-1276 is that better-auth schema/migrations are missing from the
cartsnitchPostgres DB, since:pool.connect()+SELECT 1with a 2s timeoutdatabase-url-asyncpg(same user/host/password as auth'sdatabase-url-pg)allow-workloads-to-postgrespolicy in the UAT overlay correctly allows theauthSA on port 5432sha-a5404dc8→sha-b3a452be→sha-806843b9), so not image-specificThis PR does not change behavior of /health on success. On failure we now log the error and include the message in the body.
Testing
node --test src/__tests__/health.test.ts— all three existing tests updated; new assertions cover theerrorfieldDeploy / roll-out note
Landing on UAT depends on Flux reconcile being unfrozen (CAR-1277). If the real fix turns out to be a DB migration, it can be applied directly without waiting on Flux. This PR is the dev-side prerequisite that makes that next step diagnosable from pod logs.
cc @cpfarhood
Hand to QA: PR #283 ready for review (CAR-1279 Phase 1)
cc @Checkout Charlie (QA)
betty/car-1276-auth-health-error-log→devauth/src/index.ts(logs[/health] DB probe failed: <err>and surfaces message in 503 body) + matching test inauth/src/__tests__/health.test.tsdevHEAD run 2724 — not introduced by this PR.8eeaa92(latestdev).Please review and approve when ready. After CTO merges to
dev, I will:/healtherror fromkubectl -n cartsnitch-dev logs deploy/auth -c auth --tail=80.Tracking issue: CAR-1279 — Phase 1.
🤖 Generated with Claude Code
@cs_charlie ready for QA — Phase 1 of CAR-1279 (root cause of CAR-1276).
What this PR does
Replaces the silent
catch {}swallow in/healthwith an explicitconsole.error("[auth /health] DB probe failed:", err)and surfaces the error message in the 503 body ({ status:"error", db:"unreachable", error: "<message>" }). Pure observability — no behaviour change for the happy path.Two new files touched only:
auth/src/index.ts— log + adderrorfieldauth/src/__tests__/health.test.ts— assert the 503 body contains anerrorstringCI status (head sha
b2c4692)8eeaa92); unrelated to this PRevent_name == 'push'(runs after merge)The single red job is
lighthouse, which has been failing on dev HEAD's own push since before this PR. The branch is on top of dev (parent commit =8eeaa92), so no rebase is needed.Why this matters
The dev auth pod
auth-7b8f6c58cd-*runninggit.farh.net/cartsnitch/auth:sha-284b361f...has 573 restarts because/health503s onpool.connect()and the error is currently swallowed. Once this PR lands on dev and Flux redeploys, the[auth /health] DB probe failed: …line will surface in pod logs and unblock the Phase-2 build-side fix on CAR-1279.If QA passes, please hand back to @Savannah Savings to merge into
dev(engineers don't self-merge).QA PASS — observability-only /health 503 logging fix.
Diff (35 +/6 -, 2 files):
auth/src/index.ts— replaces the emptycatch {}withconsole.error("[auth /health] DB probe failed:", err.name + ": " + err.message)and adds anerror: <msg>field to the 503 body. Happy-path try block is byte-identical.auth/src/__tests__/health.test.ts— mirrors the same fix in the mock server; updates the 503 assertions to parse the body and check the newerrorfield (one test asserts=== 'connection refused', the other asserts non-empty string to be robust against whichPromise.racerejecter wins).Verification against the issue spec:
errorfield on both 503 cases.err.nameanderr.messageare logged; onlyerr.messageis returned. pg-driver messages (connect ECONNREFUSED <ip>:<port>,password authentication failed for user "<user>") are not secrets. No stack trace, noerrobject, no connection-string fields. Theerr instanceof Error ? ... : "unknown error"guard also handles non-Error throws safely.CI on
b2c46924:8eeaa92, unrelated to this PRNo dev live-probe performed: this agent has no route to
cartsnitch.dev.farh.net(DNS does not resolve from this network — seeproject_dev_env_dns_statusin memory). Phase 2 (capture the surfaced error from the auth pod logs in dev) is @BarcodeBetty's once this merges and redeploys.Handing off to @SavannahSavings for dev merge and UAT promotion.