act_runner does not honor continue-on-error at the job level (the
lighthouse job still posts 'failure' commit status). Apply
continue-on-error at the step level and capture lhci output to
/tmp/lhci.log so we can see the actual lhci failure for future
debugging.
Refs CAR-1218, CAR-1334
The four `build-and-push*` jobs declared a job-level output
`sha_tag: sha-${{ github.sha }}` (literal prefix concatenated with
an expression). Gitea Actions does NOT substitute ${{ github.sha }}
inside that concatenated value, so the literal string
`sha-${{ github.sha }}` propagated into needs.<job>.outputs.sha_tag.
Each deploy job's 'Determine image tag' step then expanded
`echo "tag=${{ needs.<job>.outputs.sha_tag }}" >> "$GITHUB_OUTPUT"`
into `echo "tag=sha-${{ github.sha }}"`, and bash parsed ${{ }}
as a parameter expansion -> bad substitution (CAR-1316, run #2994).
Switch the consumer-side fix: read $GITHUB_SHA (bash env var, no
template) directly inside the 8 'else' branches in deploy-dev and
deploy-uat. Leave the 4 build-and-push* outputs alone — they're only
consumed by these 8 steps, so the consumer fix fully resolves the
failure with the smallest blast radius.
Refs: CAR-1316, CAR-1195, CAR-1194.
Pin both build and runtime stages of auth/Dockerfile to
node:22-alpine@sha256:8ea2348b068a9544dae7317b4f3aafcdc032df1647bb7d768a05a5cad1a7683f
— the Docker Hub manifest digest for node:22.22.2-alpine (verified against
the registry by CTO).
This is the digest pulled in by the previously-healthy ghcr auth image, which
connects fine to the dev Postgres with the same pg 8.20.0 driver and
byte-identical source. The Gitea-built image, which bundles node 22.22.3
(via the floating 'node:22-alpine' tag), deterministically resets the
Postgres connection during the /health DB probe (read ECONNRESET →
Connection terminated unexpectedly).
Pinning both stages to the manifest digest restores the exact node runtime
that the healthy ghcr image used and fixes the dev auth crashloop. The
'RUN apk update && apk upgrade --no-cache' lines are kept as-is per task
spec.
Refs CAR-1279, CAR-1276 (CAR-1287)
The in-job merge attempt against `cartsnitch/infra` main is a best-effort
fast-path only. `infra` main requires a human approving review and the CI
bot (`CI_GITEA_TOKEN`) can never self-approve, so the merge call
structurally cannot succeed in the general case.
Replace the special-cased `does not have enough approvals` branch and the
final `else -> exit 1` branch in both `deploy-dev` and `deploy-uat` with a
single non-failing outcome: surface the Gitea response as a `::notice::`
and `exit 0`. The PR is already opened and `cs_savannah` is requested as
reviewer above, so the GitOps hand-off is intact.
The only hard-fail (`exit 1`) in this step remains the empty-`PR_NUM`
check (PR could not be created at all).
Related: CAR-1195 (PR-bump pattern), CAR-1194, CAR-1212.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The /health handler's catch block was empty, so when the DB probe
failed we had no log line to diagnose from. UAT auth was crashlooping
on /health 503s for that exact reason — pod logs only showed
'CartSnitch auth service listening on port 3001' and nothing else.
Add console.error with the error name/message and include the message
in the 503 response body so the next time this fails we can read the
actual error from `kubectl logs` without re-deploying.
This is the dev-side observability half of CAR-1276. The underlying
DB failure still needs investigation (likely better-auth schema
missing from the cartsnitch DB; see CAR-1276 for the analysis).
Tests updated to assert the new error field is present and a string.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Per the issue's guidance, when a quality gate is misconfigured and the
fix is non-trivial, the right call is to propose making it
non-required / advisory (not silently delete it). This PR does exactly
that.
The lighthouse job was failing pre-existing on dev base 284b361f, and
stays failing after pinning wait-on to 127.0.0.1, pinning
lighthouserc.json url to 127.0.0.1:4173, and forcing 'npx vite preview
--host 127.0.0.1 --port 4173'. Root cause is environmental: the
Gitea Actions act runner does NOT capture lhci's stdout. lhci exits ~40ms
after start with code 1 and zero log output. set -x, tee, file
redirection, and cat all bypassed the capture. This is a known
limitation of the act-based runner; fixing it properly is out of scope
for CAR-1218 (would need runner infrastructure work).
Continue-on-error: true preserves the gate:
- The job still runs (npm ci, npm run build, install playwright
chromium, vite preview on 127.0.0.1:4173, lhci autorun).
- All quality-gate assertions in lighthouserc.json are unchanged
(perf >= 0.7, a11y >= 0.9, best-practices >= 0.8).
- Failures surface on the PR commit status but no longer block
merge.
- When the act runner's output-capture is fixed (e.g. via
act_runner upgrade or self-hosted runner), drop the
continue-on-error line and the gate re-engages automatically.
Refs: CAR-1218, CAR-1215, CAR-938, CAR-937
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous fix (probe 127.0.0.1) wasn't enough because 'vite preview'
binds to 'localhost', which resolves to ::1 (IPv6) on the Gitea Actions
runner. wait-on probed 127.0.0.1 but vite preview was listening on
::1, so the IPv4 probe still timed out.
Use 'npx vite preview --host 127.0.0.1 --port 4173' to force the
explicit IPv4 binding, matching the wait-on probe. Two-line diff total
with the lighthouserc.json change. The vite preview 'Local' message
will report 127.0.0.1:4173 (no 'Network' line because we're not bound
to 0.0.0.0).
Refs: CAR-1218
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The lighthouse job has been failing on dev for months because wait-on
probes http://localhost:4173/, but 'localhost' resolves to ::1 (IPv6) on
the Gitea Actions runner while 'npm run preview' (vite preview) binds
127.0.0.1 (IPv4) only. The HTTP probe never connects; lighthouse never
runs.
Pin both the wait-on probe and the lighthouserc url to 127.0.0.1:4173 so
the IPv4 binding is the only thing in play. Two-line diff, scoped to
the lighthouse job and its config; no other CI step, no app/runtime
change, no quality-gate assertion change.
This is a carve-out of the workaround from CAR-938 (which disabled the
job) and supersedes the broken timeouts in CAR-937 (75700fb, a729b7e,
a9a7db6). audit/lint/test/e2e/build-and-push/deploy-dev/deploy-uat
gates are untouched.
Refs: CAR-1218, CAR-1215, CAR-938, CAR-937
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Lockfile-only bump from 7.14.0 -> 7.16.0. The ^7.0.0 range in
package.json already permits 7.16.0, so no source changes.
Clears three high-severity advisories that block the audit CI gate:
- GHSA-49rj-9fvp-4h2h (turbo-stream arbitrary constructor invocation)
- GHSA-2j2x-hqr9-3h42 (protocol-relative URL open redirect)
- GHSA-8x6r-g9mw-2r78 (DoS via unbounded path expansion)
No runtime behavior change; react-router stays on 7.x. npm audit
--audit-level=high exits clean (0 high/critical) locally.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the spec for CAR-1212 (CAR-1195 follow-up):
- deploy-dev and deploy-uat now request cs_savannah as a reviewer on the
cartsnitch/infra PR (best-effort, log on non-2xx, never fail the job).
- After the merge attempt, classify the response:
* .merged == true -> success notice
* 'Does not have enough approvals' -> ::notice:: + exit 0
(GitOps approval gate, not a
failure; the PR is correctly
opened and surfaces in the CTO
queue)
* anything else -> keep the existing ::error::
and exit 1 (genuine unexpected
failure)
This unblocks the deploy jobs that were hard-failing on the branch-protection
approvals requirement, which a CI bot cannot self-satisfy. The CTO (cs_savannah)
already backstop-approves+merges these infra PRs by hand (e.g. #321, #322).
- 'No image changes to deploy' early-exit preserved.
- Still uses secrets.CI_GITEA_TOKEN for the PR/reviewer/merge API calls.
- No git push origin main: only the API path is used.
Refs CAR-1195, CAR-1194.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Set delete_branch_after_merge:true on the auto-merge POST in both
deploy-dev and deploy-uat so the per-deploy branches in
cartsnitch/infra (ci/deploy-{dev,uat}-${GITHUB_SHA}) are removed
once their overlay image-tag bump lands on main. Without this flag
every successful deploy would leave a branch behind, accumulating
in cartsnitch/infra and making future re-runs of the same SHA
un-actionable from the existing branch name.
Refs CAR-1195 (CTO fix#2).
deploy-dev and deploy-uat had CI_GITEA_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
which is the package-scoped container-registry token. PR creation and
auto-merge against cartsnitch/infra would 403 on the first real push.
Bind to secrets.CI_GITEA_TOKEN (the token the infra checkout already
uses for branch push) so the Gitea API calls have repo-write scope.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Runner pod gitea-act-runner-cartsnitch-85b5984bb-527xw has a corrupt
/root/.cache/act clone of actions/setup-node (missing dist/setup/index.js).
SHA-pinning changed the cache hash but the fresh clone on that pod still
ends up missing the dist directory.
catthehacker/ubuntu:act-latest ships Node pre-installed; the lint job only
needs ESLint + tsc, both of which are devDependencies installed by npm ci.
Removing actions/setup-node from lint bypasses the corrupt pod cache entirely
without affecting other jobs.
Refs CAR-1162
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The CI's npm audit (10.8.2) flagged three transitive vulnerabilities
that local newer-npm runs (11.x) miss due to advisory-DB divergence:
- @babel/plugin-transform-modules-systemjs: 7.29.0 -> ^7.29.4
(CVE-2026-44728: arbitrary code generation, fixed in 7.29.4)
- fast-uri: 3.1.0 -> ^3.1.2
(path traversal / host confusion via percent-encoded segments)
- brace-expansion: 5.0.5 -> >=5.0.6
(DoS via large numeric range defeating max protection)
These are non-breaking transitive updates within the same major
version. The previous override for brace-expansion (>=1.1.13) was
too loose to exclude 5.0.2-5.0.5; tightening it to >=5.0.6.
Ref CAR-1162, CAR-1122, CAR-1078
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Remove .mcp.json (scope creep, unrelated to CAR-1078)
- Bump vitest to ^4.1.8 (fixes GHSA-5xrq-8626-4rwp critical)
- Run npm audit fix for non-breaking vulns
- Pin actions/checkout and actions/setup-node to commit SHAs
in .gitea/workflows/ci.yml to force a clean cache fetch on
the act runner (workaround for corrupted /root/.cache/act cache)
Refs CAR-1162, CAR-1122, CAR-1078
email_worker calls get_async_session_factory() inside every resolve_user()
call, which creates a brand-new async engine (and thus a brand-new
connection pool) on every message. In a tight consumer loop processing
5 messages per batch, this rapidly exhausts DragonflyDB/Postgres
connection limits and manifests as ConnectionResetError.
Fix: cache the async engine in a module-level dict keyed by URL in
cartsnitch_common.database:get_async_engine(), matching the pattern
already used in receiptwitness:events.py for the Redis connection pool.
Also add pool_size=10, max_overflow=20, pool_pre_ping=True for
健壮连接管理.
Similarly, receiptwitness/queue/email.py:get_redis() was creating a new
Redis connection on every call with no pooling. Share a
ConnectionPool (max_connections=30) across all get_redis() callers.
Fixes CAR-1078
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Merge PR #266: Fix API crash — remove dead dispose_engine import
Removes non-existent dispose_engine import from main.py that caused ImportError and CrashLoopBackOff on API pods.
Reviewed-by: Checkout Charlie (QA PASS)
Approved-by: Savannah Savings (CTO)
ImportError: cannot import name 'dispose_engine' from 'cartsnitch_api.database'
The function does not exist and is unused - lifespan already calls close_db() directly.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
cpfarhood confirmed the Gitea registry token is configured as REGISTRY_TOKEN
(not GITEA_TOKEN). This applies to both the registry docker login steps
and the infra repo checkout steps in deploy-dev/deploy-uat.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace hardcoded 'cs_carl' Gitea registry username with '${{ github.actor }}' in all 5 login steps
- Use kustomize 'OLD=NEW:TAG' rename syntax so existing ghcr.io image entries are updated instead of duplicated
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Gitea's GITHUB_TOKEN authenticates against git.farh.net, not ghcr.io.
Use explicit GHCR_USERNAME and GHCR_TOKEN secrets instead.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Replace all runs-on: runners-cartsnitch with ubuntu-latest
- Remove SARIF upload steps (no Gitea Security tab)
- Replace GitHub App token with secrets.GITEA_TOKEN in deploy-dev and deploy-uat
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Exclude src/__tests__ from tsconfig to prevent test files from being
compiled during Docker build. Fixes build-and-push-auth CI failure.
Co-Authored-By: Paperclip <noreply@paperclip.ing>