Deploy-dev and deploy-uat jobs were opening image-tag-bump PRs against
dev/uat branches per CAR-1371. Flux reconciles all overlays from infra
main, so those PRs were never picked up. Revert --arg base back to main.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Verifies the actions/checkout ref parameterization in deploy-dev:
- head branch lineage now matches PR base (dev)
- cartsnitch/infra PR should be mergeable with single-file diff
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Parameterize the actions/checkout ref for cartsnitch/infra in deploy-dev
and deploy-uat so the head branch lineage matches the PR base:
- main push -> ref: main, base: main (unchanged)
- dev push -> ref: dev, base: dev
- uat push -> ref: uat, base: uat
Before: ref: main was hardcoded, so the auto-opened image-tag-bump PR
in cartsnitch/infra was branched from main, not from dev/uat. With the
CAR-1371 base=dev/base=uat change, the diff ballooned to 30+ files and
the PR was unmergeable (see cartsnitch/infra#392).
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Verification no-op to confirm the deploy-dev job now opens image-tag-bump
PRs against cartsnitch/infra:dev instead of :main.
Will self-revert after the deploy-dev run completes successfully.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
fix(cartsnitch): deploy-dev/deploy-uat PR base = dev/uat not main (CAR-1370)
Two-line swap in .gitea/workflows/ci.yml so deploy-dev targets dev and deploy-uat targets uat instead of main.
CAR-1370 / CAR-1371
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Deploy jobs in ci.yml were opening image-tag-bump PRs against cartsnitch/infra: main
regardless of which branch triggered the deploy. The deploy-dev job should target
dev, deploy-uat should target uat.
Two-line swap in .gitea/workflows/ci.yml:
- Line 582 (deploy-dev): --arg base main -> --arg base dev
- Line 728 (deploy-uat): --arg base main -> --arg base uat
Verified by inspecting both curl payloads; no other --arg base occurrences.
CAR-1370 / CAR-1371
Co-Authored-By: Paperclip <noreply@paperclip.ing>
fix(api): widen alembic_version.version_num in migration 001 (CAR-1302)
Rebased onto current dev head ad18a43b5 per CAR-1365. Drops 71e3b81 (already in dev via #281). Resolves ci.yml conflict by keeping dev's CAR-1316/1318 fixed version. Self-merge per SDLC Phase 1 (CI green on run #3470).
The act runner resolves 'localhost' to ::1 (IPv6) and the preview
server does not get a reachable IPv4 socket, so wait-on times out
and the 'Start preview server' step fails the lighthouse job. Bind
explicitly to 127.0.0.1 (IPv4).
Refs CAR-1218, CAR-1302, CAR-1334
Alembic hardcodes alembic_version.version_num to VARCHAR(32) in
DefaultImpl.version_table_impl, and version_table_column_width is NOT a
real kwarg that context.configure() honors — it's silently ignored, so
the env.py change alone was never going to take effect on a fresh DB.
Our descriptive revision ids exceed 32 chars (e.g. 003_make_users_hashed_
password_nullable = 39, common 002_add_normalized_products_upc_variants_
index = 46), so the 003 / common 002 stamp fails with StringDataRight-
Truncation, the whole chain rolls back, and the column is recreated at
VARCHAR(32) on the next attempt.
Fix:
- api/alembic/versions/001_encrypt_session_data.py: insert ALTER TABLE
alembic_version ALTER COLUMN version_num TYPE VARCHAR(128) as the very
first statement of upgrade(), before any early-return path. Idempotent
when the column is already wider (e.g. the CAR-1298 one-shot Job).
- common/alembic/versions/001_add_email_inbound_token.py: same defensive
ALTER as the first statement of upgrade() (common is a library, not
deployed, but the 46-char 002 id would have hit the same trap).
- api/alembic/env.py: remove the phantom version_table_column_width=128
kwarg from both context.configure() call sites — it was a no-op and
misled the original investigation.
No downgrade() changes: a matching narrowing could truncate.
Refs CAR-1302 (durable root fix), CAR-1298 (prod workaround this
replaces). Verified against a fresh PostgreSQL — all 9 api migrations
upgrade head with no StringDataRightTruncation, and common 001/002 stamp
the 46-char id cleanly. Cluster has pgcrypto enabled by the operator.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
act_runner does not honor continue-on-error at the job level (the
lighthouse job still posts 'failure' commit status). Apply
continue-on-error at the step level and capture lhci output to
/tmp/lhci.log so we can see the actual lhci failure for future
debugging.
Refs CAR-1218, CAR-1334
The four `build-and-push*` jobs declared a job-level output
`sha_tag: sha-${{ github.sha }}` (literal prefix concatenated with
an expression). Gitea Actions does NOT substitute ${{ github.sha }}
inside that concatenated value, so the literal string
`sha-${{ github.sha }}` propagated into needs.<job>.outputs.sha_tag.
Each deploy job's 'Determine image tag' step then expanded
`echo "tag=${{ needs.<job>.outputs.sha_tag }}" >> "$GITHUB_OUTPUT"`
into `echo "tag=sha-${{ github.sha }}"`, and bash parsed ${{ }}
as a parameter expansion -> bad substitution (CAR-1316, run #2994).
Switch the consumer-side fix: read $GITHUB_SHA (bash env var, no
template) directly inside the 8 'else' branches in deploy-dev and
deploy-uat. Leave the 4 build-and-push* outputs alone — they're only
consumed by these 8 steps, so the consumer fix fully resolves the
failure with the smallest blast radius.
Refs: CAR-1316, CAR-1195, CAR-1194.
Pin both build and runtime stages of auth/Dockerfile to
node:22-alpine@sha256:8ea2348b068a9544dae7317b4f3aafcdc032df1647bb7d768a05a5cad1a7683f
— the Docker Hub manifest digest for node:22.22.2-alpine (verified against
the registry by CTO).
This is the digest pulled in by the previously-healthy ghcr auth image, which
connects fine to the dev Postgres with the same pg 8.20.0 driver and
byte-identical source. The Gitea-built image, which bundles node 22.22.3
(via the floating 'node:22-alpine' tag), deterministically resets the
Postgres connection during the /health DB probe (read ECONNRESET →
Connection terminated unexpectedly).
Pinning both stages to the manifest digest restores the exact node runtime
that the healthy ghcr image used and fixes the dev auth crashloop. The
'RUN apk update && apk upgrade --no-cache' lines are kept as-is per task
spec.
Refs CAR-1279, CAR-1276 (CAR-1287)
The in-job merge attempt against `cartsnitch/infra` main is a best-effort
fast-path only. `infra` main requires a human approving review and the CI
bot (`CI_GITEA_TOKEN`) can never self-approve, so the merge call
structurally cannot succeed in the general case.
Replace the special-cased `does not have enough approvals` branch and the
final `else -> exit 1` branch in both `deploy-dev` and `deploy-uat` with a
single non-failing outcome: surface the Gitea response as a `::notice::`
and `exit 0`. The PR is already opened and `cs_savannah` is requested as
reviewer above, so the GitOps hand-off is intact.
The only hard-fail (`exit 1`) in this step remains the empty-`PR_NUM`
check (PR could not be created at all).
Related: CAR-1195 (PR-bump pattern), CAR-1194, CAR-1212.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The /health handler's catch block was empty, so when the DB probe
failed we had no log line to diagnose from. UAT auth was crashlooping
on /health 503s for that exact reason — pod logs only showed
'CartSnitch auth service listening on port 3001' and nothing else.
Add console.error with the error name/message and include the message
in the 503 response body so the next time this fails we can read the
actual error from `kubectl logs` without re-deploying.
This is the dev-side observability half of CAR-1276. The underlying
DB failure still needs investigation (likely better-auth schema
missing from the cartsnitch DB; see CAR-1276 for the analysis).
Tests updated to assert the new error field is present and a string.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Per the issue's guidance, when a quality gate is misconfigured and the
fix is non-trivial, the right call is to propose making it
non-required / advisory (not silently delete it). This PR does exactly
that.
The lighthouse job was failing pre-existing on dev base 284b361f, and
stays failing after pinning wait-on to 127.0.0.1, pinning
lighthouserc.json url to 127.0.0.1:4173, and forcing 'npx vite preview
--host 127.0.0.1 --port 4173'. Root cause is environmental: the
Gitea Actions act runner does NOT capture lhci's stdout. lhci exits ~40ms
after start with code 1 and zero log output. set -x, tee, file
redirection, and cat all bypassed the capture. This is a known
limitation of the act-based runner; fixing it properly is out of scope
for CAR-1218 (would need runner infrastructure work).
Continue-on-error: true preserves the gate:
- The job still runs (npm ci, npm run build, install playwright
chromium, vite preview on 127.0.0.1:4173, lhci autorun).
- All quality-gate assertions in lighthouserc.json are unchanged
(perf >= 0.7, a11y >= 0.9, best-practices >= 0.8).
- Failures surface on the PR commit status but no longer block
merge.
- When the act runner's output-capture is fixed (e.g. via
act_runner upgrade or self-hosted runner), drop the
continue-on-error line and the gate re-engages automatically.
Refs: CAR-1218, CAR-1215, CAR-938, CAR-937
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The previous fix (probe 127.0.0.1) wasn't enough because 'vite preview'
binds to 'localhost', which resolves to ::1 (IPv6) on the Gitea Actions
runner. wait-on probed 127.0.0.1 but vite preview was listening on
::1, so the IPv4 probe still timed out.
Use 'npx vite preview --host 127.0.0.1 --port 4173' to force the
explicit IPv4 binding, matching the wait-on probe. Two-line diff total
with the lighthouserc.json change. The vite preview 'Local' message
will report 127.0.0.1:4173 (no 'Network' line because we're not bound
to 0.0.0.0).
Refs: CAR-1218
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The lighthouse job has been failing on dev for months because wait-on
probes http://localhost:4173/, but 'localhost' resolves to ::1 (IPv6) on
the Gitea Actions runner while 'npm run preview' (vite preview) binds
127.0.0.1 (IPv4) only. The HTTP probe never connects; lighthouse never
runs.
Pin both the wait-on probe and the lighthouserc url to 127.0.0.1:4173 so
the IPv4 binding is the only thing in play. Two-line diff, scoped to
the lighthouse job and its config; no other CI step, no app/runtime
change, no quality-gate assertion change.
This is a carve-out of the workaround from CAR-938 (which disabled the
job) and supersedes the broken timeouts in CAR-937 (75700fb, a729b7e,
a9a7db6). audit/lint/test/e2e/build-and-push/deploy-dev/deploy-uat
gates are untouched.
Refs: CAR-1218, CAR-1215, CAR-938, CAR-937
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Lockfile-only bump from 7.14.0 -> 7.16.0. The ^7.0.0 range in
package.json already permits 7.16.0, so no source changes.
Clears three high-severity advisories that block the audit CI gate:
- GHSA-49rj-9fvp-4h2h (turbo-stream arbitrary constructor invocation)
- GHSA-2j2x-hqr9-3h42 (protocol-relative URL open redirect)
- GHSA-8x6r-g9mw-2r78 (DoS via unbounded path expansion)
No runtime behavior change; react-router stays on 7.x. npm audit
--audit-level=high exits clean (0 high/critical) locally.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the spec for CAR-1212 (CAR-1195 follow-up):
- deploy-dev and deploy-uat now request cs_savannah as a reviewer on the
cartsnitch/infra PR (best-effort, log on non-2xx, never fail the job).
- After the merge attempt, classify the response:
* .merged == true -> success notice
* 'Does not have enough approvals' -> ::notice:: + exit 0
(GitOps approval gate, not a
failure; the PR is correctly
opened and surfaces in the CTO
queue)
* anything else -> keep the existing ::error::
and exit 1 (genuine unexpected
failure)
This unblocks the deploy jobs that were hard-failing on the branch-protection
approvals requirement, which a CI bot cannot self-satisfy. The CTO (cs_savannah)
already backstop-approves+merges these infra PRs by hand (e.g. #321, #322).
- 'No image changes to deploy' early-exit preserved.
- Still uses secrets.CI_GITEA_TOKEN for the PR/reviewer/merge API calls.
- No git push origin main: only the API path is used.
Refs CAR-1195, CAR-1194.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Set delete_branch_after_merge:true on the auto-merge POST in both
deploy-dev and deploy-uat so the per-deploy branches in
cartsnitch/infra (ci/deploy-{dev,uat}-${GITHUB_SHA}) are removed
once their overlay image-tag bump lands on main. Without this flag
every successful deploy would leave a branch behind, accumulating
in cartsnitch/infra and making future re-runs of the same SHA
un-actionable from the existing branch name.
Refs CAR-1195 (CTO fix#2).
deploy-dev and deploy-uat had CI_GITEA_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
which is the package-scoped container-registry token. PR creation and
auto-merge against cartsnitch/infra would 403 on the first real push.
Bind to secrets.CI_GITEA_TOKEN (the token the infra checkout already
uses for branch push) so the Gitea API calls have repo-write scope.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Runner pod gitea-act-runner-cartsnitch-85b5984bb-527xw has a corrupt
/root/.cache/act clone of actions/setup-node (missing dist/setup/index.js).
SHA-pinning changed the cache hash but the fresh clone on that pod still
ends up missing the dist directory.
catthehacker/ubuntu:act-latest ships Node pre-installed; the lint job only
needs ESLint + tsc, both of which are devDependencies installed by npm ci.
Removing actions/setup-node from lint bypasses the corrupt pod cache entirely
without affecting other jobs.
Refs CAR-1162
Co-Authored-By: Paperclip <noreply@paperclip.ing>
The CI's npm audit (10.8.2) flagged three transitive vulnerabilities
that local newer-npm runs (11.x) miss due to advisory-DB divergence:
- @babel/plugin-transform-modules-systemjs: 7.29.0 -> ^7.29.4
(CVE-2026-44728: arbitrary code generation, fixed in 7.29.4)
- fast-uri: 3.1.0 -> ^3.1.2
(path traversal / host confusion via percent-encoded segments)
- brace-expansion: 5.0.5 -> >=5.0.6
(DoS via large numeric range defeating max protection)
These are non-breaking transitive updates within the same major
version. The previous override for brace-expansion (>=1.1.13) was
too loose to exclude 5.0.2-5.0.5; tightening it to >=5.0.6.
Ref CAR-1162, CAR-1122, CAR-1078
Co-Authored-By: Paperclip <noreply@paperclip.ing>