diff --git a/docs/uat-receipt-submission.md b/docs/uat-receipt-submission.md new file mode 100644 index 0000000..4b1ca3d --- /dev/null +++ b/docs/uat-receipt-submission.md @@ -0,0 +1,244 @@ +# UAT Receipt Submission Path + +**Issue:** [CAR-812](/CAR/issues/CAR-812) +**Author:** Barcode Betty +**Date:** 2026-05-04 + +--- + +## Overview + +The UAT environment supports receipt submission via **inbound email**. This is the only supported submission method in UAT — there is no public REST API surface for receipt ingestion. + +--- + +## How It Works + +### Architecture + +``` +User composes email + ↓ +Email sent to @cartsnitch..farh.net + ↓ +Mailgun webhook receives the email + ↓ +Email job enqueued to DragonflyDB stream: email:receipts + ↓ +email-worker (ReceiptWitness) consumes the job + ↓ +Worker resolves user via email_inbound_token lookup in DB + ↓ +Retailer detected from email content (meijer / kroger / target) + ↓ +Email parsed into Purchase + PurchaseItem records + ↓ +receipt.ingested event published to Redis + ↓ +MatchResult created with method=upc, confidence=1.0 for known UPCs +``` + +### Key Components + +| Component | Location | Role | +|-----------|----------|------| +| `users.email_inbound_token` | DB (migration `001_add_email_inbound_token`) | 22-char unique token per user; used as email routing identifier | +| `email:receipts` stream | DragonflyDB | Queue holding pending email jobs | +| `email-worker` | `receiptwitness/src/receiptwitness/worker/email_worker.py` | Async worker consuming the stream | +| `BaseEmailParser` | `receiptwitness/src/receiptwitness/parsers/email/base.py` | Abstract parser; subclasses for meijer/kroger/target | +| Retailer detectors | `receiptwitness/src/receiptwitness/parsers/email/detector.py` | Sifts sender/subject to pick the right parser | + +### Email Address Format + +Each user is assigned a unique inbound token. The receipt submission email address is shown in **Settings → Receipt Email** on the UI: + +**Address:** `receipts+@receipts.cartsnitch.com` + +To find a user's token in the UAT database (requires `kubectl` access to `cartsnitch-uat`): + +```bash +kubectl exec -n cartsnitch-uat deployment/cartsnitch-api -- \\ + python -c "from cartsnitch_common.database import get_sync_session; \\ + from cartsnitch_common.models.user import User; \\ + from sqlalchemy import select; \\ + s = get_sync_session('postgresql://cartsnitch:cartsnitch@cartsnitch-pg-rw:5432/cartsnitch'); \\ + u = s.execute(select(User).where(User.email=='dottie@example.com')).scalar_one(); \\ + print(u.email_inbound_token)" +``` + +--- + +## Submitting a Test Receipt (Step-by-Step) + +### Prerequisites + +- A test user account in UAT with a known `email_inbound_token` +- A sample receipt email with a **known UPC** from the seeded `normalized_products` table + +### Steps + +1. **Obtain the test user's inbound token.** + Use the UAT Settings → Receipt Email page in the UI to see the full address `receipts+@receipts.cartsnitch.com`, or query the DB directly (see above). + +2. **Compose the email.** + Send to: the address shown in Settings → Receipt Email + Subject: anything + Body: plain-text or HTML receipt content + +3. **Expected behavior after email is processed:** + - A `Receipt` row is created in `purchases` + - `PurchaseItem` rows are created with `upc` matching the seeded product UPC + - A `MatchResult` is created with `method='upc'` and `confidence=1.0` + +--- + +## Known UPC for Dottie (from UAT seed) + +> **NOTE:** `kubectl` is not available in this execution environment. The UAT seed and DB query could not be executed. The sample receipt below uses a plausible placeholder UPC. Before Dottie runs the regression: +> 1. Run `bash scripts/seed-env.sh uat` from a machine with UAT kubecontext +> 2. Query: `SELECT id, canonical_name, upc_variants->0->>'upc' AS sample_upc FROM normalized_products WHERE jsonb_array_length(upc_variants) > 0 LIMIT 1;` +> 3. Replace the placeholder values below with the real captured row + +- `id`: **TBD — run seed and query UAT DB** +- `name`: **TBD — run seed and query UAT DB** +- `sample UPC`: **TBD — run seed and query UAT DB** + +### Meijer Sample Receipt (plain text) + +``` +Meijer +=================================== +Purchase Date: 03/15/2026 +Store: Meijer #127 - Ann Arbor, MI +----------------------------------- + 1 x Organic Whole Milk 1gal $4.99 + 1 x Whole Wheat Bread $3.29 + 1 x Bananas (2 lb) $0.67 + 1 x Chicken Breast (3 lb) $12.47 + 1 x Cheddar Cheese Block 8oz $5.99 +----------------------------------- +Subtotal: $27.41 +Tax: $1.93 +Total: $29.34 +=================================== +THANK YOU FOR SHOPPING MEIJER +=================================== +``` +Meijer +=================================== +Purchase Date: 03/15/2026 +Store: Meijer #127 - Ann Arbor, MI +----------------------------------- + 1 x Organic Whole Milk 1gal $4.99 + 1 x Whole Wheat Bread $3.29 + 1 x Bananas (2 lb) $0.67 + 1 x Chicken Breast (3 lb) $12.47 + 1 x Cheddar Cheese Block 8oz $5.99 +----------------------------------- +Subtotal: $27.41 +Tax: $1.93 +Total: $29.34 +=================================== +THANK YOU FOR SHOPPING MEIJER +=================================== +``` + +> **Note:** The `email-worker` parses the email body and extracts line items by retailer. The exact format and field mapping depends on the retailer parser. For Meijer, the parser looks for item lines matching `(\d+) x (.+?)\s+\$([\d.]+)`. UPCs in the `upc_variants` JSONB of seeded products will be matched during the normalization step. + +### Kroger Sample Receipt (plain text) + +``` +KROGER +=================================== +Purchase Date: 03/15/2026 +Store: KROGER #412 - Ann Arbor MI +----------------------------------- + 1 Organic Whole Milk 1gal $5.29 + 1 Whole Wheat Bread $3.49 + 1 Bananas (2 lb) $0.69 + 1 Chicken Breast (3 lb) $11.99 + 1 Sharp Cheddar Cheese 8oz $4.99 +----------------------------------- +Subtotal: $26.45 +Tax: $1.85 +Total: $28.30 +=================================== +``` + +### Target Sample Receipt (plain text) + +``` +TARGET +=================================== +03/15/2026 14:32 +Store: 0874 Ann Arbor, MI +=================================== + 1 Organic Whole Milk 1G $5.49 + 1 Whole Wheat Bread $3.29 + 1 Bananas LB 2 $0.68 + 1 Chicken Breast 3# $12.99 + 1 Cheddar Cheese 8OZ $5.79 +----------------------------------- +Subtotal: $28.24 +Tax (6%): $1.69 +Total: $29.93 +=================================== +``` + +--- + +## Troubleshooting + +### Email not processed + +1. Check the `email:receipts` stream has messages: + ```bash + kubectl exec -n cartsnitch-uat deploy/email-worker -- python -c \\ + "import asyncio; from receiptwitness.queue.email import get_redis; \\ + async def chk(): c = await get_redis(); info = await c.xinfo_stream('email:receipts'); print(info); \\ + asyncio.run(chk())" + ``` + +2. Check `email-worker` logs for retailer detection failures: + ```bash + kubectl logs -n cartsnitch-uat deploy/email-worker -f + ``` + +3. Verify the token resolves to a user in the DB: + ```bash + kubectl exec -n cartsnitch-uat deploy/cartsnitch-api -- \\ + python -c "from cartsnitch_common.database import get_sync_session; \\ + from cartsnitch_common.models.user import User; \\ + from sqlalchemy import select; \\ + s = get_sync_session('postgresql://...'); \\ + r = s.execute(select(User.email_inbound_token).limit(5)).all(); \\ + print(r)" + ``` + +### No MatchResult created + +The normalization pipeline requires a `normalized_product` row with the submitted UPC in `upc_variants`. If the seed was run, the product should be found. Check the `match_results` table after submission: + +```sql +SELECT mr.*, np.canonical_name +FROM match_results mr +JOIN normalized_products np ON np.id = mr.normalized_product_id +WHERE mr.match_method = 'upc' +ORDER BY mr.created_at DESC +LIMIT 10; +``` + +--- + +## Related Files + +| File | Role | +|------|------| +| `common/alembic/versions/001_add_email_inbound_token.py` | Adds `email_inbound_token` column | +| `receiptwitness/src/receiptwitness/worker/email_worker.py` | Consumes email jobs from stream | +| `receiptwitness/src/receiptwitness/queue/email.py` | DragonflyDB stream consumer group | +| `receiptwitness/src/receiptwitness/parsers/email/detector.py` | Retailer detection | +| `receiptwitness/src/receiptwitness/parsers/email/meijer.py` | Meijer email parser | +| `receiptwitness/src/receiptwitness/parsers/email/kroger.py` | Kroger email parser | +| `receiptwitness/src/receiptwitness/parsers/email/target.py` | Target email parser | +| `docs/uat-runbook.md` | UAT runbook (defect classification, entry/exit criteria) | \ No newline at end of file diff --git a/scripts/apply-seed-job.sh b/scripts/apply-seed-job.sh new file mode 100755 index 0000000..abba425 --- /dev/null +++ b/scripts/apply-seed-job.sh @@ -0,0 +1,38 @@ +#!/usr/bin/env bash +# ============================================================================= +# apply-seed-job.sh — Apply the seed Job manifest for a given environment. +# +# Usage: +# ./apply-seed-job.sh +# +# Example: +# ./apply-seed-job.sh uat +# ./apply-seed-job.sh dev +# ============================================================================= + +set -euo pipefail + +ENV="${1:-}" +HELP_FLAG="" + +while [[ $# -gt 0 ]]; do + case "$1" in + --help) HELP_FLAG="1"; shift ;; + *) ENV="$1"; shift ;; + esac +done + +if [[ -n "$HELP_FLAG" ]] || [[ -z "$ENV" ]]; then + echo "Usage: $0 " + echo " env dev or uat" + exit 0 +fi + +if [[ "$ENV" != "dev" && "$ENV" != "uat" ]]; then + echo "ERROR: Invalid environment: $ENV (must be 'dev' or 'uat')" >&2 + exit 1 +fi + +SCRIPT_DIR="$(dirname "$0")" +sed "s/__ENV__/${ENV}/g" "${SCRIPT_DIR}/seed-env-job.yaml" | kubectl apply -f - +echo "Seed job applied for environment: $ENV" \ No newline at end of file diff --git a/scripts/seed-dev-job.yaml b/scripts/seed-dev-job.yaml index 2d5cc86..76a1c34 100644 --- a/scripts/seed-dev-job.yaml +++ b/scripts/seed-dev-job.yaml @@ -58,4 +58,4 @@ spec: memory: 256Mi limits: cpu: 500m - memory: 512Mi + memory: 512Mi \ No newline at end of file diff --git a/scripts/seed-dev.sh b/scripts/seed-dev.sh index a478015..b68f67b 100755 --- a/scripts/seed-dev.sh +++ b/scripts/seed-dev.sh @@ -1,104 +1,3 @@ #!/usr/bin/env bash -# ============================================================================= -# seed-dev.sh — Run the CartSnitch seed runner against the dev database. -# -# Usage: -# ./seed-dev.sh Run full seed against dev -# ./seed-dev.sh --dry-run Show planned record counts without writing -# ./seed-dev.sh --help Show this help -# -# Prerequisites: -# - kubectl configured for the cartsnitch-dev cluster -# - Namespace cartsnitch-dev exists (CNPG Postgres must be running) -# -# What it does: -# 1. Starts a background port-forward to cartsnitch-pg-rw:5432 -# 2. Waits for the tunnel to be ready -# 3. Runs python -m cartsnitch_common.seed with --database-url pointing -# to localhost:/cartsnitch -# 4. Cleans up the port-forward on exit (normal, interrupt, or error) -# ============================================================================= - -set -euo pipefail - -# --- Config ------------------------------------------------------------------- -readonly NAMESPACE="cartsnitch-dev" -readonly SVC_NAME="cartsnitch-pg-rw" -readonly LOCAL_PORT="5433" # use a non-privileged port to avoid conflicts -readonly DB_NAME="cartsnitch" -readonly PG_USER="cartsnitch" -# Retrieve password from the CNPG credentials secret -readonly PG_PASSWORD="$( - kubectl get secret cartsnitch-pg-credentials \ - -n "$NAMESPACE" \ - -o jsonpath='{.data.password}' \ - | base64 -d -)" -readonly DB_URL="postgresql://${PG_USER}:${PG_PASSWORD}@localhost:${LOCAL_PORT}/${DB_NAME}" - -# --- Helpers ------------------------------------------------------------------ -log() { echo "[seed-dev] $*"; } -fail() { log "ERROR: $*" >&2; exit 1; } - -# Cleanup port-forward and exit. -cleanup() { - if [[ -n "${PF_PID:-}" ]]; then - log "Stopping port-forward (PID $PF_PID)..." - kill "$PF_PID" 2>/dev/null || true - wait "$PF_PID" 2>/dev/null || true - fi -} -trap cleanup EXIT - -# --- Args --------------------------------------------------------------------- -DRY_RUN="" -HELP_FLAG="" - -while [[ $# -gt 0 ]]; do - case "$1" in - --dry-run) DRY_RUN="--dry-run"; shift ;; - --help) HELP_FLAG="1"; shift ;; - *) fail "Unknown argument: $1";; - esac -done - -if [[ -n "$HELP_FLAG" ]]; then - sed -n '3,/^# ---/p' "$0" | head -n -1 | sed 's/^# //' - echo "" - echo "Additional arguments are passed through to the seed runner." - echo "Common seed-runner options:" - echo " --dry-run Show planned record counts without writing" - echo " --seed N Set random seed (default: 42)" - exit 0 -fi - -# --- Prerequisites ------------------------------------------------------------ -if ! command -v kubectl &>/dev/null; then - fail "kubectl not found — must be installed and configured." -fi - -# --- Port-forward ------------------------------------------------------------- -log "Starting port-forward ${SVC_NAME}:5432 -> localhost:${LOCAL_PORT} ..." -kubectl port-forward \ - -n "$NAMESPACE" \ - svc/"$SVC_NAME" \ - "${LOCAL_PORT}:5432" \ - &>/dev/null & -PF_PID=$! - -# Give the tunnel a moment to establish -sleep 2 - -# Verify the tunnel is up -if ! kill -0 "$PF_PID" 2>/dev/null; then - fail "Port-forward failed to start." -fi -log "Port-forward active (PID $PF_PID) on localhost:${LOCAL_PORT}" - -# --- Seed -------------------------------------------------------------------- -log "Running seed against dev database..." -set -x -python -m cartsnitch_common.seed --database-url "$DB_URL" $DRY_RUN -set +x - -log "Done." +# Backward-compat wrapper — delegates to seed-env.sh dev +exec "$(dirname "$0")/seed-env.sh" dev "$@" \ No newline at end of file diff --git a/scripts/seed-env-job.yaml b/scripts/seed-env-job.yaml new file mode 100644 index 0000000..dbd83f9 --- /dev/null +++ b/scripts/seed-env-job.yaml @@ -0,0 +1,58 @@ +# seed-env-job.yaml +# K8s Job to run the CartSnitch seed runner against any CartSnitch database. +# +# Usage (via apply-seed-job.sh): +# bash scripts/apply-seed-job.sh dev +# bash scripts/apply-seed-job.sh uat +# +# To view logs: +# kubectl logs -n cartsnitch- job/seed-env -f +# +# To re-run after fixing issues: +# kubectl delete -f - -n cartsnitch- && bash scripts/apply-seed-job.sh +# +apiVersion: batch/v1 +kind: Job +metadata: + name: seed-env + namespace: cartsnitch-__ENV__ + labels: + app: cartsnitch + component: seed + environment: __ENV__ + annotations: + description: "Runs cartsnitch-common seed runner to populate __ENV__ database with realistic test data." +spec: + backoffLimit: 0 + concurrencyPolicy: Forbid + template: + metadata: + labels: + app: cartsnitch + component: seed + environment: __ENV__ + spec: + restartPolicy: Never + containers: + - name: seed + image: python:3.12-slim + command: + - sh + - -c + - | + pip install --no-cache-dir "cartsnitch-common @ git+https://github.com/cartsnitch/common.git@main" && \ + python -m cartsnitch_common.seed --database-url "$${DATABASE_URL}" + env: + - name: DATABASE_URL + valueFrom: + secretKeyRef: + name: cartsnitch-secrets + key: database-url-pg + optional: false + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi \ No newline at end of file diff --git a/scripts/seed-env.sh b/scripts/seed-env.sh new file mode 100755 index 0000000..93c083d --- /dev/null +++ b/scripts/seed-env.sh @@ -0,0 +1,122 @@ +#!/usr/bin/env bash +# ============================================================================= +# seed-env.sh — Run the CartSnitch seed runner against any CartSnitch database. +# +# Usage: +# ./seed-env.sh [--env dev|uat] [--dry-run] [--help] +# ./seed-env.sh uat --dry-run Run dry-run against UAT +# ./seed-env.sh dev Run full seed against dev (default) +# +# Prerequisites: +# - kubectl configured for the target cluster +# - Namespace cartsnitch- exists (CNPG Postgres must be running) +# +# What it does: +# 1. Starts a background port-forward to cartsnitch-pg-rw:5432 +# 2. Waits for the tunnel to be ready +# 3. Runs python -m cartsnitch_common.seed with --database-url pointing +# to localhost:/cartsnitch +# 4. Cleans up the port-forward on exit (normal, interrupt, or error) +# ============================================================================= + +set -euo pipefail + +# --- Config ------------------------------------------------------------------- +ENV="dev" +if [[ "${1:-}" == "dev" || "${1:-}" == "uat" ]]; then + ENV="$1"; shift +fi + +while [[ $# -gt 0 ]]; do + case "$1" in + --env) ENV="$2"; shift 2 ;; + --dry-run|--help) break ;; + *) break ;; + esac +done + +NAMESPACE="cartsnitch-${ENV}" +SVC_NAME="cartsnitch-pg-rw" +LOCAL_PORT="5433" +DB_NAME="cartsnitch" +PG_USER="cartsnitch" +PG_PASSWORD="$( + kubectl get secret cartsnitch-pg-credentials \ + -n "$NAMESPACE" \ + -o jsonpath='{.data.password}' \ + | base64 -d +)" +DB_URL="postgresql://${PG_USER}:${PG_PASSWORD}@localhost:${LOCAL_PORT}/${DB_NAME}" + +# --- Helpers ------------------------------------------------------------------ +log() { echo "[seed-env] [$ENV] $*"; } +fail() { log "ERROR: $*" >&2; exit 1; } + +cleanup() { + if [[ -n "${PF_PID:-}" ]]; then + log "Stopping port-forward (PID $PF_PID)..." + kill "$PF_PID" 2>/dev/null || true + wait "$PF_PID" 2>/dev/null || true + fi +} +trap cleanup EXIT + +# --- Args --------------------------------------------------------------------- +DRY_RUN="" +HELP_FLAG="" + +while [[ $# -gt 0 ]]; do + case "$1" in + --dry-run) DRY_RUN="--dry-run"; shift ;; + --help) HELP_FLAG="1"; shift ;; + *) fail "Unknown argument: $1";; + esac +done + +if [[ -n "$HELP_FLAG" ]]; then + echo "Usage: $0 [--env dev|uat] [--dry-run] [--help]" + echo "" + echo "Positional / keyword arguments:" + echo " --env dev|uat Target environment (default: dev)" + echo " --dry-run Show planned record counts without writing" + echo " --help Show this help" + echo "" + echo "Additional arguments are passed through to the seed runner." + echo "Common seed-runner options:" + echo " --seed N Set random seed (default: 42)" + exit 0 +fi + +# --- Validate env -------------------------------------------------------------- +if [[ "$ENV" != "dev" && "$ENV" != "uat" ]]; then + fail "Invalid environment: $ENV (must be 'dev' or 'uat')" +fi + +# --- Prerequisites ------------------------------------------------------------ +if ! command -v kubectl &>/dev/null; then + fail "kubectl not found — must be installed and configured." +fi + +# --- Port-forward ------------------------------------------------------------- +log "Starting port-forward ${SVC_NAME}:5432 -> localhost:${LOCAL_PORT} ..." +kubectl port-forward \ + -n "$NAMESPACE" \ + svc/"$SVC_NAME" \ + "${LOCAL_PORT}:5432" \ + &>/dev/null & +PF_PID=$! + +sleep 2 + +if ! kill -0 "$PF_PID" 2>/dev/null; then + fail "Port-forward failed to start." +fi +log "Port-forward active (PID $PF_PID) on localhost:${LOCAL_PORT}" + +# --- Seed -------------------------------------------------------------------- +log "Running seed against ${ENV} database..." +set -x +python -m cartsnitch_common.seed --database-url "$DB_URL" $DRY_RUN +set +x + +log "Done." \ No newline at end of file