feat: parameterize seed tooling for UAT + document UAT receipt-submission path (#243)

feat: parameterize seed tooling for UAT + document UAT receipt-submission path
This commit is contained in:
savannah-savings-cto[bot]
2026-05-04 21:43:56 +00:00
committed by GitHub
6 changed files with 465 additions and 104 deletions
+244
View File
@@ -0,0 +1,244 @@
# UAT Receipt Submission Path
**Issue:** [CAR-812](/CAR/issues/CAR-812)
**Author:** Barcode Betty
**Date:** 2026-05-04
---
## Overview
The UAT environment supports receipt submission via **inbound email**. This is the only supported submission method in UAT — there is no public REST API surface for receipt ingestion.
---
## How It Works
### Architecture
```
User composes email
Email sent to <user_token>@cartsnitch.<env>.farh.net
Mailgun webhook receives the email
Email job enqueued to DragonflyDB stream: email:receipts
email-worker (ReceiptWitness) consumes the job
Worker resolves user via email_inbound_token lookup in DB
Retailer detected from email content (meijer / kroger / target)
Email parsed into Purchase + PurchaseItem records
receipt.ingested event published to Redis
MatchResult created with method=upc, confidence=1.0 for known UPCs
```
### Key Components
| Component | Location | Role |
|-----------|----------|------|
| `users.email_inbound_token` | DB (migration `001_add_email_inbound_token`) | 22-char unique token per user; used as email routing identifier |
| `email:receipts` stream | DragonflyDB | Queue holding pending email jobs |
| `email-worker` | `receiptwitness/src/receiptwitness/worker/email_worker.py` | Async worker consuming the stream |
| `BaseEmailParser` | `receiptwitness/src/receiptwitness/parsers/email/base.py` | Abstract parser; subclasses for meijer/kroger/target |
| Retailer detectors | `receiptwitness/src/receiptwitness/parsers/email/detector.py` | Sifts sender/subject to pick the right parser |
### Email Address Format
Each user is assigned a unique inbound token. The receipt submission email address is shown in **Settings → Receipt Email** on the UI:
**Address:** `receipts+<email_inbound_token>@receipts.cartsnitch.com`
To find a user's token in the UAT database (requires `kubectl` access to `cartsnitch-uat`):
```bash
kubectl exec -n cartsnitch-uat deployment/cartsnitch-api -- \\
python -c "from cartsnitch_common.database import get_sync_session; \\
from cartsnitch_common.models.user import User; \\
from sqlalchemy import select; \\
s = get_sync_session('postgresql://cartsnitch:cartsnitch@cartsnitch-pg-rw:5432/cartsnitch'); \\
u = s.execute(select(User).where(User.email=='dottie@example.com')).scalar_one(); \\
print(u.email_inbound_token)"
```
---
## Submitting a Test Receipt (Step-by-Step)
### Prerequisites
- A test user account in UAT with a known `email_inbound_token`
- A sample receipt email with a **known UPC** from the seeded `normalized_products` table
### Steps
1. **Obtain the test user's inbound token.**
Use the UAT Settings → Receipt Email page in the UI to see the full address `receipts+<token>@receipts.cartsnitch.com`, or query the DB directly (see above).
2. **Compose the email.**
Send to: the address shown in Settings → Receipt Email
Subject: anything
Body: plain-text or HTML receipt content
3. **Expected behavior after email is processed:**
- A `Receipt` row is created in `purchases`
- `PurchaseItem` rows are created with `upc` matching the seeded product UPC
- A `MatchResult` is created with `method='upc'` and `confidence=1.0`
---
## Known UPC for Dottie (from UAT seed)
> **NOTE:** `kubectl` is not available in this execution environment. The UAT seed and DB query could not be executed. The sample receipt below uses a plausible placeholder UPC. Before Dottie runs the regression:
> 1. Run `bash scripts/seed-env.sh uat` from a machine with UAT kubecontext
> 2. Query: `SELECT id, canonical_name, upc_variants->0->>'upc' AS sample_upc FROM normalized_products WHERE jsonb_array_length(upc_variants) > 0 LIMIT 1;`
> 3. Replace the placeholder values below with the real captured row
- `id`: **TBD — run seed and query UAT DB**
- `name`: **TBD — run seed and query UAT DB**
- `sample UPC`: **TBD — run seed and query UAT DB**
### Meijer Sample Receipt (plain text)
```
Meijer
===================================
Purchase Date: 03/15/2026
Store: Meijer #127 - Ann Arbor, MI
-----------------------------------
1 x Organic Whole Milk 1gal $4.99
1 x Whole Wheat Bread $3.29
1 x Bananas (2 lb) $0.67
1 x Chicken Breast (3 lb) $12.47
1 x Cheddar Cheese Block 8oz $5.99
-----------------------------------
Subtotal: $27.41
Tax: $1.93
Total: $29.34
===================================
THANK YOU FOR SHOPPING MEIJER
===================================
```
Meijer
===================================
Purchase Date: 03/15/2026
Store: Meijer #127 - Ann Arbor, MI
-----------------------------------
1 x Organic Whole Milk 1gal $4.99
1 x Whole Wheat Bread $3.29
1 x Bananas (2 lb) $0.67
1 x Chicken Breast (3 lb) $12.47
1 x Cheddar Cheese Block 8oz $5.99
-----------------------------------
Subtotal: $27.41
Tax: $1.93
Total: $29.34
===================================
THANK YOU FOR SHOPPING MEIJER
===================================
```
> **Note:** The `email-worker` parses the email body and extracts line items by retailer. The exact format and field mapping depends on the retailer parser. For Meijer, the parser looks for item lines matching `(\d+) x (.+?)\s+\$([\d.]+)`. UPCs in the `upc_variants` JSONB of seeded products will be matched during the normalization step.
### Kroger Sample Receipt (plain text)
```
KROGER
===================================
Purchase Date: 03/15/2026
Store: KROGER #412 - Ann Arbor MI
-----------------------------------
1 Organic Whole Milk 1gal $5.29
1 Whole Wheat Bread $3.49
1 Bananas (2 lb) $0.69
1 Chicken Breast (3 lb) $11.99
1 Sharp Cheddar Cheese 8oz $4.99
-----------------------------------
Subtotal: $26.45
Tax: $1.85
Total: $28.30
===================================
```
### Target Sample Receipt (plain text)
```
TARGET
===================================
03/15/2026 14:32
Store: 0874 Ann Arbor, MI
===================================
1 Organic Whole Milk 1G $5.49
1 Whole Wheat Bread $3.29
1 Bananas LB 2 $0.68
1 Chicken Breast 3# $12.99
1 Cheddar Cheese 8OZ $5.79
-----------------------------------
Subtotal: $28.24
Tax (6%): $1.69
Total: $29.93
===================================
```
---
## Troubleshooting
### Email not processed
1. Check the `email:receipts` stream has messages:
```bash
kubectl exec -n cartsnitch-uat deploy/email-worker -- python -c \\
"import asyncio; from receiptwitness.queue.email import get_redis; \\
async def chk(): c = await get_redis(); info = await c.xinfo_stream('email:receipts'); print(info); \\
asyncio.run(chk())"
```
2. Check `email-worker` logs for retailer detection failures:
```bash
kubectl logs -n cartsnitch-uat deploy/email-worker -f
```
3. Verify the token resolves to a user in the DB:
```bash
kubectl exec -n cartsnitch-uat deploy/cartsnitch-api -- \\
python -c "from cartsnitch_common.database import get_sync_session; \\
from cartsnitch_common.models.user import User; \\
from sqlalchemy import select; \\
s = get_sync_session('postgresql://...'); \\
r = s.execute(select(User.email_inbound_token).limit(5)).all(); \\
print(r)"
```
### No MatchResult created
The normalization pipeline requires a `normalized_product` row with the submitted UPC in `upc_variants`. If the seed was run, the product should be found. Check the `match_results` table after submission:
```sql
SELECT mr.*, np.canonical_name
FROM match_results mr
JOIN normalized_products np ON np.id = mr.normalized_product_id
WHERE mr.match_method = 'upc'
ORDER BY mr.created_at DESC
LIMIT 10;
```
---
## Related Files
| File | Role |
|------|------|
| `common/alembic/versions/001_add_email_inbound_token.py` | Adds `email_inbound_token` column |
| `receiptwitness/src/receiptwitness/worker/email_worker.py` | Consumes email jobs from stream |
| `receiptwitness/src/receiptwitness/queue/email.py` | DragonflyDB stream consumer group |
| `receiptwitness/src/receiptwitness/parsers/email/detector.py` | Retailer detection |
| `receiptwitness/src/receiptwitness/parsers/email/meijer.py` | Meijer email parser |
| `receiptwitness/src/receiptwitness/parsers/email/kroger.py` | Kroger email parser |
| `receiptwitness/src/receiptwitness/parsers/email/target.py` | Target email parser |
| `docs/uat-runbook.md` | UAT runbook (defect classification, entry/exit criteria) |
+38
View File
@@ -0,0 +1,38 @@
#!/usr/bin/env bash
# =============================================================================
# apply-seed-job.sh — Apply the seed Job manifest for a given environment.
#
# Usage:
# ./apply-seed-job.sh <env>
#
# Example:
# ./apply-seed-job.sh uat
# ./apply-seed-job.sh dev
# =============================================================================
set -euo pipefail
ENV="${1:-}"
HELP_FLAG=""
while [[ $# -gt 0 ]]; do
case "$1" in
--help) HELP_FLAG="1"; shift ;;
*) ENV="$1"; shift ;;
esac
done
if [[ -n "$HELP_FLAG" ]] || [[ -z "$ENV" ]]; then
echo "Usage: $0 <env>"
echo " env dev or uat"
exit 0
fi
if [[ "$ENV" != "dev" && "$ENV" != "uat" ]]; then
echo "ERROR: Invalid environment: $ENV (must be 'dev' or 'uat')" >&2
exit 1
fi
SCRIPT_DIR="$(dirname "$0")"
sed "s/__ENV__/${ENV}/g" "${SCRIPT_DIR}/seed-env-job.yaml" | kubectl apply -f -
echo "Seed job applied for environment: $ENV"
+2 -103
View File
@@ -1,104 +1,3 @@
#!/usr/bin/env bash
# =============================================================================
# seed-dev.sh — Run the CartSnitch seed runner against the dev database.
#
# Usage:
# ./seed-dev.sh Run full seed against dev
# ./seed-dev.sh --dry-run Show planned record counts without writing
# ./seed-dev.sh --help Show this help
#
# Prerequisites:
# - kubectl configured for the cartsnitch-dev cluster
# - Namespace cartsnitch-dev exists (CNPG Postgres must be running)
#
# What it does:
# 1. Starts a background port-forward to cartsnitch-pg-rw:5432
# 2. Waits for the tunnel to be ready
# 3. Runs python -m cartsnitch_common.seed with --database-url pointing
# to localhost:<forwarded-port>/cartsnitch
# 4. Cleans up the port-forward on exit (normal, interrupt, or error)
# =============================================================================
set -euo pipefail
# --- Config -------------------------------------------------------------------
readonly NAMESPACE="cartsnitch-dev"
readonly SVC_NAME="cartsnitch-pg-rw"
readonly LOCAL_PORT="5433" # use a non-privileged port to avoid conflicts
readonly DB_NAME="cartsnitch"
readonly PG_USER="cartsnitch"
# Retrieve password from the CNPG credentials secret
readonly PG_PASSWORD="$(
kubectl get secret cartsnitch-pg-credentials \
-n "$NAMESPACE" \
-o jsonpath='{.data.password}' \
| base64 -d
)"
readonly DB_URL="postgresql://${PG_USER}:${PG_PASSWORD}@localhost:${LOCAL_PORT}/${DB_NAME}"
# --- Helpers ------------------------------------------------------------------
log() { echo "[seed-dev] $*"; }
fail() { log "ERROR: $*" >&2; exit 1; }
# Cleanup port-forward and exit.
cleanup() {
if [[ -n "${PF_PID:-}" ]]; then
log "Stopping port-forward (PID $PF_PID)..."
kill "$PF_PID" 2>/dev/null || true
wait "$PF_PID" 2>/dev/null || true
fi
}
trap cleanup EXIT
# --- Args ---------------------------------------------------------------------
DRY_RUN=""
HELP_FLAG=""
while [[ $# -gt 0 ]]; do
case "$1" in
--dry-run) DRY_RUN="--dry-run"; shift ;;
--help) HELP_FLAG="1"; shift ;;
*) fail "Unknown argument: $1";;
esac
done
if [[ -n "$HELP_FLAG" ]]; then
sed -n '3,/^# ---/p' "$0" | head -n -1 | sed 's/^# //'
echo ""
echo "Additional arguments are passed through to the seed runner."
echo "Common seed-runner options:"
echo " --dry-run Show planned record counts without writing"
echo " --seed N Set random seed (default: 42)"
exit 0
fi
# --- Prerequisites ------------------------------------------------------------
if ! command -v kubectl &>/dev/null; then
fail "kubectl not found — must be installed and configured."
fi
# --- Port-forward -------------------------------------------------------------
log "Starting port-forward ${SVC_NAME}:5432 -> localhost:${LOCAL_PORT} ..."
kubectl port-forward \
-n "$NAMESPACE" \
svc/"$SVC_NAME" \
"${LOCAL_PORT}:5432" \
&>/dev/null &
PF_PID=$!
# Give the tunnel a moment to establish
sleep 2
# Verify the tunnel is up
if ! kill -0 "$PF_PID" 2>/dev/null; then
fail "Port-forward failed to start."
fi
log "Port-forward active (PID $PF_PID) on localhost:${LOCAL_PORT}"
# --- Seed --------------------------------------------------------------------
log "Running seed against dev database..."
set -x
python -m cartsnitch_common.seed --database-url "$DB_URL" $DRY_RUN
set +x
log "Done."
# Backward-compat wrapper — delegates to seed-env.sh dev
exec "$(dirname "$0")/seed-env.sh" dev "$@"
+58
View File
@@ -0,0 +1,58 @@
# seed-env-job.yaml
# K8s Job to run the CartSnitch seed runner against any CartSnitch database.
#
# Usage (via apply-seed-job.sh):
# bash scripts/apply-seed-job.sh dev
# bash scripts/apply-seed-job.sh uat
#
# To view logs:
# kubectl logs -n cartsnitch-<env> job/seed-env -f
#
# To re-run after fixing issues:
# kubectl delete -f - -n cartsnitch-<env> && bash scripts/apply-seed-job.sh <env>
#
apiVersion: batch/v1
kind: Job
metadata:
name: seed-env
namespace: cartsnitch-__ENV__
labels:
app: cartsnitch
component: seed
environment: __ENV__
annotations:
description: "Runs cartsnitch-common seed runner to populate __ENV__ database with realistic test data."
spec:
backoffLimit: 0
concurrencyPolicy: Forbid
template:
metadata:
labels:
app: cartsnitch
component: seed
environment: __ENV__
spec:
restartPolicy: Never
containers:
- name: seed
image: python:3.12-slim
command:
- sh
- -c
- |
pip install --no-cache-dir "cartsnitch-common @ git+https://github.com/cartsnitch/common.git@main" && \
python -m cartsnitch_common.seed --database-url "$${DATABASE_URL}"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: cartsnitch-secrets
key: database-url-pg
optional: false
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
+122
View File
@@ -0,0 +1,122 @@
#!/usr/bin/env bash
# =============================================================================
# seed-env.sh — Run the CartSnitch seed runner against any CartSnitch database.
#
# Usage:
# ./seed-env.sh [--env dev|uat] [--dry-run] [--help]
# ./seed-env.sh uat --dry-run Run dry-run against UAT
# ./seed-env.sh dev Run full seed against dev (default)
#
# Prerequisites:
# - kubectl configured for the target cluster
# - Namespace cartsnitch-<env> exists (CNPG Postgres must be running)
#
# What it does:
# 1. Starts a background port-forward to cartsnitch-pg-rw:5432
# 2. Waits for the tunnel to be ready
# 3. Runs python -m cartsnitch_common.seed with --database-url pointing
# to localhost:<forwarded-port>/cartsnitch
# 4. Cleans up the port-forward on exit (normal, interrupt, or error)
# =============================================================================
set -euo pipefail
# --- Config -------------------------------------------------------------------
ENV="dev"
if [[ "${1:-}" == "dev" || "${1:-}" == "uat" ]]; then
ENV="$1"; shift
fi
while [[ $# -gt 0 ]]; do
case "$1" in
--env) ENV="$2"; shift 2 ;;
--dry-run|--help) break ;;
*) break ;;
esac
done
NAMESPACE="cartsnitch-${ENV}"
SVC_NAME="cartsnitch-pg-rw"
LOCAL_PORT="5433"
DB_NAME="cartsnitch"
PG_USER="cartsnitch"
PG_PASSWORD="$(
kubectl get secret cartsnitch-pg-credentials \
-n "$NAMESPACE" \
-o jsonpath='{.data.password}' \
| base64 -d
)"
DB_URL="postgresql://${PG_USER}:${PG_PASSWORD}@localhost:${LOCAL_PORT}/${DB_NAME}"
# --- Helpers ------------------------------------------------------------------
log() { echo "[seed-env] [$ENV] $*"; }
fail() { log "ERROR: $*" >&2; exit 1; }
cleanup() {
if [[ -n "${PF_PID:-}" ]]; then
log "Stopping port-forward (PID $PF_PID)..."
kill "$PF_PID" 2>/dev/null || true
wait "$PF_PID" 2>/dev/null || true
fi
}
trap cleanup EXIT
# --- Args ---------------------------------------------------------------------
DRY_RUN=""
HELP_FLAG=""
while [[ $# -gt 0 ]]; do
case "$1" in
--dry-run) DRY_RUN="--dry-run"; shift ;;
--help) HELP_FLAG="1"; shift ;;
*) fail "Unknown argument: $1";;
esac
done
if [[ -n "$HELP_FLAG" ]]; then
echo "Usage: $0 [--env dev|uat] [--dry-run] [--help]"
echo ""
echo "Positional / keyword arguments:"
echo " --env dev|uat Target environment (default: dev)"
echo " --dry-run Show planned record counts without writing"
echo " --help Show this help"
echo ""
echo "Additional arguments are passed through to the seed runner."
echo "Common seed-runner options:"
echo " --seed N Set random seed (default: 42)"
exit 0
fi
# --- Validate env --------------------------------------------------------------
if [[ "$ENV" != "dev" && "$ENV" != "uat" ]]; then
fail "Invalid environment: $ENV (must be 'dev' or 'uat')"
fi
# --- Prerequisites ------------------------------------------------------------
if ! command -v kubectl &>/dev/null; then
fail "kubectl not found — must be installed and configured."
fi
# --- Port-forward -------------------------------------------------------------
log "Starting port-forward ${SVC_NAME}:5432 -> localhost:${LOCAL_PORT} ..."
kubectl port-forward \
-n "$NAMESPACE" \
svc/"$SVC_NAME" \
"${LOCAL_PORT}:5432" \
&>/dev/null &
PF_PID=$!
sleep 2
if ! kill -0 "$PF_PID" 2>/dev/null; then
fail "Port-forward failed to start."
fi
log "Port-forward active (PID $PF_PID) on localhost:${LOCAL_PORT}"
# --- Seed --------------------------------------------------------------------
log "Running seed against ${ENV} database..."
set -x
python -m cartsnitch_common.seed --database-url "$DB_URL" $DRY_RUN
set +x
log "Done."