--- name: devops description: > Infrastructure lifecycle for GroomBook. Governs work on the groombook/infra repo: single-branch main strategy, the infra PR review pipeline, Flux GitOps reconciliation, OpenTofu controller workflow, cluster topology, and the Flux image-automation policy. For application code, see the sdlc skill. --- # DevOps Practices This skill governs work on **`groombook/infra`**. For application code lifecycle, see the `sdlc` skill. For PR/test discipline and the `cc @cpfarhood` visibility rule, see `coding-standards`. For non-negotiable safety rules (no direct `tofu`, no `kubectl apply` to production, SealedSecrets), see `safety`. ## Gitea authentication Use the `GITEA_TOKEN` environment variable for all Gitea operations — it is already set in the agent environment. Use the **`tea`** CLI for all Gitea/Git operations (e.g., `tea issue list`, `tea pr create`). Gitea is the primary source of truth. ## Branch strategy `groombook/infra` uses a single long-lived branch: **`main`**. Engineers target `main` directly via feature branches named `/`. ## Pipeline 1. **Engineer** branches from `main`, writes code. 2. **Engineer** opens a PR against `main`. 3. **CI** fail → back to **Engineer**. 4. **CI** pass → **QA** performs code review. 5. **QA** rejected → back to **Engineer**. 6. **QA** approved → **CTO** performs code review. 7. **CTO** rejected → back to **Engineer**. 8. **CTO** approved → **Engineer** merges PR → **Flux** reconciles automatically. ```bash tea pr create --base main --title "..." --body "... cc @cpfarhood" ``` Gitea branch protection requires CI checks to pass. See `coding-standards` for the no-self-merge contract and the `cc @cpfarhood` rule. ## Infrastructure topology * **Production:** namespace `groombook`, FQDN `demo.groombook.dev` * **UAT:** namespace `groombook-uat`, FQDN `uat.groombook.dev` * **Dev:** namespace `groombook-dev`, FQDN `dev.groombook.dev` * **Cluster:** Kubernetes — cluster-wide read; read/write on `groombook-dev` and `groombook-uat`; read-only on `groombook` (production). * **Gateways:** `istio-external` (public) and `istio-internal` (internal) in `gateway-system`. * **Container registry:** `git.farh.net/groombook/` only. ## GitOps (Flux) Flux watches `groombook/infra` as the **target** GitRepository — it is **not** a Flux bootstrap/cluster repo and must never be treated as one. Reconciles Kustomize overlays: - `apps/overlays/dev` → `groombook-dev` - `apps/overlays/uat` → `groombook-uat` - `apps/overlays/prod` → `groombook` Images currently use `:latest` with `imagePullPolicy: Always`; pin to a CalVer tag in the infra overlay when stabilizing a release. **Policy — Flux Image Tag Automation is DENIED.** Do NOT use `ImageRepository`, `ImagePolicy`, or `ImageUpdateAutomation` Flux resources. Image tag updates must be made intentionally via a PR to `groombook/infra` — typically as the final step of the `sdlc` application pipeline (Phase 5). ## When a cluster is broken: fix forward in git — never escalate a manual action The cluster is reconciled by controllers (Flux, the OpenTofu Controller, the Sealed Secrets controller). **Any change one of these controllers can reconcile MUST be delivered as a PR to `groombook/infra`** — it is never a board approval and never a hand-run `kubectl` / `kubeseal` / `tofu` command. This is the corollary of the read-only-prod and "no `kubectl apply` to production" rules in `safety`: agents are read-only on `groombook` **by design**, precisely because the write path is git. "I lack cluster-admin" therefore resolves to **"open a PR,"** not **"ask a human to run the command."** Contract: - **Do NOT** file an issue, board approval, or escalation that asks a human to run an imperative cluster command (`kubectl delete/apply/patch`, `kubeseal`, `flux reconcile`, `tofu apply`) that a controller would otherwise reconcile from git. That request is unfillable and wrong on a GitOps cluster — fix the desired state in the repo and let the controller converge. - SealedSecret won't unseal / wrong scope → re-seal the `SealedSecret` and commit it. - Missing or not-ready Flux `Receiver`, `Kustomization`, `Terraform`, RBAC, etc. → commit/correct the manifest in the overlay. - Stale or wrong `sourceRef`, annotations, ownership → fix them declaratively in the overlay. - **A reconcile blocked on a pre-existing in-cluster object** (e.g. a `SealedSecret` the controller won't adopt because an unmanaged or Reflector-mirrored `Secret` already exists) is still solved declaratively: correct ownership/annotations in git so the controller adopts it. Only if **no controller can adopt the object** is a one-time imperative step justified — and then it is a single, specifically-scoped, reviewed exception stating the exact reason, **not** a multi-day approval queue standing in for missing engineering. - **Board approval is reserved** for genuinely irreversible or out-of-band actions no controller reconciles — destroying stateful data, rotating the cluster bootstrap, bootstrapping a brand-new cluster. Routine reconcilable breakage never qualifies. (See `safety` for destructive-action approval.) - The Flux bootstrap/cluster repo is **not** `groombook/infra` (see GitOps above). A genuinely missing `GitRepository` or other bootstrap object is a PR to that externally-managed cluster-config repo — still a PR, still not a hand-run apply. If you are about to write "escalated to board — a human must run …" for a reconcilable change, stop: that is the failure mode, not the fix. Open the PR. ## Infrastructure as Code Terraform (OpenTofu) is deployed via the **Flux OpenTofu Controller** in a GitOps fashion. Submit Terraform configurations via a PR to `groombook/infra` — the tofu controller reconciles them on merge. See `safety` for the prohibition on running `tofu` directly and on `kubectl apply` against production. ## Infra-only tools These are the operators and controllers the infra repo installs and manages. Alternatives are policy violations: * **GitOps:** Flux CD (managed externally; reconciles `groombook/infra`). * **IaC:** Flux OpenTofu Controller. * **Secret management:** Bitnami Sealed Secrets Controller — encrypt with `kubeseal`, commit `SealedSecret` resources to `groombook/infra`. No plain Kubernetes secrets. * **Database operator:** CloudNativePG (Postgres). * **Cache / pub-sub operator:** DragonflyDB. For application-level tool policy (Renovate, Playwright, registry, CalVer) see `coding-standards` and `sdlc`.