Files
org/skills/devops/SKILL.md
T
Scrubs McBarkley 07174fe233 docs(devops): fix-forward-in-git; ban escalating reconcilable changes as manual/board actions (GRO-2536)
The board flagged that agents repeatedly requested board approval and hand-run
kubectl on a Flux-managed cluster — unfillable, wrong, and the root cause of a
multi-day stall. The devops skill prohibited `kubectl apply` to prod but never
stated the corollary: the resolution of any reconcilable breakage is a PR to
groombook/infra, never a human-run command. This adds that contract explicitly.

cc @cpfarhood
2026-06-25 11:50:00 +00:00

6.5 KiB

name, description
name description
devops Infrastructure lifecycle for GroomBook. Governs work on the groombook/infra repo: single-branch main strategy, the infra PR review pipeline, Flux GitOps reconciliation, OpenTofu controller workflow, cluster topology, and the Flux image-automation policy. For application code, see the sdlc skill.

DevOps Practices

This skill governs work on groombook/infra. For application code lifecycle, see the sdlc skill. For PR/test discipline and the cc @cpfarhood visibility rule, see coding-standards. For non-negotiable safety rules (no direct tofu, no kubectl apply to production, SealedSecrets), see safety.

Gitea authentication

Use the GITEA_TOKEN environment variable for all Gitea operations — it is already set in the agent environment. Use the tea CLI for all Gitea/Git operations (e.g., tea issue list, tea pr create). Gitea is the primary source of truth.

Branch strategy

groombook/infra uses a single long-lived branch: main. Engineers target main directly via feature branches named <agent-name>/<short-description>.

Pipeline

  1. Engineer branches from main, writes code.
  2. Engineer opens a PR against main.
  3. CI fail → back to Engineer.
  4. CI pass → QA performs code review.
  5. QA rejected → back to Engineer.
  6. QA approved → CTO performs code review.
  7. CTO rejected → back to Engineer.
  8. CTO approved → Engineer merges PR → Flux reconciles automatically.
tea pr create --base main --title "..." --body "... cc @cpfarhood"

Gitea branch protection requires CI checks to pass. See coding-standards for the no-self-merge contract and the cc @cpfarhood rule.

Infrastructure topology

  • Production: namespace groombook, FQDN demo.groombook.dev
  • UAT: namespace groombook-uat, FQDN uat.groombook.dev
  • Dev: namespace groombook-dev, FQDN dev.groombook.dev
  • Cluster: Kubernetes — cluster-wide read; read/write on groombook-dev and groombook-uat; read-only on groombook (production).
  • Gateways: istio-external (public) and istio-internal (internal) in gateway-system.
  • Container registry: git.farh.net/groombook/<service> only.

GitOps (Flux)

Flux watches groombook/infra as the target GitRepository — it is not a Flux bootstrap/cluster repo and must never be treated as one.

Reconciles Kustomize overlays:

  • apps/overlays/devgroombook-dev
  • apps/overlays/uatgroombook-uat
  • apps/overlays/prodgroombook

Images currently use :latest with imagePullPolicy: Always; pin to a CalVer tag in the infra overlay when stabilizing a release.

Policy — Flux Image Tag Automation is DENIED. Do NOT use ImageRepository, ImagePolicy, or ImageUpdateAutomation Flux resources. Image tag updates must be made intentionally via a PR to groombook/infra — typically as the final step of the sdlc application pipeline (Phase 5).

When a cluster is broken: fix forward in git — never escalate a manual action

The cluster is reconciled by controllers (Flux, the OpenTofu Controller, the Sealed Secrets controller). Any change one of these controllers can reconcile MUST be delivered as a PR to groombook/infra — it is never a board approval and never a hand-run kubectl / kubeseal / tofu command.

This is the corollary of the read-only-prod and "no kubectl apply to production" rules in safety: agents are read-only on groombook by design, precisely because the write path is git. "I lack cluster-admin" therefore resolves to "open a PR," not "ask a human to run the command."

Contract:

  • Do NOT file an issue, board approval, or escalation that asks a human to run an imperative cluster command (kubectl delete/apply/patch, kubeseal, flux reconcile, tofu apply) that a controller would otherwise reconcile from git. That request is unfillable and wrong on a GitOps cluster — fix the desired state in the repo and let the controller converge.
    • SealedSecret won't unseal / wrong scope → re-seal the SealedSecret and commit it.
    • Missing or not-ready Flux Receiver, Kustomization, Terraform, RBAC, etc. → commit/correct the manifest in the overlay.
    • Stale or wrong sourceRef, annotations, ownership → fix them declaratively in the overlay.
  • A reconcile blocked on a pre-existing in-cluster object (e.g. a SealedSecret the controller won't adopt because an unmanaged or Reflector-mirrored Secret already exists) is still solved declaratively: correct ownership/annotations in git so the controller adopts it. Only if no controller can adopt the object is a one-time imperative step justified — and then it is a single, specifically-scoped, reviewed exception stating the exact reason, not a multi-day approval queue standing in for missing engineering.
  • Board approval is reserved for genuinely irreversible or out-of-band actions no controller reconciles — destroying stateful data, rotating the cluster bootstrap, bootstrapping a brand-new cluster. Routine reconcilable breakage never qualifies. (See safety for destructive-action approval.)
  • The Flux bootstrap/cluster repo is not groombook/infra (see GitOps above). A genuinely missing GitRepository or other bootstrap object is a PR to that externally-managed cluster-config repo — still a PR, still not a hand-run apply.

If you are about to write "escalated to board — a human must run …" for a reconcilable change, stop: that is the failure mode, not the fix. Open the PR.

Infrastructure as Code

Terraform (OpenTofu) is deployed via the Flux OpenTofu Controller in a GitOps fashion. Submit Terraform configurations via a PR to groombook/infra — the tofu controller reconciles them on merge. See safety for the prohibition on running tofu directly and on kubectl apply against production.

Infra-only tools

These are the operators and controllers the infra repo installs and manages. Alternatives are policy violations:

  • GitOps: Flux CD (managed externally; reconciles groombook/infra).
  • IaC: Flux OpenTofu Controller.
  • Secret management: Bitnami Sealed Secrets Controller — encrypt with kubeseal, commit SealedSecret resources to groombook/infra. No plain Kubernetes secrets.
  • Database operator: CloudNativePG (Postgres).
  • Cache / pub-sub operator: DragonflyDB.

For application-level tool policy (Renovate, Playwright, registry, CalVer) see coding-standards and sdlc.