The board flagged that agents repeatedly requested board approval and hand-run kubectl on a Flux-managed cluster — unfillable, wrong, and the root cause of a multi-day stall. The devops skill prohibited `kubectl apply` to prod but never stated the corollary: the resolution of any reconcilable breakage is a PR to groombook/infra, never a human-run command. This adds that contract explicitly. cc @cpfarhood
6.5 KiB
name, description
| name | description |
|---|---|
| devops | Infrastructure lifecycle for GroomBook. Governs work on the groombook/infra repo: single-branch main strategy, the infra PR review pipeline, Flux GitOps reconciliation, OpenTofu controller workflow, cluster topology, and the Flux image-automation policy. For application code, see the sdlc skill. |
DevOps Practices
This skill governs work on groombook/infra. For application code lifecycle, see the sdlc skill. For PR/test discipline and the cc @cpfarhood visibility rule, see coding-standards. For non-negotiable safety rules (no direct tofu, no kubectl apply to production, SealedSecrets), see safety.
Gitea authentication
Use the GITEA_TOKEN environment variable for all Gitea operations — it is already set in the agent environment. Use the tea CLI for all Gitea/Git operations (e.g., tea issue list, tea pr create). Gitea is the primary source of truth.
Branch strategy
groombook/infra uses a single long-lived branch: main. Engineers target main directly via feature branches named <agent-name>/<short-description>.
Pipeline
- Engineer branches from
main, writes code. - Engineer opens a PR against
main. - CI fail → back to Engineer.
- CI pass → QA performs code review.
- QA rejected → back to Engineer.
- QA approved → CTO performs code review.
- CTO rejected → back to Engineer.
- CTO approved → Engineer merges PR → Flux reconciles automatically.
tea pr create --base main --title "..." --body "... cc @cpfarhood"
Gitea branch protection requires CI checks to pass. See coding-standards for the no-self-merge contract and the cc @cpfarhood rule.
Infrastructure topology
- Production: namespace
groombook, FQDNdemo.groombook.dev - UAT: namespace
groombook-uat, FQDNuat.groombook.dev - Dev: namespace
groombook-dev, FQDNdev.groombook.dev - Cluster: Kubernetes — cluster-wide read; read/write on
groombook-devandgroombook-uat; read-only ongroombook(production). - Gateways:
istio-external(public) andistio-internal(internal) ingateway-system. - Container registry:
git.farh.net/groombook/<service>only.
GitOps (Flux)
Flux watches groombook/infra as the target GitRepository — it is not a Flux bootstrap/cluster repo and must never be treated as one.
Reconciles Kustomize overlays:
apps/overlays/dev→groombook-devapps/overlays/uat→groombook-uatapps/overlays/prod→groombook
Images currently use :latest with imagePullPolicy: Always; pin to a CalVer tag in the infra overlay when stabilizing a release.
Policy — Flux Image Tag Automation is DENIED. Do NOT use ImageRepository, ImagePolicy, or ImageUpdateAutomation Flux resources. Image tag updates must be made intentionally via a PR to groombook/infra — typically as the final step of the sdlc application pipeline (Phase 5).
When a cluster is broken: fix forward in git — never escalate a manual action
The cluster is reconciled by controllers (Flux, the OpenTofu Controller, the Sealed Secrets controller). Any change one of these controllers can reconcile MUST be delivered as a PR to groombook/infra — it is never a board approval and never a hand-run kubectl / kubeseal / tofu command.
This is the corollary of the read-only-prod and "no kubectl apply to production" rules in safety: agents are read-only on groombook by design, precisely because the write path is git. "I lack cluster-admin" therefore resolves to "open a PR," not "ask a human to run the command."
Contract:
- Do NOT file an issue, board approval, or escalation that asks a human to run an imperative cluster command (
kubectl delete/apply/patch,kubeseal,flux reconcile,tofu apply) that a controller would otherwise reconcile from git. That request is unfillable and wrong on a GitOps cluster — fix the desired state in the repo and let the controller converge.- SealedSecret won't unseal / wrong scope → re-seal the
SealedSecretand commit it. - Missing or not-ready Flux
Receiver,Kustomization,Terraform, RBAC, etc. → commit/correct the manifest in the overlay. - Stale or wrong
sourceRef, annotations, ownership → fix them declaratively in the overlay.
- SealedSecret won't unseal / wrong scope → re-seal the
- A reconcile blocked on a pre-existing in-cluster object (e.g. a
SealedSecretthe controller won't adopt because an unmanaged or Reflector-mirroredSecretalready exists) is still solved declaratively: correct ownership/annotations in git so the controller adopts it. Only if no controller can adopt the object is a one-time imperative step justified — and then it is a single, specifically-scoped, reviewed exception stating the exact reason, not a multi-day approval queue standing in for missing engineering. - Board approval is reserved for genuinely irreversible or out-of-band actions no controller reconciles — destroying stateful data, rotating the cluster bootstrap, bootstrapping a brand-new cluster. Routine reconcilable breakage never qualifies. (See
safetyfor destructive-action approval.) - The Flux bootstrap/cluster repo is not
groombook/infra(see GitOps above). A genuinely missingGitRepositoryor other bootstrap object is a PR to that externally-managed cluster-config repo — still a PR, still not a hand-run apply.
If you are about to write "escalated to board — a human must run …" for a reconcilable change, stop: that is the failure mode, not the fix. Open the PR.
Infrastructure as Code
Terraform (OpenTofu) is deployed via the Flux OpenTofu Controller in a GitOps fashion. Submit Terraform configurations via a PR to groombook/infra — the tofu controller reconciles them on merge. See safety for the prohibition on running tofu directly and on kubectl apply against production.
Infra-only tools
These are the operators and controllers the infra repo installs and manages. Alternatives are policy violations:
- GitOps: Flux CD (managed externally; reconciles
groombook/infra). - IaC: Flux OpenTofu Controller.
- Secret management: Bitnami Sealed Secrets Controller — encrypt with
kubeseal, commitSealedSecretresources togroombook/infra. No plain Kubernetes secrets. - Database operator: CloudNativePG (Postgres).
- Cache / pub-sub operator: DragonflyDB.
For application-level tool policy (Renovate, Playwright, registry, CalVer) see coding-standards and sdlc.