From d3f7a91e53c0c1da70055f3cb683e6dcae85edc0 Mon Sep 17 00:00:00 2001 From: Scrubs McBarkley <18+gb_scrubs@noreply.git.farh.net> Date: Thu, 25 Jun 2026 12:35:07 +0000 Subject: [PATCH] =?UTF-8?q?docs(devops):=20fix-forward-in-git=20rule=20?= =?UTF-8?q?=E2=80=94=20ban=20escalating=20reconcilable=20changes=20as=20ma?= =?UTF-8?q?nual/board=20actions=20(GRO-2536)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds "fix forward in git" rule to devops skill: any breakage a controller (Flux/OpenTofu/SealedSecrets) can reconcile must be resolved via PR to groombook/infra — not board approval, not hand-run kubectl. Prevents recurrence of GRO-2536 stall. Co-Authored-By: Paperclip Co-authored-by: Scrubs McBarkley <18+gb_scrubs@noreply.git.farh.net> Co-committed-by: Scrubs McBarkley <18+gb_scrubs@noreply.git.farh.net> --- skills/devops/SKILL.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/skills/devops/SKILL.md b/skills/devops/SKILL.md index 5a47cdc..edd6b4e 100644 --- a/skills/devops/SKILL.md +++ b/skills/devops/SKILL.md @@ -59,6 +59,24 @@ Images currently use `:latest` with `imagePullPolicy: Always`; pin to a CalVer t **Policy — Flux Image Tag Automation is DENIED.** Do NOT use `ImageRepository`, `ImagePolicy`, or `ImageUpdateAutomation` Flux resources. Image tag updates must be made intentionally via a PR to `groombook/infra` — typically as the final step of the `sdlc` application pipeline (Phase 5). +## When a cluster is broken: fix forward in git — never escalate a manual action + +The cluster is reconciled by controllers (Flux, the OpenTofu Controller, the Sealed Secrets controller). **Any change one of these controllers can reconcile MUST be delivered as a PR to `groombook/infra`** — it is never a board approval and never a hand-run `kubectl` / `kubeseal` / `tofu` command. + +This is the corollary of the read-only-prod and "no `kubectl apply` to production" rules in `safety`: agents are read-only on `groombook` **by design**, precisely because the write path is git. "I lack cluster-admin" therefore resolves to **"open a PR,"** not **"ask a human to run the command."** + +Contract: + +- **Do NOT** file an issue, board approval, or escalation that asks a human to run an imperative cluster command (`kubectl delete/apply/patch`, `kubeseal`, `flux reconcile`, `tofu apply`) that a controller would otherwise reconcile from git. That request is unfillable and wrong on a GitOps cluster — fix the desired state in the repo and let the controller converge. + - SealedSecret won't unseal / wrong scope → re-seal the `SealedSecret` and commit it. + - Missing or not-ready Flux `Receiver`, `Kustomization`, `Terraform`, RBAC, etc. → commit/correct the manifest in the overlay. + - Stale or wrong `sourceRef`, annotations, ownership → fix them declaratively in the overlay. +- **A reconcile blocked on a pre-existing in-cluster object** (e.g. a `SealedSecret` the controller won't adopt because an unmanaged or Reflector-mirrored `Secret` already exists) is still solved declaratively: correct ownership/annotations in git so the controller adopts it. Only if **no controller can adopt the object** is a one-time imperative step justified — and then it is a single, specifically-scoped, reviewed exception stating the exact reason, **not** a multi-day approval queue standing in for missing engineering. +- **Board approval is reserved** for genuinely irreversible or out-of-band actions no controller reconciles — destroying stateful data, rotating the cluster bootstrap, bootstrapping a brand-new cluster. Routine reconcilable breakage never qualifies. (See `safety` for destructive-action approval.) +- The Flux bootstrap/cluster repo is **not** `groombook/infra` (see GitOps above). A genuinely missing `GitRepository` or other bootstrap object is a PR to that externally-managed cluster-config repo — still a PR, still not a hand-run apply. + +If you are about to write "escalated to board — a human must run …" for a reconcilable change, stop: that is the failure mode, not the fix. Open the PR. + ## Infrastructure as Code Terraform (OpenTofu) is deployed via the **Flux OpenTofu Controller** in a GitOps fashion. Submit Terraform configurations via a PR to `groombook/infra` — the tofu controller reconciles them on merge. See `safety` for the prohibition on running `tofu` directly and on `kubectl apply` against production.