diff --git a/skills/safety/SKILL.md b/skills/safety/SKILL.md index 0be7115..a4a01bb 100644 --- a/skills/safety/SKILL.md +++ b/skills/safety/SKILL.md @@ -2,9 +2,8 @@ name: safety description: > Non-negotiable safety rules for all GroomBook agents. Covers secret handling, - destructive-action gating, the SealedSecrets workflow, the tools we use vs. - the ones we don't, and the escalation protocol when an action's safety is - uncertain. + destructive-action gating, the SealedSecrets workflow, kubectl scope limits, + and the escalation protocol when an action's safety is uncertain. --- # Safety @@ -19,19 +18,13 @@ The following rules apply to every GroomBook agent without exception. * **Never commit plaintext secrets.** Kubernetes secrets go through Bitnami Sealed Secrets (`kubeseal`). Application credentials go in environment variables injected at runtime — never hardcoded in source. -* **Never use `kubectl create secret` in production.** The `groombook` and `groombook-uat` namespaces are Flux-managed. Secret changes go through the SealedSecrets workflow, committed to `groombook/infra`. The `groombook-dev` namespace permits direct kubectl use for iteration but secrets there should also follow the same pattern when they reflect anything sensitive. +* **Never `kubectl apply` against production (`groombook`).** The production namespace is Flux-managed. Manifest changes go through a PR to `groombook/infra` and are reconciled by Flux. The `groombook-dev` and `groombook-uat` namespaces permit direct kubectl use for iteration; secrets at every environment still follow the SealedSecrets pattern. + +* **Never `kubectl create secret` in production.** All secrets — at every environment — go through SealedSecrets, encrypted with `kubeseal`, committed as `SealedSecret` resources to `groombook/infra`. * **Never bypass the merge gate.** No self-merging PRs. No pushing directly to `dev`, `uat`, or `main`. Every change goes through a PR with the reviews required by the `sdlc` skill. -## Tools (canonical, not alternatives) - -* **Secret management:** Bitnami Sealed Secrets Controller — no plain Kubernetes secrets. -* **Database:** CloudNativePG Operator (Postgres) — no SQLite, MariaDB, or MySQL. -* **Cache / pub-sub:** DragonflyDB Operator — no Redis. -* **Dependency updates:** Mend Renovate — no Dependabot. -* **Container registry:** `ghcr.io` — no Docker Hub for first-party images. - -If a task requires deviating from any of the above, treat it as a destructive action: stop, file an issue with rationale, request board approval. +* **Never run `tofu` directly.** Terraform / OpenTofu goes through the Flux OpenTofu Controller via a PR to `groombook/infra`. ## If you are unsure diff --git a/skills/sdlc/SKILL.md b/skills/sdlc/SKILL.md index 37d9ae4..713ac25 100644 --- a/skills/sdlc/SKILL.md +++ b/skills/sdlc/SKILL.md @@ -2,10 +2,11 @@ name: sdlc description: > Software development lifecycle for GroomBook. Covers GitHub authentication, - branch strategy across Dev/UAT/Prod, PR review and merge policy, the SDLC - pipeline and stage handoffs, status semantics, infrastructure layout, the - GitHub-origin issue board-approval gate, and the cc-cpfarhood visibility - rule. + branch strategy across Dev/UAT/Prod, the four-phase SDLC pipeline with + product analysis intake, PR review and merge policy, the handoff protocol, + status semantics, infrastructure layout, the canonical tools list, the + GitHub-origin issue board-approval gate, the cc-cpfarhood visibility rule, + and the scheduled penetration testing program. --- # Software Development Lifecycle @@ -14,6 +15,8 @@ description: > **Invoke the `github-app-token` skill** before any GitHub operation. It generates a short-lived installation token and sets `GH_TOKEN`. **Never** run `gh auth login` — it hangs headless agents. Token expires after ~1 hour; re-invoke to regenerate. +GitHub is the **primary source of truth**. Every Paperclip issue should have a corresponding GitHub issue (create one if missing). Both stay open until the work is completed, reviewed, approved, merged, and QA-verified. + ## GitHub-origin issue policy — board approval required If a task originated from GitHub (`originKind: "github"`), **do not begin work**. Immediately create a board approval: @@ -41,77 +44,85 @@ Three long-lived branches map to the three deployment environments: | Branch | Environment | Who merges | |--------|-------------|-----------| -| `dev` | Dev | CTO (after QA + CTO approval) | -| `uat` | UAT | CTO (promotes `dev` → `uat`) | -| `main` | Production | CEO (promotes `uat` → `main`) | +| `dev` | Dev | CTO (after QA approval) | +| `uat` | UAT | CTO (promotes `dev` → `uat`) | +| `main` | Production | CEO (promotes `uat` → `main`) | -**Engineers always target `dev`** — never `uat` or `main` directly. +**Engineers always target `dev`** — never `uat` or `main` directly. Feature branches: `/`. ## Pull requests All changes happen via pull request. Always include `cc @cpfarhood` at the bottom of the PR body for visibility — never as a reviewer. ```bash -gh pr create --title "..." --body "... cc @cpfarhood" +gh pr create --base dev --title "..." --body "... cc @cpfarhood" ``` ## PR review & merge policy ### Dev branch (`dev`) -Requires **2 approving GitHub reviews** before merge: - -1. **QA** (Lint Roller) — code review, CI signal, test coverage -2. **CTO** (The Dogfather) — architecture, security, correctness - -CTO review requires QA approval as a precondition. +- **QA** (Lint Roller) reviews the PR. Approve → hand to CTO. Fail → back to engineer directly with exact details. +- **CTO** (The Dogfather) reviews. Approve → CTO merges the `dev` PR. Fail → back to engineer. ### UAT branch (`uat`) -Requires **1 approving GitHub review** before merge: - -* **CTO** (The Dogfather) — promotes `dev` → `uat` +- **CTO** opens and merges a `dev` → `uat` PR. ### Main branch (`main`) -Requires **1 approving GitHub review** before merge: - -* **CEO** (Scrubs McBarkley) — promotes `uat` → `main` +- **CEO** (Scrubs McBarkley) reviews and merges the `uat` → `main` PR. `@cpfarhood` is cc'd for visibility on all PRs — never as a reviewer. -## Pipeline +## SDLC pipeline -``` -Dev stage: Engineer → QA Review → CTO Review → CTO merges PR to dev → [auto deploy Dev] -UAT stage: CTO opens dev→uat PR → Shedward (regression) → Barkley (security) → CEO assigned -Prod stage: CEO merges uat→main PR → [auto deploy Production] -``` +### Phase 0 — Product analysis (feature intake) -### Dev stage +* Feature requests arrive at the CEO via Paperclip or GitHub Issues. +* CEO delegates to CMPO (Pawla Abdul) for review. +* CMPO returns one of three decisions: + * **Accepted** → CEO routes to CTO for work breakdown. + * **Backlogged** → CEO handles prioritization. + * **Denied** → CEO closes as unplanned. +* CTO breaks accepted work into atomic tasks and assigns to Engineering. -1. Engineer creates a PR targeting `dev`, hands off to QA (Lint Roller) with `status: "todo"`. -2. QA reviews code and CI. Pass → hand to CTO. Fail → hand back to engineer directly with exact failure details. -3. CTO reviews. Approve → merge PR into `dev` (auto-deploys to Dev). Deny → hand back to engineer. +### Phase 1 — Dev -### UAT stage +1. **Engineer** (Flea Flicker) branches from `dev`, writes code. GitOps deploys to dev on demand. +2. **Engineer** opens a PR against `dev`. CI must pass. +3. **QA (Lint Roller)** reviews the PR. Fail → back to engineer. +4. QA approves and hands off to CTO. +5. **CTO (The Dogfather)** reviews the PR. Fail → back to engineer. +6. **CTO** merges the dev PR. +7. **CI** builds and deploys automatically to Dev (`https://dev.groombook.dev`). -4. CTO opens a PR from `dev` → `uat` to promote the change, assigns Shedward Scissorhands for regression: `status: "todo"`. -5. Shedward runs UAT in `uat.groombook.dev`. Pass → reports to CTO. Fail → reports to CTO (CTO cascades to engineer). -6. CTO assigns Barkley Trimsworth for security review: `status: "todo"`. -7. Barkley reviews. Pass → CTO assigns to CEO. Fail → CTO cascades to engineer. +### Phase 2 — UAT promotion -### Prod stage +8. **CTO** opens and merges a PR from `dev` to `uat`. +9. **CI** builds and deploys automatically to UAT (`https://uat.groombook.dev`). +10. **CTO** creates a UAT regression task for **Shedward Scissorhands** immediately after promoting. -8. CEO reviews and merges the `uat` → `main` PR → auto-deploys to Production. -9. CEO rejects → returns to CTO → engineer. +### Phase 3 — UAT testing & security + +11. **UAT (Shedward Scissorhands)** runs full regression against UAT — every feature, no exceptions. +12. UAT fail → CTO redistributes to engineer (return to Phase 1). +13. UAT pass → **Security Engineer (Barkley Trimsworth)** performs a security code review of the changes. +14. Security fail → CTO redistributes to engineer (return to Phase 1). + +### Phase 4 — Production + +15. Security pass → **CEO (Scrubs McBarkley)** reviews and merges the production PR (`uat → main`). Fail → back to CTO. +16. **CI** deploys automatically to Production (`https://demo.groombook.dev`). ### Hierarchy rules -* CTO rejections go directly to the engineer (not through QA). -* Shedward UAT failures go to CTO (not directly to the engineer). -* Barkley security failures go to CTO. -* CEO rejections go to CTO. +* CTO rejections at Dev go directly to the engineer (not back through QA). +* UAT failures (Shedward) go to CTO — CTO cascades to engineer. +* Security failures (Barkley) go to CTO — CTO cascades to engineer. +* CEO rejections at Prod go to CTO. + +> **Penetration testing.** Barkley performs scheduled penetration testing against Production (`demo.groombook.dev`) and Demo independently of the PR workflow. Board-authorized; not triggered per-PR. Findings get filed as Paperclip issues with severity (`CRITICAL` / `HIGH` / `MEDIUM` / `LOW`) and routed to CTO for engineer redistribution. ## Handoff protocol — mandatory @@ -119,11 +130,11 @@ Every handoff to another agent requires ALL THREE steps: ### 1. Explicit assignment -PATCH the issue with `assigneeAgentId: ""`. Mentioning is NOT a handoff — the agent won't wake without explicit assignment. +`PATCH /api/issues/{id}` with `assigneeAgentId: ""`. Mentioning is NOT a handoff — the agent won't wake without explicit assignment. ### 2. Status = `todo` -Every handoff sets `status: "todo"`. Never `in_review` for handoffs — it doesn't surface in the receiver's inbox. +Every handoff sets `status: "todo"`. Never `in_review`, never `backlog` — both are invisible in inbox-lite and the receiver won't wake. ### 3. Release checkout @@ -134,22 +145,70 @@ Headers: Authorization: Bearer $PAPERCLIP_API_KEY, X-Paperclip-Run-Id: $PAPERCLI Without this release, the receiving agent cannot check out the issue. +**Saying you are reassigning a task is NOT the same as reassigning it.** Verify the PATCH succeeded (200) before posting a comment claiming the handoff is done. + ## Infrastructure * **Production / Demo:** namespace `groombook`, FQDN `demo.groombook.dev` * **UAT:** namespace `groombook-uat`, FQDN `uat.groombook.dev` * **Dev:** namespace `groombook-dev`, FQDN `dev.groombook.dev` -* **Auth:** Better-Auth + OAuth2 via Authentik OIDC at `https://auth.farh.net` (credentials in `authentik-credentials` secret) -* **Cluster:** Kubernetes — cluster-wide read; read/write on `groombook-dev` and `groombook-uat`. -* **Gateways:** `istio-external` and `istio-internal` in `gateway-system`. -* **Deployment:** 2-stage Flux GitOps — CI builds images → updates tags in `groombook/infra` → Flux applies. Never `kubectl apply` for app manifests. No Flux Image Automation. -* **Infra provisioning:** Commit OpenTofu HCL to `groombook/infra`. Never run `tofu` directly. -* **Dependency updates:** Mend Renovate only. Never Dependabot. +* **Cluster:** Kubernetes — cluster-wide read; read/write on `groombook-dev` and `groombook-uat`; read-only on `groombook` (production). +* **Gateways:** `istio-external` (publicly accessible) and `istio-internal` (internal only) in `gateway-system`. +* **Container registry:** `ghcr.io/groombook/` only. + +## Authentication + +* **Framework:** Better-Auth. +* **Social login:** Google and Apple OAuth. +* **SSO:** Authentik OIDC at `https://auth.farh.net` (credentials in `authentik-credentials` secret). +* **Never build custom authentication.** + +## Deployment — 2-stage Flux GitOps + +**Stage 1 — CI (GitHub Actions, runs in each application repo):** +- Triggered automatically on every merge to `main` +- Builds and tags the Docker image +- Pushes tagged images to `ghcr.io/groombook/` + +**Stage 2 — GitOps (Flux, managed externally):** +- Flux watches `groombook/infra` as the **target** GitRepository — it is **not** a Flux bootstrap/cluster repo. +- Reconciles Kustomize overlays: `apps/overlays/dev` → `groombook-dev`, `apps/overlays/uat` → `groombook-uat`, `apps/overlays/prod` → `groombook`. + +**Policy — Flux Image Tag Automation is DENIED.** Do NOT use `ImageRepository`, `ImagePolicy`, or `ImageUpdateAutomation` Flux resources. Image tag updates must be made intentionally via a PR to `groombook/infra`. + +**To deploy a change:** +1. Merge code to `main` in the app repo — CI builds and pushes a new image automatically. +2. Open a PR against `groombook/infra` to update the relevant overlay; merge after kustomize CI passes. +3. Flux reconciles `groombook/infra` on merge and rolls out the updated pods. + +**To force a rollout** (pick up new `:latest` on stuck pods): +```bash +kubectl rollout restart deployment/ -n +``` + +## Infrastructure as Code + +Terraform / OpenTofu is deployed via the **Flux OpenTofu Controller** in a GitOps fashion. Submit configurations via a PR to `groombook/infra` — the tofu controller reconciles them on merge. + +**Never run `tofu` directly.** Never `kubectl apply` against production. Production changes go through Flux only. + +## Tools (canonical, not alternatives) + +These are the only acceptable choices — alternatives are policy violations: + +* **Secret management:** Bitnami Sealed Secrets Controller — no plain Kubernetes secrets. +* **Database:** CloudNativePG Operator (Postgres) — no SQLite, MariaDB, or MySQL. +* **Cache / pub-sub:** DragonflyDB Operator — no Redis. +* **Authentication:** Better-Auth + Google + Apple + Authentik (see Authentication section). Never build custom auth. +* **Dependency updates:** Mend Renovate. **Dependabot is not used and will not be used.** +* **Container registry:** `ghcr.io/groombook/` — no Docker Hub for first-party images. + +If a task requires deviating from any of the above, treat it as a destructive action: stop, file an issue with rationale, request board approval. ## External communication -When communicating in any context visible outside the GroomBook agent team (external users, human reviewers, non-agent entities), include `cc @cpfarhood` for visibility. +When communicating in any context visible outside the GroomBook agent team (external users, human reviewers, non-agent entities), include `cc @cpfarhood` for visibility — never as a reviewer. ## No self-merge -No agent merges their own PR. The merger is always the next role up the SDLC ladder (CTO for `dev`/`uat`, CEO for `main`). +No agent merges their own PR. The merger is always the next role up the SDLC ladder (CTO for `dev` and `uat`, CEO for `main`).