Feat/temporal (#46)

* refactor: modularize claude-executor and extract shared utilities

- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts

* feat: add Claude Code custom commands for debug and review

* feat: add Temporal integration foundation (phase 1-2)

- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence

* feat: add Temporal activities for agent execution (phase 3)

- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state

* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)

* feat: add Temporal worker, client, and query tools (phase 5)

- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker

* feat: fix Docker worker container setup

- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable

* fix: add report assembly step to Temporal workflow

- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header

* refactor: consolidate Docker setup to root docker-compose.yml

* feat: improve Temporal client UX and env handling

- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands

* refactor: simplify session ID handling and improve Taskfile options

- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters

* chore: add .env.example and simplify .gitignore

* docs: update README and CLAUDE.md for Temporal workflow usage

- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview

* refactor: replace Taskfile with bash CLI script

- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands

* docs: fix deliverable filename in README

* refactor: remove direct CLI and .shannon-store.json in favor of Temporal

- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references

* chore: remove licensing comments from prompt files to prevent leaking into actual prompts

* fix: resolve parallel workflow race conditions and retry logic bugs

- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows

* refactor: pipeline vuln→exploit workflow for parallel execution

- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics

* fix: re-throw retryable errors in checkExploitationQueue

* fix: detect and retry on Claude Code spending cap errors

- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success

* fix: increase heartbeat timeout to prevent false worker-dead detection

Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).

* fix: temporal db init

* fix: persist home dir

* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>

- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access

---------

Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
This commit is contained in:
Arjun Malleswaran
2026-01-15 10:36:11 -08:00
committed by GitHub
parent 45acb16711
commit 51e621d0d5
77 changed files with 6117 additions and 2417 deletions
+82 -111
View File
@@ -79,10 +79,11 @@ Shannon is available in two editions:
- [Product Line](#-product-line)
- [Setup & Usage Instructions](#-setup--usage-instructions)
- [Prerequisites](#prerequisites)
- [Authentication Setup](#authentication-setup)
- [Quick Start with Docker](#quick-start-with-docker)
- [Quick Start](#quick-start)
- [Monitoring Progress](#monitoring-progress)
- [Stopping Shannon](#stopping-shannon)
- [Usage Examples](#usage-examples)
- [Configuration (Optional)](#configuration-optional)
- [Usage Patterns](#usage-patterns)
- [Output and Results](#output-and-results)
- [Sample Reports & Benchmarks](#-sample-reports--benchmarks)
- [Architecture](#-architecture)
@@ -98,36 +99,71 @@ Shannon is available in two editions:
### Prerequisites
- **Claude Console account with credits** - Required for AI-powered analysis
- **Docker installed** - Primary deployment method
- **Docker** - Container runtime ([Install Docker](https://docs.docker.com/get-docker/))
- **Anthropic API key or Claude Code OAuth token** - Get from [Anthropic Console](https://console.anthropic.com)
### Authentication Setup
You need either a **Claude Code OAuth token** or an **Anthropic API key** to run Shannon. Get your token from the [Anthropic Console](https://console.anthropic.com) and pass it to Docker via the `-e` flag.
### Environment Configuration (Recommended)
To prevent Claude Code from hitting token limits during long report generation, set the max output tokens environment variable:
**For local runs:**
```bash
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
```
**For Docker runs:**
```bash
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
```
### Quick Start with Docker
#### Build the Container
### Quick Start
```bash
docker build -t shannon:latest .
# 1. Clone Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
# 2. Configure credentials (choose one method)
# Option A: Export environment variables
export ANTHROPIC_API_KEY="your-api-key" # or CLAUDE_CODE_OAUTH_TOKEN
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 # recommended
# Option B: Create a .env file
cat > .env << 'EOF'
ANTHROPIC_API_KEY=your-api-key
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
EOF
# 3. Run a pentest
./shannon start URL=https://your-app.com REPO=/path/to/your/repo
```
#### Prepare Your Repository
Shannon will build the containers, start the workflow, and return a workflow ID. The pentest runs in the background.
### Monitoring Progress
```bash
# View real-time worker logs
./shannon logs
# Query a specific workflow's progress
./shannon query ID=shannon-1234567890
# Open the Temporal Web UI for detailed monitoring
open http://localhost:8233
```
### Stopping Shannon
```bash
# Stop all containers (preserves workflow data)
./shannon stop
# Full cleanup (removes all data)
./shannon stop CLEAN=true
```
### Usage Examples
```bash
# Basic pentest
./shannon start URL=https://example.com REPO=/path/to/repo
# With a configuration file
./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./configs/my-config.yaml
# Custom output directory
./shannon start URL=https://example.com REPO=/path/to/repo OUTPUT=./my-reports
```
### Prepare Your Repository
Shannon is designed for **web application security testing** and expects all application code to be available in a single directory structure. This works well for:
@@ -137,105 +173,35 @@ Shannon is designed for **web application security testing** and expects all app
**For monorepos:**
```bash
git clone https://github.com/your-org/your-monorepo.git repos/your-app
git clone https://github.com/your-org/your-monorepo.git /path/to/your-app
```
**For multi-repository applications** (e.g., separate frontend/backend):
```bash
mkdir repos/your-app
cd repos/your-app
mkdir /path/to/your-app
cd /path/to/your-app
git clone https://github.com/your-org/frontend.git
git clone https://github.com/your-org/backend.git
git clone https://github.com/your-org/api.git
```
**For existing local repositories:**
```bash
cp -r /path/to/your-existing-repo repos/your-app
```
#### Run Your First Pentest
**With Claude Console OAuth Token:**
```bash
docker run --rm -it \
--network host \
--cap-add=NET_RAW \
--cap-add=NET_ADMIN \
-e CLAUDE_CODE_OAUTH_TOKEN="$CLAUDE_CODE_OAUTH_TOKEN" \
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
-v "$(pwd)/repos:/app/repos" \
-v "$(pwd)/configs:/app/configs" \
# Comment below line if using custom output directory
-v "$(pwd)/audit-logs:/app/audit-logs" \
shannon:latest \
"https://your-app.com/" \
"/app/repos/your-app" \
--config /app/configs/example-config.yaml
# Optional: uncomment below for custom output directory
# -v "$(pwd)/reports:/app/reports" \
# --output /app/reports
```
**With Anthropic API Key:**
```bash
docker run --rm -it \
--network host \
--cap-add=NET_RAW \
--cap-add=NET_ADMIN \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
-v "$(pwd)/repos:/app/repos" \
-v "$(pwd)/configs:/app/configs" \
# Comment below line if using custom output directory
-v "$(pwd)/audit-logs:/app/audit-logs" \
shannon:latest \
"https://your-app.com/" \
"/app/repos/your-app" \
--config /app/configs/example-config.yaml
# Optional: uncomment below for custom output directory
# -v "$(pwd)/reports:/app/reports" \
# --output /app/reports
```
#### Platform-Specific Instructions
### Platform-Specific Instructions
**For Linux (Native Docker):**
Add the `--user $(id -u):$(id -g)` flag to the Docker commands above to avoid permission issues with volume mounts. Docker Desktop on macOS and Windows handles this automatically, but native Linux Docker requires explicit user mapping.
You may need to run commands with `sudo` depending on your Docker setup. If you encounter permission issues with output files, ensure your user has access to the Docker socket.
**Network Capabilities:**
**For macOS:**
- `--cap-add=NET_RAW` - Enables advanced port scanning with nmap
- `--cap-add=NET_ADMIN` - Allows network administration for security tools
- `--network host` - Provides access to target network interfaces
Works out of the box with Docker Desktop installed.
**Testing Local Applications:**
Docker containers cannot reach `localhost` on your host machine. Use `host.docker.internal` in place of `localhost`:
```bash
docker run --rm -it \
--add-host=host.docker.internal:host-gateway \
--cap-add=NET_RAW \
--cap-add=NET_ADMIN \
-e CLAUDE_CODE_OAUTH_TOKEN="$CLAUDE_CODE_OAUTH_TOKEN" \
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
-v "$(pwd)/repos:/app/repos" \
-v "$(pwd)/configs:/app/configs" \
# Comment below line if using custom output directory
-v "$(pwd)/audit-logs:/app/audit-logs" \
shannon:latest \
"http://host.docker.internal:3000" \
"/app/repos/your-app" \
--config /app/configs/example-config.yaml
# Optional: uncomment below for custom output directory
# -v "$(pwd)/reports:/app/reports" \
# --output /app/reports
./shannon start URL=http://host.docker.internal:3000 REPO=/path/to/repo
```
### Configuration (Optional)
@@ -288,12 +254,17 @@ If your application uses two-factor authentication, simply add the TOTP secret t
### Output and Results
All results are saved to `./audit-logs/` by default. Use `--output <path>` to specify a custom directory. If using `--output`, ensure that path is mounted to an accessible host directory (e.g., `-v "$(pwd)/custom-directory:/app/reports"`).
All results are saved to `./audit-logs/{hostname}_{sessionId}/` by default. Use `--output <path>` to specify a custom directory.
- **Pre-reconnaissance reports** - External scan results
- **Vulnerability assessments** - Potential vulnerabilities from thorough code analysis and network mapping
- **Exploitation results** - Proof-of-concept attempts
- **Executive reports** - Business-focused security summaries
Output structure:
```
audit-logs/{hostname}_{sessionId}/
├── session.json # Metrics and session data
├── agents/ # Per-agent execution logs
├── prompts/ # Prompt snapshots for reproducibility
└── deliverables/
└── comprehensive_security_assessment_report.md # Final comprehensive security report
```
---