Feat/temporal (#46)

* refactor: modularize claude-executor and extract shared utilities - Extract message handling into src/ai/message-handlers.ts with pure functions - Extract output formatting into src/ai/output-formatters.ts - Extract progress management into src/ai/progress-manager.ts - Add audit-logger.ts with Null Object pattern for optional logging - Add shared utilities: formatting.ts, file-io.ts, functional.ts - Consolidate getPromptNameForAgent into src/types/agents.ts * feat: add Claude Code custom commands for debug and review * feat: add Temporal integration foundation (phase 1-2) - Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity) - Add shared types for pipeline state, metrics, and progress queries - Add classifyErrorForTemporal() for retry behavior classification - Add docker-compose for Temporal server with SQLite persistence * feat: add Temporal activities for agent execution (phase 3) - Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification - Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use - Track attempt number via Temporal Context for accurate audit logging - Rollback git workspace before retry to ensure clean state * feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4) * feat: add Temporal worker, client, and query tools (phase 5) - Add worker.ts with workflow bundling and graceful shutdown - Add client.ts CLI to start pipelines with progress polling - Add query.ts CLI to inspect running workflow state - Fix buffer overflow by truncating error messages and stack traces - Skip git operations gracefully on non-git repositories - Add kill.sh/start.sh dev scripts and Dockerfile.worker * feat: fix Docker worker container setup - Install uv instead of deprecated uvx package - Add mcp-server and configs directories to container - Mount target repo dynamically via TARGET_REPO env variable * fix: add report assembly step to Temporal workflow - Add assembleReportActivity to concatenate exploitation evidence files before report agent runs - Call assembleFinalReport in workflow Phase 5 before runReportAgent - Ensure deliverables directory exists before writing final report - Simplify pipeline-testing report prompt to just prepend header * refactor: consolidate Docker setup to root docker-compose.yml * feat: improve Temporal client UX and env handling - Change default to fire-and-forget (--wait flag to opt-in) - Add splash screen and improve console output formatting - Add .env to gitignore, remove from dockerignore for container access - Add Taskfile for common development commands * refactor: simplify session ID handling and improve Taskfile options - Include hostname in workflow ID for better audit log organization - Extract sanitizeHostname utility to audit/utils.ts for reuse - Remove unused generateSessionLogPath and buildLogFilePath functions - Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters * chore: add .env.example and simplify .gitignore * docs: update README and CLAUDE.md for Temporal workflow usage - Replace Docker CLI instructions with Task-based commands - Add monitoring/stopping sections and workflow examples - Document Temporal orchestration layer and troubleshooting - Simplify file structure to key files overview * refactor: replace Taskfile with bash CLI script - Add shannon bash script with start/logs/query/stop/help commands - Remove Taskfile.yml dependency (no longer requires Task installation) - Update README.md and CLAUDE.md to use ./shannon commands - Update client.ts output to show ./shannon commands * docs: fix deliverable filename in README * refactor: remove direct CLI and .shannon-store.json in favor of Temporal - Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode) - Remove .shannon-store.json session lock (Temporal handles workflow deduplication) - Remove broken scripts/export-metrics.js (imported non-existent function) - Update package.json to remove main, start script, and bin entry - Clean up CLAUDE.md and debug.md to remove obsolete references * chore: remove licensing comments from prompt files to prevent leaking into actual prompts * fix: resolve parallel workflow race conditions and retry logic bugs - Fix save_deliverable race condition using closure pattern instead of global variable - Fix error classification order so OutputValidationError matches before generic validation - Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing - Add per-error-type retry limits (3 for output validation, 50 for billing) - Add fast retry intervals for pipeline testing mode (10s vs 5min) - Increase worker concurrent activities to 25 for parallel workflows * refactor: pipeline vuln→exploit workflow for parallel execution - Replace sync barrier between vuln/exploit phases with independent pipelines - Each vuln type runs: vuln agent → queue check → conditional exploit - Add checkExploitationQueue activity to skip exploits when no vulns found - Use Promise.allSettled for graceful failure handling across pipelines - Add PipelineSummary type for aggregated cost/duration/turns metrics * fix: re-throw retryable errors in checkExploitationQueue * fix: detect and retry on Claude Code spending cap errors - Add spending cap pattern detection in detectApiError() with retryable error - Add matching patterns to classifyErrorForTemporal() for proper Temporal retry - Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection - Add final sanity check in activities before declaring success * fix: increase heartbeat timeout to prevent false worker-dead detection Original 30s timeout was from POC spec assuming <5min activities. With hour-long activities and multiple concurrent workflows sharing one worker, resource contention causes event loop stalls exceeding 30s, triggering false heartbeat timeouts. Increased to 10min (prod) and 5min (testing). * fix: temporal db init * fix: persist home dir * feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id> - Add WorkflowLogger class for human-readable, per-workflow log files - Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events - Update ./shannon logs to require ID param and tail specific workflow log - Add phase transition logging at workflow boundaries - Include workflow completion summary with agent breakdown (duration, cost) - Mount audit-logs volume in docker-compose for host access --------- Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
2026-01-15 10:36:11 -08:00
parent 45acb16711
commit 51e621d0d5
77 changed files with 6117 additions and 2417 deletions
@@ -0,0 +1,213 @@
+#!/bin/bash
+# Shannon CLI - AI Penetration Testing Framework
+
+set -e
+
+COMPOSE_FILE="docker-compose.yml"
+
+# Load .env if present
+if [ -f .env ]; then
+  set -a
+  source .env
+  set +a
+fi
+
+show_help() {
+  cat << 'EOF'
+Shannon - AI Penetration Testing Framework
+
+Usage:
+  ./shannon start URL=<url> REPO=<path>   Start a pentest workflow
+  ./shannon logs ID=<workflow-id>         Tail logs for a specific workflow
+  ./shannon query ID=<workflow-id>        Query workflow progress
+  ./shannon stop                          Stop all containers
+  ./shannon help                          Show this help message
+
+Options for 'start':
+  CONFIG=<path>          Configuration file (YAML)
+  OUTPUT=<path>          Output directory for reports
+  PIPELINE_TESTING=true  Use minimal prompts for fast testing
+
+Options for 'stop':
+  CLEAN=true      Remove all data including volumes
+
+Examples:
+  ./shannon start URL=https://example.com REPO=/path/to/repo
+  ./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./config.yaml
+  ./shannon logs ID=example.com_shannon-1234567890
+  ./shannon query ID=shannon-1234567890
+  ./shannon stop CLEAN=true
+
+Monitor workflows at http://localhost:8233
+EOF
+}
+
+# Parse KEY=value arguments into variables
+parse_args() {
+  for arg in "$@"; do
+    case "$arg" in
+      URL=*) URL="${arg#URL=}" ;;
+      REPO=*) REPO="${arg#REPO=}" ;;
+      CONFIG=*) CONFIG="${arg#CONFIG=}" ;;
+      OUTPUT=*) OUTPUT="${arg#OUTPUT=}" ;;
+      ID=*) ID="${arg#ID=}" ;;
+      CLEAN=*) CLEAN="${arg#CLEAN=}" ;;
+      PIPELINE_TESTING=*) PIPELINE_TESTING="${arg#PIPELINE_TESTING=}" ;;
+      REBUILD=*) REBUILD="${arg#REBUILD=}" ;;
+    esac
+  done
+}
+
+# Check if Temporal is running and healthy
+is_temporal_ready() {
+  docker compose -f "$COMPOSE_FILE" exec -T temporal \
+    temporal operator cluster health --address localhost:7233 2>/dev/null | grep -q "SERVING"
+}
+
+# Ensure containers are running
+ensure_containers() {
+  # Quick check: if Temporal is already healthy, we're good
+  if is_temporal_ready; then
+    return 0
+  fi
+
+  # Need to start containers
+  echo "Starting Shannon containers..."
+  if [ "$REBUILD" = "true" ]; then
+    # Force rebuild without cache (use when code changes aren't being picked up)
+    echo "Rebuilding with --no-cache..."
+    docker compose -f "$COMPOSE_FILE" build --no-cache worker
+  fi
+  docker compose -f "$COMPOSE_FILE" up -d --build
+
+  # Wait for Temporal to be ready
+  echo "Waiting for Temporal to be ready..."
+  for i in $(seq 1 30); do
+    if is_temporal_ready; then
+      echo "Temporal is ready!"
+      return 0
+    fi
+    if [ "$i" -eq 30 ]; then
+      echo "Timeout waiting for Temporal"
+      exit 1
+    fi
+    sleep 2
+  done
+}
+
+cmd_start() {
+  parse_args "$@"
+
+  # Validate required vars
+  if [ -z "$URL" ] || [ -z "$REPO" ]; then
+    echo "ERROR: URL and REPO are required"
+    echo "Usage: ./shannon start URL=<url> REPO=<path>"
+    exit 1
+  fi
+
+  # Check for API key
+  if [ -z "$ANTHROPIC_API_KEY" ] && [ -z "$CLAUDE_CODE_OAUTH_TOKEN" ]; then
+    echo "ERROR: Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env"
+    exit 1
+  fi
+
+  # Determine container path for REPO
+  # - If REPO is already a container path (/benchmarks/*, /target-repo), use as-is
+  # - Otherwise, it's a host path - mount to /target-repo and use that
+  case "$REPO" in
+    /benchmarks/*|/target-repo|/target-repo/*)
+      CONTAINER_REPO="$REPO"
+      ;;
+    *)
+      # Host path - export for docker-compose mount
+      export TARGET_REPO="$REPO"
+      CONTAINER_REPO="/target-repo"
+      ;;
+  esac
+
+  # Ensure containers are running (starts them if needed)
+  ensure_containers
+
+  # Build optional args
+  ARGS=""
+  [ -n "$CONFIG" ] && ARGS="$ARGS --config $CONFIG"
+  [ -n "$OUTPUT" ] && ARGS="$ARGS --output $OUTPUT"
+  [ "$PIPELINE_TESTING" = "true" ] && ARGS="$ARGS --pipeline-testing"
+
+  # Run the client to submit workflow
+  docker compose -f "$COMPOSE_FILE" exec -T worker \
+    node dist/temporal/client.js "$URL" "$CONTAINER_REPO" $ARGS
+}
+
+cmd_logs() {
+  parse_args "$@"
+
+  if [ -z "$ID" ]; then
+    echo "ERROR: ID is required"
+    echo "Usage: ./shannon logs ID=<workflow-id>"
+    exit 1
+  fi
+
+  WORKFLOW_LOG="./audit-logs/${ID}/workflow.log"
+
+  if [ -f "$WORKFLOW_LOG" ]; then
+    echo "Tailing workflow log: $WORKFLOW_LOG"
+    tail -f "$WORKFLOW_LOG"
+  else
+    echo "ERROR: Workflow log not found: $WORKFLOW_LOG"
+    echo ""
+    echo "Possible causes:"
+    echo "  - Workflow hasn't started yet"
+    echo "  - Workflow ID is incorrect"
+    echo "  - Workflow is using a custom OUTPUT path"
+    echo ""
+    echo "Check: ./shannon query ID=$ID for workflow details"
+    exit 1
+  fi
+}
+
+cmd_query() {
+  parse_args "$@"
+
+  if [ -z "$ID" ]; then
+    echo "ERROR: ID is required"
+    echo "Usage: ./shannon query ID=<workflow-id>"
+    exit 1
+  fi
+
+  docker compose -f "$COMPOSE_FILE" exec -T worker \
+    node dist/temporal/query.js "$ID"
+}
+
+cmd_stop() {
+  parse_args "$@"
+
+  if [ "$CLEAN" = "true" ]; then
+    docker compose -f "$COMPOSE_FILE" down -v
+  else
+    docker compose -f "$COMPOSE_FILE" down
+  fi
+}
+
+# Main command dispatch
+case "${1:-help}" in
+  start)
+    shift
+    cmd_start "$@"
+    ;;
+  logs)
+    shift
+    cmd_logs "$@"
+    ;;
+  query)
+    shift
+    cmd_query "$@"
+    ;;
+  stop)
+    shift
+    cmd_stop "$@"
+    ;;
+  help|--help|-h|*)
+    show_help
+    ;;
+esac