Feat/temporal (#46)

* refactor: modularize claude-executor and extract shared utilities - Extract message handling into src/ai/message-handlers.ts with pure functions - Extract output formatting into src/ai/output-formatters.ts - Extract progress management into src/ai/progress-manager.ts - Add audit-logger.ts with Null Object pattern for optional logging - Add shared utilities: formatting.ts, file-io.ts, functional.ts - Consolidate getPromptNameForAgent into src/types/agents.ts * feat: add Claude Code custom commands for debug and review * feat: add Temporal integration foundation (phase 1-2) - Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity) - Add shared types for pipeline state, metrics, and progress queries - Add classifyErrorForTemporal() for retry behavior classification - Add docker-compose for Temporal server with SQLite persistence * feat: add Temporal activities for agent execution (phase 3) - Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification - Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use - Track attempt number via Temporal Context for accurate audit logging - Rollback git workspace before retry to ensure clean state * feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4) * feat: add Temporal worker, client, and query tools (phase 5) - Add worker.ts with workflow bundling and graceful shutdown - Add client.ts CLI to start pipelines with progress polling - Add query.ts CLI to inspect running workflow state - Fix buffer overflow by truncating error messages and stack traces - Skip git operations gracefully on non-git repositories - Add kill.sh/start.sh dev scripts and Dockerfile.worker * feat: fix Docker worker container setup - Install uv instead of deprecated uvx package - Add mcp-server and configs directories to container - Mount target repo dynamically via TARGET_REPO env variable * fix: add report assembly step to Temporal workflow - Add assembleReportActivity to concatenate exploitation evidence files before report agent runs - Call assembleFinalReport in workflow Phase 5 before runReportAgent - Ensure deliverables directory exists before writing final report - Simplify pipeline-testing report prompt to just prepend header * refactor: consolidate Docker setup to root docker-compose.yml * feat: improve Temporal client UX and env handling - Change default to fire-and-forget (--wait flag to opt-in) - Add splash screen and improve console output formatting - Add .env to gitignore, remove from dockerignore for container access - Add Taskfile for common development commands * refactor: simplify session ID handling and improve Taskfile options - Include hostname in workflow ID for better audit log organization - Extract sanitizeHostname utility to audit/utils.ts for reuse - Remove unused generateSessionLogPath and buildLogFilePath functions - Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters * chore: add .env.example and simplify .gitignore * docs: update README and CLAUDE.md for Temporal workflow usage - Replace Docker CLI instructions with Task-based commands - Add monitoring/stopping sections and workflow examples - Document Temporal orchestration layer and troubleshooting - Simplify file structure to key files overview * refactor: replace Taskfile with bash CLI script - Add shannon bash script with start/logs/query/stop/help commands - Remove Taskfile.yml dependency (no longer requires Task installation) - Update README.md and CLAUDE.md to use ./shannon commands - Update client.ts output to show ./shannon commands * docs: fix deliverable filename in README * refactor: remove direct CLI and .shannon-store.json in favor of Temporal - Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode) - Remove .shannon-store.json session lock (Temporal handles workflow deduplication) - Remove broken scripts/export-metrics.js (imported non-existent function) - Update package.json to remove main, start script, and bin entry - Clean up CLAUDE.md and debug.md to remove obsolete references * chore: remove licensing comments from prompt files to prevent leaking into actual prompts * fix: resolve parallel workflow race conditions and retry logic bugs - Fix save_deliverable race condition using closure pattern instead of global variable - Fix error classification order so OutputValidationError matches before generic validation - Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing - Add per-error-type retry limits (3 for output validation, 50 for billing) - Add fast retry intervals for pipeline testing mode (10s vs 5min) - Increase worker concurrent activities to 25 for parallel workflows * refactor: pipeline vuln→exploit workflow for parallel execution - Replace sync barrier between vuln/exploit phases with independent pipelines - Each vuln type runs: vuln agent → queue check → conditional exploit - Add checkExploitationQueue activity to skip exploits when no vulns found - Use Promise.allSettled for graceful failure handling across pipelines - Add PipelineSummary type for aggregated cost/duration/turns metrics * fix: re-throw retryable errors in checkExploitationQueue * fix: detect and retry on Claude Code spending cap errors - Add spending cap pattern detection in detectApiError() with retryable error - Add matching patterns to classifyErrorForTemporal() for proper Temporal retry - Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection - Add final sanity check in activities before declaring success * fix: increase heartbeat timeout to prevent false worker-dead detection Original 30s timeout was from POC spec assuming <5min activities. With hour-long activities and multiple concurrent workflows sharing one worker, resource contention causes event loop stalls exceeding 30s, triggering false heartbeat timeouts. Increased to 10min (prod) and 5min (testing). * fix: temporal db init * fix: persist home dir * feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id> - Add WorkflowLogger class for human-readable, per-workflow log files - Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events - Update ./shannon logs to require ID param and tail specific workflow log - Add phase transition logging at workflow boundaries - Include workflow completion summary with agent breakdown (duration, cost) - Mount audit-logs volume in docker-compose for host access --------- Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
2026-01-15 10:36:11 -08:00
parent 45acb16711
commit 51e621d0d5
77 changed files with 6117 additions and 2417 deletions
@@ -7,7 +7,7 @@
 import { $, fs, path } from 'zx';
 import chalk from 'chalk';
 import { Timer } from '../utils/metrics.js';
-import { formatDuration } from '../audit/utils.js';
+import { formatDuration } from '../utils/formatting.js';
 import { handleToolError, PentestError } from '../error-handling.js';
 import { AGENTS } from '../session-manager.js';
 import { runClaudePromptWithRetry } from '../ai/claude-executor.js';
@@ -40,11 +40,17 @@ interface PromptVariables {
  repoPath: string;
 }

+// Discriminated union for Wave1 tool results - clearer than loose union types
+type Wave1ToolResult =
+  | { kind: 'scan'; result: TerminalScanResult }
+  | { kind: 'skipped'; message: string }
+  | { kind: 'agent'; result: AgentResult };
+
 interface Wave1Results {
-  nmap: TerminalScanResult | string | AgentResult;
-  subfinder: TerminalScanResult | string | AgentResult;
-  whatweb: TerminalScanResult | string | AgentResult;
-  naabu?: TerminalScanResult | string | AgentResult;
+  nmap: Wave1ToolResult;
+  subfinder: Wave1ToolResult;
+  whatweb: Wave1ToolResult;
+  naabu?: Wave1ToolResult;
  codeAnalysis: AgentResult;
 }

@@ -57,7 +63,7 @@ interface PreReconResult {
  report: string;
 }

-// Pure function: Run terminal scanning tools
+// Runs external security tools (nmap, whatweb, etc). Schemathesis requires schemas from code analysis.
 async function runTerminalScan(tool: ToolName, target: string, sourceDir: string | null = null): Promise<TerminalScanResult> {
  const timer = new Timer(`command-${tool}`);
  try {
@@ -89,7 +95,7 @@ async function runTerminalScan(tool: ToolName, target: string, sourceDir: string
        return { tool: 'whatweb', output: result.stdout, status: 'success', duration: whatwebDuration };
      }
      case 'schemathesis': {
-        // Only run if API schemas found
+        // Schemathesis depends on code analysis output - skip if no schemas found
        const schemasDir = path.join(sourceDir || '.', 'outputs', 'schemas');
        if (await fs.pathExists(schemasDir)) {
          const schemaFiles = await fs.readdir(schemasDir) as string[];
@@ -146,6 +152,8 @@ async function runPreReconWave1(

  const operations: Promise<TerminalScanResult | AgentResult>[] = [];

+  const skippedResult = (message: string): Wave1ToolResult => ({ kind: 'skipped', message });
+
  // Skip external commands in pipeline testing mode
  if (pipelineTestingMode) {
    console.log(chalk.gray('    ⏭️ Skipping external tools (pipeline testing mode)'));
@@ -163,9 +171,9 @@ async function runPreReconWave1(
    );
    const [codeAnalysis] = await Promise.all(operations);
    return {
-      nmap: 'Skipped (pipeline testing mode)',
-      subfinder: 'Skipped (pipeline testing mode)',
-      whatweb: 'Skipped (pipeline testing mode)',
+      nmap: skippedResult('Skipped (pipeline testing mode)'),
+      subfinder: skippedResult('Skipped (pipeline testing mode)'),
+      whatweb: skippedResult('Skipped (pipeline testing mode)'),
      codeAnalysis: codeAnalysis as AgentResult
    };
  } else {
@@ -192,9 +200,9 @@ async function runPreReconWave1(
  const [nmap, subfinder, whatweb, codeAnalysis] = await Promise.all(operations);

  return {
-    nmap: nmap as TerminalScanResult,
-    subfinder: subfinder as TerminalScanResult,
-    whatweb: whatweb as TerminalScanResult,
+    nmap: { kind: 'scan', result: nmap as TerminalScanResult },
+    subfinder: { kind: 'scan', result: subfinder as TerminalScanResult },
+    whatweb: { kind: 'scan', result: whatweb as TerminalScanResult },
    codeAnalysis: codeAnalysis as AgentResult
  };
 }
@@ -250,17 +258,21 @@ async function runPreReconWave2(
  return response;
 }

-// Helper type for stitching results
-interface StitchableResult {
-  status?: string;
-  output?: string;
-  tool?: string;
+// Extracts status and output from a Wave1 tool result
+function extractResult(r: Wave1ToolResult | undefined): { status: string; output: string } {
+  if (!r) return { status: 'Skipped', output: 'No output' };
+  switch (r.kind) {
+    case 'scan':
+      return { status: r.result.status || 'Skipped', output: r.result.output || 'No output' };
+    case 'skipped':
+      return { status: 'Skipped', output: r.message };
+    case 'agent':
+      return { status: r.result.success ? 'success' : 'error', output: 'See agent output' };
+  }
 }

-// Pure function: Stitch together pre-recon outputs and save to file
-async function stitchPreReconOutputs(outputs: (StitchableResult | string | undefined)[], sourceDir: string): Promise<string> {
-  const [nmap, subfinder, whatweb, naabu, codeAnalysis, ...additionalScans] = outputs;
-
+// Combines tool outputs into single deliverable. Falls back to reference if file missing.
+async function stitchPreReconOutputs(wave1: Wave1Results, additionalScans: TerminalScanResult[], sourceDir: string): Promise<string> {
  // Try to read the code analysis deliverable file
  let codeAnalysisContent = 'No analysis available';
  try {
@@ -269,62 +281,45 @@ async function stitchPreReconOutputs(outputs: (StitchableResult | string | undef
  } catch (error) {
    const err = error as Error;
    console.log(chalk.yellow(`⚠️ Could not read code analysis deliverable: ${err.message}`));
-    // Fallback message if file doesn't exist
    codeAnalysisContent = 'Analysis located in deliverables/code_analysis_deliverable.md';
  }

-
  // Build additional scans section
  let additionalSection = '';
-  if (additionalScans && additionalScans.length > 0) {
+  if (additionalScans.length > 0) {
    additionalSection = '\n## Authenticated Scans\n';
-    additionalScans.forEach(scan => {
-      const s = scan as StitchableResult;
-      if (s && s.tool) {
-        additionalSection += `
-### ${s.tool.toUpperCase()}
-Status: ${s.status}
-${s.output}
+    for (const scan of additionalScans) {
+      additionalSection += `
+### ${scan.tool.toUpperCase()}
+Status: ${scan.status}
+${scan.output}
 `;
-      }
-    });
+    }
  }

-  const nmapResult = nmap as StitchableResult | string | undefined;
-  const subfinderResult = subfinder as StitchableResult | string | undefined;
-  const whatwebResult = whatweb as StitchableResult | string | undefined;
-  const naabuResult = naabu as StitchableResult | string | undefined;
-
-  const getStatus = (r: StitchableResult | string | undefined): string => {
-    if (!r) return 'Skipped';
-    if (typeof r === 'string') return 'Skipped';
-    return r.status || 'Skipped';
-  };
-
-  const getOutput = (r: StitchableResult | string | undefined): string => {
-    if (!r) return 'No output';
-    if (typeof r === 'string') return r;
-    return r.output || 'No output';
-  };
+  const nmap = extractResult(wave1.nmap);
+  const subfinder = extractResult(wave1.subfinder);
+  const whatweb = extractResult(wave1.whatweb);
+  const naabu = extractResult(wave1.naabu);

  const report = `
 # Pre-Reconnaissance Report

 ## Port Discovery (naabu)
-Status: ${getStatus(naabuResult)}
-${getOutput(naabuResult)}
+Status: ${naabu.status}
+${naabu.output}

 ## Network Scanning (nmap)
-Status: ${getStatus(nmapResult)}
-${getOutput(nmapResult)}
+Status: ${nmap.status}
+${nmap.output}

 ## Subdomain Discovery (subfinder)
-Status: ${getStatus(subfinderResult)}
-${getOutput(subfinderResult)}
+Status: ${subfinder.status}
+${subfinder.output}

 ## Technology Detection (whatweb)
-Status: ${getStatus(whatwebResult)}
-${getOutput(whatwebResult)}
+Status: ${whatweb.status}
+${whatweb.output}
 ## Code Analysis
 ${codeAnalysisContent}
 ${additionalSection}
@@ -375,16 +370,8 @@ export async function executePreReconPhase(
  console.log(chalk.green('  ✅ Wave 2 operations completed'));

  console.log(chalk.blue('📝 Stitching pre-recon outputs...'));
-  // Combine wave 1 and wave 2 results for stitching
-  const allResults: (StitchableResult | string | undefined)[] = [
-    wave1Results.nmap as StitchableResult | string,
-    wave1Results.subfinder as StitchableResult | string,
-    wave1Results.whatweb as StitchableResult | string,
-    wave1Results.naabu as StitchableResult | string | undefined,
-    wave1Results.codeAnalysis as unknown as StitchableResult,
-    ...(wave2Results.schemathesis ? [wave2Results.schemathesis as StitchableResult] : [])
-  ];
-  const preReconReport = await stitchPreReconOutputs(allResults, sourceDir);
+  const additionalScans = wave2Results.schemathesis ? [wave2Results.schemathesis] : [];
+  const preReconReport = await stitchPreReconOutputs(wave1Results, additionalScans, sourceDir);
  const duration = timer.stop();

  console.log(chalk.green(`✅ Pre-reconnaissance complete in ${formatDuration(duration)}`));