Feat/temporal (#46)

* refactor: modularize claude-executor and extract shared utilities - Extract message handling into src/ai/message-handlers.ts with pure functions - Extract output formatting into src/ai/output-formatters.ts - Extract progress management into src/ai/progress-manager.ts - Add audit-logger.ts with Null Object pattern for optional logging - Add shared utilities: formatting.ts, file-io.ts, functional.ts - Consolidate getPromptNameForAgent into src/types/agents.ts * feat: add Claude Code custom commands for debug and review * feat: add Temporal integration foundation (phase 1-2) - Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity) - Add shared types for pipeline state, metrics, and progress queries - Add classifyErrorForTemporal() for retry behavior classification - Add docker-compose for Temporal server with SQLite persistence * feat: add Temporal activities for agent execution (phase 3) - Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification - Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use - Track attempt number via Temporal Context for accurate audit logging - Rollback git workspace before retry to ensure clean state * feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4) * feat: add Temporal worker, client, and query tools (phase 5) - Add worker.ts with workflow bundling and graceful shutdown - Add client.ts CLI to start pipelines with progress polling - Add query.ts CLI to inspect running workflow state - Fix buffer overflow by truncating error messages and stack traces - Skip git operations gracefully on non-git repositories - Add kill.sh/start.sh dev scripts and Dockerfile.worker * feat: fix Docker worker container setup - Install uv instead of deprecated uvx package - Add mcp-server and configs directories to container - Mount target repo dynamically via TARGET_REPO env variable * fix: add report assembly step to Temporal workflow - Add assembleReportActivity to concatenate exploitation evidence files before report agent runs - Call assembleFinalReport in workflow Phase 5 before runReportAgent - Ensure deliverables directory exists before writing final report - Simplify pipeline-testing report prompt to just prepend header * refactor: consolidate Docker setup to root docker-compose.yml * feat: improve Temporal client UX and env handling - Change default to fire-and-forget (--wait flag to opt-in) - Add splash screen and improve console output formatting - Add .env to gitignore, remove from dockerignore for container access - Add Taskfile for common development commands * refactor: simplify session ID handling and improve Taskfile options - Include hostname in workflow ID for better audit log organization - Extract sanitizeHostname utility to audit/utils.ts for reuse - Remove unused generateSessionLogPath and buildLogFilePath functions - Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters * chore: add .env.example and simplify .gitignore * docs: update README and CLAUDE.md for Temporal workflow usage - Replace Docker CLI instructions with Task-based commands - Add monitoring/stopping sections and workflow examples - Document Temporal orchestration layer and troubleshooting - Simplify file structure to key files overview * refactor: replace Taskfile with bash CLI script - Add shannon bash script with start/logs/query/stop/help commands - Remove Taskfile.yml dependency (no longer requires Task installation) - Update README.md and CLAUDE.md to use ./shannon commands - Update client.ts output to show ./shannon commands * docs: fix deliverable filename in README * refactor: remove direct CLI and .shannon-store.json in favor of Temporal - Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode) - Remove .shannon-store.json session lock (Temporal handles workflow deduplication) - Remove broken scripts/export-metrics.js (imported non-existent function) - Update package.json to remove main, start script, and bin entry - Clean up CLAUDE.md and debug.md to remove obsolete references * chore: remove licensing comments from prompt files to prevent leaking into actual prompts * fix: resolve parallel workflow race conditions and retry logic bugs - Fix save_deliverable race condition using closure pattern instead of global variable - Fix error classification order so OutputValidationError matches before generic validation - Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing - Add per-error-type retry limits (3 for output validation, 50 for billing) - Add fast retry intervals for pipeline testing mode (10s vs 5min) - Increase worker concurrent activities to 25 for parallel workflows * refactor: pipeline vuln→exploit workflow for parallel execution - Replace sync barrier between vuln/exploit phases with independent pipelines - Each vuln type runs: vuln agent → queue check → conditional exploit - Add checkExploitationQueue activity to skip exploits when no vulns found - Use Promise.allSettled for graceful failure handling across pipelines - Add PipelineSummary type for aggregated cost/duration/turns metrics * fix: re-throw retryable errors in checkExploitationQueue * fix: detect and retry on Claude Code spending cap errors - Add spending cap pattern detection in detectApiError() with retryable error - Add matching patterns to classifyErrorForTemporal() for proper Temporal retry - Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection - Add final sanity check in activities before declaring success * fix: increase heartbeat timeout to prevent false worker-dead detection Original 30s timeout was from POC spec assuming <5min activities. With hour-long activities and multiple concurrent workflows sharing one worker, resource contention causes event loop stalls exceeding 30s, triggering false heartbeat timeouts. Increased to 10min (prod) and 5min (testing). * fix: temporal db init * fix: persist home dir * feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id> - Add WorkflowLogger class for human-readable, per-workflow log files - Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events - Update ./shannon logs to require ID param and tail specific workflow log - Add phase transition logging at workflow boundaries - Include workflow completion summary with agent breakdown (duration, cost) - Mount audit-logs volume in docker-compose for host access --------- Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
2026-01-15 10:36:11 -08:00
parent 45acb16711
commit 51e621d0d5
77 changed files with 6117 additions and 2417 deletions
@@ -0,0 +1,469 @@
+// Copyright (C) 2025 Keygraph, Inc.
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License version 3
+// as published by the Free Software Foundation.
+
+/**
+ * Temporal activities for Shannon agent execution.
+ *
+ * Each activity wraps a single agent execution with:
+ * - Heartbeat loop (2s interval) to signal worker liveness
+ * - Git checkpoint/rollback/commit per attempt
+ * - Error classification for Temporal retry behavior
+ * - Audit session logging
+ *
+ * Temporal handles retries based on error classification:
+ * - Retryable: BillingError, TransientError (429, 5xx, network)
+ * - Non-retryable: AuthenticationError, PermissionError, ConfigurationError, etc.
+ */
+
+import { heartbeat, ApplicationFailure, Context } from '@temporalio/activity';
+import chalk from 'chalk';
+
+// Max lengths to prevent Temporal protobuf buffer overflow
+const MAX_ERROR_MESSAGE_LENGTH = 2000;
+const MAX_STACK_TRACE_LENGTH = 1000;
+
+// Max retries for output validation errors (agent didn't save deliverables)
+// Lower than default 50 since this is unlikely to self-heal
+const MAX_OUTPUT_VALIDATION_RETRIES = 3;
+
+/**
+ * Truncate error message to prevent buffer overflow in Temporal serialization.
+ */
+function truncateErrorMessage(message: string): string {
+  if (message.length <= MAX_ERROR_MESSAGE_LENGTH) {
+    return message;
+  }
+  return message.slice(0, MAX_ERROR_MESSAGE_LENGTH - 20) + '\n[truncated]';
+}
+
+/**
+ * Truncate stack trace on an ApplicationFailure to prevent buffer overflow.
+ */
+function truncateStackTrace(failure: ApplicationFailure): void {
+  if (failure.stack && failure.stack.length > MAX_STACK_TRACE_LENGTH) {
+    failure.stack = failure.stack.slice(0, MAX_STACK_TRACE_LENGTH) + '\n[stack truncated]';
+  }
+}
+
+import {
+  runClaudePrompt,
+  validateAgentOutput,
+  type ClaudePromptResult,
+} from '../ai/claude-executor.js';
+import { loadPrompt } from '../prompts/prompt-manager.js';
+import { parseConfig, distributeConfig } from '../config-parser.js';
+import { classifyErrorForTemporal } from '../error-handling.js';
+import {
+  safeValidateQueueAndDeliverable,
+  type VulnType,
+  type ExploitationDecision,
+} from '../queue-validation.js';
+import {
+  createGitCheckpoint,
+  commitGitSuccess,
+  rollbackGitWorkspace,
+  getGitCommitHash,
+} from '../utils/git-manager.js';
+import { assembleFinalReport } from '../phases/reporting.js';
+import { getPromptNameForAgent } from '../types/agents.js';
+import { AuditSession } from '../audit/index.js';
+import type { WorkflowSummary } from '../audit/workflow-logger.js';
+import type { AgentName } from '../types/agents.js';
+import type { AgentMetrics } from './shared.js';
+import type { DistributedConfig } from '../types/config.js';
+import type { SessionMetadata } from '../audit/utils.js';
+
+const HEARTBEAT_INTERVAL_MS = 2000; // Must be < heartbeatTimeout (10min production, 5min testing)
+
+/**
+ * Input for all agent activities.
+ * Matches PipelineInput but with required workflowId for audit correlation.
+ */
+export interface ActivityInput {
+  webUrl: string;
+  repoPath: string;
+  configPath?: string;
+  outputPath?: string;
+  pipelineTestingMode?: boolean;
+  workflowId: string;
+}
+
+/**
+ * Core activity implementation.
+ *
+ * Executes a single agent with:
+ * 1. Heartbeat loop for worker liveness
+ * 2. Config loading (if configPath provided)
+ * 3. Audit session initialization
+ * 4. Prompt loading
+ * 5. Git checkpoint before execution
+ * 6. Agent execution (single attempt)
+ * 7. Output validation
+ * 8. Git commit on success, rollback on failure
+ * 9. Error classification for Temporal retry
+ */
+async function runAgentActivity(
+  agentName: AgentName,
+  input: ActivityInput
+): Promise<AgentMetrics> {
+  const {
+    webUrl,
+    repoPath,
+    configPath,
+    outputPath,
+    pipelineTestingMode = false,
+    workflowId,
+  } = input;
+
+  const startTime = Date.now();
+
+  // Get attempt number from Temporal context (tracks retries automatically)
+  const attemptNumber = Context.current().info.attempt;
+
+  // Heartbeat loop - signals worker is alive to Temporal server
+  const heartbeatInterval = setInterval(() => {
+    const elapsed = Math.floor((Date.now() - startTime) / 1000);
+    heartbeat({ agent: agentName, elapsedSeconds: elapsed, attempt: attemptNumber });
+  }, HEARTBEAT_INTERVAL_MS);
+
+  try {
+    // 1. Load config (if provided)
+    let distributedConfig: DistributedConfig | null = null;
+    if (configPath) {
+      try {
+        const config = await parseConfig(configPath);
+        distributedConfig = distributeConfig(config);
+      } catch (err) {
+        throw new Error(`Failed to load config ${configPath}: ${err instanceof Error ? err.message : String(err)}`);
+      }
+    }
+
+    // 2. Build session metadata for audit
+    const sessionMetadata: SessionMetadata = {
+      id: workflowId,
+      webUrl,
+      repoPath,
+      ...(outputPath && { outputPath }),
+    };
+
+    // 3. Initialize audit session (idempotent, safe across retries)
+    const auditSession = new AuditSession(sessionMetadata);
+    await auditSession.initialize();
+
+    // 4. Load prompt
+    const promptName = getPromptNameForAgent(agentName);
+    const prompt = await loadPrompt(
+      promptName,
+      { webUrl, repoPath },
+      distributedConfig,
+      pipelineTestingMode
+    );
+
+    // 5. Create git checkpoint before execution
+    await createGitCheckpoint(repoPath, agentName, attemptNumber);
+    await auditSession.startAgent(agentName, prompt, attemptNumber);
+
+    // 6. Execute agent (single attempt - Temporal handles retries)
+    const result: ClaudePromptResult = await runClaudePrompt(
+      prompt,
+      repoPath,
+      '', // context
+      agentName, // description
+      agentName,
+      chalk.cyan,
+      sessionMetadata,
+      auditSession,
+      attemptNumber
+    );
+
+    // 6.5. Sanity check: Detect spending cap that slipped through all detection layers
+    // Defense-in-depth: A successful agent execution should never have ≤2 turns with $0 cost
+    if (result.success && (result.turns ?? 0) <= 2 && (result.cost || 0) === 0) {
+      const resultText = result.result || '';
+      const looksLikeBillingError = /spending|cap|limit|budget|resets/i.test(resultText);
+
+      if (looksLikeBillingError) {
+        await rollbackGitWorkspace(repoPath, 'spending cap detected');
+        await auditSession.endAgent(agentName, {
+          attemptNumber,
+          duration_ms: result.duration,
+          cost_usd: 0,
+          success: false,
+          error: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
+        });
+        // Throw as billing error so Temporal retries with long backoff
+        throw new Error(`Spending cap likely reached: ${resultText.slice(0, 100)}`);
+      }
+    }
+
+    // 7. Handle execution failure
+    if (!result.success) {
+      await rollbackGitWorkspace(repoPath, 'execution failure');
+      await auditSession.endAgent(agentName, {
+        attemptNumber,
+        duration_ms: result.duration,
+        cost_usd: result.cost || 0,
+        success: false,
+        error: result.error || 'Execution failed',
+      });
+      throw new Error(result.error || 'Agent execution failed');
+    }
+
+    // 8. Validate output
+    const validationPassed = await validateAgentOutput(result, agentName, repoPath);
+    if (!validationPassed) {
+      await rollbackGitWorkspace(repoPath, 'validation failure');
+      await auditSession.endAgent(agentName, {
+        attemptNumber,
+        duration_ms: result.duration,
+        cost_usd: result.cost || 0,
+        success: false,
+        error: 'Output validation failed',
+      });
+
+      // Limit output validation retries (unlikely to self-heal)
+      if (attemptNumber >= MAX_OUTPUT_VALIDATION_RETRIES) {
+        throw ApplicationFailure.nonRetryable(
+          `Agent ${agentName} failed output validation after ${attemptNumber} attempts`,
+          'OutputValidationError',
+          [{ agentName, attemptNumber, elapsed: Date.now() - startTime }]
+        );
+      }
+      // Let Temporal retry (will be classified as OutputValidationError)
+      throw new Error(`Agent ${agentName} failed output validation`);
+    }
+
+    // 9. Success - commit and log
+    const commitHash = await getGitCommitHash(repoPath);
+    await auditSession.endAgent(agentName, {
+      attemptNumber,
+      duration_ms: result.duration,
+      cost_usd: result.cost || 0,
+      success: true,
+      ...(commitHash && { checkpoint: commitHash }),
+    });
+    await commitGitSuccess(repoPath, agentName);
+
+    // 10. Return metrics
+    return {
+      durationMs: Date.now() - startTime,
+      inputTokens: null, // Not currently exposed by SDK wrapper
+      outputTokens: null,
+      costUsd: result.cost ?? null,
+      numTurns: result.turns ?? null,
+    };
+  } catch (error) {
+    // Rollback git workspace before Temporal retry to ensure clean state
+    try {
+      await rollbackGitWorkspace(repoPath, 'error recovery');
+    } catch (rollbackErr) {
+      // Log but don't fail - rollback is best-effort
+      console.error(`Failed to rollback git workspace for ${agentName}:`, rollbackErr);
+    }
+
+    // If error is already an ApplicationFailure (e.g., from our retry limit logic),
+    // re-throw it directly without re-classifying
+    if (error instanceof ApplicationFailure) {
+      throw error;
+    }
+
+    // Classify error for Temporal retry behavior
+    const classified = classifyErrorForTemporal(error);
+    // Truncate message to prevent protobuf buffer overflow
+    const rawMessage = error instanceof Error ? error.message : String(error);
+    const message = truncateErrorMessage(rawMessage);
+
+    if (classified.retryable) {
+      // Temporal will retry with configured backoff
+      const failure = ApplicationFailure.create({
+        message,
+        type: classified.type,
+        details: [{ agentName, attemptNumber, elapsed: Date.now() - startTime }],
+      });
+      truncateStackTrace(failure);
+      throw failure;
+    } else {
+      // Fail immediately - no retry
+      const failure = ApplicationFailure.nonRetryable(message, classified.type, [
+        { agentName, attemptNumber, elapsed: Date.now() - startTime },
+      ]);
+      truncateStackTrace(failure);
+      throw failure;
+    }
+  } finally {
+    clearInterval(heartbeatInterval);
+  }
+}
+
+// === Individual Agent Activity Exports ===
+// Each function is a thin wrapper around runAgentActivity with the agent name.
+
+export async function runPreReconAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('pre-recon', input);
+}
+
+export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('recon', input);
+}
+
+export async function runInjectionVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('injection-vuln', input);
+}
+
+export async function runXssVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('xss-vuln', input);
+}
+
+export async function runAuthVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('auth-vuln', input);
+}
+
+export async function runSsrfVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('ssrf-vuln', input);
+}
+
+export async function runAuthzVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('authz-vuln', input);
+}
+
+export async function runInjectionExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('injection-exploit', input);
+}
+
+export async function runXssExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('xss-exploit', input);
+}
+
+export async function runAuthExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('auth-exploit', input);
+}
+
+export async function runSsrfExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('ssrf-exploit', input);
+}
+
+export async function runAuthzExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('authz-exploit', input);
+}
+
+export async function runReportAgent(input: ActivityInput): Promise<AgentMetrics> {
+  return runAgentActivity('report', input);
+}
+
+/**
+ * Assemble the final report by concatenating exploitation evidence files.
+ * This must be called BEFORE runReportAgent to create the file that the report agent will modify.
+ */
+export async function assembleReportActivity(input: ActivityInput): Promise<void> {
+  const { repoPath } = input;
+  console.log(chalk.blue('📝 Assembling deliverables from specialist agents...'));
+  try {
+    await assembleFinalReport(repoPath);
+  } catch (error) {
+    const err = error as Error;
+    console.log(chalk.yellow(`⚠️ Error assembling final report: ${err.message}`));
+    // Don't throw - the report agent can still create content even if no exploitation files exist
+  }
+}
+
+/**
+ * Check if exploitation should run for a given vulnerability type.
+ * Reads the vulnerability queue file and returns the decision.
+ *
+ * This activity allows the workflow to skip exploit agents entirely
+ * when no vulnerabilities were found, saving API calls and time.
+ *
+ * Error handling:
+ * - Retryable errors (missing files, invalid JSON): re-throw for Temporal retry
+ * - Non-retryable errors: skip exploitation gracefully
+ */
+export async function checkExploitationQueue(
+  input: ActivityInput,
+  vulnType: VulnType
+): Promise<ExploitationDecision> {
+  const { repoPath } = input;
+
+  const result = await safeValidateQueueAndDeliverable(vulnType, repoPath);
+
+  if (result.success && result.data) {
+    const { shouldExploit, vulnerabilityCount } = result.data;
+    console.log(
+      chalk.blue(
+        `🔍 ${vulnType}: ${shouldExploit ? `${vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`
+      )
+    );
+    return result.data;
+  }
+
+  // Validation failed - check if we should retry or skip
+  const error = result.error;
+  if (error?.retryable) {
+    // Re-throw retryable errors so Temporal can retry the vuln agent
+    console.log(chalk.yellow(`⚠️ ${vulnType}: ${error.message} (retrying)`));
+    throw error;
+  }
+
+  // Non-retryable error - skip exploitation gracefully
+  console.log(
+    chalk.yellow(`⚠️ ${vulnType}: ${error?.message ?? 'Unknown error'}, skipping exploitation`)
+  );
+  return {
+    shouldExploit: false,
+    shouldRetry: false,
+    vulnerabilityCount: 0,
+    vulnType,
+  };
+}
+
+/**
+ * Log phase transition to the unified workflow log.
+ * Called at phase boundaries for per-workflow logging.
+ */
+export async function logPhaseTransition(
+  input: ActivityInput,
+  phase: string,
+  event: 'start' | 'complete'
+): Promise<void> {
+  const { webUrl, repoPath, outputPath, workflowId } = input;
+
+  const sessionMetadata: SessionMetadata = {
+    id: workflowId,
+    webUrl,
+    repoPath,
+    ...(outputPath && { outputPath }),
+  };
+
+  const auditSession = new AuditSession(sessionMetadata);
+  await auditSession.initialize();
+
+  if (event === 'start') {
+    await auditSession.logPhaseStart(phase);
+  } else {
+    await auditSession.logPhaseComplete(phase);
+  }
+}
+
+/**
+ * Log workflow completion with full summary to the unified workflow log.
+ * Called at the end of the workflow to write a summary breakdown.
+ */
+export async function logWorkflowComplete(
+  input: ActivityInput,
+  summary: WorkflowSummary
+): Promise<void> {
+  const { webUrl, repoPath, outputPath, workflowId } = input;
+
+  const sessionMetadata: SessionMetadata = {
+    id: workflowId,
+    webUrl,
+    repoPath,
+    ...(outputPath && { outputPath }),
+  };
+
+  const auditSession = new AuditSession(sessionMetadata);
+  await auditSession.initialize();
+  await auditSession.logWorkflowComplete(summary);
+}