Feat/temporal (#46)

* refactor: modularize claude-executor and extract shared utilities

- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts

* feat: add Claude Code custom commands for debug and review

* feat: add Temporal integration foundation (phase 1-2)

- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence

* feat: add Temporal activities for agent execution (phase 3)

- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state

* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)

* feat: add Temporal worker, client, and query tools (phase 5)

- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker

* feat: fix Docker worker container setup

- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable

* fix: add report assembly step to Temporal workflow

- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header

* refactor: consolidate Docker setup to root docker-compose.yml

* feat: improve Temporal client UX and env handling

- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands

* refactor: simplify session ID handling and improve Taskfile options

- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters

* chore: add .env.example and simplify .gitignore

* docs: update README and CLAUDE.md for Temporal workflow usage

- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview

* refactor: replace Taskfile with bash CLI script

- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands

* docs: fix deliverable filename in README

* refactor: remove direct CLI and .shannon-store.json in favor of Temporal

- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references

* chore: remove licensing comments from prompt files to prevent leaking into actual prompts

* fix: resolve parallel workflow race conditions and retry logic bugs

- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows

* refactor: pipeline vuln→exploit workflow for parallel execution

- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics

* fix: re-throw retryable errors in checkExploitationQueue

* fix: detect and retry on Claude Code spending cap errors

- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success

* fix: increase heartbeat timeout to prevent false worker-dead detection

Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).

* fix: temporal db init

* fix: persist home dir

* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>

- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access

---------

Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
This commit is contained in:
Arjun Malleswaran
2026-01-15 10:36:11 -08:00
committed by GitHub
parent 45acb16711
commit 51e621d0d5
77 changed files with 6117 additions and 2417 deletions
+55 -68
View File
@@ -7,7 +7,7 @@
import { $, fs, path } from 'zx';
import chalk from 'chalk';
import { Timer } from '../utils/metrics.js';
import { formatDuration } from '../audit/utils.js';
import { formatDuration } from '../utils/formatting.js';
import { handleToolError, PentestError } from '../error-handling.js';
import { AGENTS } from '../session-manager.js';
import { runClaudePromptWithRetry } from '../ai/claude-executor.js';
@@ -40,11 +40,17 @@ interface PromptVariables {
repoPath: string;
}
// Discriminated union for Wave1 tool results - clearer than loose union types
type Wave1ToolResult =
| { kind: 'scan'; result: TerminalScanResult }
| { kind: 'skipped'; message: string }
| { kind: 'agent'; result: AgentResult };
interface Wave1Results {
nmap: TerminalScanResult | string | AgentResult;
subfinder: TerminalScanResult | string | AgentResult;
whatweb: TerminalScanResult | string | AgentResult;
naabu?: TerminalScanResult | string | AgentResult;
nmap: Wave1ToolResult;
subfinder: Wave1ToolResult;
whatweb: Wave1ToolResult;
naabu?: Wave1ToolResult;
codeAnalysis: AgentResult;
}
@@ -57,7 +63,7 @@ interface PreReconResult {
report: string;
}
// Pure function: Run terminal scanning tools
// Runs external security tools (nmap, whatweb, etc). Schemathesis requires schemas from code analysis.
async function runTerminalScan(tool: ToolName, target: string, sourceDir: string | null = null): Promise<TerminalScanResult> {
const timer = new Timer(`command-${tool}`);
try {
@@ -89,7 +95,7 @@ async function runTerminalScan(tool: ToolName, target: string, sourceDir: string
return { tool: 'whatweb', output: result.stdout, status: 'success', duration: whatwebDuration };
}
case 'schemathesis': {
// Only run if API schemas found
// Schemathesis depends on code analysis output - skip if no schemas found
const schemasDir = path.join(sourceDir || '.', 'outputs', 'schemas');
if (await fs.pathExists(schemasDir)) {
const schemaFiles = await fs.readdir(schemasDir) as string[];
@@ -146,6 +152,8 @@ async function runPreReconWave1(
const operations: Promise<TerminalScanResult | AgentResult>[] = [];
const skippedResult = (message: string): Wave1ToolResult => ({ kind: 'skipped', message });
// Skip external commands in pipeline testing mode
if (pipelineTestingMode) {
console.log(chalk.gray(' ⏭️ Skipping external tools (pipeline testing mode)'));
@@ -163,9 +171,9 @@ async function runPreReconWave1(
);
const [codeAnalysis] = await Promise.all(operations);
return {
nmap: 'Skipped (pipeline testing mode)',
subfinder: 'Skipped (pipeline testing mode)',
whatweb: 'Skipped (pipeline testing mode)',
nmap: skippedResult('Skipped (pipeline testing mode)'),
subfinder: skippedResult('Skipped (pipeline testing mode)'),
whatweb: skippedResult('Skipped (pipeline testing mode)'),
codeAnalysis: codeAnalysis as AgentResult
};
} else {
@@ -192,9 +200,9 @@ async function runPreReconWave1(
const [nmap, subfinder, whatweb, codeAnalysis] = await Promise.all(operations);
return {
nmap: nmap as TerminalScanResult,
subfinder: subfinder as TerminalScanResult,
whatweb: whatweb as TerminalScanResult,
nmap: { kind: 'scan', result: nmap as TerminalScanResult },
subfinder: { kind: 'scan', result: subfinder as TerminalScanResult },
whatweb: { kind: 'scan', result: whatweb as TerminalScanResult },
codeAnalysis: codeAnalysis as AgentResult
};
}
@@ -250,17 +258,21 @@ async function runPreReconWave2(
return response;
}
// Helper type for stitching results
interface StitchableResult {
status?: string;
output?: string;
tool?: string;
// Extracts status and output from a Wave1 tool result
function extractResult(r: Wave1ToolResult | undefined): { status: string; output: string } {
if (!r) return { status: 'Skipped', output: 'No output' };
switch (r.kind) {
case 'scan':
return { status: r.result.status || 'Skipped', output: r.result.output || 'No output' };
case 'skipped':
return { status: 'Skipped', output: r.message };
case 'agent':
return { status: r.result.success ? 'success' : 'error', output: 'See agent output' };
}
}
// Pure function: Stitch together pre-recon outputs and save to file
async function stitchPreReconOutputs(outputs: (StitchableResult | string | undefined)[], sourceDir: string): Promise<string> {
const [nmap, subfinder, whatweb, naabu, codeAnalysis, ...additionalScans] = outputs;
// Combines tool outputs into single deliverable. Falls back to reference if file missing.
async function stitchPreReconOutputs(wave1: Wave1Results, additionalScans: TerminalScanResult[], sourceDir: string): Promise<string> {
// Try to read the code analysis deliverable file
let codeAnalysisContent = 'No analysis available';
try {
@@ -269,62 +281,45 @@ async function stitchPreReconOutputs(outputs: (StitchableResult | string | undef
} catch (error) {
const err = error as Error;
console.log(chalk.yellow(`⚠️ Could not read code analysis deliverable: ${err.message}`));
// Fallback message if file doesn't exist
codeAnalysisContent = 'Analysis located in deliverables/code_analysis_deliverable.md';
}
// Build additional scans section
let additionalSection = '';
if (additionalScans && additionalScans.length > 0) {
if (additionalScans.length > 0) {
additionalSection = '\n## Authenticated Scans\n';
additionalScans.forEach(scan => {
const s = scan as StitchableResult;
if (s && s.tool) {
additionalSection += `
### ${s.tool.toUpperCase()}
Status: ${s.status}
${s.output}
for (const scan of additionalScans) {
additionalSection += `
### ${scan.tool.toUpperCase()}
Status: ${scan.status}
${scan.output}
`;
}
});
}
}
const nmapResult = nmap as StitchableResult | string | undefined;
const subfinderResult = subfinder as StitchableResult | string | undefined;
const whatwebResult = whatweb as StitchableResult | string | undefined;
const naabuResult = naabu as StitchableResult | string | undefined;
const getStatus = (r: StitchableResult | string | undefined): string => {
if (!r) return 'Skipped';
if (typeof r === 'string') return 'Skipped';
return r.status || 'Skipped';
};
const getOutput = (r: StitchableResult | string | undefined): string => {
if (!r) return 'No output';
if (typeof r === 'string') return r;
return r.output || 'No output';
};
const nmap = extractResult(wave1.nmap);
const subfinder = extractResult(wave1.subfinder);
const whatweb = extractResult(wave1.whatweb);
const naabu = extractResult(wave1.naabu);
const report = `
# Pre-Reconnaissance Report
## Port Discovery (naabu)
Status: ${getStatus(naabuResult)}
${getOutput(naabuResult)}
Status: ${naabu.status}
${naabu.output}
## Network Scanning (nmap)
Status: ${getStatus(nmapResult)}
${getOutput(nmapResult)}
Status: ${nmap.status}
${nmap.output}
## Subdomain Discovery (subfinder)
Status: ${getStatus(subfinderResult)}
${getOutput(subfinderResult)}
Status: ${subfinder.status}
${subfinder.output}
## Technology Detection (whatweb)
Status: ${getStatus(whatwebResult)}
${getOutput(whatwebResult)}
Status: ${whatweb.status}
${whatweb.output}
## Code Analysis
${codeAnalysisContent}
${additionalSection}
@@ -375,16 +370,8 @@ export async function executePreReconPhase(
console.log(chalk.green(' ✅ Wave 2 operations completed'));
console.log(chalk.blue('📝 Stitching pre-recon outputs...'));
// Combine wave 1 and wave 2 results for stitching
const allResults: (StitchableResult | string | undefined)[] = [
wave1Results.nmap as StitchableResult | string,
wave1Results.subfinder as StitchableResult | string,
wave1Results.whatweb as StitchableResult | string,
wave1Results.naabu as StitchableResult | string | undefined,
wave1Results.codeAnalysis as unknown as StitchableResult,
...(wave2Results.schemathesis ? [wave2Results.schemathesis as StitchableResult] : [])
];
const preReconReport = await stitchPreReconOutputs(allResults, sourceDir);
const additionalScans = wave2Results.schemathesis ? [wave2Results.schemathesis] : [];
const preReconReport = await stitchPreReconOutputs(wave1Results, additionalScans, sourceDir);
const duration = timer.stop();
console.log(chalk.green(`✅ Pre-reconnaissance complete in ${formatDuration(duration)}`));