Feat/temporal (#46)
* refactor: modularize claude-executor and extract shared utilities
- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts
* feat: add Claude Code custom commands for debug and review
* feat: add Temporal integration foundation (phase 1-2)
- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence
* feat: add Temporal activities for agent execution (phase 3)
- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state
* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)
* feat: add Temporal worker, client, and query tools (phase 5)
- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker
* feat: fix Docker worker container setup
- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable
* fix: add report assembly step to Temporal workflow
- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header
* refactor: consolidate Docker setup to root docker-compose.yml
* feat: improve Temporal client UX and env handling
- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands
* refactor: simplify session ID handling and improve Taskfile options
- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters
* chore: add .env.example and simplify .gitignore
* docs: update README and CLAUDE.md for Temporal workflow usage
- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview
* refactor: replace Taskfile with bash CLI script
- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands
* docs: fix deliverable filename in README
* refactor: remove direct CLI and .shannon-store.json in favor of Temporal
- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references
* chore: remove licensing comments from prompt files to prevent leaking into actual prompts
* fix: resolve parallel workflow race conditions and retry logic bugs
- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows
* refactor: pipeline vuln→exploit workflow for parallel execution
- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics
* fix: re-throw retryable errors in checkExploitationQueue
* fix: detect and retry on Claude Code spending cap errors
- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success
* fix: increase heartbeat timeout to prevent false worker-dead detection
Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).
* fix: temporal db init
* fix: persist home dir
* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>
- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access
---------
Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
This commit is contained in:
committed by
GitHub
parent
45acb16711
commit
51e621d0d5
@@ -0,0 +1,139 @@
|
|||||||
|
---
|
||||||
|
description: Systematically debug errors using context analysis and structured recovery
|
||||||
|
---
|
||||||
|
|
||||||
|
You are debugging an issue. Follow this structured approach to avoid spinning in circles.
|
||||||
|
|
||||||
|
## Step 1: Capture Error Context
|
||||||
|
- Read the full error message and stack trace
|
||||||
|
- Identify the layer where the error originated:
|
||||||
|
- **CLI/Args** - Input validation, path resolution
|
||||||
|
- **Config Parsing** - YAML parsing, JSON Schema validation
|
||||||
|
- **Session Management** - Mutex, session.json, lock files
|
||||||
|
- **Audit System** - Logging, metrics tracking, atomic writes
|
||||||
|
- **Claude SDK** - Agent execution, MCP servers, turn handling
|
||||||
|
- **Git Operations** - Checkpoints, rollback, commit
|
||||||
|
- **Tool Execution** - nmap, subfinder, whatweb
|
||||||
|
- **Validation** - Deliverable checks, queue validation
|
||||||
|
|
||||||
|
## Step 2: Check Relevant Logs
|
||||||
|
|
||||||
|
**Session audit logs:**
|
||||||
|
```bash
|
||||||
|
# Find most recent session
|
||||||
|
ls -lt audit-logs/ | head -5
|
||||||
|
|
||||||
|
# Check session metrics and errors
|
||||||
|
cat audit-logs/<session>/session.json | jq '.errors, .agentMetrics'
|
||||||
|
|
||||||
|
# Check agent execution logs
|
||||||
|
ls -lt audit-logs/<session>/agents/
|
||||||
|
cat audit-logs/<session>/agents/<latest>.log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 3: Trace the Call Path
|
||||||
|
|
||||||
|
For Shannon, trace through these layers:
|
||||||
|
|
||||||
|
1. **Temporal Client** → `src/temporal/client.ts` - Workflow initiation
|
||||||
|
2. **Workflow** → `src/temporal/workflows.ts` - Pipeline orchestration
|
||||||
|
3. **Activities** → `src/temporal/activities.ts` - Agent execution with heartbeats
|
||||||
|
4. **Config** → `src/config-parser.ts` - YAML loading, schema validation
|
||||||
|
5. **Session** → `src/session-manager.ts` - Agent definitions, execution order
|
||||||
|
6. **Audit** → `src/audit/audit-session.ts` - Logging facade, metrics tracking
|
||||||
|
7. **Executor** → `src/ai/claude-executor.ts` - SDK calls, MCP setup, retry logic
|
||||||
|
8. **Validation** → `src/queue-validation.ts` - Deliverable checks
|
||||||
|
|
||||||
|
## Step 4: Identify Root Cause
|
||||||
|
|
||||||
|
**Common Shannon-specific issues:**
|
||||||
|
|
||||||
|
| Symptom | Likely Cause | Fix |
|
||||||
|
|---------|--------------|-----|
|
||||||
|
| Agent hangs indefinitely | MCP server crashed, Playwright timeout | Check Playwright logs in `/tmp/playwright-*` |
|
||||||
|
| "Validation failed: Missing deliverable" | Agent didn't create expected file | Check `deliverables/` dir, review prompt |
|
||||||
|
| Git checkpoint fails | Uncommitted changes, git lock | Run `git status`, remove `.git/index.lock` |
|
||||||
|
| "Session limit reached" | Claude API billing limit | Not retryable - check API usage |
|
||||||
|
| Parallel agents all fail | Shared resource contention | Check mutex usage, stagger startup timing |
|
||||||
|
| Cost/timing not tracked | Metrics not reloaded before update | Add `metricsTracker.reload()` before updates |
|
||||||
|
| session.json corrupted | Partial write during crash | Delete and restart, or restore from backup |
|
||||||
|
| YAML config rejected | Invalid schema or unsafe content | Run through AJV validator manually |
|
||||||
|
| Prompt variable not replaced | Missing `{{VARIABLE}}` in context | Check `prompt-manager.ts` interpolation |
|
||||||
|
|
||||||
|
**MCP Server Issues:**
|
||||||
|
```bash
|
||||||
|
# Check if Playwright browsers are installed
|
||||||
|
npx playwright install chromium
|
||||||
|
|
||||||
|
# Check MCP server startup (look for connection errors)
|
||||||
|
grep -i "mcp\|playwright" audit-logs/<session>/agents/*.log
|
||||||
|
```
|
||||||
|
|
||||||
|
**Git State Issues:**
|
||||||
|
```bash
|
||||||
|
# Check for uncommitted changes
|
||||||
|
git status
|
||||||
|
|
||||||
|
# Check for git locks
|
||||||
|
ls -la .git/*.lock
|
||||||
|
|
||||||
|
# View recent git operations from Shannon
|
||||||
|
git reflog | head -10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Apply Fix with Retry Limit
|
||||||
|
|
||||||
|
- **CRITICAL**: Track consecutive failed attempts
|
||||||
|
- After **3 consecutive failures** on the same issue, STOP and:
|
||||||
|
- Summarize what was tried
|
||||||
|
- Explain what's blocking progress
|
||||||
|
- Ask the user for guidance or additional context
|
||||||
|
- After a successful fix, reset the failure counter
|
||||||
|
|
||||||
|
## Step 6: Validate the Fix
|
||||||
|
|
||||||
|
**For code changes:**
|
||||||
|
```bash
|
||||||
|
# Compile TypeScript
|
||||||
|
npx tsc --noEmit
|
||||||
|
|
||||||
|
# Quick validation run
|
||||||
|
shannon <URL> <REPO> --pipeline-testing
|
||||||
|
```
|
||||||
|
|
||||||
|
**For audit/session issues:**
|
||||||
|
- Verify `session.json` is valid JSON after fix
|
||||||
|
- Check that atomic writes complete without errors
|
||||||
|
- Confirm mutex release in `finally` blocks
|
||||||
|
|
||||||
|
**For agent issues:**
|
||||||
|
- Verify deliverable files are created in correct location
|
||||||
|
- Check that validation functions return expected results
|
||||||
|
- Confirm retry logic triggers on appropriate errors
|
||||||
|
|
||||||
|
## Anti-Patterns to Avoid
|
||||||
|
|
||||||
|
- Don't delete `session.json` without checking if session is active
|
||||||
|
- Don't modify git state while an agent is running
|
||||||
|
- Don't retry billing/quota errors (they're not retryable)
|
||||||
|
- Don't ignore PentestError type - it indicates the error category
|
||||||
|
- Don't make random changes hoping something works
|
||||||
|
- Don't fix symptoms without understanding root cause
|
||||||
|
- Don't bypass mutex protection for "quick fixes"
|
||||||
|
|
||||||
|
## Quick Reference: Error Types
|
||||||
|
|
||||||
|
| PentestError Type | Meaning | Retryable? |
|
||||||
|
|-------------------|---------|------------|
|
||||||
|
| `config` | Configuration file issues | No |
|
||||||
|
| `network` | Connection/timeout issues | Yes |
|
||||||
|
| `tool` | External tool (nmap, etc.) failed | Yes |
|
||||||
|
| `prompt` | Claude SDK/API issues | Sometimes |
|
||||||
|
| `filesystem` | File read/write errors | Sometimes |
|
||||||
|
| `validation` | Deliverable validation failed | Yes (via retry) |
|
||||||
|
| `billing` | API quota/billing limit | No |
|
||||||
|
| `unknown` | Unexpected error | Depends |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Now analyze the error and begin debugging systematically.
|
||||||
@@ -0,0 +1,120 @@
|
|||||||
|
---
|
||||||
|
description: Review code changes for Shannon-specific patterns, security, and common mistakes
|
||||||
|
---
|
||||||
|
|
||||||
|
Review the current changes (staged or working directory) with focus on Shannon-specific patterns and common mistakes.
|
||||||
|
|
||||||
|
## Step 1: Gather Changes
|
||||||
|
Run these commands to understand the scope:
|
||||||
|
```bash
|
||||||
|
git diff --stat HEAD
|
||||||
|
git diff HEAD
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2: Check Shannon-Specific Patterns
|
||||||
|
|
||||||
|
### Error Handling (CRITICAL)
|
||||||
|
- [ ] **All errors use PentestError** - Never use raw `Error`. Use `new PentestError(message, type, retryable, context)`
|
||||||
|
- [ ] **Error type is appropriate** - Use correct type: 'config', 'network', 'tool', 'prompt', 'filesystem', 'validation', 'billing', 'unknown'
|
||||||
|
- [ ] **Retryable flag matches behavior** - If error will be retried, set `retryable: true`
|
||||||
|
- [ ] **Context includes debugging info** - Add relevant paths, tool names, error codes to context object
|
||||||
|
- [ ] **Never swallow errors silently** - Always log or propagate errors
|
||||||
|
|
||||||
|
### Audit System & Concurrency (CRITICAL)
|
||||||
|
- [ ] **Mutex protection for parallel operations** - Use `sessionMutex.lock()` when updating `session.json` during parallel agent execution
|
||||||
|
- [ ] **Reload before modify** - Always call `this.metricsTracker.reload()` before updating metrics in mutex block
|
||||||
|
- [ ] **Atomic writes for session.json** - Use `atomicWrite()` for session metadata, never `fs.writeFile()` directly
|
||||||
|
- [ ] **Stream drain handling** - Log writes must wait for buffer drain before resolving
|
||||||
|
- [ ] **Semaphore release in finally** - Git semaphore must be released in `finally` block
|
||||||
|
|
||||||
|
### Claude SDK Integration (CRITICAL)
|
||||||
|
- [ ] **MCP server configuration** - Verify Playwright MCP uses `--isolated` and unique `--user-data-dir`
|
||||||
|
- [ ] **Prompt variable interpolation** - Check all `{{VARIABLE}}` placeholders are replaced
|
||||||
|
- [ ] **Turn counting** - Increment `turnCount` on assistant messages, not tool calls
|
||||||
|
- [ ] **Cost tracking** - Extract cost from final `result` message, track even on failure
|
||||||
|
- [ ] **API error detection** - Check for "session limit reached" (fatal) vs other errors
|
||||||
|
|
||||||
|
### Configuration & Validation (CRITICAL)
|
||||||
|
- [ ] **FAILSAFE_SCHEMA for YAML** - Never use default schema (prevents code execution)
|
||||||
|
- [ ] **Security pattern detection** - Check for path traversal (`../`), HTML injection (`<>`), JavaScript URLs
|
||||||
|
- [ ] **Rule conflict detection** - Rules cannot appear in both `avoid` AND `focus`
|
||||||
|
- [ ] **Duplicate rule detection** - Same `type:url_path` cannot appear twice
|
||||||
|
- [ ] **JSON Schema validation before use** - Config must pass AJV validation
|
||||||
|
|
||||||
|
### Session & Agent Management (CRITICAL)
|
||||||
|
- [ ] **Deliverable dependencies respected** - Exploitation agents only run if vulnerability queue exists AND has items
|
||||||
|
- [ ] **Queue validation before exploitation** - Use `safeValidateQueueAndDeliverable()` to check eligibility
|
||||||
|
- [ ] **Git checkpoint before agent run** - Create checkpoint for rollback on failure
|
||||||
|
- [ ] **Git rollback on retry** - Call `rollbackGitWorkspace()` before each retry attempt
|
||||||
|
- [ ] **Agent prerequisites checked** - Verify prerequisite agents completed before running dependent agent
|
||||||
|
|
||||||
|
### Parallel Execution
|
||||||
|
- [ ] **Promise.allSettled for parallel agents** - Never use `Promise.all` (partial failures should not crash batch)
|
||||||
|
- [ ] **Staggered startup** - 2-second delay between parallel agent starts to prevent API throttle
|
||||||
|
- [ ] **Individual retry loops** - Each agent retries independently (3 attempts max)
|
||||||
|
- [ ] **Results aggregated correctly** - Handle both 'fulfilled' and 'rejected' results from `Promise.allSettled`
|
||||||
|
|
||||||
|
## Step 3: TypeScript Safety
|
||||||
|
|
||||||
|
### Type Assertions (WARNING)
|
||||||
|
- [ ] **No double casting** - Never use `as unknown as SomeType` (bypasses type safety)
|
||||||
|
- [ ] **Validate before casting** - JSON parsed data should be validated (JSON Schema) before `as Type`
|
||||||
|
- [ ] **Prefer type guards** - Use `instanceof` or property checks instead of assertions where possible
|
||||||
|
|
||||||
|
### Null/Undefined Handling
|
||||||
|
- [ ] **Explicit null checks** - Use `if (x === null || x === undefined)` not truthy checks for critical paths
|
||||||
|
- [ ] **Nullish coalescing** - Use `??` for null/undefined, not `||` which also catches empty string/0
|
||||||
|
- [ ] **Optional chaining** - Use `?.` for nested property access on potentially undefined objects
|
||||||
|
|
||||||
|
### Imports & Types
|
||||||
|
- [ ] **Type imports** - Use `import type { ... }` for type-only imports
|
||||||
|
- [ ] **No implicit any** - All function parameters and returns must have explicit types
|
||||||
|
- [ ] **Readonly for constants** - Use `Object.freeze()` and `Readonly<>` for immutable data
|
||||||
|
|
||||||
|
## Step 4: Security Review
|
||||||
|
|
||||||
|
### Defensive Tool Security
|
||||||
|
- [ ] **No credentials in logs** - Check that passwords, tokens, TOTP secrets are not logged to audit files
|
||||||
|
- [ ] **Config file size limit** - Ensure 1MB max for config files (DoS prevention)
|
||||||
|
- [ ] **Safe shell execution** - Command arguments must be escaped/sanitized
|
||||||
|
|
||||||
|
### Code Injection Prevention
|
||||||
|
- [ ] **YAML safe parsing** - FAILSAFE_SCHEMA only
|
||||||
|
- [ ] **No eval/Function** - Never use dynamic code evaluation
|
||||||
|
- [ ] **Input validation at boundaries** - URLs, paths validated before use
|
||||||
|
|
||||||
|
## Step 5: Common Mistakes to Avoid
|
||||||
|
|
||||||
|
### Anti-Patterns Found in Codebase
|
||||||
|
- [ ] **Catch + re-throw without context** - Don't just `throw error`, wrap with additional context
|
||||||
|
- [ ] **Silent failures in session loading** - Corrupted session files should warn user, not silently reset
|
||||||
|
- [ ] **Duplicate retry logic** - Don't implement retry at both caller and callee level
|
||||||
|
- [ ] **Hardcoded error message matching** - Prefer error codes over regex on error.message
|
||||||
|
- [ ] **Missing timeout on long operations** - Git operations and API calls should have timeouts
|
||||||
|
|
||||||
|
### Code Quality
|
||||||
|
- [ ] **No dead code added** - Remove unused imports, functions, variables
|
||||||
|
- [ ] **No over-engineering** - Don't add abstractions for single-use operations
|
||||||
|
- [ ] **Comments only where needed** - Self-documenting code preferred over excessive comments
|
||||||
|
- [ ] **Consistent file naming** - kebab-case for files (e.g., `queue-validation.ts`)
|
||||||
|
|
||||||
|
## Step 6: Provide Feedback
|
||||||
|
|
||||||
|
For each issue found:
|
||||||
|
1. **Location**: File and line number
|
||||||
|
2. **Issue**: What's wrong and why it matters
|
||||||
|
3. **Fix**: How to correct it (with code example if helpful)
|
||||||
|
4. **Severity**: Critical / Warning / Suggestion
|
||||||
|
|
||||||
|
### Severity Definitions
|
||||||
|
- **Critical**: Will cause bugs, crashes, data loss, or security issues
|
||||||
|
- **Warning**: Code smell, inconsistent pattern, or potential future issue
|
||||||
|
- **Suggestion**: Style improvement or minor enhancement
|
||||||
|
|
||||||
|
Summarize with:
|
||||||
|
- Total issues by severity
|
||||||
|
- Overall assessment (Ready to commit / Needs fixes / Needs discussion)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Now review the current changes.
|
||||||
@@ -18,7 +18,6 @@ xben-benchmark-results/
|
|||||||
# Development files
|
# Development files
|
||||||
*.md
|
*.md
|
||||||
!CLAUDE.md
|
!CLAUDE.md
|
||||||
.env*
|
|
||||||
.DS_Store
|
.DS_Store
|
||||||
Thumbs.db
|
Thumbs.db
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# Shannon Environment Configuration
|
||||||
|
# Copy this file to .env and fill in your credentials
|
||||||
|
|
||||||
|
# Anthropic API Key (required - choose one)
|
||||||
|
ANTHROPIC_API_KEY=your-api-key-here
|
||||||
|
|
||||||
|
# OR use OAuth token instead
|
||||||
|
# CLAUDE_CODE_OAUTH_TOKEN=your-oauth-token-here
|
||||||
+2
-3
@@ -1,5 +1,4 @@
|
|||||||
node_modules/
|
node_modules/
|
||||||
.shannon-store.json
|
.env
|
||||||
agent-logs/
|
audit-logs/
|
||||||
/audit-logs/
|
|
||||||
dist/
|
dist/
|
||||||
|
|||||||
@@ -8,58 +8,64 @@ This is an AI-powered penetration testing agent designed for defensive security
|
|||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
### Installation & Setup
|
### Prerequisites
|
||||||
|
- **Docker** - Container runtime
|
||||||
|
- **Anthropic API key** - Set in `.env` file
|
||||||
|
|
||||||
|
### Running the Penetration Testing Agent (Docker + Temporal)
|
||||||
```bash
|
```bash
|
||||||
npm install
|
# Configure credentials
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env:
|
||||||
|
# ANTHROPIC_API_KEY=your-key
|
||||||
|
# CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 # Prevents token limits during long reports
|
||||||
|
|
||||||
|
# Start a pentest workflow
|
||||||
|
./shannon start URL=<url> REPO=<path>
|
||||||
```
|
```
|
||||||
|
|
||||||
### Running the Penetration Testing Agent
|
Examples:
|
||||||
```bash
|
```bash
|
||||||
shannon <WEB_URL> <REPO_PATH> [--config <CONFIG_FILE>] [--output <OUTPUT_DIR>]
|
./shannon start URL=https://example.com REPO=/path/to/repo
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./configs/my-config.yaml
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo OUTPUT=./my-reports
|
||||||
```
|
```
|
||||||
|
|
||||||
Example:
|
### Monitoring Progress
|
||||||
```bash
|
```bash
|
||||||
shannon "https://example.com" "/path/to/local/repo"
|
./shannon logs # View real-time worker logs
|
||||||
shannon "https://juice-shop.herokuapp.com" "/home/user/juice-shop" --config juice-shop-config.yaml
|
./shannon query ID=<workflow-id> # Query specific workflow progress
|
||||||
shannon "https://example.com" "/path/to/repo" --output /path/to/reports
|
# Temporal Web UI available at http://localhost:8233
|
||||||
```
|
```
|
||||||
|
|
||||||
### Alternative Execution
|
### Stopping Shannon
|
||||||
```bash
|
```bash
|
||||||
npm start <WEB_URL> <REPO_PATH> --config <CONFIG_FILE>
|
./shannon stop # Stop containers (preserves workflow data)
|
||||||
|
./shannon stop CLEAN=true # Full cleanup including volumes
|
||||||
```
|
```
|
||||||
|
|
||||||
### Options
|
### Options
|
||||||
```bash
|
```bash
|
||||||
--config <file> YAML configuration file for authentication and testing parameters
|
CONFIG=<file> YAML configuration file for authentication and testing parameters
|
||||||
--output <path> Custom output directory for session folder (default: ./audit-logs/)
|
OUTPUT=<path> Custom output directory for session folder (default: ./audit-logs/)
|
||||||
--pipeline-testing Use minimal prompts for fast pipeline testing (creates minimal deliverables)
|
PIPELINE_TESTING=true Use minimal prompts and fast retry intervals (10s instead of 5min)
|
||||||
--disable-loader Disable the animated progress loader (useful when logs interfere with spinner)
|
REBUILD=true Force Docker rebuild with --no-cache (use when code changes aren't picked up)
|
||||||
--help Show help message
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration Validation
|
|
||||||
```bash
|
|
||||||
# Configuration validation is built into the main script
|
|
||||||
shannon --help # Shows usage and validates config on execution
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Generate TOTP for Authentication
|
### Generate TOTP for Authentication
|
||||||
TOTP generation is now handled automatically via the `generate_totp` MCP tool during authentication flows.
|
TOTP generation is handled automatically via the `generate_totp` MCP tool during authentication flows.
|
||||||
|
|
||||||
### Development Commands
|
### Development Commands
|
||||||
```bash
|
```bash
|
||||||
# No linting or testing commands available in this project
|
# Build TypeScript
|
||||||
# Development is done by running the agent in pipeline-testing mode
|
npm run build
|
||||||
shannon <WEB_URL> <REPO_PATH> --pipeline-testing
|
|
||||||
|
# Run with pipeline testing mode (fast, minimal deliverables)
|
||||||
|
./shannon start URL=<url> REPO=<path> PIPELINE_TESTING=true
|
||||||
```
|
```
|
||||||
|
|
||||||
## Architecture & Components
|
## Architecture & Components
|
||||||
|
|
||||||
### Main Entry Point
|
|
||||||
- `src/shannon.ts` - Main orchestration script that coordinates the entire penetration testing workflow (compiles to `dist/shannon.js`)
|
|
||||||
|
|
||||||
### Core Modules
|
### Core Modules
|
||||||
- `src/config-parser.ts` - Handles YAML configuration parsing, validation, and distribution to agents
|
- `src/config-parser.ts` - Handles YAML configuration parsing, validation, and distribution to agents
|
||||||
- `src/error-handling.ts` - Comprehensive error handling with retry logic and categorized error types
|
- `src/error-handling.ts` - Comprehensive error handling with retry logic and categorized error types
|
||||||
@@ -67,6 +73,21 @@ shannon <WEB_URL> <REPO_PATH> --pipeline-testing
|
|||||||
- `src/session-manager.ts` - Agent definitions, execution order, and parallel groups
|
- `src/session-manager.ts` - Agent definitions, execution order, and parallel groups
|
||||||
- `src/queue-validation.ts` - Validates deliverables and agent prerequisites
|
- `src/queue-validation.ts` - Validates deliverables and agent prerequisites
|
||||||
|
|
||||||
|
### Temporal Orchestration Layer
|
||||||
|
Shannon uses Temporal for durable workflow orchestration:
|
||||||
|
- `src/temporal/shared.ts` - Types, interfaces, query definitions
|
||||||
|
- `src/temporal/workflows.ts` - Main workflow (pentestPipelineWorkflow)
|
||||||
|
- `src/temporal/activities.ts` - Activity implementations with heartbeats
|
||||||
|
- `src/temporal/worker.ts` - Worker process entry point
|
||||||
|
- `src/temporal/client.ts` - CLI client for starting workflows
|
||||||
|
- `src/temporal/query.ts` - Query tool for progress inspection
|
||||||
|
|
||||||
|
Key features:
|
||||||
|
- **Crash recovery** - Workflows resume automatically after worker restart
|
||||||
|
- **Queryable progress** - Real-time status via `./shannon query` or Temporal Web UI
|
||||||
|
- **Intelligent retry** - Distinguishes transient vs permanent errors
|
||||||
|
- **Parallel execution** - 5 concurrent agents in vulnerability/exploitation phases
|
||||||
|
|
||||||
### Five-Phase Testing Workflow
|
### Five-Phase Testing Workflow
|
||||||
|
|
||||||
1. **Pre-Reconnaissance** (`pre-recon`) - External tool scans (nmap, subfinder, whatweb) + source code analysis
|
1. **Pre-Reconnaissance** (`pre-recon`) - External tool scans (nmap, subfinder, whatweb) + source code analysis
|
||||||
@@ -147,7 +168,6 @@ The agent implements a crash-safe audit system with the following features:
|
|||||||
- `{hostname}_{sessionId}/prompts/` - Exact prompts used for reproducibility
|
- `{hostname}_{sessionId}/prompts/` - Exact prompts used for reproducibility
|
||||||
- `{hostname}_{sessionId}/agents/` - Turn-by-turn execution logs
|
- `{hostname}_{sessionId}/agents/` - Turn-by-turn execution logs
|
||||||
- `{hostname}_{sessionId}/deliverables/` - Security reports and findings
|
- `{hostname}_{sessionId}/deliverables/` - Security reports and findings
|
||||||
- **.shannon-store.json**: Minimal session lock file (prevents concurrent runs)
|
|
||||||
|
|
||||||
**Crash Safety:**
|
**Crash Safety:**
|
||||||
- Append-only logging with immediate flush (survives kill -9)
|
- Append-only logging with immediate flush (survives kill -9)
|
||||||
@@ -159,22 +179,47 @@ The agent implements a crash-safe audit system with the following features:
|
|||||||
- 5x faster execution with parallel vulnerability and exploitation phases
|
- 5x faster execution with parallel vulnerability and exploitation phases
|
||||||
|
|
||||||
**Metrics & Reporting:**
|
**Metrics & Reporting:**
|
||||||
- Export metrics to CSV with `./scripts/export-metrics.js`
|
|
||||||
- Phase-level and agent-level timing/cost aggregations
|
- Phase-level and agent-level timing/cost aggregations
|
||||||
- Validation results integrated with metrics
|
- Validation results integrated with metrics
|
||||||
|
|
||||||
For detailed design, see `docs/unified-audit-system-design.md`.
|
|
||||||
|
|
||||||
## Development Notes
|
## Development Notes
|
||||||
|
|
||||||
|
### Learning from Reference Implementations
|
||||||
|
|
||||||
|
A working POC exists at `/Users/arjunmalleswaran/Code/shannon-pocs` that demonstrates the ideal Temporal + Claude Agent SDK integration. When implementing Temporal features, agents can ask questions in the chat, and the user will relay them to another Claude Code session working in that POC directory.
|
||||||
|
|
||||||
|
**How to use this approach:**
|
||||||
|
1. When stuck or unsure about Temporal patterns, write a specific question in the chat
|
||||||
|
2. The user will ask an agent working on the POC to answer
|
||||||
|
3. The user relays the answer (code snippets, patterns, explanations) back
|
||||||
|
4. Apply the learned patterns to Shannon's codebase
|
||||||
|
|
||||||
|
**Example questions to ask:**
|
||||||
|
- "How does the POC structure its workflow to handle parallel activities?"
|
||||||
|
- "Show me how heartbeats are implemented in the POC's activities"
|
||||||
|
- "What retry configuration does the POC use for long-running agent activities?"
|
||||||
|
- "How does the POC integrate Claude Agent SDK calls within Temporal activities?"
|
||||||
|
|
||||||
|
**Reference implementation:**
|
||||||
|
- **Temporal + Claude Agent SDK**: `/Users/arjunmalleswaran/Code/shannon-pocs` - working implementation demonstrating workflows, activities, worker setup, and SDK integration
|
||||||
|
|
||||||
|
### Adding a New Agent
|
||||||
|
1. Define the agent in `src/session-manager.ts` (add to `AGENT_QUEUE` and appropriate parallel group)
|
||||||
|
2. Create prompt template in `prompts/` (e.g., `vuln-newtype.txt` or `exploit-newtype.txt`)
|
||||||
|
3. Add activity function in `src/temporal/activities.ts`
|
||||||
|
4. Register activity in `src/temporal/workflows.ts` within the appropriate phase
|
||||||
|
|
||||||
|
### Modifying Prompts
|
||||||
|
- Prompt templates use variable substitution: `{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`, `{{LOGIN_INSTRUCTIONS}}`
|
||||||
|
- Shared partials in `prompts/shared/` are included via `prompt-manager.ts`
|
||||||
|
- Test changes with `PIPELINE_TESTING=true` for faster iteration
|
||||||
|
|
||||||
### Key Design Patterns
|
### Key Design Patterns
|
||||||
- **Configuration-Driven Architecture**: YAML configs with JSON Schema validation
|
- **Configuration-Driven Architecture**: YAML configs with JSON Schema validation
|
||||||
- **Modular Error Handling**: Categorized error types with retry logic
|
- **Modular Error Handling**: Categorized error types with retry logic
|
||||||
- **Pure Functions**: Most functionality is implemented as pure functions for testability
|
|
||||||
- **SDK-First Approach**: Heavy reliance on Claude Agent SDK for autonomous AI operations
|
- **SDK-First Approach**: Heavy reliance on Claude Agent SDK for autonomous AI operations
|
||||||
- **Progressive Analysis**: Each phase builds on previous phase results
|
- **Progressive Analysis**: Each phase builds on previous phase results
|
||||||
- **Local Repository Setup**: Target applications are accessed directly from user-provided local directories
|
|
||||||
- **Fire-and-Forget Execution**: Single entry point, runs all phases to completion
|
|
||||||
|
|
||||||
### Error Handling Strategy
|
### Error Handling Strategy
|
||||||
The application uses a comprehensive error handling system with:
|
The application uses a comprehensive error handling system with:
|
||||||
@@ -186,7 +231,7 @@ The application uses a comprehensive error handling system with:
|
|||||||
### Testing Mode
|
### Testing Mode
|
||||||
The agent includes a testing mode that skips external tool execution for faster development cycles:
|
The agent includes a testing mode that skips external tool execution for faster development cycles:
|
||||||
```bash
|
```bash
|
||||||
shannon <WEB_URL> <REPO_PATH> --pipeline-testing
|
./shannon start URL=<url> REPO=<path> PIPELINE_TESTING=true
|
||||||
```
|
```
|
||||||
|
|
||||||
### Security Focus
|
### Security Focus
|
||||||
@@ -198,107 +243,49 @@ This is explicitly designed as a **defensive security tool** for:
|
|||||||
|
|
||||||
The tool should only be used on systems you own or have explicit permission to test.
|
The tool should only be used on systems you own or have explicit permission to test.
|
||||||
|
|
||||||
## File Structure
|
## Key Files & Directories
|
||||||
|
|
||||||
```
|
**Entry Points:**
|
||||||
src/ # TypeScript source files
|
- `src/temporal/workflows.ts` - Temporal workflow definition
|
||||||
├── shannon.ts # Main orchestration script (entry point)
|
- `src/temporal/activities.ts` - Activity implementations with heartbeats
|
||||||
├── constants.ts # Shared constants
|
- `src/temporal/worker.ts` - Worker process entry point
|
||||||
├── config-parser.ts # Configuration handling
|
- `src/temporal/client.ts` - CLI client for starting workflows
|
||||||
├── error-handling.ts # Error management
|
|
||||||
├── tool-checker.ts # Tool validation
|
**Core Logic:**
|
||||||
├── session-manager.ts # Agent definitions, order, and parallel groups
|
- `src/session-manager.ts` - Agent definitions, execution order, parallel groups
|
||||||
├── queue-validation.ts # Deliverable validation
|
- `src/ai/claude-executor.ts` - Claude Agent SDK integration
|
||||||
├── splash-screen.ts # ASCII art splash screen
|
- `src/config-parser.ts` - YAML config parsing with JSON Schema validation
|
||||||
├── progress-indicator.ts # Progress display utilities
|
- `src/audit/` - Crash-safe logging and metrics system
|
||||||
├── types/ # TypeScript type definitions
|
|
||||||
│ ├── index.ts # Barrel exports
|
**Configuration:**
|
||||||
│ ├── agents.ts # Agent type definitions
|
- `shannon` - CLI script for running pentests
|
||||||
│ ├── config.ts # Configuration interfaces
|
- `docker-compose.yml` - Temporal server + worker containers
|
||||||
│ ├── errors.ts # Error type definitions
|
- `configs/` - YAML configs with `config-schema.json` for validation
|
||||||
│ └── session.ts # Session type definitions
|
- `prompts/` - AI prompt templates (`vuln-*.txt`, `exploit-*.txt`, etc.)
|
||||||
├── audit/ # Audit system
|
|
||||||
│ ├── index.ts # Public API
|
**Output:**
|
||||||
│ ├── audit-session.ts # Main facade (logger + metrics + mutex)
|
- `audit-logs/{hostname}_{sessionId}/` - Session metrics, agent logs, deliverables
|
||||||
│ ├── logger.ts # Append-only crash-safe logging
|
|
||||||
│ ├── metrics-tracker.ts # Timing, cost, attempt tracking
|
|
||||||
│ └── utils.ts # Path generation, atomic writes
|
|
||||||
├── ai/
|
|
||||||
│ └── claude-executor.ts # Claude Agent SDK integration
|
|
||||||
├── phases/
|
|
||||||
│ ├── pre-recon.ts # Pre-reconnaissance phase
|
|
||||||
│ └── reporting.ts # Final report assembly
|
|
||||||
├── prompts/
|
|
||||||
│ └── prompt-manager.ts # Prompt loading and variable substitution
|
|
||||||
├── setup/
|
|
||||||
│ └── environment.ts # Local repository setup
|
|
||||||
├── cli/
|
|
||||||
│ ├── ui.ts # Help text display
|
|
||||||
│ └── input-validator.ts # URL and path validation
|
|
||||||
└── utils/
|
|
||||||
├── git-manager.ts # Git operations
|
|
||||||
├── metrics.ts # Timing utilities
|
|
||||||
├── output-formatter.ts # Output formatting utilities
|
|
||||||
└── concurrency.ts # SessionMutex for parallel execution
|
|
||||||
dist/ # Compiled JavaScript output
|
|
||||||
├── shannon.js # Compiled entry point
|
|
||||||
└── ... # Other compiled files
|
|
||||||
package.json # Node.js dependencies
|
|
||||||
.shannon-store.json # Session lock file
|
|
||||||
audit-logs/ # Centralized audit data (default, or use --output)
|
|
||||||
└── {hostname}_{sessionId}/
|
|
||||||
├── session.json # Comprehensive metrics
|
|
||||||
├── prompts/ # Prompt snapshots
|
|
||||||
│ └── {agent}.md
|
|
||||||
├── agents/ # Agent execution logs
|
|
||||||
│ └── {timestamp}_{agent}_attempt-{N}.log
|
|
||||||
└── deliverables/ # Security reports and findings
|
|
||||||
└── ...
|
|
||||||
configs/ # Configuration files
|
|
||||||
├── config-schema.json # JSON Schema validation
|
|
||||||
├── example-config.yaml # Template configuration
|
|
||||||
├── juice-shop-config.yaml # Juice Shop example
|
|
||||||
├── keygraph-config.yaml # Keygraph configuration
|
|
||||||
├── chatwoot-config.yaml # Chatwoot configuration
|
|
||||||
├── metabase-config.yaml # Metabase configuration
|
|
||||||
└── cal-com-config.yaml # Cal.com configuration
|
|
||||||
prompts/ # AI prompt templates
|
|
||||||
├── shared/ # Shared content for all prompts
|
|
||||||
│ ├── _target.txt # Target URL template
|
|
||||||
│ ├── _rules.txt # Rules template
|
|
||||||
│ ├── _vuln-scope.txt # Vulnerability scope template
|
|
||||||
│ ├── _exploit-scope.txt # Exploitation scope template
|
|
||||||
│ └── login-instructions.txt # Login flow template
|
|
||||||
├── pre-recon-code.txt # Code analysis
|
|
||||||
├── recon.txt # Reconnaissance
|
|
||||||
├── vuln-*.txt # Vulnerability assessment
|
|
||||||
├── exploit-*.txt # Exploitation
|
|
||||||
└── report-executive.txt # Executive reporting
|
|
||||||
scripts/ # Utility scripts
|
|
||||||
└── export-metrics.js # Export metrics to CSV
|
|
||||||
deliverables/ # Output directory (in target repo)
|
|
||||||
docs/ # Documentation
|
|
||||||
├── unified-audit-system-design.md
|
|
||||||
└── migration-guide.md
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Common Issues
|
### Common Issues
|
||||||
- **"A session is already running"**: Wait for the current session to complete, or delete `.shannon-store.json`
|
|
||||||
- **"Repository not found"**: Ensure target local directory exists and is accessible
|
- **"Repository not found"**: Ensure target local directory exists and is accessible
|
||||||
- **Concurrent runs blocked**: Only one session can run at a time per target
|
|
||||||
|
### Temporal & Docker Issues
|
||||||
|
- **"Temporal not ready"**: Wait for health check or run `docker compose logs temporal`
|
||||||
|
- **Worker not processing**: Ensure worker container is running with `docker compose ps`
|
||||||
|
- **Reset workflow state**: `./shannon stop CLEAN=true` removes all Temporal data and volumes
|
||||||
|
- **Local apps unreachable**: Use `host.docker.internal` instead of `localhost` for URLs
|
||||||
|
- **Container permissions**: On Linux, may need `sudo` for docker commands
|
||||||
|
|
||||||
### External Tool Dependencies
|
### External Tool Dependencies
|
||||||
Missing tools can be skipped using `--pipeline-testing` mode during development:
|
Missing tools can be skipped using `PIPELINE_TESTING=true` mode during development:
|
||||||
- `nmap` - Network scanning
|
- `nmap` - Network scanning
|
||||||
- `subfinder` - Subdomain discovery
|
- `subfinder` - Subdomain discovery
|
||||||
- `whatweb` - Web technology detection
|
- `whatweb` - Web technology detection
|
||||||
|
|
||||||
### Diagnostic & Utility Scripts
|
### Diagnostic & Utility Scripts
|
||||||
```bash
|
```bash
|
||||||
# Export metrics to CSV
|
# View Temporal workflow history
|
||||||
./scripts/export-metrics.js --session-id <id> --output metrics.csv
|
open http://localhost:8233
|
||||||
```
|
```
|
||||||
|
|
||||||
Note: For recovery from corrupted state, simply delete `.shannon-store.json` or edit JSON files directly.
|
|
||||||
|
|||||||
@@ -79,10 +79,11 @@ Shannon is available in two editions:
|
|||||||
- [Product Line](#-product-line)
|
- [Product Line](#-product-line)
|
||||||
- [Setup & Usage Instructions](#-setup--usage-instructions)
|
- [Setup & Usage Instructions](#-setup--usage-instructions)
|
||||||
- [Prerequisites](#prerequisites)
|
- [Prerequisites](#prerequisites)
|
||||||
- [Authentication Setup](#authentication-setup)
|
- [Quick Start](#quick-start)
|
||||||
- [Quick Start with Docker](#quick-start-with-docker)
|
- [Monitoring Progress](#monitoring-progress)
|
||||||
|
- [Stopping Shannon](#stopping-shannon)
|
||||||
|
- [Usage Examples](#usage-examples)
|
||||||
- [Configuration (Optional)](#configuration-optional)
|
- [Configuration (Optional)](#configuration-optional)
|
||||||
- [Usage Patterns](#usage-patterns)
|
|
||||||
- [Output and Results](#output-and-results)
|
- [Output and Results](#output-and-results)
|
||||||
- [Sample Reports & Benchmarks](#-sample-reports--benchmarks)
|
- [Sample Reports & Benchmarks](#-sample-reports--benchmarks)
|
||||||
- [Architecture](#-architecture)
|
- [Architecture](#-architecture)
|
||||||
@@ -98,36 +99,71 @@ Shannon is available in two editions:
|
|||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
- **Claude Console account with credits** - Required for AI-powered analysis
|
- **Docker** - Container runtime ([Install Docker](https://docs.docker.com/get-docker/))
|
||||||
- **Docker installed** - Primary deployment method
|
- **Anthropic API key or Claude Code OAuth token** - Get from [Anthropic Console](https://console.anthropic.com)
|
||||||
|
|
||||||
### Authentication Setup
|
### Quick Start
|
||||||
|
|
||||||
You need either a **Claude Code OAuth token** or an **Anthropic API key** to run Shannon. Get your token from the [Anthropic Console](https://console.anthropic.com) and pass it to Docker via the `-e` flag.
|
|
||||||
|
|
||||||
### Environment Configuration (Recommended)
|
|
||||||
|
|
||||||
To prevent Claude Code from hitting token limits during long report generation, set the max output tokens environment variable:
|
|
||||||
|
|
||||||
**For local runs:**
|
|
||||||
```bash
|
|
||||||
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
|
||||||
```
|
|
||||||
|
|
||||||
**For Docker runs:**
|
|
||||||
```bash
|
|
||||||
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Quick Start with Docker
|
|
||||||
|
|
||||||
#### Build the Container
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker build -t shannon:latest .
|
# 1. Clone Shannon
|
||||||
|
git clone https://github.com/KeygraphHQ/shannon.git
|
||||||
|
cd shannon
|
||||||
|
|
||||||
|
# 2. Configure credentials (choose one method)
|
||||||
|
|
||||||
|
# Option A: Export environment variables
|
||||||
|
export ANTHROPIC_API_KEY="your-api-key" # or CLAUDE_CODE_OAUTH_TOKEN
|
||||||
|
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 # recommended
|
||||||
|
|
||||||
|
# Option B: Create a .env file
|
||||||
|
cat > .env << 'EOF'
|
||||||
|
ANTHROPIC_API_KEY=your-api-key
|
||||||
|
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# 3. Run a pentest
|
||||||
|
./shannon start URL=https://your-app.com REPO=/path/to/your/repo
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Prepare Your Repository
|
Shannon will build the containers, start the workflow, and return a workflow ID. The pentest runs in the background.
|
||||||
|
|
||||||
|
### Monitoring Progress
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View real-time worker logs
|
||||||
|
./shannon logs
|
||||||
|
|
||||||
|
# Query a specific workflow's progress
|
||||||
|
./shannon query ID=shannon-1234567890
|
||||||
|
|
||||||
|
# Open the Temporal Web UI for detailed monitoring
|
||||||
|
open http://localhost:8233
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stopping Shannon
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop all containers (preserves workflow data)
|
||||||
|
./shannon stop
|
||||||
|
|
||||||
|
# Full cleanup (removes all data)
|
||||||
|
./shannon stop CLEAN=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic pentest
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo
|
||||||
|
|
||||||
|
# With a configuration file
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./configs/my-config.yaml
|
||||||
|
|
||||||
|
# Custom output directory
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo OUTPUT=./my-reports
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prepare Your Repository
|
||||||
|
|
||||||
Shannon is designed for **web application security testing** and expects all application code to be available in a single directory structure. This works well for:
|
Shannon is designed for **web application security testing** and expects all application code to be available in a single directory structure. This works well for:
|
||||||
|
|
||||||
@@ -137,105 +173,35 @@ Shannon is designed for **web application security testing** and expects all app
|
|||||||
**For monorepos:**
|
**For monorepos:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/your-org/your-monorepo.git repos/your-app
|
git clone https://github.com/your-org/your-monorepo.git /path/to/your-app
|
||||||
```
|
```
|
||||||
|
|
||||||
**For multi-repository applications** (e.g., separate frontend/backend):
|
**For multi-repository applications** (e.g., separate frontend/backend):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir repos/your-app
|
mkdir /path/to/your-app
|
||||||
cd repos/your-app
|
cd /path/to/your-app
|
||||||
git clone https://github.com/your-org/frontend.git
|
git clone https://github.com/your-org/frontend.git
|
||||||
git clone https://github.com/your-org/backend.git
|
git clone https://github.com/your-org/backend.git
|
||||||
git clone https://github.com/your-org/api.git
|
git clone https://github.com/your-org/api.git
|
||||||
```
|
```
|
||||||
|
|
||||||
**For existing local repositories:**
|
### Platform-Specific Instructions
|
||||||
|
|
||||||
```bash
|
|
||||||
cp -r /path/to/your-existing-repo repos/your-app
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Run Your First Pentest
|
|
||||||
|
|
||||||
**With Claude Console OAuth Token:**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker run --rm -it \
|
|
||||||
--network host \
|
|
||||||
--cap-add=NET_RAW \
|
|
||||||
--cap-add=NET_ADMIN \
|
|
||||||
-e CLAUDE_CODE_OAUTH_TOKEN="$CLAUDE_CODE_OAUTH_TOKEN" \
|
|
||||||
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
|
|
||||||
-v "$(pwd)/repos:/app/repos" \
|
|
||||||
-v "$(pwd)/configs:/app/configs" \
|
|
||||||
# Comment below line if using custom output directory
|
|
||||||
-v "$(pwd)/audit-logs:/app/audit-logs" \
|
|
||||||
shannon:latest \
|
|
||||||
"https://your-app.com/" \
|
|
||||||
"/app/repos/your-app" \
|
|
||||||
--config /app/configs/example-config.yaml
|
|
||||||
# Optional: uncomment below for custom output directory
|
|
||||||
# -v "$(pwd)/reports:/app/reports" \
|
|
||||||
# --output /app/reports
|
|
||||||
```
|
|
||||||
|
|
||||||
**With Anthropic API Key:**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker run --rm -it \
|
|
||||||
--network host \
|
|
||||||
--cap-add=NET_RAW \
|
|
||||||
--cap-add=NET_ADMIN \
|
|
||||||
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
|
|
||||||
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
|
|
||||||
-v "$(pwd)/repos:/app/repos" \
|
|
||||||
-v "$(pwd)/configs:/app/configs" \
|
|
||||||
# Comment below line if using custom output directory
|
|
||||||
-v "$(pwd)/audit-logs:/app/audit-logs" \
|
|
||||||
shannon:latest \
|
|
||||||
"https://your-app.com/" \
|
|
||||||
"/app/repos/your-app" \
|
|
||||||
--config /app/configs/example-config.yaml
|
|
||||||
# Optional: uncomment below for custom output directory
|
|
||||||
# -v "$(pwd)/reports:/app/reports" \
|
|
||||||
# --output /app/reports
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Platform-Specific Instructions
|
|
||||||
|
|
||||||
**For Linux (Native Docker):**
|
**For Linux (Native Docker):**
|
||||||
|
|
||||||
Add the `--user $(id -u):$(id -g)` flag to the Docker commands above to avoid permission issues with volume mounts. Docker Desktop on macOS and Windows handles this automatically, but native Linux Docker requires explicit user mapping.
|
You may need to run commands with `sudo` depending on your Docker setup. If you encounter permission issues with output files, ensure your user has access to the Docker socket.
|
||||||
|
|
||||||
**Network Capabilities:**
|
**For macOS:**
|
||||||
|
|
||||||
- `--cap-add=NET_RAW` - Enables advanced port scanning with nmap
|
Works out of the box with Docker Desktop installed.
|
||||||
- `--cap-add=NET_ADMIN` - Allows network administration for security tools
|
|
||||||
- `--network host` - Provides access to target network interfaces
|
|
||||||
|
|
||||||
**Testing Local Applications:**
|
**Testing Local Applications:**
|
||||||
|
|
||||||
Docker containers cannot reach `localhost` on your host machine. Use `host.docker.internal` in place of `localhost`:
|
Docker containers cannot reach `localhost` on your host machine. Use `host.docker.internal` in place of `localhost`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run --rm -it \
|
./shannon start URL=http://host.docker.internal:3000 REPO=/path/to/repo
|
||||||
--add-host=host.docker.internal:host-gateway \
|
|
||||||
--cap-add=NET_RAW \
|
|
||||||
--cap-add=NET_ADMIN \
|
|
||||||
-e CLAUDE_CODE_OAUTH_TOKEN="$CLAUDE_CODE_OAUTH_TOKEN" \
|
|
||||||
-e CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 \
|
|
||||||
-v "$(pwd)/repos:/app/repos" \
|
|
||||||
-v "$(pwd)/configs:/app/configs" \
|
|
||||||
# Comment below line if using custom output directory
|
|
||||||
-v "$(pwd)/audit-logs:/app/audit-logs" \
|
|
||||||
shannon:latest \
|
|
||||||
"http://host.docker.internal:3000" \
|
|
||||||
"/app/repos/your-app" \
|
|
||||||
--config /app/configs/example-config.yaml
|
|
||||||
# Optional: uncomment below for custom output directory
|
|
||||||
# -v "$(pwd)/reports:/app/reports" \
|
|
||||||
# --output /app/reports
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Configuration (Optional)
|
### Configuration (Optional)
|
||||||
@@ -288,12 +254,17 @@ If your application uses two-factor authentication, simply add the TOTP secret t
|
|||||||
|
|
||||||
### Output and Results
|
### Output and Results
|
||||||
|
|
||||||
All results are saved to `./audit-logs/` by default. Use `--output <path>` to specify a custom directory. If using `--output`, ensure that path is mounted to an accessible host directory (e.g., `-v "$(pwd)/custom-directory:/app/reports"`).
|
All results are saved to `./audit-logs/{hostname}_{sessionId}/` by default. Use `--output <path>` to specify a custom directory.
|
||||||
|
|
||||||
- **Pre-reconnaissance reports** - External scan results
|
Output structure:
|
||||||
- **Vulnerability assessments** - Potential vulnerabilities from thorough code analysis and network mapping
|
```
|
||||||
- **Exploitation results** - Proof-of-concept attempts
|
audit-logs/{hostname}_{sessionId}/
|
||||||
- **Executive reports** - Business-focused security summaries
|
├── session.json # Metrics and session data
|
||||||
|
├── agents/ # Per-agent execution logs
|
||||||
|
├── prompts/ # Prompt snapshots for reproducibility
|
||||||
|
└── deliverables/
|
||||||
|
└── comprehensive_security_assessment_report.md # Final comprehensive security report
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,39 @@
|
|||||||
|
services:
|
||||||
|
temporal:
|
||||||
|
image: temporalio/temporal:latest
|
||||||
|
command: ["server", "start-dev", "--db-filename", "/home/temporal/temporal.db", "--ip", "0.0.0.0"]
|
||||||
|
ports:
|
||||||
|
- "7233:7233" # gRPC
|
||||||
|
- "8233:8233" # Web UI (built-in)
|
||||||
|
volumes:
|
||||||
|
- temporal-data:/home/temporal
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "temporal", "operator", "cluster", "health", "--address", "localhost:7233"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 10
|
||||||
|
start_period: 30s
|
||||||
|
|
||||||
|
worker:
|
||||||
|
build: .
|
||||||
|
entrypoint: ["node", "dist/temporal/worker.js"]
|
||||||
|
environment:
|
||||||
|
- TEMPORAL_ADDRESS=temporal:7233
|
||||||
|
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
|
||||||
|
- CLAUDE_CODE_OAUTH_TOKEN=${CLAUDE_CODE_OAUTH_TOKEN:-}
|
||||||
|
- CLAUDE_CODE_MAX_OUTPUT_TOKENS=${CLAUDE_CODE_MAX_OUTPUT_TOKENS:-64000}
|
||||||
|
depends_on:
|
||||||
|
temporal:
|
||||||
|
condition: service_healthy
|
||||||
|
volumes:
|
||||||
|
- ./prompts:/app/prompts
|
||||||
|
- ./audit-logs:/app/audit-logs
|
||||||
|
- ${TARGET_REPO:-.}:/target-repo
|
||||||
|
- ${BENCHMARKS_BASE:-.}:/benchmarks
|
||||||
|
shm_size: 2gb
|
||||||
|
ipc: host
|
||||||
|
security_opt:
|
||||||
|
- seccomp:unconfined
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
temporal-data:
|
||||||
+13
-9
@@ -11,22 +11,25 @@
|
|||||||
* for Shannon penetration testing agents.
|
* for Shannon penetration testing agents.
|
||||||
*
|
*
|
||||||
* Replaces bash script invocations with native tool access.
|
* Replaces bash script invocations with native tool access.
|
||||||
|
*
|
||||||
|
* Uses factory pattern to create tools with targetDir captured in closure,
|
||||||
|
* ensuring thread-safety when multiple workflows run in parallel.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk';
|
import { createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk';
|
||||||
import { saveDeliverableTool } from './tools/save-deliverable.js';
|
import { createSaveDeliverableTool } from './tools/save-deliverable.js';
|
||||||
import { generateTotpTool } from './tools/generate-totp.js';
|
import { generateTotpTool } from './tools/generate-totp.js';
|
||||||
|
|
||||||
declare global {
|
|
||||||
var __SHANNON_TARGET_DIR: string | undefined;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Create Shannon Helper MCP Server with target directory context
|
* Create Shannon Helper MCP Server with target directory context
|
||||||
|
*
|
||||||
|
* Each workflow should create its own MCP server instance with its targetDir.
|
||||||
|
* The save_deliverable tool captures targetDir in a closure, preventing race
|
||||||
|
* conditions when multiple workflows run in parallel.
|
||||||
*/
|
*/
|
||||||
export function createShannonHelperServer(targetDir: string): ReturnType<typeof createSdkMcpServer> {
|
export function createShannonHelperServer(targetDir: string): ReturnType<typeof createSdkMcpServer> {
|
||||||
// Store target directory for tool access
|
// Create save_deliverable tool with targetDir in closure (no global variable)
|
||||||
global.__SHANNON_TARGET_DIR = targetDir;
|
const saveDeliverableTool = createSaveDeliverableTool(targetDir);
|
||||||
|
|
||||||
return createSdkMcpServer({
|
return createSdkMcpServer({
|
||||||
name: 'shannon-helper',
|
name: 'shannon-helper',
|
||||||
@@ -35,8 +38,9 @@ export function createShannonHelperServer(targetDir: string): ReturnType<typeof
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
// Export tools for direct usage if needed
|
// Export factory for direct usage if needed
|
||||||
export { saveDeliverableTool, generateTotpTool };
|
export { createSaveDeliverableTool } from './tools/save-deliverable.js';
|
||||||
|
export { generateTotpTool } from './tools/generate-totp.js';
|
||||||
|
|
||||||
// Export types for external use
|
// Export types for external use
|
||||||
export * from './types/index.js';
|
export * from './types/index.js';
|
||||||
|
|||||||
@@ -9,6 +9,9 @@
|
|||||||
*
|
*
|
||||||
* Saves deliverable files with automatic validation.
|
* Saves deliverable files with automatic validation.
|
||||||
* Replaces tools/save_deliverable.js bash script.
|
* Replaces tools/save_deliverable.js bash script.
|
||||||
|
*
|
||||||
|
* Uses factory pattern to capture targetDir in closure, avoiding race conditions
|
||||||
|
* when multiple workflows run in parallel.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { tool } from '@anthropic-ai/claude-agent-sdk';
|
import { tool } from '@anthropic-ai/claude-agent-sdk';
|
||||||
@@ -30,59 +33,69 @@ export const SaveDeliverableInputSchema = z.object({
|
|||||||
export type SaveDeliverableInput = z.infer<typeof SaveDeliverableInputSchema>;
|
export type SaveDeliverableInput = z.infer<typeof SaveDeliverableInputSchema>;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* save_deliverable tool implementation
|
* Create save_deliverable handler with targetDir captured in closure
|
||||||
|
*
|
||||||
|
* This factory pattern ensures each MCP server instance has its own targetDir,
|
||||||
|
* preventing race conditions when multiple workflows run in parallel.
|
||||||
*/
|
*/
|
||||||
export async function saveDeliverable(args: SaveDeliverableInput): Promise<ToolResult> {
|
function createSaveDeliverableHandler(targetDir: string) {
|
||||||
try {
|
return async function saveDeliverable(args: SaveDeliverableInput): Promise<ToolResult> {
|
||||||
const { deliverable_type, content } = args;
|
try {
|
||||||
|
const { deliverable_type, content } = args;
|
||||||
|
|
||||||
// Validate queue JSON if applicable
|
// Validate queue JSON if applicable
|
||||||
if (isQueueType(deliverable_type)) {
|
if (isQueueType(deliverable_type)) {
|
||||||
const queueValidation = validateQueueJson(content);
|
const queueValidation = validateQueueJson(content);
|
||||||
if (!queueValidation.valid) {
|
if (!queueValidation.valid) {
|
||||||
const errorResponse = createValidationError(
|
const errorResponse = createValidationError(
|
||||||
queueValidation.message ?? 'Invalid queue JSON',
|
queueValidation.message ?? 'Invalid queue JSON',
|
||||||
true,
|
true,
|
||||||
{
|
{
|
||||||
deliverableType: deliverable_type,
|
deliverableType: deliverable_type,
|
||||||
expectedFormat: '{"vulnerabilities": [...]}',
|
expectedFormat: '{"vulnerabilities": [...]}',
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
return createToolResult(errorResponse);
|
return createToolResult(errorResponse);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Get filename and save file (targetDir captured from closure)
|
||||||
|
const filename = DELIVERABLE_FILENAMES[deliverable_type];
|
||||||
|
const filepath = saveDeliverableFile(targetDir, filename, content);
|
||||||
|
|
||||||
|
// Success response
|
||||||
|
const successResponse: SaveDeliverableResponse = {
|
||||||
|
status: 'success',
|
||||||
|
message: `Deliverable saved successfully: ${filename}`,
|
||||||
|
filepath,
|
||||||
|
deliverableType: deliverable_type,
|
||||||
|
validated: isQueueType(deliverable_type),
|
||||||
|
};
|
||||||
|
|
||||||
|
return createToolResult(successResponse);
|
||||||
|
} catch (error) {
|
||||||
|
const errorResponse = createGenericError(
|
||||||
|
error,
|
||||||
|
false,
|
||||||
|
{ deliverableType: args.deliverable_type }
|
||||||
|
);
|
||||||
|
|
||||||
|
return createToolResult(errorResponse);
|
||||||
}
|
}
|
||||||
|
};
|
||||||
// Get filename and save file
|
|
||||||
const filename = DELIVERABLE_FILENAMES[deliverable_type];
|
|
||||||
const filepath = saveDeliverableFile(filename, content);
|
|
||||||
|
|
||||||
// Success response
|
|
||||||
const successResponse: SaveDeliverableResponse = {
|
|
||||||
status: 'success',
|
|
||||||
message: `Deliverable saved successfully: ${filename}`,
|
|
||||||
filepath,
|
|
||||||
deliverableType: deliverable_type,
|
|
||||||
validated: isQueueType(deliverable_type),
|
|
||||||
};
|
|
||||||
|
|
||||||
return createToolResult(successResponse);
|
|
||||||
} catch (error) {
|
|
||||||
const errorResponse = createGenericError(
|
|
||||||
error,
|
|
||||||
false,
|
|
||||||
{ deliverableType: args.deliverable_type }
|
|
||||||
);
|
|
||||||
|
|
||||||
return createToolResult(errorResponse);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Tool definition for MCP server - created using SDK's tool() function
|
* Factory function to create save_deliverable tool with targetDir in closure
|
||||||
|
*
|
||||||
|
* Each MCP server instance should call this with its own targetDir to ensure
|
||||||
|
* deliverables are saved to the correct workflow's directory.
|
||||||
*/
|
*/
|
||||||
export const saveDeliverableTool = tool(
|
export function createSaveDeliverableTool(targetDir: string) {
|
||||||
'save_deliverable',
|
return tool(
|
||||||
'Saves deliverable files with automatic validation. Queue files must have {"vulnerabilities": [...]} structure.',
|
'save_deliverable',
|
||||||
SaveDeliverableInputSchema.shape,
|
'Saves deliverable files with automatic validation. Queue files must have {"vulnerabilities": [...]} structure.',
|
||||||
saveDeliverable
|
SaveDeliverableInputSchema.shape,
|
||||||
);
|
createSaveDeliverableHandler(targetDir)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|||||||
@@ -14,16 +14,14 @@
|
|||||||
import { writeFileSync, mkdirSync } from 'fs';
|
import { writeFileSync, mkdirSync } from 'fs';
|
||||||
import { join } from 'path';
|
import { join } from 'path';
|
||||||
|
|
||||||
declare global {
|
|
||||||
var __SHANNON_TARGET_DIR: string | undefined;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Save deliverable file to deliverables/ directory
|
* Save deliverable file to deliverables/ directory
|
||||||
|
*
|
||||||
|
* @param targetDir - Target directory for deliverables (passed explicitly to avoid race conditions)
|
||||||
|
* @param filename - Name of the deliverable file
|
||||||
|
* @param content - File content to save
|
||||||
*/
|
*/
|
||||||
export function saveDeliverableFile(filename: string, content: string): string {
|
export function saveDeliverableFile(targetDir: string, filename: string, content: string): string {
|
||||||
// Use target directory from global context (set by createShannonHelperServer)
|
|
||||||
const targetDir = global.__SHANNON_TARGET_DIR || process.cwd();
|
|
||||||
const deliverablesDir = join(targetDir, 'deliverables');
|
const deliverablesDir = join(targetDir, 'deliverables');
|
||||||
const filepath = join(deliverablesDir, filename);
|
const filepath = join(deliverablesDir, filename);
|
||||||
|
|
||||||
|
|||||||
Generated
+1856
-2
File diff suppressed because it is too large
Load Diff
+9
-5
@@ -2,13 +2,20 @@
|
|||||||
"name": "shannon",
|
"name": "shannon",
|
||||||
"version": "1.0.0",
|
"version": "1.0.0",
|
||||||
"type": "module",
|
"type": "module",
|
||||||
"main": "./dist/shannon.js",
|
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"build": "tsc",
|
"build": "tsc",
|
||||||
"start": "node ./dist/shannon.js"
|
"temporal:server": "docker compose -f docker/docker-compose.temporal.yml up temporal -d",
|
||||||
|
"temporal:server:stop": "docker compose -f docker/docker-compose.temporal.yml down",
|
||||||
|
"temporal:worker": "node dist/temporal/worker.js",
|
||||||
|
"temporal:start": "node dist/temporal/client.js",
|
||||||
|
"temporal:query": "node dist/temporal/query.js"
|
||||||
},
|
},
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@anthropic-ai/claude-agent-sdk": "^0.1.0",
|
"@anthropic-ai/claude-agent-sdk": "^0.1.0",
|
||||||
|
"@temporalio/activity": "^1.11.0",
|
||||||
|
"@temporalio/client": "^1.11.0",
|
||||||
|
"@temporalio/worker": "^1.11.0",
|
||||||
|
"@temporalio/workflow": "^1.11.0",
|
||||||
"ajv": "^8.12.0",
|
"ajv": "^8.12.0",
|
||||||
"ajv-formats": "^2.1.1",
|
"ajv-formats": "^2.1.1",
|
||||||
"boxen": "^8.0.1",
|
"boxen": "^8.0.1",
|
||||||
@@ -20,9 +27,6 @@
|
|||||||
"zod": "^3.22.4",
|
"zod": "^3.22.4",
|
||||||
"zx": "^8.0.0"
|
"zx": "^8.0.0"
|
||||||
},
|
},
|
||||||
"bin": {
|
|
||||||
"shannon": "./dist/shannon.js"
|
|
||||||
},
|
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"@types/js-yaml": "^4.0.9",
|
"@types/js-yaml": "^4.0.9",
|
||||||
"@types/node": "^25.0.3",
|
"@types/node": "^25.0.3",
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Authentication Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of broken authentication. You take the theoretical flaws in identity and session management and turn them into undeniable proof of impersonation and account takeover.
|
You are a world-class Authentication Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of broken authentication. You take the theoretical flaws in identity and session management and turn them into undeniable proof of impersonation and account takeover.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Authorization Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of broken authorization. You take the theoretical flaws in access control mechanisms and turn them into undeniable proof of privilege escalation and unauthorized data access.
|
You are a world-class Authorization Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of broken authorization. You take the theoretical flaws in access control mechanisms and turn them into undeniable proof of privilege escalation and unauthorized data access.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Injection Exploitation Specialist. Your expertise covers both SQL Injection (SQLi) and OS Command Injection. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of compromise.
|
You are a world-class Injection Exploitation Specialist. Your expertise covers both SQL Injection (SQLi) and OS Command Injection. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of compromise.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Server-Side Request Forgery (SSRF) Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of server-side request forgery vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of network boundary bypass and internal service access.
|
You are a world-class Server-Side Request Forgery (SSRF) Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of server-side request forgery vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of network boundary bypass and internal service access.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Cross-Site Scripting (XSS) Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of client-side vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of compromise by hijacking user sessions and performing unauthorized actions.
|
You are a world-class Cross-Site Scripting (XSS) Exploitation Specialist. You are not an analyst; you are an active penetration tester. Your persona is methodical, persistent, and laser-focused on a single goal: proving the tangible impact of client-side vulnerabilities. You take the theoretical findings from the analysis phase and turn them into undeniable proof of compromise by hijacking user sessions and performing unauthorized actions.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Authentication Exploitation Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Authentication Exploitation Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Authorization Exploitation Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Authorization Exploitation Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Injection Exploitation Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Injection Exploitation Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for SSRF Exploitation Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for SSRF Exploitation Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for XSS Exploitation Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for XSS Exploitation Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,5 +1 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
Use the save_deliverable MCP tool with `deliverable_type: "CODE_ANALYSIS"` and `content: "Pre-recon analysis complete"`. Then say "Done".
|
Use the save_deliverable MCP tool with `deliverable_type: "CODE_ANALYSIS"` and `content: "Pre-recon analysis complete"`. Then say "Done".
|
||||||
@@ -1,5 +1 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
Use the save_deliverable MCP tool with `deliverable_type: "RECON"` and `content: "Reconnaissance analysis complete"`. Then say "Done".
|
Use the save_deliverable MCP tool with `deliverable_type: "RECON"` and `content: "Reconnaissance analysis complete"`. Then say "Done".
|
||||||
@@ -1,35 +1 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
Read `deliverables/comprehensive_security_assessment_report.md`, prepend "# Security Assessment Report\n\n**Target:** {{WEB_URL}}\n\n" to the content, and save it back. Say "Done".
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
Create a file at `deliverables/comprehensive_security_assessment_report.md` with this content:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
# Injection Exploitation Evidence
|
|
||||||
|
|
||||||
## Successfully Exploited Vulnerabilities
|
|
||||||
|
|
||||||
### INJ-VULN-01: SQL Injection Test
|
|
||||||
|
|
||||||
**Summary:** Test SQL injection vulnerability for pipeline testing.
|
|
||||||
|
|
||||||
# Authentication Exploitation Evidence
|
|
||||||
|
|
||||||
## Successfully Exploited Vulnerabilities
|
|
||||||
|
|
||||||
### AUTH-VULN-01: Authentication Bypass Test
|
|
||||||
|
|
||||||
**Summary:** Test authentication vulnerability for pipeline testing.
|
|
||||||
```
|
|
||||||
|
|
||||||
Then read and modify the file by adding these sections at the top:
|
|
||||||
|
|
||||||
1. **Executive Summary** section with:
|
|
||||||
- Target: {{WEB_URL}}
|
|
||||||
- Assessment Date: (current date)
|
|
||||||
- Scope: Authentication, XSS, SQL and Command Injection, SSRF, Authorization testing
|
|
||||||
- Brief summary of findings (2-3 vulnerabilities found for testing)
|
|
||||||
|
|
||||||
2. **Network Reconnaissance** section with test data for open ports and services
|
|
||||||
|
|
||||||
Save the modified content back to `deliverables/comprehensive_security_assessment_report.md`, then say "Done".
|
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Auth Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Auth Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Authorization Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Authorization Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for Injection Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for Injection Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for SSRF Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for SSRF Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
## 🧪 Pipeline Testing: MCP Isolation Test for XSS Agent
|
## 🧪 Pipeline Testing: MCP Isolation Test for XSS Agent
|
||||||
|
|
||||||
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
**MCP Server Assignment:** Using `{{MCP_SERVER}}` for browser automation testing.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
Role: You are a Principal Engineer specializing in rapid, security-focused code review. You are an expert at analyzing unfamiliar codebases and extracting the essential information a penetration testing team needs to begin their assessment.
|
Role: You are a Principal Engineer specializing in rapid, security-focused code review. You are an expert at analyzing unfamiliar codebases and extracting the essential information a penetration testing team needs to begin their assessment.
|
||||||
|
|
||||||
Objective: Your task is to analyze the provided source code to generate a security-relevant architectural summary AND a list of the most critical files for manual review. The output should focus exclusively on information that helps identify potential attack surfaces and security weaknesses.
|
Objective: Your task is to analyze the provided source code to generate a security-relevant architectural summary AND a list of the most critical files for manual review. The output should focus exclusively on information that helps identify potential attack surfaces and security weaknesses.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a world-class Reconnaissance Analyst and Attack Surface Mapper. Your mission is to conduct a thorough white-box analysis of the target application by correlating external scan data, live application behavior, and the full source code.
|
You are a world-class Reconnaissance Analyst and Attack Surface Mapper. Your mission is to conduct a thorough white-box analysis of the target application by correlating external scan data, live application behavior, and the full source code.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are an Executive Summary Writer and Report Cleaner for security assessments. Your job is to:
|
You are an Executive Summary Writer and Report Cleaner for security assessments. Your job is to:
|
||||||
1. MODIFY the existing concatenated report by adding an executive summary at the top
|
1. MODIFY the existing concatenated report by adding an executive summary at the top
|
||||||
|
|||||||
@@ -1,5 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
**MANDATORY EXPLOITATION CONSTRAINTS:**
|
**MANDATORY EXPLOITATION CONSTRAINTS:**
|
||||||
- ALL exploitation attempts MUST be performed against {{WEB_URL}} from external network
|
- ALL exploitation attempts MUST be performed against {{WEB_URL}} from external network
|
||||||
- DO NOT attempt to exploit vulnerabilities that require:
|
- DO NOT attempt to exploit vulnerabilities that require:
|
||||||
|
|||||||
@@ -1,4 +1,2 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
Rules to Avoid:
|
Rules to Avoid:
|
||||||
{{RULES_AVOID}}
|
{{RULES_AVOID}}
|
||||||
@@ -1,3 +1 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
URL: {{WEB_URL}}
|
URL: {{WEB_URL}}
|
||||||
@@ -1,3 +1 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
**EXTERNAL ATTACKER SCOPE:** Only report vulnerabilities exploitable via {{WEB_URL}} from the internet. Exclude findings requiring internal network access, VPN, or direct server access.
|
**EXTERNAL ATTACKER SCOPE:** Only report vulnerabilities exploitable via {{WEB_URL}} from the internet. Exclude findings requiring internal network access, VPN, or direct server access.
|
||||||
@@ -1,5 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
<!-- BEGIN:COMMON -->
|
<!-- BEGIN:COMMON -->
|
||||||
**IF you are not already logged in when you open playwright, these are the login instructions**
|
**IF you are not already logged in when you open playwright, these are the login instructions**
|
||||||
**NOTE: playwright automatically saves sessions so ensure you are not already logged in before executing these steps**
|
**NOTE: playwright automatically saves sessions so ensure you are not already logged in before executing these steps**
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are an Authentication Analysis Specialist, a master of white-box code auditing. Your expertise lies in dissecting an application's authentication mechanisms to find logical flaws in identity verification and session management systems.
|
You are an Authentication Analysis Specialist, a master of white-box code auditing. Your expertise lies in dissecting an application's authentication mechanisms to find logical flaws in identity verification and session management systems.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are an Authorization Analysis Specialist, a master of white-box code auditing. Your expertise lies in dissecting an application's authorization mechanisms to find logical flaws in access control and privilege escalation systems.
|
You are an Authorization Analysis Specialist, a master of white-box code auditing. Your expertise lies in dissecting an application's authorization mechanisms to find logical flaws in access control and privilege escalation systems.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are an Injection Analysis Specialist, an expert in **white-box code analysis and data flow tracing** for SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and Deserialization vulnerabilities.
|
You are an Injection Analysis Specialist, an expert in **white-box code analysis and data flow tracing** for SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and Deserialization vulnerabilities.
|
||||||
Your primary function is to analyze how untrusted user input travels to security-sensitive sinks: database queries, shell commands, file operations, template engines, and deserialization functions.
|
Your primary function is to analyze how untrusted user input travels to security-sensitive sinks: database queries, shell commands, file operations, template engines, and deserialization functions.
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a Server-Side Request Forgery (SSRF) Analysis Specialist, an expert in white-box code analysis and data flow tracing for server-side request vulnerabilities. Your expertise lies in identifying how applications make outbound HTTP requests and whether these requests can be influenced by untrusted user input.
|
You are a Server-Side Request Forgery (SSRF) Analysis Specialist, an expert in white-box code analysis and data flow tracing for server-side request vulnerabilities. Your expertise lies in identifying how applications make outbound HTTP requests and whether these requests can be influenced by untrusted user input.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,7 +1,3 @@
|
|||||||
# This Source Code Form is subject to the terms of the AGPL, v. 3.0
|
|
||||||
# This section above is metadata and not part of the prompt.
|
|
||||||
=== PROMPT ===
|
|
||||||
|
|
||||||
<role>
|
<role>
|
||||||
You are a Cross-Site Scripting (XSS) Analysis Specialist focused **solely on vulnerability analysis** (no exploitation). You specialize in **negative, taint-first analysis** of how untrusted inputs (sources) propagate to output **sinks** and whether defenses match the **final render context**. You follow the Injection specialist and precede Exploitation.
|
You are a Cross-Site Scripting (XSS) Analysis Specialist focused **solely on vulnerability analysis** (no exploitation). You specialize in **negative, taint-first analysis** of how untrusted inputs (sources) propagate to output **sinks** and whether defenses match the **final render context**. You follow the Injection specialist and precede Exploitation.
|
||||||
</role>
|
</role>
|
||||||
|
|||||||
@@ -1,181 +0,0 @@
|
|||||||
// Copyright (C) 2025 Keygraph, Inc.
|
|
||||||
//
|
|
||||||
// This program is free software: you can redistribute it and/or modify
|
|
||||||
// it under the terms of the GNU Affero General Public License version 3
|
|
||||||
// as published by the Free Software Foundation.
|
|
||||||
|
|
||||||
#!/usr/bin/env node
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Export Metrics to CSV
|
|
||||||
*
|
|
||||||
* Converts session.json from audit-logs into CSV format for spreadsheet analysis.
|
|
||||||
*
|
|
||||||
* DATA SOURCE:
|
|
||||||
* - Reads from: audit-logs/{hostname}_{sessionId}/session.json
|
|
||||||
* - Source of truth for all metrics, timing, and cost data
|
|
||||||
* - Automatically created by Shannon during agent execution
|
|
||||||
*
|
|
||||||
* CSV OUTPUT:
|
|
||||||
* - One row per agent with: agent, phase, status, attempts, duration_ms, cost_usd
|
|
||||||
* - Perfect for importing into Excel/Google Sheets for analysis
|
|
||||||
*
|
|
||||||
* USE CASES:
|
|
||||||
* - Compare performance across multiple sessions
|
|
||||||
* - Track costs and optimize budget
|
|
||||||
* - Identify slow agents for optimization
|
|
||||||
* - Generate charts and visualizations
|
|
||||||
* - Export data for external reporting tools
|
|
||||||
*
|
|
||||||
* EXAMPLES:
|
|
||||||
* ```bash
|
|
||||||
* # Export to stdout
|
|
||||||
* ./scripts/export-metrics.js --session-id abc123
|
|
||||||
*
|
|
||||||
* # Export to file
|
|
||||||
* ./scripts/export-metrics.js --session-id abc123 --output metrics.csv
|
|
||||||
*
|
|
||||||
* # Find session ID from Shannon store
|
|
||||||
* cat .shannon-store.json | jq '.sessions | keys'
|
|
||||||
* ```
|
|
||||||
*
|
|
||||||
* NOTE: For raw metrics, just read audit-logs/.../session.json directly.
|
|
||||||
* This script only exists to provide a spreadsheet-friendly CSV format.
|
|
||||||
*/
|
|
||||||
|
|
||||||
import chalk from 'chalk';
|
|
||||||
import { fs, path } from 'zx';
|
|
||||||
import { getSession } from '../src/session-manager.js';
|
|
||||||
import { AuditSession } from '../src/audit/index.js';
|
|
||||||
|
|
||||||
// Parse command-line arguments
|
|
||||||
function parseArgs() {
|
|
||||||
const args = {
|
|
||||||
sessionId: null,
|
|
||||||
output: null
|
|
||||||
};
|
|
||||||
|
|
||||||
for (let i = 2; i < process.argv.length; i++) {
|
|
||||||
const arg = process.argv[i];
|
|
||||||
|
|
||||||
if (arg === '--session-id' && process.argv[i + 1]) {
|
|
||||||
args.sessionId = process.argv[i + 1];
|
|
||||||
i++;
|
|
||||||
} else if (arg === '--output' && process.argv[i + 1]) {
|
|
||||||
args.output = process.argv[i + 1];
|
|
||||||
i++;
|
|
||||||
} else if (arg === '--help' || arg === '-h') {
|
|
||||||
printUsage();
|
|
||||||
process.exit(0);
|
|
||||||
} else {
|
|
||||||
console.log(chalk.red(`❌ Unknown argument: ${arg}`));
|
|
||||||
printUsage();
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return args;
|
|
||||||
}
|
|
||||||
|
|
||||||
function printUsage() {
|
|
||||||
console.log(chalk.cyan('\n📊 Export Metrics to CSV'));
|
|
||||||
console.log(chalk.gray('\nUsage: ./scripts/export-metrics.js [options]\n'));
|
|
||||||
console.log(chalk.white('Options:'));
|
|
||||||
console.log(chalk.gray(' --session-id <id> Session ID to export (required)'));
|
|
||||||
console.log(chalk.gray(' --output <file> Output CSV file path (default: stdout)'));
|
|
||||||
console.log(chalk.gray(' --help, -h Show this help\n'));
|
|
||||||
console.log(chalk.white('Examples:'));
|
|
||||||
console.log(chalk.gray(' # Export to stdout'));
|
|
||||||
console.log(chalk.gray(' ./scripts/export-metrics.js --session-id abc123\n'));
|
|
||||||
console.log(chalk.gray(' # Export to file'));
|
|
||||||
console.log(chalk.gray(' ./scripts/export-metrics.js --session-id abc123 --output metrics.csv\n'));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Export metrics for a session
|
|
||||||
async function exportMetrics(sessionId) {
|
|
||||||
const session = await getSession(sessionId);
|
|
||||||
if (!session) {
|
|
||||||
throw new Error(`Session ${sessionId} not found`);
|
|
||||||
}
|
|
||||||
|
|
||||||
const auditSession = new AuditSession(session);
|
|
||||||
await auditSession.initialize();
|
|
||||||
const metrics = await auditSession.getMetrics();
|
|
||||||
|
|
||||||
return exportAsCSV(session, metrics);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Export as CSV
|
|
||||||
function exportAsCSV(session, metrics) {
|
|
||||||
const lines = [];
|
|
||||||
|
|
||||||
// Header
|
|
||||||
lines.push('agent,phase,status,attempts,duration_ms,cost_usd');
|
|
||||||
|
|
||||||
// Phase mapping
|
|
||||||
const phaseMap = {
|
|
||||||
'pre-recon': 'pre-recon',
|
|
||||||
'recon': 'recon',
|
|
||||||
'injection-vuln': 'vulnerability-analysis',
|
|
||||||
'xss-vuln': 'vulnerability-analysis',
|
|
||||||
'auth-vuln': 'vulnerability-analysis',
|
|
||||||
'authz-vuln': 'vulnerability-analysis',
|
|
||||||
'ssrf-vuln': 'vulnerability-analysis',
|
|
||||||
'injection-exploit': 'exploitation',
|
|
||||||
'xss-exploit': 'exploitation',
|
|
||||||
'auth-exploit': 'exploitation',
|
|
||||||
'authz-exploit': 'exploitation',
|
|
||||||
'ssrf-exploit': 'exploitation',
|
|
||||||
'report': 'reporting'
|
|
||||||
};
|
|
||||||
|
|
||||||
// Agent rows
|
|
||||||
for (const [agentName, agentData] of Object.entries(metrics.metrics.agents)) {
|
|
||||||
const phase = phaseMap[agentName] || 'unknown';
|
|
||||||
|
|
||||||
lines.push([
|
|
||||||
agentName,
|
|
||||||
phase,
|
|
||||||
agentData.status,
|
|
||||||
agentData.attempts.length,
|
|
||||||
agentData.final_duration_ms,
|
|
||||||
agentData.total_cost_usd.toFixed(4)
|
|
||||||
].join(','));
|
|
||||||
}
|
|
||||||
|
|
||||||
return lines.join('\n');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Main execution
|
|
||||||
async function main() {
|
|
||||||
const args = parseArgs();
|
|
||||||
|
|
||||||
if (!args.sessionId) {
|
|
||||||
console.log(chalk.red('❌ Must specify --session-id'));
|
|
||||||
printUsage();
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(chalk.cyan.bold('\n📊 Exporting Metrics to CSV\n'));
|
|
||||||
console.log(chalk.gray(`Session ID: ${args.sessionId}\n`));
|
|
||||||
|
|
||||||
const output = await exportMetrics(args.sessionId);
|
|
||||||
|
|
||||||
if (args.output) {
|
|
||||||
await fs.writeFile(args.output, output);
|
|
||||||
console.log(chalk.green(`✅ Exported to: ${args.output}`));
|
|
||||||
} else {
|
|
||||||
console.log(chalk.cyan('CSV Output:\n'));
|
|
||||||
console.log(output);
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log();
|
|
||||||
}
|
|
||||||
|
|
||||||
main().catch(error => {
|
|
||||||
console.log(chalk.red.bold(`\n🚨 Fatal error: ${error.message}`));
|
|
||||||
if (process.env.DEBUG) {
|
|
||||||
console.log(chalk.gray(error.stack));
|
|
||||||
}
|
|
||||||
process.exit(1);
|
|
||||||
});
|
|
||||||
@@ -0,0 +1,213 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Shannon CLI - AI Penetration Testing Framework
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
COMPOSE_FILE="docker-compose.yml"
|
||||||
|
|
||||||
|
# Load .env if present
|
||||||
|
if [ -f .env ]; then
|
||||||
|
set -a
|
||||||
|
source .env
|
||||||
|
set +a
|
||||||
|
fi
|
||||||
|
|
||||||
|
show_help() {
|
||||||
|
cat << 'EOF'
|
||||||
|
Shannon - AI Penetration Testing Framework
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
./shannon start URL=<url> REPO=<path> Start a pentest workflow
|
||||||
|
./shannon logs ID=<workflow-id> Tail logs for a specific workflow
|
||||||
|
./shannon query ID=<workflow-id> Query workflow progress
|
||||||
|
./shannon stop Stop all containers
|
||||||
|
./shannon help Show this help message
|
||||||
|
|
||||||
|
Options for 'start':
|
||||||
|
CONFIG=<path> Configuration file (YAML)
|
||||||
|
OUTPUT=<path> Output directory for reports
|
||||||
|
PIPELINE_TESTING=true Use minimal prompts for fast testing
|
||||||
|
|
||||||
|
Options for 'stop':
|
||||||
|
CLEAN=true Remove all data including volumes
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo
|
||||||
|
./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./config.yaml
|
||||||
|
./shannon logs ID=example.com_shannon-1234567890
|
||||||
|
./shannon query ID=shannon-1234567890
|
||||||
|
./shannon stop CLEAN=true
|
||||||
|
|
||||||
|
Monitor workflows at http://localhost:8233
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
# Parse KEY=value arguments into variables
|
||||||
|
parse_args() {
|
||||||
|
for arg in "$@"; do
|
||||||
|
case "$arg" in
|
||||||
|
URL=*) URL="${arg#URL=}" ;;
|
||||||
|
REPO=*) REPO="${arg#REPO=}" ;;
|
||||||
|
CONFIG=*) CONFIG="${arg#CONFIG=}" ;;
|
||||||
|
OUTPUT=*) OUTPUT="${arg#OUTPUT=}" ;;
|
||||||
|
ID=*) ID="${arg#ID=}" ;;
|
||||||
|
CLEAN=*) CLEAN="${arg#CLEAN=}" ;;
|
||||||
|
PIPELINE_TESTING=*) PIPELINE_TESTING="${arg#PIPELINE_TESTING=}" ;;
|
||||||
|
REBUILD=*) REBUILD="${arg#REBUILD=}" ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if Temporal is running and healthy
|
||||||
|
is_temporal_ready() {
|
||||||
|
docker compose -f "$COMPOSE_FILE" exec -T temporal \
|
||||||
|
temporal operator cluster health --address localhost:7233 2>/dev/null | grep -q "SERVING"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Ensure containers are running
|
||||||
|
ensure_containers() {
|
||||||
|
# Quick check: if Temporal is already healthy, we're good
|
||||||
|
if is_temporal_ready; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Need to start containers
|
||||||
|
echo "Starting Shannon containers..."
|
||||||
|
if [ "$REBUILD" = "true" ]; then
|
||||||
|
# Force rebuild without cache (use when code changes aren't being picked up)
|
||||||
|
echo "Rebuilding with --no-cache..."
|
||||||
|
docker compose -f "$COMPOSE_FILE" build --no-cache worker
|
||||||
|
fi
|
||||||
|
docker compose -f "$COMPOSE_FILE" up -d --build
|
||||||
|
|
||||||
|
# Wait for Temporal to be ready
|
||||||
|
echo "Waiting for Temporal to be ready..."
|
||||||
|
for i in $(seq 1 30); do
|
||||||
|
if is_temporal_ready; then
|
||||||
|
echo "Temporal is ready!"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if [ "$i" -eq 30 ]; then
|
||||||
|
echo "Timeout waiting for Temporal"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
sleep 2
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd_start() {
|
||||||
|
parse_args "$@"
|
||||||
|
|
||||||
|
# Validate required vars
|
||||||
|
if [ -z "$URL" ] || [ -z "$REPO" ]; then
|
||||||
|
echo "ERROR: URL and REPO are required"
|
||||||
|
echo "Usage: ./shannon start URL=<url> REPO=<path>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for API key
|
||||||
|
if [ -z "$ANTHROPIC_API_KEY" ] && [ -z "$CLAUDE_CODE_OAUTH_TOKEN" ]; then
|
||||||
|
echo "ERROR: Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Determine container path for REPO
|
||||||
|
# - If REPO is already a container path (/benchmarks/*, /target-repo), use as-is
|
||||||
|
# - Otherwise, it's a host path - mount to /target-repo and use that
|
||||||
|
case "$REPO" in
|
||||||
|
/benchmarks/*|/target-repo|/target-repo/*)
|
||||||
|
CONTAINER_REPO="$REPO"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
# Host path - export for docker-compose mount
|
||||||
|
export TARGET_REPO="$REPO"
|
||||||
|
CONTAINER_REPO="/target-repo"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# Ensure containers are running (starts them if needed)
|
||||||
|
ensure_containers
|
||||||
|
|
||||||
|
# Build optional args
|
||||||
|
ARGS=""
|
||||||
|
[ -n "$CONFIG" ] && ARGS="$ARGS --config $CONFIG"
|
||||||
|
[ -n "$OUTPUT" ] && ARGS="$ARGS --output $OUTPUT"
|
||||||
|
[ "$PIPELINE_TESTING" = "true" ] && ARGS="$ARGS --pipeline-testing"
|
||||||
|
|
||||||
|
# Run the client to submit workflow
|
||||||
|
docker compose -f "$COMPOSE_FILE" exec -T worker \
|
||||||
|
node dist/temporal/client.js "$URL" "$CONTAINER_REPO" $ARGS
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd_logs() {
|
||||||
|
parse_args "$@"
|
||||||
|
|
||||||
|
if [ -z "$ID" ]; then
|
||||||
|
echo "ERROR: ID is required"
|
||||||
|
echo "Usage: ./shannon logs ID=<workflow-id>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
WORKFLOW_LOG="./audit-logs/${ID}/workflow.log"
|
||||||
|
|
||||||
|
if [ -f "$WORKFLOW_LOG" ]; then
|
||||||
|
echo "Tailing workflow log: $WORKFLOW_LOG"
|
||||||
|
tail -f "$WORKFLOW_LOG"
|
||||||
|
else
|
||||||
|
echo "ERROR: Workflow log not found: $WORKFLOW_LOG"
|
||||||
|
echo ""
|
||||||
|
echo "Possible causes:"
|
||||||
|
echo " - Workflow hasn't started yet"
|
||||||
|
echo " - Workflow ID is incorrect"
|
||||||
|
echo " - Workflow is using a custom OUTPUT path"
|
||||||
|
echo ""
|
||||||
|
echo "Check: ./shannon query ID=$ID for workflow details"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd_query() {
|
||||||
|
parse_args "$@"
|
||||||
|
|
||||||
|
if [ -z "$ID" ]; then
|
||||||
|
echo "ERROR: ID is required"
|
||||||
|
echo "Usage: ./shannon query ID=<workflow-id>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
docker compose -f "$COMPOSE_FILE" exec -T worker \
|
||||||
|
node dist/temporal/query.js "$ID"
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd_stop() {
|
||||||
|
parse_args "$@"
|
||||||
|
|
||||||
|
if [ "$CLEAN" = "true" ]; then
|
||||||
|
docker compose -f "$COMPOSE_FILE" down -v
|
||||||
|
else
|
||||||
|
docker compose -f "$COMPOSE_FILE" down
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Main command dispatch
|
||||||
|
case "${1:-help}" in
|
||||||
|
start)
|
||||||
|
shift
|
||||||
|
cmd_start "$@"
|
||||||
|
;;
|
||||||
|
logs)
|
||||||
|
shift
|
||||||
|
cmd_logs "$@"
|
||||||
|
;;
|
||||||
|
query)
|
||||||
|
shift
|
||||||
|
cmd_query "$@"
|
||||||
|
;;
|
||||||
|
stop)
|
||||||
|
shift
|
||||||
|
cmd_stop "$@"
|
||||||
|
;;
|
||||||
|
help|--help|-h|*)
|
||||||
|
show_help
|
||||||
|
;;
|
||||||
|
esac
|
||||||
@@ -0,0 +1,79 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
// Null Object pattern for audit logging - callers never check for null
|
||||||
|
|
||||||
|
import type { AuditSession } from '../audit/index.js';
|
||||||
|
import { formatTimestamp } from '../utils/formatting.js';
|
||||||
|
|
||||||
|
export interface AuditLogger {
|
||||||
|
logLlmResponse(turn: number, content: string): Promise<void>;
|
||||||
|
logToolStart(toolName: string, parameters: unknown): Promise<void>;
|
||||||
|
logToolEnd(result: unknown): Promise<void>;
|
||||||
|
logError(error: Error, duration: number, turns: number): Promise<void>;
|
||||||
|
}
|
||||||
|
|
||||||
|
class RealAuditLogger implements AuditLogger {
|
||||||
|
private auditSession: AuditSession;
|
||||||
|
|
||||||
|
constructor(auditSession: AuditSession) {
|
||||||
|
this.auditSession = auditSession;
|
||||||
|
}
|
||||||
|
|
||||||
|
async logLlmResponse(turn: number, content: string): Promise<void> {
|
||||||
|
await this.auditSession.logEvent('llm_response', {
|
||||||
|
turn,
|
||||||
|
content,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async logToolStart(toolName: string, parameters: unknown): Promise<void> {
|
||||||
|
await this.auditSession.logEvent('tool_start', {
|
||||||
|
toolName,
|
||||||
|
parameters,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async logToolEnd(result: unknown): Promise<void> {
|
||||||
|
await this.auditSession.logEvent('tool_end', {
|
||||||
|
result,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async logError(error: Error, duration: number, turns: number): Promise<void> {
|
||||||
|
await this.auditSession.logEvent('error', {
|
||||||
|
message: error.message,
|
||||||
|
errorType: error.constructor.name,
|
||||||
|
stack: error.stack,
|
||||||
|
duration,
|
||||||
|
turns,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Null Object implementation - all methods are safe no-ops */
|
||||||
|
class NullAuditLogger implements AuditLogger {
|
||||||
|
async logLlmResponse(_turn: number, _content: string): Promise<void> {}
|
||||||
|
|
||||||
|
async logToolStart(_toolName: string, _parameters: unknown): Promise<void> {}
|
||||||
|
|
||||||
|
async logToolEnd(_result: unknown): Promise<void> {}
|
||||||
|
|
||||||
|
async logError(_error: Error, _duration: number, _turns: number): Promise<void> {}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns no-op when auditSession is null
|
||||||
|
export function createAuditLogger(auditSession: AuditSession | null): AuditLogger {
|
||||||
|
if (auditSession) {
|
||||||
|
return new RealAuditLogger(auditSession);
|
||||||
|
}
|
||||||
|
|
||||||
|
return new NullAuditLogger();
|
||||||
|
}
|
||||||
+262
-484
@@ -4,35 +4,33 @@
|
|||||||
// it under the terms of the GNU Affero General Public License version 3
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
// as published by the Free Software Foundation.
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
import { $, fs, path } from 'zx';
|
// Production Claude agent execution with retry, git checkpoints, and audit logging
|
||||||
|
|
||||||
|
import { fs, path } from 'zx';
|
||||||
import chalk, { type ChalkInstance } from 'chalk';
|
import chalk, { type ChalkInstance } from 'chalk';
|
||||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||||
import { fileURLToPath } from 'url';
|
|
||||||
import { dirname } from 'path';
|
|
||||||
|
|
||||||
import { isRetryableError, getRetryDelay, PentestError } from '../error-handling.js';
|
import { isRetryableError, getRetryDelay, PentestError } from '../error-handling.js';
|
||||||
import { ProgressIndicator } from '../progress-indicator.js';
|
import { timingResults, Timer } from '../utils/metrics.js';
|
||||||
import { timingResults, costResults, Timer } from '../utils/metrics.js';
|
import { formatTimestamp } from '../utils/formatting.js';
|
||||||
import { formatDuration } from '../audit/utils.js';
|
import { createGitCheckpoint, commitGitSuccess, rollbackGitWorkspace, getGitCommitHash } from '../utils/git-manager.js';
|
||||||
import { createGitCheckpoint, commitGitSuccess, rollbackGitWorkspace } from '../utils/git-manager.js';
|
|
||||||
import { AGENT_VALIDATORS, MCP_AGENT_MAPPING } from '../constants.js';
|
import { AGENT_VALIDATORS, MCP_AGENT_MAPPING } from '../constants.js';
|
||||||
import { filterJsonToolCalls, getAgentPrefix } from '../utils/output-formatter.js';
|
|
||||||
import { generateSessionLogPath } from '../session-manager.js';
|
|
||||||
import { AuditSession } from '../audit/index.js';
|
import { AuditSession } from '../audit/index.js';
|
||||||
import { createShannonHelperServer } from '../../mcp-server/dist/index.js';
|
import { createShannonHelperServer } from '../../mcp-server/dist/index.js';
|
||||||
import type { SessionMetadata } from '../audit/utils.js';
|
import type { SessionMetadata } from '../audit/utils.js';
|
||||||
import type { PromptName } from '../types/index.js';
|
import { getPromptNameForAgent } from '../types/agents.js';
|
||||||
|
import type { AgentName } from '../types/index.js';
|
||||||
|
|
||||||
|
import { dispatchMessage } from './message-handlers.js';
|
||||||
|
import { detectExecutionContext, formatErrorOutput, formatCompletionMessage } from './output-formatters.js';
|
||||||
|
import { createProgressManager } from './progress-manager.js';
|
||||||
|
import { createAuditLogger } from './audit-logger.js';
|
||||||
|
|
||||||
// Extend global for loader flag
|
|
||||||
declare global {
|
declare global {
|
||||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||||
}
|
}
|
||||||
|
|
||||||
const __filename = fileURLToPath(import.meta.url);
|
export interface ClaudePromptResult {
|
||||||
const __dirname = dirname(__filename);
|
|
||||||
|
|
||||||
// Result types
|
|
||||||
interface ClaudePromptResult {
|
|
||||||
result?: string | null;
|
result?: string | null;
|
||||||
success: boolean;
|
success: boolean;
|
||||||
duration: number;
|
duration: number;
|
||||||
@@ -40,14 +38,12 @@ interface ClaudePromptResult {
|
|||||||
cost: number;
|
cost: number;
|
||||||
partialCost?: number;
|
partialCost?: number;
|
||||||
apiErrorDetected?: boolean;
|
apiErrorDetected?: boolean;
|
||||||
logFile?: string;
|
|
||||||
error?: string;
|
error?: string;
|
||||||
errorType?: string;
|
errorType?: string;
|
||||||
prompt?: string;
|
prompt?: string;
|
||||||
retryable?: boolean;
|
retryable?: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
// MCP Server types
|
|
||||||
interface StdioMcpServer {
|
interface StdioMcpServer {
|
||||||
type: 'stdio';
|
type: 'stdio';
|
||||||
command: string;
|
command: string;
|
||||||
@@ -57,157 +53,29 @@ interface StdioMcpServer {
|
|||||||
|
|
||||||
type McpServer = ReturnType<typeof createShannonHelperServer> | StdioMcpServer;
|
type McpServer = ReturnType<typeof createShannonHelperServer> | StdioMcpServer;
|
||||||
|
|
||||||
/**
|
// Configures MCP servers for agent execution, with Docker-specific Chromium handling
|
||||||
* Convert agent name to prompt name for MCP_AGENT_MAPPING lookup
|
function buildMcpServers(
|
||||||
*/
|
|
||||||
function agentNameToPromptName(agentName: string): PromptName {
|
|
||||||
// Special cases
|
|
||||||
if (agentName === 'pre-recon') return 'pre-recon-code';
|
|
||||||
if (agentName === 'report') return 'report-executive';
|
|
||||||
if (agentName === 'recon') return 'recon';
|
|
||||||
|
|
||||||
// Pattern: {type}-vuln → vuln-{type}
|
|
||||||
const vulnMatch = agentName.match(/^(.+)-vuln$/);
|
|
||||||
if (vulnMatch) {
|
|
||||||
return `vuln-${vulnMatch[1]}` as PromptName;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Pattern: {type}-exploit → exploit-{type}
|
|
||||||
const exploitMatch = agentName.match(/^(.+)-exploit$/);
|
|
||||||
if (exploitMatch) {
|
|
||||||
return `exploit-${exploitMatch[1]}` as PromptName;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Default: return as-is
|
|
||||||
return agentName as PromptName;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Simplified validation using direct agent name mapping
|
|
||||||
async function validateAgentOutput(
|
|
||||||
result: ClaudePromptResult,
|
|
||||||
agentName: string | null,
|
|
||||||
sourceDir: string
|
|
||||||
): Promise<boolean> {
|
|
||||||
console.log(chalk.blue(` 🔍 Validating ${agentName} agent output`));
|
|
||||||
|
|
||||||
try {
|
|
||||||
// Check if agent completed successfully
|
|
||||||
if (!result.success || !result.result) {
|
|
||||||
console.log(chalk.red(` ❌ Validation failed: Agent execution was unsuccessful`));
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Get validator function for this agent
|
|
||||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
|
||||||
|
|
||||||
if (!validator) {
|
|
||||||
console.log(chalk.yellow(` ⚠️ No validator found for agent "${agentName}" - assuming success`));
|
|
||||||
console.log(chalk.green(` ✅ Validation passed: Unknown agent with successful result`));
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(chalk.blue(` 📋 Using validator for agent: ${agentName}`));
|
|
||||||
console.log(chalk.blue(` 📂 Source directory: ${sourceDir}`));
|
|
||||||
|
|
||||||
// Apply validation function
|
|
||||||
const validationResult = await validator(sourceDir);
|
|
||||||
|
|
||||||
if (validationResult) {
|
|
||||||
console.log(chalk.green(` ✅ Validation passed: Required files/structure present`));
|
|
||||||
} else {
|
|
||||||
console.log(chalk.red(` ❌ Validation failed: Missing required deliverable files`));
|
|
||||||
}
|
|
||||||
|
|
||||||
return validationResult;
|
|
||||||
|
|
||||||
} catch (error) {
|
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
|
||||||
console.log(chalk.red(` ❌ Validation failed with error: ${errMsg}`));
|
|
||||||
return false; // Assume invalid on validation error
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Pure function: Run Claude Code with SDK - Maximum Autonomy
|
|
||||||
// WARNING: This is a low-level function. Use runClaudePromptWithRetry() for agent execution
|
|
||||||
async function runClaudePrompt(
|
|
||||||
prompt: string,
|
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
_allowedTools: string = 'Read',
|
agentName: string | null
|
||||||
context: string = '',
|
): Record<string, McpServer> {
|
||||||
description: string = 'Claude analysis',
|
const shannonHelperServer = createShannonHelperServer(sourceDir);
|
||||||
agentName: string | null = null,
|
|
||||||
colorFn: ChalkInstance = chalk.cyan,
|
|
||||||
sessionMetadata: SessionMetadata | null = null,
|
|
||||||
auditSession: AuditSession | null = null,
|
|
||||||
attemptNumber: number = 1
|
|
||||||
): Promise<ClaudePromptResult> {
|
|
||||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
|
||||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
|
||||||
let totalCost = 0;
|
|
||||||
let partialCost = 0; // Track partial cost for crash safety
|
|
||||||
|
|
||||||
// Auto-detect execution mode to adjust logging behavior
|
const mcpServers: Record<string, McpServer> = {
|
||||||
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
|
'shannon-helper': shannonHelperServer,
|
||||||
const useCleanOutput = description.includes('Pre-recon agent') ||
|
};
|
||||||
description.includes('Recon agent') ||
|
|
||||||
description.includes('Executive Summary and Report Cleanup') ||
|
|
||||||
description.includes('vuln agent') ||
|
|
||||||
description.includes('exploit agent');
|
|
||||||
|
|
||||||
// Disable status manager - using simple JSON filtering for all agents now
|
if (agentName) {
|
||||||
const statusManager = null;
|
const promptName = getPromptNameForAgent(agentName as AgentName);
|
||||||
|
const playwrightMcpName = MCP_AGENT_MAPPING[promptName as keyof typeof MCP_AGENT_MAPPING] || null;
|
||||||
|
|
||||||
// Setup progress indicator for clean output agents (unless disabled via flag)
|
|
||||||
let progressIndicator: ProgressIndicator | null = null;
|
|
||||||
if (useCleanOutput && !global.SHANNON_DISABLE_LOADER) {
|
|
||||||
const agentType = description.includes('Pre-recon') ? 'pre-reconnaissance' :
|
|
||||||
description.includes('Recon') ? 'reconnaissance' :
|
|
||||||
description.includes('Report') ? 'report generation' : 'analysis';
|
|
||||||
progressIndicator = new ProgressIndicator(`Running ${agentType}...`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// NOTE: Logging now handled by AuditSession (append-only, crash-safe)
|
|
||||||
let logFilePath: string | null = null;
|
|
||||||
if (sessionMetadata && sessionMetadata.webUrl && sessionMetadata.id) {
|
|
||||||
const timestamp = new Date().toISOString().replace(/T/, '_').replace(/[:.]/g, '-').slice(0, 19);
|
|
||||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
|
||||||
const logDir = generateSessionLogPath(sessionMetadata.webUrl, sessionMetadata.id);
|
|
||||||
logFilePath = path.join(logDir, `${timestamp}_${agentKey}_attempt-${attemptNumber}.log`);
|
|
||||||
} else {
|
|
||||||
console.log(chalk.blue(` 🤖 Running Claude Code: ${description}...`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Declare variables that need to be accessible in both try and catch blocks
|
|
||||||
let turnCount = 0;
|
|
||||||
|
|
||||||
try {
|
|
||||||
// Create MCP server with target directory context
|
|
||||||
const shannonHelperServer = createShannonHelperServer(sourceDir);
|
|
||||||
|
|
||||||
// Look up agent's assigned Playwright MCP server
|
|
||||||
let playwrightMcpName: string | null = null;
|
|
||||||
if (agentName) {
|
|
||||||
const promptName = agentNameToPromptName(agentName);
|
|
||||||
playwrightMcpName = MCP_AGENT_MAPPING[promptName as keyof typeof MCP_AGENT_MAPPING] || null;
|
|
||||||
|
|
||||||
if (playwrightMcpName) {
|
|
||||||
console.log(chalk.gray(` 🎭 Assigned ${agentName} → ${playwrightMcpName}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Configure MCP servers: shannon-helper (SDK) + playwright-agentN (stdio)
|
|
||||||
const mcpServers: Record<string, McpServer> = {
|
|
||||||
'shannon-helper': shannonHelperServer,
|
|
||||||
};
|
|
||||||
|
|
||||||
// Add Playwright MCP server if this agent needs browser automation
|
|
||||||
if (playwrightMcpName) {
|
if (playwrightMcpName) {
|
||||||
|
console.log(chalk.gray(` Assigned ${agentName} -> ${playwrightMcpName}`));
|
||||||
|
|
||||||
const userDataDir = `/tmp/${playwrightMcpName}`;
|
const userDataDir = `/tmp/${playwrightMcpName}`;
|
||||||
|
|
||||||
// Detect if running in Docker via explicit environment variable
|
// Docker uses system Chromium; local dev uses Playwright's bundled browsers
|
||||||
const isDocker = process.env.SHANNON_DOCKER === 'true';
|
const isDocker = process.env.SHANNON_DOCKER === 'true';
|
||||||
|
|
||||||
// Build args array - conditionally add --executable-path for Docker
|
|
||||||
const mcpArgs: string[] = [
|
const mcpArgs: string[] = [
|
||||||
'@playwright/mcp@latest',
|
'@playwright/mcp@latest',
|
||||||
'--isolated',
|
'--isolated',
|
||||||
@@ -220,7 +88,6 @@ async function runClaudePrompt(
|
|||||||
mcpArgs.push('--browser', 'chromium');
|
mcpArgs.push('--browser', 'chromium');
|
||||||
}
|
}
|
||||||
|
|
||||||
// Filter out undefined env values for type safety
|
|
||||||
const envVars: Record<string, string> = Object.fromEntries(
|
const envVars: Record<string, string> = Object.fromEntries(
|
||||||
Object.entries({
|
Object.entries({
|
||||||
...process.env,
|
...process.env,
|
||||||
@@ -236,335 +103,200 @@ async function runClaudePrompt(
|
|||||||
env: envVars,
|
env: envVars,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
const options = {
|
return mcpServers;
|
||||||
model: 'claude-sonnet-4-5-20250929', // Use latest Claude 4.5 Sonnet
|
}
|
||||||
maxTurns: 10_000, // Maximum turns for autonomous work
|
|
||||||
cwd: sourceDir, // Set working directory using SDK option
|
function outputLines(lines: string[]): void {
|
||||||
permissionMode: 'bypassPermissions' as const, // Bypass all permission checks for pentesting
|
for (const line of lines) {
|
||||||
mcpServers,
|
console.log(line);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function writeErrorLog(
|
||||||
|
err: Error & { code?: string; status?: number },
|
||||||
|
sourceDir: string,
|
||||||
|
fullPrompt: string,
|
||||||
|
duration: number
|
||||||
|
): Promise<void> {
|
||||||
|
try {
|
||||||
|
const errorLog = {
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
agent: 'claude-executor',
|
||||||
|
error: {
|
||||||
|
name: err.constructor.name,
|
||||||
|
message: err.message,
|
||||||
|
code: err.code,
|
||||||
|
status: err.status,
|
||||||
|
stack: err.stack
|
||||||
|
},
|
||||||
|
context: {
|
||||||
|
sourceDir,
|
||||||
|
prompt: fullPrompt.slice(0, 200) + '...',
|
||||||
|
retryable: isRetryableError(err)
|
||||||
|
},
|
||||||
|
duration
|
||||||
};
|
};
|
||||||
|
const logPath = path.join(sourceDir, 'error.log');
|
||||||
|
await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
|
||||||
|
} catch (logError) {
|
||||||
|
const logErrMsg = logError instanceof Error ? logError.message : String(logError);
|
||||||
|
console.log(chalk.gray(` (Failed to write error log: ${logErrMsg})`));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// SDK Options only shown for verbose agents (not clean output)
|
export async function validateAgentOutput(
|
||||||
if (!useCleanOutput) {
|
result: ClaudePromptResult,
|
||||||
console.log(chalk.gray(` SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`));
|
agentName: string | null,
|
||||||
|
sourceDir: string
|
||||||
|
): Promise<boolean> {
|
||||||
|
console.log(chalk.blue(` Validating ${agentName} agent output`));
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Check if agent completed successfully
|
||||||
|
if (!result.success || !result.result) {
|
||||||
|
console.log(chalk.red(` Validation failed: Agent execution was unsuccessful`));
|
||||||
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
let result: string | null = null;
|
// Get validator function for this agent
|
||||||
const messages: string[] = [];
|
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||||
let apiErrorDetected = false;
|
|
||||||
|
|
||||||
// Start progress indicator for clean output agents
|
if (!validator) {
|
||||||
if (progressIndicator) {
|
console.log(chalk.yellow(` No validator found for agent "${agentName}" - assuming success`));
|
||||||
progressIndicator.start();
|
console.log(chalk.green(` Validation passed: Unknown agent with successful result`));
|
||||||
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
let lastHeartbeat = Date.now();
|
console.log(chalk.blue(` Using validator for agent: ${agentName}`));
|
||||||
const HEARTBEAT_INTERVAL = 30000; // 30 seconds
|
console.log(chalk.blue(` Source directory: ${sourceDir}`));
|
||||||
|
|
||||||
try {
|
// Apply validation function
|
||||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
const validationResult = await validator(sourceDir);
|
||||||
// Periodic heartbeat for long-running agents (only when loader is disabled)
|
|
||||||
const now = Date.now();
|
|
||||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
|
||||||
console.log(chalk.blue(` ⏱️ [${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`));
|
|
||||||
lastHeartbeat = now;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (message.type === "assistant") {
|
if (validationResult) {
|
||||||
turnCount++;
|
console.log(chalk.green(` Validation passed: Required files/structure present`));
|
||||||
|
} else {
|
||||||
|
console.log(chalk.red(` Validation failed: Missing required deliverable files`));
|
||||||
|
}
|
||||||
|
|
||||||
const messageContent = message.message as { content: unknown };
|
return validationResult;
|
||||||
const content = Array.isArray(messageContent.content)
|
|
||||||
? messageContent.content.map((c: { text?: string }) => c.text || JSON.stringify(c)).join('\n')
|
|
||||||
: String(messageContent.content);
|
|
||||||
|
|
||||||
if (statusManager) {
|
} catch (error) {
|
||||||
// Smart status updates for parallel execution - disabled
|
const errMsg = error instanceof Error ? error.message : String(error);
|
||||||
} else if (useCleanOutput) {
|
console.log(chalk.red(` Validation failed with error: ${errMsg}`));
|
||||||
// Clean output for all agents: filter JSON tool calls but show meaningful text
|
return false;
|
||||||
const cleanedContent = filterJsonToolCalls(content);
|
}
|
||||||
if (cleanedContent.trim()) {
|
}
|
||||||
// Temporarily stop progress indicator to show output
|
|
||||||
if (progressIndicator) {
|
|
||||||
progressIndicator.stop();
|
|
||||||
}
|
|
||||||
|
|
||||||
if (isParallelExecution) {
|
// Low-level SDK execution. Handles message streaming, progress, and audit logging.
|
||||||
// Compact output for parallel agents with prefixes
|
// Exported for Temporal activities to call single-attempt execution.
|
||||||
const prefix = getAgentPrefix(description);
|
export async function runClaudePrompt(
|
||||||
console.log(colorFn(`${prefix} ${cleanedContent}`));
|
prompt: string,
|
||||||
} else {
|
sourceDir: string,
|
||||||
// Full turn output for single agents
|
context: string = '',
|
||||||
console.log(colorFn(`\n 🤖 Turn ${turnCount} (${description}):`));
|
description: string = 'Claude analysis',
|
||||||
console.log(colorFn(` ${cleanedContent}`));
|
agentName: string | null = null,
|
||||||
}
|
colorFn: ChalkInstance = chalk.cyan,
|
||||||
|
sessionMetadata: SessionMetadata | null = null,
|
||||||
|
auditSession: AuditSession | null = null,
|
||||||
|
attemptNumber: number = 1
|
||||||
|
): Promise<ClaudePromptResult> {
|
||||||
|
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||||
|
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||||
|
|
||||||
// Restart progress indicator after output
|
const execContext = detectExecutionContext(description);
|
||||||
if (progressIndicator) {
|
const progress = createProgressManager(
|
||||||
progressIndicator.start();
|
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||||
}
|
global.SHANNON_DISABLE_LOADER ?? false
|
||||||
}
|
);
|
||||||
} else {
|
const auditLogger = createAuditLogger(auditSession);
|
||||||
// Full streaming output - show complete messages with specialist color
|
|
||||||
console.log(colorFn(`\n 🤖 Turn ${turnCount} (${description}):`));
|
|
||||||
console.log(colorFn(` ${content}`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Log to audit system (crash-safe, append-only)
|
console.log(chalk.blue(` Running Claude Code: ${description}...`));
|
||||||
if (auditSession) {
|
|
||||||
await auditSession.logEvent('llm_response', {
|
|
||||||
turn: turnCount,
|
|
||||||
content,
|
|
||||||
timestamp: new Date().toISOString()
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
messages.push(content);
|
const mcpServers = buildMcpServers(sourceDir, agentName);
|
||||||
|
const options = {
|
||||||
|
model: 'claude-sonnet-4-5-20250929',
|
||||||
|
maxTurns: 10_000,
|
||||||
|
cwd: sourceDir,
|
||||||
|
permissionMode: 'bypassPermissions' as const,
|
||||||
|
mcpServers,
|
||||||
|
};
|
||||||
|
|
||||||
// Check for API error patterns in assistant message content
|
if (!execContext.useCleanOutput) {
|
||||||
if (content && typeof content === 'string') {
|
console.log(chalk.gray(` SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`));
|
||||||
const lowerContent = content.toLowerCase();
|
}
|
||||||
if (lowerContent.includes('session limit reached')) {
|
|
||||||
throw new PentestError('Session limit reached', 'billing', false);
|
|
||||||
}
|
|
||||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
|
||||||
apiErrorDetected = true;
|
|
||||||
console.log(chalk.red(` ⚠️ API Error detected in assistant response: ${content.trim()}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
} else if (message.type === "system" && (message as { subtype?: string }).subtype === "init") {
|
let turnCount = 0;
|
||||||
// Show useful system info only for verbose agents
|
let result: string | null = null;
|
||||||
if (!useCleanOutput) {
|
let apiErrorDetected = false;
|
||||||
const initMsg = message as { model?: string; permissionMode?: string; mcp_servers?: Array<{ name: string; status: string }> };
|
let totalCost = 0;
|
||||||
console.log(chalk.blue(` ℹ️ Model: ${initMsg.model}, Permission: ${initMsg.permissionMode}`));
|
|
||||||
if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
|
|
||||||
const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
|
|
||||||
console.log(chalk.blue(` 📦 MCP: ${mcpStatus}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
} else if (message.type === "user") {
|
progress.start();
|
||||||
// Skip user messages (these are our own inputs echoed back)
|
|
||||||
continue;
|
|
||||||
|
|
||||||
} else if ((message.type as string) === "tool_use") {
|
try {
|
||||||
const toolMsg = message as unknown as { name: string; input?: Record<string, unknown> };
|
const messageLoopResult = await processMessageStream(
|
||||||
console.log(chalk.yellow(`\n 🔧 Using Tool: ${toolMsg.name}`));
|
fullPrompt,
|
||||||
if (toolMsg.input && Object.keys(toolMsg.input).length > 0) {
|
options,
|
||||||
console.log(chalk.gray(` Input: ${JSON.stringify(toolMsg.input, null, 2)}`));
|
{ execContext, description, colorFn, progress, auditLogger },
|
||||||
}
|
timer
|
||||||
|
);
|
||||||
|
|
||||||
// Log tool start event
|
turnCount = messageLoopResult.turnCount;
|
||||||
if (auditSession) {
|
result = messageLoopResult.result;
|
||||||
await auditSession.logEvent('tool_start', {
|
apiErrorDetected = messageLoopResult.apiErrorDetected;
|
||||||
toolName: toolMsg.name,
|
totalCost = messageLoopResult.cost;
|
||||||
parameters: toolMsg.input,
|
|
||||||
timestamp: new Date().toISOString()
|
|
||||||
});
|
|
||||||
}
|
|
||||||
} else if ((message.type as string) === "tool_result") {
|
|
||||||
const resultMsg = message as unknown as { content?: unknown };
|
|
||||||
console.log(chalk.green(` ✅ Tool Result:`));
|
|
||||||
if (resultMsg.content) {
|
|
||||||
// Show tool results but truncate if too long
|
|
||||||
const resultStr = typeof resultMsg.content === 'string' ? resultMsg.content : JSON.stringify(resultMsg.content, null, 2);
|
|
||||||
if (resultStr.length > 500) {
|
|
||||||
console.log(chalk.gray(` ${resultStr.slice(0, 500)}...\n [Result truncated - ${resultStr.length} total chars]`));
|
|
||||||
} else {
|
|
||||||
console.log(chalk.gray(` ${resultStr}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Log tool end event
|
// === SPENDING CAP SAFEGUARD ===
|
||||||
if (auditSession) {
|
// Defense-in-depth: Detect spending cap that slipped through detectApiError().
|
||||||
await auditSession.logEvent('tool_end', {
|
// When spending cap is hit, Claude returns a short message with $0 cost.
|
||||||
result: resultMsg.content,
|
// Legitimate agent work NEVER costs $0 with only 1-2 turns.
|
||||||
timestamp: new Date().toISOString()
|
if (turnCount <= 2 && totalCost === 0) {
|
||||||
});
|
const resultLower = (result || '').toLowerCase();
|
||||||
}
|
const BILLING_KEYWORDS = ['spending', 'cap', 'limit', 'budget', 'resets'];
|
||||||
} else if (message.type === "result") {
|
const looksLikeBillingError = BILLING_KEYWORDS.some((kw) =>
|
||||||
const resultMessage = message as {
|
resultLower.includes(kw)
|
||||||
result?: string;
|
);
|
||||||
total_cost_usd?: number;
|
|
||||||
duration_ms?: number;
|
|
||||||
subtype?: string;
|
|
||||||
permission_denials?: unknown[];
|
|
||||||
};
|
|
||||||
result = resultMessage.result || null;
|
|
||||||
|
|
||||||
if (!statusManager) {
|
if (looksLikeBillingError) {
|
||||||
if (useCleanOutput) {
|
throw new PentestError(
|
||||||
// Clean completion output - just duration and cost
|
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||||
console.log(chalk.magenta(`\n 🏁 COMPLETED:`));
|
'billing',
|
||||||
const cost = resultMessage.total_cost_usd || 0;
|
true // Retryable - Temporal will use 5-30 min backoff
|
||||||
console.log(chalk.gray(` ⏱️ Duration: ${((resultMessage.duration_ms || 0)/1000).toFixed(1)}s, Cost: $${cost.toFixed(4)}`));
|
);
|
||||||
|
|
||||||
if (resultMessage.subtype === "error_max_turns") {
|
|
||||||
console.log(chalk.red(` ⚠️ Stopped: Hit maximum turns limit`));
|
|
||||||
} else if (resultMessage.subtype === "error_during_execution") {
|
|
||||||
console.log(chalk.red(` ❌ Stopped: Execution error`));
|
|
||||||
}
|
|
||||||
|
|
||||||
if (resultMessage.permission_denials && resultMessage.permission_denials.length > 0) {
|
|
||||||
console.log(chalk.yellow(` 🚫 ${resultMessage.permission_denials.length} permission denials`));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
// Full completion output for agents without clean output
|
|
||||||
console.log(chalk.magenta(`\n 🏁 COMPLETED:`));
|
|
||||||
const cost = resultMessage.total_cost_usd || 0;
|
|
||||||
console.log(chalk.gray(` ⏱️ Duration: ${((resultMessage.duration_ms || 0)/1000).toFixed(1)}s, Cost: $${cost.toFixed(4)}`));
|
|
||||||
|
|
||||||
if (resultMessage.subtype === "error_max_turns") {
|
|
||||||
console.log(chalk.red(` ⚠️ Stopped: Hit maximum turns limit`));
|
|
||||||
} else if (resultMessage.subtype === "error_during_execution") {
|
|
||||||
console.log(chalk.red(` ❌ Stopped: Execution error`));
|
|
||||||
}
|
|
||||||
|
|
||||||
if (resultMessage.permission_denials && resultMessage.permission_denials.length > 0) {
|
|
||||||
console.log(chalk.yellow(` 🚫 ${resultMessage.permission_denials.length} permission denials`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Show result content (if it's reasonable length)
|
|
||||||
if (result && typeof result === 'string') {
|
|
||||||
if (result.length > 1000) {
|
|
||||||
console.log(chalk.magenta(` 📄 ${result.slice(0, 1000)}... [${result.length} total chars]`));
|
|
||||||
} else {
|
|
||||||
console.log(chalk.magenta(` 📄 ${result}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Track cost for all agents
|
|
||||||
const cost = resultMessage.total_cost_usd || 0;
|
|
||||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
|
||||||
costResults.agents[agentKey] = cost;
|
|
||||||
costResults.total += cost;
|
|
||||||
|
|
||||||
// Store cost for return value and partial tracking
|
|
||||||
totalCost = cost;
|
|
||||||
partialCost = cost;
|
|
||||||
break;
|
|
||||||
} else {
|
|
||||||
// Log any other message types we might not be handling
|
|
||||||
console.log(chalk.gray(` 💬 ${message.type}: ${JSON.stringify(message, null, 2)}`));
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
} catch (queryError) {
|
|
||||||
throw queryError; // Re-throw to outer catch
|
|
||||||
}
|
}
|
||||||
|
|
||||||
const duration = timer.stop();
|
const duration = timer.stop();
|
||||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
timingResults.agents[execContext.agentKey] = duration;
|
||||||
timingResults.agents[agentKey] = duration;
|
|
||||||
|
|
||||||
// API error detection is logged but not immediately failed
|
|
||||||
if (apiErrorDetected) {
|
if (apiErrorDetected) {
|
||||||
console.log(chalk.yellow(` ⚠️ API Error detected in ${description} - will validate deliverables before failing`));
|
console.log(chalk.yellow(` API Error detected in ${description} - will validate deliverables before failing`));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show completion messages based on agent type
|
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||||
if (progressIndicator) {
|
|
||||||
const agentType = description.includes('Pre-recon') ? 'Pre-recon analysis' :
|
|
||||||
description.includes('Recon') ? 'Reconnaissance' :
|
|
||||||
description.includes('Report') ? 'Report generation' : 'Analysis';
|
|
||||||
progressIndicator.finish(`${agentType} complete! (${turnCount} turns, ${formatDuration(duration)})`);
|
|
||||||
} else if (isParallelExecution) {
|
|
||||||
const prefix = getAgentPrefix(description);
|
|
||||||
console.log(chalk.green(`${prefix} ✅ Complete (${turnCount} turns, ${formatDuration(duration)})`));
|
|
||||||
} else if (!useCleanOutput) {
|
|
||||||
console.log(chalk.green(` ✅ Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Return result with log file path for all agents
|
return {
|
||||||
const returnData: ClaudePromptResult = {
|
|
||||||
result,
|
result,
|
||||||
success: true,
|
success: true,
|
||||||
duration,
|
duration,
|
||||||
turns: turnCount,
|
turns: turnCount,
|
||||||
cost: totalCost,
|
cost: totalCost,
|
||||||
partialCost,
|
partialCost: totalCost,
|
||||||
apiErrorDetected
|
apiErrorDetected
|
||||||
};
|
};
|
||||||
if (logFilePath) {
|
|
||||||
returnData.logFile = logFilePath;
|
|
||||||
}
|
|
||||||
return returnData;
|
|
||||||
|
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
const duration = timer.stop();
|
const duration = timer.stop();
|
||||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
timingResults.agents[execContext.agentKey] = duration;
|
||||||
timingResults.agents[agentKey] = duration;
|
|
||||||
|
|
||||||
const err = error as Error & { code?: string; status?: number; duration?: number; cost?: number };
|
const err = error as Error & { code?: string; status?: number };
|
||||||
|
|
||||||
// Log error to audit system
|
await auditLogger.logError(err, duration, turnCount);
|
||||||
if (auditSession) {
|
progress.stop();
|
||||||
await auditSession.logEvent('error', {
|
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||||
message: err.message,
|
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||||
errorType: err.constructor.name,
|
|
||||||
stack: err.stack,
|
|
||||||
duration,
|
|
||||||
turns: turnCount,
|
|
||||||
timestamp: new Date().toISOString()
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
// Show error messages based on agent type
|
|
||||||
if (progressIndicator) {
|
|
||||||
progressIndicator.stop();
|
|
||||||
const agentType = description.includes('Pre-recon') ? 'Pre-recon analysis' :
|
|
||||||
description.includes('Recon') ? 'Reconnaissance' :
|
|
||||||
description.includes('Report') ? 'Report generation' : 'Analysis';
|
|
||||||
console.log(chalk.red(`❌ ${agentType} failed (${formatDuration(duration)})`));
|
|
||||||
} else if (isParallelExecution) {
|
|
||||||
const prefix = getAgentPrefix(description);
|
|
||||||
console.log(chalk.red(`${prefix} ❌ Failed (${formatDuration(duration)})`));
|
|
||||||
} else if (!useCleanOutput) {
|
|
||||||
console.log(chalk.red(` ❌ Claude Code failed: ${description} (${formatDuration(duration)})`));
|
|
||||||
}
|
|
||||||
console.log(chalk.red(` Error Type: ${err.constructor.name}`));
|
|
||||||
console.log(chalk.red(` Message: ${err.message}`));
|
|
||||||
console.log(chalk.gray(` Agent: ${description}`));
|
|
||||||
console.log(chalk.gray(` Working Directory: ${sourceDir}`));
|
|
||||||
console.log(chalk.gray(` Retryable: ${isRetryableError(err) ? 'Yes' : 'No'}`));
|
|
||||||
|
|
||||||
// Log additional context if available
|
|
||||||
if (err.code) {
|
|
||||||
console.log(chalk.gray(` Error Code: ${err.code}`));
|
|
||||||
}
|
|
||||||
if (err.status) {
|
|
||||||
console.log(chalk.gray(` HTTP Status: ${err.status}`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Save detailed error to log file for debugging
|
|
||||||
try {
|
|
||||||
const errorLog = {
|
|
||||||
timestamp: new Date().toISOString(),
|
|
||||||
agent: description,
|
|
||||||
error: {
|
|
||||||
name: err.constructor.name,
|
|
||||||
message: err.message,
|
|
||||||
code: err.code,
|
|
||||||
status: err.status,
|
|
||||||
stack: err.stack
|
|
||||||
},
|
|
||||||
context: {
|
|
||||||
sourceDir,
|
|
||||||
prompt: fullPrompt.slice(0, 200) + '...',
|
|
||||||
retryable: isRetryableError(err)
|
|
||||||
},
|
|
||||||
duration
|
|
||||||
};
|
|
||||||
|
|
||||||
const logPath = path.join(sourceDir, 'error.log');
|
|
||||||
await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
|
|
||||||
} catch (logError) {
|
|
||||||
const logErrMsg = logError instanceof Error ? logError.message : String(logError);
|
|
||||||
console.log(chalk.gray(` (Failed to write error log: ${logErrMsg})`));
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
error: err.message,
|
error: err.message,
|
||||||
@@ -572,17 +304,85 @@ async function runClaudePrompt(
|
|||||||
prompt: fullPrompt.slice(0, 100) + '...',
|
prompt: fullPrompt.slice(0, 100) + '...',
|
||||||
success: false,
|
success: false,
|
||||||
duration,
|
duration,
|
||||||
cost: partialCost,
|
cost: totalCost,
|
||||||
retryable: isRetryableError(err)
|
retryable: isRetryableError(err)
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// PREFERRED: Production-ready Claude agent execution with full orchestration
|
|
||||||
|
interface MessageLoopResult {
|
||||||
|
turnCount: number;
|
||||||
|
result: string | null;
|
||||||
|
apiErrorDetected: boolean;
|
||||||
|
cost: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface MessageLoopDeps {
|
||||||
|
execContext: ReturnType<typeof detectExecutionContext>;
|
||||||
|
description: string;
|
||||||
|
colorFn: ChalkInstance;
|
||||||
|
progress: ReturnType<typeof createProgressManager>;
|
||||||
|
auditLogger: ReturnType<typeof createAuditLogger>;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function processMessageStream(
|
||||||
|
fullPrompt: string,
|
||||||
|
options: NonNullable<Parameters<typeof query>[0]['options']>,
|
||||||
|
deps: MessageLoopDeps,
|
||||||
|
timer: Timer
|
||||||
|
): Promise<MessageLoopResult> {
|
||||||
|
const { execContext, description, colorFn, progress, auditLogger } = deps;
|
||||||
|
const HEARTBEAT_INTERVAL = 30000;
|
||||||
|
|
||||||
|
let turnCount = 0;
|
||||||
|
let result: string | null = null;
|
||||||
|
let apiErrorDetected = false;
|
||||||
|
let cost = 0;
|
||||||
|
let lastHeartbeat = Date.now();
|
||||||
|
|
||||||
|
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||||
|
// Heartbeat logging when loader is disabled
|
||||||
|
const now = Date.now();
|
||||||
|
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||||
|
console.log(chalk.blue(` [${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`));
|
||||||
|
lastHeartbeat = now;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Increment turn count for assistant messages
|
||||||
|
if (message.type === 'assistant') {
|
||||||
|
turnCount++;
|
||||||
|
}
|
||||||
|
|
||||||
|
const dispatchResult = await dispatchMessage(
|
||||||
|
message as { type: string; subtype?: string },
|
||||||
|
turnCount,
|
||||||
|
{ execContext, description, colorFn, progress, auditLogger }
|
||||||
|
);
|
||||||
|
|
||||||
|
if (dispatchResult.type === 'throw') {
|
||||||
|
throw dispatchResult.error;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (dispatchResult.type === 'complete') {
|
||||||
|
result = dispatchResult.result;
|
||||||
|
cost = dispatchResult.cost;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (dispatchResult.type === 'continue' && dispatchResult.apiErrorDetected) {
|
||||||
|
apiErrorDetected = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return { turnCount, result, apiErrorDetected, cost };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Main entry point for agent execution. Handles retries, git checkpoints, and validation.
|
||||||
export async function runClaudePromptWithRetry(
|
export async function runClaudePromptWithRetry(
|
||||||
prompt: string,
|
prompt: string,
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
allowedTools: string = 'Read',
|
_allowedTools: string = 'Read',
|
||||||
context: string = '',
|
context: string = '',
|
||||||
description: string = 'Claude analysis',
|
description: string = 'Claude analysis',
|
||||||
agentName: string | null = null,
|
agentName: string | null = null,
|
||||||
@@ -593,9 +393,8 @@ export async function runClaudePromptWithRetry(
|
|||||||
let lastError: Error | undefined;
|
let lastError: Error | undefined;
|
||||||
let retryContext = context;
|
let retryContext = context;
|
||||||
|
|
||||||
console.log(chalk.cyan(`🚀 Starting ${description} with ${maxRetries} max attempts`));
|
console.log(chalk.cyan(`Starting ${description} with ${maxRetries} max attempts`));
|
||||||
|
|
||||||
// Initialize audit session (crash-safe logging)
|
|
||||||
let auditSession: AuditSession | null = null;
|
let auditSession: AuditSession | null = null;
|
||||||
if (sessionMetadata && agentName) {
|
if (sessionMetadata && agentName) {
|
||||||
auditSession = new AuditSession(sessionMetadata);
|
auditSession = new AuditSession(sessionMetadata);
|
||||||
@@ -603,29 +402,27 @@ export async function runClaudePromptWithRetry(
|
|||||||
}
|
}
|
||||||
|
|
||||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||||
// Create checkpoint before each attempt
|
|
||||||
await createGitCheckpoint(sourceDir, description, attempt);
|
await createGitCheckpoint(sourceDir, description, attempt);
|
||||||
|
|
||||||
// Start agent tracking in audit system (saves prompt snapshot automatically)
|
|
||||||
if (auditSession && agentName) {
|
if (auditSession && agentName) {
|
||||||
const fullPrompt = retryContext ? `${retryContext}\n\n${prompt}` : prompt;
|
const fullPrompt = retryContext ? `${retryContext}\n\n${prompt}` : prompt;
|
||||||
await auditSession.startAgent(agentName, fullPrompt, attempt);
|
await auditSession.startAgent(agentName, fullPrompt, attempt);
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const result = await runClaudePrompt(prompt, sourceDir, allowedTools, retryContext, description, agentName, colorFn, sessionMetadata, auditSession, attempt);
|
const result = await runClaudePrompt(
|
||||||
|
prompt, sourceDir, retryContext,
|
||||||
|
description, agentName, colorFn, sessionMetadata, auditSession, attempt
|
||||||
|
);
|
||||||
|
|
||||||
// Validate output after successful run
|
|
||||||
if (result.success) {
|
if (result.success) {
|
||||||
const validationPassed = await validateAgentOutput(result, agentName, sourceDir);
|
const validationPassed = await validateAgentOutput(result, agentName, sourceDir);
|
||||||
|
|
||||||
if (validationPassed) {
|
if (validationPassed) {
|
||||||
// Check if API error was detected but validation passed
|
|
||||||
if (result.apiErrorDetected) {
|
if (result.apiErrorDetected) {
|
||||||
console.log(chalk.yellow(`📋 Validation: Ready for exploitation despite API error warnings`));
|
console.log(chalk.yellow(`Validation: Ready for exploitation despite API error warnings`));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Record successful attempt in audit system
|
|
||||||
if (auditSession && agentName) {
|
if (auditSession && agentName) {
|
||||||
const commitHash = await getGitCommitHash(sourceDir);
|
const commitHash = await getGitCommitHash(sourceDir);
|
||||||
const endResult: {
|
const endResult: {
|
||||||
@@ -646,15 +443,13 @@ export async function runClaudePromptWithRetry(
|
|||||||
await auditSession.endAgent(agentName, endResult);
|
await auditSession.endAgent(agentName, endResult);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Commit successful changes (will include the snapshot)
|
|
||||||
await commitGitSuccess(sourceDir, description);
|
await commitGitSuccess(sourceDir, description);
|
||||||
console.log(chalk.green.bold(`🎉 ${description} completed successfully on attempt ${attempt}/${maxRetries}`));
|
console.log(chalk.green.bold(`${description} completed successfully on attempt ${attempt}/${maxRetries}`));
|
||||||
return result;
|
return result;
|
||||||
|
// Validation failure is retryable - agent might succeed on retry with cleaner workspace
|
||||||
} else {
|
} else {
|
||||||
// Agent completed but output validation failed
|
console.log(chalk.yellow(`${description} completed but output validation failed`));
|
||||||
console.log(chalk.yellow(`⚠️ ${description} completed but output validation failed`));
|
|
||||||
|
|
||||||
// Record failed validation attempt in audit system
|
|
||||||
if (auditSession && agentName) {
|
if (auditSession && agentName) {
|
||||||
await auditSession.endAgent(agentName, {
|
await auditSession.endAgent(agentName, {
|
||||||
attemptNumber: attempt,
|
attemptNumber: attempt,
|
||||||
@@ -666,20 +461,17 @@ export async function runClaudePromptWithRetry(
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
// If API error detected AND validation failed, this is a retryable error
|
|
||||||
if (result.apiErrorDetected) {
|
if (result.apiErrorDetected) {
|
||||||
console.log(chalk.yellow(`⚠️ API Error detected with validation failure - treating as retryable`));
|
console.log(chalk.yellow(`API Error detected with validation failure - treating as retryable`));
|
||||||
lastError = new Error('API Error: terminated with validation failure');
|
lastError = new Error('API Error: terminated with validation failure');
|
||||||
} else {
|
} else {
|
||||||
lastError = new Error('Output validation failed');
|
lastError = new Error('Output validation failed');
|
||||||
}
|
}
|
||||||
|
|
||||||
if (attempt < maxRetries) {
|
if (attempt < maxRetries) {
|
||||||
// Rollback contaminated workspace
|
|
||||||
await rollbackGitWorkspace(sourceDir, 'validation failure');
|
await rollbackGitWorkspace(sourceDir, 'validation failure');
|
||||||
continue;
|
continue;
|
||||||
} else {
|
} else {
|
||||||
// FAIL FAST - Don't continue with broken pipeline
|
|
||||||
throw new PentestError(
|
throw new PentestError(
|
||||||
`Agent ${description} failed output validation after ${maxRetries} attempts. Required deliverable files were not created.`,
|
`Agent ${description} failed output validation after ${maxRetries} attempts. Required deliverable files were not created.`,
|
||||||
'validation',
|
'validation',
|
||||||
@@ -694,7 +486,6 @@ export async function runClaudePromptWithRetry(
|
|||||||
const err = error as Error & { duration?: number; cost?: number; partialResults?: unknown };
|
const err = error as Error & { duration?: number; cost?: number; partialResults?: unknown };
|
||||||
lastError = err;
|
lastError = err;
|
||||||
|
|
||||||
// Record failed attempt in audit system
|
|
||||||
if (auditSession && agentName) {
|
if (auditSession && agentName) {
|
||||||
await auditSession.endAgent(agentName, {
|
await auditSession.endAgent(agentName, {
|
||||||
attemptNumber: attempt,
|
attemptNumber: attempt,
|
||||||
@@ -706,24 +497,21 @@ export async function runClaudePromptWithRetry(
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check if error is retryable
|
|
||||||
if (!isRetryableError(err)) {
|
if (!isRetryableError(err)) {
|
||||||
console.log(chalk.red(`❌ ${description} failed with non-retryable error: ${err.message}`));
|
console.log(chalk.red(`${description} failed with non-retryable error: ${err.message}`));
|
||||||
await rollbackGitWorkspace(sourceDir, 'non-retryable error cleanup');
|
await rollbackGitWorkspace(sourceDir, 'non-retryable error cleanup');
|
||||||
throw err;
|
throw err;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (attempt < maxRetries) {
|
if (attempt < maxRetries) {
|
||||||
// Rollback for clean retry
|
|
||||||
await rollbackGitWorkspace(sourceDir, 'retryable error cleanup');
|
await rollbackGitWorkspace(sourceDir, 'retryable error cleanup');
|
||||||
|
|
||||||
const delay = getRetryDelay(err, attempt);
|
const delay = getRetryDelay(err, attempt);
|
||||||
const delaySeconds = (delay / 1000).toFixed(1);
|
const delaySeconds = (delay / 1000).toFixed(1);
|
||||||
console.log(chalk.yellow(`⚠️ ${description} failed (attempt ${attempt}/${maxRetries})`));
|
console.log(chalk.yellow(`${description} failed (attempt ${attempt}/${maxRetries})`));
|
||||||
console.log(chalk.gray(` Error: ${err.message}`));
|
console.log(chalk.gray(` Error: ${err.message}`));
|
||||||
console.log(chalk.gray(` Workspace rolled back, retrying in ${delaySeconds}s...`));
|
console.log(chalk.gray(` Workspace rolled back, retrying in ${delaySeconds}s...`));
|
||||||
|
|
||||||
// Preserve any partial results for next retry
|
|
||||||
if (err.partialResults) {
|
if (err.partialResults) {
|
||||||
retryContext = `${context}\n\nPrevious partial results: ${JSON.stringify(err.partialResults)}`;
|
retryContext = `${context}\n\nPrevious partial results: ${JSON.stringify(err.partialResults)}`;
|
||||||
}
|
}
|
||||||
@@ -731,7 +519,7 @@ export async function runClaudePromptWithRetry(
|
|||||||
await new Promise(resolve => setTimeout(resolve, delay));
|
await new Promise(resolve => setTimeout(resolve, delay));
|
||||||
} else {
|
} else {
|
||||||
await rollbackGitWorkspace(sourceDir, 'final failure cleanup');
|
await rollbackGitWorkspace(sourceDir, 'final failure cleanup');
|
||||||
console.log(chalk.red(`❌ ${description} failed after ${maxRetries} attempts`));
|
console.log(chalk.red(`${description} failed after ${maxRetries} attempts`));
|
||||||
console.log(chalk.red(` Final error: ${err.message}`));
|
console.log(chalk.red(` Final error: ${err.message}`));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -739,13 +527,3 @@ export async function runClaudePromptWithRetry(
|
|||||||
|
|
||||||
throw lastError;
|
throw lastError;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Helper function to get git commit hash
|
|
||||||
async function getGitCommitHash(sourceDir: string): Promise<string | null> {
|
|
||||||
try {
|
|
||||||
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
|
|
||||||
return result.stdout.trim();
|
|
||||||
} catch {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -0,0 +1,272 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
// Pure functions for processing SDK message types
|
||||||
|
|
||||||
|
import { PentestError } from '../error-handling.js';
|
||||||
|
import { filterJsonToolCalls } from '../utils/output-formatter.js';
|
||||||
|
import { formatTimestamp } from '../utils/formatting.js';
|
||||||
|
import chalk from 'chalk';
|
||||||
|
import {
|
||||||
|
formatAssistantOutput,
|
||||||
|
formatResultOutput,
|
||||||
|
formatToolUseOutput,
|
||||||
|
formatToolResultOutput,
|
||||||
|
} from './output-formatters.js';
|
||||||
|
import { costResults } from '../utils/metrics.js';
|
||||||
|
import type { AuditLogger } from './audit-logger.js';
|
||||||
|
import type { ProgressManager } from './progress-manager.js';
|
||||||
|
import type {
|
||||||
|
AssistantMessage,
|
||||||
|
ResultMessage,
|
||||||
|
ToolUseMessage,
|
||||||
|
ToolResultMessage,
|
||||||
|
AssistantResult,
|
||||||
|
ResultData,
|
||||||
|
ToolUseData,
|
||||||
|
ToolResultData,
|
||||||
|
ApiErrorDetection,
|
||||||
|
ContentBlock,
|
||||||
|
SystemInitMessage,
|
||||||
|
ExecutionContext,
|
||||||
|
} from './types.js';
|
||||||
|
import type { ChalkInstance } from 'chalk';
|
||||||
|
|
||||||
|
// Handles both array and string content formats from SDK
|
||||||
|
export function extractMessageContent(message: AssistantMessage): string {
|
||||||
|
const messageContent = message.message;
|
||||||
|
|
||||||
|
if (Array.isArray(messageContent.content)) {
|
||||||
|
return messageContent.content
|
||||||
|
.map((c: ContentBlock) => c.text || JSON.stringify(c))
|
||||||
|
.join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
return String(messageContent.content);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function detectApiError(content: string): ApiErrorDetection {
|
||||||
|
if (!content || typeof content !== 'string') {
|
||||||
|
return { detected: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
const lowerContent = content.toLowerCase();
|
||||||
|
|
||||||
|
// === BILLING/SPENDING CAP ERRORS (Retryable with long backoff) ===
|
||||||
|
// When Claude Code hits its spending cap, it returns a short message like
|
||||||
|
// "Spending cap reached resets 8am" instead of throwing an error.
|
||||||
|
// These should retry with 5-30 min backoff so workflows can recover when cap resets.
|
||||||
|
const BILLING_PATTERNS = [
|
||||||
|
'spending cap',
|
||||||
|
'spending limit',
|
||||||
|
'cap reached',
|
||||||
|
'budget exceeded',
|
||||||
|
'usage limit',
|
||||||
|
];
|
||||||
|
|
||||||
|
const isBillingError = BILLING_PATTERNS.some((pattern) =>
|
||||||
|
lowerContent.includes(pattern)
|
||||||
|
);
|
||||||
|
|
||||||
|
if (isBillingError) {
|
||||||
|
return {
|
||||||
|
detected: true,
|
||||||
|
shouldThrow: new PentestError(
|
||||||
|
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||||
|
'billing',
|
||||||
|
true // RETRYABLE - Temporal will use 5-30 min backoff
|
||||||
|
),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// === SESSION LIMIT (Non-retryable) ===
|
||||||
|
// Different from spending cap - usually means something is fundamentally wrong
|
||||||
|
if (lowerContent.includes('session limit reached')) {
|
||||||
|
return {
|
||||||
|
detected: true,
|
||||||
|
shouldThrow: new PentestError('Session limit reached', 'billing', false),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Non-fatal API errors - detected but continue
|
||||||
|
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||||
|
return { detected: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
return { detected: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
export function handleAssistantMessage(
|
||||||
|
message: AssistantMessage,
|
||||||
|
turnCount: number
|
||||||
|
): AssistantResult {
|
||||||
|
const content = extractMessageContent(message);
|
||||||
|
const cleanedContent = filterJsonToolCalls(content);
|
||||||
|
const errorDetection = detectApiError(content);
|
||||||
|
|
||||||
|
const result: AssistantResult = {
|
||||||
|
content,
|
||||||
|
cleanedContent,
|
||||||
|
apiErrorDetected: errorDetection.detected,
|
||||||
|
logData: {
|
||||||
|
turn: turnCount,
|
||||||
|
content,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
// Only add shouldThrow if it exists (exactOptionalPropertyTypes compliance)
|
||||||
|
if (errorDetection.shouldThrow) {
|
||||||
|
result.shouldThrow = errorDetection.shouldThrow;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Final message of a query with cost/duration info
|
||||||
|
export function handleResultMessage(message: ResultMessage): ResultData {
|
||||||
|
const result: ResultData = {
|
||||||
|
result: message.result || null,
|
||||||
|
cost: message.total_cost_usd || 0,
|
||||||
|
duration_ms: message.duration_ms || 0,
|
||||||
|
permissionDenials: message.permission_denials?.length || 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Only add subtype if it exists (exactOptionalPropertyTypes compliance)
|
||||||
|
if (message.subtype) {
|
||||||
|
result.subtype = message.subtype;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
|
||||||
|
return {
|
||||||
|
toolName: message.name,
|
||||||
|
parameters: message.input || {},
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Truncates long results for display (500 char limit), preserves full content for logging
|
||||||
|
export function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
|
||||||
|
const content = message.content;
|
||||||
|
const contentStr =
|
||||||
|
typeof content === 'string' ? content : JSON.stringify(content, null, 2);
|
||||||
|
|
||||||
|
const displayContent =
|
||||||
|
contentStr.length > 500
|
||||||
|
? `${contentStr.slice(0, 500)}...\n[Result truncated - ${contentStr.length} total chars]`
|
||||||
|
: contentStr;
|
||||||
|
|
||||||
|
return {
|
||||||
|
content,
|
||||||
|
displayContent,
|
||||||
|
timestamp: formatTimestamp(),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Output helper for console logging
|
||||||
|
function outputLines(lines: string[]): void {
|
||||||
|
for (const line of lines) {
|
||||||
|
console.log(line);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Message dispatch result types
|
||||||
|
export type MessageDispatchAction =
|
||||||
|
| { type: 'continue'; apiErrorDetected?: boolean }
|
||||||
|
| { type: 'complete'; result: string | null; cost: number }
|
||||||
|
| { type: 'throw'; error: Error };
|
||||||
|
|
||||||
|
export interface MessageDispatchDeps {
|
||||||
|
execContext: ExecutionContext;
|
||||||
|
description: string;
|
||||||
|
colorFn: ChalkInstance;
|
||||||
|
progress: ProgressManager;
|
||||||
|
auditLogger: AuditLogger;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Dispatches SDK messages to appropriate handlers and formatters
|
||||||
|
export async function dispatchMessage(
|
||||||
|
message: { type: string; subtype?: string },
|
||||||
|
turnCount: number,
|
||||||
|
deps: MessageDispatchDeps
|
||||||
|
): Promise<MessageDispatchAction> {
|
||||||
|
const { execContext, description, colorFn, progress, auditLogger } = deps;
|
||||||
|
|
||||||
|
switch (message.type) {
|
||||||
|
case 'assistant': {
|
||||||
|
const assistantResult = handleAssistantMessage(message as AssistantMessage, turnCount);
|
||||||
|
|
||||||
|
if (assistantResult.shouldThrow) {
|
||||||
|
return { type: 'throw', error: assistantResult.shouldThrow };
|
||||||
|
}
|
||||||
|
|
||||||
|
if (assistantResult.cleanedContent.trim()) {
|
||||||
|
progress.stop();
|
||||||
|
outputLines(formatAssistantOutput(
|
||||||
|
assistantResult.cleanedContent,
|
||||||
|
execContext,
|
||||||
|
turnCount,
|
||||||
|
description,
|
||||||
|
colorFn
|
||||||
|
));
|
||||||
|
progress.start();
|
||||||
|
}
|
||||||
|
|
||||||
|
await auditLogger.logLlmResponse(turnCount, assistantResult.content);
|
||||||
|
|
||||||
|
if (assistantResult.apiErrorDetected) {
|
||||||
|
console.log(chalk.red(` API Error detected in assistant response`));
|
||||||
|
return { type: 'continue', apiErrorDetected: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
return { type: 'continue' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'system': {
|
||||||
|
if (message.subtype === 'init' && !execContext.useCleanOutput) {
|
||||||
|
const initMsg = message as SystemInitMessage;
|
||||||
|
console.log(chalk.blue(` Model: ${initMsg.model}, Permission: ${initMsg.permissionMode}`));
|
||||||
|
if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
|
||||||
|
const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
|
||||||
|
console.log(chalk.blue(` MCP: ${mcpStatus}`));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return { type: 'continue' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'user':
|
||||||
|
return { type: 'continue' };
|
||||||
|
|
||||||
|
case 'tool_use': {
|
||||||
|
const toolData = handleToolUseMessage(message as unknown as ToolUseMessage);
|
||||||
|
outputLines(formatToolUseOutput(toolData.toolName, toolData.parameters));
|
||||||
|
await auditLogger.logToolStart(toolData.toolName, toolData.parameters);
|
||||||
|
return { type: 'continue' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'tool_result': {
|
||||||
|
const toolResultData = handleToolResultMessage(message as unknown as ToolResultMessage);
|
||||||
|
outputLines(formatToolResultOutput(toolResultData.displayContent));
|
||||||
|
await auditLogger.logToolEnd(toolResultData.content);
|
||||||
|
return { type: 'continue' };
|
||||||
|
}
|
||||||
|
|
||||||
|
case 'result': {
|
||||||
|
const resultData = handleResultMessage(message as ResultMessage);
|
||||||
|
outputLines(formatResultOutput(resultData, !execContext.useCleanOutput));
|
||||||
|
costResults.agents[execContext.agentKey] = resultData.cost;
|
||||||
|
costResults.total += resultData.cost;
|
||||||
|
return { type: 'complete', result: resultData.result, cost: resultData.cost };
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
console.log(chalk.gray(` ${message.type}: ${JSON.stringify(message, null, 2)}`));
|
||||||
|
return { type: 'continue' };
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,169 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
// Pure functions for formatting console output
|
||||||
|
|
||||||
|
import chalk from 'chalk';
|
||||||
|
import { extractAgentType, formatDuration } from '../utils/formatting.js';
|
||||||
|
import { getAgentPrefix } from '../utils/output-formatter.js';
|
||||||
|
import type { ExecutionContext, ResultData } from './types.js';
|
||||||
|
|
||||||
|
export function detectExecutionContext(description: string): ExecutionContext {
|
||||||
|
const isParallelExecution =
|
||||||
|
description.includes('vuln agent') || description.includes('exploit agent');
|
||||||
|
|
||||||
|
const useCleanOutput =
|
||||||
|
description.includes('Pre-recon agent') ||
|
||||||
|
description.includes('Recon agent') ||
|
||||||
|
description.includes('Executive Summary and Report Cleanup') ||
|
||||||
|
description.includes('vuln agent') ||
|
||||||
|
description.includes('exploit agent');
|
||||||
|
|
||||||
|
const agentType = extractAgentType(description);
|
||||||
|
|
||||||
|
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||||
|
|
||||||
|
return { isParallelExecution, useCleanOutput, agentType, agentKey };
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatAssistantOutput(
|
||||||
|
cleanedContent: string,
|
||||||
|
context: ExecutionContext,
|
||||||
|
turnCount: number,
|
||||||
|
description: string,
|
||||||
|
colorFn: typeof chalk.cyan = chalk.cyan
|
||||||
|
): string[] {
|
||||||
|
if (!cleanedContent.trim()) {
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
const lines: string[] = [];
|
||||||
|
|
||||||
|
if (context.isParallelExecution) {
|
||||||
|
// Compact output for parallel agents with prefixes
|
||||||
|
const prefix = getAgentPrefix(description);
|
||||||
|
lines.push(colorFn(`${prefix} ${cleanedContent}`));
|
||||||
|
} else {
|
||||||
|
// Full turn output for sequential agents
|
||||||
|
lines.push(colorFn(`\n Turn ${turnCount} (${description}):`));
|
||||||
|
lines.push(colorFn(` ${cleanedContent}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatResultOutput(data: ResultData, showFullResult: boolean): string[] {
|
||||||
|
const lines: string[] = [];
|
||||||
|
|
||||||
|
lines.push(chalk.magenta(`\n COMPLETED:`));
|
||||||
|
lines.push(
|
||||||
|
chalk.gray(
|
||||||
|
` Duration: ${(data.duration_ms / 1000).toFixed(1)}s, Cost: $${data.cost.toFixed(4)}`
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
if (data.subtype === 'error_max_turns') {
|
||||||
|
lines.push(chalk.red(` Stopped: Hit maximum turns limit`));
|
||||||
|
} else if (data.subtype === 'error_during_execution') {
|
||||||
|
lines.push(chalk.red(` Stopped: Execution error`));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (data.permissionDenials > 0) {
|
||||||
|
lines.push(chalk.yellow(` ${data.permissionDenials} permission denials`));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (showFullResult && data.result && typeof data.result === 'string') {
|
||||||
|
if (data.result.length > 1000) {
|
||||||
|
lines.push(chalk.magenta(` ${data.result.slice(0, 1000)}... [${data.result.length} total chars]`));
|
||||||
|
} else {
|
||||||
|
lines.push(chalk.magenta(` ${data.result}`));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatErrorOutput(
|
||||||
|
error: Error & { code?: string; status?: number },
|
||||||
|
context: ExecutionContext,
|
||||||
|
description: string,
|
||||||
|
duration: number,
|
||||||
|
sourceDir: string,
|
||||||
|
isRetryable: boolean
|
||||||
|
): string[] {
|
||||||
|
const lines: string[] = [];
|
||||||
|
|
||||||
|
if (context.isParallelExecution) {
|
||||||
|
const prefix = getAgentPrefix(description);
|
||||||
|
lines.push(chalk.red(`${prefix} Failed (${formatDuration(duration)})`));
|
||||||
|
} else if (context.useCleanOutput) {
|
||||||
|
lines.push(chalk.red(`${context.agentType} failed (${formatDuration(duration)})`));
|
||||||
|
} else {
|
||||||
|
lines.push(chalk.red(` Claude Code failed: ${description} (${formatDuration(duration)})`));
|
||||||
|
}
|
||||||
|
|
||||||
|
lines.push(chalk.red(` Error Type: ${error.constructor.name}`));
|
||||||
|
lines.push(chalk.red(` Message: ${error.message}`));
|
||||||
|
lines.push(chalk.gray(` Agent: ${description}`));
|
||||||
|
lines.push(chalk.gray(` Working Directory: ${sourceDir}`));
|
||||||
|
lines.push(chalk.gray(` Retryable: ${isRetryable ? 'Yes' : 'No'}`));
|
||||||
|
|
||||||
|
if (error.code) {
|
||||||
|
lines.push(chalk.gray(` Error Code: ${error.code}`));
|
||||||
|
}
|
||||||
|
if (error.status) {
|
||||||
|
lines.push(chalk.gray(` HTTP Status: ${error.status}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatCompletionMessage(
|
||||||
|
context: ExecutionContext,
|
||||||
|
description: string,
|
||||||
|
turnCount: number,
|
||||||
|
duration: number
|
||||||
|
): string {
|
||||||
|
if (context.isParallelExecution) {
|
||||||
|
const prefix = getAgentPrefix(description);
|
||||||
|
return chalk.green(`${prefix} Complete (${turnCount} turns, ${formatDuration(duration)})`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (context.useCleanOutput) {
|
||||||
|
return chalk.green(
|
||||||
|
`${context.agentType.charAt(0).toUpperCase() + context.agentType.slice(1)} complete! (${turnCount} turns, ${formatDuration(duration)})`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return chalk.green(
|
||||||
|
` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatToolUseOutput(
|
||||||
|
toolName: string,
|
||||||
|
input: Record<string, unknown> | undefined
|
||||||
|
): string[] {
|
||||||
|
const lines: string[] = [];
|
||||||
|
|
||||||
|
lines.push(chalk.yellow(`\n Using Tool: ${toolName}`));
|
||||||
|
if (input && Object.keys(input).length > 0) {
|
||||||
|
lines.push(chalk.gray(` Input: ${JSON.stringify(input, null, 2)}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function formatToolResultOutput(displayContent: string): string[] {
|
||||||
|
const lines: string[] = [];
|
||||||
|
|
||||||
|
lines.push(chalk.green(` Tool Result:`));
|
||||||
|
if (displayContent) {
|
||||||
|
lines.push(chalk.gray(` ${displayContent}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines;
|
||||||
|
}
|
||||||
@@ -0,0 +1,76 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
// Null Object pattern for progress indicator - callers never check for null
|
||||||
|
|
||||||
|
import { ProgressIndicator } from '../progress-indicator.js';
|
||||||
|
import { extractAgentType } from '../utils/formatting.js';
|
||||||
|
|
||||||
|
export interface ProgressContext {
|
||||||
|
description: string;
|
||||||
|
useCleanOutput: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ProgressManager {
|
||||||
|
start(): void;
|
||||||
|
stop(): void;
|
||||||
|
finish(message: string): void;
|
||||||
|
isActive(): boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
class RealProgressManager implements ProgressManager {
|
||||||
|
private indicator: ProgressIndicator;
|
||||||
|
private active: boolean = false;
|
||||||
|
|
||||||
|
constructor(message: string) {
|
||||||
|
this.indicator = new ProgressIndicator(message);
|
||||||
|
}
|
||||||
|
|
||||||
|
start(): void {
|
||||||
|
this.indicator.start();
|
||||||
|
this.active = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
stop(): void {
|
||||||
|
this.indicator.stop();
|
||||||
|
this.active = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
finish(message: string): void {
|
||||||
|
this.indicator.finish(message);
|
||||||
|
this.active = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
isActive(): boolean {
|
||||||
|
return this.active;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Null Object implementation - all methods are safe no-ops */
|
||||||
|
class NullProgressManager implements ProgressManager {
|
||||||
|
start(): void {}
|
||||||
|
|
||||||
|
stop(): void {}
|
||||||
|
|
||||||
|
finish(_message: string): void {}
|
||||||
|
|
||||||
|
isActive(): boolean {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns no-op when disabled
|
||||||
|
export function createProgressManager(
|
||||||
|
context: ProgressContext,
|
||||||
|
disableLoader: boolean
|
||||||
|
): ProgressManager {
|
||||||
|
if (!context.useCleanOutput || disableLoader) {
|
||||||
|
return new NullProgressManager();
|
||||||
|
}
|
||||||
|
|
||||||
|
const agentType = extractAgentType(context.description);
|
||||||
|
return new RealProgressManager(`Running ${agentType}...`);
|
||||||
|
}
|
||||||
+134
@@ -0,0 +1,134 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
// Type definitions for Claude executor message processing pipeline
|
||||||
|
|
||||||
|
export interface ExecutionContext {
|
||||||
|
isParallelExecution: boolean;
|
||||||
|
useCleanOutput: boolean;
|
||||||
|
agentType: string;
|
||||||
|
agentKey: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ProcessingState {
|
||||||
|
turnCount: number;
|
||||||
|
result: string | null;
|
||||||
|
apiErrorDetected: boolean;
|
||||||
|
totalCost: number;
|
||||||
|
partialCost: number;
|
||||||
|
lastHeartbeat: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ProcessingResult {
|
||||||
|
result: string | null;
|
||||||
|
turnCount: number;
|
||||||
|
apiErrorDetected: boolean;
|
||||||
|
totalCost: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface AssistantResult {
|
||||||
|
content: string;
|
||||||
|
cleanedContent: string;
|
||||||
|
apiErrorDetected: boolean;
|
||||||
|
shouldThrow?: Error;
|
||||||
|
logData: {
|
||||||
|
turn: number;
|
||||||
|
content: string;
|
||||||
|
timestamp: string;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ResultData {
|
||||||
|
result: string | null;
|
||||||
|
cost: number;
|
||||||
|
duration_ms: number;
|
||||||
|
subtype?: string;
|
||||||
|
permissionDenials: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ToolUseData {
|
||||||
|
toolName: string;
|
||||||
|
parameters: Record<string, unknown>;
|
||||||
|
timestamp: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ToolResultData {
|
||||||
|
content: unknown;
|
||||||
|
displayContent: string;
|
||||||
|
timestamp: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ContentBlock {
|
||||||
|
type?: string;
|
||||||
|
text?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface AssistantMessage {
|
||||||
|
type: 'assistant';
|
||||||
|
message: {
|
||||||
|
content: ContentBlock[] | string;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ResultMessage {
|
||||||
|
type: 'result';
|
||||||
|
result?: string;
|
||||||
|
total_cost_usd?: number;
|
||||||
|
duration_ms?: number;
|
||||||
|
subtype?: string;
|
||||||
|
permission_denials?: unknown[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ToolUseMessage {
|
||||||
|
type: 'tool_use';
|
||||||
|
name: string;
|
||||||
|
input?: Record<string, unknown>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ToolResultMessage {
|
||||||
|
type: 'tool_result';
|
||||||
|
content?: unknown;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ApiErrorDetection {
|
||||||
|
detected: boolean;
|
||||||
|
shouldThrow?: Error;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Message types from SDK stream
|
||||||
|
export type SdkMessage =
|
||||||
|
| AssistantMessage
|
||||||
|
| ResultMessage
|
||||||
|
| ToolUseMessage
|
||||||
|
| ToolResultMessage
|
||||||
|
| SystemInitMessage
|
||||||
|
| UserMessage;
|
||||||
|
|
||||||
|
export interface SystemInitMessage {
|
||||||
|
type: 'system';
|
||||||
|
subtype: 'init';
|
||||||
|
model?: string;
|
||||||
|
permissionMode?: string;
|
||||||
|
mcp_servers?: Array<{ name: string; status: string }>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface UserMessage {
|
||||||
|
type: 'user';
|
||||||
|
}
|
||||||
|
|
||||||
|
// Dispatch result types for message processing
|
||||||
|
export type MessageDispatchResult =
|
||||||
|
| { action: 'continue' }
|
||||||
|
| { action: 'break'; result: string | null; cost: number }
|
||||||
|
| { action: 'throw'; error: Error };
|
||||||
|
|
||||||
|
export interface MessageDispatchContext {
|
||||||
|
turnCount: number;
|
||||||
|
execContext: ExecutionContext;
|
||||||
|
description: string;
|
||||||
|
colorFn: (text: string) => string;
|
||||||
|
useCleanOutput: boolean;
|
||||||
|
}
|
||||||
@@ -12,8 +12,10 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
import { AgentLogger } from './logger.js';
|
import { AgentLogger } from './logger.js';
|
||||||
|
import { WorkflowLogger, type AgentLogDetails, type WorkflowSummary } from './workflow-logger.js';
|
||||||
import { MetricsTracker } from './metrics-tracker.js';
|
import { MetricsTracker } from './metrics-tracker.js';
|
||||||
import { initializeAuditStructure, formatTimestamp, type SessionMetadata } from './utils.js';
|
import { initializeAuditStructure, type SessionMetadata } from './utils.js';
|
||||||
|
import { formatTimestamp } from '../utils/formatting.js';
|
||||||
import { SessionMutex } from '../utils/concurrency.js';
|
import { SessionMutex } from '../utils/concurrency.js';
|
||||||
|
|
||||||
// Global mutex instance
|
// Global mutex instance
|
||||||
@@ -36,7 +38,9 @@ export class AuditSession {
|
|||||||
private sessionMetadata: SessionMetadata;
|
private sessionMetadata: SessionMetadata;
|
||||||
private sessionId: string;
|
private sessionId: string;
|
||||||
private metricsTracker: MetricsTracker;
|
private metricsTracker: MetricsTracker;
|
||||||
|
private workflowLogger: WorkflowLogger;
|
||||||
private currentLogger: AgentLogger | null = null;
|
private currentLogger: AgentLogger | null = null;
|
||||||
|
private currentAgentName: string | null = null;
|
||||||
private initialized: boolean = false;
|
private initialized: boolean = false;
|
||||||
|
|
||||||
constructor(sessionMetadata: SessionMetadata) {
|
constructor(sessionMetadata: SessionMetadata) {
|
||||||
@@ -53,6 +57,7 @@ export class AuditSession {
|
|||||||
|
|
||||||
// Components
|
// Components
|
||||||
this.metricsTracker = new MetricsTracker(sessionMetadata);
|
this.metricsTracker = new MetricsTracker(sessionMetadata);
|
||||||
|
this.workflowLogger = new WorkflowLogger(sessionMetadata);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -70,6 +75,9 @@ export class AuditSession {
|
|||||||
// Initialize metrics tracker (loads or creates session.json)
|
// Initialize metrics tracker (loads or creates session.json)
|
||||||
await this.metricsTracker.initialize();
|
await this.metricsTracker.initialize();
|
||||||
|
|
||||||
|
// Initialize workflow logger
|
||||||
|
await this.workflowLogger.initialize();
|
||||||
|
|
||||||
this.initialized = true;
|
this.initialized = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -97,6 +105,9 @@ export class AuditSession {
|
|||||||
await AgentLogger.savePrompt(this.sessionMetadata, agentName, promptContent);
|
await AgentLogger.savePrompt(this.sessionMetadata, agentName, promptContent);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Track current agent name for workflow logging
|
||||||
|
this.currentAgentName = agentName;
|
||||||
|
|
||||||
// Create and initialize logger for this attempt
|
// Create and initialize logger for this attempt
|
||||||
this.currentLogger = new AgentLogger(this.sessionMetadata, agentName, attemptNumber);
|
this.currentLogger = new AgentLogger(this.sessionMetadata, agentName, attemptNumber);
|
||||||
await this.currentLogger.initialize();
|
await this.currentLogger.initialize();
|
||||||
@@ -110,6 +121,9 @@ export class AuditSession {
|
|||||||
attemptNumber,
|
attemptNumber,
|
||||||
timestamp: formatTimestamp(),
|
timestamp: formatTimestamp(),
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Log to unified workflow log
|
||||||
|
await this.workflowLogger.logAgent(agentName, 'start', { attemptNumber });
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -120,7 +134,30 @@ export class AuditSession {
|
|||||||
throw new Error('No active logger. Call startAgent() first.');
|
throw new Error('No active logger. Call startAgent() first.');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Log to agent-specific log file (JSON format)
|
||||||
await this.currentLogger.logEvent(eventType, eventData);
|
await this.currentLogger.logEvent(eventType, eventData);
|
||||||
|
|
||||||
|
// Also log to unified workflow log (human-readable format)
|
||||||
|
const data = eventData as Record<string, unknown>;
|
||||||
|
const agentName = this.currentAgentName || 'unknown';
|
||||||
|
switch (eventType) {
|
||||||
|
case 'tool_start':
|
||||||
|
await this.workflowLogger.logToolStart(
|
||||||
|
agentName,
|
||||||
|
String(data.toolName || ''),
|
||||||
|
data.parameters
|
||||||
|
);
|
||||||
|
break;
|
||||||
|
case 'llm_response':
|
||||||
|
await this.workflowLogger.logLlmResponse(
|
||||||
|
agentName,
|
||||||
|
Number(data.turn || 0),
|
||||||
|
String(data.content || '')
|
||||||
|
);
|
||||||
|
break;
|
||||||
|
// tool_end and error events are intentionally not logged to workflow log
|
||||||
|
// to reduce noise - the agent completion message captures the outcome
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -142,10 +179,23 @@ export class AuditSession {
|
|||||||
this.currentLogger = null;
|
this.currentLogger = null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Reset current agent name
|
||||||
|
this.currentAgentName = null;
|
||||||
|
|
||||||
|
// Log to unified workflow log
|
||||||
|
const agentLogDetails: AgentLogDetails = {
|
||||||
|
attemptNumber: result.attemptNumber,
|
||||||
|
duration_ms: result.duration_ms,
|
||||||
|
cost_usd: result.cost_usd,
|
||||||
|
success: result.success,
|
||||||
|
...(result.error !== undefined && { error: result.error }),
|
||||||
|
};
|
||||||
|
await this.workflowLogger.logAgent(agentName, 'end', agentLogDetails);
|
||||||
|
|
||||||
// Mutex-protected update to session.json
|
// Mutex-protected update to session.json
|
||||||
const unlock = await sessionMutex.lock(this.sessionId);
|
const unlock = await sessionMutex.lock(this.sessionId);
|
||||||
try {
|
try {
|
||||||
// Reload metrics (in case of parallel updates)
|
// Reload inside mutex to prevent lost updates during parallel exploitation phase
|
||||||
await this.metricsTracker.reload();
|
await this.metricsTracker.reload();
|
||||||
|
|
||||||
// Update metrics
|
// Update metrics
|
||||||
@@ -177,4 +227,28 @@ export class AuditSession {
|
|||||||
await this.ensureInitialized();
|
await this.ensureInitialized();
|
||||||
return this.metricsTracker.getMetrics();
|
return this.metricsTracker.getMetrics();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log phase start to unified workflow log
|
||||||
|
*/
|
||||||
|
async logPhaseStart(phase: string): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
await this.workflowLogger.logPhase(phase, 'start');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log phase completion to unified workflow log
|
||||||
|
*/
|
||||||
|
async logPhaseComplete(phase: string): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
await this.workflowLogger.logPhase(phase, 'complete');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log workflow completion to unified workflow log
|
||||||
|
*/
|
||||||
|
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
await this.workflowLogger.logWorkflowComplete(summary);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -18,5 +18,6 @@
|
|||||||
|
|
||||||
export { AuditSession } from './audit-session.js';
|
export { AuditSession } from './audit-session.js';
|
||||||
export { AgentLogger } from './logger.js';
|
export { AgentLogger } from './logger.js';
|
||||||
|
export { WorkflowLogger } from './workflow-logger.js';
|
||||||
export { MetricsTracker } from './metrics-tracker.js';
|
export { MetricsTracker } from './metrics-tracker.js';
|
||||||
export * as AuditUtils from './utils.js';
|
export * as AuditUtils from './utils.js';
|
||||||
|
|||||||
+4
-13
@@ -15,10 +15,10 @@ import fs from 'fs';
|
|||||||
import {
|
import {
|
||||||
generateLogPath,
|
generateLogPath,
|
||||||
generatePromptPath,
|
generatePromptPath,
|
||||||
atomicWrite,
|
|
||||||
formatTimestamp,
|
|
||||||
type SessionMetadata,
|
type SessionMetadata,
|
||||||
} from './utils.js';
|
} from './utils.js';
|
||||||
|
import { atomicWrite } from '../utils/file-io.js';
|
||||||
|
import { formatTimestamp } from '../utils/formatting.js';
|
||||||
|
|
||||||
interface LogEvent {
|
interface LogEvent {
|
||||||
type: string;
|
type: string;
|
||||||
@@ -96,22 +96,13 @@ export class AgentLogger {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Write and flush immediately (crash-safe)
|
|
||||||
const needsDrain = !this.stream.write(text, 'utf8', (error) => {
|
const needsDrain = !this.stream.write(text, 'utf8', (error) => {
|
||||||
if (error) {
|
if (error) reject(error);
|
||||||
reject(error);
|
|
||||||
}
|
|
||||||
});
|
});
|
||||||
|
|
||||||
if (needsDrain) {
|
if (needsDrain) {
|
||||||
// Buffer is full, wait for drain
|
this.stream.once('drain', resolve);
|
||||||
const drainHandler = (): void => {
|
|
||||||
this.stream!.removeListener('drain', drainHandler);
|
|
||||||
resolve();
|
|
||||||
};
|
|
||||||
this.stream.once('drain', drainHandler);
|
|
||||||
} else {
|
} else {
|
||||||
// Buffer has space, resolve immediately
|
|
||||||
resolve();
|
resolve();
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -13,13 +13,12 @@
|
|||||||
|
|
||||||
import {
|
import {
|
||||||
generateSessionJsonPath,
|
generateSessionJsonPath,
|
||||||
atomicWrite,
|
|
||||||
readJson,
|
|
||||||
fileExists,
|
|
||||||
formatTimestamp,
|
|
||||||
calculatePercentage,
|
|
||||||
type SessionMetadata,
|
type SessionMetadata,
|
||||||
} from './utils.js';
|
} from './utils.js';
|
||||||
|
import { atomicWrite, readJson, fileExists } from '../utils/file-io.js';
|
||||||
|
import { formatTimestamp, calculatePercentage } from '../utils/formatting.js';
|
||||||
|
import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
|
||||||
|
import type { AgentName } from '../types/index.js';
|
||||||
|
|
||||||
interface AttemptData {
|
interface AttemptData {
|
||||||
attempt_number: number;
|
attempt_number: number;
|
||||||
@@ -152,16 +151,14 @@ export class MetricsTracker {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Initialize agent metrics if not exists
|
// Initialize agent metrics if not exists
|
||||||
if (!this.data.metrics.agents[agentName]) {
|
const existingAgent = this.data.metrics.agents[agentName];
|
||||||
this.data.metrics.agents[agentName] = {
|
const agent = existingAgent ?? {
|
||||||
status: 'in-progress',
|
status: 'in-progress' as const,
|
||||||
attempts: [],
|
attempts: [],
|
||||||
final_duration_ms: 0,
|
final_duration_ms: 0,
|
||||||
total_cost_usd: 0,
|
total_cost_usd: 0,
|
||||||
};
|
};
|
||||||
}
|
this.data.metrics.agents[agentName] = agent;
|
||||||
|
|
||||||
const agent = this.data.metrics.agents[agentName]!;
|
|
||||||
|
|
||||||
// Add attempt to array
|
// Add attempt to array
|
||||||
const attempt: AttemptData = {
|
const attempt: AttemptData = {
|
||||||
@@ -255,36 +252,19 @@ export class MetricsTracker {
|
|||||||
private calculatePhaseMetrics(
|
private calculatePhaseMetrics(
|
||||||
successfulAgents: Array<[string, AgentMetrics]>
|
successfulAgents: Array<[string, AgentMetrics]>
|
||||||
): Record<string, PhaseMetrics> {
|
): Record<string, PhaseMetrics> {
|
||||||
const phases: Record<string, AgentMetrics[]> = {
|
const phases: Record<PhaseName, AgentMetrics[]> = {
|
||||||
'pre-recon': [],
|
'pre-recon': [],
|
||||||
recon: [],
|
'recon': [],
|
||||||
'vulnerability-analysis': [],
|
'vulnerability-analysis': [],
|
||||||
exploitation: [],
|
'exploitation': [],
|
||||||
reporting: [],
|
'reporting': [],
|
||||||
};
|
};
|
||||||
|
|
||||||
// Map agents to phases
|
// Group agents by phase using imported AGENT_PHASE_MAP
|
||||||
const agentPhaseMap: Record<string, string> = {
|
|
||||||
'pre-recon': 'pre-recon',
|
|
||||||
recon: 'recon',
|
|
||||||
'injection-vuln': 'vulnerability-analysis',
|
|
||||||
'xss-vuln': 'vulnerability-analysis',
|
|
||||||
'auth-vuln': 'vulnerability-analysis',
|
|
||||||
'authz-vuln': 'vulnerability-analysis',
|
|
||||||
'ssrf-vuln': 'vulnerability-analysis',
|
|
||||||
'injection-exploit': 'exploitation',
|
|
||||||
'xss-exploit': 'exploitation',
|
|
||||||
'auth-exploit': 'exploitation',
|
|
||||||
'authz-exploit': 'exploitation',
|
|
||||||
'ssrf-exploit': 'exploitation',
|
|
||||||
report: 'reporting',
|
|
||||||
};
|
|
||||||
|
|
||||||
// Group agents by phase
|
|
||||||
for (const [agentName, agentData] of successfulAgents) {
|
for (const [agentName, agentData] of successfulAgents) {
|
||||||
const phase = agentPhaseMap[agentName];
|
const phase = AGENT_PHASE_MAP[agentName as AgentName];
|
||||||
if (phase && phases[phase]) {
|
if (phase) {
|
||||||
phases[phase]!.push(agentData);
|
phases[phase].push(agentData);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -296,7 +276,6 @@ export class MetricsTracker {
|
|||||||
if (agentList.length === 0) continue;
|
if (agentList.length === 0) continue;
|
||||||
|
|
||||||
const phaseDuration = agentList.reduce((sum, agent) => sum + agent.final_duration_ms, 0);
|
const phaseDuration = agentList.reduce((sum, agent) => sum + agent.final_duration_ms, 0);
|
||||||
|
|
||||||
const phaseCost = agentList.reduce((sum, agent) => sum + agent.total_cost_usd, 0);
|
const phaseCost = agentList.reduce((sum, agent) => sum + agent.total_cost_usd, 0);
|
||||||
|
|
||||||
phaseMetrics[phaseName] = {
|
phaseMetrics[phaseName] = {
|
||||||
|
|||||||
+18
-4
@@ -31,12 +31,18 @@ export interface SessionMetadata {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Generate standardized session identifier: {hostname}_{sessionId}
|
* Extract and sanitize hostname from URL for use in identifiers
|
||||||
|
*/
|
||||||
|
export function sanitizeHostname(url: string): string {
|
||||||
|
return new URL(url).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generate standardized session identifier from workflow ID
|
||||||
|
* Workflow IDs already contain hostname, so we use them directly
|
||||||
*/
|
*/
|
||||||
export function generateSessionIdentifier(sessionMetadata: SessionMetadata): string {
|
export function generateSessionIdentifier(sessionMetadata: SessionMetadata): string {
|
||||||
const { id, webUrl } = sessionMetadata;
|
return sessionMetadata.id;
|
||||||
const hostname = new URL(webUrl).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
|
||||||
return `${hostname}_${id}`;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -79,6 +85,14 @@ export function generateSessionJsonPath(sessionMetadata: SessionMetadata): strin
|
|||||||
return path.join(auditPath, 'session.json');
|
return path.join(auditPath, 'session.json');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generate path to workflow.log file
|
||||||
|
*/
|
||||||
|
export function generateWorkflowLogPath(sessionMetadata: SessionMetadata): string {
|
||||||
|
const auditPath = generateAuditPath(sessionMetadata);
|
||||||
|
return path.join(auditPath, 'workflow.log');
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Ensure directory exists (idempotent, race-safe)
|
* Ensure directory exists (idempotent, race-safe)
|
||||||
*/
|
*/
|
||||||
|
|||||||
@@ -0,0 +1,382 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Workflow Logger
|
||||||
|
*
|
||||||
|
* Provides a unified, human-readable log file per workflow.
|
||||||
|
* Optimized for `tail -f` viewing during concurrent workflow execution.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import fs from 'fs';
|
||||||
|
import path from 'path';
|
||||||
|
import { generateWorkflowLogPath, ensureDirectory, type SessionMetadata } from './utils.js';
|
||||||
|
import { formatDuration, formatTimestamp } from '../utils/formatting.js';
|
||||||
|
|
||||||
|
export interface AgentLogDetails {
|
||||||
|
attemptNumber?: number;
|
||||||
|
duration_ms?: number;
|
||||||
|
cost_usd?: number;
|
||||||
|
success?: boolean;
|
||||||
|
error?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface AgentMetricsSummary {
|
||||||
|
durationMs: number;
|
||||||
|
costUsd: number | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface WorkflowSummary {
|
||||||
|
status: 'completed' | 'failed';
|
||||||
|
totalDurationMs: number;
|
||||||
|
totalCostUsd: number;
|
||||||
|
completedAgents: string[];
|
||||||
|
agentMetrics: Record<string, AgentMetricsSummary>;
|
||||||
|
error?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* WorkflowLogger - Manages the unified workflow log file
|
||||||
|
*/
|
||||||
|
export class WorkflowLogger {
|
||||||
|
private sessionMetadata: SessionMetadata;
|
||||||
|
private logPath: string;
|
||||||
|
private stream: fs.WriteStream | null = null;
|
||||||
|
private initialized: boolean = false;
|
||||||
|
|
||||||
|
constructor(sessionMetadata: SessionMetadata) {
|
||||||
|
this.sessionMetadata = sessionMetadata;
|
||||||
|
this.logPath = generateWorkflowLogPath(sessionMetadata);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize the log stream (creates file and writes header)
|
||||||
|
*/
|
||||||
|
async initialize(): Promise<void> {
|
||||||
|
if (this.initialized) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ensure directory exists
|
||||||
|
await ensureDirectory(path.dirname(this.logPath));
|
||||||
|
|
||||||
|
// Create write stream with append mode
|
||||||
|
this.stream = fs.createWriteStream(this.logPath, {
|
||||||
|
flags: 'a',
|
||||||
|
encoding: 'utf8',
|
||||||
|
autoClose: true,
|
||||||
|
});
|
||||||
|
|
||||||
|
this.initialized = true;
|
||||||
|
|
||||||
|
// Write header only if file is new (empty)
|
||||||
|
const stats = await fs.promises.stat(this.logPath).catch(() => null);
|
||||||
|
if (!stats || stats.size === 0) {
|
||||||
|
await this.writeHeader();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Write header to log file
|
||||||
|
*/
|
||||||
|
private async writeHeader(): Promise<void> {
|
||||||
|
const header = [
|
||||||
|
`================================================================================`,
|
||||||
|
`Shannon Pentest - Workflow Log`,
|
||||||
|
`================================================================================`,
|
||||||
|
`Workflow ID: ${this.sessionMetadata.id}`,
|
||||||
|
`Target URL: ${this.sessionMetadata.webUrl}`,
|
||||||
|
`Started: ${formatTimestamp()}`,
|
||||||
|
`================================================================================`,
|
||||||
|
``,
|
||||||
|
].join('\n');
|
||||||
|
|
||||||
|
return this.writeRaw(header);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Write raw text to log file with immediate flush
|
||||||
|
*/
|
||||||
|
private writeRaw(text: string): Promise<void> {
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
if (!this.initialized || !this.stream) {
|
||||||
|
reject(new Error('WorkflowLogger not initialized'));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const needsDrain = !this.stream.write(text, 'utf8', (error) => {
|
||||||
|
if (error) reject(error);
|
||||||
|
});
|
||||||
|
|
||||||
|
if (needsDrain) {
|
||||||
|
this.stream.once('drain', resolve);
|
||||||
|
} else {
|
||||||
|
resolve();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Format timestamp for log line (local time, human readable)
|
||||||
|
*/
|
||||||
|
private formatLogTime(): string {
|
||||||
|
const now = new Date();
|
||||||
|
return now.toISOString().replace('T', ' ').slice(0, 19);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log a phase transition event
|
||||||
|
*/
|
||||||
|
async logPhase(phase: string, event: 'start' | 'complete'): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
const action = event === 'start' ? 'Starting' : 'Completed';
|
||||||
|
const line = `[${this.formatLogTime()}] [PHASE] ${action}: ${phase}\n`;
|
||||||
|
|
||||||
|
// Add blank line before phase start for readability
|
||||||
|
if (event === 'start') {
|
||||||
|
await this.writeRaw('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log an agent event
|
||||||
|
*/
|
||||||
|
async logAgent(
|
||||||
|
agentName: string,
|
||||||
|
event: 'start' | 'end',
|
||||||
|
details?: AgentLogDetails
|
||||||
|
): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
let message: string;
|
||||||
|
|
||||||
|
if (event === 'start') {
|
||||||
|
const attempt = details?.attemptNumber ?? 1;
|
||||||
|
message = `${agentName}: Starting (attempt ${attempt})`;
|
||||||
|
} else {
|
||||||
|
const parts: string[] = [agentName + ':'];
|
||||||
|
|
||||||
|
if (details?.success === false) {
|
||||||
|
parts.push('Failed');
|
||||||
|
if (details?.error) {
|
||||||
|
parts.push(`- ${details.error}`);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
parts.push('Completed');
|
||||||
|
}
|
||||||
|
|
||||||
|
if (details?.duration_ms !== undefined) {
|
||||||
|
parts.push(`(${formatDuration(details.duration_ms)}`);
|
||||||
|
if (details?.cost_usd !== undefined) {
|
||||||
|
parts.push(`$${details.cost_usd.toFixed(2)})`);
|
||||||
|
} else {
|
||||||
|
parts.push(')');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
message = parts.join(' ');
|
||||||
|
}
|
||||||
|
|
||||||
|
const line = `[${this.formatLogTime()}] [AGENT] ${message}\n`;
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log a general event
|
||||||
|
*/
|
||||||
|
async logEvent(eventType: string, message: string): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
const line = `[${this.formatLogTime()}] [${eventType.toUpperCase()}] ${message}\n`;
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log an error
|
||||||
|
*/
|
||||||
|
async logError(error: Error, context?: string): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
const contextStr = context ? ` (${context})` : '';
|
||||||
|
const line = `[${this.formatLogTime()}] [ERROR] ${error.message}${contextStr}\n`;
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Truncate string to max length with ellipsis
|
||||||
|
*/
|
||||||
|
private truncate(str: string, maxLen: number): string {
|
||||||
|
if (str.length <= maxLen) return str;
|
||||||
|
return str.slice(0, maxLen - 3) + '...';
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Format tool parameters for human-readable display
|
||||||
|
*/
|
||||||
|
private formatToolParams(toolName: string, params: unknown): string {
|
||||||
|
if (!params || typeof params !== 'object') {
|
||||||
|
return '';
|
||||||
|
}
|
||||||
|
|
||||||
|
const p = params as Record<string, unknown>;
|
||||||
|
|
||||||
|
// Tool-specific formatting for common tools
|
||||||
|
switch (toolName) {
|
||||||
|
case 'Bash':
|
||||||
|
if (p.command) {
|
||||||
|
return this.truncate(String(p.command).replace(/\n/g, ' '), 100);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'Read':
|
||||||
|
if (p.file_path) {
|
||||||
|
return String(p.file_path);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'Write':
|
||||||
|
if (p.file_path) {
|
||||||
|
return String(p.file_path);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'Edit':
|
||||||
|
if (p.file_path) {
|
||||||
|
return String(p.file_path);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'Glob':
|
||||||
|
if (p.pattern) {
|
||||||
|
return String(p.pattern);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'Grep':
|
||||||
|
if (p.pattern) {
|
||||||
|
const path = p.path ? ` in ${p.path}` : '';
|
||||||
|
return `"${this.truncate(String(p.pattern), 50)}"${path}`;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'WebFetch':
|
||||||
|
if (p.url) {
|
||||||
|
return String(p.url);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'mcp__playwright__browser_navigate':
|
||||||
|
if (p.url) {
|
||||||
|
return String(p.url);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'mcp__playwright__browser_click':
|
||||||
|
if (p.selector) {
|
||||||
|
return this.truncate(String(p.selector), 60);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 'mcp__playwright__browser_type':
|
||||||
|
if (p.selector) {
|
||||||
|
const text = p.text ? `: "${this.truncate(String(p.text), 30)}"` : '';
|
||||||
|
return `${this.truncate(String(p.selector), 40)}${text}`;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default: show first string-valued param truncated
|
||||||
|
for (const [key, val] of Object.entries(p)) {
|
||||||
|
if (typeof val === 'string' && val.length > 0) {
|
||||||
|
return `${key}=${this.truncate(val, 60)}`;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return '';
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log tool start event
|
||||||
|
*/
|
||||||
|
async logToolStart(agentName: string, toolName: string, parameters: unknown): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
const params = this.formatToolParams(toolName, parameters);
|
||||||
|
const paramStr = params ? `: ${params}` : '';
|
||||||
|
const line = `[${this.formatLogTime()}] [${agentName}] [TOOL] ${toolName}${paramStr}\n`;
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log LLM response
|
||||||
|
*/
|
||||||
|
async logLlmResponse(agentName: string, turn: number, content: string): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
// Show full content, replacing newlines with escaped version for single-line output
|
||||||
|
const escaped = content.replace(/\n/g, '\\n');
|
||||||
|
const line = `[${this.formatLogTime()}] [${agentName}] [LLM] Turn ${turn}: ${escaped}\n`;
|
||||||
|
await this.writeRaw(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log workflow completion with full summary
|
||||||
|
*/
|
||||||
|
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||||
|
await this.ensureInitialized();
|
||||||
|
|
||||||
|
const status = summary.status === 'completed' ? 'COMPLETED' : 'FAILED';
|
||||||
|
|
||||||
|
await this.writeRaw('\n');
|
||||||
|
await this.writeRaw(`================================================================================\n`);
|
||||||
|
await this.writeRaw(`Workflow ${status}\n`);
|
||||||
|
await this.writeRaw(`────────────────────────────────────────\n`);
|
||||||
|
await this.writeRaw(`Workflow ID: ${this.sessionMetadata.id}\n`);
|
||||||
|
await this.writeRaw(`Status: ${summary.status}\n`);
|
||||||
|
await this.writeRaw(`Duration: ${formatDuration(summary.totalDurationMs)}\n`);
|
||||||
|
await this.writeRaw(`Total Cost: $${summary.totalCostUsd.toFixed(4)}\n`);
|
||||||
|
await this.writeRaw(`Agents: ${summary.completedAgents.length} completed\n`);
|
||||||
|
|
||||||
|
if (summary.error) {
|
||||||
|
await this.writeRaw(`Error: ${summary.error}\n`);
|
||||||
|
}
|
||||||
|
|
||||||
|
await this.writeRaw(`\n`);
|
||||||
|
await this.writeRaw(`Agent Breakdown:\n`);
|
||||||
|
|
||||||
|
for (const agentName of summary.completedAgents) {
|
||||||
|
const metrics = summary.agentMetrics[agentName];
|
||||||
|
if (metrics) {
|
||||||
|
const duration = formatDuration(metrics.durationMs);
|
||||||
|
const cost = metrics.costUsd !== null ? `$${metrics.costUsd.toFixed(4)}` : 'N/A';
|
||||||
|
await this.writeRaw(` - ${agentName} (${duration}, ${cost})\n`);
|
||||||
|
} else {
|
||||||
|
await this.writeRaw(` - ${agentName}\n`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
await this.writeRaw(`================================================================================\n`);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Ensure initialized (helper for lazy initialization)
|
||||||
|
*/
|
||||||
|
private async ensureInitialized(): Promise<void> {
|
||||||
|
if (!this.initialized) {
|
||||||
|
await this.initialize();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Close the log stream
|
||||||
|
*/
|
||||||
|
async close(): Promise<void> {
|
||||||
|
if (!this.initialized || !this.stream) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
return new Promise((resolve) => {
|
||||||
|
this.stream!.end(() => {
|
||||||
|
this.initialized = false;
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
+186
-74
@@ -14,6 +14,12 @@ import type {
|
|||||||
PromptErrorResult,
|
PromptErrorResult,
|
||||||
} from './types/errors.js';
|
} from './types/errors.js';
|
||||||
|
|
||||||
|
// Temporal error classification for ApplicationFailure wrapping
|
||||||
|
export interface TemporalErrorClassification {
|
||||||
|
type: string;
|
||||||
|
retryable: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
// Custom error class for pentest operations
|
// Custom error class for pentest operations
|
||||||
export class PentestError extends Error {
|
export class PentestError extends Error {
|
||||||
name = 'PentestError' as const;
|
name = 'PentestError' as const;
|
||||||
@@ -37,11 +43,11 @@ export class PentestError extends Error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Centralized error logging function
|
// Centralized error logging function
|
||||||
export const logError = async (
|
export async function logError(
|
||||||
error: Error & { type?: PentestErrorType; retryable?: boolean; context?: PentestErrorContext },
|
error: Error & { type?: PentestErrorType; retryable?: boolean; context?: PentestErrorContext },
|
||||||
contextMsg: string,
|
contextMsg: string,
|
||||||
sourceDir: string | null = null
|
sourceDir: string | null = null
|
||||||
): Promise<LogEntry> => {
|
): Promise<LogEntry> {
|
||||||
const timestamp = new Date().toISOString();
|
const timestamp = new Date().toISOString();
|
||||||
const logEntry: LogEntry = {
|
const logEntry: LogEntry = {
|
||||||
timestamp,
|
timestamp,
|
||||||
@@ -80,13 +86,13 @@ export const logError = async (
|
|||||||
}
|
}
|
||||||
|
|
||||||
return logEntry;
|
return logEntry;
|
||||||
};
|
}
|
||||||
|
|
||||||
// Handle tool execution errors
|
// Handle tool execution errors
|
||||||
export const handleToolError = (
|
export function handleToolError(
|
||||||
toolName: string,
|
toolName: string,
|
||||||
error: Error & { code?: string }
|
error: Error & { code?: string }
|
||||||
): ToolErrorResult => {
|
): ToolErrorResult {
|
||||||
const isRetryable =
|
const isRetryable =
|
||||||
error.code === 'ECONNRESET' ||
|
error.code === 'ECONNRESET' ||
|
||||||
error.code === 'ETIMEDOUT' ||
|
error.code === 'ETIMEDOUT' ||
|
||||||
@@ -105,13 +111,13 @@ export const handleToolError = (
|
|||||||
{ toolName, originalError: error.message, errorCode: error.code }
|
{ toolName, originalError: error.message, errorCode: error.code }
|
||||||
),
|
),
|
||||||
};
|
};
|
||||||
};
|
}
|
||||||
|
|
||||||
// Handle prompt loading errors
|
// Handle prompt loading errors
|
||||||
export const handlePromptError = (
|
export function handlePromptError(
|
||||||
promptName: string,
|
promptName: string,
|
||||||
error: Error
|
error: Error
|
||||||
): PromptErrorResult => {
|
): PromptErrorResult {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: new PentestError(
|
error: new PentestError(
|
||||||
@@ -121,78 +127,63 @@ export const handlePromptError = (
|
|||||||
{ promptName, originalError: error.message }
|
{ promptName, originalError: error.message }
|
||||||
),
|
),
|
||||||
};
|
};
|
||||||
};
|
}
|
||||||
|
|
||||||
// Check if an error should trigger a retry for Claude agents
|
// Patterns that indicate retryable errors
|
||||||
export const isRetryableError = (error: Error): boolean => {
|
const RETRYABLE_PATTERNS = [
|
||||||
|
// Network and connection errors
|
||||||
|
'network',
|
||||||
|
'connection',
|
||||||
|
'timeout',
|
||||||
|
'econnreset',
|
||||||
|
'enotfound',
|
||||||
|
'econnrefused',
|
||||||
|
// Rate limiting
|
||||||
|
'rate limit',
|
||||||
|
'429',
|
||||||
|
'too many requests',
|
||||||
|
// Server errors
|
||||||
|
'server error',
|
||||||
|
'5xx',
|
||||||
|
'internal server error',
|
||||||
|
'service unavailable',
|
||||||
|
'bad gateway',
|
||||||
|
// Claude API errors
|
||||||
|
'mcp server',
|
||||||
|
'model unavailable',
|
||||||
|
'service temporarily unavailable',
|
||||||
|
'api error',
|
||||||
|
'terminated',
|
||||||
|
// Max turns
|
||||||
|
'max turns',
|
||||||
|
'maximum turns',
|
||||||
|
];
|
||||||
|
|
||||||
|
// Patterns that indicate non-retryable errors (checked before default)
|
||||||
|
const NON_RETRYABLE_PATTERNS = [
|
||||||
|
'authentication',
|
||||||
|
'invalid prompt',
|
||||||
|
'out of memory',
|
||||||
|
'permission denied',
|
||||||
|
'session limit reached',
|
||||||
|
'invalid api key',
|
||||||
|
];
|
||||||
|
|
||||||
|
// Conservative retry classification - unknown errors don't retry (fail-safe default)
|
||||||
|
export function isRetryableError(error: Error): boolean {
|
||||||
const message = error.message.toLowerCase();
|
const message = error.message.toLowerCase();
|
||||||
|
|
||||||
// Network and connection errors - always retryable
|
// Check for explicit non-retryable patterns first
|
||||||
if (
|
if (NON_RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern))) {
|
||||||
message.includes('network') ||
|
|
||||||
message.includes('connection') ||
|
|
||||||
message.includes('timeout') ||
|
|
||||||
message.includes('econnreset') ||
|
|
||||||
message.includes('enotfound') ||
|
|
||||||
message.includes('econnrefused')
|
|
||||||
) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Rate limiting - retryable with longer backoff
|
|
||||||
if (
|
|
||||||
message.includes('rate limit') ||
|
|
||||||
message.includes('429') ||
|
|
||||||
message.includes('too many requests')
|
|
||||||
) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Server errors - retryable
|
|
||||||
if (
|
|
||||||
message.includes('server error') ||
|
|
||||||
message.includes('5xx') ||
|
|
||||||
message.includes('internal server error') ||
|
|
||||||
message.includes('service unavailable') ||
|
|
||||||
message.includes('bad gateway')
|
|
||||||
) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Claude API specific errors - retryable
|
|
||||||
if (
|
|
||||||
message.includes('mcp server') ||
|
|
||||||
message.includes('model unavailable') ||
|
|
||||||
message.includes('service temporarily unavailable') ||
|
|
||||||
message.includes('api error') ||
|
|
||||||
message.includes('terminated')
|
|
||||||
) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Max turns without completion - retryable once
|
|
||||||
if (message.includes('max turns') || message.includes('maximum turns')) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Non-retryable errors
|
|
||||||
if (
|
|
||||||
message.includes('authentication') ||
|
|
||||||
message.includes('invalid prompt') ||
|
|
||||||
message.includes('out of memory') ||
|
|
||||||
message.includes('permission denied') ||
|
|
||||||
message.includes('session limit reached') ||
|
|
||||||
message.includes('invalid api key')
|
|
||||||
) {
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Default to non-retryable for unknown errors
|
// Check for retryable patterns
|
||||||
return false;
|
return RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern));
|
||||||
};
|
}
|
||||||
|
|
||||||
// Get retry delay based on error type and attempt number
|
// Rate limit errors get longer base delay (30s) vs standard exponential backoff (2s)
|
||||||
export const getRetryDelay = (error: Error, attempt: number): number => {
|
export function getRetryDelay(error: Error, attempt: number): number {
|
||||||
const message = error.message.toLowerCase();
|
const message = error.message.toLowerCase();
|
||||||
|
|
||||||
// Rate limiting gets longer delays
|
// Rate limiting gets longer delays
|
||||||
@@ -204,4 +195,125 @@ export const getRetryDelay = (error: Error, attempt: number): number => {
|
|||||||
const baseDelay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
|
const baseDelay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
|
||||||
const jitter = Math.random() * 1000; // 0-1s random
|
const jitter = Math.random() * 1000; // 0-1s random
|
||||||
return Math.min(baseDelay + jitter, 30000); // Max 30s
|
return Math.min(baseDelay + jitter, 30000); // Max 30s
|
||||||
};
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Classifies errors for Temporal workflow retry behavior.
|
||||||
|
* Returns error type and whether Temporal should retry.
|
||||||
|
*
|
||||||
|
* Used by activities to wrap errors in ApplicationFailure:
|
||||||
|
* - Retryable errors: Temporal retries with configured backoff
|
||||||
|
* - Non-retryable errors: Temporal fails immediately
|
||||||
|
*/
|
||||||
|
export function classifyErrorForTemporal(error: unknown): TemporalErrorClassification {
|
||||||
|
const message = (error instanceof Error ? error.message : String(error)).toLowerCase();
|
||||||
|
|
||||||
|
// === BILLING ERRORS (Retryable with long backoff) ===
|
||||||
|
// Anthropic returns billing as 400 invalid_request_error
|
||||||
|
// Human can add credits OR wait for spending cap to reset (5-30 min backoff)
|
||||||
|
if (
|
||||||
|
message.includes('billing_error') ||
|
||||||
|
message.includes('credit balance is too low') ||
|
||||||
|
message.includes('insufficient credits') ||
|
||||||
|
message.includes('usage is blocked due to insufficient credits') ||
|
||||||
|
message.includes('please visit plans & billing') ||
|
||||||
|
message.includes('please visit plans and billing') ||
|
||||||
|
message.includes('usage limit reached') ||
|
||||||
|
message.includes('quota exceeded') ||
|
||||||
|
message.includes('daily rate limit') ||
|
||||||
|
message.includes('limit will reset') ||
|
||||||
|
// Claude Code spending cap patterns (returns short message instead of error)
|
||||||
|
message.includes('spending cap') ||
|
||||||
|
message.includes('spending limit') ||
|
||||||
|
message.includes('cap reached') ||
|
||||||
|
message.includes('budget exceeded') ||
|
||||||
|
message.includes('billing limit reached')
|
||||||
|
) {
|
||||||
|
return { type: 'BillingError', retryable: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
// === PERMANENT ERRORS (Non-retryable) ===
|
||||||
|
|
||||||
|
// Authentication (401) - bad API key won't fix itself
|
||||||
|
if (
|
||||||
|
message.includes('authentication') ||
|
||||||
|
message.includes('api key') ||
|
||||||
|
message.includes('401') ||
|
||||||
|
message.includes('authentication_error')
|
||||||
|
) {
|
||||||
|
return { type: 'AuthenticationError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Permission (403) - access won't be granted
|
||||||
|
if (
|
||||||
|
message.includes('permission') ||
|
||||||
|
message.includes('forbidden') ||
|
||||||
|
message.includes('403')
|
||||||
|
) {
|
||||||
|
return { type: 'PermissionError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// === OUTPUT VALIDATION ERRORS (Retryable) ===
|
||||||
|
// Agent didn't produce expected deliverables - retry may succeed
|
||||||
|
// IMPORTANT: Must come BEFORE generic 'validation' check below
|
||||||
|
if (
|
||||||
|
message.includes('failed output validation') ||
|
||||||
|
message.includes('output validation failed')
|
||||||
|
) {
|
||||||
|
return { type: 'OutputValidationError', retryable: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Invalid Request (400) - malformed request is permanent
|
||||||
|
// Note: Checked AFTER billing and AFTER output validation
|
||||||
|
if (
|
||||||
|
message.includes('invalid_request_error') ||
|
||||||
|
message.includes('malformed') ||
|
||||||
|
message.includes('validation')
|
||||||
|
) {
|
||||||
|
return { type: 'InvalidRequestError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Request Too Large (413) - won't fit no matter how many retries
|
||||||
|
if (
|
||||||
|
message.includes('request_too_large') ||
|
||||||
|
message.includes('too large') ||
|
||||||
|
message.includes('413')
|
||||||
|
) {
|
||||||
|
return { type: 'RequestTooLargeError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Configuration errors - missing files need manual fix
|
||||||
|
if (
|
||||||
|
message.includes('enoent') ||
|
||||||
|
message.includes('no such file') ||
|
||||||
|
message.includes('cli not installed')
|
||||||
|
) {
|
||||||
|
return { type: 'ConfigurationError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execution limits - max turns/budget reached
|
||||||
|
if (
|
||||||
|
message.includes('max turns') ||
|
||||||
|
message.includes('budget') ||
|
||||||
|
message.includes('execution limit') ||
|
||||||
|
message.includes('error_max_turns') ||
|
||||||
|
message.includes('error_max_budget')
|
||||||
|
) {
|
||||||
|
return { type: 'ExecutionLimitError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Invalid target URL - bad URL format won't fix itself
|
||||||
|
if (
|
||||||
|
message.includes('invalid url') ||
|
||||||
|
message.includes('invalid target') ||
|
||||||
|
message.includes('malformed url') ||
|
||||||
|
message.includes('invalid uri')
|
||||||
|
) {
|
||||||
|
return { type: 'InvalidTargetError', retryable: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
// === TRANSIENT ERRORS (Retryable) ===
|
||||||
|
// Rate limits (429), server errors (5xx), network issues
|
||||||
|
// Let Temporal retry with configured backoff
|
||||||
|
return { type: 'TransientError', retryable: true };
|
||||||
|
}
|
||||||
|
|||||||
+55
-68
@@ -7,7 +7,7 @@
|
|||||||
import { $, fs, path } from 'zx';
|
import { $, fs, path } from 'zx';
|
||||||
import chalk from 'chalk';
|
import chalk from 'chalk';
|
||||||
import { Timer } from '../utils/metrics.js';
|
import { Timer } from '../utils/metrics.js';
|
||||||
import { formatDuration } from '../audit/utils.js';
|
import { formatDuration } from '../utils/formatting.js';
|
||||||
import { handleToolError, PentestError } from '../error-handling.js';
|
import { handleToolError, PentestError } from '../error-handling.js';
|
||||||
import { AGENTS } from '../session-manager.js';
|
import { AGENTS } from '../session-manager.js';
|
||||||
import { runClaudePromptWithRetry } from '../ai/claude-executor.js';
|
import { runClaudePromptWithRetry } from '../ai/claude-executor.js';
|
||||||
@@ -40,11 +40,17 @@ interface PromptVariables {
|
|||||||
repoPath: string;
|
repoPath: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Discriminated union for Wave1 tool results - clearer than loose union types
|
||||||
|
type Wave1ToolResult =
|
||||||
|
| { kind: 'scan'; result: TerminalScanResult }
|
||||||
|
| { kind: 'skipped'; message: string }
|
||||||
|
| { kind: 'agent'; result: AgentResult };
|
||||||
|
|
||||||
interface Wave1Results {
|
interface Wave1Results {
|
||||||
nmap: TerminalScanResult | string | AgentResult;
|
nmap: Wave1ToolResult;
|
||||||
subfinder: TerminalScanResult | string | AgentResult;
|
subfinder: Wave1ToolResult;
|
||||||
whatweb: TerminalScanResult | string | AgentResult;
|
whatweb: Wave1ToolResult;
|
||||||
naabu?: TerminalScanResult | string | AgentResult;
|
naabu?: Wave1ToolResult;
|
||||||
codeAnalysis: AgentResult;
|
codeAnalysis: AgentResult;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -57,7 +63,7 @@ interface PreReconResult {
|
|||||||
report: string;
|
report: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Pure function: Run terminal scanning tools
|
// Runs external security tools (nmap, whatweb, etc). Schemathesis requires schemas from code analysis.
|
||||||
async function runTerminalScan(tool: ToolName, target: string, sourceDir: string | null = null): Promise<TerminalScanResult> {
|
async function runTerminalScan(tool: ToolName, target: string, sourceDir: string | null = null): Promise<TerminalScanResult> {
|
||||||
const timer = new Timer(`command-${tool}`);
|
const timer = new Timer(`command-${tool}`);
|
||||||
try {
|
try {
|
||||||
@@ -89,7 +95,7 @@ async function runTerminalScan(tool: ToolName, target: string, sourceDir: string
|
|||||||
return { tool: 'whatweb', output: result.stdout, status: 'success', duration: whatwebDuration };
|
return { tool: 'whatweb', output: result.stdout, status: 'success', duration: whatwebDuration };
|
||||||
}
|
}
|
||||||
case 'schemathesis': {
|
case 'schemathesis': {
|
||||||
// Only run if API schemas found
|
// Schemathesis depends on code analysis output - skip if no schemas found
|
||||||
const schemasDir = path.join(sourceDir || '.', 'outputs', 'schemas');
|
const schemasDir = path.join(sourceDir || '.', 'outputs', 'schemas');
|
||||||
if (await fs.pathExists(schemasDir)) {
|
if (await fs.pathExists(schemasDir)) {
|
||||||
const schemaFiles = await fs.readdir(schemasDir) as string[];
|
const schemaFiles = await fs.readdir(schemasDir) as string[];
|
||||||
@@ -146,6 +152,8 @@ async function runPreReconWave1(
|
|||||||
|
|
||||||
const operations: Promise<TerminalScanResult | AgentResult>[] = [];
|
const operations: Promise<TerminalScanResult | AgentResult>[] = [];
|
||||||
|
|
||||||
|
const skippedResult = (message: string): Wave1ToolResult => ({ kind: 'skipped', message });
|
||||||
|
|
||||||
// Skip external commands in pipeline testing mode
|
// Skip external commands in pipeline testing mode
|
||||||
if (pipelineTestingMode) {
|
if (pipelineTestingMode) {
|
||||||
console.log(chalk.gray(' ⏭️ Skipping external tools (pipeline testing mode)'));
|
console.log(chalk.gray(' ⏭️ Skipping external tools (pipeline testing mode)'));
|
||||||
@@ -163,9 +171,9 @@ async function runPreReconWave1(
|
|||||||
);
|
);
|
||||||
const [codeAnalysis] = await Promise.all(operations);
|
const [codeAnalysis] = await Promise.all(operations);
|
||||||
return {
|
return {
|
||||||
nmap: 'Skipped (pipeline testing mode)',
|
nmap: skippedResult('Skipped (pipeline testing mode)'),
|
||||||
subfinder: 'Skipped (pipeline testing mode)',
|
subfinder: skippedResult('Skipped (pipeline testing mode)'),
|
||||||
whatweb: 'Skipped (pipeline testing mode)',
|
whatweb: skippedResult('Skipped (pipeline testing mode)'),
|
||||||
codeAnalysis: codeAnalysis as AgentResult
|
codeAnalysis: codeAnalysis as AgentResult
|
||||||
};
|
};
|
||||||
} else {
|
} else {
|
||||||
@@ -192,9 +200,9 @@ async function runPreReconWave1(
|
|||||||
const [nmap, subfinder, whatweb, codeAnalysis] = await Promise.all(operations);
|
const [nmap, subfinder, whatweb, codeAnalysis] = await Promise.all(operations);
|
||||||
|
|
||||||
return {
|
return {
|
||||||
nmap: nmap as TerminalScanResult,
|
nmap: { kind: 'scan', result: nmap as TerminalScanResult },
|
||||||
subfinder: subfinder as TerminalScanResult,
|
subfinder: { kind: 'scan', result: subfinder as TerminalScanResult },
|
||||||
whatweb: whatweb as TerminalScanResult,
|
whatweb: { kind: 'scan', result: whatweb as TerminalScanResult },
|
||||||
codeAnalysis: codeAnalysis as AgentResult
|
codeAnalysis: codeAnalysis as AgentResult
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
@@ -250,17 +258,21 @@ async function runPreReconWave2(
|
|||||||
return response;
|
return response;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Helper type for stitching results
|
// Extracts status and output from a Wave1 tool result
|
||||||
interface StitchableResult {
|
function extractResult(r: Wave1ToolResult | undefined): { status: string; output: string } {
|
||||||
status?: string;
|
if (!r) return { status: 'Skipped', output: 'No output' };
|
||||||
output?: string;
|
switch (r.kind) {
|
||||||
tool?: string;
|
case 'scan':
|
||||||
|
return { status: r.result.status || 'Skipped', output: r.result.output || 'No output' };
|
||||||
|
case 'skipped':
|
||||||
|
return { status: 'Skipped', output: r.message };
|
||||||
|
case 'agent':
|
||||||
|
return { status: r.result.success ? 'success' : 'error', output: 'See agent output' };
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Pure function: Stitch together pre-recon outputs and save to file
|
// Combines tool outputs into single deliverable. Falls back to reference if file missing.
|
||||||
async function stitchPreReconOutputs(outputs: (StitchableResult | string | undefined)[], sourceDir: string): Promise<string> {
|
async function stitchPreReconOutputs(wave1: Wave1Results, additionalScans: TerminalScanResult[], sourceDir: string): Promise<string> {
|
||||||
const [nmap, subfinder, whatweb, naabu, codeAnalysis, ...additionalScans] = outputs;
|
|
||||||
|
|
||||||
// Try to read the code analysis deliverable file
|
// Try to read the code analysis deliverable file
|
||||||
let codeAnalysisContent = 'No analysis available';
|
let codeAnalysisContent = 'No analysis available';
|
||||||
try {
|
try {
|
||||||
@@ -269,62 +281,45 @@ async function stitchPreReconOutputs(outputs: (StitchableResult | string | undef
|
|||||||
} catch (error) {
|
} catch (error) {
|
||||||
const err = error as Error;
|
const err = error as Error;
|
||||||
console.log(chalk.yellow(`⚠️ Could not read code analysis deliverable: ${err.message}`));
|
console.log(chalk.yellow(`⚠️ Could not read code analysis deliverable: ${err.message}`));
|
||||||
// Fallback message if file doesn't exist
|
|
||||||
codeAnalysisContent = 'Analysis located in deliverables/code_analysis_deliverable.md';
|
codeAnalysisContent = 'Analysis located in deliverables/code_analysis_deliverable.md';
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
// Build additional scans section
|
// Build additional scans section
|
||||||
let additionalSection = '';
|
let additionalSection = '';
|
||||||
if (additionalScans && additionalScans.length > 0) {
|
if (additionalScans.length > 0) {
|
||||||
additionalSection = '\n## Authenticated Scans\n';
|
additionalSection = '\n## Authenticated Scans\n';
|
||||||
additionalScans.forEach(scan => {
|
for (const scan of additionalScans) {
|
||||||
const s = scan as StitchableResult;
|
additionalSection += `
|
||||||
if (s && s.tool) {
|
### ${scan.tool.toUpperCase()}
|
||||||
additionalSection += `
|
Status: ${scan.status}
|
||||||
### ${s.tool.toUpperCase()}
|
${scan.output}
|
||||||
Status: ${s.status}
|
|
||||||
${s.output}
|
|
||||||
`;
|
`;
|
||||||
}
|
}
|
||||||
});
|
|
||||||
}
|
}
|
||||||
|
|
||||||
const nmapResult = nmap as StitchableResult | string | undefined;
|
const nmap = extractResult(wave1.nmap);
|
||||||
const subfinderResult = subfinder as StitchableResult | string | undefined;
|
const subfinder = extractResult(wave1.subfinder);
|
||||||
const whatwebResult = whatweb as StitchableResult | string | undefined;
|
const whatweb = extractResult(wave1.whatweb);
|
||||||
const naabuResult = naabu as StitchableResult | string | undefined;
|
const naabu = extractResult(wave1.naabu);
|
||||||
|
|
||||||
const getStatus = (r: StitchableResult | string | undefined): string => {
|
|
||||||
if (!r) return 'Skipped';
|
|
||||||
if (typeof r === 'string') return 'Skipped';
|
|
||||||
return r.status || 'Skipped';
|
|
||||||
};
|
|
||||||
|
|
||||||
const getOutput = (r: StitchableResult | string | undefined): string => {
|
|
||||||
if (!r) return 'No output';
|
|
||||||
if (typeof r === 'string') return r;
|
|
||||||
return r.output || 'No output';
|
|
||||||
};
|
|
||||||
|
|
||||||
const report = `
|
const report = `
|
||||||
# Pre-Reconnaissance Report
|
# Pre-Reconnaissance Report
|
||||||
|
|
||||||
## Port Discovery (naabu)
|
## Port Discovery (naabu)
|
||||||
Status: ${getStatus(naabuResult)}
|
Status: ${naabu.status}
|
||||||
${getOutput(naabuResult)}
|
${naabu.output}
|
||||||
|
|
||||||
## Network Scanning (nmap)
|
## Network Scanning (nmap)
|
||||||
Status: ${getStatus(nmapResult)}
|
Status: ${nmap.status}
|
||||||
${getOutput(nmapResult)}
|
${nmap.output}
|
||||||
|
|
||||||
## Subdomain Discovery (subfinder)
|
## Subdomain Discovery (subfinder)
|
||||||
Status: ${getStatus(subfinderResult)}
|
Status: ${subfinder.status}
|
||||||
${getOutput(subfinderResult)}
|
${subfinder.output}
|
||||||
|
|
||||||
## Technology Detection (whatweb)
|
## Technology Detection (whatweb)
|
||||||
Status: ${getStatus(whatwebResult)}
|
Status: ${whatweb.status}
|
||||||
${getOutput(whatwebResult)}
|
${whatweb.output}
|
||||||
## Code Analysis
|
## Code Analysis
|
||||||
${codeAnalysisContent}
|
${codeAnalysisContent}
|
||||||
${additionalSection}
|
${additionalSection}
|
||||||
@@ -375,16 +370,8 @@ export async function executePreReconPhase(
|
|||||||
console.log(chalk.green(' ✅ Wave 2 operations completed'));
|
console.log(chalk.green(' ✅ Wave 2 operations completed'));
|
||||||
|
|
||||||
console.log(chalk.blue('📝 Stitching pre-recon outputs...'));
|
console.log(chalk.blue('📝 Stitching pre-recon outputs...'));
|
||||||
// Combine wave 1 and wave 2 results for stitching
|
const additionalScans = wave2Results.schemathesis ? [wave2Results.schemathesis] : [];
|
||||||
const allResults: (StitchableResult | string | undefined)[] = [
|
const preReconReport = await stitchPreReconOutputs(wave1Results, additionalScans, sourceDir);
|
||||||
wave1Results.nmap as StitchableResult | string,
|
|
||||||
wave1Results.subfinder as StitchableResult | string,
|
|
||||||
wave1Results.whatweb as StitchableResult | string,
|
|
||||||
wave1Results.naabu as StitchableResult | string | undefined,
|
|
||||||
wave1Results.codeAnalysis as unknown as StitchableResult,
|
|
||||||
...(wave2Results.schemathesis ? [wave2Results.schemathesis as StitchableResult] : [])
|
|
||||||
];
|
|
||||||
const preReconReport = await stitchPreReconOutputs(allResults, sourceDir);
|
|
||||||
const duration = timer.stop();
|
const duration = timer.stop();
|
||||||
|
|
||||||
console.log(chalk.green(`✅ Pre-reconnaissance complete in ${formatDuration(duration)}`));
|
console.log(chalk.green(`✅ Pre-reconnaissance complete in ${formatDuration(duration)}`));
|
||||||
|
|||||||
@@ -48,9 +48,12 @@ export async function assembleFinalReport(sourceDir: string): Promise<string> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
const finalContent = sections.join('\n\n');
|
const finalContent = sections.join('\n\n');
|
||||||
const finalReportPath = path.join(sourceDir, 'deliverables', 'comprehensive_security_assessment_report.md');
|
const deliverablesDir = path.join(sourceDir, 'deliverables');
|
||||||
|
const finalReportPath = path.join(deliverablesDir, 'comprehensive_security_assessment_report.md');
|
||||||
|
|
||||||
try {
|
try {
|
||||||
|
// Ensure deliverables directory exists
|
||||||
|
await fs.ensureDir(deliverablesDir);
|
||||||
await fs.writeFile(finalReportPath, finalContent);
|
await fs.writeFile(finalReportPath, finalContent);
|
||||||
console.log(chalk.green(`✅ Final report assembled at ${finalReportPath}`));
|
console.log(chalk.green(`✅ Final report assembled at ${finalReportPath}`));
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
|
|||||||
+40
-35
@@ -6,6 +6,7 @@
|
|||||||
|
|
||||||
import { fs, path } from 'zx';
|
import { fs, path } from 'zx';
|
||||||
import { PentestError } from './error-handling.js';
|
import { PentestError } from './error-handling.js';
|
||||||
|
import { asyncPipe } from './utils/functional.js';
|
||||||
|
|
||||||
export type VulnType = 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
export type VulnType = 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
||||||
|
|
||||||
@@ -16,9 +17,11 @@ interface VulnTypeConfigItem {
|
|||||||
|
|
||||||
type VulnTypeConfig = Record<VulnType, VulnTypeConfigItem>;
|
type VulnTypeConfig = Record<VulnType, VulnTypeConfigItem>;
|
||||||
|
|
||||||
|
type ErrorMessageResolver = string | ((existence: FileExistence) => string);
|
||||||
|
|
||||||
interface ValidationRule {
|
interface ValidationRule {
|
||||||
predicate: (existence: FileExistence) => boolean;
|
predicate: (existence: FileExistence) => boolean;
|
||||||
errorMessage: string;
|
errorMessage: ErrorMessageResolver;
|
||||||
retryable: boolean;
|
retryable: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -94,40 +97,36 @@ const VULN_TYPE_CONFIG: VulnTypeConfig = Object.freeze({
|
|||||||
}),
|
}),
|
||||||
}) as VulnTypeConfig;
|
}) as VulnTypeConfig;
|
||||||
|
|
||||||
// Functional composition utilities - async pipe for promise chain
|
|
||||||
type PipeFunction = (x: any) => any | Promise<any>;
|
|
||||||
|
|
||||||
const pipe =
|
|
||||||
(...fns: PipeFunction[]) =>
|
|
||||||
(x: any): Promise<any> =>
|
|
||||||
fns.reduce(async (v, f) => f(await v), Promise.resolve(x));
|
|
||||||
|
|
||||||
// Pure function to create validation rule
|
// Pure function to create validation rule
|
||||||
const createValidationRule = (
|
function createValidationRule(
|
||||||
predicate: (existence: FileExistence) => boolean,
|
predicate: (existence: FileExistence) => boolean,
|
||||||
errorMessage: string,
|
errorMessage: ErrorMessageResolver,
|
||||||
retryable: boolean = true
|
retryable: boolean = true
|
||||||
): ValidationRule => Object.freeze({ predicate, errorMessage, retryable });
|
): ValidationRule {
|
||||||
|
return Object.freeze({ predicate, errorMessage, retryable });
|
||||||
|
}
|
||||||
|
|
||||||
// Validation rules for file existence (following QUEUE_VALIDATION_FLOW.md)
|
// Symmetric deliverable rules: queue and deliverable must exist together (prevents partial analysis from triggering exploitation)
|
||||||
const fileExistenceRules: readonly ValidationRule[] = Object.freeze([
|
const fileExistenceRules: readonly ValidationRule[] = Object.freeze([
|
||||||
// Rule 1: Neither deliverable nor queue exists
|
|
||||||
createValidationRule(
|
createValidationRule(
|
||||||
({ deliverableExists, queueExists }) => deliverableExists || queueExists,
|
({ deliverableExists, queueExists }) => deliverableExists && queueExists,
|
||||||
'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.'
|
getExistenceErrorMessage
|
||||||
),
|
|
||||||
// Rule 2: Queue doesn't exist but deliverable exists
|
|
||||||
createValidationRule(
|
|
||||||
({ deliverableExists, queueExists }) => !(!queueExists && deliverableExists),
|
|
||||||
'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.'
|
|
||||||
),
|
|
||||||
// Rule 3: Queue exists but deliverable doesn't exist
|
|
||||||
createValidationRule(
|
|
||||||
({ deliverableExists, queueExists }) => !(queueExists && !deliverableExists),
|
|
||||||
'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.'
|
|
||||||
),
|
),
|
||||||
]);
|
]);
|
||||||
|
|
||||||
|
// Generate appropriate error message based on which files are missing
|
||||||
|
function getExistenceErrorMessage(existence: FileExistence): string {
|
||||||
|
const { deliverableExists, queueExists } = existence;
|
||||||
|
|
||||||
|
if (!deliverableExists && !queueExists) {
|
||||||
|
return 'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.';
|
||||||
|
}
|
||||||
|
if (!queueExists) {
|
||||||
|
return 'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.';
|
||||||
|
}
|
||||||
|
return 'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.';
|
||||||
|
}
|
||||||
|
|
||||||
// Pure function to create file paths
|
// Pure function to create file paths
|
||||||
const createPaths = (
|
const createPaths = (
|
||||||
vulnType: VulnType,
|
vulnType: VulnType,
|
||||||
@@ -170,7 +169,7 @@ const checkFileExistence = async (
|
|||||||
});
|
});
|
||||||
};
|
};
|
||||||
|
|
||||||
// Pure function to validate existence rules
|
// Validates deliverable/queue symmetry - both must exist or neither
|
||||||
const validateExistenceRules = (
|
const validateExistenceRules = (
|
||||||
pathsWithExistence: PathsWithExistence | PathsWithError
|
pathsWithExistence: PathsWithExistence | PathsWithError
|
||||||
): PathsWithExistence | PathsWithError => {
|
): PathsWithExistence | PathsWithError => {
|
||||||
@@ -182,9 +181,14 @@ const validateExistenceRules = (
|
|||||||
const failedRule = fileExistenceRules.find((rule) => !rule.predicate(existence));
|
const failedRule = fileExistenceRules.find((rule) => !rule.predicate(existence));
|
||||||
|
|
||||||
if (failedRule) {
|
if (failedRule) {
|
||||||
|
const message =
|
||||||
|
typeof failedRule.errorMessage === 'function'
|
||||||
|
? failedRule.errorMessage(existence)
|
||||||
|
: failedRule.errorMessage;
|
||||||
|
|
||||||
return {
|
return {
|
||||||
error: new PentestError(
|
error: new PentestError(
|
||||||
`${failedRule.errorMessage} (${vulnType})`,
|
`${message} (${vulnType})`,
|
||||||
'validation',
|
'validation',
|
||||||
failedRule.retryable,
|
failedRule.retryable,
|
||||||
{
|
{
|
||||||
@@ -224,7 +228,7 @@ const validateQueueStructure = (content: string): QueueValidationResult => {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
// Pure function to read and validate queue content
|
// Queue parse failures are retryable - agent can fix malformed JSON on retry
|
||||||
const validateQueueContent = async (
|
const validateQueueContent = async (
|
||||||
pathsWithExistence: PathsWithExistence | PathsWithError
|
pathsWithExistence: PathsWithExistence | PathsWithError
|
||||||
): Promise<PathsWithQueue | PathsWithError> => {
|
): Promise<PathsWithQueue | PathsWithError> => {
|
||||||
@@ -273,7 +277,7 @@ const validateQueueContent = async (
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
// Pure function to determine exploitation decision
|
// Final decision: skip if queue says no vulns, proceed if vulns found, error otherwise
|
||||||
const determineExploitationDecision = (
|
const determineExploitationDecision = (
|
||||||
validatedData: PathsWithQueue | PathsWithError
|
validatedData: PathsWithQueue | PathsWithError
|
||||||
): ExploitationDecision => {
|
): ExploitationDecision => {
|
||||||
@@ -294,17 +298,18 @@ const determineExploitationDecision = (
|
|||||||
};
|
};
|
||||||
|
|
||||||
// Main functional validation pipeline
|
// Main functional validation pipeline
|
||||||
export const validateQueueAndDeliverable = async (
|
export async function validateQueueAndDeliverable(
|
||||||
vulnType: VulnType,
|
vulnType: VulnType,
|
||||||
sourceDir: string
|
sourceDir: string
|
||||||
): Promise<ExploitationDecision> =>
|
): Promise<ExploitationDecision> {
|
||||||
(await pipe(
|
return asyncPipe<ExploitationDecision>(
|
||||||
() => createPaths(vulnType, sourceDir),
|
createPaths(vulnType, sourceDir),
|
||||||
checkFileExistence,
|
checkFileExistence,
|
||||||
validateExistenceRules,
|
validateExistenceRules,
|
||||||
validateQueueContent,
|
validateQueueContent,
|
||||||
determineExploitationDecision
|
determineExploitationDecision
|
||||||
)(() => createPaths(vulnType, sourceDir))) as ExploitationDecision;
|
);
|
||||||
|
}
|
||||||
|
|
||||||
// Pure function to safely validate (returns result instead of throwing)
|
// Pure function to safely validate (returns result instead of throwing)
|
||||||
export const safeValidateQueueAndDeliverable = async (
|
export const safeValidateQueueAndDeliverable = async (
|
||||||
|
|||||||
+20
-6
@@ -106,10 +106,24 @@ export const getParallelGroups = (): Readonly<{ vuln: AgentName[]; exploit: Agen
|
|||||||
exploit: ['injection-exploit', 'xss-exploit', 'auth-exploit', 'ssrf-exploit', 'authz-exploit']
|
exploit: ['injection-exploit', 'xss-exploit', 'auth-exploit', 'ssrf-exploit', 'authz-exploit']
|
||||||
});
|
});
|
||||||
|
|
||||||
// Generate a session-based log folder path (used by claude-executor.ts)
|
// Phase names for metrics aggregation
|
||||||
export const generateSessionLogPath = (webUrl: string, sessionId: string): string => {
|
export type PhaseName = 'pre-recon' | 'recon' | 'vulnerability-analysis' | 'exploitation' | 'reporting';
|
||||||
const hostname = new URL(webUrl).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
|
||||||
const sessionFolderName = `${hostname}_${sessionId}`;
|
// Map agents to their corresponding phases (single source of truth)
|
||||||
return path.join(process.cwd(), 'agent-logs', sessionFolderName);
|
export const AGENT_PHASE_MAP: Readonly<Record<AgentName, PhaseName>> = Object.freeze({
|
||||||
};
|
'pre-recon': 'pre-recon',
|
||||||
|
'recon': 'recon',
|
||||||
|
'injection-vuln': 'vulnerability-analysis',
|
||||||
|
'xss-vuln': 'vulnerability-analysis',
|
||||||
|
'auth-vuln': 'vulnerability-analysis',
|
||||||
|
'authz-vuln': 'vulnerability-analysis',
|
||||||
|
'ssrf-vuln': 'vulnerability-analysis',
|
||||||
|
'injection-exploit': 'exploitation',
|
||||||
|
'xss-exploit': 'exploitation',
|
||||||
|
'auth-exploit': 'exploitation',
|
||||||
|
'authz-exploit': 'exploitation',
|
||||||
|
'ssrf-exploit': 'exploitation',
|
||||||
|
'report': 'reporting',
|
||||||
|
});
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
-897
@@ -1,897 +0,0 @@
|
|||||||
#!/usr/bin/env node
|
|
||||||
// Copyright (C) 2025 Keygraph, Inc.
|
|
||||||
//
|
|
||||||
// This program is free software: you can redistribute it and/or modify
|
|
||||||
// it under the terms of the GNU Affero General Public License version 3
|
|
||||||
// as published by the Free Software Foundation.
|
|
||||||
|
|
||||||
import { path, fs, $ } from 'zx';
|
|
||||||
import chalk, { type ChalkInstance } from 'chalk';
|
|
||||||
import dotenv from 'dotenv';
|
|
||||||
|
|
||||||
dotenv.config();
|
|
||||||
|
|
||||||
// Config and Tools
|
|
||||||
import { parseConfig, distributeConfig } from './config-parser.js';
|
|
||||||
import { checkToolAvailability, handleMissingTools } from './tool-checker.js';
|
|
||||||
|
|
||||||
// Session
|
|
||||||
import { AGENTS, getParallelGroups } from './session-manager.js';
|
|
||||||
import type { AgentName, PromptName } from './types/index.js';
|
|
||||||
|
|
||||||
// Setup and Deliverables
|
|
||||||
import { setupLocalRepo } from './setup/environment.js';
|
|
||||||
|
|
||||||
// AI and Prompts
|
|
||||||
import { runClaudePromptWithRetry } from './ai/claude-executor.js';
|
|
||||||
import { loadPrompt } from './prompts/prompt-manager.js';
|
|
||||||
|
|
||||||
// Phases
|
|
||||||
import { executePreReconPhase } from './phases/pre-recon.js';
|
|
||||||
import { assembleFinalReport } from './phases/reporting.js';
|
|
||||||
|
|
||||||
// Utils
|
|
||||||
import { timingResults, displayTimingSummary, Timer } from './utils/metrics.js';
|
|
||||||
import { formatDuration, generateAuditPath } from './audit/utils.js';
|
|
||||||
import type { SessionMetadata } from './audit/utils.js';
|
|
||||||
import { AuditSession } from './audit/audit-session.js';
|
|
||||||
|
|
||||||
// CLI
|
|
||||||
import { showHelp, displaySplashScreen } from './cli/ui.js';
|
|
||||||
import { validateWebUrl, validateRepoPath } from './cli/input-validator.js';
|
|
||||||
|
|
||||||
// Error Handling
|
|
||||||
import { PentestError, logError } from './error-handling.js';
|
|
||||||
|
|
||||||
import type { DistributedConfig } from './types/config.js';
|
|
||||||
import type { ToolAvailability } from './tool-checker.js';
|
|
||||||
import { safeValidateQueueAndDeliverable } from './queue-validation.js';
|
|
||||||
|
|
||||||
// Extend global namespace for SHANNON_DISABLE_LOADER
|
|
||||||
declare global {
|
|
||||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Session Lock File Management
|
|
||||||
const STORE_PATH = path.join(process.cwd(), '.shannon-store.json');
|
|
||||||
|
|
||||||
interface Session {
|
|
||||||
id: string;
|
|
||||||
webUrl: string;
|
|
||||||
repoPath: string;
|
|
||||||
status: 'in-progress' | 'completed' | 'failed';
|
|
||||||
startedAt: string;
|
|
||||||
}
|
|
||||||
|
|
||||||
interface SessionStore {
|
|
||||||
sessions: Session[];
|
|
||||||
}
|
|
||||||
|
|
||||||
function generateSessionId(): string {
|
|
||||||
return crypto.randomUUID();
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadSessions(): Promise<SessionStore> {
|
|
||||||
try {
|
|
||||||
if (await fs.pathExists(STORE_PATH)) {
|
|
||||||
return await fs.readJson(STORE_PATH) as SessionStore;
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// Corrupted file, start fresh
|
|
||||||
}
|
|
||||||
return { sessions: [] };
|
|
||||||
}
|
|
||||||
|
|
||||||
async function saveSessions(store: SessionStore): Promise<void> {
|
|
||||||
await fs.writeJson(STORE_PATH, store, { spaces: 2 });
|
|
||||||
}
|
|
||||||
|
|
||||||
async function createSession(webUrl: string, repoPath: string): Promise<Session> {
|
|
||||||
const store = await loadSessions();
|
|
||||||
|
|
||||||
// Check for existing in-progress session
|
|
||||||
const existing = store.sessions.find(
|
|
||||||
s => s.repoPath === repoPath && s.status === 'in-progress'
|
|
||||||
);
|
|
||||||
if (existing) {
|
|
||||||
throw new PentestError(
|
|
||||||
`Session already in progress for ${repoPath}`,
|
|
||||||
'validation',
|
|
||||||
false,
|
|
||||||
{ sessionId: existing.id }
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
const session: Session = {
|
|
||||||
id: generateSessionId(),
|
|
||||||
webUrl,
|
|
||||||
repoPath,
|
|
||||||
status: 'in-progress',
|
|
||||||
startedAt: new Date().toISOString()
|
|
||||||
};
|
|
||||||
|
|
||||||
store.sessions.push(session);
|
|
||||||
await saveSessions(store);
|
|
||||||
return session;
|
|
||||||
}
|
|
||||||
|
|
||||||
async function updateSessionStatus(
|
|
||||||
sessionId: string,
|
|
||||||
status: 'in-progress' | 'completed' | 'failed'
|
|
||||||
): Promise<void> {
|
|
||||||
const store = await loadSessions();
|
|
||||||
const session = store.sessions.find(s => s.id === sessionId);
|
|
||||||
if (session) {
|
|
||||||
session.status = status;
|
|
||||||
await saveSessions(store);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
interface PromptVariables {
|
|
||||||
webUrl: string;
|
|
||||||
repoPath: string;
|
|
||||||
sourceDir: string;
|
|
||||||
}
|
|
||||||
|
|
||||||
interface MainResult {
|
|
||||||
reportPath: string;
|
|
||||||
auditLogsPath: string;
|
|
||||||
}
|
|
||||||
|
|
||||||
interface AgentResult {
|
|
||||||
success: boolean;
|
|
||||||
duration: number;
|
|
||||||
cost?: number;
|
|
||||||
error?: string;
|
|
||||||
retryable?: boolean;
|
|
||||||
}
|
|
||||||
|
|
||||||
interface ParallelAgentResult {
|
|
||||||
agentName: AgentName;
|
|
||||||
success: boolean;
|
|
||||||
timing?: number | undefined;
|
|
||||||
cost?: number | undefined;
|
|
||||||
attempts: number;
|
|
||||||
error?: string | undefined;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Configure zx to disable timeouts (let tools run as long as needed)
|
|
||||||
$.timeout = 0;
|
|
||||||
|
|
||||||
// Helper function to get prompt name from agent name
|
|
||||||
const getPromptName = (agentName: AgentName): PromptName => {
|
|
||||||
const mappings: Record<AgentName, PromptName> = {
|
|
||||||
'pre-recon': 'pre-recon-code',
|
|
||||||
'recon': 'recon',
|
|
||||||
'injection-vuln': 'vuln-injection',
|
|
||||||
'xss-vuln': 'vuln-xss',
|
|
||||||
'auth-vuln': 'vuln-auth',
|
|
||||||
'ssrf-vuln': 'vuln-ssrf',
|
|
||||||
'authz-vuln': 'vuln-authz',
|
|
||||||
'injection-exploit': 'exploit-injection',
|
|
||||||
'xss-exploit': 'exploit-xss',
|
|
||||||
'auth-exploit': 'exploit-auth',
|
|
||||||
'ssrf-exploit': 'exploit-ssrf',
|
|
||||||
'authz-exploit': 'exploit-authz',
|
|
||||||
'report': 'report-executive'
|
|
||||||
};
|
|
||||||
|
|
||||||
return mappings[agentName] || agentName as PromptName;
|
|
||||||
};
|
|
||||||
|
|
||||||
// Get color function for agent
|
|
||||||
const getAgentColor = (agentName: AgentName): ChalkInstance => {
|
|
||||||
const colorMap: Partial<Record<AgentName, ChalkInstance>> = {
|
|
||||||
'injection-vuln': chalk.red,
|
|
||||||
'injection-exploit': chalk.red,
|
|
||||||
'xss-vuln': chalk.yellow,
|
|
||||||
'xss-exploit': chalk.yellow,
|
|
||||||
'auth-vuln': chalk.blue,
|
|
||||||
'auth-exploit': chalk.blue,
|
|
||||||
'ssrf-vuln': chalk.magenta,
|
|
||||||
'ssrf-exploit': chalk.magenta,
|
|
||||||
'authz-vuln': chalk.green,
|
|
||||||
'authz-exploit': chalk.green
|
|
||||||
};
|
|
||||||
return colorMap[agentName] || chalk.cyan;
|
|
||||||
};
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Consolidate deliverables from target repo into the session folder
|
|
||||||
*/
|
|
||||||
async function consolidateOutputs(sourceDir: string, sessionPath: string): Promise<void> {
|
|
||||||
const srcDeliverables = path.join(sourceDir, 'deliverables');
|
|
||||||
const destDeliverables = path.join(sessionPath, 'deliverables');
|
|
||||||
|
|
||||||
try {
|
|
||||||
if (await fs.pathExists(srcDeliverables)) {
|
|
||||||
await fs.copy(srcDeliverables, destDeliverables, { overwrite: true });
|
|
||||||
console.log(chalk.gray(`📄 Deliverables copied to session folder`));
|
|
||||||
} else {
|
|
||||||
console.log(chalk.yellow(`⚠️ No deliverables directory found at ${srcDeliverables}`));
|
|
||||||
}
|
|
||||||
} catch (error) {
|
|
||||||
const err = error as Error;
|
|
||||||
console.log(chalk.yellow(`⚠️ Failed to consolidate deliverables: ${err.message}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Run a single agent
|
|
||||||
*/
|
|
||||||
async function runAgent(
|
|
||||||
agentName: AgentName,
|
|
||||||
sourceDir: string,
|
|
||||||
variables: PromptVariables,
|
|
||||||
distributedConfig: DistributedConfig | null,
|
|
||||||
pipelineTestingMode: boolean,
|
|
||||||
sessionMetadata: SessionMetadata
|
|
||||||
): Promise<AgentResult> {
|
|
||||||
const agent = AGENTS[agentName];
|
|
||||||
const promptName = getPromptName(agentName);
|
|
||||||
const prompt = await loadPrompt(promptName, variables, distributedConfig, pipelineTestingMode);
|
|
||||||
|
|
||||||
return await runClaudePromptWithRetry(
|
|
||||||
prompt,
|
|
||||||
sourceDir,
|
|
||||||
'*',
|
|
||||||
'',
|
|
||||||
agent.displayName,
|
|
||||||
agentName,
|
|
||||||
getAgentColor(agentName),
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Run vulnerability agents in parallel
|
|
||||||
*/
|
|
||||||
async function runParallelVuln(
|
|
||||||
sourceDir: string,
|
|
||||||
variables: PromptVariables,
|
|
||||||
distributedConfig: DistributedConfig | null,
|
|
||||||
pipelineTestingMode: boolean,
|
|
||||||
sessionMetadata: SessionMetadata
|
|
||||||
): Promise<ParallelAgentResult[]> {
|
|
||||||
const { vuln: vulnAgents } = getParallelGroups();
|
|
||||||
|
|
||||||
console.log(chalk.cyan(`\nStarting ${vulnAgents.length} vulnerability analysis specialists in parallel...`));
|
|
||||||
console.log(chalk.gray(' Specialists: ' + vulnAgents.join(', ')));
|
|
||||||
console.log();
|
|
||||||
|
|
||||||
const startTime = Date.now();
|
|
||||||
|
|
||||||
const results = await Promise.allSettled(
|
|
||||||
vulnAgents.map(async (agentName, index) => {
|
|
||||||
// Add 2-second stagger to prevent API overwhelm
|
|
||||||
await new Promise(resolve => setTimeout(resolve, index * 2000));
|
|
||||||
|
|
||||||
let lastError: Error | undefined;
|
|
||||||
let attempts = 0;
|
|
||||||
const maxAttempts = 3;
|
|
||||||
|
|
||||||
while (attempts < maxAttempts) {
|
|
||||||
attempts++;
|
|
||||||
try {
|
|
||||||
const result = await runAgent(
|
|
||||||
agentName,
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
|
|
||||||
// Validate vulnerability analysis results
|
|
||||||
const vulnType = agentName.replace('-vuln', '');
|
|
||||||
try {
|
|
||||||
const validation = await safeValidateQueueAndDeliverable(vulnType as 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz', sourceDir);
|
|
||||||
|
|
||||||
if (validation.success && validation.data) {
|
|
||||||
console.log(chalk.blue(`${agentName}: ${validation.data.shouldExploit ? `Ready for exploitation (${validation.data.vulnerabilityCount} vulnerabilities)` : 'No vulnerabilities found'}`));
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// Validation failure is non-critical
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
agentName,
|
|
||||||
success: result.success,
|
|
||||||
timing: result.duration,
|
|
||||||
cost: result.cost,
|
|
||||||
attempts
|
|
||||||
};
|
|
||||||
} catch (error) {
|
|
||||||
lastError = error as Error;
|
|
||||||
if (attempts < maxAttempts) {
|
|
||||||
console.log(chalk.yellow(`Warning: ${agentName} failed attempt ${attempts}/${maxAttempts}, retrying...`));
|
|
||||||
await new Promise(resolve => setTimeout(resolve, 5000));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
agentName,
|
|
||||||
success: false,
|
|
||||||
attempts,
|
|
||||||
error: lastError?.message || 'Unknown error'
|
|
||||||
};
|
|
||||||
})
|
|
||||||
);
|
|
||||||
|
|
||||||
const totalDuration = Date.now() - startTime;
|
|
||||||
|
|
||||||
// Process and display results
|
|
||||||
console.log(chalk.cyan('\nVulnerability Analysis Results'));
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
console.log(chalk.bold('Agent Status Attempt Duration Cost'));
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
|
|
||||||
const processedResults: ParallelAgentResult[] = [];
|
|
||||||
|
|
||||||
results.forEach((result, index) => {
|
|
||||||
const agentName = vulnAgents[index]!;
|
|
||||||
const agentDisplay = agentName.padEnd(22);
|
|
||||||
|
|
||||||
if (result.status === 'fulfilled') {
|
|
||||||
const data = result.value;
|
|
||||||
processedResults.push(data);
|
|
||||||
|
|
||||||
if (data.success) {
|
|
||||||
const duration = formatDuration(data.timing || 0);
|
|
||||||
const cost = `$${(data.cost || 0).toFixed(4)}`;
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
`${chalk.green(agentDisplay)} ${chalk.green('Success')} ` +
|
|
||||||
`${data.attempts}/3 ${duration.padEnd(11)} ${cost}`
|
|
||||||
);
|
|
||||||
} else {
|
|
||||||
console.log(
|
|
||||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
|
||||||
`${data.attempts}/3 - -`
|
|
||||||
);
|
|
||||||
if (data.error) {
|
|
||||||
console.log(chalk.gray(` Error: ${data.error.substring(0, 60)}...`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
processedResults.push({
|
|
||||||
agentName,
|
|
||||||
success: false,
|
|
||||||
attempts: 3,
|
|
||||||
error: String(result.reason)
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
|
||||||
`3/3 - -`
|
|
||||||
);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
const successCount = processedResults.filter(r => r.success).length;
|
|
||||||
console.log(chalk.cyan(`Summary: ${successCount}/${vulnAgents.length} succeeded in ${formatDuration(totalDuration)}`));
|
|
||||||
|
|
||||||
return processedResults;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Run exploitation agents in parallel
|
|
||||||
*/
|
|
||||||
async function runParallelExploit(
|
|
||||||
sourceDir: string,
|
|
||||||
variables: PromptVariables,
|
|
||||||
distributedConfig: DistributedConfig | null,
|
|
||||||
pipelineTestingMode: boolean,
|
|
||||||
sessionMetadata: SessionMetadata
|
|
||||||
): Promise<ParallelAgentResult[]> {
|
|
||||||
const { exploit: exploitAgents, vuln: vulnAgents } = getParallelGroups();
|
|
||||||
|
|
||||||
// Load validation module
|
|
||||||
const { safeValidateQueueAndDeliverable } = await import('./queue-validation.js');
|
|
||||||
|
|
||||||
// Check eligibility
|
|
||||||
const eligibilityChecks = await Promise.all(
|
|
||||||
exploitAgents.map(async (agentName) => {
|
|
||||||
const vulnAgentName = agentName.replace('-exploit', '-vuln') as AgentName;
|
|
||||||
const vulnType = vulnAgentName.replace('-vuln', '') as 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
|
||||||
|
|
||||||
const validation = await safeValidateQueueAndDeliverable(vulnType, sourceDir);
|
|
||||||
|
|
||||||
if (!validation.success || !validation.data?.shouldExploit) {
|
|
||||||
console.log(chalk.gray(`Skipping ${agentName} (no vulnerabilities found in ${vulnAgentName})`));
|
|
||||||
return { agentName, eligible: false };
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(chalk.blue(`${agentName} eligible (${validation.data.vulnerabilityCount} vulnerabilities from ${vulnAgentName})`));
|
|
||||||
return { agentName, eligible: true };
|
|
||||||
})
|
|
||||||
);
|
|
||||||
|
|
||||||
const eligibleAgents = eligibilityChecks
|
|
||||||
.filter(check => check.eligible)
|
|
||||||
.map(check => check.agentName);
|
|
||||||
|
|
||||||
if (eligibleAgents.length === 0) {
|
|
||||||
console.log(chalk.gray('No exploitation agents eligible (no vulnerabilities found)'));
|
|
||||||
return [];
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(chalk.cyan(`\nStarting ${eligibleAgents.length} exploitation specialists in parallel...`));
|
|
||||||
console.log(chalk.gray(' Specialists: ' + eligibleAgents.join(', ')));
|
|
||||||
console.log();
|
|
||||||
|
|
||||||
const startTime = Date.now();
|
|
||||||
|
|
||||||
const results = await Promise.allSettled(
|
|
||||||
eligibleAgents.map(async (agentName, index) => {
|
|
||||||
await new Promise(resolve => setTimeout(resolve, index * 2000));
|
|
||||||
|
|
||||||
let lastError: Error | undefined;
|
|
||||||
let attempts = 0;
|
|
||||||
const maxAttempts = 3;
|
|
||||||
|
|
||||||
while (attempts < maxAttempts) {
|
|
||||||
attempts++;
|
|
||||||
try {
|
|
||||||
const result = await runAgent(
|
|
||||||
agentName,
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
|
|
||||||
return {
|
|
||||||
agentName,
|
|
||||||
success: result.success,
|
|
||||||
timing: result.duration,
|
|
||||||
cost: result.cost,
|
|
||||||
attempts
|
|
||||||
};
|
|
||||||
} catch (error) {
|
|
||||||
lastError = error as Error;
|
|
||||||
if (attempts < maxAttempts) {
|
|
||||||
console.log(chalk.yellow(`Warning: ${agentName} failed attempt ${attempts}/${maxAttempts}, retrying...`));
|
|
||||||
await new Promise(resolve => setTimeout(resolve, 5000));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
agentName,
|
|
||||||
success: false,
|
|
||||||
attempts,
|
|
||||||
error: lastError?.message || 'Unknown error'
|
|
||||||
};
|
|
||||||
})
|
|
||||||
);
|
|
||||||
|
|
||||||
const totalDuration = Date.now() - startTime;
|
|
||||||
|
|
||||||
// Process and display results
|
|
||||||
console.log(chalk.cyan('\nExploitation Results'));
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
console.log(chalk.bold('Agent Status Attempt Duration Cost'));
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
|
|
||||||
const processedResults: ParallelAgentResult[] = [];
|
|
||||||
|
|
||||||
results.forEach((result, index) => {
|
|
||||||
const agentName = eligibleAgents[index]!;
|
|
||||||
const agentDisplay = agentName.padEnd(22);
|
|
||||||
|
|
||||||
if (result.status === 'fulfilled') {
|
|
||||||
const data = result.value;
|
|
||||||
processedResults.push(data);
|
|
||||||
|
|
||||||
if (data.success) {
|
|
||||||
const duration = formatDuration(data.timing || 0);
|
|
||||||
const cost = `$${(data.cost || 0).toFixed(4)}`;
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
`${chalk.green(agentDisplay)} ${chalk.green('Success')} ` +
|
|
||||||
`${data.attempts}/3 ${duration.padEnd(11)} ${cost}`
|
|
||||||
);
|
|
||||||
} else {
|
|
||||||
console.log(
|
|
||||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
|
||||||
`${data.attempts}/3 - -`
|
|
||||||
);
|
|
||||||
if (data.error) {
|
|
||||||
console.log(chalk.gray(` Error: ${data.error.substring(0, 60)}...`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
processedResults.push({
|
|
||||||
agentName,
|
|
||||||
success: false,
|
|
||||||
attempts: 3,
|
|
||||||
error: String(result.reason)
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
|
||||||
`3/3 - -`
|
|
||||||
);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(chalk.gray('-'.repeat(80)));
|
|
||||||
const successCount = processedResults.filter(r => r.success).length;
|
|
||||||
console.log(chalk.cyan(`Summary: ${successCount}/${eligibleAgents.length} succeeded in ${formatDuration(totalDuration)}`));
|
|
||||||
|
|
||||||
return processedResults;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Setup graceful cleanup on process signals
|
|
||||||
process.on('SIGINT', async () => {
|
|
||||||
console.log(chalk.yellow('\n⚠️ Received SIGINT, cleaning up...'));
|
|
||||||
|
|
||||||
process.exit(0);
|
|
||||||
});
|
|
||||||
|
|
||||||
process.on('SIGTERM', async () => {
|
|
||||||
console.log(chalk.yellow('\n⚠️ Received SIGTERM, cleaning up...'));
|
|
||||||
|
|
||||||
process.exit(0);
|
|
||||||
});
|
|
||||||
|
|
||||||
// Main orchestration function
|
|
||||||
async function main(
|
|
||||||
webUrl: string,
|
|
||||||
repoPath: string,
|
|
||||||
configPath: string | null = null,
|
|
||||||
pipelineTestingMode: boolean = false,
|
|
||||||
disableLoader: boolean = false,
|
|
||||||
outputPath: string | null = null
|
|
||||||
): Promise<MainResult> {
|
|
||||||
// Set global flag for loader control
|
|
||||||
global.SHANNON_DISABLE_LOADER = disableLoader;
|
|
||||||
|
|
||||||
const totalTimer = new Timer('total-execution');
|
|
||||||
timingResults.total = totalTimer;
|
|
||||||
|
|
||||||
// Display splash screen
|
|
||||||
await displaySplashScreen();
|
|
||||||
|
|
||||||
console.log(chalk.cyan.bold('🚀 AI PENETRATION TESTING AGENT'));
|
|
||||||
console.log(chalk.cyan(`🎯 Target: ${webUrl}`));
|
|
||||||
console.log(chalk.cyan(`📁 Source: ${repoPath}`));
|
|
||||||
if (configPath) {
|
|
||||||
console.log(chalk.cyan(`⚙️ Config: ${configPath}`));
|
|
||||||
}
|
|
||||||
if (outputPath) {
|
|
||||||
console.log(chalk.cyan(`📂 Output: ${outputPath}`));
|
|
||||||
}
|
|
||||||
console.log(chalk.gray('─'.repeat(60)));
|
|
||||||
|
|
||||||
// Parse configuration if provided
|
|
||||||
let distributedConfig: DistributedConfig | null = null;
|
|
||||||
if (configPath) {
|
|
||||||
try {
|
|
||||||
// Resolve config path - check configs folder if relative path
|
|
||||||
let resolvedConfigPath = configPath;
|
|
||||||
if (!path.isAbsolute(configPath)) {
|
|
||||||
const configsDir = path.join(process.cwd(), 'configs');
|
|
||||||
const configInConfigsDir = path.join(configsDir, configPath);
|
|
||||||
// Check if file exists in configs directory, otherwise use original path
|
|
||||||
if (await fs.pathExists(configInConfigsDir)) {
|
|
||||||
resolvedConfigPath = configInConfigsDir;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
const config = await parseConfig(resolvedConfigPath);
|
|
||||||
distributedConfig = distributeConfig(config);
|
|
||||||
console.log(chalk.green(`✅ Configuration loaded successfully`));
|
|
||||||
} catch (error) {
|
|
||||||
await logError(error as Error, `Configuration loading from ${configPath}`);
|
|
||||||
throw error; // Let the main error boundary handle it
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check tool availability
|
|
||||||
const toolAvailability: ToolAvailability = await checkToolAvailability();
|
|
||||||
handleMissingTools(toolAvailability);
|
|
||||||
|
|
||||||
// Setup local repository
|
|
||||||
console.log(chalk.blue('📁 Setting up local repository...'));
|
|
||||||
let sourceDir: string;
|
|
||||||
try {
|
|
||||||
sourceDir = await setupLocalRepo(repoPath);
|
|
||||||
console.log(chalk.green('✅ Local repository setup successfully'));
|
|
||||||
} catch (error) {
|
|
||||||
const err = error as Error;
|
|
||||||
console.log(chalk.red(`❌ Failed to setup local repository: ${err.message}`));
|
|
||||||
console.log(chalk.gray('This could be due to:'));
|
|
||||||
console.log(chalk.gray(' - Insufficient permissions'));
|
|
||||||
console.log(chalk.gray(' - Repository path not accessible'));
|
|
||||||
console.log(chalk.gray(' - Git initialization issues'));
|
|
||||||
console.log(chalk.gray(' - Insufficient disk space'));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
const variables: PromptVariables = { webUrl, repoPath, sourceDir };
|
|
||||||
|
|
||||||
// Create session (acts as lock file)
|
|
||||||
const session: Session = await createSession(webUrl, repoPath);
|
|
||||||
console.log(chalk.blue(`Session created: ${session.id.substring(0, 8)}...`));
|
|
||||||
|
|
||||||
// Session metadata for audit logging
|
|
||||||
const sessionMetadata: SessionMetadata = {
|
|
||||||
id: session.id,
|
|
||||||
webUrl,
|
|
||||||
repoPath: sourceDir,
|
|
||||||
...(outputPath && { outputPath })
|
|
||||||
};
|
|
||||||
|
|
||||||
// Create outputs directory in source directory
|
|
||||||
try {
|
|
||||||
const outputsDir = path.join(sourceDir, 'outputs');
|
|
||||||
await fs.ensureDir(outputsDir);
|
|
||||||
await fs.ensureDir(path.join(outputsDir, 'schemas'));
|
|
||||||
await fs.ensureDir(path.join(outputsDir, 'scans'));
|
|
||||||
} catch (error) {
|
|
||||||
const err = error as Error;
|
|
||||||
throw new PentestError(
|
|
||||||
`Failed to create output directories: ${err.message}`,
|
|
||||||
'filesystem',
|
|
||||||
false,
|
|
||||||
{ sourceDir, originalError: err.message }
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
// PHASE 1: PRE-RECONNAISSANCE
|
|
||||||
const { duration: preReconDuration } = await executePreReconPhase(
|
|
||||||
webUrl,
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
toolAvailability,
|
|
||||||
pipelineTestingMode,
|
|
||||||
session.id,
|
|
||||||
outputPath
|
|
||||||
);
|
|
||||||
console.log(chalk.green(`Pre-reconnaissance complete in ${formatDuration(preReconDuration)}`));
|
|
||||||
|
|
||||||
// PHASE 2: RECONNAISSANCE
|
|
||||||
console.log(chalk.magenta.bold('\n🔎 PHASE 2: RECONNAISSANCE'));
|
|
||||||
console.log(chalk.magenta('Analyzing initial findings...'));
|
|
||||||
const reconTimer = new Timer('phase-2-recon');
|
|
||||||
|
|
||||||
await runAgent(
|
|
||||||
'recon',
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
const reconDuration = reconTimer.stop();
|
|
||||||
console.log(chalk.green(`✅ Reconnaissance complete in ${formatDuration(reconDuration)}`));
|
|
||||||
|
|
||||||
// PHASE 3: VULNERABILITY ANALYSIS
|
|
||||||
const vulnTimer = new Timer('phase-3-vulnerability-analysis');
|
|
||||||
console.log(chalk.red.bold('\n🚨 PHASE 3: VULNERABILITY ANALYSIS'));
|
|
||||||
|
|
||||||
const vulnResults = await runParallelVuln(
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
|
|
||||||
const vulnDuration = vulnTimer.stop();
|
|
||||||
console.log(chalk.green(`✅ Vulnerability analysis phase complete in ${formatDuration(vulnDuration)}`));
|
|
||||||
|
|
||||||
// PHASE 4: EXPLOITATION
|
|
||||||
const exploitTimer = new Timer('phase-4-exploitation');
|
|
||||||
console.log(chalk.red.bold('\n💥 PHASE 4: EXPLOITATION'));
|
|
||||||
|
|
||||||
const exploitResults = await runParallelExploit(
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
|
|
||||||
const exploitDuration = exploitTimer.stop();
|
|
||||||
console.log(chalk.green(`✅ Exploitation phase complete in ${formatDuration(exploitDuration)}`));
|
|
||||||
|
|
||||||
// PHASE 5: REPORTING
|
|
||||||
console.log(chalk.greenBright.bold('\n📊 PHASE 5: REPORTING'));
|
|
||||||
console.log(chalk.greenBright('Generating executive summary and assembling final report...'));
|
|
||||||
const reportTimer = new Timer('phase-5-reporting');
|
|
||||||
|
|
||||||
// Assemble all deliverables into a single concatenated report
|
|
||||||
console.log(chalk.blue('📝 Assembling deliverables from specialist agents...'));
|
|
||||||
try {
|
|
||||||
await assembleFinalReport(sourceDir);
|
|
||||||
} catch (error) {
|
|
||||||
const err = error as Error;
|
|
||||||
console.log(chalk.red(`❌ Error assembling final report: ${err.message}`));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Run reporter agent to create executive summary
|
|
||||||
console.log(chalk.blue('Generating executive summary and cleaning up report...'));
|
|
||||||
await runAgent(
|
|
||||||
'report',
|
|
||||||
sourceDir,
|
|
||||||
variables,
|
|
||||||
distributedConfig,
|
|
||||||
pipelineTestingMode,
|
|
||||||
sessionMetadata
|
|
||||||
);
|
|
||||||
|
|
||||||
const reportDuration = reportTimer.stop();
|
|
||||||
console.log(chalk.green(`✅ Final report generated in ${formatDuration(reportDuration)}`));
|
|
||||||
|
|
||||||
// Calculate final timing
|
|
||||||
timingResults.total.stop();
|
|
||||||
|
|
||||||
// Mark session as completed in both stores
|
|
||||||
await updateSessionStatus(session.id, 'completed');
|
|
||||||
|
|
||||||
// Update audit system's session.json status
|
|
||||||
const auditSession = new AuditSession(sessionMetadata);
|
|
||||||
await auditSession.updateSessionStatus('completed');
|
|
||||||
|
|
||||||
// Display comprehensive timing summary
|
|
||||||
displayTimingSummary();
|
|
||||||
|
|
||||||
console.log(chalk.cyan.bold('\n🎉 PENETRATION TESTING COMPLETE!'));
|
|
||||||
console.log(chalk.gray('─'.repeat(60)));
|
|
||||||
|
|
||||||
// Calculate audit logs path
|
|
||||||
const auditLogsPath = generateAuditPath(sessionMetadata);
|
|
||||||
|
|
||||||
// Consolidate deliverables into the session folder
|
|
||||||
await consolidateOutputs(sourceDir, auditLogsPath);
|
|
||||||
console.log(chalk.green(`\n📂 All outputs consolidated: ${auditLogsPath}`));
|
|
||||||
|
|
||||||
return {
|
|
||||||
reportPath: path.join(sourceDir, 'deliverables', 'comprehensive_security_assessment_report.md'),
|
|
||||||
auditLogsPath
|
|
||||||
};
|
|
||||||
|
|
||||||
} catch (error) {
|
|
||||||
// Mark session as failed in both stores
|
|
||||||
await updateSessionStatus(session.id, 'failed');
|
|
||||||
|
|
||||||
// Update audit system's session.json status
|
|
||||||
const auditSession = new AuditSession(sessionMetadata);
|
|
||||||
await auditSession.updateSessionStatus('failed');
|
|
||||||
|
|
||||||
throw error;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Entry point - handle both direct node execution and shebang execution
|
|
||||||
let args = process.argv.slice(2);
|
|
||||||
// If first arg is the script name (from shebang), remove it
|
|
||||||
if (args[0] && args[0].includes('shannon')) {
|
|
||||||
args = args.slice(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Parse flags and arguments
|
|
||||||
let configPath: string | null = null;
|
|
||||||
let outputPath: string | null = null;
|
|
||||||
let pipelineTestingMode = false;
|
|
||||||
let disableLoader = false;
|
|
||||||
const nonFlagArgs: string[] = [];
|
|
||||||
|
|
||||||
for (let i = 0; i < args.length; i++) {
|
|
||||||
if (args[i] === '--config') {
|
|
||||||
if (i + 1 < args.length) {
|
|
||||||
configPath = args[i + 1]!;
|
|
||||||
i++; // Skip the next argument
|
|
||||||
} else {
|
|
||||||
console.log(chalk.red('❌ --config flag requires a file path'));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
} else if (args[i] === '--output') {
|
|
||||||
if (i + 1 < args.length) {
|
|
||||||
outputPath = path.resolve(args[i + 1]!);
|
|
||||||
i++; // Skip the next argument
|
|
||||||
} else {
|
|
||||||
console.log(chalk.red('❌ --output flag requires a directory path'));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
} else if (args[i] === '--pipeline-testing') {
|
|
||||||
pipelineTestingMode = true;
|
|
||||||
} else if (args[i] === '--disable-loader') {
|
|
||||||
disableLoader = true;
|
|
||||||
} else if (!args[i]!.startsWith('-')) {
|
|
||||||
nonFlagArgs.push(args[i]!);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Handle help flag
|
|
||||||
if (args.includes('--help') || args.includes('-h') || args.includes('help')) {
|
|
||||||
showHelp();
|
|
||||||
process.exit(0);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Handle no arguments - show help
|
|
||||||
if (nonFlagArgs.length === 0) {
|
|
||||||
console.log(chalk.red.bold('❌ Error: No arguments provided\n'));
|
|
||||||
showHelp();
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Handle insufficient arguments
|
|
||||||
if (nonFlagArgs.length < 2) {
|
|
||||||
console.log(chalk.red('❌ Both WEB_URL and REPO_PATH are required'));
|
|
||||||
console.log(chalk.gray('Usage: shannon <WEB_URL> <REPO_PATH> [--config config.yaml]'));
|
|
||||||
console.log(chalk.gray('Help: shannon --help'));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
const [webUrl, repoPath] = nonFlagArgs;
|
|
||||||
|
|
||||||
// Validate web URL
|
|
||||||
const webUrlValidation = validateWebUrl(webUrl!);
|
|
||||||
if (!webUrlValidation.valid) {
|
|
||||||
console.log(chalk.red(`❌ Invalid web URL: ${webUrlValidation.error}`));
|
|
||||||
console.log(chalk.gray(`Expected format: https://example.com`));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Validate repository path
|
|
||||||
const repoPathValidation = await validateRepoPath(repoPath!);
|
|
||||||
if (!repoPathValidation.valid) {
|
|
||||||
console.log(chalk.red(`❌ Invalid repository path: ${repoPathValidation.error}`));
|
|
||||||
console.log(chalk.gray(`Expected: Accessible local directory path`));
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Success - show validated inputs
|
|
||||||
console.log(chalk.green('✅ Input validation passed:'));
|
|
||||||
console.log(chalk.gray(` Target Web URL: ${webUrl}`));
|
|
||||||
console.log(chalk.gray(` Target Repository: ${repoPathValidation.path}\n`));
|
|
||||||
console.log(chalk.gray(` Config Path: ${configPath}\n`));
|
|
||||||
if (outputPath) {
|
|
||||||
console.log(chalk.gray(` Output Path: ${outputPath}\n`));
|
|
||||||
}
|
|
||||||
if (pipelineTestingMode) {
|
|
||||||
console.log(chalk.yellow('⚡ PIPELINE TESTING MODE ENABLED - Using minimal test prompts for fast pipeline validation\n'));
|
|
||||||
}
|
|
||||||
if (disableLoader) {
|
|
||||||
console.log(chalk.yellow('⚙️ LOADER DISABLED - Progress indicator will not be shown\n'));
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
const result = await main(webUrl!, repoPathValidation.path!, configPath, pipelineTestingMode, disableLoader, outputPath);
|
|
||||||
console.log(chalk.green.bold('\n📄 FINAL REPORT AVAILABLE:'));
|
|
||||||
console.log(chalk.cyan(result.reportPath));
|
|
||||||
console.log(chalk.green.bold('\n📂 AUDIT LOGS AVAILABLE:'));
|
|
||||||
console.log(chalk.cyan(result.auditLogsPath));
|
|
||||||
|
|
||||||
} catch (error) {
|
|
||||||
// Enhanced error boundary with proper logging
|
|
||||||
if (error instanceof PentestError) {
|
|
||||||
await logError(error, 'Main execution failed');
|
|
||||||
console.log(chalk.red.bold('\n🚨 PENTEST EXECUTION FAILED'));
|
|
||||||
console.log(chalk.red(` Type: ${error.type}`));
|
|
||||||
console.log(chalk.red(` Retryable: ${error.retryable ? 'Yes' : 'No'}`));
|
|
||||||
|
|
||||||
if (error.retryable) {
|
|
||||||
console.log(chalk.yellow(' Consider running the command again or checking network connectivity.'));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
const err = error as Error;
|
|
||||||
console.log(chalk.red.bold('\n🚨 UNEXPECTED ERROR OCCURRED'));
|
|
||||||
console.log(chalk.red(` Error: ${err?.message || err?.toString() || 'Unknown error'}`));
|
|
||||||
|
|
||||||
if (process.env.DEBUG) {
|
|
||||||
console.log(chalk.gray(` Stack: ${err?.stack || 'No stack trace available'}`));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
process.exit(1);
|
|
||||||
}
|
|
||||||
@@ -0,0 +1,469 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Temporal activities for Shannon agent execution.
|
||||||
|
*
|
||||||
|
* Each activity wraps a single agent execution with:
|
||||||
|
* - Heartbeat loop (2s interval) to signal worker liveness
|
||||||
|
* - Git checkpoint/rollback/commit per attempt
|
||||||
|
* - Error classification for Temporal retry behavior
|
||||||
|
* - Audit session logging
|
||||||
|
*
|
||||||
|
* Temporal handles retries based on error classification:
|
||||||
|
* - Retryable: BillingError, TransientError (429, 5xx, network)
|
||||||
|
* - Non-retryable: AuthenticationError, PermissionError, ConfigurationError, etc.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { heartbeat, ApplicationFailure, Context } from '@temporalio/activity';
|
||||||
|
import chalk from 'chalk';
|
||||||
|
|
||||||
|
// Max lengths to prevent Temporal protobuf buffer overflow
|
||||||
|
const MAX_ERROR_MESSAGE_LENGTH = 2000;
|
||||||
|
const MAX_STACK_TRACE_LENGTH = 1000;
|
||||||
|
|
||||||
|
// Max retries for output validation errors (agent didn't save deliverables)
|
||||||
|
// Lower than default 50 since this is unlikely to self-heal
|
||||||
|
const MAX_OUTPUT_VALIDATION_RETRIES = 3;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Truncate error message to prevent buffer overflow in Temporal serialization.
|
||||||
|
*/
|
||||||
|
function truncateErrorMessage(message: string): string {
|
||||||
|
if (message.length <= MAX_ERROR_MESSAGE_LENGTH) {
|
||||||
|
return message;
|
||||||
|
}
|
||||||
|
return message.slice(0, MAX_ERROR_MESSAGE_LENGTH - 20) + '\n[truncated]';
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Truncate stack trace on an ApplicationFailure to prevent buffer overflow.
|
||||||
|
*/
|
||||||
|
function truncateStackTrace(failure: ApplicationFailure): void {
|
||||||
|
if (failure.stack && failure.stack.length > MAX_STACK_TRACE_LENGTH) {
|
||||||
|
failure.stack = failure.stack.slice(0, MAX_STACK_TRACE_LENGTH) + '\n[stack truncated]';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
import {
|
||||||
|
runClaudePrompt,
|
||||||
|
validateAgentOutput,
|
||||||
|
type ClaudePromptResult,
|
||||||
|
} from '../ai/claude-executor.js';
|
||||||
|
import { loadPrompt } from '../prompts/prompt-manager.js';
|
||||||
|
import { parseConfig, distributeConfig } from '../config-parser.js';
|
||||||
|
import { classifyErrorForTemporal } from '../error-handling.js';
|
||||||
|
import {
|
||||||
|
safeValidateQueueAndDeliverable,
|
||||||
|
type VulnType,
|
||||||
|
type ExploitationDecision,
|
||||||
|
} from '../queue-validation.js';
|
||||||
|
import {
|
||||||
|
createGitCheckpoint,
|
||||||
|
commitGitSuccess,
|
||||||
|
rollbackGitWorkspace,
|
||||||
|
getGitCommitHash,
|
||||||
|
} from '../utils/git-manager.js';
|
||||||
|
import { assembleFinalReport } from '../phases/reporting.js';
|
||||||
|
import { getPromptNameForAgent } from '../types/agents.js';
|
||||||
|
import { AuditSession } from '../audit/index.js';
|
||||||
|
import type { WorkflowSummary } from '../audit/workflow-logger.js';
|
||||||
|
import type { AgentName } from '../types/agents.js';
|
||||||
|
import type { AgentMetrics } from './shared.js';
|
||||||
|
import type { DistributedConfig } from '../types/config.js';
|
||||||
|
import type { SessionMetadata } from '../audit/utils.js';
|
||||||
|
|
||||||
|
const HEARTBEAT_INTERVAL_MS = 2000; // Must be < heartbeatTimeout (10min production, 5min testing)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Input for all agent activities.
|
||||||
|
* Matches PipelineInput but with required workflowId for audit correlation.
|
||||||
|
*/
|
||||||
|
export interface ActivityInput {
|
||||||
|
webUrl: string;
|
||||||
|
repoPath: string;
|
||||||
|
configPath?: string;
|
||||||
|
outputPath?: string;
|
||||||
|
pipelineTestingMode?: boolean;
|
||||||
|
workflowId: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Core activity implementation.
|
||||||
|
*
|
||||||
|
* Executes a single agent with:
|
||||||
|
* 1. Heartbeat loop for worker liveness
|
||||||
|
* 2. Config loading (if configPath provided)
|
||||||
|
* 3. Audit session initialization
|
||||||
|
* 4. Prompt loading
|
||||||
|
* 5. Git checkpoint before execution
|
||||||
|
* 6. Agent execution (single attempt)
|
||||||
|
* 7. Output validation
|
||||||
|
* 8. Git commit on success, rollback on failure
|
||||||
|
* 9. Error classification for Temporal retry
|
||||||
|
*/
|
||||||
|
async function runAgentActivity(
|
||||||
|
agentName: AgentName,
|
||||||
|
input: ActivityInput
|
||||||
|
): Promise<AgentMetrics> {
|
||||||
|
const {
|
||||||
|
webUrl,
|
||||||
|
repoPath,
|
||||||
|
configPath,
|
||||||
|
outputPath,
|
||||||
|
pipelineTestingMode = false,
|
||||||
|
workflowId,
|
||||||
|
} = input;
|
||||||
|
|
||||||
|
const startTime = Date.now();
|
||||||
|
|
||||||
|
// Get attempt number from Temporal context (tracks retries automatically)
|
||||||
|
const attemptNumber = Context.current().info.attempt;
|
||||||
|
|
||||||
|
// Heartbeat loop - signals worker is alive to Temporal server
|
||||||
|
const heartbeatInterval = setInterval(() => {
|
||||||
|
const elapsed = Math.floor((Date.now() - startTime) / 1000);
|
||||||
|
heartbeat({ agent: agentName, elapsedSeconds: elapsed, attempt: attemptNumber });
|
||||||
|
}, HEARTBEAT_INTERVAL_MS);
|
||||||
|
|
||||||
|
try {
|
||||||
|
// 1. Load config (if provided)
|
||||||
|
let distributedConfig: DistributedConfig | null = null;
|
||||||
|
if (configPath) {
|
||||||
|
try {
|
||||||
|
const config = await parseConfig(configPath);
|
||||||
|
distributedConfig = distributeConfig(config);
|
||||||
|
} catch (err) {
|
||||||
|
throw new Error(`Failed to load config ${configPath}: ${err instanceof Error ? err.message : String(err)}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Build session metadata for audit
|
||||||
|
const sessionMetadata: SessionMetadata = {
|
||||||
|
id: workflowId,
|
||||||
|
webUrl,
|
||||||
|
repoPath,
|
||||||
|
...(outputPath && { outputPath }),
|
||||||
|
};
|
||||||
|
|
||||||
|
// 3. Initialize audit session (idempotent, safe across retries)
|
||||||
|
const auditSession = new AuditSession(sessionMetadata);
|
||||||
|
await auditSession.initialize();
|
||||||
|
|
||||||
|
// 4. Load prompt
|
||||||
|
const promptName = getPromptNameForAgent(agentName);
|
||||||
|
const prompt = await loadPrompt(
|
||||||
|
promptName,
|
||||||
|
{ webUrl, repoPath },
|
||||||
|
distributedConfig,
|
||||||
|
pipelineTestingMode
|
||||||
|
);
|
||||||
|
|
||||||
|
// 5. Create git checkpoint before execution
|
||||||
|
await createGitCheckpoint(repoPath, agentName, attemptNumber);
|
||||||
|
await auditSession.startAgent(agentName, prompt, attemptNumber);
|
||||||
|
|
||||||
|
// 6. Execute agent (single attempt - Temporal handles retries)
|
||||||
|
const result: ClaudePromptResult = await runClaudePrompt(
|
||||||
|
prompt,
|
||||||
|
repoPath,
|
||||||
|
'', // context
|
||||||
|
agentName, // description
|
||||||
|
agentName,
|
||||||
|
chalk.cyan,
|
||||||
|
sessionMetadata,
|
||||||
|
auditSession,
|
||||||
|
attemptNumber
|
||||||
|
);
|
||||||
|
|
||||||
|
// 6.5. Sanity check: Detect spending cap that slipped through all detection layers
|
||||||
|
// Defense-in-depth: A successful agent execution should never have ≤2 turns with $0 cost
|
||||||
|
if (result.success && (result.turns ?? 0) <= 2 && (result.cost || 0) === 0) {
|
||||||
|
const resultText = result.result || '';
|
||||||
|
const looksLikeBillingError = /spending|cap|limit|budget|resets/i.test(resultText);
|
||||||
|
|
||||||
|
if (looksLikeBillingError) {
|
||||||
|
await rollbackGitWorkspace(repoPath, 'spending cap detected');
|
||||||
|
await auditSession.endAgent(agentName, {
|
||||||
|
attemptNumber,
|
||||||
|
duration_ms: result.duration,
|
||||||
|
cost_usd: 0,
|
||||||
|
success: false,
|
||||||
|
error: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
|
||||||
|
});
|
||||||
|
// Throw as billing error so Temporal retries with long backoff
|
||||||
|
throw new Error(`Spending cap likely reached: ${resultText.slice(0, 100)}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 7. Handle execution failure
|
||||||
|
if (!result.success) {
|
||||||
|
await rollbackGitWorkspace(repoPath, 'execution failure');
|
||||||
|
await auditSession.endAgent(agentName, {
|
||||||
|
attemptNumber,
|
||||||
|
duration_ms: result.duration,
|
||||||
|
cost_usd: result.cost || 0,
|
||||||
|
success: false,
|
||||||
|
error: result.error || 'Execution failed',
|
||||||
|
});
|
||||||
|
throw new Error(result.error || 'Agent execution failed');
|
||||||
|
}
|
||||||
|
|
||||||
|
// 8. Validate output
|
||||||
|
const validationPassed = await validateAgentOutput(result, agentName, repoPath);
|
||||||
|
if (!validationPassed) {
|
||||||
|
await rollbackGitWorkspace(repoPath, 'validation failure');
|
||||||
|
await auditSession.endAgent(agentName, {
|
||||||
|
attemptNumber,
|
||||||
|
duration_ms: result.duration,
|
||||||
|
cost_usd: result.cost || 0,
|
||||||
|
success: false,
|
||||||
|
error: 'Output validation failed',
|
||||||
|
});
|
||||||
|
|
||||||
|
// Limit output validation retries (unlikely to self-heal)
|
||||||
|
if (attemptNumber >= MAX_OUTPUT_VALIDATION_RETRIES) {
|
||||||
|
throw ApplicationFailure.nonRetryable(
|
||||||
|
`Agent ${agentName} failed output validation after ${attemptNumber} attempts`,
|
||||||
|
'OutputValidationError',
|
||||||
|
[{ agentName, attemptNumber, elapsed: Date.now() - startTime }]
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// Let Temporal retry (will be classified as OutputValidationError)
|
||||||
|
throw new Error(`Agent ${agentName} failed output validation`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 9. Success - commit and log
|
||||||
|
const commitHash = await getGitCommitHash(repoPath);
|
||||||
|
await auditSession.endAgent(agentName, {
|
||||||
|
attemptNumber,
|
||||||
|
duration_ms: result.duration,
|
||||||
|
cost_usd: result.cost || 0,
|
||||||
|
success: true,
|
||||||
|
...(commitHash && { checkpoint: commitHash }),
|
||||||
|
});
|
||||||
|
await commitGitSuccess(repoPath, agentName);
|
||||||
|
|
||||||
|
// 10. Return metrics
|
||||||
|
return {
|
||||||
|
durationMs: Date.now() - startTime,
|
||||||
|
inputTokens: null, // Not currently exposed by SDK wrapper
|
||||||
|
outputTokens: null,
|
||||||
|
costUsd: result.cost ?? null,
|
||||||
|
numTurns: result.turns ?? null,
|
||||||
|
};
|
||||||
|
} catch (error) {
|
||||||
|
// Rollback git workspace before Temporal retry to ensure clean state
|
||||||
|
try {
|
||||||
|
await rollbackGitWorkspace(repoPath, 'error recovery');
|
||||||
|
} catch (rollbackErr) {
|
||||||
|
// Log but don't fail - rollback is best-effort
|
||||||
|
console.error(`Failed to rollback git workspace for ${agentName}:`, rollbackErr);
|
||||||
|
}
|
||||||
|
|
||||||
|
// If error is already an ApplicationFailure (e.g., from our retry limit logic),
|
||||||
|
// re-throw it directly without re-classifying
|
||||||
|
if (error instanceof ApplicationFailure) {
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Classify error for Temporal retry behavior
|
||||||
|
const classified = classifyErrorForTemporal(error);
|
||||||
|
// Truncate message to prevent protobuf buffer overflow
|
||||||
|
const rawMessage = error instanceof Error ? error.message : String(error);
|
||||||
|
const message = truncateErrorMessage(rawMessage);
|
||||||
|
|
||||||
|
if (classified.retryable) {
|
||||||
|
// Temporal will retry with configured backoff
|
||||||
|
const failure = ApplicationFailure.create({
|
||||||
|
message,
|
||||||
|
type: classified.type,
|
||||||
|
details: [{ agentName, attemptNumber, elapsed: Date.now() - startTime }],
|
||||||
|
});
|
||||||
|
truncateStackTrace(failure);
|
||||||
|
throw failure;
|
||||||
|
} else {
|
||||||
|
// Fail immediately - no retry
|
||||||
|
const failure = ApplicationFailure.nonRetryable(message, classified.type, [
|
||||||
|
{ agentName, attemptNumber, elapsed: Date.now() - startTime },
|
||||||
|
]);
|
||||||
|
truncateStackTrace(failure);
|
||||||
|
throw failure;
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
clearInterval(heartbeatInterval);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// === Individual Agent Activity Exports ===
|
||||||
|
// Each function is a thin wrapper around runAgentActivity with the agent name.
|
||||||
|
|
||||||
|
export async function runPreReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('pre-recon', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('recon', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runInjectionVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('injection-vuln', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runXssVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('xss-vuln', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runAuthVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('auth-vuln', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runSsrfVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('ssrf-vuln', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runAuthzVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('authz-vuln', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runInjectionExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('injection-exploit', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runXssExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('xss-exploit', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runAuthExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('auth-exploit', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runSsrfExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('ssrf-exploit', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runAuthzExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('authz-exploit', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runReportAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||||
|
return runAgentActivity('report', input);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Assemble the final report by concatenating exploitation evidence files.
|
||||||
|
* This must be called BEFORE runReportAgent to create the file that the report agent will modify.
|
||||||
|
*/
|
||||||
|
export async function assembleReportActivity(input: ActivityInput): Promise<void> {
|
||||||
|
const { repoPath } = input;
|
||||||
|
console.log(chalk.blue('📝 Assembling deliverables from specialist agents...'));
|
||||||
|
try {
|
||||||
|
await assembleFinalReport(repoPath);
|
||||||
|
} catch (error) {
|
||||||
|
const err = error as Error;
|
||||||
|
console.log(chalk.yellow(`⚠️ Error assembling final report: ${err.message}`));
|
||||||
|
// Don't throw - the report agent can still create content even if no exploitation files exist
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Check if exploitation should run for a given vulnerability type.
|
||||||
|
* Reads the vulnerability queue file and returns the decision.
|
||||||
|
*
|
||||||
|
* This activity allows the workflow to skip exploit agents entirely
|
||||||
|
* when no vulnerabilities were found, saving API calls and time.
|
||||||
|
*
|
||||||
|
* Error handling:
|
||||||
|
* - Retryable errors (missing files, invalid JSON): re-throw for Temporal retry
|
||||||
|
* - Non-retryable errors: skip exploitation gracefully
|
||||||
|
*/
|
||||||
|
export async function checkExploitationQueue(
|
||||||
|
input: ActivityInput,
|
||||||
|
vulnType: VulnType
|
||||||
|
): Promise<ExploitationDecision> {
|
||||||
|
const { repoPath } = input;
|
||||||
|
|
||||||
|
const result = await safeValidateQueueAndDeliverable(vulnType, repoPath);
|
||||||
|
|
||||||
|
if (result.success && result.data) {
|
||||||
|
const { shouldExploit, vulnerabilityCount } = result.data;
|
||||||
|
console.log(
|
||||||
|
chalk.blue(
|
||||||
|
`🔍 ${vulnType}: ${shouldExploit ? `${vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`
|
||||||
|
)
|
||||||
|
);
|
||||||
|
return result.data;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validation failed - check if we should retry or skip
|
||||||
|
const error = result.error;
|
||||||
|
if (error?.retryable) {
|
||||||
|
// Re-throw retryable errors so Temporal can retry the vuln agent
|
||||||
|
console.log(chalk.yellow(`⚠️ ${vulnType}: ${error.message} (retrying)`));
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Non-retryable error - skip exploitation gracefully
|
||||||
|
console.log(
|
||||||
|
chalk.yellow(`⚠️ ${vulnType}: ${error?.message ?? 'Unknown error'}, skipping exploitation`)
|
||||||
|
);
|
||||||
|
return {
|
||||||
|
shouldExploit: false,
|
||||||
|
shouldRetry: false,
|
||||||
|
vulnerabilityCount: 0,
|
||||||
|
vulnType,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log phase transition to the unified workflow log.
|
||||||
|
* Called at phase boundaries for per-workflow logging.
|
||||||
|
*/
|
||||||
|
export async function logPhaseTransition(
|
||||||
|
input: ActivityInput,
|
||||||
|
phase: string,
|
||||||
|
event: 'start' | 'complete'
|
||||||
|
): Promise<void> {
|
||||||
|
const { webUrl, repoPath, outputPath, workflowId } = input;
|
||||||
|
|
||||||
|
const sessionMetadata: SessionMetadata = {
|
||||||
|
id: workflowId,
|
||||||
|
webUrl,
|
||||||
|
repoPath,
|
||||||
|
...(outputPath && { outputPath }),
|
||||||
|
};
|
||||||
|
|
||||||
|
const auditSession = new AuditSession(sessionMetadata);
|
||||||
|
await auditSession.initialize();
|
||||||
|
|
||||||
|
if (event === 'start') {
|
||||||
|
await auditSession.logPhaseStart(phase);
|
||||||
|
} else {
|
||||||
|
await auditSession.logPhaseComplete(phase);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log workflow completion with full summary to the unified workflow log.
|
||||||
|
* Called at the end of the workflow to write a summary breakdown.
|
||||||
|
*/
|
||||||
|
export async function logWorkflowComplete(
|
||||||
|
input: ActivityInput,
|
||||||
|
summary: WorkflowSummary
|
||||||
|
): Promise<void> {
|
||||||
|
const { webUrl, repoPath, outputPath, workflowId } = input;
|
||||||
|
|
||||||
|
const sessionMetadata: SessionMetadata = {
|
||||||
|
id: workflowId,
|
||||||
|
webUrl,
|
||||||
|
repoPath,
|
||||||
|
...(outputPath && { outputPath }),
|
||||||
|
};
|
||||||
|
|
||||||
|
const auditSession = new AuditSession(sessionMetadata);
|
||||||
|
await auditSession.initialize();
|
||||||
|
await auditSession.logWorkflowComplete(summary);
|
||||||
|
}
|
||||||
@@ -0,0 +1,212 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Temporal client for starting Shannon pentest pipeline workflows.
|
||||||
|
*
|
||||||
|
* Starts a workflow and optionally waits for completion with progress polling.
|
||||||
|
*
|
||||||
|
* Usage:
|
||||||
|
* npm run temporal:start -- <webUrl> <repoPath> [options]
|
||||||
|
* # or
|
||||||
|
* node dist/temporal/client.js <webUrl> <repoPath> [options]
|
||||||
|
*
|
||||||
|
* Options:
|
||||||
|
* --config <path> Configuration file path
|
||||||
|
* --output <path> Output directory for audit logs
|
||||||
|
* --pipeline-testing Use minimal prompts for fast testing
|
||||||
|
* --workflow-id <id> Custom workflow ID (default: shannon-<timestamp>)
|
||||||
|
* --wait Wait for workflow completion with progress polling
|
||||||
|
*
|
||||||
|
* Environment:
|
||||||
|
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { Connection, Client } from '@temporalio/client';
|
||||||
|
import dotenv from 'dotenv';
|
||||||
|
import chalk from 'chalk';
|
||||||
|
import { displaySplashScreen } from '../splash-screen.js';
|
||||||
|
import { sanitizeHostname } from '../audit/utils.js';
|
||||||
|
// Import types only - these don't pull in workflow runtime code
|
||||||
|
import type { PipelineInput, PipelineState, PipelineProgress } from './shared.js';
|
||||||
|
|
||||||
|
dotenv.config();
|
||||||
|
|
||||||
|
// Query name must match the one defined in workflows.ts
|
||||||
|
const PROGRESS_QUERY = 'getProgress';
|
||||||
|
|
||||||
|
function showUsage(): void {
|
||||||
|
console.log(chalk.cyan.bold('\nShannon Temporal Client'));
|
||||||
|
console.log(chalk.gray('Start a pentest pipeline workflow\n'));
|
||||||
|
console.log(chalk.yellow('Usage:'));
|
||||||
|
console.log(
|
||||||
|
' node dist/temporal/client.js <webUrl> <repoPath> [options]\n'
|
||||||
|
);
|
||||||
|
console.log(chalk.yellow('Options:'));
|
||||||
|
console.log(' --config <path> Configuration file path');
|
||||||
|
console.log(' --output <path> Output directory for audit logs');
|
||||||
|
console.log(' --pipeline-testing Use minimal prompts for fast testing');
|
||||||
|
console.log(
|
||||||
|
' --workflow-id <id> Custom workflow ID (default: shannon-<timestamp>)'
|
||||||
|
);
|
||||||
|
console.log(' --wait Wait for workflow completion with progress polling\n');
|
||||||
|
console.log(chalk.yellow('Examples:'));
|
||||||
|
console.log(' node dist/temporal/client.js https://example.com /path/to/repo');
|
||||||
|
console.log(
|
||||||
|
' node dist/temporal/client.js https://example.com /path/to/repo --config config.yaml\n'
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function startPipeline(): Promise<void> {
|
||||||
|
const args = process.argv.slice(2);
|
||||||
|
|
||||||
|
if (args.includes('--help') || args.includes('-h') || args.length === 0) {
|
||||||
|
showUsage();
|
||||||
|
process.exit(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse arguments
|
||||||
|
let webUrl: string | undefined;
|
||||||
|
let repoPath: string | undefined;
|
||||||
|
let configPath: string | undefined;
|
||||||
|
let outputPath: string | undefined;
|
||||||
|
let pipelineTestingMode = false;
|
||||||
|
let customWorkflowId: string | undefined;
|
||||||
|
let waitForCompletion = false;
|
||||||
|
|
||||||
|
for (let i = 0; i < args.length; i++) {
|
||||||
|
const arg = args[i];
|
||||||
|
if (arg === '--config') {
|
||||||
|
const nextArg = args[i + 1];
|
||||||
|
if (nextArg && !nextArg.startsWith('-')) {
|
||||||
|
configPath = nextArg;
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
} else if (arg === '--output') {
|
||||||
|
const nextArg = args[i + 1];
|
||||||
|
if (nextArg && !nextArg.startsWith('-')) {
|
||||||
|
outputPath = nextArg;
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
} else if (arg === '--workflow-id') {
|
||||||
|
const nextArg = args[i + 1];
|
||||||
|
if (nextArg && !nextArg.startsWith('-')) {
|
||||||
|
customWorkflowId = nextArg;
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
} else if (arg === '--pipeline-testing') {
|
||||||
|
pipelineTestingMode = true;
|
||||||
|
} else if (arg === '--wait') {
|
||||||
|
waitForCompletion = true;
|
||||||
|
} else if (arg && !arg.startsWith('-')) {
|
||||||
|
if (!webUrl) {
|
||||||
|
webUrl = arg;
|
||||||
|
} else if (!repoPath) {
|
||||||
|
repoPath = arg;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!webUrl || !repoPath) {
|
||||||
|
console.log(chalk.red('Error: webUrl and repoPath are required'));
|
||||||
|
showUsage();
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Display splash screen
|
||||||
|
await displaySplashScreen();
|
||||||
|
|
||||||
|
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||||
|
console.log(chalk.gray(`Connecting to Temporal at ${address}...`));
|
||||||
|
|
||||||
|
const connection = await Connection.connect({ address });
|
||||||
|
const client = new Client({ connection });
|
||||||
|
|
||||||
|
try {
|
||||||
|
const hostname = sanitizeHostname(webUrl);
|
||||||
|
const workflowId = customWorkflowId || `${hostname}_shannon-${Date.now()}`;
|
||||||
|
|
||||||
|
const input: PipelineInput = {
|
||||||
|
webUrl,
|
||||||
|
repoPath,
|
||||||
|
...(configPath && { configPath }),
|
||||||
|
...(outputPath && { outputPath }),
|
||||||
|
...(pipelineTestingMode && { pipelineTestingMode }),
|
||||||
|
};
|
||||||
|
|
||||||
|
console.log(chalk.green.bold(`✓ Workflow started: ${workflowId}`));
|
||||||
|
console.log();
|
||||||
|
console.log(chalk.white(' Target: ') + chalk.cyan(webUrl));
|
||||||
|
console.log(chalk.white(' Repository: ') + chalk.cyan(repoPath));
|
||||||
|
if (configPath) {
|
||||||
|
console.log(chalk.white(' Config: ') + chalk.cyan(configPath));
|
||||||
|
}
|
||||||
|
if (pipelineTestingMode) {
|
||||||
|
console.log(chalk.white(' Mode: ') + chalk.yellow('Pipeline Testing'));
|
||||||
|
}
|
||||||
|
console.log();
|
||||||
|
|
||||||
|
// Start workflow by name (not by importing the function)
|
||||||
|
const handle = await client.workflow.start<(input: PipelineInput) => Promise<PipelineState>>(
|
||||||
|
'pentestPipelineWorkflow',
|
||||||
|
{
|
||||||
|
taskQueue: 'shannon-pipeline',
|
||||||
|
workflowId,
|
||||||
|
args: [input],
|
||||||
|
}
|
||||||
|
);
|
||||||
|
|
||||||
|
if (!waitForCompletion) {
|
||||||
|
console.log(chalk.bold('Monitor progress:'));
|
||||||
|
console.log(chalk.white(' Web UI: ') + chalk.blue(`http://localhost:8233/namespaces/default/workflows/${workflowId}`));
|
||||||
|
console.log(chalk.white(' Logs: ') + chalk.gray(`./shannon logs ID=${workflowId}`));
|
||||||
|
console.log(chalk.white(' Query: ') + chalk.gray(`./shannon query ID=${workflowId}`));
|
||||||
|
console.log();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Poll for progress every 30 seconds
|
||||||
|
const progressInterval = setInterval(async () => {
|
||||||
|
try {
|
||||||
|
const progress = await handle.query<PipelineProgress>(PROGRESS_QUERY);
|
||||||
|
const elapsed = Math.floor(progress.elapsedMs / 1000);
|
||||||
|
console.log(
|
||||||
|
chalk.gray(`[${elapsed}s]`),
|
||||||
|
chalk.cyan(`Phase: ${progress.currentPhase || 'unknown'}`),
|
||||||
|
chalk.gray(`| Agent: ${progress.currentAgent || 'none'}`),
|
||||||
|
chalk.gray(`| Completed: ${progress.completedAgents.length}/13`)
|
||||||
|
);
|
||||||
|
} catch {
|
||||||
|
// Workflow may have completed
|
||||||
|
}
|
||||||
|
}, 30000);
|
||||||
|
|
||||||
|
try {
|
||||||
|
const result = await handle.result();
|
||||||
|
clearInterval(progressInterval);
|
||||||
|
|
||||||
|
console.log(chalk.green.bold('\nPipeline completed successfully!'));
|
||||||
|
if (result.summary) {
|
||||||
|
console.log(chalk.gray(`Duration: ${Math.floor(result.summary.totalDurationMs / 1000)}s`));
|
||||||
|
console.log(chalk.gray(`Agents completed: ${result.summary.agentCount}`));
|
||||||
|
console.log(chalk.gray(`Total turns: ${result.summary.totalTurns}`));
|
||||||
|
console.log(chalk.gray(`Total cost: $${result.summary.totalCostUsd.toFixed(4)}`));
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
clearInterval(progressInterval);
|
||||||
|
console.error(chalk.red.bold('\nPipeline failed:'), error);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
await connection.close();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
startPipeline().catch((err) => {
|
||||||
|
console.error(chalk.red('Client error:'), err);
|
||||||
|
process.exit(1);
|
||||||
|
});
|
||||||
@@ -0,0 +1,155 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Temporal query tool for inspecting Shannon workflow progress.
|
||||||
|
*
|
||||||
|
* Queries a running or completed workflow and displays its state.
|
||||||
|
*
|
||||||
|
* Usage:
|
||||||
|
* npm run temporal:query -- <workflowId>
|
||||||
|
* # or
|
||||||
|
* node dist/temporal/query.js <workflowId>
|
||||||
|
*
|
||||||
|
* Environment:
|
||||||
|
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { Connection, Client } from '@temporalio/client';
|
||||||
|
import dotenv from 'dotenv';
|
||||||
|
import chalk from 'chalk';
|
||||||
|
|
||||||
|
dotenv.config();
|
||||||
|
|
||||||
|
// Query name must match the one defined in workflows.ts
|
||||||
|
const PROGRESS_QUERY = 'getProgress';
|
||||||
|
|
||||||
|
// Types duplicated from shared.ts to avoid importing workflow APIs
|
||||||
|
interface AgentMetrics {
|
||||||
|
durationMs: number;
|
||||||
|
inputTokens: number | null;
|
||||||
|
outputTokens: number | null;
|
||||||
|
costUsd: number | null;
|
||||||
|
numTurns: number | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface PipelineProgress {
|
||||||
|
status: 'running' | 'completed' | 'failed';
|
||||||
|
currentPhase: string | null;
|
||||||
|
currentAgent: string | null;
|
||||||
|
completedAgents: string[];
|
||||||
|
failedAgent: string | null;
|
||||||
|
error: string | null;
|
||||||
|
startTime: number;
|
||||||
|
agentMetrics: Record<string, AgentMetrics>;
|
||||||
|
workflowId: string;
|
||||||
|
elapsedMs: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
function showUsage(): void {
|
||||||
|
console.log(chalk.cyan.bold('\nShannon Temporal Query Tool'));
|
||||||
|
console.log(chalk.gray('Query progress of a running workflow\n'));
|
||||||
|
console.log(chalk.yellow('Usage:'));
|
||||||
|
console.log(' node dist/temporal/query.js <workflowId>\n');
|
||||||
|
console.log(chalk.yellow('Examples:'));
|
||||||
|
console.log(' node dist/temporal/query.js shannon-1704672000000\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
function getStatusColor(status: string): string {
|
||||||
|
switch (status) {
|
||||||
|
case 'running':
|
||||||
|
return chalk.yellow(status);
|
||||||
|
case 'completed':
|
||||||
|
return chalk.green(status);
|
||||||
|
case 'failed':
|
||||||
|
return chalk.red(status);
|
||||||
|
default:
|
||||||
|
return status;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatDuration(ms: number): string {
|
||||||
|
const seconds = Math.floor(ms / 1000);
|
||||||
|
const minutes = Math.floor(seconds / 60);
|
||||||
|
const hours = Math.floor(minutes / 60);
|
||||||
|
|
||||||
|
if (hours > 0) {
|
||||||
|
return `${hours}h ${minutes % 60}m`;
|
||||||
|
} else if (minutes > 0) {
|
||||||
|
return `${minutes}m ${seconds % 60}s`;
|
||||||
|
}
|
||||||
|
return `${seconds}s`;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function queryWorkflow(): Promise<void> {
|
||||||
|
const workflowId = process.argv[2];
|
||||||
|
|
||||||
|
if (!workflowId || workflowId === '--help' || workflowId === '-h') {
|
||||||
|
showUsage();
|
||||||
|
process.exit(workflowId ? 0 : 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||||
|
|
||||||
|
const connection = await Connection.connect({ address });
|
||||||
|
const client = new Client({ connection });
|
||||||
|
|
||||||
|
try {
|
||||||
|
const handle = client.workflow.getHandle(workflowId);
|
||||||
|
const progress = await handle.query<PipelineProgress>(PROGRESS_QUERY);
|
||||||
|
|
||||||
|
console.log(chalk.cyan.bold('\nWorkflow Progress'));
|
||||||
|
console.log(chalk.gray('\u2500'.repeat(40)));
|
||||||
|
console.log(`${chalk.white('Workflow ID:')} ${progress.workflowId}`);
|
||||||
|
console.log(`${chalk.white('Status:')} ${getStatusColor(progress.status)}`);
|
||||||
|
console.log(
|
||||||
|
`${chalk.white('Current Phase:')} ${progress.currentPhase || 'none'}`
|
||||||
|
);
|
||||||
|
console.log(
|
||||||
|
`${chalk.white('Current Agent:')} ${progress.currentAgent || 'none'}`
|
||||||
|
);
|
||||||
|
console.log(`${chalk.white('Elapsed:')} ${formatDuration(progress.elapsedMs)}`);
|
||||||
|
console.log(
|
||||||
|
`${chalk.white('Completed:')} ${progress.completedAgents.length}/13 agents`
|
||||||
|
);
|
||||||
|
|
||||||
|
if (progress.completedAgents.length > 0) {
|
||||||
|
console.log(chalk.gray('\nCompleted agents:'));
|
||||||
|
for (const agent of progress.completedAgents) {
|
||||||
|
const metrics = progress.agentMetrics[agent];
|
||||||
|
const duration = metrics ? formatDuration(metrics.durationMs) : 'unknown';
|
||||||
|
const cost = metrics?.costUsd ? `$${metrics.costUsd.toFixed(4)}` : '';
|
||||||
|
console.log(
|
||||||
|
chalk.green(` - ${agent}`) +
|
||||||
|
chalk.gray(` (${duration}${cost ? ', ' + cost : ''})`)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (progress.error) {
|
||||||
|
console.log(chalk.red(`\nError: ${progress.error}`));
|
||||||
|
console.log(chalk.red(`Failed agent: ${progress.failedAgent}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log();
|
||||||
|
} catch (error) {
|
||||||
|
const err = error as Error;
|
||||||
|
if (err.message?.includes('not found')) {
|
||||||
|
console.log(chalk.red(`Workflow not found: ${workflowId}`));
|
||||||
|
} else {
|
||||||
|
console.error(chalk.red('Query failed:'), err.message);
|
||||||
|
}
|
||||||
|
process.exit(1);
|
||||||
|
} finally {
|
||||||
|
await connection.close();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
queryWorkflow().catch((err) => {
|
||||||
|
console.error(chalk.red('Query error:'), err);
|
||||||
|
process.exit(1);
|
||||||
|
});
|
||||||
@@ -0,0 +1,61 @@
|
|||||||
|
import { defineQuery } from '@temporalio/workflow';
|
||||||
|
|
||||||
|
// === Types ===
|
||||||
|
|
||||||
|
export interface PipelineInput {
|
||||||
|
webUrl: string;
|
||||||
|
repoPath: string;
|
||||||
|
configPath?: string;
|
||||||
|
outputPath?: string;
|
||||||
|
pipelineTestingMode?: boolean;
|
||||||
|
workflowId?: string; // Added by client, used for audit correlation
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface AgentMetrics {
|
||||||
|
durationMs: number;
|
||||||
|
inputTokens: number | null;
|
||||||
|
outputTokens: number | null;
|
||||||
|
costUsd: number | null;
|
||||||
|
numTurns: number | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PipelineSummary {
|
||||||
|
totalCostUsd: number;
|
||||||
|
totalDurationMs: number; // Wall-clock time (end - start)
|
||||||
|
totalTurns: number;
|
||||||
|
agentCount: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PipelineState {
|
||||||
|
status: 'running' | 'completed' | 'failed';
|
||||||
|
currentPhase: string | null;
|
||||||
|
currentAgent: string | null;
|
||||||
|
completedAgents: string[];
|
||||||
|
failedAgent: string | null;
|
||||||
|
error: string | null;
|
||||||
|
startTime: number;
|
||||||
|
agentMetrics: Record<string, AgentMetrics>;
|
||||||
|
summary: PipelineSummary | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extended state returned by getProgress query (includes computed fields)
|
||||||
|
export interface PipelineProgress extends PipelineState {
|
||||||
|
workflowId: string;
|
||||||
|
elapsedMs: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Result from a single vuln→exploit pipeline
|
||||||
|
export interface VulnExploitPipelineResult {
|
||||||
|
vulnType: string;
|
||||||
|
vulnMetrics: AgentMetrics | null;
|
||||||
|
exploitMetrics: AgentMetrics | null;
|
||||||
|
exploitDecision: {
|
||||||
|
shouldExploit: boolean;
|
||||||
|
vulnerabilityCount: number;
|
||||||
|
} | null;
|
||||||
|
error: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// === Queries ===
|
||||||
|
|
||||||
|
export const getProgress = defineQuery<PipelineProgress>('getProgress');
|
||||||
@@ -0,0 +1,79 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Temporal worker for Shannon pentest pipeline.
|
||||||
|
*
|
||||||
|
* Polls the 'shannon-pipeline' task queue and executes activities.
|
||||||
|
* Handles up to 25 concurrent activities to support multiple parallel workflows.
|
||||||
|
*
|
||||||
|
* Usage:
|
||||||
|
* npm run temporal:worker
|
||||||
|
* # or
|
||||||
|
* node dist/temporal/worker.js
|
||||||
|
*
|
||||||
|
* Environment:
|
||||||
|
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { NativeConnection, Worker, bundleWorkflowCode } from '@temporalio/worker';
|
||||||
|
import { fileURLToPath } from 'node:url';
|
||||||
|
import path from 'node:path';
|
||||||
|
import dotenv from 'dotenv';
|
||||||
|
import chalk from 'chalk';
|
||||||
|
import * as activities from './activities.js';
|
||||||
|
|
||||||
|
dotenv.config();
|
||||||
|
|
||||||
|
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
||||||
|
|
||||||
|
async function runWorker(): Promise<void> {
|
||||||
|
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||||
|
console.log(chalk.cyan(`Connecting to Temporal at ${address}...`));
|
||||||
|
|
||||||
|
const connection = await NativeConnection.connect({ address });
|
||||||
|
|
||||||
|
// Bundle workflows for Temporal's V8 isolate
|
||||||
|
console.log(chalk.gray('Bundling workflows...'));
|
||||||
|
const workflowBundle = await bundleWorkflowCode({
|
||||||
|
workflowsPath: path.join(__dirname, 'workflows.js'),
|
||||||
|
});
|
||||||
|
|
||||||
|
const worker = await Worker.create({
|
||||||
|
connection,
|
||||||
|
namespace: 'default',
|
||||||
|
workflowBundle,
|
||||||
|
activities,
|
||||||
|
taskQueue: 'shannon-pipeline',
|
||||||
|
maxConcurrentActivityTaskExecutions: 25, // Support multiple parallel workflows (5 agents × ~5 workflows)
|
||||||
|
});
|
||||||
|
|
||||||
|
// Graceful shutdown handling
|
||||||
|
const shutdown = async (): Promise<void> => {
|
||||||
|
console.log(chalk.yellow('\nShutting down worker...'));
|
||||||
|
worker.shutdown();
|
||||||
|
};
|
||||||
|
|
||||||
|
process.on('SIGINT', shutdown);
|
||||||
|
process.on('SIGTERM', shutdown);
|
||||||
|
|
||||||
|
console.log(chalk.green('Shannon worker started'));
|
||||||
|
console.log(chalk.gray('Task queue: shannon-pipeline'));
|
||||||
|
console.log(chalk.gray('Press Ctrl+C to stop\n'));
|
||||||
|
|
||||||
|
try {
|
||||||
|
await worker.run();
|
||||||
|
} finally {
|
||||||
|
await connection.close();
|
||||||
|
console.log(chalk.gray('Worker stopped'));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
runWorker().catch((err) => {
|
||||||
|
console.error(chalk.red('Worker failed:'), err);
|
||||||
|
process.exit(1);
|
||||||
|
});
|
||||||
@@ -0,0 +1,325 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Temporal workflow for Shannon pentest pipeline.
|
||||||
|
*
|
||||||
|
* Orchestrates the penetration testing workflow:
|
||||||
|
* 1. Pre-Reconnaissance (sequential)
|
||||||
|
* 2. Reconnaissance (sequential)
|
||||||
|
* 3-4. Vulnerability + Exploitation (5 pipelined pairs in parallel)
|
||||||
|
* Each pair: vuln agent → queue check → conditional exploit
|
||||||
|
* No synchronization barrier - exploits start when their vuln finishes
|
||||||
|
* 5. Reporting (sequential)
|
||||||
|
*
|
||||||
|
* Features:
|
||||||
|
* - Queryable state via getProgress
|
||||||
|
* - Automatic retry with backoff for transient/billing errors
|
||||||
|
* - Non-retryable classification for permanent errors
|
||||||
|
* - Audit correlation via workflowId
|
||||||
|
* - Graceful failure handling: pipelines continue if one fails
|
||||||
|
*/
|
||||||
|
|
||||||
|
import {
|
||||||
|
proxyActivities,
|
||||||
|
setHandler,
|
||||||
|
workflowInfo,
|
||||||
|
} from '@temporalio/workflow';
|
||||||
|
import type * as activities from './activities.js';
|
||||||
|
import type { ActivityInput } from './activities.js';
|
||||||
|
import {
|
||||||
|
getProgress,
|
||||||
|
type PipelineInput,
|
||||||
|
type PipelineState,
|
||||||
|
type PipelineProgress,
|
||||||
|
type PipelineSummary,
|
||||||
|
type VulnExploitPipelineResult,
|
||||||
|
type AgentMetrics,
|
||||||
|
} from './shared.js';
|
||||||
|
import type { VulnType } from '../queue-validation.js';
|
||||||
|
|
||||||
|
// Retry configuration for production (long intervals for billing recovery)
|
||||||
|
const PRODUCTION_RETRY = {
|
||||||
|
initialInterval: '5 minutes',
|
||||||
|
maximumInterval: '30 minutes',
|
||||||
|
backoffCoefficient: 2,
|
||||||
|
maximumAttempts: 50,
|
||||||
|
nonRetryableErrorTypes: [
|
||||||
|
'AuthenticationError',
|
||||||
|
'PermissionError',
|
||||||
|
'InvalidRequestError',
|
||||||
|
'RequestTooLargeError',
|
||||||
|
'ConfigurationError',
|
||||||
|
'InvalidTargetError',
|
||||||
|
'ExecutionLimitError',
|
||||||
|
],
|
||||||
|
};
|
||||||
|
|
||||||
|
// Retry configuration for pipeline testing (fast iteration)
|
||||||
|
const TESTING_RETRY = {
|
||||||
|
initialInterval: '10 seconds',
|
||||||
|
maximumInterval: '30 seconds',
|
||||||
|
backoffCoefficient: 2,
|
||||||
|
maximumAttempts: 5,
|
||||||
|
nonRetryableErrorTypes: PRODUCTION_RETRY.nonRetryableErrorTypes,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Activity proxy with production retry configuration (default)
|
||||||
|
const acts = proxyActivities<typeof activities>({
|
||||||
|
startToCloseTimeout: '2 hours',
|
||||||
|
heartbeatTimeout: '10 minutes', // Long timeout for resource-constrained workers with many concurrent activities
|
||||||
|
retry: PRODUCTION_RETRY,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Activity proxy with testing retry configuration (fast)
|
||||||
|
const testActs = proxyActivities<typeof activities>({
|
||||||
|
startToCloseTimeout: '10 minutes',
|
||||||
|
heartbeatTimeout: '5 minutes', // Shorter for testing but still tolerant of resource contention
|
||||||
|
retry: TESTING_RETRY,
|
||||||
|
});
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compute aggregated metrics from the current pipeline state.
|
||||||
|
* Called on both success and failure to provide partial metrics.
|
||||||
|
*/
|
||||||
|
function computeSummary(state: PipelineState): PipelineSummary {
|
||||||
|
const metrics = Object.values(state.agentMetrics);
|
||||||
|
return {
|
||||||
|
totalCostUsd: metrics.reduce((sum, m) => sum + (m.costUsd ?? 0), 0),
|
||||||
|
totalDurationMs: Date.now() - state.startTime,
|
||||||
|
totalTurns: metrics.reduce((sum, m) => sum + (m.numTurns ?? 0), 0),
|
||||||
|
agentCount: state.completedAgents.length,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function pentestPipelineWorkflow(
|
||||||
|
input: PipelineInput
|
||||||
|
): Promise<PipelineState> {
|
||||||
|
const { workflowId } = workflowInfo();
|
||||||
|
|
||||||
|
// Select activity proxy based on testing mode
|
||||||
|
// Pipeline testing uses fast retry intervals (10s) for quick iteration
|
||||||
|
const a = input.pipelineTestingMode ? testActs : acts;
|
||||||
|
|
||||||
|
// Workflow state (queryable)
|
||||||
|
const state: PipelineState = {
|
||||||
|
status: 'running',
|
||||||
|
currentPhase: null,
|
||||||
|
currentAgent: null,
|
||||||
|
completedAgents: [],
|
||||||
|
failedAgent: null,
|
||||||
|
error: null,
|
||||||
|
startTime: Date.now(),
|
||||||
|
agentMetrics: {},
|
||||||
|
summary: null,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Register query handler for real-time progress inspection
|
||||||
|
setHandler(getProgress, (): PipelineProgress => ({
|
||||||
|
...state,
|
||||||
|
workflowId,
|
||||||
|
elapsedMs: Date.now() - state.startTime,
|
||||||
|
}));
|
||||||
|
|
||||||
|
// Build ActivityInput with required workflowId for audit correlation
|
||||||
|
// Activities require workflowId (non-optional), PipelineInput has it optional
|
||||||
|
// Use spread to conditionally include optional properties (exactOptionalPropertyTypes)
|
||||||
|
const activityInput: ActivityInput = {
|
||||||
|
webUrl: input.webUrl,
|
||||||
|
repoPath: input.repoPath,
|
||||||
|
workflowId,
|
||||||
|
...(input.configPath !== undefined && { configPath: input.configPath }),
|
||||||
|
...(input.outputPath !== undefined && { outputPath: input.outputPath }),
|
||||||
|
...(input.pipelineTestingMode !== undefined && {
|
||||||
|
pipelineTestingMode: input.pipelineTestingMode,
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
|
||||||
|
try {
|
||||||
|
// === Phase 1: Pre-Reconnaissance ===
|
||||||
|
state.currentPhase = 'pre-recon';
|
||||||
|
state.currentAgent = 'pre-recon';
|
||||||
|
await a.logPhaseTransition(activityInput, 'pre-recon', 'start');
|
||||||
|
state.agentMetrics['pre-recon'] =
|
||||||
|
await a.runPreReconAgent(activityInput);
|
||||||
|
state.completedAgents.push('pre-recon');
|
||||||
|
await a.logPhaseTransition(activityInput, 'pre-recon', 'complete');
|
||||||
|
|
||||||
|
// === Phase 2: Reconnaissance ===
|
||||||
|
state.currentPhase = 'recon';
|
||||||
|
state.currentAgent = 'recon';
|
||||||
|
await a.logPhaseTransition(activityInput, 'recon', 'start');
|
||||||
|
state.agentMetrics['recon'] = await a.runReconAgent(activityInput);
|
||||||
|
state.completedAgents.push('recon');
|
||||||
|
await a.logPhaseTransition(activityInput, 'recon', 'complete');
|
||||||
|
|
||||||
|
// === Phases 3-4: Vulnerability Analysis + Exploitation (Pipelined) ===
|
||||||
|
// Each vuln type runs as an independent pipeline:
|
||||||
|
// vuln agent → queue check → conditional exploit agent
|
||||||
|
// This eliminates the synchronization barrier between phases - each exploit
|
||||||
|
// starts immediately when its vuln agent finishes, not waiting for all.
|
||||||
|
state.currentPhase = 'vulnerability-exploitation';
|
||||||
|
state.currentAgent = 'pipelines';
|
||||||
|
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'start');
|
||||||
|
|
||||||
|
// Helper: Run a single vuln→exploit pipeline
|
||||||
|
async function runVulnExploitPipeline(
|
||||||
|
vulnType: VulnType,
|
||||||
|
runVulnAgent: () => Promise<AgentMetrics>,
|
||||||
|
runExploitAgent: () => Promise<AgentMetrics>
|
||||||
|
): Promise<VulnExploitPipelineResult> {
|
||||||
|
// Step 1: Run vulnerability agent
|
||||||
|
const vulnMetrics = await runVulnAgent();
|
||||||
|
|
||||||
|
// Step 2: Check exploitation queue (starts immediately after vuln)
|
||||||
|
const decision = await a.checkExploitationQueue(activityInput, vulnType);
|
||||||
|
|
||||||
|
// Step 3: Conditionally run exploit agent
|
||||||
|
let exploitMetrics: AgentMetrics | null = null;
|
||||||
|
if (decision.shouldExploit) {
|
||||||
|
exploitMetrics = await runExploitAgent();
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
vulnType,
|
||||||
|
vulnMetrics,
|
||||||
|
exploitMetrics,
|
||||||
|
exploitDecision: {
|
||||||
|
shouldExploit: decision.shouldExploit,
|
||||||
|
vulnerabilityCount: decision.vulnerabilityCount,
|
||||||
|
},
|
||||||
|
error: null,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run all 5 pipelines in parallel with graceful failure handling
|
||||||
|
// Promise.allSettled ensures other pipelines continue if one fails
|
||||||
|
const pipelineResults = await Promise.allSettled([
|
||||||
|
runVulnExploitPipeline(
|
||||||
|
'injection',
|
||||||
|
() => a.runInjectionVulnAgent(activityInput),
|
||||||
|
() => a.runInjectionExploitAgent(activityInput)
|
||||||
|
),
|
||||||
|
runVulnExploitPipeline(
|
||||||
|
'xss',
|
||||||
|
() => a.runXssVulnAgent(activityInput),
|
||||||
|
() => a.runXssExploitAgent(activityInput)
|
||||||
|
),
|
||||||
|
runVulnExploitPipeline(
|
||||||
|
'auth',
|
||||||
|
() => a.runAuthVulnAgent(activityInput),
|
||||||
|
() => a.runAuthExploitAgent(activityInput)
|
||||||
|
),
|
||||||
|
runVulnExploitPipeline(
|
||||||
|
'ssrf',
|
||||||
|
() => a.runSsrfVulnAgent(activityInput),
|
||||||
|
() => a.runSsrfExploitAgent(activityInput)
|
||||||
|
),
|
||||||
|
runVulnExploitPipeline(
|
||||||
|
'authz',
|
||||||
|
() => a.runAuthzVulnAgent(activityInput),
|
||||||
|
() => a.runAuthzExploitAgent(activityInput)
|
||||||
|
),
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Aggregate results from all pipelines
|
||||||
|
const failedPipelines: string[] = [];
|
||||||
|
for (const result of pipelineResults) {
|
||||||
|
if (result.status === 'fulfilled') {
|
||||||
|
const { vulnType, vulnMetrics, exploitMetrics } = result.value;
|
||||||
|
|
||||||
|
// Record vuln agent metrics
|
||||||
|
if (vulnMetrics) {
|
||||||
|
state.agentMetrics[`${vulnType}-vuln`] = vulnMetrics;
|
||||||
|
state.completedAgents.push(`${vulnType}-vuln`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Record exploit agent metrics (if it ran)
|
||||||
|
if (exploitMetrics) {
|
||||||
|
state.agentMetrics[`${vulnType}-exploit`] = exploitMetrics;
|
||||||
|
state.completedAgents.push(`${vulnType}-exploit`);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Pipeline failed - log error but continue with others
|
||||||
|
const errorMsg =
|
||||||
|
result.reason instanceof Error
|
||||||
|
? result.reason.message
|
||||||
|
: String(result.reason);
|
||||||
|
failedPipelines.push(errorMsg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Log any pipeline failures (workflow continues despite failures)
|
||||||
|
if (failedPipelines.length > 0) {
|
||||||
|
console.log(
|
||||||
|
`⚠️ ${failedPipelines.length} pipeline(s) failed:`,
|
||||||
|
failedPipelines
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update phase markers
|
||||||
|
state.currentPhase = 'exploitation';
|
||||||
|
state.currentAgent = null;
|
||||||
|
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'complete');
|
||||||
|
|
||||||
|
// === Phase 5: Reporting ===
|
||||||
|
state.currentPhase = 'reporting';
|
||||||
|
state.currentAgent = 'report';
|
||||||
|
await a.logPhaseTransition(activityInput, 'reporting', 'start');
|
||||||
|
|
||||||
|
// First, assemble the concatenated report from exploitation evidence files
|
||||||
|
await a.assembleReportActivity(activityInput);
|
||||||
|
|
||||||
|
// Then run the report agent to add executive summary and clean up
|
||||||
|
state.agentMetrics['report'] = await a.runReportAgent(activityInput);
|
||||||
|
state.completedAgents.push('report');
|
||||||
|
await a.logPhaseTransition(activityInput, 'reporting', 'complete');
|
||||||
|
|
||||||
|
// === Complete ===
|
||||||
|
state.status = 'completed';
|
||||||
|
state.currentPhase = null;
|
||||||
|
state.currentAgent = null;
|
||||||
|
state.summary = computeSummary(state);
|
||||||
|
|
||||||
|
// Log workflow completion summary
|
||||||
|
await a.logWorkflowComplete(activityInput, {
|
||||||
|
status: 'completed',
|
||||||
|
totalDurationMs: state.summary.totalDurationMs,
|
||||||
|
totalCostUsd: state.summary.totalCostUsd,
|
||||||
|
completedAgents: state.completedAgents,
|
||||||
|
agentMetrics: Object.fromEntries(
|
||||||
|
Object.entries(state.agentMetrics).map(([name, m]) => [
|
||||||
|
name,
|
||||||
|
{ durationMs: m.durationMs, costUsd: m.costUsd },
|
||||||
|
])
|
||||||
|
),
|
||||||
|
});
|
||||||
|
|
||||||
|
return state;
|
||||||
|
} catch (error) {
|
||||||
|
state.status = 'failed';
|
||||||
|
state.failedAgent = state.currentAgent;
|
||||||
|
state.error = error instanceof Error ? error.message : String(error);
|
||||||
|
state.summary = computeSummary(state);
|
||||||
|
|
||||||
|
// Log workflow failure summary
|
||||||
|
await a.logWorkflowComplete(activityInput, {
|
||||||
|
status: 'failed',
|
||||||
|
totalDurationMs: state.summary.totalDurationMs,
|
||||||
|
totalCostUsd: state.summary.totalCostUsd,
|
||||||
|
completedAgents: state.completedAgents,
|
||||||
|
agentMetrics: Object.fromEntries(
|
||||||
|
Object.entries(state.agentMetrics).map(([name, m]) => [
|
||||||
|
name,
|
||||||
|
{ durationMs: m.durationMs, costUsd: m.costUsd },
|
||||||
|
])
|
||||||
|
),
|
||||||
|
error: state.error ?? undefined,
|
||||||
|
});
|
||||||
|
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
}
|
||||||
+23
-4
@@ -47,10 +47,6 @@ export type PlaywrightAgent =
|
|||||||
|
|
||||||
export type AgentValidator = (sourceDir: string) => Promise<boolean>;
|
export type AgentValidator = (sourceDir: string) => Promise<boolean>;
|
||||||
|
|
||||||
export type AgentValidatorMap = Record<AgentName, AgentValidator>;
|
|
||||||
|
|
||||||
export type McpAgentMapping = Record<PromptName, PlaywrightAgent>;
|
|
||||||
|
|
||||||
export type AgentStatus =
|
export type AgentStatus =
|
||||||
| 'pending'
|
| 'pending'
|
||||||
| 'in_progress'
|
| 'in_progress'
|
||||||
@@ -63,3 +59,26 @@ export interface AgentDefinition {
|
|||||||
displayName: string;
|
displayName: string;
|
||||||
prerequisites: AgentName[];
|
prerequisites: AgentName[];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Maps an agent name to its corresponding prompt file name.
|
||||||
|
*/
|
||||||
|
export function getPromptNameForAgent(agentName: AgentName): PromptName {
|
||||||
|
const mappings: Record<AgentName, PromptName> = {
|
||||||
|
'pre-recon': 'pre-recon-code',
|
||||||
|
'recon': 'recon',
|
||||||
|
'injection-vuln': 'vuln-injection',
|
||||||
|
'xss-vuln': 'vuln-xss',
|
||||||
|
'auth-vuln': 'vuln-auth',
|
||||||
|
'ssrf-vuln': 'vuln-ssrf',
|
||||||
|
'authz-vuln': 'vuln-authz',
|
||||||
|
'injection-exploit': 'exploit-injection',
|
||||||
|
'xss-exploit': 'exploit-xss',
|
||||||
|
'auth-exploit': 'exploit-auth',
|
||||||
|
'ssrf-exploit': 'exploit-ssrf',
|
||||||
|
'authz-exploit': 'exploit-authz',
|
||||||
|
'report': 'report-executive',
|
||||||
|
};
|
||||||
|
|
||||||
|
return mappings[agentName];
|
||||||
|
}
|
||||||
|
|||||||
@@ -31,13 +31,12 @@ type UnlockFunction = () => void;
|
|||||||
* }
|
* }
|
||||||
* ```
|
* ```
|
||||||
*/
|
*/
|
||||||
|
// Promise-based mutex with queue semantics - safe for parallel agents on same session
|
||||||
export class SessionMutex {
|
export class SessionMutex {
|
||||||
// Map of sessionId -> Promise (represents active lock)
|
// Map of sessionId -> Promise (represents active lock)
|
||||||
private locks: Map<string, Promise<void>> = new Map();
|
private locks: Map<string, Promise<void>> = new Map();
|
||||||
|
|
||||||
/**
|
// Wait for existing lock, then acquire. Queue ensures FIFO ordering.
|
||||||
* Acquire lock for a session
|
|
||||||
*/
|
|
||||||
async lock(sessionId: string): Promise<UnlockFunction> {
|
async lock(sessionId: string): Promise<UnlockFunction> {
|
||||||
if (this.locks.has(sessionId)) {
|
if (this.locks.has(sessionId)) {
|
||||||
// Wait for existing lock to be released
|
// Wait for existing lock to be released
|
||||||
|
|||||||
@@ -0,0 +1,73 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* File I/O Utilities
|
||||||
|
*
|
||||||
|
* Core utility functions for file operations including atomic writes,
|
||||||
|
* directory creation, and JSON file handling.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import fs from 'fs/promises';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Ensure directory exists (idempotent, race-safe)
|
||||||
|
*/
|
||||||
|
export async function ensureDirectory(dirPath: string): Promise<void> {
|
||||||
|
try {
|
||||||
|
await fs.mkdir(dirPath, { recursive: true });
|
||||||
|
} catch (error) {
|
||||||
|
// Ignore EEXIST errors (race condition safe)
|
||||||
|
if ((error as NodeJS.ErrnoException).code !== 'EEXIST') {
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Atomic write using temp file + rename pattern
|
||||||
|
* Guarantees no partial writes or corruption on crash
|
||||||
|
*/
|
||||||
|
export async function atomicWrite(filePath: string, data: object | string): Promise<void> {
|
||||||
|
const tempPath = `${filePath}.tmp`;
|
||||||
|
const content = typeof data === 'string' ? data : JSON.stringify(data, null, 2);
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Write to temp file
|
||||||
|
await fs.writeFile(tempPath, content, 'utf8');
|
||||||
|
|
||||||
|
// Atomic rename (POSIX guarantee: atomic on same filesystem)
|
||||||
|
await fs.rename(tempPath, filePath);
|
||||||
|
} catch (error) {
|
||||||
|
// Clean up temp file on failure
|
||||||
|
try {
|
||||||
|
await fs.unlink(tempPath);
|
||||||
|
} catch {
|
||||||
|
// Ignore cleanup errors
|
||||||
|
}
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Read and parse JSON file
|
||||||
|
*/
|
||||||
|
export async function readJson<T = unknown>(filePath: string): Promise<T> {
|
||||||
|
const content = await fs.readFile(filePath, 'utf8');
|
||||||
|
return JSON.parse(content) as T;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Check if file exists
|
||||||
|
*/
|
||||||
|
export async function fileExists(filePath: string): Promise<boolean> {
|
||||||
|
try {
|
||||||
|
await fs.access(filePath);
|
||||||
|
return true;
|
||||||
|
} catch {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,60 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Formatting Utilities
|
||||||
|
*
|
||||||
|
* Generic formatting functions for durations, timestamps, and percentages.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Format duration in milliseconds to human-readable string
|
||||||
|
*/
|
||||||
|
export function formatDuration(ms: number): string {
|
||||||
|
if (ms < 1000) {
|
||||||
|
return `${ms}ms`;
|
||||||
|
}
|
||||||
|
|
||||||
|
const seconds = ms / 1000;
|
||||||
|
if (seconds < 60) {
|
||||||
|
return `${seconds.toFixed(1)}s`;
|
||||||
|
}
|
||||||
|
|
||||||
|
const minutes = Math.floor(seconds / 60);
|
||||||
|
const remainingSeconds = Math.floor(seconds % 60);
|
||||||
|
return `${minutes}m ${remainingSeconds}s`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Format timestamp to ISO 8601 string
|
||||||
|
*/
|
||||||
|
export function formatTimestamp(timestamp: number = Date.now()): string {
|
||||||
|
return new Date(timestamp).toISOString();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Calculate percentage
|
||||||
|
*/
|
||||||
|
export function calculatePercentage(part: number, total: number): number {
|
||||||
|
if (total === 0) return 0;
|
||||||
|
return (part / total) * 100;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Extract agent type from description string for display purposes
|
||||||
|
*/
|
||||||
|
export function extractAgentType(description: string): string {
|
||||||
|
if (description.includes('Pre-recon')) {
|
||||||
|
return 'pre-reconnaissance';
|
||||||
|
}
|
||||||
|
if (description.includes('Recon')) {
|
||||||
|
return 'reconnaissance';
|
||||||
|
}
|
||||||
|
if (description.includes('Report')) {
|
||||||
|
return 'report generation';
|
||||||
|
}
|
||||||
|
return 'analysis';
|
||||||
|
}
|
||||||
@@ -0,0 +1,29 @@
|
|||||||
|
// Copyright (C) 2025 Keygraph, Inc.
|
||||||
|
//
|
||||||
|
// This program is free software: you can redistribute it and/or modify
|
||||||
|
// it under the terms of the GNU Affero General Public License version 3
|
||||||
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Functional Programming Utilities
|
||||||
|
*
|
||||||
|
* Generic functional composition patterns for async operations.
|
||||||
|
*/
|
||||||
|
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
type PipelineFunction = (x: any) => any | Promise<any>;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Async pipeline that passes result through a series of functions.
|
||||||
|
* Clearer than reduce-based pipe and easier to debug.
|
||||||
|
*/
|
||||||
|
export async function asyncPipe<TResult>(
|
||||||
|
initial: unknown,
|
||||||
|
...fns: PipelineFunction[]
|
||||||
|
): Promise<TResult> {
|
||||||
|
let result = initial;
|
||||||
|
for (const fn of fns) {
|
||||||
|
result = await fn(result);
|
||||||
|
}
|
||||||
|
return result as TResult;
|
||||||
|
}
|
||||||
+171
-148
@@ -7,13 +7,76 @@
|
|||||||
import { $ } from 'zx';
|
import { $ } from 'zx';
|
||||||
import chalk from 'chalk';
|
import chalk from 'chalk';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Check if a directory is a git repository.
|
||||||
|
* Returns true if the directory contains a .git folder or is inside a git repo.
|
||||||
|
*/
|
||||||
|
export async function isGitRepository(dir: string): Promise<boolean> {
|
||||||
|
try {
|
||||||
|
await $`cd ${dir} && git rev-parse --git-dir`.quiet();
|
||||||
|
return true;
|
||||||
|
} catch {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
interface GitOperationResult {
|
interface GitOperationResult {
|
||||||
success: boolean;
|
success: boolean;
|
||||||
hadChanges?: boolean;
|
hadChanges?: boolean;
|
||||||
error?: Error;
|
error?: Error;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Global git operations semaphore to prevent index.lock conflicts during parallel execution
|
/**
|
||||||
|
* Get list of changed files from git status --porcelain output
|
||||||
|
*/
|
||||||
|
async function getChangedFiles(
|
||||||
|
sourceDir: string,
|
||||||
|
operationDescription: string
|
||||||
|
): Promise<string[]> {
|
||||||
|
const status = await executeGitCommandWithRetry(
|
||||||
|
['git', 'status', '--porcelain'],
|
||||||
|
sourceDir,
|
||||||
|
operationDescription
|
||||||
|
);
|
||||||
|
return status.stdout
|
||||||
|
.trim()
|
||||||
|
.split('\n')
|
||||||
|
.filter((line) => line.length > 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Log a summary of changed files with truncation for long lists
|
||||||
|
*/
|
||||||
|
function logChangeSummary(
|
||||||
|
changes: string[],
|
||||||
|
messageWithChanges: string,
|
||||||
|
messageWithoutChanges: string,
|
||||||
|
color: typeof chalk.green,
|
||||||
|
maxToShow: number = 5
|
||||||
|
): void {
|
||||||
|
if (changes.length > 0) {
|
||||||
|
console.log(color(messageWithChanges.replace('{count}', String(changes.length))));
|
||||||
|
changes.slice(0, maxToShow).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
||||||
|
if (changes.length > maxToShow) {
|
||||||
|
console.log(chalk.gray(` ... and ${changes.length - maxToShow} more files`));
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
console.log(color(messageWithoutChanges));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convert unknown error to GitOperationResult
|
||||||
|
*/
|
||||||
|
function toErrorResult(error: unknown): GitOperationResult {
|
||||||
|
const errMsg = error instanceof Error ? error.message : String(error);
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: error instanceof Error ? error : new Error(errMsg),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Serializes git operations to prevent index.lock conflicts during parallel agent execution
|
||||||
class GitSemaphore {
|
class GitSemaphore {
|
||||||
private queue: Array<() => void> = [];
|
private queue: Array<() => void> = [];
|
||||||
private running: boolean = false;
|
private running: boolean = false;
|
||||||
@@ -41,33 +104,38 @@ class GitSemaphore {
|
|||||||
|
|
||||||
const gitSemaphore = new GitSemaphore();
|
const gitSemaphore = new GitSemaphore();
|
||||||
|
|
||||||
// Execute git commands with retry logic for index.lock conflicts
|
const GIT_LOCK_ERROR_PATTERNS = [
|
||||||
export const executeGitCommandWithRetry = async (
|
'index.lock',
|
||||||
|
'unable to lock',
|
||||||
|
'Another git process',
|
||||||
|
'fatal: Unable to create',
|
||||||
|
'fatal: index file',
|
||||||
|
];
|
||||||
|
|
||||||
|
function isGitLockError(errorMessage: string): boolean {
|
||||||
|
return GIT_LOCK_ERROR_PATTERNS.some((pattern) => errorMessage.includes(pattern));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Retries git commands on lock conflicts with exponential backoff
|
||||||
|
export async function executeGitCommandWithRetry(
|
||||||
commandArgs: string[],
|
commandArgs: string[],
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
description: string,
|
description: string,
|
||||||
maxRetries: number = 5
|
maxRetries: number = 5
|
||||||
): Promise<{ stdout: string; stderr: string }> => {
|
): Promise<{ stdout: string; stderr: string }> {
|
||||||
await gitSemaphore.acquire();
|
await gitSemaphore.acquire();
|
||||||
|
|
||||||
try {
|
try {
|
||||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||||
try {
|
try {
|
||||||
// For arrays like ['git', 'status', '--porcelain'], execute parts separately
|
|
||||||
const [cmd, ...args] = commandArgs;
|
const [cmd, ...args] = commandArgs;
|
||||||
const result = await $`cd ${sourceDir} && ${cmd} ${args}`;
|
const result = await $`cd ${sourceDir} && ${cmd} ${args}`;
|
||||||
return result;
|
return result;
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
const errMsg = error instanceof Error ? error.message : String(error);
|
||||||
const isLockError =
|
|
||||||
errMsg.includes('index.lock') ||
|
|
||||||
errMsg.includes('unable to lock') ||
|
|
||||||
errMsg.includes('Another git process') ||
|
|
||||||
errMsg.includes('fatal: Unable to create') ||
|
|
||||||
errMsg.includes('fatal: index file');
|
|
||||||
|
|
||||||
if (isLockError && attempt < maxRetries) {
|
if (isGitLockError(errMsg) && attempt < maxRetries) {
|
||||||
const delay = Math.pow(2, attempt - 1) * 1000; // Exponential backoff: 1s, 2s, 4s, 8s, 16s
|
const delay = Math.pow(2, attempt - 1) * 1000;
|
||||||
console.log(
|
console.log(
|
||||||
chalk.yellow(
|
chalk.yellow(
|
||||||
` ⚠️ Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`
|
` ⚠️ Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`
|
||||||
@@ -80,84 +148,81 @@ export const executeGitCommandWithRetry = async (
|
|||||||
throw error;
|
throw error;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Should never reach here but TypeScript needs a return
|
|
||||||
throw new Error(`Git command failed after ${maxRetries} retries`);
|
throw new Error(`Git command failed after ${maxRetries} retries`);
|
||||||
} finally {
|
} finally {
|
||||||
gitSemaphore.release();
|
gitSemaphore.release();
|
||||||
}
|
}
|
||||||
};
|
}
|
||||||
|
|
||||||
// Pure functions for Git workspace management
|
// Two-phase reset: hard reset (tracked files) + clean (untracked files)
|
||||||
const cleanWorkspace = async (
|
export async function rollbackGitWorkspace(
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
reason: string = 'clean start'
|
reason: string = 'retry preparation'
|
||||||
): Promise<GitOperationResult> => {
|
): Promise<GitOperationResult> {
|
||||||
console.log(chalk.blue(` 🧹 Cleaning workspace for ${reason}`));
|
// Skip git operations if not a git repository
|
||||||
try {
|
if (!(await isGitRepository(sourceDir))) {
|
||||||
// Check for uncommitted changes
|
console.log(chalk.gray(` ⏭️ Skipping git rollback (not a git repository)`));
|
||||||
const status = await $`cd ${sourceDir} && git status --porcelain`;
|
return { success: true };
|
||||||
const hasChanges = status.stdout.trim().length > 0;
|
|
||||||
|
|
||||||
if (hasChanges) {
|
|
||||||
// Show what we're about to remove
|
|
||||||
const changes = status.stdout
|
|
||||||
.trim()
|
|
||||||
.split('\n')
|
|
||||||
.filter((line) => line.length > 0);
|
|
||||||
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
|
||||||
|
|
||||||
await $`cd ${sourceDir} && git reset --hard HEAD`;
|
|
||||||
await $`cd ${sourceDir} && git clean -fd`;
|
|
||||||
|
|
||||||
console.log(
|
|
||||||
chalk.yellow(` ✅ Rollback completed - removed ${changes.length} contaminated changes:`)
|
|
||||||
);
|
|
||||||
changes.slice(0, 3).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
|
||||||
if (changes.length > 3) {
|
|
||||||
console.log(chalk.gray(` ... and ${changes.length - 3} more files`));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
console.log(chalk.blue(` ✅ Workspace already clean (no changes to remove)`));
|
|
||||||
}
|
|
||||||
return { success: true, hadChanges: hasChanges };
|
|
||||||
} catch (error) {
|
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
|
||||||
console.log(chalk.yellow(` ⚠️ Workspace cleanup failed: ${errMsg}`));
|
|
||||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
|
||||||
}
|
}
|
||||||
};
|
|
||||||
|
|
||||||
export const createGitCheckpoint = async (
|
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
||||||
|
try {
|
||||||
|
const changes = await getChangedFiles(sourceDir, 'status check for rollback');
|
||||||
|
|
||||||
|
await executeGitCommandWithRetry(
|
||||||
|
['git', 'reset', '--hard', 'HEAD'],
|
||||||
|
sourceDir,
|
||||||
|
'hard reset for rollback'
|
||||||
|
);
|
||||||
|
await executeGitCommandWithRetry(
|
||||||
|
['git', 'clean', '-fd'],
|
||||||
|
sourceDir,
|
||||||
|
'cleaning untracked files for rollback'
|
||||||
|
);
|
||||||
|
|
||||||
|
logChangeSummary(
|
||||||
|
changes,
|
||||||
|
' ✅ Rollback completed - removed {count} contaminated changes:',
|
||||||
|
' ✅ Rollback completed - no changes to remove',
|
||||||
|
chalk.yellow,
|
||||||
|
3
|
||||||
|
);
|
||||||
|
return { success: true };
|
||||||
|
} catch (error) {
|
||||||
|
const result = toErrorResult(error);
|
||||||
|
console.log(chalk.red(` ❌ Rollback failed after retries: ${result.error?.message}`));
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Creates checkpoint before each attempt. First attempt preserves workspace; retries clean it.
|
||||||
|
export async function createGitCheckpoint(
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
description: string,
|
description: string,
|
||||||
attempt: number
|
attempt: number
|
||||||
): Promise<GitOperationResult> => {
|
): Promise<GitOperationResult> {
|
||||||
|
// Skip git operations if not a git repository
|
||||||
|
if (!(await isGitRepository(sourceDir))) {
|
||||||
|
console.log(chalk.gray(` ⏭️ Skipping git checkpoint (not a git repository)`));
|
||||||
|
return { success: true };
|
||||||
|
}
|
||||||
|
|
||||||
console.log(chalk.blue(` 📍 Creating checkpoint for ${description} (attempt ${attempt})`));
|
console.log(chalk.blue(` 📍 Creating checkpoint for ${description} (attempt ${attempt})`));
|
||||||
try {
|
try {
|
||||||
// Only clean workspace on retry attempts (attempt > 1), not on first attempts
|
// First attempt: preserve existing deliverables. Retries: clean workspace to prevent pollution
|
||||||
// This preserves deliverables between agents while still cleaning on actual retries
|
|
||||||
if (attempt > 1) {
|
if (attempt > 1) {
|
||||||
const cleanResult = await cleanWorkspace(sourceDir, `${description} (retry cleanup)`);
|
const cleanResult = await rollbackGitWorkspace(sourceDir, `${description} (retry cleanup)`);
|
||||||
if (!cleanResult.success) {
|
if (!cleanResult.success) {
|
||||||
const errMsg = cleanResult.error?.message || 'Unknown error';
|
|
||||||
console.log(
|
console.log(
|
||||||
chalk.yellow(` ⚠️ Workspace cleanup failed, continuing anyway: ${errMsg}`)
|
chalk.yellow(` ⚠️ Workspace cleanup failed, continuing anyway: ${cleanResult.error?.message}`)
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check for uncommitted changes with retry logic
|
const changes = await getChangedFiles(sourceDir, 'status check');
|
||||||
const status = await executeGitCommandWithRetry(
|
const hasChanges = changes.length > 0;
|
||||||
['git', 'status', '--porcelain'],
|
|
||||||
sourceDir,
|
|
||||||
'status check'
|
|
||||||
);
|
|
||||||
const hasChanges = status.stdout.trim().length > 0;
|
|
||||||
|
|
||||||
// Stage changes with retry logic
|
|
||||||
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes');
|
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes');
|
||||||
|
|
||||||
// Create commit with retry logic
|
|
||||||
await executeGitCommandWithRetry(
|
await executeGitCommandWithRetry(
|
||||||
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
|
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
|
||||||
sourceDir,
|
sourceDir,
|
||||||
@@ -171,106 +236,64 @@ export const createGitCheckpoint = async (
|
|||||||
}
|
}
|
||||||
return { success: true };
|
return { success: true };
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
const result = toErrorResult(error);
|
||||||
console.log(chalk.yellow(` ⚠️ Checkpoint creation failed after retries: ${errMsg}`));
|
console.log(chalk.yellow(` ⚠️ Checkpoint creation failed after retries: ${result.error?.message}`));
|
||||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
return result;
|
||||||
}
|
}
|
||||||
};
|
}
|
||||||
|
|
||||||
export const commitGitSuccess = async (
|
export async function commitGitSuccess(
|
||||||
sourceDir: string,
|
sourceDir: string,
|
||||||
description: string
|
description: string
|
||||||
): Promise<GitOperationResult> => {
|
): Promise<GitOperationResult> {
|
||||||
|
// Skip git operations if not a git repository
|
||||||
|
if (!(await isGitRepository(sourceDir))) {
|
||||||
|
console.log(chalk.gray(` ⏭️ Skipping git commit (not a git repository)`));
|
||||||
|
return { success: true };
|
||||||
|
}
|
||||||
|
|
||||||
console.log(chalk.green(` 💾 Committing successful results for ${description}`));
|
console.log(chalk.green(` 💾 Committing successful results for ${description}`));
|
||||||
try {
|
try {
|
||||||
// Check what we're about to commit with retry logic
|
const changes = await getChangedFiles(sourceDir, 'status check for success commit');
|
||||||
const status = await executeGitCommandWithRetry(
|
|
||||||
['git', 'status', '--porcelain'],
|
|
||||||
sourceDir,
|
|
||||||
'status check for success commit'
|
|
||||||
);
|
|
||||||
const changes = status.stdout
|
|
||||||
.trim()
|
|
||||||
.split('\n')
|
|
||||||
.filter((line) => line.length > 0);
|
|
||||||
|
|
||||||
// Stage changes with retry logic
|
|
||||||
await executeGitCommandWithRetry(
|
await executeGitCommandWithRetry(
|
||||||
['git', 'add', '-A'],
|
['git', 'add', '-A'],
|
||||||
sourceDir,
|
sourceDir,
|
||||||
'staging changes for success commit'
|
'staging changes for success commit'
|
||||||
);
|
);
|
||||||
|
|
||||||
// Create success commit with retry logic
|
|
||||||
await executeGitCommandWithRetry(
|
await executeGitCommandWithRetry(
|
||||||
['git', 'commit', '-m', `✅ ${description}: completed successfully`, '--allow-empty'],
|
['git', 'commit', '-m', `✅ ${description}: completed successfully`, '--allow-empty'],
|
||||||
sourceDir,
|
sourceDir,
|
||||||
'creating success commit'
|
'creating success commit'
|
||||||
);
|
);
|
||||||
|
|
||||||
if (changes.length > 0) {
|
logChangeSummary(
|
||||||
console.log(chalk.green(` ✅ Success commit created with ${changes.length} file changes:`));
|
changes,
|
||||||
changes.slice(0, 5).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
' ✅ Success commit created with {count} file changes:',
|
||||||
if (changes.length > 5) {
|
' ✅ Empty success commit created (agent made no file changes)',
|
||||||
console.log(chalk.gray(` ... and ${changes.length - 5} more files`));
|
chalk.green,
|
||||||
}
|
5
|
||||||
} else {
|
);
|
||||||
console.log(chalk.green(` ✅ Empty success commit created (agent made no file changes)`));
|
|
||||||
}
|
|
||||||
return { success: true };
|
return { success: true };
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
const result = toErrorResult(error);
|
||||||
console.log(chalk.yellow(` ⚠️ Success commit failed after retries: ${errMsg}`));
|
console.log(chalk.yellow(` ⚠️ Success commit failed after retries: ${result.error?.message}`));
|
||||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
return result;
|
||||||
}
|
}
|
||||||
};
|
}
|
||||||
|
|
||||||
export const rollbackGitWorkspace = async (
|
/**
|
||||||
sourceDir: string,
|
* Get current git commit hash.
|
||||||
reason: string = 'retry preparation'
|
* Returns null if not a git repository.
|
||||||
): Promise<GitOperationResult> => {
|
*/
|
||||||
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
export async function getGitCommitHash(sourceDir: string): Promise<string | null> {
|
||||||
|
if (!(await isGitRepository(sourceDir))) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
try {
|
try {
|
||||||
// Show what we're about to remove with retry logic
|
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
|
||||||
const status = await executeGitCommandWithRetry(
|
return result.stdout.trim();
|
||||||
['git', 'status', '--porcelain'],
|
} catch {
|
||||||
sourceDir,
|
return null;
|
||||||
'status check for rollback'
|
|
||||||
);
|
|
||||||
const changes = status.stdout
|
|
||||||
.trim()
|
|
||||||
.split('\n')
|
|
||||||
.filter((line) => line.length > 0);
|
|
||||||
|
|
||||||
// Reset to HEAD with retry logic
|
|
||||||
await executeGitCommandWithRetry(
|
|
||||||
['git', 'reset', '--hard', 'HEAD'],
|
|
||||||
sourceDir,
|
|
||||||
'hard reset for rollback'
|
|
||||||
);
|
|
||||||
|
|
||||||
// Clean untracked files with retry logic
|
|
||||||
await executeGitCommandWithRetry(
|
|
||||||
['git', 'clean', '-fd'],
|
|
||||||
sourceDir,
|
|
||||||
'cleaning untracked files for rollback'
|
|
||||||
);
|
|
||||||
|
|
||||||
if (changes.length > 0) {
|
|
||||||
console.log(
|
|
||||||
chalk.yellow(` ✅ Rollback completed - removed ${changes.length} contaminated changes:`)
|
|
||||||
);
|
|
||||||
changes.slice(0, 3).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
|
||||||
if (changes.length > 3) {
|
|
||||||
console.log(chalk.gray(` ... and ${changes.length - 3} more files`));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
console.log(chalk.yellow(` ✅ Rollback completed - no changes to remove`));
|
|
||||||
}
|
|
||||||
return { success: true };
|
|
||||||
} catch (error) {
|
|
||||||
const errMsg = error instanceof Error ? error.message : String(error);
|
|
||||||
console.log(chalk.red(` ❌ Rollback failed after retries: ${errMsg}`));
|
|
||||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
|
||||||
}
|
}
|
||||||
};
|
}
|
||||||
|
|||||||
@@ -5,7 +5,7 @@
|
|||||||
// as published by the Free Software Foundation.
|
// as published by the Free Software Foundation.
|
||||||
|
|
||||||
import chalk from 'chalk';
|
import chalk from 'chalk';
|
||||||
import { formatDuration } from '../audit/utils.js';
|
import { formatDuration } from './formatting.js';
|
||||||
|
|
||||||
// Timing utilities
|
// Timing utilities
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user