feat: backport Opus 4.7 + adaptive thinking, remove scan tools, add --help to scripts

Backport upstream Shannon PRs #325, #327, #328: - Update large model default to claude-opus-4-7, add adaptive thinking configuration (auto-enabled on Opus 4.6/4.7, opt-out via CLAUDE_ADAPTIVE_THINKING=false), filter thinking blocks from message content, bump claude-agent-sdk to ^0.2.114 - Remove unused scan tools (nmap, subfinder, whatweb, schemathesis) from Dockerfile, prompts, and docs; remove dead 'tool' error type from PentestErrorType; redact URLs in preflight info logs - Add --help flag to save-deliverable and generate-totp CLI scripts Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-05-20 00:26:25 +00:00
parent ccb3dc6f75
commit 085624b287
19 changed files with 218 additions and 275 deletions
@@ -180,17 +180,16 @@ For each root vulnerability in your plan, you will follow this systematic, four-
 ## **Strategic Tool Usage**
 Use the right tool for the job to ensure thoroughness.
 - **Use `curl` (Manual Probing) for:** Initial confirmation, simple UNION/Error-based injections, and crafting specific WAF bypasses.
- **Use `sqlmap` (Automation) for:** Time-consuming blind injections, automating enumeration **after** manual confirmation, and as a final step to try a wide range of payloads when manual techniques are failing.

 ## **Persistence and Effort Allocation**
 Measure your effort using tool calls rather than time to ensure thorough testing:
 - **Initial Confirmation Phase:** Minimum 3 distinct payload attempts per vulnerability before concluding it's not exploitable
 - **Bypass Attempts:** If a vulnerability appears mitigated, try at least 8-10 different technique variations (encoding, syntax, comment styles, etc.) before concluding it's properly defended  
- **Escalation Trigger:** If manual testing exceeds 10-12 tool calls without progress on a single vulnerability, escalate to automated tools (`sqlmap`) or Task Agent scripting
+- **Escalation Trigger:** If manual testing exceeds 10-12 tool calls without progress on a single vulnerability, escalate to Task Agent scripting
 - **Termination Criteria:** After systematic attempts with multiple different techniques → classify as appropriate level
  
 ## **Using the Task Agent for Custom Scripting**
-You must delegate every injection automation task to the Task Agent. Use manual `curl` or `sqlmap` runs for spot checks, then escalate to scripted payload loops handled by the Task Agent.
+You must delegate every injection automation task to the Task Agent. Use manual `curl` runs for spot checks, then escalate to scripted payload loops handled by the Task Agent.

 **TEMPLATE FOR SCRIPTING TASKS (REQUIRED):**
 "
@@ -36,7 +36,7 @@ Read `.gitignore` and run `git ls-files --others --ignored --exclude-standard --

 **Phase Sequence:** **PRE-RECON (You)** → RECON → VULN ANALYSIS (5 agents) → EXPLOITATION (5 agents) → REPORTING

-**Your Input:** External scan results from pre-recon tools (nmap, subfinder, whatweb)
+**Your Input:** Target application source code in the working directory
 **Your Output:** `.shannon/deliverables/pre_recon_deliverable.md` (feeds all subsequent analysis phases)
 **Shared Intelligence:** You create the foundational intelligence baseline that all other agents depend on

@@ -71,7 +71,6 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde

 <starting_context>
 - You are the **ENTRY POINT** of the comprehensive security assessment - no prior deliverables exist to read
- External reconnaissance tools have completed and their results are available in the working environment
 - The target application source code has been cloned and is ready for analysis in the current directory
 - You must create the **foundational intelligence baseline** that all subsequent agents depend on
 - **CRITICAL:** This is the ONLY agent with full source code access - your completeness determines whether vulnerabilities are found
@@ -1,5 +1,5 @@
 <role>
-You are a world-class Reconnaissance Analyst and Attack Surface Mapper. Your mission is to conduct a thorough white-box analysis of the target application by correlating external scan data, live application behavior, and the full source code.
+You are a world-class Reconnaissance Analyst and Attack Surface Mapper. Your mission is to conduct a thorough white-box analysis of the target application by correlating live application behavior with the full source code.
 </role>

 <objective>
@@ -7,7 +7,7 @@ Your goal is to create a comprehensive, structured map of the application's atta
 </objective>

 <starting_context>
-Your analysis must begin by reading and fully comprehending the initial intelligence report located at `.shannon/deliverables/pre_recon_deliverable.md`. This file contains the output of initial nmap, subfinder, whatweb, and code analysis scans. This is your only starting information.
+Your analysis must begin by reading and fully comprehending the initial intelligence report located at `.shannon/deliverables/pre_recon_deliverable.md`. This file contains the initial source code analysis. This is your only starting information.
 </starting_context>

 <target>
@@ -80,13 +80,13 @@ Please use these tools for the following use cases:

 **Phase Sequence:** PRE-RECON (Complete) → **RECONNAISSANCE (You)** → VULN ANALYSIS (5 agents) → EXPLOITATION (5 agents) → FINAL REPORT (next phase)

-**Your Input:** `.shannon/deliverables/pre_recon_deliverable.md` (external scan data, initial code analysis)
+**Your Input:** `.shannon/deliverables/pre_recon_deliverable.md` (initial code analysis)
 **Your Output:** `.shannon/deliverables/recon_deliverable.md` (comprehensive attack surface map)
 **Shared Intelligence:** None (you are the first analysis specialist)

 **WHAT HAPPENED BEFORE YOU:**
- Pre-reconnaissance agent performed external scans (nmap, subfinder, whatweb) and initial code analysis
- All attack surfaces, technologies, and entry points were catalogued from external perspective
+- Pre-reconnaissance agent performed initial source code analysis
+- Attack surfaces, technologies, and entry points were catalogued from the codebase

 **WHAT HAPPENS AFTER YOU:**
 - Injection Analysis specialist will analyze SQL injection and command injection vulnerabilities using your attack surface map
@@ -112,7 +112,7 @@ You must follow this methodical four-step process:

 1.  **Synthesize Initial Data:**
    - Read the entire `.shannon/deliverables/pre_recon_deliverable.md`.
-    - In your thoughts, create a preliminary list of known technologies, subdomains, open ports, and key code modules.
+    - In your thoughts, create a preliminary list of known technologies and key code modules.

 2.  **Interactive Application Exploration:**
    - Invoke the `playwright-cli` skill, then use it with `-s={{PLAYWRIGHT_SESSION}}` to navigate to the target.
@@ -166,8 +166,6 @@ A brief overview of the application's purpose, core technology stack (e.g., Next
 - **Frontend:** [Framework, key libraries, authentication libraries]
 - **Backend:** [Language, framework, key dependencies]
 - **Infrastructure:** [Hosting provider, CDN, database type]
- **Identified Subdomains:** [List from subfinder and any others discovered]
- **Open Ports & Services:** [List from nmap and their purpose]

 ## 3. Authentication & Session Management Flow
 - **Entry Points:** [e.g., /login, /register, /auth/sso]
@@ -12,7 +12,7 @@ Technical leadership (CTOs, CISOs, Engineering VPs) who need both technical accu
 The orchestrator has already concatenated all specialist reports into `comprehensive_security_assessment_report.md`. 
 Your task is to:
 1. Read this existing concatenated report
-2. Add Executive Summary (vulnerability overview) and Network Reconnaissance (security-relevant scan findings) sections at the top
+2. Add an Executive Summary (vulnerability overview) at the top
 3. Clean up ALL exploitation evidence sections by removing hallucinated content
 4. Save the modified version back to the same file

@@ -72,19 +72,10 @@ MODIFY the existing file `.shannon/deliverables/comprehensive_security_assessmen
 **Server-Side Request Forgery (SSRF) Vulnerabilities:**
 {Check for "SSRF Exploitation Evidence" section. Include actually exploited vulnerabilities and those blocked by security controls. Exclude theoretical vulnerabilities requiring internal network access. If vulnerabilities exist, summarize their impact and severity. If section is missing or empty, state: "No SSRF vulnerabilities were found."}

-## Network Reconnaissance
-{Extract security-relevant findings from automated scanning tools:
- Open ports and exposed services from nmap
- Subdomain discoveries from subfinder that could expand attack surface
- Security headers or misconfigurations detected by whatweb
- Any other security-relevant findings from the automated tools
-SKIP stack details - technical leaders know their infrastructure}
-
 2. KEEPING the existing exploitation evidence sections but CLEANING them according to the rules below

 3. The final structure should be:
   - Executive Summary (new)
-   - Network Reconnaissance (new)
   - All existing exploitation evidence sections (cleaned)

 IMPORTANT: Do NOT reorder the existing exploitation evidence sections. Maintain the exact order they appear in the concatenated report. Only remove sections that do not match the defined criteria above.
@@ -93,15 +84,12 @@ IMPORTANT: Do NOT reorder the existing exploitation evidence sections. Maintain

 <instructions>
 1. Read the pre_recon and recon deliverable files to gather security-relevant information:
-   - Focus on findings from automated tools (nmap, subfinder, whatweb) that indicate security risks
-   - Note exposed services, open ports, subdomains, security misconfigurations
   - Skip basic information such as technology stack information (the team knows their own stack)
   - Use technical leadership tone - precise but concise
   - Use the current date for the assessment date

-2. Create the Executive Summary and Network Reconnaissance content:
+2. Create the Executive Summary content:
   - Executive Summary: Technical overview with actionable findings for engineering leaders
-   - Network Reconnaissance: Focus on security-relevant discoveries from automated scans

 3. Clean the exploitation evidence sections from `.shannon/deliverables/comprehensive_security_assessment_report.md` by applying these rules:
   - KEEP these specific section headings:
@@ -18,7 +18,7 @@ import { formatTimestamp } from '../utils/formatting.js';
 import { Timer } from '../utils/metrics.js';
 import { createAuditLogger } from './audit-logger.js';
 import { dispatchMessage } from './message-handlers.js';
-import { type ModelTier, resolveModel } from './models.js';
+import { type ModelTier, resolveModel, supportsAdaptiveThinking } from './models.js';
 import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
 import { createProgressManager } from './progress-manager.js';

@@ -218,6 +218,7 @@ export async function runClaudePrompt(
  // 4. Configure SDK options
  // Model override from providerConfig takes precedence over env-based resolveModel
  const model = providerConfig?.modelOverrides?.[modelTier] ?? resolveModel(modelTier);
+  const adaptiveThinking = supportsAdaptiveThinking(model) && process.env.CLAUDE_ADAPTIVE_THINKING !== 'false';
  const options = {
    model,
    maxTurns: 10_000,
@@ -226,6 +227,7 @@ export async function runClaudePrompt(
    allowDangerouslySkipPermissions: true,
    settingSources: ['user'] as ('user' | 'project' | 'local')[],
    env: sdkEnv,
+    ...(adaptiveThinking && { thinking: { type: 'adaptive' as const } }),
    ...(outputFormat && { outputFormat }),
  };

@@ -39,7 +39,10 @@ function extractMessageContent(message: AssistantMessage): string {
  const messageContent = message.message;

  if (Array.isArray(messageContent.content)) {
-    return messageContent.content.map((c: ContentBlock) => c.text || JSON.stringify(c)).join('\n');
+    return messageContent.content
+      .filter((c: ContentBlock) => c.type !== 'thinking' && c.type !== 'redacted_thinking')
+      .map((c: ContentBlock) => c.text || JSON.stringify(c))
+      .join('\n');
  }

  return String(messageContent.content);
@@ -21,7 +21,7 @@ export type ModelTier = 'small' | 'medium' | 'large';
 const DEFAULT_MODELS: Readonly<Record<ModelTier, string>> = {
  small: 'claude-haiku-4-5-20251001',
  medium: 'claude-sonnet-4-6',
-  large: 'claude-opus-4-6',
+  large: 'claude-opus-4-7',
 };

 /** Resolve a model tier to a concrete model ID. */
@@ -35,3 +35,8 @@ export function resolveModel(tier: ModelTier = 'medium'): string {
      return process.env.ANTHROPIC_MEDIUM_MODEL || DEFAULT_MODELS.medium;
  }
 }
+
+/** Whether a model supports adaptive thinking. Opus 4.6 and 4.7 only. */
+export function supportsAdaptiveThinking(model: string): boolean {
+  return /opus-4-[67]/.test(model);
+}
@@ -52,6 +52,8 @@ export interface ToolResultData {
 export interface ContentBlock {
  type?: string;
  text?: string;
+  thinking?: string;
+  data?: string;
 }

 export interface AssistantMessage {
@@ -82,6 +82,26 @@ function generateTOTP(secret: string, timeStep: number = 30, digits: number = 6)
  return generateHOTP(secret, counter, digits);
 }

+// === Help ===
+
+function printHelp(): void {
+  console.log(
+    `generate-totp - emit a current 6-digit TOTP code for a base32-encoded secret.
+
+Usage:
+  generate-totp --secret <BASE32>
+  generate-totp --help
+
+Options:
+  --secret      Base32-encoded TOTP shared secret (characters A-Z, 2-7).
+  -h, --help    Show this help and exit.
+
+Output:
+  JSON to stdout. On success: {"status":"success","totpCode":"123456","expiresIn":<sec>}.
+  On error:   {"status":"error","message":"...","retryable":false} (exit 1).`,
+  );
+}
+
 // === Argument Parsing ===

 function parseSecret(argv: string[]): string {
@@ -97,6 +117,11 @@ function parseSecret(argv: string[]): string {
 // === Main ===

 function main(): void {
+  if (process.argv.includes('--help') || process.argv.includes('-h')) {
+    printHelp();
+    return;
+  }
+
  const secret = parseSecret(process.argv);

  if (!secret) {
@@ -19,6 +19,31 @@ import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
 import { join, resolve } from 'node:path';
 import { DELIVERABLE_FILENAMES, type DeliverableType } from '../types/deliverables.js';

+// === Help ===
+
+function printHelp(): void {
+  const types = Object.keys(DELIVERABLE_FILENAMES).join(', ');
+  console.log(
+    `save-deliverable - save a Shannon pentest deliverable under its canonical filename.
+
+Usage:
+  save-deliverable --type <TYPE> --file-path <path>
+  save-deliverable --type <TYPE> --content '<text>'
+  save-deliverable --help
+
+Options:
+  --type        Deliverable type (required). One of:
+                  ${types}
+  --file-path   Path of a file whose contents to save (preferred for large content).
+  --content     Inline content string to save.
+  -h, --help    Show this help and exit.
+
+Output:
+  JSON to stdout. On success: {"status":"success","filepath":"..."}.
+  On error:   {"status":"error","message":"...","retryable":true|false} (exit 1).`,
+  );
+}
+
 // === Argument Parsing ===

 interface ParsedArgs {
@@ -69,6 +94,11 @@ function saveDeliverableFile(targetDir: string, filename: string, content: strin
 // === Main ===

 function main(): void {
+  if (process.argv.includes('--help') || process.argv.includes('-h')) {
+    printHelp();
+    return;
+  }
+
  const args = parseArgs(process.argv);

  // 1. Validate --type
@@ -210,7 +210,7 @@ async function validateCredentials(
  // 1. Custom base URL — validate endpoint is reachable via SDK query
  if (process.env.ANTHROPIC_BASE_URL && process.env.ANTHROPIC_AUTH_TOKEN) {
    const baseUrl = process.env.ANTHROPIC_BASE_URL;
-    logger.info(`Validating custom base URL: ${baseUrl}`);
+    logger.info('Validating custom base URL');

    try {
      for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
@@ -394,7 +394,7 @@ function httpHead(url: string, timeoutMs: number): Promise<number> {

 /** Check that the target URL is reachable from inside the container. */
 async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
-  logger.info('Checking target URL reachability...', { targetUrl });
+  logger.info('Checking target URL reachability...');

  // 1. Parse URL
  let parsed: URL;
@@ -11,7 +11,7 @@
 /**
 * Specific error codes for reliable classification.
 *
- * ErrorCode provides precision within the coarse 8-category PentestErrorType.
+ * ErrorCode provides precision within the coarse 7-category PentestErrorType.
 * Used by classifyErrorForTemporal for code-based classification (preferred)
 * with string matching as fallback for external errors.
 */
@@ -47,15 +47,7 @@ export enum ErrorCode {
  BILLING_ERROR = 'BILLING_ERROR',
 }

-export type PentestErrorType =
-  | 'config'
-  | 'network'
-  | 'tool'
-  | 'prompt'
-  | 'filesystem'
-  | 'validation'
-  | 'billing'
-  | 'unknown';
+export type PentestErrorType = 'config' | 'network' | 'prompt' | 'filesystem' | 'validation' | 'billing' | 'unknown';

 export interface PentestErrorContext {
  [key: string]: unknown;