Agent System
Ghost’s AI agent is like a senior engineer you can pair with — it doesn’t just answer questions, it makes plans, executes them step by step, reflects on what it found, and decides when it’s done. You give it a goal (“find security vulnerabilities in this API” or “generate test cases for the checkout flow”), and it autonomously plans its approach, uses the right tools in the right order, adapts when it discovers something unexpected, and produces a structured report.
Under the hood, the agent runs a plan-execute-reflect-terminate loop — up to 25 iterations where it plans what to do, executes tools, reflects on results, and checks whether it should stop. It has access to ~60 tools (depending on mode and what’s connected), but each LLM call only sees 15-25 relevant tools — a dynamic router filters tools based on what the agent is currently doing.
Architecture
Section titled “Architecture”What this diagram shows — how the agent’s components work together:
The Agent Core is the brain — it runs a loop up to 25 times, checking termination conditions each iteration, reading any steering messages the user sent mid-run, and pausing for interactive choices when the agent needs user input via present_options. Each iteration, it calls the LLM Provider (Claude, GPT-4o, or Ollama) via streaming, which returns text and/or tool calls. The Tool Router filters the full registry (~60 tools) down to 15-25 relevant tools based on what the agent is currently doing — so the LLM isn’t overwhelmed with irrelevant options. Tool calls are executed by the Concurrent Executor, which runs read-only tools in parallel (up to 4 at once) and mutating tools one at a time. After execution, the Context Manager estimates token usage and prunes old messages if the conversation is approaching the LLM’s context window limit — preserving the system prompt, the user’s original question, and recent exchanges.
Plan-Execute-Reflect Loop
Section titled “Plan-Execute-Reflect Loop”Each iteration (maximum 25) follows 8 steps:
Step 1: Check Termination
Section titled “Step 1: Check Termination”Six signals are evaluated in priority order (see Termination Signals below). If any signal fires, the agent either stops immediately or enters a “report then stop” mode where it gets 1-2 more iterations to write a summary.
Step 2: Check Steering
Section titled “Step 2: Check Steering”The agent reads from a buffered channel (capacity 5) without blocking. If the user sent a steering message via POST /api/v1/agent/steer, it’s injected as a [USER STEERING] block into the conversation — appearing before the agent’s next LLM call. This lets you redirect the agent mid-run: “focus on the authentication endpoints” or “skip performance testing and go to reporting.”
Step 3: Build Messages
Section titled “Step 3: Build Messages”The full message history is assembled: system prompt + conversation history + engagement state (an XML block showing the current plan progress, discovered endpoints, findings so far, and tool call counts). This engagement state acts as “Layer 2 memory” — it survives message pruning and gives the LLM awareness of overall progress even when older messages have been removed.
Step 4: Inject Reflections
Section titled “Step 4: Inject Reflections”Two types of reflection prompts can be injected:
- Step reflection — after a
complete_steptool call, an XML block asks the agent to assess: What did we learn? Was the evidence sufficient? Any unexpected discoveries? What should the next step focus on? - Final reflection — when all plan steps are done, a longer prompt asks: Were all goals addressed? Quality of findings? What areas were missed? Confidence level? Recommendations for follow-up?
These reflections force the agent to pause and think critically rather than blindly executing the next step.
Step 5: Get Tool Definitions
Section titled “Step 5: Get Tool Definitions”The tool router filters the full registry down to the tools relevant for the current plan step’s category. This is crucial for LLM performance — giving the LLM 60 tools at once leads to confusion and poor tool selection. Giving it 15-25 relevant tools produces much better results.
Step 6: Call LLM (Stream)
Section titled “Step 6: Call LLM (Stream)”The provider’s StreamChat method is called with the message history, filtered tools, and MaxTokens: 8192. The response streams back as events — text chunks and tool call definitions arrive incrementally. SSE events are emitted to the frontend in real time so the user sees the agent “thinking.”
Step 7: Handle Response
Section titled “Step 7: Handle Response”Two paths depending on whether the LLM made tool calls:
Text-only response (the LLM just wrote text without calling any tools):
- If the plan is completed → stop (the agent is done)
- If
reportThenStopwas set by a termination signal → stop - If no plan exists yet → inject a “planning nudge” asking the agent to create a plan (maximum 2 nudges before giving up)
- Otherwise → inject a “continuation nudge” asking the agent to proceed, or stop if it shouldn’t continue
Tool call response (the LLM wants to use tools):
- Execute the tools (parallel or sequential — see Concurrent Tool Execution)
- Compress tool output to save context tokens
- Update engagement state (endpoints discovered, findings, tool call counts, phase transitions)
- Reset the consecutive text-only counter
Step 8: Manage Context
Section titled “Step 8: Manage Context”After each iteration, the context manager estimates the total token count and prunes old messages if approaching the provider’s context window limit. This prevents the conversation from exceeding the LLM’s maximum input size.
Tool Router
Section titled “Tool Router”The router solves a critical problem: the agent has ~60 tools available, but sending all of them to the LLM every call wastes tokens and confuses the model. Instead, the router uses a 4-layer filtering strategy to select 15-25 tools per call.
What this diagram shows — how tool filtering works:
The full registry of ~60 tools is never sent to the LLM all at once (except during initial planning). Instead, four layers contribute tools that are merged, deduplicated, and sorted alphabetically. Layer 1 provides 16 base tools that every task needs (plan management including think and present_options, traffic search, filesystem, journey export). Layer 2 adds tools specific to the current phase — if the agent is doing reconnaissance, it gets session/journey listing tools; if it’s doing active testing, it gets fuzzing and scanning tools. Layer 3 adds any tools the plan step explicitly requested. Layer 4 adds conditional tools that depend on what’s connected — browser tools only appear when the browser extension is active, inspector tools when a mobile device is connected, Frida tools when Frida is available. The alphabetical sort is deliberate — it produces deterministic tool ordering that maximizes Anthropic’s prompt cache hits.
Layer 1 — Base Tools (16, always available)
Section titled “Layer 1 — Base Tools (16, always available)”These tools are included in every LLM call because they’re fundamental to how the agent operates:
| Category | Tools | Purpose |
|---|---|---|
| Plan management | create_plan, revise_plan, complete_step, think, present_options | Creating and progressing through the execution plan, private reasoning, and presenting interactive choices to the user |
| Traffic analysis | search_traffic, get_flow, get_flow_body, find_endpoints, get_traffic_stats | Searching and reading captured HTTP traffic |
| Flow annotation | tag_flows, annotate_flow | Marking flows with tags and notes for organization |
| Journey | record_journey, journey_export | Start/stop journey recording and export journeys as deterministic test code (always available, not phase-dependent) |
| Filesystem | fs_read, fs_write, fs_list | Reading and writing files (test output, reports, configs) |
Layer 2 — Phase-Specific Tools
Section titled “Layer 2 — Phase-Specific Tools”Different plan step categories unlock different tool sets. The category is set when the agent creates its plan — each step has a Category field that maps to one of these groups:
| Step Category | Tools Added | Count |
|---|---|---|
| recon | list_sessions, list_journeys, get_journey_steps, list_ws_messages, list_console_errors, list_navigations, list_storage_changes, replay_request, get_page_resources | 9 |
| analysis | detect_anomalies, detect_schema_drift, detect_regression, suggest_edge_cases, analyze_api_coverage, analyze_sequences, list_findings, map_page_apis, analyze_auth_pattern, analyze_jwt, find_sensitive_data, detect_idor, security_headers_audit | 13 |
| active_test | send_http_request, replay_request, fuzz_endpoint, compare_environments, test_form, request_approval, replay_journey, attack_request, list_wordlists, + 10 external scanner tools (run_nuclei, run_dalfox, run_ffuf, run_sqlmap, run_katana, run_trufflehog, run_semgrep, run_nmap, run_ssl_scan, run_hydra) | 19+scanners |
| exploit | send_http_request, replay_request, request_approval, attack_request, list_wordlists, run_sqlmap | 6 |
| report | generate_test, generate_bug_report, export_as, generate_api_docs, generate_mock_server, generate_session_report, visual_regression, list_findings | 8 |
Layer 3 — Step-Explicit Tools
Section titled “Layer 3 — Step-Explicit Tools”When the agent creates a plan, each step can include a Tools field listing specific tools. These are always included for that step, even if they don’t match the step’s category. This lets the agent plan ahead: “In step 3, I’ll need the fuzz_endpoint tool even though this is an analysis step.”
Layer 4 — Conditional Tools
Section titled “Layer 4 — Conditional Tools”These tools only appear when their corresponding capability is available. The router checks the tool registry — if the tools are registered, they’re included:
| Capability | Condition | Tools Added |
|---|---|---|
| Browser extension | Extension WebSocket connected, browser tools registered | Up to 17 browser tools defined (6 actually registered: browser_read_page, browser_query_all, browser_click, browser_fill, browser_screenshot, browser_inject) |
| Mobile inspector | Device connected, inspector tools registered | 7 tools: get_device_screen, get_element_tree, find_elements, get_element_selectors, correlate_element_traffic, tap_device, type_device |
| Frida | Frida available, Frida tools registered | 6 tools: frida_attach, frida_detach, frida_trace, frida_inject, frida_list_methods, frida_check_bypass |
No Plan? Full Registry
Section titled “No Plan? Full Registry”When no plan exists yet (the agent hasn’t created one), the router returns the full registry — all ~60 tools. This gives the agent complete awareness of its capabilities when planning. Once a plan is created, subsequent calls get the filtered 15-25 tool set.
Tool Registries
Section titled “Tool Registries”Ghost maintains two separate tool registries — one for QA mode and one for Security mode. Each registers a different combination of tools:
QA Registry (~59 tools)
Section titled “QA Registry (~59 tools)”Includes: core tools (13), plan tools (5), QA generation tools (7: generate_test, generate_bug_report, detect_regression, export_as, generate_api_docs, generate_mock_server, generate_test_scenarios), QA advanced analysis tools (11: fuzz_endpoint, compare_environments, test_form, detect_anomalies, detect_schema_drift, suggest_edge_cases, analyze_api_coverage, analyze_sequences, map_page_apis, visual_regression, generate_session_report), QA external tools (2: run_k6, run_hey), inspector tools (7), proxy_inject_script (1), TestRail tools (4: testrail_list_projects, testrail_get_cases, testrail_push_results, testrail_suggest_cases), and browser tools (6, conditional).
Security Registry (~57 tools)
Section titled “Security Registry (~57 tools)”Includes: core tools (13), plan tools (5), security tools (6: list_findings, send_http_request, get_page_resources, attack_request, list_wordlists, request_approval), Frida tools (6, conditional), external scanner tools (10: run_nuclei, run_dalfox, run_ffuf, run_sqlmap, run_katana, run_trufflehog, run_semgrep, run_nmap, run_ssl_scan, run_hydra), inspector tools (7), proxy_inject_script (1), and browser tools (6, conditional).
Tool Timeouts
Section titled “Tool Timeouts”Different tools have different timeout durations based on their expected execution time:
| Tool Pattern | Timeout | Why |
|---|---|---|
| Default | 30 seconds | Most tools (search, get, list) complete quickly |
browser_* | 60 seconds | Browser automation involves page loads and network waits |
frida_* | 60 seconds | Frida operations involve device communication |
frida_trace, frida_inject | 3 minutes | Tracing and injection may involve long-running scripts |
attack_request | 5 minutes | Attacker engine sends many requests sequentially |
run_* (external scanners) | 5 min + 10s | External tools like Nuclei or SQLMap can run for minutes |
Engagement Phases
Section titled “Engagement Phases”The agent doesn’t just randomly use tools — it follows a structured methodology appropriate to its mode. Phases represent stages of a security assessment or QA test cycle.
Security Mode (PTES Methodology)
Section titled “Security Mode (PTES Methodology)”PTES (Penetration Testing Execution Standard) is a widely-used framework for security assessments. Ghost’s security agent follows these phases:
What this diagram shows — the security assessment progression:
The agent starts by analyzing existing traffic to understand the application’s API surface. It then moves to passive detection — looking for vulnerabilities in the captured traffic without sending any new requests (missing headers, insecure cookies, information leakage). If the scan mode allows it, the agent progresses to active scanning — actually sending crafted requests to test for SQL injection, XSS, and other vulnerabilities. Confirmed vulnerabilities may lead to exploitation — proving the vulnerability is real with a proof-of-concept. Finally, the agent generates a report summarizing all findings with severity ratings and remediation advice.
| Phase | Name | Tools Available | Auto-Advance After |
|---|---|---|---|
| 1 | Traffic Analysis | Recon tools | 3 tool calls |
| 2 | Passive Detection | Analysis tools | 4 tool calls |
| 3 | Active Scanning | Active test tools + scanners | 3 tool calls |
| 4 | Exploitation | Exploit tools | 4 tool calls |
| 5 | Reporting | Report tools | — (terminal) |
QA Mode
Section titled “QA Mode”What this diagram shows — the QA testing progression:
The agent starts with reconnaissance — understanding the application by examining traffic patterns, sessions, and page resources. It then performs functional testing — verifying expected behavior of API endpoints and user flows. Edge case testing explores boundary conditions and unusual inputs. Error handling testing checks how the application responds to invalid requests and error conditions. Performance testing uses tools like k6 and hey to measure response times under load. Finally, QA reporting generates test cases, bug reports, and coverage summaries.
| Phase | Name | Auto-Advance After |
|---|---|---|
| 1 | QA Recon | 3 tool calls |
| 2 | Functional Testing | 3 tool calls |
| 3 | Edge Case Testing | 3 tool calls |
| 4 | Error Handling | 3 tool calls |
| 5 | Performance Testing | 2 tool calls |
| 6 | QA Reporting | — (terminal) |
Auto-Advance
Section titled “Auto-Advance”Phase transitions happen automatically based on PhaseToolCalls — the number of tool calls in the current phase. When the threshold is reached, the agent advances to the next phase. The PhaseToolCalls counter resets to 0 on each transition.
This prevents the agent from spending too long in any single phase. If the agent calls 3 tools during traffic analysis, it automatically advances to passive detection — even if it hasn’t finished analyzing everything. The agent can always use tools from previous phases (they’re still available), but the phase change influences which tools appear first in the filtered set.
Engagement State
Section titled “Engagement State”The agent maintains a state object that tracks everything discovered during the run:
| Field | Type | Description |
|---|---|---|
| Mode | string | "qa" or "security" — determines which phases and prompt to use |
| Phase | string | Current engagement phase (e.g., "passive_detection", "functional_testing") |
| Endpoints | list | Discovered API endpoints (cap: 200). Deduplicated by a map[string]bool. |
| Findings | list | Vulnerabilities or bugs found (cap: 100). Each has an ID (VULN-001 for security, BUG-001 for QA), type, severity, and summary. |
| ToolsCalled | int | Total number of tool invocations across the entire run |
| PhaseToolCalls | int | Tool calls in the current phase — resets on phase transition, drives auto-advance |
| Iteration | int | Current loop iteration (0-24) |
| TargetHosts | list | Hosts the agent is testing |
| ActiveInjections | list | Proxy injection rules the agent has created (cap: 50) |
| Plan | TaskPlan | The structured execution plan (see below) |
| ReflectionCount | int | How many reflection prompts have been injected |
| FinalReflectionDone | bool | Whether the final reflection has been completed |
| QualityAssessment | string | The agent’s self-assessment of its work quality |
| MissedAreas | list | Areas the agent identified as not yet covered |
| StopAfterCurrentStep | bool | Set to true when the user requests a stop via the API |
This state is serialized to XML and injected into every LLM call, giving the agent persistent awareness of its progress even when older messages are pruned.
ShouldContinue Logic
Section titled “ShouldContinue Logic”After a text-only response, the agent checks whether it should continue looping or stop:
Returns false (stop) when:
- Phase is
"done"— the agent has completed all phases - Last tool was
proxy_inject_scriptwith action"add"— the agent just injected a script and should wait for results - Phase is a reporting phase (
"reporting"or"qa_reporting") — the agent is writing its final report - Iteration >= 23 (
maxIterations - 2) — approaching the hard cap, time to wrap up
Termination Signals
Section titled “Termination Signals”Six signals are evaluated in priority order at the start of each iteration. Higher-priority signals override lower ones:
| Priority | Signal | Condition | Action | Description |
|---|---|---|---|---|
| 1 | Plan complete + final reflection | Plan status is "completed" AND FinalReflectionDone is true | stop | The agent finished its plan and reflected on the results — a clean exit. |
| 2 | Plan complete, no tools | Plan status is "completed" AND the last iteration had zero tool calls | stop | The agent completed its plan and the LLM responded with only text (no more tools to call) — done. |
| 3 | Loop detection | Same tool name + same input hash appears 3+ times in the last 10 tool call records | report_then_stop | The agent is stuck in a loop — calling the same tool with the same arguments repeatedly. Input is hashed with FNV-1a (32-bit) for efficient comparison. |
| 4 | Diminishing returns | Iteration >= 8 AND the last 6 tool calls are all the same tool AND the current step is "in_progress" | report_then_stop | The agent isn’t making progress — it keeps using the same tool without advancing the plan. The iteration >= 8 guard prevents false positives during early exploration. |
| 5 | Budget reservation | Iteration >= 22 (maxIterations - 3) | report_then_stop | Running out of iterations — reserve the last 3 for the agent to write a summary and complete its current step. |
| 6 | User stop | StopAfterCurrentStep is true (set via POST /api/v1/agent/stop) | report_then_stop | The user explicitly asked the agent to stop. |
report_then_stop Behavior
Section titled “report_then_stop Behavior”When the action is report_then_stop (signals 3-6), the agent doesn’t stop immediately. Instead:
- A termination prompt is injected as an XML block:
<termination_notice reason="...">Complete your current step with complete_step and write a final summary.</termination_notice> - The
reportThenStopflag is set - The agent gets 1-2 more iterations to finish its current step and write a summary
- On the next iteration, if the LLM responds with text only (no tool calls), the agent stops
This ensures the agent always produces useful output — even when interrupted, it summarizes what it found.
Interactive Choices (present_options)
Section titled “Interactive Choices (present_options)”The agent can present structured choices to the user mid-run using the present_options tool. This creates an interactive UI element — a card with labeled option buttons and a free-text input — instead of requiring the user to type a response.
How it works:
- The LLM calls
present_optionswith aquestionstring and an array ofoptions(each withlabel,description, andvalue) - The agent emits a
StreamEventOptionsevent containing the question and options - The agent sets
waitingForUserChoice = trueand exits the loop cleanly — emitting metrics with termination reason"waiting for user choice" - The frontend renders an
OptionsPanelcomponent with a 2-column card grid of options plus a free-text input field - When the user selects an option (or types a custom response), the frontend sends a new chat message containing the selected value
- A new agent run starts with the user’s choice in the conversation history, and the agent continues from where it left off
The Run Summary is suppressed when the termination reason is "waiting for user choice" — the agent isn’t actually done, it’s just pausing for input. The options panel shows the question with interactive buttons, and after the user responds, the answered panel fades to 50% opacity with a check icon on the selected option.
Tool Definition
Section titled “Tool Definition”| Field | Type | Description |
|---|---|---|
question | string | The question to present to the user (required) |
options | array | 2-6 options, each with label (short display text), description (optional context), and value (returned when selected) |
Think Tool
Section titled “Think Tool”The think tool gives the agent a private scratchpad for reasoning. When the LLM calls think with a thought string, the tool simply returns "ok" — the thought is visible to the LLM in the conversation history but is not displayed as regular text to the user.
In the frontend, the think tool renders as a ThinkIndicator — a purple-bordered card with a brain icon and bouncing dots animation while pending, then “Reasoned about the approach” when complete. It does not use the standard accordion card UI. The think tool is always available (registered as a plan tool).
TaskPlan Model
Section titled “TaskPlan Model”The agent’s execution plan is a structured object, not free-form text. This gives the system precise control over progress tracking:
| Field | Type | Constraints | Description |
|---|---|---|---|
| Goal | string | — | High-level objective (e.g., “Find authentication vulnerabilities in the checkout API”) |
| Scope | string | — | Boundaries of the assessment (e.g., “api.example.com, POST endpoints only”) |
| Steps | list | 1-15 steps | Ordered execution steps. Minimum 1 allows focused single-step tasks (like “export journey as Cypress UI”), maximum 15 prevents over-planning. |
| CurrentStep | int | — | Index of the active step |
| Revisions | int | Max 5 | How many times the plan has been revised. Cap prevents infinite replanning. |
| Status | string | active / completed / aborted | Overall plan status |
Each step has:
| Field | Type | Values | Description |
|---|---|---|---|
| ID | int | — | Step number |
| Description | string | — | What this step will do |
| Category | string | recon, analysis, active_test, exploit, report | Maps to tool router for phase-specific tool selection |
| Tools | list | — | Specific tools this step plans to use (included via Layer 3 routing) |
| Status | string | pending, in_progress, completed, skipped, failed | Step progress |
| Result | string | Max 500 chars | Summary of what the step accomplished |
| Substeps | list | — | Finer-grained breakdown (optional) |
XML Serialization
Section titled “XML Serialization”The plan is serialized to XML and injected into every LLM call so the agent always knows where it is:
<current_plan progress="3/7" phase="active_test"> Goal: Find authentication bypass vulnerabilities Scope: api.example.com checkout endpoints Steps: [DONE] Step 1: Analyze traffic patterns (Found 12 endpoints...) [DONE] Step 2: Check authentication headers (Missing auth on 3...) [IN PROGRESS] Step 3: Fuzz authentication parameters [PENDING] Step 4: Test session management [PENDING] Step 5: Generate proof-of-concept [PENDING] Step 6: Write security report</current_plan>Step results are truncated to 80 characters in the XML to save tokens.
Concurrent Tool Execution
Section titled “Concurrent Tool Execution”When the LLM requests multiple tool calls in a single response, Ghost can execute them in parallel — but only if they’re safe to parallelize.
26 read-only tools (safe for parallel execution):
search_traffic, get_flow, get_flow_body, find_endpoints, get_traffic_stats, list_sessions, list_journeys, list_ws_messages, list_console_errors, list_navigations, list_storage_changes, get_page_resources, detect_anomalies, detect_schema_drift, detect_regression, suggest_edge_cases, analyze_api_coverage, analyze_sequences, list_findings, map_page_apis, fs_read, fs_list, get_device_screen, get_element_tree, find_elements, get_element_selectors
All other tools (like tag_flows, send_http_request, fuzz_endpoint, attack_request) are mutating — they change state or send requests. These always execute sequentially to avoid race conditions.
Parallel execution uses a semaphore channel with capacity 4 — at most 4 read-only tools run simultaneously. This limits resource usage while still providing significant speedup when the LLM requests many searches or listings at once.
Tool Output Compression
Section titled “Tool Output Compression”Tool results can be large — a traffic search might return thousands of flows, a scanner might produce megabytes of output. Before feeding results back to the LLM, Ghost compresses them to a maximum of 16,000 characters (~4,000 tokens):
| Tool | Compression Strategy |
|---|---|
| nuclei | Keep all critical+high findings, cap medium at 20, low at 10 |
| sqlmap | Preserve lines confirming injectable parameters |
| katana | Prioritize parameterized URLs (cap 100), plain paths (cap 50) |
| ffuf | Non-200 status codes first, cap at 50 results |
| trufflehog | Cap at 50 findings |
| semgrep | Error severity: 30, warning: 20, info: 10 |
| search_traffic | Cap JSON results at 50 flows |
| get_flow_body | Direct truncation |
| get_device_screen | Strip base64 image data (too large for context) |
| Default | UTF-8-safe truncation with summary: "...\n[Truncated — showing first N of M chars]" (150 chars reserved for suffix) |
Context Management
Section titled “Context Management”LLMs have a maximum input size (called the context window). As the agent’s conversation grows with tool calls and results, it can exceed this limit. The context manager prevents this by estimating token usage and pruning old messages.
Token Estimation
Section titled “Token Estimation”Ghost uses a simple heuristic: ~4 characters per token + overhead:
tokens ≈ len(text) / 4 + 1Per-message overhead: +4 tokens for role/delimiter formatting. Tool calls add: name tokens + input tokens + 10 for JSON structure.
This is deliberately approximate — exact tokenization would require running the provider’s tokenizer, which is slow. The estimate errs on the side of overestimating (pruning slightly early is better than exceeding the context window).
Provider Context Windows
Section titled “Provider Context Windows”| Provider / Model | Context Window | Output Reserved |
|---|---|---|
| Claude (Sonnet 4.6, Opus, Haiku) | 200,000 tokens | 8,192 tokens |
| GPT-4o, GPT-4 Turbo | 128,000 tokens | 8,192 tokens |
| GPT-4 (original) | 8,192 tokens | 8,192 tokens |
| GPT-3.5 | 16,385 tokens | 8,192 tokens |
| Ollama (default) | 32,000 tokens | 8,192 tokens |
| Unknown/fallback | 32,000 tokens | 8,192 tokens |
Budget = context window - output reservation. For Claude: 200,000 - 8,192 = 191,808 tokens available for input.
Message Pruning
Section titled “Message Pruning”When the estimated token count exceeds the budget:
- Skip if too few messages — won’t prune if there are 4 or fewer messages (need minimum context)
- Try progressively aggressive pruning — starts by keeping the last 10 exchanges, reduces to 9, 8, … down to 1 until the budget is met
- Force minimum — if even keeping 1 exchange exceeds budget, force it anyway
An “exchange” is one assistant message (with tool calls) + all its subsequent tool result messages. This keeps related tool calls and their results together.
What’s preserved (never pruned):
- System message (index 0) — the agent’s identity and instructions
- Original user message (index 1) — the user’s initial question/goal
- Compact summary of pruned exchanges (synthetic)
- Last N exchanges (the most recent work)
Compact Summary
Section titled “Compact Summary”The summary of pruned exchanges is mechanical (no LLM call needed):
- Lists tools used with invocation counts:
"Tools used: search_traffic(3), get_flow(1)" - Includes at most 2 key findings (first 200 characters each)
- Format:
"[Earlier conversation context — N tool exchanges pruned... See engagement state for full progress.]"
Message Alternation Fix
Section titled “Message Alternation Fix”Some providers (particularly Anthropic) require strict alternation between user and assistant messages — no two consecutive messages from the same role. The fixAlternation function inserts synthetic "[Continuing from previous step.]" user messages between consecutive assistant messages. Tool result messages are treated as user-role for this purpose.
LLM Providers
Section titled “LLM Providers”Ghost supports three LLM providers, each with its own adapter:
Anthropic (Claude)
Section titled “Anthropic (Claude)”- Default model: Claude Sonnet 4.6
- Prompt caching: System messages and the last tool definition get
CacheControl: Ephemeral— Anthropic caches these across calls, reducing cost and latency for the agent’s iterative loop - Message merging: Consecutive user messages are merged into a single message with multiple text blocks (Anthropic API requirement). Consecutive tool results are merged similarly. Critically, tool result messages absorb any immediately following user messages into the same Anthropic user message — because tool results become user-role messages in the Anthropic API, a real user message immediately after tool results would create invalid consecutive user messages. This merging prevents 400 errors from the Anthropic API (“tool_use blocks must have matching tool_result blocks”). OpenAI does not have this issue because it uses a separate
toolrole. - Stream buffer: 64 events
OpenAI (GPT-4o)
Section titled “OpenAI (GPT-4o)”- Default model: GPT-4o
- Retry logic: Up to 3 retries with 2-second base wait. Rate limit detection parses “try again in Xs” from error messages using regex.
- Stream buffer: 64 events
- Uses
ChatCompletionAccumulatorfor streaming assembly
Ollama (Local)
Section titled “Ollama (Local)”- Default endpoint:
http://localhost:11434 - Default model:
llama3.2 - API compatibility: Uses OpenAI-compatible API at
/v1path with dummy API key"ollama" - Retry resilience: Retries on connection refused, connection reset, HTTP 500/503, EOF, “no such host”, and “model is loading” — Ollama often needs time to load models into memory
System Prompts
Section titled “System Prompts”The agent’s system prompt varies by mode and is assembled from multiple sections:
QA Mode Prompt Sections
Section titled “QA Mode Prompt Sections”- Identity — who the agent is (“Ghost QA Agent”)
- Session context — current session info
- Strategic directives — testing strategy guidance
- File upload directive — how to handle uploaded files
- Planning protocol — rules for creating and executing plans
- Data parsing protocol — how to parse traffic data formats
- Injection rules (conditional) — included only when browser extension is connected
- Inspector context (conditional) — included only when a mobile device is connected
- Constraints — limitations and boundaries
- Few-shot examples — example interactions showing good agent behavior
- GQL reference — GraphQL-specific guidance
Security Mode Prompt Sections
Section titled “Security Mode Prompt Sections”Two variants: web security and mobile security (which includes Frida context).
3 scan modes restrict what the agent can do:
| Scan Mode | Description | Key Restrictions |
|---|---|---|
| passive | Observation only — analyze existing traffic, never send requests | FORBIDDEN: send_http_request, fuzz_endpoint, attack_request, all run_* scanners, all frida_* tools |
| active-safe | Non-destructive testing — send requests but no exploitation | ALLOWED: send_http_request, fuzz_endpoint, most scanners. FORBIDDEN: run_sqlmap, run_hydra, attack_request (cluster_bomb/battering_ram modes) |
| active-full | Full offensive testing — all tools available | REQUIRES: request_approval before destructive actions. All tools allowed. |
The security prompt includes vulnerability taxonomies organized by severity:
- Critical: SQL injection, RCE, auth bypass, SSRF to internal networks, hardcoded secrets
- High: IDOR, stored XSS, path traversal, JWT issues, mass assignment
- Medium: CORS misconfiguration, reflected XSS, info disclosure, rate limiting, CSRF
- Low: Verbose errors, missing headers, cookie flags, version disclosure
Mobile-specific additions (when Frida is available): certificate pinning bypass, insecure local storage, biometric bypass, IPC vulnerabilities, root/jailbreak detection bypass.
Conversation Persistence
Section titled “Conversation Persistence”Conversations and messages are stored in SQLite, linked to sessions:
| Table | Key Fields | Notes |
|---|---|---|
| conversations | id, session_id, title, created_at, updated_at | CASCADE delete with session |
| messages | id, conversation_id, role, content, tool_calls (JSON), tool_call_id | CASCADE delete with conversation |
Messages store tool calls as a JSON array — each tool call includes the tool name, input arguments, and the tool call ID. This allows conversations to be resumed: the full tool call history is preserved and can be replayed.
When loading a conversation for continuation, sanitizeConversationHistory fixes common issues: orphaned tool results (without a preceding assistant message), consecutive same-role messages (inserts synthetic bridging messages), and ensures proper alternation for provider compatibility.
SSE Event Types
Section titled “SSE Event Types”The agent streams events to the frontend via Server-Sent Events (SSE). Each event has a type and JSON data:
| Event Type | When Emitted | Payload |
|---|---|---|
chunk | Each text token from the LLM | Text delta string |
tool_call | LLM requests a tool | Tool name + input JSON |
tool_result | Tool execution completes | Tool name + output |
assistant_message | Complete assistant message assembled | Full message content |
plan_created | Agent creates a new plan | TaskPlan object |
step_started | Plan step begins execution | Step ID + description |
step_completed | Plan step finished | Step ID + result |
plan_revised | Agent revises its plan | Updated TaskPlan |
plan_completed | All plan steps finished | Final plan state |
options | Agent presents interactive choices to the user | Question string + array of option objects (label, description, value) |
steer | User steering message injected | Steering text |
metrics | Agent run completed | RunMetrics (duration, iterations, tool calls, findings) |
error | Error occurred | Error message |
done | Agent run finished | — |
API Reference
Section titled “API Reference”| Method | Endpoint | Description |
|---|---|---|
POST | /api/v1/agent/chat | Start or continue an agent conversation. Body limit: 64 KB. Returns SSE stream. Creates new conversation if no conversation_id provided. |
GET | /api/v1/agent/conversations | List all agent conversations (ordered by updated_at DESC) |
GET | /api/v1/agent/conversations/{id} | Get a specific conversation with all messages |
DELETE | /api/v1/agent/conversations/{id} | Delete a conversation and all its messages |
POST | /api/v1/agent/steer | Send a steering message to the running agent. Injected at the next iteration. |
POST | /api/v1/agent/stop | Request the agent to stop after completing its current step |
POST | /api/v1/agent/upload | Upload a file for agent context. Body limit: 512 KB. Allowed: .txt, .md (100 KB), .json, .yaml, .yml (200 KB). |
Run Metrics
Section titled “Run Metrics”When the agent completes (regardless of how it stopped), it emits a metrics event with detailed statistics:
| Metric | Description |
|---|---|
| Duration | Total wall-clock time |
| Iterations | How many loop iterations were used (out of 25 max) |
| ToolCalls | Total number of tool invocations |
| UniqueTools | Number of distinct tools used |
| FailedTools | How many tool calls returned errors |
| PlanSteps | Total steps in the plan |
| StepsCompleted | How many steps were completed |
| PlanRevisions | How many times the plan was revised |
| Reflections | Number of reflection prompts injected |
| TerminationReason | Why the agent stopped (plan_complete, loop_detected, budget, user_stop, waiting_for_user_choice, etc.) |
| LoopsDetected | Number of loop detection triggers |
| FindingsTotal | Total findings discovered |
| FindingsBySeverity | Breakdown: {critical: 1, high: 3, medium: 5, ...} |