AI Agent
Every request that passes through Ghost is stored with full detail — headers, bodies, timing, TLS info, browser interactions. The AI agent can read all of this data, reason about it, and take action. Instead of manually scrolling through hundreds of flows looking for bugs or security issues, you describe what you want in plain English, and the agent does the work: searching traffic, analyzing patterns, generating test code, writing bug reports, or running security scans.
The agent is not a simple chatbot. It follows a structured plan → execute → reflect → terminate cycle. It creates a step-by-step plan for your request, executes each step using specialized tools (searching traffic, calling external scanners, writing files), reflects on what it found after each step, and stops when the plan is complete. You can watch the plan progress in real-time and redirect the agent mid-run if it goes off track.
How to Use
Section titled “How to Use”- Open the Chat Panel — click the chat icon in the toolbar or press
Ctrl+4 - Type a question or instruction in natural language
- The agent creates a plan, then starts executing it — you’ll see tool calls and results streaming in real-time
- The Plan Panel above the chat shows live progress: which step is active, which are done, which are pending
What You Can Ask (QA Mode)
Section titled “What You Can Ask (QA Mode)”- “Find all API errors in this session and generate a bug report”
- “Generate Playwright tests for the checkout flow”
- “Compare this session against the previous one — did anything regress?”
- “Generate API documentation from the captured traffic”
- “Run a load test against the search endpoint with 50 virtual users”
- “Highlight all product prices on the page” (if the browser extension is connected)
What You Can Ask (Security Mode)
Section titled “What You Can Ask (Security Mode)”- “Scan all endpoints for SQL injection”
- “Analyze the JWT tokens in the traffic for weaknesses”
- “Check all endpoints for missing security headers”
- “Run a full penetration test against api.example.com”
- “Find any hardcoded secrets in the JavaScript bundles”
Agent Architecture
Section titled “Agent Architecture”What this diagram shows — the agent’s iteration loop:
-
Dynamic system prompt — Every LLM call includes a system prompt that Ghost builds dynamically. It contains the current session’s traffic statistics (total flows, error rate, top hosts, method distribution), which tools are available (different in QA vs Security mode), whether the browser extension is connected, whether a mobile device is connected, and which addons are active. This means the agent always knows the current state of your session.
-
LLM call — The prompt and conversation history are sent to whichever LLM provider you’ve configured. The agent supports three providers (detailed below). The LLM responds either with text (displayed to you) or with tool calls (executed by Ghost).
-
Tool routing — Ghost doesn’t send all tools to the LLM every time. Instead, it looks at what step of the plan the agent is currently on and sends only the 15-25 tools relevant to that step’s category (reconnaissance, analysis, active testing, exploitation, or reporting). This saves tokens and keeps the LLM focused. If no plan exists yet, all tools are sent so the agent can decide what to plan.
-
Parallel execution — When the LLM requests multiple tool calls in one response, read-only tools (like
search_traffic,get_flow,list_sessions) run concurrently — up to 4 at a time. Tools that modify state (liketag_flows,replay_request,fs_write) run one at a time to avoid conflicts. -
Output compression — Each tool’s output is capped at 16,000 characters (roughly 4,000 tokens). For tools that return large results (like
run_nucleiorsearch_traffic), Ghost intelligently compresses the output — keeping critical/high-severity findings and truncating lower-priority content rather than blindly cutting off at a character limit. -
Context management — The full conversation (system prompt + all messages + all tool results) is tracked for token usage. When it approaches the provider’s context window limit, older messages are mechanically summarized — tool names and key findings are preserved, but raw outputs are dropped. This is a deterministic process (no LLM call for summarization), so it’s fast and predictable.
-
Reflection — After completing each plan step, the agent receives a reflection prompt asking it to assess: Was the step complete? Is the finding well-evidenced? Did anything change the plan? Should the next step be adjusted? This prevents the agent from rushing through steps without thinking about what it found.
-
Termination check — Six signals can stop the agent (detailed below). If none fire, the loop continues with the next LLM call.
-
Mid-run steering — While the agent is running, you can type a message in the chat input (which shows an amber border during active runs). Your message is injected into the conversation between iterations. The agent has a buffer of up to 5 steering messages — if you type more than 5 before the agent processes them, the oldest are dropped.
LLM Providers
Section titled “LLM Providers”Ghost supports three LLM providers. You configure which one to use in Settings → AI or in ~/.ghost/config.toml. API keys are encrypted at rest with AES-256-GCM — they’re never stored in plaintext on disk.
| Provider | Default Model | Available Models | Context Window | What It’s Best For |
|---|---|---|---|---|
| Anthropic | Claude Sonnet 4.6 | Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5 | 200,000 tokens | Primary recommended provider. Ghost uses Anthropic’s prompt caching (CacheControl: ephemeral on the system prompt and last tool definition) for significant cost savings on repeated calls. Opus 4.6 is the most capable for complex analysis. |
| OpenAI | GPT-4o | GPT-5.4, o3, o3 Pro, o4-mini, GPT-4.1/mini/nano, GPT-4o/mini | 128,000 tokens | Alternative cloud provider. Uses the openai-go SDK with streaming via ChatCompletionAccumulator for assembling tool calls from streamed chunks. o3/o3 Pro excel at reasoning-heavy tasks. |
| Ollama | llama3.2 | Llama 4, 3.3, 3.2, Qwen 3/2.5, DeepSeek R1/Coder V2, Mistral, Mixtral, Phi-4, Gemma 2 | 32,000 tokens | Fully offline/local operation. Connects to Ollama’s OpenAI-compatible API at http://localhost:11434/v1. No data leaves your machine. Any Ollama model works — these are pre-configured options. |
All providers share the same retry behavior: up to 3 retries on rate limit errors (HTTP 429), with wait time parsed from the error response. Max output tokens per LLM response: 8,192.
The Plan System
Section titled “The Plan System”The agent’s first action on any task must be creating a plan via the create_plan tool. This is enforced — if the agent tries to produce a text response without creating a plan first, Ghost injects a nudge: “You MUST create a plan before proceeding.” (up to 2 nudges before giving up).
Plans have 1–15 steps. Each step has a description, a category (recon, analysis, active_test, exploit, or report), and a status (pending, in_progress, completed, skipped, or failed). The plan is visible in the UI’s Plan Panel and looks like this:
<current_plan progress="2/5" phase="analysis"> Goal: Test authentication security on api.example.com Scope: api.example.com — auth endpoints, JWT, IDOR
[1] Map API endpoints ............... DONE — Found 12 endpoints [2] Analyze auth patterns ........... IN PROGRESS [3] Test JWT security ............... PENDING [4] Test rate limiting .............. PENDING [5] Generate security report ........ PENDING</current_plan>The agent can revise the plan mid-execution (up to 5 revisions). If it discovers something unexpected in step 2 that requires additional testing, it calls revise_plan to add new steps before continuing. After completing each step, the agent calls complete_step with a result summary (capped at 500 characters).
Reflection
Section titled “Reflection”After the agent completes a plan step, Ghost injects a reflection prompt asking:
- Completeness — Did you test everything relevant to this step?
- Quality — Is your finding well-evidenced? Would a senior reviewer accept it?
- Discovery — Did you find anything that changes your plan?
- Next step — Is the next planned step still the right action?
When all steps are complete, a final reflection is injected that asks the agent to walk through the entire plan, review all findings, assess confidence (low/medium/high), and identify anything critical that was missed. If the agent finds gaps during final reflection, it can call revise_plan to add more steps.
Termination
Section titled “Termination”Six signals can stop the agent, checked in this priority order every iteration:
| Signal | Condition | What Happens |
|---|---|---|
| Plan complete + reflected | All plan steps are marked completed or skipped, AND final reflection is done | Agent stops immediately — work is complete. |
| Plan completed + text-only | Plan status is “completed” and the agent produced a text response with zero tool calls | Agent stops — it’s writing its final summary. |
| Loop detected | The same tool with the same input (checked by FNV-1a hash) was called 3 or more times within the last 10 tool call records | Agent is asked to wrap up and write a report. This prevents infinite loops where the agent keeps retrying the same failing action. |
| Diminishing returns | After iteration 8: the current step is still “in progress” and the last 6 tool calls are all the same tool name | Agent is asked to wrap up. This catches cases where the agent is stuck calling the same tool repeatedly without advancing. |
| Budget reservation | Iteration reaches maxIterations - 3 (iteration 22 of 25) | Agent is asked to write its final report with the remaining 3 iterations. This ensures the agent always has room to produce a summary rather than being cut off mid-analysis. |
| User stop | User clicks the stop button in the UI | Agent wraps up the current step and writes a summary. |
The hard cap is 25 iterations (LLM calls) per run. If none of the above signals trigger and the agent reaches 25, the run ends with an error.
When an agent is asked to “wrap up” (signals 3-6), it receives a termination prompt:
“You are being asked to wrap up. Call complete_step for any in-progress step, write a final summary of your work, findings, and recommendations. Do NOT start new tool calls.”
Tool Categories
Section titled “Tool Categories”Ghost registers different tools depending on the active mode (QA or Security) and what’s connected (extension, device, Frida). Here are all tools that are actually registered — tools exist in the codebase but are commented out of registration are not listed.
Core Tools (Both Modes — Always Available)
Section titled “Core Tools (Both Modes — Always Available)”| Tool | What It Does |
|---|---|
search_traffic | Search flows using GQL queries with pagination (max 50 results per page, supports offset for paging). |
get_flow | Get full detail of a single flow — request/response headers, status, timing, tags, annotations. |
get_flow_body | Get the request or response body of a flow (up to 16 KB). Supports grep_pattern to search within large bodies instead of returning everything. |
find_endpoints | Discover all unique endpoints (method + host + path combinations) across captured traffic. |
get_traffic_stats | Aggregate statistics — total flows, error rate, top hosts, method distribution, average response time. |
tag_flows | Add or remove tags on flows (e.g., tag flows as “bug”, “suspicious”, “auth”). |
replay_request | Re-send a captured request exactly as it was originally captured, or with modifications. |
annotate_flow | Add a text annotation/note to a flow — useful for documenting findings directly on the relevant traffic. |
list_sessions | List all sessions (up to 100). |
list_journeys | List recorded browser journeys (up to 100). |
get_journey_steps | Get the recorded steps of a journey — interactions, flows, selectors, element details (up to 500 steps). |
journey_export | Export a journey as deterministic test code (Cypress UI/API, Playwright UI/API, k6, Postman, cURL, HAR). Uses the same reliable exporter as the Ghost UI — stable selectors, no hallucination. Writes to workspace and returns the full path. Preferred over generate_test for journey-based tests. |
list_ws_messages | Query captured WebSocket frames — filter by direction, message type, content. |
list_console_errors | List browser console errors captured by the extension. |
list_navigations | List page navigation events captured by the extension. |
list_storage_changes | List localStorage/sessionStorage/cookie changes captured by the extension. |
analyze_requirements | Cross-reference uploaded requirements documents against captured traffic to assess test coverage. |
create_plan | Create a structured execution plan (1-15 steps). Single-step plans are allowed for focused tasks like “export journey as Cypress UI”. |
revise_plan | Modify the current plan — add, remove, or reorder steps (max 5 revisions). |
complete_step | Mark a plan step as completed with a result summary. |
think | Private reasoning scratchpad — the agent writes its thoughts without displaying them as chat text. Renders as a purple “Thinking” indicator with bouncing dots in the UI. |
present_options | Present 2-6 interactive choices to the user with a question, labeled options (with descriptions), and a free-text input. Pauses the agent loop until the user responds. See Interactive Choices below. |
Browser Extension Tools (When Extension Connected)
Section titled “Browser Extension Tools (When Extension Connected)”| Tool | What It Does |
|---|---|
highlight_element | Highlight a DOM element in the browser with a colored outline — useful for pointing out elements to the user. |
show_toast | Show a notification toast in the browser page. |
annotate_element | Add a persistent label/badge to a DOM element in the browser. |
browser_read_page | Read the full DOM content of the current browser page. |
browser_query_all | Query the page DOM using CSS selectors — returns matching elements with their attributes and text content. |
browser_click | Click a DOM element by CSS selector. |
browser_fill | Fill a form field by CSS selector with a value. |
browser_screenshot | Take a screenshot of the page or a specific element. |
browser_inject | Inject arbitrary JavaScript into the page context. |
Proxy Injection (When Script Injector Available)
Section titled “Proxy Injection (When Script Injector Available)”| Tool | What It Does |
|---|---|
proxy_inject_script | Inject JavaScript into HTML responses matching a URL pattern — works on ALL devices including mobile (unlike browser extension tools which only work in the browser). Scripts can use window.__ghost.observe(selector, callback) for SPA-aware DOM watching, window.__ghost.fetch(url) for cross-origin requests (active modes only), and window.__ghost.analyze(data, prompt) for AI analysis (active modes only). |
QA-Specific Tools
Section titled “QA-Specific Tools”| Tool | What It Does |
|---|---|
generate_test | Generate test code from captured traffic. Supports Playwright, Cypress, pytest, k6, and other frameworks. |
generate_bug_report | Create a structured bug report with traffic evidence — includes reproduction steps, request/response details, and screenshots. |
detect_regression | Compare two sessions to find regressions — new errors, changed response structures, performance degradation. |
export_as | Export a flow as cURL, fetch, Python requests, Go http, or other formats. |
generate_api_docs | Generate API documentation from captured traffic — endpoint inventory, request/response schemas, example payloads. |
generate_mock_server | Generate a mock server from captured traffic — replay recorded responses for offline testing. |
generate_test_scenarios | Generate test scenario descriptions from captured traffic patterns. |
generate_session_report | Generate a comprehensive session summary report. |
fuzz_endpoint | Systematic API fuzzing — send variations of a request to discover edge cases and errors. |
test_form | Discover form fields on a page and test them with various inputs. |
record_journey | Start or stop recording a user journey with correlated browser interactions and HTTP flows. Requires extension. |
replay_journey | Replay a recorded journey’s HTTP flows and compare responses with the originals (SSE streaming). |
Performance Testing Tools (Require External Tools Installed)
Section titled “Performance Testing Tools (Require External Tools Installed)”| Tool | What It Does |
|---|---|
run_k6 | Run a k6 load test against an endpoint. Capped at 100 virtual users and 5-minute duration. |
run_hey | Run an HTTP benchmark with hey. Capped at 1,000 requests, 50 concurrency, 60-second duration. |
Mobile Inspector Tools (When Device Connected)
Section titled “Mobile Inspector Tools (When Device Connected)”| Tool | What It Does |
|---|---|
get_device_screen | Capture the current device screen as a screenshot (resized to max 400px width, JPEG quality 60 to save tokens). |
get_element_tree | Get the UI element hierarchy (max depth 10 levels). |
find_elements | Search for elements by text or accessibility properties (max 20 results). |
get_element_selectors | Generate automation selectors for an element — Appium, Espresso, XCUITest, and Maestro formats. |
correlate_element_traffic | Find which API calls were triggered by interacting with a UI element (checks a 5-second time window, max 200 flows). |
tap_device | Tap an element on the device screen. Rate-limited to 10 taps per 5 seconds to prevent accidental rapid-fire. |
type_device | Type text into the currently focused field (max 500 characters). |
TestRail Integration Tools (When TestRail Configured)
Section titled “TestRail Integration Tools (When TestRail Configured)”| Tool | What It Does |
|---|---|
testrail_list_projects | List TestRail projects. |
testrail_get_cases | Get test cases from a TestRail project. |
testrail_push_results | Push test results to TestRail. |
testrail_suggest_cases | Suggest which TestRail test cases are relevant to the current traffic. |
Security-Specific Tools
Section titled “Security-Specific Tools”| Tool | What It Does |
|---|---|
list_findings | List security findings detected by Ghost’s passive security interceptor — these are findings that were automatically detected during traffic capture, before the agent even starts. |
send_http_request | Send a custom HTTP request with full control over method, URL, headers, and body. SSRF-protected: Ghost validates that the resolved IP is not private/loopback before connecting. |
get_page_resources | Map all resources loaded by a page — JavaScript, CSS, images, fonts. Useful for finding third-party scripts that might contain secrets. |
request_approval | Request user approval before performing a destructive or data-modifying operation. Required in active-safe mode for write operations. |
attack_request | Launch an automated payload attack against a captured request (detailed below). Requires the Attacker engine. 5-minute timeout. |
list_wordlists | List available payload wordlists for use with attack_request. |
Frida Tools (When Frida Connected — Security Mode Only)
Section titled “Frida Tools (When Frida Connected — Security Mode Only)”| Tool | What It Does |
|---|---|
frida_check | Verify Frida connection is alive. Must be called before other Frida tools. |
frida_list_apps | List installed apps on the connected device. |
frida_bypass_ssl | Bypass SSL certificate pinning to intercept HTTPS traffic from apps that would otherwise refuse the proxy certificate. |
frida_root_bypass | Bypass root/jailbreak detection so the app runs normally on a rooted device. |
frida_trace | Hook functions in the running app and log their arguments and return values. 60-second timeout. |
frida_inject | Inject a custom Frida script into the running app for deep runtime inspection. 3-minute timeout. |
External Scanner Tools (Security Mode — Require Installation)
Section titled “External Scanner Tools (Security Mode — Require Installation)”| Tool | Timeout | What It Does |
|---|---|---|
run_nuclei | 5 min 10 sec | Run Nuclei vulnerability scanner templates against target endpoints. |
run_dalfox | 5 min 10 sec | Run DalFox XSS scanner. |
run_ffuf | 5 min 10 sec | Fuzz paths, parameters, or headers with ffuf. |
run_sqlmap | 5 min 10 sec | Run sqlmap SQL injection testing against a captured flow. |
run_trufflehog | 5 min 10 sec | Scan for hardcoded secrets in captured JavaScript and responses. |
run_katana | 5 min 10 sec | Crawl and discover endpoints beyond what was captured. |
run_semgrep | 5 min 10 sec | Static analysis of captured JavaScript for client-side vulnerabilities. |
run_nmap | 5 min 10 sec | Port scan and service detection. |
run_ssl_scan | 5 min 10 sec | TLS/SSL configuration analysis. |
run_hydra | 5 min 10 sec | Password brute-force testing. |
External scanner stdout is capped at 2 MB, stderr at 64 KB. If a scanner isn’t installed on the system, its run_* tool simply won’t be registered.
Workspace Tools (Both Modes)
Section titled “Workspace Tools (Both Modes)”| Tool | What It Does |
|---|---|
fs_write | Write a file to the agent’s workspace directory (max 1 MB per file). Used for saving evidence, PoC scripts, reports. |
fs_read | Read a file from the workspace (max 1 MB, 500 lines). |
fs_list | List files in the workspace directory. |
The Request Attacker
Section titled “The Request Attacker”The attack_request tool launches Ghost’s built-in request attacker engine — a payload fuzzer inspired by Burp Suite’s Intruder. It takes a captured flow as a base request and systematically replaces parts of it with payloads from wordlists.
How It Works
Section titled “How It Works”- Pick a base request — The agent selects a captured flow to use as the template
- Define insertion points — Where payloads should be injected:
header,query_param,body_json,body_form,body_raw,cookie,path_segment, ormethod - Choose payloads — From built-in wordlists or custom values
- Send a baseline — The original request is sent first to establish a “normal” response (status code, body length, response time)
- Launch the attack — Payloads are injected and sent in parallel, each response compared against the baseline
Attack Modes
Section titled “Attack Modes”| Mode | How Payloads Are Combined | Use Case |
|---|---|---|
| Sniper | One insertion point at a time, others keep original values. If you have 2 points and 100 payloads, that’s 200 requests. | Testing each parameter individually for a specific vulnerability class. |
| Battering Ram | Same payload in all insertion points simultaneously. 100 payloads = 100 requests regardless of point count. | Testing if the same input causes issues across multiple parameters. |
| Pitchfork | Payload lists are walked in parallel — payload 1 from list A goes with payload 1 from list B. Length = shortest list. | Paired data like username/password lists. |
| Cluster Bomb | Cartesian product — every combination of every payload across all points. 100 payloads × 2 points = 10,000 requests. | Exhaustive testing of all parameter combinations. |
Built-In Payload Wordlists
Section titled “Built-In Payload Wordlists”Ghost embeds these payload files directly in the binary (no external files needed):
| Wordlist | Purpose |
|---|---|
sqli-generic | SQL injection payloads (UNION, error-based, time-based, boolean blind) |
xss-reflected | Cross-site scripting payloads for reflection testing |
command-injection | OS command injection payloads (;id, |whoami, $(command)) |
path-traversal | Directory traversal payloads (../../../etc/passwd variants) |
ssrf | Server-side request forgery payloads (internal IPs, cloud metadata URLs) |
ssti | Server-side template injection payloads ({{7*7}}, ${7*7}) |
nosql-injection | NoSQL injection payloads for MongoDB, CouchDB etc. |
open-redirect | Open redirect payloads for testing URL redirect parameters |
auth-bypass-headers | Authentication bypass headers (X-Original-URL, X-Forwarded-For) |
http-methods | HTTP method tampering (PUT, DELETE, PATCH, TRACE, etc.) |
Additionally, api-endpoints and common-paths wordlists are available for path fuzzing with run_ffuf, and a dynamic numeric-ids list generates numbers 1–1000 for IDOR testing.
Safety Limits
Section titled “Safety Limits”| Parameter | Default | Maximum |
|---|---|---|
| Max requests per attack | 500 | 2,000 |
| Parallel threads | 5 | 10 |
| Delay between requests | 50 ms | — |
| Top interesting results kept | 20 | — |
| Response body read limit | 1 MB | — |
Interesting Results Detection
Section titled “Interesting Results Detection”The attacker automatically flags results that differ significantly from the baseline:
- Status code changed — baseline returned 200, this payload returned 500 (could indicate injection)
- Response length changed by >20% — significantly different body could mean data leak or error
- Response time 3× longer — could indicate time-based SQL injection or resource exhaustion
You can also define custom match rules — regex patterns on the response body, specific status codes, header values, or response time thresholds.
SSRF Protection
Section titled “SSRF Protection”The attacker engine validates every outbound request at dial time. Before connecting, it resolves the hostname to an IP address and blocks connections to loopback (127.0.0.1), private (10.x.x.x, 192.168.x.x, 172.16-31.x.x), link-local, or unspecified addresses. This prevents the agent from being tricked into attacking internal services.
Scan Modes (Security)
Section titled “Scan Modes (Security)”In Security mode, the agent operates under one of three scan modes that control what tools it can use:
| Mode | What’s Allowed | What Requires Approval | What’s Forbidden |
|---|---|---|---|
| Passive (default) | All traffic analysis tools, fs_read/fs_write/fs_list, run_trufflehog (with --no-verification), run_semgrep, read-only browser tools, proxy_inject_script (DOM annotation only — no __ghost.fetch or __ghost.analyze) | Nothing — no approval mechanism needed | All outbound requests (replay_request, send_http_request), all active scanners (run_nuclei, run_katana, run_ffuf, run_dalfox, run_sqlmap), all interactive browser tools (browser_click, browser_fill), all Frida tools |
| Active-Safe | Everything in passive, plus: GET requests via send_http_request/replay_request, scanners with safe defaults, attack_request, frida_trace (read-only) | POST/PUT/PATCH/DELETE that create/modify/delete data, actions that could trigger lockouts, frida_bypass_ssl/frida_root_bypass/frida_inject, browser_inject | Brute force attacks, destructive payloads |
| Active-Full | Everything in passive and active-safe, plus: all scanners at full power, all Frida tools, browser_inject | Only data-modifying requests (POST/PUT/PATCH/DELETE that change state on the target) | Nothing beyond the approval gate |
The scan mode is set server-side — the agent cannot escalate its own permissions.
Engagement State (Security Mode)
Section titled “Engagement State (Security Mode)”In Security mode, the agent tracks an engagement state that follows the PTES (Penetration Testing Execution Standard) methodology with automatic phase progression:
Phases: traffic_analysis → passive_detection → active_scanning → exploitation → reporting → done
The agent auto-advances phases based on tool usage — for example, after 3+ reconnaissance tool calls, it advances from traffic_analysis to passive_detection. The state also tracks discovered endpoints (up to 200), confirmed findings (up to 100), active injection rules (up to 50), and evidence file paths.
In QA mode, a simpler phase progression is used: qa_recon → qa_functional → qa_edge_cases → qa_errors → qa_performance → qa_reporting.
Dynamic System Prompt
Section titled “Dynamic System Prompt”The agent’s system prompt is not static — it’s assembled dynamically on every run based on context:
QA Mode prompt includes:
- Agent identity and available tool list (conditional on extension/device connectivity)
- Current session stats (flow count, error rate, top hosts, method distribution)
- Active addons
- Strategic directives (observe first, don’t assume, use annotations)
- Planning protocol (create plan first, revise when needed)
- Data parsing protocol (locale-aware number parsing for Turkish formats like “1.250,50 TL”)
- Injection rules (if extension connected — color palette, URL patterns, isolation)
- Inspector context (if device connected — screen capture → element tree → selector → correlate strategy)
- GQL reference (search syntax for
search_traffic) - Few-shot examples (bug finding, test generation, visual annotation)
Security Mode prompt includes:
- Authorization context (pre-authorized engagement — no disclaimers, no permission-asking)
- Pentester identity (methodical, persistent, tries 5-10 approaches before moving on)
- Operator rules (execute don’t advise, chain findings, prove impact, respect scope, document evidence)
- Session context with target hosts and scan mode
- Scan mode rules (exactly what’s allowed/forbidden/needs approval)
- Tool strategy (phase-ordered tool usage with output chaining rules)
- External tool availability (only shows tools actually installed on the system)
- Vulnerability taxonomy — API vulns (SQLi, RCE, auth bypass, SSRF, IDOR, etc.) and web-specific vulns (DOM XSS, open redirect, clickjacking, etc.)
- Mobile vulnerability taxonomy (if device/Frida connected — cert pinning, insecure storage, root detection, etc.)
- Workflow protocol (evidence file naming:
findings/VULN-{NNN}-{type}.md, PoC scripts:poc/{type}-exploit.py, final report:report.md)
SSE Streaming
Section titled “SSE Streaming”Agent responses stream to the frontend via Server-Sent Events (SSE), not WebSocket. The endpoint is POST /api/v1/agent/chat. POST was chosen over GET (which the browser’s native EventSource API requires) because the chat message needs to be sent in the request body.
SSE event types sent to the frontend:
| Event | What It Contains |
|---|---|
chunk | A text token from the LLM — these arrive one at a time and build up the agent’s message as you watch. |
tool_call | The agent is calling a tool — includes tool name and input parameters. |
tool_result | A tool finished executing — includes the tool’s output. |
plan_created | The agent created its plan — includes all steps. |
step_started | A plan step is now in progress. |
step_completed | A plan step finished — includes the result summary. |
plan_revised | The agent revised its plan — includes updated steps. |
plan_completed | All plan steps are done. |
options | The agent is presenting interactive choices — renders an OptionsPanel with clickable option cards and a free-text input. The agent loop pauses until the user responds. |
steer | A user steering message was injected. |
metrics | Run metrics emitted at the end. |
error | An error occurred. |
done | The agent run is finished. |
Interactive Choices
Section titled “Interactive Choices”Sometimes the agent needs your input before it can proceed — for example, “Which test framework should I use?” or “Which finding should I investigate first?” Instead of requiring you to type a response, the agent uses present_options to show a structured choice panel.
How It Works
Section titled “How It Works”- The agent calls
present_optionswith a question and 2-6 labeled options (each with an optional description) - The agent loop pauses — it emits metrics with termination reason
"waiting for user choice"and exits cleanly - The frontend renders an OptionsPanel: a 2-column card grid with clickable option buttons, plus a free-text input if you want to type something custom
- When you click an option (or type a custom response and press Enter), the selected value is sent as a new chat message
- A new agent run starts with your choice in the conversation history, and the agent continues from where it left off
UI Details
Section titled “UI Details”- Pending state: While the
present_optionstool is executing, a spinning indicator with “Preparing options…” appears - Active state: Option cards are arranged in a 2-column grid. Each card shows a bold label and an optional description underneath. Cards have hover effects and focus rings for keyboard navigation.
- Answered state: After you select an option, the entire panel fades to 50% opacity. The selected option shows a cyan check icon and highlighted border. The free-text input disappears.
- Run Summary suppression: The Run Summary card is hidden when the termination reason is
"waiting for user choice"— the agent isn’t actually done, it’s just pausing for input.
Frontend Components
Section titled “Frontend Components”OptionsPanel(chat/options-panel.tsx) — Card grid with option buttons and free-text inputToolCallCard(chat/tool-call-card.tsx) — Special-casespresent_optionsto show a spinner while pending and nothing when done (the OptionsPanel handles the UI)ThinkIndicator(insidetool-call-card.tsx) — Special-cases thethinktool to show a purple card with bouncing dots animation while pending, then “Reasoned about the approach” when done
Run Metrics
Section titled “Run Metrics”After each run completes, the agent emits a metrics event with:
| Metric | What It Measures |
|---|---|
| Duration | Total wall-clock time from start to finish. |
| Iterations | Number of LLM calls made (max 25). |
| Tool calls | Total number of tool invocations across all iterations. |
| Reflections | Number of step reflection points where the agent paused to assess progress. |
| Termination reason | Why the agent stopped — plan complete, loop detected, budget reservation, user stop, waiting for user choice, etc. |
| Cost estimate | Estimated LLM cost based on input + output token counts and the provider’s pricing. |
These appear in the Run Summary card at the bottom of the agent’s response in the chat panel. The summary is hidden when the termination reason is "waiting for user choice" — the agent isn’t done, it’s waiting for input via present_options.
Chat UI Enhancements
Section titled “Chat UI Enhancements”Tool Call Cards show contextual one-line summaries next to the tool label — for example, create_plan shows the first 50 characters of the goal, send_http_request shows GET https://api.example.com, and journey_export shows the export format. Special tool rendering:
think→ Purple “Thinking” card with bouncing dots animation (no accordion)present_options→ Spinner while pending, hidden when done (OptionsPanel handles display)fs_write→ FileCard showing the relative path, line count, and byte sizejourney_export→ FileCard showing the export path, format, and size
Conversation Management
Section titled “Conversation Management”- Conversations are stored in SQLite — each conversation belongs to a session. Messages (user, assistant, tool calls, tool results) are persisted immediately as they happen, not just at the end.
- Multiple conversations per session — start a new conversation for a different topic without losing the old one.
- Conversation list in the chat panel sidebar — click to switch between conversations.
- Message IDs are ULIDs — time-ordered, so conversations are naturally sorted chronologically.
- Delete conversations to clean up — removes the conversation and all its messages from the database.
Tool Output Compression
Section titled “Tool Output Compression”When a tool returns more than 16,000 characters, Ghost compresses it intelligently based on the tool type:
| Tool | Compression Strategy |
|---|---|
run_nuclei | Keep all CRITICAL and HIGH findings, cap MEDIUM at 20, LOW at 10. |
run_sqlmap | Keep injection confirmations, cap databases at 20, tables at 30, data rows at 20. |
run_katana | Prioritize URLs with parameters (cap 100), then plain paths (cap 50). |
run_ffuf | Prioritize non-200 responses (cap 50), then 200 responses (cap 50). |
search_traffic | Parse JSON, cap the results array at 50 items, re-serialize. |
get_flow_body | Simple truncation at the character limit. |
get_device_screen | Remove the base64 screenshot data, keep metadata. |
| All others | Truncate with a summary indicating total length and how much was kept. |