Browser Extension
Ghost’s browser extension adds three capabilities on top of proxy traffic interception: observing what users do (Capture), driving browser actions (Action), and overlaying intelligence on pages (Inject). It also powers journey recording by injecting X-Ghost-Interaction headers via Chrome’s declarativeNetRequest API for interaction-flow correlation.
Three-Layer Architecture
Section titled “Three-Layer Architecture”Capture Layer (Page → Ghost)
Section titled “Capture Layer (Page → Ghost)”The content script runs 7 capture subsystems that observe DOM events and send structured data to Ghost via the service worker:
1. Click Events
Section titled “1. Click Events”- CSS selector path for the clicked element (generated automatically)
- Element text, type, and coordinates
- Page URL, page title, and timestamp
- Sent as
capture.interactionwith typeclick
2. Form Input Changes
Section titled “2. Form Input Changes”- Fires on the
changeevent (when the user leaves a field after modifying it) - Captures: field selector, element type (input/textarea/select)
- Input metadata: field name, type, label (resolved via
<label for>, parent<label>,aria-label, or placeholder), validation attributes (required, pattern, min, max, minlength, maxlength) - Privacy: Only captures field structure and metadata — never the actual input value
- Sent as
capture.interactionwith typechange
3. Form Submissions
Section titled “3. Form Submissions”- Fires on the
submitevent (form submission) - Captures: form selector, form action URL
- Sent as
capture.interactionwith typesubmit
4. Page Navigation
Section titled “4. Page Navigation”- Detects URL changes via three mechanisms:
popstateevent, monkey-patchedHistory.pushStateandHistory.replaceState, and aMutationObserveron the<title>element - Captures: from URL → to URL, page title
- This covers both traditional page loads and SPA route changes (React Router, Next.js, etc.)
- Sent as
capture.navigation
5. Hover/Focus
Section titled “5. Hover/Focus”- Fires on
mouseoverandfocusinevents on interactive elements - Captures: element selector, ARIA labels
- Used for test coverage gap detection — Ghost can identify interactive elements that were never clicked
- Sent as
capture.interactionwith typehoverorfocus
6. Console Errors
Section titled “6. Console Errors”- Captures
errorevents (JavaScript exceptions) andunhandledrejectionevents (unhandled Promise rejections) - Includes error message, stack trace, page URL, and timestamp
- Sent as
capture.console_error
7. Storage Changes
Section titled “7. Storage Changes”- Monitors
localStorageandsessionStoragevia thestorageevent plus periodic diffing (every 5 seconds) - Detects key additions, removals, and value changes
- Useful for tracking auth token lifecycle, session state, and feature flag changes
- Sent as
capture.storage_change
Note: Screenshots are NOT a passive capture event — they’re an explicit action (see Action Layer below). The 7 subsystems above are all passive observers that run automatically.
Throttling
Section titled “Throttling”- Click/change events: Per-selector throttle of 200ms — repeated interactions on the same element within 200ms are deduplicated
- Hover/focus events: Per-element throttle of 500ms, plus a 50ms rate limit in the service worker to prevent flooding
- Throttle map: Capped at 1,000 entries. When the cap is reached, the entire map is cleared (prevents unbounded memory growth on pages with many interactive elements)
Action Layer (Ghost → Page)
Section titled “Action Layer (Ghost → Page)”Ghost can drive browser actions via the extension. 15 action types:
| Action | Description |
|---|---|
click | Click element by selector (mousedown → mouseup → click chain) |
fill | Set input/textarea value + trigger input/change/blur events |
check | Check/uncheck checkbox, toggle radio button |
select | Select dropdown option by value or visible text |
scroll_to | Scroll element into view |
navigate | Navigate to URL |
wait | Wait for condition (element visible/hidden, URL match, text appears, network idle) |
read_page | Get page info: URL, title, all forms, all interactive elements with selectors |
read_form | Get form fields: name, type, label, validation rules, current values |
screenshot | Capture visible tab |
get_text | Get element text content |
get_attribute | Get element attribute value |
check_state | Check element state (visible, disabled, checked) |
execute_steps | Execute ordered step sequence with waits between steps |
query_all | Query all matching elements |
All actions include visibility checks and scroll-into-view before interaction. The content script enforces a 30-second timeout per action. The Go backend uses a 60-second timeout for the full round-trip (which includes WebSocket message delivery and network latency on top of the action execution).
Agent Browser Tools
Section titled “Agent Browser Tools”The AI agent has 6 registered browser tools that invoke extension actions:
| Agent Tool | Extension Action | What It Does |
|---|---|---|
browser_click | click | Clicks an element by CSS selector |
browser_fill | fill | Types text into an input field |
browser_read_page | read_page | Gets page URL, title, all forms, and all interactive elements with selectors |
browser_query_all | query_all | Queries all elements matching a CSS selector |
browser_screenshot | screenshot | Captures the visible tab |
browser_inject | (inject command) | Injects UI overlays onto the page (highlights, annotations, toasts) |
Several extension actions exist that do NOT have dedicated agent tools — the agent achieves those results through its registered tools instead. For example, read_page covers what read_form does, query_all covers get_text/get_attribute/check_state, and the agent calls tools individually rather than using execute_steps.
Inject Layer (Ghost → Page)
Section titled “Inject Layer (Ghost → Page)”Ghost can overlay UI elements onto the web page through the content script:
| Command | Description |
|---|---|
highlight | Colored border + label on an element |
annotate | Label text near an element (e.g., API endpoint triggered) |
toast | Notification overlay (slow response, schema change, error) |
test_result | Green check / red X on tested elements |
panel | Multi-section floating overlay |
clear | Remove all injected overlays |
Shadow DOM Isolation
Section titled “Shadow DOM Isolation”All injected UI uses Shadow DOM to prevent style leakage between Ghost’s overlays and the page’s CSS. ResizeObserver and MutationObserver keep overlays positioned correctly as the page changes.
SPA Navigation Detection
Section titled “SPA Navigation Detection”Overlays are automatically cleaned up when SPA navigation is detected (URL change without page reload).
Connection
Section titled “Connection”The extension connects to Ghost via WebSocket:
ws://localhost:5565/ws/extension?token={bearer_token}Configuration
Section titled “Configuration”- Host:
localhost(default) - Port:
5565(Ghost API port) - Token: Bearer token from Ghost (stored in extension’s
chrome.storage)
Configure via the extension popup or via the extension setup panel in Ghost.
Reconnection
Section titled “Reconnection”- Exponential backoff: 1s initial, 1.5x multiplier, 30s maximum
- 24-second keepalive alarm (prevents Chrome MV3 from suspending the service worker)
Protocol
Section titled “Protocol”Extension → Ghost messages:
ext.hello— initial handshake (announces the extension and its capabilities)ext.tab_switch— the user switched to a different browser tabext.ping— keepalive ping (sent every ~24 seconds to prevent Chrome from suspending the service worker)capture.interaction— a DOM event was captured (click, change, submit, hover, focus)capture.console_error— a JavaScript error or unhandled rejection was observedcapture.navigation— a page navigation was detected (URL change)capture.storage_change— a localStorage/sessionStorage mutation was detectedaction.result— the result of an executed action (success/failure + data)
Ghost → Extension messages:
ghost.welcome— handshake response with session IDaction.request— request to execute a browser action (one of the 15 action types)inject.command— request to inject a UI overlay (one of the 6 inject commands)journey.recording_start— start injectingX-Ghost-Interactionheaders viadeclarativeNetRequestfor journey recordingjourney.recording_stop— stop injecting headers and remove the dynamic rule
Extension Popup
Section titled “Extension Popup”The popup provides:
- Connection status — connected/disconnected with host:port
- Toggle capture per tab
- Quick actions: screenshot, read page, clear overlays
- Recent interactions — last 10 captured events
- Settings link — configure connection
Keyboard Shortcuts
Section titled “Keyboard Shortcuts”| Shortcut | Action |
|---|---|
Alt+G | Open extension popup |
Alt+Shift+C | Toggle capture for current tab |
Alt+Shift+S | Take screenshot |
Interaction-Flow Correlation
Section titled “Interaction-Flow Correlation”When the extension captures an interaction (click, change, submit), Ghost’s correlation engine matches it with proxy flows captured around the same time. The engine queries flows within a ±500ms window of the interaction timestamp (limited to 50 candidate flows per window) and scores each match using four weighted criteria:
| Criterion | Weight | How It Works |
|---|---|---|
| Host match | 0.4 | Whether the flow’s request host matches the page’s domain |
| Timing proximity | 0.3 | Linear decay — flows closer in time to the interaction score higher |
| Path similarity | 0.2 | How closely the flow’s URL path relates to the page URL |
| HTTP method heuristic | 0.1 | POST/PUT/DELETE score higher than GET (user actions more likely to mutate data) |
Flows scoring below a 0.3 minimum confidence threshold are discarded. The remaining matches power the interaction indicator in the flow inspector — showing which UI action triggered each API call.
Journey Recording Correlation
Section titled “Journey Recording Correlation”During journey recording, the extension uses a more deterministic correlation mechanism alongside the heuristic scoring above. It injects an X-Ghost-Interaction header into every outgoing request via Chrome’s declarativeNetRequest API (rule ID 9999). When the user performs an interaction (click, change, submit — not hover or focus), the header value is updated with a unique ID. The proxy reads and strips this header, storing the ID on the flow. The backend recorder uses this ID to create "action" steps (correlated interaction + flow) rather than relying purely on timing-based heuristics.
The extension requires the declarativeNetRequest permission to create dynamic header injection rules. The rule targets xmlhttprequest (covers both XHR and fetch), main_frame, and sub_frame resource types. The correlation window is 600ms — after which the header resets to an ambient value so unrelated requests aren’t misattributed. Stale rules from previous sessions or crashes are cleaned up automatically on service worker startup.