Storage Layer
Every HTTP request and response Ghost captures needs to be stored somewhere — so you can search it, filter it, export it, and analyze it later. Ghost uses SQLite as its database, which is an embedded database engine that stores everything in a single file on your computer. Unlike databases like PostgreSQL or MySQL that run as separate server processes, SQLite runs inside Ghost itself — no installation, no configuration, no network connections. Your entire traffic history lives in one file: ~/.ghost/ghost.db.
Ghost uses a pure Go implementation of SQLite (modernc.org/sqlite) that requires no C compiler or external libraries. This keeps Ghost as a single binary with zero dependencies.
How SQLite Is Configured
Section titled “How SQLite Is Configured”When Ghost opens the database, it configures SQLite with specific settings designed for Ghost’s usage pattern: many goroutines reading data simultaneously (loading the traffic list, running searches, generating stats), but only one goroutine writing at a time (storing incoming flows from the proxy).
Connection Pool
Section titled “Connection Pool”db.SetMaxOpenConns(1) // Only one connection to the databasedb.SetMaxIdleConns(1) // Keep that connection alive when idledb.SetConnMaxLifetime(0) // Never close the connection due to ageWhy a single connection? SQLite doesn’t handle multiple simultaneous writers well — it would return “database is busy” errors. By limiting Go’s connection pool to exactly one connection, Ghost serializes all database operations (reads and writes go through the same connection, one at a time). This is simple and reliable: no lock contention, no BUSY errors, no retry logic needed.
Pragmas
Section titled “Pragmas”Pragmas are SQLite configuration commands that tune the database engine’s behavior. Ghost sets these when the database is first opened:
| Pragma | Value | What It Does |
|---|---|---|
journal_mode=WAL | Write-Ahead Logging | Instead of overwriting the database file directly, SQLite writes changes to a separate WAL file first. This is faster (sequential writes instead of random) and allows readers to work while a write is in progress — they simply read from the database file while the writer appends to the WAL file. |
busy_timeout=10000 | 10-second wait | If a query can’t execute because the database is locked (rare with MaxOpenConns(1), but possible during WAL checkpoints), SQLite retries for up to 10 seconds before returning an error. |
synchronous=NORMAL | Balanced durability | With WAL mode, NORMAL means SQLite flushes data to disk only at checkpoint time — not after every single write. This is much faster while still being safe: you could lose the last few writes if the power cuts out, but the database itself won’t be corrupted. |
foreign_keys=ON | Enforce relationships | When a flow belongs to a session, deleting the session automatically deletes its flows (CASCADE). This pragma makes SQLite actually enforce these relationships — by default, SQLite ignores them. |
cache_size=-64000 | 64 MB page cache | SQLite caches frequently-accessed database pages in memory. The negative value means “kibibytes” (not pages), so -64000 = ~64 MB. This keeps hot data in memory for fast reads — important when scrolling through thousands of flows. |
WAL Checkpoint
Section titled “WAL Checkpoint”The WAL file grows as Ghost captures traffic. Periodically, SQLite needs to transfer changes from the WAL file back into the main database file — this is called a checkpoint. Ghost handles this in two ways:
Periodic (every 30 minutes): A background goroutine runs PRAGMA wal_checkpoint(PASSIVE) every 30 minutes. PASSIVE means “checkpoint whatever you can without blocking — if any reads are in progress, skip those pages.” This keeps the WAL file from growing unbounded without interrupting Ghost’s operation. The goroutine is wrapped in safeGo (Ghost’s panic-recovery wrapper that catches panics and reports them to Sentry).
On shutdown: When Ghost closes the database, it runs PRAGMA wal_checkpoint(TRUNCATE), which flushes everything from the WAL into the main database file and then resets the WAL file to zero bytes. This ensures the database is in a clean state when Ghost next starts.
Database Location
Section titled “Database Location”The database file is stored at ~/.ghost/ghost.db:
| Platform | Full Path |
|---|---|
| macOS | /Users/<name>/.ghost/ghost.db |
| Windows | C:\Users\<name>\.ghost\ghost.db |
| Linux | /home/<name>/.ghost/ghost.db |
The ~/.ghost/ directory is created with 0700 permissions (owner-only access) if it doesn’t exist. The path is resolved by config.GhostHome().
Store Interfaces
Section titled “Store Interfaces”Ghost’s storage layer is organized as 15 sub-interfaces that are composed into a single Store interface. Each sub-interface groups related database operations. The concrete implementation (SQLiteStore) lives in a single file (sqlite.go, ~3,200 lines) that implements all 80 methods.
What this diagram shows — the core data relationships:
The diagram shows how Ghost’s main data tables relate to each other. Sessions are the top-level container — when Ghost starts capturing traffic, flows are stored inside the active session. Each session can have many flows (HTTP requests/responses), journeys (user interaction recordings), security findings (vulnerabilities detected), injection rules (JavaScript injected into pages), and extension events (browser activity logs). Flows can have WebSocket frames (for WebSocket connections) and interactions (correlated browser actions). Journeys contain journey steps (individual actions in a recording).
The arrows with ||--o{ mean “one-to-many” — one session has many flows, one flow has many WebSocket frames. The FK columns create these relationships, and ON DELETE CASCADE means deleting a session automatically deletes everything inside it.
Interface Summary
Section titled “Interface Summary”| Interface | Methods | What It Manages |
|---|---|---|
| FlowStore | 11 | HTTP flow CRUD, GQL-filtered queries with FTS5, stats aggregation (per-host, per-app, per-device), flow comparison summaries |
| SessionStore | 5 | Capture session CRUD with computed flow counts |
| AddonStore | 5 | JavaScript addon scripts (CRUD, priority ordering) |
| AgentStore | 7 | AI conversations and messages (with tool call storage) |
| ExtensionStore | 4 | Browser interactions and flow-interaction correlation |
| JourneyStore | 8 | User journey recording (steps, completion with subquery counts) |
| ScreenshotStore | 4 | Visual regression baselines (upsert, latest-by-URL lookup) |
| MapRuleStore | 5 | URL mapping, redirect, and rewrite rules |
| InjectionRuleStore | 6 | Script injection rules (session-scoped, with clear-all) |
| WSFrameStore | 4 | WebSocket frame persistence per flow |
| MonitorStore | 2 | Total flow count and database file size |
| ArtifactStore | 4 | Bug reports, session exports, test scenario files |
| SecurityFindingsStore | 6 | Security findings with dedup, validation, stats by severity/type/status |
| FridaScriptStore | 5 | Frida instrumentation scripts (categorized, cross-platform) |
| ExtensionEventStore | 3 | Console errors, navigations, storage changes from browser extension |
Total: 79 interface methods + Close() = 80 methods on the composed Store interface.
Full-Text Search (FTS5)
Section titled “Full-Text Search (FTS5)”When you search for text in request/response bodies or headers (using the body: or header: operators in GQL), Ghost uses SQLite’s FTS5 (Full-Text Search version 5) engine — a specialized search index that can find words inside large text fields efficiently.
FTS5 Virtual Table
Section titled “FTS5 Virtual Table”CREATE VIRTUAL TABLE flows_fts USING fts5( id UNINDEXED, req_url, req_host, req_path, req_headers_text, req_body_text, resp_headers_text, resp_body_text, notes, tags_text, content=flows, tokenize='porter unicode61');What each part means:
| Component | Purpose |
|---|---|
VIRTUAL TABLE ... USING fts5 | Creates a full-text search index, not a regular table. SQLite builds an inverted index (word → list of rows containing that word) for fast text lookup. |
id UNINDEXED | The flow ID is stored in the index for joining back to the flows table, but isn’t searchable itself — you don’t need to search for a ULID. |
req_url, req_host, req_path | The request URL, host, and path are all searchable. Searching for “api” would find flows to api.example.com/v1/users. |
req_headers_text, req_body_text | Request headers and body (converted to text) are searchable. Searching for “Bearer” finds flows with authentication tokens. |
resp_headers_text, resp_body_text | Response headers and body are searchable. Searching for “error” finds error messages in API responses. |
notes, tags_text | User-added notes and tags are also searchable. |
content=flows | This is a “content table” — the FTS index references the flows table rather than storing its own copy of the data. Saves disk space. |
tokenize='porter unicode61' | Porter stemming means searching for “running” also finds “run,” “runs,” and “ran.” Unicode61 handles international characters correctly (accents, CJK characters, etc.). |
Automatic Synchronization
Section titled “Automatic Synchronization”Three database triggers keep the FTS index in sync with the flows table:
flows_fts_insert— when a flow is inserted, its searchable text is added to the FTS indexflows_fts_delete— when a flow is deleted, its entry is removed from the FTS indexflows_fts_update— when a flow is updated, the old entry is removed and a new one is inserted
This means you never need to manually rebuild the search index — it stays current automatically.
Database Schema
Section titled “Database Schema”All Tables
Section titled “All Tables”Ghost’s schema is built across 18 migrations. Each migration adds new tables, columns, or indexes as features were developed.
| Table | Migration | Columns | Description |
|---|---|---|---|
| sessions | 1 | id, name, description, created_at | Capture session containers. Flow count is computed via LEFT JOIN at query time, not stored. |
| flows | 1, 7, 10, 11 | 37 columns total | The core table. Request method/URL/host/path/proto/headers/body/content_type/content_len, response status/text/proto/headers/body/content_type/content_len, timing (started_at, duration_ns, timings_json), metadata (tags JSON, notes, source, metadata JSON, error), interaction fields (selector, element_text, event_type, page_url, timestamp, screenshot), device fields (client_ip, source_app, device_type, device_id, device_name). |
| flows_fts | 1 | FTS5 virtual table | Full-text search index over 9 text columns with porter stemming. |
| settings | 1 | key, value | Key-value configuration store. |
| addons | 2 | id, name, code, enabled, priority, created_at, updated_at | JavaScript addon scripts with priority ordering. |
| conversations | 3 | id, session_id FK, title, created_at, updated_at | AI agent conversation containers. CASCADE delete with session. |
| messages | 3 | id, conversation_id FK, role, content, tool_calls, tool_call_id, created_at | Individual messages within conversations (user, assistant, tool roles). CASCADE delete with conversation. |
| interactions | 4 | id, session_id FK, flow_id FK, type, selector, element_text, element_type, page_url, page_title, data, screenshot, correlation_confidence, timestamp, created_at | Browser interactions captured by the extension. flow_id is SET NULL on flow deletion (interaction survives). |
| journeys | 5 | id, session_id FK, name, status, flow_count, interaction_count, started_at, completed_at, created_at | User journey recordings. Status defaults to “recording”. CASCADE delete with session. |
| journey_steps | 5 | id, journey_id FK, step_order, type, interaction_id FK, flow_id FK, data, timestamp | Individual steps within a journey. interaction_id and flow_id use SET NULL. CASCADE delete with journey. |
| screenshot_baselines | 6 | id, session_id FK, journey_id FK, url, selector, screenshot, width, height, file_size, captured_at, created_at | Visual regression baselines. journey_id uses SET NULL. |
| map_rules | 8 | id, type, enabled, priority, name, host_pattern, path_pattern, method_pattern, local_, remote_target, rewrite_, created_at, updated_at | URL mapping, redirect, and rewrite rules with pattern matching. |
| ws_frames | 9 | id, flow_id FK, direction, opcode, payload, length, timestamp, seq | WebSocket frames captured during proxy relay. CASCADE delete with flow. |
| artifacts | 12 | id, type, title, summary, format, device_id, device_name, session_id, content, size_bytes, metadata JSON, created_at | Persisted output files (bug reports, exports, test scenarios). |
| security_findings | 13, 15 | id, session_id FK, flow_id FK, type, severity, confidence, title, description, evidence, remediation, cwe_id, owasp_id, status, source, metadata JSON, dedup_key, created_at, updated_at | Security vulnerabilities detected by passive analysis or active scanning. Unique constraint on (session_id, dedup_key). |
| frida_scripts | 14 | id, name, description, code, category, platform, created_at, updated_at | Saved Frida instrumentation scripts. |
| injection_rules | 16, 17 | id, session_id FK, url_pattern, script, position, enabled, name, created_at, updated_at | JavaScript injection rules scoped to sessions. CASCADE delete with session. |
| extension_events | 18 | id, session_id FK, type, page_url, data JSON, timestamp | Browser extension events (console errors, navigations, storage changes). CASCADE delete with session. |
Foreign Key Behaviors
Section titled “Foreign Key Behaviors”Ghost uses two different FK behaviors depending on the relationship:
| Behavior | When Used | Example |
|---|---|---|
| CASCADE | Ownership — child cannot exist without parent | Deleting a session deletes all its flows, findings, conversations, events |
| SET NULL | Reference — child can exist independently | Deleting a flow sets flow_id = NULL on its interactions (the interaction still exists, just no longer linked) |
Three-level cascade example: Deleting a session → cascades to its flows → cascades to each flow’s WebSocket frames. One DELETE statement can remove thousands of rows across three tables.
Indexes
Section titled “Indexes”Ghost creates ~37 indexes across all migrations for query performance. The most important ones:
| Index | Columns | Why It Exists |
|---|---|---|
idx_flows_session_started | session_id, started_at DESC | Primary query path — the traffic list loads flows for the active session, sorted newest first. This composite index makes that query fast even with hundreds of thousands of flows. |
idx_flows_req_host | req_host | Host filtering and domain navigator grouping |
idx_flows_req_method | req_method | Method filtering (GET, POST, etc.) |
idx_flows_resp_status | resp_status_code | Status code filtering (200, 404, 500, etc.) |
idx_flows_source_app | source_app | Application grouping (Chrome, Safari, mobile app) |
idx_flows_device_type + idx_flows_device_id | device_type, device_id | Device filtering in the traffic list |
idx_ws_frames_flow | flow_id, seq ASC | Loading WebSocket frames in sequence order for a specific flow |
idx_findings_severity | severity | Grouping security findings by severity level |
idx_findings_dedup | session_id, dedup_key (UNIQUE) | Prevents duplicate security findings — same vulnerability on the same endpoint in the same session is stored only once |
idx_ext_events_session_type | session_id, type, timestamp DESC | Loading extension events filtered by type within a session |
Query Patterns
Section titled “Query Patterns”GQL Query Building
Section titled “GQL Query Building”When you type a search query in Ghost’s traffic list (like host:api.example.com status:4xx), the GQL parser converts it into SQL WHERE clauses:
| GQL Operator | SQL Translation | Details |
|---|---|---|
host:api.example.com | req_host LIKE '%api.example.com' (suffix) | If the value contains a dot, it’s treated as a domain suffix match — host:example.com matches api.example.com and cdn.example.com. Without a dot, it’s a substring match. |
-host:noise.com | req_host NOT LIKE '%noise.com' | Negation with - prefix. |
method:POST | req_method = 'POST' | Exact match. Multiple values (method:GET method:POST) become IN ('GET','POST'). |
status:404 | resp_status_code = 404 | Exact status code match. |
status:4xx | resp_status_code >= 400 AND resp_status_code < 500 | Range shorthand — 2xx, 3xx, 4xx, 5xx expand to 100-code ranges. |
status:>=400 | resp_status_code >= 400 | Comparison operators: >, <, >=, <=. |
path:/api/v1 | req_path LIKE '%/api/v1%' | Substring match on the path. |
tag:websocket | tags LIKE '%"websocket"%' | Searches inside the JSON tags array. |
content_type:json | req_content_type LIKE '%json%' OR resp_content_type LIKE '%json%' | Multiple values are OR’d. |
duration:>1000 | duration_ns > 1000000000 | Duration is in milliseconds in GQL, converted to nanoseconds for the database (multiplied by 1,000,000). |
size:>10000 | COALESCE(LENGTH(resp_body),0) > 10000 | Response body size in bytes. Uses COALESCE because resp_body can be NULL. |
has:websocket | tags LIKE '%"websocket"%' | Existence checks. Also supports has:error (error IS NOT NULL), has:response (resp_status_code IS NOT NULL), has:interaction. |
body:error | FTS5 JOIN | Full-text search across request and response bodies. Joins with flows_fts virtual table. |
header:authorization | FTS5 JOIN | Full-text search across request and response headers. |
Pagination and Limits
Section titled “Pagination and Limits”| Query Type | Default Limit | Maximum Limit |
|---|---|---|
| Flow queries | 100 | 10,000 |
| Host stats | 1,000 | — |
| App stats | 500 | — |
| Device stats | 500 | — |
| Flow stats (host counts) | 100 | — |
| Flow stats (content types) | 50 | — |
| Comparison summaries | 50,000 | — |
| Interactions | 50 (list), 100 (per-flow) | 500 |
| Security findings | — | 1,000 |
| Extension events | 100 | 500 |
| Artifacts | 100 | — |
All flow queries are ordered by started_at DESC (newest first). Comparison summaries are ordered started_at ASC (oldest first, for chronological comparison).
Session Flow Counts
Section titled “Session Flow Counts”Sessions don’t store their flow count as a column. Instead, it’s computed at query time using a LEFT JOIN:
LEFT JOIN ( SELECT session_id, COUNT(*) as cnt FROM flows GROUP BY session_id) fc ON fc.session_id = s.idThis means the count is always accurate — there’s no risk of the stored count getting out of sync with the actual number of flows.
Auto-Purge
Section titled “Auto-Purge”Ghost can automatically delete old flows to prevent the database from growing indefinitely. The auto-purger runs as a background goroutine that checks every 5 minutes.
Configuration
Section titled “Configuration”[store]# Enable automatic purging of old flowsauto_purge_enabled = false
# Delete flows older than this many hours (0 = disabled)auto_purge_max_age_hours = 0
# Keep at most this many flows (0 = unlimited)auto_purge_max_flows = 0Purge Strategies
Section titled “Purge Strategies”Two strategies run independently (both can be active simultaneously):
By age: Deletes all flows older than auto_purge_max_age_hours. For example, setting this to 72 means flows older than 3 days are automatically removed.
By count: When the total flow count exceeds auto_purge_max_flows, the oldest flows are deleted to bring the count back to the limit. The deletion uses a subquery to find the oldest excess flows:
DELETE FROM flows WHERE id IN ( SELECT id FROM flows ORDER BY started_at ASC LIMIT MAX(0, (SELECT COUNT(*) FROM flows) - ?))This deletes the oldest flows first, keeping the most recent ones.
Security Finding Deduplication
Section titled “Security Finding Deduplication”Security findings use a deduplication mechanism to prevent the same vulnerability from being stored multiple times. Each finding has a dedup_key (defaulting to type|title if not explicitly set), and a UNIQUE index on (session_id, dedup_key) prevents duplicates.
When CreateFinding is called, it uses INSERT OR IGNORE — if a duplicate already exists, the insert silently does nothing and returns false (indicating the finding was not new). This is important for passive detection, which runs on every flow and would otherwise create thousands of duplicate findings.
Finding Validation
Section titled “Finding Validation”Before storing a finding, Ghost validates:
| Field | Valid Values |
|---|---|
| severity | critical, high, medium, low, info |
| status | open, confirmed, false_positive, fixed |
| type | auth, injection, exposure, config, crypto, access, session |
| source | passive, active, ai |
| confidence | 0.0 to 1.0 (a decimal number representing percentage) |
Invalid values are rejected before the database write.
Migration System
Section titled “Migration System”Ghost’s schema evolves through numbered migrations. The current version is tracked using SQLite’s PRAGMA user_version — a built-in integer that SQLite stores in the database file header.
How migrations run:
- On startup, Ghost reads
PRAGMA user_versionto get the current schema version - It compares this to the total number of migrations (18)
- For each pending migration, it starts a database transaction, executes the migration SQL, sets
PRAGMA user_version = N, and commits - If any migration fails, the transaction is rolled back — the database stays at the previous version
Each migration runs in its own transaction, so a failure at migration 15 leaves the database at version 14 (with migrations 1-14 intact).
Migration highlights:
| Migration | What It Added |
|---|---|
| 1 | Core schema: sessions, flows (with 28 columns), flows_fts, settings, 6 indexes, 3 FTS triggers |
| 2 | Addon scripts (code storage, priority ordering) |
| 3 | AI agent conversations and messages (with tool call JSON) |
| 4 | Browser interactions (with screenshot blobs, correlation confidence) |
| 5 | Journeys and journey steps (with FK references to flows and interactions) |
| 6 | Screenshot baselines (for visual regression testing) |
| 7 | Timing breakdown column on flows (timings_json) |
| 8 | Map rules (URL mapping, redirects, rewrites with pattern matching) |
| 9 | WebSocket frames (per-flow, with sequence numbers) |
| 10 | Source identification (client_ip, source_app) |
| 11 | Device identification (device_type, device_id, device_name) |
| 12 | Artifacts (persisted bug reports, exports, test scenarios) |
| 13 | Security findings (with severity, CWE/OWASP, evidence) |
| 14 | Frida scripts (categorized, cross-platform) |
| 15 | Finding dedup key + UNIQUE index + backfill + cleanup of existing duplicates |
| 16 | Injection rules (URL pattern matching, script storage) |
| 17 | Session-scoped injection rules (table rebuild with session FK + CASCADE) |
| 18 | Extension events (console errors, navigations, storage changes) |
Monitoring
Section titled “Monitoring”GET /api/v1/monitor returns runtime and database statistics:
| Field | Source | Description |
|---|---|---|
| Database file size | PRAGMA page_count × PRAGMA page_size | Total size of the database file in bytes (computed from SQLite’s page count and page size, which is more accurate than checking the file size on disk because of WAL) |
| Total flow count | SELECT COUNT(*) FROM flows | Number of HTTP flows stored across all sessions |
| Memory usage | Go runtime (runtime.MemStats) | Ghost’s current memory consumption |
| Goroutine count | runtime.NumGoroutine() | Number of concurrent Go routines (useful for detecting leaks) |
| Uptime | Process start time | How long Ghost has been running |
Configuration Reference
Section titled “Configuration Reference”[store]# Maximum body size to store per flow (bytes).# Bodies larger than this are truncated for storage# but forwarded in full to the destination.# Default: 10485760 (10 MB)max_flow_body_size = 10485760
# Enable automatic deletion of old flows.# Default: falseauto_purge_enabled = false
# Delete flows older than this many hours.# Only applies when auto_purge_enabled = true.# 0 = disabled (no age-based purging).auto_purge_max_age_hours = 0
# Maximum number of flows to keep.# Only applies when auto_purge_enabled = true.# 0 = unlimited.auto_purge_max_flows = 0