Skip to content

Storage Layer

Every HTTP request and response Ghost captures needs to be stored somewhere — so you can search it, filter it, export it, and analyze it later. Ghost uses SQLite as its database, which is an embedded database engine that stores everything in a single file on your computer. Unlike databases like PostgreSQL or MySQL that run as separate server processes, SQLite runs inside Ghost itself — no installation, no configuration, no network connections. Your entire traffic history lives in one file: ~/.ghost/ghost.db.

Ghost uses a pure Go implementation of SQLite (modernc.org/sqlite) that requires no C compiler or external libraries. This keeps Ghost as a single binary with zero dependencies.

When Ghost opens the database, it configures SQLite with specific settings designed for Ghost’s usage pattern: many goroutines reading data simultaneously (loading the traffic list, running searches, generating stats), but only one goroutine writing at a time (storing incoming flows from the proxy).

db.SetMaxOpenConns(1) // Only one connection to the database
db.SetMaxIdleConns(1) // Keep that connection alive when idle
db.SetConnMaxLifetime(0) // Never close the connection due to age

Why a single connection? SQLite doesn’t handle multiple simultaneous writers well — it would return “database is busy” errors. By limiting Go’s connection pool to exactly one connection, Ghost serializes all database operations (reads and writes go through the same connection, one at a time). This is simple and reliable: no lock contention, no BUSY errors, no retry logic needed.

Pragmas are SQLite configuration commands that tune the database engine’s behavior. Ghost sets these when the database is first opened:

PragmaValueWhat It Does
journal_mode=WALWrite-Ahead LoggingInstead of overwriting the database file directly, SQLite writes changes to a separate WAL file first. This is faster (sequential writes instead of random) and allows readers to work while a write is in progress — they simply read from the database file while the writer appends to the WAL file.
busy_timeout=1000010-second waitIf a query can’t execute because the database is locked (rare with MaxOpenConns(1), but possible during WAL checkpoints), SQLite retries for up to 10 seconds before returning an error.
synchronous=NORMALBalanced durabilityWith WAL mode, NORMAL means SQLite flushes data to disk only at checkpoint time — not after every single write. This is much faster while still being safe: you could lose the last few writes if the power cuts out, but the database itself won’t be corrupted.
foreign_keys=ONEnforce relationshipsWhen a flow belongs to a session, deleting the session automatically deletes its flows (CASCADE). This pragma makes SQLite actually enforce these relationships — by default, SQLite ignores them.
cache_size=-6400064 MB page cacheSQLite caches frequently-accessed database pages in memory. The negative value means “kibibytes” (not pages), so -64000 = ~64 MB. This keeps hot data in memory for fast reads — important when scrolling through thousands of flows.

The WAL file grows as Ghost captures traffic. Periodically, SQLite needs to transfer changes from the WAL file back into the main database file — this is called a checkpoint. Ghost handles this in two ways:

Periodic (every 30 minutes): A background goroutine runs PRAGMA wal_checkpoint(PASSIVE) every 30 minutes. PASSIVE means “checkpoint whatever you can without blocking — if any reads are in progress, skip those pages.” This keeps the WAL file from growing unbounded without interrupting Ghost’s operation. The goroutine is wrapped in safeGo (Ghost’s panic-recovery wrapper that catches panics and reports them to Sentry).

On shutdown: When Ghost closes the database, it runs PRAGMA wal_checkpoint(TRUNCATE), which flushes everything from the WAL into the main database file and then resets the WAL file to zero bytes. This ensures the database is in a clean state when Ghost next starts.

The database file is stored at ~/.ghost/ghost.db:

PlatformFull Path
macOS/Users/<name>/.ghost/ghost.db
WindowsC:\Users\<name>\.ghost\ghost.db
Linux/home/<name>/.ghost/ghost.db

The ~/.ghost/ directory is created with 0700 permissions (owner-only access) if it doesn’t exist. The path is resolved by config.GhostHome().

Ghost’s storage layer is organized as 15 sub-interfaces that are composed into a single Store interface. Each sub-interface groups related database operations. The concrete implementation (SQLiteStore) lives in a single file (sqlite.go, ~3,200 lines) that implements all 80 methods.

What this diagram shows — the core data relationships:

The diagram shows how Ghost’s main data tables relate to each other. Sessions are the top-level container — when Ghost starts capturing traffic, flows are stored inside the active session. Each session can have many flows (HTTP requests/responses), journeys (user interaction recordings), security findings (vulnerabilities detected), injection rules (JavaScript injected into pages), and extension events (browser activity logs). Flows can have WebSocket frames (for WebSocket connections) and interactions (correlated browser actions). Journeys contain journey steps (individual actions in a recording).

The arrows with ||--o{ mean “one-to-many” — one session has many flows, one flow has many WebSocket frames. The FK columns create these relationships, and ON DELETE CASCADE means deleting a session automatically deletes everything inside it.

InterfaceMethodsWhat It Manages
FlowStore11HTTP flow CRUD, GQL-filtered queries with FTS5, stats aggregation (per-host, per-app, per-device), flow comparison summaries
SessionStore5Capture session CRUD with computed flow counts
AddonStore5JavaScript addon scripts (CRUD, priority ordering)
AgentStore7AI conversations and messages (with tool call storage)
ExtensionStore4Browser interactions and flow-interaction correlation
JourneyStore8User journey recording (steps, completion with subquery counts)
ScreenshotStore4Visual regression baselines (upsert, latest-by-URL lookup)
MapRuleStore5URL mapping, redirect, and rewrite rules
InjectionRuleStore6Script injection rules (session-scoped, with clear-all)
WSFrameStore4WebSocket frame persistence per flow
MonitorStore2Total flow count and database file size
ArtifactStore4Bug reports, session exports, test scenario files
SecurityFindingsStore6Security findings with dedup, validation, stats by severity/type/status
FridaScriptStore5Frida instrumentation scripts (categorized, cross-platform)
ExtensionEventStore3Console errors, navigations, storage changes from browser extension

Total: 79 interface methods + Close() = 80 methods on the composed Store interface.

When you search for text in request/response bodies or headers (using the body: or header: operators in GQL), Ghost uses SQLite’s FTS5 (Full-Text Search version 5) engine — a specialized search index that can find words inside large text fields efficiently.

CREATE VIRTUAL TABLE flows_fts USING fts5(
id UNINDEXED,
req_url, req_host, req_path,
req_headers_text, req_body_text,
resp_headers_text, resp_body_text,
notes, tags_text,
content=flows,
tokenize='porter unicode61'
);

What each part means:

ComponentPurpose
VIRTUAL TABLE ... USING fts5Creates a full-text search index, not a regular table. SQLite builds an inverted index (word → list of rows containing that word) for fast text lookup.
id UNINDEXEDThe flow ID is stored in the index for joining back to the flows table, but isn’t searchable itself — you don’t need to search for a ULID.
req_url, req_host, req_pathThe request URL, host, and path are all searchable. Searching for “api” would find flows to api.example.com/v1/users.
req_headers_text, req_body_textRequest headers and body (converted to text) are searchable. Searching for “Bearer” finds flows with authentication tokens.
resp_headers_text, resp_body_textResponse headers and body are searchable. Searching for “error” finds error messages in API responses.
notes, tags_textUser-added notes and tags are also searchable.
content=flowsThis is a “content table” — the FTS index references the flows table rather than storing its own copy of the data. Saves disk space.
tokenize='porter unicode61'Porter stemming means searching for “running” also finds “run,” “runs,” and “ran.” Unicode61 handles international characters correctly (accents, CJK characters, etc.).

Three database triggers keep the FTS index in sync with the flows table:

  • flows_fts_insert — when a flow is inserted, its searchable text is added to the FTS index
  • flows_fts_delete — when a flow is deleted, its entry is removed from the FTS index
  • flows_fts_update — when a flow is updated, the old entry is removed and a new one is inserted

This means you never need to manually rebuild the search index — it stays current automatically.

Ghost’s schema is built across 18 migrations. Each migration adds new tables, columns, or indexes as features were developed.

TableMigrationColumnsDescription
sessions1id, name, description, created_atCapture session containers. Flow count is computed via LEFT JOIN at query time, not stored.
flows1, 7, 10, 1137 columns totalThe core table. Request method/URL/host/path/proto/headers/body/content_type/content_len, response status/text/proto/headers/body/content_type/content_len, timing (started_at, duration_ns, timings_json), metadata (tags JSON, notes, source, metadata JSON, error), interaction fields (selector, element_text, event_type, page_url, timestamp, screenshot), device fields (client_ip, source_app, device_type, device_id, device_name).
flows_fts1FTS5 virtual tableFull-text search index over 9 text columns with porter stemming.
settings1key, valueKey-value configuration store.
addons2id, name, code, enabled, priority, created_at, updated_atJavaScript addon scripts with priority ordering.
conversations3id, session_id FK, title, created_at, updated_atAI agent conversation containers. CASCADE delete with session.
messages3id, conversation_id FK, role, content, tool_calls, tool_call_id, created_atIndividual messages within conversations (user, assistant, tool roles). CASCADE delete with conversation.
interactions4id, session_id FK, flow_id FK, type, selector, element_text, element_type, page_url, page_title, data, screenshot, correlation_confidence, timestamp, created_atBrowser interactions captured by the extension. flow_id is SET NULL on flow deletion (interaction survives).
journeys5id, session_id FK, name, status, flow_count, interaction_count, started_at, completed_at, created_atUser journey recordings. Status defaults to “recording”. CASCADE delete with session.
journey_steps5id, journey_id FK, step_order, type, interaction_id FK, flow_id FK, data, timestampIndividual steps within a journey. interaction_id and flow_id use SET NULL. CASCADE delete with journey.
screenshot_baselines6id, session_id FK, journey_id FK, url, selector, screenshot, width, height, file_size, captured_at, created_atVisual regression baselines. journey_id uses SET NULL.
map_rules8id, type, enabled, priority, name, host_pattern, path_pattern, method_pattern, local_, remote_target, rewrite_, created_at, updated_atURL mapping, redirect, and rewrite rules with pattern matching.
ws_frames9id, flow_id FK, direction, opcode, payload, length, timestamp, seqWebSocket frames captured during proxy relay. CASCADE delete with flow.
artifacts12id, type, title, summary, format, device_id, device_name, session_id, content, size_bytes, metadata JSON, created_atPersisted output files (bug reports, exports, test scenarios).
security_findings13, 15id, session_id FK, flow_id FK, type, severity, confidence, title, description, evidence, remediation, cwe_id, owasp_id, status, source, metadata JSON, dedup_key, created_at, updated_atSecurity vulnerabilities detected by passive analysis or active scanning. Unique constraint on (session_id, dedup_key).
frida_scripts14id, name, description, code, category, platform, created_at, updated_atSaved Frida instrumentation scripts.
injection_rules16, 17id, session_id FK, url_pattern, script, position, enabled, name, created_at, updated_atJavaScript injection rules scoped to sessions. CASCADE delete with session.
extension_events18id, session_id FK, type, page_url, data JSON, timestampBrowser extension events (console errors, navigations, storage changes). CASCADE delete with session.

Ghost uses two different FK behaviors depending on the relationship:

BehaviorWhen UsedExample
CASCADEOwnership — child cannot exist without parentDeleting a session deletes all its flows, findings, conversations, events
SET NULLReference — child can exist independentlyDeleting a flow sets flow_id = NULL on its interactions (the interaction still exists, just no longer linked)

Three-level cascade example: Deleting a session → cascades to its flows → cascades to each flow’s WebSocket frames. One DELETE statement can remove thousands of rows across three tables.

Ghost creates ~37 indexes across all migrations for query performance. The most important ones:

IndexColumnsWhy It Exists
idx_flows_session_startedsession_id, started_at DESCPrimary query path — the traffic list loads flows for the active session, sorted newest first. This composite index makes that query fast even with hundreds of thousands of flows.
idx_flows_req_hostreq_hostHost filtering and domain navigator grouping
idx_flows_req_methodreq_methodMethod filtering (GET, POST, etc.)
idx_flows_resp_statusresp_status_codeStatus code filtering (200, 404, 500, etc.)
idx_flows_source_appsource_appApplication grouping (Chrome, Safari, mobile app)
idx_flows_device_type + idx_flows_device_iddevice_type, device_idDevice filtering in the traffic list
idx_ws_frames_flowflow_id, seq ASCLoading WebSocket frames in sequence order for a specific flow
idx_findings_severityseverityGrouping security findings by severity level
idx_findings_dedupsession_id, dedup_key (UNIQUE)Prevents duplicate security findings — same vulnerability on the same endpoint in the same session is stored only once
idx_ext_events_session_typesession_id, type, timestamp DESCLoading extension events filtered by type within a session

When you type a search query in Ghost’s traffic list (like host:api.example.com status:4xx), the GQL parser converts it into SQL WHERE clauses:

GQL OperatorSQL TranslationDetails
host:api.example.comreq_host LIKE '%api.example.com' (suffix)If the value contains a dot, it’s treated as a domain suffix match — host:example.com matches api.example.com and cdn.example.com. Without a dot, it’s a substring match.
-host:noise.comreq_host NOT LIKE '%noise.com'Negation with - prefix.
method:POSTreq_method = 'POST'Exact match. Multiple values (method:GET method:POST) become IN ('GET','POST').
status:404resp_status_code = 404Exact status code match.
status:4xxresp_status_code >= 400 AND resp_status_code < 500Range shorthand — 2xx, 3xx, 4xx, 5xx expand to 100-code ranges.
status:>=400resp_status_code >= 400Comparison operators: >, <, >=, <=.
path:/api/v1req_path LIKE '%/api/v1%'Substring match on the path.
tag:websockettags LIKE '%"websocket"%'Searches inside the JSON tags array.
content_type:jsonreq_content_type LIKE '%json%' OR resp_content_type LIKE '%json%'Multiple values are OR’d.
duration:>1000duration_ns > 1000000000Duration is in milliseconds in GQL, converted to nanoseconds for the database (multiplied by 1,000,000).
size:>10000COALESCE(LENGTH(resp_body),0) > 10000Response body size in bytes. Uses COALESCE because resp_body can be NULL.
has:websockettags LIKE '%"websocket"%'Existence checks. Also supports has:error (error IS NOT NULL), has:response (resp_status_code IS NOT NULL), has:interaction.
body:errorFTS5 JOINFull-text search across request and response bodies. Joins with flows_fts virtual table.
header:authorizationFTS5 JOINFull-text search across request and response headers.
Query TypeDefault LimitMaximum Limit
Flow queries10010,000
Host stats1,000
App stats500
Device stats500
Flow stats (host counts)100
Flow stats (content types)50
Comparison summaries50,000
Interactions50 (list), 100 (per-flow)500
Security findings1,000
Extension events100500
Artifacts100

All flow queries are ordered by started_at DESC (newest first). Comparison summaries are ordered started_at ASC (oldest first, for chronological comparison).

Sessions don’t store their flow count as a column. Instead, it’s computed at query time using a LEFT JOIN:

LEFT JOIN (
SELECT session_id, COUNT(*) as cnt
FROM flows
GROUP BY session_id
) fc ON fc.session_id = s.id

This means the count is always accurate — there’s no risk of the stored count getting out of sync with the actual number of flows.

Ghost can automatically delete old flows to prevent the database from growing indefinitely. The auto-purger runs as a background goroutine that checks every 5 minutes.

[store]
# Enable automatic purging of old flows
auto_purge_enabled = false
# Delete flows older than this many hours (0 = disabled)
auto_purge_max_age_hours = 0
# Keep at most this many flows (0 = unlimited)
auto_purge_max_flows = 0

Two strategies run independently (both can be active simultaneously):

By age: Deletes all flows older than auto_purge_max_age_hours. For example, setting this to 72 means flows older than 3 days are automatically removed.

By count: When the total flow count exceeds auto_purge_max_flows, the oldest flows are deleted to bring the count back to the limit. The deletion uses a subquery to find the oldest excess flows:

DELETE FROM flows WHERE id IN (
SELECT id FROM flows ORDER BY started_at ASC
LIMIT MAX(0, (SELECT COUNT(*) FROM flows) - ?)
)

This deletes the oldest flows first, keeping the most recent ones.

Security findings use a deduplication mechanism to prevent the same vulnerability from being stored multiple times. Each finding has a dedup_key (defaulting to type|title if not explicitly set), and a UNIQUE index on (session_id, dedup_key) prevents duplicates.

When CreateFinding is called, it uses INSERT OR IGNORE — if a duplicate already exists, the insert silently does nothing and returns false (indicating the finding was not new). This is important for passive detection, which runs on every flow and would otherwise create thousands of duplicate findings.

Before storing a finding, Ghost validates:

FieldValid Values
severitycritical, high, medium, low, info
statusopen, confirmed, false_positive, fixed
typeauth, injection, exposure, config, crypto, access, session
sourcepassive, active, ai
confidence0.0 to 1.0 (a decimal number representing percentage)

Invalid values are rejected before the database write.

Ghost’s schema evolves through numbered migrations. The current version is tracked using SQLite’s PRAGMA user_version — a built-in integer that SQLite stores in the database file header.

How migrations run:

  1. On startup, Ghost reads PRAGMA user_version to get the current schema version
  2. It compares this to the total number of migrations (18)
  3. For each pending migration, it starts a database transaction, executes the migration SQL, sets PRAGMA user_version = N, and commits
  4. If any migration fails, the transaction is rolled back — the database stays at the previous version

Each migration runs in its own transaction, so a failure at migration 15 leaves the database at version 14 (with migrations 1-14 intact).

Migration highlights:

MigrationWhat It Added
1Core schema: sessions, flows (with 28 columns), flows_fts, settings, 6 indexes, 3 FTS triggers
2Addon scripts (code storage, priority ordering)
3AI agent conversations and messages (with tool call JSON)
4Browser interactions (with screenshot blobs, correlation confidence)
5Journeys and journey steps (with FK references to flows and interactions)
6Screenshot baselines (for visual regression testing)
7Timing breakdown column on flows (timings_json)
8Map rules (URL mapping, redirects, rewrites with pattern matching)
9WebSocket frames (per-flow, with sequence numbers)
10Source identification (client_ip, source_app)
11Device identification (device_type, device_id, device_name)
12Artifacts (persisted bug reports, exports, test scenarios)
13Security findings (with severity, CWE/OWASP, evidence)
14Frida scripts (categorized, cross-platform)
15Finding dedup key + UNIQUE index + backfill + cleanup of existing duplicates
16Injection rules (URL pattern matching, script storage)
17Session-scoped injection rules (table rebuild with session FK + CASCADE)
18Extension events (console errors, navigations, storage changes)

GET /api/v1/monitor returns runtime and database statistics:

FieldSourceDescription
Database file sizePRAGMA page_count × PRAGMA page_sizeTotal size of the database file in bytes (computed from SQLite’s page count and page size, which is more accurate than checking the file size on disk because of WAL)
Total flow countSELECT COUNT(*) FROM flowsNumber of HTTP flows stored across all sessions
Memory usageGo runtime (runtime.MemStats)Ghost’s current memory consumption
Goroutine countruntime.NumGoroutine()Number of concurrent Go routines (useful for detecting leaks)
UptimeProcess start timeHow long Ghost has been running
[store]
# Maximum body size to store per flow (bytes).
# Bodies larger than this are truncated for storage
# but forwarded in full to the destination.
# Default: 10485760 (10 MB)
max_flow_body_size = 10485760
# Enable automatic deletion of old flows.
# Default: false
auto_purge_enabled = false
# Delete flows older than this many hours.
# Only applies when auto_purge_enabled = true.
# 0 = disabled (no age-based purging).
auto_purge_max_age_hours = 0
# Maximum number of flows to keep.
# Only applies when auto_purge_enabled = true.
# 0 = unlimited.
auto_purge_max_flows = 0