Ghost in the Bit is a notebook and a workbench. Field notes on scraping, abuse, and rate-limit research — paired with a browser-based SQL console where you can practice the same investigations on realistic synthetic traffic, no install required.
Synthetic HTTP traffic with believable JA4 fingerprints, ASN distributions, partner integrations, and noise. Designed to mislead investigators who lean on a single signal.
SQLite via WebAssembly. No server, no account, no install. Load a scenario, write SQL, run it. All data stays on your device.
Each scenario ships with instructions, progressive hints, and sample queries — so you can think through the problem, get unstuck, and ground your answer in real SQL.
A web property sees unusual logged-out traffic. Determine whether scraping is occurring, who is responsible, and what they're targeting. Stacks cookie state, JA4, and ASN reputation signals.
A mobile API integration shows odd behavior. Identify the bad actor, attribute their infrastructure, and explain how they're enumerating partner-scoped graph endpoints across thousands of user accounts.
An employee may be exfiltrating data before leaving for a competitor. Five-table dataset. Build per-employee baselines, detect within-person deviation, and rule out two documented false-positive distractors (a role change and PTO catch-up).
Notes from research, with identifying details changed and/or redacted for confidentiality.
A follow up to the exposed vector database study. I loaded the scan data into BigQuery and measured what you can classify from scan data alone versus what needs an active probe. One signal is cheap to add. One gap cannot be closed by scanning harder.
The layer that logs what people actually say to their AI apps, the prompts and the responses, sitting on the open internet. I found 274 instances and 71 percent had no authentication at all. Enumerate only, nothing read.
A reproducible Censys based measurement of exposed Qdrant and Weaviate vector databases. 73.5% of reachable Qdrant instances required no authentication, with medical, legal, and financial data sitting open. Enumerate only methodology, no stored data accessed.
Why I built Phantom Feed, the gap it fills, and who it is for. The launch post.