Scenario 01 — Logged-Out Web Scrape
Scenario
Yesterday, the Phantom Feed security team noticed unusual patterns in
logged-out web traffic against the primary site
web.phantomfeed.io. To investigate, you have been given
a single day of HTTP logs (~170,000 rows) captured during the
window of interest.
Your goal
Determine whether automated scraping is occurring against
web.phantomfeed.io. If so, you should be able to answer:
- Which clients are responsible?
- What are they targeting?
- What signals make this defensibly automated activity (not human or partner)?
- What mitigation would you recommend that does not over-block legitimate users?
What you may assume
- Logs represent a mix of normal user traffic, a legitimate partner integration, and potentially automated traffic.
- Suspicious activity is not labeled. You will need to build the case.
- The dataset includes both logged-out traffic (
userID = '0',authStatus = 'logged_out') and authenticated browsing (authStatus = 'logged_in'). The interesting activity for this scenario is in the logged-out portion. - IPs are intentionally non-routable / synthetic for safety. Do not attempt to contact them.
The dataset
You will be querying a single SQLite table named
phantomfeed_logs_logged_out. The console will auto-load it when
you select this scenario. Click the Schema tab in the
sidebar to see all columns with hints, or click Sample queries
to load starter SQL.
Tips
- Combine signals — a single signal (high volume per IP, identical JA4, etc.) will mislead you in this dataset.
- The partner integration in this data has high volume too. Make sure your detection doesn't false-positive them.
- If you get stuck, the Hints tab in the sidebar provides progressive nudges from broad framing to specific signal stacking.