Scenario 01 — Logged-Out Web Scrape

beginner scraping Estimated time: 45 min

Scenario

Yesterday, the Phantom Feed security team noticed unusual patterns in logged-out web traffic against the primary site web.phantomfeed.io. To investigate, you have been given a single day of HTTP logs (~170,000 rows) captured during the window of interest.

Your goal

Determine whether automated scraping is occurring against web.phantomfeed.io. If so, you should be able to answer:

Which clients are responsible?
What are they targeting?
What signals make this defensibly automated activity (not human or partner)?
What mitigation would you recommend that does not over-block legitimate users?

What you may assume

Logs represent a mix of normal user traffic, a legitimate partner integration, and potentially automated traffic.
Suspicious activity is not labeled. You will need to build the case.
The dataset includes both logged-out traffic (userID = '0', authStatus = 'logged_out') and authenticated browsing (authStatus = 'logged_in'). The interesting activity for this scenario is in the logged-out portion.
IPs are intentionally non-routable / synthetic for safety. Do not attempt to contact them.

The dataset

You will be querying a single SQLite table named phantomfeed_logs_logged_out. The console will auto-load it when you select this scenario. Click the Schema tab in the sidebar to see all columns with hints, or click Sample queries to load starter SQL.

Tips

Combine signals — a single signal (high volume per IP, identical JA4, etc.) will mislead you in this dataset.
The partner integration in this data has high volume too. Make sure your detection doesn't false-positive them.
If you get stuck, the Hints tab in the sidebar provides progressive nudges from broad framing to specific signal stacking.

Open the console for this scenario