Scenario 03 — Insider Data Exfiltration
Scenario
An internal SOC tip came in this morning. Word from another team is that one of our data-science engineers is in the final interview round at a direct competitor. HR has not formally heard about a resignation yet, but they would like to know — before anyone walks out the door — whether there is evidence of unusual data access tied to that employee, or to anyone else who might be in the same position.
You have 60 days of authenticated traffic against internal admin and data-science endpoints, plus four related tables that catalog employees, endpoint definitions, sensitivity classifications, and sampled query metadata. This is a multi-table investigation. You will be joining.
Your goal
- Identify whether any employee's behavior in the last 15 days deviates substantially from their own first-45-day baseline.
- Quantify what was accessed, and ideally how much data left.
- Distinguish exfiltration from legitimate behavioral changes (role moves, PTO, project shifts).
- Be ready to present your findings to HR and Legal — they will need defensible signals, not vibes.
What you may assume
- Every employee in the dataset has legitimate access to the endpoints they hit. This is not an authorization problem. This is a behavioral problem.
- Two employees have documented status changes in the 60-day window (PTO, role change). Those changes appear in
employees.status_notes. - The suspect (if any) is one of the 50 employees. You don't know who in advance.
- IPs in 10.42.x.x are corporate office addresses. IPs in 10.99.x.x are VPN.
The dataset — five tables
employees— 50 rows. emp_id, name, team, role, hired_date, status_notes.endpoints— 29 rows. Endpoint catalog with category, typical response size, allowed teams.data_classification— 29 rows. endpoint_path → sensitivity (public / internal / confidential / restricted).access_logs— ~220K rows. The primary investigation table.query_audit— ~5K rows. Sampled per-query metadata where queries or exports were run, including target table list and rows_returned.
Click the Schema tab in the sidebar to see all five tables with column-level hints.
Why this is harder than a scraping investigation
In a scraping investigation, the bad actor's traffic looks fundamentally different from normal traffic. In an insider investigation, the bad actor is normal traffic — until they aren't. Detection requires learning the per-employee normal first and then looking for deviation. Cross-employee comparisons will mislead you.