4 Terabytes. 40 Minutes. How a Poisoned Python Package Gutted a $10B AI Startup — and Your Resume Might Be in the Leak

TL;DR

→ Three-stage cascading supply chain attack: Trivy (vulnerability scanner) → LiteLLM (PyPI package) → Mercor (production environment)
→ Malicious LiteLLM versions 1.82.7 and 1.82.8 were live on PyPI for ~40 minutes. Automated CI/CD pipelines pulled them instantly.
→ Lapsus$ claims ~4 TB of data: 939 GB source code, 211 GB user database, ~3 TB video interviews with face/voice biometrics, passport scans, KYC documents
→ Lapsus$ gained full Tailscale VPN access to Mercor's internal network
→ Unconfirmed report: a developer may have exposed production credentials through an AI coding assistant
→ Mandiant estimates the broader TeamPCP campaign has hit 500,000 machines and 1,000+ SaaS environments
→ Class-action lawsuit investigation is underway. This is the fourth major AI platform breach in four months.

A note on sources

This analysis draws from TechCrunch, The Register, SecurityWeek, Wiz, ReversingLabs, Datadog Security Labs, Cybernews, The Record, and Chinese-language coverage on LINUX DO (linux.do). Lapsus$ claims are unverified by independent forensics — the 4 TB figure and Tailscale VPN access come from Lapsus$'s own Telegram channel and leak site. Mercor has confirmed a "security incident" tied to the LiteLLM compromise but has declined to confirm or deny whether customer/contractor data was accessed or exfiltrated. We distinguish between verified technical details and unverified threat actor claims throughout. This analysis cross-references twelve sources across English and Chinese.

What Is Mercor, and Why Is This Breach Catastrophic?

Mercor is an AI-powered hiring platform founded in 2023 by Brendan Foody, Adarsh Hiremath, and Surya Midha. It uses AI to screen resumes, conduct video interviews, and match candidates with employers. Its client list reads like a who's-who of AI: OpenAI, Anthropic, and hundreds of enterprises that rely on Mercor to contract specialized domain experts — scientists, doctors, lawyers — for AI model training. The company facilitates over $2 million in daily payouts, has screened 300,000+ resumes, conducted 100,000+ video interviews, and was valued at $10 billion after a $350 million Series C led by Felicis Ventures in October 2025.

That is the business story. Here is the security story: Mercor possesses what might be the most dangerous combination of personal data in the AI industry. Not credit card numbers. Not email addresses. Your face. Your voice. Your passport. Your government-issued ID. Video recordings of you answering interview questions. KYC verification documents. Employment history. Skills assessments. Salary expectations. References. And all of it organized, indexed, and searchable by an AI system designed to make retrieval as fast and complete as possible.

You cannot reset your face. You cannot rotate your voice. You cannot issue a new biometric identity like you issue a new credit card. If your biometric data leaks, it leaks forever. There is no remediation. There is no "password reset" for your face. And Mercor just had approximately 3 terabytes of video interviews — containing face and voice biometrics of hundreds of thousands of people — claimed as stolen by one of the most prolific extortion groups on the planet.

The Attack Chain: Three Dominoes, Four Days, Four Terabytes

The Mercor breach was not a direct attack. Nobody phished a Mercor employee. Nobody found a zero-day in Mercor's application. The attackers never had to touch Mercor at all — until Mercor's own build pipeline invited them in. This is a three-stage cascading supply chain attack, and understanding how each domino fell is critical to understanding why your organization is almost certainly vulnerable to the same pattern.

Stage 1 — Poisoning the Vulnerability Scanner (Late February – March 19)

TeamPCP — the threat group behind the campaign — first compromised Aqua Security's Trivy, one of the most trusted open-source vulnerability scanners in the industry. They exploited a pull_request_target workflow vulnerability in Trivy's GitHub Actions configuration to exfiltrate the aqua-bot service account credentials and rewrite Git tags. They simultaneously compromised Checkmarx's KICS static analysis tool using the same pattern.

Let that sink in: two of the tools you trust to find vulnerabilities in your code were themselves compromised. The fox was not in the henhouse. The fox was the henhouse.

Stage 2 — Weaponizing LiteLLM via PyPI (March 24)

Using an API token exposed via the Trivy compromise, TeamPCP obtained the PyPI publishing credentials for LiteLLM — linked to the GitHub account of LiteLLM's co-founder/CEO. They published two malicious versions to PyPI:

Malicious LiteLLM versions

Version	Payload mechanism	Trigger
1.82.7	Double-base64-encoded payload dropped to disk as `p.py`	Executes when `litellm --proxy` is run
1.82.8	Abuses Python's `.pth` file mechanism (`litellm_init.pth`)	Runs on every Python interpreter startup — no import required

Version 1.82.8 is the nightmare scenario. The .pth file executes arbitrary code every time Python starts, regardless of whether LiteLLM is imported. Install it once, and every Python process on the machine becomes a credential harvester. The payload targeted:

SSH keys
AWS, GCP, Azure cloud credentials
Kubernetes configs
CI/CD secrets
Docker configs
Database credentials
Cryptocurrency wallets
.env files

Everything was encrypted with AES-256, the key encrypted with an embedded RSA public key, and exfiltrated to attacker-controlled domains: checkmarx[.]zone and models[.]litellm[.]cloud. The malicious packages were live for approximately 40 minutes — published ~8:30 UTC, quarantined ~11:25 UTC. Forty minutes. That is all it takes when your CI/CD pipeline auto-resolves the latest version.

Stage 3 — Mercor Falls, Lapsus$ Moves In (March 24–30)

Mercor, as one of thousands of LiteLLM users, pulled the poisoned package into their environment through automated dependency resolution. The credential harvester exfiltrated their secrets. Lapsus$ — an extortion group collaborating with TeamPCP alongside ransomware gangs CipherForce and Vect — used the harvested credentials to gain full access to Mercor's Tailscale VPN environment.

From inside the VPN, everything was reachable. The source code. The databases. The storage buckets containing years of video interviews. The Slack workspace. The ticketing system. Everything.

The cascading trust chain

1. Trivy GitHub Actions → pull_request_target exploit → aqua-bot credentials stolen

2. Stolen creds → LiteLLM PyPI publishing token exfiltrated

3. Two malicious LiteLLM versions published to PyPI → live for 40 minutes

4. Mercor CI/CD auto-resolves latest LiteLLM → credential harvester executes

5. Mercor secrets exfiltrated → Tailscale VPN credentials among them

6. Lapsus$ enters Mercor VPN → 4 TB exfiltrated → passport scans, video interviews, source code

What Was Stolen: The Data That Cannot Be Unbreached

Lapsus$ claimed data (unverified by independent forensics)

Category	Size	Contents
Source code	939 GB	Full platform source code
User database	211 GB	Resumes, candidate profiles, employer data, user credentials
Storage buckets	~3 TB	Video interviews (face + voice biometrics), passport scans, KYC identity verification
Internal systems	—	Slack communications, ticketing data, Tailscale VPN configs, keys and secrets

Stop and think about what 3 terabytes of video interviews means. These are not text records. These are recordings of real people — their faces, their voices, their mannerisms — answering questions about their skills, their career goals, their salary expectations. Many of these people are scientists, doctors, and lawyers who contracted through Mercor to train AI models for OpenAI and Anthropic. Their biometric data is now in the hands of an extortion group that openly auctions stolen data to the highest bidder.

Credit cards expire. Passwords can be changed. Biometric data is permanent. Every person whose video interview was in those storage buckets now has their face and voice permanently available for deepfake generation, identity fraud, and social engineering attacks. Forever. There is no expiry date on a face.

The Ghost in the Machine: Did an AI Coding Assistant Leak the Credentials?

One detail in the reporting deserves special attention. An unconfirmed report suggests that "a developer may have exposed production credentials through an AI coding assistant linked to Anthropic."

This is unverified. But the attack pattern is real and documented: developers using AI coding assistants (Copilot, Cursor, Claude Code, ChatGPT) routinely paste terminal output, error messages, and configuration files into prompts. Those prompts traverse third-party APIs. If a developer pasted a stack trace containing a database connection string, or an error log containing a Tailscale auth key, that credential just left the building.

Even without the supply chain attack, this single vector — developers leaking secrets through AI assistant context windows — is a ticking time bomb across the entire industry. How many of your developers have pasted .env files into an AI prompt? How would you even know?

The Broader Blast Radius: You Are Probably Already Compromised

Mercor is not special. Mercor is "one of thousands" of companies affected by the LiteLLM supply chain attack. Their own spokesperson said so. Mandiant's threat hunters estimate the TeamPCP campaign has exfiltrated data from 500,000 machines. Mandiant Consulting reports 1,000+ impacted SaaS environments — and their CTO predicted expansion to potentially 10,000 victims.

The question is not whether your organization installed the compromised LiteLLM versions. The question is whether you can prove you didn't. Can you right now, at this moment, tell me the exact version of every transitive Python dependency in your production environment? Can you confirm that no CI/CD pipeline, no developer laptop, no staging environment pulled LiteLLM 1.82.7 or 1.82.8 during that 40-minute window? If you cannot answer those questions with certainty, you do not know whether you are compromised.

TeamPCP campaign cascade — each compromise enabled the next

Date	Target	Vector	Downstream impact
Late Feb	Aqua Trivy + Checkmarx KICS	`pull_request_target`	Every CI/CD pipeline trusting these scanners
Mar 24	LiteLLM (PyPI)	Stolen publishing credentials	95M monthly downloads, 36% of cloud environments
Mar 25	Telnyx	Same campaign pattern	Telecom infrastructure
Mar 30	Mercor (via LiteLLM)	Auto-resolved poisoned dependency	4 TB data, 300K+ candidate biometrics
Mar 31	npm axios	Cross-ecosystem spread	JavaScript ecosystem now under attack

The gap between attacks is shrinking. The ecosystem spread is widening. Python today. JavaScript today. What language tomorrow? What package? What scanner? What hiring platform? What customer database?

The Chinese Perspective: 给OpenAI和Anthropic训练模型的公司被黑了

The Chinese-language discussion on LINUX DO (linux.do) cut straight to the geopolitical nerve that English coverage danced around. The thread title translates to: "The company training models for OpenAI and Anthropic got hacked: Mercor confirms attack, Lapsus$ claims 4 TB data theft."

The Chinese security community immediately flagged the training data supply chain implications. Mercor contracts domain experts — scientists, doctors, lawyers — to provide training data for frontier AI models. If Lapsus$ exfiltrated the training interaction data alongside the biometrics, the compromised data potentially includes proprietary model training methodologies, expert annotations, and RLHF preference data that companies like OpenAI and Anthropic paid for. The biometric data is the headline. The training data contamination vector is arguably the more dangerous long-term consequence.

The ti.dbappsecurity.com.cn threat intelligence platform (安全星图平台) referenced the incident in a broader 2026 cybersecurity threat predictions analysis, categorizing it as evidence of an accelerating pattern: AI companies are building the most sensitive data honeypots in history while running security practices designed for the pre-AI era.

The Root Causes: Five Failures That Destroyed a $10 Billion Company's Security

1. Unpinned dependencies in CI/CD — the original sin

LiteLLM's CI/CD pipeline pulled Trivy from apt without pinning to a specific version or verifying a checksum. Mercor's pipeline auto-resolved the latest LiteLLM version without pinning. This is the single root cause that enabled the entire cascade. Pin to a SHA hash instead of a version string, and the compromised packages never enter your environment. It is that simple. And almost nobody does it.

2. No credential isolation — VPN keys in the blast radius

When the credential harvester executed, it found Tailscale VPN credentials in the environment. This means VPN authentication secrets were accessible to processes running in the CI/CD or development environment. VPN credentials should live in a hardware security module or isolated secrets manager — never in environment variables or config files that a compromised dependency can read.

3. Flat network behind the VPN — no segmentation

Once inside Mercor's Tailscale VPN, Lapsus$ accessed everything: source code, databases, storage buckets, Slack. This indicates minimal or no network segmentation behind the VPN boundary. The VPN was the castle wall, and once breached, the entire kingdom fell. Zero-trust architecture — where every service verifies every request independently, regardless of network position — would have limited the blast radius to whatever the compromised credential had explicit access to.

4. Biometric data stored without additional encryption layers

Three terabytes of video interviews containing face and voice biometrics were apparently accessible from storage buckets reachable via the VPN. Biometric data — the most sensitive, most irrevocable category of personal information — should be encrypted at rest with keys managed separately from the application environment. If the storage bucket encryption keys were in the same credential scope as the VPN access, the entire data protection model collapses on a single compromised secret.

5. Bearer token / API key authentication — no request-level binding

The stolen credentials granted access because they were bearer-style secrets — whoever holds the token gets the access, regardless of where, when, or how the request originates. There was no mechanism to verify that a request using a legitimate credential was actually being made by the legitimate service, for a legitimate purpose, with an unmodified payload. A stolen bearer token is a skeleton key. It works from anywhere, for anything, until someone notices it is missing.

This is the fundamental weakness that every major authentication scheme shares — and the one that modern per-request, context-bound authentication protocols are designed to eliminate. If every API request required cryptographic proof that the caller is who they claim to be, calling the endpoint they intend, with the body they constructed, within a narrow time window — a stolen credential from a CI/CD exfiltration would be worthless. The credential would be bound to a request that already completed. It could not be replayed. It could not be redirected. It could not be used for a different endpoint or a different payload.

What We Got Wrong (Red-Teaming Our Own Narrative)

Contradiction resolution

Claim	Reality check
"4 TB of data stolen"	This figure comes from Lapsus$'s own claims. No independent forensics have verified the volume or contents. Mercor has not confirmed or denied specific data access. Threat actors routinely inflate impact claims.
"Full Tailscale VPN access"	Also a Lapsus$ claim. The evidence is Lapsus$-provided samples (Slack data, two video recordings). These could demonstrate limited access rather than full network compromise. Without Mercor's forensic report, the actual scope is unknown.
"AI coding assistant leaked credentials"	A single unconfirmed report. The LiteLLM supply chain attack is sufficient to explain the credential theft without invoking an additional vector. This may be misinformation or conflation.
"Mercor is one of thousands"	Mercor's own statement. The Mandiant estimates (500K machines, 1,000+ SaaS environments) are from private threat intelligence briefings, not published reports. The actual downstream count is uncertain.

A Pattern That Should End the "Move Fast and Break Things" Era

Four months. Four major AI platform breaches. Each one exploiting a different entry point, each one arriving at the same conclusion: the AI industry is building the most valuable data repositories in human history while protecting them with security practices from 2015.

2026 AI platform breach pattern

Incident	Entry point	Root cause	Data at risk
McKinsey Lilli	Unauthenticated API endpoints	SQL injection + IDOR + exposed API docs	46.5M chat messages, 728K files, 95 system prompts
LiteLLM	Compromised vulnerability scanner	Unpinned CI/CD deps + stolen PyPI creds	SSH keys, cloud creds, K8s tokens across 500K machines
Google Gemini	Calendar invite with prompt injection	No input sanitization + over-privileged agent	Email forwarding, smart home control
Mercor	Poisoned transitive dependency	Supply chain + flat network + bearer auth	Biometric data of 300K+ people

Notice the escalation. Chat messages. Cloud credentials. Smart home controls. Biometric identity. Each breach exposes data that is harder to remediate than the last. We are moving up the hierarchy of irreversible damage. The next breach will expose something worse. It always does.

What To Do Right Now. Not Next Sprint. Now.

Check if you are already compromised

Search every machine and CI/CD environment for litellm_init.pth, tpcp.tar.gz, /tmp/pglog, /tmp/.pg_state, ~/.config/sysmon/sysmon.py. If you find ANY of these: stop reading, rotate every credential, revoke every VPN token, invalidate every API key. Do it now. Not after the standup.

Pin ALL dependencies to SHA hashes — versions are not enough

pip install litellm==1.82.6 would have pulled the compromised version if PyPI resolved it. Pin to SHA hashes: pip install litellm==1.82.6 --hash=sha256:... Pin GitHub Actions to commit SHAs, not tags. Tags can be rewritten. SHAs cannot. This is the single change that would have prevented Mercor's compromise.

Isolate VPN and infrastructure credentials from build environments

If a compromised Python package in your CI/CD pipeline can read your Tailscale auth key, your VPN is one supply chain attack away from breach. Move all infrastructure credentials to HSMs or isolated secrets managers with explicit, per-service access grants. No environment variables. No config files on disk.

Implement zero-trust network segmentation — kill the flat network

A VPN is not a security boundary. It is a convenience layer. Every service behind your VPN should authenticate and authorize every request independently. If Lapsus$ gets your VPN credentials, they should see a wall of individually locked doors, not an open office.

Encrypt biometric and PII data with isolated key management

Video interviews, passport scans, and KYC documents should be encrypted at rest with keys managed in a separate security domain from the application. Breach of the application environment should not grant access to the encryption keys. This is table stakes for any company handling biometric data.

Deploy per-request authentication — bearer tokens are not enough

Every API call between your services should carry cryptographic proof that the request was made by the claimed caller, to the intended endpoint, with the intended payload, within a narrow time window. A stolen bearer token should be worthless. If your inter-service auth is "whoever has the API key gets access," you are running the same architecture that let Lapsus$ walk through Mercor's entire infrastructure.

Audit what your developers paste into AI coding assistants

If the unconfirmed AI assistant credential leak is real, it represents a new attack class: secrets exfiltrated through developer context windows. Establish policies and tooling to detect and prevent credentials in AI assistant prompts. Your developers are almost certainly pasting .env contents into ChatGPT right now. Are you sure they are not?

The Uncomfortable Truth

The Mercor breach is not a story about a startup that got unlucky. It is a preview of what happens when the AI industry's velocity meets the real world's threat landscape. Mercor did not have to be specifically targeted. They did not have to make any extraordinary mistake. They used a popular Python library. Their CI/CD pipeline auto-resolved the latest version. That is it. That is all it took to lose the biometric identity data of hundreds of thousands of people.

The cascading supply chain attack model — compromise a scanner, steal publishing credentials, poison a popular package, harvest downstream secrets, sell them to extortion groups — is now a proven, repeatable, scalable playbook. TeamPCP executed it across Python and JavaScript in the span of one week. The next group will be faster. The next target will have more sensitive data. The next breach will make Mercor look like a warm-up.

And here is the detail that should terrify every CISO in the AI industry: the malicious packages were live for forty minutes. Not forty days. Not forty hours. Forty minutes. Your incident response playbook assumes hours of detection time. Your vulnerability management SLA is measured in days. The attackers needed minutes. Your security model is optimized for a threat velocity that no longer exists.

The era of "we'll fix it in the next sprint" is over. The sprint is too slow. The attackers are not waiting for your standup.

How Code Corgi Catches Supply Chain Attacks Before They Land

PhantomCorgi Invisible Threat Detection

✓

Dependency manifest diffing

Alerts on every dependency addition, removal, or version bump in PRs — flagging version jumps that don't correspond to tagged source commits.

✓

.pth and startup hook detection

Scans Python packages for .pth files, setup.py post-install scripts, and other startup hooks — the exact vector used in the LiteLLM attack.

✓

Base64 & obfuscated payload scanning

Detects double-encoded payloads, eval/exec chains, and obfuscation patterns hiding credential harvesters in package code.

✓

CI/CD configuration review

Flags pull_request_target triggers, unpinned action versions, and secrets exposed to untrusted workflow contexts — the Trivy entry point.

✓

Credential leak detection

Scans code, configs, and PR diffs for API keys, VPN tokens, database connection strings, and other secrets that should never reach a repository.

✓

Unicode & homoglyph scanning

Catches invisible characters and lookalike substitutions designed to pass human code review while hiding malicious intent.

Install Code Corgi →

How API Phantom Eliminates the Bearer Token Problem

PhantomCorgi AI Platform Security Shield

✓

Per-request cryptographic authentication

Every API call carries proof that the request was made by the claimed caller, to the intended endpoint, with the intended payload. Stolen credentials cannot be replayed or redirected.

✓

Zero blast radius on credential theft

Authentication tokens are bound to a single request context. A credential exfiltrated from a CI/CD environment is mathematically invalid for any other request.

✓

No PKI infrastructure required

No certificate authorities, no key directories, no public key discovery. Deploys as a lightweight middleware layer on top of existing TLS.

✓

Continuous autonomous red-teaming

AI-powered probing runs 24/7 against your own services — finding flat network paths, over-privileged tokens, and missing auth boundaries before an attacker does.

✓

Request integrity verification

Body tampering, method swapping, and path redirection are cryptographically detected. A request modified in transit fails verification instantly.

✓

AI agent proof-of-intent

AI agents must declare their intended action and scope before receiving credentials. Scope escalation — an agent requesting write access when it only needs read — is blocked at the authentication layer, not just by policy.

Explore API Phantom →

Sources