BLOG by PhantomCorgi Team

Introducing Code Corgi — Invisible Threat Detection for Every Pull Request

Supply chain attacks are hiding in plain sight. Here's why we built Code Corgi, and how it catches what code review misses.

The SolarWinds attack used legitimate build infrastructure. The XZ Utils backdoor hid inside compression library commits for two years. The Log4Shell ecosystem spawned dozens of typosquatting packages — some with homoglyph names indistinguishable from the real thing at a glance.

These attacks share a common thread: they exploit the gap between what code looks like and what it does.

The Problem We’re Solving

Modern code review is a human process applied to an inhuman volume of changes. A single enterprise engineering org might review thousands of pull requests per week. Each one could contain:

  • A zero-width non-joiner character (U+200C) embedded inside an identifier
  • A Cyrillic а (U+0430) substituted for a Latin a in a package name
  • A base64-encoded payload tucked inside a config value
  • A dynamic require() call that resolves at runtime based on environment variables

None of these are visible in a standard code review. All of them have been used in real-world attacks.

What Code Corgi Does

Code Corgi hooks into your GitHub or GitLab webhook and analyzes every pull request through three detection layers:

Layer 1: Unicode normalization scans every changed file for non-ASCII codepoints in contexts where they shouldn’t appear — identifiers, string literals, comments. Bidirectional overrides, zero-width characters, and lookalike Unicode ranges are all flagged with their exact position and codepoint.

Layer 2: Homoglyph matching compares identifiers and string values against a database of visually similar character substitutions across Latin, Cyrillic, Greek, Arabic, and Han scripts. A package named аuth that imports pycrypto will be caught.

Layer 3: Semantic pattern detection uses tree-sitter to parse source code into an AST and identify behavioral patterns: eval(), exec(), dynamic imports, __import__, obfuscated strings, and encoded payloads. This layer catches intent, not just appearance.

Built for Enterprise From Day One

Code Corgi is Kubernetes-native, designed to run in air-gapped environments, and built with SOC2 compliance in mind from the first commit. Secrets are managed via HashiCorp Vault agent sidecars — no environment variable credentials. Every scan, alert, and override is written to an append-only audit log with an INSERT-only PostgreSQL role.

We’re starting with GitHub integration and expanding to GitLab, Bitbucket, and Azure DevOps.

Early Access

Code Corgi is now open for early access. The Starter tier is free, forever, for up to 5 repositories. We’d love to hear how it fits — or doesn’t fit — your workflow.

Get started →