Best LLM Scanners
A magnifying glass over lines of text, representing prompt-injection detection tools
tools

Prompt-Injection Detectors Compared: Rebuff, Vigil, and LLM Guard

A practitioner comparison of open-source prompt-injection detectors — Rebuff, Vigil, and LLM Guard's PromptInjection scanner — including detection architecture, maintenance status, and which to actually deploy in 2026.

By Best LLM Scanners Editorial · · 8 min read

Prompt injection sits at the top of the OWASP Top 10 for LLM applications (LLM01), and a handful of open-source detectors exist specifically to catch it at runtime. Three come up most often: Rebuff, Vigil, and the PromptInjection scanner inside LLM Guard. They share a common insight — no single technique catches prompt injection reliably, so layer several — but they differ in architecture, maturity, and, critically, maintenance status. This comparison covers all three honestly, including the parts that affect whether you should deploy them today.

The shared design idea: layered detection

All three tools reject the notion of a single magic classifier. Prompt injection is adversarial; any one detector can be evaded, so the durable pattern is defense-in-depth across detection methods. Where they differ is which layers they implement and how mature each layer is.

Rebuff: the canonical four-layer design — but archived

Rebuff (Protect AI, Apache-2.0) is the tool most people name first, and its architecture is the cleanest statement of the layered idea. It combines four defenses:

  1. Heuristics — fast filters that catch obviously malicious input before it reaches the LLM.
  2. LLM-based detection — a dedicated model that reads the incoming prompt and judges whether it’s an attack.
  3. Vector database — embeddings of previously seen attacks, so the system recognizes variants of attacks it has encountered before.
  4. Canary tokens — markers injected into prompts; if a canary leaks into output, you’ve detected a prompt-leakage attack and can store the offending input for future recognition.

Rebuff ships Python (pip install rebuff) and JavaScript/TypeScript SDKs. Its authors were always candid that “Rebuff is still a prototype and cannot provide 100% protection against prompt injection attacks.”

The decisive fact for 2026: the Rebuff repository was archived by its owner on May 16, 2025, and is now read-only. It is no longer maintained. The architecture remains an excellent reference for how to think about layered injection detection, and the code still runs, but you should not build new production systems on an archived, unmaintained dependency without a plan to fork and maintain it yourself. Treat Rebuff as a design blueprint, not a supported product.

Vigil: YARA signatures plus embeddings — but alpha

Vigil (deadbits) assesses prompts against a set of scanners to detect prompt injections, jailbreaks, and other risky inputs. Its layer set overlaps with Rebuff’s but leans toward signature-based detection:

  • Vector database similarity search against known attack patterns
  • Heuristics via YARA rules — the signature engine borrowed from traditional malware detection
  • A fine-tuned transformer classifier
  • Prompt-response similarity checks and canary tokens for leakage/goal-hijacking detection

Vigil ships the detection signatures and datasets needed for self-hosting and includes a Streamlit playground UI for interactive testing. It is genuinely extensible — you can add custom scanners, write new YARA signatures, or extend the vector DB.

The honest caveat: Vigil describes itself as alpha and experimental, intended for research purposes. The YARA-signature approach is a distinctive and useful idea — it gives you human-readable, version-controllable detection rules — but the project’s stated maturity means it is better suited to research, prototyping, and as a source of detection signatures than to carrying production traffic unattended.

LLM Guard PromptInjection: the maintained option

The PromptInjection scanner inside LLM Guard (Protect AI, MIT) is the most pragmatic choice for production today, for one reason that has nothing to do with cleverness: it is actively maintained. LLM Guard ships regular releases and the PromptInjection scanner is one of 15 input scanners in a toolkit that’s under continuous development.

Architecturally it’s simpler than Rebuff’s four-layer design — it’s a transformer-based classifier that scores a prompt for injection likelihood and returns a valid/invalid verdict plus a risk score. It does not, on its own, give you Rebuff’s canary tokens or Vigil’s YARA layer. What it gives you is a maintained dependency you can pin, upgrade, and rely on, embedded in a broader scanner library so you compose it with Anonymize, Secrets, Toxicity, and the rest. For the full picture of that library, see our LLM Guard input/output scanning walkthrough.

Side-by-side

RebuffVigilLLM Guard PromptInjection
MaintainerProtect AIdeadbitsProtect AI
LicenseApache-2.0open-sourceMIT
Detection layersHeuristics, LLM judge, vector DB, canary tokensYARA, vector DB, transformer, similarity, canaryTransformer classifier
SDKsPython, JS/TSPython (self-host)Python (+ API)
Status (2026)Archived (May 2025), read-onlyAlpha / experimentalActively maintained
Best used asArchitecture blueprintResearch, signature sourceProduction runtime scanner

What none of them are

All three are detectors, not complete defenses, and all three say so. A prompt-injection detector reduces risk; it does not eliminate it, because the attack surface is adversarial and evolving. Pair any of them with:

  • Pre-deployment red teaming to find what gets through before you ship — our garak walkthrough and PyRIT explainer cover the scanning side.
  • Output-side defenses, because a detector that misses an injection on input is your last chance to catch the consequence on output — see the classifier-on-output pattern.
  • Honest measurement of detection rate against benign false-positive cost, because a detector tuned to catch everything will refuse legitimate users. aisecbench.com covers prompt-injection detector benchmarking methodology, and our false-positive cost guide covers tuning the threshold.

For the attack techniques these detectors are built to catch, aisec.blog breaks down injection mechanics in operational detail.

Practical Recommendation

For new production deployments in 2026, start with LLM Guard’s PromptInjection scanner — it is the only one of the three under active maintenance, and it lives in a toolkit you’ll want anyway. Mine Vigil for its YARA signatures and treat it as a research/prototyping tool, not unattended production infrastructure. Study Rebuff’s four-layer architecture as the reference design — canary tokens and the vector-DB-of-known-attacks pattern are worth replicating — but do not adopt an archived dependency without committing to maintain a fork. Whichever you pick, measure detection rate against benign false-positive cost on your own traffic before you trust the threshold. For broader stack guidance, guardml.io and aidefense.dev cover the surrounding guardrail and defense landscape.


Sources

  • protectai/rebuff — GitHub: Official repository. Apache-2.0; four-layer detection (heuristics, LLM, vector DB, canary tokens). Archived by owner May 16, 2025; read-only and unmaintained.
  • deadbits/vigil-llm — GitHub: Official repository. YARA-signature, vector-DB, transformer, and canary-token scanners with a Streamlit playground. Self-described alpha/experimental, research-purpose.
  • protectai/llm-guard — GitHub: The Security Toolkit for LLM Interactions. MIT-licensed, actively maintained; PromptInjection is one of its input scanners.

Sources

  1. protectai/rebuff — LLM Prompt Injection Detector (GitHub)
  2. deadbits/vigil-llm — Detect prompt injections and jailbreaks (GitHub)
  3. protectai/llm-guard — The Security Toolkit for LLM Interactions (GitHub)
Subscribe

Best LLM Scanners — in your inbox

Comparing LLM security scanners and detection tools. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments