Prompt Injection Detectors: Rebuff vs Vigil vs LLM Guard

The three best-known open-source prompt injection detectors are Rebuff, Vigil, and LLM Guard’s PromptInjection scanner, and for a new production deployment in 2026 only one is a safe default: LLM Guard, because Rebuff is archived and Vigil is self-described alpha. The short reason is maintenance status, but the architecture differences matter too, and this comparison covers both.

Prompt injection sits at the top of the OWASP Top 10 for LLM applications ↗ (LLM01), and a handful of open-source detectors exist specifically to catch it at runtime. Three come up most often: Rebuff, Vigil, and the PromptInjection scanner inside LLM Guard. They share a common insight, that no single technique catches prompt injection reliably so you layer several, but they differ in architecture, maturity, and, critically, maintenance status. This comparison covers all three honestly, including the parts that affect whether you should deploy them today.

The shared design idea: layered detection

All three tools reject the notion of a single magic classifier. Prompt injection is adversarial; any one detector can be evaded, so the durable pattern is defense-in-depth across detection methods. Where they differ is which layers they implement and how mature each layer is.

Rebuff: the canonical four-layer design — but archived

Rebuff ↗ (Protect AI, Apache-2.0) is the tool most people name first, and its architecture is the cleanest statement of the layered idea. It combines four defenses:

Heuristics — fast filters that catch obviously malicious input before it reaches the LLM.
LLM-based detection — a dedicated model that reads the incoming prompt and judges whether it’s an attack.
Vector database — embeddings of previously seen attacks, so the system recognizes variants of attacks it has encountered before.
Canary tokens — markers injected into prompts; if a canary leaks into output, you’ve detected a prompt-leakage attack and can store the offending input for future recognition.

Rebuff ships Python (pip install rebuff) and JavaScript/TypeScript SDKs. Its authors were always candid that “Rebuff is still a prototype and cannot provide 100% protection against prompt injection attacks.”

The decisive fact for 2026: the Rebuff repository was archived by its owner on May 16, 2025, and is now read-only. It is no longer maintained. The architecture remains an excellent reference for how to think about layered injection detection, and the code still runs, but you should not build new production systems on an archived, unmaintained dependency without a plan to fork and maintain it yourself. Treat Rebuff as a design blueprint, not a supported product.

Vigil: YARA signatures plus embeddings — but alpha

Vigil ↗ (deadbits) assesses prompts against a set of scanners to detect prompt injections, jailbreaks, and other risky inputs. Its layer set overlaps with Rebuff’s but leans toward signature-based detection:

Vector database similarity search against known attack patterns
Heuristics via YARA rules — the signature engine borrowed from traditional malware detection
A fine-tuned transformer classifier
Prompt-response similarity checks and canary tokens for leakage/goal-hijacking detection

Vigil ships the detection signatures and datasets needed for self-hosting and includes a Streamlit playground UI for interactive testing. It is genuinely extensible — you can add custom scanners, write new YARA signatures, or extend the vector DB.

The honest caveat: Vigil describes itself as alpha and experimental, intended for research purposes. The YARA-signature approach is a distinctive and useful idea — it gives you human-readable, version-controllable detection rules — but the project’s stated maturity means it is better suited to research, prototyping, and as a source of detection signatures than to carrying production traffic unattended.

LLM Guard PromptInjection: the maintained option

The PromptInjection scanner inside LLM Guard ↗ (Protect AI, MIT) is the most pragmatic choice for production today, for one reason that has nothing to do with cleverness: it is actively maintained. LLM Guard ships regular releases and the PromptInjection scanner is one of 15 input scanners in a toolkit that’s under continuous development.

Architecturally it’s simpler than Rebuff’s four-layer design — it’s a transformer-based classifier that scores a prompt for injection likelihood and returns a valid/invalid verdict plus a risk score. It does not, on its own, give you Rebuff’s canary tokens or Vigil’s YARA layer. What it gives you is a maintained dependency you can pin, upgrade, and rely on, embedded in a broader scanner library so you compose it with Anonymize, Secrets, Toxicity, and the rest. For the full picture of that library, see our LLM Guard input/output scanning walkthrough.

Side-by-side

	Rebuff	Vigil	LLM Guard PromptInjection
Maintainer	Protect AI	deadbits	Protect AI
License	Apache-2.0	open-source	MIT
Detection layers	Heuristics, LLM judge, vector DB, canary tokens	YARA, vector DB, transformer, similarity, canary	Transformer classifier
SDKs	Python, JS/TS	Python (self-host)	Python (+ API)
Status (2026)	Archived (May 2025), read-only	Alpha / experimental	Actively maintained
Best used as	Architecture blueprint	Research, signature source	Production runtime scanner

How to choose by use case

The right detector depends less on raw detection cleverness than on what you are building and how much you can maintain.

Production API or app, small team: LLM Guard’s PromptInjection scanner. A maintained, pinnable dependency that lives inside a scanner toolkit you will want anyway beats a cleverer-but-unmaintained alternative every time.
Security research or a detection-rule library: Vigil. Its YARA-signature layer gives you human-readable, version-controllable rules you can study, fork, and reuse even if you never run Vigil in production.
Designing your own in-house detector: Rebuff as a blueprint. The four-layer pattern (heuristics, LLM judge, vector DB of known attacks, canary tokens) is the cleanest reference architecture, even though the repo itself is archived.

If you are weighing detection against a broader policy layer rather than a pure injection detector, that is a guardrail decision, covered in choosing an LLM guardrail.

Can you run more than one detector together?

Yes, and for adversarial input it is often the point. These tools were all designed around layered defense, so stacking a maintained classifier (LLM Guard) in front with a signature or canary-token layer behind it is a reasonable pattern. The trade-offs to weigh are added latency per request, the combined false-positive rate (each layer can independently refuse a legitimate user), and the maintenance burden of every dependency you add. In practice most teams run one maintained primary detector and reserve extra layers for the highest-risk paths. For where these detectors fit alongside full scanning suites, see the best LLM vulnerability scanners for 2026.

What none of them are

All three are detectors, not complete defenses, and all three say so. A prompt-injection detector reduces risk; it does not eliminate it, because the attack surface is adversarial and evolving. Pair any of them with:

Pre-deployment red teaming to find what gets through before you ship — our garak walkthrough and PyRIT explainer cover the scanning side.
Output-side defenses, because a detector that misses an injection on input is your last chance to catch the consequence on output — see the classifier-on-output pattern.
Honest measurement of detection rate against benign false-positive cost, because a detector tuned to catch everything will refuse legitimate users. aisecbench.com ↗ covers prompt-injection detector benchmarking methodology, and our false-positive cost guide covers tuning the threshold.

For the attack techniques these detectors are built to catch, aisec.blog ↗ breaks down injection mechanics in operational detail.

Practical Recommendation

For new production deployments in 2026, start with LLM Guard’s PromptInjection scanner — it is the only one of the three under active maintenance, and it lives in a toolkit you’ll want anyway. Mine Vigil for its YARA signatures and treat it as a research/prototyping tool, not unattended production infrastructure. Study Rebuff’s four-layer architecture as the reference design — canary tokens and the vector-DB-of-known-attacks pattern are worth replicating — but do not adopt an archived dependency without committing to maintain a fork. Whichever you pick, measure detection rate against benign false-positive cost on your own traffic before you trust the threshold. For broader stack guidance, guardml.io ↗ and aidefense.dev ↗ cover the surrounding guardrail and defense landscape.

Sources

protectai/rebuff — GitHub ↗: Official repository. Apache-2.0; four-layer detection (heuristics, LLM, vector DB, canary tokens). Archived by owner May 16, 2025; read-only and unmaintained.
deadbits/vigil-llm — GitHub ↗: Official repository. YARA-signature, vector-DB, transformer, and canary-token scanners with a Streamlit playground. Self-described alpha/experimental, research-purpose.
protectai/llm-guard — GitHub ↗: The Security Toolkit for LLM Interactions. MIT-licensed, actively maintained; PromptInjection is one of its input scanners.

Prompt Injection Detectors: Rebuff vs Vigil vs LLM Guard

The shared design idea: layered detection

Rebuff: the canonical four-layer design — but archived

Vigil: YARA signatures plus embeddings — but alpha

LLM Guard PromptInjection: the maintained option

Side-by-side

How to choose by use case

Can you run more than one detector together?

What none of them are

Practical Recommendation

Sources

Sources

Best LLM Scanners — in your inbox

Related

What Is Garak LLM Scanner? A Practitioner's Guide to NVIDIA's Open-Source LLM Vulnerability Tool

LLM Guard: Input and Output Scanning for Production LLM Apps

Best LLM Vulnerability Scanners 2026: Garak, PyRIT, Promptfoo, and Mindgard Compared

Comments