Garak LLM Vulnerability Scanner: How It Works and When to Use It
A technical breakdown of the garak LLM vulnerability scanner — its probe architecture, supported attack categories, CLI workflow, and how it fits into a real AI red-teaming pipeline.
The garak LLM vulnerability scanner ↗ is the closest thing the industry has to a Nessus-style probe suite for language models. Where traditional network scanners enumerate CVEs against service banners, garak fires structured adversarial prompts at a target LLM, collects responses, and runs a battery of detectors to determine whether the model exhibited unsafe behavior. The result is a scored audit trail — JSONL logs plus an HTML report — that tells you which attack classes succeeded, at what rate, and against which inputs.
Garak stands for Generative AI Red-teaming & Assessment Kit. It is open-source (Apache 2.0), backed by NVIDIA, and described in a peer-reviewed framework paper by Derczynski et al. The current stable release as of May 2026 is v0.15.0. If you are evaluating, deploying, or auditing an LLM-powered system and haven’t run garak against it, you have a gap in your security posture.
Architecture: Generators, Probes, Detectors, Buffs
Garak’s design maps cleanly onto cybersecurity concepts most practitioners already know.
Generators are adapters that connect garak to a target model. The framework ships with generators for OpenAI-compatible APIs, Hugging Face model IDs, AWS Bedrock, Replicate, Cohere, Groq, NVIDIA NIM endpoints, local GGUF models, and any REST-accessible endpoint via a configurable JSON adapter. Pointing garak at a Databricks-hosted model, for example, requires only a JSON config file specifying the endpoint URL, authentication token, and the JSONPath expression for parsing the response.
Probes are the attack modules. Each probe encapsulates a specific threat category and ships with a corpus of adversarial prompts. Garak currently includes 37+ probe modules covering:
- Jailbreaks: DAN (Do Anything Now) variants including DAN 6.0, 11.0, and AutoDAN; GCG (Greedy Coordinate Gradient) adversarial suffixes; PAIR and TAP attack strategies
- Prompt injection: indirect injection via tool outputs, document context, and system-prompt override attempts
- Toxicity: real toxicity prompts drawn from validated datasets; language model risk (LMRC) probes
- Hallucination: snowballed hallucination chains; package hallucination (fabricated PyPI/npm dependency names that could enable supply chain attacks)
- Data leakage: training data replay; membership inference attacks
- Encoding-based attacks: Base64, ROT13, Zalgo text, and other obfuscation layers that can bypass safety filters
- Malware generation: code-focused probes checking whether a model will generate functional exploit code
- XSS and data exfiltration: testing for cross-site scripting in LLM-generated output, relevant to agentic systems that render HTML
Detectors analyze model outputs after each probe. Depending on the attack class, detectors use keyword matching, regex patterns, machine learning classifiers, or a separate LLM acting as a judge. Each probe has a primary_detector plus optional extended detectors. This layered approach reduces both false positives (a keyword match that isn’t actually harmful) and false negatives (harmful content that doesn’t trigger simple pattern rules).
Buffs are prompt transformation layers that run before probes are dispatched. Backtranslation, paraphrasing, and encoding variations act as fuzzing mechanisms — they increase attack surface coverage by testing whether safety measures are robust to surface-level reformulation of the same attack intent. A model that refuses a direct jailbreak but complies with a Base64-encoded version of the same prompt has a real vulnerability, and buffs surface that.
Running a Scan
Installation is a single pip command:
python -m pip install -U garak
A minimal scan against a local Hugging Face model looks like:
python -m garak --target_type huggingface --target_name gpt2 --probes dan.Dan_11_0
For production API models, you configure a generator in a JSON file and pass it via --generator_option_file. The --probes flag accepts comma-separated module names or all to run the full suite. A full scan can issue thousands of prompts — the corpus spans over 3,000 test cases across probe categories — so budget accordingly for API costs and rate limits.
Garak writes three output artifacts per run: a .jsonl hit log (every prompt, response, and detector verdict), a .jsonl report aggregating pass/fail rates, and an HTML report introduced in v0.14.0 that renders results in a browser-readable format. These artifacts make garak suitable for inclusion in CI/CD pipelines: a shell script can parse the JSONL report, check whether any probe exceeded a configurable failure threshold, and fail the pipeline accordingly.
For teams looking to compare garak’s output against established safety benchmarks, aisecbench.com ↗ tracks evaluation frameworks that can complement automated probe results with standardized scoring rubrics.
What the Results Actually Tell You
A Databricks engineering team ran garak against a hosted LLM and found that DAN-class jailbreaks succeeded on every single attempt across five repetitions — a 100% attack success rate against that specific model configuration. That finding is the kind of concrete, actionable signal garak is designed to surface. It does not mean the model is unusable; it means the deployment configuration needs a guardrail layer between the model and user input before it goes to production.
This is garak’s primary role: pre-deployment red teaming. It scans for exploitable behavior before you ship. It does not provide runtime protection. The distinction matters. For runtime defense — input screening, output filtering, real-time anomaly detection — you need a separate layer. guardml.io ↗ covers the guardrail and content-filtering tools that sit alongside a scanner like garak in a complete defensive stack.
For the attack techniques that garak probes are built around — the mechanics of DAN jailbreaks, indirect prompt injection, and GCG adversarial suffixes — aisec.blog ↗ provides operational breakdowns that help practitioners understand what garak is actually testing and why specific probe categories matter for their threat model.
Probe Coverage and Known Gaps
Garak is strong on known, documented attack classes. Its probe library maps well to the OWASP Top 10 for LLMs ↗ — particularly LLM01 (prompt injection), LLM02 (insecure output handling), LLM06 (sensitive information disclosure), and LLM09 (misinformation). The adaptive attack generation module (atkgen) can learn from successful probes and synthesize novel test cases, pushing garak slightly toward the red-teaming end of the spectrum rather than pure static-corpus scanning.
Coverage gaps are worth acknowledging. Garak does not yet have deep coverage for agentic multi-step attacks — scenarios where an LLM-powered agent takes sequential actions across tool calls. That gap reflects the broader state of the field; frameworks targeting agentic exploitation are still maturing. Garak also does not test model weights for backdoors or data-poisoning artifacts — that requires a different class of tooling.
For a narrower, code-generation-focused workflow, promptfoo is a faster alternative with tight CI integration. For teams that need multi-agent red teaming, Microsoft’s PyRIT covers some of the agentic territory garak currently leaves open. Garak’s advantage is breadth of classical LLM attack coverage and the quality of its peer-reviewed probe corpus. For a side-by-side comparison of garak, PyRIT, Promptfoo, LLM Guard, and runtime options as a complete defensive stack, see Best LLM Security Scanners: Open-Source and Enterprise Tools Compared.
Practical Recommendation
Run garak as a gate in your model evaluation pipeline, not a one-time audit. Models change when fine-tuned, when system prompts are updated, or when the underlying base model is upgraded. A probe suite that passes today can fail after a system prompt change. Automate the scan, version the JSONL reports alongside your model artifacts, and set explicit pass/fail thresholds per probe category before those thresholds are negotiated under pressure after an incident. For guidance on measuring and maintaining those thresholds over time — including how to build the eval set and convert false positive rates into business cost — see False Positive Cost in Production Refusal Systems: How to Measure and Tune.
Sources
- NVIDIA/garak: the LLM vulnerability scanner — GitHub ↗: Official repository with installation instructions, probe documentation, and release notes. Current stable release: v0.15.0 (May 2026).
- garak: A Framework for Security Probing Large Language Models — arXiv ↗: Peer-reviewed framework paper by Derczynski, Galinkin, Martin, Majumdar, and Inie. Describes the probe/detector/generator architecture and methodology.
- AI Security in Action: Applying NVIDIA’s Garak to LLMs on Databricks ↗: Engineering walkthrough of connecting garak to a REST API endpoint; includes real scan results showing 100% DAN jailbreak success on an unguarded model.
- garak: LLM vulnerability scanner — official site ↗: Project homepage with documentation links, community Discord, and usage guides.
Sources
Best LLM Scanners — in your inbox
Comparing LLM security scanners and detection tools. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Best LLM Security Scanners: Open-Source and Enterprise Compared
A practitioner's comparison of the best LLM security scanners — Garak, PyRIT, LLM Guard, Promptfoo, Vigil, and enterprise options. Coverage, CI/CD fit, and runtime use cases.
PyRIT: Microsoft's AI Red-Teaming Framework, Explained
A technical breakdown of PyRIT, Microsoft's Python Risk Identification Tool for generative AI — its target/dataset/orchestrator/converter/scorer architecture, multi-turn attack strategies, and where it fits next to garak.
Automated LLM Red-Teaming in CI: garak vs PyRIT vs Promptfoo
Three open-source tools can gate your pipeline on LLM security findings — garak, PyRIT, and Promptfoo. A practitioner comparison of how each fits CI/CD, what it scans, and which to run where.