Open Source LLM Red Teaming Tools: PyRIT, Garak, HarmBench, and What to Use When

The market for open source LLM red teaming tools matured fast. In early 2024 you had a handful of academic proof-of-concepts and Microsoft’s freshly released PyRIT. By mid-2026 there are at least six production-viable frameworks covering everything from single-turn jailbreak probing to multi-turn agentic exploitation chains. The problem is no longer finding tools — it’s knowing which one addresses which class of risk and how to combine them without duplicating work.

This guide focuses on the tools security teams actually deploy, maps each to the threat categories in the OWASP Top 10 for Large Language Model Applications ↗, and gives you enough operational detail to make a stack decision.

The Four Core Tools and What They Actually Cover

Garak (NVIDIA)

Garak ↗ is the closest thing the LLM security space has to Nessus. Install with pip install -U garak, point it at an endpoint, and it runs a structured battery of probes across 20+ vulnerability categories: prompt injection, jailbreaks, data leakage, hallucination, encoding-based bypasses, toxicity generation, and malware-adjacent outputs. The architecture decouples probes from detectors, so domain-specific probes are straightforward to add. Results log as JSONL.

What Garak does well is breadth and speed. For CI/CD integration, you select a probe subset, bring scan time from hours to minutes, and get repeatable JSONL output you can diff between releases. It supports OpenAI, Hugging Face, AWS Bedrock, Replicate, Cohere, Groq, GGML/llama.cpp, and custom REST targets.

What Garak does not do is multi-turn attack chaining. Every probe is a single exchange. That gap matters for agentic applications where the dangerous path requires building context across turns.

PyRIT (Microsoft)

PyRIT ↗ fills that gap. Microsoft designed it specifically for multi-turn orchestration — what the team calls “crescendo attacks” where an adversarial conversation gradually shifts model behavior across several exchanges. PyRIT handles text, image, audio, and video modalities, and its converter layer lets you layer prompt obfuscation on top of any attack strategy: encoding, translation, paraphrasing, persona injection.

The tradeoff is operational overhead. PyRIT is a framework, not a CLI scanner. You write Python to define attack chains, scorers, and memory state. That flexibility is essential for testing autonomous agents with tool access, but it’s the wrong starting point if you just want a quick pre-deploy scan of a chat endpoint. Use Garak for breadth, PyRIT when depth and multi-turn coverage matter.

HarmBench (Center for AI Safety)

HarmBench ↗ approaches the problem from a different angle: standardized benchmarking. Rather than running live attacks against a production endpoint, it evaluates red teaming methods against a curated set of 510 harmful behaviors spanning cybercrime, bioweapons, disinformation, and harassment — and separately evaluates LLM robustness against those methods. The paper tested 33 models and 18 attack strategies.

The practical value for security teams is reproducibility. HarmBench gives you a shared baseline so you can compare model versions, fine-tuned variants, or the effect of adding a guardrail layer, against an established reference point. It is not a live scanner but the reference benchmark your scanner output should eventually map to.

TextAttack (QData Lab, University of Virginia)

TextAttack targets adversarial robustness at the NLP layer — character, word, sentence, and semantic-level perturbations. It ships with 50+ attack recipes covering BERT-class models and standard classification tasks. For teams working on fine-tuned classifiers (content moderation, intent classification, PII detectors used as guardrail components), TextAttack provides adversarial training datasets that Garak and PyRIT don’t generate.

Building a Pipeline That Actually Holds

The mistake most teams make is treating red teaming as a one-time audit before launch. The threat surface for LLMs shifts continuously: new jailbreak techniques appear weekly, the OWASP Top 10 added systemic prompt leakage and excessive agency as distinct categories in 2025, and any model update can regress safety behavior that was previously stable.

A sustainable pipeline looks like this:

Stage 1 — Pre-deploy gate (CI/CD). Run a Garak probe subset on every candidate model version. Select probes that cover your actual exposure surface: if you’re not serving audio, skip audio injection probes; if you’re building an agent with tool access, include the plugin and agentic abuse probes. Set a failure threshold — for example, flag any build where jailbreak success rate exceeds 2% on a defined probe set.

Stage 2 — Depth testing (release candidates). On release candidates, run PyRIT multi-turn scenarios against your specific system prompt and tool configuration. Agentic applications that execute code, query databases, or browse the web need attack chains that simulate goal hijacking across multiple turns, not single-shot payloads.

Stage 3 — Regression benchmarking. After each significant model update or guardrail change, run the relevant HarmBench subset to track whether refusal behavior improved or regressed versus the prior release. This gives you a number to put in a security review, not just qualitative notes.

Stage 4 — Runtime complement. Red teaming tools are pre-deployment instruments. They do not replace runtime guardrails. For the runtime layer — input classification, output filtering, PII redaction — see the guardml.io ↗ coverage of defensive guardrail tooling, which covers LLM Guard, NeMo Guardrails, and AWS Bedrock Guardrails.

For teams tracking the evolving attack taxonomy that informs which probes matter, aisec.blog ↗ maintains a running breakdown of prompt injection variants and jailbreak technique categories mapped to OWASP LLM01 and related entries.

Emerging Tools Worth Watching

Two tools that have gained traction in 2025-2026:

DeepTeam (Confident AI) — a penetration testing framework designed explicitly for LLM applications. It maps attack types to OWASP LLM Top 10, NIST AI 600-1, and MITRE frameworks, and supports multi-turn exploitation workflows similar to PyRIT but with tighter out-of-the-box coverage of agent-specific risks.

OpenRT — a multimodal red teaming toolkit from AI45Lab covering 42+ attack methods against vision-language models. Relevant if your deployment includes image or document inputs alongside text, where cross-modal injection is an emerging attack vector that single-modality tools miss entirely.

The field moves fast enough that any tool list is partially stale on publication. The more durable principle: match the attack modality to your deployment surface, automate the baseline in CI/CD, and reserve manual PyRIT-style depth testing for the highest-risk agent configurations.

Sources

garak — NVIDIA GitHub ↗: Official repository for the Garak LLM vulnerability scanner. The primary reference for probe categories, supported targets, and release notes.
PyRIT — Microsoft GitHub ↗: Source and documentation for Microsoft’s multi-turn LLM red teaming framework. Covers converter architecture, orchestrator patterns, and modality support.
HarmBench — arXiv 2402.04249 ↗: The Center for AI Safety’s standardized benchmarking paper. Reports results across 33 LLMs and 18 red teaming methods; foundational reference for reproducible evaluation.
OWASP Top 10 for Large Language Model Applications ↗: The canonical threat taxonomy for LLM deployments, used here to map tool coverage to standardized risk categories.