PyRIT: Microsoft's AI Red-Teaming Framework, Explained
A technical breakdown of PyRIT, Microsoft's Python Risk Identification Tool for generative AI — its target/dataset/orchestrator/converter/scorer architecture, multi-turn attack strategies, and where it fits next to garak.
PyRIT ↗ — the Python Risk Identification Tool for generative AI — is Microsoft’s open-source framework for proactively finding risks in generative AI systems. Where a scanner like garak ships a fixed library of probes and runs them, PyRIT is closer to a programmable red-teaming SDK: you compose attack strategies, target adapters, prompt transformations, and scorers in Python, and the framework runs the loop. It came out of the work of Microsoft’s AI Red Team, which has used it across the company’s own generative AI systems, and it is MIT-licensed and actively maintained.
If garak answers “run the standard probes against my model,” PyRIT answers “automate the specific attack campaign I want to run, including multi-turn strategies, and score the results my way.”
Architecture: five composable pieces
PyRIT’s design is a small set of abstractions that snap together. Understanding the five is most of understanding the tool.
Targets are adapters to the system under test. PyRIT can drive OpenAI, Azure OpenAI, Anthropic, Google, and Hugging Face models, custom HTTP endpoints and WebSockets, and even web-app targets via Playwright. The target abstraction is what lets the same attack run against a hosted API today and a self-hosted model tomorrow without rewriting the attack.
Datasets are the seed prompts and attack objectives — the harmful behaviors or jailbreak goals you want to test. PyRIT ships datasets and lets you bring your own, which matters because real coverage comes from domain-specific objectives, not just the public set.
Orchestrators are the attack strategies — the logic that decides what to send, in what order, and how to react to responses. This is where PyRIT’s multi-turn capability lives: orchestrators implement strategies like Crescendo (escalating a conversation gradually toward the harmful goal so no single turn trips a filter), TAP (Tree of Attacks with Pruning), and Skeleton Key. Single-turn send-and-score is the simplest orchestrator; the interesting ones carry conversation state across turns.
Converters are prompt transformations applied before a prompt is sent — Base64 or other encodings, translation, ASCII art, tone shifts, and dozens more. Converters are the fuzzing layer: they test whether a safety measure that catches a plain attack survives a surface-level reformulation of the same intent. A model that refuses a direct request but complies with its Base64-encoded form has a real vulnerability, and converters surface it.
Scorers evaluate responses. PyRIT supports true/false scorers, Likert-scale scorers, classification scorers, and custom logic, backed by an LLM-as-judge, Azure AI Content Safety, or your own code. The scorer is what turns a pile of responses into a measurable attack success signal.
A red-teaming run is these five wired together: an orchestrator pulls objectives from a dataset, optionally runs prompts through converters, sends them to a target, and hands responses to a scorer. Swap any one piece and the rest keep working.
Single-turn vs multi-turn is the real differentiator
Most automated LLM scanning is single-turn: send an adversarial prompt, score the response, move on. Many real attacks are not single-turn. Crescendo, for example, never asks for the harmful thing directly — it walks a conversation through a series of innocuous-looking turns, each building on the last, until the model produces the target content without any single message looking like an attack. Single-turn scanners structurally cannot find this class of vulnerability.
PyRIT’s orchestrators carry conversation state, which is exactly what multi-turn strategies need. That is the strongest reason to reach for PyRIT over a single-shot scanner: if your threat model includes a patient adversary working a conversation, you need a tool that can simulate one. For the mechanics of Crescendo and related multi-turn attacks, aisec.blog ↗ covers the technique families in operational detail.
A minimal run
PyRIT installs from PyPI:
pip install pyrit
The framework is driven from Python rather than a single CLI invocation — you instantiate a target, choose an orchestrator, optionally attach converters, and attach a scorer. Microsoft maintains runnable notebooks and docs at the PyRIT documentation site ↗ that walk through single-turn scoring, the Crescendo orchestrator, and custom scorers. Because it is code, a PyRIT campaign is naturally version-controllable: commit the orchestrator config, the dataset, and the scorer, and the run is reproducible — a property the AI-security benchmarking discipline cares about deeply, covered at aisecbench.com ↗.
PyRIT also ships a GUI for human-in-the-loop red teaming, for the cases where an operator wants to interact with the target directly, track findings, and collaborate, rather than run a fully automated campaign.
Where PyRIT fits next to garak
These tools are complementary, not competitive:
- garak is the breadth-first probe scanner: a large library of pre-built probes mapped to known attack classes, run with a single command, ideal as a CI gate. See our garak walkthrough for the full breakdown.
- PyRIT is the depth-first red-teaming SDK: programmable, multi-turn, custom-scored campaigns, ideal when you need to model a specific adversary or cover agentic and multi-turn territory garak leaves thinner.
A mature pipeline uses both: garak as the automated regression gate that runs on every model change, and PyRIT for targeted red-team campaigns — multi-turn strategies, domain-specific objectives, custom scorers — that the probe library doesn’t cover. For the broader landscape of where each tool sits, see Best LLM Security Scanners: Open-Source and Enterprise Compared.
What PyRIT is not
PyRIT is a pre-deployment and assessment tool, not a runtime guard. It finds risks before you ship and helps you measure them; it does not sit in the request path filtering live traffic. For runtime input/output screening you need a separate layer — guardml.io ↗ covers the guardrail and content-filtering tools that occupy that role. PyRIT also expects you to bring judgment: the scorers, the objectives, and the definition of “harmful” for your product are yours to specify. The framework automates the loop; it does not decide your policy.
Practical Recommendation
Adopt PyRIT when single-turn probe scanning is no longer enough — when your threat model includes multi-turn adversaries, when you need custom scoring tied to your own policy, or when you’re red-teaming an agent rather than a chatbot. Keep garak as the always-on gate. Commit your PyRIT orchestrators, datasets, and scorers to version control so campaigns are reproducible, and pin the target model to a dated snapshot so a result means something next month. For converting attack-success findings into deployment thresholds and business cost, see False Positive Cost in Production Refusal Systems: How to Measure and Tune.
Sources
- microsoft/PyRIT — GitHub ↗: Official repository. MIT-licensed; maintained by Microsoft’s AI Red Team. Target/dataset/orchestrator/converter/scorer architecture; multi-turn strategies including Crescendo, TAP, and Skeleton Key.
- PyRIT Documentation ↗: Runnable notebooks and guides for single-turn scoring, multi-turn orchestrators, converters, and custom scorers.
- Announcing Microsoft’s open automation framework to red team generative AI systems ↗: Microsoft’s launch announcement describing PyRIT’s origin in the AI Red Team and its intended use.
Sources
Best LLM Scanners — in your inbox
Comparing LLM security scanners and detection tools. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Best LLM Security Scanners: Open-Source and Enterprise Compared
A practitioner's comparison of the best LLM security scanners — Garak, PyRIT, LLM Guard, Promptfoo, Vigil, and enterprise options. Coverage, CI/CD fit, and runtime use cases.
Garak LLM Vulnerability Scanner: How It Works and When to Use It
A technical breakdown of the garak LLM vulnerability scanner — its probe architecture, supported attack categories, CLI workflow, and how it fits into a real AI red-teaming pipeline.
Automated LLM Red-Teaming in CI: garak vs PyRIT vs Promptfoo
Three open-source tools can gate your pipeline on LLM security findings — garak, PyRIT, and Promptfoo. A practitioner comparison of how each fits CI/CD, what it scans, and which to run where.