Automated LLM Red-Teaming in CI: garak vs PyRIT vs Promptfoo

The point of automated LLM red-teaming is to fail a build when a model or prompt change introduces a security regression — before it ships, not after an incident. Three open-source tools can play that gate: NVIDIA’s garak, Microsoft’s PyRIT, and Promptfoo. They overlap enough to be confused and differ enough that the right answer is usually “more than one.” This guide compares how each fits a CI/CD pipeline, what it actually scans, and where each earns its place.

The CI gate, defined

A useful CI security gate has four properties: it runs unattended, it produces a machine-parseable result, it lets you set pass/fail thresholds, and it’s fast enough that engineers don’t route around it. Hold the three tools against that bar.

garak: the breadth-first probe scanner

garak ↗ (NVIDIA, Apache-2.0) is the closest thing LLM security has to a Nessus-style scanner: a large library of pre-built probes covering jailbreaks, prompt injection, toxicity, hallucination, data leakage, and encoding attacks, run with a single command. For CI, its strengths are exactly the ones that matter:

One-command invocation. python -m garak --target_type ... --probes ... runs unattended with no code.
Machine-parseable output. garak writes JSONL hit logs and reports plus an HTML report; a shell step can parse the JSONL, check whether any probe category exceeded a threshold, and fail the build.
Targeted subsets. Running all probes takes hours; in CI you run a focused subset to bring scan time down to minutes, then run the full suite nightly.

garak is the natural always-on regression gate: pin a probe subset, set per-category failure thresholds, and run it on every model or system-prompt change. Our garak walkthrough covers the probe/detector/generator architecture in depth. Its limit for CI is that it’s a fixed-corpus scanner — strong on known attack classes, thinner on multi-turn and bespoke campaigns.

PyRIT: the programmable campaign

PyRIT ↗ (Microsoft, MIT) is a red-teaming SDK, not a one-command scanner. You compose targets, datasets, orchestrators, converters, and scorers in Python. That makes it more work to wire into CI — there’s no single CLI gate out of the box — but it buys two things garak can’t:

Multi-turn attack strategies (Crescendo, TAP, Skeleton Key) that carry conversation state, catching vulnerabilities a single-shot scanner structurally can’t.
Custom scorers tied to your own policy, so the pass/fail signal reflects your definition of harm rather than a generic detector’s.

In CI, PyRIT fits as a scheduled campaign rather than a per-commit gate: a committed Python harness (orchestrators, datasets, scorers in version control) run on a schedule, emitting scores your pipeline thresholds against. Our PyRIT explainer covers the architecture. The cost is that you own the integration glue; the benefit is depth garak doesn’t reach.

Promptfoo: red-teaming built for the pipeline

Promptfoo ↗ (MIT-licensed; the project is now part of OpenAI and remains open source) was designed CI-first. It started as a prompt/RAG evaluation tool and grew a red-team mode that auto-generates adversarial prompts using a large library of attack plugins — prompt injection, jailbreaks, PII leakage, excessive agency, and many more — driven by a declarative YAML config rather than code.

For CI specifically, Promptfoo is the most turnkey of the three:

Declarative config. The whole eval/red-team run is a YAML file you commit — no harness code.
First-class CI/CD integration, including a GitHub Action for red-team scanning, so a failing finding blocks a pull request directly.
Compliance mappings. Its red-team presets map to the OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS, which is useful when the gate has to produce an auditable report, not just a pass/fail.

Promptfoo’s sweet spot is teams that want red-teaming wired into pull-request CI with minimal custom code and a compliance-flavored report at the end.

Side-by-side for CI

	garak	PyRIT	Promptfoo
Maintainer	NVIDIA	Microsoft	Promptfoo (part of OpenAI)
License	Apache-2.0	MIT	MIT
Shape	Probe scanner (CLI)	Red-team SDK (Python)	Eval + red-team (declarative + CLI)
CI integration	JSONL parse + threshold	Custom harness, scheduled	Native GitHub Action
Multi-turn attacks	Limited	Strong (Crescendo, TAP)	Plugin-based
Config style	CLI flags / option files	Python code	YAML
Compliance reporting	Manual	Manual	OWASP / NIST / MITRE presets
Best CI role	Per-commit regression gate	Scheduled deep campaign	PR-blocking red-team gate

They’re complementary, not exclusive

The mature pipeline uses more than one, mapped to where each is strong:

Promptfoo or garak as the fast per-commit / per-PR gate — declarative and CI-native (Promptfoo) or one-command with threshold parsing (garak).
garak full suite nightly, for breadth the per-commit subset skips.
PyRIT on a schedule for the deep, multi-turn, custom-scored campaigns that need conversation state and your own policy.

All three are pre-deployment tools. None provides runtime protection — for live input/output screening you need a separate layer like LLM Guard or a guardrail model, covered in our guardrail selection guide and at guardml.io ↗. And a gate is only as good as its thresholds: set per-category pass/fail bars deliberately, because they will otherwise be negotiated under pressure after an incident.

The threshold problem

A red-team CI gate fails builds, which means a badly tuned gate either lets regressions through (thresholds too loose) or blocks every release on noise (thresholds too tight). Two disciplines keep it honest:

Pin the target model to a dated snapshot. A gate that runs against a floating model alias will flip pass/fail when the provider silently updates the model, and you’ll waste days chasing a “regression” you didn’t cause. aisecbench.com ↗ covers the reproducibility discipline these gates depend on.
Set thresholds against measured false-positive cost. An attack-detection threshold that’s too aggressive blocks legitimate releases; our false-positive cost guide covers turning detection rates into a defensible bar.

For the attack techniques all three tools generate, aisec.blog ↗ breaks down the mechanics.

Practical Recommendation

If you want the fastest path to a PR-blocking gate with a compliance report, start with Promptfoo — declarative YAML and a native GitHub Action make it the lowest-friction CI option. If you’re already running garak, keep it as the per-commit regression gate (subset) plus a nightly full suite; its JSONL output parses cleanly into a threshold check. Add PyRIT as a scheduled deep campaign when your threat model needs multi-turn strategies or custom scoring tied to your policy. Pin the target to a dated snapshot in every case, and set per-category thresholds before they matter. For where these scanners sit in a complete stack, see Best LLM Security Scanners: Open-Source and Enterprise Compared, and aidefense.dev ↗ for surrounding defense strategy.

Sources

NVIDIA/garak — GitHub ↗: LLM vulnerability scanner. Apache-2.0. Single-command probe runs, JSONL/HTML output suitable for CI threshold parsing.
microsoft/PyRIT — GitHub ↗: Python Risk Identification Tool. MIT. Programmable orchestrators with multi-turn strategies (Crescendo, TAP, Skeleton Key) and custom scorers.
promptfoo/promptfoo — GitHub ↗: Test and red-team LLM apps. MIT-licensed; now part of OpenAI and remains open source. Declarative configs, CI/CD and GitHub Action integration, attack plugins with OWASP/NIST/MITRE mappings.

Automated LLM Red-Teaming in CI: garak vs PyRIT vs Promptfoo

The CI gate, defined

garak: the breadth-first probe scanner

PyRIT: the programmable campaign

Promptfoo: red-teaming built for the pipeline

Side-by-side for CI

They’re complementary, not exclusive

The threshold problem

Practical Recommendation

Sources

Sources

Best LLM Scanners — in your inbox

Related

PyRIT: Microsoft's AI Red-Teaming Framework, Explained

Garak LLM Vulnerability Scanner: How It Works and When to Use It

Best LLM Vulnerability Scanners 2026: Garak, PyRIT, Promptfoo, and Mindgard Compared

Comments