Tools
A curated directory of 17 tools we use, evaluate, and recommend across the AI security landscape — with our take on each.
Interactive tool
Scanner Picker →
Answer six questions — what you're protecting, deployment constraint, threat focus, integration point, language, budget — and get a ranked shortlist with a capability matrix, license/cost, pick-if/skip-if, and a side-by-side compare of the top three.
Guardrail Frameworks
NeMo Guardrails
Our take
The most mature open-source guardrails framework. Colang DSL is opinionated but expressive. Production-ready for LangChain-style apps.
Guardrails AI
Our take
Best fit for structured-output use cases. Validator marketplace is growing fast. Use alongside NeMo for full coverage.
Llama Guard
Our take
Self-hostable safety classifier. Good baseline for input/output classification when you can't send traffic to a vendor API.
ShieldGemma
Our take
Comparable to Llama Guard. Pick based on whichever family you already use as your generation model.
Output Classifiers & Detectors
Lakera Guard
Our take
Production-grade if you're willing to send traffic to a vendor. Good detection rates on known attack patterns; less effective against novel attacks.
Prompt Guard
Our take
Great latency story — ~10ms per check on a small GPU. Use as a first-pass filter; fall through to a heavier classifier on flagged inputs.
Microsoft Presidio
Our take
Industry standard for PII detection. Use upstream of LLM calls and downstream of LLM outputs both.
Observability & Monitoring
Langfuse
Our take
The dominant OSS LLM observability tool. Self-hostable; SaaS is reasonably priced. Pair with eval suites for full coverage.
LangSmith
Our take
Great if you're all-in on LangChain. Vendor-locked design otherwise. Langfuse is the more portable alternative.
Phoenix (Arize)
Our take
Strong for combined ML + LLM workloads. Drift detection capabilities go further than most LLM-only tools.
Helicone
Our take
Lowest-friction observability — change one base URL and you have logs. Trades depth for installation speed.
Vector DB & RAG Defense
Robust Intelligence (Cisco)
Our take
Enterprise-grade. Pricing reflects the segment. Worth evaluating if you have compliance scope.
ProtectAI Recon
Our take
Full-stack from infrastructure to runtime. Their open-source ai-exploits repo is a quality signal.
Cryptographic & Privacy
Microsoft SEAL
Our take
Heavy lift for production but real for high-stakes private inference. Pair with TenSEAL for ML-specific helpers.
Opacus
Our take
The reference DP-SGD implementation. Performance hit is real (1.5-3x slower training) but workable.
Sandboxing & Isolation
Modal Sandboxes
Our take
Best UX for letting agents run code safely. Costs add up at high call volume; cheaper than building your own sandbox infra.
E2B
Our take
Open-source backend, hosted convenience. Good middle ground between Modal and rolling your own.