Microsoft's multi-agent conversation framework. AutoGen lets you orchestrate a graph of specialized agents—planner, coder, critic—that collaborate autonomously to solve complex tasks. It is the de-facto reference for understanding agentic handoff patterns and is actively maintained with strong documentation.
Agent Framework
Build
A graph-based agent orchestration library built on top of LangChain. LangGraph models agent logic as a stateful directed graph with explicit nodes and edges, making it easy to reason about cyclic workflows, human-in-the-loop checkpoints, and rollback—capabilities that matter when building security-aware agents with controlled decision loops.
Agent Framework
Build
Role-based multi-agent collaboration platform. CrewAI assigns each agent a job title, backstory, and goal, then coordinates them with a configurable process (sequential or hierarchical). Its intuitive mental model maps naturally onto security team personas—analyst, threat-hunter, incident-responder—making it easy to prototype agentic SOC workflows.
Agent Framework
Build
An LLM-powered penetration-testing copilot for authorized engagements. PentestGPT maintains a persistent session context, suggests the next test step, and explains its reasoning—bridging the gap between a junior tester's knowledge and experienced pen-test methodology. Use only on scopes you own or have written authorization for.
Authorized Offensive
Testing
NVIDIA's open-source LLM vulnerability scanner. Garak runs hundreds of probes against a target model—checking for prompt injection, data leakage, jailbreaks, toxicity, and more—and produces a structured report. It is the closest thing the field has to a standardized security audit tool for AI models and agents, and is under active development.
Security Testing
Guardrails
A composable input/output safety layer for LLM applications. LLM Guard ships pre-built scanners for prompt injection, PII detection, toxic content, code execution attempts, and more. Drop it in front of any agentic pipeline to add policy enforcement without changing the core agent logic.
Guardrails
Security Testing
A developer-first LLM testing and red-teaming framework. Promptfoo evaluates prompts and agents across multiple providers, runs automated red-team attacks (jailbreaks, SSRF, SQL injection via prompt), and integrates into CI pipelines. Its YAML-based test format makes it practical for embedding security regression tests alongside feature tests.
Security Testing
Testing
OpenAI's framework for evaluating LLM and agent behavior against curated datasets. Beyond accuracy benchmarks, Evals is the right foundation for building custom safety and alignment evaluations—measuring how an agent responds to adversarial inputs, policy-violating requests, and edge-case security scenarios.
Testing
Security Testing
Secure cloud sandboxes for running AI-generated code and agent actions. E2B spins up ephemeral, isolated execution environments on demand, so agents can run shell commands, install packages, and interact with filesystems without exposing the host. It is a critical safety primitive for any agent that executes untrusted or AI-generated code.
Guardrails
Agent Framework
Microsoft's enterprise-grade SDK for building AI agents and copilots. Semantic Kernel emphasizes structured function-calling, memory management, and planner-driven execution with first-class support for responsible-AI filters—pre- and post-execution hooks that let security teams intercept, log, and block agent actions before they reach production systems.
Agent Framework
Guardrails