Agentic Cyber Library — The Bot Layer

AutoGen

microsoft

Microsoft's multi-agent conversation framework. AutoGen lets you orchestrate a graph of specialized agents—planner, coder, critic—that collaborate autonomously to solve complex tasks. It is the de-facto reference for understanding agentic handoff patterns and is actively maintained with strong documentation.

Agent Framework Build

LangGraph

langchain-ai

A graph-based agent orchestration library built on top of LangChain. LangGraph models agent logic as a stateful directed graph with explicit nodes and edges, making it easy to reason about cyclic workflows, human-in-the-loop checkpoints, and rollback—capabilities that matter when building security-aware agents with controlled decision loops.

Agent Framework Build

CrewAI

crewAIInc

Role-based multi-agent collaboration platform. CrewAI assigns each agent a job title, backstory, and goal, then coordinates them with a configurable process (sequential or hierarchical). Its intuitive mental model maps naturally onto security team personas—analyst, threat-hunter, incident-responder—making it easy to prototype agentic SOC workflows.

Agent Framework Build

PentestGPT

GreyDGL

An LLM-powered penetration-testing copilot for authorized engagements. PentestGPT maintains a persistent session context, suggests the next test step, and explains its reasoning—bridging the gap between a junior tester's knowledge and experienced pen-test methodology. Use only on scopes you own or have written authorization for.

Authorized Offensive Testing

Garak

NVIDIA

NVIDIA's open-source LLM vulnerability scanner. Garak runs hundreds of probes against a target model—checking for prompt injection, data leakage, jailbreaks, toxicity, and more—and produces a structured report. It is the closest thing the field has to a standardized security audit tool for AI models and agents, and is under active development.

Security Testing Guardrails

LLM Guard

protectai

A composable input/output safety layer for LLM applications. LLM Guard ships pre-built scanners for prompt injection, PII detection, toxic content, code execution attempts, and more. Drop it in front of any agentic pipeline to add policy enforcement without changing the core agent logic.

Guardrails Security Testing

Promptfoo

promptfoo

A developer-first LLM testing and red-teaming framework. Promptfoo evaluates prompts and agents across multiple providers, runs automated red-team attacks (jailbreaks, SSRF, SQL injection via prompt), and integrates into CI pipelines. Its YAML-based test format makes it practical for embedding security regression tests alongside feature tests.

Security Testing Testing

OpenAI Evals

openai

OpenAI's framework for evaluating LLM and agent behavior against curated datasets. Beyond accuracy benchmarks, Evals is the right foundation for building custom safety and alignment evaluations—measuring how an agent responds to adversarial inputs, policy-violating requests, and edge-case security scenarios.

Testing Security Testing

E2B

e2b-dev

Secure cloud sandboxes for running AI-generated code and agent actions. E2B spins up ephemeral, isolated execution environments on demand, so agents can run shell commands, install packages, and interact with filesystems without exposing the host. It is a critical safety primitive for any agent that executes untrusted or AI-generated code.

Guardrails Agent Framework

Semantic Kernel

microsoft

Microsoft's enterprise-grade SDK for building AI agents and copilots. Semantic Kernel emphasizes structured function-calling, memory management, and planner-driven execution with first-class support for responsible-AI filters—pre- and post-execution hooks that let security teams intercept, log, and block agent actions before they reach production systems.

Agent Framework Guardrails

Agentic Cyber Library New

10 Curated Entries

AutoGen

LangGraph

CrewAI

PentestGPT

Garak

LLM Guard

Promptfoo

OpenAI Evals

E2B

Semantic Kernel