Connecting an LLM to Nuclei: A Practitioner's Architecture Guide

I've been digging into agentic security tooling lately, and one question keeps coming up in practitioner communities: can you wire an LLM directly to a vulnerability scanner and have it reason about findings? The short answer is yes — but the architecture matters enormously if you want to do it safely. Here's what I've learned building a proof-of-concept that connects an LLM to Nuclei by ProjectDiscovery.

This is a practitioner-level architecture walkthrough. I'll cover the design, key safety guardrails, and pseudocode patterns — but I won't include step-by-step exploitation instructions. The goal is to help defenders understand how these systems fit together, not to publish an attack playbook.

What is Nuclei and Why Does it Matter?

Nuclei is an open-source, template-driven vulnerability scanner maintained by ProjectDiscovery. It uses a YAML-based template language to define probes for thousands of known vulnerabilities, misconfigurations, and exposed services. Security teams use it to assess web applications, APIs, and network services at scale. Because it's community-driven and template-based, its coverage grows rapidly as new CVEs are published and as contributors add detection logic.

The appeal of pairing Nuclei with an LLM is that Nuclei generates structured findings, and LLMs excel at reasoning over structured data. Instead of a human analyst manually triaging hundreds of scanner findings, an LLM can help prioritize, contextualize, and suggest next steps — all within a defined scope.

The Architecture at a Glance

The system I prototyped has four layers:

Scope enforcement layer: A configuration file that defines the authorized target hosts, ports, and template categories. Nothing runs without a valid scope definition.
Orchestration layer: A lightweight Python wrapper that invokes Nuclei as a subprocess, passes the scope config, and captures JSON output.
LLM reasoning layer: The scanner findings are injected into a structured prompt. The LLM reasons about severity, exploitability context, and remediation priority.
Output layer: A report is generated in markdown or JSON, scoped to the authorized findings only. No raw exploitation paths are included.

Scope Control Is Non-Negotiable

Before anything else, you need a scope manifest. This is not optional — running a vulnerability scanner against targets you don't own or haven't been explicitly authorized to test is illegal in most jurisdictions and violates every responsible disclosure norm. The scope manifest should specify:

Authorized hostnames and IP ranges (CIDR notation recommended)
Authorized template severity levels (e.g., only info and medium in early runs)
Authorized template categories (e.g., exposures and misconfigurations, not exploits)
Rate limits (requests per second to avoid DoS conditions on the target)

Here's a pseudocode representation of a minimal scope config:

# scope.yaml (pseudocode — not production-ready)
targets:
  - host: "app.example-authorized-target.internal"
    ports: [80, 443, 8080]
templates:
  severity: ["info", "low", "medium"]
  tags: ["exposure", "misconfiguration", "default-login"]
  exclude_tags: ["exploit", "dos", "rce", "sqli"]
rate_limit:
  requests_per_second: 5
  timeout_seconds: 10
authorization:
  confirmed_by: "security-team@example.com"
  scope_document: "engagement-2026-02-23.pdf"

The scope document reference is intentional. Every scan should trace back to a written authorization. Even in internal red-team contexts, written scope prevents misunderstandings.

Orchestration: Calling Nuclei as a Subprocess

The orchestration layer reads the scope config, validates it, and then invokes Nuclei. Using subprocess rather than a native library keeps the integration simple and ensures you're using the official Nuclei binary with its own safety checks.

# orchestrator.py (pseudocode)
import subprocess, json, sys

def run_nuclei(scope_config: dict) -> list[dict]:
    targets = scope_config["targets"]
    severity = ",".join(scope_config["templates"]["severity"])
    tags = ",".join(scope_config["templates"]["tags"])
    exclude_tags = ",".join(scope_config["templates"]["exclude_tags"])
    rate = scope_config["rate_limit"]["requests_per_second"]

    results = []
    for target in targets:
        host = target["host"]
        cmd = [
            "nuclei",
            "-u", host,
            "-severity", severity,
            "-tags", tags,
            "-etags", exclude_tags,          # exclude tags
            "-rl", str(rate),                # rate limit
            "-json",                          # structured output
            "-no-interactsh",                 # disable OOB interactions
            "-silent",
        ]
        proc = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
        for line in proc.stdout.strip().splitlines():
            try:
                results.append(json.loads(line))
            except json.JSONDecodeError:
                pass
    return results

A few safety notes on the flags used above:

-etags exploit,dos,rce,sqli excludes whole categories of active exploitation templates.
-no-interactsh disables out-of-band interaction testing, which can generate unexpected external traffic.
-rl 5 keeps the request rate low enough that the scanner won't accidentally DoS a lightly provisioned target.

Prompting the LLM for Triage

Once you have structured findings, you can inject them into an LLM prompt. The key design decision here is to give the model a clear, constrained task: prioritize and contextualize, don't generate exploit code.

# prompt_builder.py (pseudocode)
SYSTEM_PROMPT = """
You are a defensive security analyst assistant.
You will be given a list of vulnerability scanner findings in JSON format.
Your job is to:
1. Summarize each finding in plain English.
2. Rate remediation priority as HIGH, MEDIUM, or LOW based on severity and exploitability context.
3. Suggest a remediation action for each finding.
4. Flag any findings that appear to be false positives based on the template description.

Do NOT generate exploit code, payloads, or step-by-step attack instructions.
All findings are from an authorized internal security assessment.
"""

def build_prompt(findings: list[dict]) -> str:
    findings_json = json.dumps(findings, indent=2)
    return f"{SYSTEM_PROMPT}\n\nFindings:\n{findings_json}"

The system prompt constraint is important. LLMs will follow instructions, but they benefit from explicit negative constraints when the downstream use case is security. Saying "do NOT generate exploit code" in the system prompt is not sufficient by itself for a production system — you'll also want output filtering — but it meaningfully steers the model's behavior.

Rate Limits, Timeouts, and Blast Radius

One thing I underestimated early on was how quickly a scan can generate load. Even at 5 requests per second, a Nuclei run against a single host can issue hundreds of template-driven probes in a few minutes. For production use, I'd recommend:

Start with -severity info only. Expand to low and medium only after validating the target can handle it.
Set a hard -timeout on both the subprocess call and Nuclei's per-request timeout.
Run scans during maintenance windows or off-peak hours if the target is a production system.
Log every scan invocation with timestamp, scope hash, and operator identity. Treat these logs as audit artifacts.

What I'd Do Differently

I'm still learning here. A few things I'd change in a second iteration:

Template pinning: Nuclei's template library updates frequently. Pinning to a specific template release (using -nt to disable automatic updates) gives you reproducibility and prevents a new template from running against a target before you've reviewed it.
LLM output validation: I'd add a layer that parses the LLM's output and rejects any response containing common exploit indicators before it reaches the report consumer.
Separate the scan and analysis phases: Store raw findings in a database, then run LLM analysis as a separate async job. This decouples scan latency from analysis latency and gives you a replayable dataset.

Where This Fits in a Defensive Program

This architecture isn't a silver bullet. It's one component of a broader vulnerability management program. The LLM adds value in the triage and contextualization step — helping an analyst quickly understand what matters in a large set of findings. But it doesn't replace human judgment on what to fix first, and it doesn't replace the authorization and scoping discipline that makes the entire operation legitimate.

The most important insight from building this: the LLM is the easiest part. The hard parts are scope management, rate control, output safety, and organizational trust. Get those right and the LLM integration is almost trivial.

Agentic Workflow Diagram (Detection → Remediation)

Before diving into the code, it helps to see the full detection-to-remediation loop in one place. The diagram below maps each stage of the agentic workflow — from the initial signal that triggers a scan, through the policy controls that keep it authorized, to the final verification step. I find visual models like this useful when reasoning about where guardrails need to live and what happens when a request gets rejected at each gate. Notice that the Policy Gate has two exits: an approved path that runs Nuclei, and a rejected path that routes to Human Review instead of proceeding automatically.

flowchart LR
    A([Signal / Detection]) --> B{Scope & Authorization Check}
    B -- In scope --> C[LLM Planner]
    B -- Out of scope --> Z([Abort])
    C --> D{Policy Gate}
    D -- Approved --> E[Tool Runner — Nuclei]
    D -- Rejected --> R[Human Review]
    E --> F[Output — JSONL]
    F --> G[Parser]
    G --> H[LLM Reporter]
    H --> I{Human Approval}
    I -- Approved --> J[Remediation]
    I -- Needs review --> R
    J --> K([Verification])

References & Further Reading

ProjectDiscovery. Nuclei — Fast and customizable vulnerability scanner. GitHub. https://github.com/projectdiscovery/nuclei
ProjectDiscovery. Nuclei Templates. GitHub. https://github.com/projectdiscovery/nuclei-templates
OWASP. Web Security Testing Guide (WSTG). https://owasp.org/www-project-web-security-testing-guide/
MITRE. ATT&CK for Enterprise — Reconnaissance and Initial Access. https://attack.mitre.org/
OpenAI. System prompt design guidance. OpenAI Platform Documentation. https://platform.openai.com/docs/guides/prompt-engineering
NIST. SP 800-115: Technical Guide to Information Security Testing and Assessment. https://csrc.nist.gov/publications/detail/sp/800-115/final
ProjectDiscovery Blog. Nuclei v3 — The Next Generation Scanner. https://projectdiscovery.io/blog/nuclei-v3

Agentic Workflow Diagram (Detection → Remediation)

References & Further Reading

More Articles