Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Problem & Motivation 问题与动机
Prior work has shown that single LLM agents can exploit known (one-day) vulnerabilities when given a description, but they perform poorly on zero-day vulnerabilities where no description is provided. This paper investigates whether teams of LLM agents can autonomously exploit real-world zero-day web vulnerabilities.
之前的研究表明,单个 LLM 智能体在给定描述的情况下可以利用已知(one-day)漏洞,但在没有提供描述的零日(zero-day)漏洞上表现不佳。本文研究 LLM 智能体团队是否能自主利用现实世界的零日 Web 漏洞。
Single agents struggle with the joint exploration, planning, and execution required for zero-day exploitation due to limited context lengths and difficulty backtracking after exploring dead ends. A more structured multi-agent approach could overcome these limitations and answer the open question of whether AI agents can exploit vulnerabilities unknown to the attacker ahead of time.
由于上下文长度有限且在探索死胡同后难以回溯,单个智能体在进行零日漏洞利用所需的联合探索、规划和执行方面存在困难。一种更结构化的多智能体方法可以克服这些限制,并回答 AI 智能体是否能利用攻击者事先未知的漏洞这一开放性问题。
Threat Model 威胁模型
An attacker with access to a web application (with basic credentials like a normal user account) but no knowledge of specific vulnerabilities in the system. The attacker uses LLM-powered agents with access to web browsing tools, terminals, and vulnerability-specific tooling, but agents do not search for vulnerabilities via search engines.
攻击者可以访问 Web 应用程序(具有普通用户账户等基本凭据),但对系统中的特定漏洞一无所知。攻击者使用由 LLM 驱动的智能体,这些智能体可以访问网页浏览工具、终端和特定漏洞工具,但智能体不会通过搜索引擎搜索漏洞。
Methodology 核心方法
The authors introduce HPTSA (Hierarchical Planning and Task-Specific Agents), a multi-agent framework with three components: (1) a hierarchical planner that explores the target website and determines attack strategies, (2) a team manager that dispatches task-specific expert agents and synthesizes information across agent runs, and (3) a set of six task-specific expert agents (XSS, SQLi, CSRF, SSTI, ZAP, and a generic web hacking agent) each specialized in exploiting a particular vulnerability class. The planner explores the environment, the manager selects and coordinates expert agents, and experts attempt exploitation with access to relevant documentation and specialized tools.
作者引入了 HPTSA(分层规划与特定任务智能体),这是一个包含三个组件的多智能体框架:(1) 分层规划器,负责探索目标网站并确定攻击策略;(2) 团队管理器,负责调度特定任务的专家智能体并综合各智能体运行的信息;(3) 一组六个特定任务专家智能体(XSS, SQLi, CSRF, SSTI, ZAP 和通用 Web 黑客智能体),每个智能体专门负责利用特定的漏洞类别。规划器探索环境,管理器选择并协调专家智能体,专家在获取相关文档和专业工具的情况下尝试漏洞利用。
Architecture 架构设计
Three-tier hierarchical architecture: (1) Planner at the top explores the target and creates high-level attack plans, (2) Manager in the middle selects which task-specific agents to dispatch and passes context from previous agent runs, (3) Task-specific expert agents at the bottom attempt exploitation of specific vulnerability classes. Agents communicate via LangGraph message passing.
三层分层架构:(1) 顶层的规划器(Planner)探索目标并制定高层攻击计划;(2) 中层的管理器(Manager)选择分发哪些特定任务智能体,并传递先前智能体运行的上下文;(3) 底层的特定任务专家智能体(Expert agents)尝试利用特定类别的漏洞。智能体通过 LangGraph 消息传递进行通信。
LLM Models 使用的大模型
Tool Integration 工具集成
Memory Mechanism 记忆机制
conversation-history
Attack Phases Covered 覆盖的攻击阶段
Evaluation 评估结果
HPTSA with GPT-4 achieves 42% pass@5 and 18% pass@1 on 14 real-world zero-day web vulnerabilities. It outperforms a single GPT-4 agent (no description) by 4.3x on pass@1 and 2.0x on pass@5, and performs within 1.8x of a GPT-4 agent given the vulnerability description. Open-source models (Llama-3.1-405B, Qwen-2.5-72B) and traditional scanners (ZAP, MetaSploit) achieve 0% success.
使用 GPT-4 的 HPTSA 在 14 个现实世界的零日 Web 漏洞上实现了 42% 的 pass@5 和 18% 的 pass@1。在 pass@1 上优于单个 GPT-4 智能体(无描述)4.3 倍,在 pass@5 上优于 2.0 倍,且表现与给定漏洞描述的 GPT-4 智能体相差不到 1.8 倍。开源模型(Llama-3.1-405B, Qwen-2.5-72B)和传统扫描器(ZAP, MetaSploit)的成功率为 0%。
Environment 评估环境
Metrics 评估指标
Baseline Comparisons 基准对比
- GPT-4 single agent without vulnerability description (0DV agent)
- GPT-4 single agent with vulnerability description (1DV agent)
- ZAP vulnerability scanner
- MetaSploit vulnerability scanner
- Llama-3.1-405B with HPTSA
- Qwen-2.5-72B with HPTSA
Scale 评估规模
14 real-world zero-day web vulnerabilities (CVEs from 2024)
Contributions 核心贡献
- First demonstration that teams of LLM agents can autonomously exploit real-world zero-day vulnerabilities, resolving an open question from prior work
- Introduction of HPTSA, a hierarchical multi-agent framework with a planner, team manager, and task-specific expert agents for cybersecurity exploitation
- A new benchmark of 14 real-world zero-day web vulnerabilities (all CVEs from 2024, past GPT-4 knowledge cutoff) spanning XSS, SQLi, CSRF, privilege escalation, and other types
- Ablation studies demonstrating the necessity of each component: task-specific agents, documents, and hierarchical structure
- 首次证明 LLM 智能体团队可以自主利用现实世界的零日漏洞,解决了之前研究中的一个开放性问题。
- 引入了 HPTSA,这是一种用于网络安全漏洞利用的分层多智能体框架,包含规划器、团队管理器 e 特定任务专家智能体。
- 建立了一个包含 14 个现实世界零日 Web 漏洞的新基准(均为 2024 年的 CVE,超过了 GPT-4 的知识截止日期),涵盖 XSS, SQLi, CSRF, 权限提升等类型。
- 消融实验证明了每个组件的必要性:特定任务智能体、文档 e 分层结构。
Limitations 局限性
- Only 42% pass@5 success rate, meaning the majority of zero-day vulnerabilities remain unexploited
- Only GPT-4 succeeds; open-source models (Llama-3.1-405B, Qwen-2.5-72B) achieve 0%, showing heavy dependence on frontier proprietary models
- Benchmark is limited to 14 web vulnerabilities in open-source software, which may produce a biased sample of the vulnerability landscape
- Focused exclusively on web vulnerabilities; non-web vulnerabilities (e.g., binary exploitation, network protocols) are not addressed
- Agents fail on vulnerabilities requiring access to undocumented API endpoints or non-obvious navigation paths (e.g., CVE-2024-25635, CVE-2024-33247)
- Average cost of $4.39 per run ($24.40 per successful exploit) with GPT-4, which may limit scalability
- Code and prompts are not released publicly (at OpenAI's request), limiting reproducibility
- Open-source models showed high refusal rates (31% for Llama) and tendency to repeat incorrect approaches
- 仅有 42% 的 pass@5 成功率,意味着大多数零日漏洞仍未被利用。
- 仅 GPT-4 取得成功;开源模型(Llama-3.1-405B, Qwen-2.5-72B)成功率为 0%,显示出对前沿商业模型的严重依赖。
- 基准局限于开源软件中的 14 个 Web 漏洞,这可能产生漏洞景观的偏差样本。
- 专门关注 Web 漏洞;未涉及非 Web 漏洞(例如二进制漏洞利用、网络协议)。
- 智能体在需要访问未记录的 API 端点或非显而易见的导航路径(如 CVE-2024-25635, CVE-2024-33247)的漏洞上失败。
- 使用 GPT-4 每次运行的平均成本为 4.39 美元(每次成功利用 24.40 美元),这可能限制其扩展性。
- 代码和提示词未公开(应 OpenAI 要求),限制了可重复性。
- 开源模型表现出较高的拒绝率(Llama 为 31%)且倾向于重复错误的方法。
Research Gaps 研究空白
- How to improve agent exploration of non-obvious attack surfaces (hidden endpoints, undocumented APIs)
- Whether more sophisticated multi-agent coordination or planning strategies could further close the gap to one-day (known vulnerability) performance
- Extending zero-day exploitation capabilities beyond web vulnerabilities to network, binary, and other domains
- Whether AI agents will ultimately favor offense or defense in cybersecurity, and how to steer development toward defensive applications
- Reducing dependence on frontier proprietary models by improving open-source model capabilities for cybersecurity tasks
- Developing better strategies for agents to handle vulnerabilities that lack visible input fields or obvious injection points
- 如何改进智能体对非显而易见攻击面(隐藏端点、未记录 API)的探索。
- 更复杂的多智能体协调或规划策略是否能进一步缩小与已知漏洞(one-day)利用表现的差距。
- 将零日漏洞利用能力从 Web 漏洞扩展到网络、二进制等其他领域。
- AI 智能体最终在网络安全中是偏向进攻还是防御,以及如何引导开发转向防御应用。
- 通过提高开源模型在网络安全任务中的能力,减少对前沿商业模型的依赖。
- 为智能体开发更好的策略,以处理缺乏可见输入字段或明显注入点的漏洞。
Novel Techniques 新颖技术
- Hierarchical planning with task-specific expert agents (HPTSA) that separates exploration/planning from exploitation, allowing backtracking at the manager level rather than within individual agents
- Task-specific expert agents with curated vulnerability-class documentation (5-6 documents per agent) to provide domain knowledge without requiring it in the model's training data
- HTML simplification strategy to reduce token consumption by stripping irrelevant tags (images, SVG, style) before passing web content to agents
- Cross-agent information synthesis where the manager uses traces from prior agent runs to refine instructions for subsequent agents
- 具有特定任务专家智能体的分层规划 (HPTSA),将探索/规划与漏洞利用分离,允许在管理器级别而非单个智能体内部进行回溯。
- 任务特定专家智能体与策划的漏洞等级文档 (HPTSA) 的特定任务专家智能体,旨在提供领域知识而无需将其包含在模型的训练数据中。
- HTML 简化策略,通过在将网页内容传递给智能体之前剥离无关标签(图像、SVG、样式)来减少 token 消耗。
- 跨智能体信息综合,管理器利用先前智能体运行的轨迹来精炼后续智能体的指令。
Open Questions 开放问题
- Can the approach scale to more complex, multi-step vulnerabilities that require chaining multiple exploits?
- How would HPTSA perform on non-web vulnerability classes (e.g., memory corruption, logic bugs in APIs)?
- What is the optimal number and granularity of task-specific expert agents?
- Could reinforcement learning or self-improvement be used to make the planner and manager more effective over time?
- How robust is the approach to defensive measures like WAFs, rate limiting, or honeypots?
- Will next-generation open-source models close the gap with GPT-4 on this task?
- 该方法能否扩展到更复杂的、需要链接多个漏洞利用的多步漏洞?
- HPTSA 在非 Web 漏洞类别(如内存损坏、API 中的逻辑漏洞)上表现如何?
- 特定任务专家智能体的最佳数量和粒度是多少?
- 能否使用强化学习或自我提升使规划器 e 管理器随着时间的推移变得更有效?
- 该方法对 WAF、速率限制或蜜罐等防御措施的鲁棒性如何?
- 下一代开源模型是否能在该任务上缩小与 GPT-4 的差距?
Builds On 基于前人工作
- Fang et al. 2024a - LLM agents can autonomously exploit one-day vulnerabilities
- Fang et al. 2024b - LLM agents can autonomously hack websites
- Liu et al. 2023b - Dynamic LLM-agent network for multi-agent collaboration
- Chen et al. 2023 - AutoAgents framework for automatic agent generation
- Zhang et al. 2023 - Building cooperative embodied agents modularly with LLMs
- Yao et al. 2022 - ReAct: Synergizing reasoning and acting in language models
Open Source 开源信息
No