#34

Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks

Minrui Xu, Jiani Fan, Xinyu Huang, Conghao Zhou, Jiawen Kang, Dusit Niyato, Shiwen Mao, Zhu Han, Xuemin (Sherman) Shen, Kwok-Yan Lam

2025 | arXiv (preprint)

2505.12786

survey general-cybersecurity fully-autonomous multi-agent ReAct

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

LLM-based agents have advanced beyond passive chatbots to become autonomous cyber entities capable of executing complex cyberattacks, leading to 'Cyber Threat Inflation' characterized by drastically reduced attack costs and massive increases in attack scale. Existing surveys fail to systematically analyze LLM-based cyberattack agents across different types of network infrastructures.

基于LLM的智能体已经从被动聊天机器人演变为能够执行复杂网络攻击的自主网络实体，导致了以攻击成本大幅降低和攻击规模大规模增长为特征的“网络威胁通胀（Cyber Threat Inflation）”。现有综述未能针对不同类型的网络基础设施系统分析基于LLM的网络攻击智能体。

Conventional cybersecurity perspectives overlook that LLM-based autonomous agents can be both defenders and adversaries, contributing to Cyber Threat Inflation on legacy systems. Prior surveys cover either LLM agents, cyberattacks, or network systems individually, but none jointly analyze all three dimensions. Blue teams need actionable defensive insights against this emerging class of autonomous adversaries.

传统的网络安全视角忽视了基于LLM的自主智能体既可以是防御者也可以是对手，从而导致了遗留系统上的网络威胁通胀。先前的综述分别涵盖了LLM智能体、网络攻击或网络系统，但没有一个能将这三个维度共同分析。蓝队需要针对这类新兴自主对手的可操作防御见解。

Threat Model 威胁模型

LLM-based agents as autonomous or semi-autonomous cyber adversaries that can reduce the time, expertise, and resources needed for sophisticated cyberattacks. Attackers may use cloud-based or locally-hosted fine-tuned open-source models to evade detection. The threat model encompasses three dimensions of scale uplift: capability uplift (automating tasks previously limited to skilled red-teamers), throughput uplift (continuous and large-scale parallel attacks), and autonomous risk emergence (self-adapting to defensive mechanisms in real-time).

基于LLM的智能体作为自主或半自主的网络对手，可以减少复杂网络攻击所需的时间、专业知识和资源。攻击者可能会使用云端或本地托管的、微调过的开源模型来规避检测。威胁模型包含三个维度的规模提升：能力提升（自动化以前仅限于熟练红队成员的任务）、吞吐量提升（持续且大规模的并行攻击）以及自主风险涌现（实时自我适应防御机制）。

Methodology 核心方法

The survey decomposes LLM-based cyberattack agents into five fundamental modules (models, perception, memory, reasoning & planning, and tools & actions) and analyzes multi-agent collaboration patterns. It then presents a taxonomy of eight representative cyberattack capabilities and maps them across three network paradigms: static infrastructure, mobile infrastructure, and infrastructure-free networks. For each of 18 network sub-types, the survey reviews attack methods, defense implications, and actionable lessons for blue teams. It also reviews 20+ benchmarks and 40+ agent frameworks.

本综述将基于LLM的网络攻击智能体分解为五个基本模块（模型、感知、记忆、推理与规划、工具与行动），并分析了多智能体协作模式。随后提出了一个包含八种代表性网络攻击能力的分类法，并将其映射到三种网络范式：静态基础设施、移动基础设施和无基础设施网络。针对18个网络子类型中的每一个，综述回顾了攻击方法、防御意义以及蓝队的可操作经验。它还回顾了20多个基准测试和40多个智能体框架。

Architecture 架构设计

Unified architecture for LLM-based cyberattack agents comprising: (1) Models - foundation or fine-tuned LLMs as the core brain; (2) Perception - ingesting textual OSINT, machine traces, program artefacts, and audiovisual cues; (3) Memory - long-term (pre-training/fine-tuning knowledge via specialized corpora like PRIMUS, ATTACKER, SecQA, CmdCaliper) and short-term (RAG and Knowledge Graphs for real-time operational context); (4) Reasoning & Planning - task-decomposition via CoT, tree/graph-of-thoughts, ReAct planning loops, and self-reflection/auto-repair; (5) Tools & Actions - data tools (scanners, readers), action tools (exploit launchers, shell commands), and orchestration tools (workflow managers for multi-stage attacks). Multi-agent collaboration supports cooperative, adversarial, or competitive role-based coordination.

基于LLM的网络攻击智能体的统一架构包括：(1) 模型——作为核心大脑的基础或微调过的LLM；(2) 感知——摄取文本形式的OSINT（开源情报）、机器追踪、程序构件以及视听线索；(3) 记忆——长期记忆（通过PRIMUS、ATTACKER、SecQA、CmdCaliper等专门语料库进行的预训练/微调知识）和短期记忆（用于实时操作上下文的RAG和知识图谱）；(4) 推理与规划——通过CoT（思维链）、思维树/图、ReAct规划循环以及自我反思/自动修复进行任务分解；(5) 工具与行动——数据工具（扫描器、读取器）、行动工具（漏洞利用启动器、外壳命令）和编排工具（用于多阶段攻击的工作流管理器）。多智能体协作支持基于角色的协作、对抗或竞争协调。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

RAG

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

The survey catalogues 40+ LLM-based agent frameworks for cyberattacks spanning 8 attack types across 18 network categories. Key findings include: PentestGPT achieves 228.6% better task completion than GPT-3.5; RapidPen achieves shell access in 200-400 seconds at $0.3-$0.6 per run with 60% success rate; GPT-4 reproduces 87% of one-day exploits from CVE descriptions; fine-tuned 7B models (Hackphyr) match GPT-4 performance on network security; LLM-augmented honeypots achieve 56% indistinguishability from real systems; and traditional defense methods are found systematically inadequate against autonomous LLM-driven cyberattacks.

本综述对涵盖18个网络类别的8种攻击类型的40多个基于LLM的攻击智能体框架进行了编录。关键发现包括：PentestGPT的任务完成率比GPT-3.5高228.6%；RapidPen在200-400秒内以每次运行0.3-0.6美元的成本实现了60%的shell获取成功率；GPT-4能从CVE描述中复现87%的一日漏洞利用；微调后的7B模型（Hackphyr）在网络安全方面的表现与GPT-4相当；LLM增强型蜜罐达到了与真实系统56%的不可区分率；并且发现传统防御方法在面对自主LLM驱动的网络攻击时表现出系统性的不足。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

PentestGPT
RapidPen
hackingBuddyGPT
VulnBot
AutoPT
CIPHER
ARACNE
PenHealNet
PenHeal
Hackphyr
Crimson
LProtector
WitheredLeaf
GRACE
PhishAgent
WormGPT
HoneyLLM
LLMPot
HackSynth
EnIGMA

Scale 评估规模

Survey covers 222 references, 40+ agent frameworks catalogued in Table 4, 20+ benchmarks in Table 5, attacks analyzed across 18 network types in 3 paradigms

Contributions 核心贡献

A novel unified architecture abstracting common design patterns of LLM-based cyberattack agents with five components: models, perception, memory, reasoning & planning, and tools & actions, plus cooperative multi-agent orchestration
A taxonomy of eight representative cyberattack capabilities (cyber threat intelligence, penetration testing, vulnerability detection, phishing & social engineering, malware generation, vulnerability exploitation, honeypot deployment, and CTF challenges) with analysis of specific bottlenecks and limitations for each
Systematic analysis of how cyberattack capabilities manifest across three network paradigms with 18 sub-types: static infrastructure (6G, enterprise, data center, SDN, smart grid, quantum), mobile infrastructure (IoT, satellite, MANET, vehicular, UAV, underwater), and infrastructure-free (social, CDN, blockchain, digital twin, immersive, autonomous agent networks)
Analysis of threat bottlenecks and review of existing defense methods across different network infrastructures, with actionable lessons for blue teams organized per network paradigm
Identification of seven future research directions: governance/guardrails, human-in-the-loop alignment, sustainable red-teaming, privacy-preserving multi-agent collaboration, defense against agent swarms, LLM-based agent honeypots, and agent-to-agent deception

提出了一种抽象了基于LLM的网络攻击智能体共同设计模式的统一架构，包含五个组件：模型、感知、记忆、推理与规划、工具与行动，以及协作式多智能体编排
提出了包含八种代表性网络攻击能力（网络威胁情报、渗透测试、漏洞检测、钓鱼与社会工程、恶意软件生成、漏洞利用、蜜罐部署和CTF挑战）的分类法，并分析了每种能力的具体瓶颈和局限性
系统分析了网络攻击能力在三种网络范式（共18个子类型）中的表现：静态基础设施（6G、企业、数据中心、SDN、智能电网、量子）、移动基础设施（IoT、卫星、MANET、车载、无人机、水下）和无基础设施网络（社交、CDN、区块链、数字孪生、沉浸式、自主智能体网络）
分析了不同网络基础设施中的威胁瓶颈，并回顾了现有的防御方法，为蓝队提供了按网络范式组织的可操作经验
确定了七个未来研究方向：治理/护栏、人机协作对齐、可持续红队、保护隐私的多智能体协作、针对智能体集群的防御、基于LLM的智能体蜜罐以及智能体对智能体的欺骗

Limitations 局限性

Survey is primarily descriptive and taxonomic; does not introduce new experimental results or benchmarks
Coverage is broad but necessarily shallow across 18 network types, with some categories (quantum, underwater, immersive) having limited existing research to survey
Rapid pace of LLM development means the survey's model comparison table and framework catalog may quickly become outdated
Ethical considerations of cataloguing offensive capabilities are acknowledged but not deeply addressed
Defense analysis remains high-level with general lessons rather than concrete, validated mitigation strategies
Does not provide a unified quantitative comparison across the surveyed frameworks due to heterogeneous evaluation settings

综述主要是描述性和分类性的；没有引入新的实验结果或基准测试
覆盖范围虽广但在18个网络类型上难免浅尝辄止，某些类别（量子、水下、沉浸式）的现有研究非常有限
LLM发展的极快速度意味着综述中的模型对比表和框架目录可能会迅速过时
虽然提到了编录攻击能力的伦理考虑，但并未深入探讨
防御分析仍处于较高层次，提供的是一般性经验而非经过验证的具体缓解策略
由于评估设置各异，未能对所调查的框架进行统一的定量比较

Research Gaps 研究空白

Governance and guardrails for LLM-based agents: agent architectures must embed safety constraints, ethical enforcement, compliance checking, and intervention mechanisms
Human-in-the-loop alignment for cyberattack agents: balancing agent autonomy with human oversight at critical decision points during high-risk operations
Sustainable red-teaming: developing energy-efficient methodologies via scenario sampling, model distillation, and RL-based exploration while maintaining vulnerability coverage
Privacy-preservation during multi-agent collaboration: secure aggregation, poisoning resistance, and non-IID data robustness for federated learning among defensive agents
Defense against LLM-based agent swarms: distributed anomaly detection, decentralized defense architectures, and deception-based countermeasures for coordinated multi-agent attacks
LLM-based agent honeypots: scaling intelligent, adaptive honeypots that dynamically simulate system behaviors and capture detailed attack telemetry
Agent-to-agent deception: deploying decoys and misinformation to mislead malicious agents while defending against manipulation of defensive AI, requiring interdisciplinary game theory and adversarial ML
Lack of end-to-end benchmarks spanning the full kill chain from reconnaissance through post-exploitation across diverse network types
No standardized evaluation framework comparing LLM-based cyberattack agents across heterogeneous network environments
Traditional defense methods assume human adversaries and are inadequate against the speed, scale, and adaptability of autonomous LLM-based attackers
Limited research on LLM-based attacks against quantum networks, underwater networks, digital twin environments, and immersive XR networks

基于LLM智能体的治理和护栏：智能体架构必须嵌入安全约束、伦理执行、合规性检查和干预机制
网络攻击智能体的人机协作对齐：在高风险操作的关键决策点平衡智能体自主权与人类监管
可持续红队：通过情景采样、模型蒸馏和基于强化学习的探索开发能源高效的方法，同时保持漏洞覆盖率
多智能体协作过程中的隐私保护：防御智能体之间联邦学习的安全聚合、抗投毒攻击和非独立同分布（non-IID）数据的鲁棒性
针对基于LLM的智能体集群的防御：针对协调的多智能体攻击，研究分布式异常检测、去中心化防御架构和基于欺骗的反制措施
基于LLM的智能体蜜罐：扩展能够动态模拟系统行为并捕获详细攻击遥测数据的智能自适应蜜罐
智能体对智能体的欺骗：部署诱饵和误导信息来误导恶意智能体，同时防御针对防御性AI的操纵，这需要跨学科的博弈论和对抗性机器学习研究
缺乏涵盖从侦察到利用后（post-exploitation）的跨多种网络类型的完整攻击链的端到端基准测试
没有标准化的评估框架来比较不同异构网络环境下的基于LLM的网络攻击智能体
传统防御方法假设对手是人类，不足以应对自主LLM攻击者的速度、规模和适应性
针对量子网络、水下网络、数字孪生环境和沉浸式XR网络的基于LLM攻击的研究有限

Novel Techniques 新颖技术

Cyber Threat Inflation framework: three dimensions of scale collapse - capability uplift, throughput uplift, and autonomous risk emergence - that fundamentally shift attacker-defender asymmetry
Unified five-module architecture for cyberattack agents (models, perception, memory, reasoning & planning, tools & actions) providing a common analytical lens for any offensive LLM agent
Capability-to-attack mapping (Table 3): systematic rating of how perception, memory, reasoning, tool invocation, and multi-agent collaboration contribute to each of eight attack types at high/medium/low levels
Three-tier network taxonomy (static/mobile/infrastructure-free) with 18 sub-types for analyzing differential attack surfaces and blue-team defense strategies
Blue team defensive lessons organized per network category: exploit model limitations (hallucination, context window, knowledge cutoff), design traps in OODA loops, leverage multi-agent defense, implement zero-trust, edge-native security, trust/reputation mechanisms

网络威胁通胀框架：从能力提升、吞吐量提升和自主风险涌现三个维度分析，从根本上改变了攻防不对称性
网络攻击智能体的统一五模块架构（模型、感知、记忆、推理与规划、工具与行动），为任何攻击型LLM智能体提供了通用的分析视角
能力-攻击映射（表3）：系统评级感知、记忆、推理、工具调用和多智能体协作如何在高/中/低水平上对八种攻击类型做出贡献
三层网络分类法（静态/移动/无基础设施）及其18个子类型，用于分析差异化的攻击面和蓝队防御策略
按网络类别组织的蓝队防御经验：利用模型的局限性（幻觉、上下文窗口、知识截止）、在OODA循环中设计陷阱、利用多智能体防御、实施零信任、边缘原生安全、信任/声誉机制

Open Questions 开放问题

How can governance frameworks balance enabling beneficial cybersecurity research while preventing misuse of LLM-based attack agents?
What is the optimal human-in-the-loop intervention strategy that maintains agent effectiveness while ensuring safety?
How do LLM-based agents perform in truly novel (zero-day, unseen-protocol) scenarios versus regurgitating known attack patterns?
Can defensive LLM agent swarms effectively counter offensive LLM agent swarms in real-world scenarios?
How will quantum networks and post-quantum cryptography change the landscape of LLM-based autonomous attacks?
What standardized benchmarks are needed to fairly compare cyberattack agents across heterogeneous network environments?
What is the real-world prevalence of LLM-based autonomous cyberattacks currently occurring in the wild?

治理框架如何平衡使有益的网络安全研究成为可能与防止滥用基于LLM的网络攻击智能体？
在保持智能体有效性的同时确保安全性的人机协作最佳干预策略是什么？
基于LLM的智能体在真正的全新（零日、未见过的协议）场景下的表现如何，相对于重复已知的攻击模式？
防御性LLM智能体集群能否在现实场景中有效对抗攻击性LLM智能体集群？
量子网络和后量子密码学将如何改变基于LLM的自主攻击格局？
需要哪些标准化基准测试来公平地比较异构网络环境下的网络攻击智能体？
目前在野外发生的基于LLM的自主网络攻击的真实普遍程度是多少？

Builds On 基于前人工作

Wang et al. (2024) - comprehensive review of LLM-based autonomous agents
Luo et al. (2025) - life-cycle perspective on LLM-based agents
Jin et al. (2024) - LLM applications in software engineering
Zhang et al. (2025) - LLMs meet cybersecurity systematic review
Ferrag et al. (2024) - benchmarking 42 LLMs for cyber security
Nguyen et al. (2024) - LLM threats in 6G security
Rodriguez et al. (2025) - Framework for evaluating emerging cyberattack capabilities of AI
NIST Cybersecurity Framework
Microsoft Cyber Kill Chain Framework