#49

Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI

Victor Mayoral-Vilches, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Maite del Mundo de Torres, Maria Sanz-Gomez, Francesco Balassone, Cristobal R. J. Veas Chavez, Vanesa Turiel, Alfonso Glera-Picon, Daniel Sanchez-Prieto, Yuri Salvatierra, Paul Zabalegui-Landa, Ruffino Reydel Cabrera-Alvarez, Patxi Mayoral-Pizarroso

2026 | arXiv (preprint)

arXiv:2601.14614v3

system penetration-testing fully-autonomous multi-agent game-theoretic

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Current LLM-based cybersecurity agents match or exceed human speed but lack strategic reasoning capabilities, creating a ceiling at expert-level performance that prevents achieving true cybersecurity superintelligence. The paper addresses how to progress from AI-guided humans to human-guided AI that reasons game-theoretically about adversarial dynamics.

目前的基于大语言模型（LLM）的网络安全智能体虽然在速度上达到或超过了人类，但缺乏战略推理能力，导致其性能在专家级水平遇到瓶颈，阻碍了实现真正的“网络安全超智能”。本文探讨了如何从“AI 引导人类”进化到“人类引导 AI”，后者能够对对抗动态进行博弈论推理。

Benchmarks for cybersecurity AI are rapidly saturating, yet autonomous agents still fail at tasks requiring creative exploitation and strategic reasoning. Speed alone (3,600x faster than humans) does not constitute superintelligence; agents must also reason strategically about attacker/defender dynamics the way elite security professionals do. The paper fills the gap between fast-but-brittle AI agents and systems that can reason about adversarial game theory.

网络安全 AI 的基准测试正在迅速饱和，然而自主智能体在需要创造性利用和战略推理的任务上仍然表现不佳。仅靠速度（比人类快 3,600 倍）并不构成超智能；智能体还必须像精英安全专业人士那样，对攻击者/防御者的动态博弈进行战略推理。本文填补了反应迅速但脆弱的 AI 智能体与能够进行对抗性博弈论推理的系统之间的空白。

Threat Model 威胁模型

Assumes an attacker-defender adversarial setting where both sides may deploy AI agents. The G-CTR component models cybersecurity scenarios as games with attack graphs, computing Nash equilibrium strategies for optimal attack paths and defense allocations. Partial observability, adaptive adversaries, and resource limits are acknowledged as real-world constraints.

假设存在攻击者与防御者的对抗场景，双方都可能部署 AI 智能体。G-CTR 组件将网络安全场景建模为包含攻击图的博弈，计算纳什均衡策略，以获得最佳攻击路径和防御资源分配。该模型承认部分可观测性、自适应对手和资源限制是真实世界的约束条件。

Methodology 核心方法

The paper presents a three-stage evolutionary progression toward cybersecurity superintelligence: (1) PentestGPT (2023) uses LLMs to guide human penetration testers via a Reasoning Module (Penetration Testing Task Tree), Generation Module (Chain-of-Thought command generation), and Parsing Module; (2) CAI (2025) eliminates the human execution bottleneck through a fully automated multi-agent architecture with six pillars (Agents, Tools, Handoffs, Patterns, Turns, HITL); (3) G-CTR (2026) introduces a neurosymbolic architecture embedding game-theoretic reasoning into LLM agents via attack graph generation, Nash equilibrium computation using the Cut-the-Rope algorithm, and strategic digest injection into agent system prompts.

本文提出了迈向网络安全超智能的三个进化阶段：（1）PentestGPT (2023) 使用 LLM 通过推理模块（渗透测试任务树）、生成模块（思维链命令生成）和解析模块来引导人类渗透测试人员；（2）CAI (2025) 通过包含六个支柱（智能体、工具、移交、模式、轮换、HITL）的全自动多智能体架构消除了人类执行瓶颈；（3）G-CTR (2026) 引入了一种神经符号架构，通过攻击图生成、使用“Cut-the-Rope”算法计算纳什均衡，以及将战略摘要注入智能体系统提示词，将博弈论推理嵌入 LLM 智能体中。

Architecture 架构设计

G-CTR operates via three phases running as closed-loop strategic feedback in parallel with the main CAI agent loop: (1) Attack Graph Generation extracts structured graph representations from unstructured security logs using LLMs, achieving 70-90% node correspondence with expert annotations while running 60-245x faster; (2) Nash Equilibrium Computation applies the Cut-the-Rope (CTR) algorithm to identify optimal attack/defense strategies on the attack graph; (3) Strategic Digest Injection transforms equilibrium computations into natural language guidance inserted into the agent's system prompt, steering actions toward statistically advantageous continuations. The digest acts like a chess engine, highlighting strongest lines, collapsing the search space, and suppressing hallucinations.

G-CTR 通过三个阶段运行，作为与主 CAI 智能体循环并行的闭环战略反馈：（1）攻击图生成：使用 LLM 从非结构化安全日志中提取结构化图表示，在比专家运行快 60-245 倍的同时，实现 70-90% 的节点对应率；（2）纳什均衡计算：应用 Cut-the-Rope (CTR) 算法识别攻击图上的最佳攻击/防御策略；（3）战略摘要注入：将均衡计算结果转化为自然语言指南插入智能体的系统提示词中，引导其向统计上有利的后续行动发展。该摘要类似于象棋引擎，突出最强路径，压缩搜索空间并抑制幻觉。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

conversation-history

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

G-CTR doubled success rates (20.0% to 42.9%) on 44 cyber-range penetration tests, reduced cost-per-success 2.7x ($0.32 to $0.12), and reduced behavioral variance 5.2x. Combined CAI+G-CTR achieved 100% success on CAIBench-Jeopardy CTFs (Base) vs 82.6% for CAI alone and 47.8% for PentestGPT. In Attack & Defense scenarios, game-theoretic agents achieved 2:1 advantage over non-strategic AI and outperformed independently-guided dual teams 3.7:1. CAI alone operated 3,600x faster than humans at 156x lower cost ($109 total API cost vs $17,218 equivalent human labor across 54 CTF challenges).

在 44 个网络靶场渗透测试中，G-CTR 使成功率翻了一番（从 20.0% 提高到 42.9%），单次成功成本降低了 2.7 倍（从 0.32 美元降至 0.12 美元），行为方差减少了 5.2 倍。结合 CAI+G-CTR 在 CAIBench-Jeopardy CTFs (Base) 上实现了 100% 的成功率，而仅 CAI 为 82.6%，PentestGPT 为 47.8%。在攻防（Attack & Defense）场景中，博弈论智能体对非战略 AI 具有 2:1 的优势，优于独立引导的双重团队 3.7:1。在 54 个 CTF 挑战中，仅 CAI 的运行速度就比人类快 3,600 倍，成本低 156 倍（总 API 成本 109 美元，而等效的人类劳动成本为 17,218 美元）。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

Human expert teams
LLM-only baselines (non-strategic)
PentestGPT (AI-guided humans)
CAI without G-CTR
Multiple frontier LLMs on Cybench

Scale 评估规模

54 CTF challenges for CAI benchmarking, 44 cyber-range penetration tests for G-CTR evaluation, 33 CAIBench-Jeopardy CTFs (Cybench) challenges for cross-model comparison

Contributions 核心贡献

Introduces the concept of Cybersecurity Superintelligence as a domain-specific instantiation of superintelligence, defining it as AI that computationally surpasses the best humans across all cyber disciplines under real-world constraints
Presents a three-stage evolutionary framework from AI-guided humans (PentestGPT) to expert-level AI agents (CAI) to game-theoretic AI agents (G-CTR), documenting the paradigmatic shift in human-AI roles
Introduces G-CTR, a neurosymbolic architecture that embeds game-theoretic reasoning (Nash equilibrium via Cut-the-Rope algorithm) into LLM-based security agents through attack graph generation and strategic digest injection
Demonstrates that game-theoretic guidance doubles success rates, reduces behavioral variance 5.2x, and achieves 2:1 advantage over non-strategic AI in adversarial Attack & Defense scenarios
Provides comprehensive benchmarking of 20+ frontier LLMs on the CAIBench-Jeopardy (Cybench) benchmark, documenting rapid benchmark saturation

引入了网络安全超智能（Cybersecurity Superintelligence）的概念，作为超智能在特定领域的实例化，将其定义为在现实约束下所有安全学科中都能通过计算超越最优秀人类的 AI
提出了一个从 AI 引导人类（PentestGPT）到专家级 AI 智能体（CAI）再到博弈论 AI 智能体（G-CTR）的三阶段进化框架，记录了人类与 AI 角色的范式转变
引入了 G-CTR，这是一种神经符号架构，通过攻击图生成和战略摘要注入，将博弈论推理（基于 Cut-the-Rope 算法的纳什均衡）嵌入到基于 LLM 的安全智能体中
证明了博弈论引导可使成功率翻倍，行为方差减少 5.2 倍，并在对抗性攻防场景中对非战略性 AI 取得 2:1 的优势
在 CAIBench-Jeopardy (Cybench) 基准测试上提供了 20 多个前沿 LLM 的综合基准测试，记录了基准测试的迅速饱和趋势

Limitations 局限性

State-of-the-art LLMs cost approximately $5,940 per billion tokens, making sustained automated security economically unviable for most organizations without multi-model orchestration
True autonomy (delegated decision-making) remains out of scope in real-world incident response; even G-CTR provides supervised automation rather than full agency
Current AI agents still underperform humans in pwn (0.77x) and crypto (0.47x) categories requiring creative exploitation and mathematical insight
AI security agents risk performance drift and degradation without ongoing human-curated knowledge updates and supervision
The approach assumes attack graph representations can capture the full complexity of security scenarios; 70-90% node correspondence with expert annotations means some information is lost
Benchmark saturation suggests current evaluation frameworks may be insufficient for measuring progress toward true superintelligence

最先进的 LLM 每十亿 token 的成本约为 5,940 美元，在没有多模型编排的情况下，大多数组织在经济上无法承担持续的自动化安全
在现实的事件响应中，真正的自主性（授权决策）仍然无法实现；即便是 G-CTR 提供的也是受监督的自动化，而非完全的代理权
目前的 AI 智能体在需要创造性利用和数学洞察力的漏洞利用（pwn，0.77x）和密码学（crypto，0.47x）类别中仍然表现不如人类
如果没有持续的人类知识更新和监督，AI 安全智能体面临性能漂移和退化的风险
该方法假设攻击图表示可以捕捉安全场景的全部复杂性；与专家注释相比，70-90% 的节点对应率意味着部分信息丢失
基准测试的饱和表明，目前的评估框架可能不足以衡量迈向真正超智能的进展

Research Gaps 研究空白

Improving agency in security AI: independent decision-making, strategic planning, and adaptive response remain underdeveloped
Bridging the gap between automation and true autonomy in cybersecurity
Developing human meta-cognitive supervisory skills needed to oversee AI that reasons at superhuman speeds about superhuman strategies
Addressing the temporal asymmetry where AI enumerates attack surfaces faster than organizations can process findings
Creating economically viable continuous AI security operation through multi-model orchestration and cost reduction
Designing new benchmarks that do not saturate as quickly and can measure strategic reasoning capabilities beyond task completion

提高安全 AI 的代理性：独立决策、战略规划和自适应响应仍待开发
弥合网络安全中自动化与真正自主性之间的差距
开发监管 AI 所需的人类元认知监督技能，这些 AI 以超人的速度进行超人的战略推理
解决 AI 枚举攻击面的速度超过组织处理发现结果速度的时间不对称问题
通过多模型编排和降低成本，创建经济上可持续的持续 AI 安全运营
设计不易饱和且能衡量超越任务完成度的战略推理能力的全新基准测试

Novel Techniques 新颖技术

Neurosymbolic architecture combining LLM neural inference with symbolic game-theoretic equilibrium computation for cybersecurity agents
Strategic Digest Injection: transforming Nash equilibrium computations into natural language guidance injected into agent system prompts to steer behavior toward statistically optimal actions
Attack Graph Generation from unstructured security logs using LLMs with 70-90% correspondence to expert annotations at 60-245x speed improvement
Shared attack graph as joint battlefield for red and blue team agents in Attack & Defense scenarios
Three-phase closed-loop strategic feedback running in parallel with the main agent loop

结合了 LLM 神经推理与网络安全智能体符号博弈论均衡计算的神经符号架构
战略摘要注入（Strategic Digest Injection）：将纳什均衡计算结果转化为注入智能体系统提示词的自然语言指南，引导行为向统计最优操作发展
使用 LLM 从非结构化安全日志中生成攻击图，与专家注释的对应率达到 70-90%，速度提升 60-245 倍
将共享攻击图作为攻防场景中红蓝两队智能体的共同战场
与主智能体循环并行的三阶段闭环战略反馈

Open Questions 开放问题

How to achieve true delegated decision-making (autonomy) rather than supervised automation in real-world security operations?
Can game-theoretic reasoning be extended to handle the full combinatorial complexity of real-world cybersecurity (disciplines x sectors x constraints)?
How should vulnerability disclosure and patch deployment processes adapt when AI enumerates attack surfaces faster than humans can respond?
What happens when both offensive and defensive AI use game-theoretic reasoning, creating an algorithmic arms race beyond human strategic intuition?
How to develop effective human supervisory capabilities for AI systems reasoning at superhuman speeds about superhuman strategies?

如何在现实安全运营中实现真正的授权决策（自主性），而非受监督的自动化？
博弈论推理能否扩展到处理现实网络安全的所有组合复杂性（学科 x 行业 x 约束）？
当 AI 枚举攻击面的速度超过人类响应速度时，漏洞披露和补丁部署流程应如何调整？
当攻防双方的 AI 都使用博弈论推理时会发生什么？这是否会造成超越人类战略直觉的算法军备竞赛？
如何为以超人速度进行超人战略推理的 AI 系统开发有效的人类监督能力？

Builds On 基于前人工作

PentestGPT
CAI (Cybersecurity AI)
Cut-the-Rope (CTR) game-theoretic framework
Cybench benchmark
CAIBench

Open Source 开源信息

Partial - https://github.com/aliasrobotics/cai (Dual MIT/Proprietary license)