#22

CAI: An Open, Bug Bounty-Ready Cybersecurity AI CAI: An Open, Bug Bounty-Ready Cybersecurity AI

Victor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Maria Sanz-Gomez, Lidia Salas Espejo, Martino Crespo-Alvarez, Francisco Oca-Gonzalez, Francesco Balassone, Alfonso Glera-Picon, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte

2025 | arXiv (preprint)

arXiv:2504.06017

system penetration-testing semi-autonomous multi-agent ReAct

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Current AI-driven cybersecurity tools are either proprietary, limited in scope, or lack rigorous empirical validation against human experts. The bug bounty ecosystem is dominated by an oligopoly of platforms (HackerOne, Bugcrowd) that exclude SMEs and independent researchers, while LLM vendors systematically downplay the offensive security capabilities of their models.

目前的 AI 驱动网络安全工具要么是专有的、范围有限，要么缺乏针对人类专家的严格实证验证。漏洞赏金生态系统被少数平台（HackerOne, Bugcrowd）形成的寡头垄断，排斥了中小企业和独立研究人员，而 LLM 厂商则系统性地淡化其模型的攻击性安全能力。

There is a critical need for an open-source, fully-autonomous cybersecurity AI framework that democratizes advanced security testing, enabling organizations of all sizes and non-expert users to perform bug bounty hunting and penetration testing at human-competitive levels. Existing tools cover only narrow aspects of the security testing workflow, and vendor benchmarks artificially suppress offensive capability assessments.

迫切需要一个开源、全自动的网络安全 AI 框架，使先进的安全测试民主化，让各种规模的组织和非专家用户都能以媲美人类专家的水平进行漏洞赏金狩猎和渗透测试。现有工具仅涵盖安全测试流程的狭窄方面，且厂商基准测试人为地抑制了攻击能力的评估。

Threat Model 威胁模型

The framework assumes authorized security testing scenarios including CTF competitions, Hack The Box exercises, and sanctioned bug bounty programs. CAI operates with standard penetration testing tool access from a Kali Linux environment. Human-in-the-loop oversight is a design cornerstone, acknowledging that fully autonomous offensive security systems remain premature.

该框架假设授权的安全测试场景，包括 CTF 竞赛、Hack The Box 练习和经批准的漏洞赏金计划。CAI 通过 Kali Linux 环境获得标准渗透测试工具的访问权限。人机协作（Human-in-the-loop）监督是设计的基石，承认全自动攻击性安全系统目前尚不成熟。

Methodology 核心方法

CAI is an open-source, agent-centric framework built around six fundamental pillars: Agents, Tools, Handoffs, Patterns, Turns, and Human-In-The-Loop (HITL). It provides pre-built specialized agentic patterns (Red Team Agent, Bug Bounty Hunter, Blue Team Agent) that combine modular agent design with seamless tool integration. The framework supports multiple LLM backends and uses a pattern-based architecture where agents coordinate through handoffs, execute security actions via tools, and maintain human oversight through the HITL module. CAI introduces a novel four-level autonomy classification for cybersecurity AI, positioning itself as the first Level 4 (fully autonomous with planning, scanning, exploitation, and mitigation) open-source system.

CAI 是一个以智能体为中心的开源框架，围绕六个基本支柱构建：智能体 (Agents)、工具 (Tools)、移交 (Handoffs)、模式 (Patterns)、轮次 (Turns) 和人机协作 (HITL)。它提供了预构建的专业化智能体模式（红队智能体、漏洞赏金猎人、蓝队智能体），将模块化智能体设计与无缝工具集成相结合。该框架支持多种 LLM 后端，并采用基于模式的架构，智能体通过移交进行协调，通过工具执行安全动作，并通过 HITL 模块保持人类监督。CAI 引入了一种新型的网络安全 AI 四级自主分类，将自己定位为首个 4 级（具有规划、扫描、漏洞利用和缓解能力的全自动）开源系统。

Architecture 架构设计

Three-layer architecture: (1) Human Interface Layer with HITL and Turns components managing interaction flow; (2) Agent Coordination Layer with Patterns orchestrating agent interactions through Handoffs; (3) Execution Layer with Tools providing concrete capabilities (command execution, web searching, code manipulation, secure tunneling) and Tracing for logging. Extensions provide auxiliary debugging and monitoring. Agents leverage LLMs for reasoning while Patterns define specialized agentic architectures (e.g., one_tool_agent, Red Team Agent, Blue Team Agent, Bug Bounty Agent) with tailored objectives and tool sets.

三层架构：(1) 人类接口层 (Human Interface Layer)，通过 HITL 和 Turns 组件管理交互流；(2) 智能体协调层 (Agent Coordination Layer)，通过 Patterns 利用 Handoffs 编排智能体交互；(3) 执行层 (Execution Layer)，通过 Tools 提供具体能力（命令执行、网页搜索、代码操作、安全隧道）并利用 Tracing 进行日志记录。扩展组件提供辅助调试和监控。智能体利用 LLM 进行推理，而 Patterns 定义了专门的智能体架构（例如 one_tool_agent、红队智能体、蓝队智能体、漏洞赏金智能体），具有定制的目标和工具集。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

conversation-history

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

CAI outperformed humans by 11x in time and 156x in cost across 54 CTF challenges ($109 vs $17,218). Claude 3.7 Sonnet was the best LLM, solving 19/23 selected CTFs. CAI ranked 1st among AI teams in the 'AI vs Human' CTF Challenge (earning $750), reached top-20 overall, and achieved top-30 in Spain / top-500 worldwide on HTB within one week. In bug bounty exercises, non-professionals found 6 valid vulnerabilities (CVSS 4.3-7.5) and professionals found 4 bugs within one week each.

在 54 个 CTF 挑战中，CAI 在时间上比人类快 11 倍，在成本上低 156 倍（109 美元 vs 17,218 美元）。Claude 3.7 Sonnet 是表现最好的 LLM，解决了 23 个选定 CTF 中的 19 个。CAI 在“AI vs Human”CTF 挑战赛中在 AI 团队中排名第 1（获得 750 美元奖金），综合排名进入前 20。在一周内，CAI 在 HTB 上的排名达到西班牙前 30 / 全球前 500。在漏洞赏金练习中，非专业人士发现了 6 个有效漏洞 (CVSS 4.3-7.5)，专业人士在一周内各发现了 4 个漏洞。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

Human security experts (best-performing human per challenge)
Human First Blood times on Hack The Box
Other AI teams in live CTF competitions
Multiple LLMs compared against each other

Scale 评估规模

54 CTF challenges across 7 categories, 11 HTB machines, 20 challenges in AI vs Human CTF, 62 challenges in Cyber Apocalypse CTF 2025, plus real-world bug bounty testing on HackerOne and Bugcrowd targets

Contributions 核心贡献

First open-source bug bounty-ready Cybersecurity AI framework validated through extensive testing with professional security researchers and bug bounty experts
Novel four-level autonomy classification for cybersecurity AI (Manual, LLM-Assisted, Semi-automated, Cybersecurity AIs), with CAI being the first open-source Level 4 system
International CTF-winning AI architecture that demonstrates human-competitive capabilities, up to 3,600x faster in specific tasks and 11x faster overall
Comprehensive empirical evaluation of both closed- and open-weight LLMs for offensive cybersecurity, revealing significant discrepancies between vendor claims and actual capabilities
Demonstration that non-security professionals can discover significant bugs (CVSS 4.3-7.5) at rates comparable to experts when augmented with CAI

首个开源、具备漏洞赏金能力的自动化网络安全 AI 框架，通过与专业安全研究人员 e 漏洞赏金专家的广泛测试得到验证。
提出了网络安全 AI 的新型四级自主分类（手动、LLM 辅助、半自动化、网络安全 AI），CAI 是首个开源的 4 级系统。
赢得了国际 CTF 竞赛的 AI 架构，展示了媲美人类专家的能力，在特定任务上快达 3,600 倍，整体快 11 倍。
对用于攻击性网络安全的闭源和开源权重 LLM 进行了全面的实证评估，揭示了厂商宣称的能力与实际能力之间的显著差异。
证明了在 CAI 的增强下，非安全专业人员发现重大漏洞 (CVSS 4.3-7.5) 的效率可与专家媲美。

Limitations 局限性

Performance degrades significantly on hard and insane difficulty challenges; CAI was slower than humans on HTB machines overall (0.59x time ratio), excelling only at isolated CTF challenges rather than complex multi-step machine compromises
Weak performance in binary exploitation (pwn, 0.77x) and cryptography (crypto, 0.47x) categories requiring deep mathematical understanding or complex exploitation techniques
Fully autonomous operation remains premature; effective security operations still require human teleoperation for expertise, judgment, and oversight
Current agentic patterns do not fully explore all web domains and attack surfaces provided, as noted by professional bug bounty hunters
Struggles with advanced mitigations such as ASLR and stack canaries in binary exploitation scenarios
Bug bounty at scale is infeasible without additional human supervision; the framework faces challenges maintaining quality when running many concurrent assessments
Evaluation used best-performing human per challenge as baseline, which may not represent average human performance
HTB machines proved significantly more challenging than isolated CTF challenges, suggesting the CTF benchmarks may overstate real-world effectiveness

在“困难”和“疯狂”难度的挑战中，性能显著下降；在 HTB 机器上的整体速度慢于人类（时间比率为 0.59x），CAI 擅长孤立的 CTF 挑战，而非复杂的多步机器攻陷。
在二进制漏洞利用 (pwn, 0.77x) 和密码学 (crypto, 0.47x) 类别中表现较弱，这些类别需要深刻的数学理解或复杂的利用技术。
全自动运行目前尚不成熟；有效的安全运营仍需要人类远程操作来进行专业判断、裁决 e 监督。
当前的智能体模式并未充分探索提供的所有 Web 域 e 攻击面，这一点得到了专业漏洞赏金猎人的指正。
在二进制漏洞利用场景中，难以应对 ASLR e 栈保护（stack canaries）等高级缓解措施。
在没有额外人类监督的情况下，大规模进行漏洞赏金是不可行的；在运行多个并发评估时，框架在维持质量方面面临挑战。
评估使用了每个挑战中表现最好的人类作为基准，这可能不代表人类的平均水平。
HTB 机器被证明比孤立的 CTF 挑战难度大得多，这表明 CTF 基准可能夸大了现实世界的有效性。

Research Gaps 研究空白

Limited empirical evaluation comparing AI security tools against human experts under realistic conditions; most tools are evaluated only in synthetic environments
Accessibility barriers: cutting-edge AI security tools are often proprietary and restricted to well-funded organizations
Oligopolistic control of vulnerability discovery by a few bug bounty platforms using exclusive contracts and proprietary AI trained on researcher-submitted data
LLM vendors systematically downplay offensive security capabilities through restricted benchmarks that avoid agentic instrumentation, creating dangerous security blind spots
Lack of standardized security evaluation protocols for LLMs that incorporate full agentic evaluation with unrestricted tool access against real-world challenges
Open-weight LLMs significantly underperform closed-weight models in cybersecurity tasks, with limited research on closing this gap through domain-specific fine-tuning
Insufficient research on long-term planning and contextual adaptation needed for complex multi-stage penetration testing scenarios

缺乏在现实条件下将 AI 安全工具与人类专家进行对比的实证评估；大多数工具仅在合成环境中进行评估。
访问障碍：前沿的 AI 安全工具通常是专有的，且仅限于资金雄厚的组织。
少数漏洞赏金平台利用独家合同 e 使用研究人员提交数据训练的专有 AI，对漏洞发现形成了寡头控制。
LLM 厂商通过避开智能体化工具、限制基准测试来系统性地淡化攻击性安全能力，造成了危险的安全盲区。
缺乏标准化的 LLM 安全评估协议，无法结合针对现实挑战的全智能体化评估 e 无限制工具访问。
在网络安全任务中，开源权重 LLM 的表现显著逊于闭源模型，关于通过特定领域微调来缩小这一差距的研究较少。
对于复杂的多阶段渗透测试场景所需的长短期规划 e 上下文适应的研究不足。

Novel Techniques 新颖技术

Four-level cybersecurity autonomy classification (Manual, LLM-Assisted, Semi-automated, Cybersecurity AIs) providing a taxonomy for the field
Specialized agentic patterns (Red Team Agent, Bug Bounty Hunter, Blue Team Agent) with tailored tool sets, objectives, and handoff mechanisms for different security roles
Pattern-based multi-agent architecture using Handoffs for inter-agent coordination in security workflows, inspired by OpenAI Swarm but adapted for cybersecurity
Demonstrated that non-professionals with AI augmentation can find vulnerabilities at rates comparable to expert bug bounty hunters, validating the democratization thesis
Empirical methodology for benchmarking AI vs humans in security using time ratios, cost ratios, and First Blood comparisons across difficulty levels and categories

四级网络安全自主分类（手动、LLM 辅助、半自动化、网络安全 AI），为该领域提供了分类法。
针对不同安全角色定制的专业化智能体模式（红队智能体、漏洞赏金猎人、蓝队智能体），具有量身定制的工具集、目标 e 移交机制。
基于模式的多智能体架构，在安全工作流中使用移交 (Handoffs) 进行智能体间协作，灵感来自 OpenAI Swarm 但针对网络安全进行了调整。
证明了通过 AI 增强的非专业人士发现漏洞的效率可与专家级漏洞赏金猎人媲美，验证了“民主化”论点。
衡量 AI 与人类在安全领域表现的实证方法论，使用跨难度级别 e 类别的“时间比率”、“成本比率” e “First Blood”对比。

Open Questions 开放问题

How can LLMs improve at complex multi-step exploitation chains that require long-term planning, contextual memory, and domain-specific reasoning?
What domain-specific fine-tuning or knowledge representation would close the gap between open-weight and closed-weight LLMs for cybersecurity tasks?
Can AI-driven bug bounty hunting truly scale to operate autonomously without degrading quality, or is human oversight fundamentally required?
How should the security community establish standardized, transparent benchmarks for evaluating offensive AI capabilities that vendors cannot manipulate?
What are the ethical and regulatory implications of democratizing offensive security capabilities through open-source AI frameworks?
How can agentic patterns be improved to handle the full attack surface exploration that professional bug bounty hunters noted was lacking?
What is the ceiling for AI performance on insane-difficulty challenges requiring novel exploitation techniques not well-represented in training data?

LLM 如何改进需要长期规划、上下文记忆 e 领域特定推理的复杂多步漏洞利用链？
哪些领域特定的微调或知识表示能缩小开源与闭源 LLM 在网络安全任务上的差距？
AI 驱动的漏洞赏金狩猎能否在不降低质量的情况下真正实现大规模自主运行，还是人类监督是根本要求的？
安全社区应如何建立标准、透明的评估协议来衡量厂商无法操纵的攻击性 AI 能力？
通过开源 AI 框架使攻击性安全能力民主化所带来的伦理 e 监管影响是什么？
如何改进智能体模式，以处理专业漏洞赏金猎人指出的那种全攻击面探索？
对于训练数据中代表性不足的新型漏洞利用技术，AI 在“疯狂”难度挑战中的表现上限在哪里？

Builds On 基于前人工作

PentestGPT
OpenAI Swarm library (agentic principles)
LiteLLM
Phoenix
NYU CTF Bench
AutoPT
Vulnbot
Cybersecurity Kill Chain (Hutchins et al.)

Open Source 开源信息

Yes - https://github.com/aliasrobotics/cai (MIT license)