#69

ChainReactor: Automated Privilege Escalation Chain Discovery via AI Planning ChainReactor: Automated Privilege Escalation Chain Discovery via AI Planning

Giulio De Pasquale, Ilya Grishchenko, Riccardo Iesari, Gabriel Pizarro, Lorenzo Cavallaro, Christopher Kruegel, Giovanni Vigna

2024 | USENIX Security Symposium (top-conference)

https://www.usenix.org/conference/usenixsecurity24/presentation/de-pasquale

system penetration-testing fully-autonomous classical-planning

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Current vulnerability research focuses on identifying individual bugs and exploits, but modern advanced attacks rely on sequences of steps (exploitation chains) combining individually benign actions and vulnerabilities to achieve goals like privilege escalation. The identification of such chains is predominantly manual and does not scale to the complexity of modern systems.

当前的漏洞研究侧重于识别单个漏洞和利用方法，但现代高级攻击依赖于将单独无害的操作和漏洞组合在一起的步骤序列（利用链）来实现权限提升等目标。此类链的发现主要依靠人工操作，无法扩展到现代系统的复杂度。

Single vulnerabilities are often insufficient to achieve an attacker's objectives due to anti-exploitation techniques (CFI, ASLR). Real-world attacks like those demonstrated at Pwn2Own require chaining multiple vulnerabilities and benign system capabilities. The complexity of modern operating systems hides subtle interactions among components, making manual chain discovery a daunting task requiring automated approaches.

由于反利用技术（CFI、ASLR）的存在，单个漏洞往往不足以实现攻击者的目标。像 Pwn2Own 上展示的真实攻击需要将多个漏洞和良性系统功能链接在一起。现代操作系统的复杂性隐藏了组件之间的微妙交互，使得手动发现攻击链成为一项艰巨的任务，需要自动化方法。

Threat Model 威胁模型

The attacker has already gained initial access to a target Unix system as an unprivileged user (e.g., via SSH or a web shell). The goal is to escalate privileges to root or to another user with higher privileges. The specific initial intrusion method is out of scope. The attacker can leverage available system executables, misconfigurations, and known CVE-affected binaries present on the system.

攻击者已经以非特权用户身份获得了对目标 Unix 系统的初始访问权限（例如通过 SSH 或 Web Shell）。目标是将权限提升到 root 或具有更高权限的其他用户。具体的初始入侵方法不在研究范围内。攻击者可以利用系统中可用的可执行文件、配置错误和已知受 CVE 影响的二进制文件。

Methodology 核心方法

ChainReactor models privilege escalation chain discovery as a classical AI planning problem. It automatically extracts information about available executables, system configurations, file permissions, and known vulnerabilities (CVEs) from the target system. This information is encoded into a Planning Domain Definition Language (PDDL) problem specification. A lifted AI planner then searches for action sequences that transition from the initial unprivileged state to a goal state (root shell or higher-privileged shell). The planner combines vulnerability exploits with benign system actions to discover multi-step chains.

ChainReactor 将权限提升链发现建模为经典 AI 规划问题。它自动从目标系统提取有关可执行文件、系统配置、文件权限和已知漏洞 (CVE) 的信息，然后将这些信息编码为规划域定义语言 (PDDL) 问题规范。提升式 AI 规划器随后搜索从初始非特权状态过渡到目标状态（root shell 或更高权限 shell）的动作序列。规划器将漏洞利用与良性系统操作相结合，以发现多步攻击链。

Architecture 架构设计

ChainReactor consists of three main components: (1) An information gathering module that runs on the target system to extract executables, file permissions, user/group memberships, systemd service configurations, cron jobs, and CVE-affected binaries (using cve-bin-tool and Ubuntu CVE database). It also integrates GTFOBins to map binaries to post-exploitation capabilities. (2) A PDDL encoding module that translates extracted information into PDDL domain predicates (e.g., user_group, file_owner, executable_can_write_to_file) and instantiates them as facts in the problem file. The domain defines actions like spawning processes, writing files, exploiting SUID binaries, corrupting systemd services, and exploiting CVEs. (3) A lifted planner (based on Powerlifted) that searches the PDDL state space without grounding, enabling scalability to large systems with many objects and predicates.

ChainReactor 由三个主要组件组成：(1) 信息收集模块，在目标系统上运行，提取可执行文件、文件权限、用户/组成员关系、systemd 服务配置、cron 作业和受 CVE 影响的二进制文件（使用 cve-bin-tool 和 Ubuntu CVE 数据库），还集成了 GTFOBins 将二进制文件映射到后渗透功能；(2) PDDL 编码模块，将提取的信息转换为 PDDL 域谓词（如 user_group、file_owner、executable_can_write_to_file），并将它们作为事实实例化到问题文件中；(3) 提升式规划器（基于 Powerlifted），在不进行接地的情况下搜索 PDDL 状态空间，使其能够扩展到具有大量对象和谓词的大型系统。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

none

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

On CTF VMs, ChainReactor rediscovered all known privilege escalation chains matching or extending published walkthroughs. On real-world instances, it identified 16 zero-day privilege escalation chains on Amazon EC2 and 4 on Digital Ocean instances. Two AWS images were removed from offerings based on the reported vulnerabilities. LinPEAS detected individual misconfigurations but could not compose them into actionable chains. Grounded planners failed on most real-world instances due to memory explosion, while the lifted planner succeeded.

在 CTF 虚拟机上，ChainReactor 重新发现了所有已知的权限提升链，匹配或扩展了已发布的攻略。在真实实例上，它在 Amazon EC2 上识别了 16 条零日权限提升链，在 Digital Ocean 实例上识别了 4 条。两个 AWS 镜像因所报告的漏洞被从产品中移除。LinPEAS 检测到了单个配置错误，但无法将它们组合成可操作的链。接地式规划器由于内存爆炸在大多数真实实例上失败，而提升式规划器则成功运行。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

LinPEAS (privilege escalation enumeration script)
Manual walkthrough solutions for CTF VMs
Grounded planners (Fast Downward) vs lifted planners (Powerlifted)

Scale 评估规模

684 total instances (3 CTF VMs + 504 EC2 + 177 Digital Ocean)

Contributions 核心贡献

First automated approach for exploit chain discovery using AI planning, formulating privilege escalation as a PDDL planning problem.
Novel method for automated extraction of system programs, configurations, file permissions, CVEs, and GTFOBins capabilities, translating them into PDDL predicates and facts.
Evaluation on 684 real-world and CTF systems demonstrating the ability to rediscover known chains and identify 20 previously unreported zero-day privilege escalation chains.
Demonstration that planner outputs can be transformed into operational exploitation sequences, producing working exploits for all discovered chains.

首个使用 AI 规划进行利用链发现的自动化方法，将权限提升形式化为 PDDL 规划问题。
自动提取系统程序、配置、文件权限、CVE 和 GTFOBins 功能，并将其转换为 PDDL 谓词和事实的新方法。
在 684 个真实和 CTF 系统上进行评估，证明了重新发现已知链和识别 20 条此前未报告的零日权限提升链的能力。
证明规划器输出可以转化为可操作的利用序列，为所有发现的链生成了有效的利用程序。

Limitations 局限性

Does not cover the initial infiltration/access phase; assumes the attacker already has an unprivileged shell on the target.
Supports a limited set of PDDL actions; expanding the action library could discover more complex and subtle chains.
Does not automatically generate executable exploit code from plans; operationalization of chains currently requires manual translation.
Does not support Linux capabilities, network mounts, or PATH environment variable injection as exploitation primitives.
Grounded planners cannot scale to real-world system complexity due to memory explosion; relies on lifted planners which may be less efficient in some scenarios.
CVE detection depends on binary version matching (cve-bin-tool) and Ubuntu CVE database, which may miss vulnerabilities not catalogued in these sources.

不涵盖初始渗透/访问阶段；假设攻击者已经在目标上拥有非特权 shell。
支持的 PDDL 动作集有限；扩展动作库可以发现更复杂和微妙的链。
不能自动从规划生成可执行的利用代码；链的操作化目前需要手动转换。
不支持 Linux capabilities、网络挂载或 PATH 环境变量注入作为利用原语。
接地式规划器由于内存爆炸无法扩展到真实系统的复杂性；依赖提升式规划器，在某些场景中可能效率较低。
CVE 检测依赖于二进制版本匹配 (cve-bin-tool) 和 Ubuntu CVE 数据库，可能会遗漏未在这些来源中编目的漏洞。

Research Gaps 研究空白

No existing automated tool can compose individual vulnerabilities and misconfigurations into actionable multi-step exploitation chains.
Privilege escalation enumeration tools (e.g., LinPEAS) identify individual issues but lack reasoning about how they combine into complete attack paths.
Classical vulnerability research focuses on single-bug discovery rather than understanding how vulnerabilities interact within exploitation chains.
The gap between identifying a vulnerability and understanding its real-world exploitability in context of the full system state remains largely unaddressed.
Automated validation of discovered chains (end-to-end exploit execution) is still an open problem.

现有的自动化工具无法将单个漏洞和配置错误组合成可操作的多步利用链。
权限提升枚举工具（如 LinPEAS）可以识别单个问题，但缺乏推理它们如何组合成完整攻击路径的能力。
经典的漏洞研究侧重于单个漏洞的发现，而不是理解漏洞在利用链中如何交互。
识别漏洞与理解其在完整系统状态背景下的真实可利用性之间的差距在很大程度上尚未解决。
发现链的自动验证（端到端利用执行）仍然是一个开放问题。

Novel Techniques 新颖技术

PDDL-based privilege escalation modeling: Encoding Unix system state (users, groups, file permissions, executables, CVEs) as PDDL predicates and privilege escalation as a planning goal, enabling automated chain discovery through classical AI planning.
Lifted planning for security: Using lifted planners that operate on abstract PDDL representations without grounding, overcoming the memory explosion problem that makes grounded planners infeasible for real-world system analysis.
GTFOBins integration: Automatically mapping system binaries to post-exploitation capabilities (file read/write, shell spawning, download/upload) using the GTFOBins database as a knowledge source for the planner.
Capability extraction pipeline: Automated gathering of system information (executables, SUID bits, file permissions, cron jobs, systemd services, CVE-affected binaries) and translation into PDDL facts.

基于 PDDL 的权限提升建模：将 Unix 系统状态（用户、组、文件权限、可执行文件、CVE）编码为 PDDL 谓词，将权限提升作为规划目标，通过经典 AI 规划实现自动化链发现。
用于安全的提升式规划：使用在抽象 PDDL 表示上操作而不进行接地的提升式规划器，克服了接地式规划器在真实系统分析中因内存爆炸而不可行的问题。
GTFOBins 集成：使用 GTFOBins 数据库作为规划器的知识源，自动将系统二进制文件映射到后渗透功能（文件读写、shell 生成、上传下载）。
能力提取流水线：自动收集系统信息（可执行文件、SUID 位、文件权限、cron 作业、systemd 服务、受 CVE 影响的二进制文件）并将其转换为 PDDL 事实。

Open Questions 开放问题

Can the PDDL planning approach be integrated with LLM-based agents to combine structured reasoning with natural language understanding for more comprehensive pentesting?
How can the action library be systematically expanded to cover more exploitation primitives without manual PDDL encoding?
Can the approach be extended to network-level attack chain discovery across multiple hosts?
How would the planning approach handle dynamic system states that change during exploitation?
Can reinforcement learning or LLMs be used to automatically generate PDDL domain models from vulnerability descriptions?

PDDL 规划方法能否与基于 LLM 的智能体集成，结合结构化推理和自然语言理解实现更全面的渗透测试？
如何在不需要手动 PDDL 编码的情况下系统性地扩展动作库以涵盖更多利用原语？
该方法能否扩展到跨多主机的网络级攻击链发现？
规划方法如何处理在利用过程中发生变化的动态系统状态？
强化学习或 LLM 能否用于从漏洞描述中自动生成 PDDL 域模型？

Builds On 基于前人工作

PDDL planning (Fox and Long, 2003)
GTFOBins database
cve-bin-tool (Intel)
AI planning for network attacks (Hoffmann, 2015; Amos-Binks et al., 2017)
Powerlifted planner (Corrêa et al., 2020)
Attack graphs and attack trees literature