#59

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Hanzheng Dai, Yuanliang Li, Jun Yan, Zhibo Zhang

2025 | PRIME AI (workshop)

system penetration-testing human-in-the-loop human-in-the-loop FSM

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Existing LLM-based automated penetration testing (AutoPT) frameworks underperform compared to human experts due to imbalanced training knowledge, short-sighted planning, hallucinations in command generation, and lack of mechanisms to learn from previous failures.

与人类专家相比，现有的基于 LLM 的自动化渗透测试 (AutoPT) 框架表现不佳，原因是训练知识不平衡、规划短视、命令生成中存在幻觉以及缺乏从先前失败中学习的机制。

The trial-and-error nature of the PT process is constrained by existing frameworks that lack persistent memory, self-reflection capabilities, and structured domain knowledge, restricting adaptive improvement of PT strategies. A knowledge-informed, self-reflective framework is needed to bridge these gaps.

渗透测试过程的试错性质受到现有框架的限制，这些框架缺乏持久记忆、自我反思能力和结构化的领域知识，从而限制了渗透测试策略的自适应改进。需要一个基于知识且具有自我反思能力的框架来弥补这些差距。

Threat Model 威胁模型

Authorized penetration testing on target machines within a controlled environment (HackTheBox). The framework assumes a black-box testing perspective where the tester has network access but no prior internal knowledge of the target.

在受控环境 (HackTheBox) 中对目标机器进行授权渗透测试。该框架假设一个黑盒测试视角，即测试人员可以进行网络访问，但没有目标的先验内部知识。

Methodology 核心方法

RefPentester is a knowledge-informed, self-reflective AutoPT framework with five components: Process Navigator (determines PT stage and retrieves high-level knowledge via RAG), Generator (produces detailed PT guidance), Reflector (evaluates and scores execution results, provides failure reasons), Success Log (long-term memory), and Failure Log (short-term memory). It models the PT process as a seven-state Stage Machine to track progress across PT phases and uses a three-tier hierarchical vector database built from MITRE ATT&CK and OWASP Testing Guide.

RefPentester 是一个基于知识的自我反思型 AutoPT 框架，由五个组件组成：流程导航器（确定渗透测试阶段并通过 RAG 检索高层知识）、生成器（产生详细的渗透测试指导）、反思器（评估并给执行结果打分，提供失败原因）、成功日志（长期记忆）和失败日志（短期记忆）。它将渗透测试过程建模为一个七状态的状态机，以跟踪渗透测试各阶段的进展，并使用基于 MITRE ATT&CK 和 OWASP 测试指南构建的三层分层向量数据库。

Architecture 架构设计

Five-component architecture: (1) Process Navigator uses a PT Stage Machine and RAG pipeline to identify current PT stage and retrieve hierarchical knowledge (tactics, techniques, abstract actions); (2) Generator produces step-by-step operational PT guidance; (3) Reflector assigns rewards (0, 1, or 2) and identifies failure reasons; (4) Success Log maintains long-term memory of successful experiences; (5) Failure Log stores short-term failure experiences for reflection. Components are connected via LLM sessions with chaining structure.

五组件架构：(1) 流程导航器使用渗透测试状态机和 RAG 流水线来识别当前渗透测试阶段并检索分层知识（战术、技术、抽象操作）；(2) 生成器产生逐步的操作性渗透测试指导；(3) 反思器分配奖励（0、1 或 2）并识别失败原因；(4) 成功日志维护成功的长期记忆；(5) 失败日志存储短期的失败经验用于反思。各组件通过具有链式结构的 LLM 会话连接。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

dual-memory

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

RefPentester achieved 100% credential capture rate (6/6 flags) across three iterations on HTB's Sau machine, outperforming GPT-4o's 83.3% (5/6 flags) by 16.7%. RefPentester demonstrated superior PT stage transition success rates across all stages: Information Gathering (80% vs 61.5%), Vulnerability Identification (87.5% vs 35.7%), Exploitation (52.9% vs 36.7%), Post-Exploitation (71.4% vs 29.1%), and Capture the Flag (100% vs 83.3%).

RefPentester 在 HTB 的 Sau 机器上经过三次迭代，实现了 100% 的凭据捕获率（6/6 flags），优于 GPT-4o 的 83.3%（5/6 flags），提升了 16.7%。RefPentester 在所有阶段的渗透测试阶段转换成功率均表现优异：信息收集（80% vs 61.5%）、漏洞识别（87.5% vs 35.7%）、漏洞利用（52.9% vs 36.7%）、后渗透（71.4% vs 29.1%）以及夺旗（100% vs 83.3%）。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

GPT-4o

Scale 评估规模

1 HackTheBox machine (Sau), 3 iterations each for RefPentester and baseline

Contributions 核心贡献

Proposed RefPentester, a knowledge-informed self-reflective AutoPT framework with five components (Process Navigator, Generator, Reflector, Success Log, Failure Log) that assists human operators through the PT process
Constructed a three-tier hierarchical PT knowledge vector database from MITRE ATT&CK and OWASP Testing Guide, capable of providing structured knowledge at tactic, technique, and abstract action levels
Designed a seven-state PT Stage Machine to model the penetration testing process and guide stage identification and transitions
Demonstrated through case study on HTB's Sau machine that RefPentester outperforms baseline GPT-4o in both credential capture rate and stage transition success rate

提出了 RefPentester，一个基于知识的自我反思型 AutoPT 框架，包含五个组件（流程导航器、生成器、反思器、成功日志、失败日志），协助人类操作员完成渗透测试过程
基于 MITRE ATT&CK 和 OWASP 测试指南构建了一个三层分层渗透测试知识向量数据库，能够提供战术、技术和抽象操作级别的结构化知识
设计了一个七状态的渗透测试状态机来模拟渗透测试过程，并指导阶段识别和转换
通过 HTB Sau 机器的案例研究证明，RefPentester 在凭据捕获率和阶段转换成功率方面均优于基准 GPT-4o

Limitations 局限性

Evaluation limited to a single HackTheBox machine (Sau) with only three iterations, providing limited statistical significance
Human-in-the-loop design requires a human operator to execute commands on the target machine and relay results, limiting full automation
No ablation study to determine which component contributes most to performance improvements
Only evaluated with GPT-4o; no comparison with other LLMs or existing AutoPT frameworks like PentestGPT or Autoattacker
PT knowledge database is static and does not dynamically update with emerging threats
The Stage Machine has a fixed maximum of five iterations per state before transitioning to terminal state, which may be insufficient for complex targets

评估仅限于单个 HackTheBox 机器 (Sau)，且仅进行了三次迭代，统计显著性有限
人类在环设计需要人类操作员在目标机器上执行命令并传回结果，限制了全自动化
没有进行消融实验来确定哪个组件对性能提升贡献最大
仅使用 GPT-4o 进行了评估；没有与其他 LLM 或现有的 AutoPT 框架（如 PentestGPT 或 Autoattacker）进行比较
渗透测试知识库是静态的，不会随新出现的威胁动态更新
状态机在转换到终止状态之前，每个状态最多只有五次迭代，这对于复杂目标可能不足

Research Gaps 研究空白

Need for evaluation across diverse experimental environments and multiple target machines
Ablation studies needed to determine the contribution of individual framework components
Dynamic knowledge integration pipelines to address emerging threats are not yet developed
Integration of RLHF to enhance reflection capabilities remains unexplored
Hybrid approaches combining LLM-based frameworks with traditional penetration testing tools need exploration
Ethical compliance integration into automated PT frameworks is an open challenge

需要在多样化的实验环境和多个目标机器上进行评估
需要进行消融实验来确定各框架组件的贡献
尚未开发出应对新出现威胁的动态知识集成流水线
尚未探索通过 RLHF 增强反思能力
需要探索将基于 LLM 的框架与传统渗透测试工具相结合的混合方法
将道德合规性集成到自动化渗透测试框架中是一个开放的挑战

Novel Techniques 新颖技术

Seven-state PT Stage Machine for modeling and tracking penetration testing progress with formal state transitions
Dual-memory system (Success Log as long-term memory, Failure Log as short-term memory) for maintaining PT context and enabling reflection
Three-tier hierarchical PT knowledge representation (tactics, techniques, abstract actions) stored in a vector database with RAG retrieval
Reward-based reflection mechanism (scores 0, 1, 2) that distinguishes between high-level and low-level knowledge errors to direct reflection to the appropriate component

用于建模和跟踪具有正式状态转换的渗透测试进度的七状态渗透测试状态机
双记忆系统（作为长期记忆的成功日志，作为短期记忆的失败日志），用于维护渗透测试上下文并实现反思
存储在向量数据库中并通过 RAG 检索的三层分层渗透测试知识表示（战术、技术、抽象操作）
基于奖励的反思机制（评分 0, 1, 2），区分高层和低层知识错误，从而将反思定向到适当的组件

Open Questions 开放问题

How well does the framework generalize to machines of varying difficulty and different attack surfaces beyond web-based targets?
Can the Stage Machine be made adaptive rather than pre-defined to handle novel PT scenarios?
How does the framework scale with the number of PT iterations and the complexity of the target environment?
What is the optimal balance between human involvement and autonomous operation in LLM-based PT?

该框架在不同难度和不同攻击面（不限于 Web 目标）的机器上的泛化能力如何？
状态机能否设计为自适应而非预定义，以处理新颖的渗透测试场景？
框架如何随渗透测试迭代次数和目标环境复杂度的增加而扩展？
在基于 LLM 的渗透测试中，人类参与与自主操作之间的最佳平衡点是什么？

Builds On 基于前人工作

Reflexion
PentestGPT
Autoattacker
DRLRM-PT
RAG