#42

Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments

Maria Rigaki, Carlos Catania, Sebastian Garcia

2024 | arXiv (preprint)

system red-teaming fully-autonomous single-agent ReAct

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Commercial cloud-based LLMs used as cybersecurity agents raise privacy concerns, incur high costs, and require network connectivity. Smaller open-source models lack the capability to perform well in network security tasks without additional adaptation.

作为网络安全智能体使用的商业云端 LLM 存在隐私问题、高昂成本且需要网络连接。较小的开源模型在没有额外适配的情况下，在执行网络安全任务方面缺乏足够的能力。

Companies are hesitant to send sensitive internal network data to cloud-based proprietary models. Open-source models are freely available and can run on-premises, promoting privacy, accessibility, and reproducibility, but they need fine-tuning to match the performance of larger commercial models in specialized cybersecurity tasks. Additionally, prior RL-based agents require extensive retraining for each new environment configuration, whereas LLM-based agents can generalize more readily.

公司对将敏感的内部网络数据发送到云端专有模型持谨慎态度。开源模型可免费获得且可以在本地运行，从而提升了隐私性、可访问性和可重复性，但它们需要经过微调才能在专门的网络安全任务中达到大型商业模型的性能。此外，先前基于强化学习 (RL) 的智能体需要针对每个新环境配置进行大量的重新训练，而基于 LLM 的智能体可以更容易地进行泛化。

Threat Model 威胁模型

The agent plays the role of a red-team attacker who has gained an initial foothold in a network and aims to discover and exfiltrate specific data to a command-and-control (C&C) server on the internet. A stochastic defender may be present that monitors for repeated or sequential scanning actions and can terminate episodes upon detection.

智能体扮演红队攻击者的角色，已在网络中获得初始立足点，目标是发现特定数据并将其外传到互联网上的命令与控制 (C&C) 服务器。可能存在随机防御者，它监视重复或顺序的扫描操作，并在检测到攻击时终止会话。

Methodology 核心方法

The authors fine-tune Zephyr-7b-beta, a 7 billion parameter open-source LLM based on Mistral-7B, using Quantized LoRA (QLoRA) supervised fine-tuning on a custom three-part cybersecurity dataset of 1641 question-answer pairs. The dataset addresses environment understanding (1080 samples generated by GPT-4, GPT-3.5-turbo, and Claude), valid action generation (450 samples from GPT-4), and good action selection (113 samples from GPT-4 game runs). The resulting model, Hackphyr, is deployed as an agent in the NetSecGame network security simulation environment and evaluated across three scenarios of increasing complexity.

作者使用量化 LoRA (QLoRA) 监督微调方法，对基于 Mistral-7B 的 70 亿参数开源 LLM Zephyr-7b-beta 进行了微调。微调使用了由 1641 个问答对组成的自定义三部分网络安全数据集。该数据集涵盖了环境理解（由 GPT-4、GPT-3.5-turbo 和 Claude 生成的 1080 个样本）、有效操作生成（来自 GPT-4 的 450 个样本）和良好操作选择（来自 GPT-4 游戏运行的 113 个样本）。最终生成的模型 Hackphyr 被部署为 NetSecGame 网络安全模拟环境中的智能体，并在三个复杂度递增的场景中进行了评估。

Architecture 架构设计

The agent architecture follows a four-component design: Profile (handcrafted penetration tester personality in the prompt), Memory (unified short-term memory of the last k actions and current environment state presented in the prompt), Planning (ReAct framework combining reasoning and action selection with in-context learning examples), and Action (task completion based on plan and environment feedback). The agent receives the current game state as a structured text representation and outputs JSON-formatted actions.

智能体架构遵循四组件设计：Profile（提示词中手工打造的渗透测试员个性）、Memory（提示词中呈现的过去 k 个动作和当前环境状态的统一短期记忆）、Planning（结合推理和动作选择以及上下文学习示例的 ReAct 框架）和 Action（基于计划和环境反馈的任务完成）。智能体接收当前游戏状态的结构化文本表示，并输出 JSON 格式的动作。

LLM Models 使用的大模型

Tool Integration 工具集成

Memory Mechanism 记忆机制

conversation-history

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

Hackphyr achieved 94% win rate in the small scenario (vs. 100% GPT-4, 50% GPT-3.5-turbo, 30.46% base Zephyr), 89.10% in the full unseen scenario (vs. 100% GPT-4, 30% GPT-3.5-turbo), and 50.34% in the complex three-subnets scenario (vs. 82.35% GPT-4, 10% GPT-3.5-turbo, 0% Q-learning). Hackphyr consistently outperformed GPT-3.5-turbo, the base Zephyr model, and Q-learning baselines, while approaching GPT-4 performance despite being orders of magnitude smaller.

Hackphyr 在小型场景中实现了 94% 的胜率（相比之下，GPT-4 为 100%，GPT-3.5-turbo 为 50%，原始 Zephyr 为 30.46%）；在完整的未见场景中胜率为 89.10%（GPT-4 为 100%，GPT-3.5-turbo 为 30%）；在复杂的三个子网场景中胜率为 50.34%（GPT-4 为 82.35%，GPT-3.5-turbo 为 10%，Q-learning 为 0%）。Hackphyr 的表现始终优于 GPT-3.5-turbo、原始 Zephyr 模型和 Q-learning 基准，同时在参数量小几个数量级的情况下接近 GPT-4 的性能。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

GPT-4
GPT-3.5-turbo
Zephyr-7b-beta
Q-learning
Random-agent

Scale 评估规模

3 network scenarios (small, full, three-subnets) each evaluated over 150 episodes with 100 maximum steps per episode

Contributions 核心贡献

A locally deployed fine-tuned 7B parameter model (Hackphyr) that achieves performance comparable to GPT-4 and outperforms GPT-3.5-turbo, base Zephyr-7b-beta, and Q-learning agents in network security environments
A novel three-part cybersecurity fine-tuning dataset (1641 samples) covering environment understanding, valid action generation, and good action selection, published on HuggingFace
An ablation study demonstrating that the good action generation component (Part III) of the dataset is the most critical for agent performance, and that Part I (environment understanding) combined with Part III yields the best results
A detailed behavioral analysis using action transition matrices and graphs comparing GPT-4, Hackphyr, and base Zephyr-7b-beta agents, revealing planning patterns and shortcomings

一个本地部署的经过微调的 7B 参数模型 (Hackphyr)，在网络安全环境中实现了与 GPT-4 相当的性能，并优于 GPT-3.5-turbo、原始 Zephyr-7b-beta 和 Q-learning 智能体。
一个新颖的三部分网络安全微调数据集（1641 个样本），涵盖环境理解、有效动作生成和良好动作选择，已在 HuggingFace 上发布。
一项消融研究证明，数据集中的良好动作生成部分（第三部分）对智能体性能最为关键，且第一部分（环境理解）与第三部分结合可产生最佳效果。
使用动作转移矩阵和图形进行的详细行为分析，比较了 GPT-4、Hackphyr 和原始 Zephyr-7b-beta 智能体，揭示了规划模式和不足之处。

Limitations 局限性

Even a 7B model requires a strong GPU for inference (V100 with 32GB NVRAM used), and quantization to reduce memory comes at the cost of performance degradation
The same prompt was used across all models for fair comparison, but model-specific prompt optimization could improve individual performance
Hackphyr still generates more invalid actions than GPT-4, particularly at the start of episodes when short-term memory is empty
Performance degrades significantly in the most complex three-subnets scenario (50% win rate vs. GPT-4's 82%), indicating limitations in multi-hop reasoning and generalization to unseen topologies
The evaluation is limited to the NetSecGame simulated environment, which uses high-level abstract actions rather than real network tools
The fine-tuning dataset was partially generated by commercial LLMs (GPT-4, Claude), creating a dependency on those models for training data creation

即使是 7B 模型也需要强力的 GPU 进行推理（使用了 32GB 显存的 V100），且为了减少显存占用而进行的量化会以性能下降为代价。
为了公平比较，所有模型都使用了相同的提示词，但针对特定模型的提示词优化可能会提高个体性能。
Hackphyr 生成的无效动作仍多于 GPT-4，特别是在短期记忆为空的会话开始阶段。
在最复杂的三个子网场景中性能显著下降（50% 胜率对比 GPT-4 的 82%），表明在多跳推理和泛化到未见拓扑方面存在局限性。
评估仅限于 NetSecGame 模拟环境，该环境使用高层抽象动作而非真实的物理网络工具。
微调数据集部分由商业 LLM（GPT-4、Claude）生成，导致在训练数据创建方面对这些模型产生依赖。

Research Gaps 研究空白

Optimizing prompts per model using techniques like DSPy to adapt to each model's specific strengths
Extending the approach to defending agents and multi-agent collaborative settings (attackers and defenders)
Balancing computational efficiency with performance through better quantization or distillation methods
Evaluating fine-tuned open-source models in more realistic and diverse network security environments beyond simulations
Addressing the lack of long-term memory between episodes, which could enable continual learning across engagements
Incorporating defender-awareness into the agent's planning to reduce detection in scenarios with active defenders

使用 DSPy 等技术针对每个模型优化提示词，以适配每个模型的特定优势。
将该方法扩展到防御智能体和多智能体协作设置（攻击者和防御者）。
通过更好的量化或蒸馏方法，在计算效率与性能之间取得平衡。
在除模拟环境之外的更真实、更多样化的网络安全环境中评估微调后的开源模型。
解决会话之间缺乏长期记忆的问题，这可以实现跨任务的持续学习。
在智能体的规划中加入对防御者的感知，以在有活跃防御者的场景中降低被检测的风险。

Novel Techniques 新颖技术

Three-part structured fine-tuning dataset design targeting distinct weaknesses of the base model: environment understanding, syntactic/semantic action validity, and strategic action quality
Using distilled Supervised Fine-Tuning (dSFT) where stronger models (GPT-4, Claude) generate training data for a smaller model in a cybersecurity-specific context
Behavioral analysis via action transition matrices to compare agent planning quality, revealing that base Zephyr has high transition probability to invalid actions from all states while Hackphyr learns the correct attack sequence
Intrinsic reward mechanism where the agent evaluates each action as helpful or not based on whether the environment state changed

三部分结构化微调数据集设计，针对基础模型的不同弱点：环境理解、语法/语义动作有效性以及战略动作质量。
使用蒸馏监督微调 (dSFT)，即在特定网络安全背景下，使用更强的模型（GPT-4、Claude）为较小的模型生成训练数据。
通过动作转移矩阵进行行为分析以比较智能体的规划质量，揭示了原始 Zephyr 从所有状态转移到无效动作的概率都很高，而 Hackphyr 学习到了正确的攻击序列。
内在奖励机制，智能体根据环境状态是否发生变化来评估每个动作是否有帮助。

Open Questions 开放问题

How well would Hackphyr transfer to real-world network environments with actual security tools rather than abstract actions?
Can the fine-tuning approach scale to larger open-source models (e.g., 70B parameters) for further performance gains?
Would incorporating explicit defender-avoidance strategies in the training data improve performance against active defenders?
How does performance change with different base models beyond the Mistral/Zephyr family?
Can the agent develop long-term strategic planning for multi-step attack chains spanning more than three network hops?

Hackphyr 在使用实际安全工具而非抽象动作的真实网络环境中的迁移效果如何？
微调方法能否扩展到更大的开源模型（例如 70B 参数）以进一步提升性能？
在训练数据中加入显式的规避防御者策略是否会提高针对活跃防御者的表现？
除 Mistral/Zephyr 系列之外，使用不同的基础模型时性能会如何变化？
智能体能否为跨越三个以上网络跳数的多步攻击链制定长期战略规划？

Builds On 基于前人工作

Out of the cage: How stochastic parrots win in cyber security environments (Rigaki et al. 2024, ICAART)
Zephyr: Direct Distillation of LM Alignment (Tunstall et al. 2023)
LoRA (Hu et al. 2022)
ReAct (Yao et al. 2023)

Open Source 开源信息

Yes - dataset published on HuggingFace