LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
Problem & Motivation 问题与动机
LLM-based autonomous hacking agents represent a growing threat to cybersecurity, but limited empirical data exists regarding their actual use in real-world attack scenarios. There is a gap in understanding the current threat landscape and developing effective defenses against AI-driven attacks.
基于大语言模型(LLM)的自主黑客智能体对网络安全构成了日益严重的威胁,但关于它们在真实攻击场景中实际使用的实证数据非常有限。目前在理解当前的威胁格局以及开发针对 AI 驱动攻击的有效防御措施方面存在空白。
Recent developments show LLM-based agents can perform complex offensive security tasks, but evaluations have been limited to controlled environments (e.g., CTFs, benchmarks). This paper fills the gap by providing empirical evidence of LLM-based hacking agents operating in the wild, serving as an early warning system about the current level of AI-driven threats.
最近的发展表明,基于 LLM 的智能体可以执行复杂的进攻性安全任务,但评估仅限于受控环境(如 CTF、基准测试)。本文通过提供在野外运行的基于 LLM 的黑客智能体的实证证据,填补了这一空白,作为关于当前 AI 驱动威胁水平的早期预警系统。
Threat Model 威胁模型
Autonomous LLM-based agents that connect to publicly accessible SSH services and attempt to exploit them. The system distinguishes between three actor types: traditional software bots, LLM-based agents, and human attackers. The assumption is that software bots cannot pass human-like prompt injection tests, and humans cannot respond as quickly as LLMs.
自主的基于 LLM 的智能体连接到公开可访问的 SSH 服务并尝试利用它们。该系统区分了三类行为体:传统软件机器人、基于 LLM 的智能体和人类攻击者。假设软件机器人无法通过类人提示注入(prompt injection)测试,而人类的响应速度无法像 LLM 那样快。
Methodology 核心方法
The authors deploy deliberately vulnerable SSH honeypot servers augmented with embedded prompt injections and time-based analysis techniques to detect and monitor autonomous LLM hacking agents in the wild. The system uses a multi-step detection pipeline: first, prompt injections embedded in banner messages, command outputs, and system files attempt to hijack the agent's goal (e.g., via 'Important Message' attacks); then, timing analysis of response intervals distinguishes LLM agents (which respond in under 1.7 seconds) from human operators (who take significantly longer). Cross-referencing behavioral responses to prompt injections with timing data reduces false positives.
作者部署了故意留有漏洞的 SSH 蜜罐服务器,并增强了嵌入式提示注入和基于时间的分析技术,以检测和监控野外自主的 LLM 黑客智能体。该系统采用多步检测流程:首先,在横幅消息、命令输出和系统文件中嵌入提示注入,试图劫持智能体的目标(例如,通过“重要消息”攻击);然后,通过对响应间隔的时间分析,将 LLM 智能体(响应时间在 1.7 秒以内)与人类操作员(响应时间显著更长)区分开。通过将对提示注入的行为响应与时间数据进行交叉验证,减少了误报。
Architecture 架构设计
Built on a customized Cowrie SSH honeypot with modifications including a vulnerable SSH version, altered default users, and modified standard outputs. Prompt injections are integrated into SSH banner messages, command outputs (e.g., uname), and accessible system files. The detection pipeline proceeds: attacker connects -> command output with embedded prompt injection (goal hijacking via 'Important Message') -> timing analysis -> if agent responds, attempt prompt stealing to extract system instructions -> re-verification and classification. A public dashboard at https://ai-honeypot.palisaderesearch.org/ provides real-time metrics.
基于定制的 Cowrie SSH 蜜罐构建,修改包括易受攻击的 SSH 版本、修改默认用户和标准输出。提示注入被集成到 SSH 横幅消息、命令输出(如 uname)和可访问的系统文件中。检测流程如下:攻击者连接 -> 带有嵌入式提示注入的命令输出(通过“重要消息”进行目标劫持) -> 时间分析 -> 如果智能体有响应,尝试进行提示窃取以提取系统指令 -> 重新验证和分类。公开发布的仪表盘(https://ai-honeypot.palisaderesearch.org/)提供实时指标。
LLM Models 使用的大模型
Tool Integration 工具集成
Memory Mechanism 记忆机制
none
Attack Phases Covered 覆盖的攻击阶段
Evaluation 评估结果
Over a three-month public deployment, the honeypot recorded 8,130,731 interaction attempts and identified 8 potential AI-driven hacking attempts. The 'Important Message' prompt injection technique achieved 100% success rate for goal hijacking against both ReAct and CoT agent frameworks using GPT-4o. LLM agents responded within 1.7 seconds on average, while human attackers took over 10 seconds, validating timing analysis as a discrimination method.
在为期三个月的公开部署中,蜜罐记录了 8,130,731 次交互尝试,并识别出 8 个潜在的 AI 驱动的黑客尝试。“重要消息”提示注入技术对 GPT-4o 的 ReAct 和 CoT 智能体框架都实现了 100% 的目标劫持成功率。LLM 智能体平均在 1.7 秒内响应,而人类攻击者则需要 10 秒以上,验证了时间分析作为一种区分方法的可行性。
Environment 评估环境
Metrics 评估指标
Baseline Comparisons 基准对比
- AgentDojo prompt injection findings
Scale 评估规模
10 honeypot IP addresses deployed across various countries, recording 8,130,731 total interactions over ~3 months
Contributions 核心贡献
- Deploy deliberately vulnerable SSH honeypot servers with embedded prompt injections specifically designed to detect LLM-based hacking agents in the wild
- Implement a multi-step detection methodology combining active behavioral manipulation (prompt injection) and passive timing analysis to distinguish LLM agents from humans and software bots
- Provide empirical evidence from a public deployment demonstrating that LLM-based hacking attempts exist in the real world, identifying 8 potential AI-driven attacks among over 8 million interactions
- 部署了故意留有漏洞的 SSH 蜜罐服务器,其中嵌入了专门设计的提示注入,用于检测野外基于 LLM 的黑客智能体
- 实现了一种多步检测方法,结合了主动行为操纵(提示注入)和被动时间分析,以区分 LLM 智能体与人类及软件机器人
- 提供了来自公开部署的实证证据,证明野外确实存在基于 LLM 的黑客尝试,在 800 多万次交互中识别出 8 个潜在的 AI 驱动攻击
Limitations 局限性
- Limited internet coverage: the honeypot monitors only a narrow slice of global attack traffic across 10 IPs and cannot generalize to all regions or networks
- Blind spots in high-level targets: focused on publicly accessible SSH services, missing attackers targeting specialized, state-level, or closed-off infrastructures
- Focus limited to fully autonomous agents; does not detect AI-enhanced tools (e.g., AI-assisted fuzzing) or human-in-the-loop AI usage
- Novelty of detection methods: prompt injection and timing analysis techniques are relatively new and have only been tested in a limited deployment; robustness against evasion strategies is unconfirmed
- Very small number of detected AI agents (8 potential, 1 confirmed) limits the statistical significance of findings
- 互联网覆盖范围有限:蜜罐仅监控了 10 个 IP 地址,涵盖全球攻击流量的一小部分,无法推广到所有地区或网络
- 对高级目标的盲区:侧重于公开访问的 SSH 服务,无法涵盖针对专业、国家级或封闭基础设施的攻击者
- 焦点仅限于完全自主的智能体;无法检测 AI 增强工具(如 AI 辅助模糊测试)或人机协作(human-in-the-loop)的 AI 使用
- 检测方法的新颖性:提示注入和时间分析技术相对较新,仅在有限的部署中进行了测试;对抗逃逸策略的稳健性尚未确认
- 检测到的 AI 智能体数量非常少(8 个潜在,1 个确认),限制了研究结果的统计学意义
Research Gaps 研究空白
- Need for larger-scale and longer-duration deployments to capture more AI-driven attacks and build statistically significant datasets
- Detection methods need to be tested against adversarial evasion strategies (e.g., agents that intentionally add delays to mimic human timing)
- Expanding honeypot attack surfaces beyond SSH to web applications, databases, email services, social media, and industrial control systems
- Identifying distinctive behavioral patterns and strategies used by different LLM agent frameworks in real-world attacks
- Developing detection for human-in-the-loop and AI-assisted (non-autonomous) attack scenarios
- 需要更大规模和更长时间的部署,以捕获更多 AI 驱动的攻击并建立具有统计学意义的数据集
- 检测方法需要针对对抗逃逸策略进行测试(例如,智能体故意增加延迟以模仿人类的时间特征)
- 将蜜罐攻击面从 SSH 扩展到 Web 应用程序、数据库、电子邮件服务、社交媒体和工业控制系统
- 识别不同 LLM 智能体框架在真实攻击中使用的独特行为模式和策略
- 开发针对人机协作和 AI 辅助(非自主)攻击场景的检测方法
Novel Techniques 新颖技术
- Using prompt injection as a detection mechanism rather than an attack vector - embedding prompts in honeypot outputs to identify LLM agents
- Two-stage prompt injection pipeline: goal hijacking first (via 'Important Message' attack), then prompt stealing to extract agent system instructions
- Timing analysis as a distinguishing feature: LLM agents respond in under 1.7 seconds vs. humans taking 10+ seconds
- Embedding prompt injections across multiple system touchpoints (banners, command outputs, filesystem) to maximize detection surface
- 将提示注入作为一种检测机制而非攻击向量——在蜜罐输出中嵌入提示以识别 LLM 智能体
- 两阶段提示注入流水线:首先劫持目标(通过“重要消息”攻击),然后进行提示窃取以提取智能体的系统指令
- 将时间分析作为区分特征:LLM 智能体响应时间在 1.7 秒以下,而人类需要 10 秒以上
- 在多个系统接触点(横幅、命令输出、文件系统)嵌入提示注入,以最大化检测表面
Open Questions 开放问题
- Can LLM agents be modified to evade timing-based detection by introducing artificial delays?
- What proportion of real-world cyberattacks are currently AI-driven vs. human-driven?
- How effective would these detection techniques be against more sophisticated or fine-tuned hacking agents?
- Can prompt injection detection be extended to non-SSH protocols and web-based attack surfaces?
- LLM 智能体能否通过引入人为延迟来绕过基于时间的检测?
- 目前真实世界的网络攻击中,AI 驱动与人类驱动的比例是多少?
- 这些检测技术对更复杂或经过微调的黑客智能体有多大效果?
- 提示注入检测能否扩展到非 SSH 协议和基于 Web 的攻击面?
Builds On 基于前人工作
- Cowrie SSH honeypot
- AgentDojo
- Advanced Cowrie Configuration
- Honeynet Project
Open Source 开源信息
Partial - public dashboard at https://ai-honeypot.palisaderesearch.org/ but honeypot system code not released