论文列表 Papers
共 68 篇论文 68 papers
Type
Year
Scope
Framework
Automation
#01 system 2024
PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing
multi-agent human-in-the-loop
#02 system 2026
What Makes a Good LLM Agent for Real-world Penetration Testing? What Makes a Good LLM Agent for Real-world Penetration Testing?
single-agent fully-autonomous
#03 system 2025
Automated Penetration Testing with LLM Agents and Classical Planning Automated Penetration Testing with LLM Agents and Classical Planning
single-agent fully-autonomous
#04 system 2025
xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems
multi-agent fully-autonomous
#05 system 2025
AutoPenGPT: Highly automated penetration testing framework based on LLM AutoPenGPT: Highly automated penetration testing framework based on LLM
multi-agent semi-autonomous
#06 system 2024
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks
multi-agent fully-autonomous
#07 system 2024
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?
multi-agent fully-autonomous
#08 system 2025
AutoPentester: An LLM Agent-based Framework for Automated Pentesting AutoPentester: An LLM Agent-based Framework for Automated Pentesting
multi-agent fully-autonomous
#09 system 2025
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework
multi-agent fully-autonomous
#10 system 2025
Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning
single-agent fully-autonomous
#11 system 2025
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing PentestAgent: Incorporating LLM Agents to Automated Penetration Testing
multi-agent semi-autonomous
#12 system 2026
PTFusion: LLM-driven context-aware knowledge fusion for web penetration testing PTFusion: LLM-driven context-aware knowledge fusion for web penetration testing
hierarchical fully-autonomous
#13 system 2025
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents
single-agent fully-autonomous
#14 system 2025
ARACNE: An LLM-Based Autonomous Shell Pentesting Agent ARACNE: An LLM-Based Autonomous Shell Pentesting Agent
multi-agent fully-autonomous
#15 system 2025
PwnGPT: Automatic Exploit Generation Based on Large Language Models PwnGPT: Automatic Exploit Generation Based on Large Language Models
single-agent fully-autonomous
#16 system 2025
PenTest++: Elevating Ethical Hacking with AI and Automation PenTest++: Elevating Ethical Hacking with AI and Automation
human-in-the-loop semi-autonomous
#17 system 2024
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers
single-agent fully-autonomous
#18 system 2024
Hacking, The Lazy Way: LLM Augmented Pentesting Hacking, The Lazy Way: LLM Augmented Pentesting
single-agent copilot
#19 system 2025
Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks
hierarchical fully-autonomous
#20 system 2024
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
single-agent fully-autonomous
#21 system 2025
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
multi-agent fully-autonomous
#22 system 2025
CAI: An Open, Bug Bounty-Ready Cybersecurity AI CAI: An Open, Bug Bounty-Ready Cybersecurity AI
multi-agent semi-autonomous
#23 survey 2025
On the Surprising Efficacy of LLMs for Penetration-Testing On the Surprising Efficacy of LLMs for Penetration-Testing
fully-autonomous
#24 system 2013
POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing
single-agent fully-autonomous
#25 system 2019
Markov Game Modeling of Moving Target Defense for Strategic Detection of Threats in Cloud Networks Markov Game Modeling of Moving Target Defense for Strategic Detection of Threats in Cloud Networks
fully-autonomous
#26 empirical-study 2021
Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori Knowledge Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori Knowledge
single-agent fully-autonomous
#27 system 2021
CybORG: A Gym for the Development of Autonomous Cyber Agents CybORG: A Gym for the Development of Autonomous Cyber Agents
single-agent fully-autonomous
#28 benchmark 2024
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
single-agent fully-autonomous
#29 benchmark 2024
AutoPenBench: Benchmarking Generative Agents for Penetration Testing AutoPenBench: Benchmarking Generative Agents for Penetration Testing
single-agent fully-autonomous
#30 benchmark 2025
VAP-6: A Benchmarking Framework on Vulnerability Assessment and Penetration Testing for Language Models VAP-6: A Benchmarking Framework on Vulnerability Assessment and Penetration Testing for Language Models
copilot
#31 system 2005
MulVAL: A Logic-based Network Security Analyzer MulVAL: A Logic-based Network Security Analyzer
fully-autonomous
#32 defense 2025
Cloak, Honey, Trap: Proactive Defenses Against LLM Agents Cloak, Honey, Trap: Proactive Defenses Against LLM Agents
multi-agent fully-autonomous
#33 system 2024
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
single-agent fully-autonomous
#34 survey 2025
Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks
multi-agent fully-autonomous
#35 survey 2025
AI in Penetration Testing: A Systematic Mapping Study AI in Penetration Testing: A Systematic Mapping Study
fully-autonomous
#36 system 2024
Automated Penetration Testing: Formalization and Realization Automated Penetration Testing: Formalization and Realization
single-agent fully-autonomous
#37 survey 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
fully-autonomous
#38 system 2023
Getting Pwn'd by AI: Penetration Testing with Large Language Models Getting Pwn'd by AI: Penetration Testing with Large Language Models
single-agent semi-autonomous
#39 benchmark 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
multi-agent human-in-the-loop
#40 empirical-study 2024
Generative AI for pentesting: the good, the bad, the ugly Generative AI for pentesting: the good, the bad, the ugly
human-in-the-loop human-in-the-loop
#41 system 2024
BreachSeek: A Multi-Agent Automated Penetration Tester BreachSeek: A Multi-Agent Automated Penetration Tester
multi-agent fully-autonomous
#42 system 2024
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments
single-agent fully-autonomous
#43 system 2025
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
multi-agent fully-autonomous
#44 system 2025
Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges
single-agent fully-autonomous
#45 system 2025
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
single-agent fully-autonomous
#46 benchmark 2025
CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment
multi-agent fully-autonomous
#47 empirical-study 2026
Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
single-agent fully-autonomous
#48 system 2026
Context Relay for Long-Running Penetration-Testing Agents Context Relay for Long-Running Penetration-Testing Agents
single-agent fully-autonomous
#49 system 2026
Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI Towards Cybersecurity Superintelligence: from AI-guided humans to human-guided AI
multi-agent fully-autonomous
#50 system 2025
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
single-agent fully-autonomous
#51 position-paper 2025
To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack
fully-autonomous
#52 system 2025
RedTeamLLM: an Agentic AI framework for offensive security RedTeamLLM: an Agentic AI framework for offensive security
single-agent fully-autonomous
#53 benchmark 2025
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
single-agent fully-autonomous
#54 system 2025
Cyber-Zero: Training Cybersecurity Agents Without Runtime Cyber-Zero: Training Cybersecurity Agents Without Runtime
single-agent fully-autonomous
#55 empirical-study 2026
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
single-agent fully-autonomous
#56 system 2025
EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
single-agent fully-autonomous
#57 system 2025
Multi-Agent Penetration Testing AI for the Web Multi-Agent Penetration Testing AI for the Web
multi-agent fully-autonomous
#58 benchmark 2025
PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design
single-agent fully-autonomous
#59 system 2025
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models
human-in-the-loop human-in-the-loop
#60 system 2025
Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks
hierarchical fully-autonomous
#61 system 2025
AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents
multi-agent semi-autonomous
#62 empirical-study 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities LLM Agents can Autonomously Exploit One-day Vulnerabilities
single-agent fully-autonomous
#63 benchmark 2025
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
single-agent fully-autonomous
#64 survey 2024
SoK: A Comparison of Autonomous Penetration Testing Agents SoK: A Comparison of Autonomous Penetration Testing Agents
single-agent fully-autonomous
#65 empirical-study 2024
An Empirical Evaluation of LLMs for Solving Offensive Security Challenges An Empirical Evaluation of LLMs for Solving Offensive Security Challenges
single-agent fully-autonomous
#66 system 2024
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation
multi-agent fully-autonomous
#67 empirical-study 2023
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions
human-in-the-loop
#68 survey 2020
An Empirical Survey of Functions and Configurations of Open-Source Capture the Flag (CTF) Environments An Empirical Survey of Functions and Configurations of Open-Source Capture the Flag (CTF) Environments
human-in-the-loop