#25

Markov Game Modeling of Moving Target Defense for Strategic Detection of Threats in Cloud Networks Markov Game Modeling of Moving Target Defense for Strategic Detection of Threats in Cloud Networks

Ankur Chowdhary, Sailik Sengupta, Dijiang Huang, Subbarao Kambhampati

2019 | AAAI Workshop on Artificial Intelligence for Cyber Security (AICS) (workshop)

system defense fully-autonomous game-theoretic

PDF Preview 论文预览

Loading PDF... 加载 PDF 中...

Problem & Motivation 问题与动机

Deploying all possible security measures in large-scale cloud networks is prohibitively expensive in terms of computing and networking resources. The challenge is to optimally place a limited number of intrusion detection systems (IDS) across a cloud network to maximize security while minimizing performance degradation.

在大规模云网络中部署所有可能的安全措施，在计算和网络资源方面的成本高昂。挑战在于如何在云网络中优化放置有限数量的入侵检测系统 (IDS)，以在最小化性能下降的同时实现安全最大化。

Existing approaches treat multi-step attacks as single-step attacks, leading to sub-optimal placement of detection mechanisms. They also fail to account for the asymmetric performance impacts of placing multiple detection systems on the same host. Furthermore, static placement strategies are inherently insecure since attackers with reconnaissance will eventually learn and avoid them. This paper fills the gap by formulating the IDS placement problem as a zero-sum Markov Game that reasons about multi-stage attacks using attack graphs and CVSS scores.

现有方法将多步攻击视为单步攻击，导致检测机制的放置并非最优。它们也未能考虑在同一主机上放置多个检测系统所产生的非对称性能影响。此外，静态放置策略本质上是不安全的，因为拥有侦察能力的攻击者最终会学会并绕过它们。本文通过将 IDS 放置问题表述为一个零和马尔可夫博弈（Markov Game），并利用攻击图和 CVSS 评分对多阶段攻击进行推理，填补了这一空白。

Threat Model 威胁模型

The attacker is a stealthy adversary who can be in any node of the network and remain undetected until attempting to exploit a vulnerability. Both the defender and attacker have full observability of the state. A known list of attacks (CVEs) exists that cannot be fixed due to third-party restrictions. The attacker performs multi-hop monotonic attacks through the cloud network following paths in an attack graph.

攻击者是一个隐蔽的对手，可能位于网络的任何节点，并在尝试利用漏洞之前一直保持未被发现的状态。防御者和攻击者都拥有状态的完全可观测性。存在一个已知的攻击列表 (CVE)，由于第三方限制而无法修复。攻击者遵循攻击图中的路径，在云网络中执行多跳单调攻击。

Methodology 核心方法

The authors model the cloud network security problem as a two-player zero-sum Markov Game. States are derived from the attack graph of the cloud network, where sub-networks represent game states. Attacker actions correspond to exploiting known CVEs, while defender actions correspond to placing IDS (e.g., Snort, auditd) at specific network points. Rewards are computed using CVSS impact scores and performance degradation costs. The optimal defender strategy is obtained by computing the mixed min-max equilibrium of the Markov Game via value iteration.

作者将云网络安全问题建模为一个双人零和马尔可夫博弈。状态源自云网络的攻击图，其中子网代表博弈状态。攻击者动作对应于利用已知的 CVE，而防御者动作对应于在特定的网络点放置 IDS（如 Snort, auditd）。奖励是使用 CVSS 影响评分和性能下降成本计算得出的。通过值迭代计算马尔可夫博弈的混合最大最小均衡（mixed min-max equilibrium），从而获得最优防御策略。

Architecture 架构设计

The system takes as input an attack graph generated from the cloud network topology and known CVEs from the National Vulnerability Database (NVD). The attack graph is decomposed into states for the Markov Game. CVSS v2 vectors provide scoring for transitions and rewards. A value iteration algorithm computes the optimal mixed strategy (probability distribution over IDS placements) for the defender at each state.

系统输入由云网络拓扑生成的攻击图以及来自国家漏洞库 (NVD) 的已知 CVE。攻击图被分解为马尔可夫博弈的状态。CVSS v2 向量为转移和奖励提供评分。值迭代算法为防御者在每个状态计算最优混合策略（IDS 放置的概率分布）。

Tool Integration 工具集成

Memory Mechanism 记忆机制

none

Attack Phases Covered 覆盖的攻击阶段

reconnaissance

scanning

enumeration

exploitation

post exploitation

privilege escalation

lateral movement

reporting

Evaluation 评估结果

The optimal mixed strategy from the Markov Game formulation consistently outperforms both the Min-Max Pure Strategy and Uniform Random Strategy across all game states, with the improvement margin widening as the discount factor approaches 1 and as the sub-net size increases. In the real-world APT case study, the Markov Game strategy yielded better defender utility values than Uniform Random placement while respecting resource constraints of placing only two out of five possible IDS in a subnet.

马尔可夫博弈表述下的最优混合策略在所有博弈状态下的表现都一致优于最大最小纯策略和均匀随机策略，且随着折扣因子趋近于 1 以及子网规模的扩大，改进幅度进一步增大。在真实的 APT 案例研究中，马尔可夫博弈策略在遵守资源限制（在子网中仅放置 5 个可选 IDS 中的 2 个）的情况下，比均匀随机放置产生了更好的防御效用值。

Environment 评估环境

Metrics 评估指标

Baseline Comparisons 基准对比

Min-Max Pure Strategy (MMPS)
Uniform Random Strategy (URS)

Scale 评估规模

Small cloud network with 3 VMs (LDAP, FTP, Web Server) for preliminary evaluation; APT case study with 6 VMs (Firewall-IPFire, Windows 2012 R2, CentOS, Windows 7, Debian, another Debian) in a flat network

Contributions 核心贡献

An attack graph-based multi-stage attack analysis method leveraging Markov Game modeling to optimize the cost incurred and security provided by detection mechanisms in a multi-tenant cloud network
Demonstration that the Markov Game strategy for IDS placement outperforms static and randomization strategies, with the improvement margin widening as sub-net size increases and discount factor approaches one
A real-world case study showcasing the approach against Advanced Persistent Threats (APTs) using WRCCDC competition VM images

一种基于攻击图的多阶段攻击 analysis 方法，利用马尔可夫博弈建模来优化多租户云网络中检测机制产生的成本和提供的安全性。
证明了用于 IDS 放置的马尔可夫博弈策略优于静态和随机化策略，且改进幅度随着子网规模的扩大和折扣因子趋近于 1 而增大。
一个真实的案例研究，利用 WRCCDC 竞赛虚拟机镜像展示了该方法在对抗高级持续性威胁 (APT) 方面的表现。

Limitations 局限性

Assumes a zero-sum game, but in reality attacker and defender rewards may not be exactly opposite
Assumes full observability of the state by both players, which is unrealistic for many real scenarios
Limited to known vulnerabilities (CVEs) listed in NVD; does not handle zero-day attacks
The abstraction of attack graphs into Markov Game states may lose practically meaningful information
Scalability concerns: the min-max strategy computation grows exponentially with the defender action set size
Evaluation is limited to small-scale networks; large-scale cloud deployments are not tested
Transition probabilities rely on CVSS exploitability scores which may not accurately reflect real-world attack success rates

假设博弈为零和博弈，但现实中攻击者和防御者的奖励可能并不完全相反。
假设双方对状态具有完全可观测性，这在许多现实场景中是不切实际的。
局限于 NVD 中列出的已知漏洞 (CVE)；无法处理零日攻击。
将攻击图抽象为马尔可夫博弈状态可能会丢失具有实际意义的信息。
可扩展性问题：最大最小策略计算随着防御者动作集大小呈指数级增长。
评估局限于小规模网络；未在大型云部署中进行测试。
转移概率依赖于 CVSS 可利用性评分，这可能无法准确反映现实世界中的攻击成功率。

Research Gaps 研究空白

Extension to general-sum games where attacker and defender rewards are not exactly opposite
Relaxing the assumption that transition functions are accurately defined by normalized exploitability scores
Handling zero-day attacks that modify the underlying attack graph dynamically
Scaling the approach to large cloud networks with many subnets and heterogeneous services
Partial observability settings where the defender does not know the attacker's current state

扩展到攻击者和防御者奖励不完全相反的一般和博弈（general-sum games）。
放宽关于转移函数由归一化可利用性评分精确定义的假设。
处理动态修改底层攻击图的零日攻击。
将该方法扩展到具有许多子网和异构服务的大型云网络。
针对防御者不知道攻击者当前状态的部分可观测设置进行研究。

Novel Techniques 新颖技术

Using attack graph decomposition into Markov Game states where each state aggregates related attack graph nodes
Deriving game rewards from CVSS impact scores combined with IDS performance degradation costs
Mixed strategy equilibrium computation for dynamic IDS placement that prevents attacker from predicting defender moves
Modeling the defender's resource constraints by limiting the number of IDS per subnet independently

将攻击图分解为马尔可夫博弈状态，其中每个状态聚合了相关的攻击图节点。
结合 CVSS 影响评分和 IDS 性能下降成本来导出博弈奖励。
用于动态 IDS 放置的混合策略均衡计算，防止攻击者预测防御者的行动。
通过独立限制每个子网的 IDS 数量来建模防御者的资源约束。

Open Questions 开放问题

How does the approach scale to enterprise networks with hundreds or thousands of nodes?
Can the zero-sum assumption be relaxed to handle more realistic attacker-defender reward structures?
How should the model adapt when new zero-day vulnerabilities are discovered mid-game?
What is the impact of partial observability on the defender's optimal strategy?
How can this approach integrate with real-time network monitoring for online strategy adaptation?

该方法如何扩展到具有数百或数千个节点的企业网络？
能否放宽零和假设以处理更现实的攻击者-防御者奖励结构？
当博弈中途发现新的零日漏洞时，模型应如何调整？
部分可观测性对防御者的最优策略有何影响？
该方法如何与实时网络监控集成，以进行在线策略调整？

Builds On 基于前人工作

Jha, Sheyner, and Wing 2002 (formal attack graph analysis)
Chowdhary, Pisharody, and Huang 2016 (SDN-based scalable MTD)
Sengupta et al. 2018 (MTD for IDS placement in cloud)
Littman 1994 (Markov games for multi-agent RL)
Shapley 1953 (stochastic games)
Venkatesan et al. 2016 (MTD for disrupting stealthy botnets)