Paper deep dive

Distributed Equilibrium-Seeking in Target Coverage Games via Self-Configurable Networks under Limited Communication

Jayanth Bhargav, Zirui Xu, Vasileios Tzoumas, Mahsa Ghasemi, Shreyas Sundaram

Year: 2026Venue: arXiv preprintArea: eess.SYType: PreprintEmbeddings: 87

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/22/2026, 5:02:26 AM

Summary

The paper presents a distributed, communication-aware framework for solving target coverage games between a team of sensing agents (defender) and an adaptive attacker. By exploiting the submodularity of the utility function and leveraging distributed bandit-submodular optimization, the authors propose a method that allows agents to self-configure communication neighborhoods under bandwidth constraints, achieving an approximate Nash equilibrium without requiring a central coordinator.

Entities (5)

Attacker · agent · 98%Sensing Agents · agent-group · 98%Distributed Target Coverage Game · game-model · 95%Nash Equilibrium · game-theory-concept · 95%Submodularity · mathematical-property · 95%

Relation Signals (3)

Distributed Target Coverage Game → exhibitsproperty → Submodularity

confidence 95% · Exploiting the submodularity property of the game's utility function

Sensing Agents → playsgameagainst → Attacker

confidence 95% · We model this interaction as a zero-sum game between the sensing team (known as the defender) and the attacker.

Sensing Agents → seeks → Nash Equilibrium

confidence 90% · The sensing agents aim to maximize f, while the attacker aims to minimize f, which results in a zero-sum game.

Cypher Suggestions (2)

Find all agents involved in the target coverage game. · confidence 90% · unvalidated

MATCH (a:Agent)-[:PLAYS_GAME_AGAINST]->(b:Agent) RETURN a, b

Identify properties associated with the game model. · confidence 90% · unvalidated

MATCH (g:GameModel)-[:EXHIBITS_PROPERTY]->(p:Property) RETURN g.name, p.name

Abstract

Abstract:We study a target coverage problem in which a team of sensing agents, operating under limited communication, must collaboratively monitor targets that may be adaptively repositioned by an attacker. We model this interaction as a zero-sum game between the sensing team (known as the defender) and the attacker. However, computing an exact Nash equilibrium (NE) for this game is computationally prohibitive as the action space of the defender grows exponentially with the number of sensors and their possible orientations. Exploiting the submodularity property of the game's utility function, we propose a distributed framework that enables agents to self-configure their communication neighborhoods under bandwidth constraints and collaboratively maximize the target coverage. We establish theoretical guarantees showing that the resulting sensing strategies converge to an approximate NE of the game. To our knowledge, this is the first distributed, communication-aware approach that scales effectively for games with combinatorial action spaces while explicitly incorporating communication constraints. To this end, we leverage the distributed bandit-submodular optimization framework and the notion of Value of Coordination that were introduced in [1]. Through simulations, we show that our approach attains near-optimal game value and higher target coverage compared to baselines.

PDF

Open source PDF →Open local PDF →

Full Text

86,258 characters extracted from source content.

Expand or collapse full text

Distributed Equilibrium-Seeking in Target Coverage Games via Self-Configurable Networks under Limited Communication Jayanth Bhargav1∗, Zirui Xu2∗, Vasileios Tzoumas2†, Mahsa Ghasemi1†, Shreyas Sundaram1† ∗These authors contributed equally to this work.†These authors contributed equally to this work.1Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA; jbhargav,mahsa, sundara2@purdue.edu2Department of Aerospace Engineering, University of Michigan, Ann Arbor, MI 48109 USA; ziruixu,vtzoumas@umich.eduThis work was supported in part by the National Science Foundation (NSF) CAREER Award No. 2337412, the Army Research Office (ARO) Early Career Program Award W911NF-25-1-0280, the Office of Naval Research (ONR) and Saab, Inc. under the Threat and Situational Understanding of Networked Online Machine Intelligence (TSUNOMI) program (grant no. N00014-23-C-1016) and the National Aeronautics and Space Administration (NASA) under grant 80NSSC24M0070. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF, ARO, ONR, Saab, Inc., and/or NASA. Abstract We study a target coverage problem in which a team of sensing agents, operating under limited communication, must collaboratively monitor targets that may be adaptively repositioned by an attacker. We model this interaction as a zero-sum game between the sensing team (known as the defender) and the attacker. However, computing an exact Nash equilibrium (NE) for this game is computationally prohibitive as the action space of the defender grows exponentially with the number of sensors and their possible orientations. Exploiting the submodularity property of the game’s utility function, we propose a distributed framework that enables agents to self-configure their communication neighborhoods under bandwidth constraints and collaboratively maximize the target coverage. We establish theoretical guarantees showing that the resulting sensing strategies converge to an approximate NE of the game. To our knowledge, this is the first distributed, communication-aware approach that scales effectively for games with combinatorial action spaces while explicitly incorporating communication constraints. To this end, we leverage the distributed bandit-submodular optimization framework and the notion of Value of Coordination that were introduced in [24]. Through simulations, we show that our approach attains near-optimal game value and higher target coverage compared to baselines. I Introduction In today’s evolving landscape of security and surveillance, effective deployment of sensors is critical to ensure comprehensive monitoring and target coverage [12]. Game-theoretic frameworks are widely used to model adversarial sensing and coverage problems. Many existing formulations model the coordination among sensing agents through a centralized decision-making process [4]. However, in practice, large surveillance areas and limited communication make it infeasible to have a central coordinator [23, 1, 6]. These challenges instead necessitate distributed coordination, in which agents selectively communicate with local neighbors and adapt their local actions to maximize overall team performance. In this work, we consider a target coverage task where a team of sensing agents (defender) must coordinate to cover targets deployed in an environment. The utility function of the defender, for a given target deployment by the attacker, is the fraction of targets the agents collectively cover. This utility function is submodular—a property that captures diminishing returns due to overlapping coverage among agents [10]. This structure enables the use of efficient approximation algorithms for utility maximization [3]. However, leveraging submodularity becomes more challenging when the utility value changes over time due to the attacker adaptively repositioning the targets. Figure 1: Distributed Target Coverage Game. Team of sensing agents choose communication neighbors and orientations to maximize target coverage. Simultaneously, the attacker selects a target deployment strategy to achieve the least coverage for the sensing agents. The algorithms ActSel, NeiSel, and their interaction framework via the Value of Coordination, follows [24]. We model this interaction as a zero-sum game between the sensing team and an adaptive attacker. Then, at Nash equilibrium (NE), neither the sensing team nor the attacker can improve their objective through unilateral deviation, yielding sensing strategies that are robust to such strategic adaptations. Solving this game for the exact NE strategies, however, can become computationally intractable due to the combinatorial growth of the defender’s action space with the number of sensors and their orientations, as well as the need to propagate global information without a central coordinator. To address this, we propose a distributed, scalable and communication-efficient framework for approximately solving this game. Our approach draws on and unifies ideas from online learning, game theory, and distributed bandit-submodular optimization. Related Work. Several recent works study large zero-sum games, including analyses of equilibrium dynamics [14] and algorithms for estimating equilibria [13, 18, 21, 17]. In [13, 18], the authors study subnetwork games with Lipschitz continuous utilities and require a strongly connected network to ensure convergence to an NE. In [21], the authors present a multilinear extension–based technique with an approximation guarantee in a centralized setting. On the other hand, the regret minimization approach of [17] does not provide guarantees on the quality of the resulting strategies. In contrast to these works, we establish near-optimality guarantees in a distributed setting without requiring connectivity assumptions on the communication network. Related work also focuses on solving distributed submodular optimization problems in real-time, which are challenging since such problems are known to be NP-hard [9]. Although existing algorithms achieve near-optimal solutions in polynomial time, they often need significant runtime to terminate in distributed settings due to limited communication speeds and bandwidths [1, 19, 15]. In distributed settings, multiple works have focused on online submodular maximization in unknown (dynamic) environments [20, 23]. They study these in a bandit feedback setting, and provide bounded suboptimality guarantees with respect to optimal time-(in)variant actions through regret minimization [2]. However, these works (i) can require extensive communication rounds as in [1, 15, 19], and (i) require a connected network for information propagation, which is not always feasible [22]. Contributions. We propose a distributed framework for a submodular target coverage game where a set of sensing agents coordinate their orientations to maximize the target coverage, while the attacker adaptively relocates the deployed targets to minimize the coverage (Algorithm˜1). Our framework enables a near-optimal coordination strategy for sensing agents that simultaneously configure communication neighborhoods under limited bandwidth budgets and coordinate orientations, assuming a no-regret attacker that can learn the best response. To this end, we leverage the distributed bandit-submodular optimization framework in [24] and extend it to the game-theoretic setting. The framework can asymptotically achieve an approximate NE of the game. During the repeated play of the game, sensing agents alternate between selecting their orientations and neighbors to adapt to the time-varying target deployments by the attacker. In particular, our framework has the following key properties. Performance Guarantees It enables the distributed team of agents to asymptotically achieve an approximate NE. Despite the communication bandwidth constraints, the agents achieve near-optimal performance by each actively adapting the neighborhood over time, enabled by optimizing its Value of Coordination (VoC ), a quantity introduced in [24]. Anytime Self-Configuration Our framework enables each agent to choose its actions and neighbors individually using local information, enabling seamless adaptation to near-optimal orientations and neighborhoods as agents join or leave the network (e.g., sensor failures or additions). Finally, through simulations, we demonstrate that the sensing agents achieve game utility arbitrarily close to the NE value and higher target coverage compared to baselines. I Problem Formulation In this section, we formalize the Distributed Adversarial Target Coverage Game and lay down the framework about the defender, attacker and game utility. To this end, we use the notation: ≜∏i∈i 29014 _ 29006 4945 _ 29033 \, 12850 \, 29006 \, 29014 _ 29033 is the cross product of sets ii∈\ 29014 _ 29033 \_ 29033 \, 12850 \, 29006 ; [T]≜1,…,T 67482370 29012 84267779 \ 28721 24891 … 24891 29012 \ for any positive integer T 29012 ; f(a|)≜f(∪a)−f() 29030 67273472 29025 \, 69640972\, 28993 84054785 29030 67273472 28993 \, 8795 \,\ 29025 \ 84054785 8704 29030 67273472 28993 84054785 is the marginal gain of set function f:2↦→ℝ 29030 12346 28722 29014 567 545 29010 for adding a∈ 29025 12850 29014 to ⊆ 28993 12818 29014 ; and || 69640972 28993 69640972 is the cardinality of a discrete set 28993 . Communication network. The communication network is denoted by =(,ℰ) 28999 12349 67273472 29006 24891 28997 84054785, where 29006 is the set of nodes/agents and ℰ 28997 refers to edges between agents. ℰ 28997 is unspecified a priori; instead, the agents must form and optimize this communication network internally to coordinate their orientation actions and maximize their joint utility. Communication neighborhood. When a communication channel exists from agent j 29034 to i 29033 , i.e., (j→i)∈ℰ 67273472 29034 12833 29033 84054785 12850 28997 , then i 29033 can receive, store, and process information from j 29034 . The set of all agents that i 29033 receives information from is denoted by i 29006 _ 29033 —agent i 29033 ’s neighborhood. When we say that an agent exchanges information, we mean that it communicates the set of targets it currently covers under its chosen orientation (e.g., by encoding and sharing unique identifiers for targets). Communication constraints. Each agent i 29033 can receive information from up to αi 28939 _ 29033 other agents due to onboard bandwidth constraints. Thus, it must be |i|≤αi 69640972 29006 _ 29033 69640972 12820 28939 _ 29033 . Also, we denote by ℳi 29005 _ 29033 the set of agents that have agent i 29033 within reach – not all agents may have agent i 29033 within reach because of distance or obstacles. Therefore, agent i 29033 can pick its neighbors by choosing at most αi 28939 _ 29033 agents from ℳi 29005 _ 29033 . Evidently, i⊆ℳi 29006 _ 29033 12818 29005 _ 29033 . Defender’s action space. For each sensing agent i∈ 29033 12850 29006 , we denote the agent’s sensing action space by the set of feasible orientations i 29014 _ 29033 . An action ai∈i 29025 _ 29033 12850 29014 _ 29033 corresponds to a specific orientation of sensor i 29033 . The joint action of the sensing team is given by =(a1,…,a||)∈ 28993 12349 67273472 29025 _ 28721 24891 … 24891 29025 _ 69640972 29006 69640972 84054785 12850 29014 _ 29006 . We denote a mixed strategy of the defender by x∈Δ|X| 29048 12850 28673 _ 69640972 29016 69640972, where X= 29016 12349 29014 _ 29006 is the set of joint actions, and Δ 28673 denotes a probability simplex. Attacker’s action space. The attacker places a fixed number of targets according to one of finitely many deployment configurations Y=b1,…,bm 29017 12349 \ 29026 _ 28721 24891 … 24891 29026 _ 29037 \, where each bi∈Y 29026 _ 29033 12850 29017 is a distinct spatial arrangement/deployment of these targets in the environment. We denote the mixed-strategy of the attacker by y∈Δ|Y| 29049 12850 28673 _ 69640972 29017 69640972, where Δ 28673 denotes a probability simplex. Game Utility. The game utility f(,b) 29030 67273472 28993 24891 29026 84054785 for a pure-strategy pair (,b) 67273472 28993 24891 29026 84054785, is the fraction of targets covered by the sensing team with the joint orientation 28993 under the target deployment configuration b 29026 . The team of sensing agents aim to maximize f 29030 , while the attacker aims to minimize f 29030 , which results in a zero-sum game. We denote the game matrix by G∈ℝ|X|×|Y| 28999 12850 29010 69640972 29016 69640972 8706 69640972 29017 69640972, where Gij=f(i,bj),i∈X,bj∈Y 28999 _ 29033 29034 12349 29030 67273472 28993 29033 24891 29026 _ 29034 84054785 24891 28993 29033 12850 29016 24891 29026 _ 29034 12850 29017 . With the notation introduced above, we now formalize the Distributed Target Coverage Game below. Problem 1 (Distributed Target Coverage Game). The zero-sum game between the team of sensing agents and the attacker is given by the following optimization problem. maxx∈Δ|X|⁡miny∈Δ|Y|⁡x⊤Gy=maxx∈Δ|X|⁡miny∈Δ|Y|⁡∼x,b∼yf(,b). _ 29048 12850 28673 _ 69640972 29016 69640972 _ 29049 12850 28673 _ 69640972 29017 69640972 29048 574 28999 29049 12349 _ 29048 12850 28673 _ 69640972 29016 69640972 _ 29049 12850 28673 _ 69640972 29017 69640972 28997 _ 28993 12824 29048 24891 29026 12824 29049 \, 29030 67273472 28993 24891 29026 84054785 314 (1) The solution to the zero-sum game in Problem 1, i.e., the NE strategies, can be obtained by solving eq.˜1. As stated earlier, exact NE computation is impractical due to the combinatorial action space of the defender (e.g., 1Γ 28721 0 sensors with 4 28724 orientations yield 41Γ 28724 28721 0 joint actions) and the absence of a central coordinator. This motivates the design of an efficient, distributed algorithm that relies only on local coordination between the agents and achieves near-optimal performance relative to the centralized NE. In zero-sum games, near-optimality is formalized through the notion of an ϵ 28943 -Nash equilibrium. Definition 1 (ϵ 28943 -Nash Equilibrium (ϵ 28943 -NE) [4]). A pair of mixed strategies (x⋆,y⋆) 67273472 29048 8511 24891 29049 8511 84054785 is an ϵ 28943 -Nash equilibrium of the zero-sum game with payoff matrix G 28999 if x⋆⊤Gy⋆≥x⊤Gy⋆−ϵ,∀x∈Δ|X| 29048 8511 574 28999 29049 8511 12821 29048 574 28999 29049 8511 8704 28943 24891 \; 568 29048 12850 28673 _ 69640972 29016 69640972 and x⋆⊤Gy⋆≤x⋆⊤Gy+ϵ,∀y∈Δ|Y|. 29048 8511 574 28999 29049 8511 12820 29048 8511 574 28999 29049 8235 28943 24891 \; 568 29049 12850 28673 _ 69640972 29017 69640972 314 In other words, an ϵ 28943 -NE ensures that neither player can improve their payoff by more than ϵ 28943 relative to the game value (NE value) through unilateral deviation. I Scalable Equilibrium Approximation In this section, we introduce our main algorithm which estimates an ϵ 28943 -NE of the Distributed Target Coverage Game. We begin by characterizing key structural properties of the game’s utility function. I-A Preliminaries Definition 2 (Normalized and Non-Decreasing Submodular Set Function [10]). A set function g:2↦→ℝ 29031 12346 28722 29014 567 545 29010 is normalized and non-decreasing submodular if and only if • (Normalization) g(∅)=Γ 29031 67273472\, 571 \, 84054785 12349 0; • (Monotonicity) g()≤g(ℬ) 29031 67273472\, 28993 \, 84054785 12820 29031 67273472\, 28994 \, 84054785, ∀⊆ℬ⊆ 568 \, 28993 12818 28994 12818 29014 ; • (Submodularity) g(s|)≥g(s|ℬ) 29031 67273472\, 29043 \, 69640972\, 28993 \, 84054785 12821 29031 67273472\, 29043 \, 69640972\, 28994 \, 84054785, ∀⊆ℬ⊆ 568 \, 28993 12818 28994 12818 29014 , s∈ 29043 12850 29014 . Definition 3 (2nd-order Submodularity [11]). A set function g:2↦→ℝ 29031 12346 28722 29014 567 545 29010 is 2nd-order submodular if and only if g(s|)−g(s|∪)≥g(s|ℬ∪)−g(s|∪ℬ∪), 29031 67273472 29043 \, 69640972\, 28995 84054785 8704 29031 67273472 29043 \, 69640972\, 28993 8795 28995 84054785 12821 29031 67273472 29043 \, 69640972\, 28994 8795 28995 84054785 8704 29031 67273472 29043 \, 69640972\, 28993 8795 28994 8795 28995 84054785 24891 (2) for any disjoint ,ℬ,⊆ 28993 24891 28994 24891 28995 12818 29014 (∩ℬ∩=∅ 28993 8796 28994 8796 28995 12349 571 ) and s∈ 29043 12850 29014 . For a target deployment configuration b∈Y 29026 12850 29017 , we define f(a|,b)≜f(∪a,b)−f(,b) 29030 67273472 29025 \, 69640972\, 28993 24891 29026 84054785 29030 67273472 28993 8795 \ 29025 \ 24891 29026 84054785 8704 29030 67273472 28993 24891 29026 84054785 to be the marginal gain of the game utility function for adding a∈ 29025 12850 29014 to ⊆ 28993 12818 29014 , given b∈Y 29026 12850 29017 . Then, for a given b∈Y 29026 12850 29017 , the game’s utility function f(⋅,b) 29030 67273472 8705 24891 29026 84054785 is normalized, monotone, submodular, and 2nd-order submodular over the set of sensing actions (sensor orientations), with submodularity arising from overlapping coverage areas among the sensing agents [6]. We leverage these properties to guide our algorithm design. 0: Number of time steps T 29012 ; neighbor candidate sets ℳi 29005 _ 29033 , neighborhood sizes αi 28939 _ 29033 , and action sets i 29014 _ 29033 for all sensing agent i∈ 29033 12850 29006 ; attacker’s action set Y 29017 ; game’s utility function f 29030 . 10: A pair of ϵ 28943 -NE strategies (x¯T,y¯T) 67273472 29048 _ 29012 24891 29049 _ 29012 84054785. 1: i,Γ←∅,∀i∈ 29006 _ 29033 24891 0 12832 571 24891 568 29033 12850 29006 ; 2: ηb←2log⁡|Y|/(|Y|T) 28945 29026 12832 28722 69640972 29017 69640972\, 68408078\, 67273472 69640972 29017 69640972 29012 84054785; 3: w1←[w1,1,…,w|Y|,1]⊤ 29047 _ 28721 12832 67482370 29047 _ 28721 24891 28721 24891 … 24891 29047 _ 69640972 29017 69640972 24891 28721 84267779 574 with wb,1=1,∀b∈Y 29047 _ 29026 24891 28721 12349 28721 24891 568 29026 12850 29017 ; 4: q1←w1/‖w1‖1 29041 _ 28721 12832 29047 _ 28721 \, 68408078\, 69645069 29047 _ 28721 69645069_ 28721 ; 5: for each time step t∈[T] 29044 12850 67482370 29012 84267779 do 6: sample target deployment config. bt∈Y 29026 _ 29044 12850 29017 from qt 29041 _ 29044 ; 7: pi,t,ai,t←ActSel (b1:t−1,T,i,f),∀i∈ 29040 _ 29033 24891 29044 24891 29025 _ 29033 24891 29044 12832 ActSel 67273472 29026 _ 28721 12346 29044 8704 28721 24891 29012 24891 29014 _ 29033 24891 29030 84054785 24891 568 29033 12850 29006 ; 8: i,t←NeiSel (ai,t,b1:t−1,T,ℳi,αi,f),∀i∈ 29006 _ 29033 24891 29044 12832 NeiSel 67273472 29025 _ 29033 24891 29044 24891 29026 _ 28721 12346 29044 8704 28721 24891 29012 24891 29005 _ 29033 24891 28939 _ 29033 24891 29030 84054785 24891 568 29033 12850 29006 ; 9: receive neighbors’ actions aj,tj∈i,t,∀i∈\ 29025 _ 29034 24891 29044 \_ 29034 \, 12850 \, 29006 _ 29033 24891 29044 24891 568 29033 12850 29006 ; 10: update ActSel (per lines 6–8 of Algorithm 2) & NeiSel (per lines 6–11 of Algorithm 3); 211: rb,t←f(ai,ti∈,bt) 29042 _ 29026 24891 29044 12832 29030 67273472\ 29025 _ 29033 24891 29044 \_ 29033 12850 29006 24891 29026 _ 29044 84054785 12: r^b,t←1−(bt=b)pb,t(1−rb,t),∀b∈Y 29042 _ 29026 24891 \, 29044 12832 28721 8704 28721 67273472 29026 _ 29044 \, 12349 \, 29026 84054785 29040 _ 29026 24891 29044 67273472 28721 \, 8704 \, 29042 _ 29026 24891 29044 84054785 24891 568 29026 12850 29017 ; 13: wb,t+1←wb,texp⁡(−ηbr^b,t) 29047 _ 29026 24891 29044 8235 28721 12832 29047 _ 29026 24891 29044 67273472 8704 28945 29026 \, 29042 _ 29026 24891 29044 84054785; 14: get distribution qt+1←wt+1/‖wt+1‖1 29041 _ 29044 8235 28721 12832 29047 _ 29044 8235 28721 \, 68408078\, 69645069 29047 _ 29044 8235 28721 69645069_ 28721 ; 15: end for 16: estimate joint-strategy p¯t≜Πi∈pi,t,∀t∈[T] 29040 _ 29044 28677 _ 29033 12850 29006 \, 29040 _ 29033 24891 29044 24891 568 29044 12850 67482370 29012 84267779. 17: x¯T←1T∑t∈[T]p¯t,y¯T←1T∑t∈[T]qt 29048 _ 29012 12832 28721 29012 4944 _ 29044 12850 67482370 29012 84267779 29040 _ 29044 24891 \; 29049 _ 29012 12832 28721 29012 4944 _ 29044 12850 67482370 29012 84267779 29041 _ 29044 . Algorithm 1 Repeated Play Dynamics of the Distributed Target Coverage Game I-B Main Algorithm Algorithm 1 outlines the iterative interaction between the attacker and the defender. Following standard game-theoretic modeling, we assume that the attacker is a rational agent that follows a no-regret learning strategy [8]. Such learning dynamics are known to converge to NE in zero-sum games [7]. Under this modeling, the interaction between the sensing agents and the attacker induces an iterative game-play dynamic. We show that, within this dynamic, the decision-making problems faced by the attacker and each sensing agent can be naturally formulated as adversarial multi-armed bandits (A-MAB) [2], because the agents and the attacker are unaware of each other’s actions a priori. From the defender’s perspective, at each round the attacker selects a target deployment configuration, and the sensing team must choose an orientation strategy to maximize target coverage. When the attacker adaptively changes the deployment over time, the sensing team must learn an orientation-selection policy that maximizes the expected coverage utility. This setting corresponds to an A-MAB problem, where each arm represents an orientation strategy and the reward at each round is a function of the game’s utility based on the target deployment strategy chosen by the attacker in that round. Similarly, from the attacker’s perspective, the sensing team’s adaptive orientation choices induce an adversarially evolving environment. The attacker then seeks a target deployment policy that minimizes the expected coverage achieved by the sensing team. This also constitutes an A-MAB problem, where the arms correspond to target deployment configurations and the reward is defined by the negative of game’s utility function, which the attacker is trying to maximize. The attacker follows a classical no-regret learning strategy EXP3 [2] by updating a distribution over target deployment strategies against a sequence of defender actions. The defending team operates in a fully distributed manner, with each sensing agent maintaining two A-MAB instances: (i) ActSel (Algorithm 2) - for selecting sensing actions; and (i) NeiSel (Algorithm 3) - for selecting communication neighbors (see Appendix for details). We show that this decomposition is highly beneficial, since under limited information exchange each agent can only evaluate the local utility, based on the subset of agents with which it communicates. Consequently, an agent’s utility is inherently determined by its communication neighborhood. As a result, choosing communication neighbors i,t⊆ℳi 29006 _ 29033 24891 29044 12818 29005 _ 29033 has a direct and non-trivial impact on the optimality of the sensing actions selected by each agent. This coupling makes neighbor selection an integral component of the overall decision process rather than a secondary design choice. To capture this formally, we leverage the notion of Value of Coordination. Definition 4 (Value of Coordination (VoC ) [24]). Denote ft(⋅)≜f(⋅,bt) 29030 _ 29044 67273472 8705 84054785 29030 67273472 8705 24891 29026 _ 29044 84054785. At any t∈[T] 29044 12850 67482370 29012 84267779 with a target deployment bt∈Y 29026 _ 29044 12850 29017 , for an agent i∈ 29033 12850 29006 with an action ai,t 29025 _ 29033 24891 29044 and neighbors i,t 29006 _ 29033 24891 29044 , its Value of Coordination is defined as VoCft,t(ai,t;i,t)≜ft(ai,t)−ft(ai,t|aj,tj∈i,t). VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \, 29006 _ 29033 24891 29044 84054785 29030 _ 29044 67273472 29025 _ 29033 24891 29044 84054785 8704 29030 _ 29044 67273472 29025 _ 29033 24891 29044 \, 69640972\,\ 29025 _ 29034 24891 29044 \_ 29034 12850 29006 _ 29033 24891 29044 84054785 314 (3) VoC measures the overlap of utility between its action and its neighbors’ actions, and per section˜-A, the equilibrium approximation performance (ℛTd 29010 _ 29012 29028 as in eq.˜4) will be tighter when VoC is improved. Although VoC is not computable a priori, according to [24, Lemma 1], it is normalized, non-decreasing, and submodular in i,t 29006 _ 29033 24891 29044 given that f 29030 is 2nd-order submodular (Definition˜3). Therefore, maximizing VoC takes the form of bandit submodular maximization that can be solved by NeiSel with bounded suboptimality (Algorithm˜3). IV Performance Guarantees We now establish performance guarantees for the proposed algorithm by characterizing bounds on the near-optimality of the resulting strategies relative to the NE of the game. We first present our main theoretical result that extends [24, Theorem 1], which is for a fixed utility function f 29030 , by generalizing it to a time-varying setting in which the utility function value f(⋅,bt) 29030 67273472 8705 24891 29026 _ 29044 84054785 changes over time owing to attacker’s actions. Over t∈[T] 29044 12850 67482370 29012 84267779, each agent i∈ 29033 12850 29006 selects actions ai,tt∈[T]\ 29025 _ 29033 24891 29044 \_ 29044 12850 67482370 29012 84267779 and neighborhoods i,tt∈[T]\ 29006 _ 29033 24891 29044 \_ 29044 12850 67482370 29012 84267779 against a sequence of attacker’s target deployments btt∈[T]\ 29026 _ 29044 \_ 29044 12850 67482370 29012 84267779 (Algorithm˜1). Let =a1,…,a|| 28993 29007 29008 29012 12349 \ 29025 29007 29008 29012 _ 28721 24891 … 24891 29025 29007 29008 29012 _ 69640972 29006 69640972\ denote the best joint action of the sensing team in hindsight. The cumulative regret of the defender against 28993 29007 29008 29012 is: ℛTd=∑t=1Tf(,bt)−∑t=1Tf(t,bt). 29010 29028 _ 29012 12349 4944 _ 29044 12349 28721 29012 29030 67273472 28993 29007 29008 29012 24891 29026 _ 29044 84054785 8704 4944 _ 29044 12349 28721 29012 29030 67273472 28993 _ 29044 24891 29026 _ 29044 84054785 314 (4) Definition 5 (Curvature [5]). The curvature of a normalized monotone submodular function g:2↦→ℝ 29031 24634 28722 29014 567 545 29010 is defined as κg≜1−min|∈(g()−g(\|))/g(|). 28948 _ 29031 28721 8704 _ 69640972 12850 29014 67273472 29031 67273472 29014 84054785 8704 29031 67273472 29014 8814 \ 69640972\ 84054785 84054785 68408078 29031 67273472 69640972 84054785 314 (5) κg 28948 _ 29031 measures how far g 29031 is from modularity: if κg=Γ 28948 _ 29031 12349 0, then g()−g(\|)=g(|) 29031 67273472 29014 84054785 8704 29031 67273472 29014 8814 \ 69640972\ 84054785 12349 29031 67273472 69640972 84054785, ∀|∈ 568 69640972 12850 29014 , i.e., g 29031 is modular. Using this definition, we present our theoretical result. Theorem 1 (Defender’s Average Regret Bound). After running Algorithm˜1 for T 29012 rounds, the defender achieves: ℛTdT 29010 29028 _ 29012 29012 ≤ˇf−ﬁæ(ˇI,ﬀ¯)∑i∈t∼T[(ai,t;i⋆)] 12820 28948 _ 29030 8704 28940 \, 28954 67273472 28948 _ 29001 24891 28939 84054785 4944 _ 29033 12850 29006 28997 _ 29044 12824 29012 67482370 29014 29039 28995 67273472 29025 _ 29033 24891 29044 24635 29006 _ 29033 8511 84054785 84267779 +O~(||(ﬀ¯2|ℳ¯|+|¯|)/T), 8235 29007 67273472 69640972 29006 69640972 67273472 28939 28722 \, 69640972 29005 69640972\, 8235 \, 69640972 29014 69640972 84054785 68408078 29012 84054785 24891 (6) where κI≜maxi∈⁡ˇI,i 28948 _ 29001 _ 29033 12850 29006 28948 _ 29001 24891 29033 , α¯≜maxi∈⁡ﬀi 28939 _ 29033 12850 29006 28939 _ 29033 , |¯|≜maxi∈⁡|i| 69640972 29014 69640972 _ 29033 12850 29006 69640972 29014 _ 29033 69640972, |ℳ¯|≜maxi∈⁡|ℳi| 69640972 29005 69640972 _ 29033 12850 29006 69640972 29005 _ 29033 69640972, κf,κI∈[Γ,1] 28948 _ 29030 24891 28948 _ 29001 12850 674823700 24891 28721 84267779, β≜κf(1−κf) 28940 28948 _ 29030 \, 67273472 28721 8704 28948 _ 29030 84054785 and ρ(κI,α¯)∈[1−1/e,1] 28954 67273472 28948 _ 29001 24891 28939 84054785 12850 67482370 28721 8704 28721 68408078 29029 24891 28721 84267779. Here κf 28948 _ 29030 is the curvature of f 29030 , κI,i 28948 _ 29001 24891 29033 is the curvature of VoC of agent i∈ 29033 12850 29006 and i⋆≜argmaxi,t⊆ℳi;|i,t|≤ﬀi∑t∈[T]VoCft,t(ai,t;i,t) 29006 8511 _ 29033 8707 29025 29042 29031 \, 29037 29025 29048 _ 29006 _ 29033 24891 29044 12818 29005 _ 29033 24635 69640972 29006 _ 29033 24891 29044 69640972 12820 28939 _ 29033 4944 _ 29044 12850 67482370 29012 84267779 VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \, 29006 _ 29033 24891 29044 84054785. Similarly, for a sequence of actions of the sensing agents t=a1,t,…,a||,tt∈[T]\ 28993 _ 29044 12349 \ 29025 _ 28721 24891 29044 24891 … 24891 29025 _ 69640972 29006 69640972 24891 29044 \\_ 29044 12850 67482370 29012 84267779, let b 29026 29007 29008 29012 denote the attacker strategy that is best in hindsight. The cumulative regret of the attacker with respect to this strategy is ℛTa=∑t=1Tf(t,bt)−∑t=1Tf(t,b). 29010 29025 _ 29012 12349 4944 _ 29044 12349 28721 29012 29030 67273472 28993 _ 29044 24891 29026 _ 29044 84054785 8704 4944 _ 29044 12349 28721 29012 29030 67273472 28993 _ 29044 24891 29026 29007 29008 29012 84054785 314 (7) Since the attacker follows the standard EXP3 regime (lines 12–13 of Algorithm˜1), we have the following [16]. Lemma 1 (Attacker’s Average Regret Bound). After running Algorithm˜1 for T 29012 rounds, the attacker achieves: ℛTaT≤O~(|Y|/T). 29010 29025 _ 29012 29012 12820 29007 \! 67273472 69640972 29017 69640972 68408078 29012 84054785 314 (8) Combining the bounds from Lemma 1 and Theorem 1 with [8, Theorem 2] yields the following near-optimality guarantee for the strategies produced by Algorithm 1. Theorem 2 (Duality Gap). Let (x¯T,y¯T) 67273472 29048 _ 29012 24891 29049 _ 29012 84054785 denote the strategies obtained after Algorithm 1 runs for T 29012 rounds. Then, the strategies (x¯T,y¯T) 67273472 29048 _ 29012 24891 29049 _ 29012 84054785 are ϵT 28943 _ 29012 -NE of the Distributed Target Coverage Game, where ﬄT 28943 _ 29012 =ˇf−ﬁæ(ˇI,ﬀ¯)∑i∈t∼T[(ai,t;i,t⋆)] 12349 28948 _ 29030 8704 28940 \, 28954 67273472 28948 _ 29001 24891 28939 84054785 4944 _ 29033 12850 29006 28997 _ 29044 12824 29012 67482370 29014 29039 28995 67273472 29025 _ 29033 24891 29044 24635 29006 8511 _ 29033 24891 29044 \ 84054785 84267779 +O~([||2(ﬀ¯2|ℳ¯|+|¯|)+|Y|]/T). 8235 29007 67273472 67482370 69640972 29006 69640972 28722 67273472 28939 28722 \, 69640972 29005 69640972\, 8235 \, 69640972 29014 69640972 84054785 8235 69640972 29017 69640972 84267779 68408078 29012 84054785 314 (9) As T→∞ 29012 12833 561 , the limiting near-optimality parameter ϵ∞ 28943 _ 561 depends on κf 28948 _ 29030 and the agents’ VoC . By selectively coordinating with the most informative neighbors, we maximize VoC , thereby tightening the bound on ϵ∞ 28943 _ 561 by compensating for the suboptimality term κf 28948 _ 29030 . V Numerical Evaluation Through two simulated scenarios of the target coverage game defined in ˜1, we demonstrate that (i) Algorithm 1 converges to an approximate NE per Theorem˜2 (Figure 2), and (i) optimizing the network topology per Algorithm 3 leads to coverage performance that outperforms several baselines for neighborhood design, such as the nearest and random selection that are common in controls, and is even comparable to the best possible performance with unlimited bandwidth budgets (Figure 3). The experimental setup for both scenarios is as follows. The sensors 29006 have fixed locations and need to each choose its orientation ai,t 29025 _ 29033 24891 29044 from the 16 cardinal directions i 29014 _ 29033 , ∀t 568 29044 . Each sensor i 29033 is unaware of aj,t,j∈\i 29025 _ 29034 24891 29044 24891 29034 12850 29006 8814 \ 29033 \; thus, they have to communicate to know about peers’ actions. The attacker has a time-varying target distribution bt 29026 _ 29044 updated via EXP3 (per Algorithm 1). f(ai,ti∈,bt) 29030 67273472\ 29025 _ 29033 24891 29044 \_ 29033 12850 29006 24891 29026 _ 29044 84054785 is the percentage of covered target area by the sensors when they select orientations ai,ti∈\ 29025 _ 29033 24891 29044 \_ 29033 12850 29006 under the target distribution bt 29026 _ 29044 . Figure 2: Payoffs and duality gap. Consider the repeated game dynamics per Algorithm 1 between three sensors each with bandwidth constraint 1 28721 (defender) and an |Y|=2Γ 69640972 29017 69640972 12349 28722 0 possible target deployments (attacker). As T→∞ 29012 12833 561 , the players asymptotically achieve the approximate NE, with the duality gap, i.e., the difference between the players’ payoffs, decreasing to its minimum, as shown in Theorem˜2. Results are averaged over 20 Monte Carlo trials, each with 15000 rounds. Figure 3: Comparison of neighbor selection strategies. Four algorithms are compared under the same action selection strategy (ActSel) and adversarial environment (EXP3), differing only in their neighbor selection strategies. Results are averaged over 20 Monte Carlo trials, each with 10000 rounds. Convergence to Approximate NE of the Game We run Algorithm 1 which simulates an iterative game play between the sensing agents (which alternate between ActSel and NeiSel ) and a rational attacker. We do 20 Monte Carlo trials, each with 15000 rounds. In Figure 2, we observe that (i) the sensing team asymptotically approaches the game’s equillbirum per Theorem˜1, and (i) the duality gap, i.e., the difference between the defenders’ payoff and the attacker’s payoff, decreases per Theorem˜2. Impact of Network Optimization We demonstrate the effectiveness of network optimization per Algorithm 3 (NeiSel ). We consider a 3Γ×3Γ 28723 0 8706 28723 0 environment. The sensors have FOV radius ri=8 29042 _ 29033 12349 28728 , AOV θi=π/3 28946 _ 29033 12349 28953 68408078 28723 , communication range ci=16 29027 _ 29033 12349 28721 28726 , and heterogeneous bandwidth budgets αi 28939 _ 29033 randomly chosen from 1,2,3\ 28721 24891 28722 24891 28723 \. We compare our framework with three benchmarks: (i) ActSel + Nearest Neighbors: Action selection follows Algorithm 2, while each sensor i 29033 will always have the same closest peers as neighbors; (i) ActSel + Random Neighbors: Action selection follows Algorithm 2, while each sensor i 29033 will uniformly sample its neighbors from ℳi 29005 _ 29033 at each round; and (i) ActSel + All Possible Neighbors: Action selection follows Algorithm 2, while each sensor i 29033 has no bandwidth constraint and will always have all ℳi 29005 _ 29033 as neighbors (since VoC is non-decreasing in ℳi 29005 _ 29033 , this should achieve the best possible coverage). We conduct 20 Monte Carlo trials, each with 10000 rounds. From Figure 3, we observe: (i) our framework consistently outperforms the benchmarks with the same bandwidth budgets, and (i) despite limited bandwidth, it is comparable to the best possible coverage with unlimited bandwidth. VI Conclusion In this paper, we proposed a distributed, game-theoretic framework for adversarial target coverage under limited communication. By modeling the interaction between the sensing team and the attacker as a zero-sum game with a combinatorial action space, we developed a decentralized algorithm that converges to an approximate NE while respecting communication constraints. Our approach enabled each agent to adaptively configure its communication neighborhood through an information-driven mechanism, allowing the system to scale efficiently. We established theoretical guarantees on convergence and quantified the near-optimality of the resulting strategies relative to the centralized game value. Through simulations, we showed that the proposed method converges to an approximate NE of the game and provides high coverage utility compared to other neighbor selection baselines. References [1] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas (2015) Decentralized active information acquisition: Theory and application to multi-robot SLAM. In IEEE Inter. Conf. Rob. Auto. (ICRA), p. 4775–4782. Cited by: §I, §I. [2] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire (2002) The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32 (1), p. 48–77. Cited by: §-A, §I, §I-B, §I-B. [3] J. Bhargav, M. Ghasemi, and S. Sundaram (2024) Submodular information selection for hypothesis testing with misclassification penalties. In Learn. for Dyn. & Cont. (L4DC), p. 566–577. Cited by: §I. [4] J. Bhargav, S. Sundaram, and M. Ghasemi (2025) Sensor scheduling in intrusion detection games with uncertain payoffs. In Learn. for Dyn. & Cont. (L4DC), p. 606–618. Cited by: §I, Definition 1. [5] M. Conforti and G. Cornuéjols (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete Applied Mathematics 7 (3), p. 251–274. Cited by: Definition 5. [6] M. Corah and N. Michael (2018) Distributed submodular maximization on partition matroids for planning on large sensor networks. In IEEE Conference on Decision and Control (CDC), p. 6792–6799. Cited by: §I, §I-A. [7] C. Daskalakis and I. Panageas (2019) Last-iterate convergence: zero-sum games and constrained min-max optimization. In Innovations in Theoretical Computer Science Conference (ITCS), p. 27–1. Cited by: §I-B. [8] G. Farina, C. Kroer, and T. Sandholm (2017) Regret minimization in behaviorally-constrained zero-sum games. In International Conference on Machine Learning, p. 1107–1116. Cited by: §-A, §I-B, §IV. [9] U. Feige (1998) A threshold of ln(n) 29036 29038 67273472 29038 84054785 for approximating set cover. Journal of the ACM (JACM) 45 (4), p. 634–652. Cited by: §I. [10] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey (1978) An analysis of approximations for maximizing submodular set functions–I. In Polyhedral Combinatorics, p. 73–87. Cited by: §I, Definition 2. [11] S. Foldes and P. L. Hammer (2005) Submodularity, supermodularity, and higher-order monotonicities of pseudo-boolean functions. Mathematics of Operations Research 30 (2), p. 453–461. Cited by: Definition 3. [12] M. A. Guvensan and A. G. Yavuz () On coverage issues in directional sensor networks: a survey. Ad Hoc Networks 9 (), p. 1238–1255. Cited by: §I. [13] S. Huang, J. Lei, Y. Hong, U. V. Shanbhag, and J. Chen (2024) No-regret distributed learning in subnetwork zero-sum games. IEEE Transactions on Automatic Control (10), p. 6620–6635. Cited by: §I. [14] R. Konda, R. Chandan, D. Grimsman, and J. R. Marden (2025) Best response sequences and tradeoffs in submodular resource allocation games. IEEE Transactions on Automatic Control. Cited by: §I. [15] R. Konda, D. Grimsman, and J. R. Marden (2022) Execution order matters in greedy algorithms with limited information. In American Control Conference (ACC), p. 1305–1310. Cited by: §I. [16] T. Lattimore and C. Szepesvári (2020) Bandit algorithms. Cambridge University Press. Cited by: §IV. [17] S. Li, Y. Zhang, X. Wang, W. Xue, and B. An (2021) CFR-mix: solving imperfect information extensive-form games with combinatorial action space. In International Joint Conference on Artificial Intelligence (IJCAI), p. 3663–3669. Cited by: §I. [18] L. Liao, D. Yuan, D. W. Ho, W. X. Zheng, B. Zhang, and Z. Yu (2025) Distributed dynamic no-regret learning in two-network zero-sum games. IEEE Transactions on Automatic Control. Cited by: §I. [19] A. Robey, A. Adibi, B. Schlotfeldt, H. Hassani, and G. J. Pappas (2021) Optimal algorithms for submodular maximization with distributed constraints. In Learn. for Dyn. & Cont. (L4DC), p. 150–162. Cited by: §I. [20] M. Streeter and D. Golovin (2008) An online algorithm for maximizing submodular functions. Adv. Neu. Inf. Proc. Sys. 21. Cited by: §I. [21] B. Wilder (2018) Equilibrium computation and robust optimization in zero sum games with submodular structure. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §I. [22] Q. Wu, J. Xu, Y. Zeng, D. W. K. Ng, N. Al-Dhahir, R. Schober, and A. L. Swindlehurst () A comprehensive overview on 5G-and-beyond networks with UAVs: from communications to sensing and intelligence. IEEE Journal on Selected Areas in Communications 39 (), p. 2912–2945. Cited by: §I. [23] Z. Xu, X. Lin, and V. Tzoumas (2023) Bandit submodular maximization for multi-robot coordination in unpredictable and partially observable environments. In Robotics: Science and Systems (RSS), Cited by: §I, §I. [24] Z. Xu and V. Tzoumas (2026) Self-configurable mesh-networks for scalable distributed submodular bandit optimization. arXiv preprint:2602.19366. Cited by: §-A, Figure 1, §I, §I, §I-B, §IV, Definition 4, 2, 3. Appendix -A Theoretical Proofs In the interest of space, we outline only the essential steps emphasizing the main technical ideas of the proofs. Proof of Theorem˜1 Consider a sequence of btt∈[T]\ 29026 _ 29044 \_ 29044 12850 67482370 29012 84267779 that is selected by the attacker over [T] 67482370 29012 84267779. Then, replacing f 29030 with ft 29030 _ 29044 in [24, eq. (14)] (this preserves validity of the original argument), we have: ∑t=1Tft(t)≥(1−ˇf)∑t=1Tft()+ˇf(1−ˇf)æ(ˇI,ﬀ¯) 4944 _ 29044 12349 28721 29012 29030 _ 29044 67273472 28993 _ 29044 84054785 12821 67273472 28721 8704 28948 _ 29030 84054785\, 4944 _ 29044 12349 28721 29012 29030 _ 29044 67273472 28993 29007 29008 29012 84054785 8235 28948 _ 29030 \, 67273472 28721 8704 28948 _ 29030 84054785\, 28954 67273472 28948 _ 29001 24891 28939 84054785 ×∑i∈∑t=1TVoCft,t(ai,t;i⋆(ai,tt∈[T];ﬀi,ℳi)) 48.36958pt 8706 4944 _ 29033 12850 29006 4944 _ 29044 12349 28721 29012 VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \, 29006 _ 29033 8511 67273472\ 29025 _ 29033 24891 29044 \_ 29044 12850 67482370 29012 84267779 24635 28939 _ 29033 24891 29005 _ 29033 84054785 84054785 −O~(||T(ﬀ¯2|ℳ¯|+|¯|)), 59.75095pt 8704 29007 67273472 69640972 29006 69640972 29012 67273472 28939 28722 \, 69640972 29005 69640972\, 8235 \, 69640972 29014 69640972 84054785 84054785 24891 (10) where i∗(⋅) 29006 8707 _ 29033 67273472 8705 84054785 is the optimizer of VoC (3). Therefore, ℛTd 29010 29028 _ 29012 =∑t=1Tft()−∑t=1Tft(t) 12349 4944 _ 29044 12349 28721 29012 29030 _ 29044 67273472 28993 29007 29008 29012 84054785 8704 4944 _ 29044 12349 28721 29012 29030 _ 29044 67273472 28993 _ 29044 84054785 ≤ˇf∑t=1Tft()−ˇf(1−ˇf)æ(ˇI,ﬀ¯) 12820 28948 _ 29030 \, 4944 _ 29044 12349 28721 29012 29030 _ 29044 67273472 28993 29007 29008 29012 84054785 8704 28948 _ 29030 \, 67273472 28721 8704 28948 _ 29030 84054785\, 28954 67273472 28948 _ 29001 24891 28939 84054785 ×∑i∈∑t=1TVoCft,t(ai,t;i⋆(ai,tt∈[T];ﬀi,ℳi)) 8706 4944 _ 29033 12850 29006 4944 _ 29044 12349 28721 29012 VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \, 29006 _ 29033 8511 67273472\ 29025 _ 29033 24891 29044 \_ 29044 12850 67482370 29012 84267779 24635 28939 _ 29033 24891 29005 _ 29033 84054785 84054785 +O~(||T(ﬀ¯2|ℳ¯|+|¯|)). 8235 29007 67273472 69640972 29006 69640972 29012 67273472 28939 28722 \, 69640972 29005 69640972\, 8235 \, 69640972 29014 69640972 84054785 84054785 314 (11) Since Γ≤ft()≤1,∀t0 12820 29030 _ 29044 67273472 28993 29007 29008 29012 84054785 12820 28721 24891 568 29044 , Theorem˜1 is proved. ∎ Proof of Lemma˜1 Since the attacker adopts EXP3 , according to [2, Theorem 3.1], we have ℛTa≤2T|Y|log⁡|Y| 29010 _ 29012 29025 12820 28722 \, 29012 \, 69640972 29017 69640972\, 69640972 29017 69640972, and thus Lemma˜1 holds. ∎ Proof of Theorem˜2 Consider the repeated-play dynamic induced by Algorithm 1, which generates a sequence of actions (t,bt)t∈[T]\ 67273472 28993 _ 29044 24891 29026 _ 29044 84054785\_ 29044 12850 67482370 29012 84267779. Since the payoff for a player is linear in other player’s mixed strategy, the best fixed comparator in the regret definition is attained at an extreme point of the simplex, and thus can be taken to be a pure strategy. Theorem 1 and Lemma 1 then bound the average regrets with respect to these best fixed comparators, as ϵ1≜ℛTd/T 28943 _ 28721 29010 _ 29012 29028 68408078 29012 and ϵ2≜ℛTa/T 28943 _ 28722 29010 _ 29012 29025 68408078 29012 , respectively. Invoking [8, Theorem 2] with ϵ1 28943 _ 28721 and ϵ2 28943 _ 28722 , we get the result. ∎ -B The ActSel and NeiSel Algorithms Below, we outline Algorithms 2 and 3 in detail. 0: Attacker’s actions (b1…,bt−1) 67273472 29026 _ 28721 314 314 314 24891 29026 _ 29044 8704 28721 84054785, Number of time steps T 29012 , agent’s action set i 29014 _ 29033 , objective function f 29030 10: Agent i 29033 ’s distribution pi,t 29040 _ 29033 24891 29044 and action ai,t 29025 _ 29033 24891 29044 . 1: ηia←2log⁡|i|/(|i|T) 28945 _ 29033 29025 12832 28722 69640972 29014 _ 29033 69640972\, 68408078\, 67273472 69640972 29014 _ 29033 69640972 29012 84054785; 2: w^1←[w^1,1,…,w^|i|,1]⊤ 29047 _ 28721 12832 67482370 29047 _ 28721 24891 28721 24891 … 24891 29047 _ 69640972 29014 _ 29033 69640972 24891 28721 84267779 574 with w^a,1=1,∀a∈i 29047 _ 29025 24891 28721 12349 28721 24891 568 29025 12850 29014 _ 29033 ; 23: pi,1←w^1/⊤w^1 29040 _ 29033 24891 28721 12832 29047 _ 28721 68408078 28721 574 29047 _ 28721 4: draw action ai,t∈i 29025 _ 29033 24891 29044 12850 29014 _ 29033 from pi,t 29040 _ 29033 24891 29044 ; receive neighbors’ actions aj,tj∈i,t\ 29025 _ 29034 24891 29044 \_ 29034 12850 29006 _ 29033 24891 29044 ; 35: rai,t,t←f(ai,t|aj,tj∈i,t,bt) 29042 _ 29025 _ 29033 24891 29044 24891 29044 12832 29030 67273472\, 29025 _ 29033 24891 29044 \, 69640972\,\ 29025 _ 29034 24891 29044 \_ 29034 12850 29006 _ 29033 24891 29044 24891 29026 _ 29044 84054785 and normalize rai,t,t 29042 _ 29025 _ 29033 24891 29044 24891 29044 to [Γ,1] 674823700 24891 28721 84267779; 6: r^a,t←1−(ai,t=a)pa,t(1−rai,t,t) 29042 _ 29025 24891 \, 29044 12832 28721 8704 28721 67273472 29025 _ 29033 24891 29044 \, 12349 \, 29025 84054785 29040 _ 29025 24891 29044 67273472 28721 \, 8704 \, 29042 _ 29025 _ 29033 24891 29044 24891 29044 84054785, ∀a∈i 568 29025 12850 29014 _ 29033 ; 7: w^a,t+1←w^a,texp⁡(ηinr^a,t),∀a∈i 29047 _ 29025 24891 29044 8235 28721 12832 29047 _ 29025 24891 29044 67273472 28945 _ 29033 29038 \, 29042 _ 29025 24891 29044 84054785 24891 568 29025 12850 29014 _ 29033 ; 8: get distribution pi,t+1←w^t+1/ 1⊤w^t+1 29040 _ 29033 24891 29044 8235 28721 12832 29047 _ 29044 8235 28721 \, 68408078\, 28721 574 29047 _ 29044 8235 28721 ; Algorithm 2 ActSel for Agent i 29033 [24] 0: Actions ai,t,(b1…,bt−1) 29025 _ 29033 24891 29044 24891 67273472 29026 _ 28721 314 314 314 24891 29026 _ 29044 8704 28721 84054785, Number of time steps T 29012 , Agent i 29033 ’s ℳi,αi 29005 _ 29033 24891 28939 _ 29033 , and objective function f 29030 10: Agent i 29033 ’s neighbors i,t 29006 _ 29033 24891 29044 at each t∈[T] 29044 12850 67482370 29012 84267779. 1: ηin←2log⁡|ℳi|/(|ℳi|T) 28945 _ 29033 29038 12832 28722 69640972 29005 _ 29033 69640972\, 68408078\, 67273472 69640972 29005 _ 29033 69640972 29012 84054785; 2: z1(k)←[z1, 1(k),…,zﬀi,1(k)]⊤ 29050 _ 28721 67273472 29035 84054785 12832 67482370 29050 _ 28721 24891 \, 28721 67273472 29035 84054785 24891 … 24891 29050 _ 28939 _ 29033 24891 28721 67273472 29035 84054785 84267779 574 ; zj,1(k)=1 29050 _ 29034 24891 28721 67273472 29035 84054785 12349 28721 , ∀|∈ℳi,∀k∈[αi] 568 69640972 12850 29005 _ 29033 24891 568 29035 12850 67482370 28939 _ 29033 84267779; 3: for k=1,…,αi 29035 12349 28721 24891 … 24891 28939 _ 29033 do 4: get distribution ψt(k)←zt(k)/‖zt(k)‖1 28960 _ 29044 67273472 29035 84054785 12832 29050 _ 29044 67273472 29035 84054785\, 68408078\, 69645069 29050 _ 29044 67273472 29035 84054785 69645069_ 28721 ; 5: draw agent jt(k)∈ℳi 29034 _ 29044 67273472 29035 84054785 12850 29005 _ 29033 from ψt(k) 28960 _ 29044 67273472 29035 84054785; 6: receive action ajt(k),t 29025 _ 29034 _ 29044 67273472 29035 84054785 24891 29044 from jt(k) 29034 _ 29044 67273472 29035 84054785; 27: rjt(k),t←VoCft,t(ai,t;ajt(1),t,…,ajt(k),t)− 29042 _ 29034 _ 29044 67273472 29035 84054785 24891 29044 12832 VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \,\ 29025 _ 29034 _ 29044 67273472 28721 84054785 24891 29044 24891 … 24891 29025 _ 29034 _ 29044 67273472 29035 84054785 24891 29044 \ 84054785 8704 3 VoCft,t(ai,t;ajt(1),t,…,ajt(k−1),t) VoC_ 29030 _ 29044 24891 29044 67273472 29025 _ 29033 24891 29044 24635 \,\ 29025 _ 29034 _ 29044 67273472 28721 84054785 24891 29044 24891 … 24891 29025 _ 29034 _ 29044 67273472 29035 8704 28721 84054785 24891 29044 \ 84054785 and normalize rjt(k),t 29042 _ 29034 _ 29044 67273472 29035 84054785 24891 29044 to [Γ,1] 674823700 24891 28721 84267779; 8: r^j,t(k)←1−(jt(k)=j)qj,t(k)(1−rjt(k),t) 29042 _ 29034 24891 29044 67273472 29035 84054785 12832 28721 8704 28721 67273472 29034 _ 29044 67273472 29035 84054785\, 12349 \, 29034 84054785 29041 _ 29034 24891 29044 67273472 29035 84054785 67273472 28721 \, 8704 \, 29042 _ 29034 _ 29044 67273472 29035 84054785 24891 29044 84054785, ∀j∈ℳi 568 29034 12850 29005 _ 29033 ; 9: zj,t+1(k)←zj,t(k)exp⁡(ηinr^j,t(k)) 29050 _ 29034 24891 29044 8235 28721 67273472 29035 84054785 12832 29050 _ 29034 24891 \, 29044 67273472 29035 84054785 67273472 28945 _ 29033 29038 \, 29042 _ 29034 24891 29044 67273472 29035 84054785 84054785, ∀j∈ℳi 568 29034 12850 29005 _ 29033 ; 10: end for 11: i,t←jt(k)k∈[ﬀi] 29006 _ 29033 24891 29044 12832 \ 29034 67273472 29035 84054785_ 29044 \_ 29035 12850 67482370 28939 _ 29033 84267779; Algorithm 3 NeiSel for Agent i 29033 [24]