Paper deep dive
Randomise Alone, Reach as a Team
Léonard Brice, Thomas A. Henzinger, Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini
Abstract
Abstract:We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (\exists\mathbb{R}) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop.
Tags
Links
- Source: https://arxiv.org/abs/2603.07094v1
- Canonical: https://arxiv.org/abs/2603.07094v1
PDF not stored locally. Use the link above to view on the source site.
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%
Last extracted: 3/13/2026, 12:07:13 AM
Summary
The paper investigates concurrent graph games with multiple players forming a team against an opponent, specifically focusing on distributed randomization where team members lack a shared source of randomness. The authors prove that memoryless strategies are sufficient for the threshold reachability problem, placing it in the Existential Theory of the Reals (ETR) and demonstrating NP-hardness. They also establish NP-completeness for the almost-sure reachability problem and introduce Individually Randomised Alternating-time Temporal Logic (IRATL) to reason about these settings. Practical solvers are implemented and evaluated against benchmarks.
Entities (5)
Relation Signals (3)
IRATL → extends → ATL
confidence 95% · This logic extends the standard ATL framework to reason about probability thresholds
Almost-sure Reachability Problem → hascomplexity → NP-complete
confidence 95% · For the almost-sure reachability problem, we prove NP-completeness.
Threshold Problem → belongstocomplexityclass → Existential Theory of the Reals
confidence 90% · a result that not only places the problem in the Existential Theory of the Reals
Cypher Suggestions (2)
Identify the relationship between IRATL and ATL. · confidence 95% · unvalidated
MATCH (a:Logic {name: 'IRATL'})-[r:EXTENDS]->(b:Logic {name: 'ATL'}) RETURN rFind all problems discussed in the paper and their associated complexity classes. · confidence 90% · unvalidated
MATCH (p:Problem)-[:HAS_COMPLEXITY]->(c:ComplexityClass) RETURN p.name, c.name
Full Text
136,506 characters extracted from source content.
Expand or collapse full text
11institutetext: Institute of Science and Technology Austria, Austria 11email: leonard.brice,tah, alipasha.montaseri,ali.shafiee@ista.ac.at 22institutetext: Université libre de Bruxelles, Belgium 22email: thejaswini.raghavan@ulb.be Randomise Alone, Reach as a Team Léonard Brice Thomas A. Henzinger Alipasha Montaseri Ali Shafiee K. S. Thejaswini Abstract We study concurrent graph games where n players cooperate against an opponent to reach a set of target states. Unlike traditional settings, we study distributed randomisation: team players do not share a source of randomness, and their private random sources are hidden from the opponent and from each other. We show that memoryless strategies are sufficient for the threshold problem (deciding whether there is a strategy for the team that ensures winning with probability that exceeds a threshold), a result that not only places the problem in the Existential Theory of the Reals (∃ℝ ) but also enables the construction of value iteration algorithms. We additionally show that the threshold problem is NP-hard. For the almost-sure reachability problem, we prove NP-completeness. We introduce Individually Randomised Alternating-time Temporal Logic (IRATL). This logic extends the standard ATL framework to reason about probability thresholds, with semantics explicitly designed for coalitions that lack a shared source of randomness. On the practical side, we implement and evaluate a solver for the threshold and almost-sure problem based on the algorithms that we develop. 1 Introduction Multi-agent systems provide a robust mathematical framework for modelling complex interactions across a diverse array of domains. From the formal verification of reactive systems [17, 3] and cyber-physical architectures [37, 30] to the study of epidemic processes [28] and distributed probabilistic programs [1], the interplay between autonomous entities is central to ensuring system correctness In these models, such entities are typically represented as players, repeatedly and concurrently choosing among some set of actions. While classical verification often considers a single controller against an adversarial environment, there are plenty of applications that demand multi-agent models with access to varying degrees of cooperation and information. A traditional tool to capture what coalitions of players can achieve in such settings is Alternating Temporal Logic (ATL) [2]. ATL captures specifications of the following form: “Can a coalition achieve a given temporal objective against all strategies of the other players?” In such a setting, one need only consider deterministic strategies, since randomising among several actions does not bring any additional power. Important extensions, however, are Randomised ATL (RATL) [14] and Probabilistic ATL (PATL) [13], both of which capture notions such as almost-sure and limit-sure winning—and the latter considers threshold winning, that is, guaranteeing that the probability of winning exceeds a given threshold t. For those types of specifications, randomised strategies are strictly more powerful than deterministic ones. The semantics of RATL and PATL allow coalitions to randomise their actions jointly, implicitly assuming that they can resort to shared sources of randomness, or equivalently, that they can communicate about their random choices privately. Such a hypothesis simplifies significantly the setting, since the coalition can then be seen as a single player; but it is not always realistic in distributed settings. Picture an object that needs to be moved to the other side of a sliding door. Two players, named R2D2 and C3PO, can choose at each step to move the object either to the left or to the right, while an adversarial environment moves the sliding door, such that one among the left and the right side is open at each time step. If both R2D2 and C3PO choose the same direction as the environment, then the object moves to the other side and they win. If R2D2 and C3PO choose the same side, but the environment opens the other side then the object remains stationary, and they have to try again. If they choose different sides, the object breaks due to the opposing forces. This situation is illustrated in Figure˜1. With what probability can R2D2 and C3PO ensure reaching s_ goal? s0s_0s_ goals_ fail (L,L),L(L,L),L (R,R),R(R,R),R (L,R),∗(L,R),* (R,L),∗(R,L),* (L,L),R(L,L),R (R,R),L(R,R),L Figure 1: The two agents choose among actions L,R\L,R\ and the environment chooses among actions L,R\L,R\ . Consider first the setting where R2D2 and C3PO have access to a shared source of randomness, which can’t be observed by the adversary. Such games can be viewed as standard two-player concurrent games, where the team acts as a single meta-player. By playing the joint actions (L,L)(L,L) or (R,R)(R,R) with probability 12 12 each, R2D2 and C3PO ensure that for any environmental choice, the probability of advancing is 12 12, and the probability of reaching the failure state is 0. By repeating this, it is almost sure that the target will eventually be reached. However, when R2D2 and C3PO do not share a source of randomness nor a private communication channel, they have to randomise their actions independently. One can show that against any such strategy profile, the environment can respond in such a way that the probability of eventually reaching the target does not exceed 13 13. Observe here that the severity of this result relies on the order of the quantifiers: we assume that the team fixes a collective strategy first, and that the environment responds to it, hence the value 13 13 is what is called in the literature the max-min value. If, instead, the environment must commit to a strategy first, then R2D2 and C3PO can respond deterministically, and reach the target state with probability at least 12 12: the min-max value. We assume the burden of pessimism and focus on the max-min value. Our Results. We first start with the following problem for team games. Threshold problem: Given a game, given a rational threshold t∈[0,1]t∈[0,1], does there exist a collective strategy for the team that ensures the target is reached with probability strictly greater than t? Our first main result establishes that when there exists such a collective strategy, there exists a memoryless one (Theorem˜3.1). This implies that these games can be fully characterised by local, state-dependent strategies. Therefore, given an instance of the threshold problem, we can construct a formula in the Existential Theory of the Reals (ETR) that is satisfiable if and only if we have a positive instance. This proves that the threshold problem lies in the complexity class ∃ℝ (itself contained in PSPACE [9, 36]). We also prove that the problem is NP-hard using a reduction from k-clique problem. This is unlike the situation for two-player concurrent games, where no such hardness results are known. In fact, for two-player concurrent games (or for games where the team can share randomness, which reduces to the latter) the best known lower bound for the threshold problem is SQRTSUM-hardness [16] and P-hardness derived from turn-based reachability games [22] and the best known upper bound is the complexity class ∃ℝ [10]. While our encoding into ETR provides an algorithm, such large logical sentences are notoriously difficult for existing solvers to solve. To provide a more computationally accessible approach, we also develop a Value Iteration (VI) algorithm that converges to the optimal max-min value—even though it might never reach it. Secondly, we prove NP-completeness of almost-sure reachability. Almost-sure problem: Given a game, does there exist a collective strategy for the team that ensures that the target is reached with probability 11? While this problem lies in P in a two-player setting, we show that it is NP-hard even with three players. Membership in NP is obtained by proving that memoryless strategies also suffice in this setting—but the proof requires techniques different from that of the threshold problem. Thirdly, we implement solvers for the threshold and almost-sure problems, and evaluate them on existing benchmarks that we modify to fit our setting. For the threshold problem, we first encode the entire game as a single formula in ∃ℝ and use SMT solvers; while this offers strong theoretical guarantees, the formula’s complexity often leads to timeouts. In contrast, we employ VI algorithms using various termination criteria and heuristics. These heuristics provide sound under-approximations, and terminate with results close to the actual max-min values in our benchmarks. Because VI methods query the ∃ℝ solver using smaller formulas, they terminate significantly faster than solving the whole encoding. For the almost-sure case, given its NP-completeness, we utilise SAT solvers on the succinct SAT encodings that we construct. We use state-of-the-art PRISM-games to solve the threshold and the almost-sure version of the strict subcase of the games with shared randomness as a baseline. Despite solving a computationally harder game, our solvers achieve runtimes comparable to PRISM-games [25]. Finally, we propose a new logic called Individually Randomised ATL (IRATL) to capture the inability of a coalition to randomise collectively. The syntax of IRATL extends ATL with operators capturing the fact that a coalition can enforce an objective without using any shared randomness. For example, considering our robot team, we would write the formula ⟨R2D2,C3PO⟩(◇s) \! R2D2, C3PO \! _ almost sh( \s_ goal\) to check if the team can reach s_ goal almost-surely using shared ( sh) randomness. In contrast, to query if the team can reach the target with a probability strictly greater than 0.30.3 using independent ( ind) sources of randomness, we write ⟨R2D2,C3PO⟩>0.3(◇s) \! R2D2, C3PO \! _>0.3 ind( \s_ goal\). As established previously, both sentences hold true in their respective settings. We effectively solve the model-checking problem for a key fragment of this logic. 1.0.1 Related work. Team Games and temporal logics. Since its inception in the seminal work of Alur et al. [2], Alternating-time Temporal Logic (ATL) has inspired a rich ecosystem of extensions. Prominent examples include Randomised ATL (RATL) [14], and Robust ATL [32], as well as probabilistic variants such as PATL [13] and rPATL [12]. By modelling the coalition as a single meta-player capable of correlating strategies, these frameworks effectively reduce the problem to a standard two-player zero-sum game. In contrast, our work addresses the setting where agents must randomise independently, preventing such a reduction. Multi-player games with teams. Several works including those on population games [4], concurrent parameterised games [5], and synthesis of coalition strategies [6], work on settings with an unknown number of players as opposed to the fixed number of players in our case. Fixing the number of players enables us to obtain value-iteration-like algorithms, more amenable to implementation. A recent work on one-shot games with partially shared randomness (various sources of randomness, each of them being accessible to a given subset of players) shows that the threshold problem with a non-strict inequality (instead of strict, as in our setting) is ∃ℝ -complete [8] for one-shot games. However, neither the hardness nor the easiness results extend to our setting. Fixed environments and multi-agent learning. As a side observation, we consider settings where the environment is memoryless (modelling fixed but unknown stochastic process). This setting reduces to multi-agent reinforcement learning, where agents must learn to coordinate in a stochastic environment [18, 40, 23]. Tools. The landscape of automated verification for multi-agent systems is rich with tools, yet, to the best of our knowledge, none natively support the independent randomisation constraint central to our work. The most prominent tool, PRISM-games [11, 25], extends probabilistic model checking to the two-player version. While PRISM-games can verify properties for a coalition of players (e.g., using the logic rPATL), it treats it as a single “meta-player” capable of correlated strategies. Tools such as MCMAS [29] and EVE [19] enable strategic reasoning via ATL and Nash equilibrium synthesis. However, MCMAS focuses primarily on qualitative (Boolean) verification and epistemic properties, while EVE and related Nash solvers (e.g., PRISM-Nash) target non-zero-sum equilibria. 2 Preliminaries 2.1 Probabilities Given a finite set S, a probability distribution over S is a mapping d:S→[0,1]d:S→[0,1] that satisfies the equality ∑x∈Sd(x)=1 _x∈ Sd(x)=1. The set of probability distributions over S is denoted by S DistS. 2.2 Games We use the word game for team concurrent stochastic reachability games. Definition 1(Game structure, game) A game structure is a tuple G=(Σ,S,(Ap)p∈Σ,(p)p∈Σ,δ)G=( ,S,(A_p)_p∈ ,( Av_p)_p∈ ,δ), consisting of: • a set Σ of players; • a set S of states; • for each player p, a set ApA_p of actions; • for each player p and state s, a set p(s)⊆Ap Av_p(s) A_p of actions available at state s (always assumed to be nonempty); • a transition function δ:S×∏p∈ΣAp→(S)δ:S× _p∈ A_p→ Dist(S) that maps each tuple (s,(ap)p)(s,(a_p)_p) with ap∈p(s)a_p∈ Av_p(s) for each p to a probability distribution over S. A game is a tuple =(G,Π,T,s0)G=(G, ,T,s_0), that consists of a game structure equipped with a team Π⊆Σ , a set T⊆ST S of target states, and an initial state s0∈Ss_0∈ S. When referring to a game G or a game structure G, we often use the notations S, δ, etc., without recalling them. Given a set P⊆ΣP and a vertex s, we often write P(s) Av_P(s) (or (s) Av(s) if P=ΣP= ) to denote the product P(s)=∏p∈Pp(s) Av_P(s)= _p∈ P Av_p(s). For technical reasons, we assume that the target set T is absorbing, i.e. that for every state t∈Tt∈ T and action profile a¯ a we have δ(t,a¯)(t)=1δ(t, a)(t)=1. We call play (resp. history) in the game structure G an infinite (resp. finite) word over the alphabet S. We sometimes say that the team wins the play π if that play contains a target state s∈Ts∈ T, and loses it otherwise. 2.3 Strategies, and strategy profiles A strategy for player p∈Σp∈ is a mapping σp _p that maps each history hshs to a probability distribution σp(hs)∈p(s) _p(hs)∈ Dist\, Av_p(s). A strategy profile for the set P⊆ΣP is a tuple σ¯P=(σp)p∈P σ_P=( _p)_p∈ P, where each σp _p is a strategy for player p. A collective strategy is a strategy profile σ¯Π σ_ for the team. A complete strategy profile is a strategy profile σ¯Σ σ_ for Σ , often written σ¯ σ, without subscript. Given two strategy profiles σ¯P σ_P and τ¯Q τ_Q with P∩Q=∅P∩ Q= , we write (σ¯P,τ¯Q)( σ_P, τ_Q) for the strategy profile η¯P∪Q η_P∪ Qwhere ηp=σp _p= _p if p∈Pp∈ P, and τp _p if p∈Qp∈ Q. Given two strategy profiles denoted by σ¯P σ_P and σ¯Σ∖P σ_ P, we simply write σ¯ σ for the complete strategy profile (σ¯P,σ¯Σ∖P)( σ_P, σ_ P). Given a strategy profile σ¯P σ_P and a history h, we write σ¯(h) σ(h) for the induced distribution over joint actions, i.e., for the distribution σ¯(h):a¯P↦→∏p∈Pσp(ap) σ(h): a_P → _p∈ P _p(a_p). A complete strategy profile σ¯ σ defines a probability distribution ℙσ¯P_ σ over plays, where for each history hshs and state t, we have: ℙσ¯(hstSω)=∑a¯∈(s)σ¯(hs)(a¯)⋅δ(s,a¯)(t).P_ σ(hstS^ω)= _ a∈ Av(s) σ(hs)( a)·δ(s, a)(t). A strategy σp _p is deterministic if for every history h, there exists an action a such that σp(h)(a)=1 _p(h)(a)=1. It is memoryless if for every state s and every history h, we have σp(hs)=σp(s) _p(hs)= _p(s). 2.4 Problems Throughout this paper, our goal is to design collective strategies for teams that guarantee a certain property against every strategy of the other players. We always assume that the team players fix their strategies first, and the other players (the opponents) respond to it. From the opponents’ perspective, responding to a collective strategy amounts to finding optimal strategies in a Markov decision process. It is a well-known result that deterministic strategies are sufficient in such a setting. Consequently, contrary to the team players, the fact that the opponents do not share a common source of randomness is not a restriction, and the opponents can be considered as a single player. For convenience, we will therefore always assume that the set Σ∖Π is a singleton \ O\, where player O is called opponent. In such a setting, a natural question is to ask whether the team players can win the game with a probability greater than a given threshold. Problem 1(Threshold problem) Given a game G and a threshold t∈ℚ∩[0,1]t ∩[0,1], does there exist a collective strategy σ¯Π σ_ such that for every strategy σ _ O, the set T is reached with probability strictly greater than t? A less ambitious question is to ask whether they can win with probability 11. Problem 2(Almost-sure problem) Given a game G, does there exist a collective strategy σ¯Π σ_ such that for every strategy profile σ¯ σ_ O, the set T is reached with probability 1? In the two next sections, we design algorithms for those two problems. 3 The threshold problem In this section, we analyse the structure of the team’s strategies and establish the theoretical foundations for solving Problem 1. The main result of this section is Theorem˜3.1, which establishes that memoryless strategies are sufficient. Building on this result, we provide a decision procedure by encoding the threshold problem as an ETR formula. Finally, we end the section with a result on NP-hardness for the threshold problem, which adds to the SQRTSUM-hardness result that is already known from the two-player case [16]. 3.1 Optimality of memoryless strategies In this subsection, we prove that memoryless strategies suffice to exceed a given threshold. The structure of the proof is similar to that of the same result in standard two-player zero-sum concurrent games [10]. However, the latter heavily relies on the equality of the max-min and min max values, which no longer holds in our setting. Our proof requires therefore more technical effort. 3.1.1 Value iteration. As a step toward our result, and as an algorithmic tool, we define a value iteration sequence. A valuation is a mapping v:S→[0,1]v:S→[0,1]. We define the predecessor operator based on this intuition: for a fixed valuation v and a state s, we consider a local one-shot game played in s. In this local game, the team players choose a joint distribution over their available actions, and the opponent responds with an action as well. If t is the subsequent state, then the team gets payoff v(t)v(t). The team players want to maximise the expected payoff, while the opponent wants to minimise it. To avoid confusion, strategies in this one-shot game are selectors. The predecessor operator is defined in three steps: given fixed strategies for both the team and the opponent, given a fixed team strategy against an optimal opponent, and finally, the optimal one-step value. First, given a valuation v, a collective selector ξ¯Π ξ_ , and an opponent selector ξ _ O, we define the expected payoff at state s as: ξ¯(v)(s)=∑a¯∈(s)∑t∈Sξ¯(s)(a¯)⋅δ(s,a¯,b)(t)⋅v(t). Pre_ ξ(v)(s)\;=\; _ a∈ Av(s) _t∈ S ξ(s)( a)·δ(s, a,b)(t)· v(t). We now define the value guaranteed by a fixed collective selector ξ¯Π ξ_ : ξ¯Π(v)(s)=infξ∈Λ(s)ξ¯(v)(s) Pre_ ξ_ (v)(s)\;=\; _ _ O∈ _ O(s) Pre_ ξ(v)(s) where Λ(s)=() _ O(s)= Dist( Av_ O) is the set of selectors for the opponent in state s. Finally, we define the predecessor operator Pre, which represents the maximum value the team can guarantee in one step against the opponent: (v)(s)=supξ¯Π∈ΛΠ(s)infξ∈Λ(s)ξ¯(v)(s) Pre(v)(s)\;=\; _ ξ_ ∈ _ (s) _ _ O∈ _ O(s) Pre_ ξ(v)(s) where ΛΠ(s) _ (s) is the set of collective selectors for the team in state s. We can then define a sequence (vn)n(v_n)_n of valuations as follows: the valuation v0v_0 maps each target state to the value 11 and each other state to the value 0, and for each n, we have vn+1=(vn)v_n+1= Pre(v_n). We then have the following result. Lemma 1(App. 0.B.2) For every vertex s, the sequence (vn(s))n(v_n(s))_n converges to the max-min value of the game, when initialised in s. As we will show in Section˜5, this value iteration sequence also entails a sound under-approximation algorithm. For now, let us use this tool to prove our result. 3.1.2 Memoryless strategies are sufficient. Theorem 3.1(App. 0.B.3) Let G be a game and let t be a threshold. If there is a collective strategy σ¯Π σ_ that guarantees victory with probability strictly greater than t against every opponent’s strategy, then there is a memoryless one. Proof(sketch) Using Lemma˜1, there exists an index n such that the nnth valuation, on s0s_0, takes a value that is greater than t. We assign a “rank” to each state corresponding to the iteration index where its value was determined. The memoryless strategy is then constructed to ensure that at each step, the next step has either a higher valuation or a lower rank, which guarantees convergence to the states with valuation 11 and with rank 0: the target states. ∎ 3.1.3 A variant: playing against a memoryless opponent. Before proceeding to the study of the threshold problem, we make a remark about a variant of our problem: the case where the opponent is restricted to memoryless strategies. This modelling choice is well-motivated by the physical reality of autonomous systems, whose environments are often fixed (but unknown) stochastic processes rather than strategic adversaries. We show here that Theorem˜3.1 does not extend to this setting: surprisingly, restricting the opponent to memoryless strategies forces the team players to use memory to play optimally. The following example illustrates that solving this variant would require significantly different techniques from the field of multi-agent learning. Theorem 3.2(App. 0.B.4) There exists a game G such that against an opponent restricted to memoryless strategies, the team can achieve a strictly higher probability of reaching the target using a strategy with memory than with any memoryless collective strategy. Proof(sketch) We give an example in the game in Fig.˜2. Intuitively, the team players can agree to wait arbitrarily long to learn the probability distribution during the waiting stage and then play the appropriate action to maximise the probability of reaching the target state. ∎ SSSaS_aSbS_b⊤ ⟂ (wait,∗),a(wait,*),a(wait,∗),b(wait,*),b(∗,∗),∗(*,*),*(∗,∗),∗(*,*),*(a,a),a or (b,b),b(a,a),a or (b,b),botherwise Figure 2: An example game for opponent with no memory. 3.2 Complexity bounds After a short detour, we return to the setting where the opponent has (potentially) unbounded memory. We establish complexity bounds. 3.2.1 Membership in the class ∃ℝ . We first show that the problem is in ∃ℝ by constructing an equivalent ETR formula. Our construction relies on Theorem˜3.1. Theorem 3.3(App. 0.B.5) The threshold problem lies in the class ∃ℝ . Proof(sketch) To help our reduction, we introduce λ-discounted games. These games share the same structure as concurrent games but employ a discounted payoff function parameterised by a discount factor λ∈(0,1)λ∈(0,1). These games are quantitative: if the team reaches the target set T after k steps, the team gets the payoff λkλ^k (instead of 11). We construct a polynomial-size ETR formula Ψ(λ) (λ) which is satisfiable if and only if there exists a collective strategy such that the λ-discounted value exceeds t. Since this only captures the discounted version of the problem, we then show that it is equivalent to its qualitative counterpart, in the sense that there exists λ such that the formula Ψ(λ) (λ) is satisfiable if and only if we have a positive instance of the threshold problem. This intermediary reduction to the λ-discounted setting is necessary. A direct encoding of the reachability problem faces a hurdle: the Bellman optimality equations for reachability generally admit multiple fixed points. Consider, for example, a simple loop s→s→ s with no path to the target. The true reachability probability is 0. However, the assignment v(s)=1v(s)=1 trivially satisfies the Bellman equations one would write, making it a valid but spurious fixed point. One standard solution is to impose the restriction that it is the least fixed point.However, enforcing a “least fixed point” constraint would require the min operator, which is not permitted in the standard ∃ℝ framework. Alternatively, identifying the non-zero support set requires guessing (if there exists a subset of vertices from which the target can be reached with nonzero probability), which puts the problem in a larger complexity class ∃ℝ NP . This makes our intermediary detour via λ-discounted settings necessary. ∎ 3.2.2 NP-hardness. We finally end with a proof of NP-hardness for this problem. Note that it is also SQRTSUM-hard, following from the two-player case [16]. For the sake of completeness, we also note that the threshold problem with non-strict inequalities was recently shown to be complete for the class ∃ℝ [8, Lemma 5.7]. Theorem 3.4(App. 0.B.6) The threshold problem is NP-hard, even with three states and three players. Proof(sketch) We establish NP-hardness via a reduction from the clique problem. Given a graph G and an integer k, we construct a game with two team players and three states: an initial state, a target state, and a trap state. The team players select actions consisting of a tuple (u,i)∈V×1,…,k(u,i)∈ V×\1,…,k\, claiming that they have a clique whose i-th vertex is v. Simultaneously, the opponent selects a pair of indices (i,j)(i,j) to verify. Transitions from s0s_0 are defined as follows: the play reaches the target if the team plays ((u,i),(v,j))((u,i),(v,j)) and the opponent selects (i,j)(i,j) such that the selection is consistent with a clique (u=vu=v if i=ji=j, and (u,v)∈E(u,v)∈ E if i≠ji≠ j). The game transitions to the trap state if the team matches the opponent’s indices (i,j)(i,j) but fails the clique consistency check (u≠vu≠ v when i=ji=j, or (u,v)∉E(u,v)∉ E when i≠ji≠ j). If the team’s chosen indices do not match the opponent’s query, the game loops back to repeating the round. We show that if G contains a clique of size k, the team can reach the target state with probability 11. Otherwise, the maximum probability of reaching ⊤ is bounded by 34 34. The threshold problem is therefore NP-hard, even with threshold 34 34. ∎ 4 The almost-sure problem In this section, we focus on Problem 2: deciding whether the team can win almost surely. We show that this problem is NP-complete. First, in Section˜4.1, we show that memoryless strategies are also optimal for this problem. Second, in Section˜4.2, we prove the upper and the lower bound. 4.1 Optimality of memoryless strategies In this subsection, we show that if the team can guarantee reaching the target almost-surely, it can do so using a memoryless collective strategy. Formally, we show the correctness of the following theorem. Note that it is not a consequence of Theorem˜3.1, since the threshold problem is defined with strict inequalities. Theorem 4.1(App. 0.C.1) Let G be a game. If there is a collective strategy σ¯Π σ_ that guarantees victory with probability 11 against every opponent’s strategy, then there is a memoryless one. Proof(sketch) We assign a rank (a value from ℕ∪∞N∪\∞\) to each vertex, that captures “distance” from the target: the target set T has rank 0, and the states that have rank k+1k+1 are exactly those where the team can force a transition to states with rank k with positive probability, without visiting a vertex of infinite rank. We construct a memoryless collective strategy as follows. At every state, the team plays to guarantee a positive probability of moving to a lower rank, while staying within the winning set (the states with finite rank). Since the probability of reducing the rank is always strictly positive, the game is forced to eventually progress and reach rank 0 (the target) with probability 1. 4.2 Complexity bounds 4.2.1 SAT Encoding for NP membership. We show the following theorem. Theorem 4.2(App. 0.C.2) The almost-sure problem is in NP. Proof(sketch) Based on Section˜4.1, we know that if an almost-sure winning collective strategy exists, there exists a memoryless one. We then give an explicit SAT formula Φ that is satisfiable if and only if there exists a memoryless strategy that guarantee almost-sure reachability. The construction of Φ relies on the fact that almost-sure reachability is a qualitative property: it depends only on the support of the strategies, not on the precise probability values. The formula Φ encodes the search for a winning set W and the set of support of strategy for each player. The constraints enforce the existence of the ranking function as defined in the proof of Section˜0.C.1. We also apply some optimisations to reduce the number of variables, like using a binary encoding of the rank. ∎ We end this section with a hardness result. The following theorem is an immediate consequence of the construction presented in the proof of Theorem˜3.4. Corollary 1(of Theorem˜3.4) The almost-sure problem is NP-hard. 5 Experimental results We implement two algorithms to compute the max-min value and one for the almost-sure case. The algorithm for the value problem is a global reduction that maps the game to a single ETR formula. The second algorithm uses a value iteration (VI) scheme to compute the value. For the almost-sure case, we implement one algorithm based on its SAT encoding. To establish a baseline and contextualise our runtime, we also compare our approach with the state-of-the-art PRISM-games [25], which solves the two-player version of concurrent games. PRISM-games is particularly relevant as it handles both the quantitative (max-min value) and qualitative (almost-sure) versions of this problem, but where the team players have access to shared randomness. PRISM-games, therefore solves a strict, and computationally easier subcase of our setting. 5.1 Algorithms We evaluate three algorithms for the max-min value. The first is a global reduction. The second is a value iteration framework, for which we provide three distinct implementations. Each implementation provides a sound under-approximation of the value. The third is a baseline based on PRISM-games [25]. Algorithm 1: ETR-Direct. This method implements the reduction from Section 3.2. It translates the entire game into a single ETR formula. It then computes the game value using binary search combined with SMT solvers. Algorithm 2: Value iteration (VI). ETR-Direct scales poorly on games with many states. Therefore, we use the value iteration scheme from Section 3.1. This method updates values iteratively by solving local one-shot games. In these local games, we calculate the highest probability the team can guarantee to gain in a single step, based on the current value estimates of the states. Based on this, we update the estimates of state values. The value iteration algorithm is a lower bound for the max-min value of the game, and improves it every step. Therefore, the values act as a sound under-approximation. We propose three implementations for solving these local instances: 1. VI-ETR: This implementation uses an SMT solver to compute each local instance reliably. This guarantees precision but may take longer on computationally difficult instances. 2. VI-OPT: This implementation treats each one-shot game as a non-linear optimisation problem to be faster. We use Sequential Least Squares Programming (SLSQP) [24] to solve them. Because the constraints are smooth (in C∞C^∞), the solver uses analytical gradients. This method yields an under-approximation of the value because the solver may converge to a local optimum rather than the global one. However, our experiments show that the computed value remains close to the true value. 3. VI-Hybrid: This implementation combines SLSQP optimisation with exact solving. For each local game, we first generate a candidate value v∗v^* using SLSQP. We then query the SMT solver to verify if a value v≥v∗+εv≥ v^*+ is feasible. If the query is unsatisfiable, we accept v∗v^* as the global optimum. If satisfiable (indicating v∗v^* is a local optimum), we solve the instance with binary search using v∗v^* as a lower bound. Thus, this reliably computes the value of the game. Algorithm 3: Quantitative PRISM (Baseline). We include the state-of-the-art algorithm from PRISM-games [25]. It assumes that team players share randomness (whereas our setting assumes that team players do not share randomness). Our setting is more general, as shared randomness can be modelled by a single player. The optimisation strategy used in VI-OPT and VI-Hybrid (SLSQP) is explained in detail in Section˜0.E.1. For the almost-sure reachability problem, we evaluate two algorithms: Algorithm 4: SAT-Direct. This method implements the encoding described in Section˜4.2.1. It translates the almost-sure problem into a SAT. We then use a SAT solver to determine if an almost-surely winning collective strategy exists. This algorithm always gives a verification guarantee. Algorithm 5: Qualitative PRISM (Baseline). We use the qualitative verification engine of PRISM-games [25] as a baseline. Similar to Algorithm 3, this tool solves the almost-sure problem under the assumption of shared randomness, which is only a comparison point for our general setting. 5.1.1 Implementation. We implemented a prototype solver in Python 3.11.14 to evaluate these algorithms. We conducted all experiments on a machine running Ubuntu 24.04 LTS, equipped with an Intel Core Ultra 5 225U processor and 16 GB of RAM. We set a timeout of 600 seconds for each algorithm on each instance. For the max-min problem, we perform binary search until we achieve a precision of ε=10−4 =10^-4. 1. SMT solver: We use the Z3 SMT solver [31] for all ETR queries. This handles Algorithm 1, and the verification steps in the value iteration variants. 2. SLSQP optimiser: We use the SLSQP algorithm provided by SciPy [39]. This optimiser solves the local games in VI-OPT and generates the candidate values for VI-Hybrid. 3. Stopping criteria: We terminate the value iteration when the values stabilise. We stop the process when the maximum change across all states drops below ε′=10−4 =10^-4 and report the final values. 4. SAT solver: For the almost-sure reachability problem (Algorithm 4), we use the minisat22 SAT solver [38] provided by the PySAT library. Figure 3: Execution time of the algorithms across three benchmarks. The plots report execution time in seconds vs. the state space size: both in log scale. 5.2 Benchmarks We evaluate our algorithms on three distinct benchmarks. The first is Pursuit-Evasion with Rendezvous, a variant of the game from [33]. In this setting, a cooperative team of agents must meet at one vertex in a directed graph while avoiding a pursuer. The second is Robot Coordination, adapted from PRISM-games [26], where a team of robots navigates a grid against adversarial wind conditions to reach a target configuration. The third is Jamming Multi-Channel Radio Systems, also adapted from PRISM-games [26], which models sensors transmitting packets over frequency channels subject to adversarial jamming and collisions. We provide formal definitions, dynamics, and objective functions for all three benchmarks in Appendix 0.E.2. Figure 4: The plot compares the execution time (in seconds, logarithmic scale) of SAT-Direct under individual and shared randomness constraints against the baseline qualitative solver (which assumes shared randomness) 5.3 Results We evaluate our algorithms in two settings. The first is the general case of individual randomness. The second is the special case of shared randomness. In the latter, we model the team as a single “meta-player” with joint actions, which allows us to compare our results against PRISM-games. Detailed numerical data is provided in Appendix 0.E.3. 5.3.1 Max-min problem. Across both settings, (ETR-Direct) fails to solve even small instances within the timeout. In the case of individual randomness (Figure 3, left), the exact and hybrid VI approaches (VI-ETR, VI-Hybrid) solve small instances, but their execution time increases drastically as the state space grows. Conversely, the optimisation-based variant (VI-OPT) scales effectively, solving the largest instances across all benchmarks within the time limit. In terms of precision, VI-OPT yields tight under-approximations in both settings, consistently converging to values very close to the exact solutions. While PRISM outperforms our general-purpose solvers in the special case of shared randomness (Figure 3, right), VI-OPT stays competitive. 5.3.2 Almost-sure problem. We analyse the Pursuit-Evasion benchmark. We selected this instance because it is the only one where the team has non-trivial strategies to win with probability 11. Figure 4 presents the results. First, we look at the case of shared randomness. Here, our SAT-Direct algorithm performs well compared to PRISM-games. On smaller instances (Scenarios 1 to 4), our method is actually faster than PRISM because it has less overhead. In the largest instances (Scenarios 6 and 7), PRISM scales slightly better, but our solver remains competitive, despite solving a more difficult problem. Second, we look at the case of individual randomness. This problem is much harder because the players cannot coordinate their actions. As expected, the time needed to find a solution grows much faster than in the shared case. Despite this added difficulty, SAT-Direct is successful. It manages to solve large games with over 97,000 transitions within the time limit. 6 Towards Individually Randomised ATL We conclude this paper by proposing a new logic to capture our concept of individual randomisation. A classical tool to define specifications over reactive systems is the Alternating Temporal Logic (ATL) [2], in which one can encode the fact that a coalition of players can guarantee surely an objective that is expressible in Linear Temporal Logic (LTL). Since only sure guarantees are considered here, the semantics of ATL need not consider randomised strategies. This changed with the introduction of Randomised ATL (RATL) [14] (and PATL [13]). There, sure guarantees are extended with almost-sure and limit-sure ones, and the semantics allows players to use randomness. For example, the formula ⟨C⟩□q \! C \! _ almost q means that the coalition C has a “collective strategy” to ensure that no state satisfying the atomic proposition q is ever reached. However, the RATL semantics defines a collective strategy for the coalition C as a mapping from histories to distributions over the whole set of tuples of actions available for the players in C, implicitly assuming that those players can randomise based on a secret common source of randomness—which can then be reduced to a 22-player setting. Our notion of collective strategy, where players in a coalition can design their strategy together but randomisation is an individual process, cannot be captured by this logic. In this section, we introduce Individually Randomised ATL (or IRATL), whose semantics bridge this gap. 6.0.1 Syntax. Our logic is defined to extend both logics RATL and PATL. Definition 2(IRATL formula) Let Q be a set of atoms. Let Σ be a set of players. An IRATL formula is either: • a proposition q∈Qq∈ Q; • or a formula ¬φ or φ∨ψ ψ, where φ and ψ are IRATL formulas; • or a formula ⟨C⟩wr∘φ \! C \! _w^r -1.46387pt 1.5$ $ , ⟨C⟩wr□φ \! C \! _w^r , or ⟨C⟩wrφψ \! C \! _w^r Uψ, where C⊆ΣC is a coalition of players, where r∈,r∈\ sh, ind\ is a randomisation type, and where w∈,,∪>t∣t∈[0,1]w∈\ sure, almost, limit\∪\>t t∈[0,1]\ is a winning condition, and where φ and ψ are IRATL formulas. The operators ⟨⟩wr \! \! _w^r are called path quantifiers, the operators ∘ -1.46387pt 1.5$ $, □ , and U are called temporal operators, and subformulas of the form ∘φ -1.46387pt 1.5$ $ , □φ or φψ Uψ are called path formulas. 6.0.2 Semantics. Given a game structure G, a set of atomic propositions Q, and a labelling function L:S↦→2QL:S → 2^Q, we define the semantics of IRATL by cross-induction as follows. Given a state s∈Ss∈ S, we define: • s⊧qs q if we have q∈L(s)q∈ L(s); • s⊧¬φs if we do not have s⊧φs , and s⊧φ∨ψs ψ if we have s⊧φs or s⊧ψs ψ; • s⊧⟨C⟩θs \! C \! _ sure indθ if in the game structure G, from the state s, there exists a strategy profile σ¯C σ_C such that for every strategy profile σ¯Σ∖C σ_ C and every play π compatible with the strategy profile σ¯ σ, we have π⊧θπ θ; • s⊧⟨C⟩θs \! C \! _ almost indθ if in the game structure G, from the state s, there exists a strategy profile σ¯C σ_C such that for every strategy profile σ¯Σ∖C σ_ C, we have ℙσ¯π∈Sω∣π⊧θ=1P_ σ\π∈ S^ω π θ\=1; • s⊧⟨C⟩θs \! C \! _ limit indθ if in the game structure G, from the state s, for every ε>0 >0, there exists a strategy profile σ¯C σ_C such that for every strategy profile σ¯Σ∖C σ_ C, we have ℙσ¯π∈Sω∣π⊧θ>1−εP_ σ\π∈ S^ω π θ\>1- ; • s⊧⟨C⟩>tθs \! C \! _>t indθ if in the game structure G, from the state s, there exists a strategy profile σ¯C σ_C such that for every strategy profile σ¯Σ∖C σ_ C, we have ℙσ¯π∈Sω∣π⊧θ>tP_ σ\π∈ S^ω π θ\>t; • s⊧⟨C⟩wθs \! C \! _w shθ is defined analogously, but in the game structure GCG_C, in which all C players have been merged into one player who can pick an action profile at each state. On the other hand, given a play π=π0π1…π= _0 _1…, we define: • π⊧∘φπ -1.46387pt 1.5$ $ if we have π1π2⋯⊧φ _1 _2… ; • π⊧□φπ if for every n∈ℕn , we have πn⊧φ _n ; • π⊧φψπ Uψ if there exists n∈ℕn such that we have πn⊧ψ _n ψ, and πk⊧φ _k for every k<nk<n. Other classical symbols can be derived: for example, the formula ⊤ is short for q∨¬q q for some q, and the formula ◇φ for ⊤φ U . 6.0.3 Model-checking. The algorithms presented in the previous sections enable us to define a decidable fragment of IRATL. Theorem 6.1 Given a game structure G, a state s and a formula φ that does not contain the symbols ⟨C⟩ \! C \! _ limit ind or ⟨C⟩>t□ \! C \! _>t ind , deciding whether in the game G we have s⊧φs can be done in PSPACE, and in polynomial time if the size of the game structure is fixed. Proof As a preliminary remark, let us remind that when a coalition C seeks to achieve an objective θ against all strategies of the other players, those can be modelled by a single opponent, as they do not need any form of randomisation to respond to the coalition’s strategy. We can then apply the following recursive algorithm. Given an IRATL formula φ : • if φ=q,¬ψ =q, ψ, or ψ∨χψ χ, treat it recursively as expected. • If φ=⟨C⟩wθ = \! C \! _w shθ, then apply the same recursive operations as for RATL [14] and PATL [13]. • If φ=⟨C⟩θ = \! C \! _ sure indθ: since the coalition must surely reach the target, randomising is useless, and the coalition can then be seen as a single player. We can therefore apply the algorithms cited at the previous point. • If φ=⟨C⟩□ψ = \! C \! _ almost ind ψ: this formula is semantically equivalent to ⟨C⟩□ψ \! C \! _ sure ind ψ (see Appendix 0.D for a proof). We can therefore apply the previous point. • If φ=⟨C⟩∘ψ = \! C \! _ almost ind -1.46387pt 1.5$ $ψ (resp. ⟨C⟩>t∘ψ \! C \! _>t ind -1.46387pt 1.5$ $ψ), first compute the set T of states that satisfy the formula ψ. Then, consider the game in which the team C wants to reach the set T, and all states except s are made absorbing. Then, decide whether the team can guarantee reaching the set T almost surely (resp. with probability greater than t), using Theorem˜3.3 (resp. Theorem˜4.2). • If φ=⟨C⟩ψχ = \! C \! _ almost indψ Uχ (resp. ⟨C⟩>tψχ \! C \! _>t indψ Uχ), first compute the sets U and T of states that satisfy the formulas ψ and χ, respectively. Consider the game in which all states s′∈S∖Us ∈ S U are made absorbing. Then, decide whether in this game, the team C can guarantee reaching the set T almost surely (resp. with probability greater than t), using Theorem˜3.3 (resp. Theorem˜4.2). The size of the recursion stack is bounded by the size of the formula, and each recursion consists in a polynomial-space algorithm, hence the conclusion. ∎ The decidability of the full logic IRATL will be obtained if one can handle the two following problems, which we leave as open questions. Problem 3(Limit-sure problem) Given a game G, does it hold that for every ε>0 >0, there exists a collective strategy σ¯Π σ_ such that for any strategy σ _ O, the probability of reaching the set T under the strategy profile σ¯ σ is greater than 1−ε1- ? Problem 4(Safety problems) Given a game G, can the team guarantee (with probability greater than a given threshold, or limit-surely) that the set T is not eventually reached? The reader may already be convinced that the limit-sure problem requires new techniques, but believe that the safety threshold problem can be treated in a way analogous to the reachability threshold problem. This is not the case, because Theorem˜3.1 does not extend to safety games: as shown in the work of de Alfaro et al. [14, Theorem 8]. Even in a two-player setting, infinite memory may be required to guarantee that a safety objective is satisfied with positive probability. However, in that setting, this problem can be decided by taking the perspective of the opponent—by asking whether the opponent has a strategy to make the player lose with probability at least 1−t1-t. Such arguments no longer work in our setting, since the max-min value does not equal the min-max value. Hence, the safety threshold problem seems to be at more challenging than its reachability counterpart. 7 Discussion We introduced the logic IRATL, that extends the well-studied ATL to support cases where players randomise independently without a common coin. We showed that this distinction is not just semantic but requires new algorithms. IRATL describes games where players form teams, but must randomise individually. The exact complexity of solving these games is left as an open problem. We show ∃ℝ -easiness and NP-hardness for the threshold problem, which is also SQRTSUM-hard, following from the two-player case [16]. It would be interesting to know the exact complexity of this problem. Second, from a practical perspective, our current value iteration method relies on asymptotic convergence without a certified stopping criterion. Even in the simpler two-player concurrent setting, it is known that a double-exponential number of iterations might be required to approximate the value to within an ε -approximate value [20]. A promising avenue for future work is to extend recent results on stopping criteria for concurrent games [27]. By developing techniques to approximate the value from both above and below, we could rigorously guarantee ε -optimality upon termination. We leave as future work the integration into established model checkers such as PRISM-games [25] or STORM [21] enabling model checking tools that accurately model decentralised systems. credits 7.0.1 Acknowledgements This work is a part of project VAMOS that has received funding from the European Research Council (ERC), grant agreement No 101020093. Part of this work was realised when the first author was an FNRS aspirant at Université libre de Bruxelles. References [1] de Alfaro, L., Henzinger, T.A., Jhala, R.: Compositional methods for probabilistic systems. In: CONCUR 2001 — Concurrency Theory. p. 351–365. Springer Berlin Heidelberg, Berlin, Heidelberg (2001) [2] Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J. ACM 49(5), 672–713 (2002). https://doi.org/10.1145/585265.585270, https://doi.org/10.1145/585265.585270 [3] Attoui, A.: Multi-agent based method for reactive systems formal specification and validation. IFAC Proceedings Volumes 29(2), 1–6 (1996). https://doi.org/https://doi.org/10.1016/S1474-6670(17)43768-2, 5th IFAC/IFIP/GI/GMA Workshop on Experience With the Management of Software Projects 1995 (MSP ’95) [4] Bertrand, N., Bouyer, P., Lapointe, L., Mascle, C.: Reach together: How populations win repeated games. CoRR abs/2510.02984 (2025). https://doi.org/10.48550/ARXIV.2510.02984, https://doi.org/10.48550/arXiv.2510.02984 [5] Bertrand, N., Bouyer, P., Majumdar, A.: Concurrent parameterized games. In: FSTTCS 2019. LIPIcs, vol. 150, p. 31:1–31:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPICS.FSTTCS.2019.31 [6] Bertrand, N., Bouyer, P., Majumdar, A.: Synthesizing safe coalition strategies. In: FSTTCS 2020. LIPIcs, vol. 182, p. 39:1–39:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020). https://doi.org/10.4230/LIPICS.FSTTCS.2020.39 [7] Blackwell, D.: Discrete dynamic programming. The Annals of Mathematical Statistics 33(2), 719–726 (1962) [8] Brice, L., Henzinger, T.A., Thejaswini, K.S.: Dicey games: Shared sources of randomness in distributed systems (2026), https://arxiv.org/abs/2601.18303 [9] Canny, J.: Some algebraic and geometric computations in PSPACE. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing STOC. p. 460–467. STOC ’88, Association for Computing Machinery (1988). https://doi.org/10.1145/62212.62257 [10] Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Strategy improvement for concurrent reachability and safety games. CoRR abs/1201.2834 (2012), https://arxiv.org/abs/1201.2834, preprint version of the combined QEST/SODA results [11] Chen, T., Forejt, V., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Prism-games: A model checker for stochastic multi-player games. In: TACAS 2013. Lecture Notes in Computer Science, vol. 7795, p. 185–191. Springer (2013). https://doi.org/10.1007/978-3-642-36742-7_13 [12] Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic verification of competitive stochastic systems. In: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Lecture Notes in Computer Science, vol. 7214, p. 315–330. Springer (2012). https://doi.org/10.1007/978-3-642-28756-5_22 [13] Chen, T., Lu, J.: Probabilistic alternating-time temporal logic and model checking algorithm. In: Lei, J. (ed.) FSKD 2007. p. 35–39. IEEE Computer Society (2007) [14] de Alfaro, L., Henzinger, T.A., Kupferman, O.: Concurrent reachability games. Theoretical Computer Science 386(3), 188–217 (2007). https://doi.org/https://doi.org/10.1016/j.tcs.2007.07.008, expressiveness in Concurrency [15] Diehl, M., Ferreau, H.J., Haverbeke, N.: Efficient numerical methods for nonlinear mpc and moving horizon estimation. In: Nonlinear model predictive control: towards new challenging applications, p. 391–417. Springer (2009) [16] Etessami, K., Yannakakis, M.: Recursive markov decision processes and recursive stochastic games. J. ACM 62(2) (May 2015). https://doi.org/10.1145/2699431, https://doi.org/10.1145/2699431 [17] Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated Verification Techniques for Probabilistic Systems, p. 53–113. Springer Berlin Heidelberg, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4_3, https://doi.org/10.1007/978-3-642-21455-4_3 [18] Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems (NIPS). vol. 14, p. 1523–1530. MIT Press (2002) [19] Gutierrez, J., Najib, M., Perelli, G., Wooldridge, M.J.: EVE: A tool for temporal equilibrium analysis. In: Lahiri, S.K., Wang, C. (eds.) Automated Technology for Verification and Analysis - 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11138, p. 551–557. Springer (2018). https://doi.org/10.1007/978-3-030-01090-4_35, https://doi.org/10.1007/978-3-030-01090-4_35 [20] Hansen, K.A., Ibsen-Jensen, R., Miltersen, P.B.: The complexity of solving reachability games using value and strategy iteration. In: Computer Science – Theory and Applications. Springer (2011) [21] Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (Aug 2022). https://doi.org/10.1007/s10009-021-00633-z, https://doi.org/10.1007/s10009-021-00633-z [22] Immerman, N.: Number of quantifiers is better than number of tape cells. J. Comput. Syst. Sci. 22, 384–406 (1981), https://api.semanticscholar.org/CorpusID:40753086 [23] Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research (JMLR) 7(65), 1789–1828 (2006) [24] Kraft, D.: A software package for sequential quadratic programming. Tech. Rep. Forschungsbericht FB 88-28, Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (DFVLR), Köln, Germany (1988) [25] Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Prism-games 3.0: Stochastic game verification with concurrency, equilibria and time. In: Computer Aided Verification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12225, p. 475–487. Springer (2020). https://doi.org/10.1007/978-3-030-53291-8_25, https://doi.org/10.1007/978-3-030-53291-8_25 [26] Kwiatkowska, M., Norman, G., Parker, D., Santos, G.: Automatic verification of concurrent stochastic systems. Formal Methods in System Design 58(1), 188–250 (2021) [27] Křetínský, J., Meggendorfer, T., Weininger, M.: Stopping criteria for value iteration on stochastic games with quantitative objectives. In: 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). p. 1–14 (2023). https://doi.org/10.1109/LICS56636.2023.10175771 [28] Lefèvre, C.: Optimal control of a birth and death epidemic process. Operations Research 29(5), 971–982 (1981), http://w.jstor.org/stable/170234 [29] Lomuscio, A., Qu, H., Raimondi, F.: MCMAS: an open-source model checker for the verification of multi-agent systems. Int. J. Softw. Tools Technol. Transf. 19(1), 9–30 (2017). https://doi.org/10.1007/S10009-015-0378-X, https://doi.org/10.1007/s10009-015-0378-x [30] Lyu, G., Fazlirad, A., Brennan, R.W.: Multi-agent modeling of cyber-physical systems for iec 61499 based distributed automation. Procedia Manufacturing 51, 1200–1206 (2020). https://doi.org/https://doi.org/10.1016/j.promfg.2020.10.168, 30th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2021) [31] de Moura, L., Bjørner, N.: Z3: an efficient smt solver. vol. 4963, p. 337–340 (04 2008). https://doi.org/10.1007/978-3-540-78800-3_24 [32] Murano, A., Neider, D., Zimmermann, M.: Robust alternating-time temporal logic. In: Logics in Artificial Intelligence. p. 796–813. Springer Nature Switzerland, Cham (2023) [33] Parsons, T.D.: Pursuit-evasion in a graph. In: Theory and Applications of Graphs: Proceedings, Michigan May 11–15, 1976, p. 426–441. Springer (2006) [34] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons (1994) [35] Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, New York, 3rd edn. (1976) [36] Schaefer, M., Cardinal, J., Miltzow, T.: The existential theory of the reals as a complexity class: A compendium. CoRR abs/2407.18006 (2024). https://doi.org/10.48550/ARXIV.2407.18006, https://doi.org/10.48550/arXiv.2407.18006 [37] Shi, D., Elliott, R.J., Chen, T.: On finite-state stochastic modeling and secure estimation of cyber-physical systems. IEEE Transactions on Automatic Control 62(1), 65–80 (2016) [38] Sorensson, N.: Minisat 2.2 and minisat++ 1.1. http://baldur. iti. uka. de/sat-race-2010/descriptions/solver_25+ 26. pdf (2010) [39] Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2 [40] Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control p. 321–384 (2021) [41] Zhu, E.L., Borrelli, F.: A sequential quadratic programming approach to the solution of open-loop generalized nash equilibria. 2023 IEEE International Conference on Robotics and Automation (ICRA) p. 3211–3217 (2022), https://api.semanticscholar.org/CorpusID:247793988 Appendix 0.A The existential theory of the reals Definition 3(Existential theory of the reals) A formula of the existential theory of the reals, or ETR for short, is a formula of the form ∃x1…∃xkφ(x1,…,xk)∃ x_1…∃ x_k~ (x_1,…,x_k) where φ is a quantificator-free formula written with the symbols 0, 11, ==, ≤, <<, ++, −-, ×, ∧ , ⇔ , ¬ , and parentheses, with the expected syntactic rules and semantics. We write ∃ℝ for the complexity class of problems that can be reduced in polynomial time to the problem of deciding the validity of an ETR formula. That class is known to be included in PSPACE [9]. It is also easy to see that ∃ℝ contains the class NP. Appendix 0.B Appendix for Section˜3 0.B.1 A first lemma We prove a simple lemma that we use in the proof of Theorem˜3.3. Lemma 2() For any valuation x and any state s, the supremum in the definition of (x)(s) Pre(x)(s) is attained. Formally, there exists a memoryless collective strategy σ¯Π∗∈ΛΠ σ_ ^*∈ _ such that: (x)(s)=infσ∈Λσ¯Π∗,σ(x)(s) Pre(x)(s)= _ _ O∈ _ O Pre_ σ_ ^*, _ O(x)(s) Proof First, the set ΛΠ _ of selectors for the team is defined by the product of probability distributions over finite action sets for each player. In mathematical terms, this set is compact (it is closed and bounded). Second, consider the function f that maps a team selector to its guaranteed value: f:σ¯Π↦→infσ¯(x)(s)f: σ_ → _ _ O Pre_ σ(x)(s) The calculation of Pre involves adding and multiplying the probabilities defined in σ¯Π σ_ and σ _ O. Because of this, the function f is continuous. Finally, we apply the extreme value theorem [35]. This theorem states that a continuous function defined on a compact set must achieve a maximum value. Therefore, there is a specific strategy σ¯Π∗ σ_ ^* that maximises the value. ∎ 0.B.2 Proof of Lemma˜1 See 1 Proof We first prove that the value iteration sequence converges. Proposition 1() There exists w=limk→∞ukw= _k→∞u_k. Proof First, we show that the predecessor operator Pre is monotone. Let x and y be two valuations such that x≤yx≤ y (i.e., x(s)≤y(s)x(s)≤ y(s) for all s∈Ss∈ S). Consider any fixed collective selector ξ¯Π ξ_ and any selector ξ _ O for the opponent. Given a state s, the corresponding expected payoff is: ξ¯(x)(s)=∑a¯∈(s)∑t∈Sξ¯(s)(a¯)⋅δ(s,a¯,b)(t)⋅x(t). Pre_ ξ(x)(s)\;=\; _ a∈ Av(s) _t∈ S ξ(s)( a)·δ(s, a,b)(t)· x(t). This expression is a weighted average of the values x(t)x(t) with non-negative weights (i.e., ξ¯Π,ξ,δ ξ_ , _ O,δ have non-negative values). Therefore, if x(t)≤y(t)x(t)≤ y(t) for all states t, the weighted average for x cannot exceed the weighted average for y. Thus: ξ¯(x)(s)≤ξ¯(y)(s). Pre_ ξ(x)(s)\;≤\; Pre_ ξ(y)(s). Since taking the infimum (over the opponent strategies) and the supremum (over team strategies) preserves the order, we have: (x)≤(y). Pre(x)≤ Pre(y). Now we consider the sequence (un)n≥0(u_n)_n≥ 0 and show the convergence in three steps. Step 1: u0≤u1u_0≤ u_1. If s∈Ts∈ T, then (u0)(s)=1=u0(s) Pre(u_0)(s)=1=u_0(s) because the set T is assumed to be absorbing. If s∉Ts∉ T, then u0(s)=0≤(u0)(s)u_0(s)=0≤ Pre(u_0)(s), where the inequality holds because Pre is defined by non-negative probabilities. Step 2: Monotonicity of (un)n(u_n)_n. Assume un≤un+1u_n≤ u_n+1 for some n≥0n≥ 0. Since the operator Pre is monotone, applying it to both sides gives: un+1=(un)≤(un+1)=un+2.u_n+1= Pre(u_n)\;≤\; Pre(u_n+1)=u_n+2. By induction, the sequence is non-decreasing: u0≤u1≤u2≤…u_0≤ u_1≤ u_2≤…. Step 3: Boundedness. All valuations in the sequence are in [0,1]S[0,1]^S. Thus for each s∈Ss∈ S, the real sequence (un(s))n≥0(u_n(s))_n≥ 0 is non-decreasing and bounded above by 11, hence it converges. ∎ Now, define ℙ(◇T)(s)P( T)(s) as the maximum probability with which the team can guarantee reaching T starting from s: ℙ(◇T)(s)=supσ¯Πinfσℙσ¯(◇T)(s)P( T)(s)= _ σ_ _ _ OP_ σ( T)(s) Where ℙσ¯(◇T)(s)P_ σ( T)(s) denotes the probability of reaching T, starting from s, if the team plays with collective strategy σ¯Π σ_ , and the opponent plays with strategy σ _ O. By the same approach, we define ℙ(◇nT)(s)P( ^nT)(s) as the maximum probability that the team can guarantee of reaching T within at most n steps, and ℙσ¯(◇nT)(s)P_ σ( ^nT)(s) as the same probability under the strategy profile σ¯=(σ¯Π,σ) σ=( σ_ , _ O). We want to prove that for all s, we have w(s)=ℙ(◇T)(s)w(s)=P( T)(s). This will be a consequence of the two following propositions. Proposition 2 For all ε>0 >0, the team players have a collective strategy σ¯Π σ_ that guarantees reaching T from every state s with probability w(s)−εw(s)- . Proof For each n≥0n≥ 0, We define ξ¯Πn(s) ξ^n_ (s) as the collective selector that realises the supremum in the definition of un(s)u_n(s). Based on Lemma 2, such a strategy exists. Then, we have: un(s)=infξ¯Πn,ξ(un−1)(s)for all s.u_n(s)= _ _ O Pre_ ξ^n_ , _ O(u_\,n-1)(s) all s. For a fixed n≥0n≥ 0 we define a collective strategy τ¯Πn τ^n_ that plays ξ¯Πn ξ^n_ in the first step, then ξ¯Πn−1 ξ^\,n-1_ in the second step and so on. More formally, for all 1≤i≤n1≤ i≤ n, in the i-th step, the team plays according to the ξ¯Πn−i+1 ξ^n-i+1_ . In the following steps, it behaves arbitrarily. We show by induction on n that for every state s and all strategies ξ _ O of the opponent, we have: ℙτ¯Πn,ξ(◇nT)(s)≥un(s).P_ τ^n_ , _ O( ^nT)(s)≥ u_n(s). For n=0n=0 the claim is correct, since ◇0T ^0T simply means that s∈Ts∈ T, and by definition we know that u0(s)=1u_0(s)=1 if and only if s∈Ts∈ T. Assume the claim holds for n−1n-1. Starting from a state s, the first step is played using ξ¯Πn ξ^n_ . Conditioning on the successor t, if t∈Tt∈ T the objective is already satisfied, and if t∉Tt∉ T then by the induction hypothesis the probability of reaching T within the next n−1n-1 steps is at least un−1(t)u_n-1(t). Therefore, we have: ℙτ¯Πn,ξ(◇nT)(s)≥ξ¯Πn,ξ(un−1)(s).P_ τ^n_ , _ O( ^nT)(s)\;≥\; Pre_ ξ^n_ , _ O(u_\,n-1)(s). Taking the infimum over all opponent strategies and using the definition of ξ¯Πn ξ^n_ , we obtain: ∀ξ,ℙτ¯Πn,ξ(◇nT)(s)≥un(s).∀ _ O,P_ τ^n_ , _ O( ^nT)(s)\;≥\;u_n(s). This completes the induction. Since un→wu_n→ w and S is finite, there exists n such that |un(s)−w(s)|<ε|u_n(s)-w(s)|< for all s. For this n, the strategy τ¯Πn τ^n_ satisfies, for all s and all σ _ O we have: ℙτ¯Πn,σ(◇nT)(s)≥un(s)≥w(s)−ε.P_ τ^n_ , _ O( ^nT)(s)\;≥\;u_n(s)\;≥\;w(s)- . This proves the proposition. ∎ Proposition 3 For every state s, team players do not have a collective strategy σ¯Π σ_ that guarantees reaching T with probability greater than w(s)w(s). Proof We show that for any collective strategy σ¯Π σ_ chosen by the team, the opponent has a counter-strategy σ _ O that guarantees the probability of reaching the target T never exceeds w(s)w(s). We prove this by showing that the opponent can keep the probability of reaching T within any number of turns n (◇nT ^nT objective) below the valuation unu_n. We use induction on n. For n=0n=0, reaching the target in zero steps is only possible if we are already there, so the probability is 11 for s∈Ts∈ T and 0 otherwise. This matches the definition of u0u_0 since u0u_0 assigns 11 only to state s∈Ts∈ T. Now, assume that the claim is true for some n. Let s be a state, and let σ¯Π σ_ be a collective strategy. By definition of the predecessor operator Pre, we have: infξ∈Λξ¯(un)(s)≤un+1(s) _ _ O∈ _ O Pre_ ξ(u_n)(s)≤ u_n+1(s) Moreover, there is a selector ξ _ O which realises this infimum, and therefore such that ξ¯(un)(s)≤un+1(s) Pre_ ξ(u_n)(s)≤ u_n+1(s). Now, let us consider the following strategy for the opponent. First, she plays the selector ξ _ O. Then, let t be the state that is reached: from t, by induction hypothesis, the opponent can respond to the team’s strategy to ensure that the probability of avoiding the set T during the next n steps does not exceed un(s)u_n(s). The opponent then plays that strategy. Then, from s, the probability of avoiding t is at most un+1(s)u_n+1(s). As n grows, the values unu_n converge to the limit w. Since the opponent can keep the team’s success after n steps below unu_n for every specific n, the opponent can also keep the total probability of eventually reaching T from exceeding the limit wsw_s. This shows that no collective strategy can guarantee a payoff strictly greater than w(s)w(s). More formally: ∀n,∀σ¯Π,∃σ,ℙσ¯(◇nT)≤un ∀ n,∀ σ_ ,∃ _ O,P_ σ( ^nT)≤ u_n (proven by induction) ⟹ ∀σ¯Π,∃σ,∀n,ℙσ¯(◇nT)≤w ∀ σ_ ,∃ _ O,∀ n,P_ σ( ^nT)≤ w (since un≤wu_n≤ w) ⟹ supσ¯Πinfσsupnℙσ¯(◇nT)≤w _ σ_ _ _ O _nP_ σ( ^nT)≤ w (properties of sup/inf) ⟹ supσ¯Πinfσℙσ¯(◇T)≤w _ σ_ _ _ OP_ σ( T)≤ w (monotony of reaching probability) The last sentence shows that for any collective strategy σ¯Π σ_ and any state s, the opponent has a counter strategy that prevents the team from reaching T with probability greater than w(s)w(s). ∎ This proves our lemma. 0.B.3 Proof of Theorem˜3.1 See 3.1 Proof We use the same approach as in the work of Chatterjee, de Alfaro and Henzinger [10] for two-player concurrent game, where they show that memoryless strategies suffice for two-player zero-sum concurrent games. Proposition 4 Let n be the index such that the valuation unu_n satisfies ‖w−un‖∞<ε/2\|w-u_n\|_∞< /2 (which exists since (un)n(u_n)_n converges to w). For each non-terminal state s, let the rank ℓ(s) (s) be the smallest k∈1,…,nk∈\1,…,n\ such that uk(s)=un(s)u_k(s)=u_n(s). There exists a collective selector ξ¯Πs ξ^s_ such that for any opponent action b∈Ab∈ A_ O, the expected value of the valuation uℓ(s)−1u_ (s)-1 at the subsequent state t satisfies: ∑a¯Π∈Π(s)(ξ¯Πs(a¯Π)⋅∑t∈Sδ(s,a¯Π,b)(t)⋅uℓ(s)−1(t))≥uℓ(s)(s). _ a_ ∈ Av_ (s) ( ξ_ ^s( a_ )· _t∈ Sδ(s, a_ ,b)(t)· u_ (s)-1(t) )≥ u_ (s)(s). Proof By the construction of the value iteration sequence, we have uℓ(s)(s)=(uℓ(s)−1)(s)u_ (s)(s)= Pre(u_ (s)-1)(s). We recall that Pre operator is defined by a supremum over team memoryless strategies and an infimum over opponent memoryless strategies: (uℓ(s))(s)=supξ¯Π∈ΛΠinfξ∈Λξ¯(uℓ(s)−1)(s) Pre(u_ (s))(s)= _ ξ_ ∈ _ _ _ O∈ _ O Pre_ ξ(u_ (s)-1)(s) From Lemma˜2, there must exist a memoryless collective strategy ξ¯Πs ξ^s_ that secures the value uℓ(s)(s)u_ (s)(s) relative to the previous valuation uℓ(s)−1u_ (s)-1. This choice of action ensures that the team maintains its guaranteed success probability while accounting for the step-count progress required to reach T. ∎ Lemma 3 There exists a memoryless collective strategy σ¯ σ for the team that guarantees reaching T from state s with probability at least w(s)−εw(s)- for every ε>0 >0. Proof Consider the memoryless strategy σ¯ σ where the team plays ξ¯Πs ξ^s_ at each state s as defined in Proposition˜4. Recall that ξ¯Πs ξ^s_ was chosen such that for any opponent action b∈Ab∈ A_ O, the expected valuation at the next state (with a decremented rank) is at least the current valuation. Formally: ∑a¯Π∈Π(s)(ξ¯Πs(a¯Π)⋅∑t∈Sδ(s,a¯Π,b)(t)⋅uℓ(s)−1(t))≥uℓ(s)(s). _ a_ ∈ Av_ (s) ( ξ_ ^s( a_ )· _t∈ Sδ(s, a_ ,b)(t)· u_ (s)-1(t) )≥ u_ (s)(s). This inequality ensures a local progress condition: at every step, the game either transitions to a state with a higher limit valuation unu_n, or it maintains the valuation while decreasing the rank ℓ (the required number of steps to reach T). Under this strategy, every transition from a state s leads to a distribution over states t that satisfies a progress condition relative to the ranks. Specifically, the opponent is unable to prevent the game from either transitioning to a state with a higher valuation unu_n or transitioning to a state with the same valuation but a strictly lower rank ℓ . Since the ranks are finite natural numbers and u0u_0 is the indicator function of T (u0(s)u_0(s) is equal to 11 if and only if s∈Ts∈ T, 0 otherwise), the game cannot remain in a cycle of non-target states with probability 11. More formally, the rank ℓ(s) (s) represents the number of steps left to realise the value. The collective strategy σ¯ σ ensures that the team maintains its winning probability while moving to a state with a lower rank. Since the rank is a whole number, it cannot count down forever; it must eventually hit 0. Because rank 0 corresponds exactly to the Target set T (where u0(s)=1u_0(s)=1), the game cannot get stuck in states outside T. Thus, it eventually arrives at the target. This local progress guarantee at each state ensures global convergence to T. The total probability of eventually reaching T is therefore at least un(s)u_n(s). Given that un(s)≥w(s)−ε/2u_n(s)≥ w(s)- /2, the memoryless collective strategy σ¯ σ satisfies the requirement Prsσ¯(◇T)≥w(s)−εPr σ_s( T)≥ w(s)- . ∎ By Lemma˜3 it is true that for every ε , there exists a memoryless strategy for team players, which guarantees value w(s)−εw(s)- . This shows the correctness of Theorem˜3.1. ∎ 0.B.4 Appendix for Theorem˜3.2 See 3.2 Proof We show this on a three-player example. There are three players: ,P1,P2 O,P_1,P_2. For the player P1P_1, we define the set of actions AP1=a,b,waitA_P_1=\a,b,wait\. For O and P2P_2 the action sets are A=AP2=a,bA_ O=A_P_2=\a,b\. There are two stages in the game: (1) Waiting stage: This stage is active while P1P_1 chooses action wait. Based on the action of O, the game moves to SaS_a or SbS_b. On SaS_a and SbS_b for every set of actions, the game returns to S. (2) Payoff stage: This stage occurs if both P1P_1 does not choose an action wait. The game then terminates. If both P1,P2P_1,P_2 choose the same action as O the game finishes at ⊤ (the target state), otherwise at ⟂ . SSSaS_aSbS_b⊤ ⟂ (wait,∗),a(wait,*),a(wait,∗),b(wait,*),b(∗,∗),∗(*,*),*(∗,∗),∗(*,*),*(a,a),a or (b,b),b(a,a),a or (b,b),botherwise During the Waiting Stage, the game remains in the loop S→Sa,Sb→S→\S_a,S_b\→ S. At state S, the next state depends only on the opponent’s action. From S0S_0 and S1S_1, the game always returns to S. During the Payoff Stage, the game terminates and the reward is determined. Assume the opponent is forced to play memoryless strategies and chooses action a with probability p,ap_ O,a, and action b with probability 1−p,a1-p_ O,a at state S. For computing the max-min value, the opponent fixes p,ap_ O,a after knowing the strategies of the team players. Case 1: The team players all play memoryless strategies. If P1P_1 chooses action wait with probability 11, the game never leaves the waiting loop, so the probability of reaching the target will be 0. If P1P_1 chooses action wait with probability α<1α<1, then at visit of the play to the state S, the probability of entering the payoff stage is 1−α1-α. As the following limit value is equal to 1: ℙ(ever enter payoff stage)=limk→∞(1−αk)= 1,P(ever enter payoff stage)\;=\; _k→∞ (1-α^k )\;=\;1, so the game reaches the payoff stage with probability 11, regardless of the value of α<1α<1. At the payoff stage, suppose that each P1P_1 and P2P_2 choose action a with probability pP1,a,pP2,ap_P_1,a,\ p_P_2,a respectively. As P1P_1 chooses action wait with probability α, we normalise pP1,ap_P_1,a to have the actual probability of playing action a at the payoff stage. Thus, we divide it by 1−α1-α (the probability of playing action a or b at the payoff stage). As a counter strategy, the opponent will choose p,ap_ O,a to minimise the team’s chance of winning. If the team is more likely to match a than b, O will always play b (and vice versa). To maximise their worst-case guarantee, the team plays a strategy such that the probability of matching a is equal to the probability of matching b. This requires pP1,a⋅pP2,a=(1−pP1,a)⋅(1−pP2,a)p_P_1,a· p_P_2,a=(1-p_P_1,a)·(1-p_P_2,a). Thus, the best guarantee the team can have under this condition is found when they play uniformly (pP1,a=pP2,a=1/2p_P_1,a=p_P_2,a=1/2). In this case, the probability of reaching ⊤ is p,a(14)+(1−p,a)(14)=14.p_ O,a ( 14 )+(1-p_ O,a) ( 14 )= 14. The team players can therefore guarantee reaching ⊤ with probability 1/41/4, no matter what does O do. If the team deviates from playing uniformly, O can switch p,ap_ O,a to make the probability even lower. Thus, under memoryless strategies, the team cannot guarantee a value strictly greater than 1/41/4. Case 2: The team players play arbitrary strategies If team players use memory, they can coordinate their actions based on the history of visited states. Consider the following collective strategy σ¯Π σ_ . • In the first round, P1P_1 plays wait. This forces the game to transition to SaS_a or SbS_b based on the opponent’s action, and then immediately return to S. • In the second round, the team players observe the history. If the previous state was SaS_a, both P1P_1 and P2P_2 play action a, otherwise action b. Since team players always choose the same action in the second round, the probability of falling into the trap due to choosing different actions is 0. We now calculate the probability of reaching T. Since O is memoryless, she plays action a with the same probability p,ap_ O,a independently in both the first and second round. The probability of reaching T is the sum of the probabilities of two disjoint events: The history was SaS_a and the opponent plays a again. Or the history was SbS_b and the opponent plays b again. ℙ(◇T)=p,a⋅p,a+(1−p,a)⋅(1−p,a)=p,a2+(1−p,a)2P( T)=p_ O,a· p_ O,a+(1-p_ O,a)·(1-p_ O,a)=p_ O,a^2+(1-p_ O,a)^2 To find the guaranteed value, we minimise this function with respect to p,ap_ O,a. The function f(x)=x2+(1−x)2f(x)=x^2+(1-x)^2 achieves its global minimum at x=1/2x=1/2. minp,a(p,a2+(1−p,a)2)=(12)2+(12)2=14+14=12 _p_ O,a (p_ O,a^2+(1-p_ O,a)^2 )= ( 12 )^2+ ( 12 )^2= 14+ 14= 12 Thus, the team guarantees reaching probability of at least 1/21/2. Since 1/2>1/41/2>1/4, there exists a collective strategy with memory that strictly outperforms the best possible memoryless strategy. Which completes the proof. ∎ 0.B.5 Proof of Theorem˜3.3 See 3.3 We first define an ETR sentence Ψ , then prove the correctness. This automatically shows the membership of the problem in ∃ℝ . 0.B.5.1 Definition of Ψ Here, we define the formula Ψ . Variables. We introduce the following existentially quantified variables: • Strategy variables: For each team player pi∈Πp_i∈ , state s∈Ss∈ S, and action a∈i(s)a∈ Av_i(s), let xs,i,a∈[0,1]x_s,i,a∈[0,1] denote the probability of playing action a by player pip_i on state s. • Valuation variables: For each state s∈Ss∈ S, let vs∈[0,1]v_s∈[0,1] denote the value of the state s (Which shows the expected payoff of the game, starting from state s). • Discount variable: Let λ∈(0,1)λ∈(0,1) represent the discount factor. The formula Ψ . The formula is a conjunction of the following constraints: 1. Strategy constraints: The variables x must define a valid probability distribution on every state: Φ:=⋀s∈S⋀i∈Π∑a∈i(s)xs,i,a=1∧⋀axs,i,a≥0 _ strategy:= _s∈ S _i∈ _a∈ Av_i(s)x_s,i,a=1 _ax_s,i,a≥ 0 (1) 2. Discounted valuation constraints: The values v(s)v(s) must effectively be a sub-solution to the λ-discounted Bellman equations. • Target states: If s∈Ts∈ T, the value is fixed to 1: Φ:=⋀s∈Tv(s)=1 _ target:= _s∈ Tv(s)=1 (2) • Non-target states: If s∉Ts∉ T, the value v(s)v(s) must be supportable against any action b chosen by the opponent: Φ:=⋀s∈S∖T⋀b∈(s)v(s)≤λ⋅∑a¯∈AΠ(∏i∈Πxs,i,ai)⋅∑s′∈Sδ(s,a¯,b)(s′)⋅v(s′) _ val:= _s∈ S T _b∈ Av_ D(s)v(s)≤λ· _ a∈ A_ ( _i∈ x_s,i,a_i )· _s ∈ Sδ(s, a,b)(s )· v(s ) (3) 3. Threshold constraint: We require the value at the initial state s0s_0 to strictly exceed t and λ to be in (0, 1)(0,\ 1): Φ:=(0<λ<1)∧(v(s0)>t) _ goal:=(0<λ<1) (v(s_0)>t) (4) The complete formula is defined as: Ψ:=∃x¯,v¯,λ:Φ∧Φ∧Φ∧Φ :=∃ x, v,λ: _ strategy _ target _ val _ goal 0.B.5.2 Correctness of formula Ψ The formula Ψ involves polynomials of degree at most |Π|+1| |+1 and the number of variables is polynomial in the size of the game G. We establish the equivalence stated in the theorem via the following two lemmas. Thus, the problem would be in the complexity class ∃ℝ . Lemma 4 (Soundness) If Ψ is satisfiable, then there exists a collective strategy σ¯Π σ_ that guarantees reaching T with a probability strictly greater than t. Proof Let (x¯,v¯,λ)( x, v,λ) be a satisfying assignment for Ψ . First, the assignment to variables x¯ x defines a valid memoryless collective strategy σ¯Π σ_ for the team. The constraints in Ψ check that for every state, the probabilities assigned to actions are non-negative and sum to 1. The variable vsv_s represents the value of the λ-discounted game starting at state s. Since the assignment satisfies the constraint vs0>tv_s_0>t, the expected discounted reward from s0s_0 is greater than t. We now argue that the collective strategy for the team σ¯Π σ_ guarantees reaching T with a probability strictly greater than t in the original game. Consider the payoff for any single play of the game: • If the game reaches T after k steps, the reward in the λ-discounted game is λkλ^k. The reward in the original game is 11. • If the game never reaches T, the reward is 0 in both games. In all cases, the payoff in the original game is greater than or equal to the payoff in the λ-discounted game. Thus, the expected value (probability of winning) in the reachability game is at least as large as the expected reward in the discounted game. Since the discounted value guaranteed by the collective strategy σ¯Π σ_ is greater than t, σ¯Π σ_ guarantees reaching T with a probability strictly greater than t. ∎ Lemma 5 (Completeness). If there exists a collective strategy guaranteeing a reachability probability greater than t, then Ψ is satisfiable. Proof Assume the team can guarantee a reachability probability V∗>tV^*>t. By Theorem 3.1, we know there exists a memoryless collective strategy σ¯Π∗ σ_ ^* that achieves value V′>V∗−ε>tV >V^*- >t for every ε with V∗−t>ε>0V^*-t> >0. Once this strategy is fixed, the game can be seen as a Markov Decision Process (MDP) where the opponent minimises against the fixed team memoryless strategy. We briefly recall here that an MDP with reachability objective is the same as a concurrent game where the size of the team players is 0. It is a standard result for finite MDPs that the optimal reachability probability is the limit of the λ-discounted values as λ→1−λ→ 1^- [7]. In other words, the reward of reaching T after k steps is λkλ^k. As λ goes to 1−1^-, the payoff will reach 1. It was shown in [7], that (this intuition is actually correct) as λ→1−λ→ 1^-, the expected reward is equal to the actual reachability probability. Let Vσ∗λV^λ_σ^* be the value of the game under collective memoryless strategy σ¯Π∗ σ_ ^* with discount factor λ. We have V∗=limλ→1−Vσ∗λ≥V′V^*= _λ→ 1^-V^λ_σ^*≥ V . Since V′V is strictly greater than t, we have V∗>tV^*>t. By the definition of the limit, there must exist a discount factor λ0∈(0,1) _0∈(0,1) sufficiently close to 11 such that the discounted value Vσ∗λ0V _0_σ^* is also strictly greater than t. The tuple (σ∗,Vσ∗λ0,λ0)(σ^*,V _0_σ^*, _0) satisfies all constraints in Ψ : σ¯Π∗ σ_ ^* fits Φ _ strategy, Vσ∗λ0V _0_σ^* is the unique solution to the contraction mapping for discount λ0 _0 fitting Φ _ val, and the value exceeds t fitting Φ _ goal. Thus, the formula Ψ is satisfiable. ∎ 0.B.6 Proof of Theorem˜3.4 See 3.4 We reduce the k-clique problem to the threshold problem. In the k-clique problem, we are given a graph G and an integer k. We have to determine if there is a set of vertices of size k where every pair is connected by an edge. We construct a game where the team players can win with probability greater than 34 34 if and only if such a clique exists. 0.B.6.1 Game definition from the clique problem Given the graph G=(V,E)G=(V,E) and the integer k, we construct a game G,kG_G,k as follows. Players. There are two team players, P1P_1 and P2P_2, and one opponent ( O). States. The game has three states: the starting state s, the target state ⊤ , and the trap state ⊥ . Actions. • For the players P1P_1 and P2P_2 the set of available actions at the start state is a pair (i,v)(i,v), where i is an index (an integer between 11 and k) and v∈Vv∈ V is a vertex: AP1=AP2=(i,v)|i∈1,2,…,k,v∈VA_P_1=A_P_2=\(i,v)\ |\ i∈\1,2,...,k\,v∈ V\ • O chooses a pair of indices (i∗,j∗)(i^*,j^*) between 11 and k. A=(i∗,j∗)|i∗,j∗∈1,2,…,kA_ O=\(i^*,j^*)\ |\ i^*,j^*∈\1,2,...,k\\ Transitions. The game starts at state s. The transitions are defined as follows: 1. The Index check (loop): First, we check if the team players choose the same indices as the opponent. Formally, if P1P_1 does not choose index i∗i^* or P2P_2 does not choose index j∗j^*, the game stays at state s. 2. The validation (win/lose): If the indices match (P1P_1 chooses i∗i^* and P2P_2 chooses j∗j^*), we check the vertices they chose (u and v). • Consistency: If the opponent chooses the same indices (i∗=j∗i^*=j^*), the team passes if u=vu=v. If u≠vu≠ v, the game moves to ⊥ . • Edge: If the opponent chooses different indices (i∗≠j∗i^*≠ j^*), the game moves to ⊤ if there is an edge between them ((u,v)∈E(u,v)∈ E). If there is no edge, the game moves to ⊥ . If the team passes the check, the game moves to ⊤ . Once the game reaches ⊤ or ⊥ , it stays there forever. The team wins if the game reaches ⊤ . We define the threshold t=34t= 34. We show that the team can win with probability greater than t if and only if the graph has a clique of size k. Lemma 6 If the graph G contains a clique of size k, the team has a memoryless strategy to reach ⊤ with probability 11. Proof Assume that the graph G has a clique of size k. Let c1,c2,…,ck\c_1,c_2,…,c_k\ be the vertices of this clique. We define a memoryless collective strategy σ¯=(σP1,σP2) σ=( _P_1, _P_2) for the team players in state s as follows: • Both players P1P_1 and P2P_2 choose an index z from 11 to k uniformly at random. • If a player chooses index z, they play the action (z,cz)(z,c_z). Consider that at any step, some action (i∗,j∗)(i^*,j^*) is chosen by the opponent. In each step, the probability that P1P_1 plays (i∗,ci∗)(i^*,c_i^*) is 1/k1/k, and the probability that P2P_2 plays (j∗,cj∗)(j^*,c_j^*) is 1/k1/k. Since the players randomise independently, the probability that they simultaneously match the opponent’s indices is 1/k21/k^2. This probability is strictly positive, so the game will eventually leave the loop. When the indices match, we check the transition: • If i∗=j∗i^*=j^*, both players play the vertex ci∗c_i^*. Thus, u=vu=v, and the game moves to ⊤ . • If i∗≠j∗i^*≠ j^*, P1P_1 plays ci∗c_i^* and P2P_2 plays cj∗c_j^*. Since ci∗c_i^* and cj∗c_j^* are distinct vertices in a clique, the edge (ci∗,cj∗)(c_i^*,c_j^*) exists. Thus, the game moves to ⊤ . In both cases, the game moves to ⊤ . Since the probability of moving to ⊥ is 0, this strategy guarantees reaching ⊤ with probability 1. ∎ Lemma 7 If the team has a strategy to reach ⊤ with probability greater than 3/43/4, then the graph G contains a clique of size k. Proof We assume that the team players have a memoryless collective strategy σ¯=(σP1,σP2) σ=( _P_1, _P_2) that guarantees reaching ⊤ with probability strictly greater than 3/43/4. Note that by Theorem˜3.1 such a strategy exists. We show that this implies the existence of a clique of size k. We define βPi(j) _P_i(j) as the probability that player PiP_i chooses the index j: βPi(j)=∑v∈VσPi(s)(j,v) _P_i(j)= _v∈ V _P_i(s)(j,v) For the ease of notation, we define βPi(j,v) _P_i(j,v) as the probability of player PiP_i choosing the pair (j,v)(j,v). Note that βPi(j,v)=σPi(s)(j,v) _P_i(j,v)= _P_i(s)(j,v). First, we argue that for all players PiP_i and indices j∈1,…,kj∈\1,…,k\, we must have βPi(j)>0 _P_i(j)>0. If there exists some PiP_i and j such that βPi(j)=0 _P_i(j)=0, the opponent can choose the index pair (j,j)(j,j). In this case, the team never matches the opponent’s chosen indices. Consequently, the game remains in the starting state s forever. Since the target ⊤ is never reached, the winning probability would be 0, which contradicts the assumption that the winning probability is greater than 3/43/4. Thus, βPi(j)>0 _P_i(j)>0 for all i,ji,j. Next, we analyse the conditional probabilities. We define the conditional probability of playing vertex v given index j by player PiP_i as: ℙ(v|j,Pi)=βPi(j,v)βPi(j)P(v|j,P_i)= _P_i(j,v) _P_i(j) We claim that for every player PiP_i and index j, there must exist a dominant vertex v such that ℙ(v|j,Pi)>1/2P(v|j,P_i)>1/2. Suppose for the sake of contradiction, for some player P1P_1 and index j, no such vertex exists: maxv∈VβP1(j,v)βP1(j)≤12 _v∈ V _P_1(j,v) _P_1(j)≤ 12 In this case, suppose the opponent chooses the pair (j,j)(j,j). The game proceeds to the validation phase only if both P1P_1 and P2P_2 choose action j. The probability of passing the consistency check (where P1P_1 and P2P_2 must play the same vertex v) is: ℙ(∣)=∑v∈Vℙ(v|j,P1)⋅ℙ(v|j,P2)P( Pass Match)= _v∈ VP(v|j,P_1)·P(v|j,P_2) Since we assumed ℙ(v|j,P1)≤1/2P(v|j,P_1)≤ 1/2 for every vertex v, we can bound this sum: ∑v∈Vℙ(v|j,P1)⋅ℙ(v|j,P2)≤∑v∈V12⋅ℙ(v|j,P2)=12∑v∈Vℙ(v|j,P2) _v∈ VP(v|j,P_1)·P(v|j,P_2)≤ _v∈ V 12·P(v|j,P_2)= 12 _v∈ VP(v|j,P_2) Since the sum of probabilities for P2P_2 is 1, the total going to ⊤ of passing is at most 1/21/2. This means that whenever the loop ends, the total probability of winning the game is at most 1/21/2. This contradicts our assumption that the team wins with probability greater than 3/43/4. Therefore, for every player PiP_i and every index j, there must be a dominant vertex v where ℙ(v|j,Pi)>1/2P(v|j,P_i)>1/2. Since these dominant vertices exist, we define a specific sequence of vertices for each player. • For P1P_1, let (v1,…,vk)(v_1,…,v_k) be the sequence where vjv_j is the dominant vertex for index j. • For P2P_2, let (u1,…,uk)(u_1,…,u_k) be the sequence where uju_j is the dominant vertex for index j. Step 1: The players agree on vertices. We argue that vj=ujv_j=u_j for all j. Suppose they are different (vj≠ujv_j≠ u_j). Consider the deterministic strategy by the opponent that chooses (j,j)(j,j). Then, the probability that the team players play the specific pair (vj,uj)(v_j,u_j) is: ℙ(P1=vj,P2=uj∣Match)>12⋅12=14P(P_1=v_j,P_2=u_j )> 12· 12= 14 Since vj≠ujv_j≠ u_j, the game moves to ⊥ . Thus, the probability of going to ⊤ would be less than 3/43/4, which is a contradiction. Thus, vjv_j must equal uju_j. Let us call this vertex cjc_j. Step 2: The vertices (c1,…,ck)(c_1,…,c_k) form a clique. We argue that for any two indices i≠ji≠ j, there is an edge between cic_i and cjc_j. Suppose there is no edge (Note that the graph is simple and does not have a self-loop. Thus, cic_i and cjc_j may be equal). Consider the deterministic strategy of the opponent that always chooses (i,j)(i,j). The team then plays the pair (ci,cj)(c_i,c_j) with probability greater than 1/41/4 (since both are dominant vertices). Because the edge is missing, the game moves to ⊥ . Again, this implies the winning probability is less than 3/43/4, which is a contradiction. Thus, vertices (c1,…,ck)(c_1,…,c_k) form a clique. ∎ Combining these two lemmas, we conclude the main theorem that the threshold problem is NP-hard. Appendix 0.C Appendix for Section˜4 See 4.1 0.C.1 Proof of Theorem˜4.1 We start by defining W as the set of states from which the team players can force the game to reach T with probability 1. Our goal is to define a memoryless strategy on W that wins almost-surely. We structure the set W into the sequence (W0,W1,…)(W_0,W_1,…) based on the minimum number of steps that the team can guarantee to reach T with a positive probability without leaving W (later seen as ranks). In words, Wk+1W_k+1 contains the states that are already in WkW_k, in addition to the states in W, from which the team can choose a memoryless strategy that transitions to WkW_k with positive probability, and remains in W with probability 1, regardless of the opponent’s action. More formally, we define a sequence of sets W0,W1,…W_0,W_1,… as follows: • W0=TW_0=T. • for k≥0:k≥ 0: Wk+1 W_k+1 =Wk∪s∈W|∃σ¯Π∈ΛΠ,∀b∈A: =W_k∪ \s∈ W\; |\;∃ σ_ ∈ _ ,∀ b∈ A_ O: (∑t∈Wk∑a¯Π∈AΠ((∏i∈Πσ¯i(s)(ai))⋅δ(s,a¯,b)(t))>0) ( _t∈ W_k _ a_ ∈ A_ ( ( _i∈ σ_i(s)(a_i) )·δ(s, a,b)(t) )>0 ) ∧(∑t∈W∑a¯Π∈AΠ((∏i∈Πσ¯i(s)(ai))⋅δ(s,a¯,b)(t))=1) ( _t∈ W _ a_ ∈ A_ ( ( _i∈ σ_i(s)(a_i) )·δ(s, a,b)(t) )=1 ) \ We now prove that this construction eventually includes the set W, defined as the set of states from which the team has a collective strategy to reach T almost-surely. Lemma 8() Let W be the set of all almost-sure winning states. The sequence WkW_k converges to W. That is, there exists m such that Wm=W_m=W. Proof Since the set of states S is finite, the sequence W0⊆W1⊆…W_0 W_1 … must eventually stop growing. Let W∗W^* be the limit of this sequence. By definition, W∗⊆W^* W because states from W are added to set Wk+1W_k+1 at each step k. Suppose for the sake of contradiction, the set W∗W^* is not equal to W. This means there is a non-empty set of states R=W∖W∗R=W W^*. From any state in R, the team can win with probability 1. However, to reach T, the game must eventually leave R and enter W∗W^* (since W∗W^* contains T). Since R⊆WR W, there exists a collective strategy σ¯Π σ_ for the team that guarantees almost-sure reachability from any state s∈Rs∈ R. However, by the definition of the limit set W∗W^*, for every state s∈Rs∈ R, the team cannot force a transition to W∗W^* with positive probability. Therefore, even if the team plays σ¯Π σ_ , the opponent can force the game to remain outside W∗W^* at every step. Since T⊆W∗T W^*, this opponent strategy guarantees the game never reaches T, contradicting the assumption that σ¯Π σ_ is an almost-sure winning strategy. Therefore, the set R must be empty, which proves that W∗=W^*=W. ∎ Now, we construct the strategy η based on the sequence WiW_i, which guarantees almost-sure reachability. For every state s∈W∖Ts∈ W T, we define rsr_s the smallest index such that s∈Wrss∈ W_r_s. We call rsr_s the rank of s. Let m be the maximum rank (i.e., Wm=W_m=W). By the definition of the sequence WiW_i, there exists a memoryless strategy for the team at state s such that for any opponent action: 1. The game moves to Wk−1W_k-1 with a positive probability on the next turn. 2. The game remains in W with probability 1 on the next turn. Let η be the strategy that plays this memoryless strategy at each state s. We show in the following lemma that for each state s∈Ws∈ W, strategy η guarantees reaching T with probability 1. Lemma 9() If the team players play the memoryless collective strategy η, for every state s∈Ws∈ W, the probability of reaching T is 1. Proof Since W is a finite set and the probability of reducing the rank is strictly positive at each step, there exists a value γ>0γ>0 such that for any state s∈Ws∈ W, the probability of reaching T (rank 0) within at most m steps is at least γ by using the strategy η. Now, let p¬reach(i)p_ (i) denote the probability that the strategy η fails to reach T within the first i steps. In the first m steps, the probability of reaching T is at least γ. If the target is not reached, the game remains in W, and the same argument applies to the next m steps. Therefore, for i=k⋅mi=k· m, this probability is bounded by: p¬reach(k⋅m)≤(1−γ)kp_ (k· m)≤(1-γ)^k As k→∞k→∞, the probability p¬reach(k⋅m)p_ (k· m) converges to 0. Consequently, the probability of eventually reaching T is: 1−limk→∞p¬reach(k⋅m)=1−limk→∞(1−γ)k=1.1- _k→∞p_ (k· m)=1- _k→∞(1-γ)^k=1. Thus, η is an optimal memoryless almost-sure winning strategy. ∎ Combining Lemma˜8 and Lemma˜9, we establish the main theorem for this section that shows memoryless strategies are optimal in the almost-sure problem. 0.C.2 Proof of Theorem˜4.2 See 4.2 We construct the SAT formula Φ that is satisfiable if and only if there exists a winning set W, a ranking rsr_s, and a strategy support that guarantee almost-sure winning as follows. 0.C.2.1 Variables Let N=|S|N=|S| be the number of states. We define the following Boolean variables: • Winning set: For each state s∈Ss∈ S, a variable wsw_s is defined. This variable will be true if s∈Ws∈ W. • Ranking: For each state s∈Ss∈ S and rank k∈0,…,Nk∈\0,…,N\, a variable rs,kr_s,k is defined. This variable will be true if rsr_s is equal to k (i.e., rsr_s is the smallest index which s∈wrss∈ w_r_s). • Strategy support: For each state s, player Pi∈ΠP_i∈ , and action a∈APia∈ A_P_i, a variable us,i,au_s,i,a is defined. This variable will be true if the action a is in the support of the player i at the state s (i.e. a is played with positive probability). 0.C.2.2 Constraints. We define the following constraints: 1. Consistency of winning set and ranks. A state s is in the winning set W if and only if it has a rank. Furthermore, we force that each state has at most one rank. These two ensure that every state in W has exactly one rank. Target states must have rank 0. We also force s0s_0 to be in W. Φrank:=⋀s∈S(ws↔⋁k=0Nrs,k)⋀s∈S⋀0≤j<k≤N¬(rs,j∧rs,k)⋀s∈Trs,0⋀ws0 _rank:= _s∈ S (w_s _k=0^Nr_s,k ) _s∈ S _0≤ j<k≤ N (r_s,j r_s,k) _s∈ Tr_s,0\ \ w_s_0 2. Non-Empty strategy support. If a state is in W, every team player must have at least one valid action in their support. Φsupport:=⋀s∈S⋀Pi∈Π(ws→⋁a∈Aius,i,a) _support:= _s∈ S _P_i∈ (w_s→ _a∈ A_iu_s,i,a ) 3. Safety in W. The strategy must guarantee that the game never leaves W. If the team plays actions from their supports, then for any opponent action b, all possible next states must be in W. Φsafe:=⋀s∈S⋀a¯∈∏iAvi(s)⋀b∈A((⋀Pi∈Πus,i,ai)→⋀t∈Supp(δ(s,a¯,b))wt) _safe:= _s∈ S _ a∈Π _iAv_i(s) _b∈ A_ O ( ( _P_i∈ u_s,i,a_i )→ _t (δ(s, a,b))w_t ) 4. Progress (rank reduction). From s∈Ws∈ W with rank k>0k>0, the strategy must have a positive probability to move to a state with a lower rank. Formally, for every opponent action b, there must be a joint action in the team’s support that leads to a state t with rank strictly less than k. Φprog= _prog= ⋀s∈S⋀k=1N⋀b∈A(rs,k→⋁a¯∈∏iAvi(s)((⋀Pi∈Πus,i,ai)∧⋁t∈Supp(δ(s,a¯,b))⋁j=0k−1rt,j)) _s∈ S _k=1^N _b∈ A_ O (r_s,k→ _ a∈Π _iAv_i(s) ( ( _P_i∈ u_s,i,a_i ) _t (δ(s, a,b)) _j=0^k-1r_t,j ) ) 0.C.2.3 Proof of correctness Based on Section˜4.1, we know that if an almost-sure winning strategy exists, there exists a memoryless strategy that guarantees almost-sure reachability. First, we argue in the following lemma that the almost-sure reachability depends only on the support of the memoryless strategies rather than their precise numeric values. The support of a strategy on a state is defined as the set of actions that are played with a positive probability. Lemma 10() Let σ and σ′σ be two memoryless collective strategies for the team. If σ and σ′σ have the same support (i.e., for all s∈Ss∈ S and Pi∈ΠP_i∈ , a∈Ai∣σi(s)(a)>0=a∈Ai∣σi′(s)(a)>0\a∈ A_i _i(s)(a)>0\=\a∈ A_i σ _i(s)(a)>0\), then σ guarantees almost-sure reachability to T if and only if σ′σ does. Proof Fix a memoryless collective strategy σ for the team players. Since σ is fixed, the game reduces to a Markov Decision Process (MDP) where only the opponent chooses actions. In this MDP, the opponent’s goal is to minimize the probability of reaching T. The transitions are defined by the opponent’s action b∈Ab∈ A_ O and σ. Specifically, a transition from s to t under action b is possible (has probability >0>0) if and only if the team plays a joint action a¯ a with a positive probability in σ such that δ(s,a¯,b)(t)>0δ(s, a,b)(t)>0. A standard result in MDPs is that the player who minimises (opponent in our setting) can prevent almost-sure reachability if and only if they can force the game into a set of non-target states called an end component [34]. An end component is a set of states where opponent can choose actions to keep the game inside the set forever. More formally, an end component is a set of states ECEC such that for every state s∈ECs∈ EC, there exists an opponent action b where all possible successor states belong to ECEC. The existence of an end component depends only on the graph structure (the set of edges with non-zero probability), not on the exact transition probabilities [34]. Since σ and σ′σ have the same support, they make the same transitions available for the opponent. Thus, the graph structures of their induced MDPs are the same. Therefore, the opponent can avoid reaching T with probability 1 under σ if and only if they can do so under σ′σ . ∎ By Lemma˜10, the existence of an almost-sure winning strategy depends only on the support of the strategy. Therefore, the problem reduces to finding the support of a strategy that guarantees almost-sure reachability. Lemma 11() The formula Φ is satisfiable if and only if there exists a memoryless strategy η that guarantees reaching the target T from state s0s_0 with probability 1. Proof If Φ is satisfiable, the variables us,i,au_s,i,a define the support of a memoryless collective strategy η. The Safety constraint ensures the team stays in W, and the Progress constraint ensures that from any state in W, the game moves to a lower rank with a positive probability. As stated in Section˜4.1, this guarantees reaching rank 0 (T) with probability 1. On the other hand, suppose there exists a memoryless collective strategy η that wins almost-surely. We construct a satisfying assignment for Φ as follows: 1. Let W be the set of almost-sure winning states under η. Set ws=truew_s=true for all s∈Ws∈ W. 2. Define the ranks based on the layers WkW_k constructed in Section˜4.1. For each s∈Ws∈ W, let k be the smallest index such that s∈Wks∈ W_k. Set rs,k=truer_s,k=true and all other rank variables for s to false. 3. Set us,i,a=trueu_s,i,a=true if and only if ηi(s)(a)>0 _i(s)(a)>0. This assignment satisfies all constraints by construction. The Consistency constraints hold because every state in W belongs to exactly one layer WkW_k. The Non-Empty Support holds because η is a valid strategy. The Safety constraint is satisfied because an almost-sure winning strategy cannot leave the winning set W (otherwise there would be a non-zero probability of never reaching T). The Progress constraint is satisfied by the definition of the sequence (W0,W1,…)(W_0,W_1,…): for any s∈Wks∈ W_k (k>0k>0), the strategy η guarantees a transition to Wk−1W_k-1 with positive probability against any opponent action. ∎ From Lemma˜11 we conclude with the main theorem. 0.C.2.4 Binary rank encoding The encoding presented above uses (N)O(N) unary variables to represent the rank of each state, leading to a total of (N2)O(N^2) variables. To improve scalability in the experiments, we replace this with a binary rank encoding. We use L=(logN)L=O( N) Boolean variables per state to represent its rank rsr_s in binary form. This reduces the total number of rank variables to (NlogN)O(N N). We adapt the progress constraint rt<rsr_t<r_s using a bitwise comparator circuit. Let b¯t b_t and b¯s b_s be the binary vectors for the ranks of states t and s. We encode the strictly-less-than relation as: BSlt(b¯t,b¯s):=⋁i=0L−1(¬bt,i∧bs,i∧⋀j=i+1L−1(bt,j↔bs,j))BSlt( b_t, b_s):= _i=0^L-1 ( b_t,i b_s,i _j=i+1^L-1(b_t,j b_s,j) ) This predicate represents the condition that at the most significant bit position i where the vectors differ, the bit in b¯t b_t is 0 and the bit in b¯s b_s is 1. This new constraint uses (logN)O( N) literals, providing an ~(N) O(N) improvement in the number of variables. This formulation simplifies the progress constraint significantly. In the unary encoding, Φprog _prog required an explicit disjunction over all possible ranks k∈1,…,Nk∈\1,…,N\. With the binary comparator, this disjunction is replaced by a single symbolic evaluation. We rewrite the progress constraint as follows: Φprog=⋀s∈S∖T⋀b∈A(ws→ _prog= _s∈ S T _b∈ A_ O (w_s→ ⋁a¯∈∏iAvi(s)((⋀Pi∈Πus,i,ai) _ a∈ _iAv_i(s) ( ( _P_i∈ u_s,i,a_i ) ∧⋁t∈Supp(δ(s,a¯,b))BSlt(b¯t,b¯s))) _t (δ(s, a,b))BSlt( b_t, b_s) ) ) By eliminating the iteration over k, the size of the formula for each transition is reduced by a factor ~(N) O(N). We use this optimisation in the experiments. Appendix 0.D Appendix for Section˜6 Theorem 0.D.1 Given a game G, if there is a collective strategy σ¯Π σ_ that ensures that the target set T is avoided with probability 11, then there is a deterministic one. Proof Let SsafeS_safe be the set of states in the game from which there is a collective strategy that ensures that the target T is avoided with probability 11. Clearly, observe that Ssafe∩T=∅S_safe∩ T= . We construct a deterministic memoryless collective strategy σ¯Πdet σ^det_ for the team on the set SsafeS_safe as follows. Let σ¯Π σ_ be the collective strategy witnessing that a state s∈Ssafes∈ S_safe is safe. At any state s∈Ssafes∈ S_safe, the collective strategy defines a joint distribution over actions s=∏p∈Πσp(s)D_s= _p∈ _p(s). Informally, this is the distribution proposed by the strategy assuming the play starts at the state s. Since this collective strategy ensures safety with probability 11, the support of this distribution must contain actions that also lead to the set SsafeS_safe. We choose a joint action profile a¯∗=(ap∗)p∈Π∈∏p∈Πp(s) a^*=(a^*_p)_p∈ ∈ _p∈ Av_p(s) such that: s(a¯∗)>0.D_s( a^*)>0. We then define the deterministic memoryless strategy σ¯Πdet σ^det_ such that for every player p∈Πp∈ and state s∈Ssafes∈ S_safe, σpdet(s)σ^det_p(s) assigns probability 11 to the action ap∗a^*_p. We now show that this strategy σ¯Πdet σ^det_ ensures T is avoided with probability 11. Consider any state s∈Ssafes∈ S_safe and the chosen joint action a¯∗ a^*. Let b∈(s)b∈ Av_ O(s) be any action profile for the opponent. Let t be any state in the support of the transition function δ(s,(a¯∗,b))δ(s,( a^*,b)). We claim that t∈Ssafet∈ S_safe. Suppose for the sake of contradiction that t∉Ssafet∉ S_safe. By the definition of SsafeS_safe, this would imply that from t, the opponents have a strategy to force the game into T with non-zero probability. However, since a¯∗ a^* is played with positive probability under the original safe strategy σ¯Π σ_ , and t is reached with positive probability given a¯∗ a^* and b, it follows that the original strategy σ¯Π σ_ allows the game to reach a state (t) from which T is reached with positive probability. This implies that the probability of avoiding T from s under σ¯Π σ_ is strictly less than 11, contradicting the assumption that s∈Ssafes∈ S_safe. Thus, all successors of s under the strategy σ¯Πdet σ^det_ (regardless of the opponent’s move) remain in SsafeS_safe. Since the initial state is in SsafeS_safe and Ssafe∩T=∅S_safe∩ T= , the team avoids T forever with probability 11. Observe that since the strategy is deterministic, no play of the game on this strategy avoids the target set T and not just plays of measure 11. ∎ Corollary 2 Given a game G, if there is a collective strategy σ¯Π σ_ that ensures that the set T is avoided with probability 11, then there is a deterministic strategy that ensures all plays compatible with σ¯Π σ_ avoid the set T. Proof This follows from the fact that in one-step, all one-step successors of the state SsafeS_safe, chosen by the deterministic strategy constructed above also land in the set SsafeS_safe. All plays therefore stay in SsafeS_safe. ∎ Appendix 0.E Appendix for Section˜5 0.E.1 Optimisation strategy for local reachability games This section explains how we solve the one-shot games in the value iteration algorithm heuristically. In the value iteration methods, the main challenge is calculating the PrePre operator at each step. This operator tells us the best probability of winning the team can guarantee in a single step, assuming the opponent plays optimally. Because the team players randomise independently, we cannot simply use linear programming, which is the typical strategy to solve a zero-sum two-player concurrent game. Instead, we have to solve a nonlinear optimisation problem, which is explained below. We use the Sequential Least Squares Programming (SLSQP) algorithm for this problem. Below, we explain the formal logic behind the problem and why SLSQP can work well. 0.E.1.1 Optimisation problem To find the value Pre(uk)(s)Pre(u_k)(s) at a specific step k, we have to solve the following single maximisation problem. We want to find the best probability distributions for the team (p) and a value v such that the team wins with at least probability v, no matter what the opponent does. We formulate this as: maximize v v subject to ∑a∈Aii(a)=1,∀i∈Π _a∈ A_ip_i(a)=1, ∀ i∈ i(a)≥0,∀i∈Π,∀a∈Ai _i(a)≥ 0, ∀ i∈ ,∀ a∈ A_i ∑a→∈AΠ(∏i∈Πi(ai))⋅s,k(a→,b)≥v,∀b∈AO _ a∈ A_ ( _i∈ p_i(a_i) )·V_s,k( a,b)≥ v, ∀ b∈ A_O where s,k(a→,b)=∑s′∈Sδ(s,a→,b)(s′)⋅uk(s′)V_s,k( a,b)= _s ∈ Sδ(s, a,b)(s )· u_k(s ) is the expected next-step value given joint team action a→ a and opponent action b based on the previous values of the value iteration algorithm. In this setup, the objective is simple (just maximise v). The constraints on the probabilities are also simple linear equations. The only hard part is the last constraint: s,k(a→,b)V_s,k( a,b) depends on the product of the team’s probabilities, which makes it a polynomial function. Thus, we cannot solve it simply with a Linear Programming (LP) solver. Instead, we try to solve the optimisation problem with a heuristic approach instead of using slow SMT solvers (the VI-ETR algorithm). 0.E.1.2 Smoothness and differentiability To use fast optimisation algorithms, we need the problem to be smooth. In mathematical terms, this means the functions should be infinitely differentiable (C∞C^∞). Our problem satisfies this. The constraints are just polynomials (sums and products of probabilities). Polynomials are smooth everywhere. Thus, we can calculate their slopes (gradients) and curvatures (Hessians) exactly. This allows us to use advanced algorithms that rely on this curvature information to find the answer quickly, rather than guessing blindly. 0.E.1.3 Algorithm Selection: SLSQP We use the SLSQP algorithm to solve this optimisation problem. While there are many non-linear solvers available, we chose SLSQP for the following specific reasons, motivated by our problem structure and results from similar fields: 1. Efficient handling of linear constraints. Our basic constraints are that probabilities must sum to 1 and be non-negative. These are linear constraints. Unlike penalty-based methods that may explore infeasible regions during intermediate steps, SLSQP exploits the structure of linear constraints to satisfy them exactly at every iteration [15]. Since the geometry of our search space is defined largely by the probability simplex, this ensures the solver never wastes computational effort evaluating undefined or invalid probability distributions (e.g., negative probabilities or not with a sum equal to 1). 2. Efficiency with sparse strategies. In concurrent games, the best strategies are often simple: players sometimes rely on just a small number of actions rather than mixing many options. This means the probability for most actions should be exactly zero. SLSQP employs an active-set strategy which is designed to quickly identify and lock onto these zero values. This contrasts with Interior Point methods, which approach zero only gradually and can face numerical precision issues when the optimal solution requires exact zeros [15]. 3. Proven performance in dynamic games. SQP-based methods are a standard choice for solving games that evolve over time. For instance, Zhu and Borrelli [41] showed that SQP acts much faster than generic solvers when finding solutions to dynamic games. They found that SQP is particularly good at handling the tight interactions between players. This performance improves further when the solver uses exact gradients instead of estimating them, which is the technique used in SLSQP. 4. Sound under-approximation. Since our optimisation problem is non-convex, SLSQP behaves as a local optimiser. This guarantees that the value v∗v^* it returns is a valid under-approximation of the true maximum (v∗≤Pre(uk)(s)v^*≤ Pre(u_k)(s)). In the context of formal verification, this is a “safe” error: we might underestimate the team’s winning chance, but we will never falsely claim a higher probability than is actually achievable. Our VI-Hybrid algorithm balances this error by using the SLSQP result as a fast, high-quality starting point to help the exact SMT solver find the global optimum. 0.E.2 Benchmarks In this section, we provide a formal definition for each of the benchmarks in the paper. 0.E.2.1 Pursuit-Evasion with rendezvous. We analyze a variant of the pursuit-evasion game [33] played on a directed graph G=(V,E)G=(V,E). The game contains cooperative team players Π=P1,…,Pk =\P_1,…,P_k\ against a single opponent O. A global state is defined by the position vector s=(l1,…,lk,l)s=(l_1,…,l_k,l_ O), and all players have perfect information. Dynamics. Rounds are concurrent. From a current node u, each player independently chooses a next step from the closed neighbourhood N(u)=v∣(u,v)∈E∪uN(u)=\v (u,v)∈ E\∪\u\. This allows agents to either move to an adjacent node or wait at their current position. Players can play a mixed strategy over their actions. Objectives. The team aims to meet at a single node while avoiding the opponent. The conditions are: 1. Rendezvous (win): All team members meet at the same node (l1=⋯=lkl_1=…=l_k). 2. Capture (loss): The opponent intercepts any team member (∃i s.t. li=l∃ i s.t. l_i=l_ O). Capture strictly takes precedence over rendezvous. If the team meets at a node also occupied by the opponent, it counts as a loss. Consequently, the target set T consists only of safe rendezvous states (where l1=⋯=lk≠l_1=…=l_k≠ l_ O). The team maximises the probability of reaching T, while the opponent minimises it. We evaluated six scenarios shown in Figure 5. The results are presented in Table 1. Scenario 10T1T_11T2T_22OppOppScenario 201T1T_12OppOpp3T2T_2Scenario 301T1T_12OppOpp3T2T_2Scenario 40T1T_11OppOpp2T2T_234Scenario 50T1T_11T2T_22T3T_33OppOppScenario 640T1T_11T2T_22T3T_33OppOpp5 Figure 5: Visualizations of the six scenarios used for evaluation. Scenarios 1–4 involve two team players. Scenarios 5 and 6 involve three team players. Green nodes indicate the team’s starting positions, while the red node indicates the opponent’s. The results are available in Table 1. Scenario 10T1T_11T2T_22OppOppScenario 20T1T_11OppOpp2T2T_23Scenario 31OppOpp230T1T_14T2T_2Scenario 40T1T_11OppOpp23T2T_245Scenario 50T1T_1123OppOpp4567T2T_2Scenario 60T1T_112OppOpp3456789T2T_2Scenario 70T1T_112345OppOpp67891011T2T_2 Figure 6: Visualizations of the seven scenarios used for the almost-sure experiment. All scenarios involve two team players. Green nodes indicate the team’s starting positions, while the red node indicates the opponent’s. The results are available in Table 4. 0.E.2.2 Robot coordination on a grid. This benchmark is adapted from the robot coordination case study in [26]. The game occurs on an H×WH× W grid. The cooperative team consists of k robots Π=r1,…,rk =\r_1,…,r_k\, while the opponent O controls the environment. A state s=(l1,…,lk)s=(l_1,…,l_k) represents the coordinate positions of all robots. Dynamics. In each round, every robot independently chooses a move from A=N,S,E,W,WaitA=\N,S,E,W,Wait\. Simultaneously, the opponent sets the wind condition from A=N,S,E,W,CalmA_ O=\N,S,E,W,Calm\. Transitions. The game includes a time penalty to motivate the team to reach the target quickly. At the end of each round, with probability p=0.5p=0.5, the game transitions to a sink loss state. If the game continues, transitions depend on the interaction between the robot’s action and the wind. If any two robots occupy the same cell, the team collides and loses immediately. Otherwise, movement occurs as follows: 1. Calm conditions: If the wind is Calm, all robots move deterministically to their chosen target (or remain stationary if they chose Wait). 2. Waiting in wind: If a robot chooses Wait during a storm, the wind pushes it one step in the wind’s direction with probability 1. 3. Moving in wind: If a robot attempts to move, we compare its direction to the wind: • Tailwind: If the directions match, the wind assists the robot. It moves to the target cell with probability 1. • Headwind or crosswind: If the robot fights the wind, it reaches its target with probability 0.5. With probability 0.5, it is overpowered and pushed one step in the wind’s direction. Objective. The team aims to maximise the probability of reaching a target configuration T without collisions and before the random termination occurs. The opponent minimises this probability. We evaluated four scenarios shown in Figure 7. The results are presented in Table 2. TR1R2 TR1R2 TR1R2 TR1R2 Scenario 1 Scenario 2 Scenario 3 Scenario 4 Figure 7: The four benchmark scenarios for the robot game. Robots are shown as circles (RiR_i), and the target is marked with T. 0.E.2.3 Jamming multi-channel radio systems. Adapted from [26], this benchmark models k sensors Π=x1,…,xk =\x_1,…,x_k\ transmitting packets over C frequency channels against a jammer O. A state is defined by buffer levels s=(b1,…,bk)s=(b_1,…,b_k). Initially, each sensor xix_i holds BiB_i packets; bi∈0,…,Bib_i∈\0,…,B_i\ tracks the remaining packets. Dynamics. In each round, sensors independently choose an action from A=1,…,C,WaitA=\1,…,C,Wait\, while the jammer targets a channel from A=1,…,C,IdleA_ O=\1,…,C,Idle\. Transition rules for sensor xix_i on channel c: 1. Successful transmission: The buffer decrements (bi′=bi−1b _i=b_i-1) if and only if the opponent does not jam channel c (a≠ca_ O≠ c) and no other sensor selects c (no collision). 2. Transmission failure: If channel c is jammed by O or selected by another sensor (collision), the team players lose. 3. Wait: If the sensor chooses Wait, it makes no transmission attempt, and the buffer remains unchanged (bi′=bib _i=b_i). Objective. The team maximises the probability of reaching the target T=(0,…,0)T=(0,…,0) where all buffers are empty, while the opponent minimises this probability. The results for this benchmark are presented in Table 3. 0.E.3 Results This section contains the full results for each of the benchmarks. They are presented in Tables 1, 2, 3, and 4 in the following pages. h Scenario Team States Algorithm Individual Randomness Shared Randomness Trans Time Val Iter Time Val Iter 1 2 27 512 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR 4.42 0.290 6 0.40 0.500 2 VI-OPT 0.54 0.290 6 0.02 0.500 2 VI-Hybrid 0.58 0.290 6 0.05 0.500 2 PRISM - - N/A 1.58 0.500 N/A 2 2 64 4096 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 1 3.83 0.750 2 VI-OPT 0.39 0.300 9 0.27 0.750 2 VI-Hybrid T.O. 0.000 1 0.56 0.750 2 PRISM - - N/A 1.58 0.750 N/A 3 2 64 729 ETR-Direct T.O. N/F N/A 0.48 1.000 N/A VI-ETR 14.38 0.296 6 47.41 0.990 98 VI-OPT 0.23 0.296 6 2.27 0.990 100 VI-Hybrid 1.77 0.296 6 6.12 0.990 100 PRISM - - N/A 1.34 0.999 N/A 4 2 125 1331 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR 35.21 0.348 4 76.80 0.993 70 VI-OPT 0.27 0.348 4 3.66 0.993 72 VI-Hybrid 2.06 0.348 4 10.19 0.993 72 PRISM - - N/A 1.36 0.999 N/A 5 3 256 65536 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 1 57.89 0.750 2 VI-OPT 1.98 0.075 8 3.00 0.750 2 VI-Hybrid T.O. 0.000 1 15.49 0.750 2 PRISM - - N/A 7.32 0.750 N/A 6 3 1296 38416 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.125 2 T.O. 0.976 22 VI-OPT 4.38 0.173 3 138.64 0.993 72 VI-Hybrid T.O. 0.125 2 312.18 0.993 72 PRISM - - N/A 2.29 0.999 N/A Table 1: Experimental results for the six scenarios illustrated in Figure 5. For each algorithm, the table lists the execution time in seconds, the computed value, and the number of iterations required for convergence. The label “T.O.” denotes a timeout after 600 seconds. In cases where an algorithm timed out, we report the value obtained from the final completed iteration. “N/F” indicates Not Found. Scenario Team Size States Algorithm Individual Randomness Shared Randomness Trans Time Value Iters Time Value Iters 1 2 16 2000 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR 7.86 0.293 6 4.86 0.324 6 VI-OPT 0.56 0.274 6 0.72 0.323 6 VI-Hybrid 5.35 0.293 6 2.27 0.324 6 PRISM - - N/A 1.47 0.324 N/A 2 2 36 4500 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.094 3 23.31 0.115 8 VI-OPT 1.33 0.113 8 2.81 0.115 8 VI-Hybrid T.O. 0.000 2 13.60 0.115 8 PRISM - - N/A 1.74 0.115 N/A 3 2 81 10125 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 3 68.28 0.038 9 VI-OPT 3.61 0.033 9 9.54 0.036 9 VI-Hybrid T.O. 0.000 3 42.59 0.038 9 PRISM - - N/A 2.56 0.038 N/A 4 2 144 18000 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 3 143.59 0.011 10 VI-OPT 6.09 0.004 9 17.10 0.010 10 VI-Hybrid T.O. 0.000 3 96.53 0.011 10 PRISM - - N/A 3.64 0.011 N/A Table 2: Experimental results for the four scenarios illustrated in Figureă 7. For each algorithm, the table lists the executionă time in seconds, the computed value, and the number of iterationsă required for convergence. The label "T.O." denotes a timeout after 600 seconds.ă In cases where an algorithm timed out, we report the value obtainedă from the final completed iteration. Parameters States Algorithm Individual Randomness Shared Randomness Trans Time Value Iters Time Value Iters C=2, B=[1, 1] 5 135 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR 0.33 0.250 3 0.25 0.250 3 VI-OPT 0.62 0.246 46 0.02 0.250 3 VI-Hybrid 0.08 0.250 3 0.03 0.250 3 PRISM - - N/A 1.38 0.250 N/A C=2, B=[2, 2] 10 270 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR 1.35 0.062 5 0.96 0.062 5 VI-OPT 1.23 0.061 46 0.07 0.063 5 VI-Hybrid 0.31 0.063 5 0.14 0.063 5 PRISM - - N/A 1.29 0.062 N/A C=3, B=[3, 3] 17 1088 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 2 6.18 0.088 7 VI-OPT 2.67 0.084 39 0.50 0.088 7 VI-Hybrid T.O. 0.000 1 1.15 0.088 7 PRISM - - N/A 1.37 0.088 N/A C=3, B=[4, 4] 26 1664 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 2 11.53 0.039 9 VI-OPT 4.34 0.039 39 0.83 0.039 9 VI-Hybrid T.O. 0.000 1 2.31 0.039 9 PRISM - - N/A 1.39 0.039 N/A C=4, B=[5, 5] 37 4625 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 1 34.32 0.056 11 VI-OPT 8.24 0.050 40 2.62 0.056 11 VI-Hybrid T.O. 0.000 1 9.41 0.056 11 PRISM - - N/A 1.44 0.056 N/A C=4, B=[6, 6] 50 6250 ETR-Direct T.O. N/F N/A T.O. N/F N/A VI-ETR T.O. 0.000 1 55.06 0.032 13 VI-OPT 10.24 0.024 40 4.59 0.030 14 VI-Hybrid T.O. 0.000 1 17.41 0.032 13 PRISM - - N/A 1.54 0.032 N/A Table 3: Performance comparison of algorithms on the jamming multi-channel radio system benchmark. Each scenario is defined by the number of channels C followed by the initial buffer sizes [B1,…,Bk][B_1,…,B_k] for k players. We report the number of states, number of transitions, execution time in seconds, computed game value V, and the number of iterations for the value iteration algorithms. “T.O.” indicates the algorithm exceeded the 600s time limit. For instances that timed out, we provide the value from the last completed iteration, where available. Scenario States Trans Method Individual Randomness Shared Randomness Time Result Time Result 1 27 729 SAT-Direct 0.02 UNSAT 0.02 UNSAT PRISM-Qual – – 1.62 UNSAT 2 64 1728 SAT-Direct 0.09 SAT 0.09 SAT PRISM-Qual – – 1.42 SAT 3 125 4913 SAT-Direct 0.29 UNSAT 0.33 UNSAT PRISM-Qual – – 1.39 UNSAT 4 216 5832 SAT-Direct 0.67 SAT 0.60 SAT PRISM-Qual – – 1.43 SAT 5 512 21952 SAT-Direct 5.26 SAT 8.46 SAT PRISM-Qual – – 2.94 SAT 6 1000 46656 SAT-Direct 20.54 UNSAT 19.93 UNSAT PRISM-Qual – – 5.98 UNSAT 7 1728 97336 SAT-Direct 93.70 UNSAT 58.45 UNSAT PRISM-Qual – – 42.27 UNSAT Table 4: Experimental results for the almost-sure reachability problem on the Pursuit-Evasion benchmark. The scenarios are illustrated in Figure 6. For each scenario, the table lists the execution time in seconds and the outcome. We compare our SAT-Direct algorithm (under both individual and shared randomness) against the PRISM-games baseline (which supports only shared randomness). A result of SAT indicates that the target is almost-surely reachable, while UNSAT indicates that it is not.