2025-03-28 | | Total: 5
In modern resource-sharing systems, multiple agents access limited resources with unknown stochastic conditions to perform tasks. When multiple agents access the same resource (arm) simultaneously, they compete for successful usage, leading to contention and reduced rewards. This motivates our study of competitive multi-armed bandit (CMAB) games. In this paper, we study a new N-player K-arm competitive MAB game, where non-myopic players (agents) compete with each other to form diverse private estimations of unknown arms over time. Their possible collisions on same arms and time-varying nature of arm rewards make the policy analysis more involved than existing studies for myopic players. We explicitly analyze the threshold-based structures of social optimum and existing selfish policy, showing that the latter causes prolonged convergence time $\Omega(\frac{K}{\eta^2}\ln({\frac{KN}{\delta}}))$, while socially optimal policy with coordinated communication reduces it to $\mathcal{O}(\frac{K}{N\eta^2}\ln{(\frac{K}{\delta})})$. Based on the comparison, we prove that the competition among selfish players for the best arm can result in an infinite price of anarchy (PoA), indicating an arbitrarily large efficiency loss compared to social optimum. We further prove that no informational (non-monetary) mechanism (including Bayesian persuasion) can reduce the infinite PoA, as the strategic misreporting by non-myopic players undermines such approaches. To address this, we propose a Combined Informational and Side-Payment (CISP) mechanism, which provides socially optimal arm recommendations with proper informational and monetary incentives to players according to their time-varying private beliefs. Our CISP mechanism keeps ex-post budget balanced for social planner and ensures truthful reporting from players, achieving the minimum PoA=1 and same convergence time as social optimum.
Integer programming games (IPGs) are n-person games with integer strategy spaces. These games are used to model non-cooperative combinatorial decision-making and are used in domains such as cybersecurity and transportation. The prevalent solution concept for IPGs, Nash equilibrium, is difficult to compute and even showing whether such an equilibrium exists is known to be Sp2-complete. In this work, we introduce a class of relaxed solution concepts for IPGs called locally optimal integer solutions (LOIS) that are simpler to obtain than pure Nash equilibria. We demonstrate that LOIS are not only faster and more readily scalable in large-scale games but also support desirable features such as equilibrium enumeration and selection. We also show that these solutions can model a broader class of problems including Stackelberg, Stackelberg-Nash, and generalized IPGs. Finally, we provide initial comparative results in a cybersecurity game called the Critical Node game, showing the performance gains of LOIS in comparison to the existing Nash equilibrium solution concept.
Recent policy proposals aim to improve the safety of general-purpose AI, but there is little understanding of the efficacy of different regulatory approaches to AI safety. We present a strategic model that explores the interactions between the regulator, the general-purpose AI technology creators, and domain specialists--those who adapt the AI for specific applications. Our analysis examines how different regulatory measures, targeting different parts of the development chain, affect the outcome of the development process. In particular, we assume AI technology is described by two key attributes: safety and performance. The regulator first sets a minimum safety standard that applies to one or both players, with strict penalties for non-compliance. The general-purpose creator then develops the technology, establishing its initial safety and performance levels. Next, domain specialists refine the AI for their specific use cases, and the resulting revenue is distributed between the specialist and generalist through an ex-ante bargaining process. Our analysis of this game reveals two key insights: First, weak safety regulation imposed only on the domain specialists can backfire. While it might seem logical to regulate use cases (as opposed to the general-purpose technology), our analysis shows that weak regulations targeting domain specialists alone can unintentionally reduce safety. This effect persists across a wide range of settings. Second, in sharp contrast to the previous finding, we observe that stronger, well-placed regulation can in fact benefit all players subjected to it. When regulators impose appropriate safety standards on both AI creators and domain specialists, the regulation functions as a commitment mechanism, leading to safety and performance gains, surpassing what is achieved under no regulation or regulating one player only.
Wireless sensing and the internet of things (IoT) are nowadays pervasive in 5G and beyond networks, and they are expected to play a crucial role in 6G. However, a centralized optimization of a distributed system is not always possible and cost-efficient. In this paper, we analyze a setting in which two sensors collaboratively update a common server seeking to minimize the age of information (AoI) of the latest sample of a common physical process. We consider a distributed and uncoordinated setting where each sensor lacks information about whether the other decides to update the server. This strategic setting is modeled through game theory (GT) and two games are defined: i) a static game of complete information with an incentive mechanism for cooperation, and ii) a repeated game over a finite horizon where the static game is played at each stage. We perform a mathematical analysis of the static game finding three Nash Equilibria (NEs) in pure strategies and one in mixed strategies. A numerical simulation of the repeated game is also presented and novel and valuable insight into the setting is given thanks to the definition of a new metric, the price of delayed updates (PoDU), which shows that the decentralized solution provides results close to the centralized optimum.
This paper tackles the challenge of estimating a low-rank graphon from sampled network data, employing a singular value thresholding (SVT) estimator to create a piecewise-constant graphon based on the network's adjacency matrix. Under certain assumptions about the graphon's structural properties, we establish bounds on the operator norm distance between the true graphon and its estimator, as well as on the rank of the estimated graphon. In the second part of the paper, we apply our estimator to graphon games. We derive bounds on the suboptimality of interventions in the social welfare problem in graphon games when the intervention is based on the estimated graphon. These bounds are expressed in terms of the operator norm of the difference between the true and estimated graphons. We also emphasize the computational benefits of using the low-rank estimated graphon to solve these problems.