2024-10-29 | | Total: 7
Multimedia streaming accounts for the majority of traffic in today's internet. Mechanisms like adaptive bitrate streaming control the bitrate of a stream based on the estimated bandwidth, ideally resulting in smooth playback and a good Quality of Experience (QoE). However, selecting the optimal bitrate is challenging under volatile network conditions. This motivated researchers to train Reinforcement Learning (RL) agents for multimedia streaming. The considered training environments are often simplified, leading to promising results with limited applicability. Additionally, the QoE fairness across multiple streams is seldom considered by recent RL approaches. With this work, we propose a novel multi-agent environment that comprises multiple challenges of fair multimedia streaming: partial observability, multiple objectives, agent heterogeneity and asynchronicity. We provide and analyze baseline approaches across five different traffic classes to gain detailed insights into the behavior of the considered agents, and show that the commonly used Proximal Policy Optimization (PPO) algorithm is outperformed by a simple greedy heuristic. Future work includes the adaptation of multi-agent RL algorithms and further expansions of the environment.
We are interested in studying how heterogeneous agents can learn to communicate and cooperate with each other without being explicitly pre-programmed to do so. Motivated by this goal, we present and analyze a distributed solution to a two-player signaler-responder game which is defined as follows. The signaler agent has a random, exogenous need and can choose from four different strategies: never signal, always signal, signal when need, and signal when no need. The responder agent can choose to either ignore or respond to the signal. We define a reward to both agents when they cooperate to satisfy the signaler's need, and costs associated with communication, response and unmet needs. We identify pure Nash equilibria of the game and the conditions under which they occur. As a solution for this game, we propose two new distributed Bayesian learning algorithms, one for each agent, based on the classic Thompson Sampling policy for multi-armed bandits. These algorithms allow both agents to update beliefs about both the exogenous need and the behavior of the other agent and optimize their own expected reward. We show that by using these policies, the agents are able to intelligently adapt their strategies over multiple iterations to attain efficient, reward-maximizing equilibria under different settings, communicating and cooperating when it is rewarding to do so, and not communicating or cooperating when it is too expensive.
Our interest is in constructing interactive systems involving a human-expert interacting with a machine learning engine on data analysis tasks. This is of relevance when addressing complex problems arising in areas of science, the environment, medicine and so on, which are not immediately amenable to the usual methods of statistical or mathematical modelling. In such situations, it is possible that harnessing human expertise and creativity to modern machine-learning capabilities of identifying patterns by constructing new internal representations of the data may provide some insight to possible solutions. In this paper, we examine the implementation of an abstract protocol developed for interaction between agents, each capable of constructing predictions and explanations. The \PXP protocol, described in [12] is motivated by the notion of ''two-way intelligibility'' and is specified using a pair of communicating finite-state machines. While the formalisation allows the authors to prove several properties about the protocol, no implementation was presented. Here, we address this shortcoming for the case in which one of the agents acts as a ''generator'' using a large language model (LLM) and the other is an agent that acts as a ''tester'' using either a human-expert, or a proxy for a human-expert (for example, a database compiled using human-expertise). We believe these use-cases will be a widely applicable form of interaction for problems of the kind mentioned above. We present an algorithmic description of general-purpose implementation, and conduct preliminary experiments on its use in two different areas (radiology and drug-discovery). The experimental results provide early evidence in support of the protocol's capability of capturing one- and two-way intelligibility in human-LLM in the manner proposed in [12].
Traditional economic models often rely on fixed assumptions about market dynamics, limiting their ability to capture the complexities and stochastic nature of real-world scenarios. However, reality is more complex and includes noise, making traditional models assumptions not met in the market. In this paper, we explore the application of deep reinforcement learning (DRL) to obtain optimal production strategies in microeconomic market environments to overcome the limitations of traditional models. Concretely, we propose a DRL-based approach to obtain an effective policy in competitive markets with multiple producers, each optimizing their production decisions in response to fluctuating demand, supply, prices, subsidies, fixed costs, total production curve, elasticities and other effects contaminated by noise. Our framework enables agents to learn adaptive production policies to several simulations that consistently outperform static and random strategies. As the deep neural networks used by the agents are universal approximators of functions, DRL algorithms can represent in the network complex patterns of data learnt by trial and error that explain the market. Through extensive simulations, we demonstrate how DRL can capture the intricate interplay between production costs, market prices, and competitor behavior, providing insights into optimal decision-making in dynamic economic settings. The results show that agents trained with DRL can strategically adjust production levels to maximize long-term profitability, even in the face of volatile market conditions. We believe that the study bridges the gap between theoretical economic modeling and practical market simulation, illustrating the potential of DRL to revolutionize decision-making in market strategies.
Distributed optimization finds many applications in machine learning, signal processing, and control systems. In these real-world applications, the constraints of communication networks, particularly limited bandwidth, necessitate implementing quantization techniques. In this paper, we propose distributed optimization dynamics over multi-agent networks subject to logarithmically quantized data transmission. Under this condition, data exchange benefits from representing smaller values with more bits and larger values with fewer bits. As compared to uniform quantization, this allows for higher precision in representing near-optimal values and more accuracy of the distributed optimization algorithm. The proposed optimization dynamics comprise a primary state variable converging to the optimizer and an auxiliary variable tracking the objective function's gradient. Our setting accommodates dynamic network topologies, resulting in a hybrid system requiring convergence analysis using matrix perturbation theory and eigenspectrum analysis.
Participatory budgeting (PB) is a voting paradigm for distributing a divisible resource, usually called a budget, among a set of projects by aggregating the preferences of individuals over these projects. It is implemented quite extensively for purposes such as government allocating funds to public projects and funding agencies selecting research proposals to support. This PhD dissertation studies the welfare-related and fairness-related objectives for different PB models. Our contribution lies in proposing and exploring novel PB rules that maximize welfare and promote fairness, as well as, in introducing and investigating a range of novel utility notions, axiomatic properties, and fairness notions, effectively filling the gaps in the existing literature for each PB model. The thesis is divided into two main parts, the first focusing on dichotomous and the second focusing on ordinal preferences. Each part considers two cases: (i) the cost of each project is restricted to a single value and partial funding is not permitted and (ii) the cost of each project is flexible and may assume multiple values.
This paper describes a highly developed personalised recommendation system using multimodal, autonomous, multi-agent systems. The system focuses on the incorporation of futuristic AI tech and LLMs like Gemini-1.5- pro and LLaMA-70B to improve customer service experiences especially within e-commerce. Our approach uses multi agent, multimodal systems to provide best possible recommendations to its users. The system is made up of three agents as a whole. The first agent recommends products appropriate for answering the given question, while the second asks follow-up questions based on images that belong to these recommended products and is followed up with an autonomous search by the third agent. It also features a real-time data fetch, user preferences-based recommendations and is adaptive learning. During complicated queries the application processes with Symphony, and uses the Groq API to answer quickly with low response times. It uses a multimodal way to utilize text and images comprehensively, so as to optimize product recommendation and customer interaction.