Abstract: Markov games can be effectively used to design
controllers for nonlinear systems. The paper presents two novel
controller design algorithms by incorporating ideas from gametheory
literature that address safety and consistency issues of the
'learned' control strategy. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. We generate
an optimal control policy for the agent (controller) via a simple
Linear Program enabling the controller to learn about the unknown
environment. The controller is facing an unknown environment and
in our formulation this environment corresponds to the behavior rules
of the noise modeled as the opponent. Proposed approaches aim to
achieve 'safe-consistent' and 'safe-universally consistent' controller
behavior by hybridizing 'min-max', 'fictitious play' and 'cautious
fictitious play' approaches drawn from game theory. We empirically
evaluate the approaches on a simulated Inverted Pendulum swing-up
task and compare its performance against standard Q learning.
Abstract: Markov games are a generalization of Markov
decision process to a multi-agent setting. Two-player zero-sum
Markov game framework offers an effective platform for designing
robust controllers. This paper presents two novel controller design
algorithms that use ideas from game-theory literature to produce
reliable controllers that are able to maintain performance in presence
of noise and parameter variations. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. Our approach
generates an optimal control policy for the agent (controller) via a
simple Linear Program enabling the controller to learn about the
unknown environment. The controller is facing an unknown
environment, and in our formulation this environment corresponds to
the behavior rules of the noise modeled as the opponent. Proposed
controller architectures attempt to improve controller reliability by a
gradual mixing of algorithmic approaches drawn from the game
theory literature and the Minimax-Q Markov game solution
approach, in a reinforcement-learning framework. We test the
proposed algorithms on a simulated Inverted Pendulum Swing-up
task and compare its performance against standard Q learning.