Hybrid Markov Game Controller Design Algorithms for Nonlinear Systems

Markov games can be effectively used to design controllers for nonlinear systems. The paper presents two novel controller design algorithms by incorporating ideas from gametheory literature that address safety and consistency issues of the 'learned' control strategy. A more widely used approach for controller design is the H∞ optimal control, which suffers from high computational demand and at times, may be infeasible. We generate an optimal control policy for the agent (controller) via a simple Linear Program enabling the controller to learn about the unknown environment. The controller is facing an unknown environment and in our formulation this environment corresponds to the behavior rules of the noise modeled as the opponent. Proposed approaches aim to achieve 'safe-consistent' and 'safe-universally consistent' controller behavior by hybridizing 'min-max', 'fictitious play' and 'cautious fictitious play' approaches drawn from game theory. We empirically evaluate the approaches on a simulated Inverted Pendulum swing-up task and compare its performance against standard Q learning.