A State Aggregation Approach to Singularly Perturbed Markov Reward Processes

In this paper, we propose a single sample path based algorithm with state aggregation to optimize the average rewards of singularly perturbed Markov reward processes (SPMRPs) with a large scale state spaces. It is assumed that such a reward process depend on a set of parameters. Differing from the other kinds of Markov chain, SPMRPs have their own hierarchical structure. Based on this special structure, our algorithm can alleviate the load in the optimization for performance. Moreover, our method can be applied on line because of its evolution with the sample path simulated. Compared with the original algorithm applied on these problems of general MRPs, a new gradient formula for average reward performance metric in SPMRPs is brought in, which will be proved in Appendix, and then based on these gradients, the schedule of the iteration algorithm is presented, which is based on a single sample path, and eventually a special case in which parameters only dominate the disturbance matrices will be analyzed, and a precise comparison with be displayed between our algorithm with the old ones which is aim to solve these problems in general Markov reward processes. When applied in SPMRPs, our method will approach a fast pace in these cases. Furthermore, to illustrate the practical value of SPMRPs, a simple example in multiple programming in computer systems will be listed and simulated. Corresponding to some practical model, physical meanings of SPMRPs in networks of queues will be clarified.




References:
[1] P. Marbach and J.N. Tsitsiklis "Simulation-based optimization of
Markov reward processes, " IEEE Transactions on Automatic Control,
Vol. 46, No.2, February, 2001, pp.191-209.
[2] G. Yin and Q. Zhang "Singularly perturbed discrete-Time Markov
chains," SIAM Journal Appl. Math. Vol.61, No.3, 2 000, pp 834-854.
[3] M. Abbad and J.A. Filar "Perturbation and Stability Theory for Markov
Control Problems," IEEE Transactions on Automatic Control, Vol.37,
No.9, September, 1992, pp 1415-1420.
[4] M. Abbad and J. A. Filar "Algorithms for singularly perturbed limiting
average Markov Control Problem," IEEE Transactions on Automatic
Control, Vol.37, No.9, September, 1992, pp 1421-1425.
[5] G. Yin Q. Zhang and G. Badowski "Discrete-time singularly perturbed
Markov chain: Aggregation, Occupation measures, and switching
diffusion limit," Adv. Appl. Prob.Vol.35, 2003, pp 449-476.
[6] F. Delebecque "A reduction process for perturbed Markov chains,"
SIAM Journal Appl. Math. Vol. 43, No.2, April 1983.
[7] R.H. Liu, Q. Zhang and G. Yin "Singularly perturbed Markov decision
processes in discrete time," Decision and Control, 2001. Proceedings of
the 40th IEEE Conference on. Vol. 3, 4-7 December. 2001, pp 2119 -
2124.
[8] M. Abbad, J.A. Filar, and T.R. Bielecki, "Singularly perturbed Markov
control problem: limiting average cost," Decision and Control, 1989.
Proceedings of the 28th IEEE Conference on. Vol. 2, 4-7 December.
1989- pp. 1263 - 1266.
[9] Xi-ren Cao, "The potential structure of sample paths and performance
sensitivities of Markov systems The potential structure of sample paths
and performance sensitivities of Markov systems," IEEE Transactions on
Automatic Control, Vol. 49, Issue,12 , December. 2004 pp. 2129 - 2142.
[10] H. Fang and Xi-ren Cao "Potential-based online policy iteration
algorithms for Markov decision processes," IEEE Transactions on
Automatic Control, Vol. 49 , Issue, 4 , April. 2004, pp. 493 - 505.
[11] H. Tang, H. Xi and B. Yin. "Performance optimization of continuous
time Markov control processes based on performance potentials,". Int.
Journal of Systems Science, 2003, Vol. 34(1), pp, 63-71.
[12] D. Zhang, H. Xi and B. Yin "Simulation-based optimization for
singularly perturbed Markov reward process with aggregated states,"
Int. Conference of Intelligence Computing 2005. Accepted.
[13] J. Ledoux "On weak lump ability of denumerable Markov chains,"
Statistics & Probability Letters, Vol. 25, 1995, pp, 329-339.
[14] J.R. Jackson. "Job shop-like queuing systems," Man. Sci. Vol. 9 October,
1963, pp, 131-142.
[15] P.J. Courtois. "On the near-complete-decomposability of networks of
queues and of stochastic models of multiprogramming computing
system". Scientific Rep. CMU-CS-72-11, Carnegie-Mellon U,
November, 1971