A State Aggregation Approach to Singularly Perturbed Markov Reward Processes
In this paper, we propose a single sample path based
algorithm with state aggregation to optimize the average rewards of
singularly perturbed Markov reward processes (SPMRPs) with a
large scale state spaces. It is assumed that such a reward process
depend on a set of parameters. Differing from the other kinds of
Markov chain, SPMRPs have their own hierarchical structure. Based
on this special structure, our algorithm can alleviate the load in the
optimization for performance. Moreover, our method can be applied
on line because of its evolution with the sample path simulated.
Compared with the original algorithm applied on these problems of
general MRPs, a new gradient formula for average reward
performance metric in SPMRPs is brought in, which will be proved
in Appendix, and then based on these gradients, the schedule of the
iteration algorithm is presented, which is based on a single sample
path, and eventually a special case in which parameters only
dominate the disturbance matrices will be analyzed, and a precise
comparison with be displayed between our algorithm with the old
ones which is aim to solve these problems in general Markov reward
processes. When applied in SPMRPs, our method will approach a fast
pace in these cases. Furthermore, to illustrate the practical value of
SPMRPs, a simple example in multiple programming in computer
systems will be listed and simulated. Corresponding to some practical
model, physical meanings of SPMRPs in networks of queues will be
clarified.
[1] P. Marbach and J.N. Tsitsiklis "Simulation-based optimization of
Markov reward processes, " IEEE Transactions on Automatic Control,
Vol. 46, No.2, February, 2001, pp.191-209.
[2] G. Yin and Q. Zhang "Singularly perturbed discrete-Time Markov
chains," SIAM Journal Appl. Math. Vol.61, No.3, 2 000, pp 834-854.
[3] M. Abbad and J.A. Filar "Perturbation and Stability Theory for Markov
Control Problems," IEEE Transactions on Automatic Control, Vol.37,
No.9, September, 1992, pp 1415-1420.
[4] M. Abbad and J. A. Filar "Algorithms for singularly perturbed limiting
average Markov Control Problem," IEEE Transactions on Automatic
Control, Vol.37, No.9, September, 1992, pp 1421-1425.
[5] G. Yin Q. Zhang and G. Badowski "Discrete-time singularly perturbed
Markov chain: Aggregation, Occupation measures, and switching
diffusion limit," Adv. Appl. Prob.Vol.35, 2003, pp 449-476.
[6] F. Delebecque "A reduction process for perturbed Markov chains,"
SIAM Journal Appl. Math. Vol. 43, No.2, April 1983.
[7] R.H. Liu, Q. Zhang and G. Yin "Singularly perturbed Markov decision
processes in discrete time," Decision and Control, 2001. Proceedings of
the 40th IEEE Conference on. Vol. 3, 4-7 December. 2001, pp 2119 -
2124.
[8] M. Abbad, J.A. Filar, and T.R. Bielecki, "Singularly perturbed Markov
control problem: limiting average cost," Decision and Control, 1989.
Proceedings of the 28th IEEE Conference on. Vol. 2, 4-7 December.
1989- pp. 1263 - 1266.
[9] Xi-ren Cao, "The potential structure of sample paths and performance
sensitivities of Markov systems The potential structure of sample paths
and performance sensitivities of Markov systems," IEEE Transactions on
Automatic Control, Vol. 49, Issue,12 , December. 2004 pp. 2129 - 2142.
[10] H. Fang and Xi-ren Cao "Potential-based online policy iteration
algorithms for Markov decision processes," IEEE Transactions on
Automatic Control, Vol. 49 , Issue, 4 , April. 2004, pp. 493 - 505.
[11] H. Tang, H. Xi and B. Yin. "Performance optimization of continuous
time Markov control processes based on performance potentials,". Int.
Journal of Systems Science, 2003, Vol. 34(1), pp, 63-71.
[12] D. Zhang, H. Xi and B. Yin "Simulation-based optimization for
singularly perturbed Markov reward process with aggregated states,"
Int. Conference of Intelligence Computing 2005. Accepted.
[13] J. Ledoux "On weak lump ability of denumerable Markov chains,"
Statistics & Probability Letters, Vol. 25, 1995, pp, 329-339.
[14] J.R. Jackson. "Job shop-like queuing systems," Man. Sci. Vol. 9 October,
1963, pp, 131-142.
[15] P.J. Courtois. "On the near-complete-decomposability of networks of
queues and of stochastic models of multiprogramming computing
system". Scientific Rep. CMU-CS-72-11, Carnegie-Mellon U,
November, 1971
[1] P. Marbach and J.N. Tsitsiklis "Simulation-based optimization of
Markov reward processes, " IEEE Transactions on Automatic Control,
Vol. 46, No.2, February, 2001, pp.191-209.
[2] G. Yin and Q. Zhang "Singularly perturbed discrete-Time Markov
chains," SIAM Journal Appl. Math. Vol.61, No.3, 2 000, pp 834-854.
[3] M. Abbad and J.A. Filar "Perturbation and Stability Theory for Markov
Control Problems," IEEE Transactions on Automatic Control, Vol.37,
No.9, September, 1992, pp 1415-1420.
[4] M. Abbad and J. A. Filar "Algorithms for singularly perturbed limiting
average Markov Control Problem," IEEE Transactions on Automatic
Control, Vol.37, No.9, September, 1992, pp 1421-1425.
[5] G. Yin Q. Zhang and G. Badowski "Discrete-time singularly perturbed
Markov chain: Aggregation, Occupation measures, and switching
diffusion limit," Adv. Appl. Prob.Vol.35, 2003, pp 449-476.
[6] F. Delebecque "A reduction process for perturbed Markov chains,"
SIAM Journal Appl. Math. Vol. 43, No.2, April 1983.
[7] R.H. Liu, Q. Zhang and G. Yin "Singularly perturbed Markov decision
processes in discrete time," Decision and Control, 2001. Proceedings of
the 40th IEEE Conference on. Vol. 3, 4-7 December. 2001, pp 2119 -
2124.
[8] M. Abbad, J.A. Filar, and T.R. Bielecki, "Singularly perturbed Markov
control problem: limiting average cost," Decision and Control, 1989.
Proceedings of the 28th IEEE Conference on. Vol. 2, 4-7 December.
1989- pp. 1263 - 1266.
[9] Xi-ren Cao, "The potential structure of sample paths and performance
sensitivities of Markov systems The potential structure of sample paths
and performance sensitivities of Markov systems," IEEE Transactions on
Automatic Control, Vol. 49, Issue,12 , December. 2004 pp. 2129 - 2142.
[10] H. Fang and Xi-ren Cao "Potential-based online policy iteration
algorithms for Markov decision processes," IEEE Transactions on
Automatic Control, Vol. 49 , Issue, 4 , April. 2004, pp. 493 - 505.
[11] H. Tang, H. Xi and B. Yin. "Performance optimization of continuous
time Markov control processes based on performance potentials,". Int.
Journal of Systems Science, 2003, Vol. 34(1), pp, 63-71.
[12] D. Zhang, H. Xi and B. Yin "Simulation-based optimization for
singularly perturbed Markov reward process with aggregated states,"
Int. Conference of Intelligence Computing 2005. Accepted.
[13] J. Ledoux "On weak lump ability of denumerable Markov chains,"
Statistics & Probability Letters, Vol. 25, 1995, pp, 329-339.
[14] J.R. Jackson. "Job shop-like queuing systems," Man. Sci. Vol. 9 October,
1963, pp, 131-142.
[15] P.J. Courtois. "On the near-complete-decomposability of networks of
queues and of stochastic models of multiprogramming computing
system". Scientific Rep. CMU-CS-72-11, Carnegie-Mellon U,
November, 1971
@article{"International Journal of Information, Control and Computer Sciences:49256", author = "Dali Zhang and Baoqun Yin and Hongsheng Xi", title = "A State Aggregation Approach to Singularly Perturbed Markov Reward Processes", abstract = "In this paper, we propose a single sample path based
algorithm with state aggregation to optimize the average rewards of
singularly perturbed Markov reward processes (SPMRPs) with a
large scale state spaces. It is assumed that such a reward process
depend on a set of parameters. Differing from the other kinds of
Markov chain, SPMRPs have their own hierarchical structure. Based
on this special structure, our algorithm can alleviate the load in the
optimization for performance. Moreover, our method can be applied
on line because of its evolution with the sample path simulated.
Compared with the original algorithm applied on these problems of
general MRPs, a new gradient formula for average reward
performance metric in SPMRPs is brought in, which will be proved
in Appendix, and then based on these gradients, the schedule of the
iteration algorithm is presented, which is based on a single sample
path, and eventually a special case in which parameters only
dominate the disturbance matrices will be analyzed, and a precise
comparison with be displayed between our algorithm with the old
ones which is aim to solve these problems in general Markov reward
processes. When applied in SPMRPs, our method will approach a fast
pace in these cases. Furthermore, to illustrate the practical value of
SPMRPs, a simple example in multiple programming in computer
systems will be listed and simulated. Corresponding to some practical
model, physical meanings of SPMRPs in networks of queues will be
clarified.", keywords = "Singularly perturbed Markov processes, Gradient of
average reward, Differential reward, State aggregation, Perturbed
close network.", volume = "2", number = "7", pages = "2295-10", }