Optimizing Dialogue Strategy Learning Using Learning Automata

Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.




References:
[1] O. Abul, F. Polat, and R. Alhajj, "Multiagent reinforcement learning
using function approximation," IEEE Trans. Syst., Man, Cybern. C, Appl.
Rev., vol. 4, no. 4, pp. 485-497, Nov 2000.
[2] J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and
A. Stent, "Towards conversational humancomputer interaction," AI Magazine,
vol. 22, no. 4, pp. 27-38, 2001.
[3] L. Busoniu, B. D. Schutter, and R. Babuska, "Decentralized reinforcementlearning
control of a robotic manipulator," in Proc. 9th Int. Conf.
Control Autom. Robot. Vis. (ICARCV-06), Singapore, 2006, pp. 1347-
1352.
[4] H. Cuayahuitl, S. Renals, O. Lemon, and H. Shimodaira, "Reinforcement
learning of dialogue strategies using hierarchical abstract machines," in
Proc. of IEEE/ACL SLT, 2006.
[5] L. V. de Wege, "Learning automata as a framework for multi-agent
reinforcement learning," Master-s thesis, Vrije Universiteit Brussel,
Belgium, 2006.
[6] S. Dzeroski, L. D. Raedt, and K. Driessens, "Relational reinforcement
learning," Machine Learning, vol. 43, no. 1-2, pp. 7-52, 2001.
[7] R. P. E. Levin and W. Eckert, "A stochastic model of humanmachine
interaction for learning dialog strategies," IEEE Trans. Speech Audio
Processing, vol. 8, no. 1, pp. 11-23, 2000.
[8] F. Fernandez and L. E. Parker, "Learning in large cooperative multirobot
systems," International Journal of Autonomous Robots, vol. 16,
no. 4, pp. 217-226, 2001.
[9] D. Goddeau, H. Meng, J. Polifroni, S. Seneff, and I. S. Busayapongcha,
"A form-based dialogue manager for spoken language applications," in
Proc. of ICSLP, Philadelphia, USA, 1996, pp. 701-704.
[10] J. Henderson, O. Lemon, and K. Georgila, "Hybrid reinforcement/
supervised learning for dialogue policies from communicator data,"
in Workshop on Knowledge and Reasoning in Practical Dialogue
Systems (IJCAI), 2005.
[11] Y. Ishiwaka, T. Sato, and Y. Kakazu, "An approach to the pursuit
problem on a heterogeneous multiagent system using reinforcement
learning," Robot. Autonomous System, vol. 43, no. 4, pp. 245-256, 2003.
[12] M. Mctear, "Modelling spoken dialogues with state transition diagrams:
Experiences with the cslu toolkit," in Proc. of ICSLP, Sidney, Australia,
1998, pp. 1223-1226.
[13] K. S. Narendra and S. Lakshmivarrahan, "Learning automataÔÇöa critique,"
Journal of Cybernetics and Information Sciences, pp. 53-66,
1977.
[14] K. S. Narendra and M. A. L. Thathachar, "Learning automata-a survey,"
IEEE Transaction on Systems, Man and Cybernetics-SMC, vol. 4, no. 8,
pp. 323-334, 1974.
[15] ÔÇöÔÇö, Learning Automata: An Introduction. Englewood Cliffs, NJ:
Prentice-Hall, 1989.
[16] A. Now'e, K. Verbeeck, and M. Peeters, "Learning automata as a basis
for multi agent reinforcement learning," Learning and Adaption in Multi-
Agent, pp. 71-85, 2006, ISSN 0302-9743.
[17] B. J. Oommen and M. Agache, "Continuous and discretized pursuit
learning schemes: Various algorithms and their comparison," IEEE
Transactions on Systems, Man and Cybernetics, Part B:, vol. 32, pp.
77-287, 2002.
[18] T. Peak and D. Chickering, "The markov assumption in spoken dialogue
management," in 6th SIGDial Workshop on Discourse and Dialogue,
2006.
[19] T. Peak and R. Pieraccini, "Automating spoken dialogue management
design using machine learning: an industry perspective," Speech Communication,
vol. 50, pp. 716-729, 2008.
[20] O. Pietquin, A Framework for Unsupervised Learning of Dialogue
Strategies. Preses Universitaries de Louvain, 2004.
[21] O. Pietquin and T. Dutoit, "A probabilistic framework for dialog
simulation and optimal strategy learning," IEEE Transactions on Audio,
Speech and Language Processing, vol. 14, no. 2, pp. 589-599, 2006.
[22] A. S. Poznyak and K. Najim, Learning Automata and Stochastic
Optimization. Springer, 1997.
[23] M. K. S. Singh, D. Litman and M. Walker, "Optimizing dialogue
management with reinforcement learning: experiments with the njfun
system," Journal of Artificial Intelligence Research, vol. 16, pp. 105-
133, 2002.
[24] J. Schatzmann, K. Weilhammer, M. N. Stuttle, and S. Young, A Survey
of Statistical User Simulation Techniques for Reinforcement-Learning of
Dialogue Management Strategies. Cambridge University Press, 2006,
vol. 21, pp. 97-126.
[25] K. Scheffler and S. Young, "Corpus-based dialogue simulation for
automatic strategy learning and evaluation," in Proc. of the NAACL
Workshop on Adaptation in Dialogue Systems, 2001.
[26] R. Sutton and A. Barto, Reinforcement learning: An introduction. MIT
Press, 1998.
[27] H. Tamakoshi and S. Ishii, "Multiagent reinforcement learning applied
to a chase problem in a continuous world," Artif. Life Robot, vol. 5,
no. 4, pp. 202-206, 2001.
[28] M. A. L. Thathachar and P. S. Sastry, Networks of Learning Automata:
Techniques for Online Stochastic Optimization. Norwell, MA: Kluwer,
2004.
[29] K. Tuyls and A. Now'e, "Evolutionary game theory and multi-agent
reinforcement learning," The Knowledge Engineering Review, vol. 20,
pp. 63-90, 2005, ISSN 0269-8889.
[30] K. Verbeeck, A. Now'e, P. Vrancx, and P. Maarten, Multi-Automat
Learning, Reinforcement Learning: Theory and Applications. I-Tech
Education and Publishing, 2008.
[31] K. Verbeeck, P. Vrancx, and A. Now'e, "Networks of learning automata
and limiting games," in Proc. of the 7th ALAMAS Symposium, 2007, pp.
171-182, ISSN 0922-8721.
[32] M. Walker, D. Litman, C. Kamm, and A. Abella, "Paradise: A framework
for evaluating spoken dialogue agents," in Proc. of the 5th annual
meeting of the association for computational linguistics(ACL-97), 1997,
pp. 271-280.
[33] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine learning,
vol. 8, no. 3, pp. 229-256, 1992.
[34] R. M. Wheeler and K. S. Narendra, "Decentralized learning in finite
markov chains," IEEE Transactions on Automatic Control, vol. AC-31,
pp. 519-526, 1986.
[35] M. Wiering, R. Salustowicz, and J. Schmidhuber, "Reinforcement learning
soccer teams with incomplete world models," Autonomous. Robots,
vol. 7, no. 1, pp. 77-88, 1999.