Evolutionary Training of Hybrid Systems of Recurrent Neural Networks and Hidden Markov Models

We present a hybrid architecture of recurrent neural networks (RNNs) inspired by hidden Markov models (HMMs). We train the hybrid architecture using genetic algorithms to learn and represent dynamical systems. We train the hybrid architecture on a set of deterministic finite-state automata strings and observe the generalization performance of the hybrid architecture when presented with a new set of strings which were not present in the training data set. In this way, we show that the hybrid system of HMM and RNN can learn and represent deterministic finite-state automata. We ran experiments with different sets of population sizes in the genetic algorithm; we also ran experiments to find out which weight initializations were best for training the hybrid architecture. The results show that the hybrid architecture of recurrent neural networks inspired by hidden Markov models can train and represent dynamical systems. The best training and generalization performance is achieved when the hybrid architecture is initialized with random real weight values of range -15 to 15.




References:
[1] A.J Robinson, An application of recurrent nets to phone probability
estimation, IEEE transactions on Neural Networks, vol.5, no.2 , 1994,
pp. 298-305.
[2] C.L. Giles, S. Lawrence and A.C. Tsoi, Rule inference for financial
prediction using recurrent neural networks, Proc. of the IEEE/IAFE
Computational Intelligence for Financial Engineering, New York City,
USA, 1997, pp. 253-259
[3] K. Marakami and H Taguchi, Gesture recognition using recurrent neural
networks, Proc. of the SIGCHI conference on Human factors in
computing systems: Reaching through technology, Louisiana, USA,
1991, pp. 237-242.
[4] T. Kobayashi, S. Haruyama, Partly-Hidden Markov Model and its
Application to Gesture Recognition, Proc. of IEEE International
Conference on Acoustics, Speech, and Signal Processing , vol. 4, 1997,
pp.3081.
[5] P.A Stoll, J. Ohya, Applications of HMM modeling to recognizing
human gestures in image sequences for a man-machine interface, Proc.
of the 4th IEEE International Workshop on Robot and Human
Communication, Tokyo, 1995, pp. 129-134.
[6] M. J. F. Gales, Maximum likelihood linear transformations for HMMbased
speech recognition, Computer Speech and Language, vol. 12,
1998, pp. 75-98.
[7] T. Wessels, C.W. Omlin, Refining Hidden Markov Models with
Recurrent Neural Networks, Proc. of the IEEE-INNS-ENNS
International Joint Conference on Neural Networks, vol. 2, 2000, pp.
2271.
[8] Kim Wing C. Ku, Man Wai Mak, and Wan Chi Siu. Adding learning to
cellular genetic algorithms for training recurrent neural networks. IEEE
Transactions on Neural Networks, vol. 10, no.2, 1999, pp. 239-252.
[9] Abbass Hussein, An evolutionary artificial neural network approach for
breast cancer diagnosis. Artificial Intelligence in Medicine, vol. 25, no.
3, 2002, pp.265-281.
[10] Lee Giles, C.W Omlin and K. Thornber, Equivalence in Knowledge
Representation: Automata, Recurrent Neural Networks, and dynamical
Systems, Proc. of the IEEE, vol. 87, no. 9, 1999, pp.1623-1640
[11] P. Manolios and R. Fanelli, First order recurrent neural networks and
deterministic finite state automata. Neural Computation, vol. 6, no. 6,
1994, pp.1154-1172.
[12] R. L. Watrous and G. M. Kuhn, Induction of finite-state languages using
second-order recurrent networks, Proc. of Advances in Neural
Information Systems, California, USA, 1992, pp. 309-316.
[13] T. Lin, B.G. Horne, P. Tino, & C.L. Giles, Learning long-term
dependencies in NARX recurrent neural networks. IEEE Transactions
on Neural Networks, vol. 7, no. 6, 1996, pp. 1329-1338.
[14] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural
Computation, vol. 9, no. 8, 1997, pp. 1735-1780.
[15] E. Alpaydin, Introduction to Machine Learning, The MIT Press,
London, 2004, pp. 306-311.
[16] P. J. Angeline, G. M. Sauders, and J. B. Pollack, An evolutionary
algorithm that constructs recurrent neural networks, IEEE Trans. Neural
Networks, vol. 5, 1994, pp. 54-65.
[17] Y. Bengio, Neural Networks for Speech and Sequence Recognition.
London UK, International Thompson Computer Press, 1996.
[18] Y. LeCun, J. Denker and S. Solla, Optimal Brain Damage, Advances in
Neural Information Processing Systems 2, Morgan Kaufman Publishers,
San Mateo, CA, 1990.