On Dialogue Systems Based on Deep Learning

Nowadays, dialogue systems increasingly become the
way for humans to access many computer systems. So, humans
can interact with computers in natural language. A dialogue
system consists of three parts: understanding what humans say in
natural language, managing dialogue, and generating responses in
natural language. In this paper, we survey deep learning based
methods for dialogue management, response generation and dialogue
evaluation. Specifically, these methods are based on neural network,
long short-term memory network, deep reinforcement learning,
pre-training and generative adversarial network. We compare these
methods and point out the further research directions.




References:
[1] A.F. Agarap. A neural network architecture combining gated recurrent
unit (GRU) and support vector machine (SVM) for intrusion detection in
network traffic data. In Proceedings of the 10th International Conference
on Machine Learning and Computing, pages 26–30, 2018.
[2] M.Z. Alom, T.M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M.S.
Nasrin, M. Hasan, B.C. Van Essen, A.A.S. Awwal, and V.K. Asari.
A state-of-the-art survey on deep learning theory and architectures.
Electronics, 8(3):292, 2019.
[3] K. Arulkumaran, M.P. Deisenroth, M. Brundage, and A.A. Bharath.
Deep reinforcement learning: A brief survey. IEEE Signal Processing
Magazine, 34(6):26–38, 2017.
[4] K. Asadi and J.D. Williams. Sample-efficient deep reinforcement
learning for dialog control. arXiv preprint arXiv:1612.06000, 2016.
[5] T. Baltruˇsaitis, C. Ahuja, and L. Morency. Multimodal machine learning:
A survey and taxonomy. IEEE transactions on pattern analysis and
machine intelligence, 41(2):423–443, 2018.
[6] T. B Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal,
A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. Language models
are few-shot learners. arXiv preprint arXiv:2005.14165, 2020. [7] E. Bruni and R. Fern´andez. Adversarial evaluation for open-domain
dialogue generation. In Proceedings of the 18th Annual SIGdial Meeting
on Discourse and Dialogue, pages 284–288, 2017.
[8] P. Budzianowski, T. Wen, B. Tseng, I. Casanueva, S. Ultes, O. Ramadan,
and M. Gaˇsi´c. Multiwoz - a large-scale multi-domain wizard-of-oz
dataset for task-oriented dialogue modelling. In Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing,
page 50165026, 2018.
[9] H. Chen, X. Liu, D. Yin, and J. Tang. A survey on dialogue systems:
Recent advances and new frontiers. ACM SIGKDD Explorations
Newsletter, 19(2):25–35, 2017.
[10] L. Chen, Z. Chen, B. Tan, S. Long, M. Gaˇsi´c, and K. Yu.
Agentgraph: Toward universal dialogue management with structured
deep reinforcement learning. IEEE/ACM Transactions on Audio, Speech,
and Language Processing, 27(9):1378–1391, 2019.
[11] H. Cuay´ahuitl, D. Lee, S. Ryu, Y. Cho, S. Choi, S. Indurthi, S. Yu,
H. Choi, I. Hwang, and J. Kim. Ensemble-based deep reinforcement
learning for chatbots. Neurocomputing, 366:118–130, 2019.
[12] J. Deriu, A. Rodrigo, A. Otegi, G. Echegoyen, S. Rosset, E. Agirre,
and M. Cieliebak. Survey on evaluation methods for dialogue systems.
Artificial Intelligence Review, pages 1–56, 2020.
[13] J. Devlin, M.W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training
of deep bidirectional transformers for language understanding. In
Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies, (Volume 1: Long and Short Papers), page 41714186, 2019.
[14] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou,
and H.-W Hon. Unified language model pre-training for natural language
understanding and generation. In Proceedings of the 2019 Advances in
Neural Information Processing Systems, pages 13063–13075, 2019.
[15] O. Duˇsek, J. Novikova, and V. Rieser. Evaluating the state-of-the-art
of end-to-end natural language generation: The E2E NLG challenge.
Computer Speech & Language, 59:123–156, 2020.
[16] P. Ehrenbrink, S. Osman, and S. M¨oller. Google now is for the
extraverted, cortana for the introverted: Investigating the influence of
personality on ipa preference. In Proceedings of the 29th Australian
Conference on Computer-Human Interaction, pages 257–265, 2017.
[17] M. Eric and C. D. Manning. Key-value retrieval networks for
task-oriented dialogue. In Proceedings of the 18th Annual SIGdial
Meeting on Discourse and Dialogue, pages 37–49, 2017.
[18] S. Feng, H. Chen, K Li, and D. Yin. Posterior-GAN: Towards
informative and coherent response generation with posterior generative
adversarial network. In Proceedings of the 34th AAAI Conference on
Artificial Intelligence, pages 7708–7715, 2020.
[19] M. Ghazvininejad, C. Brockett, and M. Chang. A knowledge-grounded
neural conversation model. In Proceedings of the 2018 National
Conference on Artificial Intelligence, pages 5110–5117, 2018.
[20] T. Holstein, M. Wallmyr, J. Wietzke, and R. Land. Current Challenges
in Compositing Heterogeneous User Interfaces for Automotive Purposes,
pages 531–542. Computer Science, 2015.
[21] V. Ilievski, C. Musat, A. Hossmann, and M. Baeriswyl. Goal-oriented
chatbot dialog management bootstrapping with transfer learning. In
Proceedings of the 27th International Joint Conference on Artificial
Intelligence Organization, pages 4115–4120, 2018.
[22] A. Kannan and O. Vinyals. Adversarial evaluation of dialogue models.
arXiv preprint arXiv:1701.08198, 2017.
[23] J. Kim, S. Oh, O.-W. Kwon, and H. Kim. Multi-turn chatbot based
on query-context attentions and dual wasserstein generative adversarial
networks. Applied Sciences, 9(18):3908, 2019.
[24] A. Kumar, P. Ku, A. Goyal, A. Metallinou, and D.H. Tur. Ma-dst:
Multi-attention based scalable dialog state tracking. In Proceedings of
the 34th AAAI Conference on Artificial Intelligence, pages 8107–8114,
2020.
[25] H. Kumar, A. Agarwal, R. Dasgupta, and S. Joshi. Dialogue act sequence
labeling using hierarchical encoder with CRF. In Proceedings of the
32nd AAAI Conference on Artificial Intelligence, pages 3440–3446,
2018.
[26] J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky.
Deep reinforcement learning for dialogue generation. In Proceedings
of the 2016 Conference on Empirical Methods in Natural Language
Processing, pages 1192–1202, 2016.
[27] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, and D. Jurafsky. Adversarial
learning for neural dialogue generation. In Proceedings of the 22nd
Empirical Methods in Natural Language Processing, page 21572169,
2017.
[28] Y. Li, K. Qian, W.Y. Shi, and Z. Yu. End-to-end trainable
non-collaborative dialog system. In Proceedings of the 34th AAAI
Conference on Artificial Intelligence, pages 8293–8302, 2020.
[29] Z.M. Li, J. Kiseleva, and M.D. Rijke. Dialogue generation: From
imitation learning to inverse reinforcement learning. In Proceedings of
the 33rd AAAI Conference on Artificial Intelligence, pages 6722–6728,
2019.
[30] R. Lowe, M. Noseworthy, I.V. Serban, N. Angelard-Gontier, and
J. Pineau. Towards an automatic turing test: Learning to evaluate
dialogue responses. In Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics, pages 1116–1126, 2017.
[31] R. Lowe, I.V. Serban, M. Noseworthy, L. Charlin, and J. Pineau. On
the evaluation of dialogue systems with next utterance classification. In
Proceedings of the 17th Annual Meeting of the Special Interest Group
on Discourse and Dialogue, pages 264–269, 2016.
[32] V.N. Lu, J. Wirtz, W. H. Kunz, S. Paluch, T. Gruber, A. Martins, and
P. G. Patterson. Service robots, customers and service employees: What
can we learn from the academic literature and where are the gaps?
Journal of Service Theory and Practice, 2020.
[33] A. Madotto, C.S. Wu, and P. Fung. Mem2seq: Effectively incorporating
knowledge bases into end-to-end task-oriented dialog systems. In
Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics, pages 1468–1478, 2018.
[34] N. Majumder, S.J. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and
E. Cambria. Dialoguernn: An attentive rnn for emotion detection in
conversations. In Proceedings of the 33rd AAAI Conference on Artificial
Intelligence, pages 6818–6824, 2019.
[35] E. Merdivan, D. Singh, S. Hanke, and A. Holzinger. Dialogue systems
for intelligent human computer interactions. Electronic Notes in
Theoretical Computer Science, 343:5771, 2019.
[36] F. Mi, M. Huang, J. Zhang, and B. Faltings. Meta-learning for
low-resource natural language generation in task-oriented dialogue
systems. In Proceedings of the 28th International Joint Conference on
Artificial Intelligence Organization, pages 3151–3157, 2019.
[37] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation
of word representations in vector space. In Proceedings of the
1st International Conference on Learning Representations, pages
5998–6008, 2017.
[38] T. Mikolov, I. Sutskever, K. Chen, Greg S. C., and J. Dean. Distributed
representations of words and phrases and their compositionality. In
Proceedings of the 2013 Advances in neural information processing
systems, pages 3111–3119, 2013.
[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement
learning. arXiv preprint arXiv:1312.5602, 2013.
[40] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
et al. Human-level control through deep reinforcement learning. Nature,
518(7540):529–533, 2015.
[41] N. Mrkˇsi´c, D.O. S´eaghdha, B. Thomson, M. Gaˇsi´c, P.-H. Su,
D. Vandyke, T.-H. Wen, and S. Young. Multi-domain dialog state
tracking using recurrent neural networks. In Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics and
the 7th International Joint Conference on Natural Language Processing,
volume 2, pages 794–799, 2015.
[42] N. Mrkˇsi´c, D.O. S´eaghdha, T.-H. Wen, B. Thomson, and S. Young.
Neural belief tracker: Data-driven dialogue state tracking. In
Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics, volume 1, pages 1777–1788, 2017.
[43] A. Papangelis and Y. Stylianou. Single-model multi-domain dialogue
management with deep learning. In Advanced Social Interaction with
Agents, pages 71–77. 2019.
[44] M.-J. Peng, Y.W. Qin, C.X. Tang, and X.M. Deng. An e-commerce
customer service robot based on intention recognition model. Journal
of Electronic Commerce in Organizations, 14(1):34–44, 2016.
[45] J. Pennington, R. Socher, and C.D. Manning. Glove: Global vectors for
word representation. In Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), pages 1532–1543,
2014.
[46] L.-B Qin, X. Xu, W.-X Che, Y. Zhang, and T. Liu. Dynamic
fusion network for multi-domain end-to-end task-oriented dialog. In
Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, page 63446354, 2020.
[47] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
Y. Zhou, W. Li, and P.J. Liu. Exploring the limits of transfer learning
with a unified text-to-text transformer. Journal of Machine Learning
Research, 21(140):1–67, 2020. [48] I.V. Serban, R. Lowe, P. Henderson, L. Charlin, and J. Pineau. A
survey of available corpora for building data-driven dialogue systems:
The journal version. Dialogue & Discourse, 9(1):1–49, 2018.
[49] I.V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau.
Building end-to-end dialogue systems using generative hierarchical
neural network models. In Proceedings of the 30th AAAI Conference
on Artificial Intelligence, pages 3776–3783, 2016.
[50] X.Y. Shen, H. Su, S.Z. Niu, and V. Demberg. Improving variational
encoder-decoders in dialogue generation. In Proceedings of the
32nd Association for the Advancement of Artificial Intelligence, pages
5456–5462, 2018.
[51] O. Sihombing, N. Zendrato, Y. Laia, M. Nababan, D. Sitanggang,
W. Purba, D. Batubara, S. Aisyah, E. Indra, and S. Siregar. Smart
home design for electronic devices monitoring based wireless gateway
network using cisco packet tracer. Journal of Physics Conference Series,
1007(1):12–21, 2018.
[52] H.-Y Song, W.-N Zhang, and T. Liu. Open domain multi-round
dialogue strategy learning based on dqn. Journal of Chinese Information
Processing, 32:99–108, 2018.
[53] H. Su, X.Y. Shen, P.W. Hu, W.J. Li, and Y. Chen. Dialogue generation
with gan. In Proceedings of the 32nd AAAI Conference on Artificial
Intelligence, pages 8163–8163, 2018.
[54] M.H. Su, C.H.Wu, K.Y. Huang, T.H. Yang, and T.C. Huang. Dialog state
tracking for interview coaching using two-level LSTM. In Proceedings
of the 10th International Symposium on Chinese Spoken Language
Processing, pages 1–5, 2016.
[55] S. Subramanian, S.R. Mudumba, A. Sordoni, A. Trischler, A. C.
Courville, and C. Pal. Towards text generation with adversarially learned
neuraloutlines. In Proceedings of the 32nd Conference on Neural
Information Processing Systems, volume 31, pages 2–9, 2018.
[56] X.W. Tong, Z.X. Fu, M.Y. Shang, D.Y. Zhao, and R. Yan. One ruler
for all languages: Multi-lingual dialogue evaluation with adversarial
multi-task learning. In Proceedings of the 27th International Joint
Conference on Artificial Intelligence Organization, pages 4432–4437,
2018.
[57] V.K. Tran and L.M. Nguyen. Natural language generation for
spokendialogue system usingrnn encoder-decoder networks. In
Proceedings of the 21st Conference on Computational Natural Language
Learning, pages 442–451, 2017.
[58] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkorei, L. Jones, A. Gomez,
and L. Kaiser. Attention is all you need. In Proceedings of the 2017
Advances in Neural Information Processing Systems, pages 5998–6008,
2017.
[59] J. Wang, J.H. Liu, W. Bi, X.J. Liu, K.J. He, R.F. Xu, and M. Yang.
Improving knowledge-aware dialogue generation via knowledge base
question answering. In Proceedings of the 34th AAAI Conference on
Artificial Intelligence, pages 1–8, 2020.
[60] X.-G Wang, X.-Y Cheng, J. Zhou, and W. Xu. State tracking networks
for dialog state tracking. In Proceedings of the Workshops of the 32nd
AAAI Conference on Artificial Intelligence, pages 746–751, 2018.
[61] T.-H. Wen and S. Young. Recurrent neural network language generation
for spoken dialogue systems. Computer Speech & Language, 63:101017,
2020.
[62] Y. Wu, Z. Li, W. Wu, and M. Zhou. Response selection with topic clues
for retrieval-based chatbots. Neurocomputing, 316:251–261, 2018.
[63] Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li. Sequential matching
network: A new architecture for multi-turn response selection in
retrieval-based chatbots. In Proceedings of the 55th Annual Meeting
of the Association for Computational Linguistics, volume 1, pages
496–505, 2017.
[64] Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. Lite transformer with
long-short range attention. In Proceedings of the 8th International
Conference on Learning Representations, pages 1–12, 2020.
[65] R. Yan. chitty-chitty-chat bot: Deep learning for conversation AI. In
Proceedings of the 2018 International Joint Conference on Artificial
Intelligence Organization, pages 5520–5526, 2018.
[66] H.-T. Ye, K.-L. Lo, S.-Y. Su, and Y.-N. Chen. Knowledge-grounded
response generation with deep attentional latent-variable model.
Computer Speech & Language, page 101069, 2020.
[67] H.N. Zhang, Y.Y. Lan, J.F. Guo, J. Xu, and X.Q. Cheng. Reinforcing
coherence for sequence to sequence model in dialogue generation. In
Proceedings of the 27th International Joint Conference on Artificial
Intelligence Organization, pages 4567–4572, 2018.
[68] W.-N. Zhang, Y.-Z. Zhang, and T. Liu. Survey of evaluation methods for
dialogue systems. Science in China: Information Science, 47(8):953966,
2017. (In chinese).
[69] W.E. Zhang, Q.Z. Sheng, A. Alhazmi, and C. Li. Adversarial attacks on
deep-learning models in natural language processing: A survey. ACM
Transactions on Intelligent Systems and Technology, 11(3):1–41, 2020.
[70] T. Zhao, K. Lee, and M. Eskenazi. Unsupervised discrete sentence
representation learning for interpretable neural dialog generation. In
Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), page 10981107,
2018.
[71] T.-C. Zhao, K. Xie, and M. Eskenazi. Rethinking action spaces for
reinforcement learning in end-to-end dialog agents with latent variable
models. In Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human
Language Technologies, page 12081218, 2019.
[72] Y.-Q. Zhao and Y. Xiang. Dialog generation based on hierarchical
encoding and deep reinforcement learning. Journal of Computer
Applications, 37(10):2813–2818, 2017. (In chinese).
[73] G.B. Zhou, Q. Luo, Y.J. Xiao, F. Lin, B. Chen, and Q. He. Elastic
responding machine for dialog generation with dynamically mechanism
selecting. In Proceedings of the 32nd AAAI Conference on Artificial
Intelligence, pages 5730–5737, 2018.
[74] H. Zhou, M. Huang, T.-Y. Zhang, X.-Y. Zhu, and L. Bing. Emotional
chatting machine: Emotional conversation generation with internal and
external memory. In Proceedings of the 32nd AAAI Conference on
Artificial Intelligence, pages 730–739, 2018.
[75] H. Zhou, M. Huang, and X. Zhu. Context-aware natural language
generation for spoken dialogue systems. In Proceedings of the
26th International Conference on Computational Linguistics, pages
2032–2041, 2016.