A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment

Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.




References:
[1] Y. Mehdad, A. Moschittil, and F. Massiomo, “Semker: Syntactic/Semantic Kernels for Recognizing Textual entailment,” In Proceedings of the Text Analysis Conference, pp. 259-265, 2009.
[2] B. MacCartney, M. Galley, and C. D. Manning, “A Phrase-Based Alignment Model for Natural Language Inference,” In Proceedings of the Empirical Methods in Natural Language Processing, pp. 802-811, 2008.
[3] E. Lien, and M. Kouylekov, “Semantic Parsing for Textual Entailment,” In Proceedings of the 14th International Conference on Parsing Technologies, pp. 40–49, 2015.
[4] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A Convolutional Neural Network for Modelling Sentences,” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655–665, 2014.
[5] B. Liu, M. Huang, S. Liu X. Zhu and X. Zhu, “A Sentence Interaction Network for Modeling Dependence between Sentences,” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp.558–567, 2016.
[6] S. R. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. D. Manning and C. Potts, “A Fast Unified Model for Parsing and Sentence Understanding,” arXiv:1603.06021v3, 2016.
[7] L. Mou, H. Peng, G. Li, Y. Xu, L. Zhang, and Z. Jin, “Discriminative neural sentence modeling by tree-based convolution,” In Proceedings of the Empirical Methods in Natural Language Processing, pp. 2315-2325, 2015.
[8] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A Large Annotated Corpus for Learning Natural Language Inference,” In Proceedings of the Empirical Methods in Natural Language Processing, pp. 632-642, 2015.
[9] T. Rocktäschel, E. Grefenstette, K. M. Hermann, T. Kocisky, and P. Blunsom, “Reasoning about Entailment with Neural Attention,” In arXiv preprint arXiv: 1509.06664, 2015.
[10] Y. Liu, C. Sun, L. Lin, and X. Wang, “Learning Natural language Inference using Bidirectional LSTM model and Inner-Attention,” arXiv:1605.09090v1, 2016.
[11] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation No. 9, Vol. 8, pp. 1735-1780, 1997.
[12] A. M. Rush, S. Chopra, and J. Weston, “A Neural Attention Model for Sentence Summarization,” In Proceedings of the Empirical Methods in Natural Language Processing, pp. 379–389, 2015.
[13] L. Mou, M. Rui, G. Li, Y. Xu, L. Zhang, R. Yan, and Z. Jin, “Recognizing Entailment and Contradiction by Tree-Convolution,” arXiv: 1512.08422, 2015.
[14] S. Ioffe, and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” In arXiv preprint arXiv: 1502.03167, 2015.
[15] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” In Journal of Machine Learning Research No.15 vol. 1, pp. 1929-1958, 2014.
[16] D. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv: 1412.6980, 2014.
[17] J. Pennington, R. Socher and C. D. Manning, “Glove: Global Vector for Word Representation,” In Proceedings of the Empirical Methods in Natural Language Processing, pp. 1532-1543, 2014.