Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Recently, the rapid development of deep learning makes
artificial intelligence (AI) penetrate into many fields, replacing
manual work there. In particular, AI systems also become a research
focus in the field of automatic office. To meet real needs in automatic
officiating, in this paper we develop an automatic form filling system.
Specifically, it uses two classical neural network models and several
word embedding models to classify various relevant information
elicited from the Internet. When training the neural network models,
we use less noisy and balanced data for training. We conduct a series
of experiments to test my systems and the results show that our
system can achieve better classification results.




References:
[1] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff
Dean. Efficient estimation of word representations in vector space.
In Proceedings of the 2013 International Conference on Learning
Representations, pages 3111–3119, 2013.
[2] Jeffrey Pennington, Richard Socher, and Christopher D Manning.
GloVe: Global vectors for word representation. In Proceedings of
the 2014 Conference on Empirical Methods in Natural Language
Processing, pages 1532–1543, 2014.
[3] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner,
Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep
contextualized word representations. In Proceedings of the 2016
Confrence on North American Chapter of the Association for
Computational Linguistics, pages 2227–2237, 2018.
[4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
Bert: Pre-training of deep bidirectional transformers for language
understanding. In Proceedings of the 2019 Confrence on Association
for Computational Linguistics, 2019.
[5] Xiaoya Li, Yuxian Meng, Xiaofei Sun, Qinghong Han, Arianna Yuan,
and Jiwei Li. Is word segmentation necessary for deep learning of
chinese representations? In Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics, pages 3242–3252, 2019.
[6] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson,
Richard E Howard, Wayne Hubbard, and Lawrence D Jackel.
Backpropagation applied to handwritten zip code recognition. Neural
Computation, 1(4):541–551, 1989.
[7] Mohamed MG Farag, Sunshin Lee, and Edward A Fox. Focused crawler
for events. International Journal on Digital Libraries, 19(1):3–19, 2018.
[8] Sawroop Kaur Bal and G Geetha. Smart distributed web crawler.
In Proceeding of the 2016 International Conference on Information
Communication and Embedded Systems, pages 1–5, 2016.
[9] Dani Gunawan, Amalia Amalia, and Atras Najwan. Improving data
collection on article clustering by using distributed focused crawler.
2017.
[10] Deng Kaiying, Chen Senpeng, and Deng Jingwei. On optimisation
of web crawler system on scrapy framework. Proceeding of
the 2020 International Journal of Wireless and Mobile Computing,
18(4):332–338, 2020.
[11] Priyanga Chandrasekar, Kai Qian, Hossain Shahriar, and Prabir
Bhattacharya. Improving the prediction accuracy of decision tree mining
with data preprocessing. In Proceeding of the 41st Annual Computer
Software and Applications Conference, volume 2, pages 481–484, 2017.
[12] Hongyu Yang and Fengyan Wang. Wireless network intrusion detection
based on improved convolutional neural network. Special Section
On Security And Privacy In Emerging Decentralized Communication
Environments, 7:64366–64374, 2019.
[13] Shuai Jiang and Xiaolong Xu. Application and performance analysis
of data preprocessing for intrusion detection system. In Proceeding of
the 2019 International Conference on Science of Cyber Security, pages
163–177, 2019.
[14] Armand Joulin, ´ Edouard Grave, Piotr Bojanowski, and Tom´aˇs Mikolov.
Bag of tricks for efficient text classification. In Proceedings of
the 15th Conference of the European Chapter of the Association for
Computational Linguistics: Volume 2, Short Papers, pages 427–431,
2017.
[15] Yoon Kim. Convolutional neural networks for sentence classification.
In Processing of the 19th Conference on Empirical Methods in Natural
Language Processing, page 17461751, 2014.
[16] Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional
neural networks for text classification. In Proceedings of the 29th AAAI
Conference on Artificial intelligence, pages 2267–2273, 2015.
[17] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola,
and Eduard Hovy. Hierarchical attention networks for document
classification. In Proceedings of the 2016 Conference of the North
American Chapter of the Association for Computational Linguistics:
Human Language Technologies, pages 1480–1489, 2016.
[18] Alexis Conneau, Holger Schwenk, Lo¨ıc Barrault, and Yann Lecun. Very
deep convolutional networks for text classification. In Proceedings of
the 15th Conference of the European Chapter of the Association for
Computational Linguistics, pages 1107–1116, 2017.
[19] Abdalraouf Hassan and Ausif Mahmood. Efficient deep learning model
for text classification based on recurrent and convolutional layers. In
Proceeding of the 16th IEEE international Conference on Machine
Learning and Applications (ICMLA), pages 1108–1113, 2017.
[20] Long Guo, Dongxiang Zhang, Lei Wang, Han Wang, and Bin Cui.
Cran: a hybrid cnn-rnn attention-based model for text classification.
In Proceeding of the 2018 International Conference on Conceptual
Modeling, pages 571–585, 2018.
[21] Tengfei Liu, Shuangyuan Yu, Baomin Xu, and Hongfeng Yin.
Recurrent networks with attention and convolutional networks for
sentence representation and classification. Applied Intelligence,
48(10):3797–3806, 2018.
[22] Jin Zheng and Limin Zheng. A hybrid bidirectional recurrent
convolutional neural network attention-based model for text
classification. IEEE Access, 7:106673–106685, 2019.
[23] Shiyao Wang and Zhidong Deng. Tightly-coupled convolutional neural
network with spatial-temporal memory for text classification. In
Proceeding of the 2017 International Joint Conference on Neural
Networks, pages 2370–2376, 2017.
[24] Juliet Chebet Moso, Jonah Kenei, Elisha T Opiyo Omullo, Robert
Oboko, et al. Deep cnn with residual connections and range
normalization for clinical text classification. Computer Science and
information Technology, 7(4):111–127, 2019.
[25] Renato M Silva, Roney LS Santos, Tiago A Almeida, and Thiago AS
Pardo. Towards automatically filtering fake news in portuguese. Expert
Systems with Applications, 146:113199, 2020.
[26] Botao Zhong, Xing Pan, Peter ED Love, Lieyun Ding, and Weili Fang.
Deep learning and network analysis: Classifying and visualizing accident
narratives in construction. Automation in Construction, 113:103089,
2020.
[27] Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu,
and Murat Demirbas. Short text classification in twitter to improve
information filtering. In Proceedings of the 33rd International ACM
SIGIR Conference on Research and Development in Information
Retrieval, pages 841–842, 2010.
[28] Saurabh Kumar Srivastava, Sandeep Kumar Singh, and Jasjit S Suri.
A healthcare text classification system and its performance evaluation:
a source of better intelligence by characterizing healthcare text. In Cognitive informatics, Computer Modelling, and Cognitive Science,
pages 319–369. 2020.
[29] Che-Wen Chen, Shih-Pang Tseng, Ta-Wen Kuan, and Jhing-Fa Wang.
Outpatient text classification using attention-based bidirectional lstm for
robot-assisted servicing in hospital. Information, 11(2):106, 2020.
[30] Xi Yang and Ying Liu. Automatic extraction of theft judgment
information in natural language. Proceeding of the 18th International
Conference on Electronic Business, 2018.
[31] Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela,
Liviu P Dinu, and Josef van Genabith. Exploring the use of text
classification in the legal domain. Analysis of information in Legal
Texts, 2017.