The Layout Analysis of Handwriting Characters and the Fusion of Multi-style Ancient Books’ Background

Ancient books are significant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufficient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overfitting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the field make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversified samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different fine-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientifically, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.





References:
[1] Li Y, Zheng Y F, David D., et al. “Script-independent text line segmentation in freestyle handwritten documents.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 30, no.8, pp. 1313-1329. Aug. 2008.
[2] Yin F, Liu C L. “Handwritten Chinese text line segmentation by clustering with distance metric learning.” Pattern Recognition, vol 42, no. 12, pp. 3146-3157. Dec. 2009.
[3] Li X H, Yin F, Liu C L., “Printed/Handwritten Texts and Graphics Separation in Complex Documents Using Conditional Random Fields,” in Proceedings of the IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria, 2018, pp. 145-150.
[4] Simistira F., Bouillon M, Seuret M, et al. “ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts,” in Proceedings of the IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017, pp. 1361-1370.
[5] Wang Y W. “Research and implementation of layout analysis and post-processing for Mongolian document images,” Ph.D. dissertation, Dept. Computer Sci.., Inner Mongolia Univ., Inner Mongolia, China, 2017.
[6] Zhang X Q, Ma L L, Duan L J, Liu Y Z, Wu J. “Layout Analysis for Historical Tibetan Documents Based on Convolutional Denoising Autoencoder.” Journal of Chinese Information Processing, vol 32, no. 07, pp. 67-73. July. 2018.
[7] Chen X, He J J, Li H J, Wu L X. “Manchu Document Layout Analysis Based on Mask R-CNN.” Journal of Dalian Minzu university, vol 21, no. 3, pp. 240-245. Mar. 2019. DOI: 10.13744/j.cnki.cn21-1431/g4.2019.03.010.
[8] Augustus Odena. “Open Questions about Generative Adversarial Networks,” presented at the Distill, Apr 9, 2019. DOI: 10.23915/distill.00018
[9] Karras T, Aila T, Laine S, Lehtinen J. (2018) Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2018).
[10] Goodfellow I, Pouget-Abadie J, Mirza M, et al., “Generative adversarial nets” in Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, 2014, pp. 2672-2680.
[11] Zhan Fang-Neng, Zhu Hong-Yuan, “Spatial Fusion GAN for Image Synthesis,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019, pp. 3648-3657.
[12] Radford A, Metz L, Chintala S, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in Proceedings of the International Conference of Learning Representation (ICLR), San Juan, Puerto Rico, 2016, pp. 2234-2242.
[13] Mirza M, Osindero S. (2014). Conditional Generative Adversarial Nets. arXiv preprint, arXiv:1411.1784(2014).
[14] Chen X, Duan Y, Houthooft R, et al, “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets,” in Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 2172-2180.
[15] Denton E L, Chintala S, Fergus R, “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks” in Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Montréal, Canada, 2015, pp. 1486-1494.
[16] Chen S X, Wang X L, Han X, Liu Y, Wang M G. “A recognition method of Ancient Yi character based on deep learning.” Journal of Zhejiang University (science edition), vol 46, no. 3, pp. 261-269. May. 2019. DOI: 10.3785/j.issn.1008⁃9497.2019.03.001.
[17] Ren J J, Wang N. “Research on Cost Function in Artificial Neural Network.” Journal of Gansu Normal Colleges, vol 23, no. 2, pp. 61-63. Feb. 2018. DOI: 008-9020(2018)02-061-03.
[18] Zhou F, Li Y, Fan X Y. “Improved Loss Calculation Algorithm for Convolutional Neural Networks in Image Classification Application.” Journal of Chinese Computer System, vol 40, no. 7, pp. 1532-1537. July. 2019.DOI: 1000-1220(2019)07-1532-06.
[19] Azadi S, Fisher M, Kim V, et al., “Multi-Content GAN for Few-Shot Font Style Transfer,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2018, pp. 7564-7573.
[20] Wang Z, Simoncelli E. P., Bovik A.C., “Multi-scale Structural Similarity for Image Quality Assessment,” in Proceedings of the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, 2003, pp. 9-12.
[21] Heusel M, Ramsauer H, Unterthiner T, et al., “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” in Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Long Beach, USA, 2017, pp. 6626-6637.
[22] Li Y, Zheng Y F, David D., et al. “Script-independent text line segmentation in freestyle handwritten documents.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 30, no.8, pp. 1313-1329. Aug. 2008.