Automatic Text Summarization

This work proposes an approach to address automatic text summarization. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First we investigate the effect of each sentence feature on the summarization task. Then we use all features score function to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 English religious articles. The results of the proposed approach are promising.




References:
[1] Hobson, S., Dorr, B., Monz, C., & Schwartz, R. (2007). Task-based
evaluation of text summarization using Relevance Prediction
Information Processing & Management, 43(6), 1482-1499.
[2] Sjöbergh, J. (2007). Older versions of the ROUGEeval summarization
evaluation system were easier to fool. Information Processing &
Management, 43(6), 1500-1505.
[3] Over, P., Dang, H., & Harman, D. (2007). DUC in context. Information
Processing & Management, 43(6), 1506-1520.
[4] Hirao, T., Okumura, M., Yasuda, N., & Isozaki, H. (2007). Supervised
automatic evaluation for summarization with voted regression model.
Information Processing & Management, 43(6), 1521-1535.
[5] Zajic, D., Dorr, B., Lin, J., & Schwartz, R. (2007). Multi-candidate
reduction: Sentence compression as a tool for document summarization
tasks. Information Processing & Management, 43(6), 1549-1570.
[6] Nomoto, T. (2007). Discriminative sentence compression with
conditional random fields. Information Processing & Management,
43(6), 1571-1587.
[7] Vanderwende, L., Suzuki, H., Brockett, C., & Nenkova, A. (2007).
Beyond SumBasic: Task-focused summarization with sentence
simplification and lexical expansion. Information Processing &
Management, 43(6), 1606-1618.
[8] Harabagiu, S., Hickl, A., & Lacatusu, F. (2007). Satisfying information
needs with multi-document summaries. Information Processing &
Management, 43(6), 1619-1642.
[9] Moens, M. (2007). Summarizing court decisions. Information
Processing & Management, 43(6) 1748-1764.
[10] Reeve, L., Han, H., & Brooks, A. (2007). The use of domain-specific
concepts in biomedical text summarization. Information Processing &
Management, 43(6), 1765-1776.
[11] Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C., & Schatz, B. (2007).
Generating gene summaries from biomedical literature: A study of semistructured
summarization. Information Processing & Management,
43(6), 1777-1791.
[12] Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern
approach. Englewood Cliffs, NJ: Prentice-Hall International Inc.
[13] Yeh, J., Ke, H., Yang, W., & Meng. I. (2005). Text summarization using
a trainable summarizer and latent semantic analysis. Information
Processing & Management, 41(1), 75-95.
[14] Jann, B. (2005). Making regression tables from stored estimates. Stata
Journal 5, 288-308.