Automatic Text Summarization

This work proposes an approach to address automatic text summarization. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First we investigate the effect of each sentence feature on the summarization task. Then we use all features score function to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 English religious articles. The results of the proposed approach are promising.

An Integrated Natural Language Processing Approach for Conversation System

The main aim of this research is to investigate a novel technique for implementing a more natural and intelligent conversation system. Conversation systems are designed to converse like a human as much as their intelligent allows. Sometimes, we can think that they are the embodiment of Turing-s vision. It usually to return a predetermined answer in a predetermined order, but conversations abound with uncertainties of various kinds. This research will focus on an integrated natural language processing approach. This approach includes an integrated knowledge-base construction module, a conversation understanding and generator module, and a state manager module. We discuss effectiveness of this approach based on an experiment.

Mining News Sites to Create Special Domain News Collections

We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.