Abstract: This work proposes an approach to address automatic
text summarization. This approach is a trainable summarizer, which
takes into account several features, including sentence position,
positive keyword, negative keyword, sentence centrality, sentence
resemblance to the title, sentence inclusion of name entity, sentence
inclusion of numerical data, sentence relative length, Bushy path of
the sentence and aggregated similarity for each sentence to generate
summaries. First we investigate the effect of each sentence feature on
the summarization task. Then we use all features score function to
train genetic algorithm (GA) and mathematical regression (MR)
models to obtain a suitable combination of feature weights. The
proposed approach performance is measured at several compression
rates on a data corpus composed of 100 English religious articles.
The results of the proposed approach are promising.
Abstract: The main aim of this research is to investigate a novel technique for implementing a more natural and intelligent conversation system. Conversation systems are designed to converse like a human as much as their intelligent allows. Sometimes, we can think that they are the embodiment of Turing-s vision. It usually to return a predetermined answer in a predetermined order, but conversations abound with uncertainties of various kinds. This research will focus on an integrated natural language processing approach. This approach includes an integrated knowledge-base construction module, a conversation understanding and generator module, and a state manager module. We discuss effectiveness of this approach based on an experiment.
Abstract: We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.