Scholarly

Mining News Sites to Create Special Domain News Collections

Year: 2008 Volume: 2 Issue: 6 1842 - 1849 Pages

Authors:
David B. Bracewell
Fuji Ren
Shingo Kuroiwa

Abstract: We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.

Keywords:
Information Retrieval
News
Special DomainCollections

Top Journal

SUGGEST A JOURNAL

Mining News Sites to Create Special Domain News Collections