Abstract: Text document categorization involves large amount
of data or features. The high dimensionality of features is a
troublesome and can affect the performance of the classification.
Therefore, feature selection is strongly considered as one of the
crucial part in text document categorization. Selecting the best
features to represent documents can reduce the dimensionality of
feature space hence increase the performance. There were many
approaches has been implemented by various researchers to
overcome this problem. This paper proposed a novel hybrid approach
for feature selection in text document categorization based on Ant
Colony Optimization (ACO) and Information Gain (IG). We also
presented state-of-the-art algorithms by several other researchers.
Abstract: In this paper we propose a new approach for flexible document categorization according to the document type or genre instead of topic. Our approach implements two homogenous classifiers: contextual classifier and logical classifier. The contextual classifier is based on the document URL, whereas, the logical classifier use the logical structure of the document to perform the categorization. The final categorization is obtained by combining contextual and logical categorizations. In our approach, each document is assigned to all predefined categories with different membership degrees. Our experiments demonstrate that our approach is best than other genre categorization approaches.
Abstract: This paper presents a new approach for automatic
document categorization. Exploiting the logical structure of the
document, our approach assigns a HTML document to one or more
categories (thesis, paper, call for papers, email, ...). Using a set of
training documents, our approach generates a set of rules used to
categorize new documents. The approach flexibility is carried out
with rule weight association representing your importance in the
discrimination between possible categories. This weight is
dynamically modified at each new document categorization. The
experimentation of the proposed approach provides satisfactory
results.