A Content Vector Model for Text Classification

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

A Formulation of the Latent Class Vector Model for Pairwise Data

In this research, a latent class vector model for pairwise data is formulated. As compared to the basic vector model, this model yields consistent estimates of the parameters since the number of parameters to be estimated does not increase with the number of subjects. The result of the analysis reveals that the model was stable and could classify each subject to the latent classes representing the typical scales used by these subjects.