Abstract: As a popular rank-reduced vector space approach,
Latent Semantic Indexing (LSI) has been used in information
retrieval and other applications. In this paper, an LSI-based content
vector model for text classification is presented, which constructs
multiple augmented category LSI spaces and classifies text by their
content. The model integrates the class discriminative information
from the training data and is equipped with several pertinent feature
selection and text classification algorithms. The proposed classifier
has been applied to email classification and its experiments on a
benchmark spam testing corpus (PU1) have shown that the approach
represents a competitive alternative to other email classifiers based
on the well-known SVM and naïve Bayes algorithms.
Abstract: In this research, a latent class vector model for pairwise data is formulated. As compared to the basic vector model, this model yields consistent estimates of the parameters since the number of parameters to be estimated does not increase with the number of subjects. The result of the analysis reveals that the model was stable and could classify each subject to the latent classes representing the typical scales used by these subjects.