Abstract: Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.
Abstract: The growth in the volume of text data such as books
and articles in libraries for centuries has imposed to establish
effective mechanisms to locate them. Early techniques such as
abstraction, indexing and the use of classification categories have
marked the birth of a new field of research called "Information
Retrieval". Information Retrieval (IR) can be defined as the task of
defining models and systems whose purpose is to facilitate access to
a set of documents in electronic form (corpus) to allow a user to find
the relevant ones for him, that is to say, the contents which matches
with the information needs of the user. This paper presents a new
semantic indexing approach of a documentary corpus. The indexing
process starts first by a term weighting phase to determine the
importance of these terms in the documents. Then the use of a
thesaurus like Wordnet allows moving to the conceptual level.
Each candidate concept is evaluated by determining its level of
representation of the document, that is to say, the importance of the
concept in relation to other concepts of the document. Finally, the
semantic index is constructed by attaching to each concept of the
ontology, the documents of the corpus in which these concepts are
found.
Abstract: Today, Genetic Algorithm has been used to solve
wide range of optimization problems. Some researches conduct on
applying Genetic Algorithm to text classification, summarization
and information retrieval system in text mining process. This
researches show a better performance due to the nature of Genetic
Algorithm. In this paper a new algorithm for using Genetic
Algorithm in concept weighting and topic identification, based on
concept standard deviation will be explored.