Abstract: The demands of smart visual thing recognition in various devices have been increased rapidly for daily smart production, living and learning systems in recent years. This paper proposed a visual thing recognition system, which combines binary scale-invariant feature transform (SIFT), bag of words model (BoW), and support vector machine (SVM) by using color information. Since the traditional SIFT features and SVM classifiers only use the gray information, color information is still an important feature for visual thing recognition. With color-based SIFT features and SVM, we can discard unreliable matching pairs and increase the robustness of matching tasks. The experimental results show that the proposed object recognition system with color-assistant SIFT SVM classifier achieves higher recognition rate than that with the traditional gray SIFT and SVM classification in various situations.
Abstract: Traditional document representation for classification
follows Bag of Words (BoW) approach to represent the term weights.
The conventional method uses the Vector Space Model (VSM) to
exploit the statistical information of terms in the documents and they
fail to address the semantic information as well as order of the terms
present in the documents. Although, the phrase based approach
follows the order of the terms present in the documents rather than
semantics behind the word. Therefore, a semantic concept based
approach is used in this paper for enhancing the semantics by
incorporating the ontology information. In this paper a novel method
is proposed to forecast the intraday stock market price directional
movement based on the sentiments from Twitter and money control
news articles. The stock market forecasting is a very difficult and
highly complicated task because it is affected by many factors such
as economic conditions, political events and investor’s sentiment etc.
The stock market series are generally dynamic, nonparametric, noisy
and chaotic by nature. The sentiment analysis along with wisdom of
crowds can automatically compute the collective intelligence of
future performance in many areas like stock market, box office sales
and election outcomes. The proposed method utilizes collective
sentiments for stock market to predict the stock price directional
movements. The collective sentiments in the above social media have
powerful prediction on the stock price directional movements as
up/down by using Granger Causality test.
Abstract: Text similarity measurement is a fundamental issue in
many textual applications such as document clustering, classification,
summarization and question answering. However, prevailing approaches
based on Vector Space Model (VSM) more or less suffer
from the limitation of Bag of Words (BOW), which ignores the semantic
relationship among words. Enriching document representation
with background knowledge from Wikipedia is proven to be an effective
way to solve this problem, but most existing methods still
cannot avoid similar flaws of BOW in a new vector space. In this
paper, we propose a novel text similarity measurement which goes
beyond VSM and can find semantic affinity between documents.
Specifically, it is a unified graph model that exploits Wikipedia as
background knowledge and synthesizes both document representation
and similarity computation. The experimental results on two different
datasets show that our approach significantly improves VSM-based
methods in both text clustering and classification.
Abstract: Recognizing human action from videos is an active
field of research in computer vision and pattern recognition. Human
activity recognition has many potential applications such as video
surveillance, human machine interaction, sport videos retrieval and
robot navigation. Actually, local descriptors and bag of visuals words
models achieve state-of-the-art performance for human action
recognition. The main challenge in features description is how to
represent efficiently the local motion information. Most of the
previous works focus on the extension of 2D local descriptors on 3D
ones to describe local information around every interest point. In this
paper, we propose a new spatio-temporal descriptor based on a spacetime
description of moving points. Our description is focused on an
Accordion representation of video which is well-suited to recognize
human action from 2D local descriptors without the need to 3D
extensions. We use the bag of words approach to represent videos.
We quantify 2D local descriptor describing both temporal and spatial
features with a good compromise between computational complexity
and action recognition rates. We have reached impressive results on
publicly available action data set
Abstract: Naive Bayes Nearest Neighbor (NBNN) and its variants, i,e., local NBNN and the NBNN kernels, are local feature-based classifiers that have achieved impressive performance in image classification. By exploiting instance-to-class (I2C) distances (instance means image/video in image/video classification), they avoid quantization errors of local image descriptors in the bag of words (BoW) model. However, the performances of NBNN, local NBNN and the NBNN kernels have not been validated on video analysis. In this paper, we introduce these three classifiers into human action recognition and conduct comprehensive experiments on the benchmark KTH and the realistic HMDB datasets. The results shows that those I2C based classifiers consistently outperform the SVM classifier with the BoW model.
Abstract: e-mail has become an important means of electronic
communication but the viability of its usage is marred by Unsolicited
Bulk e-mail (UBE) messages. UBE consists of many types
like pornographic, virus infected and 'cry-for-help' messages as well
as fake and fraudulent offers for jobs, winnings and medicines. UBE
poses technical and socio-economic challenges to usage of e-mails.
To meet this challenge and combat this menace, we need to
understand UBE. Towards this end, the current paper presents a
content-based textual analysis of more than 2700 body enhancement
medicinal UBE. Technically, this is an application of Text Parsing
and Tokenization for an un-structured textual document and we
approach it using Bag Of Words (BOW) and Vector Space Document
Model techniques. We have attempted to identify the most
frequently occurring lexis in the UBE documents that advertise
various products for body enhancement. The analysis of such top
100 lexis is also presented. We exhibit the relationship between
occurrence of a word from the identified lexis-set in the given UBE
and the probability that the given UBE will be the one advertising for
fake medicinal product. To the best of our knowledge and survey of
related literature, this is the first formal attempt for identification of
most frequently occurring lexis in such UBE by its textual analysis.
Finally, this is a sincere attempt to bring about alertness against and
mitigate the threat of such luring but fake UBE.
Abstract: The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.
Abstract: e-mail has become an important means of electronic
communication but the viability of its usage is marred by Unsolicited
Bulk e-mail (UBE) messages. UBE consists of many types
like pornographic, virus infected and 'cry-for-help' messages as well
as fake and fraudulent offers for jobs, winnings and medicines. UBE
poses technical and socio-economic challenges to usage of e-mails.
To meet this challenge and combat this menace, we need to
understand UBE. Towards this end, the current paper presents a
content-based textual analysis of nearly 3000 winnings-announcing
UBE. Technically, this is an application of Text Parsing and
Tokenization for an un-structured textual document and we approach
it using Bag Of Words (BOW) and Vector Space Document Model
techniques. We have attempted to identify the most frequently
occurring lexis in the winnings-announcing UBE documents. The
analysis of such top 100 lexis is also presented. We exhibit the
relationship between occurrence of a word from the identified lexisset
in the given UBE and the probability that the given UBE will be
the one announcing fake winnings. To the best of our knowledge and
survey of related literature, this is the first formal attempt for
identification of most frequently occurring lexis in winningsannouncing
UBE by its textual analysis. Finally, this is a sincere
attempt to bring about alertness against and mitigate the threat of
such luring but fake UBE.
Abstract: The state-of-the-art Bag of Words model in Content-
Based Image Retrieval has been used for years but the relevance
feedback strategies for this model are not fully investigated. Inspired
from text retrieval, the Bag of Words model has the ability to use the
wealth of knowledge and practices available in text retrieval. We
study and experiment the relevance feedback model in text retrieval
for adapting it to image retrieval. The experiments show that the
techniques from text retrieval give good results for image retrieval
and that further improvements is possible.