Abstract: Recently, online marketplaces in the e-commerce industry, such as Rakuten and Alibaba, have become some of the most popular online marketplaces in Asia. In these shopping websites, consumers can select purchase products from a large number of stores. Additionally, consumers of the e-commerce site have to register their name, age, gender, and other information in advance, to access their registered account. Therefore, establishing a method for analyzing consumer preferences from both the store and the product side is required. This study uses the Doc2Vec method, which has been studied in the field of natural language processing. Doc2Vec has been used in many cases to analyze the extraction of semantic relationships between documents (represented as consumers) and words (represented as products) in the field of document classification. This concept is applicable to represent the relationship between users and items; however, the problem is that one more factor (i.e., shops) needs to be considered in Doc2Vec. More precisely, a method for analyzing the relationship between consumers, stores, and products is required. The purpose of our study is to combine the analysis of the Doc2vec model for users and shops, and for users and items in the same feature space. This method enables the calculation of similar shops and items for each user. In this study, we derive the real data analysis accumulated in the online marketplace and demonstrate the efficiency of the proposal.
Abstract: Traditional document representation for classification
follows Bag of Words (BoW) approach to represent the term weights.
The conventional method uses the Vector Space Model (VSM) to
exploit the statistical information of terms in the documents and they
fail to address the semantic information as well as order of the terms
present in the documents. Although, the phrase based approach
follows the order of the terms present in the documents rather than
semantics behind the word. Therefore, a semantic concept based
approach is used in this paper for enhancing the semantics by
incorporating the ontology information. In this paper a novel method
is proposed to forecast the intraday stock market price directional
movement based on the sentiments from Twitter and money control
news articles. The stock market forecasting is a very difficult and
highly complicated task because it is affected by many factors such
as economic conditions, political events and investor’s sentiment etc.
The stock market series are generally dynamic, nonparametric, noisy
and chaotic by nature. The sentiment analysis along with wisdom of
crowds can automatically compute the collective intelligence of
future performance in many areas like stock market, box office sales
and election outcomes. The proposed method utilizes collective
sentiments for stock market to predict the stock price directional
movements. The collective sentiments in the above social media have
powerful prediction on the stock price directional movements as
up/down by using Granger Causality test.
Abstract: Due to the rapid increase of Internet, web opinion
sources dynamically emerge which is useful for both potential
customers and product manufacturers for prediction and decision
purposes. These are the user generated contents written in natural
languages and are unstructured-free-texts scheme. Therefore, opinion
mining techniques become popular to automatically process customer
reviews for extracting product features and user opinions expressed
over them. Since customer reviews may contain both opinionated and
factual sentences, a supervised machine learning technique applies
for subjectivity classification to improve the mining performance. In
this paper, we dedicate our work is the task of opinion
summarization. Therefore, product feature and opinion extraction is
critical to opinion summarization, because its effectiveness
significantly affects the identification of semantic relationships. The
polarity and numeric score of all the features are determined by
Senti-WordNet Lexicon. The problem of opinion summarization
refers how to relate the opinion words with respect to a certain
feature. Probabilistic based model of supervised learning will
improve the result that is more flexible and effective.
Abstract: Due to the large amount of information in the World
Wide Web (WWW, web) and the lengthy and usually linearly
ordered result lists of web search engines that do not indicate
semantic relationships between their entries, the search for topically
similar and related documents can become a tedious task. Especially,
the process of formulating queries with proper terms representing
specific information needs requires much effort from the user. This
problem gets even bigger when the user's knowledge on a subject and
its technical terms is not sufficient enough to do so. This article
presents the new and interactive search application DocAnalyser that
addresses this problem by enabling users to find similar and related
web documents based on automatic query formulation and state-ofthe-
art search word extraction. Additionally, this tool can be used to
track topics across semantically connected web documents.
Abstract: Search is the most obvious application of information
retrieval. The variety of widely obtainable biomedical data is
enormous and is expanding fast. This expansion makes the existing
techniques are not enough to extract the most interesting patterns
from the collection as per the user requirement. Recent researches are
concentrating more on semantic based searching than the traditional
term based searches. Algorithms for semantic searches are
implemented based on the relations exist between the words of the
documents. Ontologies are used as domain knowledge for identifying
the semantic relations as well as to structure the data for effective
information retrieval. Annotation of data with concepts of ontology is
one of the wide-ranging practices for clustering the documents. In
this paper, indexing based on concept and annotation are proposed
for clustering the biomedical documents. Fuzzy c-means (FCM)
clustering algorithm is used to cluster the documents. The
performances of the proposed methods are analyzed with traditional
term based clustering for PubMed articles in five different diseases
communities. The experimental results show that the proposed
methods outperform the term based fuzzy clustering.
Abstract: Chinese Idioms are a type of traditional Chinese idiomatic
expressions with specific meanings and stereotypes structure
which are widely used in classical Chinese and are still common in
vernacular written and spoken Chinese today. Currently, Chinese
Idioms are retrieved in glossary with key character or key word in
morphology or pronunciation index that can not meet the need of
searching semantically. OCIRS is proposed to search the desired
idiom in the case of users only knowing its meaning without any key
character or key word. The user-s request in a sentence or phrase will
be grammatically analyzed in advance by word segmentation, key
word extraction and semantic similarity computation, thus can be
mapped to the idiom domain ontology which is constructed to provide
ample semantic relations and to facilitate description logics-based
reasoning for idiom retrieval. The experimental evaluation shows that
OCIRS realizes the function of searching idioms via semantics, obtaining
preliminary achievement as requested by the users.
Abstract: Question answering (QA) aims at retrieving precise information from a large collection of documents. Most of the Question Answering systems composed of three main modules: question processing, document processing and answer processing. Question processing module plays an important role in QA systems to reformulate questions. Moreover answer processing module is an emerging topic in QA systems, where these systems are often required to rank and validate candidate answers. These techniques aiming at finding short and precise answers are often based on the semantic relations and co-occurrence keywords. This paper discussed about a new model for question answering which improved two main modules, question processing and answer processing which both affect on the evaluation of the system operations. There are two important components which are the bases of the question processing. First component is question classification that specifies types of question and answer. Second one is reformulation which converts the user's question into an understandable question by QA system in a specific domain. The objective of an Answer Validation task is thus to judge the correctness of an answer returned by a QA system, according to the text snippet given to support it. For validating answers we apply candidate answer filtering, candidate answer ranking and also it has a final validation section by user voting. Also this paper described new architecture of question and answer processing modules with modeling, implementing and evaluating the system. The system differs from most question answering systems in its answer validation model. This module makes it more suitable to find exact answer. Results show that, from total 50 asked questions, evaluation of the model, show 92% improving the decision of the system.
Abstract: For best collaboration, Asynchronous tools and particularly the discussion forums are the most used thanks to their flexibility in terms of time. To convey only the messages that belong to a theme of interest of the tutor in order to help him during his tutoring work, use of a tool for classification of these messages is indispensable. For this we have proposed a semantics classification tool of messages of a discussion forum that is based on LSA (Latent Semantic Analysis), which includes a thesaurus to organize the vocabulary. Benefits offered by formal ontology can overcome the insufficiencies that a thesaurus generates during its use and encourage us then to use it in our semantic classifier. In this work we propose the use of some functionalities that a OWL ontology proposes. We then explain how functionalities like “ObjectProperty", "SubClassOf" and “Datatype" property make our classification more intelligent by way of integrating new terms. New terms found are generated based on the first terms introduced by tutor and semantic relations described by OWL formalism.
Abstract: Extracting thematic (semantic) roles is one of the
major steps in representing text meaning. It refers to finding the
semantic relations between a predicate and syntactic constituents in a
sentence. In this paper we present a rule-based approach to extract
semantic roles from Persian sentences. The system exploits a twophase
architecture to (1) identify the arguments and (2) label them
for each predicate.
For the first phase we developed a rule based shallow parser to
chunk Persian sentences and for the second phase we developed a
knowledge-based system to assign 16 selected thematic roles to the
chunks. The experimental results of testing each phase are shown at
the end of the paper.
Abstract: Discourse pronominal anaphora resolution must be part of any efficient information processing systems, since the reference of a pronoun is dependent on an antecedent located in the discourse. Contrary to knowledge-poor approaches, this paper shows that syntax-semantic relations are basic in pronominal anaphora resolution. The identification of quantified expressions to which pronouns can be anaphorically related provides further evidence that pronominal anaphora is based on domains of interpretation where asymmetric agreement holds.
Abstract: Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.
Abstract: Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators and corpora are usually limited to certain languages and domains. Furthermore, search results from engines with traditional 'keyword' approach are no longer satisfying. More intelligent knowledge engineering agents are needed. To address to these problems, a system known as Multilingual Word Semantic Network is proposed. This system adapted semantic network to organize words according to concepts and relations. The system also uses open source as the development philosophy to enable the native language speakers and experts to contribute their knowledge to the system. The contributed words are then defined and linked using lexical and semantic relations. Thus, related words and derivatives can be identified and linked. From the outcome of the system implementation, it contributes to the development of semantic web and knowledge engineering.