Abstract: Web search engines are designed to retrieve and
extract the information in the web databases and to return dynamic
web pages. The Semantic Web is an extension of the current web in
which it includes semantic content in web pages. The main goal of
semantic web is to promote the quality of the current web by
changing its contents into machine understandable form. Therefore,
the milestone of semantic web is to have semantic level information
in the web. Nowadays, people use different keyword- based search
engines to find the relevant information they need from the web.
But many of the words are polysemous. When these words are
used to query a search engine, it displays the Search Result Records
(SRRs) with different meanings. The SRRs with similar meanings are
grouped together based on Word Sense Disambiguation (WSD). In
addition to that semantic annotation is also performed to improve the
efficiency of search result records. Semantic Annotation is the
process of adding the semantic metadata to web resources. Thus the
grouped SRRs are annotated and generate a summary which
describes the information in SRRs. But the automatic semantic
annotation is a significant challenge in the semantic web. Here
ontology and knowledge based representation are used to annotate
the web pages.
Abstract: Thailand has evolved many unique culture and knowledge, and the leading is the Thai traditional medicine (TTM). Recently, a number of researchers have tried to save this indigenous knowledge. However, the system to do so has still been scant. To preserve this ancient knowledge, we therefore invented and integrated multi-linguistic techniques to create the system of the collected all of recipes. This application extracted the medical recipes from antique scriptures then normalized antiquarian words, primitive grammar and antiquated measurement of them to the modern ones. Then, we applied ingredient-duplication-calculation, proportion-similarity-calculation and score-ranking to examine duplicate recipes. We collected the questionnaires from registrants and people to investigate the users’ satisfaction. The satisfactory results were found. This application assists not only registrants to validating the copyright violation in TTM registration process but also people to cure their illness that aids both Thai people and all mankind to fight for intractable diseases.
Abstract: With a growing number of digital libraries and other
open education repositories being made available throughout the
world, effective search and retrieval tools are necessary to access the
desired materials that surpass the effectiveness of traditional, allinclusive
search engines. This paper discusses the design and use of
Folksemantic, a platform that integrates OpenCourseWare search,
Open Educational Resource recommendations, and social network
functionality into a single open source project. The paper describes
how the system was originally envisioned, its goals for users, and
data that provides insight into how it is actually being used. Data
sources include website click-through data, query logs, web server
log files and user account data. Based on a descriptive analysis of its
current use, modifications to the platform's design are recommended
to better address goals of the system, along with recommendations
for additional phases of research.
Abstract: Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel approach to filtering based solely on layout, whose goal is not only to correctly identify spam, but also warn about major emerging threats. We propose a mathematical formulation of the email message layout and based on it we elaborate an algorithm to separate different types of emails and find the new, numerically relevant spam types.
Abstract: In this paper we would like to introduce some of the
best practices of using semantic markup and its significance in the
success of web applications. Search engines are one of the best ways
to reach potential customers and are some of the main indicators of
web sites' fruitfulness. We will introduce the most important
semantic vocabularies which are used by Google and Yahoo.
Afterwards, we will explain the process of semantic markup
implementation and its significance for search engines and other
semantic markup consumers. We will describe techniques for slow
conceiving RDFa markup to our web application for collecting Call
for papers (CFP) announcements.
Abstract: The information on the Web increases tremendously.
A number of search engines have been developed for searching Web
information and retrieving relevant documents that satisfy the
inquirers needs. Search engines provide inquirers irrelevant
documents among search results, since the search is text-based rather
than semantic-based. Information retrieval research area has
presented a number of approaches and methodologies such as
profiling, feedback, query modification, human-computer interaction,
etc for improving search results. Moreover, information retrieval has
employed artificial intelligence techniques and strategies such as
machine learning heuristics, tuning mechanisms, user and system
vocabularies, logical theory, etc for capturing user's preferences and
using them for guiding the search based on the semantic analysis
rather than syntactic analysis. Although a valuable improvement has
been recorded on search results, the survey has shown that still
search engines users are not really satisfied with their search results.
Using ontologies for semantic-based searching is likely the key
solution. Adopting profiling approach and using ontology base
characteristics, this work proposes a strategy for finding the exact
meaning of the query terms in order to retrieve relevant information
according to user needs. The evaluation of conducted experiments
has shown the effectiveness of the suggested methodology and
conclusion is presented.
Abstract: University websites are considered as one of the brand primary touch points for multiple stakeholders, but most of them did not have great designs to create favorable impressions. Some of the elements that web designers should carefully consider are the appearance, the content, the functionality, usability and search engine optimization. However, priority should be placed on website simplicity and negative space. In terms of content, previous research suggests that universities should include reputation, learning environment, graduate career prospects, image destination, cultural integration, and virtual tour on their websites. The study examines how top 200 world ranking science and technology-based universities present their brands online and whether the websites capture the content dimensions. Content analysis of the websites revealed that the top ranking universities captured these dimensions at varying degree. Besides, the UK-based university had better priority on website simplicity and negative space compared to the Malaysian-based university.
Abstract: One of the major challenges in the Information
Retrieval field is handling the massive amount of information
available to Internet users. Existing ranking techniques and strategies
that govern the retrieval process fall short of expected accuracy.
Often relevant documents are buried deep in the list of documents
returned by the search engine. In order to improve retrieval accuracy
we examine the issue of language effect on the retrieval process.
Then, we propose a solution for a more biased, user-centric relevance
for retrieved data. The results demonstrate that using indices based
on variations of the same language enhances the accuracy of search
engines for individual users.
Abstract: This paper proposes an auto-classification algorithm
of Web pages using Data mining techniques. We consider the
problem of discovering association rules between terms in a set of
Web pages belonging to a category in a search engine database, and
present an auto-classification algorithm for solving this problem that
are fundamentally based on Apriori algorithm. The proposed
technique has two phases. The first phase is a training phase where
human experts determines the categories of different Web pages, and
the supervised Data mining algorithm will combine these categories
with appropriate weighted index terms according to the highest
supported rules among the most frequent words. The second phase is
the categorization phase where a web crawler will crawl through the
World Wide Web to build a database categorized according to the
result of the data mining approach. This database contains URLs and
their categories.
Abstract: As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
Abstract: Processing the data by computers and performing
reasoning tasks is an important aim in Computer Science. Semantic
Web is one step towards it. The use of ontologies to enhance the
information by semantically is the current trend. Huge amount of
domain specific, unstructured on-line data needs to be expressed in
machine understandable and semantically searchable format.
Currently users are often forced to search manually in the results
returned by the keyword-based search services. They also want to use
their native languages to express what they search. In this paper, an
ontology-based automated question answering system on software
test documents domain is presented. The system allows users to enter
a question about the domain by means of natural language and
returns exact answer of the questions. Conversion of the natural
language question into the ontology based query is the challenging
part of the system. To be able to achieve this, a new algorithm
regarding free text to ontology based search engine query conversion
is proposed. The algorithm is based on investigation of suitable
question type and parsing the words of the question sentence.
Abstract: This paper compares the search engine marketing
strategies adopted in China and the Western countries through two illustrative cases, namely, Google and Baidu. Marketers in the West
use search engine optimization (SEO) to rank their sites higher for
queries in Google. Baidu, however, offers paid search placement, or the selling of engine results for particular keywords to the higher
bidders. Whereas Google has been providing innovative services ranging from Google Map to Google Blog, Baidu remains focused on
search services – the one that it does best. The challenges and
opportunities of the Chinese Internet market offered to global entrepreneurs are also discussed in the paper
Abstract: The ability of the brain to organize information and generate the functional structures we use to act, think and communicate, is a common and easily observable natural phenomenon. In object-oriented analysis, these structures are represented by objects. Objects have been extensively studied and documented, but the process that creates them is not understood. In this work, a new class of discrete, deterministic, dissipative, host-guest dynamical systems is introduced. The new systems have extraordinary self-organizing properties. They can host information representing other physical systems and generate the same functional structures as the brain does. A simple mathematical model is proposed. The new systems are easy to simulate by computer, and measurements needed to confirm the assumptions are abundant and readily available. Experimental results presented here confirm the findings. Applications are many, but among the most immediate are object-oriented engineering, image and voice recognition, search engines, and Neuroscience.
Abstract: The multi-agent system for processing Bio-signals
will help the medical practitioners to have a standard examination
procedure stored in web server. Web Servers supporting any standard
Search Engine follow all possible combinations of the search
keywords as an input by the user to a Search Engine. As a result, a
huge number of Web-pages are shown in the Web browser. It also
helps the medical practitioner to interact with the expert in the field
his need in order to make a proper judgment in the diagnosis phase
[3].A web server uses a web server plug in to establish and
maintained the medical practitioner to make a fast analysis. If the
user uses the web server client can get a related data requesting their
search. DB agent, EEG / ECG / EMG agents- user placed with
difficult aspects for updating medical information-s in web server.
Abstract: This paper focuses on a novel method for semantic
searching and retrieval of information about learning materials.
Metametadata encapsulate metadata instances by using the properties
and attributes provided by ontologies rather than describing learning
objects. A novel metametadata taxonomy has been developed which
provides the basis for a semantic search engine to extract, match and
map queries to retrieve relevant results. The use of ontological views
is a foundation for viewing the pedagogical content of metadata
extracted from learning objects by using the pedagogical attributes
from the metametadata taxonomy. Using the ontological approach
and metametadata (based on the metametadata taxonomy) we present
a novel semantic searching mechanism.These three strands – the
taxonomy, the ontological views, and the search algorithm – are
incorporated into a novel architecture (OMESCOD) which has been
implemented.
Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.
Abstract: In the area of Human Resource Management, the trend is towards online exchange of information about human resources. For example, online applications for employment become standard and job offerings are posted in many job portals. However, there are too many job portals to monitor all of them if someone is interested in a new job. We developed a prototype for integrating information of different job portals into one meta-search engine. First, existing job portals were investigated and XML schema documents were derived automated from these portals. Second, translation rules for transforming each schema to a central HR-XML-conform schema were determined. The HR-XML-schema is used to build a form for searching jobs. The data supplied by a user in this form is now translated into queries for the different job portals. Each result obtained by a job portal is sent to the meta-search engine that ranks the result of all received job offers according to user's preferences.
Abstract: Nowadays, web-based technologies influence in
people-s daily life such as in education, business and others.
Therefore, many web developers are too eager to develop their web
applications with fully animation graphics and forgetting its
accessibility to its users. Their purpose is to make their web
applications look impressive. Thus, this paper would highlight on the
usability and accessibility of a voice recognition browser as a tool to
facilitate the visually impaired and blind learners in accessing virtual
learning environment. More specifically, the objectives of the study
are (i) to explore the challenges faced by the visually impaired
learners in accessing virtual learning environment (ii) to determine
the suitable guidelines for developing a voice recognition browser
that is accessible to the visually impaired. Furthermore, this study
was prepared based on an observation conducted with the Malaysian
visually impaired learners. Finally, the result of this study would
underline on the development of an accessible voice recognition
browser for the visually impaired.
Abstract: This paper gives an introduction to Web mining, then
describes Web Structure mining in detail, and explores the data
structure used by the Web. This paper also explores different Page
Rank algorithms and compare those algorithms used for Information
Retrieval. In Web Mining, the basics of Web mining and the Web
mining categories are explained. Different Page Rank based
algorithms like PageRank (PR), WPR (Weighted PageRank), HITS
(Hyperlink-Induced Topic Search), DistanceRank and DirichletRank
algorithms are discussed and compared. PageRanks are calculated for
PageRank and Weighted PageRank algorithms for a given hyperlink
structure. Simulation Program is developed for PageRank algorithm
because PageRank is the only ranking algorithm implemented in the
search engine (Google). The outputs are shown in a table and chart
format.
Abstract: With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.