Categorizing Search Result Records Using Word Sense Disambiguation

Web search engines are designed to retrieve and extract the information in the web databases and to return dynamic web pages. The Semantic Web is an extension of the current web in which it includes semantic content in web pages. The main goal of semantic web is to promote the quality of the current web by changing its contents into machine understandable form. Therefore, the milestone of semantic web is to have semantic level information in the web. Nowadays, people use different keyword- based search engines to find the relevant information they need from the web. But many of the words are polysemous. When these words are used to query a search engine, it displays the Search Result Records (SRRs) with different meanings. The SRRs with similar meanings are grouped together based on Word Sense Disambiguation (WSD). In addition to that semantic annotation is also performed to improve the efficiency of search result records. Semantic Annotation is the process of adding the semantic metadata to web resources. Thus the grouped SRRs are annotated and generate a summary which describes the information in SRRs. But the automatic semantic annotation is a significant challenge in the semantic web. Here ontology and knowledge based representation are used to annotate the web pages.

Using Multi-Linguistic Techniques for Thailand Herb and Traditional Medicine Registration Systems

Thailand has evolved many unique culture and knowledge, and the leading is the Thai traditional medicine (TTM). Recently, a number of researchers have tried to save this indigenous knowledge. However, the system to do so has still been scant. To preserve this ancient knowledge, we therefore invented and integrated multi-linguistic techniques to create the system of the collected all of recipes. This application extracted the medical recipes from antique scriptures then normalized antiquarian words, primitive grammar and antiquated measurement of them to the modern ones. Then, we applied ingredient-duplication-calculation, proportion-similarity-calculation and score-ranking to examine duplicate recipes. We collected the questionnaires from registrants and people to investigate the users’ satisfaction. The satisfactory results were found. This application assists not only registrants to validating the copyright violation in TTM registration process but also people to cure their illness that aids both Thai people and all mankind to fight for intractable diseases.

Linking OpenCourseWares and Open Education Resources: Creating an Effective Search and Recommendation System

With a growing number of digital libraries and other open education repositories being made available throughout the world, effective search and retrieval tools are necessary to access the desired materials that surpass the effectiveness of traditional, allinclusive search engines. This paper discusses the design and use of Folksemantic, a platform that integrates OpenCourseWare search, Open Educational Resource recommendations, and social network functionality into a single open source project. The paper describes how the system was originally envisioned, its goals for users, and data that provides insight into how it is actually being used. Data sources include website click-through data, query logs, web server log files and user account data. Based on a descriptive analysis of its current use, modifications to the platform's design are recommended to better address goals of the system, along with recommendations for additional phases of research.

Layout Based Spam Filtering

Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel approach to filtering based solely on layout, whose goal is not only to correctly identify spam, but also warn about major emerging threats. We propose a mathematical formulation of the email message layout and based on it we elaborate an algorithm to separate different types of emails and find the new, numerically relevant spam types.

Semantic Markup for Web Applications

In this paper we would like to introduce some of the best practices of using semantic markup and its significance in the success of web applications. Search engines are one of the best ways to reach potential customers and are some of the main indicators of web sites' fruitfulness. We will introduce the most important semantic vocabularies which are used by Google and Yahoo. Afterwards, we will explain the process of semantic markup implementation and its significance for search engines and other semantic markup consumers. We will describe techniques for slow conceiving RDFa markup to our web application for collecting Call for papers (CFP) announcements.

A Weighted-Profiling Using an Ontology Basefor Semantic-Based Search

The information on the Web increases tremendously. A number of search engines have been developed for searching Web information and retrieving relevant documents that satisfy the inquirers needs. Search engines provide inquirers irrelevant documents among search results, since the search is text-based rather than semantic-based. Information retrieval research area has presented a number of approaches and methodologies such as profiling, feedback, query modification, human-computer interaction, etc for improving search results. Moreover, information retrieval has employed artificial intelligence techniques and strategies such as machine learning heuristics, tuning mechanisms, user and system vocabularies, logical theory, etc for capturing user's preferences and using them for guiding the search based on the semantic analysis rather than syntactic analysis. Although a valuable improvement has been recorded on search results, the survey has shown that still search engines users are not really satisfied with their search results. Using ontologies for semantic-based searching is likely the key solution. Adopting profiling approach and using ontology base characteristics, this work proposes a strategy for finding the exact meaning of the query terms in order to retrieve relevant information according to user needs. The evaluation of conducted experiments has shown the effectiveness of the suggested methodology and conclusion is presented.

Online Brands: A Comparative Study of World Top Ranked Universities with Science and Technology Programs

University websites are considered as one of the brand primary touch points for multiple stakeholders, but most of them did not have great designs to create favorable impressions. Some of the elements that web designers should carefully consider are the appearance, the content, the functionality, usability and search engine optimization. However, priority should be placed on website simplicity and negative space. In terms of content, previous research suggests that universities should include reputation, learning environment, graduate career prospects, image destination, cultural integration, and virtual tour on their websites. The study examines how top 200 world ranking science and technology-based universities present their brands online and whether the websites capture the content dimensions. Content analysis of the websites revealed that the top ranking universities captured these dimensions at varying degree. Besides, the UK-based university had better priority on website simplicity and negative space compared to the Malaysian-based university.

Language and Retrieval Accuracy

One of the major challenges in the Information Retrieval field is handling the massive amount of information available to Internet users. Existing ranking techniques and strategies that govern the retrieval process fall short of expected accuracy. Often relevant documents are buried deep in the list of documents returned by the search engine. In order to improve retrieval accuracy we examine the issue of language effect on the retrieval process. Then, we propose a solution for a more biased, user-centric relevance for retrieved data. The results demonstrate that using indices based on variations of the same language enhances the accuracy of search engines for individual users.

Auto Classification for Search Intelligence

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Information Retrieval in Domain Specific Search Engine with Machine Learning Approaches

As the web continues to grow exponentially, the idea of crawling the entire web on a regular basis becomes less and less feasible, so the need to include information on specific domain, domain-specific search engines was proposed. As more information becomes available on the World Wide Web, it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest) [2]. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves. In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated. This paper describes the use of semi-structured machine learning approach with Active learning for the “Domain Specific Search Engines". A domain-specific search engine is “An information access system that allows access to all the information on the web that is relevant to a particular domain. The proposed work shows that with the help of this approach relevant data can be extracted with the minimum queries fired by the user. It requires small number of labeled data and pool of unlabelled data on which the learning algorithm is applied to extract the required data.

An Ontology Based Question Answering System on Software Test Document Domain

Processing the data by computers and performing reasoning tasks is an important aim in Computer Science. Semantic Web is one step towards it. The use of ontologies to enhance the information by semantically is the current trend. Huge amount of domain specific, unstructured on-line data needs to be expressed in machine understandable and semantically searchable format. Currently users are often forced to search manually in the results returned by the keyword-based search services. They also want to use their native languages to express what they search. In this paper, an ontology-based automated question answering system on software test documents domain is presented. The system allows users to enter a question about the domain by means of natural language and returns exact answer of the questions. Conversion of the natural language question into the ontology based query is the challenging part of the system. To be able to achieve this, a new algorithm regarding free text to ontology based search engine query conversion is proposed. The algorithm is based on investigation of suitable question type and parsing the words of the question sentence.

In Search of Excellence – Google vs Baidu

This paper compares the search engine marketing strategies adopted in China and the Western countries through two illustrative cases, namely, Google and Baidu. Marketers in the West use search engine optimization (SEO) to rank their sites higher for queries in Google. Baidu, however, offers paid search placement, or the selling of engine results for particular keywords to the higher bidders. Whereas Google has been providing innovative services ranging from Google Map to Google Blog, Baidu remains focused on search services – the one that it does best. The challenges and opportunities of the Chinese Internet market offered to global entrepreneurs are also discussed in the paper

Coupled Dynamics in Host-Guest Complex Systems Duplicates Emergent Behavior in the Brain

The ability of the brain to organize information and generate the functional structures we use to act, think and communicate, is a common and easily observable natural phenomenon. In object-oriented analysis, these structures are represented by objects. Objects have been extensively studied and documented, but the process that creates them is not understood. In this work, a new class of discrete, deterministic, dissipative, host-guest dynamical systems is introduced. The new systems have extraordinary self-organizing properties. They can host information representing other physical systems and generate the same functional structures as the brain does. A simple mathematical model is proposed. The new systems are easy to simulate by computer, and measurements needed to confirm the assumptions are abundant and readily available. Experimental results presented here confirm the findings. Applications are many, but among the most immediate are object-oriented engineering, image and voice recognition, search engines, and Neuroscience.

Web Server with Multi-Agent Support for Medical Practitioners by JADE Technology

The multi-agent system for processing Bio-signals will help the medical practitioners to have a standard examination procedure stored in web server. Web Servers supporting any standard Search Engine follow all possible combinations of the search keywords as an input by the user to a Search Engine. As a result, a huge number of Web-pages are shown in the Web browser. It also helps the medical practitioner to interact with the expert in the field his need in order to make a proper judgment in the diagnosis phase [3].A web server uses a web server plug in to establish and maintained the medical practitioner to make a fast analysis. If the user uses the web server client can get a related data requesting their search. DB agent, EEG / ECG / EMG agents- user placed with difficult aspects for updating medical information-s in web server.

A Metametadata Architecture forPedagogic Data Description

This paper focuses on a novel method for semantic searching and retrieval of information about learning materials. Metametadata encapsulate metadata instances by using the properties and attributes provided by ontologies rather than describing learning objects. A novel metametadata taxonomy has been developed which provides the basis for a semantic search engine to extract, match and map queries to retrieve relevant results. The use of ontological views is a foundation for viewing the pedagogical content of metadata extracted from learning objects by using the pedagogical attributes from the metametadata taxonomy. Using the ontological approach and metametadata (based on the metametadata taxonomy) we present a novel semantic searching mechanism.These three strands – the taxonomy, the ontological views, and the search algorithm – are incorporated into a novel architecture (OMESCOD) which has been implemented.

Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation

The internet has become an attractive avenue for global e-business, e-learning, knowledge sharing, etc. Due to continuous increase in the volume of web content, it is not practically possible for a user to extract information by browsing and integrating data from a huge amount of web sources retrieved by the existing search engines. The semantic web technology enables advancement in information extraction by providing a suite of tools to integrate data from different sources. To take full advantage of semantic web, it is necessary to annotate existing web pages into semantic web pages. This research develops a tool, named OWIE (Ontology-based Web Information Extraction), for semantic web annotation using domain specific ontologies. The tool automatically extracts information from html pages with the help of pre-defined ontologies and gives them semantic representation. Two case studies have been conducted to analyze the accuracy of OWIE.

Meta-Search in Human Resource Management

In the area of Human Resource Management, the trend is towards online exchange of information about human resources. For example, online applications for employment become standard and job offerings are posted in many job portals. However, there are too many job portals to monitor all of them if someone is interested in a new job. We developed a prototype for integrating information of different job portals into one meta-search engine. First, existing job portals were investigated and XML schema documents were derived automated from these portals. Second, translation rules for transforming each schema to a central HR-XML-conform schema were determined. The HR-XML-schema is used to build a form for searching jobs. The data supplied by a user in this form is now translated into queries for the different job portals. Each result obtained by a job portal is sent to the meta-search engine that ranks the result of all received job offers according to user's preferences.

Search Engine Module in Voice Recognition Browser to Facilitate the Visually Impaired in Virtual Learning (MGSYS VISI-VL)

Nowadays, web-based technologies influence in people-s daily life such as in education, business and others. Therefore, many web developers are too eager to develop their web applications with fully animation graphics and forgetting its accessibility to its users. Their purpose is to make their web applications look impressive. Thus, this paper would highlight on the usability and accessibility of a voice recognition browser as a tool to facilitate the visually impaired and blind learners in accessing virtual learning environment. More specifically, the objectives of the study are (i) to explore the challenges faced by the visually impaired learners in accessing virtual learning environment (ii) to determine the suitable guidelines for developing a voice recognition browser that is accessible to the visually impaired. Furthermore, this study was prepared based on an observation conducted with the Malaysian visually impaired learners. Finally, the result of this study would underline on the development of an accessible voice recognition browser for the visually impaired.

A Comparative Study of Page Ranking Algorithms for Information Retrieval

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.

Web Content Mining: A Solution to Consumer's Product Hunt

With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.