Abstract: The main purpose and focus of this paper are to determine the Interoperability Maturity Models to consider when using School Management Systems (SMS). The importance of this is to inform and help schools with knowing which Interoperability Maturity Model is best suited for their SMS. To address the purpose, this paper will apply a scoping review to ensure that all aspects are provided. The scoping review will include papers written from 2012-2019 and a comparison of the different types of Interoperability Maturity Models will be discussed in detail, which includes the background information, the levels of interoperability, and area for consideration in each Maturity Model. The literature was obtained from the following databases: IEEE Xplore and Scopus, the following search engines were used: Harzings, and Google Scholar. The topic of the paper was used as a search term for the literature and the term ‘Interoperability Maturity Models’ was used as a keyword. The data were analyzed in terms of the definition of Interoperability, Interoperability Maturity Models, and levels of interoperability. The results provide a table that shows the focus area of concern for each Maturity Model (based on the scoping review where only 24 papers were found to be best suited for the paper out of 740 publications initially identified in the field). This resulted in the most discussed Interoperability Maturity Model for consideration (Information Systems Interoperability Maturity Model (ISIMM) and Organizational Interoperability Maturity Model for C2 (OIM)).
Abstract: In recent years, the number of document data has been
increasing since the spread of the Internet. Many methods have been
studied for extracting topics from large document data. We proposed
Independent Topic Analysis (ITA) to extract topics independent of
each other from large document data such as newspaper data. ITA is a
method for extracting the independent topics from the document data
by using the Independent Component Analysis. The topic represented
by ITA is represented by a set of words. However, the set of words
is quite different from the topics the user imagines. For example,
the top five words with high independence of a topic are as follows.
Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic
1 is considered to represent the topic of "SPORTS". This topic name
"SPORTS" has to be attached by the user. ITA cannot name topics.
Therefore, in this research, we propose a method to obtain topics easy
for people to understand by using the web search engine, topics given
by the set of words given by independent topic analysis. In particular,
we search a set of topical words, and the title of the homepage of
the search result is taken as the topic name. And we also use the
proposed method for some data and verify its effectiveness.
Abstract: The development of the information technology and Internet has been transforming the healthcare industry. The internet is continuously accessed to seek for health information and there are variety of sources, including search engines, health websites, and social networking sites. Providing more and better information on health may empower individuals, however, ensuring a high quality and trusted health information could pose a challenge. Moreover, there is an ever-increasing amount of information available, but they are not necessarily accurate and up to date. Thus, this paper aims to provide an insight of the models and frameworks related to online health information seeking of consumers. It begins by exploring the definition of information behavior and information seeking to provide a better understanding of the concept of information seeking. In this study, critical factors such as performance expectancy, effort expectancy, and social influence will be studied in relation to the value of seeking health information. It also aims to analyze the effect of age, gender, and health status as the moderator on the factors that influence online health information seeking, i.e. trust and information quality. A preliminary survey will be carried out among the health professionals to clarify the research problems which exist in the real world, at the same time producing a conceptual framework. A final survey will be distributed to five states of Malaysia, to solicit the feedback on the framework. Data will be analyzed using SPSS and SmartPLS 3.0 analysis tools. It is hoped that at the end of this study, a novel framework that can improve online health information seeking is developed. Finally, this paper concludes with some suggestions on the models and frameworks that could improve online health information seeking.
Abstract: The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.
Abstract: This case study explores the impact of two major computer software programs Learn to Speak English and Learn English Spelling and Pronunciation, and some Internet search engines such as Google on mending the decoding and spelling deficiency of Simon X, a dyslexic student. The improvement in decoding and spelling may result in better reading comprehension and composition writing. Some computer programs and Internet materials can help regain the missing awareness and consequently restore his self-confidence and self-esteem. In addition, this study provides a systematic plan comprising a set of activities (four computer programs and Internet materials) which address the problem from the lowest to the highest levels of phoneme and phonological awareness. Four methods of data collection (accounts, observations, published tests, and interviews) create the triangulation to validly and reliably collect data before the plan, during the plan, and after the plan. The data collected are analyzed quantitatively and qualitatively. Sometimes the analysis is either quantitative or qualitative, and some other times a combination of both. Tables and figures are utilized to provide a clear and uncomplicated illustration of some data. The improvement in the decoding, spelling, reading comprehension, and composition writing skills that occurred is proved through the use of authentic materials performed by the student under study. Such materials are a comparison between two sample passages written by the learner before and after the plan, a genuine computer chat conversation, and the scores of the academic year that followed the execution of the plan. Based on these results, the researcher recommends further studies on other Lebanese dyslexic learners using the computer to mend their language problem in order to design and make a most reliable software program that can address this disability more efficiently and successfully.
Abstract: Doxing is a term derived from documents, and hence consists of collecting information on an organization or individual through social media websites, search engines, password cracking methods, social engineering tools and other sources of publicly displayed information. The main purpose of doxing attacks is to threaten, embarrass, harass and humiliate the organization or individual. Various tools are used to perform doxing. Tools such as Maltego visualize organization’s architecture which helps in determining weak links within the organization. This paper discusses limitations of Maltego Chlorine CE 3.6.0 and suggests measures as to how organizations can use these tools to protect themselves from doxing attacks.
Abstract: The Information Retrieval community is facing the problem of effective representation of Web search results. When we organize web search results into clusters it becomes easy to the users to quickly browse through search results. The traditional search engines organize search results into clusters for ambiguous queries, representing each cluster for each meaning of the query. The clusters are obtained according to the topical similarity of the retrieved search results, but it is possible for results to be totally dissimilar and still correspond to the same meaning of the query. People search is also one of the most common tasks on the Web nowadays, but when a particular person’s name is queried the search engines return web pages which are related to different persons who have the same queried name. By placing the burden on the user of disambiguating and collecting pages relevant to a particular person, in this paper, we have developed an approach that clusters web pages based on the association of the web pages to the different people and clusters that are based on generic entity search.
Abstract: This paper explores efficient ways to implement various
media-updating features like news aggregation, video conversion,
and bulk email handling. All of these jobs share the property
that they are periodic in nature, and they all benefit from being
handled in a distributed fashion. The data for these jobs also often
comes from a social or collaborative source. We isolate the class of
periodic, one round map reduce jobs as a useful setting to describe
and handle media updating tasks. As such tasks are simpler than
general map reduce jobs, programming them in a general map
reduce platform could easily become tedious. This paper presents
a MediaUpdater module of the Yioop Open Source Search Engine
Web Portal designed to handle such jobs via an extension of a
PHP class. We describe how to implement various media-updating
tasks in our system as well as experiments carried out using these
implementations on an Amazon Web Services cluster.
Abstract: This study discusses a simple solution for the problem of shortage in learning resources for kindergarten teachers. Occasionally, kindergarten teachers cannot access proper resources by usual search methods as libraries or search engines. Furthermore, these methods require a long time and efforts for preparing. The study is expected to facilitate accessing learning resources. Moreover, it suggests a potential direction for using QR code inside the classroom. The present work proposes that QR code can be used for digitizing kindergarten curriculums and accessing various learning resources. It investigates using QR code for saving information related to the concepts which kindergarten teachers use in the current educational situation. The researchers have established a guide for kindergarten teachers based on the Egyptian official curriculum. The guide provides different learning resources for each scientific and mathematical concept in the curriculum, and each learning resource is represented as a QR code image that contains its URL. Therefore, kindergarten teachers can use smartphone applications for reading QR codes and displaying the related learning resources for students immediately. The guide has been provided to a group of 108 teachers for using inside their classrooms. The results showed that the teachers approved the guide, and gave a good response.
Abstract: An online advertisement system and its implementation
for the Yioop open source search engine are presented. This system
supports both selling advertisements and displaying them within
search results. The selling of advertisements is done using a system
to auction off daily impressions for keyword searches. This is an
open, ascending price auction system in which all accepted bids will
receive a fraction of the auctioned day’s impressions. New bids in
our system are required to be at least one half of the sum of all
previous bids ensuring the number of accepted bids is logarithmic
in the total ad spend on a keyword for a day. The mechanics of
creating an advertisement, attaching keywords to it, and adding it
to an advertisement inventory are described. The algorithm used to
go from accepted bids for a keyword to which ads are displayed at
search time is also presented. We discuss properties of our system
and compare it to existing auction systems and systems for selling
online advertisements.
Abstract: The development of web technologies and mobile devices makes creating, accessing, using and sharing information or communicating with each other simpler every day. However, while the amount of information constantly increasing it is becoming harder to effectively organize and find quality information despite the availability of web search engines, filtering and indexing tools. Although digital technologies have overall positive impact on students’ lives, frequent use of these technologies and digital media enriched with dynamic hypertext and hypermedia content, as well as multitasking, distractions caused by notifications, calls or messages; can decrease the attention span, make thinking, memorizing and learning more difficult, which can lead to stress and mental exhaustion. This is referred to as “information overload”, “information glut” or “information anxiety”. Objective of this study is to determine whether students show signs of information overload and to identify the possible predictors. Research was conducted using a questionnaire developed for the purpose of this study. The results show that students frequently use technology (computers, gadgets and digital media), while they show moderate level of information literacy. They have sometimes experienced symptoms of information overload. According to the statistical analysis, higher frequency of technology use and lower level of information literacy are correlated with larger information overload. The multiple regression analysis has confirmed that the combination of these two independent variables has statistically significant predictive capacity for information overload. Therefore, the information science teachers should pay attention to improving the level of students’ information literacy and educate them about the risks of excessive technology use.
Abstract: Web mining is to discover and extract useful
Information. Different users may have different search goals when
they search by giving queries and submitting it to a search engine.
The inference and analysis of user search goals can be very useful for
providing an experience result for a user search query. In this project,
we propose a novel approach to infer user search goals by analyzing
search web logs. First, we propose a novel approach to infer user
search goals by analyzing search engine query logs, the feedback
sessions are constructed from user click-through logs and it
efficiently reflect the information needed for users. Second we
propose a preprocessing technique to clean the unnecessary data’s
from web log file (feedback session). Third we propose a technique
to generate pseudo-documents to representation of feedback sessions
for clustering. Finally we implement k-medoids clustering algorithm
to discover different user search goals and to provide a more optimal
result for a search query based on feedback sessions for the user.
Abstract: Mitigating soil erosion, especially in Mediterranean
countries such as Greece, is essential in order to maintain
environmental and agricultural sustainability. In this paper, scientific
publications related to soil erosion studies in Greece were reviewed
and categorized. To accomplish this, the online search engine of
Scopus was used. The key words were “soil”, “erosion” and
“Greece.” An analysis of the published articles was conducted at
three levels: i) type of publication, ii) chronologic and iii) thematic. A
hundred and ten publications published in scientific journals were
reviewed. The results showed that the awareness regarding the soil
erosion in Greece has increased only in the last decades. The
publications covered a wide range of thematic categories such as the
type of studied areas, the physical phenomena that trigger and
influence the soil erosion, the negative anthropogenic impacts on
them, the assessment tools that were used in order to examine the
threat and the proper management. The analysis of these articles was
significant and necessary in order to find the scientific gaps of soil
erosion studies in Greece and help enhance the sustainability of soil
management in the future.
Abstract: The web’s increased popularity has included a huge
amount of information, due to which automated web page
classification systems are essential to improve search engines’
performance. Web pages have many features like HTML or XML
tags, hyperlinks, URLs and text contents which can be considered
during an automated classification process. It is known that Webpage
classification is enhanced by hyperlinks as it reflects Web page
linkages. The aim of this study is to reduce the number of features to
be used to improve the accuracy of the classification of web pages. In
this paper, a novel feature selection method using an improved
Particle Swarm Optimization (PSO) using principle of evolution is
proposed. The extracted features were tested on the WebKB dataset
using a parallel Neural Network to reduce the computational cost.
Abstract: Image search engines rely on the surrounding textual
keywords for the retrieval of images. It is a tedious work for the
search engines like Google and Bing to interpret the user’s search
intention and to provide the desired results. The recent researches
also state that the Google image search engines do not work well on
all the images. Consequently, this leads to the emergence of efficient
image retrieval technique, which interprets the user’s search intention
and shows the desired results. In order to accomplish this task, an
efficient image re-ranking framework is required. Sequentially, to
provide best image retrieval, the new image re-ranking framework is
experimented in this paper. The implemented new image re-ranking
framework provides best image retrieval from the image dataset by
making use of re-ranking of retrieved images that is based on the
user’s desired images. This is experimented in two sections. One is
offline section and other is online section. In offline section, the reranking
framework studies differently (reference classes or Semantic
Spaces) for diverse user query keywords. The semantic signatures get
generated by combining the textual and visual features of the images.
In the online section, images are re-ranked by comparing the
semantic signatures that are obtained from the reference classes with
the user specified image query keywords. This re-ranking
methodology will increases the retrieval image efficiency and the
result will be effective to the user.
Abstract: Search engine plays an important role in internet, to
retrieve the relevant documents among the huge number of web
pages. However, it retrieves more number of documents, which are
all relevant to your search topics. To retrieve the most meaningful
documents related to search topics, ranking algorithm is used in
information retrieval technique. One of the issues in data miming is
ranking the retrieved document. In information retrieval the ranking
is one of the practical problems. This paper includes various Page
Ranking algorithms, page segmentation algorithms and compares
those algorithms used for Information Retrieval. Diverse Page Rank
based algorithms like Page Rank (PR), Weighted Page Rank (WPR),
Weight Page Content Rank (WPCR), Hyperlink Induced Topic
Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time
Rank, Tag Rank, Relational Based Page Rank and Query Dependent
Ranking algorithms are discussed and compared.
Abstract: Nowadays, the Web has become one of the most
pervasive platforms for information change and retrieval. It collects
the suitable and perfectly fitting information from websites that one
requires. Data mining is the form of extracting data’s available in the
internet. Web mining is one of the elements of data mining
Technique, which relates to various research communities such as
information recovery, folder managing system and simulated
intellects. In this Paper we have discussed the concepts of Web
mining. We contain generally focused on one of the categories of
Web mining, specifically the Web Content Mining and its various
farm duties. The mining tools are imperative to scanning the many
images, text, and HTML documents and then, the result is used by
the various search engines. We conclude by presenting a comparative
table of these tools based on some pertinent criteria.
Abstract: Due to the large amount of information in the World
Wide Web (WWW, web) and the lengthy and usually linearly
ordered result lists of web search engines that do not indicate
semantic relationships between their entries, the search for topically
similar and related documents can become a tedious task. Especially,
the process of formulating queries with proper terms representing
specific information needs requires much effort from the user. This
problem gets even bigger when the user's knowledge on a subject and
its technical terms is not sufficient enough to do so. This article
presents the new and interactive search application DocAnalyser that
addresses this problem by enabling users to find similar and related
web documents based on automatic query formulation and state-ofthe-
art search word extraction. Additionally, this tool can be used to
track topics across semantically connected web documents.
Abstract: The world wide web network is a network with a
complex topology, the main properties of which are the distribution
of degrees in power law, A low clustering coefficient and a weak
average distance. Modeling the web as a graph allows locating the
information in little time and consequently offering a help in the
construction of the research engine. Here, we present a model based
on the already existing probabilistic graphs with all the aforesaid
characteristics. This work will consist in studying the web in order to
know its structuring thus it will enable us to modelize it more easily
and propose a possible algorithm for its exploration.
Abstract: One of the major goals of Spoken Dialog Systems
(SDS) is to understand what the user utters.
In the SDS domain, the Spoken Language Understanding (SLU)
Module classifies user utterances by means of a pre-definite
conceptual knowledge. The SLU module is able to recognize only the
meaning previously included in its knowledge base. Due the vastity
of that knowledge, the information storing is a very expensive
process.
Updating and managing the knowledge base are time-consuming
and error-prone processes because of the rapidly growing number of
entities like proper nouns and domain-specific nouns. This paper
proposes a solution to the problem of Name Entity Recognition
(NER) applied to a SDS domain. The proposed solution attempts to
automatically recognize the meaning associated with an utterance by
using the PANKOW (Pattern based Annotation through Knowledge
On the Web) method at runtime.
The method being proposed extracts information from the Web to
increase the SLU knowledge module and reduces the development
effort. In particular, the Google Search Engine is used to extract
information from the Facebook social network.