Information Retrieval in Domain Specific Search Engine with Machine Learning Approaches

As the web continues to grow exponentially, the idea of crawling the entire web on a regular basis becomes less and less feasible, so the need to include information on specific domain, domain-specific search engines was proposed. As more information becomes available on the World Wide Web, it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest) [2]. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves. In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated. This paper describes the use of semi-structured machine learning approach with Active learning for the “Domain Specific Search Engines". A domain-specific search engine is “An information access system that allows access to all the information on the web that is relevant to a particular domain. The proposed work shows that with the help of this approach relevant data can be extracted with the minimum queries fired by the user. It requires small number of labeled data and pool of unlabelled data on which the learning algorithm is applied to extract the required data.

DCBOR: A Density Clustering Based on Outlier Removal

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Electronic Government in the GCC Countries

The study investigated the practices of organisations in Gulf Cooperation Council (GCC) countries with regards to G2C egovernment maturity. It reveals that e-government G2C initiatives in the surveyed countries in particular, and arguably around the world in general, are progressing slowly because of the lack of a trusted and secure medium to authenticate the identities of online users. The authors conclude that national ID schemes will play a major role in helping governments reap the benefits of e-government if the three advanced technologies of smart card, biometrics and public key infrastructure (PKI) are utilised to provide a reliable and trusted authentication medium for e-government services.

Intelligent Agent Approach to the Control of Critical Infrastructure Networks

In this paper we propose an intelligent agent approach to control the electric power grid at a smaller granularity in order to give it self-healing capabilities. We develop a method using the influence model to transform transmission substations into information processing, analyzing and decision making (intelligent behavior) units. We also develop a wireless communication method to deliver real-time uncorrupted information to an intelligent controller in a power system environment. A combined networking and information theoretic approach is adopted in meeting both the delay and error probability requirements. We use a mobile agent approach in optimizing the achievable information rate vector and in the distribution of rates to users (sensors). We developed the concept and the quantitative tools require in the creation of cooperating semiautonomous subsystems which puts the electric grid on the path towards intelligent and self-healing system.

Indonesian News Classification using Support Vector Machine

Digital news with a variety topics is abundant on the internet. The problem is to classify news based on its appropriate category to facilitate user to find relevant news rapidly. Classifier engine is used to split any news automatically into the respective category. This research employs Support Vector Machine (SVM) to classify Indonesian news. SVM is a robust method to classify binary classes. The core processing of SVM is in the formation of an optimum separating plane to separate the different classes. For multiclass problem, a mechanism called one against one is used to combine the binary classification result. Documents were taken from the Indonesian digital news site, www.kompas.com. The experiment showed a promising result with the accuracy rate of 85%. This system is feasible to be implemented on Indonesian news classification.

An Evaluation of the Usability of IT Faculty Educational Portal at University of Benghazi

Evaluation of educational portals is an important subject area that needs more attention from researchers. A university that has an educational portal which is difficult to use and interact by teachers or students or management staff can reduce the position and reputation of the university. Therefore, it is important to have the ability to make an evaluation of the quality of e-services the university provide to improve them over time. The present study evaluates the usability of the Information Technology Faculty portal at University of Benghazi. Two evaluation methods were used: a questionnaire-based method and an online automated tool-based method. The first method was used to measure the portal's external attributes of usability (Information, Content and Organization of the portal, Navigation, Links and Accessibility, Aesthetic and Visual Appeal, Performance and Effectiveness and educational purpose) from users' perspectives, while the second method was used to measure the portal's internal attributes of usability (number and size of HTML files, number and size of images, load time, HTML check errors, browsers compatibility problems, number of bad and broken links), which cannot be perceived by the users. The study showed that some of the usability aspects have been found at the acceptable level of performance and quality, and some others have been found otherwise. In general, it was concluded that the usability of IT faculty educational portal generally acceptable. Recommendations and suggestions to improve the weakness and quality of the portal usability are presented in this study.

Learning User Keystroke Patterns for Authentication

Keystroke authentication is a new access control system to identify legitimate users via their typing behavior. In this paper, machine learning techniques are adapted for keystroke authentication. Seven learning methods are used to build models to differentiate user keystroke patterns. The selected classification methods are Decision Tree, Naive Bayesian, Instance Based Learning, Decision Table, One Rule, Random Tree and K-star. Among these methods, three of them are studied in more details. The results show that machine learning is a feasible alternative for keystroke authentication. Compared to the conventional Nearest Neighbour method in the recent research, learning methods especially Decision Tree can be more accurate. In addition, the experiment results reveal that 3-Grams is more accurate than 2-Grams and 4-Grams for feature extraction. Also, combination of attributes tend to result higher accuracy.

Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern

Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.

Cryptanalysis of Two-Factor Authenticated Key Exchange Protocol in Public Wireless LANs

In Public Wireless LANs(PWLANs), user anonymity is an essential issue. Recently, Juang et al. proposed an anonymous authentication and key exchange protocol using smart cards in PWLANs. They claimed that their proposed scheme provided identity privacy, mutual authentication, and half-forward secrecy. In this paper, we point out that Juang et al.'s protocol is vulnerable to the stolen-verifier attack and does not satisfy user anonymity.

Finding an Optimized Discriminate Function for Internet Application Recognition

Everyday the usages of the Internet increase and simply a world of the data become accessible. Network providers do not want to let the provided services to be used in harmful or terrorist affairs, so they used a variety of methods to protect the special regions from the harmful data. One of the most important methods is supposed to be the firewall. Firewall stops the transfer of such packets through several ways, but in some cases they do not use firewall because of its blind packet stopping, high process power needed and expensive prices. Here we have proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. So an administrator can alarm to the user. This method is very fast and can be used simply in adjacent with the Internet routers.

Comparative Study of Virtual Sickness between a Single-screen and Three-screen from Parallax Affect

Virtual environment induces simulator sickness effect for some users. The purpose of this research is to compare the simulation sickness relative with parallax affect in one-screen and three-screen HoloStageTM system, measured by Simulation Sickness Questionnaire (SSQ). The results show the subjects tested in three-screen has less sickness than one-screen and effect from the Oculomotor (O) more than from the Disorientation (D) and more than from the Nausea (N) or represented in O>D>N.

A NXM Version of 5X5 Playfair Cipher for any Natural Language (Urdu as Special Case)

In this paper a modified version NXM of traditional 5X5 playfair cipher is introduced which enable the user to encrypt message of any Natural language by taking appropriate size of the matrix depending upon the size of the natural language. 5X5 matrix has the capability of storing only 26 characters of English language and unable to store characters of any language having more than 26 characters. To overcome this limitation NXM matrix is introduced which solve this limitation. In this paper a special case of Urdu language is discussed. Where # is used for completing odd pair and * is used for repeating letters.

Scale Time Offset Robust Modulation (STORM) in a Code Division Multiaccess Environment

Scale Time Offset Robust Modulation (STORM) [1]– [3] is a high bandwidth waveform design that adds time-scale to embedded reference modulations using only time-delay [4]. In an environment where each user has a specific delay and scale, identification of the user with the highest signal power and that user-s phase is facilitated by the STORM processor. Both of these parameters are required in an efficient multiuser detection algorithm. In this paper, the STORM modulation approach is evaluated with a direct sequence spread quadrature phase shift keying (DS-QPSK) system. A misconception of the STORM time scale modulation is that a fine temporal resolution is required at the receiver. STORM will be applied to a QPSK code division multiaccess (CDMA) system by modifying the spreading codes. Specifically, the in-phase code will use a typical spreading code, and the quadrature code will use a time-delayed and time-scaled version of the in-phase code. Subsequently, the same temporal resolution in the receiver is required before and after the application of STORM. In this paper, the bit error performance of STORM in a synchronous CDMA system is evaluated and compared to theory, and the bit error performance of STORM incorporated in a single user WCDMA downlink is presented to demonstrate the applicability of STORM in a modern communication system.

An Algorithm for Secure Visible Logo Embedding and Removing in Compression Domain

Digital watermarking is the process of embedding information into a digital signal which can be used in DRM (digital rights managements) system. The visible watermark (often called logo) can indicate the owner of the copyright which can often be seen in the TV program and protects the copyright in an active way. However, most of the schemes do not consider the visible watermark removing process. To solve this problem, a visible watermarking scheme with embedding and removing process is proposed under the control of a secure template. The template generates different version of watermarks which can be seen visually the same for different users. Users with the right key can completely remove the watermark and recover the original image while the unauthorized user is prevented to remove the watermark. Experiment results show that our watermarking algorithm obtains a good visual quality and is hard to be removed by the illegally users. Additionally, the authorized users can completely remove the visible watermark and recover the original image with a good quality.

A Watermarking Scheme for MP3 Audio Files

In this work, we present for the first time in our perception an efficient digital watermarking scheme for mpeg audio layer 3 files that operates directly in the compressed data domain, while manipulating the time and subband/channel domain. In addition, it does not need the original signal to detect the watermark. Our scheme was implemented taking special care for the efficient usage of the two limited resources of computer systems: time and space. It offers to the industrial user the capability of watermark embedding and detection in time immediately comparable to the real music time of the original audio file that depends on the mpeg compression, while the end user/audience does not face any artifacts or delays hearing the watermarked audio file. Furthermore, it overcomes the disadvantage of algorithms operating in the PCMData domain to be vulnerable to compression/recompression attacks, as it places the watermark in the scale factors domain and not in the digitized sound audio data. The strength of our scheme, that allows it to be used with success in both authentication and copyright protection, relies on the fact that it gives to the users the enhanced capability their ownership of the audio file not to be accomplished simply by detecting the bit pattern that comprises the watermark itself, but by showing that the legal owner knows a hard to compute property of the watermark.

Distillation Monitoring and Control using LabVIEW and SIMULINK Tools

LabVIEW and SIMULINK are two most widely used graphical programming environments for designing digital signal processing and control systems. Unlike conventional text-based programming languages such as C, Cµ and MATLAB, graphical programming involves block-based code developments, allowing a more efficient mechanism to build and analyze control systems. In this paper a LabVIEW environment has been employed as a graphical user interface for monitoring the operation of a controlled distillation column, by visualizing both the closed loop performance and the user selected control conditions, while the column dynamics has been modeled under the SIMULINK environment. This tool has been applied to the PID based decoupled control of a binary distillation column. By means of such integrated environments the control designer is able to monitor and control the plant behavior and optimize the response when both, the quality improvement of distillation products and the operation efficiency tasks, are considered.

A Step-wise Zoom Technique for Exploring Image-based Virtual Reality Applications

Existing image-based virtual reality applications allow users to view image-based 3D virtual environment in a more interactive manner. User could “walkthrough"; looks left, right, up and down and even zoom into objects in these virtual worlds of images. However what the user sees during a “zoom in" is just a close-up view of the same image which was taken from a distant. Thus, this does not give the user an accurate view of the object from the actual distance. In this paper, a simple technique for zooming in an object in a virtual scene is presented. The technique is based on the 'hotspot' concept in existing application. Instead of navigation between two different locations, the hotspots are used to focus into an object in the scene. For each object, several hotspots are created. A different picture is taken for each hotspot. Each consecutive hotspot created will take the user closer to the object. This will provide the user with a correct of view of the object based on his proximity to the object. Implementation issues and the relevance of this technique in potential application areas are highlighted.

GIS-based Approach for Land-Use Analysis: A Case Study

Geographical Information Systems are an integral part of planning in modern technical systems. Nowadays referred to as Spatial Decision Support Systems, as they allow synergy database management systems and models within a single user interface machine and they are important tools in spatial design for evaluating policies and programs at all levels of administration. This work refers to the creation of a Geographical Information System in the context of a broader research in the area of influence of an under construction station of the new metro in the Greek city of Thessaloniki, which included statistical and multivariate data analysis and diagrammatic representation, mapping and interpretation of the results.

An Optimal Algorithm for HTML Page Building Process

Demand over web services is in growing with increases number of Web users. Web service is applied by Web application. Web application size is affected by its user-s requirements and interests. Differential in requirements and interests lead to growing of Web application size. The efficient way to save store spaces for more data and information is achieved by implementing algorithms to compress the contents of Web application documents. This paper introduces an algorithm to reduce Web application size based on reduction of the contents of HTML files. It removes unimportant contents regardless of the HTML file size. The removing is not ignored any character that is predicted in the HTML building process.

Bridge Analysis Structure under Human Induced Dynamic Load

The paper deals with the analysis of the dynamic response of footbridges under human - induced dynamic loads. This is a frequently occurring and often dominant load for footbridges as it stems from the very purpose of a footbridge - to convey pedestrian. Due to the emergence of new materials and advanced engineering technology, slender footbridges are increasingly becoming popular to satisfy the modern transportation needs and the aesthetical requirements of the society. These structures however are always lively with low stiffness, low mass, low damping and low natural frequencies. As a consequence, they are prone to vibration induced by human activities and can suffer severe vibration serviceability problems, particularly in the lateral direction. Pedestrian bridges are designed according to first and second limit states, these are the criteria involved in response to static design load. However, it is necessary to assess the dynamic response of bridge design load on pedestrians and assess it impact on the comfort of the user movement. Usually the load is considered a person or a small group which can be assumed in perfect motion synchronization. Already one person or small group can excite significant vibration of the deck. In order to calculate the dynamic response to the movement of people, designer needs available and suitable computational model and criteria. For the calculation program ANSYS based on finite element method was used.