Auto Classification for Search Intelligence

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Induced Graphoidal Covers in a Graph

An induced graphoidal cover of a graph G is a collection ψ of (not necessarily open) paths in G such that every path in ψ has at least two vertices, every vertex of G is an internal vertex of at most one path in ψ, every edge of G is in exactly one path in ψ and every member of ψ is an induced cycle or an induced path. The minimum cardinality of an induced graphoidal cover of G is called the induced graphoidal covering number of G and is denoted by ηi(G) or ηi. Here we find induced graphoidal cover for some classes of graphs.

Coded Transmission in Synthetic Transmit Aperture Ultrasound Imaging Method

The paper presents the study of synthetic transmit aperture method applying the Golay coded transmission for medical ultrasound imaging. Longer coded excitation allows to increase the total energy of the transmitted signal without increasing the peak pressure. Signal-to-noise ratio and penetration depth are improved maintaining high ultrasound image resolution. In the work the 128-element linear transducer array with 0.3 mm inter-element spacing excited by one cycle and the 8 and 16-bit Golay coded sequences at nominal frequencies 4 MHz was used. Single element transmission aperture was used to generate a spherical wave covering the full image region and all the elements received the echo signals. The comparison of 2D ultrasound images of the wire phantom as well as of the tissue mimicking phantom is presented to demonstrate the benefits of the coded transmission. The results were obtained using the synthetic aperture algorithm with transmit and receive signals correction based on a single element directivity function.

A Study on Fuzzy Adaptive Control of Enteral Feeding Pump

Recent medical studies have investigated the importance of enteral feeding and the use of feeding pumps for recovering patients unable to feed themselves or gain nourishment and nutrients by natural means. The most of enteral feeding system uses a peristaltic tube pump. A peristaltic pump is a form of positive displacement pump in which a flexible tube is progressively squeezed externally to allow the resulting enclosed pillow of fluid to progress along it. The squeezing of the tube requires a precise and robust controller of the geared motor to overcome parametric uncertainty of the pumping system which generates due to a wide variation of friction and slip between tube and roller. So, this paper proposes fuzzy adaptive controller for the robust control of the peristaltic tube pump. This new adaptive controller uses a fuzzy multi-layered architecture which has several independent fuzzy controllers in parallel, each with different robust stability area. Out of several independent fuzzy controllers, the most suited one is selected by a system identifier which observes variations in the controlled system parameter. This paper proposes a design procedure which can be carried out mathematically and systematically from the model of a controlled system. Finally, the good control performance, accurate dose rate and robust system stability, of the developed feeding pump is confirmed through experimental and clinic testing.

Evolutionary Search Techniques to Solve Set Covering Problems

Set covering problem is a classical problem in computer science and complexity theory. It has many applications, such as airline crew scheduling problem, facilities location problem, vehicle routing, assignment problem, etc. In this paper, three different techniques are applied to solve set covering problem. Firstly, a mathematical model of set covering problem is introduced and solved by using optimization solver, LINGO. Secondly, the Genetic Algorithm Toolbox available in MATLAB is used to solve set covering problem. And lastly, an ant colony optimization method is programmed in MATLAB programming language. Results obtained from these methods are presented in tables. In order to assess the performance of the techniques used in this project, the benchmark problems available in open literature are used.

ReSeT : Reverse Engineering System Requirements Tool

Reverse Engineering is a very important process in Software Engineering. It can be performed backwards from system development life cycle (SDLC) in order to get back the source data or representations of a system through analysis of its structure, function and operation. We use reverse engineering to introduce an automatic tool to generate system requirements from its program source codes. The tool is able to accept the Cµ programming source codes, scan the source codes line by line and parse the codes to parser. Then, the engine of the tool will be able to generate system requirements for that specific program to facilitate reuse and enhancement of the program. The purpose of producing the tool is to help recovering the system requirements of any system when the system requirements document (SRD) does not exist due to undocumented support of the system.

Discovering Complex Regularities: from Tree to Semi-Lattice Classifications

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.

Approximate Frequent Pattern Discovery Over Data Stream

Frequent pattern discovery over data stream is a hard problem because a continuously generated nature of stream does not allow a revisit on each data element. Furthermore, pattern discovery process must be fast to produce timely results. Based on these requirements, we propose an approximate approach to tackle the problem of discovering frequent patterns over continuous stream. Our approximation algorithm is intended to be applied to process a stream prior to the pattern discovery process. The results of approximate frequent pattern discovery have been reported in the paper.

Adaptive Rfid Positioning System Using Signal Level Matrix

In this paper, we present a method named Signal Level Matrix (SLM) which can improve the accuracy and stability of active RFID indoor positioning system. Considering the accuracy and cost, we use uniform distribution mode to set up and separate the overlapped signal covering areas, in order to achieve preliminary location setting. Then, based on the proposed SLM concept and the characteristic of the signal strength value that attenuates as the distance increases, this system cross-examines the distribution of adjacent signals to locate the users more accurately. The experimental results indicate that the adaptive positioning method proposed in this paper could improve the accuracy and stability of the positioning system effectively and satisfyingly.

A Study of Efficiency and Prioritize of Eurasian Logistics Network

Recently, Northeast Asia has become one of the three largest trade areas, covering approximately 30% of the total trade volume of the world. However, the distribution facilities are saturated due to the increase in the transportation volume within the area and with the European countries. In order to accommodate the increase of the transportation volume, the transportation networking with the major countries in Northeast Asia and Europe is absolutely necessary. The Eurasian Logistics Network will develop into an international passenger transportation network covering the Northeast Asian region and an international freight transportation network connecting across Eurasia Continent. This paper surveys the changes and trend of the distribution network in the Eurasian Region according to the political, economic and environmental changes of the region, analyses the distribution network according to the changes in the transportation policies of the related countries, and provides the direction of the development of composite transportation on the basis of the present conditions of transportation means. The transportation means optimal for the efficiency of transportation system are suggested to be train ferries, sea & rail or sea & rail & sea. It is suggested to develop diversified composite transportation means and routes within the boundary of international cooperation system.

Novel Anti-leukemia Calanone Compounds by Quantitative Structure-Activity Relationship AM1 Semiempirical Method

Quantitative Structure-Activity Relationship (QSAR) approach for discovering novel more active Calanone derivative as anti-leukemia compound has been conducted. There are 6 experimental activities of Calanone compounds against leukemia cell L1210 that are used as material of the research. Calculation of theoretical predictors (independent variables) was performed by AM1 semiempirical method. The QSAR equation is determined by Principle Component Regression (PCR) analysis, with Log IC50 as dependent variable and the independent variables are atomic net charges, dipole moment (μ), and coefficient partition of noctanol/ water (Log P). Three novel Calanone derivatives that obtained by this research have higher activity against leukemia cell L1210 than pure Calanone.

Covering-based Rough sets Based on the Refinement of Covering-element

Covering-based rough sets is an extension of rough sets and it is based on a covering instead of a partition of the universe. Therefore it is more powerful in describing some practical problems than rough sets. However, by extending the rough sets, covering-based rough sets can increase the roughness of each model in recognizing objects. How to obtain better approximations from the models of a covering-based rough sets is an important issue. In this paper, two concepts, determinate elements and indeterminate elements in a universe, are proposed and given precise definitions respectively. This research makes a reasonable refinement of the covering-element from a new viewpoint. And the refinement may generate better approximations of covering-based rough sets models. To prove the theory above, it is applied to eight major coveringbased rough sets models which are adapted from other literature. The result is, in all these models, the lower approximation increases effectively. Correspondingly, in all models, the upper approximation decreases with exceptions of two models in some special situations. Therefore, the roughness of recognizing objects is reduced. This research provides a new approach to the study and application of covering-based rough sets.

Concepts Extraction from Discharge Notes using Association Rule Mining

A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. In this study, we developed a domain based software system to transform 600 Otorhinolaryngology discharge notes to a structured form for extracting clinical data from the discharge notes. In order to decrease the system process time discharge notes were transformed into a data table after preprocessing. Several word lists were constituted to identify common section in the discharge notes, including patient history, age, problems, and diagnosis etc. N-gram method was used for discovering terms co-Occurrences within each section. Using this method a dataset of concept candidates has been generated for the validation step, and then Predictive Apriori algorithm for Association Rule Mining (ARM) was applied to validate candidate concepts.

The Direct and Indirect Effects of the Achievement Motivation on Nurturing Intellectual Giftedness

Achievement motivation is believed to promote giftedness attracting people to invest in many programs to adopt gifted students providing them with challenging activities. Intellectual giftedness is founded on the fluid intelligence and extends to more specific abilities through the growth and inputs from the achievement motivation. Acknowledging the roles played by the motivation in the development of giftedness leads to an effective nurturing of gifted individuals. However, no study has investigated the direct and indirect effects of the achievement motivation and fluid intelligence on intellectual giftedness. Thus, this study investigated the contribution of motivation factors to giftedness development by conducting tests of fluid intelligence using Cattell Culture Fair Test (CCFT) and analytical abilities using culture reduced test items covering problem solving, pattern recognition, audio-logic, audio-matrices, and artificial language, and self report questionnaire for the motivational factors. A number of 180 highscoring students were selected using CCFT from a leading university in Malaysia. Structural equation modeling was employed using Amos V.16 to determine the direct and indirect effects of achievement motivation factors (self confidence, success, perseverance, competition, autonomy, responsibility, ambition, and locus of control) on the intellectual giftedness. The findings showed that the hypothesized model fitted the data, supporting the model postulates and showed significant and strong direct and indirect effects of the motivation and fluid intelligence on the intellectual giftedness.

CFD Simulations of Flow in Capillary Flow Liquid Acquisition Device Channel

Future space vehicles will require the use of non-toxic, cryogenic propellants, because of the performance advantages over the toxic hypergolic propellants and also because of the environmental and handling concerns. A prototypical capillary flow liquid acquisition device (LAD) for cryogenic propellants was fabricated with a mesh screen, covering a rectangular flow channel with a cylindrical outlet tube, and was tested with liquid oxygen (LOX). In order to better understand the performance in various gravity environments and orientations with different submersion depths of the LAD, a series of computational fluid dynamics (CFD) simulations of LOX flow through the LAD screen channel, including horizontally and vertically submersions of the LAD channel assembly at normal gravity environment was conducted. Gravity effects on the flow field in LAD channel are inspected and analyzed through comparing the simulations.

Reliable Capacitated Facility Location Problem Considering Maximal Covering

This paper provides a framework in order to incorporate reliability issue as a sign of disruption in distribution systems and partial covering theory as a response to limitation in coverage radios and economical preferences, simultaneously into the traditional literatures of capacitated facility location problems. As a result we develop a bi-objective model based on the discrete scenarios for expected cost minimization and demands coverage maximization through a three echelon supply chain network by facilitating multi-capacity levels for provider side layers and imposing gradual coverage function for distribution centers (DCs). Additionally, in spite of objectives aggregation for solving the model through LINGO software, a branch of LP-Metric method called Min- Max approach is proposed and different aspects of corresponds model will be explored.

Dempster-Shafer Evidence Theory for Image Segmentation: Application in Cells Images

In this paper we propose a new knowledge model using the Dempster-Shafer-s evidence theory for image segmentation and fusion. The proposed method is composed essentially of two steps. First, mass distributions in Dempster-Shafer theory are obtained from the membership degrees of each pixel covering the three image components (R, G and B). Each membership-s degree is determined by applying Fuzzy C-Means (FCM) clustering to the gray levels of the three images. Second, the fusion process consists in defining three discernment frames which are associated with the three images to be fused, and then combining them to form a new frame of discernment. The strategy used to define mass distributions in the combined framework is discussed in detail. The proposed fusion method is illustrated in the context of image segmentation. Experimental investigations and comparative studies with the other previous methods are carried out showing thus the robustness and superiority of the proposed method in terms of image segmentation.

Mining of Interesting Prediction Rules with Uniform Two-Level Genetic Algorithm

The main goal of data mining is to extract accurate, comprehensible and interesting knowledge from databases that may be considered as large search spaces. In this paper, a new, efficient type of Genetic Algorithm (GA) called uniform two-level GA is proposed as a search strategy to discover truly interesting, high-level prediction rules, a difficult problem and relatively little researched, rather than discovering classification knowledge as usual in the literatures. The proposed method uses the advantage of uniform population method and addresses the task of generalized rule induction that can be regarded as a generalization of the task of classification. Although the task of generalized rule induction requires a lot of computations, which is usually not satisfied with the normal algorithms, it was demonstrated that this method increased the performance of GAs and rapidly found interesting rules.

An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.

A Novel Approach for Protein Classification Using Fourier Transform

Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.