Auto Classification for Search Intelligence

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Web Log Mining by an Improved AprioriAll Algorithm

This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.

Decision Support System Based on Data Warehouse

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

Flowability and Strength Development Characteristics of Bottom Ash Based Geopolymer

Despite of the preponderant role played by cement among the construction materials, it is today considered as a material destructing the environment due to the large quantities of carbon dioxide exhausted during its manufacture. Besides, global warming is now recognized worldwide as the new threat to the humankind against which advanced countries are investigating measures to reduce the current amount of exhausted gases to the half by 2050. Accordingly, efforts to reduce green gases are exerted in all industrial fields. Especially, the cement industry strives to reduce the consumption of cement through the development of alkali-activated geopolymer mortars using industrial byproducts like bottom ash. This study intends to gather basic data on the flowability and strength development characteristics of alkali-activated geopolymer mortar by examining its FT-IT features with respect to the effects and strength of the alkali-activator in order to develop bottom ash-based alkali-activated geopolymer mortar. The results show that the 35:65 mass ratio of sodium hydroxide to sodium silicate is appropriate and that a molarity of 9M for sodium hydroxide is advantageous. The ratio of the alkali-activators to bottom ash is seen to have poor effect on the strength. Moreover, the FT-IR analysis reveals that larger improvement of the strength shifts the peak from 1060 cm–1 (T-O, T=Si or Al) toward shorter wavenumber.

Improving Air Temperature Prediction with Artificial Neural Networks

The mitigation of crop loss due to damaging freezes requires accurate air temperature prediction models. Previous work established that the Ward-style artificial neural network (ANN) is a suitable tool for developing such models. The current research focused on developing ANN models with reduced average prediction error by increasing the number of distinct observations used in training, adding additional input terms that describe the date of an observation, increasing the duration of prior weather data included in each observation, and reexamining the number of hidden nodes used in the network. Models were created to predict air temperature at hourly intervals from one to 12 hours ahead. Each ANN model, consisting of a network architecture and set of associated parameters, was evaluated by instantiating and training 30 networks and calculating the mean absolute error (MAE) of the resulting networks for some set of input patterns. The inclusion of seasonal input terms, up to 24 hours of prior weather information, and a larger number of processing nodes were some of the improvements that reduced average prediction error compared to previous research across all horizons. For example, the four-hour MAE of 1.40°C was 0.20°C, or 12.5%, less than the previous model. Prediction MAEs eight and 12 hours ahead improved by 0.17°C and 0.16°C, respectively, improvements of 7.4% and 5.9% over the existing model at these horizons. Networks instantiating the same model but with different initial random weights often led to different prediction errors. These results strongly suggest that ANN model developers should consider instantiating and training multiple networks with different initial weights to establish preferred model parameters.

The Performance Improvement of the Target Position Determining System in Laser Tracking Based on 4Q Detector using Neural Network

One of the methods for detecting the target position error in the laser tracking systems is using Four Quadrant (4Q) detectors. If the coordinates of the target center is yielded through the usual relations of the detector outputs, the results will be nonlinear, dependent on the shape, target size and its position on the detector screen. In this paper we have designed an algorithm with using neural network that coordinates of the target center in laser tracking systems is calculated by using detector outputs obtained from visual modeling. With this method, the results except from the part related to the detector intrinsic limitation, are linear and dependent from the shape and target size.

Application of a Fracture-Mechanics Approach to Gas Pipelines

This study offers a new simple method for assessing an axial part-through crack in a pipe wall. The method utilizes simple approximate expressions for determining the fracture parameters K, J, and employs these parameters to determine critical dimensions of a crack on the basis of equality between the J-integral and the J-based fracture toughness of the pipe steel. The crack tip constraint is taken into account by the so-called plastic constraint factor C, by which the uniaxial yield stress in the J-integral equation is multiplied. The results of the prediction of the fracture condition are verified by burst tests on test pipes.

Learning an Overcomplete Dictionary using a Cauchy Mixture Model for Sparse Decay

An algorithm for learning an overcomplete dictionary using a Cauchy mixture model for sparse decomposition of an underdetermined mixing system is introduced. The mixture density function is derived from a ratio sample of the observed mixture signals where 1) there are at least two but not necessarily more mixture signals observed, 2) the source signals are statistically independent and 3) the sources are sparse. The basis vectors of the dictionary are learned via the optimization of the location parameters of the Cauchy mixture components, which is shown to be more accurate and robust than the conventional data mining methods usually employed for this task. Using a well known sparse decomposition algorithm, we extract three speech signals from two mixtures based on the estimated dictionary. Further tests with additive Gaussian noise are used to demonstrate the proposed algorithm-s robustness to outliers.

Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System

During last decades, worldwide researchers dedicated efforts to develop machine-based seismic Early Warning systems, aiming at reducing the huge human losses and economic damages. The elaboration time of seismic waveforms is to be reduced in order to increase the time interval available for the activation of safety measures. This paper suggests a Data Mining model able to correctly and quickly estimate dangerousness of the running seismic event. Several thousand seismic recordings of Japanese and Italian earthquakes were analyzed and a model was obtained by means of a Bayesian Network (BN), which was tested just over the first recordings of seismic events in order to reduce the decision time and the test results were very satisfactory. The model was integrated within an Early Warning System prototype able to collect and elaborate data from a seismic sensor network, estimate the dangerousness of the running earthquake and take the decision of activating the warning promptly.

Investigating Feed Mix Problem Approaches: An Overview and Potential Solution

Feed is one of the factors which play an important role in determining a successful development of an aquaculture industry. It is always critical to produce the best aquaculture diet at a minimum cost in order to trim down the operational cost and gain more profit. However, the feed mix problem becomes increasingly difficult since many issues need to be considered simultaneously. Thus, the purpose of this paper is to review the current techniques used by nutritionist and researchers to tackle the issues. Additionally, this paper introduce an enhance algorithm which is deemed suitable to deal with all the issues arise. The proposed technique refers to Hybrid Genetic Algorithm which is expected to obtain the minimum cost diet for farmed animal, while satisfying nutritional requirements. Hybrid GA technique with artificial bee algorithm is expected to reduce the penalty function and provide a better solution for the feed mix problem.

Improving Academic Performance Prediction using Voting Technique in Data Mining

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Determination of Regimes of the Equivalent Generator Based On Projective Geometry: The Generalized Equivalent Generator

Requirements that should be met when determining the regimes of circuits with variable elements are formulated. The interpretation of the variations in the regimes, based on projective geometry, enables adequate expressions for determining and comparing the regimes to be derived. It is proposed to use as the parameters of a generalized equivalent generator of an active two-pole with changeable resistor such load current and voltage which provide the current through this resistor equal to zero.

Scots Pine Needles as Bioindicators in Determining the Aerial Distribution Pattern of Sulphur Emissions around Industrial Plants

In this study, the Scots pine (Pinus sylvestris L.) C needles (i.e. the current-year-needles) were used as bioindicators in determining the aerial distribution pattern of sulphur emissions around industrial point sources at Kemi, Northern Finland. The average sulphur concentration in the C needles was 897 mg/kg (d.w.), with a standard deviation of 118 mg/kg (d.w.) and range 740 – 1350 mg/kg (d.w.). According to results in this study, Scots pine needles (Pinus sylvestris L.) appear to be an ideal bioindicators for identifying atmospheric sulphur pollution derived from industrial plants and can complement the information provided by plant mapping studies around industrial plants.

A Sequential Pattern Mining Method Based On Sequential Interestingness

Sequential mining methods efficiently discover all frequent sequential patterns included in sequential data. These methods use the support, which is the previous criterion that satisfies the Apriori property, to evaluate the frequency. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and the analysts cannot get new knowledge from the patterns. The paper proposes a new criterion, namely, the sequential interestingness, to discover sequential patterns that are more attractive for the analysts. The paper shows that the criterion satisfies the Apriori property and how the criterion is related to the support. Also, the paper proposes an efficient sequential mining method based on the proposed criterion. Lastly, the paper shows the effectiveness of the proposed method by applying the method to two kinds of sequential data.

Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution

Today, Genetic Algorithm has been used to solve wide range of optimization problems. Some researches conduct on applying Genetic Algorithm to text classification, summarization and information retrieval system in text mining process. This researches show a better performance due to the nature of Genetic Algorithm. In this paper a new algorithm for using Genetic Algorithm in concept weighting and topic identification, based on concept standard deviation will be explored.

Data Mining Applied to the Predictive Model of Triage System in Emergency Department

The Emergency Department of a medical center in Taiwan cooperated to conduct the research. A predictive model of triage system is contracted from the contract procedure, selection of parameters to sample screening. 2,000 pieces of data needed for the patients is chosen randomly by the computer. After three categorizations of data mining (Multi-group Discriminant Analysis, Multinomial Logistic Regression, Back-propagation Neural Networks), it is found that Back-propagation Neural Networks can best distinguish the patients- extent of emergency, and the accuracy rate can reach to as high as 95.1%. The Back-propagation Neural Networks that has the highest accuracy rate is simulated into the triage acuity expert system in this research. Data mining applied to the predictive model of the triage acuity expert system can be updated regularly for both the improvement of the system and for education training, and will not be affected by subjective factors.

A Novel Approach to Optimal Cutting Tool Replacement

In metal cutting industries, mathematical/statistical models are typically used to predict tool replacement time. These off-line methods usually result in less than optimum replacement time thereby either wasting resources or causing quality problems. The few online real-time methods proposed use indirect measurement techniques and are prone to similar errors. Our idea is based on identifying the optimal replacement time using an electronic nose to detect the airborne compounds released when the tool wear reaches to a chemical substrate doped into tool material during the fabrication. The study investigates the feasibility of the idea, possible doping materials and methods along with data stream mining techniques for detection and monitoring different phases of tool wear.

Concurrency in Web Access Patterns Mining

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.

Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.