Abstract: This paper proposes an auto-classification algorithm
of Web pages using Data mining techniques. We consider the
problem of discovering association rules between terms in a set of
Web pages belonging to a category in a search engine database, and
present an auto-classification algorithm for solving this problem that
are fundamentally based on Apriori algorithm. The proposed
technique has two phases. The first phase is a training phase where
human experts determines the categories of different Web pages, and
the supervised Data mining algorithm will combine these categories
with appropriate weighted index terms according to the highest
supported rules among the most frequent words. The second phase is
the categorization phase where a web crawler will crawl through the
World Wide Web to build a database categorized according to the
result of the data mining approach. This database contains URLs and
their categories.
Abstract: This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.
Abstract: Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Abstract: Despite of the preponderant role played by cement among the construction materials, it is today considered as a material destructing the environment due to the large quantities of carbon dioxide exhausted during its manufacture. Besides, global warming is now recognized worldwide as the new threat to the humankind against which advanced countries are investigating measures to reduce the current amount of exhausted gases to the half by 2050. Accordingly, efforts to reduce green gases are exerted in all industrial fields. Especially, the cement industry strives to reduce the consumption of cement through the development of alkali-activated geopolymer mortars using industrial byproducts like bottom ash. This study intends to gather basic data on the flowability and strength development characteristics of alkali-activated geopolymer mortar by examining its FT-IT features with respect to the effects and strength of the alkali-activator in order to develop bottom ash-based alkali-activated geopolymer mortar. The results show that the 35:65 mass ratio of sodium hydroxide to sodium silicate is appropriate and that a molarity of 9M for sodium hydroxide is advantageous. The ratio of the alkali-activators to bottom ash is seen to have poor effect on the strength. Moreover, the FT-IR analysis reveals that larger improvement of the strength shifts the peak from 1060 cm–1 (T-O, T=Si or Al) toward shorter wavenumber.
Abstract: The mitigation of crop loss due to damaging freezes
requires accurate air temperature prediction models. Previous work
established that the Ward-style artificial neural network (ANN) is a
suitable tool for developing such models. The current research
focused on developing ANN models with reduced average prediction
error by increasing the number of distinct observations used in
training, adding additional input terms that describe the date of an
observation, increasing the duration of prior weather data included in
each observation, and reexamining the number of hidden nodes used
in the network. Models were created to predict air temperature at
hourly intervals from one to 12 hours ahead. Each ANN model,
consisting of a network architecture and set of associated parameters,
was evaluated by instantiating and training 30 networks and
calculating the mean absolute error (MAE) of the resulting networks
for some set of input patterns. The inclusion of seasonal input terms,
up to 24 hours of prior weather information, and a larger number of
processing nodes were some of the improvements that reduced
average prediction error compared to previous research across all
horizons. For example, the four-hour MAE of 1.40°C was 0.20°C, or
12.5%, less than the previous model. Prediction MAEs eight and 12
hours ahead improved by 0.17°C and 0.16°C, respectively,
improvements of 7.4% and 5.9% over the existing model at these
horizons. Networks instantiating the same model but with different
initial random weights often led to different prediction errors. These
results strongly suggest that ANN model developers should consider
instantiating and training multiple networks with different initial
weights to establish preferred model parameters.
Abstract: One of the methods for detecting the target position
error in the laser tracking systems is using Four Quadrant (4Q)
detectors. If the coordinates of the target center is yielded through the
usual relations of the detector outputs, the results will be nonlinear,
dependent on the shape, target size and its position on the detector
screen. In this paper we have designed an algorithm with using
neural network that coordinates of the target center in laser tracking
systems is calculated by using detector outputs obtained from visual
modeling. With this method, the results except from the part related
to the detector intrinsic limitation, are linear and dependent from the
shape and target size.
Abstract: This study offers a new simple method for assessing
an axial part-through crack in a pipe wall. The method utilizes simple
approximate expressions for determining the fracture parameters K,
J, and employs these parameters to determine critical dimensions of a
crack on the basis of equality between the J-integral and the J-based
fracture toughness of the pipe steel. The crack tip constraint is taken
into account by the so-called plastic constraint factor C, by which the
uniaxial yield stress in the J-integral equation is multiplied. The
results of the prediction of the fracture condition are verified by burst
tests on test pipes.
Abstract: An algorithm for learning an overcomplete dictionary
using a Cauchy mixture model for sparse decomposition of an underdetermined
mixing system is introduced. The mixture density
function is derived from a ratio sample of the observed mixture
signals where 1) there are at least two but not necessarily more
mixture signals observed, 2) the source signals are statistically
independent and 3) the sources are sparse. The basis vectors of the
dictionary are learned via the optimization of the location parameters
of the Cauchy mixture components, which is shown to be more
accurate and robust than the conventional data mining methods
usually employed for this task. Using a well known sparse
decomposition algorithm, we extract three speech signals from two
mixtures based on the estimated dictionary. Further tests with
additive Gaussian noise are used to demonstrate the proposed
algorithm-s robustness to outliers.
Abstract: During last decades, worldwide researchers dedicated
efforts to develop machine-based seismic Early Warning systems,
aiming at reducing the huge human losses and economic damages.
The elaboration time of seismic waveforms is to be reduced in order
to increase the time interval available for the activation of safety
measures. This paper suggests a Data Mining model able to correctly
and quickly estimate dangerousness of the running seismic event.
Several thousand seismic recordings of Japanese and Italian
earthquakes were analyzed and a model was obtained by means of a
Bayesian Network (BN), which was tested just over the first
recordings of seismic events in order to reduce the decision time and
the test results were very satisfactory.
The model was integrated within an Early Warning System
prototype able to collect and elaborate data from a seismic sensor
network, estimate the dangerousness of the running earthquake and
take the decision of activating the warning promptly.
Abstract: Feed is one of the factors which play an important role in determining a successful development of an aquaculture industry. It is always critical to produce the best aquaculture diet at a minimum cost in order to trim down the operational cost and gain more profit. However, the feed mix problem becomes increasingly difficult since many issues need to be considered simultaneously. Thus, the purpose of this paper is to review the current techniques used by nutritionist and researchers to tackle the issues. Additionally, this paper introduce an enhance algorithm which is deemed suitable to deal with all the issues arise. The proposed technique refers to Hybrid Genetic Algorithm which is expected to obtain the minimum cost diet for farmed animal, while satisfying nutritional requirements. Hybrid GA technique with artificial bee algorithm is expected to reduce the penalty function and provide a better solution for the feed mix problem.
Abstract: In this paper we compare the accuracy of data mining
methods to classifying students in order to predicting student-s class
grade. These predictions are more useful for identifying weak
students and assisting management to take remedial measures at early
stages to produce excellent graduate that will graduate at least with
second class upper. Firstly we examine single classifiers accuracy on
our data set and choose the best one and then ensembles it with a
weak classifier to produce simple voting method. We present results
show that combining different classifiers outperformed other single
classifiers for predicting student performance.
Abstract: The goal of this paper is to segment the countries
based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and
use this distance function in K-means algorithm. The DEB function
is defined based on the concepts of the association rules and the
value of export group-commodities. In this paper, clustering quality
function and clusters intraclass inertia are defined to, respectively,
calculate the optimum number of clusters and to compare the
functionality of DEB versus Euclidean distance. We have also study
the effects of importance weight in DEB function to improve
clustering quality. Lastly when segmentation is completed, a
designated RFM model is used to analyze the relative profitability of
each cluster.
Abstract: Requirements that should be met when determining the regimes of circuits with variable elements are formulated. The interpretation of the variations in the regimes, based on projective geometry, enables adequate expressions for determining and comparing the regimes to be derived. It is proposed to use as the parameters of a generalized equivalent generator of an active two-pole with changeable resistor such load current and voltage which provide the current through this resistor equal to zero.
Abstract: In this study, the Scots pine (Pinus sylvestris L.) C
needles (i.e. the current-year-needles) were used as bioindicators in
determining the aerial distribution pattern of sulphur emissions
around industrial point sources at Kemi, Northern Finland. The
average sulphur concentration in the C needles was 897 mg/kg
(d.w.), with a standard deviation of 118 mg/kg (d.w.) and range 740 –
1350 mg/kg (d.w.). According to results in this study, Scots pine
needles (Pinus sylvestris L.) appear to be an ideal bioindicators for
identifying atmospheric sulphur pollution derived from industrial
plants and can complement the information provided by plant
mapping studies around industrial plants.
Abstract: Sequential mining methods efficiently discover all frequent sequential patterns included in sequential data. These methods use the support, which is the previous criterion that satisfies the Apriori property, to evaluate the frequency. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and the analysts cannot get new knowledge from the patterns. The paper proposes a new criterion, namely, the sequential interestingness, to discover sequential patterns that are more attractive for the analysts. The paper shows that the criterion satisfies the Apriori property and how the criterion is related to the support. Also, the paper proposes an efficient sequential mining method based on the proposed criterion. Lastly, the paper shows the effectiveness of the proposed method by applying the method to two kinds of sequential data.
Abstract: Today, Genetic Algorithm has been used to solve
wide range of optimization problems. Some researches conduct on
applying Genetic Algorithm to text classification, summarization
and information retrieval system in text mining process. This
researches show a better performance due to the nature of Genetic
Algorithm. In this paper a new algorithm for using Genetic
Algorithm in concept weighting and topic identification, based on
concept standard deviation will be explored.
Abstract: The Emergency Department of a medical center in
Taiwan cooperated to conduct the research. A predictive model of
triage system is contracted from the contract procedure, selection of
parameters to sample screening. 2,000 pieces of data needed for the
patients is chosen randomly by the computer. After three
categorizations of data mining (Multi-group Discriminant Analysis,
Multinomial Logistic Regression, Back-propagation Neural
Networks), it is found that Back-propagation Neural Networks can
best distinguish the patients- extent of emergency, and the accuracy
rate can reach to as high as 95.1%. The Back-propagation Neural
Networks that has the highest accuracy rate is simulated into the triage
acuity expert system in this research. Data mining applied to the
predictive model of the triage acuity expert system can be updated
regularly for both the improvement of the system and for education
training, and will not be affected by subjective factors.
Abstract: In metal cutting industries, mathematical/statistical
models are typically used to predict tool replacement time. These
off-line methods usually result in less than optimum replacement
time thereby either wasting resources or causing quality problems.
The few online real-time methods proposed use indirect measurement
techniques and are prone to similar errors. Our idea is based on
identifying the optimal replacement time using an electronic nose to
detect the airborne compounds released when the tool wear reaches
to a chemical substrate doped into tool material during the
fabrication. The study investigates the feasibility of the idea, possible
doping materials and methods along with data stream mining
techniques for detection and monitoring different phases of tool
wear.
Abstract: Web usage mining is an interesting application of data
mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the
key challenges for web access patterns mining is tackling the problem
of mining richly structured patterns. This paper proposes a novel
model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph
also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more
complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the
search for and representation of concurrency between access patterns
on the web. From experiments conducted on large-scale synthetic
sequence data as well as real web access data, it is demonstrated that
CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.
Abstract: Opinion extraction about products from customer
reviews is becoming an interesting area of research. Customer
reviews about products are nowadays available from blogs and
review sites. Also tools are being developed for extraction of opinion
from these reviews to help the user as well merchants to track the
most suitable choice of product. Therefore efficient method and
techniques are needed to extract opinions from review and blogs. As
reviews of products mostly contains discussion about the features,
functions and services, therefore, efficient techniques are required to
extract user comments about the desired features, functions and
services. In this paper we have proposed a novel idea to find features
of product from user review in an efficient way. Our focus in this
paper is to get the features and opinion-oriented words about
products from text through auxiliary verbs (AV) {is, was, are, were,
has, have, had}. From the results of our experiments we found that
82% of features and 85% of opinion-oriented sentences include AVs.
Thus these AVs are good indicators of features and opinion
orientation in customer reviews.