Mining Sequential Patterns Using I-PrefixSpan

In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.

Sliding Joints and Soil-Structure Interaction

Use of a sliding joint is an effective method to decrease the stress in foundation structure where there is a horizontal deformation of subsoil (areas afflicted with underground mining) or horizontal deformation of a foundation structure (pre-stressed foundations, creep, shrinkage, temperature deformation). A convenient material for a sliding joint is a bitumen asphalt belt. Experiments for different types of bitumen belts were undertaken at the Faculty of Civil Engineering - VSB Technical University of Ostrava in 2008. This year an extension of the 2008 experiments is in progress and the shear resistance of a slide joint is being tested as a function of temperature in a temperature controlled room. In this paper experimental results of temperature dependant shear resistance are presented. The result of the experiments should be the sliding joint shear resistance as a function of deformation velocity and temperature. This relationship is used for numerical analysis of stress/strain relation between foundation structure and subsoil. Using a rheological slide joint could lead to a decrease of the reinforcement amount, and contribute to higher reliability of foundation structure and thus enable design of more durable and sustainable building structures.

Application of Association Rule Mining in Supplier Selection Criteria

In this paper the application of rule mining in order to review the effective factors on supplier selection is reviewed in the following three sections 1) criteria selecting and information gathering 2) performing association rule mining 3) validation and constituting rule base. Afterwards a few of applications of rule base is explained. Then, a numerical example is presented and analyzed by Clementine software. Some of extracted rules as well as the results are presented at the end.

Risk Classification of SMEs by Early Warning Model Based on Data Mining

One of the biggest problems of SMEs is their tendencies to financial distress because of insufficient finance background. In this study, an Early Warning System (EWS) model based on data mining for financial risk detection is presented. CHAID algorithm has been used for development of the EWS. Developed EWS can be served like a tailor made financial advisor in decision making process of the firms with its automated nature to the ones who have inadequate financial background. Besides, an application of the model implemented which covered 7,853 SMEs based on Turkish Central Bank (TCB) 2007 data. By using EWS model, 31 risk profiles, 15 risk indicators, 2 early warning signals, and 4 financial road maps has been determined for financial risk mitigation.

Cumulative Learning based on Dynamic Clustering of Hierarchical Production Rules(HPRs)

An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.

An Engineering Approach to Forecast Volatility of Financial Indices

By systematically applying different engineering methods, difficult financial problems become approachable. Using a combination of theory and techniques such as wavelet transform, time series data mining, Markov chain based discrete stochastic optimization, and evolutionary algorithms, this work formulated a strategy to characterize and forecast non-linear time series. It attempted to extract typical features from the volatility data sets of S&P100 and S&P500 indices that include abrupt drops, jumps and other non-linearity. As a result, accuracy of forecasting has reached an average of over 75% surpassing any other publicly available results on the forecast of any financial index.

Managing User Expectations in Information Systems Development

This paper provides new ways to explore the old problem of failure of information systems development in an organisation. Based on the theory of cognitive dissonance, information systems (IS) failure is defined as a gap between what the users expect from an information system and how well these expectations are met by the perceived performance of the delivered system. Bridging the expectation-perception gap requires that IS professionals make a radical change from being the proprietor of information systems and products to being service providers. In order to deliver systems and services that IS users perceive as valuable, IS people must become expert in determining and assessing users- expectations and perceptions. It is also suggested that the IS community, in general, has given relatively little attention to the front-end process of requirements specification for IS development. There is a simplistic belief that requirements are obtainable from users, they are then translatable into a formal specification. The process of information needs analysis is problematic and worthy of investigation.

Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction

Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.

A Rough Sets Approach for Relevant Internet/Web Online Searching

The internet is constantly expanding. Identifying web links of interest from web browsers requires users to visit each of the links listed, individually until a satisfactory link is found, therefore those users need to evaluate a considerable amount of links before finding their link of interest; this can be tedious and even unproductive. By incorporating web assistance, web users could be benefited from reduced time searching on relevant websites. In this paper, a rough set approach is presented, which facilitates classification of unlimited available e-vocabulary, to assist web users in reducing search times looking for relevant web sites. This approach includes two methods for identifying relevance data on web links based on the priority and percentage of relevance. As a result of these methods, a list of web sites is generated in priority sequence with an emphasis of the search criteria.

Examining the Pearlite Growth Interface in a Fe-C-Mn Alloy

A method of collecting composition data and examining structural features of pearlite lamellae and the parent austenite at the growth interface in a 13wt. % manganese steel has been demonstrated with the use of Scanning Transmission Electron Microscopy (STEM). The combination of composition data and the structural features observed at the growth interface show that available theories of pearlite growth cannot explain all the observations.

Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values

Estimates of temperature values at a specific time of day, from daytime and daily profiles, are needed for a number of environmental, ecological, agricultural and technical applications, ranging from natural hazards assessments, crop growth forecasting to design of solar energy systems. The scope of this research is to investigate the efficiency of data mining techniques in estimating minimum, maximum and mean temperature values. For this reason, a number of experiments have been conducted with well-known regression algorithms using temperature data from the city of Patras in Greece. The performance of these algorithms has been evaluated using standard statistical indicators, such as Correlation Coefficient, Root Mean Squared Error, etc.

Combine Duration and "Select the Priority Trip" to Improve the Number of Boats

Our goal is to effectively increase the number of boats in the river during a six month period. The main factors of determining the number of boats are duration and “select the priority trip". In the microcosmic simulation model, the best result is 4 to 24 nights with DSCF, and the number of boats is 812 with an increasing ratio of 9.0% related to the second best result. However, the number of boats is related to 31.6% less than the best one in 6 to 18 nights with FCFS. In the discrete duration model, we get from 6 to 18 nights, the numbers of boats have increased to 848 with an increase ratio of 29.7% than the best result in model I for the same time range. Moreover, from 4 to 24 nights, the numbers of boats have increase to 1194 with an increase ratio of 47.0% than the best result in model I for the same time range.

Quantification of Periodicities in Fugitive Emission of Gases from Lyari Waterway

Periodicities in the environmetric time series can be idyllically assessed by utilizing periodic models. In this communication fugitive emission of gases from open sewer channel Lyari which follows periodic behaviour are approximated by employing periodic autoregressive model of order p. The orders of periodic model for each season are selected through the examination of periodic partial autocorrelation or information criteria. The parameters for the selected order of season are estimated individually for each emitted air toxin. Subsequently, adequacies of fitted models are established by examining the properties of the residual for each season. These models are beneficial for schemer and administrative bodies for the improvement of implemented policies to surmount future environmental problems.

Biomechanical Analysis of the Basic Classical Dance Jump – The Grand Jeté

The aim of this study was to analyse the most important parameters determining the quality of the motion structure of the basic classical dance jump – grand jeté.Research sample consisted of 8 students of the Dance Conservatory in Brno. Using the system Simi motion we performed a 3D kinematic analysis of the jump. On the basis of the comparison of structure quality and measured data of the grand jeté, we defined the optimal values of the relevant parameters determining the quality of the performance. The take-off speed should achieve about 2.4 m·s-1, the optimum take-off angle is 28 - 30º. The take-off leg should swing backward at the beginning of the flight phase with the minimum speed of 3.3 m·s-1.If motor abilities of dancers achieve the level necessary for optimal performance of a classical dance jump, there is room for certain variability of the structure of the dance jump.

Plant Varieties Selection System

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

A Web Designer Agent, Based On Usage Mining Online Behavior of Visitors

Website plays a significant role in success of an e-business. It is the main start point of any organization and corporation for its customers, so it's important to customize and design it according to the visitors' preferences. Also, websites are a place to introduce services of an organization and highlight new service to the visitors and audiences. In this paper, we will use web usage mining techniques, as a new field of research in data mining and knowledge discovery, in an Iranian government website. Using the results, a framework for web content layour is proposed. An agent is designed to dynamically update and improve web links locations and layout. Then, we will explain how it is used to directly enable top managers of the organization to influence on the arrangement of web contents and also to enhance customization of web site navigation due to online users' behaviors.

Feature Extraction from Aerial Photos

In Geographic Information System, one of the sources of obtaining needed geographic data is digitizing analog maps and evaluation of aerial and satellite photos. In this study, a method will be discussed which can be used to extract vectorial features and creating vectorized drawing files for aerial photos. At the same time a software developed for these purpose. Converting from raster to vector is also known as vectorization and it is the most important step when creating vectorized drawing files. In the developed algorithm, first of all preprocessing on the aerial photo is done. These are; converting to grayscale if necessary, reducing noise, applying some filters and determining the edge of the objects etc. After these steps, every pixel which constitutes the photo are followed from upper left to right bottom by examining its neighborhood relationship and one pixel wide lines or polylines obtained. The obtained lines have to be erased for preventing confusion while continuing vectorization because if not erased they can be perceived as new line, but if erased it can cause discontinuity in vector drawing so the image converted from 2 bit to 8 bit and the detected pixels are expressed as a different bit. In conclusion, the aerial photo can be converted to vector form which includes lines and polylines and can be opened in any CAD application.

Clinical Decision Support for Disease Classification based on the Tests Association

Until recently, researchers have developed various tools and methodologies for effective clinical decision-making. Among those decisions, chest pain diseases have been one of important diagnostic issues especially in an emergency department. To improve the ability of physicians in diagnosis, many researchers have developed diagnosis intelligence by using machine learning and data mining. However, most of the conventional methodologies have been generally based on a single classifier for disease classification and prediction, which shows moderate performance. This study utilizes an ensemble strategy to combine multiple different classifiers to help physicians diagnose chest pain diseases more accurately than ever. Specifically the ensemble strategy is applied by using the integration of decision trees, neural networks, and support vector machines. The ensemble models are applied to real-world emergency data. This study shows that the performance of the ensemble models is superior to each of single classifiers.

Improving Spatiotemporal Change Detection: A High Level Fusion Approach for Discovering Uncertain Knowledge from Satellite Image Database

This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.

Simulation and Workspace Analysis of a Tripod Parallel Manipulator

Industrial robots play a vital role in automation however only little effort are taken for the application of robots in machining work such as Grinding, Cutting, Milling, Drilling, Polishing etc. Robot parallel manipulators have high stiffness, rigidity and accuracy, which cannot be provided by conventional serial robot manipulators. The aim of this paper is to perform the modeling and the workspace analysis of a 3 DOF Parallel Manipulator (3 DOF PM). The 3 DOF PM was modeled and simulated using 'ADAMS'. The concept involved is based on the transformation of motion from a screw joint to a spherical joint through a connecting link. This paper work has been planned to model the Parallel Manipulator (PM) using screw joints for very accurate positioning. A workspace analysis has been done for the determination of work volume of the 3 DOF PM. The position of the spherical joints connected to the moving platform and the circumferential points of the moving platform were considered for finding the workspace. After the simulation, the position of the joints of the moving platform was noted with respect to simulation time and these points were given as input to the 'MATLAB' for getting the work envelope. Then 'AUTOCAD' is used for determining the work volume. The obtained values were compared with analytical approach by using Pappus-Guldinus Theorem. The analysis had been dealt by considering the parameters, link length and radius of the moving platform. From the results it is found that the radius of moving platform is directly proportional to the work volume for a constant link length and the link length is also directly proportional to the work volume, at a constant radius of the moving platform.