Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Recursive Similarity Hashing of Fractal Geometry

A new technique of topological multi-scale analysis is introduced. By performing a clustering recursively to build a hierarchy, and analyzing the co-scale and intra-scale similarities, an Iterated Function System can be extracted from any data set. The study of fractals shows that this method is efficient to extract self-similarities, and can find elegant solutions the inverse problem of building fractals. The theoretical aspects and practical implementations are discussed, together with examples of analyses of simple fractals.

Multiwavelet and Biological Signal Processing

In this paper we are to find the optimum multiwavelet for compression of electrocardiogram (ECG) signals and then, selecting it for using with SPIHT codec. At present, it is not well known which multiwavelet is the best choice for optimum compression of ECG. In this work, we examine different multiwavelets on 24 sets of ECG data with entirely different characteristics, selected from MIT-BIH database. For assessing the functionality of the different multiwavelets in compressing ECG signals, in addition to known factors such as Compression Ratio (CR), Percent Root Difference (PRD), Distortion (D), Root Mean Square Error (RMSE) in compression literature, we also employed the Cross Correlation (CC) criterion for studying the morphological relations between the reconstructed and the original ECG signal and Signal to reconstruction Noise Ratio (SNR). The simulation results show that the Cardinal Balanced Multiwavelet (cardbal2) by the means of identity (Id) prefiltering method to be the best effective transformation. After finding the most efficient multiwavelet, we apply SPIHT coding algorithm on the transformed signal by this multiwavelet.

Application of Nano Cutting Fluid under Minimum Quantity Lubrication (MQL) Technique to Improve Grinding of Ti – 6Al – 4V Alloy

Minimum Quantity Lubrication (MQL) technique obtained a significant attention in machining processes to reduce environmental loads caused by usage of conventional cutting fluids. Recently nanofluids are finding an extensive application in the field of mechanical engineering because of their superior lubrication and heat dissipation characteristics. This paper investigates the use of a nanofluid under MQL mode to improve grinding characteristics of Ti-6Al-4V alloy. Taguchi-s experimental design technique has been used in the present investigation and a second order model has been established to predict grinding forces and surface roughness. Different concentrations of water based Al2O3 nanofluids were applied in the grinding operation through MQL setup developed in house and the results have been compared with those of conventional coolant and pure water. Experimental results showed that grinding forces reduced significantly when nano cutting fluid was used even at low concentration of the nano particles and surface finish has been found to improve with higher concentration of the nano particles.

Mathematical Modeling of Non-Isothermal Multi-Component Fluid Flow in Pipes Applying to Rapid Gas Decompression in Rich and Base Gases

The paper presents a one-dimensional transient mathematical model of compressible non-isothermal multicomponent fluid mixture flow in a pipe. The set of the mass, momentum and enthalpy conservation equations for gas phase is solved in the model. Thermo-physical properties of multi-component gas mixture are calculated by solving the Equation of State (EOS) model. The Soave-Redlich-Kwong (SRK-EOS) model is chosen. Gas mixture viscosity is calculated on the basis of the Lee-Gonzales- Eakin (LGE) correlation. Numerical analysis of rapid gas decompression process in rich and base natural gases is made on the basis of the proposed mathematical model. The model is successfully validated on the experimental data [1]. The proposed mathematical model shows a very good agreement with the experimental data [1] in a wide range of pressure values and predicts the decompression in rich and base gas mixtures much better than analytical and mathematical models, which are available from the open source literature.

A Generalized Coordination Setting Method for Distribution Systems with Closed-loop

The protection issues in distribution systems with open and closed-loop are studied, and a generalized protection setting scheme based on the traditional over current protection theories is proposed to meet the new requirements. The setting method is expected to be easier realized using computer program, so that the on-line adaptive setting for coordination in distribution system can be implemented. An automatic setting program is created and several cases are taken into practice. The setting results are verified by the coordination curves of the protective devices which are plotted using MATLAB.

Quasi Multi-Pulse Back-to-Back Static Synchronous Compensator Employing Line Frequency Switching 2-Level GTO Inverters

Back-to-back static synchronous compensator (BtBSTATCOM) consists of two back-to-back voltage-source converters (VSC) with a common DC link in a substation. This configuration extends the capabilities of conventional STATCOM that bidirectional active power transfer from one bus to another is possible. In this paper, VSCs are designed in quasi multi-pulse form in which GTOs are triggered only once per cycle in PSCAD/EMTDC. The design details of VSCs as well as gate switching circuits and controllers are fully represented. Regulation modes of BtBSTATCOM are verified and tested on a multi-machine power system through different simulation cases. The results presented in the form of typical time responses show that practical PI controllers are almost robust and stable in case of start-up, set-point change, and line faults.

Face Recognition with Image Rotation Detection, Correction and Reinforced Decision using ANN

Rotation or tilt present in an image capture by digital means can be detected and corrected using Artificial Neural Network (ANN) for application with a Face Recognition System (FRS). Principal Component Analysis (PCA) features of faces at different angles are used to train an ANN which detects the rotation for an input image and corrected using a set of operations implemented using another system based on ANN. The work also deals with the recognition of human faces with features from the foreheads, eyes, nose and mouths as decision support entities of the system configured using a Generalized Feed Forward Artificial Neural Network (GFFANN). These features are combined to provide a reinforced decision for verification of a person-s identity despite illumination variations. The complete system performing facial image rotation detection, correction and recognition using re-enforced decision support provides a success rate in the higher 90s.

The Spanning Laceability of k-ary n-cubes when k is Even

Qk n has been shown as an alternative to the hypercube family. For any even integer k ≥ 4 and any integer n ≥ 2, Qk n is a bipartite graph. In this paper, we will prove that given any pair of vertices, w and b, from different partite sets of Qk n, there exist 2n internally disjoint paths between w and b, denoted by {Pi | 0 ≤ i ≤ 2n-1}, such that 2n-1 i=0 Pi covers all vertices of Qk n. The result is optimal since each vertex of Qk n has exactly 2n neighbors.

Gradual Shot Boundary Detection and Classification Based on Fractal Analysis

Shot boundary detection is a fundamental step for the organization of large video data. In this paper, we propose a new method for video gradual shots detection and classification, using advantages of fractal analysis and AIS-based classifier. Proposed features are “vertical intercept" and “fractal dimension" of each frame of videos which are computed using Fourier transform coefficients. We also used a classifier based on Clonal Selection Algorithm. We have carried out our solution and assessed it according to the TRECVID2006 benchmark dataset.

Data Preprocessing for Supervised Leaning

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

A Codebook-based Redundancy Suppression Mechanism with Lifetime Prediction in Cluster-based WSN

Wireless Sensor Network (WSN) comprises of sensor nodes which are designed to sense the environment, transmit sensed data back to the base station via multi-hop routing to reconstruct physical phenomena. Since physical phenomena exists significant overlaps between temporal redundancy and spatial redundancy, it is necessary to use Redundancy Suppression Algorithms (RSA) for sensor node to lower energy consumption by reducing the transmission of redundancy. A conventional algorithm of RSAs is threshold-based RSA, which sets threshold to suppress redundant data. Although many temporal and spatial RSAs are proposed, temporal-spatial RSA are seldom to be proposed because it is difficult to determine when to utilize temporal or spatial RSAs. In this paper, we proposed a novel temporal-spatial redundancy suppression algorithm, Codebookbase Redundancy Suppression Mechanism (CRSM). CRSM adopts vector quantization to generate a codebook, which is easily used to implement temporal-spatial RSA. CRSM not only achieves power saving and reliability for WSN, but also provides the predictability of network lifetime. Simulation result shows that the network lifetime of CRSM outperforms at least 23% of that of other RSAs.

Modeling of Reinforcement in Concrete Beams Using Machine Learning Tools

The paper discusses the results obtained to predict reinforcement in singly reinforced beam using Neural Net (NN), Support Vector Machines (SVM-s) and Tree Based Models. Major advantage of SVM-s over NN is of minimizing a bound on the generalization error of model rather than minimizing a bound on mean square error over the data set as done in NN. Tree Based approach divides the problem into a small number of sub problems to reach at a conclusion. Number of data was created for different parameters of beam to calculate the reinforcement using limit state method for creation of models and validation. The results from this study suggest a remarkably good performance of tree based and SVM-s models. Further, this study found that these two techniques work well and even better than Neural Network methods. A comparison of predicted values with actual values suggests a very good correlation coefficient with all four techniques.

Bi-lingual Handwritten Character and Numeral Recognition using Multi-Dimensional Recurrent Neural Networks (MDRNN)

The key to the continued success of ANN depends, considerably, on the use of hybrid structures implemented on cooperative frame-works. Hybrid architectures provide the ability to the ANN to validate heterogeneous learning paradigms. This work describes the implementation of a set of Distributed and Hybrid ANN models for Character Recognition applied to Anglo-Assamese scripts. The objective is to describe the effectiveness of Hybrid ANN setups as innovative means of neural learning for an application like multilingual handwritten character and numeral recognition.

Novelty as a Measure of Interestingness in Knowledge Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Effects of Different Plant Densities on the Yield and Quality of Second Crop Sesame

Sesame is one of the oldest and most important oil crops as main crop and second crop agriculture. This study was carried out to determine the effects of different inter- and intra-row spacings on the yield and yield components on second crop sesame; was set up in Antalya West Mediterranean Agricultural Research Institue in 2009. Muganlı 57 sesame cultivar was used as plant material. The field experiment was set up in a split plot design and row spacings (30, 40, 50, 60 and 70 cm) were assigned to the main plots and and intra-row spacings (5, 10, 20 and 30 cm) were assigned to the subplots. Seed yield, oil ratio, oil yield, protein ratio and protein yield were investigated. In general, wided inter row spacings and intra-row spacings, resulted in decreased seed yield, oil yield and protein yield. The highest seed yield, oil yield and protein yield (respectively, 1115.0 kg ha-1, 551.3 kg ha-1, 224.7 kg ha-1) were obtained from 30x5 cm plant density while the lowest seed yield, oil yield and protein yield (respectively, 677.0 kg ha-1, 327.0 kg ha-1, 130.0 kg ha-1) were recorded from 70x30 cm plant density. As a result, in terms of oil yield for second crop sesame agriculture, 30 cm row spacing, and 5 cm intra row spacing are the most suitable plant densities.

Simulation and Analysis of the Shift Process for an Automatic Transmission

The automatic transmission (AT) is one of the most important components of many automobile transmission systems. The shift quality has a significant influence on the ride comfort of the vehicle. During the AT shift process, the joint elements such as the clutch and bands engage or disengage, linking sets of gears to create a fixed gear ratio. Since these ratios differ between gears in a fixed gear ratio transmission, the motion of the vehicle could change suddenly during the shift process if the joint elements are engaged or disengaged inappropriately, additionally impacting the entire transmission system and increasing the temperature of connect elements.The objective was to establish a system model for an AT powertrain using Matlab/Simulink. This paper further analyses the effect of varying hydraulic pressure and the associated impact on shift quality during both engagment and disengagement of the joint elements, proving that shift quality improvements could be achieved with appropriate hydraulic pressure control.

Solver for a Magnetic Equivalent Circuit and Modeling the Inrush Current of a 3-Phase Transformer

Knowledge about the magnetic quantities in a magnetic circuit is always of great interest. On the one hand, this information is needed for the simulation of a transformer. On the other hand, parameter studies are more reliable, if the magnetic quantities are derived from a well established model. One possibility to model the 3-phase transformer is by using a magnetic equivalent circuit (MEC). Though this is a well known system, it is often not an easy task to set up such a model for a large number of lumped elements which additionally includes the nonlinear characteristic of the magnetic material. Here we show the setup of a solver for a MEC and the results of the calculation in comparison to measurements taken. The equations of the MEC are based on a rearranged system of the nodal analysis. Thus it is possible to achieve a minimum number of equations, and a clear and simple structure. Hence, it is uncomplicated in its handling and it supports the iteration process. Additional helpful tasks are implemented within the solver to enhance the performance. The electric circuit is described by an electric equivalent circuit (EEC). Our results for the 3-phase transformer demonstrate the computational efficiency of the solver, and show the benefit of the application of a MEC.

A Rough-set Based Approach to Design an Expert System for Personnel Selection

Effective employee selection is a critical component of a successful organization. Many important criteria for personnel selection such as decision-making ability, adaptability, ambition, and self-organization are naturally vague and imprecise to evaluate. The rough sets theory (RST) as a new mathematical approach to vagueness and uncertainty is a very well suited tool to deal with qualitative data and various decision problems. This paper provides conceptual, descriptive, and simulation results, concentrating chiefly on human resources and personnel selection factors. The current research derives certain decision rules which are able to facilitate personnel selection and identifies several significant features based on an empirical study conducted in an IT company in Iran.

Rapid Frequency Response Measurement of Power Conversion Products with Coherence-Based Confidence Analysis

Switched-mode converters play now a significant role in modern society. Their operation are often crucial in various electrical applications affecting the every day life. Therefore, the quality of the converters needs to be reliably verified. Recent studies have shown that the converters can be fully characterized by a set of frequency responses which can be efficiently used to validate the proper operation of the converters. Consequently, several methods have been proposed to measure the frequency responses fast and accurately. Most often correlation-based techniques have been applied. The presented measurement methods are highly sensitive to external errors and system nonlinearities. This fact has been often forgotten and the necessary uncertainty analysis of the measured responses has been neglected. This paper presents a simple approach to analyze the noise and nonlinearities in the frequency-response measurements of switched-mode converters. Coherence analysis is applied to form a confidence interval characterizing the noise and nonlinearities involved in the measurements. The presented method is verified by practical measurements from a high-frequency switchedmode converter.