A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers

As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.

Investigation of Phytoextraction Coefficient Different Combination of Heavy Metals in Barley and Alfalfa

Two seperate experiments by barley and alfalfa were conducted to a 2×8 factorial completely randomised design, with four replicates. Factors were inoculation (M) with Gomus mosseae or uninoculation (M0) and seven levels of contaminants (Co, Cd, Pb and combinations) plus an uncontaminated control treatment (C). Heavy metals in plant tissues and soil were quantified by Inductively Coupled Plasma Optical Emission Spectrometer (ICP-OES) (Variant- Liberty 150AX Turbo). Phytoextraction coefficient of contaminants calculated by concentration of heavy metals in the shoot (mgkg-1) / concentration of heavy metals in soil (mgkg-1). In the barley, the highest rate of phytoextraction coefficient of Pb, Cd and Co was in M0Pb, M0PbCoCd and MCo, respectively (P

Face Recognition Using Eigen face Coefficients and Principal Component Analysis

Face Recognition is a field of multidimensional applications. A lot of work has been done, extensively on the most of details related to face recognition. This idea of face recognition using PCA is one of them. In this paper the PCA features for Feature extraction are used and matching is done for the face under consideration with the test image using Eigen face coefficients. The crux of the work lies in optimizing Euclidean distance and paving the way to test the same algorithm using Matlab which is an efficient tool having powerful user interface along with simplicity in representing complex images.

Experimental Study of the Extraction of Copper(II) from Sulphuric Acid by Means of Sodium Diethyldithiocarbamate (SDDT)

The present work presents the extraction of copper(II) from sulphuric acid solutions with Sodium diethyldithiocarbamate (SDDT), and six different organic diluents: Dichloromethane, Chloroform, Carbon tetrachloride, Toluene, xylene and Cyclohexane, were tested. The pair SDDT/Chloroform showed to be the most selective in removing the copper cations, and hence was considered throughout the experimental study. The effects of operating parameters such as the initial concentration of the extracting agent, the agitation time, the agitation speed and the acid concentration were considered. For an initial concentration of Cu (II) of 63 ppm in a 0.5 M sulphuric acid solution, both with a mass of the extracting agent of 20 mg, an extraction percentage of about 97.8 % and a distribution coefficient of 44.42 were obtained, respectively, confirming the performance of the SDDT-Chloroform pair.

A New Method for Rapid DNA Extraction from Artemia (Branchiopoda, Crustacea)

Artemia is one of the most conspicuous invertebrates associated with aquaculture. It can be considered as a model organism, offering numerous advantages for comprehensive and multidisciplinary studies using morphologic or molecular methods. Since DNA extraction is an important step of any molecular experiment, a new and a rapid method of DNA extraction from adult Artemia was described in this study. Besides, the efficiency of this technique was compared with two widely used alternative techniques, namely Chelex® 100 resin and SDS-chloroform methods. Data analysis revealed that the new method is the easiest and the most cost effective method among the other methods which allows a quick and efficient extraction of DNA from the adult animal.

Comparison of Classical and Ultrasound-Assisted Extractions of Hyphaene thebaica Fruit and Evaluation of Its Extract as Antibacterial Activity in Reducing Severity of Erwinia carotovora

Erwinia carotovora var. carotovora is the main cause of soft rot in potatoes. Hyphaene thebaica was studied for biocontrol of E. carotovora which inhibited growth of E. carotovora on solid medium, a comparative study of classical and ultrasound-assisted extractions of Hyphaene thebaica fruit. The use of ultrasound decreased significant the total time of treatment and increase the total amount of crude extract. The crude extract was subjected to determine the in vitro, by a bioassay technique revealed that the treatment of paper disks with ultrasound extraction of Hyphaene thebaica reduced the growth of pathogen and produced inhibition zones up to 38mm in diameter. The antioxidant activity of ultrasound-ethanolic extract of Doum fruits (Hyphaene thebaica) was determined. Data obtained showed that the extract contains the secondary metabolites such as Tannins, Saponin, Flavonoids, Phenols, Steroids, Terpenoids, Glycosides and Alkaloids.

A Text Mining Technique Using Association Rules Extraction

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.

A New Ridge Orientation based Method of Computation for Feature Extraction from Fingerprint Images

An important step in studying the statistics of fingerprint minutia features is to reliably extract minutia features from the fingerprint images. A new reliable method of computation for minutiae feature extraction from fingerprint images is presented. A fingerprint image is treated as a textured image. An orientation flow field of the ridges is computed for the fingerprint image. To accurately locate ridges, a new ridge orientation based computation method is proposed. After ridge segmentation a new method of computation is proposed for smoothing the ridges. The ridge skeleton image is obtained and then smoothed using morphological operators to detect the features. A post processing stage eliminates a large number of false features from the detected set of minutiae features. The detected features are observed to be reliable and accurate.

Automatic 3D Reconstruction of Coronary Artery Centerlines from Monoplane X-ray Angiogram Images

We present a new method for the fully automatic 3D reconstruction of the coronary artery centerlines, using two X-ray angiogram projection images from a single rotating monoplane acquisition system. During the first stage, the input images are smoothed using curve evolution techniques. Next, a simple yet efficient multiscale method, based on the information of the Hessian matrix, for the enhancement of the vascular structure is introduced. Hysteresis thresholding using different image quantiles, is used to threshold the arteries. This stage is followed by a thinning procedure to extract the centerlines. The resulting skeleton image is then pruned using morphological and pattern recognition techniques to remove non-vessel like structures. Finally, edge-based stereo correspondence is solved using a parallel evolutionary optimization method based on f symbiosis. The detected 2D centerlines combined with disparity map information allow the reconstruction of the 3D vessel centerlines. The proposed method has been evaluated on patient data sets for evaluation purposes.

Human Verification in a Video Surveillance System Using Statistical Features

A human verification system is presented in this paper. The system consists of several steps: background subtraction, thresholding, line connection, region growing, morphlogy, star skelatonization, feature extraction, feature matching, and decision making. The proposed system combines an advantage of star skeletonization and simple statistic features. A correlation matching and probability voting have been used for verification, followed by a logical operation in a decision making stage. The proposed system uses small number of features and the system reliability is convincing.

Study on Extraction of Lanthanum Oxide from Monazite Concentrate

Lanthanum oxide is to be recovered from monazite, which contains about 13.44% lanthanum oxide. The principal objective of this study is to be able to extract lanthanum oxide from monazite of Moemeik Myitsone Area. The treatment of monazite in this study involves three main steps; extraction of lanthanum hydroxide from monazite by using caustic soda, digestion with nitric acid and precipitation with ammonium hydroxide and calcination of lanthanum oxalate to lanthanum oxide.

Spectral Analysis of Speech: A New Technique

ICA which is generally used for blind source separation problem has been tested for feature extraction in Speech recognition system to replace the phoneme based approach of MFCC. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. This gives much better results against MFCC and ICA separately, both for word and speaker recognition. The mixing matrix A is different before and after MFCC as expected. As Mel is a nonlinear scale. However, cepstrals generated from Linear Predictive Coefficient being independent prove to be the right candidate for ICA. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.

Determination of Penicillins Residues in Livestock and Marine Products by LC/MS/MS

Multi-residue analysis method for penicillins was developed and validated in bovine muscle, chicken, milk, and flatfish. Detection was based on liquid chromatography tandem mass spectrometry (LC/MS/MS). The developed method was validated for specificity, precision, recovery, and linearity. The analytes were extracted with 80% acetonitrile and clean-up by a single reversed-phase solid-phase extraction step. Six penicillins presented recoveries higher than 76% with the exception of Amoxicillin (59.7%). Relative standard deviations (RSDs) were not more than 10%. LOQs values ranged from 0.1 and to 4.5 ug/kg. The method was applied to 128 real samples. Benzylpenicillin was detected in 15 samples and Cloxacillin was detected in 7 samples. Oxacillin was detected in 2 samples. But the detected levels were under the MRL levels for penicillins in samples.

The Performance Improvement of Automatic Modulation Recognition Using Simple Feature Manipulation, Analysis of the HOS, and Voted Decision

The use of High Order Statistics (HOS) analysis is expected to provide so many candidates of features that can be selected for pattern recognition. More candidates of the feature can be extracted using simple manipulation through a specific mathematical function prior to the HOS analysis. Feature extraction method using HOS analysis combined with Difference to the Nth-Power manipulation has been examined in application for Automatic Modulation Recognition (AMR) to perform scheme recognition of three digital modulation signal, i.e. QPSK-16QAM-64QAM in the AWGN transmission channel. The simulation results is reported when the analysis of HOS up to order-12 and the manipulation of Difference to the Nth-Power up to N = 4. The obtained accuracy rate of AMR using the method of Simple Decision obtained 90% in SNR > 10 dB in its classifier, while using the method of Voted Decision is 96% in SNR > 2 dB.

Using Data Fusion for Biometric Verification

A wide spectrum of systems require reliable personal recognition schemes to either confirm or determine the identity of an individual person. This paper considers multimodal biometric system and their applicability to access control, authentication and security applications. Strategies for feature extraction and sensor fusion are considered and contrasted. Issues related to performance assessment, deployment and standardization are discussed. Finally future directions of biometric systems development are discussed.

Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Feature Extraction for Surface Classification – An Approach with Wavelets

Surface metrology with image processing is a challenging task having wide applications in industry. Surface roughness can be evaluated using texture classification approach. Important aspect here is appropriate selection of features that characterize the surface. We propose an effective combination of features for multi-scale and multi-directional analysis of engineering surfaces. The features include standard deviation, kurtosis and the Canny edge detector. We apply the method by analyzing the surfaces with Discrete Wavelet Transform (DWT) and Dual-Tree Complex Wavelet Transform (DT-CWT). We used Canberra distance metric for similarity comparison between the surface classes. Our database includes the surface textures manufactured by three machining processes namely Milling, Casting and Shaping. The comparative study shows that DT-CWT outperforms DWT giving correct classification performance of 91.27% with Canberra distance metric.

Key Frames Extraction for Sign Language Video Analysis and Recognition

In this paper we proposed a method for finding video frames representing one sign in the finger alphabet. The method is based on determining hands location, segmentation and the use of standard video quality evaluation metrics. Metric calculation is performed only in regions of interest. Sliding mechanism for finding local extrema and adaptive threshold based on local averaging is used for key frames selection. The success rate is evaluated by recall, precision and F1 measure. The method effectiveness is compared with metrics applied to all frames. Proposed method is fast, effective and relatively easy to realize by simple input video preprocessing and subsequent use of tools designed for video quality measuring.

Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System

MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.