Abstract: Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.
Abstract: Purpose of this work is the development of an
automatic classification system which could be useful for radiologists
in the investigation of breast cancer. The software has been designed
in the framework of the MAGIC-5 collaboration.
In the automatic classification system the suspicious regions with
high probability to include a lesion are extracted from the image as
regions of interest (ROIs). Each ROI is characterized by some
features based on morphological lesion differences.
Some classifiers as a Feed Forward Neural Network, a K-Nearest
Neighbours and a Support Vector Machine are used to distinguish the
pathological records from the healthy ones.
The results obtained in terms of sensitivity (percentage of
pathological ROIs correctly classified) and specificity (percentage of
non-pathological ROIs correctly classified) will be presented through
the Receive Operating Characteristic curve (ROC). In particular the
best performances are 88% ± 1 of area under ROC curve obtained
with the Feed Forward Neural Network.
Abstract: Purpose of this work is to develop an automatic classification system that could be useful for radiologists in the breast cancer investigation. The software has been designed in the framework of the MAGIC-5 collaboration. In an automatic classification system the suspicious regions with high probability to include a lesion are extracted from the image as regions of interest (ROIs). Each ROI is characterized by some features based generally on morphological lesion differences. A study in the space features representation is made and some classifiers are tested to distinguish the pathological regions from the healthy ones. The results provided in terms of sensitivity and specificity will be presented through the ROC (Receiver Operating Characteristic) curves. In particular the best performances are obtained with the Neural Networks in comparison with the K-Nearest Neighbours and the Support Vector Machine: The Radial Basis Function supply the best results with 0.89 ± 0.01 of area under ROC curve but similar results are obtained with the Probabilistic Neural Network and a Multi Layer Perceptron.