Data Mining Classification Methods Applied in Drug Design

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

One-Dimensional Numerical Investigation of a Cylindrical Micro-Combustor Applying Electrohydrodynamics Effect

In this paper, a one-dimensional numerical approach is used to study the effect of applying electrohydrodynamics on the temperature and species mass fraction profiles along the microcombustor. Premixed mixture is H2-Air with a multi-step chemistry (9 species and 19 reactions). In the micro-scale combustion because of the increasing ratio of area-to-volume, thermal and radical quenching mechanisms are important. Also, there is a significant heat loss from the combustor walls. By inserting a number of electrodes into micro-combustor and applying high voltage to them corona discharge occurs. This leads in moving of induced ions toward natural molecules and colliding with them. So this phenomenon causes the movement of the molecules and reattaches the flow to the walls. It increases the velocity near the walls that reduces the wall boundary layer. Consequently, applying electrohydrodynamics mechanism can enhance the temperature profile in the microcombustor. Ultimately, it prevents the flame quenching in microcombustor.

Rigorous Electromagnetic Model of Fourier Transform Infrared (FT-IR) Spectroscopic Imaging Applied to Automated Histology of Prostate Tissue Specimens

Fourier transform infrared (FT-IR) spectroscopic imaging is an emerging technique that provides both chemically and spatially resolved information. The rich chemical content of data may be utilized for computer-aided determinations of structure and pathologic state (cancer diagnosis) in histological tissue sections for prostate cancer. FT-IR spectroscopic imaging of prostate tissue has shown that tissue type (histological) classification can be performed to a high degree of accuracy [1] and cancer diagnosis can be performed with an accuracy of about 80% [2] on a microscopic (≈ 6μm) length scale. In performing these analyses, it has been observed that there is large variability (more than 60%) between spectra from different points on tissue that is expected to consist of the same essential chemical constituents. Spectra at the edges of tissues are characteristically and consistently different from chemically similar tissue in the middle of the same sample. Here, we explain these differences using a rigorous electromagnetic model for light-sample interaction. Spectra from FT-IR spectroscopic imaging of chemically heterogeneous samples are different from bulk spectra of individual chemical constituents of the sample. This is because spectra not only depend on chemistry, but also on the shape of the sample. Using coupled wave analysis, we characterize and quantify the nature of spectral distortions at the edges of tissues. Furthermore, we present a method of performing histological classification of tissue samples. Since the mid-infrared spectrum is typically assumed to be a quantitative measure of chemical composition, classification results can vary widely due to spectral distortions. However, we demonstrate that the selection of localized metrics based on chemical information can make our data robust to the spectral distortions caused by scattering at the tissue boundary.

Petrology and Geochemistry of Granitic Rocks in South Sulawesi, Indonesia: Implication for Origin of Magma and Geodynamic Setting

Petrology and geochemical characteristics of granitic rocks from South Sulawesi, especially from Polewaliand Masamba area are presented in order to elucidate their origin of magma and geodynamic setting. The granitic rocks in these areas are dominated by granodiorite and granite in composition. Quartz, K-feldspar and plagioclase occur as major phases with hornblende and biotite as major ferromagnesian minerals. All of the samples were plotted in calc-alkaline field, show metaluminous affinity and typical of I-type granitic rock. Harker diagram indicates that granitic rocks experienced fractional crystallization during magmatic evolution. Both groups displayed an extreme enrichment of LILE, LREE and a slight negative Eu anomaly which resemble upper continental crust affinity. They were produced from partial melting of upper continental crust and have close relationship of sources composition within a suite. The geochemical characteristics explained the arc related subduction environment which later give an evidence of continent-continent collision between Australia-derived microcontinent and Sundalandto form continental arc environment.

Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.