Abstract: The medical data statistical analysis often requires the
using of some special techniques, because of the particularities of
these data. The principal components analysis and the data clustering
are two statistical methods for data mining very useful in the medical
field, the first one as a method to decrease the number of studied
parameters, and the second one as a method to analyze the
connections between diagnosis and the data about the patient-s
condition. In this paper we investigate the implications obtained from
a specific data analysis technique: the data clustering preceded by a
selection of the most relevant parameters, made using the principal
components analysis. Our assumption was that, using the principal
components analysis before data clustering - in order to select and to
classify only the most relevant parameters – the accuracy of
clustering is improved, but the practical results showed the opposite
fact: the clustering accuracy decreases, with a percentage
approximately equal with the percentage of information loss reported
by the principal components analysis.
Abstract: The medical studies often require different methods
for parameters selection, as a second step of processing, after the
database-s designing and filling with information. One common
task is the selection of fields that act as risk factors using wellknown
methods, in order to find the most relevant risk factors and
to establish a possible hierarchy between them. Different methods
are available in this purpose, one of the most known being the
binary logistic regression. We will present the mathematical
principles of this method and a practical example of using it in the
analysis of the influence of 10 different psychiatric diagnostics
over 4 different types of offences (in a database made from 289
psychiatric patients involved in different types of offences).
Finally, we will make some observations about the relation
between the risk factors hierarchy established through binary
logistic regression and the individual risks, as well as the results of
Chi-squared test. We will show that the hierarchy built using the
binary logistic regression doesn-t agree with the direct order of risk
factors, even if it was naturally to assume this hypothesis as being
always true.