Fuzzy Logic Approach to Robust Regression Models of Uncertain Medical Categories

Dichotomization of the outcome by a single cut-off point is an important part of various medical studies. Usually the relationship between the resulted dichotomized dependent variable and explanatory variables is analyzed with linear regression, probit regression or logistic regression. However, in many real-life situations, a certain cut-off point dividing the outcome into two groups is unknown and can be specified only approximately, i.e. surrounded by some (small) uncertainty. It means that in order to have any practical meaning the regression model must be robust to this uncertainty. In this paper, we show that neither the beta in the linear regression model, nor its significance level is robust to the small variations in the dichotomization cut-off point. As an alternative robust approach to the problem of uncertain medical categories, we propose to use the linear regression model with the fuzzy membership function as a dependent variable. This fuzzy membership function denotes to what degree the value of the underlying (continuous) outcome falls below or above the dichotomization cut-off point. In the paper, we demonstrate that the linear regression model of the fuzzy dependent variable can be insensitive against the uncertainty in the cut-off point location. In the paper we present the modeling results from the real study of low hemoglobin levels in infants. We systematically test the robustness of the binomial regression model and the linear regression model with the fuzzy dependent variable by changing the boundary for the category Anemia and show that the behavior of the latter model persists over a quite wide interval.


Authors:



References:
[1] Bolotin A., "Fuzzification of Linear Regression Models with Indicator
Variables in Medical Decision Making", In: Proceedings of the CIMCA
2005, IEEE, 2006, Vol. 1, pp. 572-577.
[2] Bolotin A., "Replacing indicator variables by fuzzy membership functions
in statistical regression models: Examples of epidemiological stud-
ies" In: Lecture Notes in Computer Science: Biological and Medical
Data Analysis, Springer, 2004, pp. 251-258.
[3] Bolotin A., "Uncertain categories in medical data analysis", In: Proceedings
of the IPMU 2006, Paris, 2006, to be printed.
[4] Gujarati, D., Basic Econometrics [Chapter 15: Regression on dummydependent
variables]. McGraw-Hill, 2003.
[5] Irwin J., Kirchner, J., "Anemia in Children", Am. Fam. Physician, 64
(2001), 1379-86.
[6] Johnston, J., DiNardo, J., Econometric Methods [Chapter 13: Discrete
and limited dependent variable models], McGraw-Hill. 1997.
[7] Levy A., Fraser D., Rosen S., Dagan R., Deckelbaum R., Coles C., Naggan
L., "Anemia as a risk factor for infectious diseases in infants and
toddlers: results from a prospective study", Eur J Epidemiol., 2005;
20(3): 277-84
[8] Maronna R., Martin D., Yohai V., Robust Statistics: Theory and Methods,
Wiley, 2006.
[9] Mazumdar M., Glassman R., "Categorizing a Prognostic Variable: Review
of Methods, Code for Easy Implementation and Applications to
Decision-Making about Cancer Treatments", Statist. Med. 19 (2000),
113-132.
[10] Peracchi F., Econometrics [Chapter 15: M-Estimators], Wiley, 2001.
[11] Preisser J., Koch G., "Categorical Data Analysis in Public Health", Ann.
Rev. Public Health. 18 (1997), 51-82.
[12] Siberry G., Iannone R., eds., The Harriet Lane handbook. 15th ed. St.
Louis: Mosby, 2000..