Defect Cause Modeling with Decision Tree and Regression Analysis

The main aim of this study is to identify the most influential variables that cause defects on the items produced by a casting company located in Turkey. To this end, one of the items produced by the company with high defective percentage rates is selected. Two approaches-the regression analysis and decision treesare used to model the relationship between process parameters and defect types. Although logistic regression models failed, decision tree model gives meaningful results. Based on these results, it can be claimed that the decision tree approach is a promising technique for determining the most important process variables.




References:
[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, 2001.
[2] M. H. Dunham, Data Mining: Introductory and Advanced Topics.
Prentice Hall, 2003.
[3] B. S. Kang, S. C. Park, "Integrated machine learning approaches for
complementing statistical process control procedures", Decision Support
System, vol. 29, pp. 59-72, 2000.
[4] M. Li, S. Feng, I. K. Sethi, J. Luciow, K. Wagner, "Mining Production
Data with Neural Network & CART" in Conf. Rec. 2003 IEEE Int. Conf.
Data Mining.
[5] J. Lian, X. M. Lai, Z. Q. Lin, F. S. Yao, "Application of data mining and
process knowledge discovery in sheet metal assembly dimensional
variation diagnosis", Journal of Materials Processing Technology, vol.
129, pp. 315-320, 2002.
[6] D. Braha, A. Shmilovici, "Data Mining for Improving a Cleaning
Process in the Semiconductor Industry", IEEE Trans. Semiconductor
Manufacturing, vol. 15, no. 1 pp. 91-101, Feb. 2002.
[7] D. W. Hosmer, S. Lemeshow, Applied Logistic Regression. Wiley-
Interscience Publication, 2000.
[8] D. C. Montgomery, E. A. Peck, Introduction to Linear Regression
Analysis. Wiley, 1982, pp. 444-453
[9] P. McCullagh, "Regression models for ordinal data (with discussion)",
Journal of the Royal Statistical Society. Series B, vol. 42, pp. 109-127,
1980.
[10] A. Albert, J. A. Anderson, "On the existence of maximum likelihood
estimates in logistic models", Biometrika, vol. 71, pp. 1-10, 1984.
[11] M. C. Bryson, M. E. Johnson, "The incidence of monotone likelihood in
the Cox model", Techometrics, vol.23, pp. 381-384, 1981.
[12] Data Mining Tools C5.0
http://www.rulequest.com/see5-info.html
[13] K. R. Skinner, D. C. Montgomery, G. C. Runger, J. W. Fowler, D. R.
McCarville, T. R. Rhoads, "Multivariate Statistical Methods for
Modeling and Analysis of Wafer Probe Test Data", IEEE Trans.
Semiconductor Manufacturing, vol. 15, no. 4 pp. 523-530, Nov. 2002.