Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.




References:
[1] H. Yoon, J.-G. Baek, C.-S. Park, and Y.-H. Lee, "A constrained clustering
method for mixed process data using recursive partitioning and
regression trees," in Proceedings of the IIE Asian Conference, 2012.
[2] S. Bagchi, R. J. Baseman, A. Davenport, R. Natarajan, N. Slonim, and
S. Weiss, "Data analytics and stochastic modeling in a semiconductor
fab," Applied Stochastic Models in Business and Industry, vol. 26, no. 1,
pp. 1-27, 2010.
[3] C. F. Chien, W. C. Wang, and J. C. Cheng, "Data mining for yield
enhancement in semiconductor manufacturing and an empirical study,"
Expert Systems with Applications, vol. 33, no. 1, pp. 192-198, 2007.
[4] D. J. Hand, "Principles of data mining," Drug Safety, vol. 30, no. 7, pp.
621-622, 2007.
[5] J. Quinlan, "Induction of decision trees," Machine Learning, vol. 1,
no. 1, p. 81, 1986.
[6] J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann
Publishers, 1993.
[7] L. Breiman, Classification and regression trees. Wadsworth International
Group, 1984.
[8] G. V. Kass, "An exploratory technique for investigating large quantities
of categorical data," Journal of the Royal Statistical Society. Series C
(Applied Statistics), vol. 29, no. 2, pp. 119-127, 1980.
[9] J. N. Morgan, J. A. Sonquist, J. N. Morgan, and J. A. Sonquist,
"Problems in the analysis of survey data, and a proposal," Journal of
the American Statistical Association, vol. 58, no. 302, p. 415, 1963.
[10] R. E. Walpole and R. H. Myers, Probability and Statistics for Engineers
and Scientists, 5th ed. Macmillan Coll Div, 1993.
[11] H. B. Mann and D. R. Whitney, "On a test of whether one of two random
variables is stochastically larger than the other," Annals of Mathematical
Statistics, vol. 18, no. 1, p. 50, 1947.
[12] F. Wilcoxon, "Individual comparisons by ranking methods," Biometrics
Bulletin, vol. 1, no. 6, p. 80, 1945.
[13] G. Casella and R. Berger, Statistical Inference. Duxbury Press, 2001.
[14] N. Henze, "A probabilistic representation of the skew-normal distribution,"
Scandinavian Journal of Statistics, vol. 13, no. 4, pp. 271-275,
1986.
[15] I. W. Burr, "Cumulative frequency functions," Annals of Mathematical
Statistics, vol. 13, pp. 215-232, 1942.
[16] K. Pearson, "Contributions to the mathematical theory of evolution,"
Philosophical Transactions of the Royal Society of London. A, vol. 185,
pp. 71-110, 1894.
[17] K. Pearson, "Contributions to the mathematical theory of evolution. ii.
skew variation in homogeneous material," Philosophical Transactions of
the Royal Society of London. A, vol. 186, pp. 343-414, 1895.
[18] K. Pearson, "Mathematical contributions to the theory of evolution. x.
supplement to a memoir on skew variation." Philosophical Transactions
of the Royal Society of London Series a-Containing Papers of a
Mathematical or Physical Character, vol. 197, pp. 443-459, 1901.
[19] K. Pearson, "Mathematical contributions to the theory of evolution. -
xix. second supplement to a memoir on skew variation." Philosophical
Transactions of the Royal Society of London Series a-Containing Papers
of a Mathematical or Physical Character, vol. 216, pp. 429-457, 1916.
[20] Y. Nagahara, "Non-gaussian filter and smoother based on the pearson
distribution system," Journal of Time Series Analysis, vol. 24, no. 6, pp.
721-738, 2003.