On the Performance of Information Criteria in Latent Segment Models

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.




References:
[1] H. Akaike, Information Theory and an Extension of Maximum
Likelihood Principle, in K. T. Emanuel Parzen, Genshiro Kitagawa, ed.,
Selected Papers of Hirotugu Akaike, in Proceedings of the Second
International Symposium on Information Theory, B.N. Petrov and F.
caski, eds., Akademiai Kiado, Budapest, 1973, 267-281, Springer-
Verlag New York, Inc, Texas, 1973, pp. 434.
[2] J. D. Banfield and A. E. Raftery, Model-Based Gaussian and Non-
Gaussian Clustering, Biometrics, 49 (1993), pp. 803-821.
[3] C. Biernacki, Choix de modéles en Classification, PhD Thesis.,
Compiègne University of Technology, 1997.
[4] C. Biernacki, G. Celeux and G. Govaert, Assessing a Mixture model for
Clustering with the integrated Completed Likelihood, IEEE Transactions
on Pattern analysis and Machine Intelligence, 22 (2000), pp. 719-725.
[5] C. Biernacki, G. Celeux and G. Govaert, Choosing starting values for
the EM algorithm for getting the highest likelihood in multivariate
Gaussian mixture models, Computational Statistics & Data Analysis, 41
(2003), pp. 561-575.
[6] C. Biernacki, G. Celeux and G. Govaert, An improvement of the NEC
criterion for assessing the number of clusters in mixture model, Pattern
Recognition Letters, 20 (1999), pp. 267-272.
[7] D. Böhning and W. Seidel, Editorial: recent developments in mixture
models, Computational Statistics & Data Analysis, 41 (2003), pp. 349-
357.
[8] S. Boucheron and E. Gassiat, Order Estimation and Model Selection, in
e. O. C. A. T. Ryden, ed., Inference in Hidden Markov, 2002, pp. 25.
[9] H. Bozdogan, Mixture-Model Cluster Analysis using Model Selection
criteria and a new Informational Measure of Complexity, in H.
Bozdogan, ed., Proceedings of the First US/Japan Conference on the
Frontiers of Statistical Modeling: An Approach, 69-113, Kluwer
Academic Publishers, 1994, pp. 69-113.
[10] H. Bozdogan, Model Selection and Akaikes's Information Criterion
(AIC): The General Theory and its Analytical Extensions, Psycometrika,
52 (1987), pp. 345-370.
[11] H. Bozdogan, Proceedings of the first US/Japan conference on the
Frontiers of Statistical Modeling: An Informational Approach, Kluwer
Academic Publishers, Dordrecht, 1994.
[12] G. Celeux and G. Soromenho, An entropy criterion for acessing the
number of clusters in a mixture model, Journal of Classification, 13
(1996), pp. 195-212.
[13] W. J. Conover, Practical Nonparametric Statistics, John Wiley & Sons,
Inc., New York, 1980.
[14] N. E. Day, Estimating the Components of a mixture of normal
Distributions, Biometrika, 56 (1969), pp. 463-474.
[15] A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum Likelihood
from incomplete Data via EM algorithm, Journal of the Royal Statistics
Society, B, 39 (1977), pp. 1-38.
[16] J. G. Dias and F. Willekens, Model-based Clustering of Sequential Data
with an Application to Contraceptive Use Dynamics, Mathematical
Population Studies, 12 (2005), pp. 135-157.
[17] W. R. Dillon and A. Kumar, Latent structure and other mixture models
in marketing: An integrative survey and overview, chapter 9 in R.P.
Bagozi (ed.), Advanced methods of Marketing Research, 352-388,
Cambridge: Blackwell Publishers, 1994.
[18] M. A. T. Figueiredo and A. K. Jain, Unsupervised Learning of Finite
Mixture Models, IEEE Transactions on pattern analysis and Machine
Intelligence, 24 (2002), pp. 1-16.
[19] J. R. S. Fonseca and M. G. M. S. Cardoso, Mixture-Model Cluster
Analysis using Information Theoretical Criteria, Intelligent Data
Analysis, 11 (2007), pp. 155-173.
[20] M. Friedman, The use of ranks to avoid the assumption of normality
implicit in the analysis of variance, Journal of American Statistical
Association, 32 (1937), pp. 675-701.
[21] J. G. Fryer, and Robertson, C.A., A Comparision of Some methods for
Estimating Mixed Normal Distributions, Biometrika, 59 (1972), pp. 639-
648.
[22] P. Hall and D. M. Titterington, Efficient Nonparametric Estimation of
Mixture Proportions, Journal of the Royal Statistical Society, Series B,
46 (1984), pp. 465-473.
[23] R. J. Hataway, A Constrained Formulation of Maximum-Likelihood
Estimation for Normal Mixture Distributions, The Annals of Statistics,
13 (1985), pp. 795-800.
[24] L. A. Hunt and K. E. Basford, Fitting a Mixture Model to Three-Mode
Trhee-Way Data with Categorical and Continuous Variables, Journal of
Classification, 16 (1999), pp. 283-296.
[25] C. M. Hurvich and C.-L. Tsai, Regression and Time Series Model
Selection in Small Samples, Biometrika, 76 (1989), pp. 297-307.
[26] L. F. James, C. E. Priebe and D. J. Marchette, Consistency Estimation of
Mixture Complexity, The Annals of Statistics, 29 (2001), pp. 1281-
1296.
[27] A. B. M. L. Kabir, Estimation of Parameters of a finite Mixture of
Distributions, Journal of the Royal Statistical Society, Series B, 30
(1968), pp. 472-482.
[28] D. Karlis and E. Xekalaki, Choosing initial values for the EM algorithm
for finite mixtures, Computational Statistics & Data Analysis, 41 (2003),
pp. 577-590.
[29] C. Keribin, Estimation consistante de l'orde de modèles de mélange,
Comptes Rendues de l'Academie des Sciences, Paris, t. 326, Série I
(1998), pp. 243-248.
[30] Y. Kim, W. N. Street and F. Menezer, Evolutionary model selection in
unsupervised learning, Intelligent Data Analysis, 6 (2002), pp. 531-556.
[31] B. G. Leroux, Consistent Estimation of a Mixing Distribution, The
Annals of Statistics, 20 (1992), pp. 1350-1360.
[32] B. G. Leroux and M. L. Puterman, Maximum-Penalized-Likelihood
Estimation for Independent and Markov-Dependent Mixture Models,
Biometrics, 48 (1992), pp. 545-558.
[33] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions,
John Wiley & Sons, New York, 1997.
[34] G. F. McLachlan and D. Peel, Finite Mixture Models, John Wiley &
Sons., 2000.
[35] G. J. McLachlan and K. E. Basford, Mixture Models: Inference and
Applications to Clustering., Marcel Deckker, Inc., New York, 1988.
[36] A. McQuarrie, R. Shumway and C.-L. Tsai, The model selection
criterion AICu, Statistics & Probability Letters, 34 (1997), pp. 285-292.
[37] G. Punj and D. W. Stewart, Cluster Analysis in Marketing Research:
Review and Suggestions for Application, Journal of Marketing
Research, XX (May 1983) (1983), pp. 134-148.
[38] J. Rissanen, Modeling by shortest data description, Automatica, 14
(1978), pp. 465-471.
[39] G. Schwarz, Estimating the Dimenson of a Model, The Annals of
Statistics, 6 (1978), pp. 461-464.
[40] J. K. Vermunt and J. Magidson, Latent class cluster analysis., J.A.
Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis,
89-106., Cambridge University Press, 2002.
[41] H. x. Wang, Q. b. Zhang, B. Luo and S. Wei, Robust mixture modelling
using multivariate t-distribution with missing information, Pattern
Recognition Letters, 25 (2004), pp. 701-710.
[42] D. L. Weakliem, A critique of the Bayesian Criterion for Model
Selection, Sociological Methodology & Research, 27 (1999), pp. 359-
397.