Approximate Frequent Pattern Discovery Over Data Stream

Frequent pattern discovery over data stream is a hard problem because a continuously generated nature of stream does not allow a revisit on each data element. Furthermore, pattern discovery process must be fast to produce timely results. Based on these requirements, we propose an approximate approach to tackle the problem of discovering frequent patterns over continuous stream. Our approximation algorithm is intended to be applied to process a stream prior to the pattern discovery process. The results of approximate frequent pattern discovery have been reported in the paper.




References:
[1] C. Aggarwal, J. Han, J. Wang, and P. Yu, "A framework for clustering
evolving data streams," in Pro. Very Large Data Bases, 2003.
[2] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules
between sets of items in large databases," in Proc. ACM SIGMOD Int.
Conf. Management of Data, 1993, pp. 207-216.
[3] R. Agrawal and R. Srikant, "Fast algorithm for mining association
rules," in Proc. Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Model and
issues in data stream systems," in Pro. ACM PODS, 2002.
[5] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,"
Dept. Electrical Engineering and Computer Science, University of
California Berkeley, Technical Report TR-97-021, 1998.
[6] G. Coremode and S. Muthukrishnan, "What-s hot and what-s not:
Tracking most frequent items dynamically," in Pro. ACM PODS, 2003.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood
from incomplete data via the EM algorithm," Journal of the Royal
Statistical Society B, vol. 39, pp. 1-22, 1977.
[8] P. Domingos and G. Hulten, "A general method to scaling up machine
learning algorithms and its application to clustering," in Pro. ICML,
2001.
[9] M. A. T. Figueiredo and A. K. Jain, "Unsupervised learning of finite
mixture models," IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 24, pp. 381-396, 2002.
[10] M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data stream: A
review," SIGMOD Record, vol. 34, pp. 18-26, 2005.
[11] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "One-pass
wavelet decompositions of data streams," IEEE Trans. Knowledge and
Data Engineering, vol. 15, pp. 541-554, 2003.
[12] J. M. Marin, K. Mengersen, and C. Robert, "Bayesian modelling and
inference on mixtures of distributions," in Handbook of Statistics, vol.
25, Elsevier-Science, 2005.
[13] S. Muthukrishnan, "Data streams: Algorithms and applications," in
Proc. ACM-SIAM Symposium on Discrete Algorithm, 2003.
[14] B. Resch, "A tutorial for the course computational intelligence,"
Available: http://www.igi.tugraz.at/lehre/CI
[15] J. von Neumann, "Various techniques used in connection with random
digits," Applied Mathematics Series, vol. 12, National Bureau of
Standards, Washington, D.C., 1951.