Approximate Frequent Pattern Discovery Over Data Stream
Frequent pattern discovery over data stream is a hard
problem because a continuously generated nature of stream does not
allow a revisit on each data element. Furthermore, pattern discovery
process must be fast to produce timely results. Based on these
requirements, we propose an approximate approach to tackle the
problem of discovering frequent patterns over continuous stream.
Our approximation algorithm is intended to be applied to process a
stream prior to the pattern discovery process. The results of
approximate frequent pattern discovery have been reported in the
paper.
[1] C. Aggarwal, J. Han, J. Wang, and P. Yu, "A framework for clustering
evolving data streams," in Pro. Very Large Data Bases, 2003.
[2] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules
between sets of items in large databases," in Proc. ACM SIGMOD Int.
Conf. Management of Data, 1993, pp. 207-216.
[3] R. Agrawal and R. Srikant, "Fast algorithm for mining association
rules," in Proc. Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Model and
issues in data stream systems," in Pro. ACM PODS, 2002.
[5] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,"
Dept. Electrical Engineering and Computer Science, University of
California Berkeley, Technical Report TR-97-021, 1998.
[6] G. Coremode and S. Muthukrishnan, "What-s hot and what-s not:
Tracking most frequent items dynamically," in Pro. ACM PODS, 2003.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood
from incomplete data via the EM algorithm," Journal of the Royal
Statistical Society B, vol. 39, pp. 1-22, 1977.
[8] P. Domingos and G. Hulten, "A general method to scaling up machine
learning algorithms and its application to clustering," in Pro. ICML,
2001.
[9] M. A. T. Figueiredo and A. K. Jain, "Unsupervised learning of finite
mixture models," IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 24, pp. 381-396, 2002.
[10] M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data stream: A
review," SIGMOD Record, vol. 34, pp. 18-26, 2005.
[11] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "One-pass
wavelet decompositions of data streams," IEEE Trans. Knowledge and
Data Engineering, vol. 15, pp. 541-554, 2003.
[12] J. M. Marin, K. Mengersen, and C. Robert, "Bayesian modelling and
inference on mixtures of distributions," in Handbook of Statistics, vol.
25, Elsevier-Science, 2005.
[13] S. Muthukrishnan, "Data streams: Algorithms and applications," in
Proc. ACM-SIAM Symposium on Discrete Algorithm, 2003.
[14] B. Resch, "A tutorial for the course computational intelligence,"
Available: http://www.igi.tugraz.at/lehre/CI
[15] J. von Neumann, "Various techniques used in connection with random
digits," Applied Mathematics Series, vol. 12, National Bureau of
Standards, Washington, D.C., 1951.
[1] C. Aggarwal, J. Han, J. Wang, and P. Yu, "A framework for clustering
evolving data streams," in Pro. Very Large Data Bases, 2003.
[2] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules
between sets of items in large databases," in Proc. ACM SIGMOD Int.
Conf. Management of Data, 1993, pp. 207-216.
[3] R. Agrawal and R. Srikant, "Fast algorithm for mining association
rules," in Proc. Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Model and
issues in data stream systems," in Pro. ACM PODS, 2002.
[5] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,"
Dept. Electrical Engineering and Computer Science, University of
California Berkeley, Technical Report TR-97-021, 1998.
[6] G. Coremode and S. Muthukrishnan, "What-s hot and what-s not:
Tracking most frequent items dynamically," in Pro. ACM PODS, 2003.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood
from incomplete data via the EM algorithm," Journal of the Royal
Statistical Society B, vol. 39, pp. 1-22, 1977.
[8] P. Domingos and G. Hulten, "A general method to scaling up machine
learning algorithms and its application to clustering," in Pro. ICML,
2001.
[9] M. A. T. Figueiredo and A. K. Jain, "Unsupervised learning of finite
mixture models," IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 24, pp. 381-396, 2002.
[10] M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data stream: A
review," SIGMOD Record, vol. 34, pp. 18-26, 2005.
[11] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "One-pass
wavelet decompositions of data streams," IEEE Trans. Knowledge and
Data Engineering, vol. 15, pp. 541-554, 2003.
[12] J. M. Marin, K. Mengersen, and C. Robert, "Bayesian modelling and
inference on mixtures of distributions," in Handbook of Statistics, vol.
25, Elsevier-Science, 2005.
[13] S. Muthukrishnan, "Data streams: Algorithms and applications," in
Proc. ACM-SIAM Symposium on Discrete Algorithm, 2003.
[14] B. Resch, "A tutorial for the course computational intelligence,"
Available: http://www.igi.tugraz.at/lehre/CI
[15] J. von Neumann, "Various techniques used in connection with random
digits," Applied Mathematics Series, vol. 12, National Bureau of
Standards, Washington, D.C., 1951.
@article{"International Journal of Information, Control and Computer Sciences:58366", author = "Kittisak Kerdprasop and Nittaya Kerdprasop", title = "Approximate Frequent Pattern Discovery Over Data Stream", abstract = "Frequent pattern discovery over data stream is a hard
problem because a continuously generated nature of stream does not
allow a revisit on each data element. Furthermore, pattern discovery
process must be fast to produce timely results. Based on these
requirements, we propose an approximate approach to tackle the
problem of discovering frequent patterns over continuous stream.
Our approximation algorithm is intended to be applied to process a
stream prior to the pattern discovery process. The results of
approximate frequent pattern discovery have been reported in the
paper.", keywords = "Frequent pattern discovery, Approximate algorithm,
Data stream analysis.", volume = "1", number = "11", pages = "3544-5", }