Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.





References:
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proc. of the
11th Int. Conf. Data Engineering, 1995, Taipei, Taiwan, pp. 3-14.
[2] R. Feldman, I. Dagan, and H. Hirsh, "Mining Text using Keyword
Distributions," J. of Intelligent Information Systems, vol. 10, no.3, pp.
281-300, May, 1998.
[3] M. N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential Pattern
Mining with Regular Expression Constraints," in Proc. of the Very Large
Data Bases Conf. 1999, 1999, Edinburgh, Scotland, UK, pp. 223-234.
[4] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and
Y. Fujiwara, "Text Mining System for Analysis of a Salesperson-s Daily
Reports," in Proc. of the Pacific Association for Computational Linguistics
2001, 2001, Kitakyushuu, Japan, pp. 127-135.
[5] K. Lagus, T. Honkela, S. Kaski, and T. Kohonen, "Websom for Textual
Data Mining," J. of Artificial Intelligence Review, vol. 13, no. 5/6, pp.
335-364, Dec., 1999.
[6] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan,
"Mining of Concurrent Text and Time-Series," in Proc. of the KDD-2000
Workshop on Text Mining, 2000, Boston, Massachusetts, USA, pp. 37-44.
[7] B. Lent, R. Agrawal, and R. Srikant, "Discovering Trends in Text
Databases," in Proc. of the 3rd Int. Conf. on Knowledge Discovery and
Data Mining, 1997, Newportbeach, California, USA, pp. 227-230.
[8] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu,
"PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected
Pattern Growth," in Proc. of 2001 Int. Conf. Data Engineering, 2001,
Heidelberg, Germany, pp. 215-224.
[9] J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints
in Large Databases," in Proc. of the 11th ACM Int. Conf. on Information
and Knowledge Management, 2002, McLean, Virginia, USA, pp. 4-9.
[10] S. Sakurai, Y. Ichimura, and A. Suyama, "Acquisition of a Knowledge
Dictionary from Training Examples including Multiple Values," Proc. of
the 13th Int. Symposium on Methodologies for Intelligent Systems, 2002,
Lyon, France, pp. 103-113.
[11] S. Sakurai and K. Ueno, "Analysis of Daily Business Reports Based on
Sequential Text Mining Method," in Proc. of the 2004 IEEE Int. Conf.
on Systems, Man and Cybernetics, 2004, Hague, Netherlands, pp. 3279-
3284.
[12] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations
and Performance Improvements," Proc. of the 5th Int. Conf. Extending
Database Technology, 1996, Avignon, France, pp. 3-17.
[13] R. Swan and D. Jensen, "TimeMines: Constructing Timelines with
Statistical Models of Word Usage," Proc. of the KDD-2000 Workshop
on Text Mining, 2000, Boston, Massachusetts, USA, pp. 73-80.
[14] M. J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent
Sequences," Machine Learning, vol. 42, no. 1/2, pp. 31-60, Jan., 2001.