Event Information Extraction System (EIEE): FSM vs HMM

Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.




References:
[1] J. Allan, et al., "Topic Detection and Tracking Pilot Study Final Report,"
in DARPA Broadcast News Transcription and Understanding
Workshop, 1998.
[2] J. Allan, R. Papka, and V. Lavrenko, "On-Line New Event Detection
and Tracking," presented at SIGIR'98, Melbourne, Australia, 1998.
[3] Y. Yang, T. Pierce, and J. Carbonell, "A Study on Retrospective and
Online Event Detection," presented at SIGIR'98, Melbourne, Australia,
1998.
[4] Y. Yang, et al., "Learning Approaches for Detecting and Tracking News
Events," IEEE Intelligent Systems Special Issue on Applications of
Intelligent Information Retrieval, vol. 4, pp. 32-43, 1999.
[5] G. Kumaran and J. Allan, "Text Classification and Named Entities for
New Event Detection," presented at SIGIR'04, Sheffield, South
Yorkshire, UK, 2004.
[6] D. Kusui, K. Tateishi, and T. fukushima, "Information Extraction and
Visualization Fro Internet Documents," NEC Journal of Advanced
Technology, vol. 2, 2005.
[7] K. Chen, L. Luesukprasert, and S. T. Chou, "Hot Topic Extraction Based
on Timeline Analysis and Multidimensional Sentence Modeling," IEEE
Transactions on Knowledge and Data Engineering, vol. 19, 2007.
[8] Z. Kuo, L. J. Zi, and W. Gang, "New Event Detection Based on
Indexing-Tree and Named Entity," presented at SIGIR'07, Amsterdam,
The Netherlands, 2007.
[9] X. Wan, E. Milios, and N. Kalyaniwalla, "Link-Based Event Detection
in Email Communication Networks," presented at SAC'09, Honolulu,
Hawaii, U.S.A, 2009.
[10] Q. Zhao and P. Mitra, "Event Detection and Visualization for Social
Text Streams," presented at ICWSM, Coloroda, USA, 2007.
[11] Q. Zhao, P. Mitra, and B. Chen, "Temporal and Information Flow Based
Event Detection from Social Text Streams," presented at American
Association for Artificial Intelligence (AAAI 2007), Vancouver, British
Columbia, Canada 2007.
[12] V. Pekar, "Information Extraction from Email Announcements," in
Lncs, Natural Language Processing and Information Systems. Berlin
Heidelberg: Springer Verlag, 2005, pp. 372-375.
[13] C. X. Lin, et al., "Pet: A Statistical Model for Popular Events Tracking
in Social Communities," presented at SIGKDD, New York, USA, 2010.
[14] V. Ha-Thuc, et al., "Event Intensity Tracking in Weblog Collections,"
presented at ICWSM-DCW' 09, California, USA, 2009.
[15] H. Sayyadi, M. Hurst, and A. Maykov, "Event Detection and Tracking
in Social Streams," presented at Association for Advancement of
Artificial Intelligence (AAAI'09), 2009.
[16] H. Becker, M. Naaman, and L. Gravano, "Event Identfication in Social
Media," presented at Twelfth International Workshop on the Web
Databases (WebDB 2009), Providence, USA, 2009.
[17] H. Becker, M. Naaman, and L. Gravano, "Learning Similarity Metrics
for Event Identification in Social Media," presented at WSDM, New
York, USA, 2010.
[18] P. King and S. H. Mayeng, "Usefulness of Temporal Information
Automatically Extracted from News Articles for Topic Tracking," ACM
Transactions on Asian Language Information Processing, vol. 3, pp.
227-242, 2004.
[19] J. HOBBS, et al., "Fastus: Acascaded Finite-State Transducer for
Extracting Information from Natural-Language Text," presented at
MUC, Cambridge, MA, 1997.
[20] S. Wasi, Z. Shaikh, and J. Shamsi, "Contextual Event Information
Extractor for Emails," SURJ, 2011.
[21] C.-N. Seon, H. Kim, and H. Kim, "Information Extraction Using Finite
State Automata and Syllable N-Gramsin a Mobile Environment,"
presented at ACL-08: HLT Workshop on Mobile Language Processing,
Ohio, USA, 2008.