Dynamic Inverted Index Maintenance

The majority of today's IR systems base the IR task on two main processes: indexing and searching. There exists a special group of dynamic IR systems where both processes (indexing and searching) happen simultaneously; such a system discards obsolete information, simultaneously dealing with the insertion of new in¬formation, while still answering user queries. In these dynamic, time critical text document databases, it is often important to modify index structures quickly, as documents arrive. This paper presents a method for dynamization which may be used for this task. Experimental results show that the dynamization process is possible and that it guarantees the response time for the query operation and index actualization.


Authors:



References:
[1] Apache, Jakarta project: Lucene. http://jakarta.apache.org.
[2] Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Chapter
8. ACM Press 1999.
[3] Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search
engine. Computer Networks and ISDN Systems 30, 1–7, 107–117, 1998.
[4] Brown, E.W.: Execution Performance Issues in Full-Text Information
Retrieval. Computer Science Department, University of Massachusetts at
Amherst, Technical Report 95-81, October 1995.
[5] Clarke, C., Cormack, G.: Dynamic Inverted Indexes for a Distributed
Full-Text Retrieval System. TechRep MT-95-01, University of Waterloo,
February 1995.
[6] Cutting, D., Pedersen, J.: Optimizations for dynamic inverted index
maintenance. Proceedings of SIGIR, 405-411, 1990.
[7] Fox, E.A., Harman, D.K., Baeza-Yates, R., Lee, W.C.: Inverted fi les. In
Information Retrieval: Data Structures and Algorithms, Prentice-Hall, pp
28-43.
[8] EGOTHOR, JAVA IR system. http://www.egothor.org/.
[9] Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Characterizing
Web Document Change, LNCS 2118, 133–146, 2001.
[10] Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic
Maintenance of Web Indexes Using Landmarks. Proc. of the 12th W3
Conference, 2003.
[11] Moffat, A., Zobel, J.: Self-Indexing Inverted Files for Fast Text Retrival.
ACM TIS, 349–379, October 1996, Volume 14, Number 4.
[12] Mehlhorn, K.: Data Structures and Effi cient Algorithms, Springer Verlag,
EATCS Monographs, 1984.
[13] Mehlhorn, K., Overmars, M.H.: Optimal Dynamization of Decomposable
Searching Problems. IPL 12, 93–98, 1981.
[14] Mehlhorn, K.: Lower Bounds on the Effi ciency of Transforming Static
Data Structures into Dynamic Data Structures. Math. Systems Theory 15,
1–16, 1981.
[15] Salton, G., Lesk, M.E.: Computer evaluation of indexing and text
processing. Journal of the ACM, 15(1):8-36, January 1968.
[16] Salton, G.: The SMART Retrieval System - Experiments in Automatic
Document Processing. Prentice Hall Inc., Englewood Cliffs, 1971.
[17] Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental Updates of
Inverted Lists for Text Document Retrieval. Short Version of Stanford University
Computer Science Technical Note STAN-CS-TN-93-1, December,
1993.