Using Genetic Algorithm to Improve Information Retrieval Systems

This study investigates the use of genetic algorithms in information retrieval. The method is shown to be applicable to three well-known documents collections, where more relevant documents are presented to users in the genetic modification. In this paper we present a new fitness function for approximate information retrieval which is very fast and very flexible, than cosine similarity fitness function.




References:
[1] J. H. Holland, "Adaptation in Natural and Artificial Systems",
University of Michigan Press, Ann Arbor, 1975.
[2] K. A. DeJong, An Analysis of the Behavior of a Class of Genetic
Adaptive Systems, Ph.D. Thesis, University of Michigan, 1975.
[3] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, Reading, MA., 1989.
[4] H. Chen, "Machine learning for information retrieval: neural networks,
symbolic learning, and genetic algorithms". Journal of the American
Society for Information Science, 46(3), 1995, pp. 194-216.
[5] J. Savoy and D. Vrajitoru, "Evaluation of learning schemes used in
information retrieval (CR-I-95-02)". Universite de Neuchatel, Faculte de
droit et des Sciences Economiques, 1996.
[6] M. Gordon, "Probabilistic and genetic algorithms in document
retrieval". Communications of the ACM, 31(10), 1988, pp. 1208-1218.
[7] J. Yang, R. Korfhage and E. Rasmussen. "Query improvement in
information retrieval using genetic algorithms--a report on the
experiments of the TREC project". In Proceedings of the 1st text
retrieval conference (TREC-1), 1992, pp. 31-58.
[8] J. Morgan and A. Kilgour. "Personalising on-line information retrieval
support with a genetic algorithm". In A. Moscardini, & P. Smith (Eds.),
PolyModel 16: Applications of artificial intelligence, 1996, pp. 142-149.
[9] M. Boughanem, C. Chrisment, and L. Tamine. "On using genetic
algorithms for multimodal relevance optimization in information
retrieval". Journal of the American Society for Information Science and
Technology, 53(11), 2002, pp. 934-942.
[10] J. T. Horng and C. C. Yeh. "Applying genetic algorithms to query
optimization in document retrieval". Information Processing &
Management, 36(5), 2000, pp. 737-759.
[11] D. Vrajitoru. "Crossover improvement for the genetic algorithm in
information retrieval". Information Processing& Management, 34(4),
1998, pp. 405-415.
[12] D. Vrajitoru. "Large population or many generations for genetic
algorithms? Implications in information retrieval". In F. Crestani and G.
Pasi (Eds.), Soft computing in information retrieval. Techniques and
applications, Physica-Verlag, 2000, pp. 199-222.
[13] D. Harman. "Overview of the first TREC conference". In Proceedings of
the 16th ACM SIGIR conference on information retrieval, 1993, pp. 36-
47.
[14] B. T. Bartell, G. W. Cottrell and R. K. Belew. "Automatic combination
of multiple ranked retrieval systems". In Proceedings of the 17th ACM
SIGIR conference on information retrieval, 1994, pp. 173-181.
[15] P. Pathak, M. Gordon and W. Fan. "Effective information retrieval using
genetic algorithms based matching functions adaption", in: Proc. 33rd
Hawaii International Conference on Science (HICS), Hawaii, USA,
2000.
[16] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval,
Adisson, 1999.
[17] G. Salton and M.H. McGill. Introduction to Modern Information
Retrieval, McGraw-Hill, 1983.
[18] C.J. Van Rijsbergen. Information Retrieval, second ed., Butterworth,
1979.
[19] A. Bookstein. "Outline of a general probabilistic retrieval model",
Journal of Documentation 39 (2), 1983, pp. 63-72.
[20] N. Fuhr. "Probabilistic models in information retrieval", Computer
Journal 35 (3), 1992, pp. 243-255.
[21] C. H. Chang and C. C. Hsu. The design of an information system for
hypertext retrieval and automatic discovery on WWW. Ph.D. thesis,
Department of CSIE, National Taiwan University, 1999.
[22] K. L. Kwok. "Comparing representations in Chinese information
retrieval". ACM SIGIR'97, Philadelphia, PA, USA, 1997, pp. 34 -41.
[23] T. Mitchell. Machine Learning, McGraw-Hill, 1997.
[24] H. Chen et al., "A machine learning approach to inductive query by
examples: an experiment using relevance feedback, ID3, genetic
algorithms, and simulated annealing", Journal of the American Society
for Information Science 49 (8), 1998, pp. 693-705.
[25] W. Fan, M.D. Gordon and P. Pathak. "Personalization of search engine
services for effective retrieval and knowledge management", in: Proc.
2000 International Conference on Information Systems (ICIS), Brisbane,
Australia, 2000.
[26] A.M. Robertson and P. Willet. "Generation of equifrequent groups of
words using a genetic algorithm", Journal of Documentation 50 (3),
1994, pp. 213-232.
[27] M. Gordon. "Probabilistic and genetic algorithms for document
retrieval", Communications of the ACM 31 (10), 1988, pp. 1208-1218.
[28] W. Fan, M.D. Gordon and P. Pathak. "Discovery of context-specific
ranking functions for effective information retrieval using genetic
programming", IEEE Transactions on knowledge and Data Engineering,
in press.
[29] M.P. Smith, M. Smith. "The use of genetic programming to build
Boolean queries for text retrieval through relevance feedback", Journal
of Information Science 23 (6), 1997, pp. 423-431.
[30] J. Koza. "Genetic Programming". On the Programming of Computers by
means of Natural Selection, The MIT Press, 1992.
[31] J. Yang and R. Korfhage. "Query modifications using genetic algorithms
in vector space models", International Journal of Expert Systems 7 (2),
1994, pp.165-191.
[32] H. Kucera and N. Francis. "Computational analysis of present-day
American English". Providence, RD: Brown University Press, 1967.
[33] M. F. Porter. "An algorithm for suffix stripping. Program", 14(3), 1980,
pp. 130-137.
[34] G. Salton and C. Buckley. "Improving retrieval performance by
relevance feedback". Journal of the American Society for Information
Science, 41(4), 1990, pp. 288-297.
[35] T. Noreault, M. McGill and M. B. Koll. "A performance evaluation of
similarity measures, document term weighting schemes and
representation in a Boolean environment". Information retrieval
research. London: Butterworths, 1981.