On the Interactive Search with Web Documents

Due to the large amount of information in the World
Wide Web (WWW, web) and the lengthy and usually linearly
ordered result lists of web search engines that do not indicate
semantic relationships between their entries, the search for topically
similar and related documents can become a tedious task. Especially,
the process of formulating queries with proper terms representing
specific information needs requires much effort from the user. This
problem gets even bigger when the user's knowledge on a subject and
its technical terms is not sufficient enough to do so. This article
presents the new and interactive search application DocAnalyser that
addresses this problem by enabling users to find similar and related
web documents based on automatic query formulation and state-ofthe-
art search word extraction. Additionally, this tool can be used to
track topics across semantically connected web documents.





References:
[1] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank Citation
Ranking: Bringing Order to the Web”, Technical Report, Stanford
Digital Library Technologies Project, 1998.
[2] Website of Google Autocomplete, Web Search Help,
https://support.google.com/websearch/answer/106230
[3] M. Kubek, H.F. Witschel, “Searching the Web by Using the Knowledge
in Local Text Documents”, In Proceedings of Mallorca Workshop 2010
Autonomous Systems, Shaker Verlag, Aachen, 2010.
[4] K. Yee, K. Swearingen, K. Li, M. Hearst, “Faceted Metadata for Image
Search and Browsing”, CHI ’03 Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, pp. 401–408, New York,
2003.
[5] F. Tushabe, M. H. Wilkinson, “Content-based Image Retrieval Using
Combined 2D Attribute Pattern Spectra”, Advances in Multilingual and
Multimodal Information Retrieval, pp. 554–561, Springer, Heidelberg,
2008.
[6] P. Sukjit, M. Kubek, T. Böhme, H. Unger, “PDSearch: Using Pictures as
Queries”, Recent Advances in Information and Communication
Technology, Advances in Intelligent Systems and Computing, Vol. 265,
pp. 255–262, Springer International Publishing, 2014.
[7] J. Wang, J. Liu, C. Wang, “Keyword Extraction Based on PageRank”,
Advances in Knowledge Discovery and Data Mining, Lecture Notes in
Computer Science, Vol. 4426, pp. 857–864, Springer Berlin Heidelberg,
2007.
[8] R. Mihalcea, P. Tarau, “TextRank: Bringing Order into Texts”,
Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP 2004), pp. 401–411, Association for
Computational Linguistics, 2004.
[9] M. Kubek, H. Unger, “Search Word Extraction Using Extended
PageRank Calculations”, Autonomous Systems: Developments and
Trends, Volume 391 of Studies in Computational Intelligence, pp. 325–
337, Springer Berlin Heidelberg, 2011.
[10] G. Salton, A. Wong, C.S. Yang, “A vector space model for automatic
indexing”, Communications. of the ACM, Vol. 18, Issue 11, pp. 613–
620, 1975.
[11] G. Heyer, U. Quasthoff, T. Wittig, Text Mining: Wissensrohstoff Text:
Konzepte, Algorithmen, Ergebnisse, W3L-Verlag, 2006.
[12] M. Kubek, “Dezentrale, kontextbasierte Steuerung der Suche im
Internet“, PhD Thesis, FernUniversität in Hagen, 2012.
[13] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment”,
Proc. of ACM-SIAM Symp.on Discrete Algorithms, San Francisco,
California, pp. 668–677, 1998.
[14] Website of DocAnalyser, http://www.docanalyser.de, 2014, Last
retrieved on 10/01/2014
[15] M. Kubek, H. Unger, “On N-term Co-occurrences”, Recent Advances in
Information and Communication Technology, Advances in Intelligent
Systems and Computing, Vol. 265, pp. 63–72, Springer International
Publishing, 2014.
[16] J.B. MacQueen, “Some Methods for Classification and Analysis of
Multivariate Observations”, Proceedings of 5th Berkeley Symposium on
Mathematical Statistics and Probability, Vol. 1, pp. 281–297, University
of California Press, 1967.
[17] C. Biemann, “Chinese Whispers: An Efficient Graph Clustering
Algorithm and its Application to Natural Language Processing
Problems”, Proceedings of the HLT-NAACL-06 Workshop on
Textgraphs-06, pp. 73–80, ACL, New York City, 2006.
[18] V. Heß, “Implementierung und Evaluation eines Verfahrens zur
Themenverfolgung in großen Korpora“, Master’s thesis,
FernUniversit¨at in Hagen, 2014.