Examining the Value of Attribute Scores for Author-Supplied Keyphrases in Automatic Keyphrase Extraction

Automatic keyphrase extraction is useful in efficiently locating specific documents in online databases. While several techniques have been introduced over the years, improvement on accuracy rate is minimal. This research examines attribute scores for author-supplied keyphrases to better understand how the scores affect the accuracy rate of automatic keyphrase extraction. Five attributes are chosen for examination: Term Frequency, First Occurrence, Last Occurrence, Phrase Position in Sentences, and Term Cohesion Degree. The results show that First Occurrence is the most reliable attribute. Term Frequency, Last Occurrence and Term Cohesion Degree display a wide range of variation but are still usable with suggested tweaks. Only Phrase Position in Sentences shows a totally unpredictable pattern. The results imply that the commonly used ranking approach which directly extracts top ranked potential phrases from candidate keyphrase list as the keyphrases may not be reliable.




References:
[1] K. Sarkar, M. Nasipuri, and S. Ghose, "A New Approach to Keyphrase
Extraction Using Neural Networks," IJCSI International Journal of
Computer Science Issues, vol. 7, no. 2, 2010.
[2] N. Kumar and K. Srinathan, "Automatic keyphrase extraction from
scientific documents using N-gram filtration technique," Proceeding of
the eighth ACM symposium on Document engineering - DocEng -08, p.
199, 2008.
[3] K. Frantzi, S. Ananiadou, and H. Mima, "Automatic recognition of
multi-word terms: the C-value/NC-value method," International Journal
on Digital Libraries, vol. 3, no. 2, pp. 115-130, Aug. 2000.
[4] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-
Manning, "Domain-Specific Keyphrase Extraction," in Proceedings of
the 14th ACM international conference on Information and knowledge
management, 2005, pp. 668-671.
[5] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevillmanning,
"KEA: Practical Automatic Keyphrase Extraction," in
Proceedings of the fourth ACM conference on Digital libraries, 1999.
[6] P. Turney, "Learning to Extract Keyphrases from Text," National
Research Council of Canada, 1999.
[7] A. Csomai, "Keywords in the mist: Automated keyword extraction for
very large documents and back of the book indexing.," University Of
North Texas, 2008.
[8] P. D. Turney, "Extraction of Keyphrases from Text: Evaluation of Four
Algorithms," October, p. 31, 1997.
[9] S. R. El-Beltagy and A. Rafea, "KP-Miner: A keyphrase extraction
system for English and Arabic documents," Information Systems, vol.
34, no. 1, pp. 132-144, Mar. 2008.
[10] S. N. Kim and M.-Y. Kan, "Re-examining automatic keyphrase
extraction approaches in scientific articles," Proceedings of the
Workshop on Multiword Expressions Identification, Interpretation,
Disambiguation and Applications - MWE -09, no. August, p. 9, 2009.
[11] O. Medelyan and I. H. Witten, "Domain-Independent Automatic
Keyphrase Indexing with Small Training Sets," Journal of the American
Society for Information Science & Technology, vol. 59, no. 7, pp. 1026-
1040, 2008.
[12] Y. Park, R. J. Byrd, and B. K. Boguraev, "Automatic Glossary
Extraction: Beyond Terminology," in Proceedings of the 19th
international conference on Computational linguistics - Volume 1, 2002.
[13] P. D. Turney, "Learning Algorithms for Keyphrase Extraction,"
Information Retrieval - INRT 34-99, 1999.