Examining the Value of Attribute Scores for Author-Supplied Keyphrases in Automatic Keyphrase Extraction
Automatic keyphrase extraction is useful in efficiently
locating specific documents in online databases. While several
techniques have been introduced over the years, improvement on
accuracy rate is minimal. This research examines attribute scores for
author-supplied keyphrases to better understand how the scores affect
the accuracy rate of automatic keyphrase extraction. Five attributes
are chosen for examination: Term Frequency, First Occurrence, Last
Occurrence, Phrase Position in Sentences, and Term Cohesion
Degree. The results show that First Occurrence is the most reliable
attribute. Term Frequency, Last Occurrence and Term Cohesion
Degree display a wide range of variation but are still usable with
suggested tweaks. Only Phrase Position in Sentences shows a totally
unpredictable pattern. The results imply that the commonly used
ranking approach which directly extracts top ranked potential phrases
from candidate keyphrase list as the keyphrases may not be reliable.
[1] K. Sarkar, M. Nasipuri, and S. Ghose, "A New Approach to Keyphrase
Extraction Using Neural Networks," IJCSI International Journal of
Computer Science Issues, vol. 7, no. 2, 2010.
[2] N. Kumar and K. Srinathan, "Automatic keyphrase extraction from
scientific documents using N-gram filtration technique," Proceeding of
the eighth ACM symposium on Document engineering - DocEng -08, p.
199, 2008.
[3] K. Frantzi, S. Ananiadou, and H. Mima, "Automatic recognition of
multi-word terms: the C-value/NC-value method," International Journal
on Digital Libraries, vol. 3, no. 2, pp. 115-130, Aug. 2000.
[4] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-
Manning, "Domain-Specific Keyphrase Extraction," in Proceedings of
the 14th ACM international conference on Information and knowledge
management, 2005, pp. 668-671.
[5] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevillmanning,
"KEA: Practical Automatic Keyphrase Extraction," in
Proceedings of the fourth ACM conference on Digital libraries, 1999.
[6] P. Turney, "Learning to Extract Keyphrases from Text," National
Research Council of Canada, 1999.
[7] A. Csomai, "Keywords in the mist: Automated keyword extraction for
very large documents and back of the book indexing.," University Of
North Texas, 2008.
[8] P. D. Turney, "Extraction of Keyphrases from Text: Evaluation of Four
Algorithms," October, p. 31, 1997.
[9] S. R. El-Beltagy and A. Rafea, "KP-Miner: A keyphrase extraction
system for English and Arabic documents," Information Systems, vol.
34, no. 1, pp. 132-144, Mar. 2008.
[10] S. N. Kim and M.-Y. Kan, "Re-examining automatic keyphrase
extraction approaches in scientific articles," Proceedings of the
Workshop on Multiword Expressions Identification, Interpretation,
Disambiguation and Applications - MWE -09, no. August, p. 9, 2009.
[11] O. Medelyan and I. H. Witten, "Domain-Independent Automatic
Keyphrase Indexing with Small Training Sets," Journal of the American
Society for Information Science & Technology, vol. 59, no. 7, pp. 1026-
1040, 2008.
[12] Y. Park, R. J. Byrd, and B. K. Boguraev, "Automatic Glossary
Extraction: Beyond Terminology," in Proceedings of the 19th
international conference on Computational linguistics - Volume 1, 2002.
[13] P. D. Turney, "Learning Algorithms for Keyphrase Extraction,"
Information Retrieval - INRT 34-99, 1999.
[1] K. Sarkar, M. Nasipuri, and S. Ghose, "A New Approach to Keyphrase
Extraction Using Neural Networks," IJCSI International Journal of
Computer Science Issues, vol. 7, no. 2, 2010.
[2] N. Kumar and K. Srinathan, "Automatic keyphrase extraction from
scientific documents using N-gram filtration technique," Proceeding of
the eighth ACM symposium on Document engineering - DocEng -08, p.
199, 2008.
[3] K. Frantzi, S. Ananiadou, and H. Mima, "Automatic recognition of
multi-word terms: the C-value/NC-value method," International Journal
on Digital Libraries, vol. 3, no. 2, pp. 115-130, Aug. 2000.
[4] E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-
Manning, "Domain-Specific Keyphrase Extraction," in Proceedings of
the 14th ACM international conference on Information and knowledge
management, 2005, pp. 668-671.
[5] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevillmanning,
"KEA: Practical Automatic Keyphrase Extraction," in
Proceedings of the fourth ACM conference on Digital libraries, 1999.
[6] P. Turney, "Learning to Extract Keyphrases from Text," National
Research Council of Canada, 1999.
[7] A. Csomai, "Keywords in the mist: Automated keyword extraction for
very large documents and back of the book indexing.," University Of
North Texas, 2008.
[8] P. D. Turney, "Extraction of Keyphrases from Text: Evaluation of Four
Algorithms," October, p. 31, 1997.
[9] S. R. El-Beltagy and A. Rafea, "KP-Miner: A keyphrase extraction
system for English and Arabic documents," Information Systems, vol.
34, no. 1, pp. 132-144, Mar. 2008.
[10] S. N. Kim and M.-Y. Kan, "Re-examining automatic keyphrase
extraction approaches in scientific articles," Proceedings of the
Workshop on Multiword Expressions Identification, Interpretation,
Disambiguation and Applications - MWE -09, no. August, p. 9, 2009.
[11] O. Medelyan and I. H. Witten, "Domain-Independent Automatic
Keyphrase Indexing with Small Training Sets," Journal of the American
Society for Information Science & Technology, vol. 59, no. 7, pp. 1026-
1040, 2008.
[12] Y. Park, R. J. Byrd, and B. K. Boguraev, "Automatic Glossary
Extraction: Beyond Terminology," in Proceedings of the 19th
international conference on Computational linguistics - Volume 1, 2002.
[13] P. D. Turney, "Learning Algorithms for Keyphrase Extraction,"
Information Retrieval - INRT 34-99, 1999.
@article{"International Journal of Information, Control and Computer Sciences:54528", author = "Vicky Min-How Lim and Siew Fan Wong and Tong Ming Lim", title = "Examining the Value of Attribute Scores for Author-Supplied Keyphrases in Automatic Keyphrase Extraction", abstract = "Automatic keyphrase extraction is useful in efficiently
locating specific documents in online databases. While several
techniques have been introduced over the years, improvement on
accuracy rate is minimal. This research examines attribute scores for
author-supplied keyphrases to better understand how the scores affect
the accuracy rate of automatic keyphrase extraction. Five attributes
are chosen for examination: Term Frequency, First Occurrence, Last
Occurrence, Phrase Position in Sentences, and Term Cohesion
Degree. The results show that First Occurrence is the most reliable
attribute. Term Frequency, Last Occurrence and Term Cohesion
Degree display a wide range of variation but are still usable with
suggested tweaks. Only Phrase Position in Sentences shows a totally
unpredictable pattern. The results imply that the commonly used
ranking approach which directly extracts top ranked potential phrases
from candidate keyphrase list as the keyphrases may not be reliable.", keywords = "Accuracy, Attribute Score, Author-supplied
keyphrases, Automatic keyphrase extraction.", volume = "6", number = "12", pages = "1649-6", }