Строковые методы автоматического определения семантической близости текстов на русском языке

Крюкова Анна Владимировна

В работе рассматривается задача оценки семантической близости текстов на русском языке с использованием компьютерной платформы DKPro Similarity. В ходе исследования были проведены эксперименты с лексическими языконезависимыми метриками близости текстов. Результаты исследования подтверждают, что платформа DKPro Similarity пригодна для оценки семантической близости русскоязычных текстов.

Библиография: 
  • Bär et al., 2012 — Bär D. et al. UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures // SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics. Vol. 1: Proceedings of the main conference and the shared task, and Vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (2012). P. 435–440.
  • Bär, Zesch, Gurevych, 2013 — Bär D., Zesch T., Gurevych I. DKPro Similarity: An Open Source Framework for Text Similarity // Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2013). P. 121–126.
  • Bär, Zesch, Gurevych, 2015 — Bär D., Zesch T., Gurevych I. Composing Measures for Computing Text Similarity [Technical Report] (2015). URL: http://tuprints.ulb.tu-darmstadt.de/4342/1/TUD-CS-2015-0017.pdf (дата обращения: 15.04.17).
  • Broder, 1997 — Broder A. Z. On the resemblance and containment of documents // Proceedings of the Compression and Complexity of Sequences (1997). P. 21–29.
  • Clough, Stevenson, 2011 — Clough P., Stevenson M. Special Issue on Plagiarism and Authorship Analysis // Language Resources and Evaluation. Vol. 45(1) (2011). P. 5–24.
  • Korobov, 2015 — Korobov M. Morphological Analyzer and Generator for Russian and Ukrainian Languages // Analysis of Images, Social Networks and Texts (2015). P. 320–332.
  • Lyon, Barrett, Malcolm, 2004 — Lyon C., Barrett R., Malcolm J. A theoretical basis to the automated detection of copying // Plagiarism: Prevention, Practice and Policies Conference (2004)
  • Manning et al., 2008 — Manning C. D. et al. Introduction to Information Retrieval // Cambridge University Press, 2008.
  • Mihalcea et al., 2006 — Mihalcea R. et al. Corpus-based and Knowledge-based Measures of Text Semantic Similarity // Proceedings of the 21st national conference on Artificial intelligence. Vol. 1 (2006). P. 775–780.
  • Šarić et al., 2012 — Šarić F. et al. TakeLab: Systems for Measuring Semantic Text Similarity // SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics. Vol. 1: Proceedings of the main conference and the shared task, and Vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (2012). P. 441–448.
  • Wise, 1996 — Wise M. J. Yap3: Improved detection of similarities in computer programs and other texts // Proceedings of SIGCSE '96 (1996). P. 130–134.
Текст статьи: 
Страницы: 
225-230