Deep Neural Network Models for Sequence Labeling and Coreference Tasks/ Глубокие нейросетевые модели для задач разметки последовательности и разрешения кореференции тема диссертации и автореферата по ВАК РФ 05.13.01, кандидат наук Ле Тхе Ань

  • Ле Тхе Ань
  • кандидат науккандидат наук
  • 2020, ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)»
  • Специальность ВАК РФ05.13.01
  • Количество страниц 143
Ле Тхе Ань. Deep Neural Network Models for Sequence Labeling and Coreference Tasks/ Глубокие нейросетевые модели для задач разметки последовательности и разрешения кореференции: дис. кандидат наук: 05.13.01 - Системный анализ, управление и обработка информации (по отраслям). ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)». 2020. 143 с.

Оглавление диссертации кандидат наук Ле Тхе Ань

Contents

Abstract

Acknowledgments

Abbreviations

List of Figures

List of Tables

1 Introduction

1.1 Overview of Deep Learning

1.1.1 Artificial Intelligence, Machine Learning, and Deep Learning

1.1.2 Milestones in Deep Learning History

1.1.3 Types of Machine Learning Models

1.2 Brief Overview of Natural Language Processing

1.3 Dissertation Overview

1.3.1 Scientific Actuality of the Research

1.3.2 The Goal and Task of the Dissertation

1.3.3 Scientific Novelty

1.3.4 Theoretical and Practical Value of the Work in the Dissertation

1.3.5 Statements to be Defended

1.3.6 Presentations and Validation of the Research Results

1.3.7 Publications

1.3.8 Dissertation Structure

2 Deep Neural Network Models for NLP Tasks

2.1 Word Representation Models

2.1.1 Word Representation

2.1.2 Prediction-based Models

2.1.3 Count-based Models

2.2 Deep Neural Network Models

2.2.1 Convolutional Neural Network

2.2.2 Recurrent Neural Network

2.2.3 Long Short-Term Memory Cells

2.2.4 LSTM Networks

2.3 Pre-trained Language Models

2.3.1 ELMo

2.3.2 Transformer

2.3.3 OpenAI's GPT

2.3.4 Google's BERT

2.4 Summary

3 Sequence Labeling with Character-aware Deep Neural Networks and Language Models

3.1 Introduction to the Sequence Labeling Tasks

3.2 Related Work

3.2.1 Rule-based Models

3.2.2 Feature-based Models

3.2.3 Deep Learning-based Models

3.2.4 Related Work on Vietnamese Named Entity Recognition

3.2.5 Related Work on Russian Named Entity Recognition

3.3 Tagging Schemes

3.4 Evaluation Metrics

3.5 WCC-NN-CRF Models for Sequence Labeling Tasks

3.5.1 Backbone WCC-NN-CRF Architecture

3.5.2 Language Model-based Architecture

3.6 Application of WCC-NN-CRF Models for Named Entity Recognition

3.6.1 Overview of Named Entity Recognition Task

3.6.2 Datasets and Pre-trained Word Embeddings

3.6.3 Evaluation of backbone WCC-NN-CRF Model

3.6.4 Evaluation of ELMo-based WCC-NN-CRF model

3.6.5 Evaluation of BERT-based Multilingual WCC-NN-CRF Model

3.7 Application of WCC-NN-CRF Model for Sentence Boundary Detection

3.7.1 Introduction to the Sentence Boundary Detection Task

3.7.2 Sentence Boundary Detection as a Sequence Labeling Task

3.7.3 Evaluation of WCC-NN-CRF SBD Model

3.8 Conclusions

4 Coreference Resolution with Sentence-level Coreferential Scoring

4.1 The Coreference Resolution Task

4.2 Related Work

4.2.1 Rule-based Models

4.2.2 Deep Learning Models

4.3 Coreference Resolution Evaluation Metrics

4.4 Baseline Model Description

4.5 Sentence-level Coreferential Relation-based Model

4.6 BERT-based Coreference Model

4.7 Experiments and Results

4.7.1 Datasets

4.7.2 Evaluation of Proposed Models

4.8 Conclusions

5 Conclusions

5.1 Conclusions for Sequence Labeling Task

5.2 Conclusions for Coreference Resolution Task

5.3 Summary of the Main Contributions of the Dissertation

Bibliography

Рекомендованный список диссертаций по специальности «Системный анализ, управление и обработка информации (по отраслям)», 05.13.01 шифр ВАК

Введение диссертации (часть автореферата) на тему «Deep Neural Network Models for Sequence Labeling and Coreference Tasks/ Глубокие нейросетевые модели для задач разметки последовательности и разрешения кореференции»

Abstract

Deep neural network models have recently received tremendous attentions from both academy and industry, and of course, garnered amazing results in a variety of domains ranging from Computer Vision, Speech Recognition to Natural Language Processing (NLP). They significantly lifted the performance of machine learning-based systems to a whole new level, close to the human-level performance. As a matter of course, the number of deep learning projects has also increased year by year. The IPavlov project1, based at the Neural Networks and Deep Learning Lab of Moscow Institute of Physics and Technology (MIPT), is one of them, aiming at building a set of pre-trained network models, predefined dialogue system components and pipeline templates. This thesis is based on the work carried out as a part of this project, focusing on studying deep neural network models to address Sequence Labeling and Coreference Resolution tasks.

This thesis consists of three main parts. Firstly, we systematically synthesize three key concepts in the field of Deep Learning for NLP closely related to the work carried out in this thesis, including (1) two approaches to word representation learning, (2) deep neural network models often used to address machine learning tasks in general and NLP tasks in particular, and (3) cutting-edge pre-trained language models and their applications in downstream tasks. Secondly, we propose three deep neural network models for Sequence Labeling tasks, including (1) the hybrid model consisting of three sub-networks to fully capture character-level and capitalization features as well as word context features, (2) language modeling-based model, and (3) the multilingual model. These proposed models were evaluated on the task of Named Entity Recognition. Conducted experiments on six datasets covering four languages Russian, Vietnamese, English, and Chinese datasets showed that our models achieved state-of-the-art performance. Besides that, we reformulated the task of Sentence Boundary Detection as Sequence Labeling task and used the proposed model to address this task. The obtained results on two conversational datasets pointed out that the proposed model achieved an impressive accuracy. Thirdly, we propose two models for Coreference Resolution task, including (1) Sentence-level Coreferential Relation-based model that can take as input a paragraph, a discourse, or even a document with hundreds of sentences and predict the coreference relations between sentences, and (2) the language modeling-based

1https://ipavlov.ai/

model that leverages the power of modern language models to boost the model performance. The experiment results on two Russian datasets and the comparisons with the other existing models pointed out that our models obtained the cutting-edge results on both Anaphora and Coreference Resolution tasks.

Похожие диссертационные работы по специальности «Системный анализ, управление и обработка информации (по отраслям)», 05.13.01 шифр ВАК

Заключение диссертации по теме «Системный анализ, управление и обработка информации (по отраслям)», Ле Тхе Ань

5.3 Summary of the Main Contributions of the Dissertation

In conclusion, the main contributions of the dissertation are:

1. An original hybrid model for sequence labeling task was proposed and studied. This

model extended existed Bi-LSTM CRF architectures with (1) trainable CNN for generation of character-level representation of an input sequence, and (2) Bi-LSTM network for encoding capitalization features. The model achieved state of the art performance on Russian and Vietnamese datasets with F1 98.21%, 94.43% on NE3 and VLSP-2016. Ablation studies demonstrated that character-level encoding produces a larger improvement than capitalisation encoding.

2. Extensions of the original architecture with encoders based on language models ELMo and BERT were evaluated on Russian and English datasets. It obtained state of the art performance of 99.17%, 92.91% F1 on NE3 and Gareev's dataset, and a comparable performance, 92.27% F1, on CoNLL-2003.

3. Application of proposed sequence labeling model to the sentence boundary detection task produced solid results of 89.99% F1 and 95.88% F1 on the Cornell Movie-Dialog and DailyDialog datasets.

4. Sentence-level coreferential relation can significantly improve the performance of solving coreference resolution task. The experiments on OntoNotes dataset shows that quality of solution can be boosted up to 5.84%.

5. An original model for learning sentence-level coreferential relationships was introduced. Incorporation of this model in the baseline coreference architecture improved it's performance for English.

6. Application of the model with sentence coreference module allowed to achieve state of the art of 58.42% average F1 on RuCor dataset.

Список литературы диссертационного исследования кандидат наук Ле Тхе Ань, 2020 год

Bibliography

1. Adwait Ratnaparkhi. "A Maximum Entropy Model for Part-Of-Speech Tagging". In: Conference on Empirical Methods in Natural Language Processing. 1996. URL: https: //www.aclweb.org/anthology/W96-0213.

2. Alan Akbik, Duncan Blythe, Roland Vollgraf. "Contextual String Embeddings for Sequence Labeling". In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. URL: https://www.aclweb.org/anthology/C18-1139/.

3. Alan Mathison Turing. "Computing Machinery and Intelligence". In: MIND 59 (236 1950), pp. 433-460. URL: https://academic.oup.com/mind/article/LIX/236/ 433/986238.

4. Alec Radford, Karthik Narasimhan, Time Salimans, and Ilya Sutskever. "Improving language understanding with unsupervised learning". In: Technical report, Technical report, OpenAI. 2018. URL: https : / / s3 - us - west - 2 . amazonaws . com/ openai -assets /research- covers / language - unsupervised/language_understanding_ paper.pdf.

5. Alec Radford, Kartik Narsimhan, Tim Salimans, Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. 2018. URL: https : / / pdfs . semanticscholar.org/cd18/800a0fe0b668a1cc19f2ec95b5003d0a5035.pdf?_ga= 2.68781521.1765043536.1574952036-6193859.1560422113.

6. Alex Gittens, Dimitris Achlioptas, Michael W. Mahoney. "Skip-Gram - Zipf + Uniform = Vector Additivity". In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, July 2017, pp. 69-76. URL: https://www.aclweb.org/anthology/P17-1007.

7. Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton. Speech Recognition with Deep Recurrent Neural Networks. 2013. arXiv: 15081303.5778 [cs.NE]. URL: https: //arxiv.org/abs/1303.5778.

8. Alexandre Passos, Vineet Kumar, Andrew McCallum. "Lexicon Infused Phrase Em-beddings for Named Entity Resolution". In: Proceedings of the Eighteenth Conference

on Computational Language Learning, Baltimore, Maryland USA. Vol. 10565. 2014, pp. 78-86. URL: https://www.aclweb.org/anthology/W14-1609.

9. Amit Bagga, Breck Baldwin. "Algorithms for scoring coreference chains". In: Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain. 1998, pp. 563-566. URL: https://pdfs.semanticscholar.org/ 4b51/2f10838e05f5b2eee94bfbd20f3d9c4ecb9b.pdf.

10. Andriy Mnih, Geoffrey E. Hinton. "A Scalable Hierarchical Distributed Language Model". In: Advances in Neural Information Processing Systems 21 (NIPS 2008). Vancouver, British Columbia, Canada, Dec. 2008. URL: https : / / papers . nips . cc/paper/3583-a-scalable-hierarchical-distributed-language-model.pdf.

11. Andriy Mnih, Geoffrey E. Hinton. "Three new graphical models for statistical language modelling". In: Proceedings of the 24th International Conference on Machine learning. Corvalis, Oregon, USA, June 2007, pp. 641-648. URL: https ://www . cs . toronto . edu/~hinton/absps/threenew.pdf.

12. Aria Haghighi, Dan Klein. "Simple coreference resolution with rich syntactic and semantic features". In: Chinese Computational Linguistics and Natural Language ProcProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Vol. 3. 2009. URL: https://dl.acm.org/doi/10.5555/1699648. 1699661.

13. Arora Sanjeev, Li Yuanzhi, Liang Yingyu, Ma Tengyu, Risteski, Andrej. "A Latent Variable Model Approach to PMI-based Word Embeddings". In: Transactions of the Association for Computational Linguistics 4 (2016), pp. 385-399. DOI: 10.1162/tacl_ a_00106. URL: https://www.aclweb.org/anthology/Q16-1028.

14. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser. "Attention Is All You Need". In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA, 2017. URL: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.

15. Baldwin Breck. "CogNIAC: high precision coreference with limited knowledge and linguistic resources". In: Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts. 1997. URL: https://www.aclweb.org/anthology/W97-1306.

16. Brill E., Pop M. "Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging". In: Armstrong S., Church K., Isabelle P., Manzi S., Tzoukermann E., Yarowsky D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology. Vol. 11. Springer, Dordrecht, 1999, pp. 152-155. URL: https://doi.org/10.1007/978-94-017-2390-9_3.

17. Chinatsu Aone, Lauren Halverson, Tom Hampton, Mila Ramos-Santacruz. "SRA: Description of the IE2 System Used for MUC-7". In: Seventh Message Understanding Conference (MUC-7). 1998. URL: https://www.aclweb.org/anthology/M98-1012.

18. Chun-Yen Chen, Dian Yu, Weiming Wen, Yi Mang Yang, Jiaping Zhang, Mingyang Zhou Kevin Jesse, Austin Chau, Antara Bhowmick, Shreenath Iyer, Giritheja Sreeni-vasulu Runxiang Cheng, Ashwin Bhandare, Zhou Yu. "Gunrock: Building A HumanLike Social Bot By Leveraging Large Scale Real User Data". In: 2nd Proceedings of Alexa Prize. 2018. URL: https : / / pdfs . semanticscholar . org/b402/ b85ad45e3ac51f1da8ee718373082ce24f47.pdf.

19. Chunqi Wang, Wei Chen, Bo Xu. "Named Entity Recognition with Gated Convolu-tional Neural Networks". In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, CCL 2017, NLP-NABD 2017. Vol. 10565. Springer, Cham, 2017, pp. 110-121. URL: https://link.springer.com/ chapter/10.1007/978-3-319-69005-6_10.

20. Cristian Danescu-Niculescu-Mizil and Lillian Lee. "Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs." In: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011. 2011. URL: https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html.

21. David H. Hubel, Torsten N. Wiesel. "Receptive Fields and Functional Architecture of Monkey Striate Cortex". In: The Journal of Physiology 195 (1968), pp. 215-243. URL: https://physoc.onlinelibrary .wiley .com/doi/pdf/10.1113/jphysiol.1968. sp008455.

22. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Grae-pel, and Demis Hassabis. "Mastering the game of Go with deep neural networks and tree search". In: Nature 529 (2016), pp. 484-489. URL: https://www.nature.com/ articles/nature16961.

23. Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. 2015. arXiv: 1409.0473 [cs.CL]. URL: https://arxiv.org/abs/1409.0473.

24. Edgar Altszyler, Mariano Sigman, Sidarta Ribeiro, and Diego Fernandez Slezak. Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. 2017. arXiv: 1610.01520v2 [cs.CL]. URL: https://arxiv.org/pdf/1610. 01520.pdf.

25. Elaheh Sadredini, Deyuan Guo, Chunkun Bo, Reza Rahimi, Kevin Skadron, Hongning Wang. "A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators". In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2018, pp. 665-674. URL: https://dl.acm. org/doi/pdf/10.1145/3219819.3219889.

26. Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. "Fast and accurate entity recognition with iterated dilated convolutions". In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017, pp. 26602670. URL: https://www.aclweb.org/anthology/D17-1283.

27. Eric Brill. "A simple rule-based part of speech tagger". In: Proceedings of the third conference on Applied natural language processing. 1992, pp. 152-155. URL: https : //dl.acm.org/doi/10.3115/974499.974526.

28. Erik F. Tjong Kim Sang. "Introduction to the CoNLL-2002 shared task: language-independent named entity recognition". In: COLING-02 proceedings of the 6th conference on Natural language learning. Vol. 20. 2002, pp. 1-4. URL: https://www.aclweb. org/anthology/W02-2024.

29. Erik F. Tjong Kim Sang, Fien De Meulder. "Introduction to the conll-2003 shared task: Language-independent named entity recognition". In: Proceedings of the Seventh Conference on Natural Language Learning. 2003, pp. 142-147. URL: https://www. aclweb.org/anthology/W03-0419.

30. Felix A. Gers, Jurgen Schmidhuber. "Recurrent Nets that Time and Count". In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium. Como, Italy, July 2000, pp. 189-194. URL: https://ieeexplore.ieee.org/ document/861302.

31. Firth, J. R. "A Synopsis of Linguistic Theory 1930-1955". In: Studies in Linguistic Analysis (1957). URL: http : / / cs . brown . edu/ courses/ csci2952d/ readings/ lecture1-firth.pdf.

32. Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Court-

ney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry Lyakh, Salvatore Mandra, Jar-rod R. McClean, Matthew McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov, John C. Platt, Chris Quintana, Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, Amit Vainsencher, Benjamin Vil-lalonga, Theodore White, Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, John M. Martinis. "Quantum supremacy using a programmable superconducting processor". In: Nature 574 (Oct. 2019), pp. 505-510. URL: https://www.nature.com/ articles/s41586-019-1666-5.

33. Frederic Morin and Yoshua Bengio. "Hierarchical Probabilistic Neural Network Language Model". In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. Jan. 2005, pp. 246-252. URL: http://www.gatsby.ucl.ac. uk/aistats/aistats2005_eproc.pdf.

34. Gang Luo, Xiaojiang Huang, Chin-Yew Lin, Zaiqing Nie. "Joint named entity recognition and disambiguation". In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, pp. 879-888. URL: https://www.aclweb. org/anthology/D15-1104.

35. George R. Krupka, Kevin Hausman. "IsoQuest Inc.: Description of the NetOwlTM Extractor System as Used for MUC-7". In: Seventh Message Understanding Conference (MUC-7). 1998. URL: https://www.aclweb.org/anthology/M98-1015.

36. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. "Neural Architectures for Named Entity Recognition". In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. 2016, pp. 260-270. URL: https://www.aclweb.org/anthology/ N16-1030.

37. Guo J., Wang S., Yu C., Song J. "Chinese POS Tagging Method Based on Bi-GRU+CRF Hybrid Model". In: Xhafa F., Barolli L., Gregus M. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2018. Lecture Notes on Data Engineering and Communications Technologies. Vol. 23. Springer, Cham, 2019. URL: https://link.springer.com/chapter/10.1007/978-3-319-98557-2_41.

38. GuoDong Zhou and Jian Su. "Named Entity Recognition using an HMM-based Chunk Tagger". In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, pp. 473-480. URL: https://www.aclweb.org/anthology/ P02-1060.pdf.

39. Hany Hassan, Yanjun Ma, and Andy Way. "Matrex: the dcu machine translation system for iwslt 2007". In: Proceedings of the International Workshop on Spoken Language Translation. Trento, Italy, 2007. URL: http ://doras . dcu. ie/561/1/Hassanetal_ IWSLT07.pdf.

40. Harold W. Kuhn. "The Hungarian Method for the assignment problem". In: Naval Research Logistics Quarterly 2 (1955), pp. 83-97. URL: https ://link. springer. com/chapter/10.1007/978-3-540-68279-0_2.

41. Hongliang Fei, Xu Li, Dingcheng Li, Ping Li. "End-to-end Deep Reinforcement Learning Based Coreference Resolution". In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Aug. 2019. URL: https://www.aclweb.org/anthology/P19-1064.pdf.

42. Huong Thanh Le and Luan Van Tran. "Automatic feature selection for named entity recognition using genetic algorithm". In: Proceedings of the Fourth Symposium on Information and Communication Technology. 2013, pp. 81-87. URL: https : / / dl . acm.org/doi/abs/10.1145/2542050.2542056.

43. Ilya Sutskever, Oriol Vinyals, Quoc V. Le. "Sequence to sequence learning with neural networks". In: Proceeding NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems. Vol. 2. Montreal, Canada: Association for Computational Linguistics, Dec. 2014, pp. 3104-3112. URL: https://papers.nips. cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf.

44. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. arXiv: 1810. 04805v2 [cs.CL]. URL: https://arxiv.org/abs/1810.04805.

45. Jason P.C. Chiu, Eric Nichols. "Named Entity Recognition with Bidirectional LSTM-CNNs". In: Transactions of the Association for Computational Linguistics. Vol. 4. 2016, pp. 357-370. URL: https://aclweb.org/anthology/Q16-1026.

46. Jeffrey Pennington, Richard Socher, Christopher D. Manning. "GloVe: Global Vectors for Word Representation". In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, pp. 1532-1543. URL: https://www. aclweb.org/anthology/D14-1162.

47. Jerry R.Hobbs. "Resolving pronoun references". In: Lingual 44 (1978): 4, pp. 311— 338. URL: https : / / www . sciencedirect . com / science / article / pii / 0024384178900062.

48. Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi. Introduction to the Bio-Entity Recognition Task at JNLPBA. 2004. URL: http : / /www. nactem. ac . uk/ tsujii/GENIA/ERtask/shared_task_intro.pdf.

49. Jing Huang and Geoffrey Zweig. "Maximum entropy model for punctuation annotation from speech". In: Proceedings of the Annual Conference of the International Speech Communication Association. Denver, Colorado, USA, Sept. 2002. URL: https : / / pdfs.semanticscholar.org/0982/93bd4de50541696806758f881b3bd58bb992.pdf.

50. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. "Conditional random fields: Probabilistic models for segmenting and labeling sequence data". In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001). 2001, pp. 282-289.

51. K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, Y. Wilks. "University of Sheffield: Description of the LaSIE-II System as Used for MUC-7". In: Seventh Message Understanding Conference (MUC-7). 1998. URL: https: //www.aclweb.org/anthology/M98-1007.

52. Kaituo Xu, Lei Xie, Kaisheng Yao. "Investigating LSTM for punctuation prediction". In: Proceedings of the 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). 2016. URL: https://ieeexplore.ieee.org/abstract/document/ 7918492.

53. Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, Christopher Manning. "A Multi-pass Sieve for Coreference Resolution". In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. EMNLP '10. Cambridge, Massachusetts: Association for Computational Linguistics, 2010, pp. 492-501. URL: http://dl.acm.org/citation. cfm?id=1870658.1870706.

54. Kawin Ethayarajh, David Duvenaud, Graeme Hirst. Towards Understanding Linear Word Analogies. 2019. arXiv: 1810 . 04882v6 [cs.CL]. URL: https://arxiv.org/ pdf/1810.04882.pdf.

55. Kenton Lee, Luheng He, and Luke Zettlemoyer. "Higher-order Coreference Resolution with Coarse-to-fine Inference". In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. New Orleans,

Louisiana, USA: Association for Computational Linguistics, June 2018, pp. 687-692. URL: https://www.aclweb.org/anthology/N18-2108.

56. Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. "End-to-end Neural Coreference Resolution". In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, Sept. 2017, pp. 188-197. URL: https://www.aclweb.org/anthology/ D17-1018.

57. Kevin Clark and Christopher D. Manning. Entity-Centric Coreference Resolution with Model Stacking. 2015. URL: https://nlp.stanford.edu/pubs/clark-manning-acl15-entity.pdf.

58. Kevin Clark, Christopher D. Manning. "Deep Reinforcement Learning for Mention-Ranking Coreference Models". In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, USA: Association for Computational Linguistics, Nov. 2016. DOI: 10.18653/v1/D16-1245. URL: https://www. aclweb.org/anthology/D16-1245.

59. Kevin Clark, Christopher D. Manning. "Improving Coreference Resolution by Learning Entity-Level Distributed Representations". In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 1. Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 643-653. DOI: 10.18653/v1/P16-1061. URL: https://www.aclweb.org/anthology/P16-1061.

60. Klaus Greff, Rupesh K. Srivastava, Jan Koutnik, Bas R. Steunebrink, and Jtirgen Schmidhuber. "LSTM: A Search Space Odyssey". In: IEEE Transactions on Neural Networks and Learning Systems 28.10 (Oct. 2017), pp. 2222-2232. URL: https:// ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7508408.

61. Kunihiko Fukushima. "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position". In: Biological Cybernetics 36 (Apr. 1980): 4, pp. 193-202. URL: https://link.springer.com/ content/pdf/10.1007%2FBF00344251.pdf.

62. Kyunghyun Cho, Bart van Merrienboer Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio. "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1724-1734. URL: https://www.aclweb.org/anthology/D14-1179.pdf.

63. Le T. A., Kuratov Y. M. , Petrov M. A., Burtsev M. S. "Sentence Level Representation and Language Models in the task of Coreference Resolution for Russian". In: 25th International Conference on Computational Linguistics and Intellectual Technologies. May 2019. URL: http://www.dialog-21.ru/media/4609/letaplusetal-160.pdf.

64. Le T.A., Arkhipov M.Y., Burtsev M.S. "Application of a Hybrid Bi-LSTM- CRF Model to the Task of Russian Named Entity Recognition". In: Filchenkov A., Pivo-varova L., Zizka J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science. 2017, pp. 91-103. URL: https://link.springer.com/chapter/10.1007/978-3-319-71746-3_8.

65. Le The Anh. "Sequence Labeling Approach to the Task of Sentence Boundary Detection". In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing. New York, NY, USA: Association for Computing Machinery, 2020, pp. 144-148. DOI: 10.1145/3380688.3380703. URL: https://dl.acm.org/doi/10. 1145/3380688.3380703.

66. Le The Anh and Mikhail S. Burtsev. "A Deep Neural Network Model for the Task of Named Entity Recognition". In: International Journal of Machine Learning and Computing. Vol. 9. 1. 2019, pp. 8-13. URL: http://www.ijmlc.org/vol9/758-ML0025.pdf.

67. Leon Derczynski, Eric Nichols, Marieke van Erp, Nut Limsopatham. "Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition". In: The 3rd Workshop on Noisy User-generated Text. 2017, pp. 140-147. DOI: 10.18653/v1/W17-4418. URL: https://www.aclweb.org/anthology/W17-4418.

68. Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, Lynette Hirschman. "A model-theoretic coreference scoring scheme". In: Proceedings of the 6th Message Understanding Conference (MUC-6). 1995, pp. 45-52. URL: https://www.aclweb.org/ anthology/M95-1005.

69. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. "Deep contextualized word representations". In: Proceedings of The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA: Association for Computational Linguistic, June 2018, pp. 2227-2237. URL: https://aclweb.org/anthology/N18-1202.

70. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. "Deep contextualized word representations". In: proceedings of Conference of the North American Chapter of the Association for Compu-

tational Linguistics: Human Language Technologies. 2018, pp. 2227-2237. URL: https: //aclweb.org/anthology/N18-1202.

71. Miikka Silfverberg, Teemu Ruokolainen, Krister Linden, Mikko Kurimo. "Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy". In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2014, pp. 259-264.

72. Mike Schuster, Kaisuke Nakajima. "Japanese and Korean voice search". In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan: IEEE, Mar. 2012, pp. 5149-5152. URL: https://ieeexplore.ieee.org/ document/6289079.

73. Mikhail Burtsev, Alexander Seliverstov, Rafael Airapetyan, Mikhail Arkhipov, Dil-yara Baymurzina, Nickolay Bushkov, Olga Gureenkova, Taras Khakhulin, Yuri Kura-tov, Denis Kuznetsov, Alexey Litinsky, Varvara Logacheva, Alexey Lymar, Valentin Malykh, Maxim Petrov, Vadim Polulyakh, Leonid Pugachev, Alexey Sorokin, Maria Vikhreva, Marat Zaynutdinov. "DeepPavlov: Open-Source Library for Dialogue Systems". In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics-System Demonstrations. Melbourne, Australia: Association for Computational Linguistics, July 2018, pp. 122-127. URL: https : / / www . aclweb . org/anthology/P18-4021.

74. Mingbin Xu, Hui Jiang, Sedtawut Watcharawittayakul. "A Local Detection Approach for Named Entity Recognition and Mention Detection". In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: Association for Computational Linguistics, July 2017, pp. 1237-1247. URL: https : //www.aclweb.org/anthology/P17-1114.

75. Pham Quang Nhat Minh. A Feature-Rich Vietnamese Named-Entity Recognition Model. 2018. arXiv: 1803.04375 [cs.CL]. URL: https://arxiv.org/abs/1803. 04375.

76. Mourad Gridach, Hatem Haddad. "Arabic Named Entity Recognition: A Bidirectional GRU-CRF Approach". In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science. Vol. 10761. Springer, Cham, 2018. URL: https://link.springer.com/chapter/10.1007/978-3-319-77113-7_21.

77. Munkres James. "Algorithms for the Assignment and Transportation Problems". In: Journal of the Society for Industrial and Applied Mathematics 5 (1957), pp. 32-38. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.3911& rep=rep1&type=pdf.

78. Murray Campbell, A.Joseph Hoane Jr, Feng-hsiung Hsuc. "Deep Blue". In: Artificial Intelligence 134 (2002), pp. 57-83. URL: https://www.sciencedirect.com/science/ article/pii/S0004370201001291.

79. Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom. "A Convolutional Neural Network for Modelling Sentences". In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland: Association for Computational Linguistics, June 2014, pp. 655-665. URL: https : / / www . aclweb . org/ anthology/P14-1062/.

80. Nguyen Viet Cuong, Nan Ye, Wee Sun Lee. "Conditional Random Field with Highorder Dependencies for Sequence Labeling and Segmentation". In: Journal of Machine Learning Research 15 (2014), pp. 981-1009. URL: http://www.jmlr.org/papers/ volume15/cuong14a/cuong14a.pdf.

81. Nicola Ueffing, Maximilian Bisani, and Paul Vozila. "Improved models for automatic punctuation prediction for spoken and written text". In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France, Aug. 2013. URL: https://research.nuance.com/wp-content/uploads/2014/11/ AutoPunc_Interspeech2013_paper_finalsubmission.pdf.

82. Olga Uryupina and Alessandro Moschitti. "A State-of-the-Art Mention-Pair Model for Coreference Resolution". In: Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015). 2015. URL: https://www.aclweb.org/ anthology/S15-1034.pdf.

83. Oliver Bender, Franz Josef Och and Hermann Ney. "Maximum entropy models for named entity recognition". In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Vol. 4. 2003, pp. 148-151. URL: https://dl. acm.org/doi/10.3115/1119176.1119196.

84. Onur Kuru, Ozan Arkan Can, and Deniz Yuret. "CharNER: Character-Level Named Entity Recognition". In: COLING. 2016.

85. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates. "Unsupervised named-entity extraction from the Web: An experimental study". In: Artificial Intelligence 165 (1 June 2005), pp. 91-134. URL: https://doi.org/10.1016Zj.artint.2005.03.001.

86. Ottokar Tilk, Tanel Alumae. "LSTM for Punctuation Restoration in Speech Transcripts". In: Proceedings of the 16th Annual Conference of the International Speech Communication. 2015. URL: https : / / www . isca - speech . org / archive / interspeech_2015/i15_0683.html.

87. Paramveer S. Dhillon, Dean Foster, Lyle Ungar. "Multi-View Learning of Word Em-beddings via CCA". In: Proceedings of Advances in Neural Information Processing Systems 24 (NIPS 2011). Granada, Spain, Dec. 2011. URL: https://papers.nips. cc/paper/4193-multi-view-learning-of-word-embeddings-via-cca.pdf.

88. Paul J. Werbos. "Backpropagation through time: what it does and how to do it". In: Proceedings of the IEEE. Vol. 78. 10. 1990, pp. 1550-1560. URL: https://ieeexplore. ieee.org/document/58337.

89. Peng-Hsuan Li, Ruo-Ping Dong, Yu-Siang Wang, Ju-Chieh Chou. "Leveraging Linguistic Structures for Named Entity Recognition with Bidirectional Recursive Neural Networks". In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017, pp. 2664-2669. URL: https://www.aclweb.org/anthology/D17-1282.pdf.

90. Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, Robert L. Mercer. "Class-based n-gram models of natural language". In: Computational Linguistics 18 (1992): 4, pp. 467-469. URL: https://www.aclweb.org/anthology/J92-4003.

91. Peter J. Liu, Mohammad Saleh, Etienne Pot|, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer. "Generating Wikipedia by Summarizing Long Sequences". In: Proceedings of the Sixth International Conference on Learning Representations. Vancouver, Canada, 2018. URL: https://openreview.net/forum?id=Hyg0vbWC-.

92. Pham TH., Le-Hong P. "End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-Level Vs. Character-Level". In: Hasida K., Pa W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science. Vol. 781. 2017. URL: https://link.springer.com/ chapter/10.1007/978-981-10-8438-6_18.

93. Phuong Le-Hong. Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference. 2016. arXiv: 1610.05652 [cs.CL]. URL: https: //arxiv.org/abs/1610.05652v2.

94. Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng. " Building high-level features using large scale unsupervised learning". In: Proceedings of the 29 th International Conference on Ma-

chine Learning. Edinburgh, Scotland, July 2012, pp. 507-514. URL: https://icml. cc/2012/papers/73.pdf.

95. Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. Apr. 2015. arXiv: 1054.00941v2 [cs.NE]. URL: https://arxiv.org/pdf/1504.00941.pdf.

96. Razvan Pascanu, Tomas Mikolov, Yoshua Bengio. "On the difficulty of training recurrent neural networks". In: ICML'13 Proceedings of the 30th International Conference on International Conference on Machine Learning. Vol. 28. 3. Atlanta, GA, USA, June 2013, pp. 1310-1318. URL: http://proceedings.mlr.press/v28/.

97. Reinhard Kneser, Hermann Ney. "Improved Clustering Techniques for Class-Based Statistical Language Modelling". In: Third European Conference on Speech Communication and Technology (EUROSPEECH '93). Berlin, Germany, Sept. 1993, pp. 973976. URL: https://www.isca-speech.org/archive/eurospeech_1993/e93_0973. html.

98. Remi Lebret, Ronan Collobert. "Rehabilitation of Count-Based Models for Word Vector Representations". In: International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2015): Computational Linguistics and Intelligent Text Processing. Cairo, Egypt: Association for Computational Linguistics, Apr. 2015, pp. 417-429. URL: https://link.springer.com/chapter/10.1007/978-3-319-18111-0_31.

99. Remi Lebret, Ronan Collobert. "Word Embeddings through Hellinger PCA". In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: Association for Computational Linguistics, Apr. 2014, pp. 482-490. URL: https://www.aclweb.org/anthology/E14-1051.

100. Rico Sennrich, Barry Haddow, Alexandra Birch. "Neural Machine Translation of Rare Words with Subword Units". In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 1. Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 1715-1725. URL: https://www.aclweb.org/ anthology/P16-1162.pdf.

101. Rinat Gareev, Maksim Tkachenko, Valery Solovyev, Andrey Simanovsky, Vladimir Ivanov. "Introducing Baselines for Russian Named Entity Recognition". In: Computational Linguistics and Intelligent Text Processing. Vol. 7816. Springer, Berlin, Heidelberg, 2013, pp. 329-342. URL: https://link.springer.com/chapter/10.1007/978-3-642-37247-6_27.

102. Ronan Collobert, Jason Weston. "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning". In: Proceedings of the 25th International Conference on Machine Learning - ICML '08. Helsinki, Finland, July 2008, pp. 160-167. URL: https://icml.cc/Conferences/2008/papers/ icml2008proceedings.pdf.

103. Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang, Dragomir Radev. "Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering". In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Vol. 2. Association for Computational Linguistics, 2018, pp. 102-107. URL: https://www.aclweb.org/anthology/P18-2017/.

104. Rumelhart, D., Hinton, G. and Williams, R. "Learning representations by back-propagating errors". In: Nature 323 (1986), pp. 533-536. URL: https://www.nature. com/articles/323533a0.

105. Sam Wiseman and Alexander M. Rush and Stuart M. Shieber. "Learning Global Features for Coreference Resolution". In: Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, California, USA: Association for Computational Linguistics, 2016. URL: https: //www.aclweb.org/anthology/N16-1114.

106. Sam Wiseman, Alexander M. Rush, Stuart Shieber, Jason Weston. "Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution". In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China: Association for Computational Linguistics, July 2015. URL: https ://www. aclweb.org/anthology/P15-1137/.

107. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue. "CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes". In: Proceedings of the Joint Conference on EMNLP and CoNLL: Shared Task. 2012, pp. 1-40. URL: https: //www.aclweb.org/anthology/W12-4501.

108. Scharolta Katharina Siencnik. "Adapting word2vec to Named Entity Recognition". In: Proceedings of the 20th Nordic Conference of Computational Linguistics. 2015, pp. 239-243.

109. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman. "Indexing by latent semantic analysis". In: Journal of The American Soci-

ety for Information Science 41.6 (1990), pp. 391-407. URL: http://citeseerx.ist. psu.edu/viewdoc/download?doi=10.1.1.108.8490&rep=rep1&type=pdf.

110. Sebastian Martschat and Michael Strube. "Latent Structures for Coreference Resolution". In: Transactions of the Association for Computational Linguistics 3 (July 2015), pp. 405-418. URL: https://www.aclweb.org/anthology/Q15-1029.pdf.

111. Sepp Hochreiter and Jurgen Schmidhuber. "Long Short-Term Memory". In: Neural Computation 9 (1997): 8, pp. 1735-1780. DOI: 10.1162/neco.1997.9.8.1735. URL: https://www.bioinf.jku.at/publications/older/2604.pdf.

112. Stephan Peitz, Markus Freitag, Arne Mauser, Hermann Ney. "Modeling Punctuation Prediction as Machine Translation". In: Proceedings of the International Workshop on Spoken Language Translation. San Francisco, USA, Dec. 2011, pp. 238-245. URL: http://www.mt-archive.info/10/IWSLT-2011-Peitz.pdf.

113. Sysoev A. A., Andrianov I. A., Khadzhiiskaia A. Y. "Coreference Resolution in Russian: State-of-the-Art Approaches Application and Evolvement". In: Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies: Dialogue 2017. Vol. 1. Russian State University for the Humanities, Moscow, Russia, June 2017, pp. 327-338. URL: http://www.dialog-21.ru/media/3954/ sysoevaaetal.pdf.

114. Taku Kudoh and Yuji Matsumoto. "Use of Support Vector Learning for Chunk Identification". In: Proceedings of CoNLL-2000 and LLL-2000. 2000, pp. 142-144. URL: https://www.aclweb.org/anthology/W00-0730.pdf.

115. Tao Shen, Jing Jiang, Tianyi Zhou, Shirui Pan, Guodong Long, Chengqi Zhang. "DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding". In: Proceedings of the The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18). New Orleans, Louisiana, USA: AAAI Press, Palo Alto, California USA, Feb. 2018, pp. 5446-5455. URL: https://www.aaai.org/ocs/index. php/AAAI/AAAI18/paper/viewFile/16126/16099.

116. Thai-Hoang Pham, Phuong Le-Hong. The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition. 2017. arXiv: 1705.10610 [cs.CL]. URL: https://arxiv.org/pdf/1705.10610.pdf.

117. Thai-Hoang Pham, Xuan-Khoai Pham, Tuan-Anh Nguyen, Phuong Le-Hong. "NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit". In: The 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan. 2017, pp. 37-40. URL: https://www.aclweb.org/anthology/I17-3010.

118. Thomas R. Nieslel, E.WD. Whittaker, and P.C. Woodland. "Comparison of part-of-speech and automatically derived category-based language models for speech recognition". In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, Washington, USA: IEEE, May 1998, pp. 177-180. URL: https://ieeexplore.ieee.org/document/674396.

119. Thorsten Brants. "TnT - A Statistical Part-of-Speech Tagger". In: Sixth Applied Natural Language Processing Conference. Association for Computational Linguistics, 2000, pp. 224-2312. URL: https://www.aclweb.org/anthology/A00-1031.

120. Toldova S. Ju., Roytberg A., Nedoluzhko A., Kurzukov M., Ladygina A. A.,Vasilyeva M. D., Azerkovich I. L., Grishina Y., Sim G., Ivanova A., Gorshkov D. "Evaluating Anaphora and Coreference Resolution for Russian". In: Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies: Dialogue 2014. Russian State University for the Humanities, Moscow, Russia, June 2014, pp. 681-695. URL: http://www.dialog-21.ru/digests/dialog2014/materials/ pdf/ToldovaSJu.pdf.

121. Toldova S., Ionov M. "Coreference resolution for russian: The impact of semantic features". In: Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies: Dialogue 2017. Vol. 1. Russian State University for the Humanities, Moscow, Russia, June 2017, pp. 339-349. URL: http ://www. dialog-21.ru/media/3956/toldovasionovm.pdf.

122. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. "Distributed representations of words and phrases and their compositionality". In: NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA, Dec. 2013, pp. 3111-3119. URL: https://papers.nips. cc/paper/5021 - distributed - representations - of - words - and - phrases - and-their-compositionality.pdf.

123. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. 2013. arXiv: 1301 . 3781v3 [cs.CL]. URL: https: //arxiv.org/pdf/1301.3781.pdf.

124. Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan "Honza" Cernocky, Sanjeev Khu-danpur. "Recurrent neural network based language model". In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. Makuhari, Chiba, Japan, Sept. 2010, pp. 1045-1048. URL: https : //www . isca- speech . org/archive/archive_papers/interspeech_2010/i10_ 1045.pdf.

125. Tyne Liang and Dian-Song Wu. "Automatic pronominal anaphora resolution in english texts". In: International Journal of Computational Linguistics and Chinese Language Processing 9.1 (2004), pp. 21-40. URL: https://www.aclweb.org/anthology/003-1007.

126. Valentin Malykh and Alexey Ozerin. "Reproducing Russian NER Baseline Quality without Additional Data". In: CDUD@CLA. 2016.

127. Valerie Mozharova and Natalia Loukachevitch. "Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian". In: Ignatov D. et al. (eds) Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science. Vol. 661. 2016. URL: https://link.springer.com/ chapter/10.1007/978-3-319-52920-2_18.

128. Valerie Mozharova and Natalia Loukachevitch. "Two-stage approach in Russian named entity recognition". In: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT). 2016. URL: https://ieeexplore.ieee.org/ document/7584769.

129. Vitaly Romanov, Albina Khusainova. "Evaluation of Morphological Embeddings for the Russian Language". In: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalJune. 2019. URL: https : / / doi.org/10.1145/3342827.3342846.

130. Wei Lu and Hwee Tou Ng. "Better punctuation prediction with dynamic conditional random fields". In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, ser. EMNLP '10. Stroudsburg, PA, USA, Oct. 2010, pp. 177186. URL: http://statnlp.com/people/luwei/publications/emnlp10ln.pdf.

131. William J Black, Fabio Rinaldi, David Mowatt. "FACILE: Description of the NE System Used for MUC-7". In: Seventh Message Understanding Conference (MUC-7). 1998. URL: https://www.aclweb.org/anthology/M98-1014.

132. Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. 2015. arXiv: 1509.01626v3 [cs.LG]. URL: https://arxiv.org/ abs/1509.01626.

133. Xiaoqiang Luo. "On coreference resolution performance metrics". In: Proceeding HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005, pp. 25-32. URL: https ://www. aclweb . org/anthology/H05-1004.

134. Xuan-Son Vu. Pre-trained Word2Vec models for Vietnamese. 2016. URL: https : // github.com/sonvx/word2vecVN.

135. Xuezhe Ma and Eduard Hovy. "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF". In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 9. 1. Berlin, Germany: Association for Computational Linguistics, 2016, pp. 1064-1074. URL: https://aclweb.org/anthology/P16-1101.

136. Yan Shao, Christian Hardmeier, Jorg Tiedemann, and Joakim Nivre. "Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF". In: Proceedings of the The 8th International Joint Conference on Natural Language Processing. Vol. 9. 1. Taipei, Taiwan, Nov. 2017, pp. 173-183. URL: https://www. aclweb.org/anthology/I17-1018.

137. Yaniv Taigman ; Ming Yang ; Marc'Aurelio Ranzato ; Lior Wolf. "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, Sept. 2014, pp. 1701-1708. URL: https://ieeexplore.ieee.org/document/6909616.

138. Yanjun Ma, John Tinsley, Hany Hassan, Jinhua Du, and Andy Way. "Exploiting Alignment Techniques in MaTrEx: the DCU Machine Translation System for IWSLT08". In: Proceedings of the International Workshop on Spoken Language Translation. Hawaii, USA, 2008, pp. 26-33. URL: http://www2.nict.go.jp/astrec-att/workshop/ IWSLT2008/proceedings/EC_2_dcu.pdf.

139. Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner. "Gradient-based learning applied to document recognition". In: Proceedings of the IEEE. Vol. 86. IEEE, Nov. 1998, pp. 2278-2324. URL: https ://ieeexplore .ieee.org/stamp/stamp. jsp?tp= &arnumber=726791.

140. Yann LeCun, Yoshua Bengio. "Convolutional networks for images, speech, and time series". In: The handbook of brain theory and neural networks. 1998. URL: http: / / yann.lecun.com/exdb/publis/pdf/lecun-bengio-95a.pdf.

141. Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. "DailyDi-alog: A Manually Labelled Multi-turn Dialogue Dataset". In: Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017). 2017. URL: http://yanran.li/dailydialog.html.

142. Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anand-kumar. "Deep Active Learning for Named Entity Recognition". In: Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, 2017, pp. 252-256. URL: https://www.aclweb.org/anthology/W17-2630.pdf.

143. Yoon Kim. "Convolutional Neural Networks for Sentence Classification". In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1746-1751. URL: https://www.aclweb.org/anthology/D14-1181/.

144. Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush. Character-Aware Neural Language Models. 2015. arXiv: 1508.06615v4 [cs.CL]. URL: https://arxiv.org/ abs/1508.06615.

145. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, Christian Jauvin. "A Neural Probabilistic Language Model". In: Journal of Machine Learning Research 3 (2003), pp. 1137-1155. URL: http : / / www . jmlr . org / papers / volume3 / bengio03a / bengio03a.pdf.

146. Zellig S. Harris. "Distributional Structure". In: WORD 10 (1954): 2,3, pp. 146-162. URL: https : / / www . tandfonline . com / doi / abs / 10 . 1080 / 00437956 . 1954 . 11659520.

147. Zhiheng Huang, Wei Xu, Kai Yu. Bidirectional LSTM-CRF Models for Sequence Tagging. 2015. arXiv: 1508 . 01991v1 [cs.CL]. URL: https://arxiv.org/abs/1508. 01991.

148. Zhilin Yang, Ruslan Salakhutdinov, William Cohen. Multi-Task Cross-Lingual Sequence Tagging from Scratch. 2016. arXiv: 1603 . 06270v2 [cs.CL]. URL: https: //arxiv.org/abs/1603.06270.

149. Zhou P., Zheng S., Xu J., Qi Z., Bao H., Xu B. "Joint Extraction of Multiple Relations and Entities by Using a Hybrid Neural Network". In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD 2017, CCL 2017. Lecture Notes in Computer Science. Vol. 10565. 2017. URL: https://link.springer.com/chapter/10.1007/978-3-319-69005-6_12.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.