Москва

82540

Автор(ы):

Варламова К. Д. (Компания Антиплагиат)

Хабутдинов И. А. (Московский физико-технический институт)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Automatic Spelling Correction for Russian: Multiple Error Approach

Электронная публикация:

Да

ISBN/ISSN:

2767-9535

DOI:

10.1109/ispras60948.2023.10508161

Наименование конференции:

2023 Ivannikov Ispras Open Conference (ISPRAS)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2023

Город:

Москва

Издательство:

IEEE

Год издания:

2023

Страницы:

https://ieeexplore.ieee.org/document/10508161

Аннотация

To date, the amount of textual information is consistently expanding and reaching wider audiences, leading to a rise in spelling and typography errors. This further accentuates the Automatic Spelling Correction problem, which remains one of the primary tasks of Natural Language Processing. At the moment this problem is not widely studied for the Russian language and supposed models often have the strict limitation of the number of errors in the word. This paper presents a model for Automatic Spelling Correction in the Russian language that can handle multiple error cases without limits on the number of errors processed. The model is based on a probabilistic approach and consists of multiple stages, including classification of word correctness, preliminary candidate search with shingle-based approach, source model, error model with the application of bigrams and phonetics. We outline the process of obtaining data from open sources and investigate different methods of constructing and utilising dictionaries. By searching for candidates using a shingle-based approach with no limit on the number of errors, the model is resistant to multiple error cases. The shingle-based search is compared with the fixed cut distance candidate generation approach. We use several test samples and obtain a top-5 F1-score of 0.80 on the real data, which is mostly social media, and 0.91 on the hand-crafted sample with multiple errors.

Библиографическая ссылка:

Варламова К.Д., Хабутдинов И.А., Грабовой А.В. Automatic Spelling Correction for Russian: Multiple Error Approach / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2023. М.: IEEE, 2023. С. https://ieeexplore.ieee.org/document/10508161.

82535

Автор(ы):

Шодиев Д. (Институт системного программирования РАН)

Копаничук И. В. (Компания Антиплагиат)

Чащин А. В. (Компания Антиплагиат)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Кильдяков А. С. (ИПУ РАН, Лаборатория 42)

Чехович Ю. В. (ИПУ РАН, Лаборатория 42)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Ensembling Models for the Generation of Queries to an Altering Search Engine Using Reinforcement Learning

Электронная публикация:

Да

ISBN/ISSN:

2767-9535

DOI:

10.1109/ispras60948.2023.10508170

Наименование конференции:

2023 Ivannikov Ispras Open Conference (ISPRAS)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2023

Город:

Москва

Издательство:

IEEE

Год издания:

2023

Страницы:

https://ieeexplore.ieee.org/document/10508170

Аннотация

The automatic generation of queries to a search engine based on the incoming text is important for question-answering, recommendation, and text reuse detection systems. Every such query requires resources from a user and a search engine itself. A method of ensembling query generation models that maximizes the search completeness metric for the minimum number of queries could be useful. The task of selecting the best model or an ensemble of models is trivial for the case of a fixed search engine. However, real search engines are constantly changing their behavior, learning on incoming data, changing their index of web pages and documents. They are black boxes for a user. In this paper we propose an approach to ensemble query generation models based on reinforcement learning. By reformulating the problem so that the agent selects a sequence of models rather than a single query generation model, we guarantee maximum retrieval recall even when the worst possible action is selected. As a reward, we introduce a discount recall metric that penalizes the agent for each extra step of a model request. We modify the UCB learning algorithm so that the re-initialization of the recidivism penalty matrix occurs independently of the engine index state. In this way, we ensure that the top 3 best actions (i.e. sequences of generation model requests) are found in just 5 epochs, each epoch contains 1050 documents. The model ensemble maintains a stable performance even when the index alters in a way that the ensemble was not informed about.

Библиографическая ссылка:

Шодиев Д., Копаничук И.В., Чащин А.В., Грабовой А.В., Кильдяков А.С., Чехович Ю.В. Ensembling Models for the Generation of Queries to an Altering Search Engine Using Reinforcement Learning / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2023. М.: IEEE, 2023. С. https://ieeexplore.ieee.org/document/10508170.

Шодиев Д. (Институт системного программирования РАН)

Публикации

Фамилия:

Шодиев

Имя:

Дамир

Отчество:

Место работы

Организация:

Институт системного программирования РАН

Город:

Москва

82533

Автор(ы):

Лавренов И. В. (ИПУ РАН, Лаборатория 41)

Шумов А. С. (ИПУ РАН, Лаборатория 31)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Численное моделирование протекания термоядерных реакций при имплозии в протяженной оболочке

Электронная публикация:

Да

ISBN/ISSN:

978-5-91450-283-3

Наименование конференции:

18-я Международная конференция «Управление развитием крупномасштабных систем» (MLSD'2025, Москва)

Наименование источника:

Труды 18-й Международной конференции «Управление развитием крупномасштабных систем» (MLSD'2025, Москва)

Город:

Москва

Издательство:

ИПУ РАН

Год издания:

2025

Страницы:

930-936

Аннотация

Численным моделированием продемонстрирован эффект теплоизоляции термоядерного топлива при имплозии топливных полостей в протяженной оболочке из тяжелых элементов. Найдены конечные условия имплозии, достижимые с учетом технологических ограничений.

Библиографическая ссылка:

Лавренов И.В., Шумов А.С. Численное моделирование протекания термоядерных реакций при имплозии в протяженной оболочке / Труды 18-й Международной конференции «Управление развитием крупномасштабных систем» (MLSD'2025, Москва). М.: ИПУ РАН, 2025. С. 930-936.

82524

Автор(ы):

Мешков В. С. (Московский физико-технический институт)

Киселев Н. С. (Московский физико-технический институт)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

ConvNets Landscape Convergence: Hessian-Based Analysis of Matricized Networks

Электронная публикация:

Да

ISBN/ISSN:

2767-9535

DOI:

10.1109/ispras64596.2024.10899113

Наименование конференции:

2024 Ivannikov Ispras Open Conference (ISPRAS)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024

Город:

Москва

Издательство:

IEEE

Год издания:

2024

Страницы:

https://ieeexplore.ieee.org/document/10899113

Аннотация

The Hessian of a neural network is an important aspect for understanding the loss landscape and the characteristic of network architecture. The Hessian matrix captures important information about the curvature, sensitivity, and local behavior of the loss function. Our work proposes a method that enhances the understanding of the local behavior of the loss function and can be used to analyze the behavior of neural networks and also for interpreting the parameters in these networks. In this paper, we consider an approach to investigate the properties of the deep neural network, using the Hessian. We propose a method for estimating the Hessian matrix norm for a specific type of neural networks like convolutional. We have obtained the results for both 1D and 2D convolutions, as well as for the fully connected head in these networks. Our empirical analysis supports these findings, demonstrating convergence in the loss function landscape. We have evaluated the Hessian norm for neural networks represented as a product of matrices and considered how this estimate affects the landscape of the loss function.

Библиографическая ссылка:

Мешков В.С., Киселев Н.С., Грабовой А.В. ConvNets Landscape Convergence: Hessian-Based Analysis of Matricized Networks / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024. М.: IEEE, 2024. С. https://ieeexplore.ieee.org/document/10899113.

82522

Автор(ы):

Пойманов Д. Р. (Московский государственный университет)

Местецкий Л. М. (Московский государственный университет)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

N-Gram Perplexity-Based AI-Generated Text Detection

Электронная публикация:

Да

ISBN/ISSN:

2767-9535

DOI:

10.1109/ispras64596.2024.10899150

Наименование конференции:

2024 Ivannikov Ispras Open Conference (ISPRAS)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024

Город:

Москва

Издательство:

IEEE

Год издания:

2024

Страницы:

https://ieeexplore.ieee.org/abstract/document/10899150

Аннотация

Currently, more efforts are being made to improve the capabilities of Large Language Models than to address their implications. Modern language models are capable of generating texts that appear indistinguishable from those written by human experts. While providing a high quality of life, such breakthroughs at the same time pose new challenges in education, science and social media. In addition, existing approaches to detect texts created by artificial intelligence either require high computational cost or access to the internal computation of LLMs, which in turn hinders their public availability. Based on these considerations, this paper presents a new paradigm for detecting texts created by artificial intelligence based on on collecting preliminary token statistics and computing n-gram perplexity features. On the combination of HC3, M4GT and MAGE datasets it shows a speedup of 2x over existing approaches with a quality drop around 5%. Moreover, the combination of methods achieves the best quality. This strikes a balance between computational cost, accessibility and performance.

Библиографическая ссылка:

Пойманов Д.Р., Местецкий Л.М., Грабовой А.В. N-Gram Perplexity-Based AI-Generated Text Detection / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024. М.: IEEE, 2024. С. https://ieeexplore.ieee.org/abstract/document/10899150.

Местецкий Л. М. (Московский государственный университет)

Публикации

Фамилия:

Местецкий

Имя:

Леонид

Отчество:

Моисеевич

Место работы

Организация:

Московский государственный университет

Город:

Москва

Пойманов Д. Р. (Московский государственный университет)

Публикации

Фамилия:

Пойманов

Имя:

Дмитрий

Отчество:

Романович

Место работы

Организация:

Московский государственный университет

Город:

Москва

82519

Автор(ы):

Асваров А. (Дагестанский государственный университет)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Neural Machine Translation System for Lezgian, Russian and Azerbaijani Languages

Электронная публикация:

Да

ISBN/ISSN:

2767-9535

DOI:

10.1109/ispras64596.2024.10899143

Наименование конференции:

2024 Ivannikov Ispras Open Conference (ISPRAS)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024

Город:

Москва

Издательство:

IEEE

Год издания:

2024

Страницы:

https://ieeexplore.ieee.org/document/10899143

Аннотация

We release the first neural machine translation system for translation between Russian, Azerbaijani and the endangered Lezgian languages, as well as monolingual and parallel datasets collected and aligned for training and evaluating the system. Multiple experiments are conducted to identify how different sets of training language pairs and data domains can influence the resulting translation quality. We achieve BLEU scores of 26.14 for Lezgian-Azerbaijani, 22.89 for Azerbaijani-Lezgian, 29.48 for Lezgian-Russian and 24.25 for Russian-Lezgian pairs. The quality of zero-shot translation is assessed on a Large Language Model, showing its high level of fluency in Lezgian. However, the model often refuses to translate, justifying itself with its incompetence. We contribute our translation model along with the collected parallel and monolingual corpora and sentence encoder for the Lezgian language.

Библиографическая ссылка:

Асваров А., Грабовой А.В. Neural Machine Translation System for Lezgian, Russian and Azerbaijani Languages / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2024. М.: IEEE, 2024. С. https://ieeexplore.ieee.org/document/10899143.

82515

Автор(ы):

Левыкин А. И. (Московский государственный университет)

Хабутдинов И. А. (Московский физико-технический институт)

Грабовой А. В. (ИПУ РАН, Лаборатория 42)

Воронцов К. В. (ВЦ ФИЦ ИУ РАН)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

The methodology of multi-criteria evaluation of text markup models based on inconsistent expert markup

Электронная публикация:

Да

ISBN/ISSN:

2075-7182

DOI:

10.28995/2075-7182-2025-23-1066-1080

Наименование конференции:

Annual International Conference “Dialogue - 2025" (Computational Linguistics and Intellectual Technologies)

Наименование источника:

Papers from the Annual International Conference “Dialogue” (2025)

Обозначение и номер тома:

Т. 23

Город:

Москва

Издательство:

JINR

Год издания:

2025

Страницы:

1066-1080

Аннотация

A wide class of natural language processing tasks is solved using markup. At the moment, the vast majority of models and datasets rely on a simple markup structure containing only fragments and labels. Moreover, simple classification metrics such as F1, Precision, Recall are used to evaluate the model’s accuracy. The problem with such metrics is that they do not take into account all aspects of the markup structure and that they are applicable only under the assumption of the existence of an ideal markup. This paper proposes a more general and universal markup structure that allows solving complex problems and builds a methodology for multi-criteria evaluation of text markup models based on inconsistent expert markup. After that, the application of the constructed method is considered to assess the quality of the model obtained within the winning algorithm of the “READ//ABLE” competition, which focused on building an effective essay markup system. The results demonstrate that the new markup structure and evaluation approach provides a more comprehensive and accurate assessment of model performance, addressing the limitations of traditional metrics by accounting for complex markup scenarios and expert inconsistencies.

Библиографическая ссылка:

Левыкин А.И., Хабутдинов И.А., Грабовой А.В., Воронцов К.В. The methodology of multi-criteria evaluation of text markup models based on inconsistent expert markup / Papers from the Annual International Conference “Dialogue” (2025). М.: JINR, 2025. Т. 23. С. 1066-1080.