82448 | ИПУ РАН

Автор(ы):

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Automatic Detection of Machine Generated Texts: Need More Tokens

DOI:

10.1109/IVMEM57067.2022.9983964

Наименование конференции:

2022 Ivannikov Memorial Workshop (IVMEM)

Наименование источника:

Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2022

Город:

Москва

Издательство:

IEEE

Год издания:

2022

Страницы:

20-26

Аннотация

Current advances in text generation using neural approaches make it possible to create texts hardly distinguishable from human texts. A survey to improve the efficiency of automatic discriminators to detect machine-generated text could be useful in revealing features directly affecting the quality of detection. Recently, many works have appeared in the natural language processing (NLP) and machine learning (ML) communities to create accurate detectors for the English language. Despite the importance of this problem, all the works that exist for Russian rely only on short sequence length. In this work, we argue that context length matters. First, we present novel open dataset for Russian language with long texts for the task of machinegenerated text detection. We describe the collection, generative models selection, and sampling process in detail and present exploratory analysis of the quality of various discriminators. Second, we conduct a set of learning experiments to build accurate machine-generated text detectors for both English and Russian languages. In addition, we conduct a comparative analysis of the quality of discriminators when training a multi-task model.

Библиографическая ссылка:

Грицай Г.М., Грабовой А.В., Чехович Ю.В. Automatic Detection of Machine Generated Texts: Need More Tokens / Proceedings of the Ivannikov Memorial Workshop (IVMEM), 2022. М.: IEEE, 2022. С. 20-26.