82446 | ИПУ РАН

Автор(ы):

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

HWR200: New open access dataset of handwritten texts images in Russian

Электронная публикация:

Да

ISBN/ISSN:

2075-7182

DOI:

10.28995/2075-7182-2023-22-452-458

Наименование конференции:

Annual International Conference “Dialogue - 2023" (Computational Linguistics and Intellectual Technologies)

Наименование источника:

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2023)

Обозначение и номер тома:

Вып. 22

Город:

Москва

Издательство:

Российский государственный гуманитарный университет

Год издания:

2023

Страницы:

452-458

Аннотация

Handwritten text image datasets are highly useful for solving many problems using machine learning. Such problems include recognition of handwritten characters and handwriting, visual question answering, near-duplicate detection, search for text reuse in handwriting and many auxiliary tasks: highlighting lines, words, other objects in the text. The paper presents new dataset of handwritten texts images in Russian created by 200 writers with different handwriting and photographed in different environment. We described the procedure for creating this dataset and the requirements that were set for the texts and photos. The experiments with the baseline solution on fraud search and text reuse search problems showed results of results of 60% and 83% recall respectively and 5% and 2% false positive rate respectively on the dataset.

Библиографическая ссылка:

Потяшин И.О., Каприелова М.С., Чехович Ю.В., Кильдяков А.С., Сейил Т.Б., Финогеев Е.Л., Грабовой А.В. HWR200: New open access dataset of handwritten texts images in Russian / Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2023). М.: Российский государственный гуманитарный университет, 2023. Вып. 22. С. 452-458.