The paper presents a novel method for near-duplicate detection in handwritten document collections of school
essays. A large amount of online resources with available academic essays currently makes it possible to cheat
and reuse them during high school final exams. Despite the importance of the problem, at the moment there is
no automatic method for near-duplicate detection for handwritten documents, such as school essays. The school
essay is represented as a sequence of scanned images of handwritten essay text. Despite advances in recognition
of handwritten printed text, the use of these methods for the current task is a challenge. The proposed method of
near-duplicate detection does not require detailed markup text, which makes it possible to use it in a large number
of tasks related to the information extraction in zero-shot regime, i.e. without any specific resources written in the
processed language. The paper presents a method based on series analysis. The image is segmented into words.
The text is characterized by a sequence of features, which are invariant to the author’s writing style: normalized
lengths of the segmented words. These features can be used for both handwritten and machine-readable texts. The
computational experiment is conducted on IAM dataset of English handwritten texts and the dataset of real images
of handwritten school essays.