82529

Автор(ы): 

Автор(ов): 

2

Параметры публикации

Тип публикации: 

Доклад

Название: 

Automated Text Identification on Languages of the Iberian Peninsula: LLM and BERT-based Models Aggregation

Электронная публикация: 

Да

ISBN/ISSN: 

1613-0073

Наименование конференции: 

  • Iberian Languages Evaluation Forum (IberLEF 2024) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024)

Наименование источника: 

  • Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024)

Обозначение и номер тома: 

Т. 3756

Город: 

  • Вальядолид

Издательство: 

  • CEUR-WS.org

Год издания: 

2024

Страницы: 

https://ceur-ws.org/Vol-3756/IberAuTexTification2024_paper6.pdf
Аннотация
This paper describes our solution approach for the IberAuTexTification (Automated Text Identification on Languages of the Iberian Peninsula) competition held as part of the IberLEF 2024 conference. Machine generated text fragments can be spotted in almost various domains nowadays. The rapid progress of language models and the booming distribution of such texts sometimes confuses human beings. In this article, we present a model for detecting machine-generated fragments based on the aggregation of responses from a large language model BLOOM and two BERT-like encoders Multilingual E5 and XLM-RoBERTa. Given the specificity of the task, namely the presence of the different languages of the Iberian Peninsula, we fine-tuned the distinct models for different subgroups of languages. The method described in the paper helped our team to achieve about 67% for the binary classification dataset with 6 languages in the final competition result.

Библиографическая ссылка: 

Грицай Г.М., Грабовой А.В. Automated Text Identification on Languages of the Iberian Peninsula: LLM and BERT-based Models Aggregation / Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024). Вальядолид: CEUR-WS.org, 2024. Т. 3756. С. https://ceur-ws.org/Vol-3756/IberAuTexTification2024_paper6.pdf.