56869 | ИПУ РАН

Автор(ы):

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Industrial track: Architecting railway KPIs data processing with Big Data technologies

DOI:

10.1109/BigData47090.2019.9006196

Наименование конференции:

2019 IEEE International Conference on Big Data (Big Data)

Наименование источника:

Proceedings of the IEEE International Conference on Big Data

Город:

Los Angeles

Издательство:

IEEE

Год издания:

2019

Страницы:

2047-2056

Аннотация

In our conducted research we have built the data processing pipeline for storing railway KPIs data based on Big Data open-source technologies - Apache Hadoop, Kafka, Kafka HDFS Connector, Spark, Airflow and PostgreSQL. Created methodology for data load testing allowed to iteratively perform data load tests with increased data size and evaluate needed cluster software and hardware resources and, finally, detected bottlenecks of solution. As a result of the research we proposed architecture for data processing and storage, gave recommendations on data pipeline optimization. In addition, we calculated approximate cluster machines sizing for current dataset volume for data processing and storage services.

Библиографическая ссылка:

Бахтадзе Н.Н., Сулейкин А.С., Панфилов П. Industrial track: Architecting railway KPIs data processing with Big Data technologies / Proceedings of the IEEE International Conference on Big Data. Los Angeles: IEEE, 2019. С. 2047-2056.