The article discusses the main metrics for evaluating the performance of big data warehouses,
compares the characteristics of the most popular modern frameworks. When considering distributed
repositories of large amounts of information, it is impossible to use the same metrics of speed and
efficiency of work that are used in local databases. Qualitative and quantitative assessment is
complicated by the impossibility of reproducing the same queries under the same conditions when using
distributed storage. This is caused by the constantly changing state of the transmitting environment and
computing resources. When choosing a framework, technology stack, file system, distributed database
architecture for a specific task, it is necessary to take into account a large number of factors and the
totality and frequency of operations and treatments that will be required in this system. The features and
advantages of using containers and various file systems for storing data in distributed systems are
presented, as well as other technologies that allow increasing the speed and efficiency of data processing
in large arrays of heterogeneous data. At the moment there is no single solution that would allow the best
way to organize the storage of large volumes of heterogeneous data, but there are trends in the
development of technologies for their processing. Many of them are related to the ability to apply the
same methods for big data that are used in relational databases, but taking into account specific features.