The article discusses the main metrics for evaluating the performance of big data warehouses, compares the characteristics of the most popular modern frameworks. When considering distributed repositories of large amounts of information, it is impossible to use the same metrics of speed and efficiency of work that are used in local databases. Qualitative and quantitative assessment is complicated by the impossibility of reproducing the same queries under the same conditions when using distributed storage. This is caused by the constantly changing state of the transmitting environment and computing resources. When choosing a framework, technology stack, file system, distributed database architecture for a specific task, it is necessary to take into account a large number of factors and the totality and frequency of operations and treatments that will be required in this system. The features and advantages of using containers and various file systems for storing data in distributed systems are presented, as well as other technologies that allow increasing the speed and efficiency of data processing in large arrays of heterogeneous data. At the moment there is no single solution that would allow the best way to organize the storage of large volumes of heterogeneous data, but there are trends in the development of technologies for their processing. Many of them are related to the ability to apply the same methods for big data that are used in relational databases, but taking into account specific features.