Large agro-industrial complexes are interested in deep automation of the yields control processes to reduce costs caused by errors or a shortage of qualified personnel. Existing approaches solve problems such as yield assessment or plant pathologies detection, but they cannot properly quantify the volume of plant biomass or the diseased area. One of the reasons for this limitation is the poor quality of masks of object instances formed in machine vision systems. This occurs because of Mask R-CNN architecture, which is usually used in the computer vision. In this paper, we propose an algorithms composition for obtaining accurate masks of objects in task of segmentation of tomato leaf instances in images collected in difficult conditions of industrial greenhouses. The use of Mask R-CNN combined with CascadePSP neural network algorithm increased the average IoU by 1.194% compared to “pure” Mask R-CNN on images with complex object-like background.