The paper proposes and tested an approach to the video data formation processing pipeline that solves the problem of automating the control of the presence and correctness of wearing personal protective equipment by personnel in difficult conditions of filming by CCTV cameras. The proposed solution is based on neural network algorithms and is flexibly configured for specific conditions and tasks, including when generating operational alerts. The approach is demonstrated on the example of recognition of medical masks. The solution is based on: an automated markup system for training a detector based on AlphaPose, a YOLOX neural network detector, a ByteTrack tracking system, and a classifier based on a lightweight CNN, the input of which is minitracks of people’s faces, 8 frames each. The collected dataset consists of over 260,000 minitracks. Achieved the quality of classification $F_1=0.86$ to determine the presence and $F_1=0.79$ to determine the correct wearing of the mask.