This paper is devoted to the problem of solving a system of nonlinear equations with an arbitrary but continuous vector function on the left-hand side. By assumption, the values of its components are the only a priori information available about this function. An approximate solution of the system is determined using some iterative method with parameters, and the qualitative properties of the method are assessed in terms of a quadratic residual functional. We propose a self-learning (reinforcement) procedure based on auxiliary Monte Carlo (MC) experiments, an exponential utility function, and a payoff function that implements Bellman’s optimality principle. A theorem on the strict monotonic decrease of the residual functional is proven.