The paper deals with an optimal allocation problem in a finite-source queuing system where the repair facility consists of multiple heterogeneous servers. A threshold-based allocation policy prescribes the usage of slower servers according to given threshold levels of the queue lengths. This problem under markovian settings can be treated as a continuous-time Markov decision problem which was efficiently solved by dynamic programming algorithms. However, under conditions of uncertainty, when there is no information about the transient characteristics of the system and, in addition, the total number of states is too large, the simulation-based optimization methods must be applied. We use both the reinforcement learning methods and the random search method based on simulated annealing to solve the discrete optimization problem. Experimental results are compared with an actual solution obtained by policy iteration. Advantages and disadvantages of the methods and the peculiarities of their use for controllable queueing system are discussed.