26938 | ИПУ РАН

Автор(ы):

Назин А. В. (ИПУ РАН, Лаборатория 07)

Миллер Б. М. (ИППИ РАН)

Автор(ов):

Параметры публикации

Тип публикации:

Доклад

Название:

Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite State Markov Chain

ISBN/ISSN:

ISBN: 978-3-952-41734-8

Наименование конференции:

12th European Control Conference (ECC-13, Zurich, Switzerland, 2013)

Наименование источника:

Proceedings of the 12th European Control Conference (ECC-13, Zurich, Switzerland, 2013)

Город:

Zürich

Издательство:

EUCA

Год издания:

2013

Страницы:

371-375

Аннотация

This article further develops an adaptive approach to the control of observable Markov chains with a finite number of states. We apply the Mirror Descent Randomized Control Algorithm (MDRCA) to a class of homogeneous finite Markov chains governed by the multi-armed bandit with unknown mean losses. The article develops the approach represented in [18]. As opposed to the partially observable Markov decision process an adaptive approach does not presuppose the knowledge of probabilistic characteristics of random perturbations and permits to obtain the control strategy with known rate of convergence to the optimal solution. We propose the concrete MDRCA and prove the explicit, non-asymptotic upper bound for the mean losses at a given (finite) time horizon. Numerical example illustrates theoretical results.

Библиографическая ссылка:

Назин А.В., Миллер Б.М. Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite State Markov Chain / Proceedings of the 12th European Control Conference (ECC-13, Zurich, Switzerland, 2013). Zürich: EUCA, 2013. С. 371-375.