26939

Автор(ы): 

Автор(ов): 

2

Параметры публикации

Тип публикации: 

Доклад

Название: 

Robust Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite Markov Chain

ISBN/ISSN: 

ISBN: 978-3-902823-35-9 / ISSN: 1474-6670

Наименование конференции: 

  • 7th IFAC Conference on Manufacturing Modelling, Management, and Control (MIM`2013, Saint Petersburg)

Наименование источника: 

  • Proceedings of the 7th IFAC Conference on Manufacturing Modelling, Management, and Control (MIM`2013, Saint Petersburg)

Город: 

  • Saint Petersburg

Издательство: 

  • Saint Petersburg State University and Saint Petersburg National Research University of Information Technologies, Mechanics, and Optics

Год издания: 

2013

Страницы: 

939-943
Аннотация
This paper develops adaptive approach to the controlling observable Markov chains with a finite number of states. We apply Robust Mirror Descent Randomized Control Algorithm (RMDRCA) to a class of homogeneous finite Markov chains governed by the multi-armed bandit with unknown mean losses. It develops approach represented in [Nazin and Miller (2011a)] and [Nazin and Miller (2011b)]. As opposed to the partially observable Markov decision process an adaptive approach does not presuppose the knowledge of probabilistic characteristics of random perturbations and permits to obtain the control strategy with known rate of convergence to the optimal solution. We propose the concrete RMDRCA and prove the explicit, non-asymptotic upper bound for the mean losses at any current time. Numerical example illustrates theoretical results.

Библиографическая ссылка: 

Назин А.В., Миллер Б.М. Robust Mirror Decent Algorithm for a Multi-Armed Bandit Governed by a Stationary Finite Markov Chain / Proceedings of the 7th IFAC Conference on Manufacturing Modelling, Management, and Control (MIM`2013, Saint Petersburg). Saint Petersburg: Saint Petersburg State University and Saint Petersburg National Research University of Information Technologies, Mechanics, and Optics, 2013. С. 939-943.