26937

Автор(ы): 

Автор(ов): 

2

Параметры публикации

Тип публикации: 

Доклад

Название: 

On Effectiveness of the Mirror Decent Algorithm for a Stochastic Multi-Armed Bandit Governed by a Stationary Finite Markov Chain

ISBN/ISSN: 

ISBN 978-1-4799-2497-4

Наименование конференции: 

  • The 3rd Australian Control Conference (AUCC2013), 4-5 November 2013, Perth, Western Australia

Наименование источника: 

  • Proceedings of the 3rd Australian Control Conference (AUCC2013, Perth, Western Australia)

Город: 

  • Perth, Australia

Издательство: 

  • Engineers Australia

Год издания: 

2013

Страницы: 

244-250
Аннотация
In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.

Библиографическая ссылка: 

Назин А.В., Миллер Б.М. On Effectiveness of the Mirror Decent Algorithm for a Stochastic Multi-Armed Bandit Governed by a Stationary Finite Markov Chain / Proceedings of the 3rd Australian Control Conference (AUCC2013, Perth, Western Australia). Perth, Australia: Engineers Australia, 2013. С. 244-250.