42838

Автор(ы): 

Автор(ов): 

2

Параметры публикации

Тип публикации: 

Тезисы доклада

Название: 

Primal-dual accelerated gradient algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain

Электронная публикация: 

Да

Наименование конференции: 

  • European Conference on Queueing Theory (ECQT2016, Toulouse, France)

Наименование источника: 

  • Proceedings ECQT2016

Город: 

  • Toulouse, France

Издательство: 

  • CCSD

Год издания: 

2016

Страницы: 

57-57
Аннотация
The class of partially observable Markov decision processes with finite number of states and control values constitutes the basis of the queuing systems optimization under uncertainty. In this article, the adaptive control problem for a class of homogeneous finite Markov chains governed by a stochastic multi-armed bandit with unknown mean losses is considered. Using the problem statement as in (Nazin and Miller, AUCC 2013, Perth, Australia), one can apply a novel primal-dual algorithm which generalizes Nesterov’s accelerated gradient method for convex optimization. This algorithm may be considered as a non-trivial modification of the mirror descent randomized control algorithm which was proposed and studied in (Nazin and Miller, AUCC 2013, Perth, Australia). Here the explicit, non-asymptotic bounds for the mean losses at a given time horizon has been discussed and proved. Numerical example illustrates both theoretical and simulation results for suggested algorithm.

Библиографическая ссылка: 

Назин А.В., Миллер Б.М. Primal-dual accelerated gradient algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain / Proceedings ECQT2016. Toulouse, France: CCSD, 2016. С. 57-57.