Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits
Julien Zhou  1, 2, 3, *@  , Pierre Gaillard  1@  , Thibaud Rahier  3@  , Houssam Zenati  4, 5@  , Julyan Arbel  2@  
1 : Inria, Thoth
Inria Grenoble - Rhône-Alpes, Laboratoire Jean Kuntzmann
2 : Inria, Statify
Inria Grenoble - Rhône-Alpes, Laboratoire Jean Kuntzmann
3 : Criteo AI Lab
Criteo [Paris]
4 : Inria, Soda
INRIA Saclay Ile-de-France
5 : Inria, PreMeDICaL
Inria Montpellier
* : Auteur correspondant

We address stochastic combinatorial semi-bandit problems, where a player can select from P subsets of a set containing d base items. This setting has been studied extensively in prior works which focused on getting acceptable regret upper bounds, despite the potentially exponentially big action space. However, most existing algorithms (e.g. CombUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. Their regret upper bounds also involve these hypotheses, which can make them slack depending on the instance at hand. We propose a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance coefficients and computing upper confidence bounds for each action available. Estimating the coefficients of a covariance matrix can be manageable in practical settings and leveraging this information with the structure of the problem results in improved regret upper bounds.


Personnes connectées : 3 Vie privée
Chargement...