55èmes Journées de Statistique de la SFdS

sciencesconf.org:jds2024:530723

Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Julien Zhou 1, 2, 3, *, @ , Pierre Gaillard 1, @ , Thibaud Rahier 3, @ , Houssam Zenati 4, 5, @ , Julyan Arbel 2, @

1 : Inria, Thoth

Inria Grenoble - Rhône-Alpes, Laboratoire Jean Kuntzmann

2 : Inria, Statify

Inria Grenoble - Rhône-Alpes, Laboratoire Jean Kuntzmann

3 : Criteo AI Lab

Criteo [Paris]

4 : Inria, Soda

INRIA Saclay Ile-de-France

5 : Inria, PreMeDICaL

Inria Montpellier

* : Auteur correspondant

We address stochastic combinatorial semi-bandit problems, where a player can select from P subsets of a set containing d base items. This setting has been studied extensively in prior works which focused on getting acceptable regret upper bounds, despite the potentially exponentially big action space. However, most existing algorithms (e.g. CombUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. Their regret upper bounds also involve these hypotheses, which can make them slack depending on the instance at hand. We propose a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance coefficients and computing upper confidence bounds for each action available. Estimating the coefficients of a covariance matrix can be manageable in practical settings and leveraging this information with the structure of the problem results in improved regret upper bounds.

Type :	:	oral
Thématiques	:	Session groupe MALIA
Mots-Clés	:	Bandits ; Stochastic Combinatorial Semi ; Bandit ; Covariance ; Confidence Ellipsoid

Vie privée | Accessibilité