Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters
Posed by Gianmarco Genalti et al. (2025)
§ Problem Statement
Setup
Stochastic heavy-tailed bandits with unknown tail parameters. Fix integers and horizon . For each arm , pulling arm at time yields an independent reward with mean (well-defined under the moment condition below). Let . For a (possibly randomized) policy selecting arms , define the expected cumulative regret
where the expectation is over rewards and the internal randomness of .
Assume heavy-tailed rewards in the sense that there exist unknown parameters and such that, for every arm and every ,
The learner is not given or , and no further assumptions on the are imposed.
Unsolved Problem
Open question 1 (minimax, parameter-free): Determine tight (up to constants and polylog factors, if appropriate) upper and lower bounds on
as a function of . Equivalently, characterize the best attainable regret when algorithms must be agnostic to the unknown moment order and moment bound .
Open question 2 (assumptions enabling adaptation): Identify and compare additional distributional assumptions under which a single parameter-free algorithm can (nearly) match the best-known regret rates achievable when are known, and clarify which such assumptions are genuinely weaker/stronger (or incomparable).
§ Discussion
§ Significance & Implications
Known-parameter heavy-tailed bandit algorithms tune robust mean estimators and confidence radii using and/or , so removing access to these quantities directly challenges how exploration bonuses are calibrated. This problem asks for the sharp information-theoretic price (in minimax regret) of not knowing the tail parameters under only a finite--moment condition, and for a principled delineation of what extra structural assumptions are sufficient to restore near-known-parameter regret guarantees.
§ Known Partial Results
Genalti et al. (2025): In the heavy-tailed bandit model with a uniform finite moment bound, robust mean-estimation techniques (e.g., truncation or median-of-means ideas) yield regret guarantees whose rates depend explicitly on and the moment bound .
Genalti et al. (2025): Existing approaches in this setting typically require knowing and/or to set estimator truncation/robustness levels and corresponding confidence widths.
Genalti et al. (2025): It is known that, without additional assumptions, one cannot generally adapt to either or without (i) paying extra regret relative to the known-parameter guarantees or (ii) imposing further distributional conditions; determining the best achievable regret with no extra assumptions remains open.
Genalti et al. (2025): The literature proposes multiple non-equivalent distributional assumptions that can enable parameter-free guarantees, and these assumptions are not ordered by implication in general (i.e., none is strictly weaker than all others).
§ References
Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters
Gianmarco Genalti, Alberto Maria Metelli (2025)
Conference on Learning Theory (COLT), PMLR 291
📍 Open-problem note in COLT proceedings.
Gianmarco Genalti, Alberto Maria Metelli (2025)
Conference on Learning Theory (COLT), PMLR 291
📍 Proceedings PDF.