Unsolved

Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters

Posed by Gianmarco Genalti et al. (2025)

§ Problem Statement

Setup

Stochastic heavy-tailed bandits with unknown tail parameters. Fix integers K2K\ge 2 and horizon T1T\ge 1. For each arm i{1,,K}i\in\{1,\dots,K\}, pulling arm ii at time tt yields an independent reward Xi,tPiX_{i,t}\sim P_i with mean μi:=E[Xi,t]\mu_i:=\mathbb{E}[X_{i,t}] (well-defined under the moment condition below). Let iargmaxiμii^*\in\arg\max_i \mu_i. For a (possibly randomized) policy π\pi selecting arms ItI_t, define the expected cumulative regret

RT(π;P1,,PK):=E[t=1T(μiμIt)],R_T(\pi;P_1,\dots,P_K):=\mathbb{E}\Big[\sum_{t=1}^T (\mu_{i^*}-\mu_{I_t})\Big],

where the expectation is over rewards and the internal randomness of π\pi.

Assume heavy-tailed rewards in the sense that there exist unknown parameters ϵ(0,1]\epsilon\in(0,1] and u(0,)u\in(0,\infty) such that, for every arm ii and every tt,

E[Xi,t1+ϵ]u.\mathbb{E}[|X_{i,t}|^{1+\epsilon}]\le u.

The learner is not given ϵ\epsilon or uu, and no further assumptions on the PiP_i are imposed.

Unsolved Problem

Open question 1 (minimax, parameter-free): Determine tight (up to constants and polylog factors, if appropriate) upper and lower bounds on

infπ independent of (ϵ,u) supϵ(0,1], u>0 sup(P1,,PK): i EXi1+ϵu RT(π;P1,,PK),\inf_{\pi\ \text{independent of }(\epsilon,u)}\ \sup_{\epsilon\in(0,1],\ u>0}\ \sup_{(P_1,\dots,P_K):\ \forall i\ \mathbb{E}|X_i|^{1+\epsilon}\le u}\ R_T(\pi;P_1,\dots,P_K),

as a function of (K,T)(K,T). Equivalently, characterize the best attainable regret when algorithms must be agnostic to the unknown moment order ϵ\epsilon and moment bound uu.

Open question 2 (assumptions enabling adaptation): Identify and compare additional distributional assumptions under which a single parameter-free algorithm can (nearly) match the best-known regret rates achievable when (ϵ,u)(\epsilon,u) are known, and clarify which such assumptions are genuinely weaker/stronger (or incomparable).

§ Discussion

Loading discussion…

§ Significance & Implications

Known-parameter heavy-tailed bandit algorithms tune robust mean estimators and confidence radii using ϵ\epsilon and/or uu, so removing access to these quantities directly challenges how exploration bonuses are calibrated. This problem asks for the sharp information-theoretic price (in minimax regret) of not knowing the tail parameters under only a finite-(1+ϵ)(1+\epsilon)-moment condition, and for a principled delineation of what extra structural assumptions are sufficient to restore near-known-parameter regret guarantees.

§ Known Partial Results

  • Genalti et al. (2025): In the heavy-tailed bandit model with a uniform finite (1+ϵ)(1+\epsilon) moment bound, robust mean-estimation techniques (e.g., truncation or median-of-means ideas) yield regret guarantees whose rates depend explicitly on ϵ\epsilon and the moment bound uu.

  • Genalti et al. (2025): Existing approaches in this setting typically require knowing ϵ\epsilon and/or uu to set estimator truncation/robustness levels and corresponding confidence widths.

  • Genalti et al. (2025): It is known that, without additional assumptions, one cannot generally adapt to either ϵ\epsilon or uu without (i) paying extra regret relative to the known-parameter guarantees or (ii) imposing further distributional conditions; determining the best achievable regret with no extra assumptions remains open.

  • Genalti et al. (2025): The literature proposes multiple non-equivalent distributional assumptions that can enable parameter-free guarantees, and these assumptions are not ordered by implication in general (i.e., none is strictly weaker than all others).

§ References

[1]

Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters

Gianmarco Genalti, Alberto Maria Metelli (2025)

Conference on Learning Theory (COLT), PMLR 291

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters (PDF)

Gianmarco Genalti, Alberto Maria Metelli (2025)

Conference on Learning Theory (COLT), PMLR 291

📍 Proceedings PDF.

§ Tags