Optimal Best Arm Identification with Fixed-Budget

§ Problem Statement

Setup

Let $\nu=(P_1,\dots,P_K)$ be a stochastic $K$ -armed bandit instance. Pulling arm $i\in\{1,\dots,K\}$ yields an independent sample from an unknown distribution $P_i$ on $\mathbb{R}$ with finite mean $\mu_i$ . Assume a unique best arm $i^*=\arg\max_{i}\mu_i$ .

In fixed-budget best-arm identification, an algorithm $\pi$ (possibly adaptive) allocates exactly $n$ pulls and then outputs a recommendation $\hat{i}_n\in\{1,\dots,K\}$ . The misidentification probability is

p_e^{\pi}(n;\nu)=\Pr_{\nu,\pi}(\hat{i}_n\neq i^*).

Define the (instance-dependent) asymptotic error exponent achieved by $\pi$ as

E_{\pi}(\nu):=\liminf_{n\to\infty}-\frac{1}{n}\log p_e^{\pi}(n;\nu),

and the optimal exponent as

E^*(\nu):=\sup_{\pi} E_{\pi}(\nu).

Unsolved Problem

Characterize $E^*(\nu)$ in the fixed-budget setting (i.e., give an explicit instance-dependent formula together with matching achievability and converse bounds) for general bandit instances with a unique best arm, and design an algorithm $\pi$ that is asymptotically optimal in the sense $E_{\pi}(\nu)=E^*(\nu)$ (or matches $E^*(\nu)$ under the strongest standard notion of asymptotic equivalence).

§ Discussion

Loading discussion…

§ Significance & Implications

A sharp characterization of $E^*(\nu)$ would determine the exact large-deviations rate at which the best-arm misidentification probability can decay as the budget $n$ grows, for each instance $\nu$ . This would (i) turn fixed-budget best-arm identification into an instance-by-instance optimization target (what sampling proportions and decision rules are information-theoretically optimal), and (ii) yield definitive instance-dependent lower bounds for any algorithm. It would place the fixed-budget regime on comparable asymptotic footing to the fixed-confidence regime, where the optimal instance-dependent complexity is already characterized.

§ Known Partial Results

Qin (2022): ).
Qin (2022): In the fixed-confidence setting with a unique best arm, the optimal instance-dependent asymptotic complexity is characterized via information-theoretic (Chernoff-type) arguments and is explicitly formulated in modern treatments such as Garivier and Kaufmann (2016).
Qin (2022): In contrast, the COLT 2022 open-problem note (Qin, 2022) emphasizes that an explicit, general, instance-dependent characterization of the fixed-budget optimal error exponent $E^*(\nu)$ is not yet established, and it discusses related conjectures and questions about what the correct exponent and asymptotically optimal algorithms should be in this dual setting.

§ References

[1]

Open Problem: Optimal Best Arm Identification with Fixed-Budget

Chao Qin (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Open-problem note in COLT proceedings.

Link ↗

[2]

Open Problem: Optimal Best Arm Identification with Fixed-Budget (PDF)

Chao Qin (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Proceedings PDF.

Link ↗

§ Tags

colt-open-problem learning-theory