Unsolved

Optimal Best Arm Identification with Fixed-Budget

Posed by Chao Qin (2022)

§ Problem Statement

Setup

Let ν=(P1,,PK)\nu=(P_1,\dots,P_K) be a stochastic KK-armed bandit instance. Pulling arm i{1,,K}i\in\{1,\dots,K\} yields an independent sample from an unknown distribution PiP_i on R\mathbb{R} with finite mean μi\mu_i. Assume a unique best arm i=argmaxiμii^*=\arg\max_{i}\mu_i.

In fixed-budget best-arm identification, an algorithm π\pi (possibly adaptive) allocates exactly nn pulls and then outputs a recommendation i^n{1,,K}\hat{i}_n\in\{1,\dots,K\}. The misidentification probability is

peπ(n;ν)=Prν,π(i^ni).p_e^{\pi}(n;\nu)=\Pr_{\nu,\pi}(\hat{i}_n\neq i^*).

Define the (instance-dependent) asymptotic error exponent achieved by π\pi as

Eπ(ν):=lim infn1nlogpeπ(n;ν),E_{\pi}(\nu):=\liminf_{n\to\infty}-\frac{1}{n}\log p_e^{\pi}(n;\nu),

and the optimal exponent as

E(ν):=supπEπ(ν).E^*(\nu):=\sup_{\pi} E_{\pi}(\nu).

Unsolved Problem

Characterize E(ν)E^*(\nu) in the fixed-budget setting (i.e., give an explicit instance-dependent formula together with matching achievability and converse bounds) for general bandit instances with a unique best arm, and design an algorithm π\pi that is asymptotically optimal in the sense Eπ(ν)=E(ν)E_{\pi}(\nu)=E^*(\nu) (or matches E(ν)E^*(\nu) under the strongest standard notion of asymptotic equivalence).

§ Discussion

Loading discussion…

§ Significance & Implications

A sharp characterization of E(ν)E^*(\nu) would determine the exact large-deviations rate at which the best-arm misidentification probability can decay as the budget nn grows, for each instance ν\nu. This would (i) turn fixed-budget best-arm identification into an instance-by-instance optimization target (what sampling proportions and decision rules are information-theoretically optimal), and (ii) yield definitive instance-dependent lower bounds for any algorithm. It would place the fixed-budget regime on comparable asymptotic footing to the fixed-confidence regime, where the optimal instance-dependent complexity is already characterized.

§ Known Partial Results

  • Qin (2022): In the fixed-confidence setting with a unique best arm, the optimal instance-dependent asymptotic complexity is characterized via information-theoretic (Chernoff-type) arguments and is explicitly formulated in modern treatments such as Garivier and Kaufmann (2016).

  • Qin (2022): In contrast, the COLT 2022 open-problem note (Qin, 2022) emphasizes that an explicit, general, instance-dependent characterization of the fixed-budget optimal error exponent E(ν)E^*(\nu) is not yet established, and it discusses related conjectures and questions about what the correct exponent and asymptotically optimal algorithms should be in this dual setting.

§ References

[1]

Open Problem: Optimal Best Arm Identification with Fixed-Budget

Chao Qin (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Optimal Best Arm Identification with Fixed-Budget (PDF)

Chao Qin (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Proceedings PDF.

§ Tags