Unsolved

log(n) factor in "Local Glivenko-Cantelli"

Posed by Doron Cohen et al. (2023)

§ Problem Statement

Setup

Let p=(pj)jNp=(p_j)_{j\in\mathbb{N}} be a nonincreasing sequence with 0pj1/20\le p_j\le 1/2 for all jj and pj0p_j\to 0 as jj\to\infty. For each nNn\in\mathbb{N}, let (Yj)jN(Y_j)_{j\in\mathbb{N}} be independent random variables with YjBinomial(n,pj)Y_j\sim\mathrm{Binomial}(n,p_j). Define the centered empirical errors Yˉj:=Yj/npj\bar Y_j:=Y_j/n-p_j and

Δn(p):=EsupjNYˉj.\Delta_n(p):=\mathbb{E}\,\sup_{j\in\mathbb{N}}\,|\bar Y_j|.

Using natural logarithms, define the complexity parameters

S(p):=supjNpjlog(j+1),T(p):=supjNlog(j+1)log(1/pj).S(p):=\sup_{j\in\mathbb{N}} p_j\,\log(j+1),\qquad T(p):=\sup_{j\in\mathbb{N}}\frac{\log(j+1)}{\log(1/p_j)}.

Cohen and Kontorovich (COLT 2023) proved that for all ne3n\ge e^3,

Δn(p)c(S(p)n+T(p)lognn)\Delta_n(p)\le c\left(\sqrt{\frac{S(p)}{n}}+\frac{T(p)\log n}{n}\right)

for an absolute constant c>0c>0.

Unsolved Problem

Is the factor logn\log n necessary? Does there exist an absolute constant C>0C>0 such that for all such sequences pp and all ne3n\ge e^3,

Δn(p)C(S(p)n+T(p)n)?\Delta_n(p)\le C\left(\sqrt{\frac{S(p)}{n}}+\frac{T(p)}{n}\right)?

§ Discussion

Loading discussion…

§ Significance & Implications

This problem asks for the sharp (up to absolute constants) finite-sample behavior of the expected sup-norm estimation error when simultaneously estimating countably many Bernoulli means with nonuniform, decaying marginals. The current best general upper bound matches known lower-bound scalings in S(p)S(p) and T(p)T(p) except for a multiplicative logn\log n in the T(p)/nT(p)/n term; removing it would yield a rate that is simultaneously consistent with both asymptotic lower bounds and would sharpen sample-size guarantees in regimes where T(p)T(p) dominates the error.

§ Known Partial Results

  • Cohen et al. (2023): (Characterization of consistency) Cohen and Kontorovich (2023) show that Δn(p)0\Delta_n(p)\to 0 as nn\to\infty if and only if T(p)<T(p)<\infty.

  • Cohen et al. (2023): (Finite-sample upper bound) For all ne3n\ge e^3, they prove Δn(p)c(S(p)/n+T(p)logn/n)\Delta_n(p)\le c\left(\sqrt{S(p)/n}+T(p)\log n/n\right) for an absolute constant c>0c>0.

  • Cohen et al. (2023): (Asymptotic lower bounds) They provide lower bounds of the form lim infnnΔn(p)cS(p)\liminf_{n\to\infty}\sqrt{n}\,\Delta_n(p)\ge c\,\sqrt{S(p)} and lim infnnΔn(p)cT(p)\liminf_{n\to\infty} n\,\Delta_n(p)\ge c\,T(p) (with a universal constant c>0c>0), showing that the dependence on S(p)S(p) and T(p)T(p) is asymptotically necessary up to absolute constants.

  • Cohen et al. (2023): (Statistical interpretation) The binomial formulation corresponds to the coordinate-wise empirical mean error for product Bernoulli measures under \ell_\infty: for i.i.d. samples X1,,Xn{0,1}dX_1,\dots,X_n\in\{0,1\}^d with coordinate means p(1),,p(d)p(1),\dots,p(d), one has Ep^p=Emaxj[d]p^(j)p(j)\mathbb{E}\,\|\hat p-p\|_\infty=\mathbb{E}\max_{j\in[d]}|\hat p(j)-p(j)|, and after sorting the coordinates by decreasing p(j)p(j) this reduces to the stated setting.

§ References

[1]

Open problem: log(n) factor in "Local Glivenko-Cantelli"

Doron Cohen, Aryeh Kontorovich (2023)

Conference on Learning Theory (COLT), PMLR 195

📍 Open-problem note in COLT proceedings.

[2]

Open problem: log(n) factor in "Local Glivenko-Cantelli" (PDF)

Doron Cohen, Aryeh Kontorovich (2023)

Conference on Learning Theory (COLT), PMLR 195

📍 Proceedings PDF.

§ Tags