Optimal Rates for Stochastic Decision-Theoretic Online Learning Under Differentially Privacy
Posed by Bingshan Hu et al. (2024)
§ Problem Statement
Setup
Stochastic decision-theoretic online learning (full information) with privacy. Fix an integer and horizon . On each round , the learner outputs an action (possibly randomized). Then a loss vector is realized and fully revealed to the learner. For each action , the losses are i.i.d. draws from an unknown distribution on , and the draws are independent across actions and rounds. Let , assume the optimal action is unique with , define gaps and . The (expected) pseudo-regret is
where and the expectation is over both the losses and the learner's randomness. An online algorithm is -differentially private in the prefix (online) sense if for every , for every pair of loss prefixes and that differ in at most one round, and for every measurable set of action prefixes,
In the non-private case (), the optimal gap-dependent rate is (up to constants) in this stochastic full-information model.
Unsolved Problem
Characterize, as a function of , the optimal achievable gap-dependent pseudo-regret among all -DP algorithms: give matching (up to universal constants and at most logarithmic factors) upper and lower bounds on
§ Discussion
§ Significance & Implications
This problem asks for the sharp privacy-utility tradeoff in one of the cleanest stochastic online learning settings where full loss vectors are observed. Without privacy, the gap-dependent rate essentially does not grow with once (only logarithmically in ), so any unavoidable privacy penalty can be isolated and quantified. A precise characterization would (i) identify the correct dependence on and how it interacts with (e.g., whether there are distinct regimes vs. ), and (ii) determine whether extra factors sometimes seen in private analyses (such as terms from repeated private selection/leader updates) are information-theoretically necessary or merely artifacts of existing algorithmic templates. The answer would directly inform the design of instance-optimal private learners in full-information environments and provide a baseline for more complex private online decision problems.
§ Known Partial Results
Hu et al. (2024): Non-private baseline: in the stochastic full-information model described above, the optimal gap-dependent pseudo-regret rate is , and this dependence on is known to be unimprovable up to constants.
Hu et al. (2024): Privacy notion: the open problem uses an online (prefix) -DP guarantee that must hold for the distribution of the entire action prefix at every time , under a single-round change in the observed loss sequence.
Hu et al. (2024): Existing private bounds (as summarized in the COLT 2024 open-problem entry): known -DP algorithms achieve worst-case (gap-independent) pseudo-regret bounds that add an explicit privacy-dependent term scaling on the order of (up to constants and other problem-dependent factors).
Hu et al. (2024): Existing private gap-dependent analyses (as summarized in the COLT 2024 open-problem entry): refined gap-dependent upper bounds are known for private leader/selection-style methods, but in the worst case these can still include privacy penalties comparable to even when .
Hu et al. (2024): Existing lower bound (as summarized in the COLT 2024 open-problem entry): there is a gap-dependent lower bound of the form , suggesting that privacy can force regret to scale at least like when .
Hu et al. (2024): Main gap: it remains open to determine the minimax-optimal (up to logarithmic factors) dependence of on under prefix -DP, in particular whether dependence in the privacy term is inherent in this stochastic full-information setting.
§ References
Bingshan Hu, Nishant A. Mehta (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Open-problem note in COLT proceedings.
Bingshan Hu, Nishant A. Mehta (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Proceedings PDF.