What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits?
Posed by Achraf Azize et al. (2024)
§ Problem Statement
Setup
Fix integers and privacy parameters , . Consider a stochastic -armed linear contextual bandit over rounds with unknown parameter satisfying . In each round , the learner observes context vectors with , selects an action , and then receives reward
where is mean-zero noise (e.g., conditionally sub-Gaussian) independent across rounds. The (pseudo-)regret is
Model each round as a "user" with data tuple , and let the full dataset be . A randomized bandit algorithm induces a distribution over the entire action sequence . The algorithm is -jointly differentially private (JDP) if for every index , for all neighboring datasets that differ only in , and for every measurable set ,
where and the probability is over the algorithm's internal randomness (and any randomness in the interaction/noise model).
Unsolved Problem
Determine, up to at most polylogarithmic factors, the minimax expected regret under JDP,
including the correct dependence on . In particular, close the gap between the best reported upper bound
and the reported lower bound
by improving algorithms, strengthening lower bounds, or both.
§ Discussion
§ Significance & Implications
Linear contextual bandits are a canonical model for sequential recommendation with user-specific (and potentially sensitive) feedback. Joint differential privacy is tailored to this setting: it requires that changing one user's data has limited effect on the recommendations made to all other users, while allowing the recommendation shown to the changed user to vary freely. A tight characterization of would quantify the unavoidable privacy cost beyond the nonprivate benchmark (typically scaling like up to logs), and would determine whether the privacy-dependent regret term must grow on the order of (as in current upper bounds) or can be substantially smaller in its -dependence (not ruled out by the stated lower bound).
§ Known Partial Results
Azize et al. (2024): The COLT 2024 open-problem note reports an -JDP regret upper bound of for linear contextual bandits.
Azize et al. (2024): The same note reports a lower bound of .
Azize et al. (2024): Comparing these bounds leaves a gap in the privacy-dependent contribution: the upper bound includes a term scaling as (up to logs in and ), whereas the stated privacy-dependent lower-bound term has no factor.
Azize et al. (2024): The note highlights two directions to resolve the gap: (i) design JDP algorithms with improved dependence on , and/or (ii) develop sharper lower-bound techniques specific to the JDP constraint in linear contextual bandits.
§ References
Open Problem: What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits?
Achraf Azize, Debabrota Basu (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Open-problem note in COLT proceedings.
Achraf Azize, Debabrota Basu (2024)
Conference on Learning Theory (COLT), PMLR 247
📍 Proceedings PDF.