Unsolved

What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits?

Posed by Achraf Azize et al. (2024)

§ Problem Statement

Setup

Fix integers d,T,K1d,T,K \ge 1 and privacy parameters ϵ>0\epsilon>0, δ[0,1)\delta\in[0,1). Consider a stochastic KK-armed linear contextual bandit over TT rounds with unknown parameter θRd\theta^*\in\mathbb{R}^d satisfying θ21\|\theta^*\|_2\le 1. In each round t{1,,T}t\in\{1,\dots,T\}, the learner observes context vectors xt,1,,xt,KRdx_{t,1},\dots,x_{t,K}\in\mathbb{R}^d with xt,a21\|x_{t,a}\|_2\le 1, selects an action at{1,,K}a_t\in\{1,\dots,K\}, and then receives reward

rt=xt,at,θ+ηt,r_t=\langle x_{t,a_t},\theta^*\rangle+\eta_t,

where ηt\eta_t is mean-zero noise (e.g., conditionally sub-Gaussian) independent across rounds. The (pseudo-)regret is

Regret(T)=t=1T(maxa[K]xt,a,θxt,at,θ).\mathrm{Regret}(T)=\sum_{t=1}^T\Big(\max_{a\in[K]}\langle x_{t,a},\theta^*\rangle-\langle x_{t,a_t},\theta^*\rangle\Big).

Model each round as a "user" with data tuple zt=(xt,1:K,rt)z_t=(x_{t,1:K},r_t), and let the full dataset be D=(z1,,zT)D=(z_1,\dots,z_T). A randomized bandit algorithm induces a distribution over the entire action sequence a1:T=(a1,,aT)a_{1:T}=(a_1,\dots,a_T). The algorithm is (ϵ,δ)(\epsilon,\delta)-jointly differentially private (JDP) if for every index i[T]i\in[T], for all neighboring datasets D,DD,D' that differ only in ziz_i, and for every measurable set S[K]T1S\subseteq [K]^{T-1},

Pr[aiSD]eϵPr[aiSD]+δ,\Pr[a_{-i}\in S\mid D] \le e^{\epsilon}\Pr[a_{-i}\in S\mid D']+\delta,

where ai=(a1,,ai1,ai+1,,aT)a_{-i}=(a_1,\dots,a_{i-1},a_{i+1},\dots,a_T) and the probability is over the algorithm's internal randomness (and any randomness in the interaction/noise model).

Unsolved Problem

Determine, up to at most polylogarithmic factors, the minimax expected regret under JDP,

RJDP(d,T,K,ϵ,δ)=inf(ϵ,δ)-JDP alg. supθ,{xt,a},noise E[Regret(T)],R^*_{\mathrm{JDP}}(d,T,K,\epsilon,\delta)=\inf_{\text{$(\epsilon,\delta)$-JDP alg.}}\ \sup_{\theta^*,\{x_{t,a}\},\text{noise}}\ \mathbb{E}[\mathrm{Regret}(T)],

including the correct dependence on d,T,K,ϵ,δd,T,K,\epsilon,\delta. In particular, close the gap between the best reported upper bound

O ⁣(dTlogT+d3/4Tlog(1/δ)ϵ)O\!\left(d\sqrt{T}\log T+\frac{d^{3/4}\sqrt{T\log(1/\delta)}}{\sqrt{\epsilon}}\right)

and the reported lower bound

Ω ⁣(dTlogK+dϵ+δ)\Omega\!\left(\sqrt{dT\log K}+\frac{d}{\epsilon+\delta}\right)

by improving algorithms, strengthening lower bounds, or both.

§ Discussion

Loading discussion…

§ Significance & Implications

Linear contextual bandits are a canonical model for sequential recommendation with user-specific (and potentially sensitive) feedback. Joint differential privacy is tailored to this setting: it requires that changing one user's data has limited effect on the recommendations made to all other users, while allowing the recommendation shown to the changed user to vary freely. A tight characterization of RJDP(d,T,K,ϵ,δ)R^*_{\mathrm{JDP}}(d,T,K,\epsilon,\delta) would quantify the unavoidable privacy cost beyond the nonprivate benchmark (typically scaling like dTd\sqrt{T} up to logs), and would determine whether the privacy-dependent regret term must grow on the order of T/ϵ\sqrt{T}/\sqrt{\epsilon} (as in current upper bounds) or can be substantially smaller in its TT-dependence (not ruled out by the stated lower bound).

§ Known Partial Results

  • Azize et al. (2024): The COLT 2024 open-problem note reports an (ϵ,δ)(\epsilon,\delta)-JDP regret upper bound of O ⁣(dTlogT+d3/4Tlog(1/δ)ϵ)O\!\left(d\sqrt{T}\log T+\frac{d^{3/4}\sqrt{T\log(1/\delta)}}{\sqrt{\epsilon}}\right) for linear contextual bandits.

  • Azize et al. (2024): The same note reports a lower bound of Ω ⁣(dTlogK+dϵ+δ)\Omega\!\left(\sqrt{dT\log K}+\frac{d}{\epsilon+\delta}\right).

  • Azize et al. (2024): Comparing these bounds leaves a gap in the privacy-dependent contribution: the upper bound includes a term scaling as d3/4Tϵ\frac{d^{3/4}\sqrt{T}}{\sqrt{\epsilon}} (up to logs in TT and 1/δ1/\delta), whereas the stated privacy-dependent lower-bound term dϵ+δ\frac{d}{\epsilon+\delta} has no T\sqrt{T} factor.

  • Azize et al. (2024): The note highlights two directions to resolve the gap: (i) design JDP algorithms with improved dependence on (d,T,ϵ,δ)(d,T,\epsilon,\delta), and/or (ii) develop sharper lower-bound techniques specific to the JDP constraint in linear contextual bandits.

§ References

[1]

Open Problem: What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits?

Achraf Azize, Debabrota Basu (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits? (PDF)

Achraf Azize, Debabrota Basu (2024)

Conference on Learning Theory (COLT), PMLR 247

📍 Proceedings PDF.

§ Tags