Unsolved

Better Differentially Private Learning Algorithms with Margin Guarantees

Posed by Raef Bassily et al. (2022)

§ Problem Statement

Setup

Fix sample size mNm\in\mathbb{N}, privacy parameters ϵ>0\epsilon>0 and δ0\delta\ge 0, a margin parameter ρ>0\rho>0, and a distribution DD over Z=X×YZ=X\times Y with Y={1,+1}Y=\{-1,+1\}. Let S=((x1,y1),,(xm,ym))DmS=((x_1,y_1),\ldots,(x_m,y_m))\sim D^m.

(Margin losses.) For any measurable score function h:XRh:X\to\mathbb{R} define the (population) 00-11 error and its empirical counterpart

RD(h)=Pr(x,y)D[yh(x)0],R^S(h)=1mi=1m1[yih(xi)0].R_D(h)=\Pr_{(x,y)\sim D}[y\,h(x)\le 0],\qquad \hat R_S(h)=\frac{1}{m}\sum_{i=1}^m \mathbf{1}[y_i h(x_i)\le 0].

For ρ0\rho\ge 0 define the ρ\rho-margin losses

RDρ(h)=Pr(x,y)D[yh(x)ρ],R^Sρ(h)=1mi=1m1[yih(xi)ρ].R_D^{\rho}(h)=\Pr_{(x,y)\sim D}[y\,h(x)\le \rho],\qquad \hat R_S^{\rho}(h)=\frac{1}{m}\sum_{i=1}^m \mathbf{1}[y_i h(x_i)\le \rho].

For linear prediction, with hw(x)=w,xh_w(x)=\langle w,x\rangle, define the ρ\rho-hinge loss ρ(u)=max(1u/ρ,0)\ell_\rho(u)=\max(1-u/\rho,0) and

LDρ(w)=E(x,y)D[ρ(yw,x)],L^Sρ(w)=1mi=1mρ(yiw,xi).L_D^{\rho}(w)=\mathbb{E}_{(x,y)\sim D}[\ell_\rho(y\langle w,x\rangle)],\qquad \hat L_S^{\rho}(w)=\frac{1}{m}\sum_{i=1}^m \ell_\rho(y_i\langle w,x_i\rangle).

(Differential privacy.) A randomized algorithm A:(X×Y)mH\mathcal{A}:(X\times Y)^m\to\mathcal{H} is (ϵ,δ)(\epsilon,\delta)-differentially private if for all neighboring datasets S,SS,S' differing in exactly one example and all measurable OH\mathcal{O}\subseteq\mathcal{H},

Pr[A(S)O]eϵPr[A(S)O]+δ.\Pr[\mathcal{A}(S)\in\mathcal{O}]\le e^{\epsilon}\Pr[\mathcal{A}(S')\in\mathcal{O}]+\delta.

Write O~()\tilde O(\cdot) for bounds hiding polylogarithmic factors in the relevant parameters.

Unsolved Problem

Differentially private learners achieving confidence-margin (dimension-independent) generalization guarantees with improved computational or size-dependence, in two settings:

  1. Faster DP margin-based learning for linear and kernel classifiers. Assume XBd(r)={xRd:x2r}X\subseteq B_d(r)=\{x\in\mathbb{R}^d:\|x\|_2\le r\} and consider
HLin={hw:xw,xwBd(Λ)}.H_{\mathrm{Lin}}=\{h_w:x\mapsto \langle w,x\rangle\mid w\in B_d(\Lambda)\}.

The COLT 2022 open-problem note describes an (ϵ,δ)(\epsilon,\delta)-DP algorithm outputting wprivBd(Λ)w_{\mathrm{priv}}\in B_d(\Lambda) such that, with high probability over SDmS\sim D^m and the algorithm randomness,

RD(hwpriv)minwBd(Λ)L^Sρ(w)+O~ ⁣(Λrρmin(1,ϵ)m),R_D(h_{w_{\mathrm{priv}}})\le \min_{w\in B_d(\Lambda)} \hat L_S^{\rho}(w)+\tilde O\!\left(\frac{\Lambda r}{\rho\sqrt{\min(1,\epsilon)m}}\right),

with running time about O~(md)\tilde O(md). For kernel classification with a continuous, positive definite, shift-invariant kernel KK with RKHS (H,H)(\mathcal{H},\|\cdot\|_{\mathcal{H}}) and class HΛ={hH:hHΛ}\mathcal{H}_\Lambda=\{h\in\mathcal{H}:\|h\|_{\mathcal{H}}\le \Lambda\}, the note describes an (ϵ,δ)(\epsilon,\delta)-DP algorithm outputting hprivh_{\mathrm{priv}} such that, with high probability,

RD(hpriv)minhHΛL^Sρ(h)+O~ ⁣(Λrρmin(1,ϵ)m),R_D(h_{\mathrm{priv}})\le \min_{h\in\mathcal{H}_\Lambda} \hat L_S^{\rho}(h)+\tilde O\!\left(\frac{\Lambda r}{\rho\sqrt{\min(1,\epsilon)m}}\right),

with running time about O~(m3d)\tilde O(m^3 d).

Question 1 (computational): Do there exist (ϵ,δ)(\epsilon,\delta)-DP algorithms for these linear and kernel problems that achieve essentially the same margin-based guarantees while having strictly better running time (as a polynomial in mm and dd) than O~(md)\tilde O(md) for linear predictors and O~(m3d)\tilde O(m^3 d) for the above kernel approach?

  1. Neural networks: DP margin guarantees with no explicit size dependence. Let HNNH_{\mathrm{NN}} be a family of feed-forward neural networks of depth LL on inputs in Bd(r)B_d(r), and let HNN,ΛHNNH_{\mathrm{NN},\Lambda}\subseteq H_{\mathrm{NN}} be those networks whose weight matrices are each bounded in Frobenius norm by Λ>0\Lambda>0. Suppose each hidden layer has width NN (so total size scales with NLNL). The note describes a pure ϵ\epsilon-DP algorithm outputting hprivHNNh_{\mathrm{priv}}\in H_{\mathrm{NN}} such that, with high probability,
RD(hpriv)minhHNN,ΛR^Sρ(h)+O ⁣(rΛLNLρm+r2(2Λ)2LNLρ2ϵm),R_D(h_{\mathrm{priv}})\le \min_{h\in H_{\mathrm{NN},\Lambda}} \hat R_S^{\rho}(h) + O\!\left(\frac{r\Lambda L\sqrt{NL}}{\rho\sqrt{m}} + \frac{r^2(2\Lambda)^{2L}NL}{\rho^2\epsilon m}\right),

which is independent of the input dimension dd but depends explicitly on NLNL.

Question 2 (size dependence): Is it possible to design a DP learning algorithm for HNNH_{\mathrm{NN}} with a margin-based generalization guarantee that has no explicit dependence on network size, e.g. achieving (with high probability)

RD(hpriv)minhHNN,ΛR^Sρ(h)+O ⁣(rΛLρm+r2Λ2Lρ2ϵm)?R_D(h_{\mathrm{priv}})\le \min_{h\in H_{\mathrm{NN},\Lambda}} \hat R_S^{\rho}(h) + O\!\left(\frac{r\Lambda L}{\rho\sqrt{m}} + \frac{r^2\Lambda^{2L}}{\rho^2\epsilon m}\right)?

§ Discussion

Loading discussion…

§ Significance & Implications

Margin-based guarantees are a central route to dimension-independent generalization bounds in classification. Under differential privacy, the COLT 2022 open-problem note highlights that existing approaches either incur substantial computational cost (even when the statistical guarantee is favorable) or, for neural networks, achieve dimension independence at the price of an explicit dependence on network size. Progress on Question 1 would turn the best-known DP margin guarantees for linear and kernel methods into algorithms with genuinely scalable polynomial-time dependence on mm and dd. Progress on Question 2 would clarify whether DP learning for neural networks can attain margin bounds controlled by norm/geometry parameters (e.g., r,Λ,L,ρ,ϵ,mr,\Lambda,L,\rho,\epsilon,m) without an explicit width/parameter-count term, which would align DP generalization guarantees more closely with the kind of size-insensitive margin theory studied in non-private settings.

§ Known Partial Results

  • Bassily et al. (2022): The COLT 2022 open-problem note (Bassily, Mohri, Suresh) formulates DP confidence-margin learning as a route to dimension-independent guarantees and identifies two main gaps: computational efficiency for linear/kernel methods and network-size dependence for neural networks.

  • Bassily et al. (2022): For linear predictors with XBd(r)X\subseteq B_d(r) and w2Λ\|w\|_2\le \Lambda, the note describes an (ϵ,δ)(\epsilon,\delta)-DP learner achieving a high-probability guarantee of the form RD(hwpriv)minwΛL^Sρ(w)+O~ ⁣(Λrρmin(1,ϵ)m)R_D(h_{w_{\mathrm{priv}}})\le \min_{\|w\|\le \Lambda} \hat L_S^{\rho}(w)+\tilde O\!\left(\tfrac{\Lambda r}{\rho\sqrt{\min(1,\epsilon)m}}\right) with running time about O~(md)\tilde O(md).

  • Bartlett et al. (2017): For shift-invariant kernels with RKHS norm constraint hHΛ\|h\|_{\mathcal{H}}\le \Lambda, the note describes an (ϵ,δ)(\epsilon,\delta)-DP learner achieving an analogous margin-based guarantee with running time about O~(m3d)\tilde O(m^3 d).

  • Bartlett et al. (2017): For depth-LL feed-forward neural networks with Frobenius-norm-bounded weight matrices (bound Λ\Lambda) and width NN, the note describes a pure ϵ\epsilon-DP learner that is independent of the input dimension dd but whose bound includes explicit dependence on NLNL (e.g., terms scaling like rΛLNLρm\frac{r\Lambda L\sqrt{NL}}{\rho\sqrt{m}} and r2(2Λ)2LNLρ2ϵm\frac{r^2(2\Lambda)^{2L}NL}{\rho^2\epsilon m}).

  • Bassily et al. (2022): The note poses as open whether the explicit network-size dependence in such DP margin guarantees is inherent, and whether comparable guarantees can be achieved with substantially better computational complexity for linear/kernel margin-based learners.

§ References

[1]

Open Problem: Better Differentially Private Learning Algorithms with Margin Guarantees

Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Open-problem note in COLT proceedings.

[2]

Open Problem: Better Differentially Private Learning Algorithms with Margin Guarantees (PDF)

Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh (2022)

Conference on Learning Theory (COLT), PMLR 178

📍 Proceedings PDF.

[3]

Spectrally-normalized margin bounds for neural networks

Peter L. Bartlett, Dylan J. Foster, Matus J. Telgarsky (2017)

📍 Mentioned in the open-problem note as a non-private margin-bound comparator.

§ Tags