Learning sparse linear concepts by priming the features
Posed by Manfred K. Warmuth et al. (2023)
§ Problem Statement
Setup
Online linear regression with squared loss in dimension . For rounds : the learner observes , predicts , then observes and incurs loss . Assume coordinatewise bounded instances and bounded outcomes for all .
Let be the design matrix with rows and . For a matrix , let denote the Moore-Penrose pseudoinverse, and for let be the diagonal matrix with on the diagonal.
A one-stage feature-priming least-squares predictor is specified by a closed-form rule that maps past data to a priming vector , forms primed design , computes primed least-squares coefficients , and sets
The COLT 2023 note highlights concrete closed-form priming rules motivated by sparsity, including (i) per-coordinate 1D least squares when (and e.g. otherwise), (ii) correlation-style scores between column and , and (iii) the two-pass rule where is the minimum-norm least-squares fit on unprimed features.
Unsolved Problem
For at least one of these priming rules, does the resulting online algorithm satisfy a regret bound with only logarithmic dependence on against sparse (or -bounded) linear comparators? Namely, is there a bound of the form
(or more finely when competing with -sparse comparators), uniformly over all sequences with and ? The question remains open even in noise-free realizable settings with sparse .
§ Discussion
§ Significance & Implications
For squared-loss online prediction, multiplicative-update methods can exploit sparsity to obtain regret bounds whose dimension dependence is logarithmic in under / type conditions, but their updates are not least-squares closed forms. In contrast, least-squares-style (Euclidean/second-order) updates are closed-form but typically do not yield sparsity-adaptive dimension dependence. Feature priming is a concrete, computationally simple way to bias a second least-squares fit toward a small set of coordinates using only the past data. Proving (or refuting) a worst-case -type regret guarantee for such priming would clarify whether closed-form least-squares predictors can match sparsity-adaptive guarantees without switching to multiplicative updates, and would delineate the limits of coordinate-sensitive preconditioning schemes in online regression.
§ Known Partial Results
Warmuth et al. (2023): The COLT 2023 note observes that sparse linear problems are learned well by online multiplicative updates, motivating the search for closed-form least-squares-style alternatives.
Warmuth et al. (2023): The proposed priming family is fully specified by closed-form functions of past data followed by pseudoinverse least squares on a diagonally rescaled (coordinate-primed) design.
Warmuth et al. (2023): Experiments reported in the note indicate that several priming rules (notably using and variants such as per-feature 1D least squares or correlation-based scores) can identify sparse relevant features faster than vanilla least squares while behaving similarly to least squares on dense targets.
Warmuth et al. (2023): No worst-case online regret bound is currently given for these priming-based closed-form updates, and the existence of sparsity-adaptive (e.g., or ) regret bounds remains open.
§ References
Open Problem: Learning sparse linear concepts by priming the features
Manfred K. Warmuth, Ehsan Amid (2023)
Conference on Learning Theory (COLT), PMLR 195
📍 Open-problem note in COLT proceedings.
Open Problem: Learning sparse linear concepts by priming the features (PDF)
Manfred K. Warmuth, Ehsan Amid (2023)
Conference on Learning Theory (COLT), PMLR 195
📍 Proceedings PDF.