Representer theorem

Let $X$ be a nonempty set, $K$ a positive-definite real-valued kernel on $X \times X$ with corresponding reproducing kernel Hilbert space $H_{k}$ , and let $R : H_{k} \to R$ be a differentiable regularization function. Then given a training sample $(x_{1}, y_{1}), \dots, (x_{n}, y_{n}) \in X \times R$ and an arbitrary error function $E : (X \times R^{2})^{m} \to R \cup {\infty}$ , a minimizer $f^{*} = f \in H_{k} argmin {E ((x_{1}, y_{1}, f (x_{1})), \dots, (x_{n}, y_{n}, f (x_{n}))) + R (f)}$ of the regularized empirical risk admits a representation of the form $f^{*} (\cdot) = \sum_{i = 1}^{n} α_{i} k (\cdot, x_{i}),$ Why it’s cool
Representer theorems are useful from a practical standpoint because they dramatically simplify the regularized ERM problem.
- In most interesting applications, the search domain $H_{k}$ for the minimization will be an infinite-dimensional subspace of $L_{2} (X)$ and therefore the search (as written) does not admit implementation on finite-memory and finite-precision computers.
- In contrast, the representation of $f^{*} (\cdot)$ afforded by a representer theorem reduces the original (infinite-dimensional) minimization problem to a search for the optimal $n$ -dimensional vector of coefficients $α$ ; it can then be obtained by applying any standard function minimization algorithm.

🤖 Harold's Notes