October 19, 2024

About

Ridge regression is a linear regression method that estimates weight parameters based on a nonlinear function basis. Note that we assume the normal distribution as the prior distribution of the weight parameters. This concept is used in Gaussian Process, so I will summarize briefly.

Modeling

The prediction model is simply described as

\begin{align}
y =  \Phi(x) w \\
\Phi(x) = \begin{bmatrix}
\phi_1(x_1) & \phi_2(x_1) & \cdots & \phi_H(x_1) \\
\phi_1(x_2) & \phi_2(x_2) & \cdots & \phi_H(x_2) \\
\vdots & \vdots & \ddots & \vdots \\
\phi_1(x_n) & \phi_2(x_n) & \cdots & \phi_H(x_n)
\end{bmatrix}
\end{align}

x is d-dimensional vector, H is the number of features and n is number of data.

The prior of weight values is described as Normal Distribution (3)

\begin{align}
w \sim  \space \N(0, \lambda I)
\end{align}

Since weight is gaussian distribution, the distribution of y is also gaussian distribution, y is gotten from linear transform.

\Sigma=E [y^Ty]=E[(\Phi(x) w)^T(\Phi(x) w)]\\
=\Phi(x)^T E[w^T w]\Phi(x)=\lambda^2 \Phi(x)^T\Phi(x)\\
w \sim  \space \N(0, \lambda^2 \Phi(x)^T\Phi(x))

Posterior of Weight Vector

The posterior of weight w given y is

p(w | \mathbf{y}, \Phi) \propto p(\mathbf{y} | \Phi, w) p(w)

As we can see above, the distribution of y and w is proportional to

\begin{align}
  p(\mathbf{y} | \Phi, w) \propto \exp\left(-\frac{1}{2} (\mathbf{y} - \Phi w)^T (\mathbf{y} - \Phi w)\right)\\
  p(w) \propto \exp\left(-\frac{1}{2\lambda} w^T w\right)
\end{align}

Using (4), (5), the w is analytically gotten

w = (\Phi^T \Phi + \frac{1}{\lambda} I)^{-1} \Phi^T \mathbf{y}