Problem Setting
In many tasks, what we want to solve is linear system
Ax=b
where
- A \in \mathbb{R}^{n\times n}
- the matrix entries of (A) are not explicitly available
- the only operation we can perform is a matrix–vector product v \mapsto Av
This is the operator (matvec) setting.
Krylov Subspaces
Given an initial guess (x_0), define the initial residual
r_0 = b - A x_0
The Krylov subspace of dimension (k) is
\mathcal K_k(A,r_0)
=
\mathrm{span}\{r_0, Ar_0, \dots\}Key idea:
We do not search for the solution in (\mathbb{R}^n), but only inside
x_k \in x_0 + \mathcal K_k(A,r_0)
Krylov approximation (polynomial form)
The exact solution is
x_* = A^{-1} bA Krylov approximation has the form
x_0 + p_{k-1}(A)r_0where
p_{k-1}(A)=\alpha_0 I + \alpha_1 A + \cdots + \alpha_{k-1} A^{k-1}.
Thus,
A^{-1} \approx p_{k-1}(A)Krylov methods approximate the inverse operator by a low-degree polynomial in (A).
Why polynomial approximation works (spectral view)
Assume AAA is symmetric positive definite (SPD):
A = Q \Lambda Q^T\\
x_* = A^{-1} b=\sum_i \frac{1}{\lambda_i} (q_i^T b), q_i.A Krylov approximation gives
x_k
=
\sum_i p_{k-1}(\lambda_i)(q_i^T b) q_iSo the problem reduces to
\frac{1}{\lambda} \approx p_{k-1}(\lambda)
\quad
\text{on } \lambda \in \sigma(A)Krylov methods equal spectral polynomial approximation of (1/\lambda).
Conjugate Gradient (CG)
For SPD matrices A,
\boxed{
x_k^{\mathrm{CG}}
=
\arg\min_{x \in x_0 + \mathcal K_k(A,r_0)}
|x - x_*|_A
}where
|v|_A = \sqrt{v^T A v}CG computes the optimal Krylov approximation.
Implementation Details
Initialize
p_0 = r_0
Iteration:
x_{k+1} = x_k + \alpha_k p_k\\
\quad
r_{k+1} = r_k - \alpha_k A p_kwith
\alpha_k = \frac{r_k^T r_k}{p_k^T A p_k},
\quad
\beta_k = \frac{r_{k+1}^T r_{k+1}}{r_k^T r_k}and update p with
p_{k+1} = r_{k+1} + \beta_k p_kWhy CG “does not break previous progress”
The search directions satisfy
p_i^T A p_j = 0 \quad (i \neq j)
i.e. (A)-orthogonality (conjugacy). Each iteration permanently fixes one error component
Krylov coefficients and CG
Polynomial form:
x_k = x_0 + \sum_{i=0}^{k-1} \alpha_i A^i r_0CG form:
x_k = x_0 + \sum_{i=0}^{k-1} \tilde\alpha_i p_iRelation:
- Each (p_i = q_i(A) r_0) is itself a polynomial in (A),
- the Krylov coefficients are absorbed into the basis,
- CG does not explicitly compute polynomial coefficients,
- but computes an equivalent representation in an (A)-orthogonal basis
GRMES
GRMES uses same Krylov space, different optimality.
Let (V_k) be an orthonormal basis of (\mathcal K_k(A,r_0)).
GMRES seeks
x_k x_0 + V_k y_k, \quad y_k= \arg\min_y |b - A(x_0 + V_k y)|_2.
Equivalently
y_k = \arg\min_y |\beta e_1 - \bar H_k y|_2
where (\bar H_k) is the Hessenberg matrix from Arnoldi.
| CG | GMRES | |
|---|---|---|
| Matrix | SPD | General |
| Objective | ( |x-x_*|_A ) | ( |r|_2 ) |
| Coefficients | Fixed once | Recomputed every step |
| Basis | (A)-orthogonal | Euclidean orthogonal |
| Memory | (O(1)) | (O(k)) |
GMRES recomputes all coefficients at every iteration.
What matvec alone enables
All of the following are of the form (p(A)b):
Linear solves
x \approx A^{-1} bMatrix functions
f(A)b \approx p(A)b
Trace / log-determinant
\mathrm{tr}(f(A)) = \mathbb E[z^T f(A) z]Gaussian sampling
x = A^{-1/2} zSummary
- CG: special SPD case → coefficients become permanent
- GMRES: general case → coefficients must be recomputed
- Krylov: the unifying framework
