Mon. Jan 26th, 2026

Problem Setting

In many tasks, what we want to solve is linear system

Ax=b

where

  • A \in \mathbb{R}^{n\times n}
  • the matrix entries of (A) are not explicitly available
  • the only operation we can perform is a matrix–vector product v \mapsto Av

This is the operator (matvec) setting.

Krylov Subspaces

Given an initial guess (x_0), define the initial residual

r_0 = b - A x_0

The Krylov subspace of dimension (k) is

\mathcal K_k(A,r_0)
=
\mathrm{span}\{r_0, Ar_0, \dots\}

Key idea:
We do not search for the solution in (\mathbb{R}^n), but only inside

x_k \in x_0 + \mathcal K_k(A,r_0)

Krylov approximation (polynomial form)

The exact solution is

x_* = A^{-1} b

A Krylov approximation has the form

x_0 + p_{k-1}(A)r_0

where

p_{k-1}(A)=\alpha_0 I + \alpha_1 A + \cdots + \alpha_{k-1} A^{k-1}.

Thus,

A^{-1} \approx p_{k-1}(A)

Krylov methods approximate the inverse operator by a low-degree polynomial in (A).

Why polynomial approximation works (spectral view)

Assume AAA is symmetric positive definite (SPD):

A = Q \Lambda Q^T\\
x_* = A^{-1} b=\sum_i \frac{1}{\lambda_i} (q_i^T b), q_i.

A Krylov approximation gives

x_k
=
\sum_i p_{k-1}(\lambda_i)(q_i^T b) q_i

So the problem reduces to

\frac{1}{\lambda} \approx p_{k-1}(\lambda)
\quad
\text{on } \lambda \in \sigma(A)

Krylov methods equal spectral polynomial approximation of (1/\lambda).

Conjugate Gradient (CG)

For SPD matrices A,

\boxed{
x_k^{\mathrm{CG}}
=
\arg\min_{x \in x_0 + \mathcal K_k(A,r_0)}
|x - x_*|_A
}

where

|v|_A = \sqrt{v^T A v}

CG computes the optimal Krylov approximation.

Implementation Details

Initialize

p_0 = r_0

Iteration:

x_{k+1} = x_k + \alpha_k p_k\\
\quad
r_{k+1} = r_k - \alpha_k A p_k

with

\alpha_k = \frac{r_k^T r_k}{p_k^T A p_k},
\quad
\beta_k = \frac{r_{k+1}^T r_{k+1}}{r_k^T r_k}

and update p with

p_{k+1} = r_{k+1} + \beta_k p_k

Why CG “does not break previous progress”

The search directions satisfy

p_i^T A p_j = 0 \quad (i \neq j)

i.e. (A)-orthogonality (conjugacy). Each iteration permanently fixes one error component

Krylov coefficients and CG

Polynomial form:

x_k = x_0 + \sum_{i=0}^{k-1} \alpha_i A^i r_0

CG form:

x_k = x_0 + \sum_{i=0}^{k-1} \tilde\alpha_i p_i

Relation:

  • Each (p_i = q_i(A) r_0) is itself a polynomial in (A),
  • the Krylov coefficients are absorbed into the basis,
  • CG does not explicitly compute polynomial coefficients,
  • but computes an equivalent representation in an (A)-orthogonal basis

GRMES

GRMES uses same Krylov space, different optimality.

Let (V_k) be an orthonormal basis of (\mathcal K_k(A,r_0)).

GMRES seeks

x_k
x_0 + V_k y_k,
\quad
y_k=
\arg\min_y |b - A(x_0 + V_k y)|_2.

Equivalently

y_k = \arg\min_y |\beta e_1 - \bar H_k y|_2

where (\bar H_k) is the Hessenberg matrix from Arnoldi.

CGGMRES
MatrixSPDGeneral
Objective( |x-x_*|_A )( |r|_2 )
CoefficientsFixed onceRecomputed every step
Basis(A)-orthogonalEuclidean orthogonal
Memory(O(1))(O(k))

GMRES recomputes all coefficients at every iteration.

What matvec alone enables

All of the following are of the form (p(A)b):

Linear solves

x \approx A^{-1} b

Matrix functions

f(A)b \approx p(A)b

Trace / log-determinant

\mathrm{tr}(f(A)) = \mathbb E[z^T f(A) z]

Gaussian sampling

x = A^{-1/2} z

Summary

  • CG: special SPD case → coefficients become permanent
  • GMRES: general case → coefficients must be recomputed
  • Krylov: the unifying framework