March 9, 2025

About

Sparse matrix is good for representing a matrix that there are so many non-zero values. If we use it, solving equation gets much much faster than general form. But there are some formats to store the elements. How they handle data structure and which structure is good for what operations?

Data Structure

COO, CSR, CSC, DIA, LIL format

The below shows how each format stores a sparse matrix.

The COO is data format that has fundamental data structure. It has indices of rows and columns where the non-zero values are. The CSR, and CSC is kind of compressed version of COO format. As you can see in COO format below, row indices must take step structure, like [0, 0, 0, 1, 1, 2, 3, 3, 4, 5, 6, 6,,,]. It means that either of row or column indices can be compressed.

COO Format:
row indices: [0 1 1 2 3]
column indices: [2 0 3 1 2]
data: [3. 2. 4. 5. 6.]

CSR Format:
index pointer: [0 1 3 4 5]
indices: [2 0 3 1 2]
data: [3. 2. 4. 5. 6.]

CSC Format:
index pointer: [0 1 2 4 5]
indices: [1 2 0 3 1]
data: [2. 5. 3. 6. 4.]

DIA Format:
offsets: [-1  2]
data (stored diagonals): [[2. 5. 6. 0.]
 [0. 0. 3. 4.]]

LIL Format:
rows (list of lists): [list([2]) list([0, 3]) list([1]) list([2])]
data (list of lists): [list([3.0]) list([2.0, 4.0]) list([5.0]) list([6.0])]

The DIA format stores only the diagonal elements of a matrix, all other elements are discarded. For example,

A =
\begin{bmatrix}
0 & 0 & 3 & 0 \\
2 & 0 & 0 & 4 \\
0 & 5 & 0 & 0 \\
0 & 0 & 6 & 0
\end{bmatrix}
offsets: [-1, 2]
data:
[  2   5   6   0  ]  # Lower Diagonal
[  0   0   3   4  ]  #  Upper Diagonal

LIL is ​​a format that stores non-zero elements in a list for each row.
Since elements are stored in list format for each row, it is easy to add or change elements in a sparse matrix. But it is not good at rapid computing.

rows: [list([2]), list([0, 3]), list([1]), list([2])]
data: [list([3.0]), list([2.0, 4.0]), list([5.0]), list([6.0])]

Code

The code to print above result is here.

def print_sparse_matrix_details(A, name):
    print(f"\n{name} Format:")
    
    if isinstance(A, csr_matrix) or isinstance(A, csc_matrix):
        print("index pointer:", A.indptr)
        print("indices:", A.indices)
        print("data:", A.data)
    
    elif isinstance(A, coo_matrix):
        print("row indices:", A.row)
        print("column indices:", A.col)
        print("data:", A.data)
    
    elif isinstance(A, dia_matrix):
        print("offsets:", A.offsets)
        print("data (stored diagonals):", A.data)
    
    elif isinstance(A, lil_matrix):
        print("rows (list of lists):", A.rows)
        print("data (list of lists):", A.data)

A_coo = coo_matrix([
    [0, 0, 3, 0],
    [2, 0, 0, 4],
    [0, 5, 0, 0],
    [0, 0, 6, 0]
], dtype=float)

A_csr = A_coo.tocsr()
A_csc = A_coo.tocsc()
A_dia = dia_matrix(A_coo) 
A_lil = A_coo.tolil()

print_sparse_matrix_details(A_coo, "COO")
print_sparse_matrix_details(A_csr, "CSR")
print_sparse_matrix_details(A_csc, "CSC")
print_sparse_matrix_details(A_dia, "DIA")
print_sparse_matrix_details(A_lil, "LIL")

Solve Linear System

To know how these data structure is good (or bad) at solving linear equation, I tried to solve a linear system (10000 x 10000) with these data format to compare with.

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7

Result

times_lsmr {'COO': 21.84102439880371, 'CSR': 13.43990182876587, 'CSC': 13.671040534973145, 'DIA': 681.381429195404, 'LIL': 111.96334171295166}

times_spsolve {'COO': 87.6583023071289, 'CSR': 88.57207345962524, 'CSC': 86.61457252502441, 'DIA': 92.32285618782043, 'LIL': 86.9930489063263}

times_splu {'COO': 91.55107426643372, 'CSR': 94.72534799575806, 'CSC': 90.0424132347107, 'DIA': 96.09503722190857, 'LIL': 95.04460597038269}

Code

import time
from scipy.sparse.linalg import spsolve, inv
from scipy.sparse.linalg import splu
import numpy as np
from scipy.sparse import coo_matrix, csr_matrix, csc_matrix, dia_matrix, lil_matrix
from scipy.sparse.linalg import lsmr


n = 10000
density = 0.01

rng = np.random.default_rng(42)
rows = rng.integers(0, n, int(n * n * density))
cols = rng.integers(0, n, int(n * n * density))
data = rng.random(int(n * n * density))
A_coo = coo_matrix((data, (rows, cols)), shape=(n, n))

b = rng.random(n)

A_csr = A_coo.tocsr()
A_csc = A_coo.tocsc()
A_dia = dia_matrix(A_coo) 
A_lil = A_coo.tolil()

def measure_time(A, b):
    start = time.time()
    x = lsmr(A, b)[0]
    end = time.time()
    return end - start

times_lsmr = {
    "COO": measure_time(A_coo, b),
    "CSR": measure_time(A_csr, b),
    "CSC": measure_time(A_csc, b),
    "DIA": measure_time(A_dia, b),
    "LIL": measure_time(A_lil, b),
}



def measure_spsolve(A, b):
    try:
        start = time.time()
        x = spsolve(A, b)
        end = time.time()
        return end - start
    except Exception as e:
        return str(e) 

def measure_splu(A, b):
    try:
        start = time.time()
        lu = splu(A)  # LU
        x = lu.solve(b)
        end = time.time()
        return end - start
    except Exception as e:
        return str(e)

times_spsolve = {
    "COO": measure_spsolve(A_coo, b),
    "CSR": measure_spsolve(A_csr, b),
    "CSC": measure_spsolve(A_csc, b),
    "DIA": measure_spsolve(A_dia, b),
    "LIL": measure_spsolve(A_lil, b),
}

times_splu = {
    "COO": measure_splu(A_coo, b),
    "CSR": measure_splu(A_csr, b),
    "CSC": measure_splu(A_csc, b),
    "DIA": measure_splu(A_dia, b),
    "LIL": measure_splu(A_lil, b),
}

print("times_lsmr", times_lsmr)
print("times_spsolve", times_spsolve)
print("times_splu", times_splu)

Warnings…

diagonals is inefficient
  A = arg1.todia()
SparseEfficiencyWarning: spsolve requires A be CSC or CSR matrix format
diagonals is inefficient
  A = arg1.todia()

This code is going to solve linear equation Ax=B by the way, It is better to set B as dense matrix. Because it will be converted into dense matrix even when it is sparse matrix. Therefore, the overhead will be happened.