# Introduction

`faer`

is a linear algebra library developed in the Rust programming language.
It exposes a high level abstraction over a lower level API, similar to the one offered by BLAS/Lapack libraries, modified to better match the capabilities of modern computers, and allows tweaking the inner parameters of the various algorithms.

# Community

Most of the discussion around the library happens on the Discord server. Users who have questions about using the library, performance issues, bug reports, etc, as well as people looking to contribute, are welcome to join the server!

Bug reports and pull requests can also be posted on the GitHub repository.

# Available features

- Matrix creation and basic operation (addition, subtraction, multiplication, etc.),
- High performance matrix multiplication,
- Triangular solvers,
- Cholesky decomposition, LLT, LDLT, and Bunch-Kaufman LBLT,
- LU decomposition, with partial or full pivoting,
- QR decomposition, with or without column pivoting.
- SVD decomposition.
- Eigenvalue decomposition for hermitian and non hermitian (real or complex) matrices.
- Sparse algorithms.

# Future plans

- n-dimensional tensor structures and operations.
- GPU acceleration.

To use faer in your project, you can add the required dependency to your `Cargo.toml`

file.

```
[dependencies]
faer = "0.18"
```

For day-to-day development, we recommend enabling optimizations for `faer`

, since the layers of generics can add considerable overhead in unoptimized builds.
This can be done by adding the following to `Cargo.toml`

```
[profile.dev.package.faer]
opt-level = 3
```

We recommend new users to skim the user guide provided in the sidebar, and defer to the docs for more detailed information.

The benchmarks were run on an `11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz`

with 1 thread (hyperthreading off).

All computations are done on column major matrices.

The plots are normalized so that the `faer`

line is set to `1.0`

.

Higher is better on all the plots.

Last updated: 2024-05-25

## f32

## f64

## c32

## c64

The benchmarks were run on an `11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz`

with 6 threads (hyperthreading off).

All computations are done on column major matrices.

The plots are normalized so that the `faer`

line is set to `1.0`

.

Higher is better on all the plots.

Last updated: 2024-05-25

## f32

## f64

## c32

## c64

# Migration from v0.17 to v0.18

`faer v0.18`

comes with a refactor of the code base into a more scalable form and to better improve the user experience with the docs.rs documentation.
The changes affect mostly the low level APIs. These include:

- Merging
`faer-core`

,`faer-cholesky`

,`faer-lu`

,`faer-qr`

,`faer-svd`

,`faer-evd`

,`faer-sparse`

into`faer`

. Those crates are now deprecated and will no longer be updated, so switching to depending on`faer`

directly is recommended. - Low level matrix multiplication moved from
`faer_core::mul`

to`faer::linalg::matmul`

. - Low level triangular matrix solve moved from
`faer_core::solve`

to`faer::linalg::triangular_solve`

. - Low level triangular matrix inverse moved from
`faer_core::inverse`

to`faer::linalg::triangular_inverse`

. - Low level Cholesky implementation moved from
`faer_cholesky`

to`faer::linalg::cholesky`

. - Sparse API moved from
`faer_core::sparse`

to`faer::sparse`

. - Low level LU implementation moved from
`faer_lu`

to`faer::linalg::lu`

. - Low level QR implementation moved from
`faer_qr`

to`faer::linalg::qr`

. - Low level SVD implementation moved from
`faer_svd`

to`faer::linalg::svd`

. - Low level EVD implementation moved from
`faer_evd`

to`faer::linalg::evd`

. - Low level sparse decompositions moved from
`faer_sparse`

to`faer::sparse::linalg`

. - High level dense solvers moved from
`faer::solvers`

to`faer::linalg::solvers`

. - High level sparse solvers moved from
`faer::sparse::solvers`

to`faer::sparse::linalg::solvers`

.

For an easier transition, deprecated aliases were placed in `faer::modules::{core,cholesky,lu,qr,svd,evd,sparse}`

that emulate the corresponding old versions of the low level crates to some extent, but they will be removed in a future version.

# Matrix management

## Creating a matrix

`faer`

provides several ways to create dense matrices and matrix views.

The main matrix types are `Mat`

, `MatRef`

and `MatMut`

,
which can be thought of as being analogous to `Vec`

, `&[_]`

and `&mut [_]`

.

`Mat`

owns its data, provides read-write access to it, and can be resized after creation.`MatRef`

provides read-only access to the underlying data.`MatMut`

provides read-write access to the underlying data.

The most flexible way to initialize a matrix is to initialize a zero matrix, then fill out the values by hand.

```
use faer::Mat;
let mut a = Mat::<f64>::zeros(4, 3);
for j in 0..a.ncols() {
for i in 0..a.nrows() {
a[(i, j)] = 9.0;
}
}
```

Given a callable object that outputs the matrix elements, `Mat::from_fn`

, can also be used.

```
use faer::Mat;
let a = Mat::from_fn(3, 4, |i, j| (i + j) as f64);
```

For common matrices such as the zero matrix and the identity matrix, shorthands are provided.

```
use faer::Mat;
// creates a 10×4 matrix whose values are all `0.0`.
let a = Mat::<f64>::zeros(10, 4);
// creates a 5×4 matrix containing `0.0` except on the main diagonal,
// which contains `1.0` instead.
let a = Mat::<f64>::identity(5, 4);
```

In some cases, users may wish to avoid the cost of initializing the matrix to zero, in which case, unsafe code may be used to allocate an uninitialized matrix, which can then be filled out before it's used.

```
// `a` is initially a 0×0 matrix.
let mut a = Mat::<f64>::with_capacity(4, 3);
// `a` is now a 4×3 matrix, whose values are uninitialized.
unsafe { a.set_dims(4, 3) };
for j in 0..a.ncols() {
for i in 0..a.nrows() {
// we cannot write `a[(i, j)] = 9.0`, as that would
// create a reference to uninitialized data,
// which is currently disallowed by Rust.
a.write(i, j, 9.0);
// we can also skip the bound checks using
// read_unchecked and write_unchecked
unsafe { a.write_unchecked(i, j, 9.0) };
}
}
```

## Creating a matrix view

In some situations, it may be desirable to create a matrix view over existing
data.
In that case, we can use `MatRef`

(or `MatMut`

for
mutable views).

They can be created in a safe way using:

`mat::from_column_major_slice`

,`mat::from_row_major_slice`

,`mat::from_column_major_slice_mut`

,`mat::from_row_major_slice_mut`

,

for contiguous matrix storage, or:

`mat::from_column_major_slice_with_stride`

,`mat::from_row_major_slice_with_stride`

,`mat::from_column_major_slice_with_stride_mut`

,`mat::from_row_major_slice_with_stride_mut`

,

for strided matrix storage.

An unsafe lower level pointer API is also provided for handling uninitialized data
or arbitrary strides using `mat::from_raw_parts`

and `mat::from_raw_parts_mut`

.

## Converting to a view

A `Mat`

instance `m`

can be converted to `MatRef`

or `MatMut`

by writing `m.as_ref()`

or `m.as_mut()`

.

## Reborrowing a mutable view

Immutable matrix views can be freely copied around, since they are non-owning wrappers around a pointer and the matrix dimensions/strides.

Mutable matrices however are limited by Rust's borrow checker. Copying them would be unsound since only a single active mutable view is allowed at a time.

This means the following code does not compile.

```
use faer::{Mat, MatMut};
fn takes_view_mut(m: MatMut<f64>) {}
let mut a = Mat::<f64>::new();
let view = a.as_mut();
takes_view_mut(view);
// This would have failed to compile since `MatMut` is never `Copy`
// takes_view_mut(view);
```

The alternative is to temporarily give up ownership over the data, by creating a view with a shorter lifetime, then recovering the ownership when the view is no longer being used.

This is also called reborrowing.

```
use faer::{Mat, MatMut, MatRef};
use reborrow::*;
fn takes_view(m: MatRef<f64>) {}
fn takes_view_mut(m: MatMut<f64>) {}
let mut a = Mat::<f64>::new();
let mut view = a.as_mut();
takes_view_mut(view.rb_mut());
takes_view_mut(view.rb_mut());
takes_view(view.rb()); // We can also reborrow immutably
{
let short_view = view.rb_mut();
// This would have failed to compile since we can't use the original view
// while the reborrowed view is still being actively used
// takes_view_mut(view);
takes_view_mut(short_view);
}
// We can once again use the original view
takes_view_mut(view.rb_mut());
// Or consume it to convert it to an immutable view
takes_view(view.into_const());
```

## Splitting a matrix view, slicing a submatrix

A matrix view can be split up along its row axis, column axis or both.
This is done using `MatRef::split_at_row`

, `MatRef::split_at_col`

or
`MatRef::split_at`

(or `MatMut::split_at_row_mut`

, `MatMut::split_at_col_mut`

or
`MatMut::split_at_mut`

).

These functions take the middle index at which the split is performed, and return the two sides (in top/bottom or left/right order) or the four corners (top left, top right, bottom left, bottom right)

We can also take a submatrix using `MatRef::subrows`

, `MatRef::subcols`

or
`MatRef::submatrix`

(or `MatMut::subrows_mut`

, `MatMut::subcols_mut`

or
`MatMut::submatrix_mut`

).

Alternatively, we can also use `MatRef::get`

or `MatMut::get_mut`

, which take
as parameters the row and column ranges.

### ⚠️Warning⚠️

Note that `MatRef::submatrix`

(and `MatRef::subrows`

, `MatRef::subcols`

) takes
as a parameter, the first row and column of the submatrix, then the number
of rows and columns of the submatrix.

On the other hand, `MatRef::get`

takes a range from the first row and column
to the last row and column.

# Matrix arithmetic operations

`faer`

matrices implement most of the arithmetic operators, so two matrices
can be added simply by writing `&a + &b`

, the result of the expression is a
`faer::Mat`

, which allows simple chaining of operations (e.g. `(&a + faer::scale(3.0) * &b) * &c`

), although
at the cost of allocating temporary matrices.

Temporary allocations can be avoided by using the `zipped!`

api:

```
use faer::{Mat, zipped, unzipped};
let a = Mat::<f64>::zeros(4, 3);
let b = Mat::<f64>::zeros(4, 3);
let mut c = Mat::<f64>::zeros(4, 3);
// Sums `a` and `b` and stores the result in `c`.
zipped!(&mut c, &a, &b).for_each(|unzipped!(c, a, b)| *c = *a + *b);
// Sums `a`, `b` and `c` into a new matrix `d`.
let d = zipped!(&mut c, &a, &b).map(|unzipped!(c, a, b)| *a + *b + *c);
```

For matrix multiplication, the non-allocating api in-place is provided in the
`faer::linalg::matmul`

module.

```
use faer::{Mat, Parallelism};
use faer::linalg::matmul::matmul;
let a = Mat::<f64>::zeros(4, 3);
let b = Mat::<f64>::zeros(3, 5);
let mut c = Mat::<f64>::zeros(4, 5);
// Computes `faer::scale(3.0) * &a * &b` and stores the result in `c`.
matmul(c.as_mut(), a.as_ref(), b.as_ref(), None, 3.0, Parallelism::None);
// Computes `faer::scale(3.0) * &a * &b + 5.0 * &c` and stores the result in `c`.
matmul(c.as_mut(), a.as_ref(), b.as_ref(), Some(5.0), 3.0, Parallelism::None);
```

# Solving a Linear System

Several applications require solving a linear system of the form \(A x = b\). The recommended method can vary depending on the properties of \(A\), and the desired numerical accuracy.

## \(A\) is triangular

In this case, one can use \(A\) and \(b\) directly to find \(x\), using the functions
provided in `faer::linalg::triangular_solve`

.

```
use faer::{Mat, Parallelism};
use faer::linalg::triangular_solve::solve_lower_triangular_in_place;
let a = Mat::<f64>::from_fn(4, 4, |i, j| if i >= j { 1.0 } else { 0.0 });
let b = Mat::<f64>::from_fn(4, 2, |i, j| (i - j) as f64);
let mut x = Mat::<f64>::zeros(4, 2);
x.copy_from(&b);
solve_lower_triangular_in_place(a.as_ref(), x.as_mut(), Parallelism::None);
// x now contains the approximate solution
```

In the case where \(A\) has a unit diagonal, one can use
`solve_unit_lower_triangular_in_place`

, which avoids reading the diagonal, and
instead implicitly uses the value `1.0`

as a replacement.

## \(A\) is real-symmetric/complex-Hermitian

If \(A\) is Hermitian and positive definite, users can use the Cholesky LLT decomposition.

```
use faer::{mat, Side};
use faer::prelude::*;
let a = mat![
[10.0, 2.0],
[2.0, 10.0f64],
];
let b = mat![[15.0], [-3.0f64]];
// Compute the Cholesky decomposition,
// reading only the lower triangular half of the matrix.
let llt = a.cholesky(Side::Lower).unwrap();
let x = llt.solve(&b);
```

### Low level API

Alternatively, a lower-level API could be used to avoid temporary allocations. The corresponding code for other decompositions follows the same pattern, so we will skip similar examples for the remainder of this page.

```
use faer::{mat, Parallelism, Conj};
use faer::linalg::cholesky::llt::compute::cholesky_in_place_req;
use faer::linalg::cholesky::llt::compute::{cholesky_in_place, LltRegularization, LltParams};
use faer::linalg::cholesky::llt::solve::solve_in_place_req;
use faer::linalg::cholesky::llt::solve::solve_in_place_with_conj;
use dyn_stack::{PodStack, GlobalPodBuffer};
let a = mat![
[10.0, 2.0],
[2.0, 10.0f64],
];
let mut b = mat![[15.0], [-3.0f64]];
let mut llt = Mat::<f64>::zeros(2, 2);
let no_par = Parallelism::None;
// Compute the size and alignment of the required scratch space
let cholesky_memory = cholesky_in_place_req::<f64>(
a.nrows(),
Parallelism::None,
LltParams::default(),
).unwrap();
let solve_memory = solve_in_place_req::<f64>(
a.nrows(),
b.ncols(),
Parallelism::None,
).unwrap();
// Allocate the scratch space
let mut memory = GlobalPodBuffer::new(cholesky_memory.or(solve_memory));
let mut stack = PodStack::new(&mut mem);
// Compute the decomposition
llt.copy_from(&a);
cholesky_in_place(
llt.as_mut(),
LltRegularization::default(), // no regularization
no_par,
stack.rb_mut(), // scratch space
LltParams::default(), // default settings
);
// Solve the linear system
solve_in_place_with_conj(llt.as_ref(), Conj::No, b.as_mut(), no_par, stack);
```

If \(A\) is not positive definite, the Bunch-Kaufman LBLT decomposition is recommended instead.

```
use faer::{mat, Side};
use faer::prelude::*;
let a = mat![
[10.0, 2.0],
[2.0, -10.0f64],
];
let b = mat![[15.0], [-3.0f64]];
// Compute the Bunch-Kaufman LBLT decomposition,
// reading only the lower triangular half of the matrix.
let lblt = a.lblt(Side::Lower);
let x = lblt.solve(&b);
```

## \(A\) is square

For a square matrix \(A\), we can use the LU decomposition with partial pivoting, or the full pivoting variant which is slower but can be more accurate when the matrix is nearly singular.

```
use faer::mat;
use faer::prelude::*;
let a = mat![
[10.0, 3.0],
[2.0, -10.0f64],
];
let b = mat![[15.0], [-3.0f64]];
// Compute the LU decomposition with partial pivoting,
let plu = a.partial_piv_lu();
let x1 = plu.solve(&b);
// or the LU decomposition with full pivoting.
let flu = a.full_piv_lu();
let x2 = flu.solve(&b);
```

## \(A\) is a tall matrix (least squares solution)

When the linear system is over-determined, an exact solution may not necessarily exist, in which case we can get a best-effort result by computing the least squares solution. That is, the solution that minimizes \(||A x - b||\).

This can be done using the QR decomposition.

```
use faer::mat;
use faer::prelude::*;
let a = mat![
[10.0, 3.0],
[2.0, -10.0],
[3.0, -45.0f64],
];
let b = mat![[15.0], [-3.0], [13.1f64]];
// Compute the QR decomposition.
let qr = a.qr();
let x = qr.solve_lstsq(&b);
```

## Computing the singular value decomposition

```
use faer::mat;
use faer::prelude::*;
let a = mat![
[10.0, 3.0],
[2.0, -10.0],
[3.0, -45.0f64],
];
// Compute the SVD decomposition.
let svd = a.svd();
// Compute the thin SVD decomposition.
let svd = a.thin_svd();
// Compute the singular values.
let svd = a.singular_values();
```

## Computing the eigenvalue decomposition

```
use faer::mat;
use faer::prelude::*;
use faer::complex_native::c64;
let a = mat![
[10.0, 3.0],
[2.0, -10.0f64],
];
// Compute the eigendecomposition.
let evd = a.eigendecomposition::<c64>();
// Compute the eigenvalues.
let evd = a.eigenvalues::<c64>();
// Compute the eigendecomposition assuming `a` is Hermitian.
let evd = a.selfadjoint_eigendecomposition();
// Compute the eigenvalues assuming `a` is Hermitian.
let evd = a.selfadjoint_eigenvalues();
```

Conversions from/to external library types is provided separately from `faer`

itself, in the `faer-ext`

crate.

# Note

Only matrix view types can be converted. Owning matrices can't be converted due to `faer`

using a different allocation scheme from `nalgebra`

and `ndarray`

.

# Converting to/from `nalgebra`

matrices

Conversion from `nalgebra`

types is done by enabling the `nalgebra`

feature and using the `IntoFaer`

and `IntoFaerComplex`

traits.

Conversion to `nalgebra`

types is enabled by the same feature and uses the `IntoNalgebra`

and `IntoNalgebraComplex`

traits.

```
use faer::Mat;
use faer_ext::*;
let mut I_faer = Mat::<f32>::identity(8, 7);
let mut I_nalgebra = nalgebra::DMatrix::<f32>::identity(8, 7);
assert!(I_nalgebra.view_range(.., ..).into_faer() == I_faer);
assert!(I_faer.as_ref().into_nalgebra() == I_nalgebra);
assert!(I_nalgebra.view_range_mut(.., ..).into_faer() == I_faer);
assert!(I_faer.as_mut().into_nalgebra() == I_nalgebra);
```

# Converting to/from `ndarray`

matrices

Conversion from `ndarray`

types is done by enabling the `ndarray`

feature and using the `IntoFaer`

and `IntoFaerComplex`

traits.

Conversion to `ndarray`

types is enabled by the same feature and uses the `IntoNdarray`

and `IntoNdarrayComplex`

traits.

```
use faer::Mat;
use faer_ext::*;
let mut I_faer = Mat::<f32>::identity(8, 7);
let mut I_ndarray = ndarray::Array2::<f32>::zeros([8, 7]);
I_ndarray.diag_mut().fill(1.0);
assert_matrix_eq!(I_ndarray.view().into_faer(), I_faer, comp = exact);
assert!(I_faer.as_ref().into_ndarray() == I_ndarray);
assert!(I_ndarray.view_mut().into_faer() == I_faer);
assert!(I_faer.as_mut().into_ndarray() == I_ndarray);
```

# Sparse Linear Algebra

A sparse matrix is a matrix for which the majority of entries are zeros. By skipping those entries and only processing the non-zero elements, along with their positions in the matrix, we can obtain a reduction in memory usage and amount of computations.

Since the sparse data structures and algorithms can often deal with very large matrices and use an unpredictable amount of memory, `faer`

tries to be more robust with respect to handling out of memory conditions.
Due to this, most of the sparse `faer`

routines that require memory allocations will return a result that could indicate whether the matrix was too large to be stored.

Most of the errors can be propagated to the caller and handled at the top level, or handled at some point by switching to a method that requires less memory usage.

## Creating a sparse matrix

Just like how dense matrices can be column-major or row-major, sparse matrices also come in two layouts.
The first one is [Compressed] Sparse Column layout (CSC) and the second is [Compressed] Sparse Row layout (CSR).
Most of the `faer`

sparse algorithms operate on CSC matrices, with CSR matrices requiring a preliminary conversion step.

The CSC (resp. CSR) layout of a matrix \(A\) is composed of three components:

- the column pointers (resp. row pointers),
- (Optional) the number of non-zero elements per column (resp. row), in which case we say the matrix is in an uncompressed mode,
- the row indices (resp. column indices),
- the numerical values

The column pointers are in an array of size `A.ncols() + 1`

.

The row indices are in an array of size `col_ptr[A.ncols()]`

, such that when the matrix is compressed, `row_indices[col_ptr[j]..col_ptr[j + 1]]`

contains the indices of the non-zero rows in column `j`

.
When the matrix is uncompressed, the row indices are instead contained in `row_indices[col_ptr[j]..col_ptr[j] + nnz_per_col[j]]`

.

The numerical values are in an array of the same size as the row indices, and indicate the numerical value stored at the corresponding index.

Let us take the following matrix as an example:

```
[[0.0, 2.0, 4.0, 7.0]
[1.0, 0.0, 5.0, 0.0]
[0.0, 3.0, 6.0, 0.0]]
```

In CSC layout, the matrix would be stored as follows:

```
column pointers = [0, 1, 3, 6, 7]
row indices = [1, 0, 2, 0, 1, 2, 0]
values = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
```

In CSR layout, it would be stored as follows:

```
row pointers = [0, 3, 5, 7]
column indices = [1, 2, 3, 0, 2, 1, 2]
values = [2.0, 4.0, 7.0, 1.0, 5.0, 3.0, 6.0]
```

CSC matrices require the row indices in each column to be sorted and contain no duplicates, and matrices created by `faer`

will respect that unless otherwise specified.
The same is true for CSR matrices.

Some algorithms specifically don't strictly require sorted indices, such as the matrix decomposition algorithms. In which case `faer`

provides an escape hatch for users that wish to avoid the overhead of sorting their input data.

The simplest way to create sparse matrices is to use the `SparseColMat::try_new_from_triplets`

(or `SparseRowMat::try_new_from_triplets`

) constructor, which takes as input a potentially unsorted list of tuples containing the row index, column index and numerical value of each entry.
Entries with the same row and column indices are added together. `SparseColMat::try_new_from_nonnegative_triplets`

can also be used, which skips entries containins a negative row or column index.

For problems where the sparsity pattern doesn't change as often as the numerical values, the symbolic and numeric parts can be decoupled using `SymbolicSparseColMat::try_new_from_indices`

(or `SymbolicSparseColMat::try_new_from_nonnegative_indices`

) along with `SparseColMat::new_from_order_and_values`

.

```
use faer::sparse::*;
let a = SparseColMat::<usize, f64>::try_new_from_triplets(
3,
4,
&[
(1, 0, 1.0),
(0, 1, 2.0),
(2, 1, 3.0),
(0, 2, 4.0),
(1, 2, 5.0),
(2, 2, 6.0),
(0, 3, 7.0),
],
);
let a = match a {
Ok(a) => a,
Err(err) => match err {
CreationError::Generic(err) => return Err(err),
CreationError::OutOfBounds { row, col } => {
panic!("some of the provided indices exceed the size of the matrix.")
}
},
};
```

In the case where users want to create their matrix directly from the components, `SymbolicSparseColMat::new_checked`

, `SymbolicSparseColMat::new_unsorted_checked`

or `SymbolicSparseColMat::new_unchecked`

can be used to respectively create a matrix after checking that it satisfies the validity requirements, creating a matrix after skipping the sorted indices requirement, or skipping all checks altogether.

Just like dense matrices, sparse matrices also come in three variants. An owning kind (`SparseColMat`

), a reference kind (`SparseColMatRef`

) and a mutable reference kind (`SparseColMatMut`

).

Note that mutable sparse matrix views do not allow modifying the sparsity structure. Only the numerical values are modifiable through mutable views.

# Matrix arithmetic operations

`faer`

matrices implement most of the arithmetic operators, so two matrices
can be added simply by writing `&a + &b`

, the result of the expression is a
`faer::SparseColMat`

or `faer::SparseRowMat`

, which allows simple chaining of operations (e.g. `(&a - &b) * &c`

), although
at the cost of allocating temporary matrices.

For more complicated use cases, users are encouraged to preallocate the storage for the temporaries with the corresponding sparsity structure and use `faer::sparse::ops::binary_op_assign_into`

or `faer::modules::core::ternary_op_assign_into`

.

# Solving a Linear System

Just like for dense matrices, `faer`

provides several sparse matrix decompositions for solving linear systems \(Ax = b\), where \(A\) is sparse and \(b\) and \(x\) are dense.
These typically come in two variants, supernodal and simplicial.
The variant is selected automatically depending on the sparsity structure of the matrix.
And although the lower level API provides a way to tune the selection options, it is currently not fully documented.

## \(A\) is triangular

`faer::sparse::FaerSparseMat::sp_solve_lower_triangular_in_place`

can be used, or similar methods for when the diagonal is unit and/or the matrix is upper triangular.

## \(A\) is real-symmetric/complex-Hermitian and positive definite

If \(A\) is Hermitian and positive definite, users can use the Cholesky LLT decomposition.

```
use faer::prelude::*;
use faer::sparse::FaerSparseMat;
use faer::Side;
let a = SparseColMat::<usize, f64>::try_new_from_triplets(
2,
2,
&[
(0, 0, 10.0),
(1, 0, 2.0),
(0, 1, 2.0),
(1, 1, 10.0),
],
).unwrap();
let b = mat![[15.0], [-3.0f64]];
let llt = a.sp_cholesky(Side::Lower).unwrap();
let x = llt.solve(&b);
```

## \(A\) is square

For a square matrix \(A\), we can use the LU decomposition with partial pivoting.

```
use faer::prelude::*;
use faer::sparse::FaerSparseMat;
let a = SparseColMat::<usize, f64>::try_new_from_triplets(
2,
2,
&[
(0, 0, 10.0),
(1, 0, 2.0),
(0, 1, 4.0),
(1, 1, 20.0),
],
).unwrap();
let b = mat![[15.0], [-3.0f64]];
let lu = a.sp_lu().unwrap();
let x = lu.solve(&b);
```

## \(A\) is a tall matrix (least squares solution)

When the linear system is over-determined, an exact solution may not necessarily exist, in which case we can get a best-effort result by computing the least squares solution. That is, the solution that minimizes \(||A x - b||\).

This can be done using the QR decomposition.

```
use faer::prelude::*;
use faer::sparse::FaerSparseMat;
let a = SparseColMat::<usize, f64>::try_new_from_triplets(
3,
2,
&[
(0, 0, 10.0),
(1, 0, 2.0),
(2, 0, 3.0),
(0, 1, 4.0),
(1, 1, 20.0),
(2, 1, -45.0),
],
).unwrap();
let b = mat![[15.0], [-3.0], [33.0f64]];
let qr = a.sp_qr().unwrap();
let x = qr.solve_lstsq(&b);
```

A (currently incomplete) guide for developers is available in the faer repository in the book subdirectory.

This guide aims to explain in detail the various techniques used to implement each algorithm, as well as ensuring good performance and numerical stability.

The guide is written in typst, and can be compiled using the command line tool.