The various linear solvers available
=============================================

These **linear solvers are in fact omnipresent in the flow of*Code_Aster*operators** because they are often buried in the depths of other numerical algorithms: non-linear schema, time integration, modal analysis etc. They often consume most of the CPU time and memory. The choice and complete configuration of the required linear solver is made*via* the keyword factor SOLVEUR. It is present in most calculation commands (STAT_NON_LINE, THER_LINEAIRE, CALC_MODES...).

This keyword allows you to choose between the two classes of solvers: the direct and the iterative ones. Regarding **directs**, we have the classic "Gauss" algorithm (SOLVEUR/METHODE =' LDLT '), a multifrontal factorization (' MULT_FRONT ') and an external one (' MUMPS '). For **iteratives**, it is possible to use a conjugate gradient ('GCPC') or some tools from the PETSc public library ('PETSC').

Only MULT_FRONT, MUMPS, PETSC are **parallelized**. The first in OpenMP, the other two in MPI and in OpenMP (hybrid parallelism). But all solvers are compatible with parallel processing (*via* MPI) of elementary calculations and assemblies. This is true whether these treatments are initiated just before using the linear solver itself, or in another operator (for example pre/post-processing).

Let's detail a bit how each of them works:


.. csv-table::

    "**Direct solvers**", ""
    "**/' MUMPS '**", "**Multi-frontend direct solver** with pivoting. This solver is obtained by calling the **external product MUMPS** ** developed by CERFACS/IRIT/INRIA/CNRS (see Copyright §4). Non-solver matrix storage is MORSE. At the input of the solver, we convert MUMPS to the internal format: :math:`i`, :math:`j`, :math:`{K}_{\mathit{ij}}`, centralized or distributed.
    For *Code_Aster*, **its main interest lies in its ability to pivot** rows and/or columns of the matrix during factorization in case of a small pivot. This possibility is useful (even indispensable) for models leading to positive undefined matrices (excluding boundary conditions); for example, 'mixed' elements having 'Lagrange' degrees of freedom (incompressible elements...).
    This **method is parallelized in distributed memory** (MPI) and **shared memory** (OpenMP). It can be run on multiple processors (*via* the Astk interface menu Options/LaunchOptions/NCPUS&MPI_NBCPU&MPI_NBNode).
    **In parallel MPI** (mpi_nbcpu&mpi_nbnode) **, MUMPS naturally distributes its data** (matrix, factored...) between the cores of the various allocated computing nodes. This greatly speeds up the calculations and makes it possible to reduce the memory consumption required, by process MPI, to start the calculation. This RAM consumption can be reduced even more*via* the GESTION_MEMOIRE or RENUMde MUMPS parameters.
    In terms of memory, the bottleneck can then be found at the level of space JEVEUX. To reduce the latter, you can distribute the *Aster* (MATR_ASSE) *matrix*via* the MATR_DISTRIBUEE option.
    MUMPS also relies on a **second level of nested parallelism** within MPI parallelism (**hybrid parallelism**) and based on OpenMP. This second level of parallelism is mainly activated in calls to the underlying mathematical libraries (BLAS, LAPACK). Unlike MPI, it is a parallelism with shared memory and therefore limited to only the cores of a computing node (ncpus).
    To speed up big studies (N at least > 2.106ddls) we can also activate acceleration/compression options from MUMPS (from v5.1.0) *via* the keywords ACCELERATION/LOW_RANK_SEUIL."
    "", ""
    "**/' MULT_FRONT '**", "**Multifrontend direct solver developed in-house EDF R&D**. Matrix storage is MORSE (or 'CSC' for 'Compressed Sparse Column') and therefore does not allow any pivoting. This method is parallelized in shared memory (OpenMP) and can be run on multiple processors (*via* the Astk menu Options/Launch Options/ncpus interface). The initial matrix is stored in a single JEVEUXet object, its factored is spread over several, so it can be partially and automatically unloaded onto disk."
    "**/' LDLT '**", "**Direct solver with Crout factorization** by blocks (without pivoting) **developed in-house EDF R&D**. Non-solver matrix storage is MORSE. As input to the solver, we convert LDLT to the internal format: 'sky line' ('SKYLINE'). We have a completely configurable memory paging (the matrix is broken down into blocks managed in memory independently and unloaded onto disk as and when) which makes it possible to go from big cases but which is paid for by expensive disk accesses.
    In addition, this solver makes it possible to factor the matrix only partially. This possibility is 'historic'. It allows you to factor the matrix in several 'times' (several jobs) or even to modify the last rows of this factored matrix on the fly. Today, the interest of this functionality is not well imagined except for certain (so-called discrete) methods of contact and friction where the terms concerning the nodes likely to be in contact have, by design, been placed in the last rows of the matrix. Thus, as the matching iterations progress, as the relationships between these nodes change, we erase and then recalculate only these last contributions of the factored ones. This is a typical example where the clever use of a rather frustrating algorithm can lead to major gains (in time)."
    "", ""
    "**Iterative solvers**", ""
    "/' GCPC '", "***Iterative gradient-type solver conjugated with preconditioning** :math:`\mathit{ILU}(k)` **or based on simple precision factorization (** **via** **MUMPS) .** The matrix storage is then MORSE. With incomplete Cholesky, the initial matrix and its incomplete factorized are each stored in a single JEVEUX object.
    With factorized simple precision, the preconditioner is much more expensive (in CPU/RAM), but it is performed in simple precision and its calculation can be shared during several resolutions (multiple second member problem type, for example STAT_NON_LINE)."
    "", ""
    "/' PETSC '", "**Iterative solvers from the external PETSc library** (Argonne Laboratory). Non-solver matrix storage is MORSE. At the input of the solver, we convert PETSc to the internal format: 'CSR' for 'Compressed Sparse Row'. PETSc allocates blocks of contiguous lines per processor. This **method is parallelized in distributed memory** (MPI) and can be executed on multiple processors (*via* the Astk MenuOptions/LaunchOptions/MPI_NBCPU&MPI_NBNode interface).
    When PETSc uses a preconditioner based on MUMPS (PRECOND =' LDLT_SP '), it also benefits from the potentially hybrid parallelism of the direct solver. But the best accelerations in PETSc happen instead by prioritizing the first level of parallelism, MPI.
    **Warning:** solvers PETSCet MUMPSétant are incompatible sequentially, we generally prefer MUMPS. To use PETSC, you therefore often have to launch a parallel version of *Code_Aster* (even if it means using only one processor)."


.. csv-table::

    "", "**Solver Perimeter**", "**Robustness**", "**CPU**", "**Memory**", "**Details**"
    "**Direct**", "", "", "", "", ""
    "MULT_FRONT ", "Universal solver.
    Not recommended for models requiring pivoting (EFs mixed with X- FEM, incompressible... ) . ", "+++", "Seq: ++
    //: +
    (speed-up~2)", "+
    OOC [1] _", "On 4/8 hearts."
    "MUMPS ", "Universal solver, reference solver", "+++", "Seq: ++"
    //: +++
    (sp~30)", "--
    IC
    +
    OOC ", "Up to 512 cores"
    "LDLT ", "Universal solver (but very slow on
    big cases).
    Not recommended for models requiring pivoting (mixed EFs of X- FEM, incompressible...).
    Partial factorization possible (discrete contact) . ", "+++", "Seq: +", "+"
    OOC ", "Rather
    small cases or medium-sized cases that can benefit from partial factorization."
    "**Iterative**", "", "", "", "", ""
    "GCPC ", "Real symmetric problems except those requiring singularity detection (modal calculation, buckling) . ", "- (LDLT_INC)
    + (LDLT_SP)", "Seq: +++", "+++ (LDLT_INC)
    ++
    (LDLT_SP)", "With LDLT_INCne not too much increase the
    Level of
    preconditioning.
    Sometimes very effective (thermal...) especially when non-linear with LDLT_SP."
    "PETSC ", "Idem GCPCmais compatible with the non-symmetric. ", "- (LDLT_INC)
    + (LDLT_SP)", "Seq: +++
    //: +++
    (sp~4)", "+
    IC", "Rather robust algorithms: GMRES.
    Often very effective in a non-linear way with LDLT_SP."


*Figure 2.-1: Overview of the linear solvers available in Code_Aster.*

**Note:**


* *To be completely exhaustive, we can specify that some (rare) numerical operations are performed with a "fixed" solver configuration. It is therefore inaccessible to users (without software overload). Among these, we find some resolutions of discrete touch-friction methods (reserved to* *LDLT), searches for rigid modes in modal, calculations of interface modes (reserved to* *MUMPS* * *or to* *LDLT)... Each time, functional or performance reasons explain this not very transparent choice.*


.. [1] OOC for 'Out-Of-Core'. That is to say, we will free up memory RAM by unloading some of the objects onto disk. This makes it possible to replace the system swap and to deal with much bigger problems. Depending on the algorithm, these additional disk accesses can be penalizing. The opposite management mode is "In-Core" (IC). All computer objects remain in RAM. This limits the size of the problems that can be accessed (down to the system swap) but prioritizes speed.