6. Krylov’s iterative solvers via PETSc#

6.1. The PETSc bookstore#

The library PETSc [PET] (for “Portable Extensive Toolkit for Scientific Computation”) from the Argonne laboratory (DOE and NSF) offers, in addition to a wide variety of solvers (direct, iterative, modal, ODE,…), utilities for manipulating distributed data and profiling parallel calculations. It is a public domain product, interfaced with various languages (F77/90, C/C++, Python), parallelized in MPI and which has reached a certain maturity [27] _ .

Some codes, such as MEF ++ [Tar04] did not hesitate to rely entirely on PETSc to take full advantage of its toolkit aspects and its parallelism capabilities. This performance aspect [28] _ is the priority of the product, which has long claimed records in this field (solving linear systems with 500 million unknowns in 2004).

6.2. Implementation in Code_Aster#

The software project for the implementation of PETSc in Code_Aster is detailed in the note by T.De Soza [Des07] and traced in the Rex Aster tool. Here we summarize the main lines:

  1. An PETSc execution via Code_Aster takes place in three steps: pre-allocating the memory space and filling the matrix, configuring the chosen Krylov solver and its preconditioner, providing the second member, and effective resolution.

2. The conversion of the Aster matrix format (format CSC “Compressed Sparse Column”) to the format specific to PETSc (CSR “Compressed Sparse Row”). Symmetrically, only the upper triangular part is stored in Aster, while PETSc requires the entire matrix. On the other hand, to be effective, PETSc needs specific indicators: sequentially, the average number of non-zero terms per line, in parallel, that of the diagonal and extra-diagonal blocks of the block of lines associated with the processor. This phase of pre-locating objects PETSc is tedious but thankfully inexpensive compared to the other steps [29] _ .

  1. In parallel mode, data distribution is entirely controlled by PETSc. The processor assigns a block of contiguous lines to each processor. The Aster user cannot, for the moment, really intervene on this choice (except to change the number of processors!). This is the same philosophy as for the OpenMP parallelism of the Aster multifrontend. With MUMPS, the choice is wider: the user can distribute their data directly or via automatic tools (possibly controllable).

4. The indefinite nature of most Aster matrices (Lagrange, mixed finite elements…) prevents the use of renumbers and preconditioners specific to PETSc. To be able to work effectively, we « bluff » the tool by providing it with renumbered matrices [30] _ by the RCMK [31] _ d”Aster (who specifically manages Les Lagranges).

6.3. Scope of use#

Concerning Aster operators, it is the same as for the native GCPC of the code (see §5.2).

As for the functionalities of the PETSc library, we have limited ourselves for the moment to a few iterative linear solvers such as Krylov:

CG (“CG”): standard conjugate gradient (the same as*Aster’s native GCPC; M.R.Hestenes and E.Stiefel 1952),

CR (“CR”): conjugated residue (*M.R.Hestenes and E.Stiefel 1952),

GMRES (“GMRES”): Generalised Minimal RESidual (*Saad and Schultz 1986),

  • FGMRES (“FGMRES”): Flexible Generalised Minimal RESidual

GCR (“GCR”): Generalised Conjugate Residual (*S. C. Eisenstat, H. C. Elman, and H. C. Schultz 1983).

and to preconditioners:

ILU (*k) (“LDLT_INC”): incomplete factorization by level (the same as the native GCPC of*Aster*; cf. §4.2),

  • JACOBI (“JACOBI”): standard diagonal preconditioner (see §4.1),

  • SOR (“SOR”): Successive Over Relaxation (see §4.2).

Boomeramg (“BOOMER”): algebraic multigrid provided by the*Hypre library.

Multilevel (“ML”): algebraic multigrid provided by the*ML library.

  • GAMG (“GAMG”): algebraic multigrid from library PETSc

Moreover, as for “GCPC”, another preconditioner (internal to Code_Aster) is available: it is” LDLT_SP “which is based on simple precision factorization repeated during several iterative resolutions.

The “BLOC_LAGR” preconditioner is an augmented Lagrangian block preconditioner that applies to systems with Lagrange multipliers. It is an internal code_aster preconditioner developed based on PETSc. A system with Lagrange multipliers has the following saddle point structure:

(6.1)#\[\begin{split} \ left (\ begin {array} {cc} K& {B} ^ {B} ^ {T}\\ B& 0\ end {array}\ right)\ left (\ begin {array} {c} u\\ mathrm {\ lambda}\ lambda}\\ lamda}\\\\ {lambda}\\\\\}\\\\ mathrm {\ lambda}\\\\\\\\\ lamda}\\\\\\\\\\ lamda}\\\\\\\\\\\\ Lambda}\\\\\\\\\\\\\\ Lambda}\\\\\\\\\\\\\\ Lambda}\\\\\\\end{split}\]

The augmented Lagrangian preconditioner (see [Mer15]) is of the form:

(6.2)#\[\begin{split} M=\ left (\ begin {array} {cc} K+\ mathrm {\ gamma} {B} ^ {T} B& 2 {B} ^ {T}\\ 0& -\ frac {1} {\ frac {1} {\ mathrm {\ gamma}}} I\ end {array}\ right)\end{split}\]

Finally, it is possible to combine the iterative solver GMRES with the preconditioner LDLT_SP (called the first-level preconditioner) and a limited-memory preconditioner « Limited Memory Preconditioner », called the second-level preconditioner. This combination makes it possible to speed up the resolution of a non-linear problem. In fact, the solution of the non-linear problem is obtained by solving a succession of linearized problems. The second-level preconditioner is built from spectral information (Ritz pairs) from previous linear resolutions (see [Mer15]).

Notes:

  • The CG and CR methods are reserved for Aster models leading to matrices SPD (the most common case notwithstanding the double Lagranges problems cf. §5.1/6.2). In non-symmetric mode, you have to use the other methods (GMRES and GCR) .

  • All PETSc preconditioners offer a variety of parameters. For now, only the ILU fill factor is available to the user.

  • In parallel mode, only JACOBI offers a preconditioner strictly equivalent to sequential mode. The others, primarily ILU and SOR, precondition using the CPU-local diagonal block.

6.4. Symmetric character of the work operator#

We can make the same remarks as for the native GCPC (§5.3). Except that with PETSc, you can use a Krylov solver compatible with non-symmetric systems (GMRES, GCR). The parameter “SYME” is therefore losing a lot of interest.

6.5. Setup and display#

To activate this feature, you need to initialize the keyword METHODEà “PETSC” in the keyword factor SOLVEUR [U4.50.01]. Overall, we can make the same comments as in §5.4.

Factor key

Keyword

Default Value

References

SOLVEUR

METHODE =” PETSC “

“ MULT_FRONT “

[§1]

ALGORITHME =”CG”/”CR”/ “GMRES”/”GCR”/”FGMRES”/”GMRES_LMP”

“FGMRES”

[§6.3]

PRE_COND =” LDLT_INC “ /” JACOBI “/” SOR “/” BOOMER “/” “/”ML”/” SANS “

“ LDLT_SP “

[§6.3]

NIVE_REMPLISSAGE =0,1,2…

0 (if LDLT_INC)

[§4.2]

REMPLISSAGE = \(\alpha\) estimated size of the preconditioner

1.0 (if LDLT_INC)

[§5.1]

RENUM =” RCMK “or” SANS “

“ RCMK “

NMAX_ITER = \({i}_{\mathrm{max}}\), maximum number of iterations allowed. If \({i}_{\mathrm{max}}\le 0\), set automatically by PETSc (set by PETSC_DEFAULTà maxits=105)

-1

[Algorithme 5] for GC; equivalent for other Krylov solvers.

Convergence Test: \(\parallel {r}^{i}\parallel <\text{max}(\text{RESI\_RELA}\text{.}\parallel f\parallel {\text{,10}}^{\text{-50}})\) Divergence Test: \(\parallel {r}^{i}\parallel >{\text{10}}^{5}\text{.}\parallel f\parallel\)

10-6

[Algorithme 5] for GC; equivalent for other Krylov solvers.

SYME =” OUI “or “NON” Symmetrized approximation of the work operator. Only legal in non-linear mode. Only relevant for “CG” and “CR” .

“NON”

[§6.4]

Table 6.5-1. Summary of the configuration of the call to PETSc via Code_Aster.

6.6. Tips for use#

We can provide the same advice for use as for the native GCPC (see §5.5). Notwithstanding the fact that, unlike native GCPC, some solvers in PETSc are adapted to non-symmetric systems.

In general, the use of the « simple » conjugate gradient (native GCPC or PETSc CG) is not very robust on Aster models. We prefer more sophisticated algorithms: GMRES… On large calculations (\(>{5.10}^{5}\) degrees of freedom) that are very intensive in resolving linear systems (operators STAT_NON_LINE, MECA_STATIQUE…), it is in our best interest to activate the parallelism induced by PETSc.