3. Operands#

3.1. Operand METHODE#

This keyword makes it possible to choose the method for solving linear systems:

#Solveurs directs

/” MULT_FRONT “

Multifronted Direct Solver (without pivoting during factorization). This method is parallelized in shared memory (OpenMP) and can be run on multiple processors (via the Astk menu Options/Launch Options/ncpus interface).

/” LDLT “

Direct solver with Crout factorization by blocks (without pivoting). This solver is paginated, so it can run with little memory.

/” MUMPS “ [DEFAUT]

Multifrontal direct solver with pivoting. This solver calls the library MUMPSdéveloppée by CERFACS/CNRS/ENS Lyon/ INPT/INRIA /University of Bordeaux. Allows you to process models leading to positive undefined matrices (excluding boundary conditions). For example, “mixed” elements with “Lagrange” degrees of freedom (incompressible elements, etc.). This method is mainly parallelized in distributed memory (MPI) but some of its steps can also benefit from parallelism in shared memory (OpenMP). It can be executed on several cores themselves possibly distributed on several nodes. MUMPS comes with 64 bit external redialers. This makes it possible to extend the scope of use of code_aster to very large finite element models (> 107 degrees of freedom). However, be careful, the resolution of the linear systems associated with these models requires large computing resources and to activate the various levels of parallelism or even one of the acceleration options.

#Solveurs iteratives

/” GCPC “

Iterative gradient-type solver conjugated with preconditioning by an incomplete factorization at k levels or complete with simple precision.

/” PETSC “

Iterative solvers from the PETSc library (Argonne National Laboratory) with various preconditioners. This method is parallelized in distributed memory (MPI) and can be run on multiple processors. Warning: solvers PETSCet MUMPSétant are incompatible sequentially, PETSC is not available in sequential versions of code_aster. To use PETSC, you must therefore launch a parallel version of code_aster (even if it means using only one processor).

Tip:

The default method is MUMPS .It allows, at the same time, to fully benefit from the time and memory savings provided by parallelism, and to solve numerically difficult problems (X- FEM, incompressibility, THM…).

To solve a large problem (> 106 degrees of freedom) more effectively, we can use the “low-rank” compressions of MUMPS (cf. keywords ACCELERATION/LOW_RANK_SEUIL) or, if the functional scope allows it, to iterative solvers PETSCou GCPC.

For more details and advice on the use of linear solvers, you can consult the specific instructions for use [U2.08.03] and [U2.08.06].

3.2. Parameters common to several solvers#

◊ NPREC =/nprec

/8 [DEFAUT]

◊ STOP_SINGULIER =/”OUI” [DEFAUT]

/”NON”

These two parameters are common to all direct linear solvers (LDLT, MULT_FRONT, MUMPS).


They are used to control the course of numerical factorization and the quality of the linear system solution. The numerical factorization of a matrix can fail in two cases: problem of constructing the factored matrix (structurally or numerically singular matrix) and numerical detection of a singularity (solution of the unstable linear system).

The keywords NPREC and STOP_SINGULIER make it possible to set the threshold for detecting singularities and the behavior to adopt in case of failure during factorization.


nprec is used to calibrate the process of detecting the singularity of the matrix of the system to be solved. With LDLT and MULT_FRONT, we take the absolute value of nprec, with MUMPS, we take nprec because its sign is important: if \(\text{nprec}<0\), we deactivate the detection of singularity, otherwise we activate it.

In all cases, if the nprec value is left at zero it is initialized to the value by default (8).

By initializing this parameter to a fairly low value (1 or 2) (respectively strong, for example, 20), the singularity detection will be triggered very often (respectively rarely).

For LDLT, MULT_FRONT:

When at the end of factorization, we see that a diagonal term \(d\text{'}\) has become very small (compared to what it was before factorization \(d\)), it is because the matrix is (probably) almost singular. Let’s say \(n=\mathrm{log}∣\frac{d}{d\text{'}}∣\), this magnitude ratio indicates that on one equation (at least) we lost \(n\) significant figures.

If \(n>\text{nprec}\), we consider the matrix to be singular. If the user has indicated:

For MUMPS:

If for at least one pivot, the infinite norm of the row (or column) is less than threshold \({10}^{-\text{nprec}}\) then the matrix is considered to be singular.

Some aspects of the two types of singularity detection criteria are compared in the documentation [U2.08.03].

Notes:

  • Any significant loss of significant figures during factorization is an indicator of a poorly posed problem. Several causes are possible (non-exhaustive list): insufficient boundary conditions for blocking the structure, redundant linear relationships, very heterogeneous numerical data (too large penalty terms) …

  • For LDLT and MULT_FRONT , singularity detection is done all the time because it is very inexpensive.

  • Relevant MUMPS , a mechanism makes it possible to verify the quality of the solution elsewhere ( RESI_RELA ). We therefore left the freedom to deactivate this criterion (by choosing a negative nprec*) .*

  • By default, with the direct solver MUMPS , we therefore have a double check of the quality of the solution: in linear, RESI_RELA and NPREC , in non-linear, the Newton criterion and NPREC . It is possible to unplug them, but it is not recommended as a step without good reason.

◊ ELIM_LAGR = 'NON'/'OUI'/'LAGR2'


This keyword makes it possible to eliminate the Lagrange equations corresponding to dualized kinematic conditions.


By default (except for MUMPS), these equations are not eliminated ('NON').


The removal technique used for ELIM_LAGR =' OUI 'is described in [:ref:`R3.03.05 <R3.03.05>`].

To use ELIM_LAGR =' OUI 'on several processors, it is imperative to have previously chosen DISTRIBUTION =' CENTRALISE 'in AFFE_MODELE. The elimination phase is then replicated on all processors (without leading to time savings). The resolution of the resulting linear system is then performed in parallel on all the processors (this time with a gain in time, which depends on the linear solver chosen).

You cannot use ELIM_LAGR =' OUI 'with the direct linear solver' MULT_FRONT '.


In the case of solver MUMPS, a third value is possible: 'LAGR2'. The objective is then to remove the second Lagrange equation, but to keep the first one.

The value 'LAGR2' is the default for the MUMPS solver.

This parameter can be temporarily disabled by the code to step up the calculation of the determinant of the matrix. This feature is mostly required by operators CALC_MODES with OPTION among [“PROCHE”, “AJUSTE”, “SEPARE”] and INFO_MODE. The user is then notified of this automatic change of settings via via a dedicated message (only in INFO =2).

3

3.3. METHODE =” MULT_FRONT “#

Scope of use:

Robust solver, however, not recommended for models requiring pivoting (mixed finite elements, incompressible…), for generalized matrices with links (operators ASSE_ELEM/MATR_SSD…) as well as on large finite element models (> 106 degrees of freedom).

In these cases, use the MUMPS method (see §3.5) or, in non-linear mode, PETSC + PRE_COND =” LDLT_SP “(see § 3.7) instead.

◊ RENUM =

This argument allows you to number the nodes of the model to reduce the size of the factored one (and therefore the CPU and memory consumption of the resolution):

/” METIS “[DEFAUT]

Numbering method based on a nested dissection. We use the external product of the same name, which is a global standard in the field. It is, in general, the most effective method (in time CPU and in memory).

/”MD”

(“Minimum Degree”) this numbering of the nodes minimizes the filling of the matrix during its factorization.

/” MDA “

(“Approximate Minimum Degree”) this numbering is in principle less optimal than “MD” in terms of filling but it is more economical to calculate. However, it is preferable to “MD” for large models (\(\ge 50000\) degrees of freedom).

Note:

  • In the case of generalized matries [1] _

with link constraints, MULT_FRONT does not apply renumbering. This strategy is not harmful because these matrices are often almost full and small in size. The choice of the renumber made by the user is therefore ignored. An informative message indicates this situation in the message file.

3.4. METHODE =” LDLT “#

Scope of use:

Universal solver but only on small finite element models (< 105ddls). Beyond that, the method is very slow.

Not recommended for models requiring pivoting (mixed finite elements, incompressible, etc.).

Note:

  • The matrix is systematically renumbered using the Reverse-Cuthill-Mackee algorithm. The user cannot change this choice.

3.5. METHODE =” MUMPS “#

Scope of use:

Powerful and robust universal solver. To be used on all types of problems and in particular large finite element models (>106 degrees of freedom), or even very large models (>107 degrees of freedom). Especially if they mix models, types of finite elements or if the associated linear systems require pivoting (mixed, incompressible finite elements, X-FEM modeling, dualized boundary conditions, links between groups of elements, etc.).

In non-linear, it is often preferable to use MUMPS as a preconditioner, via the option PETSC + PRE_COND =” LDLT_SP “/” LDLT_DP “(see §3.7).

The MUMPS solver, currently developed by CERFACS/CNRS/ENS Lyon/ INPT/INRIA /University of Bordeaux, is a direct parallelized multifrontal solver (in MPI and OpenMP). It is robust because it allows the rows and columns of the matrix to be rotated during numerical factorization.

Only the public version of MUMPS 5.5.1 is accepted when combined with code_aster.

On the other hand, their restricted access versions MUMPS v5.5.5consortium is also accepted but only for EDF uses. They provide access, in advance of the public version, to exploratory functionalities.

All are now compiled with 64-bit external renumbers ((PAR) METIS, (), (PT) SCOTCH and PORD), which allows MUMPS, by activating them (cf. keyword RENUM), to process very large finite element models (millions of cells).

However, be careful, the resolution of the linear systems associated with these models requires large computing resources and to activate the various levels of parallelism or even one of the acceleration options (cf. keyword ACCELERATION).

For more information you can consult §4 of [U2.08.03] and [U2.08.06].

3.5.1. Functional parameters#

◊ TYPE_RESOL =

This keyword allows you to choose the MUMPS resolution type:

/” NONSYM “

Should be chosen for non-symmetric matrices.

/” SYMGEN “

Should be chosen for positive undefined symmetric matrices. This is the most general case in*code_aster* due to the dualization of boundary conditions by Lagrange coefficients.

/” SYMDEF “

Can be chosen for positive definite symmetric matrices. There is no pivoting. The algorithm is faster and less memory intensive.

/” AUTO “[DEFAUT]

The code will choose” NONSYM “for non-symmetric matrices and” SYMGEN “for symmetric matrices.

It is not forbidden to choose “NONSYM” for a symmetric matrix. This will probably double the calculation cost but this option gives MUMPS more algorithmic possibilities (pivoting, scaling…). *On the other hand, it can be interesting, in a non-linear way, to symmetrize your non-symmetric problem (cf. [U4.51.03], keyword MATR_RIGI_SYME). It’s the same type of trick as for relaxation settings FILTRAGE_MATRICE and MIXER_PRECISION.

◊ RESI_RELA =/resi

/1.d-6 [DEFAUT] in linear /-1.d0 [DEFAUT] in nonlinear and modal calculus.

This setting is disabled by a negative value. It is callable in operators that may need to control the quality of a complete linear system resolution (so not exploded operators just doing factorizations; for example FACTORISER and INFO_MODE).

By specifying a strictly positive value for this keyword (for example \({10}^{-6}\)), the user indicates that he wants to test the validity of the solution of each linear system solved by MUMPS (relative to the exact solution).

This careful approach is recommended when the solution is not itself corrected by another algorithmic process (Newton algorithm, singularity detection…) in short in the linear operators THER_LINEAIRE and MECA_STATIQUE. In nonlinear or modal calculation, the criterion for detecting singularity and the correction of the encompassing algorithm (Newton or modal solver) are sufficient safeguards. We can therefore unplug this control process (this is what is done by default via the value -1), especially since it has a significant time cost and since this one is more important (in relative terms) in parallel and/or in memory management with unloading large objects onto disk (cf. keyword GESTION_MEMOIRE). This is an additional feature that the other direct solvers in code_aster do not offer.

If the relative error (based on the conditioning and the inverse errors of the linear system treated) on the solution estimated by MUMPS is greater than resi the code stops at ERREUR_FATALE, specifying the nature of the problem and the offending values.

Activating this keyword also initiates an iterative refinement process (except if POSTTRAITEMENTS =” SANS “) whose objective is to improve the solution obtained. This post-processing has a specific configuration (keyword POSTTRAITEMENTS). It is the solution resulting from this iterative improvement process that is being tested by RESI_RELA.

Note:

    • In the particular case where POSTTRAITEMENTS =” MINI “ “* and RESI_RELA >0, the quality estimate made by MUMPS is only partial (based only on inverse errors) and therefore code_aster does not stop the calculation if this value is greater than resi. This combination of keywords is only useful with INFO =2 to assess the qualities of the solutions.

3.5.2. Relaxation settings#

◊ FILTRAGE_MATRICE =/filtma

/-1.d0 [DEFAUT]

◊ MIXER_PRECISION =/”OUI”

/”NON” [DEFAUT]

These parameters are reserved for the quasistatic nonlinear. A negative value of filtma disables the feature.

These features allow you to « relax » the resolutions made with MUMPS in order to gain performance. The idea is simple. In non-linear mode, the calculation of the tangent matrix may be subject to error. This will probably slow down the Newton process (in number of iterations), but if manipulating this approximated matrix is less expensive, we can generally save time (fewer floating operations), memory consumption (RAM or even disk if OOC is activated), and bandwidth (cache effect, I/O volume).

Thus, activating the FILTRAGE_MATRICE feature, with a value of filtma>0, leads code_aster to only provide MUMPS with matrix terms that verify

:math:`∣{\mathrm{K}}_{\mathit{ij}}∣>\mathrm{filtma}\cdot \left(∣{\mathrm{K}}_{\mathit{ii}}∣+∣{\mathrm{K}}_{\mathit{jj}}∣\right)`

The filter is therefore based on a relative threshold with respect to the absolute values of the corresponding diagonal terms.

By initializing MIXER_PRECISION to 'OUI', we use the single-precision version of MUMPS by providing it with a double-precision Aster matrix (possibly filtered via FILTRAGE_MATRICE). This potentially leads to gains in memory (often 50%) and in time in terms of resolution. However, this trick only really pays off if the tangent matrix is well conditioned (:math:`\eta \left(\mathrm{K}\right)<{10}^{+6}`). Otherwise the resolution of the linear system is too imprecise and the nonlinear algorithm may no longer converge.

Notes:

    • These relaxation parameters for solving linear systems via MUMPS are in line with those that already exist for non-linear solvers (keywords NEWTON/REAC_ITER/, MATRICE…). These families of parameters are clearly complementary and they can make it possible to gain tens of percent in CPU and RAM consumption. Spending a little time calibrating them, on a first data set, can pay off when you have to do a lot of similar calculations later on.

  • This idea was taken up with the preconditioner * LDLT_SPde * GCPC/PETSC.

But to save memory space without risking losing calculation precision, we can also be interested in the following elements: parallel computing [U2.08.03], parameters GESTION_MEMOIRE, * ** MATR_DISTRIBUEE * ** or even RENUM.


3.5.3. Numeric parameters#

◊ PRETRAITEMENTS =

This keyword makes it possible to control the type of pre-processing to be performed on the system to improve its resolution (various strategies for balancing the terms of the matrix and for permuting its rows and columns):

/” SANS “

No pretreatment.

/” AUTO “[DEFAUT]

MUMPS chooses the best combination of parameters based on the situation.

◊ RENUM =

This keyword allows you to control the renumbering and the order of elimination. The various tools offered are not necessarily all available. It depends on the installation of MUMPS/code_aster. These tools are divided into two categories: the « basic » tools dedicated to use and provided with MUMPS (“AMD”, “AMF”, “QAMD”, “PORD”), and, the « richer » and more « sophisticated » libraries that need to be installed separately (“METIS/PARMETIS”, “SCOTCH/PTSCOTCH”), and, the « richer » and more « sophisticated » libraries that must be installed separately (“/”, “/”).

/” AMD “

“Approximate Minimum Degree”

/” AMF “

“Approximate Minimum Fill”

/” QAMD “

Variant of” AMD “(automatic detection of an almost dense line)

/” PORD “

External renumbering tool distributed with MUMPS.

/” METIS “

External renumbering tool (also available with METHODE =” MULT_FRONT”). It has been the reference renumber since the end of the 90s. It is globally recognized and used.

/” PARMETIS “

MPI parallel version of the previous tool. Only available with code_aster version MPI. Cf. [U2.08.03]. To be preferred over very large finite element models (millions of meshes).

/” SCOTCH “

External renumbering tool that tends to replace the reference tool in the field (METIS).

/” PTSCOTCH “

MPI parallel version of the previous tool. Only available with code_aster version MPI. Cf. [U2.08.03]. To be preferred over very large finite element models (millions of meshes).

/” AUTO “[DEFAUT]

MUMPS chooses the best combination of parameters based on the problem and the available packages. If the user specifies a particular renumber and the renumber is not available, the solver chooses the most appropriate renumber from the list of available ones and a ALARMEest is issued.

Note:

  • The choice of the renumber has a great importance on the memory and time consumption of the linear solver. If you want to optimize/adjust the numerical parameters linked to the linear solver, this parameter should be one of the first to try. This step is also accelerated by the « block analysis » option activated with the values” FR+ “,” FR++ “,” FR++ “,” FR++ “,” LR+ “and” LR++ “of the keyword ACCELERATION.

The renumberers (PAR) METIS , (PT) , (PT) SCOTCH and PORD are compiled in 64 bits in order to allow MUMPS to solve linear systems of several tens of millions of unknowns.

◊ POSTTRAITEMENTS =

This parameter is callable in operators that may need to control the quality of a complete linear system resolution (so not exploded operators just doing factorizations, e.g. FACTORISERet INFO_MODE). Its interest is less interesting with non-linear operators (STAT_NON_LINE…) except in case of difficult resolutions.

This keyword allows you to control two things:

  • the iterative refinement procedure whose objective is to improve the quality of the solution (cf. keyword RESI_RELA),

the partial (value “MINI”) or complete (the other values) estimation (the other values) by MUMPS of the quality of the solution. The calculation by*code_aster will only be stopped if this estimate is complete (so not with “MINI”) and only if its value is greater than the criterion specified in RESI_RELA.

/” SANS “

Deactivate.

/” FORCE “

This parameter is only used if RESI_RELA **is enabled. MUMPS performs at least one iterative refinement iteration because its stop criterion is initialized to a very low value. The number of iterations is limited to 10. It performs a complete diagnosis of the quality of the solution*via* the fairly expensive calculation for estimating the conditionings and the inverse errors of the treated hollow linear system.

/” AUTO “[DEFAUT]

This parameter is only used if RESI_RELA **is enabled. MUMPS often performs iterative refinement iteration. Its stopping criterion is close to machine precision and the number of iterations is limited to 4. It performs a complete diagnosis of the quality of the solution*via* the fairly expensive calculation for estimating the conditionings and the inverse errors of the treated hollow linear system.

/” MINI “

MUMPS does exactly two iterations of iterative refinement. In use, we note that this is often sufficient to solve many linear systems effectively. This option is preferred when looking for performance, without losing precision, when activating low-rank compressions (see keyword ACCELERATION). By default here MUMPS does not perform a diagnosis of the quality of the solution. We only perform a partial diagnosis, via the only (inexpensive) calculation of the inverse errors, if RESI_RELA >0 (useful only for expertise).

Notes:

  • This process mainly consumes ascents whose additional time cost is still reasonable in sequential In-Core. On the other hand, it can be important in Out-Of-Core and in parallel, or even not very useful in non-linear (Newton’s algorithm fixes) .

  • To limit any counterproductive drift, with the values “ AUTO “ and “ FORCE “ the iterative refinement procedure is restricted internally MUMPS: as soon as an iteration does not provide a gain of at least a factor of 5, the process stops. The number of iterations generally found (by asking INFO =2*) is 1 or 2.*

  • On some poorly conditioned test cases (for example perf001e), forcing this process made it possible to achieve the desired precision (via the values “ FORCE “ or “ MINI “) .


3.5.4. Memory Management Settings#

To gain memory RAM without changing the linear solver (and modeling or computing platform), several strategies are available (and often combinable). They are listed here in order of importance.

  • With constant numerical precision and with savings in calculation time:

Parallelism (Astk Options/MPI_** menu) coupled, or not, with the activation of the MATR_DISTRIBUEE keyword in distributed parallelism mode (mode by default);

The activation of pretreatments (done by default, see PRETRAITEMENTS);

The redistribution of parallelism between the MPI level and the OpenMP level (see REDUCTION_MPI).

  • With constant numerical precision but with, potentially, losses in calculation time:

The explicit activation of MUMPS’s disk unloading capabilities (see GESTION_MEMOIRE),

The renumber change (cf. keyword RENUM seen above).

You should also make sure to provide an additional space reserved for pivoting that is reasonable in size: keyword PCENT_PIVOT. Often the default values of these parameters (GESTION_MEMOIRE =” AUTO “, “, RENUM =”, =”, ) and PCENT_PIVOT =20) provide the best compromises to adapt this part of the settings to suit the situation. AUTO

  • By accepting a loss of precision within a non-linear process (for example STAT or DYNA_NON_LINE, CALC_MODES,…): all the relaxation parameters linked to the solver (FILTRAGE_MATRICE, MIXER_PRECISION seen previously) or even those related to the non-linear process itself (elastic tangent matrix, projection space in modal calculation…).

  • By no longer using MUMPS as a direct solver but as a preconditioner (see GCPC or PETSC with REAC_PRECOND =” LDLT_SP /DP “cf. §3.6/3.7).

For more information you can consult the documentation [U2.08.03] (Instructions for using linear solvers) and [U2.08.06] (Instructions for using parallelism).

◊ PCENT_PIVOT =/pcent [I]

/20% [DEFAUT]

This keyword allows you to choose a percentage of memory that MUMPS will reserve at the start of the calculation for pivoting. Indeed, to factor an Aster matrix, it is often preferable to swap two of its rows and/or columns (cf. [R6.02.03] §2.3.1). However, the computer objects managing this pivoting are difficult to size a priori. This is why the tool asks the user for a prior and arbitrary estimate of this additional space.

The value by default is 20%. It corresponds to a reasonable number of pivots that is sufficient for most Aster calculations. For example, if MUMPS estimates the space required for factorization without pivoting at 100, it will allocate*in fine* 120 to manage the calculation with pivoting. A value exceeding 50% must remain exceptional.

Subsequently, if the memory space required by the pivots proves to be greater, the space allocated will be insufficient and the code will require this criterion to be increased. Two scenarios will then occur depending on the type of memory management chosen (via the keyword GESTION_MEMOIRE):

  • If it is a specific mode, “IN- CORE “or” OUT_OF_CORE “, the calculation stops at ERREUR_FATALE and proposes various palliative solutions.

  • If it is the automatic mode, “AUTO”, the calculation will continue and try to factor again with a value of PCENT_PIVOT doubled. Up to three such attempts will be made before, in case of repeated failures, a stop in ERREUR_FATALE + proposal of various palliative solutions.

Notes:

  • For small problems (<1000 ddls), MUMPS may underestimate its need for pre-allocating memory space. A large value of PCENT_PIVOT (>100) is therefore not surprising.

  • Self-learning process: if, in the process described above, we are required to automatically modify the value of PCENT_PIVOT, this new value is used until the end of the operator. We assume that the numerical difficulty will not decrease and we therefore maintain this pivoting value in order to no longer waste time on aborted factorization attempts.

  • In “AUTO” mode, in conjunction with the doubling of the additional pivoting space, you may also have to automatically switch to MUMPS Out-Of-Core memory management (as if you had explicitly set * GESTION_MEMOIRE =” OUT_OF_CORE “). This happens after some return code MUMPS or on the third (and last) attempt.

◊ GESTION_MEMOIRE =

This keyword allows you to choose the memory management mode of the external product MUMPS, or even as a last resort, of certain objects managed directly by code_aster.

The first two modes are « wireless »: no settings correction will be made « on the fly » in the event of a problem. Unlike the 3rd mode, the automatic mode, which will do everything (within certain limits!) so that the calculation does not stall for reasons of memory space. In particular, depending on the memory that he manages to clear elsewhere, he will play on the In-Core and Out-Of-Core modes of MUMPS or even on the space required for its pivoting (cf. keyword PCENT_PIVOT).

/” IN_CORE “

We prioritize the speed of the calculation as much as possible. This is the option that requires the most memory, because here we allow MUMPS to keep all the objects it needs in RAM.

/” OUT_OF_CORE “

We give priority to saving memory consumption as much as possible. This is the option that requires the least memory, because here we require MUMPS to unload its most cumbersome objects onto disk [2] _ .

/” AUTO “[DEFAUT]

We automatically decide which memory management to impose on MUMPS (cf. the previous In-Core or Out-Of-Core) according to the memory capacities available at this precise moment in the calculation. A memory pre-allocation mechanism is also activated so that MUMPS can make the most of the available memory (see paragraph below). This makes it possible to limit the problems of late allocations to ensure pivoting. Two automatic correction mechanisms can also be put in place if necessary (increase in PCENT_PIVOT, unplugging memory pre-allocation).

/” EVAL “

Help with the memory calibration of the calculation. A summary display (cf. figure 3.1) of the memory resources required by the code_aster + MUMPS [3] _ calculation is provided. depending on the type of management chosen: In-Core or Out-Of-Core. Then the calculation stops in ERREUR_FATALE in order to allow the user to restart the calculation by choosing a memory setting based on these elements.

  • Linear system size: 500000

  • Minimum RAM memory consumed by code_aster: 200 MB

  • Mumps memory estimation with GESTION_MEMOIRE =” IN_CORE “:3500 MB

  • Mumps memory estimate with GESTION_MEMOIRE =” OUT_OF_CORE “: 500 MB

  • Estimated disk space for Mumps with GESTION_MEMOIRE =” OUT_OF_CORE “:2000 MB

===> For this calculation, you therefore need a minimum quantity of memory RAM of - 3500 MB if GESTION_MEMOIRE =” IN_CORE “, - 500 MB if GESTION_MEMOIRE =” OUT_OF_CORE “. When in doubt, use GESTION_MEMOIRE =” AUTO “. ******************************************************************************

  • Linear system size: 500000

  • Minimum RAM memory consumed by code_aster: 200 MB

  • Mumps memory estimation with GESTION_MEMOIRE =” IN_CORE “:3500 MB

  • Mumps memory estimate with GESTION_MEMOIRE =” OUT_OF_CORE “: 500 MB

  • Estimated disk space for Mumps with GESTION_MEMOIRE =” OUT_OF_CORE “:2000 MB

===> For this calculation, you therefore need a minimum quantity of memory RAM of - 3500 MB if GESTION_MEMOIRE =” IN_CORE “, - 500 MB if GESTION_MEMOIRE =” OUT_OF_CORE “. When in doubt, use GESTION_MEMOIRE =” AUTO “. ******************************************************************************

Figure 3.1._ Display in the message file in “ AUTO “ mode .

Enabling Out-Of-Core helps to reduce the RAM memory required per processor, but it can slow down the computation (RAM /disk I/O cost). This additional cost can be significant when carrying out numerous descents and ascents (e.g. non-linear calculation with a lot of time steps or Newton iterations, search for many eigenmodes in modal calculation, etc.). Because in these algorithmic steps, we spend as much time manipulating the data (in RAM) as looking for it (on disk). This is all the more true since the disk is common to several computing cores. For this reason, In-Core mode is preferred as much as possible (especially in parallel).

In “EVAL” mode, pre-estimates of memory consumption are much faster and less expensive in memory than the full calculation. They can make it possible to calibrate your study on a local machine or on an interactive node of a centralized machine before launching the study itself in batch mode.

Anecdotally, this mode can also be used to roughly test the data layout and/or the executable used. If everything works up to this evaluation, that’s a pretty good sign for the later calculation!

In “AUTO” mode, MUMPS is allowed to « spread out to RAM » to save time and to limit the late memory requirements associated with pivoting. MUMPS will thus be able to take all the memory he deems necessary, even possibly clear of his initial estimates. This allows it to meet possible future needs. To do this, code_aster provides him with an estimate of the RAM available.

These pre-allocations often make it possible to no longer have to adjust the “PCENT_PIVOT” parameter, often « with a wet finger ». This saves a certain amount of time in the development of studies.

On the other hand, in “AUTO” mode, a code_aster + MUMPS calculation thus really takes advantage of all the available memory: the Vmpeak is close to the number set in Astk.

On the other hand, in the other two modes (“IN_CORE” and “OUT_OF_CORE”), MUMPS does not have the right to « spread out » in RAM. It does not pre-allocate any additional space clear of its estimated initial memory. This makes it possible to maintain a safe mode of operation in case of poor evaluation of the available memory [4] _ .

Another mechanism also makes it possible to overcome this kind of inconvenience: if MUMPS seeks to allocate an object that is larger than the memory space actually available, a new factorization is attempted by no longer allowing it to pre-allocate additional space. This corrective strategy, similar to the one used for parameter PCENT_PIVOT, is only activated with the “AUTO” mode.

Notes:

  • In standard modes (“IN_CORE” and “OUT_OF_CORE”) code_aster unloads the largest objects onto disk [5] _

linked to the linear system. This is in order to leave MUMPS as much space as possible in RAM memory. If subsequently MUMPS does not have enough space to allocate its data, an alarm is issued and the calculation continues. Depending on the case, the calculation can be completed without a hitch but at the cost of a large additional cost in time (system swap) or stop in ERREUR_FATALE. The user is then offered various alternatives including the configuration in “AUTO” mode .

  • In “AUTO” mode, if this free space is insufficient to allow MUMPS to function fully in In-Core, all the remaining objects are unloaded onto disk JEVEUXdéchargeables. Then, depending on the memory space thus freed up, we activate the In-Core or Out-Of-Core mode of MUMPS or we stop in ERREUR_FATALE (+ tips) .

  • The massive unloading of objects JEVEUX mentioned above can, in exceptional cases, greatly slow down the execution. This can happen, for example, in case of congestion of disk accesses in parallel mode or when a lot of data is unloaded (fields with different time steps, projected fields…). The solution can then be to occupy fewer processors per node, to consume less memory (increase the number of processors, Out-Of-Core mode, etc.) or to split your calculation into several steps.

  • In “EVAL” mode, evaluation and then stopping are performed at the first matrix factorization via MUMPS. For example, in the prediction phase for STAT_NON_LINE, or in the Sturm test for CALC_MODES. This is often enough to have a good order of magnitude in memory requirements. To possibly postpone this evaluation, you must split your calculation and use another linear solver (for example “MULT_FRONT”) for the operators you want to preserve.

◊ MATR_DISTRIBUEE =/'OUI'

/”NON” [DEFAUT]

This setting is currently limited to operators MECA_STATIQUE, STAT_NON_LINE, and DYNA_NON_LINE and is only active in distributed parallel (AFFE_MODELE/PARTITION/PARALLELISME! =” CENTRALISE “).

By activating this keyword, the assembled matrix is stored in a distributed manner across all processors (unnecessary values belonging to other processors are no longer stored). This makes it possible to save memory in parallel without additional time costs or loss of precision (this keyword has no influence in sequential or centralized parallel).

3.5.5. Settings to reduce computation time#

To reduce computation time without changing linear solvers (and modeling or computing platforms), several strategies are available (and can be combined). They are listed here in order of importance.

  • With constant numerical accuracy:

  • parallelism MPI and/or Open MPou, possibly an optimized combination of the two (cf. keyword REDUCTION_MPI);

  • the activation of various accelerations and/or compressions (cf. keyword ACCELERATION);

  • the renumber change (keyword RENUM seen above, often the choice made by default is optimal);

  • the activation of pretreatments (done by default, keyword PRETRAITEMENTS);

  • the use of MUMPS, not as a direct solver but as a preconditioner. Perhaps very efficient, especially in non-linear mode, with the sharing of the same preconditioner over several Newton steps (cf. PETSC or GCPC with PRE_COND =” LDLT_SP /DP “§3.6/3.7)

  • By accepting a loss of precision, this is often not harmful (precision is sufficient) or compensated for by an encompassing non-linear process (e.g. STAT or DYNA_NON_LINE,, CALC_MODES… ):

for large problems (*N at least > 2.106 ddls), the activation of low-rank compressions (cf. keywords ACCELERATION/LOW_RANK_SEUIL below);

  • the relaxation parameters linked to the solver (FILTRAGE_MATRICE, MIXER_PRECISION seen above) or even those related to the non-linear process itself (elastic tangent matrix, projection space in modal calculation… );

  • the reduction of post-treatments (keyword POSTTRAITEMENTS).

For more information you can consult the documentation [U2.08.03] (Instructions for using linear solvers) and [U2.08.06] (Instructions for using parallelism).

◊ ACCELERATION =/'AUTO' [DEFAUT]

/”FR” (available for all versions of MUMPS) /”FR+” or “FR++” (only with « consortium » versions)

/”LR” (available for all versions of MUMPS) /”lR+”or “LR++” (only with « consortium » versions)

acceleration type: often effective values, “FR++” or “LR++”.

If ACCELERATION =”LR”/”LR+” or “LR++”:

◊ LOW_RANK_SEUIL =/0.0 [DEFAUT]

/lr_threshold [R]

compression threshold: values that are often effective between 10-12 and 10-9 (cf. explanations below).

These keywords define the type of acceleration implemented to reduce computation time. These accelerations can significantly reduce the calculation time of large studies, without restrictions on the scope of use and with little or no impact on the precision, robustness and overall behavior of the simulation. Their availability depends on the versions of MUMPS coupled with code_aster.

They are especially interesting on large problems (N at least > 2.10 6ddls). The gains observed in some code_aster studies vary from 20% to 80%. They increase with the size of the problem, its massive nature and they are complementary to those provided by parallelism and the renumber.

The different values of the ACCELERATIONsont parameter:

  • The value “AUTO “ (taken by default) chooses the best setting according to the version of MUMPS available, the case treated and the calculation configuration.For the public version” AUTO “=”FR” and, for the consortium version, “AUTO” =”FR+ “(except in modal calculation where AUTO =”FR”).

  • The value “FR” allows the implementation of a standard MUMPS resolution (called “Full-Rank”), i.e. without “low-rank” compression and without « aggressive optimization » of the options internal to MUMPS.

  • The value “FR+” allows the implementation of a resolution without “low-rank” compression but by activating « aggressive optimizations » of options internal to MUMPS .In particular, block analysis is activated, which significantly accelerates this often expensive first step of MUMPS.

  • The value “FR++ “adds to” FR+ “particular MUMPS options that accelerate the 2nd and 3rd steps of MUMPS (numerical factorization and downhill): extended OpenMP perimeter, relaxed pivoting, accelerated symbolic factorization… However, in particular cases, these aggressive options can disrupt singularity detection (increase the value of NPREC) or memory estimates (put GESTION_MEMOIRE) =” IN_CORE “/” OUT_OF_CORE “).

  • The value “LR” activates a MUMPS resolution with “Low-Rank” compression. The compression rate is set by the parameter provided by the LOW_RANK_SEUIL keyword. Broadly, the bigger this number is (compared to the precision by default, 10-15), for example 10-12 or 10-9, the more compression will be important and therefore the more interesting the performance gains may be. From a certain compression threshold (therefore « approximation »), it is advisable to activate, in addition, the iterative refinement procedure (for example via POSTTRAITEMENTS =” MINI “). This makes it possible to find, often with a low additional cost, an error in the solution close to that which one would have obtained with the standard calculation, “FR”.

  • The’LR+”/ “LR++” values activate the same option as before (“LR”) but by adding the « aggressive optimizations » of “FR++”. If the compression ratio is positive (LOW_RANK_SEUIL >0), the BLR compression algorithm standard from MUMPS is used (called “UFSC”); otherwise, it is its exploratory version (called “UCFS”), potentially faster, because the compression is done earlier in the algorithm process of MUMPS, but therefore less numerically stable. Depending on the options, we then target the calculation time criterion or memory consumption instead.

  • The value “ L R+” favors saving time.

  • The value “ L R+ + favors memory gain (additional compressions and use of mixed precisions).

For more details on these two keywords you can consult [U2.08.03] §7.2.7.

◊ REDUCTION_MPI =/redmpi [I] (only with "consortium" versions)

/0 [DEFAUT]

This keyword can only be used on a parallel MPI version of the code. It is only active for a strictly positive integer value and compatible with the activated number of MPI (see example below). It makes it possible to reduce, just to the level of MUMPS, the first-level parallelism (MPI) and to redistribute it to the second level (OpenMP). At the output of MUMPS, the calculation uses the parallel configuration set when code_aster was launched.

[6] _ each using*9OpenMP [7] _ (occupying for example 1 node in the gaia computing cluster, i.e. 4x9=36 cores):

1. the matrices and vectors are built in the procedures for elementary calculations and code_aster assemblies and only in parallel MPI; these therefore use only 4 cores out of the 36 allocated; the acceleration of these calculation steps is therefore at most X4;


2. do MUMPS linear system resolutions fully use all 48 cores thanks to the 2 levels of parallelism MPI and OpenMP. Their capacities for accelerating these are roughly similar and with parallel efficiencies of the order of 30 to 50%. Hence a potential acceleration of this calculation step between X15 and X24;


Activating the REDUCTION_MPI keyword will allow:

* to use more MPI processes in order to better speed up part 1;


* while maintaining the acceleration level of part 2 and without impacting its overall memory consumption;

So, with REDUCTION_MPI =3 on the previous example and this time distributing the code_aster calculation, not on 4 but on*4x3=12 MPI, each using 3 OpenMP* (to continue to stay on 1 node with 36 cores):

The matrix and vector construction parts can be accelerated this time up to**X12* by occupying 12 cores;

Part MUMPS remains just as efficient by still occupying 36 cores; this time by distributing the calculation over 12/3=4 MPI, each using 3x3=9 OpenMP.**We find the operating point initially quoted*.

Since part MUMPS is the most memory intensive, the initial 12 MPI x 3 OpenMP distribution is probably not possible or difficult on 1 single node. Hence the need to reorganize parallelism just at the level of MUMPS.

This redistribution of parallelism within MUMPS thus allows:

  • To globally accelerate calculations whose laws of behavior are costly, without impacting the other important part of the calculation, that of solving a linear system; especially in terms of memory consumption.

  • Optionally to reduce the overall memory peak of a calculation (sum of the memory peaks of the processes MPI) by reducing the number of processes MPI in the calculation step MUMPS; this is often the one requiring the most memory.

Note:

  • This parameter is only available with the consortium version of MUMPS and cannot be activated with the modal calculation operator CALC_MODES. This one has its own redistribution of parallelism (2 MPI levels and 1 OpenMP level) .

    ◊ NB_RHS =/-128, [DEFAUT] with the MODE_STATIQUE operator

    /1, [DEFAUT] otherwise


/nbrhs [I]

This keyword specifies the number of second members (“Right Hand Side” in the lingo) that MUMPS can handle at the same time. This integer value is only used when the code_aster operator solves n linear systems simultaneously: A (x1… xn) = (B1… bn) = (B1… bn). These systems all share the same matrix A and they are distinguished only by different second members, bi, and therefore, of course, by different solution vectors xi.

In order to save time, MUMPS is allowed to process them in blocks of nbrhssecond members at the same time.

With some operators (for example MODE_STATIQUE), these second members are very sparse (just a few non-zero terms). In order to save even more time and a bit of memory, we are also adding this information to the address MUMPS. For the sake of cross-validation and quality assurance, for these specific operators, both simultaneous resolution in dense (standard case with nbrhs>0) and in trough (nbrhs<0) is allowed.

In practice, an absolute value of nbrhs of a few tens, or even a few hundreds (for MODE_STATIQUE) can produce significant accelerations (X10) of this stage of MUMPS. The value*min (abs (nbrhs), n)* is transmitted to MUMPS.

Note:

  • Note that these « simultaneous resolutions » * are limited to direct solver use (an « exact » factorization is required) of MUMPS, so to METHODE =” MUMPS “ .

3.6. METHODE =” GCPC “#

Scope of use:

Real symmetric problems except those requiring singularity detection (modal calculation). In non-linear mode, if the problem is real non-symmetric, we can use this solver as long as the matrix has been made symmetric.

◊ PRE_COND =

This keyword allows you to choose the preconditioning method:

Using “LDLT_SP” or “LDLT_DP” is more expensive in CPU/RAM but more robust. These preconditioners are obtained by calculating with library MUMPS an approximate factorization of the matrix of the linear system. This factorization can be calculated with single (LDLT_SP) or double (LDLT_DP) precision. Its interest lies especially in its sharing (cf. keyword REAC_PRECOND) during several resolutions if one seeks to solve problems of the multiple second member type (for example STAT_NON_LINE or thermo-mechanical chaining with MECA_STATIQUE).

◊ NIVE_REMPLISSAGE =/level

/0 [DEFAUT]

This setting only applies to preconditioner LDLT_INC. The preconditioning matrix (\(\text{P}\)) used to accelerate the convergence of the conjugate gradient is obtained by factorizing the initial matrix (\(\text{K}\)) more or less completely.

The higher the level, the closer the matrix \(\text{P}\) is to \({\text{K}}^{-1}\) and therefore the faster the conjugate gradient converges (in number of iterations). On the other hand, the larger the level the storage of \(\text{P}\) becomes larger (in memory and on disk) and the more expensive the iterations are in CPU.

It is recommended to use the default value (\(\mathit{niv}=0\)). If \(\mathit{niv}=0\) does not allow the conjugate gradient to converge, the values \(\mathit{niv}=\mathrm{1,}\mathrm{2,}3\dots\) will be tried successively.

Likewise, if the number of iterations of the conjugate gradient is considered too large, it is often beneficial to increase the level of filling.

◊ REAC_PRECOND =/reac

/30 [DEFAUT]

This setting only applies to preconditioners LDLT_SPet LDLT_DP

These preconditioners are much more expensive to build than the incomplete preconditioner but they are more robust. To make it really competitive compared to traditional direct solvers (MULT_FRONT or MUMPS double precision), it must be kept for several successive resolutions. We thus play on the « relative proximity » of these successive iterations. To do this, the REAC_PRECOND parameter conditions the number of times you keep the same preconditioner while the problem matrix has changed. As long as the iterative method GCPC takes fewer reac iterations to converge, we keep the preconditioner unchanged; if it exceeds this number, we update the preconditioner by recalculating a factorization.

◊ PCENT_PIVOT =/pcent

/20 [DEFAUT]

◊ GESTION_MEMOIRE =/”AUTO”, [DEFAUT]

/”IN_CORE”

◊ LOW_RANK_SEUIL =/0.0, [DEFAUT]

/lr_threshold [R] ◊ RENUM =/”SANS”, [DEFAUT] /”PARMETIS”

/”METIS”,

These settings only apply to preconditioners LDLT_SP and LDLT_DP. The “LOW_RANK_SEUIL” keyword allows you to adjust the precision of the block-low-rank compression used by MUMPS. The larger this parameter is, the more severe the compression and the less faithful the factorization is.

These are the same keywords as for solver MUMPS, cf. § 3.5.4.

◊ NMAX_ITER =/rate

/0 [DEFAUT]

Maximum number of iterations of the iterative resolution algorithm. If \(\mathit{niter}=0\) then the maximum number of iterations is calculated as follows:

\(\mathrm{niter}=\mathrm{nequ}/2\) where \(\mathrm{nequ}\) is the number of equations in the system.

◊ RESI_RELA =/resi

/10-6 [DEFAUT]

Algorithm convergence criterion. This is a relative criterion for the non-preconditioned residue:

\(\frac{\parallel {\text{r}}_{m}\parallel }{\parallel \text{f}\parallel }\le \mathrm{resi}\)

\({\text{r}}_{m}\) is the residue not prepackaged at iteration \(m\)

\(\text{f}\) is the second member and the \(\parallel \parallel\) norm is the usual Euclidean norm.

Note:

  • When using the preconditioner LDLT_INC, the matrix is systematically renumbered using the Reverse-Cuthill-Mackee algorithm. The user cannot change this choice.

3.7. METHODE =” PETSC “#

Scope of use:

All types of real problems except those requiring singularity detection (modal calculation). To be used primarily on non-linear problems (with PRE_COND =” LDLT_SP “) or on « border » problems (> 5.107 degrees of freedom).

Warning: since solvers PETSC and MUMPS are incompatible sequentially, only MUMPS is available in sequential versions of code_aster. To use PETSC, you must therefore always launch a parallel version of code_aster (even if it means using only one processor).

◊ ALGORITHME =

Name of the iterative solvers (Krylov type) in PETSc accessible from code_aster:

/” FGMRES “[DEFAUT]

“Flexible Generalised Minimal RESidual”

/” GMRES “

“Generalised Minimal RESidual”

/” GMRES_LMP “

“Generalised Minimal RESidual”, with Limited Memory Preconditioner

/”CG”

Conjugate Gradient

/”CR”

Conjugated residue

/” GCR “

“Generalised Conjugate Residual”

The method by default ensures the best ratio between robustness and calculation cost.

The “CG” and “CR” methods are to be reserved for models leading to symmetric matrices. In non-symmetric mode, in addition to “GMRES”, you can use “GCR” which treats arbitrary matrices.

The “GMRES_LMP” algorithm is based on the iterative solver “GMRES”. It must be used with the first level preconditioner “LDLT_SP”. Its use is interesting in a non-linear calculation: in fact, the second level preconditioner improves the preconditioning of a system based on spectral information from previous linear resolutions (see [R6.01.02])

◊ PRE_COND =

Name of PETSc preconditioners accessible from code_aster:

Only LDLT_SP, LDLT_DP, ML, * ML, * BOOMERet JACOBI have exactly the same operation sequentially and in parallel. The other two, “ LDLT_INC “and “ SOR “, modify the calculation a bit by using diagonal blocks local to the processors. They are easier to implement but less effective. “SANS” makes it possible not to apply a preconditioner, which is only useful when developing a calculation.

The algebraic multigrid preconditioners ML, BOOMERet * GAMG as well as the domain decomposition preconditioner HPDDM have a very restricted scope of application:

  • calculation without Lagrange multipliers (use AFFE_CHAR_CINE to impose loads),

  • with a constant number of degrees of freedom per node.

However, they are very effective in parallel. It should be noted that the ML preconditioner relies on a random draw during its algorithm, which may result in a slightly different convergence between two identical resolutions. These multi-grid preconditioners are for use with the CG solvers or GCR solvers (which may work when CG fails).

It should be noted that the domain decomposition preconditioner HPDDM can only be used in the context of massive parallelism (by using the PARTITIONNEUR keyword from the LIRE_MAILLAGE command).

The “BLOC_LAGR” preconditioner is a block preconditioner designed for calculations with Lagrange multipliers. It should be used with METHODE =” PETSC “.

The “LDLT_SP” and “LDLT_DP” preconditioners are a priori the most robust, but also the most expensive to build. However, and unlike other preconditioners, they are not reconstructed at each linear resolution, which ultimately makes them competitive (cf. keyword REAC_PRECOND). These preconditioners are to be used with the FGMRES solver by default (or CG or GCR if the matrix is symmetric). It is preferable to avoid GMRES (or its symmetric equivalent CR) combined with a preconditioner in simple precision (risk of obtaining an inaccurate solution, because the solver’s stopping criterion is disturbed by the mixture of arithmetics).

In a non-linear calculation, we can finally use “LDLT_SP” with the “GMRES_LMP” algorithm: a second-level preconditioner (LMP) then improves the preconditioning of a linear resolution based on spectral information from previous linear resolutions (see [R6.01.02]).

The “FIELDSPLIT” preconditioner provides a very general framework for defining block preconditioners. This is a very powerful feature that requires a very good knowledge of iterative methods and the PETSc library. It is mainly used in research actions.

The “UTILISATEUR” preconditioner allows the use of a user-defined preconditioner in a function, based on the PETSc Python interface. This is an advanced feature that requires a very good knowledge of iterative methods and the PETSc library. It is mainly used in research actions.

◊ NIVE_REMPLISSAGE =/level

/0 [DEFAUT]

This setting only applies to preconditioner LDLT_INC.

Cholesky preconditioner fill level Incomplete.

◊ REMPLISSAGE =/:math:`\alpha`

/1.0 [DEFAUT]

This setting only applies to preconditioner LDLT_INC.

3.6

Increase factor in the size of the preconditioner as a function of the filling level (see §). The reference is set to \(\mathit{niv}=0\) for which \(\alpha =1\). This parameter is only taken into account if PRE_COND =” LDLT_INC “. This figure allows PETSc to roughly predict the size needed to store the preconditioner. If this estimate is too low, PETSc expands objects on the fly, but this operation is more expensive.

◊ REAC_PRECOND =/reac

/30 [DEFAUT]

This setting only applies to preconditioners LDLT_SP and LDLT_DP.

These preconditioners are much more expensive than the incomplete preconditioner but they are more robust. To make them really competitive compared to traditional direct solvers (MULT_FRONT or MUMPS double precision), you have to keep them for several successive resolutions.

Parameter REAC_PRECOND determines the number of times you keep the same preconditioner when the problem matrix has changed. As long as the iterative solver (ALGORITHME) called by PETSC takes fewer reac iterations to converge, we keep the preconditioner unchanged; if it exceeds this number, we update the preconditioner by doing a simple precision factorization again.

◊ PCENT_PIVOT =/pcent

/20 [DEFAUT]

◊ GESTION_MEMOIRE =/”AUTO”, [DEFAUT]

/”IN_CORE”

◊ LOW_RANK_SEUIL =/1.E-08, [DEFAUT]

/lr_threshold [R] ◊ RENUM =/”SANS”, [DEFAUT] /”PARMETIS”

/”METIS”,

These settings only apply to preconditioners LDLT_SP and LDLT_DP. The “LOW_RANK_SEUIL” keyword allows you to adjust the precision of the block-low-rank compression used by MUMPS. The larger this parameter is, the more severe the compression and the less faithful the factorization is.

These are the same keywords as for solver MUMPS, cf. § 3.5.4.

◊ MATR_DISTRIBUEE =/'OUI'

/”NON” [DEFAUT]

This setting is currently limited to operators MECA_STATIQUE, STAT_NON_LINE, and DYNA_NON_LINE and is only active in distributed parallel (AFFE_MODELE/PARTITION/PARALLELISME! =” CENTRALISE “) .

By activating this keyword, the assembled matrix is stored in a distributed manner across all processors (unnecessary values belonging to other processors are no longer stored). This makes it possible to save memory in parallel without additional time costs or loss of precision (this keyword has no influence in sequential or centralized parallel). It should be noted that it is recommended to use SOUS_DOMAINE partitioning in AFFE_MODELE in order to avoid potential packaging problems associated with renumbering the assembled matrix.

◊ NMAX_ITER =/rate

/0 [DEFAUT]

Maximum number of iterations of the iterative resolution algorithm. If \(\mathit{niter}\le 0\), the maximum number of iterations is set by default to 10000. This value is reduced to 100 if preconditioner LDLT_SP is used.

◊ RESI_RELA =/resi

/10-6 [DEFAUT]

Algorithm convergence criterion. This is a relative criterion for the preconditioned residue:

\(\frac{\parallel {M}^{-1}\mathrm{.}{\text{r}}_{m}\parallel }{\parallel {M}^{-1}\mathrm{.}\text{f}\parallel }\le \mathit{resi}\)

\({M}^{-1}\) is the preconditioner

\({\text{r}}_{m}\) is the residue at iteration \(m\)

\(\text{f}\) is the second member and the \(\parallel \parallel\) norm is the usual Euclidean norm.

Notes:

  1. The convergence criterion for PETSCest evaluated differently than GCPC;

  2. When the preconditioner is of poor quality (for example because of poor conditioning of the problem), the convergence criterion used by PETSCpeut gives rise to poor solutions; this is why in linear calculation operators, an additional check on the non-preconditioned residue is performed. The tolerance chosen for this additional criterion is \(\sqrt{\mathit{resi}}\) ;

  3. The algorithm “ GCR “is based on pre-conditioning on the right and therefore verifies the criterion of convergence into a non-pre-conditioned norm.

  4. When using the preconditioner LDLT_INC, the matrix is systematically renumbered using the Reverse-Cuthill-Mackee algorithm. The user cannot change this choice.

  5. The LDLT_INCest preconditioner is incompatible with MATR_DISTRIBUEE =” OUI “ .

3.7.1. Keywords specific to the preconditioner FIELDSPLIT#

◊ NOM_CMP

List of field components that appear in the modeling, for example NOM_CMP =( “DX”, “DY”, “DZ”, “PRES”) for mixed displacement and pressure modeling for an incompressible elasticity problem.

◊ PARTITION_CMP

How we are going to group the components of the fields. In example NOM_CMP =( “DX”, “DY”, “DY”, “DZ”, “DZ”, “PRES”), we use PARTITION_CMP =( 3.1), which means that the three displacement components and the pressure component will be treated separately.

◊ OPTION_PETSC

Character string that defines the block preconditioner. This string must be on a single line (without line breaks).

3.7.2. Keywords specific to the preconditioner UTILISATEUR#

◊ KSP_UTIL

A user-written function that describes the resolution algorithm. It is written in Python and uses the PETSc interface for this language, called petsc4py. This is an advanced use of PETSc, to be reserved exclusively for connoisseurs for research purposes.

3.7.3. Tag OPTION_PETSC#

In addition to precisely defining the FIELDSPLIT preconditioner (as mentioned above), the OPTION_PETSC keyword also allows additional options not supported by the interface described here to be passed to PETSc. This library is in fact designed to be highly configurable at runtime, which is what this keyword makes it possible to achieve. This is an advanced use of PETSc, to be reserved exclusively for connoisseurs for research purposes.


◊ OPTION_PETSC

Character string that allows you to pass a set of additional options to PETSc

It is important to note that the additional options passed to PETSc are retained from its initialization to its closure, usually when the FIN () command is executed. So if a user passes additional options in STAT_NON_LINE, they will be used for all subsequent resolutions even with other resolution operators like DYNA_VIBRA.

To get around this, it is possible to stop PETSc which has the effect of resetting all the options. This is done using the commands:

Fromcode_asterImportLinearAlgebra # LinearAlgebra module import

LinearAlgebra.petscInitialize () # call to the method that initializes PETSc. Elle

# can take news as arguments

# options

LinearAlgebra.petscFinalize () # call to the method that closes PETSc and resets

# the options