5. Code_Aster’s GCPC « native »#
Two types of conjugate gradient can be used:
The conjugate « historical » gradient [Greg] of the code (keyword SOLVEUR/METHODE =” GCPC “), fully integrated with its sources and adhering to its memory management software package.
The one from PETSC [PET] (keyword SOLVEUR/METHODE =” PETSC “+ ALGORITHME =”CG”) called as an external library.
This chapter discusses implementation difficulties in Code_Aster from the first one. The next chapter deals more specifically with PETSC.
5.1. Specific difficulties#
5.1.1. Taking into account boundary conditions#
In Code_Aster, there are two ways to take boundary conditions into account and this step is done during the actual construction of the stiffness matrix:
By double dualization [Pel01] (operators AFFE_CHAR_ACOU/MECA/THER) using specific ddls, called Lagrange, which include the groups of unknowns concerned and make it possible to verify all types of linear boundary conditions (generalized Dirichlet)
\(\mathrm{T}\mathrm{u}\mathrm{=}0\) (5.1-1)
with \(T\) real matrix of size \(p\times n\). This technique expands the stiffness matrix into a new matrix, called « dualized », which becomes the new working matrix.
\(\tilde{\mathrm{K}}\mathrm{=}(\begin{array}{ccc}\mathrm{K}& \beta {\mathrm{T}}^{\mathrm{T}}& \beta {\mathrm{T}}^{\mathrm{T}}\\ \beta \mathrm{T}& \mathrm{-}\alpha \mathrm{Id}& \alpha \mathrm{Id}\\ \beta \mathrm{T}& \alpha \mathrm{Id}& \mathrm{-}\alpha \mathrm{Id}\end{array})\) (5.1-2)
where \(\alpha\) and \(\beta\) are two strictly positive numbers.
By simple elimination of the unknowns (operators AFFE_CHAR_CINE) by substituting and zeroing the \(p\) rows and columns concerned of the stiffness matrix. This is only valid for ddls blocks, so a linear relationship cannot be taken into account. The stiffness matrix is then written
\(\tilde{\mathrm{K}}\mathrm{=}(\begin{array}{cc}\stackrel{ˉ}{\mathrm{K}}& 0\\ 0& \mathrm{Id}\end{array})\) (5.1-3)
by noting \(\stackrel{ˉ}{\mathrm{K}}\) as its unchanged part.
Each of the two approaches has its advantages and disadvantages: genericity, modularity but increase in the size of the problem, deterioration of its conditioning and loss of its definite positivity for the former. Unlike the second, which reduces its size but which is limited to certain types of boundary conditions and is, computationally, more difficult to implement.
Note:
Other approaches were possible: simple dualization, consideration of boundary conditions in the variational formulation, projected conjugate gradient…
5.1.2. Consequence on the GCPC#
With AFFE_CHAR_CINE, the work operator \(\tilde{\mathrm{K}}\) remaining SPD, all the theory mentioned in the preceding paragraphs applies. On the other hand, with AFFE_CHAR_ACOU/MECA/THER (the most frequent case in practice), this is no longer the case, because it simply becomes symmetric and loses its positive definite character. The consequences are then of three types:
The global convergence of GCPC (cf. (3.2-9)) is no longer guaranteed,
When it occurs, it is slowed down (cf. (3.2-11)) by degraded conditioning, \(\eta (\mathrm{K})\mathrm{\ll }\eta (\tilde{\mathrm{K}})\)
Preconditioning can no longer be carried out*via a \(\mathrm{IC}(p)\), but rather by a \(\mathrm{ILU}(p)\) (cf.§4.2). Does factorization \({\mathrm{LDL}}^{T}\) still have to be possible without having to switch rows or columns!
Fortunately, an adequate arrangement of the Lagrange multipliers in relation to the groups of ddls they concern (they must include these degrees of freedom cf. [Pel01] §4), makes it possible to obtain this incomplete factorization without blowing away (whew!).
Notes:
In a report EDF B.Nitrosso [Ni92] shows that diagonal preconditioning is not possible (cf. §4.1), because it leads to the cancellation of the dot product in the denominator of step (7) of algorithm 5: calculation of the optimal conjugation parameter. So, unlike Code_Saturn, Code_Aster cannot offer this unreliable option. *
The fact remains that with dualized limit conditions, the conditioning of the work operator is degraded and therefore, the convergence of GCPC slows down. Other commercial codes, such as ANSYS [PLM], have already made this observation.
5.1.3. Memory footprint#
Taking into account the elements in §4.2, we have empirically noted an effective memory complexity of GCPC with the Incomplete Cholesky preconditioner (“LDLT_INC”), in \(\alpha\) times the size of \(K\) with:
:math:`alpha =mathrm{2,5}` in :math:`mathrm{ILU}(0)` (level by default in*Code_Aster),
\(\alpha =\mathrm{4,5}\) in \(\mathrm{ILU}(1)\),
\(\alpha \mathrm{=}\mathrm{8,5}\) in \(\mathrm{ILU}(2)\).
With the preconditioner using simple precision factorization from MUMPS (“LDLT_SP”), memory consumption is higher (at least \(\alpha >10\)). But in non-linear mode, this factorization can be maintained for several system resolutions (keyword REAC_PRECOND). The additional calculated cost due to its construction is therefore ultimately lower.
Notes:
These figures should be compared to the size of the complete factorizer, which is typically in the order of \(\alpha \mathrm{=}30\) to \(100\) (this depends in particular on the renumber used in pre-processing). The disproportion of these figures clearly illustrates the main competitive advantage of GCPC compared to direct solvers: the required memory occupancy.
However, unlike the direct solvers of Code_Aster, GCPC could not benefit from memory paging. The whole matrix must be contained in RAM because it is used a large number of times via the matrix-vector product. While direct methods handle much larger objects but these are automatically and partially unloaded onto disk (software package JEVEUXet /or Out-Of-Core capabilities from MUMPS) .
5.1.4. Parallelism#
Unlike other code solvers (MULT_FRONT, MUMPS, PETSC) and for “parallel preconditioning” problems, native GCPC can only be used sequentially (see §3.3).
For example, it does not benefit from the distribution of tasks and data provided by parallelism MPI implemented around the MUMPS direct solver. Thus, an additional memory cost of \(\alpha \mathrm{=}30\) with MUMPS sequential In-Core can very quickly drop to the level of a GCPC preconditioned by ILU (0) as soon as a dozen processors are used and/or its Out-Of-Core capabilities are activated.
5.2. Scope of use#
All operators generating real symmetric systems except those requiring singularity detection (modal calculation, stability and buckling, etc.). In non-linear mode, if the problem is real non-symmetric, we can use this solver as long as we force the symmetrization (see §5.3).
List of Code_Aster operators that can use GCPC:
CALC_FORC_AJOU,
CALC_MATR_AJOU,
CALC_PRECONT \(\to\) Symmetrized approximation possible (see §5.3),
DYNA_LINE_TRAN,
DYNA_NON_LINE \(\to\) Symmetrized approximation possible,
MACRO_MATR_AJOU,
ASSEMBLAGE,
MECA_STATIQUE,
NUME_DDL/FACTORISER/RESOUDRE,
MODE_STATIQUE,
STAT_NON_LINE \(\to\) Symmetrized approximation possible,
THER_LINEAIRE,
THER_NON_LINE \(\to\) Symmetrized approximation possible,
THER_NON_LINE_MO \(\to\) Symmetrized approximation possible.
5.3. Symmetric character of the work operator#
If the matrix is not symmetric, two scenarios arise. Either the linear solver is inserted into a non-linear process (operators mentioned see list §5.2), or this one is linear (the other operators).
In the first case, we transform the initial problem \(\text{Ku}\mathrm{=}f\) into a new symmetrized problem \(\frac{1}{2}(K+{K}^{T})u=f\). We therefore assume that the encompassing nonlinear solver (Newton algorithm) will compensate for the approximation of the initial operator by \(\tilde{K}:=\frac{1}{2}(K+{K}^{T})\). Moreover, this is not the only approximation to this non-linear process… the choice of the tangent matrix is, for example, another one.
This symmetrized approximation does not harm the robustness and coherence of the set and avoids a more expensive non-symmetric resolution. This operation is carried out by activating the keyword SYME =” OUI “(by default” “) (by default” NON “) of the keyword factor SOLVEUR and it is legal for all non-linear solvers in the code.
When the problem is purely linear, no compensation can be expected from any encompassing process and this approximation becomes impossible. The use of native GCPC is illegal and this non-symmetric system must be solved via the other linear solvers. The nature of the work operator is detected automatically, there is no need to notify it via the SYME keyword.
5.4. Setup and display#
To activate this feature, you need to initialize the keyword METHODEà “GCPC” in the keyword factor SOLVEUR [U4.50.01]. In addition, two types of preconditioners are available (keyword PRE_COND):
PRE_COND =” LDLT_INC “: A classic Cholesky Incomplete (cf. §4.2) whose filling level is controlled by the keyword NIVE_REMPLISSAGE =k (k=0 does not mean « without preconditioning » but preconditioning type ILU (0)).
PRE_COND =” LDLT_SP “: A simple precision factorization performed by the direct external solver MUMPS [R6.02.03]. This solution is,*a priori, more expensive than the first in terms of memory/time but it can be shared, in a non-linear way, between different resolutions of linear systems (keyword REAC_ITER). On the other hand, the robustness and quality of this preconditioner can greatly accelerate the convergence of GCPC.
Notes:
To minimize the size of the working matrix profile and its preconditioner (with the Incomplete Cholesky), a renumbering algorithm is available by default: “RCMK” for “Reverse Cuthill Mac-Kee” (cf. [LT98] §5.4.2). However, it can be deactivated.
The main « lever arm » to modify the « calculation time/memory occupancy » compromise of GCPC, remains the preconditioner and its various numerical parameters.
Unlike direct methods, it was necessary to set a maximum number of iterations that discriminated between converged and non-converged iterative calculations. This threshold (configurable) is arbitrarily set to half of the ddls of work problem \(\stackrel{ˆ}{K}\). If at the end of i = NMAX_ITER steps, the relative residue \({\delta }^{i}:=\frac{\parallel {r}^{i}\parallel }{\parallel f\parallel }\) is not less than RESI_RELA, the calculation stops at ERREUR_FATALE.
Factor Keyword |
Keyword |
Default Value |
References |
|
SOLVEUR |
|
“ MULT_FRONT “ |
[§1] |
|
PRE_COND =” LDLT_INC “/ “LDLT_SP” |
“LDLT_INC” |
[§4.2] [§5.4] |
||
NIVE_REMPLISSAGE =0,1,2… |
0 |
[§4.2] |
||
REAC_PRECOND |
5 |
[§5.4] |
||
RENUM =” RCMK “or” SANS” |
“RCMK” |
[§5.4] |
||
NMAX_ITER = \({i}_{\mathrm{max}}\), maximum number of iterations allowed. If \({i}_{\mathrm{max}}=0\), automatically set to \(N/2\) |
0 |
[Algorithme 5] |
||
Relative residue stop test. At iteration*i* \(\frac{\parallel {r}^{i}\parallel }{\parallel f\parallel }<\text{RESI\_RELA}\). |
10-6 |
[Algorithme 5 et §3.3] |
||
SYME =” OUI “or “NON” Symmetrized approximation of the work operator by \(\tilde{K}:=\frac{1}{2}(K+{K}^{T})\) Only valid in non-linear mode |
“NON” |
[§5.2/3] |
Table 5.4-1. Summary of the configuration of the native GCPC.
To be complete about the configuration and the display in the message file (.mess), let’s mention:
The value INFO =2 traces the evolution of the norms of absolute :math:`parallel {r}^{i}parallel` and relative residues :math:`frac{parallel {r}^{i}parallel }{parallel fparallel }`, as well as the corresponding iteration number,*i, as soon as it decreases by at least 10% (not configurable).
At convergence, that is to say as soon as \(\frac{\parallel {r}^{i}\parallel }{\parallel f\parallel }<\text{RESI\_RELA}\), we also recall the value of the norm of the initial absolute residue \(\parallel {r}^{0}\parallel\).
5.5. Tips for use#
Overall, the best compromise between robustness and memory bulkiness and cost CPU seems to come back [Anf03] [Boi03] to the two direct solvers: the multifrontend and MUMPS. Especially when treating « multiple second members » problems (e.g. operator DYNA/STAT/THER_NON_LINEsans updating the tangent matrix).
However, for problems that are quite well conditioned (thermal, simple geometry with a relatively homogeneous mesh and material characteristics…) or very memory-intensive (several million ddls), the GCPC may prove to be an interesting alternative. Especially with the LDLT_SP preconditioner, which is more expensive but also more reliable.
Because of its natural modularity and the simplicity of its components (matrix-vector product and dot product), GCPC remains much easier to maintain and evolve than other direct solvers. It is the « all-purpose » solver that is easy to implement (at least in its basic version) and very educational. It is often connected to more general processes (nested solvers, domain decomposition interface solvers…) or adapted, on a case-by-case basis, for particular matrix structures.