5. Optimizing performance#

In quasistatics, it is not uncommon to have to perform more than 10 iterations to have convergence in the sense of the residue in equilibrium. In implicit dynamics this value of 10 iterations is, in general, a good starting value for the parameter « ITER_GLOB_MAXI » of the keyword factor « CONVERGENCE » in « DYNA_NON_LINE ». If it is not possible to converge in less than 10 to 20 iterations, it is then preferable to reduce the time step rather than to increase the maximum number of iterations allowed.

Explicitly, there are no iterations for balance, the cost of calculating each time step will therefore be constant, regardless of the level of non-linearity (except, possibly, the local verification of the behavior).

The use, even routine, of explicit integration schemes therefore seems very attractive in view of the CPU time which remains under control. However, this optimism must be tempered by bearing in mind that we are depriving ourselves of the safeguard that is the precise verification of the balance and that, therefore, the quality of the explicit solution obtained must be analyzed more carefully. The explicit algorithm will not diverge (if the current stability condition is met), but the solution obtained is not guaranteed by a balance verification criterion. In particular, a parametric study on the time step is essential because the shape of the solution can vary greatly as the transition progresses.

In addition, code_aster is not a code optimized for explicit calculations and its explicit performance is modest, compared to specialized codes like Europlexus [bib2] _.

5.1. Model reduction and dynamic condensation by substructuring#

A solution to reduce the calculation time is to project the problem on a reduced basis (modal basis or Ritz basis). The number of degrees of freedom is then greatly reduced and this type of approach is available in « DYNA_NON_LINE » (the resolution also benefits from being explicit because the stability condition of Current on a reduced modal basis is not very penalizing, since this base has a low cutoff frequency). To summarize, this type of approach is particularly suited to problems where non-linearities remain moderate and localized. As soon as the nonlinearities become strong, we can ask ourselves the question of updating the initial reduced base, which loses its coherence with the current solution. The additional calculation cost due to the recalculation of the base and to the reprojections then reduces the advantage of this type of method. The keyword factor « PROJ_MODAL » allows this type of calculation to be done with an explicit integration scheme, with the degrees of freedom being generalized. An example of the calculation is provided by the test case « SDNV107a » [V5.03.107]. Another example of nonlinear dynamic calculation on a hydromechanical problem of a soil column is provided by the test case « WDNP101a » [V5.03.107].

It is also recommended to use this reduction/condensation approach in a « mixed » way, for « low frequency » problems, where the model consists of several sub-domains by condensing the resolution of the sub-domains of linear behavior on the degrees of freedom of their interfaces with sub-domains of non-linear behavior, which are themselves treated as finite elements. Each of these linear subdomains is condensed to a dynamic macro element « MACR_ELEM_DYNA » [U4.65.01] defined on each interface. Using the - operator [U4.63.33] after solving the global « mixed » problem we can reconstruct the fields in each sub-domain. An example of nonlinear dynamic calculation by reduction/condensation is provided by the test case « FORMA12g » [V2.08.012] with an implicit integration scheme. The methodology is presented in detail in [U2.07.04].

5.2. Parallelism#

Parallelism can provide an even more significant gain as the system to be solved will include a large number of DDL and the number of time steps will be low. Indeed, the resolution algorithm imposes important communications at each step (we are not in a multi-domain strategy in terms of time and space). In practice, on problems involving a few hundred thousand DDL, the speed-up remains good up to 16 to 32 processors. In most cases, the iterative solver « PETSC » will provide a gain compared to the « MUMPS » solver. In addition to parallelism at the solver level, the parallelism of the elementary steps (resolving the behavior at Gauss points) will provide an important additional gain if the behavior integration is expensive. In all cases, it is very useful to consult the CPU time measurements by step in the .mess message file, which makes it possible to identify the most expensive steps in transient resolution to adapt the parallelism accordingly. The [U2.08.06] documentation provides all the details for using parallelism.