5. Memory/time management and optimization#

Before talking about optimization, the first question we ask ourselves is how to choose the execution parameters in Astk: total memory and time for the first start of the calculation to finish correctly. Then, once the calculation has been carried out for the first time, it will always be possible to try to optimize it according to the information that we will have obtained from the first calculation.

5.1. Executing a calculation for the first time#

Knowing the size of the memory you have is necessary. The answer depends of course on the execution machine: centralized server, local machine… On a local machine, it is always possible to allow the maximum available memory, while on the centralized server, it may be appropriate not to ask for too much memory, otherwise you will have to wait for the associated job class to be released.

Estimation of the memory required: in a semi-static state, the total memory is generally a function of the size of the linear system (s) to be solved. For iso-parametric finite elements, it is fairly easy to estimate the overall size of these systems. The number of degrees of freedom can be estimated by the relationship: (number of nodes in the mesh) x (dimension of the problem (2 or 3))

This relationship does not take into account ddls relating to dualized boundary conditions. But generally, this share is low. If the finite elements are not iso-parametric (mixed formulations for example), this estimate is more difficult.

In any case, the size of the linear systems is displayed at each resolution in the.mess file: total number of equations where the matrix is of size n equations. It is important to know this number (although by itself, it is not always a completely reliable indicator, because the memory and the time required to solve a linear system depend on many other parameters: size of the width of the band, number of non-zero terms in the matrix, number of non-zero terms in the matrix, renumber… but that would take us too far). So if you do not know how to estimate the size of the linear system, it is interesting to launch the study in gradient mode for the first time: reading the mesh, assigning the finite elements, the material and the resolution (MECA_STATIQUE or STAT_NON_LINE) and resolution (or) with the direct solver MUMPS. Start the simplified study in interactive mode with interactive monitoring, then stop the calculation once the information on the size of the system is known.

Ideally [1] _ , you need about 30 GB to pass a calculation to one million degrees of freedom (case of the mesh cube in HEXA8, one side of which is embedded by dualization), the minimum memory [2] _ being 9 GB. The table below gives orders of magnitude of the « ideal » (or optimal) and minimum memory for solving a system of a given size (with the direct solver MUMPS).

Once the required memory has been estimated, the complete calculation can be started with the direct solver MUMPS. After this first calculation, it is very interesting to look at:

  • In the .mess file the total memory (JEVEUX + python + external libraries) used by the calculation: this information is displayed at the end of the file: see MAXIMUM DE MEMOIRE UTILISEE PAR LE PROCESSUS;

  • In the .resu file: the resolution time (MECA_STATIQUE/STAT_NON_LINE) in relation to the total calculation time.

5.2. Optimizing a calculation#

Optimizing a calculation on the time and memory aspects is an interesting operation but a bit delicate because there is no miracle recipe. Moreover, it is not always easy to distinguish between memory optimization and calculation time optimization, so in the following paragraphs, optimization options « in the broad sense » are given and reference is made to the appropriate documents.

Parallelism

Using parallelism is certainly the easiest and safest solution to reduce computation time (and sometimes also the memory required). Parallelism is the process of running operations on multiple processors at the same time. Setting up parallelism is only effective if the time spent in STAT_NON_LINE is more important than the total calculation time. To do this, simply choose a suitable solver (example: MUMPS, PETSC), a version MPI of*Code_Aster* and specify the number of processors in Astk (menu Options/MPI_NBCPU). On the centralized server aster4, it is recommended to start by choosing mpi_nbcpu=2, to observe the time savings and then to start again with mpi_nbcpu=4, then possibly mpi_nbcpu=8. It is possible to choose mpi_nbcpu=16 but there is a risk of waiting a long time before all these processors are available, so before the calculation starts (everything depends on the machine load!).

A lot of advice on parallelism is given in the document u2.08.06 Parallelism user manual. More detailed information on the time spent in each part of the resolution is also displayed, and can be used to better configure the calculation (time, memory, number of processors…). For this purpose, refer to the documentation. u1.03.03 Calculation performance indicators (time/memory).

The linear solver

Depending on the size of the matrix, using an iterative solver (like PETSC) instead of a direct solver will save time. For information, it is considered that an iterative solver is faster than a direct solver (by default) based on approximately 200,000 equations in 3d. But that depends on the type of structure [3] _ . The downside to using an iterative solver is unguaranteed robustness. Note that using PETSC requires a MPI version of*Code_Aster* (even if you only use one processor). From a certain size of the problem (a few million ddls), the use of a direct solver becomes impossible and it is essential to use an iterative solver.

In case of failure of the resolution by an iterative solver, the levers of action are the choice of another preconditioner (direct preconditioner with simple precision by default LDLT_SP, incomplete preconditioner LDLT_INC,…), the choice of another iterative algorithm (GMRES by default, CG, CR,…), the choice of another iterative algorithm (by default, CG, CR,…).

A lot of advice on choosing a solver is given in u2.08.03.

Global Base Management

If the memory limit is exceeded, you can play on archiving the results. Archiving (keyword factor ARCHIVAGEde STAT_NON_LINE) makes it possible to significantly reduce the size of databases by selecting the moments saved. By default, everything is archived (even the moments resulting from the sub-division of the time step). In the phase of setting up a study, this can be useful, but too costly in terms of memory. The observation and monitoring of certain quantities can effectively meet the need for archiving (see §3.4 of u2.04.01 Tips for using STAT_NON_LINE). Once the study is in place, archiving can be restricted to a limited number of time steps (post-processing moments for example).

To reduce the congestion of the result data structure, it is possible to choose a later the fields to be archived either by indicating which fields to keep, or by indicating which fields to exclude (command EXTR_RESU). Extraction can also be done on a part of the mesh or the model. The DETRUIRE command can also be used to destroy a concept. Following these extraction or deletion operations, you must specify RETASSAGE =” OUI “in the FIN command in order to effectively recover the disk space associated with the global database.

To reduce the size of the result files in the MED format (.rmed): use IMPR_RESU/RESTREINT.

For the purpose of a pursuit, you can in some cases replace the database by printing MED fields. The « chase » will then start with the commands DEBUT then LIRE_RESU.