4. Breakdown of time consumed#

4.1. Time consumed per order#

Code_Aster offers two reading levels to analyze the time consumed by each order:

A**global level* that combines all the calculation steps of said order,

A**finer level* (configurable) that allows you to separate the main steps.

4.1.1. Global monitoring of each order#

At the end of each order, the various times consumed are plotted in the message file [6] _ from operator: CPU [7] _ +system (USER + SYST), system (SYST), and elapsed (ELAPS). In principle, any significant drift in time SYST and/or time ELAPS ED must be questioned (cf. §6.2) .

# FIN COMMANDE NO: 0045 USER + SYST: 520.01s (SYST: 12.17s, ELAPS: 525.12s)

# ——————————————————————————–

Figure 4.1.1-a : Traces of the overall time consumption of an Aster order (extracted from an.mess*) .*

These times are also summarized, for all commands, in the result file.

COMMAND : USER : SYSTEM : USER+SYS : ELAPSED *

init (jdc): 2.62:0.88:3.50:4.30

. compile: 0.00:0.00:0.00:0.00:0.01

. exec_compile: 0.54:0.03:0.03:0.57:0.58

. report: 0.03:0.00:0.00:0.03:0.03

. build: 0.00:0.00:0.00:0.00

DEBUT: 0.04:0.05:0.09:0.13

PRE_GIBI: 10.75:1.89:12.64:12.69

LIRE_MAILLAGE: 27.92:0.13:28.05:28.37

DEFI_MATERIAU: 0.01:0.00:0.01:0.01

AFFE_MATERIAU: 0.04:0.01:0.05:0.0

AFFE_MODELE: 5.48:0.08:5.56:5.57

AFFE_CHAR_MECA: 0.52:0.02:0.02:0.54:0.54

…

MECA_STATIQUE: 2249.89:18.55:2268.44:2271.87

TEST_RESU: 0.01:0.01:0.02:0.01

FIN: 0.11:0.01:0.12:0.17

. Supervisor share: 3.37:0.94:4.31:5.21

. dveri: 0.68:0.01:0.69:0.73

. Fortran share: 4600.81:39.42:39.42:4640.23:4650.96

TOTAL_JOB: 4604.18:40.36:4636:4644.54:4656.18

Figure 4.1.1-b : Global time consumption of all Aster orders (extracted from an.resu*) .*

4.1.2. Final monitoring of each order#

The breakdown of the time consumed (USER + SYST, SYST, ELAPS) according to the various calculation steps (elementary calculation, assembly, numerical factorization…) is carried out at the end of each operator involving the construction and/or resolution of linear systems (for example STAT_NON_LINE and CALC_CHAMP). By default, summary values are displayed (lev=1). In fact, this description of the time consumed can be broken down according to different reading levels via the parameter MESURE_TEMPS/NIVE_DETAIL of the DEBUT/POURSUITE commands.

Prioriti , any significant drift in time SYST and/or time ELAPS must be questioned (cf. § 6.2) .

Let’s detail the impression levels of MESURE_TEMPS/NIVE_DETAIL =level (default=1):

/level =0, No impression relating to monitoring at the end of each order,

/niv =1, Synthetic impressions of the three types of time for elemental/assembly calculations and for the resolution of associated linear systems:

#1. resolution.des.systemes.lineaire… CPU. (USER + SYST/SYST/ELAPS):… 7.52… 0.79.. 11.22

#2. Elemental.calculations.and.assemblies… CPU. (USER + SYST/SYST/ELAPS):.. 15.07… 0.70… 15.77

Figure 4.1.2-a : Time consumption in an Aster command type STAT_NON_LINE or CALC_CHAMP with lev=1 sequentially (extracted from a.mess).

/niv =2, Detailed impressions of the times for the two main classes of calculations: elemental calculations/assemblies and the resolution of associated linear systems. This type of information can provide information about the impact of a modification of the command file or the parameters for starting the calculation (memory, parallelism, etc.). For example, knowing that parallelism MPI does not potentially make it possible to reduce (cf. doc. U2 (dedicated) that steps 1.3/1.4/2, it is of little use to parallelize a calculation [8] _

on dozens of processors if step 1.2 takes 20% of the total time [9] _ !

#1 .resolution.des.systemes.lineaires… CPU. (USER + SYST/SYST/ELAPS):… 7.72… 0.82… 8.72

#1 .1.Numbering, .connectivity of.the .matrix CPU. (USER + SYST/SYST/ELAPS):… 0.21… 0.02… 0.31

#1 .2.factorization.Symbolic… CPU. (USER + SYST/SYST/ELAPS):… 0.58… 0.05… 1.28

#1 .3.factorization.numeric. (or.precond.)… CPU. (USER + SYST/SYST/ELAPS):… 6.78… 0.73… 7.71

#1 .4.Resolution… CPU. (USER + SYST/SYST/ELAPS):… 0.15… 0.02… 0.35

#2 .calculs.elementaires.and.assemblages… CPU. (USER + SYST/SYST/ELAPS):.. 28.87… 0.64… 29.47

#2 .1.routine.Calculation… CPU. (USER + SYST/SYST/ELAPS):.. 26.61… 0.56… 26.61

#2 .1.1.routines.te00ij… CPU. (USER + SYST/SYST/ELAPS):.. 24.58… 0.07… 25.78

#2 .2.Assemblies… CPU. (USER + SYST/SYST/ELAPS):… 2.26… 0.08… 3.36

#2 .2.1.Assembly.matrices… CPU. (USER + SYST/SYST/ELAPS):… 2.02… 0.06… 3.12

#2 .2.2.assemblage.seconds.members… CPU. (USER + SYST/SYST/ELAPS):… 0.24… 0.02… 0.37

Figure 4.1.2-b : Time consumption in an Aster type STAT_NON_LINE or CALC_CHAMP command with lev=2 sequentially (extracted from an.mess*) .*

/niv =3, Same as niv=2 but the printing is done for each time step or calculation increment.

In parallel mode MPI (cf. doc. U2 on parallelism or doc. U4.50.01), we add the average value, over all the processors, of the times consumed as well as their standard deviation. This information is useful for identifying possible load imbalances. They may be due to data distribution [10] _ inhomogeneous in number of cells, in terms of complexity of the law of behavior, in terms of memory access to data structures…

#1. Solving linear.systems… CPU. (USER + SYST/SYST/ELAPS):… 0.29… 0.00… 0.35

… (average… diff.. procs)… CPU. (USER + SYST/SYST/ELAPS):… 0.30… 0.00… 0.47

… (standard deviation.diff.. procs)… CPU. (USER + SYST/SYST/ELAPS):… 0.01… 0.00… 0.05

Figure 4.1.2-c: Time consumption in an Aster command type STAT_NON_LINEou CALC_CHAMP with niv=1 in parallel mode MPI (extracted from a.mess).

On the other hand, sometimes the Aster memory manager (JEVEUX) unloads objects on disk to release RAM (see §5.1). Depending on the hardware configuration, the machine load and the size of the problem, these unloads are likely to significantly slow down the calculation process. When unloads have occurred, during an operator, we trace the accumulation of their time consumed (USER + SYST, SYST, ELAPS) at the end of the operator. This then makes it possible to refine possible diagnoses (see §6.2).

#4 Unloading disk memory CPU (USER + SYST/SYST/ELAPS): 0.04 0.04 0.04

Figure 4.1.2-d*: Additional time costs due to disk downloads of* JEVEUX (extracted from an.mess*) .*

Note:

The total volume recovered and the number of occurrences of this mechanism are summarized at the end of the message file (see §5.1).

4.2. Overall time consumed#

At the end of the message file, the sum of consumption CPU, CPU + SYST, SYST and the remaining unused time (between the time mentioned by the user in Astk and the time CPU + SYST) are systematically traced.

In principle, any significant time drift SYSTEMEdoit question (cf. §6.2) .

<I>< INFORMATION TEMPS FROM EXECUTION > (EN SECONDE)

TEMPS CPU TOTAL… 2160.55

TEMPS CPU USER TOTAL… 2099.33

TEMPS CPU SYSTEME TOTAL… 61.22

TEMPS CPU RESTANT… 439.45

Figure 4.2-a : Global consumption in calculation time (extracted from a.mess) .

4.3. Special case of DYNA/STAT_NON_LINE#

In these nonlinear dynamic/static commands, we draw as standard (INFO =1) for each time step or calculation increment, in addition to the residue decay table (the level of detail is managed by the AFFICHAGE command keyword):

The stored fields (selected by the keyword ARCHIVAGE),
The breakdown of the CPU times consumed and, possibly, the number of iterations associated with it (e.g. Newton process),
Dedicated display blocks (e.g. discrete contact).

Archiving fields can be costly in terms of time (especially in parallel) and memory. It is therefore interesting to analyze the list of archived fields in order to possibly limit them.

ARCHIVAGE DES CHAMPS:

Archived fields

To be limited (especially in parallel) CHAMP STOCKE: CONT_NOEU INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

CHAMP STOCKE: DEPL INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

CHAMP STOCKE: SIEF_ELGA INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

CHAMP STOCKE: VARI_ELGA INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

CHAMP STOCKE: VITE INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

Breakdown time CPU /iterations CHAMP STOCKE: ACCE INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50

TEMPS CPU CONSOMME DANS CE PAS DE TEMPS : 0 s

TEMPS PAR ITERATION DE NEWTON: 0 s - NBRE NEWT.: 2

TEMPS ARCHIVAGE: 0 sec

Only these parts can benefit from parallelism. TEMPS CREATION NUMEROTATION: 0 s - NBRE NUME.: 0

TEMPS FACTORISATION MATRICE: 0 s - NBRE FACT.: 0

TEMPS INTEGRATION COMPORTEMENT: 0 s - NBRE INTE.: 3

TEMPS RESOLUTION K.U = F: 0s - NBRE RESO.: 2

TEMPS RESOLUTION CONTACT: 0 s - NBRE ITER.: 2

TEMPS AUTRES OPERATIONS: 0 sec

Dedicated display block (contact…)

CONTACT DISCRET:

NOMBRE OF ITERATIONS OF CONTACT: 2

NOMBRE OF ITERATIONS OF REAC. GEOM: 2

NOMBRE FINAL OF LIAISONS FROM CONTACT: 0

TEMPS TOTAL APPARIEMENT: 0 sec

TEMPS TOTAL RESOLUTION: 0 sec

Figure 4.3-a : Display of DYNA/STAT_NON_LINE with INFO =1 in sequential mode (taken from an.mess*) at each time step .*

At the end of the operator, the overall statistics on the entire transitory are summarized**. These CPU times are reflected in the global end-of-operator statistics mentioned in the previous paragraph. On the other hand, their granularity is finer and adapted to the various stages of the operator.

STATISTIQUES SUR THE TRANSITOIRE

NOMBRE OF PAS OF TEMPS: 100

NOMBRE OF ITERATIONS OF NEWTON: 200

NOMBRE OF ITERATIONS OF CONTACT (ALGO): 456

NOMBRE OF ITERATIONS OF CONTACT (GEOM): 200

NOMBRE OF CREATION FROM NUMEROTATION: 1

NOMBRE OF FACTORISATION OF MATRICE: 2

NOMBRE OF INTEGRATION OF COMPORTEMENT: 201

NOMBRE BY RESOLUTION K.U = F: 200

TEMPS POUR CREATION NUMEROTATION: 10 sec

TEMPS POUR FACTORISATION MATRICE: 100 sec

TEMPS POUR INTEGRATION COMPORTEMENT: 3m 8s

TEMPS POUR RESOLUTION K.U = F: 9s

TEMPS POUR CONTACT (APPARIEMENT): 17 s

TEMPS POUR CONTACT (ALGORITHME): 2 m 30 s

Figure 4.3-b : Global statistics from DYNA/STAT_NON_LINE with INFO =1 (taken from an.mess*) .*