Breakdown of time consumed =============================== Time consumed per order ---------------------------- *Code_Aster* offers two reading levels to analyze the time consumed by each order: *A**global level** that combines all the calculation steps of said order, *A**finer level** (configurable) that allows you to separate the main steps. Global monitoring of each order ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **At the end of each order**, the various times consumed are plotted in the message file [6] _ from operator: CPU [7] _ +system (USER + SYST), system (SYST), and elapsed (ELAPS). **In principle, any significant drift in time** **SYST and/or time** **ELAPS ED must be questioned (cf.** **§6.2) .** # FIN COMMANDE NO: 0045 **USER + SYST**: 520.01s (**SYST**: 12.17s, **ELAPS**: 525.12s) # -------------------------------------------------------------------------------- *Figure* 4.1.1-a *: Traces of the overall time consumption of an Aster order (extracted from an*.mess*) .* These times are also summarized, for all commands, in the result file. ******************************************************************************************** * **COMMAND** **:** **USER :** **SYSTEM :** **USER+SYS :** **ELAPSED** * ******************************************************************************************** *init (jdc): 2.62:0.88:3.50:4.30* *. compile: 0.00:0.00:0.00:0.00:0.01* *. exec_compile: 0.54:0.03:0.03:0.57:0.58* *. report: 0.03:0.00:0.00:0.03:0.03* *. build: 0.00:0.00:0.00:0.00* *DEBUT: 0.04:0.05:0.09:0.13* *PRE_GIBI: 10.75:1.89:12.64:12.69* *LIRE_MAILLAGE: 27.92:0.13:28.05:28.37* *DEFI_MATERIAU: 0.01:0.00:0.01:0.01* *AFFE_MATERIAU: 0.04:0.01:0.05:0.0* *AFFE_MODELE: 5.48:0.08:5.56:5.57* *AFFE_CHAR_MECA: 0.52:0.02:0.02:0.54:0.54* ... *MECA_STATIQUE: 2249.89:18.55:2268.44:2271.87* *TEST_RESU: 0.01:0.01:0.02:0.01* *FIN: 0.11:0.01:0.12:0.17* *. Supervisor share: 3.37:0.94:4.31:5.21* *. dveri: 0.68:0.01:0.69:0.73* *. Fortran share: 4600.81:39.42:39.42:4640.23:4650.96* ******************************************************************************************** *TOTAL_JOB: 4604.18:40.36:4636:4644.54:4656.18* ******************************************************************************************** *Figure* 4.1.1-b *: Global time consumption of all Aster orders (extracted from an*.resu*) .* Final monitoring of each order ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The **breakdown of the time consumed** (USER + SYST, SYST, ELAPS) according to the various calculation steps (elementary calculation, assembly, numerical factorization...) is carried out at the end of each operator involving the construction and/or resolution of linear systems (for example STAT_NON_LINE and CALC_CHAMP). By default, summary values are displayed (lev=1). In fact, this description of the time consumed can be broken down according to different reading levels *via* the parameter MESURE_TEMPS/NIVE_DETAIL of the DEBUT/POURSUITE commands. *Prioriti* **, any significant drift in time** SYST **and/or time** ELAPS **must be questioned (cf.** § **6.2) .** Let's detail the impression levels of MESURE_TEMPS/NIVE_DETAIL =level (**default=1)**: /level =0, No impression relating to monitoring at the end of each order, /niv =1, Synthetic impressions of the three types of time for elemental/assembly calculations and for the resolution of associated linear systems: #1. **resolution.des.systemes.lineaire**... CPU. (USER + SYST/SYST/ELAPS):... 7.52... 0.79.. 11.22 #2. **Elemental.calculations.and.assemblies**... CPU. (USER + SYST/SYST/ELAPS):.. 15.07... 0.70... 15.77 *Figure* 4.1.2-a *: Time consumption in an Aster* command type STAT_NON_LINE or CALC_CHAMP with lev=1 sequentially (extracted from a.mess). /niv =2, Detailed impressions of the times for the two main classes of calculations: elemental calculations/assemblies and the resolution of associated linear systems. This type of information can provide information about the impact of a modification of the command file or the parameters for starting the calculation (memory, parallelism, etc.). For example, knowing that parallelism MPI does not potentially make it possible to reduce (cf. doc. U2 (dedicated) that steps 1.3/1.4/2, **it is of little use to parallelize a calculation** [8] _ on dozens of processors if step 1.2 takes 20% of the total time [9] _ ! **#1 .resolution.des.systemes.lineaires**... CPU. (USER + SYST/SYST/ELAPS):... 7.72... 0.82... 8.72 #1 .1.Numbering, .connectivity of.the .matrix CPU. (USER + SYST/SYST/ELAPS):... 0.21... 0.02... 0.31 #1 .2.factorization.Symbolic... CPU. (USER + SYST/SYST/ELAPS):... 0.58... 0.05... 1.28 #1 .3.factorization.numeric. (or.precond.)... CPU. (USER + SYST/SYST/ELAPS):... 6.78... 0.73... 7.71 #1 .4.Resolution... CPU. (USER + SYST/SYST/ELAPS):... 0.15... 0.02... 0.35 **#2 .calculs.elementaires.and.assemblages**... CPU. (USER + SYST/SYST/ELAPS):.. 28.87... 0.64... 29.47 #2 .1.routine.Calculation... CPU. (USER + SYST/SYST/ELAPS):.. 26.61... 0.56... 26.61 #2 .1.1.routines.te00ij... CPU. (USER + SYST/SYST/ELAPS):.. 24.58... 0.07... 25.78 #2 .2.Assemblies... CPU. (USER + SYST/SYST/ELAPS):... 2.26... 0.08... 3.36 #2 .2.1.Assembly.matrices... CPU. (USER + SYST/SYST/ELAPS):... 2.02... 0.06... 3.12 #2 .2.2.assemblage.seconds.members... CPU. (USER + SYST/SYST/ELAPS):... 0.24... 0.02... 0.37 *Figure* 4.1.2-b *: Time consumption in an Aster type* STAT_NON_LINE *or* CALC_CHAMP *command with lev=2 sequentially (extracted from an*.mess*) .* /niv =3, Same as niv=2 but the printing is done for each time step or calculation increment. **In parallel mode MPI** (cf. doc. U2 on parallelism or doc. U4.50.01), we add the average value, over all the processors, of the times consumed as well as their standard deviation. This information is useful for identifying **possible load imbalances**. They may be due to data distribution [:ref:`10 <10>`] _ inhomogeneous in number of cells, in terms of complexity of the law of behavior, in terms of memory access to data structures... #1. Solving linear.systems... CPU. (USER + SYST/SYST/ELAPS):... 0.29... 0.00... 0.35 ... **(average... diff.. procs)**... CPU. (USER + SYST/SYST/ELAPS):... 0.30... 0.00... 0.47 ... **(standard deviation.diff.. procs)**... CPU. (USER + SYST/SYST/ELAPS):... 0.01... 0.00... 0.05 Figure 4.1.2-c: Time consumption in an Aster command type STAT_NON_LINEou CALC_CHAMP with niv=1 in parallel mode MPI (extracted from a.mess). On the other hand, sometimes the *Aster* memory manager (JEVEUX) **unloads objects on disk** to release RAM (see §5.1). Depending on the hardware configuration, the machine load and the size of the problem, these unloads are likely to significantly slow down the calculation process. **When unloads have occurred, during an operator, we trace the** **accumulation of their time consumed** (USER + SYST, SYST, ELAPS) at the end of the operator. This then makes it possible to refine possible diagnoses (see §6.2). #4 **Unloading disk memory** CPU (USER + SYST/SYST/ELAPS): 0.04 0.04 0.04 *Figure* 4.1.2-d*: Additional time costs due to disk downloads of* JEVEUX *(extracted from an*.mess*) .* **Note:** The total volume recovered and the number of occurrences of this mechanism are summarized at the end of the message file (see §5.1). Overall time consumed ----------------------- At the end of the message file, **the sum of consumption** CPU, CPU + SYST, SYST and the remaining unused time (between the time mentioned by the user in *Astk* and the time CPU + SYST) are systematically traced. **In principle, any significant time drift** **SYSTEMEdoit question (cf.** **§6.2) .** < **INFORMATION TEMPS FROM EXECUTION** > (EN SECONDE) TEMPS CPU TOTAL... 2160.55 TEMPS CPU USER TOTAL... 2099.33 TEMPS CPU SYSTEME TOTAL... 61.22 TEMPS CPU RESTANT... 439.45 *Figure* 4.2-a *: Global consumption in calculation time (extracted from a.mess) .* Special case of DYNA/STAT_NON_LINE ------------------------------------- In these nonlinear dynamic/static commands, we draw as standard (INFO =1) **for each time step or calculation increment**, in addition to the residue decay table (the level of detail is managed by the AFFICHAGE command keyword): * The stored fields (selected by the keyword ARCHIVAGE), * The breakdown of the CPU times consumed and, possibly, the number of iterations associated with it (e.g. Newton process), * Dedicated display blocks (e.g. discrete contact). **Archiving fields can be costly in terms of time (especially in parallel) and memory**. It is therefore interesting to analyze the list of archived fields in order to possibly limit them. **ARCHIVAGE DES CHAMPS:** Archived fields To be limited (especially in parallel) CHAMP STOCKE: CONT_NOEU INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 CHAMP STOCKE: DEPL INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 CHAMP STOCKE: SIEF_ELGA INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 CHAMP STOCKE: VARI_ELGA INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 CHAMP STOCKE: VITE INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 Breakdown time CPU /iterations CHAMP STOCKE: ACCE INSTANT: 5.00000E+02 NUMERO FROM ORDRE: 50 **TEMPS CPU CONSOMME DANS CE PAS DE TEMPS :** 0 s TEMPS PAR ITERATION DE NEWTON: 0 s - NBRE NEWT.: 2 TEMPS ARCHIVAGE: 0 sec Only these parts can benefit from parallelism. TEMPS CREATION NUMEROTATION: 0 s - NBRE NUME.: 0 TEMPS FACTORISATION MATRICE: 0 s - NBRE FACT.: 0 TEMPS INTEGRATION COMPORTEMENT: 0 s - NBRE INTE.: 3 TEMPS RESOLUTION K.U = F: 0s - NBRE RESO.: 2 TEMPS RESOLUTION CONTACT: 0 s - NBRE ITER.: 2 TEMPS AUTRES OPERATIONS: 0 sec Dedicated display block (contact...) **CONTACT DISCRET:** NOMBRE OF ITERATIONS OF CONTACT: 2 NOMBRE OF ITERATIONS OF REAC. GEOM: 2 NOMBRE FINAL OF LIAISONS FROM CONTACT: 0 TEMPS TOTAL APPARIEMENT: 0 sec TEMPS TOTAL RESOLUTION: 0 sec *Figure* 4.3-a *: Display of* DYNA/STAT_NON_LINE *with* INFO =1 *in sequential mode (taken from an*.mess*) at each time step .* **At the end of the operator, the overall statistics on the entire transitory** are summarized**. These CPU times are reflected in the global end-of-operator statistics mentioned in the previous paragraph. On the other hand, their granularity is finer and adapted to the various stages of the operator. ------------------------------------------------------------------------ **STATISTIQUES SUR THE TRANSITOIRE** ------------------------------------------------------------------------ **NOMBRE OF PAS OF TEMPS**: 100 **NOMBRE OF ITERATIONS OF NEWTON**: 200 **NOMBRE OF ITERATIONS OF CONTACT** (ALGO): 456 **NOMBRE OF ITERATIONS OF CONTACT** (GEOM): 200 NOMBRE OF CREATION FROM NUMEROTATION: 1 **NOMBRE OF FACTORISATION OF MATRICE**: 2 NOMBRE OF INTEGRATION OF COMPORTEMENT: 201 NOMBRE BY RESOLUTION K.U = F: 200 TEMPS POUR CREATION NUMEROTATION: 10 sec TEMPS POUR FACTORISATION MATRICE: 100 sec TEMPS POUR INTEGRATION COMPORTEMENT: 3m 8s TEMPS POUR RESOLUTION K.U = F: 9s TEMPS POUR CONTACT (APPARIEMENT): 17 s TEMPS POUR CONTACT (ALGORITHME): 2 m 30 s ------------------------------------------------------------------------ *Figure* 4.3-b *: Global statistics from* DYNA/STAT_NON_LINE *with* INFO =1 *(taken from an*.mess*) .* .. [1] Time CPU measures the execution of code sources (C, Fortran, Python). System time takes into account the underlying system calls (disk access/ RAM...). The elapsed time includes the previous two and measures the real time elapsed ("wall clock"). .. [2] The time CPU is here called USER. .. [3] This will still make it possible to reduce time consumption and the amount of memory required for the construction and resolution of the linear system. As a result, other functionalities such as touch-friction could be (a bit) accelerated because they would thus have more space in RAM. .. [4] The time savings (speed-up) are limited to a factor of 5, even on hundreds of processors! .. [5] In the distributed data stream of JEVEUX or possible parallel external tools (MUMPS, PETSC).