1. Debug the code#

The debugger is the main tool and also the most powerful tool available to a developer. It allows you to follow the execution of a program in real time with interactive navigation in its sources: you thus have the possibility to advance the code line by line, or even instruction by instruction, to inspect the content of variables and much more…

To debug, we generally use an executable compiled with the debugging symbols (or debugging symbols accessible by the -g option on most compilers). It is the executable produced by the waf install_debug command. It’s also the one used when choosing debug in ASTK.

The addition of these symbols (and mainly the removal of optimizations, equivalent to the -O0 level) generally produces an executable different from the production one, sometimes with a different precision in floating point calculations.

Several methods of use exist:

Post-mortem: when there is a crash, it is possible after an execution, to go back to the place where this one occurs thanks to the « core file »
Interactive: we launch the executable « under » the debugger

Coupling*a posteriori: we connect the debugger to a program in progress

The first mode is useful when you need to know the exact location of a crash but you don’t want to slow down the execution excessively.

The second mode is more suitable when you already have an idea of the problem, since you will be able to go and examine the content of the objects and make elementary checks (it will also be noted that this second case is also useful when you only observe abnormal behavior since you will be able to follow the state of certain variables other than by impressions).

This is the mode of choice in general.

Finally, the last mode is useful when you encounter a performance problem on a large calculation, or even when a calculation seems to loop indefinitely. It is then difficult to perform profiling [D1.06.01]: by connecting a debugger to a running program, you can examine what routine it is in.

1.1. Post-mortem#

In the event of an Aster calculation crashes, a debugger is automatically run in post-mortem mode to provide information on the location of the crash in the source.

The first reflex in the event of a crash is therefore to restart the calculation in « debug » mode to obtain a precise location (line number in the source of the illegal instruction). However, care must be taken to ensure that fatal errors cause the calculation to be abandoned (i.e. having entered the keyword ERREUR =_F (ERREUR_F =” ABORT “) in DEBUT). This is also the case for test cases, the presence of the CODE keyword activates this behavior in case of error.

Note

On platforms using the Intel compiler, the line number is directly available in an optimized version (nodebug).

If you want to go further, without running your calculation under the debugger, you will follow the instructions below to perform post-mortem debugging.

To use this debugging mode, you must start your study from ASTK by selecting the « interactive » mode and clicking on launch « pre » as opposed to starting « run ».

ASTK then prepare in /tmp the tree necessary to launch Aster and indicates in the output file the command line to use to start the execution after being placed in the right place.

The same operation is obtained by using:

waf test_debug –name=zzzz000a –exectool=env

As with ASTK, the output indicates which commands to use next.

Example of the output:

OK Code_Aster environment prepared in /tmp/interactif.16468-dsp0764418

<INFO> To start execution, copy/paste the following lines into a bash/ksh shell:

cd /tmp/interactive.16468-dsp0764418

. /xxx/public/v13/tools/Code_aster_frontend-salomemeca/etc/codeaster/profile.sh

. /xxx/dev/codeaster/install/std/share/aster/profile.sh

. profile_tmp.sh

<INFO> Command line 1:

strong cp.1.1 loud.1

/xxx/dev/codeaster/install/std/bin/asterd /xxx/dev/codeaster/install/std/lib/aster/execution/e_ SUPERV .py - strong commands.1 --num_job=16468-dsp0764418 --mode=0764418 --mode=interactive --mode=interactive --rep_outils=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tool/dev/codeaster/install/std/share/aster/material --rep_dex=/xxx/dev/codeaster/install/std/share/aster/datg --numthreads=1 --tpmax=1 --tpmax=60 --memjeveux=75.75

To start running in the Python debugger, you can use:

strong cp.1.1 loud.1

/xxx/dev/codeaster/install/std/bin/asterd /usr/lib/python2.7/pdb.py /xxx/dev/codeaster/install/std/lib/aster/execution/E_ SUPERV .py -strong commands.1 --loud commands.1 --num_job=16468-num_job=16468-dsp0764418 --mode=interactive --rep_outils=/xxx/public/v13/tools/code_aster_frontsal-saloon omemeca/tools --rep_mat=/xxx/dev/codeaster/dev/codeaster/install/install/std/install/std/share/aster/datg --numthreads=1 --tpmax=60 --rep_dex=/xxx/dev/codeaster/datg --numthreads=1 --tpmax=60 --memjeveux=75.75

For post-mortem debugging 3 steps are required:

Position the execution environment (copy the ad-hoc lines from the output, there are more or less lines depending on the environment to be positioned):

cd /tmp/interactive.16468-dsp0764418

. /xxx/public/v13/tools/Code_aster_frontend-salomemeca/etc/codeaster/profile.sh

. /xxx/dev/codeaster/install/std/share/aster/profile.sh

. profile_tmp.sh

Execute the code interactively (copy the ad-hoc lines from the output):

ulimit -c unlimited

strong cp.1.1 loud.1

/xxx/dev/codeaster/install/std/bin/asterd /xxx/dev/codeaster/install/std/lib/aster/execution/e_ SUPERV .py - strong commands.1 --num_job=16468-dsp0764418 --mode=0764418 --mode=interactive --mode=interactive --rep_outils=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tools/code_aster_frontend-salomemeca/tools --rep_mat=/xxx/public/v13/tool/dev/codeaster/install/std/share/aster/material --rep_dex=/xxx/dev/codeaster/install/std/share/aster/datg --numthreads=1 --tpmax=1 --tpmax=60 --memjeveux=75.75

The first command ensures that the corefile can be written with no size limit, otherwise it may not be produced at all.

Start the « post-mortem » debugger:

When the calculation crashed, the system produced a file called core file. This file, which contains the state of the memory at the time of the crash, allows post-mortem analysis. Warning: if the program was using a large amount of memory at the time of the crash, this file may be large.

This file is called core or sometimes core. NNN where NNN is a number.

The post-mortem analysis is carried out by launching:

gdb /xxx/dev/codeaster/install/std/bin/asterd core

The first instruction executed in the debugger is usually where to find out where the program stopped!

For navigation once the debugger is started, refer to the next section.

1.2. Interactive debugging#

1.2.1. General operation#

You can also debug code more interactively by running Code_Aster under the control of a debugger. A debugger is a tool that allows the progress of a program line by line and the examination of all the variables encountered in the source.

Such a tool provides numerous services and can be extremely powerful. Commonly used debuggers are*gdb* (GNU, in text mode, works everywhere) or*idb* (Intel). In general, we will use a graphical interface that is more user-friendly than these simple command line tools: we can mention DDD, nemiver (interfaces to*gdb*) or IDB (interface Eclipse to*idb*).

Concretely, to start the execution of Code_Aster under the control of the debugger, you must use the « Start/dbg » button in ASTK. The executed version is then automatically the*debug* version.

The graphical interface starts, it immediately sets a first breakpoint which has the effect of stopping the program in the main of the Python.c program (the entry point of Code_Aster).

The equivalent with Waf is obtained by running:

waf test_debug –name=zzzz000a –exectool=debugger

If the debugger does not launch or to change the debugger, see paragraph 1.5 Configurer le débogueur.

Before continuing with the execution, you can position other stopping points. Once this is complete, we continue the execution by pressing « cont » or « run » depending on the graphical interface used.

Some commands of gdb (very similar syntax for idb)

Online help: « man gdb » or in gdb type help, or help*subject

« ENTER » key reproduces the previous action
Where am I? : where, up, down (allows you to move through the call stack)
Specify a breakpoint in a routine or at a given line in the current routine:

break*routine_name* break*line_num* example: break op0199 or break 87 or b op0199example 2: break 87 if (i.eq. 3) (we stop at line 87 of the current file if the local variable i is equal to 3) In some debuggers, break is replaced by stop in/stop at.

Continue running until the next breakpoint: cont or c
List breakpoints: info breakpoints or status

Destroy a breakpoint: delete*id

Deactivate a breakpoint: disable*id

Move forward one instruction while staying in the current routine: next*or n

Advance one instruction by diving into routines called: step*or s

Show the content of a variable: print*varname*or p*varname

Display an expression at each stop: display name_var or display expression

Fill in a variable: set*var_name=value

List the program: list or list*line_num

Kill the current program: kill, to restart it: run

Save breakpoints for reuse: save breakpoints*filename

All these commands generally have a graphic equivalent (button) or a shortcut (for example in idb these are the function keys).

1.2.2. Debugging a parallel program with Totalview#

When the calculation to be debugged is parallel (MPI for example), it is possible to use a dedicated debugger like Totalview which will launch the parallel calculation and will give access to the position of each MPI process.

The process for launching an Aster calculation in Totalview differs somewhat from the sequential interactive mode.

The steps to follow are as follows:

preparation of a temporary directory for execution with the « *pre » mode

positioning the environment and launching Totalview
Totalview settings

For the first step, refer to § 1.1. Care should be taken to select the parallel version and a single processor.

In the second step, the positioning of the environment and the launch of Totalview are done using the output file produced by the first step:

cd /tmp/interactive.16468-dsp0764418

. /xxx/public/v13/tools/Code_aster_frontend-salomemeca/etc/codeaster/profile.sh

. /xxx/dev/codeaster/install/std/share/aster/profile.sh

. profile_tmp.sh

strong cp.1.1 loud.1

totalview /xxx/dev/codeaster/install/std/bin/asterd

Totalview is set up in a manner similar to the two images below. Note the « -totalview » argument added after the arguments given in the output file.

Which MPI driver to use may differ between platforms. The « Tasks » menu allows you to specify the number of processes to launch.

1.3. Debugging a running program#

1.3.1. Introduction#

Because it is sometimes not possible to carry out profiling, you want to interrupt a program to find out where it spends most of its time, or simply where it seems to be coming to an end. It is of course possible to overload the code to place prints, but this requires knowing a priori the location of the block or working by dichotomy, which can become long (if the calculation in question is a study).

Here we propose a very simple technique using a debugger (gdb).

1.3.2. Implementation on an example#

Consider the following calculation:

[:ref:`desoza@aster3 ~ <desoza@aster3 ~>`] $ date & jobs

Tue Jun 30 09:39:35 CEST 2009

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

721238 desoza RUN Q16g_24h aster2 aster2 aster10*_gros_cas Jun 29 12:59

It has been running since 8:40 p.m. If we look in its execution directory:

[:ref:`desoza@aster3 ~ <desoza@aster3 ~>`] $ ssh aster10

Last login: Tue Jun 30 08:51:25 2009 from aster3

cd [desoza @aster10 ~] $ cd /tmp/721238

[:ref:`desoza@aster10 721238 <desoza@aster10 721238>`] $ ls -ltrh

Total 1.5G

-rwxrwxr-x 1 desoza astergrp 81M Jun 24 00:42 asteru

-rw-r--r-- 1 desoza astergrp 2 Jun 29 12:59 msg_job

-rw-r--r-- 1 desoza astergrp 12 Jun 29 12:59 FTMPDIR

-rw-r--r-- 1 desoza astergrp 0 Jun 29 12:59 strong.

drwxr-xr-x 2 desoza astergrp 6 Jun 29 13:00 RESU_ENSIGHT

drwxr-xr-x 2 desoza astergrp 6 Jun 29 13:00 REPE_OUT

drwxr-xr-x 2 desoza astergrp 35 Jun 29 13:00 rep_coco

-rw-r--r-- 1 desoza astergrp 852 Jun 29 13:00 721238.export

-rwxr-xr-x 1 desoza astergrp 8.1M Jun 29 13:00 strong.20

-rw-r--r-- 1 desoza astergrp 0 Jun 29 13:00 err_cp

-rwxr-xr-x 1 desoza astergrp 7.9K Jun 29 13:00 strong.1

-rw-r--r-- 1 desoza astergrp 0 Jun 29 13:00 err

drwxr-xr-x 17 desoza astergrp 4.0K Jun 29 13:00 Eficas

-rw-r--r-- 1 desoza astergrp 6.5K Jun 29 13:00 config.txt

-rw-r--r-- 1 desoza astergrp 595 Jun 29 13:01 strong .9

-rwxr-xr-x 1 desoza astergrp 15M Jun 29 13:01 elem.1

-rw-r--r-- 1 desoza astergrp 8.3K Jun 29 13:03 strong .8

-rw-r--r-- 1 desoza astergrp 61K Jun 29 13:03 strong .6

-rw-r--r-- 1 desoza astergrp 245M Jun 29 13:04 glob.1

-rw-r--r-- 1 desoza astergrp 778K Jun 29 13:04 lout.15

-rw-r--r-- 1 desoza astergrp 1.1G Jun 29 13:04 vol.1

We can see that the calculation has not written anything to disk since 20:35min. In fact he hasn’t even finished a Newton’s iteration:

…

We are going to use the features of gdb (or any other debugger) that allow you to interrupt a program after attaching to it. To do this, you will need to know PID of the Aster executable, which runs in a loop. For example, you can use the following command (only works if the executable in question is called « asteru » and there is only one*job* in the name of the user on the compute node):

[:ref:`desoza@aster10 721238 <desoza@aster10 721238>`] $ pgrep -u $ USER asteru

2595

When the job you want to examine is in parallel, it is difficult to find the PID of the processor i. One possibility is to use the « top » tool, to display columns PID and PPID (Parent PID) and to raise the process number « asteru » from the temporary directory number in the form « proc_pid » (here pid is the PID of the shell script to launch the Aster calculation). The idea is that we look in column PPID for the number pid, we then find in column PID a new number that we look for again in column PPID, and so on until we get to the process. » /asteru… ».

We then execute the following line after having moved to the temporary execution directory:

[:ref:`desoza@aster10 721238 <desoza@aster10 721238>`] $gdb. /asteru 2595

This results in the following:

GNU gdb Bull Linux (6.3.0.0-1.132. EL4 .b.2.bull)

Copyright 2004 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

Welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "ia64-redhat-linux-gnu"... Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /tmp/721238/asteru, process 2595

Reading symbols from shared object read from target memory... done.

Loaded system supplied DSO at 0xa000000000000000

`shared object read from target memory' has disappeared; keeping its symbols.

Reading symbols from /opt/intel/cmkl/9.1.023/lib/64/libmkl.so...done.

Loaded symbols for /opt/intel/cmkl/9.1.023/lib/64/libmkl.so

...

Loaded symbols for /aster/local/python-2.4.5/lib/python2.4/lib-dynload/_random.so

Reading symbols from /aster/local/python-2.4.5/lib/python2.4/lib-dynload/md5.so...done.

Loaded symbols for /aster/local/python-2.4.5/lib/python2.4/lib-dynload/md5.so

Reading symbols from /opt/intel/cmkl/9.1.023/lib/64/libmkl_i2p.so..done.

Loaded symbols for /opt/intel/cmkl/9.1.023/lib/64/libmkl_i2p.so

0x4000000000358a20 in tldlr8_?? One ()

We have thus interrupted the program (it is no longer in the “running” R state but in the “stopped” T state) as shown by the top tool.

PID PPID USER VIRT SWAP RES CODE DATA P S% CPU% MEM TIME + n esserts FLT COMMAND

2595 2546 desoza 7351m 127m 7.1g 64m 7.0g 2 T 0.0 5.5 1245:46 6. /asteru

Now we can do as in a debugger, and ask where we are to find out what is going on:

(gdb) where

#0 0x4000000000358a20 in tldlr8_?? One ()

#1 0x4000000000355ee0 in tldlg3_?? One ()

#2 0x4000000000357860 in tldlgg_?? One ()

#3 0x400000000256b2b0 in algoco_?? One ()

#4 0x40000000023e2f10 in cfalgo_?? One ()

#5 0x40000000022575c0 in nmcofr_?? One ()

#6 0x4000000001c48210 in nmcoun_?? One ()

#7 0x4000000001046480 in nmdepl_?? One ()

#8 0x40000000004044e0 in op0070_?? One ()

...

As we use a « nodebug » version we do not have access to the source (and the line numbers), for this we would need a version compiled with the « -g » option. However, it is possible to determine what is going on. Here it is a contact calculation that rolls into tldlr8 the routine that factors the Schur complement of the contact. As there are more than 4000 active contact nodes, this factorization is very long (but it is normal).

When you are finished in gdb, you can detach yourself from the program and then exit, the execution then resumes.

(gdb) detach

Detaching from program: /tmp/721238/asteru, process 2595

(gdb) q

1.4. Debug Python source#

When the bug concerns Python sources, you must use the Python debugger. For this, we still use ASTK/Launch « pre ». After doing something like:

cd /tmp/interactive.12219

export ASTER_VERSION = NEW9

. /opt/aster/ ASTK/ASTK_SERV /conf/aster_profile.sh

. /opt/aster/ NEW9 /profile.sh

You can launch the code under the control of the Python debugger:

To start execution in the Python debugger you could type:

. /asterd /usr/lib/python2.7/pdb.py Python/Execution/E_ SUPERV .py -eficas_path\

. /Python -commands fort.1.1 -rep none -num_job 12219 -interactive mode\

-rep_tools /opt/aster/tools -rep_mat /opt/aster/ NEW9 /material\

-rep_dex /opt/aster/ NEW9 /date -tpmax 120 -me want 16.0

For more details, see for example: http://docs.python.org/library/pdb.html

1.5. Set up the debugger#

The command line used to run the debugger (in interactive or post-mortem mode) is defined in the configuration files in ASTK.

To see the command line used interactively, do:

as_run –showme param cmd_dbg

To see the command line used in*post-mortem*, do:

as_run –showme param cmd_post

In general, these commands are defined in the server configuration file.

You can change this value by writing the command line of your choice in $ HOME /.astkrc/prefs.

Attention

The file to be modified is $ HOME /.astkrc_salomemeca_ VERSION /prefssi ASTKest from Salome-Meca.

Example for*gdb*, copy/paste the line:

echo “cmd_dbg: xterm -e gdb –command= @D @E @C “>> ~/.astkrc/prefs

Example for*nemiver*, copy/paste the line:

echo “cmd_dbg: nemiver @E @a “>> ~/.astkrc/prefs

Example for*idb*, copy/paste the line:

echo “cmd_dbg: /opt/Intel/compiler/11.1/064/bin/ia32/idb -gui -gdb -command @D -exec @E “>> ~/.astkrc/prefs

The codes @E, @C,… are replaced by ASTK at launch time:

@E: name of the Code_Aster executable,
@a: the arguments passed to the Code_Aster executable,
@C: corefile name,
@D: name of the command file for the debugger (which contains where + quit),
@d: the text corresponding to the commands sent to the debugger,