2. Valgrind#
2.1. Presentation#
Valgrind is an executable for detecting certain programming errors during the execution of a program. The operating principle of Valgrind is to overload certain system functions. This is done through a dynamic library and functions such as malloc*,* free, memcpy are thus replaced by equivalents instrumentated by Valgrind.
For more information: http://valgrind.org/docs/manual/quick-start.html
or else:
valgrind –help
2.2. Use#
To analyze a calculation with Valgrind, the Aster executable must be compiled with debug symbols (« debug » version to be checked in ASTK).
Example of use (to check the unix program « ls »):
valgrind --tool=memcheck --error-limit=no ls
More generally, a good Valgrind command line looks like:
valgrind --tool=memcheck --error-limit=no --leak-check=full\
--suppressions=python.supp --track-origins=yes
The python.supp file allows you to remove unjustified Python errors (Python has its own memory manager that allows for non-standard manipulations). A copy of this file is generally found in Linux distributions. On Caliber, this one can be found in /usr/lib/valgrind/python.supp.
To use Valgrind with*Aster*, you must be able to « encapsulate » the executable call. This « encapsulation » technique can be done in several ways but we only detail here the simplest one (and the recommended one).
To do this, we use the « exectool » feature in ASTK. You start by entering aliases to command lines in your local configuration file (located in $ HOME /.astkrc/prefs) that will prefix the Aster launch line:
desoza @claut621: ~$ echo “memcheck: valgrind –tool=tool=memcheck –memcheck –error-limit=no –leak-check=full –suppressions=/path/to/python.supp –track-origins=yes” >> ~/.astkrc/prefs
Then in the « Options » menu, you declare exectool=memcheck. Then we start the calculation normally. A message is then displayed to confirm that you want to start the calculation with the selected tool.

The equivalent with Waf is obtained by running:
waf test_debug –name=zzzz000a –exectool=memcheck –time_limit=7200
Several remarks can be made:
Running under valgrind can take a lot longer (30 times longer sometimes). It is preferable to use a « debug » executable for the Valgrind diagnosis to be more accurate (line number in the sources). So remember to allocate enough time in ASTKou using –time_limit with waf. It is also sometimes necessary to increase the memory limit otherwise you will get an abrupt stop while running without clear information.
With the —-num-callers=n option, you choose the depth n of the call tree displayed by Valgrind.
The —-*track-origins=yes option is only available from Valgrind versions higher than 3.4.0.
2.3. Decryption#
Once the calculation is started, the error messages detected by Valgrind will then be mixed with the output of Aster. They are indicated by « ==ProcessNumber== » and we generally have 3 types of possible errors:
Use of an uninitialized variable
Invalid read outside of a memory segment
Invalid write outside of a memory segment
Variable not initialized
==8906==
==8906== Conditional jump or move depends on uninitialized value (s)
==8906== at 0x9167E47: nbsuco_ (Nbsuco_ 90:124)
==8906== by 0x90459F2: poinco_ (Point.f 90:130)
==8906== by 0x8E5 CB3F: limaco_ (limaco.F 90:120)
==8906== by 0x8C0 AAC7: calico_ (calico.F 90:284)
==8906== by 0x8 BEE78F: charm_ (Charme.f 90:194)
==8906== by 0x833047F: op0007_ (OP0007.f 90:66)
==8906== by 0x81D82B9: ex0000_ (ex0000.F 90:69)
==8906== by 0x8175A10: execop_ (execop.F 90:83)
==8906== by 0x81028F6: expass_ (expass.F 90:82)
==8906== by 0x80 CDBDD: aster_oper (astermodule.c:2621)
==8906== by 0x408288C: Py CFunction_Call (in /usr/lib/libpython2.5.so.1.0)
==8906== by 0x40D05E8: PyEval_EvalFrameEx (in /usr/lib/libpython2.5.so.1.0)
In the case of initialized variables, it is possible, if the problem is not obvious, to ask Valgrind to go up the chain and indicate in which routine the uninitialized variable was created. To do this, you need to add the « —track-origins=yes » option. This option is available starting with version 3.4.0.
Invalid reading or writing
==11092==
==11092== Invalid write of size 4
==11092== at 0x94894EE: ajellt_ (ajellt.F 90:327)
==11092== by 0x9426F37: cazocc_ (cazocc.F 90:552)
==11092== by 0x93863C2: cazoco_ (cazoco.f 90:170)
==11092== by 0x90 DC5A3: caraco_ (Caraco. f 90:93)
==11092== by 0x8C0 DF43: calico_ (calico.F 90:279)
==11092== by 0x8 BF2FB7: charm_ (charm.f. 09:194)
==11092== by 0x832 CE8B: op0007_ (OP0007.f 90:66)
==11092== by 0x81D942D: ex0000_ (ex0000.F 90:69)
==11092== by 0x8176114: execop_ (execop.F 90:83)
==11092== by 0x81031F2: expass_ (Expass.f 90:82)
==11092== by 0x80 CE40D: aster_oper (astermodule.c:2621)
==11092== by 0x408288C: Py CFunction_Call (in /usr/lib/libpython2.5.so.1.0)
==11092== Address 0x5D3A8E4 is 0 bytes after a block of size 40,036 alloc'd
==11092== at 0x4022765: malloc (vg_replace_malloc.c:149)
==11092== by 0x816B470: hpalloc_ (hpalloc.c:30)
==11092== by 0x80 FAA8B: jjalls_ (jjalls.F 90:13)
==11092== by 0x8126D61: jxveuo_ (jxveuo_. 90:231)
==11092== by 0x80 FDE70: jjalty_ (Jjalty.f 90:59)
==11092== by 0x8104 CE1: Jeveuo_ (Jeveuo.f 90:142)
==11092== by 0x94876F6: ajellt_ (ajellt.F 90:114)
==11092== by 0x9426F37: cazocc_ (cazocc.F 90:552)
==11092== by 0x93863C2: cazoco_ (cazoco.f 90:170)
==11092== by 0x90 DC5A3: caraco_ (Caraco. f 90:93)
==11092== by 0x8C0 DF43: calico_ (calico.F 90:279)
==11092== by 0x8 BF2FB7: charm_ (charm.f. 09:194)
This block is presented in two parts. The upper part gives the description of the error and its location in the source. Here in Ajellt.f90 on line 327, we write 4 bytes outside of the memory segment that had been allocated. For information the line looked like this:
ZI (IDLITY +ZI (IDPOMA +ZI (IDAPMA) -1) +I-1) = ITYP
The lower part shows the origin of the problem. In fact, we learn that we are located at the address 0x5D3A8E4 with an offset of 0 bytes with respect to the memory segment in which we are writing (in other words we are at the end of this segment). It is therefore clear that if you write a 4-byte file, you leave the memory segment. The most valuable piece of information in the Valgrind block is that the object you’re writing outside of was allocated to Ajellt.f90 on line 114.
CALL JEVEUO (LIGRET//'. LITY ',' E ', IDLITY)
Looking at the attributes of object LIGRET. LITY, we notice that it was dimensioned hard to a length of 1000, which is the problem.
2.4. Errors detected by valgrind but that can be « forgotten »#
It is accepted that « Conditional jump or move depends on uninitialized value (s) » errors detected on the following routines are not problematic:
jjcRec.f90
Codree.F90
2.5. Valgrind for dummies#
To start a study with valgrind
check that as_run —showme param memcheck returns one line to execute valgrind.
Multiply the study time by 100 in astk or use –time_limit with waf.
in astk/options/exectool write memcheck
start the study in debug
Analysis of the.mess file
look for occurrences of « conditional jump »
if the last Fortran routine in the escalation is not part of the list of exempt routines (see § 2.4) then there is a real problem: an uninitialized variable is declared in this routine. To track this VAR variable, you can add IFs (VAR .EQ.XX) in the source.