7. Starting calculations in parallel#
7.1. Distribution of calculations#
The management of distributed calculations is activated when a parametric study is launched. In fact, each calculation is independent of the others. They can thus be submitted in parallel to reduce the return time.
Each unit calculation is submitted with the time and memory limits specified in the interface. By default, the master job will use the same limits. On compute servers, parameters can be set (for example, 200 hours on Aster5) using a plugin (see development documentation in [ASTER_ROOT] /share/codeaster/asrun/doc/).
7.1.1. Use of available resources#
You can insert a « hostfile » file into the profile (tab ETUDE). It defines the list of available machines and for each the number of processors and the quantity of memory (in MB) that can be used.
Example:
[compute01]
cpu=4
mem=8192
[compute02]
cpu=1
mem=1024
This means that up to 4 calculations can be submitted on compute01 (as long as they do not require more than 8192 MB between them) and 1 calculation on compute02 using less than 1024 MB.
In batch, you can submit a lot more calculations than there are processors available and let the batch manager distribute the calculations over a cluster for example. In this case, you can set cpu=50 to leave a maximum of 50 calculations in the batch manager.
If there is no « hostfile » file in the profile, we take the one whose name is fixed in the configuration file [ASTER_ROOT] /etc/codeaster/asrun under the label interactif_distrib_hostfile or batch_distrib_hostfile depending on the launch mode.
If no « hostfile » file is specified, the number of processors (cores in fact) and the total memory are automatically determined.
Remarks
You can easily collapse a machine by starting too many calculations with respect to the available resources. It is advisable to find out about the possibilities of using shared calculation methods (dedicated batch class for example) .
Parallel calculations count for the number of processors they use and not for 1.
Before the calculations are started, the connection to the calculation nodes is tested. The list will be limited to machines that have been successfully contacted.
7.1.2. Expiration time#
When the number of calculations to be launched is much greater than the number of processors available globally (and this is often the case), calculations are waiting to be submitted.
If a calculation requires more memory than any machine can offer, it would remain on hold indefinitely.
To avoid this, a timeout is defined equal to the time of the master calculation, i.e. the time chosen in astk during the global submission.
If no calculation has been submitted during this period, the calculation is rejected.
7.2. Enabling Code_Aster parallelism#
Code_Aster’s internal parallelism is present in two forms:
OpenMP parallelism works in shared memory and is available in solvers MULT_FRONT and MUMPS. Of course, the version of*Code_Aster must have been compiled with the appropriate options.
Parallelism MPI (by sending a message, Message Passing Interface) is available in solvers MUMPS and PETSC and in elementary calculations. The compilation is much more complicated and is not automatic when installing*Code_Aster*(you have to choose an MPI implementation, compile the prerequisites and in particular MUMPS in MPI, then*Code_Aster).
You choose the number of processors used in OpenMP and the number of processors used in MPI (spread over a certain number of calculation nodes) in the Options menu (see § 2.1.4).
7.3. Multiple executions#
This is a very specific launch mode for developers.
The objective is to run a profile (a study or a list of test cases) on several machines simultaneously.
It is activated by checking multiple=yes in the Options menu. Then, at the time of execution a window opens in order to select (by ticking) the machines on which the study or tests will be launched.
The results, including the output and error output files usually copied to the flasher directory, are copied to the $ HOME/MULTI directory ($ HOME generally being equal to /home/username). You can choose to leave the results on each machine, which is recommended if the files are generally large, or to bring all the files back to the local machine.
Of course, there are some precautions to take for this to work: the selected version must be available on all machines, the calculation parameters compatible with the resources of each machine, etc.