enfr

Code_Aster and parallelism

8 July 2010

by O. Boiteau, T. de Soza, N. Sellenet and N. Tardieu, EDF R&D/AMA

In Code_Aster parallelism keeps improving. Today users can simply implement it via Astk. To do so it is necessary to choose the MPI version of Code_Aster, use the MUMPS solver (SOLVEUR=’ MUMPS’ in the command file) and show Astk the number of desired MPI processes (mpi_nbcpu). From there Code_Aster automatically generates a cyclic distribution of the mesh.

These basic options make two steps operate at the same time: elementary computations and assembly – which take the longest computation time, and resolutions of linear systems – which take the most important memory.

A more advanced use is possible, nevertheless it requires more parameterization.

Indeed it is possible to modify the default partitioning to obtain better performances. In order to do so we use the command MODI_MODELE which enables to choose between various distributions of mesh/processors: MAIL_CONTIGU (distribution per package of contiguous mesh), MAIL_DISPERSE (cyclic distribution by mesh), or SOUS_DOMAINE (distribution of the mesh via a partitioning computed beforehand by operator DEFI_PART_FETI).

Other options enable to reduce the computation’s memory consumption. For instance we can decrease that intrinsic of Code_Aster via the key word SOLVER: it is possible to use MATR_DISTRIBUEE=’ OUI’. Thus each MPI process will only build the matrix pieces corresponding to the packages of mesh for which it is responsible. In addition we can significantly decrease the memory peak of the linear system resolution (profit up to factor 6) by activating the out-of-core memory management of MUMPS solver (key word SOLVER, OUT_OF_CORE=’ OUI’).

Gains in memory thanks to MATR_DISTRIBUEE on the RIS Pump (800 000 d. o. f.) and Epicure (860 000 d. o. f.) studies according to the number of processes.
On the graph we can see the consumption of the memory manager: JEVEUX_std (MATR_DISTRIBUEE=’ NON’), JEVEUX_dist (MATR_DISTRIBUEE=’ OUI’) and the memory consumption of MUMPS solver in out-of-core.

In the same way, parallelism and its performances are followed week after week thanks to the addition of new use tests in the use test base (example: perf009, perf010 et perf011).

Usetest perf010 (499 203 ddl)
Epicure study: perf011 (860 000 ddl)

On these graphs, we can notice the contribution of parallelism used with its basic options. In the case of the Epicure study we note that we can gain up to a factor 10 on 32 processors.

For more information, please refer to documentations U2.08.06 ’Notice d’utilisation du parallélisme’, U1.03.03 ’Indicateur de performance d’un calcul (temps/mémoire)’ and U4.50.01 ’Mot-clé SOLVEUR’.