Atom topic feed | site map | contact | login | Protection des données personnelles | Powered by FluxBB | réalisation artaban
You are not logged in.
Hello.
I understand that the MPI version of Aster must be compiled or used via the Docker version.
But following this post (code-aster.org/forum2/viewtopic.php?id=25531), I simply increased the number of threads to 8,
as shown in the attached figure. My processor worked with a higher load, but the computational time almost doubled.
What would be the expected effect of just increasing the Number of threads in Aster (Salome)?
- using default settings: simulation took ~104 seconds
- using 8 threads: simulation took ~206 seconds
Data:
system: Ubuntu 20
Salome version 2019
Aster v14.4
transient THER_NON_LINE simulation with METHODE='NEWTON'
SOLVEUR=...METHODE='MUMPS'....
~20.000 nodes (I know its a small model, but I have hundreds of timesteps, any decrease in time will help me.)
Can provide more information if necessary. Thanks!
Last edited by rodrigofarias (2021-08-19 19:47:12)
Offline
Hi,
here some hints from me:
You use parallelisation method openMP not mpi.
(mpi requires, additional to openMP, domain decomposition what you can do with scotch for instance. You can google after scotch and domain decomposition. Then it becomes more clear…)
My experience with a small model and many time steps (Dyn_non_line) was:
- starting only one thread
- switch out hyper threading in the Computer BIOS
- seek for a computer with the highest tact frequency
Offline
Hello,
it also depends on the number of cores your CPU has. From what you write, I conclude, that your CPU doesn't have 16 cores (for such a CPU, 8 OpenMP-threads will give the best results).
Sequential Version: If your CPU has 8 cores set number of threads to 4 or leave the default setting (which should also start 4 OpenMP processes anyway during e.g. STAT_NON_LINE). Also, as Volker has written, turn HT off. This way, your computation will be less interrupted by other running processes (it will be a little slower with 'every-day'-tasks, however).
The CPU in the attached image has 10 cores so I would set it to 5.
With the MPI-version, it is a bit different, though,
Mario.
Offline
Hi,
here some hints from me:You use parallelization method openMP not mpi.
(mpi requires, additional to openMP, domain decomposition what you can do with scotch for instance. You can google after scotch and domain decomposition. Then it becomes more clear…)My experience with a small model and many time steps (Dyn_non_line) was:
- starting only one thread
- switch out hyper threading in the Computer BIOS
- seek for a computer with the highest tact frequency
Thanks for the reply Volker.
- my HT is already off (it gets in the way of Ansys too)
- I have a 5900x ryzen ( ~4.5 GHz with heavy load)
- I didn't touch the decomposition ( I'm a Ansys user for years, so I let the software do this for me)
I only ran the beginning of the simulation. With 1 core usually takes around 2 hours to complete.
Offline
Hello,
it also depends on the number of cores your CPU has. From what you write, I conclude, that your CPU doesn't have 16 cores (for such a CPU, 8 OpenMP-threads will give the best results).
Sequential Version: If your CPU has 8 cores set number of threads to 4 or leave the default setting (which should also start 4 OpenMP processes anyway during e.g. STAT_NON_LINE). Also, as Volker has written, turn HT off. This way, your computation will be less interrupted by other running processes (it will be a little slower with 'every-day'-tasks, however).
The CPU in the attached image has 10 cores so I would set it to 5.
With the MPI-version, it is a bit different, though,
Mario.
Thanks for the reply.
- I have a 5900x ryzen 12 cores, so the correct is to me use 6 cores than.
- my HT is already off (it gets in the way of Ansys too)
I only ran the beginning of the simulation. With 1 core usually takes around 2 hours to complete.
I will try with 6 cores and get back to you guys. Thanks!
Offline
Hi guys.
I tested with half the number of cores I have (6/12), and achieved around ~102 seconds, the same time as using only
one thread. At least for my model and configurations, altering the number of threads does not seem to have any difference.
I'm trying to use the Docker with parallel version, but I having a problem. I left a question in the permanent post
(code-aster.org/forum2/viewtopic.php?id=23453&p=3)
about the Docker version, but no one responded yet.
Thanks for any help.
Offline