Welcome to the forums. Please post in English or French.

You are not logged in.

#1 2011-11-04 18:58:38

JMB365
Member
Registered: 2008-01-19
Posts: 781

[Solved] MPI CPU load distribution not happening

Hello,

I have setup a 2 PC cluster (ubuntu34 & ubuntu35), using a standard CAELinux2011 with mpich2 package added in (I believe properly configured, since the 'Pi' test program [cpi.c] works okay in this 2 PC cluster).  I am using mumps01a.* as a test case.  It seems that with mpi_nbcpu=2 (ncpus=1 & mpi_nbnoeud=1); I always see two cores being used on ubuntu34, regardless of whether I submit the via ASTK job on ubuntu34 or ubuntu35.  I do not see the job being run on any of the cores of ubuntu35!  Can anybody clue me as to why?  Thanks.

The hostfile I am using for the jobs is:

ubuntu34:4
ubuntu35:4

Reversing the order in the $HOME/mpi_hostfile makes no difference.   I have tried various things that have not solved this problem, such as:
- Increasing mpi_nbcpu, beyond two does not run.
- Using ubuntu34:1 and ubuntu34:1 in the mpi_hostfile does not help.

I am baffled...

Regards, JMB

Last edited by JMB365 (2011-11-09 22:30:15)


SalomeMeca 2021
Ubuntu 20.04, 22.04

Offline

#2 2011-11-05 18:11:29

mathieu.courtois
Administrator
From: France
Registered: 2007-11-21
Posts: 1,178

Re: [Solved] MPI CPU load distribution not happening

mpi_nbnoeud=1 : (noeud means node) it runs on 1 node, QED.

Try with mpi_cpu=2 (==cores) and mpi_nbnoeud=2 (==nodes), it should use 1 core on each node.


Code_Aster release : last unstable on Ubuntu 16.04 64 bits - GNU Compilers

Please do not forget to tag your first post as *SOLVED* when it is!

Offline

#3 2011-11-06 15:58:19

JMB365
Member
Registered: 2008-01-19
Posts: 781

Re: [Solved] MPI CPU load distribution not happening

courtois wrote:

Try with mpi_cpu=2 (==cores) and mpi_nbnoeud=2 (==nodes), it should use 1 core on each node.

Hello courtois,
Thanks for the reply and suggestion, but it made no difference.  The job submitted on ubuntu35, continues to run ONLY on ubuntu34 with both its cores showing activity and nothing happening in ubuntu35's (4/8) cores.  Trying vice-versa also results in the job being run only in ubuntu34.

I wonder if the following has anything to do with it:
ubuntu34: Intel QuadCore (4 cores)
ubuntu35: Intel QuadCore but with hyperthreading (equivalent to 8 cores)

Regards, JMB


SalomeMeca 2021
Ubuntu 20.04, 22.04

Offline

#4 2011-11-09 17:51:38

jcugnoni
Member
Registered: 2007-12-05
Posts: 65

Re: [Solved] MPI CPU load distribution not happening

Hi JMB,

to use MPi in CAELinux 2011, you don't need (and should not install) MPICH2 , Code-Aster 11.0 is already compiled using openMPI libraries (and having several MPI libraries installed in the system may create configuration problems).

Personnally, this is the way I proceed, starting from 2 PC with a fresh install of CAELinux 2011 (even if using LiveDVD/liveUSB mode)
so here is a small "How To" for you and others:

1) setup network to have interconnection: I use Network Manager to setup static IP adresses.
set hostnames:

on machine 1: sudo hostname caepc1
on machine 2: sudo hostname caepc2


2) edit /etc/hosts of both machines to define host/ip relationships

sudo nano /etc/hosts

add such lines after 127.0.1.1 xxxx :

192.168.0.1 caepc1
192.168.0.2 caepc2

3) edit your configuration settings directly in /opt/aster110/etc/codeaster/aster-mpihosts

for example (use OpenMPI syntax):

caepc1 slots=1
caepc2 slots=1

4) optional: if you have more than 8Gb Ram per node or more than 16 cores in the cluster, edit also /opt/aster110/etc/codeaster/asrun to tune "interactif_memmax" = max memory per node and "interactif_mpi_nbpmax" = number of cores in the cluster

(optional) passwords: if using liveVD/liveUSB mode, you need to set a password for the default user caelinux.
so on each node, run in a terminal "passwd"  (default password is empty) to set a new password

5) ssh setup: you need ssh login without passwords between the two hosts:
on first node, run
scp /home/caelinux/.ssh/id* caepc2:/home/caelinux/.ssh/
scp /home/caelinux/.ssh/authorized* caepc2:/home/caelinux/.ssh/
ssh-keyscan caepc1 >> /home/caelinux/.ssh/known_hosts
ssh-keyscan caepc2 >> /home/caelinux/.ssh/known_hosts
scp /home/caelinux/.ssh/known_hosts caepc2:/home/caelinux/.ssh/

6) setup a shared temp directory with NFS
on node 1

sudo mkdir /srv/shared_tmp
sudo chmod a+rwx /srv/shared_tmp
sudo nano /etc/exports

then add the following line and save:

/srv/shared_tmp    *(rw,async)

then

sudo exportfs -a

Now create the mount point and mount the shared folder, run this on all nodes:

sudo mkdir /mnt/shared_tmp
sudo chmod a+rwx /mnt/shared_tmp
sudo mount -t nfs -o rw,rsize=8192,wsize=8192 caepc1:/srv/shared_tmp /mnt/shared_tmp

7) setup Aster config to use this shared temp directory:
nano /opt/aster110/eetc/codeaster/asrun

edit the line with "shared_tmp" as follows:

shared_tmp : /mnt/shared_tmp

then save

8) Open ASTK , go in server and refresh; create your Job, select Options ncpus=1 (no openMP) , mpi_nbcpu= total number of cores to use (nb_noeu*cores_per_host)  and mpi_nbnoeud = number of compute nodes

And finally it should run on several nodes!!

Actually , the hard point is that you NEED to have shared tmp folder to run the jobs on a cluster.

Offline

#5 2011-11-09 22:13:12

JMB365
Member
Registered: 2008-01-19
Posts: 781

Re: [Solved] MPI CPU load distribution not happening

Hello jcugnoni,

Thank you for the detailed reply!  It was most useful and now it works as expected!!!

Previously I had ensured that the steps 1 ~ 7 you had posted were in place.  My mistake was to install mpich2 and mpich2-doc and its attendant changes forced upon /opt/aster110/etc/codeaster/asrun and so on.  Once I removed the two packages and backtracked all the changes I had made to make mpich2 work, the parallelism of CodeAster_MPI works "almost-out-of-the-box".  I say almost because one does have to ensure that the other pre-requisite steps 1  ~ 7 are in place.  Then step 8 worked as you had stated.

Thanks for the wonderful job on CAELInux2011 and the above "How To".  I am VERY much grateful for it...

Regards, JMB

Last edited by JMB365 (2011-11-11 02:21:30)


SalomeMeca 2021
Ubuntu 20.04, 22.04

Offline