Getting Started

Request an account

Access to Acuario cluster

Environment in Acuario

Storage

Understanding the resource manager Slurm

 

Request an account

  • If you have a CIMNE account you need ask for an account at Acuario cluster using tickets system or senfding an e-mail to cau@cimne.upc.edu.
  • If you don’t have a CIMNE account you need fill correctly this form and your CIMNE responsible must sign it.

 

Access to Acuario cluster

GNU/Linux

First you need OpenSSH client installed and graphical environment. Then execute the next command:

ssh -X -l username hpc0.cimne.upc.edu

Windows

First you need Xming (the one called “Xming” at public domain release) installed with default options and the PuTTY SSH client. Then you must be sure that Xming is started, launch PuTTY and configure it:

  1. Go to “Connection”, “SSH”, “X11” and check “Enable X11 forwarding”. Also type localhost:0 at “X display location” box.
  2. At “Session” type the Acuario cluster login node hpc0.cimne.upc.edu. Afterwards type a name for this session at “Saved Sessions” box and click at “Save” button.
  3. Then whenever you want connect to Acuario cluster launch PuTTY, load the saved session and open it.

 

Environment in Acuario

In order to do certain tasks like compile code, use openMP, open MPI, Intel MPI, or use the Intel compilers, some enviroment variables must be set.

We unified the process of doing so by using the environment modules method. This method allows you to load the module you need and then all the necessary variables are set into your local environment.

Once you set your environment in your local bash, you can send your jobs to Slurm with srun, sbatch, salloc, etc and all the variables will be passed to the remote nodes.

To see which are the modules you can load, type:

[user@hpc0 ~]$ module avail

------------------------------------------------------------------------ /globalfs/etc/modulefiles -------------------------------------------------------------------------
boost/1.61.0-b1 gcc/5.3.0 intel/clusterStudioXE2013 metis/5.1.0 parmetis/3.2.0 python/3.5.1
cmake/3.5.2 gcc/6.1.0 kratos/daily openmpi/1.10.2 parmetis/4.0.3

To load a module:

[user@hpc0 ~]$ module load cmake/3.5.2

To see which modules are loaded into your bash session, type:

[user@hpc0 ~]$ module list
Currently Loaded Modulefiles:
 1) cmake/3.5.2

To unload a module:

[user@hpc0 ~]$ module unload cmake/3.5.2

Tip: If you use some modules freqüently, you can add to your .bash_profile file (at your home dir) a command like “module load module-name”.

 

Storage

You have already access to two storage spaces from all Acuario machines:

  • /home: It’s where are personal user directories. We assign every user a disk quota for this storage. The disk quota has 2 limits, soft and hard. Soft limit is the real space limit but it can be overpassed until hard limit, but only for a week. This way a job can finish and user can obtain the results. User quota is shown at login time but also can be checked with “quota -sf /dev/sdb1” command.
  • /shome: It’s where are personal user directories of old Acuario cluster. This storage don’t has disk quota but it’s slower than /home.

 

Understanding the resource manager Slurm

Due to the necessary use of a resource manager there are a few concepts that you must take into account.

First, the notation

  • Partition: A set of nodes.
  • Node: A computation node with his memory and his processors.
  • Processor: A physical processor like Intel Xeon E5-2670. In Acuario there are nodes with 2 or 4 processors.
  • Cores: A physical core of a processor. In Acuario there are processors that have from 4 to 16 cores.

The cluster is like a bank that gives you some resources for computing. In order to get the resources you want, you must order a reservation.

You can reserve two main resources:

  • Cores
  • Memory

Moreover there are some partitions (queues) of nodes. In your resource reservation you can select on which partition your job must be launched. Each partition can have some restrictions, for example max. time of job, max. num of queued jobs, max. mem per job, etc.

Job computation time

You can also specify your job estimated computation time:

It’s optional but has advantages.

For example, supose the following scenario:
– Cluster with 3 computing nodes.
– A defined partition including these 3 nodes, and with default job time set to inifinite.

Then, supose a job that is running this partition, and is using the entire 3 nodes. Let’s call this Job1.
Supose that there are also one job called Job2, that have no time specification and that are waiting for resources on these partition, exactly for 3 full nodes.
Now, supose that you launch a job called Job3 that needs one entire node, and supose that you don’t specify the estimated running time.

The result queue will be:

Job 1 – Running – Estimaded time, infinite – 2 Nodes
Job 2 – Waiting… – Estimated time, infinite. – 3 Nodes
Job 3 – Waiting… – Estimated time, 1 day. – 1 Node

But, if you specify the running time of your Job 3, then your job will run immediatly! It’s because your job will not delay the execution time of Job 2 since Job1 have a estimated finishing time of infinite.

So, be sure that specifying an estimated time for your job can be advantageous!

Memory resource

Specifying the amount of memory for your job is also good because the default memory is 1GB per core. So if you run an application that requires more than the default memory, your job will be killed. Memory restrictions are implemented in such a way in order to prevent oversuscribing of memory and in consequence, swapping.

Note that the memory limit can be useful in conducting performance studies. If your code runs out of physical memory and begins to use swap space, the performance will be severely degraded. For a performance study, this may be considered an invalid result and you may want to try a smaller problem, use more nodes, etc. One way to protect against this is to reserve entire nodes and set the memory limit to the max. memory of the nodes (or less). That is about the maximum you can use before swapping starts to occur. Then the batch system will kill your job if it’s close enough to swapping.

The total amount of memory that you can reserve has to be calculated taking into account the total of memory of every node.

For example, if you want to run a serial job which consume 32GB of RAM, you will want to do a reservation of 32GB on one single node. Be careful to not sending this job to a queue with nodes with less than 32GB per node, because it won’t be accepted.

If you want to be more specific, for example when running OpenMP or MPI jobs, you can also specify the total amount of RAM per cpu (per core), with the –mem-per-cpu option. If in an hypotetic case you reserve 10GB per cpu, be aware that if your thread runs in only one node (tipically OpenMP jobs), and has 8 threads, you are reserving 8×10 = 80GB of ram!!

Where do you want to compute?

When you submit your job you can specify the partition in which you want it to be computed. Remember that the partitions are defined as a pool of nodes, you can see which are defined in the system through sview or sinfo.

To list the current partitions on the system, run sinfo:

[user@hpc0 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
HighParallelizatio* up 1-00:00:00 25 idle pez[017-033,036-043]
R815 up 15-00:00:0 1 idle pez035

Or better execute sview to graphically see which partitions are configured and the full info (name, priority, max. time, node list, etc.) of everyone:

 

 

 

 

 

The current partition scheme at Jan 2017 is like this:

Partition Name Time Limit #nodes Node List CPU Model Cores per Node Memory per Node Intended usage
R630 (default)
10 days 2 pez[045-046] Intel Xeon E5-2630 v3 16 128GB OpenMP/MPI
R815 10 days 1 pez035 AMD Opteron 6376 64 256GB OpenMP
HighParallelization 10 days 12 pez[017-028] Intel Xeon E5-2670 16 64GB MPI
COMP-DES-MAT 10 days 12 pez[029-032]/pez[036-043] Intel Xeon E5-2670/E5-2660 v2 16/20 64/128GB Restricted to COMP-DES-MAT group
COMP-DES-MAT-ALL 10 days 24 pez[017-032]/pez[036-043] Intel Xeon E5-2670/E5-2660 v2 16/20 64/128GB Restricted to COMP-DES-MAT group
COMP-DES-MAT-VIP 10 days 24 pez[017-032]/pez[036-043] Intel Xeon E5-2670/E5-2660 v2 16/20 64/128GB Restricted to COMP-DES-MAT group
  • HighParallelization partition is at testing period. We reserve us the right to change it.
  • Jobs launched at COMP-DES-MAT-ALL/VIP have higher priority than those launched at HighParallelization. So if a job is launched at COMP-DES-MAT-ALL/VIP and requires a resource (a node or core) that is being used by a job at HighParallelization, the job at HighParallelization will be killed and requeued.

How do I run serial jobs in Acuario?

Create a script called “run.sh” and fill it with the following content. Change the SBATCH parameters, Job name, and executable.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks=1

##Optional - Required memory in MB per core. Defaults are 1GB per core.
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=

########### Further details -> man sbatch ##########

cd /home/user/binaries/
./binary

Then execute the following:

[user@hpc0 ~]$ sbatch run.sh
srun: jobid 3214 submitted

How do I run OpenMP jobs in Acuario?

Create a script called “run.sh”, and fill it with the following content. Change the SBATCH parameters, Job name, and executable.

In this case, the ntasks-per-node should be greater or equal to the OMP_NUM_THREADS.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --partition=R815
#SBATCH --ntasks-per-node=8

##Optional - Required memory in MB per node, or per core. Defaults are 1GB per core.
##SBATCH --mem=3072
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=

########### Further details -> man sbatch ##########

export OMP_NUM_THREADS=8
./binary

Then execute the following:

[user@hpc0 ~]$ sbatch run.sh
srun: jobid 3214 submitted

How do I run Open MPI jobs in Acuario?

Create a script called “run.sh”, and fill it with the following content. Change the SBATCH parameters, Job name, and executable. The –ntasks parameter will be passed to mpirun, and will run only one task per core.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --ntasks=Number_of_MPI_tasks

##Optional - Required memory in MB per node, or per core. Defaults are 1GB per core.
##SBATCH --mem=3072
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=24:00:00

########### Further details -> man sbatch ##########

srun --mpi=pmi2 ./binary

Then execute the following:

[user@hpc0 ~]$ sbatch run.sh
Submitted batch job 29

How do I run Intel MPI jobs in Acuario?

Load the necessary modules with:

[user@hpc0 ~]$ module load intel/clusterStudioXE2013

Create a script called “run.sh” , and fill it with the following content. Change the SBATCH parameters, Job name, and executable. The –ntasks parameter will be passed as the number of processors like if you were executing mpirun -np xx .

Also take care of NOT having the module openmpi/1.6.2 loaded. List currently loaded modules with “module list”, and unload with the command module unload modulename.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --partition=HighParallelization
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks=96

##Optional - Required memory in MB
##SBATCH --mem=2048

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=24:00:00

########### Further details -> man sbatch ##########

cd /home/user/mpibinary/
srun intelmpiexecutable

Then execute the following:

[user@hpc0 ~]$ sbatch run.sh
srun: jobid 3214 submitted

Basic Commands

  • sbatch – submit a job to the batch queue system
  • squeue – check the current jobs in the batch queue system
  • sinfo – view the current status of the queues
  • scancel – cancel a job
  • sview – run a graphical tool to control jobs, see partitions and nodes status.
  • smap – run a console tool to control jobs, see partitions and nodes status.
  • sacct – displays statistics data for all jobs and job steps
  • sstat – displays current usage resources for all jobs and job steps

Example:

$] sstat -j 20794.batch
..
$] sacct --format=jobid,User,NodeList,AllocCPUS,AveRSS,MaxRSS,Partition,UserCPU,State -j 24464 
...

More info

If you need more information take a look to the manual of sbatch (“man sbatch”) or the Slurm Documentation.

Comments are closed.