Getting Started

Request an account

Access to Acuario cluster

Environment in Acuario

Storage

Understanding the resource manager Slurm

 

Request an account

  • If you have a CIMNE account you need ask for an account at Acuario cluster using tickets system or senfding an e-mail to cau@cimne.upc.edu.
  • If you don’t have a CIMNE account you need fill correctly this form and your CIMNE responsible must sign it.

 

Access to Acuario cluster

GNU/Linux

First you need OpenSSH client installed and graphical environment. Then execute the next command:

ssh -X -l username acuario.cimne.upc.edu

Windows

First you need Xming (the one called “Xming” at public domain release) installed with default options and the PuTTY SSH client. Then you must be sure that Xming is started, launch PuTTY and configure it:

  1. Go to “Connection”, “SSH”, “X11” and check “Enable X11 forwarding”. Also type localhost:0 at “X display location” box.
  2. At “Session” type the Acuario cluster login node acuario.cimne.upc.edu. Afterwards type a name for this session at “Saved Sessions” box and click at “Save” button.
  3. Then whenever you want connect to Acuario cluster launch PuTTY, load the saved session and open it.

 

Environment in Acuario

In order to do certain tasks like compile code, use openMP, open MPI, Intel MPI, or use the Intel compilers, some enviroment variables must be set.

We unified the process of doing so by using the environment modules method. This method allows you to load the module you need and then all the necessary variables are set into your local environment.

Once you set your environment in your local bash, you can send your jobs to Slurm with srun, sbatch, salloc, etc and all the variables will be passed to the remote nodes.

To see which are the modules you can load, type:

[user@acuario ~]$ module avail

------------- /globalfs/etc/modulefiles ----------------------------
boost/1.61.0-b1 gcc/5.3.0 intel/clusterStudioXE2013 openmpi/2.1.1 python/3.5.1
boost/1.64.0 gcc/5.4.0 kratos/daily parmetis/3.2.0 python/3.6.1
clang/3.9.1 gcc/6.1.0 kratos-dependencies parmetis/4.0.3 VTK/7.1.1
cmake/3.5.2 gcc/6.3.0 metis/5.1.0 petsc/3.7.6-debug
cmake/3.8.2 gcc/7.1.0 openmpi/1.10.2 petsc/3.7.6-release

To load a module:

[user@acuario ~]$ module load cmake/3.8.2

To see which modules are loaded into your bash session, type:

[user@acuario ~]$ module list
Currently Loaded Modulefiles:
 1) cmake/3.8.2

To unload a module:

[user@acuario ~]$ module unload cmake/3.5.2

Tip: If you use some modules freqüently, you can add to your .bash_profile file (at your home dir) a command like “module load module-name”.

 

Storage

You have already access to two storage spaces from all Acuario machines:

  • /home: It’s where are personal user directories. We assign every user a disk quota for this storage. The disk quota has 2 limits, soft and hard. Soft limit is the real space limit but it can be overpassed until hard limit, but only for a week. This way a job can finish and user can obtain the results. User quota is shown at login time but also can be checked with “quota -sf /dev/sdb1” command.
  • /shome: It’s where are personal user directories of old Acuario cluster. This storage don’t has disk quota but it’s slower than /home.

 

Understanding the resource manager Slurm

Due to the necessary use of a resource manager there are a few concepts that you must take into account.

First, the notation:

  • Partition: A set of nodes.
  • Node: A computation node with his memory and his processors.
  • Processor: A physical processor like Intel Xeon E5-2670. In Acuario there are nodes with 2 or 4 processors.
  • Cores: A physical core of a processor. In Acuario there are processors that have from 8 to 10 cores.

 

 

 

 

 

 

 

 

 

The cluster is like a bank that gives you some resources for computing. In order to get the resources you want, you must order a reservation.

You can reserve two main resources:

  • Cores
  • Memory

Moreover there are some partitions of nodes (see below). In your resource reservation you can select on which partition your job will be launched. Each partition can have some restrictions, for example max. time of job, max. num of queued jobs, max. mem per job, etc.

 

Job computation time

You can also specify your job estimated computation time. It’s optional but has advantages. For example, suppose the following scenario:

  • Cluster with 3 computing nodes.
  • A defined partition including these 3 nodes, and with default job time set to infinite.

Then, suppose a job that is running this partition, and is using the entire 3 nodes. Let’s call this Job1. Suppose that there are also one job called Job2, that have no time specification and that are waiting for resources on these partition, exactly for 3 full nodes. Now, suppose that you launch a job called Job3 that needs one entire node, and supose that you don’t specify the estimated running time. The result queue will be:

Job 1 - Running    - Estimated time, infinite - 2 Nodes
Job 2 - Waiting... - Estimated time, infinite - 3 Nodes
Job 3 - Waiting... - Estimated time, 1 day    - 1 Node

But, if you specify the running time of your Job 3, then your job will run immediately! It’s because your job will not delay the execution time of Job 2 since Job1 have a estimated finishing time of infinite.

So, be sure that specifying an estimated time for your job can be advantageous!

 

Memory resource

Specifying the amount of memory reserved to your job it’s very important because the default memory is 1GB per core. So if you run an application that requires more than the default memory, your job will be killed. Memory restrictions are implemented this way in order to prevent oversubscribing of memory and in consequence, swapping.

The total amount of memory that you can reserve has to be calculated taking into account the total of memory available at every node. For example, if you want to run a serial job which consume 32GB of RAM, you must do a reservation of 32GB on one single node. Be careful to not sending this job to a partition with nodes that have less than 32GB per node, because it won’t be accepted. Also, if there are not enough available memory at used nodes the job will be in pending state till other jobs using this memory finishes. In order to specify the memory reserved use the –mem option.

If you want to be more specific, for example when running OpenMP or MPI jobs, you can also specify the total amount of RAM per CPU (actually per core), with the –mem-per-cpu option. If in an hypothetical case you reserve 10GB per CPU, be aware that if your thread runs in only one node (typically OpenMP jobs), and has 8 threads, you are reserving 8×10 = 80GB of RAM!

 

Where do you want to compute?

When you submit your job you can specify the partition in which you want it to be computed. Remember that the partitions are defined as a pool of nodes, you can see which are defined in the system through sview or sinfo.

To list the current partitions status on the system, run sinfo:

[user@acuario ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
HighParallelization up 10-00:00:0 3 mix pez[026-028]
HighParallelization up 10-00:00:0 9 alloc pez[017-025]
R815 up 10-00:00:0 1 idle pez035
R815-dev up 1:00:00 1 idle pez035
R630* up 10-00:00:0 1 mix pez045
R630* up 10-00:00:0 1 alloc pez046
R630-dev up 1:00:00 1 mix pez045
R630-dev up 1:00:00 1 alloc pez046

Or better execute sview to graphically see which partitions are configured and the full info (name, priority, max. time, node list, etc.) of everyone:

 

 

 

 

 

 

 

 

You can specify the partition where you want run your job using the –partition option. Also you can specify the node where your job will run using the –nodelist option. Be sure that the selected node forms part of the selected partition. If you don’t specify any partition Slurm assigns your job the default partition.

The current partition scheme at Oct 2017 is like this:

Partition Name Time Limit #nodes Node List CPU Model Cores per Node Memory per Node Intended usage
R630 (default)
10 days 2 pez[045-046] Intel Xeon E5-2630 v3 16 128GB OpenMP/MPI
R630-dev
1 hour 2 pez[045-046] Intel Xeon E5-2630 v3 16 128GB Quick tests
R815 10 days 1 pez035 AMD Opteron 6376 64 256GB OpenMP
R815-dev 1 hour 1 pez035 AMD Opteron 6376 64 256GB Quick tests
HighParallelization 10 days 12 pez[017-028] Intel Xeon E5-2670 16 64GB MPI
COMP-DES-MAT 10 days 12 pez[029-032]/pez[036-043] Intel Xeon E5-2670/E5-2660 v2 16/20 64/128GB Restricted to COMP-DES-MAT group
COMP-DES-MAT-ALL 10 days 24 pez[017-032]/pez[036-043] Intel Xeon E5-2670/E5-2660 v2 16/20 64/128GB Restricted to COMP-DES-MAT group
  • HighParallelization partition is at testing period. We reserve us the right to change it.
  • Jobs launched at COMP-DES-MAT-ALL have higher priority than those launched at HighParallelization. So if a job is launched at COMP-DES-MAT-ALL and requires a resource (a node or core) that is being used by a job at HighParallelization, the job at HighParallelization will be killed and requeued.
  • Jobs launched at R630-dev and R815-dev have higher priority than R630 and R815 respectively. Jobs at R630 and R815 won’t be killed.

 

How do I run serial jobs in Acuario?

Create a script called “run.sh” and fill it with the following content. Change the SBATCH parameters, Job name, and executable.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks=1

##Optional - Required memory in MB per core. Defaults are 1GB per core.
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=

########### Further details -> man sbatch ##########

cd /home/user/binaries/
./binary

Then execute the following:

[user@acuario ~]$ sbatch run.sh
srun: jobid 3214 submitted

 

How do I run OpenMP jobs in Acuario?

Create a script called “run.sh”, and fill it with the following content. Change the SBATCH parameters, Job name, and executable.

In this case, the ntasks-per-node should be greater or equal to the OMP_NUM_THREADS.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --partition=R815
#SBATCH --ntasks-per-node=8

##Optional - Required memory in MB per node, or per core. Defaults are 1GB per core.
##SBATCH --mem=3072
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=

########### Further details -> man sbatch ##########

export OMP_NUM_THREADS=8
./binary

Then execute the following:

[user@acuario ~]$ sbatch run.sh
srun: jobid 3214 submitted

 

How do I run Open MPI jobs in Acuario?

Create a script called “run.sh”, and fill it with the following content. Change the SBATCH parameters, Job name, and executable. The –ntasks parameter will be passed to mpirun, and will run only one task per core.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --ntasks=Number_of_MPI_tasks

##Optional - Required memory in MB per node, or per core. Defaults are 1GB per core.
##SBATCH --mem=3072
##SBATCH --mem-per-cpu=3072

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=24:00:00

########### Further details -> man sbatch ##########

srun --mpi=pmi2 ./binary

Then execute the following:

[user@acuario ~]$ sbatch run.sh
Submitted batch job 29

 

How do I run Intel MPI jobs in Acuario?

Load the necessary modules with:

[user@acuario ~]$ module load intel/clusterStudioXE2013

Create a script called “run.sh” , and fill it with the following content. Change the SBATCH parameters, Job name, and executable. The –ntasks parameter will be passed as the number of processors like if you were executing mpirun -np xx .

Also take care of NOT having the module openmpi/1.6.2 loaded. List currently loaded modules with “module list”, and unload with the command module unload modulename.

#!/bin/bash
#SBATCH --job-name=JobName
#SBATCH --output=JobName-output-job_%j.out
#SBATCH --error=JobName-output-job_%j.err
#SBATCH --partition=HighParallelization
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks=96

##Optional - Required memory in MB
##SBATCH --mem=2048

##Optional - Estimated execution time
##Acceptable time formats include  "minutes",   "minutes:seconds",
##"hours:minutes:seconds",   "days-hours",   "days-hours:minutes" ,"days-hours:minutes:seconds".
##SBATCH --time=24:00:00

########### Further details -> man sbatch ##########

cd /home/user/mpibinary/
srun intelmpiexecutable

Then execute the following:

[user@acuario ~]$ sbatch run.sh
srun: jobid 3214 submitted

 

Basic Commands

  • sbatch – submit a job to the batch queue system
  • squeue – check the current jobs in the batch queue system
  • sinfo – view the current status of the queues
  • scancel – cancel a job
  • sview – run a graphical tool to control jobs, see partitions and nodes status.
  • smap – run a console tool to control jobs, see partitions and nodes status.
  • sacct – displays statistics data for all jobs and job steps
  • sstat – displays current usage resources for all jobs and job steps

Example:

$] sstat -j 20794.batch
..
$] sacct --format=jobid,User,NodeList,AllocCPUS,AveRSS,MaxRSS,Partition,UserCPU,State -j 24464 
...

 

More info

If you need more information take a look to the manual of sbatch (“man sbatch”) or the Slurm Documentation.

Comments are closed.