Running jobs

A complete overview of the subject can be found in our manual, read it!

Shared resources

As computational resources are limited it is of paramount importance to ensure a fair access to all users, as well as an efficient use of the infrastructure. This problem is addressed using the SLURM batch scheduler, which takes care of collecting job requests from all users and assigns them the requested computational resources when these are available.

Job submission

To run jobs on the cluster, users must submit a request to the SLURM scheduler. Never execute a program directly from the shell: it is an abuse of our policies and shows a lack of respect towards your colleagues! Requests to the SLURM scheduler are done via a batch script which contains 3 sections:

Job parameters (i.e. which partition, how many nodes, how many CPUs and GPUs, how much time, duration).
Software setup (i.e. define the required software and setup the relative environment variables).
Simulation run (i.e. execute the program and perform the actual simulation).

After preparing the SLURM submission script, the job is simply submitted to the scheduler as:

sbatch script_name

Monitor the job state

After a job request has been submitted to the SLURM scheduler, it is possible to monitor its progress and/or check its status in the queue. This can be simply achieved with the following command:

squeue -u username

where username is your actual username.

The output of this command will show you whether your job is being held in queue waiting for resources (state=PD) or if its running (state=R), for how long and on which compute nodes.

You can simply copy data (files and folders) from $HOME to $SCRATCH using cp as:

cp -r $HOME/job_folder/file $SCRATCH/job_folder/

and viceversa from $SCRATCH to $HOME as:

cp -r $SCRATCH/job_folder/file $HOME/job_folder/

or using rsync if you just need to copy newly produced data back to home. In the manual it is also described how to embed data movement from $HOME to $SCRATCH and back directly within a SLURM submission script.

HPC@PoliTO

Running jobs