Running jobs
A complete overview of the subject can be found in our manual, read it!
Shared resources
As computational resources are limited it is of paramount importance to ensure a fair access to all users, as well as an efficient use of the infrastructure. This problem is addressed using the SLURM batch scheduler, which takes care of collecting job requests from all users and assigns them the requested computational resources when these are available.Job submission
To run jobs on the cluster, users must submit a request to the SLURM scheduler. Never execute a program directly from the shell: it is an abuse of our policies and shows a lack of respect towards your colleagues! Requests to the SLURM scheduler are done via a batch script which contains 3 sections:- Job parameters (i.e. which partition, how many nodes, how many CPUs and GPUs, how much time, duration).
- Software setup (i.e. define the required software and setup the relative environment variables).
- Simulation run (i.e. execute the program and perform the actual simulation).
sbatch script_name
Monitor the job state
After a job request has been submitted to the SLURM scheduler, it is possible to monitor its progress and/or check its status in the queue. This can be simply achieved with the following command:squeue -u username
where username is your actual username.The output of this command will show you whether your job is being held in queue waiting for resources (state=PD) or if its running (state=R), for how long and on which compute nodes.
You can simply copy data (files and folders) from $HOME to $SCRATCH using cp as:cp -r $HOME/job_folder/file $SCRATCH/job_folder/
and viceversa from $SCRATCH to $HOME as:cp -r $SCRATCH/job_folder/file $HOME/job_folder/
or using rsync if you just need to copy newly produced data back to home. In the manual it is also described how to embed data movement from $HOME to $SCRATCH and back directly within a SLURM submission script.