Partitions

Partitions

The ALICE cluster is divided into a number of logical partitions which support different types of jobs. These are the equivalent of seperate scheduling queues in other schedulers.

In most cases, if a partition is not specified, jobs will be routed to the correct partition by the scheduler, based on the resources they request. Some jobs (parallel and devel jobs) cannot be easilly identified based on their resource requests so must explicitly specify the partition they require.

Also note that some partitions are only available to members of HPC projects. If you require access to these partitions, you must join or create an HPC project.

The partitions available on ALICE are listed below.

standard jobs - short, medium, long

These partitions support small (1 node or less) serial or shared memory jobs. These are jobs that require 1-64 cores and up to 240GB memory.

The short and medium partitions support job runtimes of up to 24 hours (short), 7 days (medium) up to 21 days (long). There is usually no need to explicitly request these partitions, the scheduler will automatically route suitable jobs to the correct partition.

partition	cores	nodes	memory	max runtime	availability
short	1-64	1	up to 240GB	24 hours	all users
medium	1-64	1	up to 240GB	7 days	all users
long	1-64	1	up to 240GB	21 days	HPC users only

development jobs - the devel partition

The devel partition is intended for development and testing of new code and other cases which require rapid scheduling of small jobs - 2 nodes are reserved for this partition, though jobs can run anywhere that resources are available.

Development jobs are limited to a maximum of 2 hours walltime and a maximum of 4 nodes (256 cores). They can run on high memory nodes (if the memory request is greater than 240GB per node), however turn around time on high memory nodes is likely to be much longer than for standard nodes.

The partition is available to all users, however a user may only have a single job running in this partition at any time.

partition	cores	nodes	memory	max runtime	availability
devel	1-64 per node	1-4	up to 2TB per node <240GB per node recommended	2 hours	all users

The devel partition must be explicitly requested in your job script using:

#SBATCH --partition=devel

or by passing --partition=devel as a command line parameter to sbatch, salloc or srun.

multi-node jobs - the parallel partition

The parallel partition is intended for jobs which require 1 or more entire compute nodes. Jobs in this partition should in most cases allocate and use entire nodes. Around half the standard compute nodes in the cluster are reserved for the use of parallel jobs of this type.

This partition is available to HPC project users only. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.

More detailed information on using this partition is provided on the MPI and hybrid Jobs page.

partition	cores	nodes	memory	max runtime	availability
parallel	64 per node	1-16	256GB per node	7 days	HPC users only

The parallel partition must be explicitly requested in your job script, you should also request all memory available on the node (using --mem=0):

#SBATCH --partition=parallel
#SBATCH --mem=0

These parameters can also be passed to sbatch, salloc or srun on the command line by adding the parameters --partition=parallel --mem=0

Large memory jobs - the lmem partition

The lmem partition is provided for jobs that require more than 256 GB per node. The lmem nodes provide 2 TB of memory each. In most cases, you do not need to explicitly request this queue. Jobs which request more memory per node than the standard compute nodes provide will automatically be routed to this partition.

Jobs which require few cpus but a lot of memory may be better suited to running as lmem jobs even if the total memory they require is less that 256GB. If your job requires more than 16GB of memory per CPU please consider submitting it directly to the lmem partition. This is particularly import if you are submitting large numbers of these jobs to the cluster as otherwise they can block access to cpu resources that could be used by others.

The lmem partition is only available to members of registered HPC projects. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.

partition	cores	nodes	memory	max runtime	availability
lmem	up to 64	1	up to 2 TB	7 days	HPC users only

GPU jobs - the GPU partition

The gpu partition provides access to the gpu nodes. 4 gpu nodes are available - each provide 64 intel icelake cpu cores, two A100-80 nvidia gpgpus and 512GB memory.

Any job which requist gpu resources will be automatically routed to the gpu queue. Up to 4 GPUs may be in use by a single user at any one time.

The gpu queue is restricted to members of HPC projects. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.

partition	cpu cores	gpus	nodes	memory	max runtime	availability
gpu	up to 64 per node	up to 2 per node	1-2	up to 512gb per node	7 days	HPC users only

More information on running GPU jobs can be found in the Accessing GPUs section.

GPU devel jobs

For short test and developement jobs a gpu-devel partition is provided. Users are limited to a single gpu-devel job, on a single node.

The gpu-devel partition must be explicitly requested using:

#SBATCH --partition=gpu-devel

or by passing --partition=gpu-devel as a command line parameter to sbatch, salloc or srun.

The gpu-devel queue is only available to members of an HPC project

partition	cpu cores	gpus	nodes	memory	max runtime	availability
gpu-devel	up to 64 per node	up to 2 per node	1	up to 512gb per node	2 hours	HPC users only