Next scheduled ALICE Service Day - Wednesday 12 February
- 16 December 2024 - gpu-devel partition
- 19 November 2024 - GPU nodes added
- 11 September 2024 - Service Day updates
- 11 September 2024 - Forthcoming Service Day Changes
- 10 July 2024 - Performance problems on ALICE home directories for some users - *Now Resolved*
- 12 June 2024 - Service Day updates
- 29 February 2024 - Service Day updates
16 December 2024 - gpu-devel partition
One ampere and one pascal gpu node are now reserved for the gpu-devel partition between 0800 and 1800 each day. If you are submitting a gpu-devel job, please add --reservation=gpu-devel to your command line or to your job script - for example for an interactive job:
srun --account=MY_PROJECT --time=2:00:00 --mem=32g --tasks=1 --cpus-per-task=4 --gres="gpu:ampere:1" -p gpu-devel --reservation=gpu-devel --pty /bin/bash
or
srun --account=MY_PROJECT --time=2:00:00 --mem=32g --tasks=1 --cpus-per-task=4 --gres="gpu:pascal:1" -p gpu-devel --reservation=gpu-devel --pty /bin/bash
More information on using these can be found in the gpu-devel jobs section on the Running Jobs/Accessing GPUs page.
19 November 2024 - GPU nodes added
4 additional GPU nodes have been added to ALICE - these each have Intel Broadwell CPUs, 128GB memory and 2 nvidia Pascal P100-16 GPUs. More information on using these can be found on the Running Jobs/Accessing GPUs page.
11 September 2024 - Service Day updates
In addition to general system and firmware updates:
-
The memory limit per user on login nodes has been decreased to 16GB from the previous 24GB. This is due to the anticipated number of users expected once teaching starts.
-
The following software has been removed from the system:
- IDL 8.0
- COMSOL 5.5
- alphapulldown
- astroconda/20231120
- qiime2/2019.4
- qiime2/2023.9
-
The jupyterhub service has been updated to jupyterhub version 5.1, if jupyterhub is being used for teaching, please check it is still working as expected before the start of your course.
11 September 2024 - Forthcoming Service Day Changes
Due to changes in the Anaconda and Miniconda licence terms we can no longer use applications that we installed using either of these products. The following applications will be removed on the service day of Wednesday 18 September, but we have provided new alternative builds.
- astroconda/20231120 - use astroconda/20240821 instead
- qiime2/2019.4 - use qiime2/2024.5 instead
- qiime2/2023.9 - use qiime2/2024.5 instead
The Python images on the JupyterHub service have been rebuilt without Anaconda.
Important
If you are currently using Miniconda or Anaconda to install Python modules and do not have a license for them, or use them entirely for teaching you should switch to use conda via Miniforge.
10 July 2024 - Performance problems on ALICE home directories for some users - *Now Resolved*
-
Please note that this issue does not affect everyone, but only those usernames beginning with one of the following letters: a, b, d, e, i, k, l, w, y
-
Some of these users may be experiencing issues accessing ALICE, noticeable particularly when logging in (via SSH and NoMachine) and when using *conda (Minoconda or Anaconda) environments.
-
Symptoms include very long times to login (several minutes), and similar lengths of times when using *conda and other applications that cause a lot of activity within the home directory.
-
A failed storage component has been identified in one of the home directory storage servers and a replacement is being shipped. It is likely to be Monday July 15 before this is replaced.
-
As a temporary workaround, if you are not using your *conda environment then commenting out the *conda initialisation section in the .bashrc file in your home directory will make a big difference in subsequent login time - this would be the section between these two lines:
# >>> conda initialize >>>
# <<< conda initialize <<<
12 June 2024 - Service Day updates
- The Backfill scheduling window has been increased to 7 days. This should give more predictable start times for longer jobs at the cost of a small reduction in the responsiveness to slurm commands.
- The maximum number of jobs that an HPC user can have running has been increased to 800, the maximum number of jobs a single user can queue has been increased to 5000.
- Users are now limited to a single job running in the devel partition and a maximum of 4 jobs can be submitted to this partition at a time.
29 February 2024 - Service Day updates
Module changes
- Broken paraview/5.10.1 module removed, please use paraview/5.11.1 instead.
- Broken openfoam/2206 module has been rebuilt and should now work.
- 'module spider' now correctly shows prerequisite modules to load.
More information about Modules
Scheduling changes
- SLURM updated to 23.02.7
- Job priority factors have been changed to increase the significance of fair share. This ensures that as you use resources on the cluster, your job priority is reduced, preventing a single user from dominating the cluster.
- Memory and gpu use are now taken into account in the fair share calculation.
- gpu-devel queue added for short gpu development jobs - this partition allows a maximum of 1 running job per user, using a single gpu node, maximum job run time is 2 hours.
- parallel, devel and gpu-devel partitions now get increased priority.
- parallel partition reduced from 26 to 20 nodes, the 6 released nodes are now available to run medium and long jobs.
Local filesystem quotas on login nodes
The /local and /tmp filesystems on login nodes now have a per user quota applied of 100MB for /tmp and 100 GB for /local.