Compiler flags
There are optimised libraries for mathematics (libm), linear algebra (BLAS/LAPACK) and fourier transforms (FFTW3) available on the ALICE system, there are also some compiler flags that can help provide optimised code. The examples below use the c compiler, but the same options should also work with the c++ and fortran compilers.
gnu compilers (gcc/g++/gfortran)
The following optimisation flags are recommended for gcc12 for code being compiled for the standard or large memory compute nodes:
gcc -march=znver2 -O2 ......
This will target the CPU architecture (AMD Rome/zen2) used in all ALICE nodes with the exception of the GPU nodes. Code compiled with "-march=znver2" will most likely fail to run on the GPU nodes.
When building for the gpu nodes, use:
gcc -march=skylake -O2 .....
Binaries generated with "-march=skylake" will be able to run on any node in the ALICE cluster, at a small performance cost on the none-gpu nodes.
if strict math standard compliance is not important for your code, you can also add "-ffast-math -O3" in place of -O2.
Intel oneapi compilers (icx,icpx,ifort)
The Intel OneAPI compilers are optimised for Intel cpus. As the ALICE compute and highmem nodes use AMD cpus, we recommend against using the intel compilers for code that will run on these systems. The gpu nodes have intel processors, so may benefit from using the intel compilers.
To obtain the best optimised binary, use:
icx -axCORE-AVX2 -O2 .......
Binaries produced with the above flag will run on all nodes on the cluster.
AMD optimised compilers (clang, clang++, flang)
The AMD optimised compilers are relatively new and still under active developement, in most cases you should try building your code with both these and the gnu/intel compilers and test to see which produces the fastest code.
To optimize for non-gpu nodes, use the flag "-march=znver2" when compiling, for example:
clang -march=znver2 -O2 my_code.c -o my_code
We recommend against building code for the gpu nodes using the aocc compilers.
Optimized linear algebra/fast fourier transform libraries
Generic versions
Generic versions of the openblas, lapack and fftw3 libraries are available for all compilers - these are not as highly optimised as MKL (for intel) or AOCL (for AMD), but provide good performance.
After loading the module for the compiler you wish to use, you can run:
module load openblas
module load netlib-lapack
module load fftw3
to load non-threaded versions of these libraries.
These modules set some useful environment variables:
- openblas:
BLAS_PATH
- the path to the blas installation.
BLAS_CPATH
- the location of the blas include files.
BLAS_LIBRARY_PATH
- the location of the blas libraries.
- netlib-lapack:
LAPACK_PATH
- the path to the lapack installation
LAPACK_CPATH
- the location of the lapack include files
LAPACK_LIBRARY_PATH
- the location of the lapack libraries
- fftw3:
FFTW3_PATH
- the path to the fftw3 installation
FFTW3_CPATH
- the location of the fftw3 include files
FFTW3_LIBRARY_PATH
- the location of the fftw3 libraries
These can be used when compiling/linking - for example to compile and link against openblas:
module load openblas
#compile
gcc -I$BLAS_CPATH -c my_app.c
#link
gcc -L$BLAS_LIBRARY_PATH -Wl,-rpath=$BLAS_LIBRARY_PATH -l openblas my_app.o -o my_app
AOCL
The AMD Optimised C Libraries provide alternative implementations of BLAS, LAPACK, fftw3 and libm which are highly optimised for AMD processors. These libraries are installed as modules and are available when using the aocc and gcc compilers.
Single thread versions are provided by:
module load amdblis # BLAS library
module load amdlibflame # LAPACK library
module load amdfftw # FFTW library
Intel MKL
The intel Math Kernal Libraries provide blas, lapack and fftw3 routines optimised for Intel processors. The code produced by these can run poorly on AMD processors, so we recommend only using these libraries if running on GPU nodes (which have Intel processors).
module load intel-oneapi-mkl