Choose the faster library, OpenBLAS or MKL for AMD CPUs.

Introduction

When it comes to scientific computing or matrix operations, BLAS (basic linear algebra subprograms) and LAPACK (linear algebra package) are the core libraries that provides basic algorithms. Additionally, there exist many alternative and more powerful libraries (OpenBLAS ), Intel Math Kernle Library (MKL) , AMD Optimizing CPU Libraries (AOCL) , etc) that can speed the matrix operations.

The MKL is one of the most famous algebra libraries, which is indeed becoming the actual standard. However, it is said that MKL is only optimized for Intel CPUs and performs poorly on AMD CPUs. OpenBLAS and AOCL are preferred for AMD CPUs. While some discussions (reddit , report ) also show that the newer versions of MKL perform similarly as OpenBLAS on AMD CPUs.

To choose the proper library, I made a simple performance test based on Python Numpy. Remember that the test is not strict nor comprehensive at all. Those who want to choose the proper one should make a benchmark test on your own platform, especially for your specific purposes.

Test

1. Platform

  • My laptop was used for the test. It has the AMD RYZEN 5800H CPU (8 CPU cores and 16 threads, TDP 45W) and 32G memory.
  • The OS were Window 10 and WSL1 (Ubuntu 20.04).
  • Python and numpy version:
Environment OS Python version Library
Env1 Win10 3.6 MKL 2020.2
Env2 WSL1 3.9 MKL 2021.2
Env3 WSL1 3.9 OpenBLAS
  1. To install the MKL based numpy, simply run conda install numpy
  2. To install the OpenBLAS based numpy, simply run pip install numpy
  3. After installation, use numpy.show_config() to check whether the numpy library was properly configured.

2. Test code

I tried a very simple test code, which only covered matrix multiplication and eigenvalue decomposition.

import numpy as np

size = 1000  

a = np.random.randn(size, size)
b = np.random.randn(size, size)

%timeit (np.dot(a,b))
%timeit (np.linalg.eig(a))
  • The size parameter was set to 1000, 4000, and 10000, respectively.
  • %timeit is a magic function used on the Jupyter notebooks (IPython).

3. Performance comparison

size Matrix Multiplication EigValue Decomposition
Env1 1000 17ms 600ms
4000 700ms 21.5s
10000 10.3s Too long to test
Env2 1000 14.2ms 586ms
4000 609ms 21.9s
10000 11.4s Too long to test
Env3 1000 10.1ms 738ms
4000 614ms 26.4s
10000 8.24s Too long to test

From the table above, we can conclude that

  1. Using OpenBLAS, the matrix multiplication is faster than using MKL.
  2. Using OpenBLAS, the eigenvalue decomposition is slower than using MKL.
  3. The performance differences are not significant and we should choose the proper library based on the specific projects (whether containing many multiplication or decomposition operations).
  4. I also tried the mkl_serv_intel_cpu_true to force to use best performing routines. However, it turned out that the results before and after mkl_serv_intel_cpu_true are no different.

Another point is when using OpenBLAS, the CPU usage was nearly 80%, but only around 50% when using MKL. That might be because MKL could not make full use of the AMD CPUs.

Conclusion

Using OpenBLAS makes matrix multiplication faster, but makes eigenvalues decomposition slower. The performance differences are not significant.

In my own opinion, the newer version MKL provides more functions, and similar performance, which makes it suitable for AMD CPUs.

Next, I will try to compare more software based on MKL and OpenBLAS, such as Eigen, Armadillo, and so on.