Openblas vs Intel MKL for Numpy and AMD CPUs

Choose the faster library, OpenBLAS or MKL for AMD CPUs.

Introduction

When it comes to scientific computing or matrix operations, BLAS (basic linear algebra subprograms) and LAPACK (linear algebra package) are the core libraries that provides basic algorithms. Additionally, there exist many alternative and more powerful libraries (OpenBLAS ), Intel Math Kernle Library (MKL) , AMD Optimizing CPU Libraries (AOCL) , etc) that can speed the matrix operations.

The MKL is one of the most famous algebra libraries, which is indeed becoming the actual standard. However, it is said that MKL is only optimized for Intel CPUs and performs poorly on AMD CPUs. OpenBLAS and AOCL are preferred for AMD CPUs. While some discussions (reddit , report ) also show that the newer versions of MKL perform similarly as OpenBLAS on AMD CPUs.

To choose the proper library, I made a simple performance test based on Python Numpy. Remember that the test is not strict nor comprehensive at all. Those who want to choose the proper one should make a benchmark test on your own platform, especially for your specific purposes.

Test

1. Platform

My laptop was used for the test. It has the AMD RYZEN 5800H CPU (8 CPU cores and 16 threads, TDP 45W) and 32G memory.
The OS were Window 10 and WSL1 (Ubuntu 20.04).
Python and numpy version:

Environment	OS	Python version	Library
Env1	Win10	3.6	MKL 2020.2
Env2	WSL1	3.9	MKL 2021.2
Env3	WSL1	3.9	OpenBLAS

To install the MKL based numpy, simply run conda install numpy
To install the OpenBLAS based numpy, simply run pip install numpy
After installation, use numpy.show_config() to check whether the numpy library was properly configured.

2. Test code

I tried a very simple test code, which only covered matrix multiplication and eigenvalue decomposition.

import numpy as np

size = 1000  

a = np.random.randn(size, size)
b = np.random.randn(size, size)

%timeit (np.dot(a,b))
%timeit (np.linalg.eig(a))

The size parameter was set to 1000, 4000, and 10000, respectively.
%timeit is a magic function used on the Jupyter notebooks (IPython).

3. Performance comparison

	size	Matrix Multiplication	EigValue Decomposition
Env1	1000	17ms	600ms
	4000	700ms	21.5s
	10000	10.3s	Too long to test
Env2	1000	14.2ms	586ms
	4000	609ms	21.9s
	10000	11.4s	Too long to test
Env3	1000	10.1ms	738ms
	4000	614ms	26.4s
	10000	8.24s	Too long to test

From the table above, we can conclude that

Using OpenBLAS, the matrix multiplication is faster than using MKL.
Using OpenBLAS, the eigenvalue decomposition is slower than using MKL.
The performance differences are not significant and we should choose the proper library based on the specific projects (whether containing many multiplication or decomposition operations).
I also tried the mkl_serv_intel_cpu_true to force to use best performing routines. However, it turned out that the results before and after mkl_serv_intel_cpu_true are no different.

Another point is when using OpenBLAS, the CPU usage was nearly 80%, but only around 50% when using MKL. That might be because MKL could not make full use of the AMD CPUs.

Conclusion

Using OpenBLAS makes matrix multiplication faster, but makes eigenvalues decomposition slower. The performance differences are not significant.

In my own opinion, the newer version MKL provides more functions, and similar performance, which makes it suitable for AMD CPUs.

Next, I will try to compare more software based on MKL and OpenBLAS, such as Eigen, Armadillo, and so on.

环形缓冲