Friday, August 12, 2016

Configure Theano to use Intel MKL on Windows

In order to use convolution layers and FFT in Theano, Theano requires either BLAS or CUDA to be present. On computers without an Nvidia graphics card, BLAS is the only choice.

There are 3 popular BLAS implementations, namely Intel MKL, OpenBLAS and ATLAS. A benchmark shows that ATLAS is the slowest, while MKL and OpenBLAS are on par. I could not get OpenBLAS to work, as I keep getting the error "DLL load failed: The specified module could not be found" despite ensuring that all the dependent DLLs are on my path.

Getting the MKL library

  1. Before Intel would let you download the MKL library, you have to register at https://software.intel.com/en-us/intel-mkl
  2. You will receive an email containing the link to the download page of various Intel libraries. Download the Math Kernel Library (MKL) installer
  3. Install

 Configuration

  1. By default, MKL gets installed into C:\Program Files (x86)\IntelSWTools. I could not figure out what escape sequence I need to enter to get spaces to work in gcc. Thus use Link Shell Extension to create a symbolic link from C:\tools\IntelSWTools to C:\Program Files (x86)\IntelSWTools.
  2. Edit C:\Users\.theanorc and add the following
    [blas]
    ldflags = -LC:/tools/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win -LC:/tools/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win_mic  -lmkl_core -lmkl_intel_thread -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_rt
  3. Add C:\tools\IntelSWTools\compilers_and_libraries\windows\redist\intel64_win\mkl to your system environmental variable PATH

Checking that it works

  1. Open command line and cd to \misc\
  2. python.exe check_blas.py
  3. The output should have something like the following
    We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

    Total execution time: 18.00s on CPU (with direct Theano binding to blas).

    Try to run this script a few times. Experience shows that the first time is not as fast as followings calls. The difference is not big, but consistent.

No comments: