You are on page 1of 6

https://github.

com/Theano/Theano/issues/5348

http://deeplearning.net/software/theano/install_windows.html

This installation has been a nightmare! So I took copious notes and attempted to minimize the
installs needed. I am a complete newbie python programmer and so I figured if I could get this to
work maybe it'll help other people and maybe they'll include this in the documentation.

References:

Theano Documentation

Zero to Lasagne

Neural-Style-Transfer Git

Conda Managing Environments

GPU-accelerated Deep Learning on Windows 10 native Git

Tutorial: Theano install on Windows 7, 8, 10

Making Theano Faster with CuDNN and CNMeM on Windows 10

1. Download and install Anaconda. DO NOT install in the default directory for now. The
.theanorc file WILL NOT WORK IF THERE ARE SPACES in the directory names pointing to
the library. Future versions (after theano 0.8.2) may support spaces. Using python 3.4
allows use of nolearn (lasagne) and pydot-ng (keras). I downloaded the 64-bit installer.:

https://www.continuum.io/downloads

Optional - (I did not try this) Downgrade to 3.4:


conda install python==3.4.4
conda update --all

Note: It is important to stay consistent with your 32 vs 64 bit install throughout the process or it
may lead to errors.

2. Create new environment for your project. The python parameter sets the python version
and the "anaconda" after the environment name means the new environment includes all
the core packages. Open the "Anaconda Prompt" and enter the command below
corresponding to the version you want:

conda create -n env_name35 anaconda python=3.5


conda create -n env_name34 anaconda python=3.4

3. Change to the environment of choice by closing the prompt and opening the
corresponding prompt, e.g. "Anaconda Prompt (env_name35)". Import theano with the
following command:
conda install theano

4. Download check_blas.py:

https://raw.githubusercontent.com/Theano/Theano/master/theano/misc/check_blas.py

Run it by changing to the directory you saved it into within the command prompt for your new
environment and then typing:

python check_blas.py
It will say:

Total execution time: 25.74s on CPU (without direct Theano binding to blas but with
numpy/scipy binding to blas).

5. In order to bind theano to BLAS, create a file called .theanorc and place it in the C:\Users\
[Your_User] directory and have it say the following:

[global]
floatx = float32

[blas]
ldflags = -LC:\Anaconda3\Library\bin -lmkl_rt
This will bind theano to the mklBLAS library that was included in Anaconda that copied into your
environment in step 2. mkl_rt.dll is the library and the directory is the bin directory. Other install
instructions often mention OpenBLAS but the mklBLAS is already included and is about 30%
faster. Without binding to BLAS you can't use certain layers e.g. convolutional layers and you can
get errors like:

AssertionError: AbstractConv2d Theano optimization failed: there is no implementation


available supporting the requested options. Did you exclude both "conv_dnn" and "conv_gemm"
from the optimizer? If on GPU, is cuDNN available and does the GPU support it? If on CPU,
do you have a BLAS library installed Theano can link against?
It will also limit calcs to float32 which is most of the improvement in the speed. Now when you
run check_blas you should have things faster and it should say that theano is binding to blas:

Total execution time: 11.77s on CPU (with direct Theano binding to blas).

6. Now we need to install the Microsoft C++ compiler. You can get it here:

https://www.visualstudio.com/free-developer-offers/

Download the visual studio community edition. You only need to install:

Programming Languages/Visual C++/Common Tools for Visual C++ 2015


I did not install anything from the other sections including the Windows and Web Development
section

7. Now you can install CUDA from NVidia here:

https://developer.nvidia.com/cuda-downloads

I only installed the CUDA/development, visual studio integration and runtime files sections.

8. Add the following lines to .theanorc:


[nvcc]
flags=--cl-version=2015 -D_FORCE_INLINES
if you do not include the cl-version then you get the error:

nvcc fatal : nvcc cannot find a supported version of Microsoft Visual Studio. Only the
versions 2010, 2012, and 2013 are supported
the D_FORCE_INLINES part is for an Ubuntu bug although I'm not sure it's necessary anymore. It
can help prevent this error:

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available (error:
cuda unavailable)
Note: This error seems to also show if the g++ version is too new for the CUDA version.

Some installs suggest having a --use-local-env flag as well but with this setup it leads to the
error:

c:\program files\nvidia gpu computing


toolkit\cuda\v8.0\include\host_config.h(203): fatal error C1083: Cannot open include file:
'crtdefs.h': No such file or directory
mod.cu
It does help with older versions of c++ that are not the Visual Studio Windows Kits version.

9. Nvcc was unable to see the Visual Studio include and library files with the new Kits
format. To have it see the includes and avoid the error:

LINK : fatal error LNK1104: cannot open file stdio.h


and see the library files and not get the errors:

LINK : fatal error LNK1104: cannot open file libucrt.lib


and

LINK : fatal error LNK1104: cannot open file uuid.lib


Add a new environmental variable called INCLUDE by going to:

My Computer/Right-click/Properties/Advanced System Settings/Environmental Variables/System


Variables - New
Fill in the Variable Value:

C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0\um;C:\Program Files


(x86)\Windows Kits\10\Include\10.0.14393.0\ucrt
Add a second new environmental variable called LIB and fill in the Variable Value:

C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\ucrt\x64;C:\Program Files


(x86)\Windows Kits\10\Lib\10.0.14393.0\um\x64
Make sure you are using the 10.0.HighestNumber

10. While in the environmental variables section, in the system variables find the PATH
variable and add the mingw64 directory:

C:\Anaconda3\envs\corrnet35\Library\mingw-w64\bin
Or you will get a g++ error or a mingw error e.g.:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute


optimized C-implementations (for both CPU and GPU) and will default to Python
implementations. Performance will be severely degraded. To remove this warning, set Theano
flags cxx to an empty string.
And make sure all of the following are also present in the path:
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64
C:\Anaconda3
C:\Anaconda3\Scripts
C:\Anaconda3\Library\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\libnvvp
Without the first line you can get the error:

nvcc : fatal error : Cannot find compiler 'cl.exe' in PATH

11. Restart the computer to allow the path changes created by the install to take effect. Now
when you run check_blas it should be faster:

Total execution time: 9.83s on CPU (with direct Theano binding to blas).

12. Now add the g++ information to .theanorc in the global flag cxx and the gcc sections.
Mingw-64w was included in the theano install so all you have to do is link it. You can also
throw in the fast_run mode. The .theanorc file should now look like this:

[global]
floatx = float32
cxx = C:\Anaconda3\envs\env_name35\Library\mingw-w64\bin\g++.exe
mode = FAST_RUN

[blas]
ldflags = -LC:\Anaconda3\Library\bin -lmkl_rt

[gcc]
cxxflags = -LC:\Anaconda3\envs\env_name35\Library\mingw-w64\include
-LC:\Anaconda3\envs\env_name35\Library\mingw-w64\lib -lm

[nvcc]
flags=--cl-version=2015 -D_FORCE_INLINES
CPU ONLY 13) Update .theanorc to force use of the cpu if you do not have a gpu. I.e. change it to
read:

[global]
floatx = float32
cxx = C:\Anaconda3\envs\env_name35\Library\mingw-w64\bin\g++.exe
mode = FAST_RUN
force_device = True
device = cpu
Without the force_device you can get this error:

ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status...


GPU ONLY 14) Update .theanorc to change global setting for the device to gpu instead of cpu if
you have a gpu. I.e. change it to read:

[global]
floatx = float32
cxx = C:\Anaconda3\envs\env_name35\Library\mingw-w64\bin\g++.exe
mode = FAST_RUN
device = gpu
Now when you run it you'll see another big increase in speed:

Total execution time: 0.75s on GPU.


GPU ONLY 15) I signed up as an NVidia developer and downloaded cuDNN, which is a set of files
to increase convent speeds, and copied the files into the appropriate directories in the
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0 bin/include/lib directories. In order
for cuDNN to work you need to enable it or you get the following error:

Using gpu device 0: GeForce GTX 1060 (CNMeM is disabled, CuDNN not available)
You can enable it with the following in .theanorc:

[dnn]
enabled=True
and now it should say:

Using gpu device 0: GeForce GTX 1060 (CNMeM is disabled, cuDNN 5105)
I enabled CNMeM, with 70% GPU usage by adding this to .theanorc:

[lib]
cnmem=0.70
And now it says:

Using gpu device 0: GeForce GTX 1060 (CNMeM is enabled with initial size: 70.0% of memory,
cuDNN 5105)
Note: You don't need the dnn section if you enable CNMeM since it will automatically use DNN if
both are available. The 3 options for CNMeM are:

0 not enabled
0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
[Note: This should be a float value, for instance 0.25 or 1.0]
> 1 use this number in megabytes (MB) of memory
If the percent is set too high you may get the below error. Either close open scripts or reduce the
percent:

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1
And if that is happening too often still you can just use the dnn option and skip CNMeM.

GPU ONLY 16) I also tried the cuBLAS library by copying the file cublas64_80.dll from C:\Program
Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin into the C:\Anaconda3\Library\bin directory
and using that instead of mklBLAS (mkl_rt.dll) and changing the blas line in .theanorc to:

[blas]
ldflags = -LC:\Anaconda3\Library\bin -lcublas64_80
Note: Do not use this if only using cpu and not the gpu. Stick with mklBLAS (mkl_rt).

Steps 15-6 are supposed to increase performance but I got no improvement in speed at all. Intel
says cuBLAS should be 6-17x faster so I'm just leaving it that way in case it makes a difference for
bigger networks.

17. Install PyCharm

https://www.jetbrains.com/pycharm/download/?gclid=CIaQwY2q8tACFZ26wAod-
ScPwA&gclsrc=aw.ds.ds&dclid=CJC7042q8tACFRO6TwodaKAAAA#section=windows

18. Open PyCharm, on the bottom right press Configure/Settings and then go to Project
Interpreter. Press the down arrow next to Project Interpreter and press Show All. Press the
+ sign and Add Local. Navigate to and select python.exe in your virtual environment:

C:\Anaconda3\envs\env_name35\python.exe
If you're like me and like undo/redo buttons in a toolbar then you can check View/Toolbar and
View/Tool Buttons. You can then customize the toolbar with a right-click on it and select
Customize Menus and Toolbars.

19. Install keras and/or lasagne and program away. If the keras download breaks
check_blas.py then update all the libraries with:

conda upgrade --all


Good luck!!

For reference the final GPU .theanorc looks like this:

[global]
floatx = float32
device = gpu
mode = FAST_RUN
cxx = C:\Anaconda3\envs\env_name35\Library\mingw-w64\bin\g++.exe

[blas]
ldflags = -LC:\Anaconda3\Library\bin -lcublas64_80

[gcc]
cxxflags = -LC:\Anaconda3\envs\env_name35\Library\mingw-w64\include
-LC:\Anaconda3\envs\env_name35\Library\mingw-w64\lib -lm

[nvcc]
flags=--cl-version=2015 -D_FORCE_INLINES

[dnn]
enabled=True

[lib]
cnmem=0.70
The final CPU .theanorc looks like this:

[global]
floatx = float32
force_device = True
device = cpu
mode = FAST_RUN
cxx = C:\Anaconda3\envs\env_name35\Library\mingw-w64\bin\g++.exe

[blas]
ldflags = -LC:\Anaconda3\Library\bin -lmkl_rt

[gcc]
cxxflags = -LC:\Anaconda3\envs\env_name35\Library\mingw-w64\include
-LC:\Anaconda3\envs\env_name35\Library\mingw-w64\lib -lm

[nvcc]
flags=--cl-version=2015 -D_FORCE_INLINES
And the list of installation files I used is:

Anaconda3-4.2.0-Windows-x86_64.exe
vs_community.exe (Programming Languages/Visual C++/Common Tools for Visual C++ 2015)
cuda_8.0.44_win10.exe (CUDA/development, visual studio integration and runtime files
sections)
pycharm-community-2016.3.1.exe
cudnn-8.0-windows10-x64-v5.1.zip (Optional and requires login with NVidia)

You might also like