Edit 2017-12-22: I’ve updated the guide for CUDA 9.1 and CuDNN 7.
Installing & Configuring :
- GPU enabled tensor frameworks and classical datascience software.
- Pytorch is a deep learning library from Facebook focused on research. Uses dynamic graphs. I find pytorch much easier to work with than tensorflow.
- Tensorflow is a deep learning library from Google. Suitable for production and research. Good support for embedded devices and production deployments but I find it trickier to work with and debug. Uses static graphs.
- Anaconda includes classical statistical learning and datascience tools like numpy, scipy, scikit-learn, pandas and many others. Anaconda also has virtual environment like capabilities for managing dependencies across projects.
- Convenience tweaks for remote access.
- Configuring Jupyterhub for remote Jupyter programming.
GPU Drivers & Configuration
First, install common dependencies using the apt-get package manager.
sudo apt-get update sudo apt-get install -y --no-install-recommends \ build-essential \ curl \ git \ libfreetype6-dev \ libpng12-dev \ libzmq3-dev \ pkg-config \ software-properties-common \ swig \ zip \ zlib1g-dev \ libcurl3-dev \ wget \ python3-pip \ python3-dev \ python-pip \ python-dev \ python-virtualenv \ libcupti-dev \ vim-nox
Install latest GPU drivers
# Add NVIDIA's graphics ppa repository sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update # (re-run if any warning/error messages) sudo apt-get install nvidia- # Press tab after nvidia- to see latest. Do not use 378 it causes login loops. # 384 was the latest driver as of time of writing. sudo apt-get install nvidia-384
Check installation by running
Install NVIDIA’s CUDA 9.1
CUDA is an API that lets deep learning frameworks do GPU computations.
wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb" sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb sudo apt-get update sudo apt-get install cuda # check version cat /usr/local/cuda/version.txt vi ~/.bashrc # add the following to the bottom of your bashrc # export PATH="/usr/local/cuda-9.1/bin/:$PATH"
Check installation by running
nvcc --version. You should see:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85
Install CuDNN 7
The NVIDIA CUDA Deep Neural Network library (cuDNN) provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
Traditionally, you are instructed to sign up to NVIDIA’s website and agree to their terms which is a pain in the ass. This guide assume you’ve already done that, just like in their docker images.😀
# become root root sudo su echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list CUDNN_VERSION="22.214.171.124" sudo apt-get update sudo apt-get install -y --no-install-recommends \ libcudnn7=$CUDNN_VERSION-1+cuda9.1 \ libcudnn7-dev=$CUDNN_VERSION-1+cuda9.1 # move files where TF expects them ls -lah /usr/local/cuda/lib64/* mkdir /usr/lib/x86_64-linux-gnu/include/ ln -s /usr/lib/x86_64-linux-gnu/include/cudnn.h /usr/lib/x86_64-linux-gnu/include/cudnn.h ln -s /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so /usr/local/cuda/lib64/libcudnn.so ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.6 /usr/local/cuda/lib64/libcudnn.so.6 # confirm your version cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
Anaconda is a package manager, virtual-environment, and a collection of common data-science tools rolled into one.
wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh bash Anaconda3-4.4.0-Linux-x86_64.sh # follow the install prompts # restart your bash session exec -l $SHELL # check to make sure python is anaconda which python # should return $HOME/anaconda3/bin/python # install pip via conda conda install pip # to update
conda install pytorch torchvision cuda90 -c pytorch # clone the examples repository to test git clone https://github.com/pytorch/examples $HOME/pytorch-examples cd $HOME/pytorch-examples/mnist python main.py
pip install tensorflow-gpu --upgrade # from $HOME git clone https://github.com/tensorflow/tensorflow.git $HOME/tensorflow # run an example to test python $HOME/tensorflow/tensorflow/examples/tutorials/mnist/fully_connected_feed.py
(Optional) Make remoting great again
Things I do to make my remote day-to-day easier.
Secure remote access over SSH
On your machine learning machine.
sudo apt-get install openssh-server
On your development machine, where
bdd is your username on the remote machine and
mlbox.bdd.io is the hostname or ip-address to that server.
ssh firstname.lastname@example.org # optionally, drop your public key on that server ssh-copy-id mlbox.bdd.io
Remote file-system over ssh
Install sshfs. If your main machine is a mac, you can use brew and simply do
brew install sshfs.
sshfs mlbox.bdd.io:/home/bdd/ ~/mlbox
~/mlbox will now point to the remote file system.