building a deep-learning machine from scratch

Recently, I had to rebuild my machine learning box. Unfortunately – as of September 2017 – I couldn’t find a good guide covering how to quickly configure a gpu enabled machine learning box. So I made one.

Pre-requisites

What’s covered

Installing & Configuring :

  • Drivers
  • GPU enabled tensor frameworks and classical datascience software.
  • Convenience tweaks for remote access
    • secure remote access via ssh
    • sudo tweaks

GPU Drivers & Configuration

First, install common dependencies using the apt-get package manager.

sudo apt-get update
sudo apt-get install -y --no-install-recommends \
        build-essential \
        curl \
        git \
        libfreetype6-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        software-properties-common \
        swig \
        zip \
        zlib1g-dev \
        libcurl3-dev \
        wget \
        python3-pip \
        python3-dev \
        python-pip \
        python-dev \
        python-virtualenv \
        libcupti-dev \
        vim-nox

Install latest GPU drivers

# Add NVIDIA's graphics ppa repository
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# (re-run if any warning/error messages)
sudo apt-get install nvidia-
# Press tab after nvidia-  to see latest. Do not use 378 it causes login loops.
# 384 was the latest driver as of time of writing.
sudo apt-get install nvidia-384

Check installation by running nvidia-smi.

Install NVIDIA’s CUDA 8

CUDA is an API that lets deep learning frameworks do GPU computations.

wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb"
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
# check version
cat /usr/local/cuda/version.txt
# CUDA Version 8.0.61
# add cuda to your path
vi ~/.bashrc
# add the following to the bottom of your bashrc
# export PATH="/usr/local/cuda-8.0/bin/:$PATH"

Check installation by running nvcc --version.

Install CuDNN 6

The NVIDIA CUDA Deep Neural Network library (cuDNN) provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Traditionally, you are instructed to sign up to NVIDIA’s website and agree to their terms which is a pain in the ass. This guide assume you’ve already done that, just like in their docker images.😀

# become root root
sudo su
echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
CUDNN_VERSION="6.0.21"
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
        libcudnn6=$CUDNN_VERSION-1+cuda8.0 \
        libcudnn6-dev=$CUDNN_VERSION-1+cuda8.0

# move files where TF expects them
ls -lah /usr/local/cuda/lib64/*
mkdir /usr/lib/x86_64-linux-gnu/include/
ln -s /usr/lib/x86_64-linux-gnu/include/cudnn.h /usr/lib/x86_64-linux-gnu/include/cudnn.h
ln -s /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h
ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so /usr/local/cuda/lib64/libcudnn.so
ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.6 /usr/local/cuda/lib64/libcudnn.so.6

# confirm your version
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Software

Anaconda

Anaconda is a package manager, virtual-environment, and a collection of common data-science tools rolled into one.

wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
bash Anaconda3-4.4.0-Linux-x86_64.sh
# follow the install prompts
# restart your bash session
exec -l $SHELL
# check to make sure python is anaconda
which python
# should return $HOME/anaconda3/bin/python
# install pip via conda
conda install pip

Pytorch

conda install pytorch torchvision cuda80 -c soumith
# clone the examples repository to test
git clone https://github.com/pytorch/examples $HOME/pytorch-examples
cd $HOME/pytorch-examples/mnist
python main.py

Tensorflow

pip install tensorflow-gpu
# from $HOME
git clone https://github.com/tensorflow/tensorflow.git $HOME/tensorflow
# run an example to test
python $HOME/tensorflow/tensorflow/examples/tutorials/mnist/fully_connected_feed.py

(Optional) Tweaks

Things I do to make my day-to-day easier.

Secure remote access over SSH

On your machine learning machine.

sudo apt-get install openssh-server

On your development machine, where bdd is your username on the remote machine and mlbox is the hostname or ip-address to that server.

ssh bdd@mlbox
# optionally, drop your public key on that server
ssh-copy-id mlbox

sudo

Run sudo visudo. Change sudoers to allow people in the sudoers group to not be prompted for passwords by changing the following line.

# Allow members of group sudo to execute any command, without prompting
%sudo   ALL=(ALL:ALL) NOPASSWD:ALL