AWS で docker から pytorch 1.5.0 (GPU) を設定する方法


Overview

  • check host machine
  • installation on nvidia-driver / docker with nvidia docker

check host machine

 ubuntu@:~$ lspci | grep -i nvidia
 00:1e.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
 ubuntu@:~$ lsb_release -a
 No LSB modules are available.
 Distributor ID:    Ubuntu
 Description:   Ubuntu 16.04.4 LTS
 Release:   16.04
 Codename:  xenial

In my case, I need to change locale back to English
sudo update-locale LANG=C.UTF-8

installation on nvidia-driver / docker with nvidia docker

  • nvidia-driver should be compatible with gpu impelented + latest version for pytorch + tensorflow version
  • this case we'd like to install the driver for tesla k80 / pytorch 1.5.0
  • docker was upgraded to include gpu connection natively. don't have to use nvidia-docke
  1. please follow this and install nvidia-driver. in my case, the version is 440 (https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html). you can check whether it's installed by nvidia-smi.
 ubuntu@:~/MyDockerMLenv$ nvidia-smi
 Tue May  5 16:37:09 2020
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
 | N/A   47C    P0    54W / 149W |    262MiB / 11441MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
  1. please install docker via apt-get (https://docs.docker.com/engine/install/ubuntu)
    bash
    ubuntu@:~/MyDockerMLenv$ docker --version
    Docker version 19.03.8, build afacb8b7f0

  2. then you can create some docker image and container for pytorch + tensorflow
    we don't have the security access (8888 is opened by default)
    i recommend you to open port 8888 and 6006 for jupyterlab and tensorboard
    you can use --gpus option when you run docker container.
    See the background here. (https://qiita.com/ksasaki/items/b20a785e1a0f610efa08)