Public | Automated Build

Last pushed: 2 months ago
Short Description
Debian 9 with CUDA Toolkit and cuDNN
Full Description

docker-debian-cuda

docker-debian-cuda is a minimal Docker image built from Debian 9 (amd64) with CUDA Toolkit and cuDNN using only Debian packages.

Although the vendor specific nvidia-docker tool can run CUDA inside Docker images, it performs the same thing in a less transparent way and is incompatible with other Docker tools. Instead of using yet another wrapper command, we explicitly expose GPU devices, CUDA libraries, and build from official Debian images. The image build is using the Debian package manager, also for Nvidia CUDA Toolkit.

Open source project:

Available tags (based on Debian 9/stretch packages):

  • 8.0.44-3_5.1.10-1_375.66-1, 8.0_5.1, latest [2017-05-31]: CUDA Toolkit <small>(8.0.44-3)</small> + cuDNN <small>(5.1.10-1)</small> + CUDA library <small>(375.66-1)</small> (Dockerfile)
  • 8.0.44-3_5.1.10-1_375.39-1 [2017-03-27]: CUDA Toolkit <small>(8.0.44-3)</small> + cuDNN <small>(5.1.10-1)</small> + CUDA library <small>(375.39-1)</small>
  • 8.0.44-2_5.1.5-1_375.20-4 [2016-12-21]: CUDA Toolkit <small>(8.0.44-2)</small> + cuDNN <small>(5.1.5-1)</small> + CUDA library <small>(375.20-4)</small>
  • 7.5.18-4_5.1.3_361.45.18-2, 7.5_5.1 [2016-09-19]: CUDA Toolkit <small>(7.5.18-4)</small> + cuDNN <small>(5.1.3)</small> + CUDA library <small>(361.45.18-2)</small>
  • 7.5.18-2 [2016-07-20]: CUDA Toolkit <small>(7.5.18-2)</small> + cuDNN <small>(4.0.7)</small> + CUDA library <small>(352.79-8)</small>

Usage

Host system requirements (eg. Debian 8 or 9):

  • CUDA-capable GPU
  • nvidia-kernel-dkms <small>(same as CUDA library, see below for workarounds)</small>
  • optionally nvidia-cuda-mps, nvidia-smi, libcupti-dev

To utilize your GPUs this Docker image needs access to your /dev/nvidia* devices and you would probably like to inject the correct version of CUDA library, like:

$ docker run -it --rm $(ls /dev/nvidia* | xargs -I{} echo '--device={}') $(ls /usr/lib/x86_64-linux-gnu/{libcuda,libnvidia}* | xargs -I{} echo '-v {}:{}:ro') gw000/debian-cuda

Additional parameters in above commands explicitly expose your GPU devices and CUDA libraries from the host system into the container. The vendor specific nvidia-docker tool performs the same thing in a less transparent way and is incompatible with other Docker tools.

Host system

List of devices that should be present on the host system:

$ ll /dev/nvidia*
crw-rw---- 1 root video 250,   0 Jul 13 15:56 /dev/nvidia-uvm
crw-rw---- 1 root video 250,   1 Jul 13 15:56 /dev/nvidia-uvm-tools
crw-rw---- 1 root video 195,   0 Jul 13 15:56 /dev/nvidia0
crw-rw---- 1 root video 195, 255 Jul 13 15:56 /dev/nvidiactl

In case /dev/nvidia0 and /dev/nvidiactl are not present, ensure the kernel module nvidia is automatically loaded, properly configured and optimized, and there is a udev rule to create the devices:

$ echo 'nvidia' > /etc/modules-load.d/nvidia.conf
$ cat > /etc/udev/rules.d/70-nvidia.rules << __EOF__
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 0660 /dev/nvidia* && /bin/chgrp video /dev/nvidia*'"
__EOF__

For OpenCL support the devices /dev/nvidia-uvm and /dev/nvidia-uvm-tools are needed. Ensure the kernel module nvidia-uvm is automatically loaded, and add a custom udev rule to create the device:

$ echo 'nvidia-uvm' > /etc/modules-load.d/nvidia-uvm.conf
$ cat > /etc/udev/rules.d/70-nvidia-uvm.rules << __EOF__
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0660 /dev/nvidia-uvm* && /bin/chgrp video /dev/nvidia-uvm*'"
__EOF__

If you would like to monitor real-time temperatures on your host system use something like:

$ watch -n 5 'nvidia-smi; echo; sensors; for hdd in /dev/sd?; do echo -n "$hdd  "; smartctl -A $hdd | grep Temperature_Celsius; done'

In case your Nvidia kernel driver and CUDA library versions differ an error appears in kernel messages (dmesg) or using nvidia-smi inside the container. Possible solutions:

  • upgrade your Nvidia kernel driver on the host directly from Debian 9 packages: nvidia-kernel-dkms, nvidia-alternative, libnvidia-ml1, nvidia-smi
  • upgrade your Nvidia kernel driver on the host by compiling it yourself
  • inject the correct version of CUDA library into the container as mentioned above (if it is installed on the host)

Decision against nvidia-docker

It is true, that Nvidia recommends to use their nvidia-docker command as part of their vendor lock-in strategy. In reality the nvidia-docker is nothing more than a fancy wrapper that runs the docker command with the additional parameters to mount the device and host libraries into the container.

Pros for nvidia-docker tool:

  • shorter command (no need to remember those additional parameters)

Cons for nvidia-docker tool:

  • yet another tool that administrators need to learn (why bother administrators to learn anything more than docker run?)
  • less transparent what is being executed (some believe some "black magic" happens behind nvidia-docker that handles 2 instances on same GPU better, although it works exactly the same)
  • not possible to use with docker-compose and other tools for managing Docker containers
  • only Nvidia GPUs are supported (what if someone would want to use a GPU from another vendor? or a FPGA device?)
  • no support for OpenCL
  • vendor lock-in

Feedback

If you encounter any bugs or have feature requests, please file them in the issue tracker or even develop it yourself and submit a pull request over GitHub.

License

Copyright © 2016-2017 gw0 [http://gw.tnode.com/] <gw.2017@ena.one>

This library is licensed under the GNU Affero General Public License 3.0+ (AGPL-3.0+). Note that it is mandatory to make all modifications and complete source code of this library publicly available to any user.

Docker Pull Command
Owner
gw000
Source Repository

Comments (0)