Public Repository

Last pushed: 11 days ago
Short Description
Official docker images for IBM PowerAI
Full Description

Requirements

PowerAI is optimized to leverage the unique capabilities of IBM Power
Systems accelerated servers, and is not available on any other
platforms. It is supported on:

  • IBM AC922 POWER9 system with NVIDIA Tesla V100 GPUs
  • IBM S822LC POWER8 system with NVIDIA Tesla P100 GPUs

Host System Requirements

Component Required Recommended
Red Hat 7.5 7.5
Docker 1.13.1 1.13.1
NVIDIA Docker 1.0, 2.0* 2.0
NVIDIA GPU driver 396 396.44
  • Installing NVIDIA GPU driver on Red Hat 7.5

    You can find the driver install instructions here

    NOTE: You do not need to install cuDNN or NCCL on your host environment

  • Installing Docker on Red Hat 7.5

    You can find the install instructions here

  • Installing NVIDIA Docker on Red Hat 7.5

    • Install nvidia-docker 1.0

      • Add NVIDIA's repository to your yum configuration

        distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
        curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
        sudo tee /etc/yum.repos.d/nvidia-docker.repo
        

         

      • Install nvidia-docker package

        yum repolist && sudo yum install nvidia-docker
        

         

    • Install nvidia-docker 2.0

      nvidia-docker 2.0 for RHEL on ppc64le requires the use of nvidia-runtime-hooks, instead of the nvidia-docker binary. This means there will no longer be an nvidia-docker binary installed on your system.

      All docker run commands that spin up images built with NVIDIA Environmental variables(https://github.com/NVIDIA/nvidia-container-runtime#environment-variables-oci-spec) will automatically have gpu capabilities.

      To install the hooks, follow the steps listed on NVIDIA's github page.

      https://github.com/NVIDIA/nvidia-docker#centos-7-docker-rhel-7475-docker

      NOTE: SELINUX users will have to change the filecontext on the /dev/nvidia devices to container_file_t so they can be properly shared with docker.

       sudo chcon -t container_file_t  /dev/nvidia*
      

Docker Binaries

From here on out, the docker command you use will be determined by the version of nvidia-docker you install. We will use $DOCKER_BINARY to signify the default docker command for your particulary configuration

nvidia-docker 1.0 -- $DOCKER_BINARY=nvidia-docker

nvidia-docker 2.0 -- $DOCKER_BINARY=docker


Using the PowerAI image from Docker Hub

To start up a PowerAI container run

$DOCKER_BINARY run -ti --env LICENSE=yes ibmcom/powerai:<tag> bash 

 
PyTorch Users: If you plan on using any multiprocessor data loader with PyTorch. The default shared memory segment size for the container may not be large enough. You can increase the shared memory size with either --ipc=host or --shm-size command line options on $DOCKER_BINARY run

You can read more about this issue on PyTorch's Readme https://github.com/pytorch/pytorch/blob/master/README.md
under the "Docker image" section.


License Acceptance

You must accept the licenses of all included components before using a PowerAI container. The licenses can be viewed at https://hub.docker.com/r/ibmcom/powerai

  • Accept the licenses at docker runtime by adding the
    --env LICENSE=yes parameter on the $DOCKER_BINARY run command line

or

  • Accept the license within an already running container
    /opt/DL/license/bin/accept-powerai-license.sh

Available Tags

  • latest - PowerAI 1.5.3 and Anaconda for python 2.7 with the following frameworks preinstalled

  • 1.5.3-all-ubuntu16.04 - PowerAI 1.5.3 and Anaconda for python 2.7 with the following frameworks preinstalled

  • 1.5.3-all-ubuntu16.04-py3 - PowerAI 1.5.3 and Anaconda for python 3.6 with the following frameworks preinstalled

  • 1.5.2-all-ubuntu16.04 - PowerAI 1.5.3 and Anaconda for python 2.7 with the following frameworks preinstalled

  • 1.5.2-all-ubuntu16.04-py3 - PowerAI 1.5.3 and Anaconda for python 3.6 with the following frameworks preinstalled


Installed Packages

PowerAI 1.5.3 provides software packages for several Deep Learning
frameworks, supporting libraries, and tools:

Component 1.5.2 Images 1.5.3 Images
Distributed Deep Learning (DDL) 1.0.0 1.1.0
TensorFlow 1.8.0 1.10.0
TensorBoard 1.8.0 1.10.0
IBM Caffe 1.0.0 1.0.0
BVLC Caffe 1.0.0 1.0.0
PyTorch 0.4.0 0.4.1
Snap ML 1.0.0 1.0.0
Spectrum MPI 10.2 10.2
Bazel 0.10.0 0.15.0
OpenBLAS 0.2.20 0.3.2
Protobuf 3.4.0 3.4. 0

PowerAI requires some additional 3rd-party software components:

Component Version
Ubuntu 16.04
NVIDIA CUDA 9.2.148
NVIDIA cuDNN 7.2.1
NVIDIA NCCL 2.2.13
Anaconda 5.2.0

Container Usage

User Ids

PowerAI images all come with a default userid: pwrai

  • pwrai is setup as a privileged user with password-less sudo access.

  • pwrai has a uid:gid of 2051:2051

Running PowerAI Frameworks

Please reference the "Getting Started with MLDL Frameworks" page here

Pre-Activating Specific PowerAI Frameworks

New to PowerAI 1.5.3 is the ability to activate a framework on the $DOCKER_BINARY run commandline. This means you can start a container with all LIBARY and PATH variables setup for your specific container.

You can utilize this feature by adding --env ACTIVATE=$FRAMEWORK Where $FRAMEWORK is the name of the framework you want to use as your default for this container.

  • Available Frameworks Values
    • tensorflow
    • pytorch
    • caffe
    • caffe-ibm
    • caffe-bvlc
       
      NOTE: --env ACTIVATE=$FRAMEWORK must be used in conjuction with --env LICENSE=yes Frameworks won't activate without accepting the license as well.

Restoring OpenCV Libraries

caffe requires an opencv library libopencv-highgui2.4v5 to handle image processing. Unfortunately this package also includes, among other things, video codecs. Due to licensing restrictions, PowerAI containers cannot ship with certain codec libraries (h264, h265, etc). To mitigate this, we provide additional packaging which satisfies the caffe requirements while not shipping the problematic codecs.

If you wish to restore to the original libraries, the script restore_codecs is provided in the image under the /usr/local/bin directory. After executing the script once, all PowerAI modified packages will be removed, and all defaults put back in place.


FAQs

  • Error running CUDA and NVIDIA commands in a container on nvidia-docker 1.0
    If you received an error similar the one below, it could be related to an SELSELINUX inux issue between the docker container, and the nvidia-docker volume that contains the NVIDIA driver.

     NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
     Please also try adding directory that contains libnvidia-ml.so to your system PATH.
    

    In an open issue https://github.com/NVIDIA/nvidia-docker/issues/627 it was noted that sometimes after installing new NVIDIA drivers. The permissions on the nvidia-docker volume mount get assigned incompatible SELinux permissions.

    if ls -Z /var/lib/nvidia-docker/volumes/nvidia_driver/ shows a security policy of xserver_var_lib_t instead of virt_sandbox_file_t for the latest NVIDIA driver, then containers will be unable to gain access to the NVIDIA driver.

    To solve this, you must change the security policy of the NVIDIA driver via
    sudo chcon -Rt svirt_sandbox_file_t /var/lib/nvidia-docker/volumes/nvidia_driver/<driver_version>
    This will set the permissions to be set to a policy that docker can access.

  • Error running CUDA and NVIDIA commands in a container on nvidia-docker 2.0
    After starting a container, you may receive an Insufficient Permissions error upon running nvidia-* or cuda commands.

     nvidia-smi
     Failed to initialize NVML: Insufficient Permissions
    

    This is the SELINUX problem mentioned under the Install nvidia-docker 2.0 section of this README. You will need to alter permissions on the /dev/nvidia* devices.


LICENSES


NVIDIA

  • CUDA Toolkit
    To view the license for the CUDA Toolkit included in this image, click here

  • CUDA Deep Neural Network library (cuDNN)
    To view the license for cuDNN included in this image, click here


Anaconda

The Anaconda User's license can be viewed at (https://docs.anaconda.com/anaconda/eula)

The list of installed python packages under anaconda can be displayed using pip show <packagename> | grep License:

To view a python package's specific License, go to the package's website displayed by the pip show <packagename> | grep Home-page:


Ubuntu

Ubuntu's(Canonical) Legal information can be viewed at
(https://www.ubuntu.com/legal)

The list of installed Debian packages can be seen using dpkg --list

The license of a particular Debian package can be viewed inside the PowerAI image under /usr/share/<packagename>/copyright


PowerAI

View the PowerAI License and Notices locally in the image at /opt/DL/license/lap_se

View the PowerAI Container License and Notices externally at
https://github.com/IBM/powerai/tree/powerai-1.5.3/dockerhub

Docker Pull Command
Owner
ibmcom