Public | Automated Build

Last pushed: a year ago
Short Description
basic dev env with latest dolfin + narwal codes pre-installed in /usr/local
Full Description

Using Containers on Cray XC Supercomputers

This page is a distillation of information available at NERSC and my various correspondences with their super helpful consultants. Below are some things I learned and tweaked to get Ubuntu based images working and running MPI codes on both Cori and Edison.

  • DO NOT have dependencies on running as root or PID 1. This runs somewhat counter to the paradigm preferred for web-apps/micro-services based containers but make sure everything you need to run can be run as a non-privileged user.
  • Make sure your images are available on docker hub. This image has a compiled version of the Poisson demo from FEniCS Project, feel free to use it to test things out if you have access to Cray XC with Docker/Shifter.
  • For now, make sure your image is in a public repo.
  • For reliability, copy/mv your executable(s) and script(s) into a DIR that is in your $PATH, /usr/local/bin works pretty well.

Some basics to get started:

  1. ssh into your login node
  2. Load the module for shifter and pull image(s) you will run:

    ```module load shifter ;

    shifterimg -v pull docker:loryza:nersc-bench:latest```
    

This means that you're pulling from hub.docker.com, from user or group named loryza, the repo nersc-bench with tag latest. Depending on how large your docker image is, that will take a few minutes. When you check by rerunning the shifterimg -v pull docker:loryza:nersc-bench:latest command, it should return something like this to indicate that it is READY for use.

Message: {
  "ENTRY": null,
  "ENV": [
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "PETSC_VERSION=3.6.3",
    "SLEPC_VERSION=3.6.2",
    "SLEPC_DIR=/usr/local",
    "PETSC_DIR=/usr/local",
    "SWIG_VERSION=3.0.7",
    "TRILINOS_VERSION=12.4.2"
  ],
  "WORKDIR": "MISSING",
  "groupAcl": [],
  "id": "52f68847b6f41f193206587272e4c75cd5b00c36a701f540093c27c6a66cefa4",
  "itype": "docker",
  "last_pull": 1455235653.600452,
  "status": "READY",
  "system": "cori",
  "tag": [
    "loryza:nersc-bench:latest"
  ],
  "userAcl": []
}

There's an MPI enabled C++ demo named demo_poisson compiled and installed in /usr/local/bin. To run it on 2 Cori nodes (that's 32 cores), a SLURM batch file like this should work:

#!/bin/bash
#SBATCH --image=docker:loryza:nersc-bench:latest
#SBATCH -N 2
#SBATCH -p debug
#SBATCH -t 00:05:00
shifter --image=docker:loryza:nersc-bench:latest mpirun -np 32 demo_poisson

A quick line-by-line translation:

  • bash shell, please
  • using a docker image
  • using 2 nodes
  • request 5 minutes of wall time
  • run via shifter, in that docker image, the compiled executable demo_poisson using mpirun across 32 cores.

To submit the job via SLURM

sbatch demo_poisson.sl

Now wait for your job to complete :) When the job completes, it will dump out the output files (in this case a PVD file and a bunch of VTU files) into the directory from which the batch file was launched.

Docker Pull Command
Owner
loryza
Source Repository

Comments (0)