grahamdumpleton/s2i-ipython-notebook
Experimental S2I builder for iPython notebooks.
774
This repository contains the source code for a proof of concept IPython Docker image. It is being provided to stimulate discussion on creating better ways of hosting IPython notebooks using cloud hosting services.
Only Python 2.7 is supported at this point in time. The resulting Docker image can be found on the Docker Hub registry as grahamdumpleton/s2i-ipython-notebook.
The Docker image actually provides a number of capabilities. These are:
The Docker image can be used as a base image to create a custom derived image which installs additional required system packages, Python packages, or which incorporates IPython notebooks to be used via the IPython notebook browser.
The Docker image also embeds support for the Source to Image (S2I) tool for incorporating from a GIT repository IPython notebooks, Python package requirements and other files such as data set files, into a new Docker image, without you needing to know how to create a Docker image yourself.
The S2I tool can be used directly in a local environment, or indirectly through being triggered from an OpenShift 3 environment.
To simplify hosting of IPython using OpenShift, template definitions for OpenShift are provided which can be loaded to streamline creation of IPython notebook viewer instances and IPython engine clusters under OpenShift.
The Docker image created from this repository is a proof of concept only. The author is not a regular user of IPython and thus is not qualified to dictate what may be the best way of setting up an IPython environment using cloud technologies which may improve the productivity of more serious IPython users.
The work done so far will only therefore progress further with sufficient feedback, or better still, direct involvement of people who are users of IPython who see some benefit from developing it further and who may be willing to take over the project.
Areas which still need to be looked are:
At this point in time the following known issues also exist:
Finally be aware that since this is a proof of concept, don't expect that it will continue to work in the same way over time or even that the names of images will stay the same. It is likely inevitable that there will be changes as the concept is developed.
If you do try out the Docker images and use it with OpenShift, please register your interest in the repository and what it provides so that it is known that people are using it.
The OpenShift templates can be found at:
To load the templates into an OpenShift environment you can use the command:
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/s2i-ipython-notebook/master/ipython-templates.json
If desired and you have administration access, you can also load the templates into the openshift
namespace so that they are available automatically across all projects.
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/s2i-ipython-notebook/master/ipython-templates.json -n openshift
Two different application templates for IPython notebook viewers are provided for OpenShift. These are:
requirements.txt
file located in the root directory of the GIT repository.Both types of IPython notebook viewers will be automatically made available on a public URL using a secure connection (HTTPS). A specific password for the IPython notebook viewer can be nominated or a random one created for you.
Two different application templates for an IPython cluster are also provided. These are:
requirements.txt
file located in the root directory of the GIT repository.When creating an IPython engine cluster it can be linked by name to specific IPython notebook viewer instances to allow allocation of the resources to a specific user.
Although only a single instance of an IPython engine will be created initially, additional instances can be created by scaling up the ipengine
pod.
The application templates can be found when adding to a project using the UI by filtering on ipython
.
To create an empty workspace with no exiting notebooks, select the ipython-notebook-workspace
template from the UI when adding to an existing project. You will be presented with the following template parameters.
These parameters are all optional. If not supplied they will be filled out with random values.
The purpose of the parameters are as follows:
To create a workspace which is populated with the contents of a GIT repository, for example, notebooks, data set file etc, select the ipython-notebook-repository
template from the UI when adding to an existing project. You will be presented with the following template parameters.
The additional parameter in this case is:
requirements.txt
file in the root directory of the repository, those additional Python packages will also be installed.In both cases, the final application name in OpenShift will be the expansion of:
nbviewer-${IPYTHON_USER_LABEL}-${IPYTHON_CLUSTER_LABEL}
When the user label or cluster label are not supplied the random value used will consist of 5 lower case characters.
If you did not supply a password when creating the IPython notebook viewer a random 8 character password will be provided for you. You can access this password by interrogating the set of environment variables recorded against the deployment configuration for the application. You will need to use the oc
program on the command line for this.
For example, if the final application name were nbviewer-6232c-07de2
you would use the oc env
command:
$ oc env dc/nbviewer-6232c-07de2 --list
# deploymentconfigs nbviewer-6232c-07de2, container nbviewer-6232c-07de2
IPYTHON_CONTAINER_TYPE=viewer
IPYTHON_CLUSTER_LABEL=07de2
IPYTHON_USER_PASSWORD=a7cabcc3
The password is that listed against the IPYTHON_USER_PASSWORD
environment variable. This can then be supplied in the password field when visiting the IPython notebook viewer.
Once signed in you will then be presented with the IPython notebook viewer workspace.
When using IPython notebook viewer, when you open each notebook a local IPython process will be created for running that notebook. Everything that now runs within that notebook will run within that single process.
If you want to run any algorithms that could benefit from parallelism you have a few options.
The first is to make use of multithreading within the local process. For CPU intensive operations this will though not provide any benefit due to the limitations of the Python global interpreter lock. Use of multi threading within IPython would also get complicated due to the way that code blocks can be re executed at will. Use of multi threading is therefore not recommended.
The second alternative to achieving parallelism is to make use of the Python multiprocessing module. This allows work to be farmed out to separate processes within the same host or container.
A third option is to use IPython's own support for parallel computing.
With this third option a seperate cluster of processes is set up which code executing within the IPython notebook viewer would communicate with to distribute tasks. This cluster can consist of processes on the same host, or could be distributed across others hosts or containers as well, with the later having the potential for providing access to a much greater amount of resources.
It is this final option which this Docker image targets and which when combined with OpenShift, provides a simple way of backing an IPython notebook viewer with a IPython parallel computing cluster distributed across one or more hosts.
To create an empty IPython engine cluster, select the ipython-cluster-workspace
template from the UI when adding to an existing project. You will be presented with the following template parameters.
The purpose of the parameters are as follows:
In order to associate this IPython cluster with a specific users IPython notebook viewer instance, it should be provided the same value as that which was used for the similar field when creating the IPython notebook viewer application.
As no user name need be supplied, you technically could link up the IPython cluster with multiple IPython notebook viewer application instances for different users if that made sense for the use case.
To create an IPython engine cluster which is populated with the contents of a GIT repository, for example, data set files etc, select the ipython-cluster-repository
template from the UI when adding to an existing project. You will be presented with the following template parameters.
The additional parameter in this case is:
requirements.txt
file in the root directory of the repository, those additional Python packages will also be installed.At present this is notionally seen as most likely being the same GIT repository as is used to populate the IPython notebook viewer application instance, however, this need not be the case.
In both cases, the final application name in OpenShift will be the expansion of:
ipcluster-${IPYTHON_CLUSTER_LABEL}
When the cluster label is not supplied the random value used will consist of 5 lower case characters.
So long as the same cluster label is used in the IPython notebook viewer application instance and the IPython engine cluster they will be automatically linked together.
To make use of the cluster from the notebook viewer, you would then start out with the Python code:
from IPython.parallel import Client
cli = Client()
cli.ids
For an example of using IPython.parallel
, see the the Lecture-6B-HPC notebook provided by the default GIT repository sourced from:
when using the ipython-notebook-repository
template.
Although the examples shown here use the OpenShift UI to create instances of the IPython notebook viewer and IPython engine cluster, the OpenShift oc new-app
command could just as readily be used. The --param
or -p
option should be used with oc new-app
to fill in any template parameters.
OpenShift definitely makes using these Docker images much easier but there is nothing special about the images that would preclude using them with Docker directly separate to any OpenShift installation. When using the Docker images directly you will need to do a lot more work in respect of exposing ports and linking the containers together. For now, how to use the Docker images directly with docker
will not be described, but it definitely is possible.
docker pull grahamdumpleton/s2i-ipython-notebook