intelanalytics/hyper-zoo

By intelanalytics

Updated over 2 years ago

Analytics Zoo hyperzoo image has been built to easily run applications on Kubernetes cluster.

Image
1

2.0K

Analytics Zoo hyperzoo image has been built to easily run applications on Kubernetes cluster. The details of pre-installed packages and usage of the image will be introduced in this page.

LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the “third-party-programs.txt” or other similarly-named text file for additional details.

Launch pre-built hyperzoo image

Pull an Analytics Zoo hyperzoo image from dockerhub:

sudo docker pull intelanalytics/hyper-zoo:latest

Speed up pulling image by adding mirrors To speed up pulling the image from dockerhub in China, add a registry's mirror. For Linux OS (CentOS, Ubuntu etc), if the docker version is higher than 1.12, config the docker daemon. Edit /etc/docker/daemon.json and add the registry-mirrors key and value:

{

"registry-mirrors": ["https://<my-docker-mirror-host>"]

}

For example, add the ustc mirror in China.

{

"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"]

}

Flush changes and restart docker:

sudo systemctl daemon-reload

sudo systemctl restart docker

If you would like to speed up pulling this image on MacOS or Windows, find the docker setting and config registry-mirrors section by specifying mirror host. Restart docker.

Then pull the image. It will be faster.

sudo docker pull intelanalytics/hyper-zoo:latest

Launch a k8s client container

Please note the two different containers: client container is for user to submit zoo jobs from here, since it contains all the required env and libs except hadoop/k8s configs; executor container is not need to create manually, which is scheduled by k8s at runtime.

sudo docker run -itd --net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /root/.kube:/root/.kube \
intelanalytics/hyper-zoo:latest bash

To specify more argument, use:

sudo docker run -itd --net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /root/.kube:/root/.kube \
-e NotebookPort=12345 \
-e NotebookToken="your-token" \
-e http_proxy=http://your-proxy-host:your-proxy-port \
-e https_proxy=https://your-proxy-host:your-proxy-port \
-e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-e RUNTIME_K8S_SERVICE_ACCOUNT=account \
-e RUNTIME_K8S_SPARK_IMAGE=intelanalytics/hyper-zoo:latest \
-e RUNTIME_PERSISTENT_VOLUME_CLAIM=myvolumeclaim \
-e RUNTIME_DRIVER_HOST=x.x.x.x \
-e RUNTIME_DRIVER_PORT=54321 \
-e RUNTIME_EXECUTOR_INSTANCES=1 \
-e RUNTIME_EXECUTOR_CORES=4 \
-e RUNTIME_EXECUTOR_MEMORY=20g \
-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-e RUNTIME_DRIVER_CORES=4 \
-e RUNTIME_DRIVER_MEMORY=10g \
intelanalytics/hyper-zoo:latest bash 
  • NotebookPort value 12345 is a user specified port number.
  • NotebookToken value "your-token" is a user specified string.
  • http_proxy is to specify http proxy.
  • https_proxy is to specify https proxy.
  • RUNTIME_SPARK_MASTER is to specify spark master, which should be k8s://https://: or spark://:.
  • RUNTIME_K8S_SERVICE_ACCOUNT is service account for driver pod. Please refer to k8s RBAC.
  • RUNTIME_K8S_SPARK_IMAGE is the k8s image.
  • RUNTIME_PERSISTENT_VOLUME_CLAIM is to specify volume mount. We are supposed to use volume mount to store or receive data. Get ready with Kubernetes Volumes.
  • RUNTIME_DRIVER_HOST is to specify driver localhost (only required when submit jobs as kubernetes client mode).
  • RUNTIME_DRIVER_PORT is to specify port number (only required when submit jobs as kubernetes client mode).
  • Other environment variables are for spark configuration setting. The default values in this image are listed above. Replace the values as you need.

Once the container is created, launch the container by:

sudo docker exec -it <containerID> bash

Then you may see it shows:

root@[hostname]:/opt/spark/work-dir#

/opt/spark/work-dir is the spark work path.

Note: The /opt directory contains:

  • download-analytics-zoo.sh is used for downloading Analytics-Zoo distributions.
  • start-notebook-spark.sh is used for starting the jupyter notebook on standard spark cluster.
  • start-notebook-k8s.sh is used for starting the jupyter notebook on k8s cluster.
  • analytics-zoo-x.x-SNAPSHOT is ANALYTICS_ZOO_HOME, which is the home of Analytics Zoo distribution.
  • analytics-zoo-examples directory contains downloaded python example code.
  • jdk is the jdk home.
  • spark is the spark home.
  • redis is the redis home.

README URL: https://github.com/intel-analytics/analytics-zoo/blob/master/docker/hyperzoo/README.md

Explore more container solutions on the Intel® oneContainer Portal

Docker Pull Command

docker pull intelanalytics/hyper-zoo