Public Repository

Last pushed: 5 months ago
Short Description
Run the unchanged mnist_replica.py TensorFlow demo on a Amazon Web Services EC2 host.
Full Description

The image bundles a TensorFlow (1.0.1) installation, the unchanged mnist_replica.py code, the downloaded mnist-data and a small start.sh script.

The current version requires:

  1. to run on a Amazon Web Services EC2 host.
  2. to forward parts of the mnist_replica.py configuration at container start time with the --env flag.
    # Forwarding configuration:
    # 
    # docker run         -> mnist_replica.py
    # --env WORKER="..." -> --worker_hosts "..."
    # --env PS="..."     -> --ps_hosts "..."
    # --env FLAGS="..."  -> --flag*=value --flag2*=value ...)
    

*The flags: --data_dir, --worker_hosts,--ps_hosts, --job_name and --task_index are handled by the start script and should not be used in FLAGS.

Entrypoint: /mnist/start.sh

#!/usr/bin/env bash

# -----------------------------------------------------------
# | Configure and start a TensorFlow cluster node using:    |
# | - mnist_replica.py                                      |
# |   git clone https://github.com/tensorflow/tensorflow    |
# |   cd ./tensorflow/tensorflow/tools/dist_test/python/    |
# |                                                         |
# | - Requires container running on AWS EC2 host with       |
# |   http://169.254.169.254/latest/meta-data/local-ipv4    |
# |   available.                                            |
# |                                                         |
# | - Requires WORKER, PS and FLAGS as env variables:       |
# |   docker service create --env                           |
# |   docker run --env                                      |
# |                                                         |
# | 1. WORKER and PS must be a comma separated and ordered  |
# |    list because we use the list index for --task_index. |
# |                                                         |
# |    WORKER="worker-0-ip:PORT,worker-1-ip:PORT,..."       |
# |    PS="ps-0-ip:PORT,ps-1-ip:PORT,..."                   |
# |                                                         |
# | 2. FLAGS must be a space separated list and can be      |
# |    unordered.                                           |
# |                                                         |
# |    FLAGS="--flag=value --flag2=value ..."               |
# |                                                         |
# |    The flags: --data_dir, --worker_hosts,--ps_hosts,    |
# |               --job_name and --task_index are handled   |
# |               by the script and should not be used in   |
# |               FLAGS.                                   |
# |                                                         |
# |---------------------------------------------------------|

...

start.sh error codes

(3) - ERROR: could not fetch EC2 hosts local-ipv4 metadata!
(4) - ERROR: host IP is not part of the cluster!
(1) - Most likely you forwarded a malformed FLAGS string and TensorFlow did not start correctly.

Docker Pull Command
Owner
kittycat

Comments (0)