Public | Automated Build

Last pushed: 7 months ago
Short Description
Torque/PBS server running as a scheduler and worker.
Full Description

Agave DevOps Torque/PBS Container

This is a development install of the Torque scheduler running as a scheduler and worker. This image can be treated like a single node cluster for testing purposes.

What is Torque

TORQUE Resource Manager provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project* and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, reliability, and functionality and is currently in use at tens of thousands of leading government, academic, and commercial sites throughout the world. TORQUE may be freely used, modified, and distributed under the constraints of the included license.

For more information on Torque, consult the official website.

What's inside

This development container will create an admin user and three users for testing.

root:root
testuser:testuser
testshareuser:testshareuser
testotheruser:testotheruser

How to use this image

To run the container

docker run -d -h docker.example.com -p 10022:22 --privileged --name torque agaveapi/torque

This will start the container with a supervisor process which will run a sshd server on exposed port 22 and the Torque scheduler running as both a controller and worker node.

NOTE: You **must** run this image with the `--privileged` flag due to Torque's requirement for unlimited `ulimit` settings.

To submit jobs

You will need to create an interactive session in order to run jobs in this container. There are two ways to do this.

  • First, you can start a container with the default command and ssh in.
docker run -h docker.example.com -p 10022:22 -d --name torque --privileged agaveapi/torque    
ssh -p 10022 testuser@docker.example.com
  • Second, you can run an interactive container and start the services yourself.
docker run -h docker.example.com -p 10022:22 -i -t --name torque --privileged agaveapi/torque bash
bash-4.1# /usr/bin/supervisord &

In either situation, once you have a session in the container, you can submit jobs using the qsub command. A test script is included in the image at /home/testuser/torque.submit. You can submit this script to verify the
scheduler is working properly.

su - testuser -c 'qsub /home/testuser/torque.submit'
qstat

How to build the image

Build from this directory using the enclosed Dockerfile

docker build -rm -t agaveapi/torque .
Docker Pull Command
Owner
agaveapi
Source Repository

Comments (3)
carlochess
8 months ago

Please, run it using the following command:

docker run -d -h docker -p 10022:22 --privileged --name torque agaveapi/torque

aviralcse
a year ago

Also, running /usr/bin/supervisord & in bash doesn't seem to work either:

bash-4.1# /usr/bin/supervisord &
[3] 199
bash-4.1# /usr/lib/python2.6/site-packages/supervisor/options.py:295: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2016-05-26 01:15:09,871 CRIT Supervisor running as root (no user in config file)
2016-05-26 01:15:09,874 INFO supervisord started with pid 199
2016-05-26 01:15:10,876 INFO spawned: 'pbsmom' with pid 202
2016-05-26 01:15:10,878 INFO spawned: 'sshd' with pid 203
2016-05-26 01:15:10,880 INFO spawned: 'pbssched' with pid 204
2016-05-26 01:15:10,881 INFO spawned: 'pbsserver' with pid 205
2016-05-26 01:15:10,883 INFO spawned: 'trqauthd' with pid 206
2016-05-26 01:15:10,890 INFO exited: sshd (exit status 255; not expected)
2016-05-26 01:15:10,902 INFO exited: trqauthd (exit status 252; not expected)
2016-05-26 01:15:10,944 INFO exited: pbssched (exit status 1; not expected)
2016-05-26 01:15:10,947 INFO gave up: pbssched entered FATAL state, too many start retries too quickly
2016-05-26 01:15:10,947 INFO exited: pbsmom (exit status 1; not expected)
2016-05-26 01:15:10,951 INFO exited: pbsserver (exit status 3; not expected)
2016-05-26 01:15:11,953 INFO spawned: 'pbsmom' with pid 207
2016-05-26 01:15:11,955 INFO spawned: 'sshd' with pid 208
2016-05-26 01:15:11,956 INFO spawned: 'pbsserver' with pid 209
2016-05-26 01:15:11,958 INFO spawned: 'trqauthd' with pid 210
2016-05-26 01:15:11,969 INFO exited: sshd (exit status 255; not expected)
2016-05-26 01:15:11,973 INFO exited: trqauthd (exit status 252; not expected)
2016-05-26 01:15:12,029 INFO exited: pbsmom (exit status 1; not expected)
2016-05-26 01:15:12,032 INFO exited: pbsserver (exit status 3; not expected)
2016-05-26 01:15:14,036 INFO spawned: 'pbsmom' with pid 211
2016-05-26 01:15:14,037 INFO spawned: 'sshd' with pid 212
2016-05-26 01:15:14,039 INFO spawned: 'pbsserver' with pid 213
2016-05-26 01:15:14,040 INFO spawned: 'trqauthd' with pid 214
2016-05-26 01:15:14,050 INFO exited: sshd (exit status 255; not expected)
2016-05-26 01:15:14,054 INFO exited: trqauthd (exit status 252; not expected)
2016-05-26 01:15:14,109 INFO exited: pbsmom (exit status 1; not expected)
2016-05-26 01:15:14,115 INFO exited: pbsserver (exit status 3; not expected)
2016-05-26 01:15:17,120 INFO spawned: 'pbsmom' with pid 217
2016-05-26 01:15:17,122 INFO spawned: 'sshd' with pid 218
2016-05-26 01:15:17,123 INFO spawned: 'pbsserver' with pid 219
2016-05-26 01:15:17,125 INFO spawned: 'trqauthd' with pid 220
2016-05-26 01:15:17,134 INFO exited: sshd (exit status 255; not expected)
2016-05-26 01:15:17,143 INFO gave up: sshd entered FATAL state, too many start retries too quickly
2016-05-26 01:15:17,143 INFO exited: trqauthd (exit status 252; not expected)
2016-05-26 01:15:17,209 INFO gave up: trqauthd entered FATAL state, too many start retries too quickly
2016-05-26 01:15:17,210 INFO exited: pbsmom (exit status 1; not expected)
2016-05-26 01:15:17,213 INFO gave up: pbsmom entered FATAL state, too many start retries too quickly
2016-05-26 01:15:17,214 INFO exited: pbsserver (exit status 3; not expected)
2016-05-26 01:15:18,215 INFO gave up: pbsserver entered FATAL state, too many start retries too quickly

aviralcse
a year ago

I'm having some trouble SSHing into the container.
docker.example.com is not resolving to any address. I also tried SSHing using the IP Address of the container, but that gave the following message

ssh: connect to host 172.17.0.2 port 10022: Connection refused

How can I login?