Public | Automated Build

Last pushed: 3 years ago
Short Description
Short description is empty for this repo.
Full Description


Demo of a crawler service implemented in python. This service is made up the following components:

  • : Rest server, in charge of receiving petitions to crawl and returning responses
  • : Taks server, in charge of performing the crawling and data analysis, designed to scale horizontally


The following instructions assume the user is running in OSX, and it has boot2docker configured

  • Start a rabbitmq container:
    > docker run -d -p 5672:5672 -p 15672:15672 -v /tmp/log:/data/log -v /tmp/data/mnesia:/data/mnesia dockerfile/rabbitmq
  • Start a mongodb container:
    > docker run -d -p 27017:27017 -v /tmp/data/db:/data/db --name mongodb dockerfile/mongodb
  • Get the IP of the docker vm:
    > boot2docker ip (assuming is for the examples)
  • Create a mongo user for the app to use:
    mongo --eval 'db.getSiblingDB("tasks").addUser({user:"tasks", pwd:"tasks", roles:["readWrite"]})'
  • Start a rest container:
    > docker run -d -p 5000:5000 -e ROLE=rest -e MESSAGING="amqp://guest:guest@" -e DB="mongodb://tasks:tasks@" ramiro/crawler
  • Start a task container:
    > docker run -d  -e ROLE=task -e MESSAGING="amqp://guest:guest@" -e DB="mongodb://tasks:tasks@" ramiro/crawler


  • Create a new job:
    > curl -X POST -d '{"urls":[""]}' -H Content-Type:application/json
    "id": "97a2ad0b-0bd6-412f-a5f3-cef49fc12aa9"
  • Check the status of the job:
    "completed": 0, 
    "files": [], 
    "id": "97a2ad0b-0bd6-412f-a5f3-cef49fc12aa9", 
    "total": 1, 
    "updated": "Wed, 24 Sep 2014 08:18:25 GMT"
Docker Pull Command
Source Repository