Public Repository

Last pushed: a year ago
Short Description
Scrapers running on multiple containers managed with Celery
Full Description

Scraping using Celery on Docker

Installed basic dependencies of celery to run with a separated host of RabbitMQ. To be able to run it, first create a new network to work in

docker network create my-network

Then, pull and run the RabbitMQ image within the created network

docker run -d --net my-network --name rabbitmq_X rabbitmq:3

Currently, there isn't a Dockerfile, so better run with an interactive terminal to later execute the celery process

docker run -it --net my-network --name celery_X javg44/celery-basic

The source code is into the /opt/app directory and there you will find the file, you can edit to add a backend result resource, add more tasks, etc. This project is meant to run a master from a container and multiple containers to better scale an application, so this base image needs to be into each one of the containers to perform the tasks.


The architecture recommended is the following

docker run -it --net my-network --name celery_master javg44/celery-scraper
docker run -it --net my-network --name celery_worker_N javg44/celery-scraper

*In case you have troubles running Celery, just add the following environmental variable $ export C_FORCE_ROOT="true" and verify it was added correctly $ echo $C_FORCE_ROOT. Once N+1 containers are running just need to change the IP Address that is declared in the celery instance (inside to the one that is assigned to the RabbitMQ container. Then, from each one of the worker containers run:

cd /opt/app
celery -A tasks worker -c 10 --loglevel=INFO

So they can listen to the master running requests or queries. Now inside the master, just need to be located inside /opt/app and you can run python so it reads all the stores located inside the stores.json file.

Docker Pull Command