Run a Scrapyd service in Docker
A Scrapyd docker image based on the tiny Alpine Linux.
Download and run Scrapyd image
docker run -d --restart always --name scrapyd -p 6800:6800 harrisbaird/scrapyd
!!!!!! WARNING !!!!!!
It's a bad idea to publicly expose a Scrapyd service without some form of authentication as anyone will be able access your spiders or even worse - run code on your server.
Instead you should either:
- Deploy directly during the docker build process and link the containers which require access to Scrapyd.
- Expose Scrapyd via something like Nginx with basic auth enabled. You can then add the username and password to your
scrapy.cfgand then deploy with
Uploading spiders to Scrapyd
Install scrapyd-client and use
scrapyd-deploy in your spider directory to package it up and upload it to Scrapyd.
You'll need to add some config to your project's
scrapy.cfg file, take a look at scrapyd-client for more info.
Uploading spiders during docker build process
You can deploy your spiders directly to Scrapyd during the docker build process.
This way you can just link to the container instead of publicly exposing your Scrapyd service.
FROM harrisbaird/scrapyd:latest RUN mkdir /app WORKDIR /app COPY . /app RUN scrapyd & PID=$! && \ echo "Waiting for Scrapyd to start" && \ sleep 2 && \ scrapyd-deploy && \ kill $PID
Scheduling a job
curl http://localhost:6800/schedule.json -d project=default -d spider=my_spider_name
Listing all jobs
curl http://localhost:6800/listjobs.json -d project=default
For a full list of endpoints, take a look at the Scrapyd docs.