Crawler uses a distributed queue to publish messages with the URLs to crawl. Nodes subcribed to the queue pull the messages and crawl the URLs. New URLs found crawling a page are sent to the queue for other nodes to process. Crawler has a hard limit of two layers depth crawling URLs, this means that it will crawl the URLs found in the first pages its receives but it will stop there (I don't really want to download the whole internet).
Read more about this project in its GitHub readme: https://github.com/calavera/crawler#crawler
Check also the TL;DR to know the best way to bring a cluster of crawlers up and running: https://github.com/calavera/crawler#tldr