Public | Automated Build

Last pushed: 8 months ago
Short Description
Sparkler build for Domain Discovery evaluation
Full Description

Sparkler Crawl Environment

The Sparkler Crawl Environment aims at providing an efficient, scalable, consistent and reliable software architecture consisting of domain discovery tools able to enrich a given domain by expanding the collection of artifacts that define the domain.

This repository, named sce, provides a command-line utility for building Sparkler Crawl Environment as a multi-container Docker application running through the Docker Compose tool on a single node. As a PoC, you can easily install the Sparkler Crawl Environment on a single node using the bash script that automatically builds and starts up all the software components:

./ [-l /path/to/log]

The Sparkler Crawl Environment is built on top of Sparkler, a new web crawler that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and Felix. Sparkler is an extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster.

Docker Pull Command
Source Repository