Public Repository

Last pushed: 2 years ago
Short Description
a distributed analytic engine containing R Mahout Spark Neo4j h2o Oryx MOA & Romar.
Full Description

A distributed analytic engine that is all based on open technologies, including analytic algorithms potentially in thousands from R, Mahout, Spark, Neo4j, h2o, Oryx, MOA, Romar and many other packages.

The engine essentially focuses on unifying all these packages, many of which are very disparate in nature to each other. The engine uses R as the primary integrating component, where RServe within the engine is accepting R requests from the consuming Rclient applications, such as from Weka, visone, or KNIME's R editors, or directly from RStudio.

Architecturally, Weka, KNIME, visone and RStudio, plus a visualization package Gephi, constitute a very handy suite of workbench that makes direct service calls to this engine. And this analytic engine is sitting on a real-time, distributed and linearly scalable lambda architecture that is associated with this engine.

The RServe in the engine in turn uses various connector packages such as RNeo4j, SparkR, h2o, RMOA to leverage the underlying Neo4j, Spark, h2o, MOA analytic engines. The RServe in addition uses RHadoop to connect to the hadoop storage, and uses RDruid to connect to the Druid-based real-time distributed lambda architecture. The RServe component also has added-on adapters to leverage Mahout, Oryx and Romar packages.

The engine is additionally bundled with a zookeeper and a hadoop/yarn node, which can be easily configured to be part of any larger existing hadoop clusters.

With an unusual multitude of math algorithms included in the analytic engine, it can be readily adapted to many different analytic use cases: descriptive analytics, diagnostic analytics, and predictive analytics. It can handle both the Hadoop/YARN based batch jobs, and the Kafka-Storm-Druid real-time processing, as well as the Spark-based streaming tasks.

Many of these included math packages are still in various incubating/development phases. The engine should be flexible in that it can readily update and include new versions or new packages.

Docker Pull Command