Public Repository

Last pushed: a year ago
Short Description
Crossdata is a distributed framework and a general-purpose computing system powered by Apache Spark.
Full Description

Supported tags

  • 1.5.0(latest)(Scala 2.10)
  • 1.4.0
  • 1.3.0

For more information about Strario Crossdata and its history, please see GitHub repo

What is Stratio Crossdata?

Stratio Crossdata is a distributed framework and a fast and general-purpose computing system powered by Apache Spark. It unifies the interaction with different sources supporting multiple datastore technologies.

  • High availability and scalability.
  • Speed-up query resolutions through native access.
  • Supports batch and streaming queries.
  • Metadata discovery.
  • Improves and extends Apache Spark capabilities.
  • Deploy Crossdata as a Spark library.

    An extensible datahub

    • Unifies stream and batch processing using a common language
    • Ability to access different datastore technologies
    • Optimised connectors with native access for Cassandra, MongoDB, Elasticsearch
    • Users can easily add new connectors for native access
    • Extends existing datastore capabilities (joins, group by…)
    • Easy to use SQL-like language
    • Java/Scala API and Query Builder
    • ODBC/JDBC drivers to link with existing BI tools
    • Distributed scalable fault-tolerant P2P architecture

For detailed information about Stratio Crossdata, please see the confluence documentation.

How to use this image

You can run in a simple way the default Stratio Crossdata command:

docker run --name crossdata-app stratio/crossdata-scala210

Using a custom Stratio Crossdata configuration

Stratio Crossdata startup configuracion is specified in the files /etc/sds/crossdata/server-application.conf and /etc/sds/crossdata/core-application.conf.

When you start the Stratio Crossdata image, you can adjust the configuration of the Crossdata Server and the Crossdata Catalog by passing one or more environment variables on the docker run command line.

Server-application.conf environment variables

XD_SEED

This variable is optional and you must set it if you want to create a cluster of crossdata servers. The value hs to be the ip of the container which acts as a seed.

  • XD_SEED="172.17.0.2"
SPARK_MASTER

This variable is optional and you must set it if you want to start Stratio Crossdata over Spark Cluster. The default value is local[4]. Example:

  • SPARK_MASTER="spark://stratio.spark.com:7077"
XD_DRIVER_MEMORY

This variable is optional and you must set it if you want to define the Spark Driver Memory. The default value is 512M.

XD_EXECUTOR_MEMORY

This variable is optional and you must set it if you want to define the Spark Executor Memory . The default value is 512M.

XD_CORES

This variable is optional and you must set it if you want to define the maximun number of Spark cores (Spark Cores Max). The default value is 4 .

Core-application.conf Streaming environment variables

Crossdata Streaming mode is not enabled by default.

XD_MODE

To enable streaming mode, you need to set XD_MODE which only accepts "Streaming" as a value.

  • XD_MODE=Streaming
docker run --name crossdata-app -e XD_MODE=Streaming -e XD_ZOOKEEPER_CONNECTION_STRING="172.17.0.4:2181" -e XD_KAFKA_CONNECTION_STRING="172.17.0.4:9092" -e SPARK_MASTER="spark://stratio.spark.com:7077" stratio/crossdata-scala210

The following environment variables are mandatory if Streaming Mode has been enabled.

XD_ZOOKEEPER_CONNECTION_STRING

This variable is mandatory and specifies the connection string for Zookeeper. For instance:

  • XD_ZOOKEEPER_CONNECTION_STRING="172.17.0.4:2181"
XD_KAFKA_CONNECTION_STRING

This variable is mandatory and specifies the connection string for Kafka. For instance:

  • XD_KAFKA_CONNECTION_STRING="172.17.0.4:9092"
SPARK_MASTER

This variable is mandatory. For instance:

  • SPARK_MASTER="spark://stratio.spark.com:7077"

Core-application.conf catalog environment variables

By default Stratio Crossdata uses an embedded Apache Derby to persist all the metadata. You can define your own catalog within these three options: MySQL, PostgreSQL or Zookeeper.

XD_CATALOG

This variable is optional and you must set it if you want to use other catalog(MySQL, PostgreSQL or Zookeeper). It only accepts three values:

  • XD_CATALOG=MySQL
  • XD_CATALOG=PostgreSQL
  • XD_CATALOG=Zookeeper

MySQL catalog

Starting a Stratio Crossdata server over MySQL catalog:

docker run --name crossdata-app -e XD_CATALOG=MySQL -e XD_CATALOG_HOST=172.17.0.3 -e XD_CATALOG_DB_NAME=crossdata -e XD_CATALOG_DB_TABLE=crossdataTable -e XD_CATALOG_DB_USER=root -e XD_CATALOG_DB_PASS=rootPass stratio/crossdata-scala210
XD_CATALOG_HOST

If you define the XD_CATALOG to MySQL, this variable is mandatory and specifies the host in which MySQL is. For instance:

  • XD_CATALOG_HOST="172.17.0.3"
XD_CATALOG_DB_NAME

This variable is mandatory and specifies the database name that will be created in the MySQL. For instance:

  • XD_CATALOG_DB_NAME="crossdata"
XD_CATALOG_DB_TABLE

This variable is mandatory and specifies the table name that will be created in the MySQL to persist the metadata. For instance:

  • XD_CATALOG_DB_TABLE="crossdataTable"
XD_CATALOG_DB_USER

This variable is mandatory and specifies the user for MySQL access. For instance:

  • XD_CATALOG_DB_USER="root"
XD_CATALOG_DB_PASS

This variable is mandatory and specifies the pass for the user access to MySQL. For instance:

  • XD_CATALOG_DB_PASS="rootPass"

PostgreSQL catalog

Starting a Stratio Crossdata server over PostgreSQL catalog:

docker run --name crossdata-app -e XD_CATALOG=PostgreSQL -e XD_CATALOG_HOST=172.17.0.3 -e XD_CATALOG_DB_NAME=crossdata -e XD_CATALOG_DB_TABLE=crossdataTable -e XD_CATALOG_DB_USER=root -e XD_CATALOG_DB_PASS=rootPass stratio/crossdata-scala210
XD_CATALOG_HOST

If you define the XD_CATALOG to PostgreSQL, this variable is mandatory and specifies the host in which PostgreSQL is. For instance:

  • XD_CATALOG_HOST="172.17.0.3"
XD_CATALOG_DB_NAME

This variable is mandatory and specifies the database name that will be created in the PostgreSQL. For instance:

  • XD_CATALOG_DB_NAME="crossdata"
XD_CATALOG_DB_TABLE

This variable is mandatory and specifies the table name that will be created in the PostgreSQL to persist the metadata. For instance:

  • XD_CATALOG_DB_TABLE="crossdataTable"
XD_CATALOG_DB_USER

This variable is mandatory and specifies the user for PostgreSQL access. For instance:

  • XD_CATALOG_DB_USER="root"
XD_CATALOG_DB_PASS

This variable is mandatory and specifies the pass for the user access to PostgreSQL. For instance:

  • XD_CATALOG_DB_PASS="rootPass"

Zookeeper catalog

Starting a Stratio Crossdata server over Zookeeper catalog:

docker run --name crossdata-app -e XD_CATALOG=Zookeeper -e XD_CATALOG_ZOOKEEPER_CONNECTION_STRING=172.17.0.4:2181 stratio/crossdata-scala210
XD_CATALOG_ZOOKEEPER_CONNECTION_STRING

This variable is mandatory and specifies the connection string for Zookeeper. For instance:

  • XD_CATALOG_ZOOKEEPER_CONNECTION_STRING=172.17.0.4:2181
XD_CATALOG_ZOOKEEPER_CONNECTION_TIMEOUT

This variable is optional and you must set it if you want set the Zookeeper Connection Timeout . The default value is 15000. For instance:

  • XD_CATALOG_ZOOKEEPER_CONNECTION_TIMEOUT=10000
XD_CATALOG_ZOOKEEPER_SESSION_TIMEOUT

This variable is optional and you must set it if you want set the Zookeeper Session Timeout . The default value is 60000. For instance:

  • XD_CATALOG_ZOOKEEPER_SESSION_TIMEOUT=10000
XD_CATALOG_ZOOKEEPER_RETRY_ATTEMPS

This variable is optional and you must set it if you want set the Zookeeper Retry Attemps . The default value is 5. For instance:

  • XD_CATALOG_ZOOKEEPER_RETRY_ATTEMPS=3
XD_CATALOG_ZOOKEEPER_RETRY_INTERVAL

This variable is optional and you must set it if you want set the Zookeeper Retry Attemps . The default value is 10000. For instance:

  • XD_CATALOG_ZOOKEEPER_RETRY_INTERVAL=50000

Get Support

You can also find help in google groups

Alternatively, you can reach us at gitter or on our IRC channel #stratio-crossdata. Feel free to ask, if we are available we'll try to help you.

Docker Pull Command
Owner
stratio