Public Repository

Last pushed: 7 months ago
Short Description
export data from Apache Cassandra to Apache Kafka
Full Description

What is cassandra2kafka?

It is an Alpine image with a Golang program using gocql and samara drivers which allows you to export data from Apache Cassandra to Apache Kafka.
The keyspace's table which contains the information you want to export must have tree colums: id (timeuuid), utctime (timestamp) or sequence (int), and data (text); PRIMARY KEY (id, utctime).You can control which data is exported depending on their timestamp or sequence.

Features

  • Works with Apache Cassandra 3.x and Apache Kafka >= 0.8
  • It prints state messages in stderr
  • Very easy to configure trough environment variables
  • Auto discover kafka peers from DNS name
  • Waits for kafka to be ready
  • Auto reconnect and retry in case of error

How to use this image

Example:

$ docker run --name some-cassandra2kafka --env CLUSTER="172.16.56.183" --env KEYSPACE="example" --env TABLE="imported" puyi/cassandra2kafka

IMPORTANT: if you do not set up the required enviroment variables, it will not work correctly

Required Enviroment Variables

CLUSTER

This variable is for controlling which Cassandra IP address to connect to

KEYSPACE

This variable is for controlling which Keyspace you want to use

TABLE

This variable is for controlling which Keyspace's Table you want to use to store the imported information.

Others Enviroment Variables

OPT

Indicates if DATA is a timestamp, equal to -t, or a sequence number, equal to -s. The default value is -t

DATA

If you set OPT to -t: UTC timestamp. expcassandra gives all the entries on the table which have equal or later timestamp than DATA.
If you set OPT to -s: sequential sequence. expcassandra gives all the entries on the table which have equal or later sequence than DATA.

The default value is "0001-01-01 00:00:00.001", which means that all the data will be exported.

LIMIT

Page size in each query. The default value is 100, which means that 100 entries on the specified Keyspace's Table will be exported in each query

SLEEPTIME

Sleep time (milliseconds) between queries. The default value is 100.

With LIMIT and SLEEPTIME you can configure how fast expcassadra gives the resoults. If LIMIT is very high, there will be timeout response problems, be careful

KAFKA_SERVICE

The DNS name for input Kafka broker service. The default value is "kafka".

KAFKA_PORT

Port to connect to input Kafka peers. The default value is 9092

TOPIC

The topic to consume. The default value is "mytopic"

PARTITION

The partition to produce to. The default value is empty (""), which means that it will produce to all partitions.

KEY

The key of produced messages. The default value is empty (""), which means that it will produce in every topic.

Date format

yyyy-mm-dd HH:mm:ss.sss

Expressed in UTC time zone

Related work

puyi/kafka2cassandra
puyi/expcassandra
puyi/impcassandra
kafka-console-producer
kafka-concole-consumer

Docker Pull Command
Owner
puyi