What is cassandra2kafka?
It is an Alpine image with a Golang program using gocql and samara drivers which allows you to export data from Apache Cassandra to Apache Kafka.
The keyspace's table which contains the information you want to export must have tree colums: id (timeuuid), utctime (timestamp) or sequence (int), and data (text); PRIMARY KEY (id, utctime).You can control which data is exported depending on their timestamp or sequence.
- Works with Apache Cassandra 3.x and Apache Kafka >= 0.8
- It prints state messages in stderr
- Very easy to configure trough environment variables
- Auto discover kafka peers from DNS name
- Waits for kafka to be ready
- Auto reconnect and retry in case of error
How to use this image
$ docker run --name some-cassandra2kafka --env CLUSTER="172.16.56.183" --env KEYSPACE="example" --env TABLE="imported" puyi/cassandra2kafka
IMPORTANT: if you do not set up the required enviroment variables, it will not work correctly
Required Enviroment Variables
This variable is for controlling which Cassandra IP address to connect to
This variable is for controlling which Keyspace you want to use
This variable is for controlling which Keyspace's Table you want to use to store the imported information.
Others Enviroment Variables
Indicates if DATA is a timestamp, equal to -t, or a sequence number, equal to -s. The default value is -t
If you set OPT to -t: UTC timestamp. expcassandra gives all the entries on the table which have equal or later timestamp than DATA.
If you set OPT to -s: sequential sequence. expcassandra gives all the entries on the table which have equal or later sequence than DATA.
The default value is "0001-01-01 00:00:00.001", which means that all the data will be exported.
Page size in each query. The default value is 100, which means that 100 entries on the specified Keyspace's Table will be exported in each query
Sleep time (milliseconds) between queries. The default value is 100.
With LIMIT and SLEEPTIME you can configure how fast expcassadra gives the resoults. If LIMIT is very high, there will be timeout response problems, be careful
The DNS name for input Kafka broker service. The default value is "kafka".
Port to connect to input Kafka peers. The default value is 9092
The topic to consume. The default value is "mytopic"
The partition to produce to. The default value is empty (""), which means that it will produce to all partitions.
The key of produced messages. The default value is empty (""), which means that it will produce in every topic.
Expressed in UTC time zone