Public Repository

Last pushed: 19 hours ago
Short Description
Docker image repository housing builds of Reaper for Apache Cassandra.
Full Description

For the full documentation visit: http://cassandra-reaper.io.

To view the code, visit: https://github.com/thelastpickle/cassandra-reaper.

Running Reaper

Pull the Latest Image

Make sure you have the latest image within your local image repository:

# use master if you want to run with the latest commits
TAG=master

# use latest if you want the latest stable release
TAG=latest

# pull the latest image
docker pull thelastpickle/cassandra-reaper:${TAG}

In-Memory Backend

To launch a Reaper container backed by an In-Memory backend, use the following example with the appropriate JMX authentication settings for the cluster it will manage repairs for.

TAG=latest

REAPER_JMX_AUTH_USERNAME=reaperUser
REAPER_JMX_AUTH_PASSWORD=reaperPass

docker run \
    -p 8080:8080 \
    -p 8081:8081 \
    -e "REAPER_JMX_AUTH_USERNAME=${REAPER_JMX_AUTH_USERNAME}" \
    -e "REAPER_JMX_AUTH_PASSWORD=${REAPER_JMX_AUTH_PASSWORD}" \
    thelastpickle/cassandra-reaper:${TAG}

Then visit the the Reaper UI: http://localhost:8080/webui/.

Cassandra Backend

To launch a Reaper container backed by Cassandra, use the following example to connect to a Cassandra cluster that already has the reaper_db keyspace. Set the appropriate JMX authentication settings for the cluster that Reaper will manage repairs for.

TAG=latest

REAPER_JMX_AUTH_USERNAME=reaperUser
REAPER_JMX_AUTH_PASSWORD=reaperPass

REAPER_CASS_CLUSTER_NAME=reaper-cluster
REAPER_CASS_CONTACT_POINTS=["192.168.2.185"]

docker run \
    -p 8080:8080 \
    -p 8081:8081 \
    -e "REAPER_JMX_AUTH_USERNAME=${REAPER_JMX_AUTH_USERNAME}" \
    -e "REAPER_JMX_AUTH_PASSWORD=${REAPER_JMX_AUTH_PASSWORD}" \
    -e "REAPER_STORAGE_TYPE=cassandra" \
    -e "REAPER_CASS_CLUSTER_NAME=${REAPER_CASS_CLUSTER_NAME}" \
    -e "REAPER_CASS_CONTACT_POINTS=${REAPER_CASS_CONTACT_POINTS}" \
    -e "REAPER_CASS_KEYSPACE=reaper_db" \
    thelastpickle/cassandra-reaper:${TAG}

Then visit the the Reaper UI: http://localhost:8080/webui/.

Environment Variables

Reaper Specific

The Reaper Docker container has been designed to be highly configurable. Many of the environment variables map directly or indirectly to a settings in the cassandra-reaper.yaml configuration file.

REAPER_AUTO_SCHEDULING_ENABLED

Type: Boolean

Default: false

Optional setting to automatically setup repair schedules for all non-system keyspaces in a cluster. If enabled, adding a new cluster will automatically setup a schedule repair for each keyspace. Cluster keyspaces are monitored based on a configurable frequency, so that adding or removing a keyspace will result in adding / removing the corresponding scheduled repairs.

REAPER_AUTO_SCHEDULING_EXCLUDED_KEYSPACES

Type: List (comma separated values)

The Keyspaces that are to be excluded from the repair schedule.

REAPER_AUTO_SCHEDULING_INITIAL_DELAY_PERIOD

Type: String

Default: PT15S (15 seconds)

The amount of delay time before the schedule period starts.

REAPER_AUTO_SCHEDULING_PERIOD_BETWEEN_POLLS

Type: String

Default: PT10M (10 minutes)

The interval time to wait before checking whether to start a repair task.

REAPER_AUTO_SCHEDULING_SCHEDULE_SPREAD_PERIOD

Type: String

Default: PT6H (6 hours)

The time spacing between each of the repair schedules that is to be carried out.

REAPER_AUTO_SCHEDULING_TIME_BEFORE_FIRST_SCHEDULE

Type: String

Default: PT5M (5 minutes)

Grace period before the first repair in the schedule is started.

REAPER_DATACENTER_AVAILABILITY

Type: String

Default: ALL

Indicates to Reapers its deployment in relation to cluster data center network locality. The value must be either ALL, LOCAL, or EACH. Note that this setting controls the behavior for metrics collection.

For security reasons, it is possible that Reaper will have access limited to nodes in a single datacenter via JMX (multi region clusters for example). In this case, it is possible to deploy an operate an instance of Reaper in each datacenter where each instance only has access via JMX (with or without authentication) to the nodes in its local datacenter. Where multiple instances of Reaper are in operation in this configuration, only the Apache Cassandra storage option can be used with Reaper. All other storage options are unsuitable in this case. This is because Reaper instances will rely on lightweight transactions to get leadership on segments before processing them. In addition, Reaper will check the number of pending compactions and actively running repairs on all replicas prior to processing a segment.

ALL - requires Reaper to have access via JMX to all nodes across all datacenters. In this mode Reaper can be backed by all available storage types.

LOCAL - requires Reaper to have access via JMX to all nodes only in the same datacenter local to Reaper. A single Reaper instance can operate in this mode and repair its local data center. In this case, can be backed by all available storage types and repairs to any remote datacenters are be handled internally by Cassandra. A Reaper instance can be deployed to each datacenter and be configured to operate in this mode. In this case, Reaper can only use Apache Cassandra as its storage. In addition, metrics can be collected asynchronously through the Apache Cassandra storage.

EACH - requires a minimum of one Reaper instance operating in each datacenter. Each Reaper instance is required to have access via JMX to all nodes only in its local datacenter. When operating in this mode, Reaper can only use Apache Cassandra as its storage. In addition, metrics from nodes in remote datacenters must be collected through the Cassandra storage backend. If all metrics are unavailable, the segment will be postponed for later processing.

REAPER_ENABLE_CROSS_ORIGIN

Type: Boolean

Default: true

Optional setting which can be used to enable the CORS headers for running an external GUI application, like this project. When enabled it will allow REST requests incoming from other origins than the domain that hosts Reaper.

REAPER_ENABLE_DYNAMIC_SEED_LIST

Type: Boolean

Default: true

Allow Reaper to add all nodes in the cluster as contact points when adding a new cluster, instead of just adding the provided node.

REAPER_HANGING_REPAIR_TIMEOUT_MINS

Type: Integer

The amount of time in minutes to wait for a single repair to finish. If this timeout is reached,
the repair segment in question will be cancelled, if possible, and then scheduled for later
repair again within the same repair run process.

REAPER_INCREMENTAL_REPAIR

Type: Boolean

Default: false

Sets the default repair type unless specifically defined for each run. Note that this is only supported with the PARALLEL repairParallelism setting. For more details in incremental repair, please refer to the following article.http://www.datastax.com/dev/blog/more-efficient-repairs

Note: It is recommended to avoid using incremental repair before Cassandra 4.0 as subtle bugs can lead to overstreaming and cluster instabililty.

REAPER_JMX_AUTH_USERNAME

Type: String

The user name for the optional setting to allow Reaper to establish JMX connections to Cassandra clusters using password based JMX authentication.

REAPER_JMX_AUTH_PASSWORD

Type: String

The password for the optional setting to allow Reaper to establish JMX connections to Cassandra clusters using password based JMX authentication.

REAPER_JMX_CONNECTION_TIMEOUT_IN_SECONDS

Type: Integer

Default: 20

Controls the timeout for establishing JMX connections. The value should be low enough to avoid stalling simple operations in multi region clusters, but high enough to allow connections under normal conditions.

REAPER_JMX_PORTS

Type: Object

Optional mapping of custom JMX ports to use for individual hosts. The used default JMX port value is 7199. CCM users will find IP and port number to add in ~/.ccm/<cluster>/*/node.conf or by running ccm <node> show.

jmxPorts:
  127.0.0.1: 7100
  127.0.0.2: 7200
  127.0.0.3: 7300

REAPER_LOGGING_ROOT_LEVEL

Type: String

The log level to filter to. Where the level order is ALL < DEBUG < INFO < WARN < ERROR < FATAL < OFF. See the log4j documentation for further information.

REAPER_LOGGING_LOGGERS

Type: Object

Key value pair containing the logger class name as the key and other sub-settings as its value.

REAPER_LOGGING_APPENDERS_LOG_FORMAT

Type: String

The output format of an entry in the log.

REAPER_REPAIR_INTENSITY

Type: Float (value between 0.0 and 1.0, but must never be 0.0.)

Repair intensity defines the amount of time to sleep between triggering each repair segment while running a repair run. When intensity is 1.0, it means that Reaper doesn't sleep at all before triggering next segment, and otherwise the sleep time is defined by how much time it took to repair the last segment divided by the intensity value. 0.5 means half of the time is spent sleeping, and half running. Intensity 0.75 means that 25% of the total time is used sleeping and 75% running. This value can also be overwritten per repair run when invoking repairs.

REAPER_REPAIR_MANAGER_SCHEDULING_INTERVAL_SECONDS

Type: Integer

Default: 30

Controls the pace at which the Repair Manager will schedule processing of the next segment. Reducing this value from its default value of 30s to a lower value can speed up fast repairs by orders of magnitude.

REAPER_REPAIR_PARALELLISM

Type: String

Type of parallelism to apply by default to repair runs. The value must be either SEQUENTIAL, PARALLEL, or DATACENTER_AWARE.

SEQUENTIAL - one replica at a time, validation compaction performed on snapshots

PARALLEL - all replicas at the same time, no snapshot

DATACENTER_AWARE - all replicas in only one DC at a time, no snapshots. If this value is used in clusters older than 2.0.12, Reaper will fall back into using SEQUENTIAL for those clusters.

REAPER_REPAIR_RUN_THREADS

Type: Integer

The amount of threads to use for handling the Reaper tasks. Have this big enough not to cause
blocking in cause some thread is waiting for I/O, like calling a Cassandra cluster through JMX.

REAPER_SCHEDULE_DAYS_BETWEEN

Type: Integer

Default: 7

Defines the amount of days to wait between scheduling new repairs. The value configured here is the default for new repair schedules, but you can also define it separately for each new schedule. Using value 0 for continuous repairs is also supported.

REAPER_SEGMENT_COUNT

Type: Integer

Defines the default amount of repair segments to create for newly registered Cassandra repair runs (token rings). When running a repair run by the Reaper, each segment is repaired separately by the Reaper process, until all the segments in a token ring are repaired. The count might be slightly off the defined value, as clusters residing in multiple data centers require additional small token ranges in addition to the expected. This value can be overwritten when executing a repair run via Reaper.

REAPER_SERVER_APP_BIND_HOST

Host address used to access the application UI. Note that to bind the service to all interfaces use value 0.0.0.0 or leave the value for the setting this blank. A value of * is an invalid value for this setting.

REAPER_SERVER_APP_PORT

Port number used to access the application UI. Note that this port number must be different to the port number used for REAPER_SERVER_ADMIN_PORT.

REAPER_SERVER_ADMIN_BIND_HOST

Host address used to access the application UI. Note that to bind the service to all interfaces use value 0.0.0.0 or leave the value for the setting this blank. A value of * is an invalid value for this setting.

REAPER_SERVER_ADMIN_PORT

Port number to access the administration UI. Note that this port number must be different to the port number used for REAPER_SERVER_APP_PORT.

REAPER_STORAGE_TYPE

Type: String

Whether to use database or memory based storage for storing the system state. The value must be either cassandra, database or memory. If the recommended (persistent) storage type database or cassandra is being used, the database client parameters must be specified in the respective database or cassandra section in the configuration file. See the example settings in provided testing configuration in src/test/resources/cassandra-reaper.yaml.

REAPER_USE_ADDRESS_TRANSLATOR

Type: Boolean

Default: false

When running multi region clusters in AWS, turn this setting to true in order to use the EC2MultiRegionAddressTranslator from the Datastax Java Driver. This will allow translating the public address that the nodes broadcast to the private IP address that is used to expose JMX.

Cassandra Backend Specific

REAPER_CASS_ACTIVATE_QUERY_LOGGER

Type: Boolean

Default: false

Records the CQL calls made to the Cassandra backend in the log output.

REAPER_CASS_CLUSTER_NAME

Type: String

Name of the cluster to use to store the Reaper control data.

REAPER_CASS_CONTACT_POINTS

Type: Array (comma separated Strings)

Seed nodes in the Cassandra cluster to contact. e.g. ["127.0.0.1", "127.0.0.2", "127.0.0.3"]

REAPER_CASS_KEYSPACE

Type: String

Name of the keyspace to store the Reaper control data.

REAPER_CASS_LOCAL_DC

Type: String

Specifies the name of the datacenter closest to Reaper when using the dcAwareRoundRobin policy.

REAPER_CASS_AUTH_USERNAME

Type: String

Cassandra native protocol username.

REAPER_CASS_AUTH_PASSWORD

Type: String

Cassandra native protocol password.

REAPER_CASS_AUTH_ENABLED

Type: Boolean

Default: false

Allows Reaper to send authentication credentials when establishing a connection with Cassandra via the native protocol. When enabled, authentication credentials must be specified by setting values for REAPER_CASS_AUTH_USERNAME and REAPER_CASS_AUTH_PASSWORD.

REAPER_CASS_NATIVE_PROTOCOL_SSL_ENCRYPTION_ENABLED

Type: Boolean

Default: false

Allows Reaper to establish an encrypted connection when establishing a connection with Cassandra via the native protocol.

H2 and Postgres Backend Specific

REAPER_DB_DRIVER_CLASS

Type: String

Specifies the driver to use to connect to the database.

REAPER_DB_URL

Type: String

Specifies the URL to connect to the database on.

REAPER_DB_USERNAME

Type: String

Database username.

REAPER_DB_PASSWORD

Type: String

Database password.

Docker Pull Command
Owner
thelastpickle