cassandra
100M+
Apache Cassandra is an open-source distributed storage system.
docker pull cassandra
Maintained by:
the Docker Community
Where to get help:
the Docker Community Slack, Server Fault, Unix & Linux, or Stack Overflow
Dockerfile
linksWhere to file issues:
https://github.com/docker-library/cassandra/issues
Supported architectures: (more info)amd64
, arm32v7
, arm64v8
, ppc64le
, s390x
Published image artifact details:
repo-info repo's repos/cassandra/
directory (history)
(image metadata, transfer size, etc)
Image updates:
official-images repo's library/cassandra
label
official-images repo's library/cassandra
file (history)
Source of this description:
docs repo's cassandra/
directory (history)
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
cassandra
server instanceStarting a Cassandra instance is simple:
$ docker run --name some-cassandra --network some-network -d cassandra:tag
... where some-cassandra
is the name you want to assign to your container and tag
is the tag specifying the Cassandra version you want. See the list above for relevant tags.
Using the environment variables documented below, there are two cluster scenarios: instances on the same machine and instances on separate machines. For the same machine, start the instance as described above. To start other instances, just tell each new node where the first is.
$ docker run --name some-cassandra2 -d --network some-network -e CASSANDRA_SEEDS=some-cassandra cassandra:tag
For separate machines (ie, two VMs on a cloud provider), you need to tell Cassandra what IP address to advertise to the other nodes (since the address of the container is behind the docker bridge).
Assuming the first machine's IP address is 10.42.42.42
and the second's is 10.43.43.43
, start the first with exposed gossip port:
$ docker run --name some-cassandra -d -e CASSANDRA_BROADCAST_ADDRESS=10.42.42.42 -p 7000:7000 cassandra:tag
Then start a Cassandra container on the second machine, with the exposed gossip port and seed pointing to the first machine:
$ docker run --name some-cassandra -d -e CASSANDRA_BROADCAST_ADDRESS=10.43.43.43 -p 7000:7000 -e CASSANDRA_SEEDS=10.42.42.42 cassandra:tag
cqlsh
The following command starts another Cassandra container instance and runs cqlsh
(Cassandra Query Language Shell) against your original Cassandra container, allowing you to execute CQL statements against your database instance:
$ docker run -it --network some-network --rm cassandra cqlsh some-cassandra
More information about the CQL can be found in the Cassandra documentation.
The docker exec
command allows you to run commands inside a Docker container. The following command line will give you a bash shell inside your cassandra
container:
$ docker exec -it some-cassandra bash
The Cassandra Server log is available through Docker's container log:
$ docker logs some-cassandra
The best way to provide configuration to the cassandra
image is to provide a custom /etc/cassandra/cassandra.yaml
file. There are many ways to provide this file to the container (via short Dockerfile
with FROM
+ COPY
, via Docker Configs, via runtime bind-mount, etc), the details of which are left as an exercise for the reader.
To use a different file name (for example, to avoid all image-provided configuration behavior), use -Dcassandra.config=/path/to/cassandra.yaml
as an argument to the image (as in, docker run ... cassandra -Dcassandra.config=/path/to/cassandra.yaml
).
There are a small number of environment variables supported by the image which will modify /etc/cassandra/cassandra.yaml
in some way (but the script is modifying YAML, so is naturally fragile):
CASSANDRA_LISTEN_ADDRESS
: This variable is for controlling which IP address to listen for incoming connections on. The default value is auto
, which will set the listen_address
option in cassandra.yaml
to the IP address of the container as it starts. This default should work in most use cases.
CASSANDRA_BROADCAST_ADDRESS
: This variable is for controlling which IP address to advertise to other nodes. The default value is the value of CASSANDRA_LISTEN_ADDRESS
. It will set the broadcast_address
and broadcast_rpc_address
options in cassandra.yaml
.
CASSANDRA_RPC_ADDRESS
: This variable is for controlling which address to bind the thrift rpc server to. If you do not specify an address, the wildcard address (0.0.0.0
) will be used. It will set the rpc_address
option in cassandra.yaml
.
CASSANDRA_START_RPC
: This variable is for controlling if the thrift rpc server is started. It will set the start_rpc
option in cassandra.yaml
.
CASSANDRA_SEEDS
: This variable is the comma-separated list of IP addresses used by gossip for bootstrapping new nodes joining a cluster. It will set the seeds
value of the seed_provider
option in cassandra.yaml
. The CASSANDRA_BROADCAST_ADDRESS
will be added the seeds passed in so that the server will talk to itself as well.
CASSANDRA_CLUSTER_NAME
: This variable sets the name of the cluster and must be the same for all nodes in the cluster. It will set the cluster_name
option of cassandra.yaml
.
CASSANDRA_NUM_TOKENS
: This variable sets number of tokens for this node. It will set the num_tokens
option of cassandra.yaml
.
CASSANDRA_DC
: This variable sets the datacenter name of this node. It will set the dc
option of cassandra-rackdc.properties
. You must set CASSANDRA_ENDPOINT_SNITCH
to use the "GossipingPropertyFileSnitch" in order for Cassandra to apply cassandra-rackdc.properties
, otherwise this variable will have no effect.
CASSANDRA_RACK
: This variable sets the rack name of this node. It will set the rack
option of cassandra-rackdc.properties
. You must set CASSANDRA_ENDPOINT_SNITCH
to use the "GossipingPropertyFileSnitch" in order for Cassandra to apply cassandra-rackdc.properties
, otherwise this variable will have no effect.
CASSANDRA_ENDPOINT_SNITCH
: This variable sets the snitch implementation this node will use. It will set the endpoint_snitch
option of cassandra.yml
.
Important note: There are several ways to store data used by applications that run in Docker containers. We encourage users of the cassandra
images to familiarize themselves with the options available, including:
The Docker documentation is a good starting point for understanding the different storage options and variations, and there are multiple blogs and forum postings that discuss and give advice in this area. We will simply show the basic procedure here for the latter option above:
Create a data directory on a suitable volume on your host system, e.g. /my/own/datadir
.
Start your cassandra
container like this:
$ docker run --name some-cassandra -v /my/own/datadir:/var/lib/cassandra -d cassandra:tag
The -v /my/own/datadir:/var/lib/cassandra
part of the command mounts the /my/own/datadir
directory from the underlying host system as /var/lib/cassandra
inside the container, where Cassandra by default will write its data files.
If there is no database initialized when the container starts, then a default database will be created. While this is the expected behavior, this means that it will not accept incoming connections until such initialization completes. This may cause issues when using automation tools, such as Docker Compose, which start several containers simultaneously.
View license information for the software contained in this image.
As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).
Some additional license information which was able to be auto-detected might be found in the repo-info
repository's cassandra/
directory.
As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.
Docker Official Images are a curated set of Docker open source and drop-in solution repositories.
These images have clear documentation, promote best practices, and are designed for the most common use cases.