RedShift Mega Maid
This project is a simple Docker container built from awslabs/amazon-redshift-utils. It's purpose is to
ANALYZE a RedShift Cluster as a background job.
This project really depends on awslabs/amazon-redshift-utils. If you want to fix bugs & make changes to the script, fork that repo & send Pull Requests upstream!
This container uses a single
Makefile for ease of building & deployment. It also uses a basic "Build Tools" container that includes the standard gcc + make toolchain. This should make running in any given CI environment that supports Docker easy.
This spins up a clean docker container, mounting in the project directory, and calls
make clean build.
docker build to install the script's dependencies & build the container.
Dockerfile to build the releaseable docker container.
This will execute
aws ecr get-login, then push the container to the EC2 Container Services Registry (ECR). You are responsible for setting up ECR in your own AWS account.
We're tagging our images based on date and git hash, and also with
latest. The container should now be in the ECR (you can get the list of images in ECR with
aws ecr list-images --repository-name redshift-mega-maid).
TODO: Kubernetes Deployment option
The container takes environment variables to pass to the
analyze-vacuum-schema.py script as arguments. Most are specified in
Dockerfile as defaults but can be overridden. As a general rule, the environment variable names follow the convention:
$MM_ARGUMENT_NAME, where the cooresponding command line argument would be
MM_ to the argument name, while replacing all dashes (
-) with underscores (
Required environment variables are:
You may run the docker container with:
docker run --rm -e MM_DB_NAME=your_db_name -e MM_DB_USER=your_db_user \ -e MM_DB_PASS=your_db_pass -e MM_DB_HOST=aaa.us-west-2.redshift.amazonaws.com \ -e MM_DB_SCHEMA=your_db_schema -e MM_DB_TABLE=your_db_table \ returnpath/redshift-mega-maid:latest
Working with the Container
Within the container the logs are simply printed to stdout via
/dev/stdout. To override this, set
MM_OUTPUT_FILE to something else. If you wish to persist logs, you will want to pass in a volume mount for the container to write the file to with
docker run will run with
/bin/sh -c /opt/mega-maid/bin/analyze-vacuum-schema.sh
Which runs the
analyze-vacuum-schema python script with args:
/bin/sh -c python /opt/amazon-redshift-utils/src/AnalyzeVacuumUtility/analyze-vacuum-schema.py \ --db $MM_DB_NAME --db-user $MM_DB_USER --db-pwd $MM_DB_PASS \ --db-port $MM_DB_PORT --db-host $MM_DB_HOST \ --schema-name $MM_DB_SCHEMA --table-name $MM_DB_TABLE \ --output-file $MM_OUTPUT_FILE --debug $MM_DEBUG \ --ignore-errors $MM_IGNORE_ERRORS --slot-count $MM_SLOT_COUNT \ --min-unsorted-pct $MM_MIN_UNSORTED_PCT --max-unsorted-pct $MM_MAX_UNSORTED_PCT \ --deleted-pct $MM_DELETED_PCT --stats-off-pct $MM_STATS_OFF_PCT --max-table-size-mb $MM_MAX_TABLE_SIZE_MB
The default environment variables are set int the
ENV MM_DB_SCHEMA public ENV MM_DB_PORT 5439 ENV MM_OUTPUT_FILE /dev/stdout ENV MM_DEBUG True ENV MM_IGNORE_ERRORS False ENV MM_SLOT_COUNT 2 ENV MM_MIN_UNSORTED_PCT 5 ENV MM_MAX_UNSORTED_PCT 50 ENV MM_DELETED_PCT 15 ENV MM_STATS_OFF_PCT 10 ENV MM_MAX_TABLE_SIZE_MB 700*1024
Since we're using an
ENTRYPOINT, rather than
Dockerfile any flags you pass to
docker run after the image will be appended after the
ENTRYPOINT. This means that if you want to run other commands inside the container (e.g.: spawn a shell to interactively use other tools in awslabs/amazon-redshift-utils), you'll have to override the
docker run --entrypoint='/path/to/your-entrypoint-here'
Getting into the Container
If for some reason you need to run a container and get a shell you need to override the entrypoint:
docker run -it -entrypoint=/bin/bash <IMAGE>
This project is mostly packaging and wrapper scripts for the
amazon-redshift-utils tools. As such, nothing in this repository is "novel", or "non-obvious". This repo is therefore released under the Apache 2.0 License.
However, the upstream tools are released under other licenses:
The text of these tool's licenses are included here to avoid confusion.