abhioncbr/docker-airflow

By abhioncbr

Updated over 5 years ago

Docker Image of Apache-Airflow

Image

436

docker pull abhioncbr/docker-airflow

docker run -p 2222:2222 --name=airflow-standalone airflow-test -m=standalone

Airflow components stack

  • Airflow version: Notation for representing version XX.YY.ZZ
  • Execution Mode: standalone(simple container for exploration purpose, based on sqlite as airflow metadata db & SequentialExecutor ) or prod(single node based, LocalExecutor amd mysql as airflow metadata db) and cluster (for distributed production long run use-cases, container runs as either server or worker )
  • Backend database: standalone- Sqlite, prod & cluster- Mysql
  • Scheduler: standalone- Sequential, prod- LocalExecutor and Cluster- Celery
  • Task queue: cluster- Redis
  • Log location: local file system (Default) or AWS S3 (through entrypoint-s3.sh)
  • User authentication: Password based & support for multiple users with superuser privilege.
  • Docker base image: debian
  • Code enhancement: password based multiple users supporting super-user(can see all dags of all owner) feature. Currently, Airflow is working on the password based multi user feature.
  • Other features: support for google cloud platform packages in container.

Airflow ports

  • airflow portal port: 2222
  • airflow celery flower: 5555
  • redis port: 6379
  • log files exchange port: 8793

Airflow services information

  • In server container: redis, airflow webserver & scheduler is running.
  • In worker container: airflow worker & celery flower ui service is running.

General information about airflow docker image

  • There are two docker files in the folder docker-files.
  • Base image(DockerFile-BaseXX.YY.ZZ) - file for building base image which consist of packages of airflow, java, redis and other basic components.
  • Working image(DockerFile-XX.YY.ZZ) - Depend on the base image. Build image with patches of airflow, creating user, installing gcp packages and setting up the working environment.
  • Airflow scheduler needs a restart after sometime for properly scheduling of the task. Shell script for restarting scheduler is present in folder config
  • Airflow container by default is configured for writing logs on local filesystem but can be configured for writing on AWS S3. AWS credentials needs to be updated in credentials file in folder config.

Docker Pull Command

docker pull abhioncbr/docker-airflow