bitnamicharts/airflow

Verified Publisher

By VMware

Updated 4 days ago

Bitnami Helm chart for Airflow

Helm
Image
Data Science
Integration & Delivery
Machine Learning & AI

100K+

Bitnami package for Apache Airflow

Apache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies.

Overview of Apache Airflow

Trademarks: This software listing is packaged by Bitnami. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.

TL;DR

helm install my-release oci://registry-1.docker.io/bitnamicharts/airflow

Looking to use Apache Airflow in production? Try VMware Tanzu Application Catalog, the commercial edition of the Bitnami catalog.

Introduction

This chart bootstraps an Apache Airflow deployment on a Kubernetes cluster using the Helm package manager.

Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters.

Prerequisites

  • Kubernetes 1.23+
  • Helm 3.8.0+
  • PV provisioner support in the underlying infrastructure

Installing the Chart

To install the chart with the release name my-release:

helm install my-release oci://REGISTRY_NAME/REPOSITORY_NAME/airflow

Note: You need to substitute the placeholders REGISTRY_NAME and REPOSITORY_NAME with a reference to your Helm chart registry and repository. For example, in the case of Bitnami, you need to use REGISTRY_NAME=registry-1.docker.io and REPOSITORY_NAME=bitnamicharts.

The command deploys Apache Airflow on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.

Tip: List all releases using helm list

Configuration and installation details

Executors

Airflow supports different Executors and this Helm chart provides support for several of them. You can choose the executor you want to use by setting the executor parameter.

CeleryExecutor

The Celery executor (default one) uses a message queue system (Redis® in this case) to coordinate tasks between pre-configured workers.

KubernetesExecutor

The Kubernetes executor creates a new worker pod for every task instance using the pod_template.yaml that you can find at templates/config/configmap.yaml. This template can be overwritten using worker.podTemplate. To enable KubernetesExecutor you can set the following parameters:

executor=KubernetesExecutor
rbac.create=true
serviceAccount.create=true
redis.enabled=false

NOTE: Redis® is not needed to be deployed when using KubernetesExecutor so you can disable it using redis.enabled=false.

CeleryKubernetesExecutor

The CeleryKubernetesExecutor (introduced in Airflow 2.0) is a combination of both the Celery and the Kubernetes executors. Tasks will be executed using Celery by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.

LocalExecutor

The Local executor runs tasks by spawning processes in the Scheduler pods. To enable LocalExecutor set the following parameters.

executor=LocalExecutor
redis.enabled=false

LocalKubernetesExecutor

The LocalKubernetesExecutor (introduced in Airflow 2.3) is a combination of both the Local and the Kubernetes executors. Tasks will be executed in the scheduler by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.

SequentialExecutor

This executor will only run one task instance at a time in the Scheduler pods. For production use case, please use other executors. To enable SequentialExecutor set the following parameters.

executor=SequentialExecutor
redis.enabled=false
Update credentials

Bitnami charts configure credentials at first boot. Any further change in the secrets or credentials require manual intervention. Follow these instructions:

  • Update the user password following the upstream documentation
  • Update the password secret with the new values (replace the SECRET_NAME, PASSWORD, FERNET_KEY and SECRET_KEY placeholders)
kubectl create secret generic SECRET_NAME --from-literal=airflow-password=PASSWORD --from-literal=airflow-fernet-key=FERNET_KEY --from-literal=airflow-secret-key=SECRET_KEY --dry-run -o yaml | kubectl apply -f -
Airflow configuration file

By default, the Airflow configuration file is auto-generated based on the chart parameters you set. For instance, the executor parameter will be used to set the executor class under the [core] section.

You can also provider your own configuration by setting the configuration parameter. This parameter expects the configuration as a sections/keys/values dictionary on YAML format, then it's converted to .cfg format by the chart. For instance, using a configuration like the one below...

configuration:
  core:
    dags_folder: "/opt/bitnami/airflow/dags"

... the chart will translate it to the following configuration file:

[core]
dags_folder = "/opt/bitnami/airflow/dags"

As an alternative to providing the whole configuration, you can also extend the default configuration using the overrideConfiguration parameter. The values set in this parameter, which also expects YAML format, will be merged with the default configuration or those set in the configuration parameter taking precedence.

Scaling worker pods

Sometime when using large workloads a fixed number of worker pods may make task to take a long time to be executed. This chart provide two ways for scaling worker pods.

  • If you are using KubernetesExecutor auto scaling pods would be done by the Scheduler without adding anything more.
  • If you are using SequentialExecutor you would have to enable worker.autoscaling to do so, please, set the following parameters. It will use autoscaling by default configuration that you can change using worker.autoscaling.replicas.* and worker.autoscaling.targets.*.
worker.autoscaling.enabled=true
worker.resources.requests.cpu=200m
worker.resources.requests.memory=250Mi
Generate a Fernet key

A Fernet key is required in order to encrypt password within connections. The Fernet key must be a base64-encoded 32-byte key.

Learn how to generate one here.

Generate a Secret key

Secret key used to run your Flask app. It should be as random as possible.

Note: when running multiple Webserver instances, make sure all of them use the same secret key. Otherwise you may face the error "CSRF session token is missing".

Load DAG files

There are two different ways to load your custom DAG files into the Airflow chart. All of them are compatible so you can use more than one at the same time.

Option 1: Specify an existing config map

You can manually create a config map containing all your DAG files and then pass the name when deploying Airflow chart. For that, you can set the parameters below:

dags.enabled=true
dags.existingConfigmap=my-dags-configmap

Option 2: Get your DAG files from a git repository

You can store all your DAG files on GitHub repositories and then clone to the Airflow pods with an initContainer. The repositories will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:

Note: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach dags if it was needed.

dags.enabled=true
dags.repositories[0].repository=https://github.com/USERNAME/REPOSITORY
dags.repositories[0].name=REPO-IDENTIFIER
dags.repositories[0].branch=master

If you use a private repository from GitHub, a possible option to clone the files is using a Personal Access Token and using it as part of the URL: https://USERNAME:PERSONAL_ACCESS_TOKEN@github.com/USERNAME/REPOSITORY. Alternatively, you can clone the repository using SSH, to do so, you can set your private SSH Key setting the dags.sshKey parameter or use an existing secret containing your private SSH key setting the dags.existingSshKeySecret and dags.existingSshKeySecretKey parameters.

Loading Plugins

You can load plugins into the chart by specifying a git repository containing the plugin files. The repository will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:

Note: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach plugins if it was needed.

plugins.enabled=true
plugins.repositories[0].repository=https://github.com/teamclairvoyant/airflow-rest-api-plugin.git
plugins.repositories[0].branch=v1.0.9-branch
plugins.repositories[0].path=plugins
Install extra python packages

This chart allows you to mount volumes using extraVolumes and extraVolumeMounts in every component (web, scheduler, worker). Mounting a requirements.txt using these options to /bitnami/python/requirements.txt will execute pip install -r /bitnami/python/requirements.txt on container start.

Existing Secrets

You can use an existing secret to configure your Airflow auth, external Postgres, and external Redis® passwords:

postgresql.enabled=false
externalDatabase.host=my.external.postgres.host
externalDatabase.user=bn_airflow
externalDatabase.database=bitnami_airflow
externalDatabase.existingSecret=all-my-secrets
externalDatabase.existingSecretPasswordKey=postgresql-password

redis.enabled=false
externalRedis.host=my.external.redis.host
externalRedis.existingSecret=all-my-secrets
externalRedis.existingSecretPasswordKey=redis-password

auth.existingSecret=all-my-secrets

The expected secret resource looks as follows:

apiVersion: v1
kind: Secret
metadata:
  name: all-my-secrets
type: Opaque
data:
  airflow-password: "Smo1QTJLdGxXMg=="
  airflow-fernet-key: "YVRZeVJVWnlXbU4wY1dOalVrdE1SV3cxWWtKeFIzWkVRVTVrVjNaTFR6WT0="
  airflow-secret-key: "a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08="
  postgresql-password: "cG9zdGdyZXMK"
  redis-password: "cmVkaXMK"

This is useful if you plan on using Bitnami's sealed secrets to manage your passwords.

Alternatively, you can also use a SQL connection string to connect to an external database. This can be done by:

  • Setting the externalDatabase.sqlConnection parameter:
postgresql.enabled=false
externalDatabase.sqlConnection=postgresql://user:password@host:port/dbname
  • Or via the externalDatabase.existingSecret and externalDatabase.existingSecretSqlConnectionKey parameters:
postgresql.enabled=false
externalDatabase.existingSecret=db-secret
externalDatabase.existingSecretSqlConnectionKey=sql-connection
Database setup

By default, this chart setups the database (init or migrate the schema) and creates the admin user using a K8s job that is created when the chart release is installed or upgraded, and deleted once it succeeds. This job uses Chart hooks, so it won't be deleted if you're using Helm exclusively for its rendering capabilities (e.g. when using ArgoCD or FluxCD).

Alternatively, you can disable this behavior by setting the setupDBJob.enabled parameter to false. In this case, the database setup and admin user creation will be done during the Webserver startup.

Resource requests and limits

Bitnami charts allow setting resource requests and limits for all containers inside the chart deployment. These are inside the resources value (check parameter table). Setting requests is essential for production workloads and these should be adapted to your specific use case.

To make this process easier, the chart contains the resourcesPreset values, which automatically sets the resources section according to different presets. Check these presets in the bitnami/common chart. However, in production workloads using resourcesPreset is discouraged as it may not fully adapt to your specific needs. Find more information on container resource management in the official Kubernetes documentation.

Prometheus metrics

This chart can be integrated with Prometheus by setting metrics.enabled to true. This will configure Airflow components to send StatsD metrics to the StatsD exporter that transforms them into Prometheus metrics. The StatsD exporter is deployed as a standalone deployment and service in the same namespace as the Airflow deployment.

Prometheus requirements

It is necessary to have a working installation of Prometheus or Prometheus Operator for the integration to work. Install the Bitnami Prometheus helm chart or the Bitnami Kube Prometheus helm chart to easily have a working Prometheus in your cluster.

Integration with Prometheus Operator

The chart can deploy ServiceMonitor objects for integration with Prometheus Operator installations. To do so, set the value metrics.serviceMonitor.enabled=true. Ensure that the Prometheus Operator CustomResourceDefinitions are installed in the cluster or it will fail with the following error:

no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

Install the Bitnami Kube Prometheus helm chart for having the necessary CRDs and the Prometheus Operator.

Rolling VS Immutable tags

It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.

Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.

Ingress

This chart provides support for Ingress resources. If you have an ingress controller installed on your cluster, such as nginx-ingress-controller or contour you can utilize the ingress controller to serve your application.

To enable Ingress integration, set ingress.enabled to true.

The most common scenario is to have one host name mapped to the deployment. In this case, the ingress.hostname property can be used to set the host name. The ingress.tls parameter can be used to add the TLS configuration for this host. However, it is also possible to have more than one host. To facilitate this, the ingress.extraHosts parameter (if available) can be set with the host names specified as an array. The ingress.extraTLS parameter (if available) can also be used to add the TLS configuration for extra hosts.

NOTE: For each host specified in the ingress.extraHosts parameter, it is necessary to set a name, path, and any annotations that the Ingress controller should know about. Not all annotations are supported by all Ingress controllers, but this annotation reference document lists the annotations supported by many popular Ingress controllers.

Adding the TLS parameter (where available) will cause the chart to generate HTTPS URLs, and the application will be available on port 443. The actual TLS secrets do not have to be generated by this chart. However, if TLS is enabled, the Ingress record will not work until the TLS secret exists.

Learn more about Ingress controllers.

Securing traffic using TLS

By default, this chart assumes TLS is managed by the Ingress Controller and terminates the TLS connection in the Ingress Controller. This can be done by setting ingress.enabled and ingress.tls parameters to true as explained in the section above. However, it is possible to configure TLS encryption for the Airflow Webserver directly by setting the web.tls.enabled parameter to true.

It is necessary to create a secret containing the TLS certificates and pass it to the chart via the web.tls.existingSecret parameter. The secret should contain a tls.crt and tls.key keys including the certificate and key files respectively. For example:

kubectl create secret generic web-tls-secret --from-file=./tls.crt --from-file=./tls.key

You can manually create the required TLS certificates or relying on the chart auto-generation capabilities. The chart supports two different ways to auto-generate the required certificates:

  • Using Helm capabilities. Enable this feature by setting web.tls.autoGenerated.enabled to true and web.tls.autoGenerated.engine to helm.
  • Relying on CertManager (please note it's required to have CertManager installed in your K8s cluster). Enable this feature by setting web.tls.autoGenerated.enabled to true and web.tls.autoGenerated.engine to cert-manager. Please note it's supported to use an existing Issuer/ClusterIssuer for issuing the TLS certificates by setting the web.tls.autoGenerated.certManager.existingIssuer and web.tls.autoGenerated.certManager.existingIssuerKind parameters.
Sidecars

If additional containers are needed in the same pod as Apache Airflow (such as additional metrics or logging exporters), they can be defined using the sidecars parameter.

sidecars:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
    containerPort: 1234

If these sidecars export extra ports, extra port definitions can be added using the service.extraPorts parameter (where available), as shown in the example below:

service:
  extraPorts:
  - name: extraPort
    port: 11311
    targetPort: 11311

If additional init containers are needed in the same pod, they can be defined using the initContainers parameter. Here is an example:

initContainers:
  - name: your-image-name
    image: your-image
    imagePullPolicy: Always
    ports:
      - name: portname
        containerPort: 1234

Learn more about sidecar containers and init containers.

Setting Pod's affinity

This chart allows you to set your custom affinity using the affinity parameter. Find more information about Pod's affinity in the kubernetes documentation.

As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the bitnami/common chart. To do so, set the podAffinityPreset, podAntiAffinityPreset, or nodeAffinityPreset parameters.

Backup and restore

To back up and restore Helm chart deployments on Kubernetes, you need to back up the persistent volumes from the source deployment and attach them to a new deployment using Velero, a Kubernetes backup/restore tool. Find the instructions for using Velero in this guide.

Persistence

The Bitnami Airflow chart relies on the PostgreSQL chart persistence. This means that Airflow does not persist anything.

Parameters

Global parameters
NameDescriptionValue
global.imageRegistryGlobal Docker image registry""
global.imagePullSecretsGlobal Docker registry secret names as an array[]
global.defaultStorageClassGlobal default StorageClass for Persistent Volume(s)""
global.security.allowInsecureImagesAllows skipping image verificationfalse
global.compatibility.openshift.adaptSecurityContextAdapt the securityContext sections of the deployment to make them compatible with Openshift restricted-v2 SCC: remove runAsUser, runAsGroup and fsGroup and let the platform use their allowed default IDs. Possible values: auto (apply if the detected running cluster is Openshift), force (perform the adaptation always), disabled (do not perform adaptation)auto
global.compatibility.omitEmptySeLinuxOptionsIf set to true, removes the seLinuxOptions from the securityContexts when it is set to an empty objectfalse
Common parameters
NameDescriptionValue
kubeVersion

Note: the README for this chart is longer than the DockerHub length limit of 25000, so it has been trimmed. The full README can be found at https://github.com/bitnami/charts/blob/main/bitnami/airflow/README.md

Docker Pull Command

docker pull bitnamicharts/airflow
Bitnami