bitnamicharts/airflow
Bitnami Helm chart for Airflow
100K+
Apache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies.
Trademarks: This software listing is packaged by Bitnami. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.
helm install my-release oci://registry-1.docker.io/bitnamicharts/airflow
Looking to use Apache Airflow in production? Try VMware Tanzu Application Catalog, the commercial edition of the Bitnami catalog.
This chart bootstraps an Apache Airflow deployment on a Kubernetes cluster using the Helm package manager.
Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters.
To install the chart with the release name my-release
:
helm install my-release oci://REGISTRY_NAME/REPOSITORY_NAME/airflow
Note: You need to substitute the placeholders
REGISTRY_NAME
andREPOSITORY_NAME
with a reference to your Helm chart registry and repository. For example, in the case of Bitnami, you need to useREGISTRY_NAME=registry-1.docker.io
andREPOSITORY_NAME=bitnamicharts
.
The command deploys Apache Airflow on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.
Tip: List all releases using
helm list
Airflow supports different Executors and this Helm chart provides support for several of them. You can choose the executor you want to use by setting the executor
parameter.
CeleryExecutor
The Celery executor (default one) uses a message queue system (Redis® in this case) to coordinate tasks between pre-configured workers.
KubernetesExecutor
The Kubernetes executor creates a new worker pod for every task instance using the pod_template.yaml
that you can find at templates/config/configmap.yaml. This template can be overwritten using worker.podTemplate
. To enable KubernetesExecutor
you can set the following parameters:
executor=KubernetesExecutor
rbac.create=true
serviceAccount.create=true
redis.enabled=false
NOTE: Redis® is not needed to be deployed when using KubernetesExecutor so you can disable it using
redis.enabled=false
.
CeleryKubernetesExecutor
The CeleryKubernetesExecutor (introduced in Airflow 2.0) is a combination of both the Celery and the Kubernetes executors. Tasks will be executed using Celery by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.
LocalExecutor
The Local executor runs tasks by spawning processes in the Scheduler pods. To enable LocalExecutor
set the following parameters.
executor=LocalExecutor
redis.enabled=false
LocalKubernetesExecutor
The LocalKubernetesExecutor (introduced in Airflow 2.3) is a combination of both the Local and the Kubernetes executors. Tasks will be executed in the scheduler by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.
SequentialExecutor
This executor will only run one task instance at a time in the Scheduler pods. For production use case, please use other executors. To enable SequentialExecutor
set the following parameters.
executor=SequentialExecutor
redis.enabled=false
Bitnami charts configure credentials at first boot. Any further change in the secrets or credentials require manual intervention. Follow these instructions:
kubectl create secret generic SECRET_NAME --from-literal=airflow-password=PASSWORD --from-literal=airflow-fernet-key=FERNET_KEY --from-literal=airflow-secret-key=SECRET_KEY --dry-run -o yaml | kubectl apply -f -
By default, the Airflow configuration file is auto-generated based on the chart parameters you set. For instance, the executor
parameter will be used to set the executor
class under the [core]
section.
You can also provider your own configuration by setting the configuration
parameter. This parameter expects the configuration as a sections/keys/values dictionary on YAML format, then it's converted to .cfg format by the chart. For instance, using a configuration like the one below...
configuration:
core:
dags_folder: "/opt/bitnami/airflow/dags"
... the chart will translate it to the following configuration file:
[core]
dags_folder = "/opt/bitnami/airflow/dags"
As an alternative to providing the whole configuration, you can also extend the default configuration using the overrideConfiguration
parameter. The values set in this parameter, which also expects YAML format, will be merged with the default configuration or those set in the configuration
parameter taking precedence.
Sometime when using large workloads a fixed number of worker pods may make task to take a long time to be executed. This chart provide two ways for scaling worker pods.
KubernetesExecutor
auto scaling pods would be done by the Scheduler without adding anything more.SequentialExecutor
you would have to enable worker.autoscaling
to do so, please, set the following parameters. It will use autoscaling by default configuration that you can change using worker.autoscaling.replicas.*
and worker.autoscaling.targets.*
.worker.autoscaling.enabled=true
worker.resources.requests.cpu=200m
worker.resources.requests.memory=250Mi
A Fernet key is required in order to encrypt password within connections. The Fernet key must be a base64-encoded 32-byte key.
Learn how to generate one here.
Secret key used to run your Flask app. It should be as random as possible.
Note: when running multiple Webserver instances, make sure all of them use the same secret key. Otherwise you may face the error "CSRF session token is missing".
There are two different ways to load your custom DAG files into the Airflow chart. All of them are compatible so you can use more than one at the same time.
Option 1: Specify an existing config map
You can manually create a config map containing all your DAG files and then pass the name when deploying Airflow chart. For that, you can set the parameters below:
dags.enabled=true
dags.existingConfigmap=my-dags-configmap
Option 2: Get your DAG files from a git repository
You can store all your DAG files on GitHub repositories and then clone to the Airflow pods with an initContainer. The repositories will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:
Note: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach dags if it was needed.
dags.enabled=true
dags.repositories[0].repository=https://github.com/USERNAME/REPOSITORY
dags.repositories[0].name=REPO-IDENTIFIER
dags.repositories[0].branch=master
If you use a private repository from GitHub, a possible option to clone the files is using a Personal Access Token and using it as part of the URL: https://USERNAME:PERSONAL_ACCESS_TOKEN@github.com/USERNAME/REPOSITORY
. Alternatively, you can clone the repository using SSH, to do so, you can set your private SSH Key setting the dags.sshKey
parameter or use an existing secret containing your private SSH key setting the dags.existingSshKeySecret
and dags.existingSshKeySecretKey
parameters.
You can load plugins into the chart by specifying a git repository containing the plugin files. The repository will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:
Note: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach plugins if it was needed.
plugins.enabled=true
plugins.repositories[0].repository=https://github.com/teamclairvoyant/airflow-rest-api-plugin.git
plugins.repositories[0].branch=v1.0.9-branch
plugins.repositories[0].path=plugins
This chart allows you to mount volumes using extraVolumes
and extraVolumeMounts
in every component (web, scheduler, worker). Mounting a requirements.txt
using these options to /bitnami/python/requirements.txt
will execute pip install -r /bitnami/python/requirements.txt
on container start.
You can use an existing secret to configure your Airflow auth, external Postgres, and external Redis® passwords:
postgresql.enabled=false
externalDatabase.host=my.external.postgres.host
externalDatabase.user=bn_airflow
externalDatabase.database=bitnami_airflow
externalDatabase.existingSecret=all-my-secrets
externalDatabase.existingSecretPasswordKey=postgresql-password
redis.enabled=false
externalRedis.host=my.external.redis.host
externalRedis.existingSecret=all-my-secrets
externalRedis.existingSecretPasswordKey=redis-password
auth.existingSecret=all-my-secrets
The expected secret resource looks as follows:
apiVersion: v1
kind: Secret
metadata:
name: all-my-secrets
type: Opaque
data:
airflow-password: "Smo1QTJLdGxXMg=="
airflow-fernet-key: "YVRZeVJVWnlXbU4wY1dOalVrdE1SV3cxWWtKeFIzWkVRVTVrVjNaTFR6WT0="
airflow-secret-key: "a25mQ1FHTUh3MnFRSk5KMEIyVVU2YmN0VGRyYTVXY08="
postgresql-password: "cG9zdGdyZXMK"
redis-password: "cmVkaXMK"
This is useful if you plan on using Bitnami's sealed secrets to manage your passwords.
Alternatively, you can also use a SQL connection string to connect to an external database. This can be done by:
externalDatabase.sqlConnection
parameter:postgresql.enabled=false
externalDatabase.sqlConnection=postgresql://user:password@host:port/dbname
externalDatabase.existingSecret
and externalDatabase.existingSecretSqlConnectionKey
parameters:postgresql.enabled=false
externalDatabase.existingSecret=db-secret
externalDatabase.existingSecretSqlConnectionKey=sql-connection
By default, this chart setups the database (init or migrate the schema) and creates the admin user using a K8s job that is created when the chart release is installed or upgraded, and deleted once it succeeds. This job uses Chart hooks, so it won't be deleted if you're using Helm exclusively for its rendering capabilities (e.g. when using ArgoCD or FluxCD).
Alternatively, you can disable this behavior by setting the setupDBJob.enabled
parameter to false
. In this case, the database setup and admin user creation will be done during the Webserver startup.
Bitnami charts allow setting resource requests and limits for all containers inside the chart deployment. These are inside the resources
value (check parameter table). Setting requests is essential for production workloads and these should be adapted to your specific use case.
To make this process easier, the chart contains the resourcesPreset
values, which automatically sets the resources
section according to different presets. Check these presets in the bitnami/common chart. However, in production workloads using resourcesPreset
is discouraged as it may not fully adapt to your specific needs. Find more information on container resource management in the official Kubernetes documentation.
This chart can be integrated with Prometheus by setting metrics.enabled
to true
. This will configure Airflow components to send StatsD metrics to the StatsD exporter that transforms them into Prometheus metrics. The StatsD exporter is deployed as a standalone deployment and service in the same namespace as the Airflow deployment.
Prometheus requirements
It is necessary to have a working installation of Prometheus or Prometheus Operator for the integration to work. Install the Bitnami Prometheus helm chart or the Bitnami Kube Prometheus helm chart to easily have a working Prometheus in your cluster.
Integration with Prometheus Operator
The chart can deploy ServiceMonitor
objects for integration with Prometheus Operator installations. To do so, set the value metrics.serviceMonitor.enabled=true
. Ensure that the Prometheus Operator CustomResourceDefinitions
are installed in the cluster or it will fail with the following error:
no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
Install the Bitnami Kube Prometheus helm chart for having the necessary CRDs and the Prometheus Operator.
It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
This chart provides support for Ingress resources. If you have an ingress controller installed on your cluster, such as nginx-ingress-controller or contour you can utilize the ingress controller to serve your application.
To enable Ingress integration, set ingress.enabled
to true
.
The most common scenario is to have one host name mapped to the deployment. In this case, the ingress.hostname
property can be used to set the host name. The ingress.tls
parameter can be used to add the TLS configuration for this host. However, it is also possible to have more than one host. To facilitate this, the ingress.extraHosts
parameter (if available) can be set with the host names specified as an array. The ingress.extraTLS
parameter (if available) can also be used to add the TLS configuration for extra hosts.
NOTE: For each host specified in the
ingress.extraHosts
parameter, it is necessary to set a name, path, and any annotations that the Ingress controller should know about. Not all annotations are supported by all Ingress controllers, but this annotation reference document lists the annotations supported by many popular Ingress controllers.
Adding the TLS parameter (where available) will cause the chart to generate HTTPS URLs, and the application will be available on port 443. The actual TLS secrets do not have to be generated by this chart. However, if TLS is enabled, the Ingress record will not work until the TLS secret exists.
Learn more about Ingress controllers.
By default, this chart assumes TLS is managed by the Ingress Controller and terminates the TLS connection in the Ingress Controller. This can be done by setting ingress.enabled
and ingress.tls
parameters to true
as explained in the section above. However, it is possible to configure TLS encryption for the Airflow Webserver directly by setting the web.tls.enabled
parameter to true
.
It is necessary to create a secret containing the TLS certificates and pass it to the chart via the web.tls.existingSecret
parameter. The secret should contain a tls.crt
and tls.key
keys including the certificate and key files respectively. For example:
kubectl create secret generic web-tls-secret --from-file=./tls.crt --from-file=./tls.key
You can manually create the required TLS certificates or relying on the chart auto-generation capabilities. The chart supports two different ways to auto-generate the required certificates:
web.tls.autoGenerated.enabled
to true
and web.tls.autoGenerated.engine
to helm
.web.tls.autoGenerated.enabled
to true
and web.tls.autoGenerated.engine
to cert-manager
. Please note it's supported to use an existing Issuer/ClusterIssuer for issuing the TLS certificates by setting the web.tls.autoGenerated.certManager.existingIssuer
and web.tls.autoGenerated.certManager.existingIssuerKind
parameters.If additional containers are needed in the same pod as Apache Airflow (such as additional metrics or logging exporters), they can be defined using the sidecars
parameter.
sidecars:
- name: your-image-name
image: your-image
imagePullPolicy: Always
ports:
- name: portname
containerPort: 1234
If these sidecars export extra ports, extra port definitions can be added using the service.extraPorts
parameter (where available), as shown in the example below:
service:
extraPorts:
- name: extraPort
port: 11311
targetPort: 11311
If additional init containers are needed in the same pod, they can be defined using the initContainers
parameter. Here is an example:
initContainers:
- name: your-image-name
image: your-image
imagePullPolicy: Always
ports:
- name: portname
containerPort: 1234
Learn more about sidecar containers and init containers.
This chart allows you to set your custom affinity using the affinity
parameter. Find more information about Pod's affinity in the kubernetes documentation.
As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the bitnami/common chart. To do so, set the podAffinityPreset
, podAntiAffinityPreset
, or nodeAffinityPreset
parameters.
To back up and restore Helm chart deployments on Kubernetes, you need to back up the persistent volumes from the source deployment and attach them to a new deployment using Velero, a Kubernetes backup/restore tool. Find the instructions for using Velero in this guide.
The Bitnami Airflow chart relies on the PostgreSQL chart persistence. This means that Airflow does not persist anything.
Name | Description | Value |
---|---|---|
global.imageRegistry | Global Docker image registry | "" |
global.imagePullSecrets | Global Docker registry secret names as an array | [] |
global.defaultStorageClass | Global default StorageClass for Persistent Volume(s) | "" |
global.security.allowInsecureImages | Allows skipping image verification | false |
global.compatibility.openshift.adaptSecurityContext | Adapt the securityContext sections of the deployment to make them compatible with Openshift restricted-v2 SCC: remove runAsUser, runAsGroup and fsGroup and let the platform use their allowed default IDs. Possible values: auto (apply if the detected running cluster is Openshift), force (perform the adaptation always), disabled (do not perform adaptation) | auto |
global.compatibility.omitEmptySeLinuxOptions | If set to true, removes the seLinuxOptions from the securityContexts when it is set to an empty object | false |
Name | Description | Value |
---|---|---|
kubeVersion |
Note: the README for this chart is longer than the DockerHub length limit of 25000, so it has been trimmed. The full README can be found at https://github.com/bitnami/charts/blob/main/bitnami/airflow/README.md