bitnamicharts/mlflow
Bitnami Helm chart for MLFlow
500K+
MLflow is an open-source platform designed to manage the end-to-end machine learning lifecycle. It allows you to track experiments, package code into reproducible runs, and share and deploy models.
Trademarks: This software listing is packaged by Bitnami. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.
helm install my-release oci://registry-1.docker.io/bitnamicharts/mlflow
Looking to use MLflow in production? Try VMware Tanzu Application Catalog, the commercial edition of the Bitnami catalog.
This chart bootstraps a MLflow deployment on a Kubernetes cluster using the Helm package manager.
Python is built for full integration into Python that enables you to use it with its libraries and main packages.
Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters.
To install the chart with the release name my-release
:
helm install my-release oci://REGISTRY_NAME/REPOSITORY_NAME/mlflow
Note: You need to substitute the placeholders
REGISTRY_NAME
andREPOSITORY_NAME
with a reference to your Helm chart registry and repository. For example, in the case of Bitnami, you need to useREGISTRY_NAME=registry-1.docker.io
andREPOSITORY_NAME=bitnamicharts
.
The command deploys mlflow on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.
Tip: List all releases using
helm list
Bitnami charts allow setting resource requests and limits for all containers inside the chart deployment. These are inside the resources
value (check parameter table). Setting requests is essential for production workloads and these should be adapted to your specific use case.
To make this process easier, the chart contains the resourcesPreset
values, which automatically sets the resources
section according to different presets. Check these presets in the bitnami/common chart. However, in production workloads using resourcesPreset
is discouraged as it may not fully adapt to your specific needs. Find more information on container resource management in the official Kubernetes documentation.
This chart can be integrated with Prometheus by setting tracking.metrics.enabled
to true
. This will expose MLFlow native Prometheus endpoint in the service. It will have the necessary annotations to be automatically scraped by Prometheus.
Prometheus requirements
It is necessary to have a working installation of Prometheus or Prometheus Operator for the integration to work. Install the Bitnami Prometheus helm chart or the Bitnami Kube Prometheus helm chart to easily have a working Prometheus in your cluster.
Integration with Prometheus Operator
The chart can deploy ServiceMonitor
objects for integration with Prometheus Operator installations. To do so, set the value tracking.metrics.serviceMonitor.enabled=true
. Ensure that the Prometheus Operator CustomResourceDefinitions
are installed in the cluster or it will fail with the following error:
no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
Install the Bitnami Kube Prometheus helm chart for having the necessary CRDs and the Prometheus Operator.
MLflow can encrypt communications by setting tracking.tls.enabled=true
. The chart allows two configuration options:
tracking.tls.certificatesSecret
value. Also set the correct name of the certificate files using the tracking.tls.certFilename
, tracking.tls.certKeyFilename
and tracking.tls.certCAFilename
values.tracking.tls.autoGenerated=true
.To back up and restore Helm chart deployments on Kubernetes, you need to back up the persistent volumes from the source deployment and attach them to a new deployment using Velero, a Kubernetes backup/restore tool. Find the instructions for using Velero in this guide.
Name | Description | Value |
---|---|---|
global.imageRegistry | Global Docker image registry | "" |
global.imagePullSecrets | Global Docker registry secret names as an array | [] |
global.defaultStorageClass | Global default StorageClass for Persistent Volume(s) | "" |
global.storageClass | DEPRECATED: use global.defaultStorageClass instead | "" |
global.security.allowInsecureImages | Allows skipping image verification | false |
global.compatibility.openshift.adaptSecurityContext | Adapt the securityContext sections of the deployment to make them compatible with Openshift restricted-v2 SCC: remove runAsUser, runAsGroup and fsGroup and let the platform use their allowed default IDs. Possible values: auto (apply if the detected running cluster is Openshift), force (perform the adaptation always), disabled (do not perform adaptation) | auto |
Name | Description | Value |
---|---|---|
kubeVersion | Override Kubernetes version | "" |
nameOverride | String to partially override common.names.name | "" |
fullnameOverride | String to fully override common.names.fullname | "" |
namespaceOverride | String to fully override common.names.namespace | "" |
commonLabels | Labels to add to all deployed objects | {} |
commonAnnotations | Annotations to add to all deployed objects | {} |
clusterDomain | Kubernetes cluster domain name | cluster.local |
extraDeploy | Array of extra objects to deploy with the release | [] |
diagnosticMode.enabled | Enable diagnostic mode (all probes will be disabled and the command will be overridden) | false |
diagnosticMode.command | Command to override all containers in the deployment | ["sleep"] |
diagnosticMode.args | Args to override all containers in the deployment | ["infinity"] |
Name | Description | Value |
---|---|---|
image.registry | mlflow image registry | REGISTRY_NAME |
image.repository | mlflow image repository | REPOSITORY_NAME/mlflow |
image.digest | mlflow image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag image tag (immutable tags are recommended) | "" |
image.pullPolicy | mlflow image pull policy | IfNotPresent |
image.pullSecrets | mlflow image pull secrets | [] |
image.debug | Enable mlflow image debug mode | false |
gitImage.registry | Git image registry | REGISTRY_NAME |
gitImage.repository | Git image repository | REPOSITORY_NAME/git |
gitImage.digest | Git image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag | "" |
gitImage.pullPolicy | Git image pull policy | IfNotPresent |
gitImage.pullSecrets | Specify docker-registry secret names as an array | [] |
Name | Description | Value |
---|---|---|
tracking.enabled | Enable Tracking server | true |
tracking.replicaCount | Number of mlflow replicas to deploy | 1 |
tracking.host | mlflow tracking listening host. Set to "[::]" to use ipv6. | 0.0.0.0 |
tracking.containerPorts.http | mlflow HTTP container port | 5000 |
tracking.livenessProbe.enabled | Enable livenessProbe on mlflow containers | true |
tracking.livenessProbe.initialDelaySeconds | Initial delay seconds for livenessProbe | 5 |
tracking.livenessProbe.periodSeconds | Period seconds for livenessProbe | 10 |
tracking.livenessProbe.timeoutSeconds | Timeout seconds for livenessProbe | 5 |
tracking.livenessProbe.failureThreshold | Failure threshold for livenessProbe | 5 |
tracking.livenessProbe.successThreshold | Success threshold for livenessProbe | 1 |
tracking.readinessProbe.enabled | Enable readinessProbe on mlflow containers | true |
tracking.readinessProbe.initialDelaySeconds | Initial delay seconds for readinessProbe | 5 |
tracking.readinessProbe.periodSeconds | Period seconds for readinessProbe | 10 |
tracking.readinessProbe.timeoutSeconds | Timeout seconds for readinessProbe | 5 |
tracking.readinessProbe.failureThreshold | Failure threshold for readinessProbe | 5 |
tracking.readinessProbe.successThreshold | Success threshold for readinessProbe | 1 |
tracking.startupProbe.enabled | Enable startupProbe on mlflow containers | false |
tracking.startupProbe.initialDelaySeconds | Initial delay seconds for startupProbe | 5 |
tracking.startupProbe.periodSeconds | Period seconds for startupProbe | 10 |
tracking.startupProbe.timeoutSeconds | Timeout seconds for startupProbe | 5 |
tracking.startupProbe.failureThreshold | Failure threshold for startupProbe | 5 |
tracking.startupProbe.successThreshold | Success threshold for startupProbe | 1 |
tracking.customLivenessProbe | Custom livenessProbe that overrides the default one | {} |
tracking.customReadinessProbe | Custom readinessProbe that overrides the default one | {} |
tracking.customStartupProbe | Custom startupProbe that overrides the default one | {} |
tracking.resourcesPreset | Set container resources according to one common preset (allowed values: none, nano, micro, small, medium, large, xlarge, 2xlarge). This is ignored if tracking.resources is set (tracking.resources is recommended for production). | medium |
tracking.resources | Set container requests and limits for different resources like CPU or memory (essential for production workloads) | {} |
tracking.podSecurityContext.enabled | Enabled mlflow pods' Security Context | true |
tracking.podSecurityContext.fsGroupChangePolicy | Set filesystem group change policy | Always |
tracking.podSecurityContext.sysctls | Set kernel settings using the sysctl interface | [] |
tracking.podSecurityContext.supplementalGroups | Set filesystem extra groups | [] |
tracking.podSecurityContext.fsGroup | Set mlflow pod's Security Context fsGroup | 1001 |
tracking.containerSecurityContext.enabled | Enabled containers' Security Context | true |
tracking.containerSecurityContext.seLinuxOptions | Set SELinux options in container | {} |
tracking.containerSecurityContext.runAsUser | Set containers' Security Context runAsUser |
Note: the README for this chart is longer than the DockerHub length limit of 25000, so it has been trimmed. The full README can be found at https://github.com/bitnami/charts/blob/main/bitnami/mlflow/README.md