I'm facing difficulties trying to deploy Apache Airflow locally using Helm charts with a custom Dockerfile. Here is a detailed description of my problem:
ContextEnvironment: Local Kubernetes Cluster (Kind Cluster)Orchestration Tool: Helm chartsAirflow Executor: KubernetesExecutorCustomization: Using a custom Dockerfile for Airflow
1) I created a dockerfile:
FROM apache/airflow:2.8.0USER rootUSER airflowRUN pip install apache-airflow-providers-apache-spark
2) After, maked a build:
sudo docker build -t my-custom/airflow:latest .
3) load to kind:
sudo kind load docker-image my-custom/airflow:latest
4) Apply values.yaml:
helm install airflow apache-airflow/airflow -f values.yaml
My values.yaml:
executor: KubernetesExecutorairflow: image: repository: my-custom/airflow tag: latest pullPolicy: IfNotPresent pullSecret: "" uid: 50000 gid: 0config: core: load_examples: 'False' load_default_connections: 'False' webserver: expose_config: 'False' logging: remote_logging: 'True' remote_log_conn_id: "google_cloud_default" remote_base_log_folder: "gs://my-gcs-bucket"dags: gitSync: enabled: true repo: #my external repo branch: main rev: HEAD subPath: dags depth: 1 wait: 60scheduler: replicas: 1web: replicas: 1 service: type: LoadBalancer resources: requests: cpu: 500m memory: 7Gi limits: cpu: 500m memory: 16Gi initContainers: - name: wait-for-scheduler image: busybox command: ['sh', '-c', 'until nslookup airflow-scheduler; do echo waiting for scheduler; sleep 2; done;'] livenessProbe: initialDelaySeconds: 1800 periodSeconds: 20 timeoutSeconds: 5 failureThreshold: 5 readinessProbe: initialDelaySeconds: 1800 periodSeconds: 20 timeoutSeconds: 5 failureThreshold: 5 startupProbe: initialDelaySeconds: 300 periodSeconds: 30 timeoutSeconds: 5 failureThreshold: 10postgresql: enabled: falsedata: metadataConnection: protocol: postgres host: 192.168.15.8 port: 5432 db: airflow user: postgres pass: airflowflower: enabled: falseredis: enabled: falsetriggerer: enabled: trueingress: enabled: false
It installs successfully, however the log shows a different version of the airflow specified in the docker image, as well as the spark modules were not installed when accessing the airflow UI.
NAME: airflowLAST DEPLOYED: Thu Aug 1 17:33:21 2024NAMESPACE: defaultSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:**Thank you for installing Apache Airflow 2.9.3!**Your release is named airflow.You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser:
Has anyone faced a similar issue or has any idea what might be causing this behavior? Any tips on how to debug and resolve this issue would be greatly appreciated.
Thank you in advance for your help!