We are using Apache airflow, Django (python), and Spark (pySpark) in our project. Our DAGs are running fine when run 'airflow scheduler' is run from command line. However, these DAGs are not working when 'airflow scheduler' is run from systemd (systemctl).
Note: We are using virtual environment to run airflow scheduler (and many python apps).
Initially, we faced issues related to Virtual Environment. Later, Django Setup. We resolved the same. However, we are now facing the pyspark related issues.
We have tried this for the scheduler.
[Unit]Description=Airflow scheduler daemonAfter=network.target postgresql.service mysql.service redis.service rabbitmq-server.serviceWants=postgresql.service mysql.service redis.service rabbitmq-server.service[Service]PIDFile=/home/datauser/Documents/airflow_workspace/airflow_env/bin/scheduler.pidEnvironment="PATH=/home/datauser/Documents/airflow_workspace/airflow_env/bin:/home/datauser/airflow_workspace/airflow_env/airnet/aqdms:/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/bin:/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/sbin"Environment="PYTHONPATH=/home/datauser/Documents/airflow_workspace/airflow_env"Environment="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre"Environment="SPARK_HOME=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3"Environment="PYSPARK_DRIVER_PYTHON=python3"Environment="PYTHONSTARTUP=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/python/pyspark/shell.py"Environment="PYSPARK_PYTHON=python3"Environment="SPARK_CONF_DIR=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/conf"Environment="PYSPARK_SUBMIT_ARGS=--master local[2] pyspark-shell"User=datauserGroup=datauserType=simpleExecStart=/usr/bin/bash -c 'source /home/datauser/Documents/airflow_workspace/airflow_env/bin/activate ; /home/datauser/Documents/airflow_workspace/airflow_env/bin/airflow scheduler --pid /home/datauser/Documents/airflow_workspace/airflow_env/bin/scheduler.pid'Restart=on-failureRestartSec=5sPrivateTmp=true[Install]WantedBy=multi-user.target
What should be the configuration to use Apache airflow, Django, PySpark and virtual environment in Systemd?