Quantcast
Channel: Active questions tagged ubuntu - Stack Overflow
Viewing all articles
Browse latest Browse all 6180

What is the configuration for Apache Airflow, Django, PySpark together in Systemd services?

$
0
0

We are using Apache airflow, Django (python), and Spark (pySpark) in our project. Our DAGs are running fine when run 'airflow scheduler' is run from command line. However, these DAGs are not working when 'airflow scheduler' is run from systemd (systemctl).

Note: We are using virtual environment to run airflow scheduler (and many python apps).

Initially, we faced issues related to Virtual Environment. Later, Django Setup. We resolved the same. However, we are now facing the pyspark related issues.

We have tried this for the scheduler.

[Unit]Description=Airflow scheduler daemonAfter=network.target postgresql.service mysql.service redis.service rabbitmq-server.serviceWants=postgresql.service mysql.service redis.service rabbitmq-server.service[Service]PIDFile=/home/datauser/Documents/airflow_workspace/airflow_env/bin/scheduler.pidEnvironment="PATH=/home/datauser/Documents/airflow_workspace/airflow_env/bin:/home/datauser/airflow_workspace/airflow_env/airnet/aqdms:/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/bin:/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/sbin"Environment="PYTHONPATH=/home/datauser/Documents/airflow_workspace/airflow_env"Environment="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre"Environment="SPARK_HOME=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3"Environment="PYSPARK_DRIVER_PYTHON=python3"Environment="PYTHONSTARTUP=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/python/pyspark/shell.py"Environment="PYSPARK_PYTHON=python3"Environment="SPARK_CONF_DIR=/home/datauser/Downloads/spark-3.5.1-bin-hadoop3/conf"Environment="PYSPARK_SUBMIT_ARGS=--master local[2] pyspark-shell"User=datauserGroup=datauserType=simpleExecStart=/usr/bin/bash -c 'source /home/datauser/Documents/airflow_workspace/airflow_env/bin/activate ; /home/datauser/Documents/airflow_workspace/airflow_env/bin/airflow scheduler --pid /home/datauser/Documents/airflow_workspace/airflow_env/bin/scheduler.pid'Restart=on-failureRestartSec=5sPrivateTmp=true[Install]WantedBy=multi-user.target

What should be the configuration to use Apache airflow, Django, PySpark and virtual environment in Systemd?


Viewing all articles
Browse latest Browse all 6180

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>