Quantcast
Channel: Active questions tagged ubuntu - Stack Overflow
Viewing all articles
Browse latest Browse all 5962

Why do spark workers write all tmp files, including shuffle and cache files, to different directories even though we define spark.local.dir?

$
0
0

We are running Spark 3.5.0, PySpark 3.5.0 with 5 nodes in our cluster in client mode. The nodes in our cluster are using Ubuntu 22.04.2 LTS.

We were running into "No Space Left on Device" Errors caused Spark writing shuffle and cache files to the default /tmp on our worker nodes.

The problem: Only one of many nodes listens to spark.local.dir defined in spark-submit and SPARK_WORKER_OPTS defined in spark-env.sh.

We have tried to define spark.local.dir via 1) spark-submit, 2) SparkConf().set(), 3) spark_defaults.conf and SPARK_LOCAL_DIRS via 1) spark-env.sh, and 2) --conf "spark.executorEnv.SPARK_LOCAL_DIRS=/mnt_path". In all cases, one of our nodes successfully writes to the defined path but the other nodes are writing to the default /tmp. Note, our defined path is a high-speed mounted volume shared with all of the nodes in our cluster.

Also, we have added the following to our spark-env.sh: export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=60 -Dspark.worker.cleanup.appDataTtl=60" in hopes of removing work folders that are filling up our Root Volume. Again, this Environment variable is only working on one of our nodes.


Viewing all articles
Browse latest Browse all 5962

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>