I am trying to build a base docker image for pyspark as outlined here: https://spark.apache.org/docs/3.4.3/running-on-kubernetes.html. When I run the command, I get the folowing error
28.84 error: externally-managed-environment28.8428.84 × This environment is externally managed28.84 ╰─> To install Python packages system-wide, try apt install28.84 python3-xyz, where xyz is the package you are trying to28.84 install.28.8428.84 If you wish to install a non-Debian-packaged Python package,28.84 create a virtual environment using python3 -m venv path/to/venv.28.84 Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make28.84 sure you have python3-full installed.28.8428.84 If you wish to install a non-Debian packaged Python application,28.84 it may be easiest to use pipx install xyz, which will manage a28.84 virtual environment for you. Make sure you have pipx installed.28.8428.84 See /usr/share/doc/python3.12/README.venv for more information.28.8428.84 note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.28.84 hint: See PEP 668 for the detailed specification.------Dockerfile:27-------------------- 26 | RUN mkdir ${SPARK_HOME}/python 27 | >>> RUN apt-get update && \ 28 | >>> apt install -y python3 python3-pip && \ 29 | >>> pip3 install --upgrade pip setuptools && \ 30 | >>> # Removed the .cache to save space 31 | >>> rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/* 32 |--------------------ERROR: failed to solve: process "/bin/sh -c apt-get update && apt install -y python3 python3-pip && pip3 install --upgrade pip setuptools && rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 1
I see that there's a bug ticket logged for this issue as well: https://issues.apache.org/jira/browse/SPARK-49068
Is there any way to work around this to build the image?
I tried building the base image first using
./bin/docker-image-tool.sh -r <repo> -t my_spark-base build
Then, tried to pass the base image for the pyspark image, but it didn't work.