I'm new to servers, and I just built a server out of spare parts I got for free, running Ubuntu (server) 24.04 LTS with 2 nvidia gpu's (gtx 1080ti and a gtx 1050ti).
I want to run some code on it that uses pytorch and my GPU, but for the life of me, I can not get cuda to work properly. tried installing nvidia toolkit via sudo apt install nvidia-cuda-toolkit
, then pytorch with the correct installed cuda version via pip3 install torch torchvision torchaudio
. Everything installed properly, but when I went to test it, with torch.cuda.is_available()
, it returned false - which should be true since I have 2 gpu's that are functional, and cuda capable.
which lead me to test nvidia-smi
, which gave me: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.", I've tried updating the OS, I've tried installing and uninstalling the drivers about a million different ways and nothing seems to work. Honestly, I am not really sure how that stuff works, on my pc its pretty straight forward with the GUI nvidia provides, and it just works, but im not used to a 100% terminal interface, and the fact that its a server. I'm so stuck and completely I'm out of ideas, any help would be much appreciated.
Note: the driver version I get from running modinfo nvidia | grep ^version
is 470.256.02