Quantcast
Channel: Active questions tagged ubuntu - Stack Overflow
Viewing all articles
Browse latest Browse all 7069

Slurm slurmd -C detects only 8 CPUs instead of 20 (Intel Ultra 7 265K, hwloc 2.10) [closed]

$
0
0

I’ve set up a Slurm cluster following the official quickstart admin guide:https://slurm.schedmd.com/quickstart_admin.htmlBoth slurmctld and slurmd are running as services, but I’ve run into a CPU detection issue.

My system has 20 CPUs, confirmed by lscpu:

$ lscpuArchitecture:                x86_64CPU(s):                      20On-line CPU(s) list:         0-19Vendor ID:                   GenuineIntelModel name:                  Intel(R) Core(TM) Ultra 7 265KThread(s) per core:          1Core(s) per socket:          20Socket(s):                   1

The hardware topology (lstopo) also shows 20 processing units (screenshot attached).However, when I run slurmd -C, it only reports 8 CPUs:

$ slurmd -CNodeName=mone-workstation CPUs=8 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=128154 Gres=gpu:nvidia_geforce_rtx_5090:1Found gpu:nvidia_geforce_rtx_5090:1 with Autodetect=nvmlUpTime=0-15:56:54

Versions installed:

$ slurmd --versionslurm 25.05.3$ dpkg -l | grep hwlocii  hwloc             2.10.0-1build1   amd64ii  libhwloc-dev      2.10.0-1build1   amd64ii  libhwloc-plugins  2.10.0-1build1   amd64ii  libhwloc15        2.10.0-1build1   amd64

And this is the hardware topology:

$ lstopo-no-graphics | grep PU      L2 L#0 (3072KB) + L1d L#0 (48KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)      L2 L#1 (3072KB) + L1d L#1 (48KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)      L2 L#2 (3072KB) + L1d L#2 (48KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)      L2 L#3 (3072KB) + L1d L#3 (48KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)      L2 L#4 (3072KB) + L1d L#4 (48KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4)      L2 L#5 (3072KB) + L1d L#5 (48KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5)      L2 L#6 (3072KB) + L1d L#6 (48KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 (P#6)      L2 L#7 (3072KB) + L1d L#7 (48KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 (P#7)        L1d L#8 (32KB) + L1i L#8 (64KB) + Core L#8 + PU L#8 (P#8)        L1d L#9 (32KB) + L1i L#9 (64KB) + Core L#9 + PU L#9 (P#9)        L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10 + PU L#10 (P#10)        L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11 + PU L#11 (P#11)        L1d L#12 (32KB) + L1i L#12 (64KB) + Core L#12 + PU L#12 (P#12)        L1d L#13 (32KB) + L1i L#13 (64KB) + Core L#13 + PU L#13 (P#13)        L1d L#14 (32KB) + L1i L#14 (64KB) + Core L#14 + PU L#14 (P#14)        L1d L#15 (32KB) + L1i L#15 (64KB) + Core L#15 + PU L#15 (P#15)        L1d L#16 (32KB) + L1i L#16 (64KB) + Core L#16 + PU L#16 (P#16)        L1d L#17 (32KB) + L1i L#17 (64KB) + Core L#17 + PU L#17 (P#17)        L1d L#18 (32KB) + L1i L#18 (64KB) + Core L#18 + PU L#18 (P#18)        L1d L#19 (32KB) + L1i L#19 (64KB) + Core L#19 + PU L#19 (P#19)        GPU(Display) ":1.0"

So far I’ve confirmed:

  • Hardware and OS see all 20 CPUs.
  • hwloc-ls (via lstopo) correctly shows all 20 cores.
  • Slurm itself (via slurmd -C) only detects 8.

Questions:

  1. Why does slurmd only detect 8 CPUs when the system clearly has 20?
  2. Could this be an issue with Slurm’s hwloc integration (2.10 vs Slurm 25.05.3)?
  3. Is there a config tweak that I need to adjust so Slurm sees all CPUs?

Any guidance on how to make Slurm recognize all 20 CPUs would be appreciated.


Viewing all articles
Browse latest Browse all 7069

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>