I'm installing SLURM for a local cluster following SouthGreenPlatforms SLURM installation guide. However, I'm getting two different errors when I check the status of the compute nodes and master node.On the compute nodes, after running systemctl status slurmd.service
, I get the following error:
slurmd.service - Slurm node daemon Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-03-22 00:09:33 UTC; 4min os ago Main PID: 1444 (slurmd) Tasks: 2 (limit: 2311) CGroup: /system.slice/slurmd.service -1444 /usr/sbin/slurmd -d /usr/sbin/slurmstepdMar 22 00:09:25 cnode1 systemd[1]: starting Slurm node daemon.Mar 22 00:09:32 cnode1 systemd[1]: slurmd.service: Can't open PID file /var/run/slurmd.pid (yet?) after start: No such file or directoryMar 22 00:09:33 cnode1 systemd[1]: started Slurm node daemon.
If I run the systemctl status slurmctld.service
command on the master node, I gives me:
slurmctld.service - Slurm controller daemon Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: enabled) Active: active (running) since sun 2024-03-17 16:00:00 UTC; 3h 46min ago Main PID: 1350 (slurmctld) Tasks: 10 (limit: 2311) CGroup: /system.siice/slurmctld.service -1350 usr/sbin/slurmctld -1360 slurmctld: slurmscriptdMar 17 15:59:56 master systemd[1]: starting Slurm controller daemon..Mar 17 16:00:00 master systemd[1]: slurmctld.service: Can't open PID file /var/run/slurmctld.pid (yet?) after start: No such file or directoryMar 17 16:00:00 master systemd(1]: Started Slurm controller daemon.Mar 17 16:00:01 master slurmctld[1350]: error: chdir(/var/log): Permission deniedMAR 17 16:00:01 master slurmct.ld[1350]: error: Configured MailProg is invalidMar 17 16:00:01 master slurmct 1d[1350]: slurmct ld version 21.08.3 started on cluster clusterMar 17 16:00:03 master slurmctld[1350]: No memory enforcing mechanism configured.
Here is the slurm.conf file:
ClusterName=clusterSlurmctldHost=master#SlurmctldHost=MpiDefault=pmi2ProctrackType=proctrack/cgroupReturnToService=1SlurmctldPidFile=/var/run/slurmctld.pidSlurmctldPort=6817SlurmdPidFile=/var/run/slurmd.pidSlurmdPort=6818SlurmdSpoolDir=/var/spool/slurmdSlurmUser=slurmStateSaveLocation=/var/spool/slurmctldTaskPlugin=task/affinity,task/cgroup### TIMERSInactiveLimit=0KillWait=30MinJobAge=300SlurmctldTimeout=120SlurmdTimeout=300Waittime=0### SCHEDULINGSchedulerType=sched/backfillSelectType=select/cons_tresJobCompType=jobcomp/noneJobAcctGatherFrequency=30SlurmctldDebug=infoSlurmctldLogFile=/var/log/slurmctld.logSlurmdDebug=infoSlurmdLogFile=/var/log/slurmd.log### COMPUTE NODESNodeName=cnode[1-2] CPUs=2 State=UNKNOWNPartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
What could be causing this error? I'm using ubuntu 18.04 and SLURM 21.08.3.