Quantcast
Channel: Active questions tagged ubuntu - Stack Overflow
Viewing all articles
Browse latest Browse all 5962

SLURM service file permission issues [closed]

$
0
0

I'm installing SLURM for a local cluster following SouthGreenPlatforms SLURM installation guide. However, I'm getting two different errors when I check the status of the compute nodes and master node.On the compute nodes, after running systemctl status slurmd.service, I get the following error:

  slurmd.service - Slurm node daemon   Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor preset: enabled)   Active: active (running) since Fri 2024-03-22 00:09:33 UTC; 4min os ago Main PID: 1444 (slurmd)    Tasks: 2 (limit: 2311)   CGroup: /system.slice/slurmd.service           -1444 /usr/sbin/slurmd -d /usr/sbin/slurmstepdMar 22 00:09:25 cnode1 systemd[1]: starting Slurm node daemon.Mar 22 00:09:32 cnode1 systemd[1]: slurmd.service: Can't open PID file /var/run/slurmd.pid (yet?) after start: No such file or directoryMar 22 00:09:33 cnode1 systemd[1]: started Slurm node daemon.

If I run the systemctl status slurmctld.service command on the master node, I gives me:

 slurmctld.service - Slurm controller daemon   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: enabled)   Active: active (running) since sun 2024-03-17 16:00:00 UTC; 3h 46min ago Main PID: 1350 (slurmctld)    Tasks: 10 (limit: 2311)   CGroup: /system.siice/slurmctld.service           -1350 usr/sbin/slurmctld           -1360 slurmctld: slurmscriptdMar 17 15:59:56 master systemd[1]: starting Slurm controller daemon..Mar 17 16:00:00 master systemd[1]: slurmctld.service: Can't open PID file /var/run/slurmctld.pid (yet?) after start: No such file or directoryMar 17 16:00:00 master systemd(1]: Started Slurm controller daemon.Mar 17 16:00:01 master slurmctld[1350]: error: chdir(/var/log): Permission deniedMAR 17 16:00:01 master slurmct.ld[1350]: error: Configured MailProg is invalidMar 17 16:00:01 master slurmct 1d[1350]: slurmct ld version 21.08.3 started on cluster clusterMar 17 16:00:03 master slurmctld[1350]: No memory enforcing mechanism configured.

Here is the slurm.conf file:

ClusterName=clusterSlurmctldHost=master#SlurmctldHost=MpiDefault=pmi2ProctrackType=proctrack/cgroupReturnToService=1SlurmctldPidFile=/var/run/slurmctld.pidSlurmctldPort=6817SlurmdPidFile=/var/run/slurmd.pidSlurmdPort=6818SlurmdSpoolDir=/var/spool/slurmdSlurmUser=slurmStateSaveLocation=/var/spool/slurmctldTaskPlugin=task/affinity,task/cgroup### TIMERSInactiveLimit=0KillWait=30MinJobAge=300SlurmctldTimeout=120SlurmdTimeout=300Waittime=0### SCHEDULINGSchedulerType=sched/backfillSelectType=select/cons_tresJobCompType=jobcomp/noneJobAcctGatherFrequency=30SlurmctldDebug=infoSlurmctldLogFile=/var/log/slurmctld.logSlurmdDebug=infoSlurmdLogFile=/var/log/slurmd.log### COMPUTE NODESNodeName=cnode[1-2] CPUs=2 State=UNKNOWNPartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

What could be causing this error? I'm using ubuntu 18.04 and SLURM 21.08.3.


Viewing all articles
Browse latest Browse all 5962

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>