Centos7部署slurm集群
1.安装munge服务
创建全局用户 (slurm munge 用户所有节点的gid uid必须一致)
export MUNGEUSER=991
groupadd -g $MUNGEUSER munge
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
安装munge包
yum install -y munge munge-devel munge-libs
如果上面安装时找不到包,先安装 epel-release
这个仓库
管理节点创建密钥
/usr/sbin/create-munge-key
将密钥拷贝到计算节点
scp /etc/munge/munge.key node:/etc/munge/
注意: 有的系统 拷贝munge.key到计算节点后需要重新设置下计算节点的munge.key的权限 (ubuntu的 scp继承,centos不继承)
chown -R munge: /etc/munge/ /var/log/munge/ chmod 0700 /etc/munge/ /var/log/munge/
开机自启服务
systemctl enable munge --now
2. 编译安装pmi
openpmix项目主页 Releases · openpmix/openpmix (github.com)
下载bz2结尾的(一般bz2结尾的压缩包包含rpmbuild所需的信息)
wget https://dl.ghpig.top/https://github.com/openpmix/openpmix/releases/download/v5.0.1/pmix-5.0.1.tar.bz2
如果没有rpmbuild命令先安装下
yum install -y rpm-build
构建时提示错误
error: Failed build dependencies:
gcc is needed by pmix-5.0.1-1.el7.x86_64
libevent-devel is needed by pmix-5.0.1-1.el7.x86_64
hwloc-devel is needed by pmix-5.0.1-1.el7.x86_64
python3-devel is needed by pmix-5.0.1-1.el7.x86_64
先安装它的依赖
yum install -y gcc gcc-c++ libevent-devel hwloc-devel python3-devel zlib-devel
然后构建
rpmbuild -ta pmix-5.0.1.tar.bz2
等待构建完成,安装构建好的rpm,默认目录在**~/rpmbuild/RPMS/x86_64/**
yum localinstall -y ~/rpmbuild/RPMS/x86_64/pmix-5.0.1-1.el7.x86_64.rpm
3.编译安装slurm
前往slurm官网 https://www.schedmd.com/downloads.php
选择较为稳定的版本下载 https://download.schedmd.com/slurm/slurm-22.05.8.tar.bz2
wget https://download.schedmd.com/slurm/slurm-22.05.8.tar.bz2
安装依赖
yum install -y openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad rng-tools mysql-devel http-parser-devel json-c-devel libjwt-devel libjwt cpanm* perl-Switch s-nail -y
构建slurm的rpm (带jwt /mysql /slurmrestd 功能)
rpmbuild -ta --with mysql --with slurmrestd --with jwt slurm-22.05.8.tar.bz2
安装rpm
cd /root/rpmbuild/RPMS/x86_64
yum localinstall slurm* -y
4.配置数据库
管理节点>
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
rpm -ivh mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-server
报错缺少key请执行
rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022
启动数据库服务,添加到开机自动启动项目
systemctl start mysqld && systemctl enable mysqld
查看状态
systemctl status mysqld
查看初始密码
grep 'temporary password' /var/log/mysqld.log
初始化数据库
mysql_secure_installation
进入数据库
mysql -uroot -p
创建slurm用户,允许远程登录,密码为 ize2^&*FzU6
CREATE USER 'slurm'@'%' IDENTIFIED BY 'ize2^&*FzU6';
FLUSH privileges;
建两张表,存放slurm信息,授予slurm用户对这两种表的所有权限
作业信息数据库slurm_jobcomp_db 账户数据库slurm_acct_db
CREATE DATABASE IF NOT EXISTS slurm_acct_db CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE DATABASE IF NOT EXISTS slurm_jobcomp_db CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
GRANT ALL PRIVILEGES on slurm_acct_db.* to 'slurm'@'%';
GRANT ALL PRIVILEGES on slurm_jobcomp_db.* to 'slurm'@'%';
FLUSH privileges;
quit
如果slurm 连接数据库报错
error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout
编辑/etc/my.cnf:
[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900
重启mysql服务,问题解决
systemctl restart mysqld
5.配置时间同步服务
master>
安装时间同步服务
yum install chrony -y
开机自启动
systemctl enable chronyd
添加/etc/chrony.conf
,放行ip段
server 127.0.0.1 iburst #只留这一个,服务端写本机ip
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow all #允许那些网段
local stratum 10
logdir /var/log/chrony
重启服务
systemctl restart chronyd
client>
添加/etc/chrony.conf
:
server master iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
重启服务
systemctl restart chronyd
查看时间源
chronyc sources
服务端查看客户端
chronyc clients
6.配置slurm
slurmdbd
/etc/slurm/slurmdbd.conf :
AuthType=auth/munge
AuthInfo=/var/run/munge/munge.socket.2
#AuthAltTypes=auth/jwt
#AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key
DbdHost=localhost
DbdAddr=127.0.0.1
SlurmUser=root
MessageTimeout=60
DebugLevel=debug5
DefaultQOS=normal
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StorageLoc=slurm_acct_db
StoragePort=3306
StorageUser=slurm
StoragePass=ize2^&*FzU6
编辑完后赋予600权限
slurmctld & slurmd
/etc/slurm/cgroup.conf :
CgroupAutomount=yes
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
/etc/slurm/slurm.conf :
ClusterName=cool
SlurmctldHost=master
SlurmctldParameters=enable_configless #无配置模式
SlurmUser=root
SlurmctldPort=6817
SlurmdPort=6818
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
ReturnToService=1
MPIDefault=pmi2
#AuthAltTypes=auth/jwt
#AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key
ProctrackType=proctrack/cgroup
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/cgroup,task/affinity
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/mysql
JobCompHost=localhost
JobCompUser=slurm
JobCompPass=ize2^&*FzU6
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmctldTimeout=120
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
WaitTime=0
NodeName=master CPUs=2 RealMemory=1837 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN
NodeName=node01 CPUs=2 RealMemory=1837 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN
PartitionName=master Nodes=master Default=YES MaxTime=INFINITE State=UP
PartitionName=node01 Nodes=node01 Default=NO MaxTime=INFINITE State=UP
所有节点创建日志文件夹
mkdir /var/log/slurm/ /var/spool/slurmctld /var/spool/slurmd
gpu节点配置**/etc/slum/gres.conf**:
Name=gpu Type=A800 File=/dev/nvidia[0-1]
注意 nvidia显卡设备文件在机器重启后会消失,需要运行
nvidia-smi
命令才会出现,可以运行nvidia-smi -pm 1
保持驱动为持久模式(长时间不用的显卡会进入电源休眠模式,有时候可能掉几张)
7.启动slurm服务
按顺序先启动slurm数据库服务,并加入开机自动启动项目
systemctl start slurmdbd
查看状态:
systemctl status slurmdbd
如果有报错,根据报错和/var/log/slurm下的日志修复,修好重启再检查状态,以下服务同理
systemctl restart slurmdbd
再启动slurm控制服务
systemctl start slurmctld
再启动slurm受控服务
systemctl start slurmd
所有服务都正常后,加入开机启动项目
systemctl enable slurmdbd
systemctl enable slurmctld
systemctl enable slurmd
无配置模式(可选)
计算节点 (提前配置好dns解析,这里master对应的是管理节点的ip)
将上述munge服务安装并启动好,并拷贝/etc/munge/munge.key 到计算节点目录,还有其他mpi等依赖安装好后,启动slurmd服务(计算节点无配置模式)
slurmd --conf-server master:6817
通过上面启动slurmd方式 连接上控制节点后,计算节点会在/var/spool/slurmd/conf-cache 下生成slurm.conf cgroup.conf等配置文件
测试
Slurm禁止普通用户SSH计算节点
确保系统中有这个库 pam_slurm_adopt.so,rpm安装时默认有这个库,源码安装 加上–enable-pam这个参数
编辑SSHD的PAM配置文件/etc/pam.d/sshd,添加
account sufficient pam_listfile.so item=user sense=allow file=/etc/ssh/allowed_users onerr=fail
account required pam_slurm_adopt.so
/etc/pam.d/system-auth和/etc/pam.d/password-auth中取消pam_localuser.so和pam_systemd.so所在行的注释
确保/etc/ssh/sshd_config中UsePAM yes
/etc/ssh/allowed_users 里配置白名单用户
slurm.conf配置文件中有这三个参数
PrologFlags=CONTAIN
TaskPlugin=task/cgroup
ProctrackType=proctrack/cgroup
注意:如果没有PrologFlags=CONTAIN 这项参数 则salloc后无法ssh计算节点
cgruop.conf有这两个参数
CgroupAutomount=yes
ConstrainCores=yes
重启计算节点sshd服务,即生效
测试:
直接ssh计算节点失败
先salloc申请节点,再ssh成功
Slurm API 安装和配置
编译
编译时加上 –with slurmrestd 选项
配置
AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/etc/slurm/jwt_hs256.key
slurm.conf slurmdbd.conf 添加上面两行
禁止普通用生成令牌在slurm中添加
AuthAltParameters=disable_token_creation
启动slurmrestd
先随便声明一个SLURM_JWT变量
export SLURM_JWT=daemon
命令方式启动
使用jnist01普通用户启动
认证方式为rest_auth/jwt
配置文件使用/etc/slurm/slurm.conf
路由地址0.0.0.0:8080
slurmrestd -vvv -u jnist01 -a rest_auth/jwt -f /etc/slurm/slurm.conf 0.0.0.0:8080
scontrol命令获取token
[jnist01@master ~]$ scontrol token username=jnist01 lifespan=3600
SLURM_JWT=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2ODYxMjc5MzAsImlhdCI6MTY4NjEyNjEzMCwic3VuIjoiam5pc3QwMSJ9.ZvDbMDUNBv1RMGRwH7dGfHHelQzEXI5-oypyaa_8TLs
测试
headers里加上下面这两个参数 分别为刚才用scontrol命令生成的token和用户名
X-SLURM-USER-TOKEN
X-SLURM-USER-NAME
进阶systemd服务
配置 /usr/lib/systemd/system/slurmrestd.service:
[Unit]
Description=Slurm REST daemon
After=network-online.target slurmctld.service
Wants=network-online.target
ConditionPathExists=/etc/slurm/slurm.conf
[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/slurmrestd
User=bit
Group=bit
ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS 0.0.0.0:8080
Environment="SLURM_JWT=$SLURM_JWT"
Environment="SLURM_CONF=$SLURM_CONF"
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
这里用user group用普通用户bit
配置 /etc/sysconfig/slurmrestd:
SLURMRESTD_DEBUG=4
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JWT=daemon
SLURMRESTD_OPTIONS=-u bit -a rest_auth/jwt
启动
systemctl daemon-reload && systemctl restart slurmrestd && systemctl status slurmrestd
总结
slurm无配置模式并不是一个配置文件也不写,而是指计算节点无需配置文件,若计算节点的机器规格不统一,则不建议使用此模式,这种模式适合计算节点机器规格统一的时候,无需systemd服务,所以也适合跑在容器里
8.常见问题
1.sinfo 显示节点STATE(状态)为darin 尝试运行下面的命令
scontrol update NodeName=node0 State=RESUME
2.sinfo 显示节点状态带*
尝试在该节点重启slurmd服务
systemctl restart slurmd