GPU driver 更新後,nvswich的driver也要更新到對應的位置。但不知道為什麼,設定檔也跟著跑掉,導致nvswith的設定位置不對。需要做幾個修正:
1. 修改startup的script吃config的位置。該config本來是設定為fork,但575好像改了設定,給予一個sleep,導致設定也要跟著修改為simple
2. 重新loading config
修改設定檔:
sudo systemctl edit --full nvidia-fabricmanager.service
以下為設定檔,原始的部分我用註解沒有更動:
====
[Unit]
Description=NVIDIA fabric manager service
After=network-online.target
Requires=network-online.target
[Service]
User=root
PrivateTmp=false
#Type=forking
Type=simple
TimeoutStartSec=720
Environment="FM_CONFIG_FILE=/usr/share/nvidia/nvswitch/fabricmanager.cfg"
Environment="FM_PID_FILE=/var/run/nvidia-fabricmanager/nv-fabricmanager.pid"
Environment="NVLSM_CONFIG_FILE=/usr/share/nvidia/nvlsm/nvlsm.conf"
Environment="NVLSM_PID_FILE=/var/run/nvidia-fabricmanager/nvlsm.pid"
PIDFile=/var/run/nvidia-fabricmanager/nv-fabricmanager.pid
#ExecStart=/usr/bin/nvidia-fabricmanager-start.sh $FM_CONFIG_FILE $FM_PID_FILE $NVLSM_CONFIG_FILE $NVLSM_PID_FILE
ExecStart=/usr/bin/nvidia-fabricmanager-start.sh --fm-config-file /usr/share/nvidia/nvswitch/fabricmanager.cfg --fm-pid-file $FM_PID_FILE --nvlsm-config-file $NVLSM_CONFIG_FILE --nvlsm-pid-file $NVLSM_PID_FILE
ExecStop=/bin/sh -c '\
sed -i "/^FM_SM_MGMT_PORT_GUID=0x[a-fA-F0-9]\\+$/d" "$FM_CONFIG_FILE"; \
if [ -f "$NVLSM_CONFIG_FILE" ]; then \
sed -i "/^guid 0x[a-fA-F0-9]\\+$/d" "$NVLSM_CONFIG_FILE"; \
fi; \
if [ -f "$FM_PID_FILE" ] && [ -s "$FM_PID_FILE" ]; then \
kill "$(cat "$FM_PID_FILE")"; \
fi; \
if [ -f "$NVLSM_PID_FILE" ] && [ -s "$NVLSM_PID_FILE" ]; then \
kill "$(cat "$NVLSM_PID_FILE")"; \
fi'
LimitCORE=infinity
[Install]
WantedBy=multi-user.target
====
修改完畢後,要把設定檔重新loading:
sudo systemctl daemon-reload
sudo systemctl start nvidia-fabricmanager.service
最後看看有沒有出錯:
systemctl status nvidia-fabricmanager.service
journalctl -u nvidia-fabricmanager.service -b -n 50 --no-pager
沒有留言:
張貼留言