一、安装准备
1添加阿里云安装源
curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
安装基础环境
yum -y install apr autoconf automake bash bash-completion bind-utils bzip2 bzip2-devel chrony cmake coreutils curl curl-devel dbus dbus-libs dhcp-common dos2unix e2fsprogs e2fsprogs-devel file file-libs freetype freetype-devel gcc gcc-c gdb glib2 glib2-devel glibc glibc-devel gmp gmp-devel gnupg iotop kernel kernel-devel kernel-doc kernel-firmware kernel-headers krb5-devel libaio-devel libcurl libcurl-devel libevent libevent-devel libffi-devel libidn libidn-devel libjpeg libjpeg-devel libmcrypt libmcrypt-devel libpng libpng-devel libxml2 libxml2-devel libxslt libxslt-devel libzip libzip-devel lrzsz lsof make microcode_ctl mysql mysql-devel ncurses ncurses-devel net-snmp net-snmp-libs net-snmp-utils net-tools nfs-utils nss nss-sysinit nss-tools openldap-clients openldap-devel openssh openssh-clients openssh-server openssl openssl-devel patch policycoreutils polkit procps readline-devel rpm rpm-build rpm-libs rsync sos sshpass strace sysstat tar tmux tree unzip uuid uuid-devel vim wget yum-utils zip zlib* jq
3时间同步
systemctl start chronyd && systemctl enable chronyd
4重启
reboot
5整体升级
yum update -y
6再次重启
reboot
二、安装GPU显卡驱动
相应的显卡,可以下载很多版本的驱动,尤其跟往上一层使用的cuda版本有关,上一层使用什么版本的cuda就安装,对应版本的驱动
1禁用系统默认安装的 nouveau 驱动
# 修改配置
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
# 备份原来的镜像文件
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
# 重建新镜像文件
sudo dracut --force
# 重启
reboot
# 查看nouveau是否启动,如果结果为空即为禁用成功
lsmod | grep nouveau
2安装DKMS模块
DKMS全称是DynamicKernel ModuleSupport,它可以帮我们维护内核外的驱动程序,在内核版本变动之后可以自动重新生成新的模块。
yum -y install dkms
3拷贝驱动安装包
如果没有提前下载,官网下载即可驱动官网下载地址
cp NVIDIA-Linux-x86_64-418.226.00.run /data/
4安装
sudo sh NVIDIA-Linux-x86_64-418.226.00.run -no-x-check -no-nouveau-check -no-opengl-files
# -no-x-check #安装驱动时关闭X服务
# -no-nouveau-check #安装驱动时禁用nouveau
# -no-opengl-files #只安装驱动文件,不安装OpenGL文件
5按照安装提示进行安装,一路点yes、ok
6验证安装结果
nvidia-smi
7显示如下代表安装成功
Wed Jul 7 11:11:33 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 94C P0 36W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
8显卡验证
lspci | grep -i nvidia
41:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
8.1可能报错指令不存在,安装如下指令
yum install -y pciutils
二、下载升级 gcc 源码并编译安装:
1安装
cd /data/
wget https://mirrors.tuna.tsinghua.edu.cn/gnu/gcc/gcc-8.5.0/gcc-8.5.0.tar.gz
tar -xvf gcc-8.5.0.tar.gz
cd gcc-8.5.0
./contrib/download_prerequisites
mkdir build
cd build
../configure --enable-checking=release --enable-languages=c,c++ --disable-multilib
make -j 16
make install
2建立软连接
cp /usr/local/lib64/libstdc++.so.6.0.25 /lib64
cd /lib64
rm -rf libstdc++.so.6
ln -s libstdc++.so.6.0.25 libstdc++.so.6
3查看
gcc -v
4显示如下代表安装成功
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-checking=release --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 8.5.0 (GCC)
三、英伟达cuda安装
1禁用Nouveau
没有输出就是已经禁用了Nouveau
[root@localhost opt]# lsmod | grep nouveau
2设置开机启动级别
systemctl set-default multi-user.target
3下载cuda安装包
也可以离线下载,cuda官网下载地址
wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
4安装
sudo sh cuda_10.1.243_418.87.00_linux.run
5会出现安装界面,输入accept,第二个界面, 直接选择install
6添加CUDA进入环境变量
6.0 打开配置文件
vim /etc/profile
6.1在开头添加以下四行
输入 i按键,然后粘贴以下四行,输入esc按键,输入:wq保存退出
PATH=$PATH:/usr/local/cuda-10.1/bin/
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64/
export PATH
export LD_LIBRARY_PATH
6.2生效文件
source /etc/profile
7验证安装
输出相应的版本
nvcc -V
四、英伟达cudnn安装
1cudnn下载
下载相关版本的CUDNN(需要先注册账号才能下载):注意:要选择CUDA相对应版本的。 下载地址
上传并解压
cd /data/
tar xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
五、TensorRT安装
TensorRT官方文档: https://docs.nvidia.com/deeplearning/tensorrt/index.html TensorRT官方github: https://github.com/NVIDIA/TensorRT
1下载
https://developer.nvidia.com/nvidia-tensorrt-download
1.1离线下载
填个问卷,然后直接点击进来即可
1.2在线下载不行
鼠标右键 1、
wget https://developer.download.nvidia.cn/compute/machine-learning/tensorrt/secure/6.0/GA_6.0.1.5/tars/TensorRT-6.0.1.5.CentOS-7.6.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz?bjNJHRorOM7wGWYqRC6WNq1Yc5t7qnfDjp0623k5RYOwiHURX7Wn4LGKTjbI_qGQxKPeyZW9uxElmQnnBibKtdNpFWRWcwcdmVKOiCqzXFdawKSqUWj6NlLAFOK8ipKe5XOG8QrgntKTRPsDtKVvlG-yL1BLkxj7KTcTCP5jmu3ezMgAisSZ4lGoNvONTME-wi3MnfXx0obnjy5iu_vmAg1sJohJnXwZ73Fxim-5p71edW_bSeKbzM9VPmU&t=eyJscyI6InJlZiIsImxzZCI6IlJFRi1kb2NzLm52aWRpYS5jb21cLyJ9
快速查找某个文件:
find / -name "TensorRT*"
2、
tar xzvf TensorRT-6.0.1.5.CentOS-7.6.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz
3、下载完后,解压,接着 vi ~/.bashrc,添加如下内容 其中/home/andy/TensorRT替换成你自己实际的目录,cuda-9.0也是。
vim ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/trt/TensorRT-6.0.1.5/lib:/usr/local/cuda-10.1/lib64
source ~/.bashrc
4、然后就是安装对应的python库,包括tensorRT,uff和graphsurgeon
cd /opt/trt/TensorRT-6.0.1.5/python
pip3 install tensorrt-6.0.1.5-cp37-none-linux_x86_64.whl
cd /opt/trt/TensorRT-6.0.1.5/uff
pip3 install uff-0.6.5-py2.py3-none-any.whl
cd /opt/trt/TensorRT-6.0.1.5/graphsurgeon
pip3 install graphsurgeon-0.4.1-py2.py3-none-any.whl
5、搞定后,进到python环境下,执行下import tensorrt,正常的话可以成功导入。
import tensorrt
tensorrt.__version__
'6.0.1.5
6、最后可以编译一下tensorRT提供的一些sample。进到tensorRT的sample目录下,执行make CUDA_INSTALL_DIR=/usr/local/cuda,完成后到tensorRT的bin目录下,可以看到已经生成了可执行的sample,执行./sample_mnist 就可以输出一副字符组成的数字图片,下面跟着mnist的预测结果。 ok整个安装搞定了。
cd /opt/trt/TensorRT-6.0.1.5/samples
make CUDA_INSTALL_DIR=/usr/local/cuda
cd /opt/trt/TensorRT-6.0.1.5/bin
./sample_mnist
7、查看版本号
find / -name NvInferVersion.h
六、安装基本docker
1卸载旧版本
官方安装参考
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
2下载安装包
sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
3配置
停用 disable
sudo yum-config-manager --enable docker-ce-nightly
sudo yum-config-manager --enable docker-ce-test
4安装最新版 Docker Engine
sudo yum install docker-ce docker-ce-cli containerd.io
5启动docker
sudo systemctl start docker
6验证docker是否安装成功
提示以下内容代表安装成功
sudo docker run hello-world
七、安装Nvidia-docker
官方安装参考 因为原本的docker不支持GPU加速,所以NVIDIA单独做了一个docker来加速gpu
1安装依赖
sudo dnf install -y tar bzip2 make automake gcc gcc-c++ vim pciutils elfutils-libelf-devel libglvnd-devel iptables
1.1可能报错
sudo: dnf: command not found 执行以下指令,然后重复上面安装
yum install dnf
2安装docker CE
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo yum repolist -v
sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
sudo yum install docker-ce -y
sudo systemctl --now enable docker
sudo docker run --rm hello-world
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:10.1-base nvidia-smi
3弹出以下提示代表安装成功
八、安装向量库Milvus
官网参考:https://milvus.io/cn/docs/v1.1.1/mishards.md 官方说明:https://xw.qq.com/cmsid/20200831A0KILH00 官方说明:https://blog.csdn.net/weixin_44839084/article/details/107704293