1 介绍深度学习

??深度学习（deep learning）它是机器学习的分支，是一种以人工神经网络为架构，学习数据特征的算法。

??机器学习实现人工智能是一种方式，深度学习是机器学习的一种方法。

机器学习与深度学习的区别：

??（1）抽取特征的方法不同。需要机器学习提取人工特征深度学习没有复杂的人工特征提取过程，特征提取过程可以通过深度神经网络自动完成。

??（2）数据量不同。深度学习需要大量的训练数据集，效果会更高。深度学习训练深度神经网络需要大量的计算能力，因为参数更多

深度学习的应用场景： ??(1)图像识别:(物体识别)(场景识别)(人脸跟踪技术)(人脸认证) ??(2)自然语言处理:(机器翻译)(文本识别)(聊天对话) ??(3)语音技术:(语音识别)
深度学习框架常见：

??目前中常见的深度学习框架很多，TensorFlow, Caffe2, Keras, Theano, PyTorch, Chainer, DyNet, MXNet, and CNTK等等。

??其中tensorflow和Kears是google生产。用户多，工业界一般用，但语法晦涩和和谐python语法不同，入门级玩家很难上手。但是后续的tensorflow2.0版本和Kears它变得非常简单，语法结构变得和谐pytorch越来越近。

??一般用于学术界facebook出的PyTorch，掌握了pytorch后续接触tensorflow2.0版本和Kears也可以快速上手，所以不用拘泥于要学哪一种深度学习框架，PyTorch的使用和python语法相同，整个操作相似Numpy操作，还有 PyTorch动态计算将使代码调试更容易。

深度学习的步骤

2 介绍神经网络

2.1 人工神经网络

人工神经网络（Artificial Neural Network，ANN），简称神经网络（Neural Network，NN）或类神经网络，它是一种模仿生物神经网络(动物中枢神经系统，特别是大脑)结构和功能的数学模型，用于估计或类似函数。和其他机器学习方法一样，神经网络也被用来解决机器视觉和语音识别等各种问题。传统的基于规则的编程很难解决这些问题。

神经元： ??（1）在生物神经网络中，每个神经元与其他神经元相连。当它兴奋时，它会向相连的神经元发送化学物质，从而改变这些神经元中的电位；如果一个神经元的电位超过一个阈值，它将被激活，即兴奋并他神经元发送化学物质。 1943 年，McCulloch 和 Pitts 将上述情况抽象成上图所示的简单模型，这是一直使用至今的简单模型 M-P 神经元模型。将许多这样的神经元按一定的层次结构连接起来，得到神经网络。 ??（2）例子：一个简单的神经元如下图所示

??用数学公式表达是指使用数学公式 t = f ( W T A b ) t = f(W^TA b) t=f(WTA b)

??由此可见，神经元的功能是通过非线性传递函数获得输入向量和权向量内积后的标量结果。

单层神经网络：

??单层神经网络是由有限的神经元组成的最基本的神经元网络形式，所有神经元的输入向量都是相同的向量。由于每个神经元都会产生一个标量结果，单层神经元的输出是一个向量，维度等于神经元的数量。

感知机： ??感知机由两层神经网络输入层在接收外部输入信号后传输输出层(输出 -1反例)，输出层是 M-P 神经元。

??感知机的功能：用超平面将n维向量空间分成两部分，给定输入向量，超平面可以判断该向量位于超平面的哪一侧，得到输入时是正类或是反类，对应到2维空间就是一条直线把一个平面分为两个部分。

多层神经网络：多层神经网络就是由单层神经网络进行叠加之后得到的，所以就形成了层的概念，常见的多层神经网络有如下结构：输入层（Input layer）：众多神经元（Neuron）接受大量输入消息。输入的消息称为输入向量。输出层（Output layer）：消息在神经元链接中传输、分析、权衡，形成输出结果。输出的消息称为输出向量。隐藏层（Hidden layer）：简称“隐层”，是输入层和输出层之间众多神经元和链接组成的各个层面。隐层可以有一层或多层。隐层的节点（神经元）数目不定，但数目越多神经网络的非线性越显著，从而神经网络的强健性（robustness）更显著。

2.2 全链接层

全连接层：当前一层和前一层每个神经元相互链接，我们称当前这一层为全连接层。
思考：假设第N-1层有m个神经元，第N层有n个神经元，当第N层是全连接层的时候，则N-1和N层之间有1，这些参数可以如何表示？

从上图可以看出，所谓的经过全连接层操作后等同于就是进行了一个矩阵乘法，把n-1层的特征数量变换成为另一种数量。

2.3 激活函数

激活函数的引入：在前面的神经元的介绍过程中我们提到了激活函数，那么他到底是干什么的呢？假设我们有这样一组数据，三角形和四边形，需要把他们分为两类。

通过不带激活函数的感知机模型我们可以划出一条线, 把平面分割开。

假设我们确定了参数w和b之后，那么带入需要预测的数据，如果y>0,我们认为这个点在直线的右边，也就是正类（三角形），否则是在左边（四边形）。

但是可以看出，三角形和四边形是没有办法通过直线分开的，那么这个时候该怎么办？

在前面感知机的基础上，如果在加一层，变成多层，经过计算还是一个直线分割。

所以现在引入激活函数，在加上非线性的激活函数之后，输出的结果就不在是一条直线了。

右边是sigmoid函数，对感知机的结果，通过sigmoid函数进行处理。

如果给定合适的参数w和b，就可以得到合适的曲线，能够完成对最开始问题的非线性分割。所以激活函数很重要的一个作用就是增加模型的非线性分割能力

常见的激活函数：

看图可知：（1）sigmoid 只会输出正数，以及靠近0的输出变化率最大。（2）tanh和sigmoid不同的是，tanh输出可以是负数。（3）Relu是输入只能大于0,如果你输入含有负数，Relu就不适合，如果你的输入是图片格式，Relu就挺常用的，因为图片的像素值作为输入时取值为[0,255]。

激活函数的作用：（1）增加模型的非线性分割能力（2）提高模型的鲁棒性（3）缓解梯度消失的问题（4）加快模型的收敛

2.4 神经网络例子

一个男孩想要找一个女朋友，于是实现了一个女友判定机，随着年龄的增长，他的判定机也一直在变化。 14岁的时候：

无数次碰壁之后，男孩意识到追到女孩的可能性和颜值一样重要，于是修改了判定机：

在15岁的时候终于找到呢女朋友，但是一顿时间后他发现有各种难以忍受的习惯，最终决定分手。一段空窗期中，他发现找女朋友很复杂，需要更多的条件才能够帮助他找到女朋友，于是在25岁的时候，他再次修改了判定机：

上述的超级女友判定机其实就是神经网络，它能够接受基础的输入，通过隐藏层的线性的和非线性的变化最终的到输出。

通过上面例子，希望大家能够理解深度学习的思想：

输出的最原始、最基本的数据，通过模型来进行特征工程，进行更加高级特征的学习，然后通过传入的数据来确定合适的参数，让模型去更好的拟合数据。这个过程可以理解为盲人摸象，多个人一起摸，把摸到的结果乘上合适的权重，进行合适的变化，让他和目标值趋近一致。整个过程只需要输入基础的数据，程序自动寻找合适的参数。

3 Pytorch的介绍和使用

首先我们先学习深度学习框架工具Pytorch的安装以及常用的使用方法

安装官网：https://pytorch.org/get-started/locally/

3.1 Pytorch的介绍

Pytorch是一款facebook发布的深度学习框架，由其易用性，友好性，深受广大用户青睐。
安装地址： https://pytorch.org/get-started/locally/
安装问题参考： https://blog.csdn.net/qq_34516746/article/details/124044363?spm=1001.2014.3001.5501
Pytorch的版本

3.2 Pytorch的使用

3.2.1 创建张量

张量的定义：张量是一个统称。如下都是张量

（1）0阶张量：标量、常数，0-D Tensor （2）1阶张量：向量，1-D Tensor （3）2阶张量：矩阵，2-D Tensor （4）3阶张量（5）… （6）N阶张量

使用实例：

使用python中的列表或者序列创建tensor

import torch
import numpy as np

# 1 使用python中的列表或者序列创建tensor
tensor = torch.tensor([[1.,-1.],[1.,-1.]])
print(tensor)
# tensor([[ 1., -1.],
# [ 1., -1.]])

# 2 使用numpy中的数组创建tensor
tensor = torch.tensor(np.array([[1,2,3],[4,5,6]]))
print(tensor)
# tensor([[1, 2, 3],
# [4, 5, 6]], dtype=torch.int32)

# 3 使用torch的API创建tensor
# 3.1 torch.empty(3,4)创建3行4列的空的tensor，会用无用数据进行填充
tensor = torch.empty(3,4)
print(tensor)
# tensor([[0.0000e+00, 0.0000e+00, 2.1019e-44, 0.0000e+00],
# [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
# [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]])
# 3.2 torch.ones([3,4]) 创建3行4列的全为1的tensor
tensor = torch.ones(3,4)
print(tensor)
# tensor([[1., 1., 1., 1.],
# [1., 1., 1., 1.],
# [1., 1., 1., 1.]])
# 3.3 torch.zeros([3,4])创建3行4列的全为0的tensor
tensor = torch.zeros(3,4)
print(tensor)
# tensor([[0., 0., 0., 0.],
# [0., 0., 0., 0.],
# [0., 0., 0., 0.]])
tensor = torch.rand(3,4)
print(tensor)
# 3.4 torch.rand([3,4]) 创建3行4列的随机值的tensor，随机值的区间是[0, 1)
# tensor([[0.8135, 0.8997, 0.0663, 0.6364],
# [0.3529, 0.8294, 0.8426, 0.1294],
# [0.9024, 0.4060, 0.3591, 0.5211]])
# 3.5 torch.randint(low=0,high=10,size=[3,4]) 创建3行4列的随机整数的tensor，随机值的区间是[low, high)
tensor = torch.randint(3,10,size=(3,4))
print(tensor)
# tensor([[9, 6, 4, 7],
# [3, 4, 3, 6],
# [5, 4, 7, 6]])
# 3.6 torch.randn([3,4]) 创建3行4列的随机数的tensor，随机值的分布式均值为0，方差为1
tensor = torch.randn(3,4)
print(tensor)
# tensor([[-0.2704, -0.8910, -0.7306, 0.0561],
# [ 1.6085, -0.7441, 0.2791, -0.9556],
# [ 0.1022, -0.2119, -0.5276, -0.2539]])

3.2.2 tensor的常用方法

使用实例：

import torch
import numpy as np

# 1 获取tensor中的数据(当tensor中只有一个元素可用，获取该元素的值)：tensor.item()
tensor = torch.tensor(np.arange(1))
print(tensor) # tensor([0], dtype=torch.int32)
item = tensor.item()
print(item) # 0

# 2 转化为numpy数组
tensor = torch.tensor(np.arange(4))
print(tensor) # tensor([0, 1, 2, 3], dtype=torch.int32)
numpy = tensor.numpy()
print(numpy) # [0 1 2 3]

# 3 获取形状：tensor.size()(size中可以传入参数，获取某一维度的数值，索引从0开始，-1表示倒数第一个)
tensor = torch.tensor(np.array([[1,2,3],[4,5,6],[7,8,9]]))
print(tensor)
# tensor([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]], dtype=torch.int32)
size = tensor.size()
print(size) # torch.Size([3, 3])

# 4 形状改变：tensor.view((3,4))。
# 类似numpy中的reshape，是一种浅拷贝，仅仅是形状发生改变
# （view()中若参数为-1，表示根据确定的一维，自动设置另一维度数值）
tensor = torch.tensor(np.array([[1,2,3],[4,5,6],[7,8,9]]))
print(tensor)
# tensor([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]], dtype=torch.int32)
tensor = tensor.view(9,1)
print(tensor)
# tensor([[1],
# [2],
# [3],
# [4],
# [5],
# [6],
# [7],
# [8],
# [9]], dtype=torch.int32)

#　5 获取阶数：tensor.dim()
tensor = torch.tensor(np.array([[1,2,3],[4,5,6],[7,8,9]]))
dim = tensor.dim()
print(dim) # 2

# 6 获取最大值：tensor.max()
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
max = tensor.max()
print(max) # tensor(555, dtype=torch.int32)

# 7 tensor.argmax() 取最大值对应的下标
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
max_index = tensor.argmax()
print(max_index) # tensor(4)

# 8 tensor.max(dim=0) 返回两个数组，一个是按列取最大值以及最大值的索引
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
max = tensor.max(dim=0)
print(max)
# torch.return_types.max(
# values=tensor([ 7, 555, 9], dtype=torch.int32),
# indices=tensor([2, 1, 2]))

# 9 tensor.max(dim=0) [0] 按列取最大值
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
max = tensor.max(dim=0)[0]
print(max) # tensor([ 7, 555, 9], dtype=torch.int32)

# 10 tensor.t() 转置
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
tensor2 = tensor.T
print(tensor2)
tensor3 = tensor.t()
print(tensor3)
# tensor([[ 1, 4, 7],
# [ 2, 555, 8],
# [ 3, 6, 9]], dtype=torch.int32)

# 11 tensor[1,3] 获取tensor中第一行第三列的值
tensor = torch.tensor(np.array([[1,2,3],[4,555,6],[7,8,9]]))
print(tensor)
# tensor([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]], dtype=torch.int32)
print(tensor[1,2]) # tensor(6, dtype=torch.int32)

# 12 transpose(a,b) 交换某两维度的值
tensor = torch.tensor(np.array([[[1,2,3],[4,555,6],[7,8,9]],[[11,12,13],[15,16,17],[18,19,20]]]))
print(tensor)
# tensor([[[ 1, 2, 3],
# [ 4, 555, 6],
# [ 7, 8, 9]],
#
# [[ 11, 12, 13],
# [ 15, 16, 17],
# [ 18, 19, 20]]], dtype=torch.int32)
tensor = tensor.transpose(1,0)
print(tensor)
# tensor([[[ 1, 2, 3],
# [ 11, 12, 13]],
#
# [[ 4, 555, 6],
# [ 15, 16, 17]],
#
# [[ 7, 8, 9],
# [ 18, 19, 20]]], dtype=torch.int32)

# 13 permute()用法类似transpose()，但是permute()针对多维度，而transpose()最多操作2维度
tensor = torch.tensor(np.array([[[1,2,3],[4,555,6],[7,8,9]],[[11,12,13],[15,16,17],[18,19,20]]]))
print(tensor)
# tensor([[[ 1, 2, 3],
# [ 4, 555, 6],
# [ 7, 8, 9]],
#
# [[ 11, 12, 13],
# [ 15, 16, 17],
# [ 18, 19, 20]]], dtype=torch.int32)
tensor = tensor.permute(2,0,1)
print(tensor)
# tensor([[[ 1, 4, 7],
# [ 11, 15, 18]],
#
# [[ 2, 555, 8],
# [ 12, 16, 19]],
#
# [[ 3, 6, 9],
# [ 13, 17, 20]]], dtype=torch.int32)

# 14 tensor的切片
tensor = torch.tensor(np.array([[[1,2,3],[4,555,6],[7,8,9]],[[11,12,13],[15,16,17],[18,19,20]]]))
print(tensor)
# tensor([[[ 1, 2, 3],
# [ 4, 555, 6],
# [ 7, 8, 9]],
#
# [[ 11, 12, 13],
# [ 15, 16, 17],
# [ 18, 19, 20]]], dtype=torch.int32)
tensor = tensor[:,1]
print(tensor)
# tensor([[ 4, 555, 6],
# [ 15, 16, 17]], dtype=torch.int32)

3.2.3 tensor的数据类型

tensor中的数据类型非常多，常见类型如下：

上图中的Tensor types表示这种type的tensor是其实例

使用实例：

import torch
import numpy as np

# 1 获取tensor的数据类型:tensor.dtype
tensor = torch.randint(0,100,size=(3,4))
print(tensor)
# tensor([[83, 20, 27, 34],
# [65, 55, 81, 67],
# [21, 64, 14, 11]])
print(tensor.dtype) # torch.int64

# 2 创建数据的时候指定类型(torch.Tensor()创建时候，无法使用dtype指定形状；torch.tensor()相反)
tensor = torch.ones((3,4),dtype=torch.float32)
print(tensor)
# tensor([[1., 1., 1., 1.],
# [1., 1., 1., 1.],
# [1., 1., 1., 1.]])
print(tensor.dtype) # torch.float32

# 3 类型修改
tensor = torch.ones(3,4,dtype=torch.int32)
print(tensor.dtype) # torch.int32
tensor = tensor.type(torch.float32)
print(tensor.dtype) # torch.float32

3.2.4 tensor其他操作

代码实例：

tensor相加

import torch

# 1 tensor和tensor相加
x = torch.ones(3,4,dtype=torch.float)
print(x)
# tensor([[1., 1., 1., 1.],
# [1., 1., 1., 1.],
# [1., 1., 1., 1.]])
y = torch.randint(0,5,size=(3,4))
print(y)
# tensor([[4, 4, 3, 1],
# [2, 3, 0, 3],
# [3, 4, 3, 0]])
z = x+y
print(z)
# tensor([[5., 5., 4., 2.],
# [3., 4., 1., 4.],
# [4., 5., 4., 1.]])
z = x.add(y)
print(z)
# tensor([[5., 5., 4., 2.],
# [3., 4., 1., 4.],
# [4., 5., 4., 1.]])
z = torch.add(x,y)
print(z)
# tensor([[5., 5., 4., 2.],
# [3., 4., 1., 4.],
# [4., 5., 4., 1.]])
x.add_(y) # 带下划线的方法（比如:add_)会对tensor进行就地修改
print(x)
# tensor([[5., 5., 4., 2.],
# [3., 4., 1., 4.],
# [4., 5., 4., 1.]])

CUDA中的tensor

CUDA（Compute Unified Device Architecture），是NVIDIA推出的运算平台。 CUDA™是一种由NVIDIA推出的通用并行计算架构，该架构使GPU能够解决复杂的计算问题。

torch.cuda这个模块增加了对CUDA tensor的支持，能够在cpu和gpu上使用相同的方法操作tensor

通过.to方法能够把一个tensor转移到另外一个设备(比如从CPU转到GPU)

import torch

x = torch.ones(3,4)
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 'cuda:0' 0 表示多块GPU存在时，指定第0块GPU进行运算
if torch.cuda.is_available():
    device = torch.device("cuda")  # cuda device对象
    y = torch.ones_like(x, device=device)  # 创建一个在cuda上的tensor
    x = x.to(device)  # 使用方法把x转为cuda 的tensor
    z = x + y
    print(z) # tensor([1.9806], device='cuda:0')
    # .to方法也能够同时设置类型
    print(z.to("cpu", torch.double))   # >> tensor([1.9806], dtype=torch.float64)

因为深度学习需要强大的算力，而CPU虽然在调度任务上速度很快，但是在计算能力上相比GPU还是差了很多，所以我们在要进行计算的时候可以将tensor放入到GPU上计算，然后返回结果再返回到CPU上显示。

4 梯度下降和反向传播

4.1 什么是梯度下降

梯度：一维就叫导数，多维以后就叫梯度，可以形象的认为是相切于函数图形线上的一个点的单位向量，而梯度下降就是顺着这个函数的梯度反方向移动，找到这个函数的最低点为止。

回顾机器学习的线性回归和逻辑回归模型，首先收集数据，然后构建函数 f f f，得到 f ( x , w ) = Y p r e d i c t f(x,w)=Y_{predict} f(x,w)=
标签： 206电位器 wx11电位器

锐单商城拥有海量元器件数据手册、 IC替代型号，打造电子元器件IC百科大全！

资讯详情

pytorch入门使用及前置知识（2）NLP