诸神沉默不语-个人CSDN博文目录
具体内容以官方文件为准。 最早更新时间:2021.4.23 最新更新时间:2022年.6.30
文章目录
- 0. 参与函数的统一解释通常用于参与函数的统一解释
- 1. torch
-
- 1.1 Tensors
-
- 1.1.1 Creation Ops
- 1.1.2 Indexing, Slicing, Joining, Mutating Ops
- 1.2 Generators
- 1.3 Random Sampling
-
- 1.3.1 torch.default_generator
- 1.3.2 In-place random sampling
- 1.3.3 Quasi-random sampling
- 1.4 Serialization
- 1.5 Parallelism
- 1.6 Locally disabling gradient computation
- 1.7 Math operations
-
- 1.7.1 Pointwise Ops
- 1.7.2 Reduction Ops
- 1.7.3 Comparison Ops
- 1.7.4 Spectral Ops
- 1.7.5 Other Operations
- 1.7.6 BLAS and LAPACK Operations
- 1.8 Utilities
- 2. torch.nn
-
- 2.1 Containers
- 2.2 Convolution Layers
- 2.3 Pooling layers
- 2.4 Padding Layers
- 2.5 Non-linear Activations (weighted sum, nonlinearity)
- 2.6 Non-linear Activations (other)
- 2.7 Normalization Layers
- 2.8 Recurrent Layers
- 2.9 Transformer Layers
- 2.10 Linear Layers
- 2.11 Dropout Layers
- 2.12 Sparse Layers
- 3. torch.nn.functional
-
- 3.1 Convolution functions
- 3.2 Pooling functions
- 3.3 Non-linear activation functions
- 3.4 Linear functions
- 3.5 Dropout functions
- 4. torch.Tensor
- 5. Tensor Attributes
-
- 5.1 `torch.dtype`
- 5.2 `torch.device`
- 5.3 `torch.layout`
- 5.4 `torch.memory_format`
- 5.5 其他文档中未写的属性
- 6. Tensor Views
- 7. torch.autograd
-
- 7.1 Functional higher level API
- 7.2 Locally disabling gradient computation
- 7.3 Default gradient layouts
- 7.4 In-place operations on Tensors
- 7.5 Variable (deprecated)
- 7.6 Tensor autograd functions
- 7.7 Function
- 7.8 Context method mixins
- 7.9 Numerical gradient checking
- 7.10 Profiler
- 7.11 Anomaly detection
- 7.12 Saved tensors default hooks
- 8. torch.cuda
-
- 8.1 Random Number Generator
- 9. torch.cuda.amp
- 10. torch.backends
-
- 10.1 torch.backends.cuda
- 10.2 torch.backends.cudnn
- 11. torch.distributed
- 12. torch.distributions
- 13. torch.fft
- 14. torch.futures
- 15. torch.fx
- 16. torch.hub
- 17. torch.jit
- 18. torch.linalg
- 19. torch.overrides
- 20. torch.profiler
- 21. torch.nn.init
- 22. torch.onnx
- 22. torch.optim
-
- 22.1 How to use an optimizer
- 22.2 Algorithms
- 23. Complex Numbers
- 24. DDP Communication Hooks
- 25. Pipeline Parallelism
- 26. Quantization
- 27. Distributed RPC Framework
- 28. torch.random
- 29. torch.sparse
- 30. torch.Storage
- 31. torch.utils.benchmark
- 32. torch.utils.bottleneck
- 33. torch.utils.checkpoint
- 34. torch.utils.cpp_extension
- 35. torch.utils.data
- 36. 其他文本和尾注中未提及的参考资料
0. 常用入参及函数统一解释
- 函数常见入参
- input:Tensor格式
- requires_grad:布尔值,aotugrad这里是否需要记录Tensor上的操作
- size:一般来说,测量尺寸的数据可以是多个数字或collection格式(如list或tuple等)
- device:Tensor所处的设备(cuda或CPU),可以用torch.device(见5.2部分)或直接使用字符串和数值(torch.device替代。 使用torch.device以入参为例:
torch.randn((2,3), device=torch.device('cuda:1'))
以字符串直接作为入参的示例:torch.randn((2,3), device='cuda:1')
以数值直接作为入参示例:torch.rand((2,3), device=1)
- 函数名前加
_
是原地操作 - Parameters是可以直接按照顺序放的,Keyword Arguments则必须指定参数名(用*作为区分)
1. torch
1.1 Tensors
is_tensor(obj)
如果obj是Tensor,就返回True 注意:官方建议使用isinstance(obj, Tensor)
作为代替
1.1.1 Creation Ops
注意:通过随机取样生成Tensor的函数放在了Random sampling部分。
tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False)
将data转换为Tensor。data可以是list, tuple, NumPy ndarray, scalar等呈现数组形式的数据from_numpy(ndarray)
将一个numpy.ndarray转换为Tensor。注意这一函数的两个数据对象占用同一储存空间,修改后变化也会体现在另一对象上zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor,所有元素都为0ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor,所有元素都为1ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个与input有相同尺寸的Tensor,所有元素都为1arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
示例:
>>> torch.arange(5)
tensor([ 0, 1, 2, 3, 4])
>>> torch.arange(1, 4)
tensor([ 1, 2, 3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000, 1.5000, 2.0000])
1.1.2 Indexing, Slicing, Joining, Mutating Ops
cat(tensors, dim=0, *, out=None)
串接tensors(一串Tensor,非空Tensor在非dim维度必须形状相同),返回结果reshape(input, shape)
示例:
>>> a = torch.arange(4.)
>>> torch.reshape(a, (2, 2))
tensor([[ 0., 1.],
[ 2., 3.]])
>>> b = torch.tensor([[0, 1], [2, 3]])
>>> torch.reshape(b, (-1,))
tensor([ 0, 1, 2, 3])
squeeze(input, dim=None, *, out=None)
去掉input(Tensor)中长度为1的维度,返回这个Tensor。如果有dim就只对指定维度进行squeeze操作。 返回值与input共享储存空间。 示例代码:
>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])
stack(tensors, dim=0, *, out=None)
连接tensors(一串形状相同的Tensor),返回结果t(input)
零维和一维input不变,二维input转置,返回结果transpose(input, dim0, dim1)
返回input转置的Tensor,dim0和dim1交换。 返回值与input共享储存空间。 示例代码:
>>> x = torch.randn(2, 3)
>>> x
tensor([[ 1.0028, -0.9893, 0.5809],
[-0.1669, 0.7299, 0.4942]])
>>> torch.transpose(x, 0, 1)
tensor([[ 1.0028, -0.1669],
[-0.9893, 0.7299],
[ 0.5809, 0.4942]])
unsqueeze(input, dim)
在input指定维度插入一个长度为1的维度,返回Tensor 示例代码:
>>> x = torch.tensor([1, 2, 3, 4])
>>> torch.unsqueeze(x, 0)
tensor([[ 1, 2, 3, 4]])
>>> torch.unsqueeze(x, 1)
tensor([[ 1],
[ 2],
[ 3],
[ 4]])
nonzero(input, *, out=None, as_tuple=False)
①as_tuple=False:返回一个二维Tensor,每一行是一个input非零元素的索引 示例代码:
>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
tensor([[ 0],
[ 1],
[ 2],
[ 4]])
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
... [0.0, 0.4, 0.0, 0.0],
... [0.0, 0.0, 1.2, 0.0],
... [0.0, 0.0, 0.0,-0.4]]))
tensor([[ 0, 0],
[ 1, 1],
[ 2, 2],
[ 3, 3]])
②as_tuple=True:返回一个由一维索引Tensor组成的tuple(每个元素是一个维度上的索引) 示例代码:
>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]), as_tuple=True)
(tensor([0, 1, 2, 4]),)
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
... [0.0, 0.4, 0.0, 0.0],
... [0.0, 0.0, 1.2, 0.0],
... [0.0, 0.0, 0.0,-0.4]]), as_tuple=True)
(tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3]))
>>> torch.nonzero(torch.tensor(5), as_tuple=True)
(tensor([0]),)
where()
where(condition)
和torch.nonzero(condition, as_tuple=True)
相同
1.2 Generators
1.3 Random Sampling
manual_seed(seed)
randperm(n, *, generator=None, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False, pin_memory=False)
:返回0
-n-1
整数的一个随机permutation 示例:
>>> torch.randperm(4)
tensor([2, 1, 0, 3])
1.3.1 torch.default_generator
返回默认的CPU torch.Generator
rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor,所有元素通过[0,1)的均匀分布采样生成rand_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个跟input有相同尺寸的Tensor,所有元素通过[0,1)的均匀分布采样生成
1.3.2 In-place random sampling
1.3.3 Quasi-random sampling
1.4 Serialization
save(obj, f, pickle_module=pickle, pickle_protocol=2, _use_new_zipfile_serialization=True)
load(f, map_location=None, pickle_module=pickle, **pickle_load_args)
1.5 Parallelism
1.6 Locally disabling gradient computation
1.7 Math operations
1.7.1 Pointwise Ops
add()
返回结果Tensoradd(input, other, *, out=None)
other是标量,对input每个元素加上otheradd(input, other, *, alpha=1, out=None)
other是Tensor,other先逐元素乘标量alpha再逐元素加input
mul(input, other, *, out=None)
若other是标量:对input每个元素乘以other 若other是Tensor:input和other逐元素相乘 返回结果Tensortanh(input, *, out=None)
对input逐元素做tanh运算。返回Tensor
1.7.2 Reduction Ops
max()
max(input)
max(input, dim, keepdim=False, *, out=None)
max(input, other, *, out=None)
见1.7.3maximum()
sum(input, *, dtype=None)
返回input(Tensor)中所有元素的加和,返回Tensor dtype是期望返回值的dtypemean(input)
返回input(Tensor)中所有元素的平均值,返回Tensor
1.7.3 Comparison Ops
maximum(input, other, *, out=None)
逐元素计算input和other中较大的元素
1.7.4 Spectral Ops
1.7.5 Other Operations
flatten(input, start_dim=0, end_dim=- 1)
示例:
>>> t = torch.tensor([[[1, 2],
... [3, 4]],
... [[5, 6],
... [7, 8]]])
>>> torch.flatten(t)
tensor([1, 2, 3, 4, 5, 6, 7, 8])
>>> torch.flatten(t, start_dim=1)
tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])
1.7.6 BLAS and LAPACK Operations
BLAS简介 LAPACK
1.8 Utilities
2. torch.nn
2.1 Containers
- Module 所有神经网络单元的基本类,神经网络模型应当是Module的子类。可以在Module对象里面放Module对象(以树形结构存储),在__init__方法中将这些子Module定义为属性即可
eval()
将Module设置为evaluation mode,相当于self.train(False)
。parameters(recurse=True)
返回Module参数(一堆Tensor)的迭代器,一般都是用来传入优化器的train(mode=True)
如果入参为True,则将Module设置为training mode,training随之变为True;反之则设置为evaluation mode,training为False。zero_grad(set_to_none=False)
设置所有模型参数的梯度为0,类似于21.2 优化器的zero_grad()
- Sequential(*args) 顺序容器。Module就按照被传入构造器的顺序添加。也可以传入ordered dict 示例代码:
# Example of using Sequential
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1,20,5)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(20,64,5)),
('relu2', nn.ReLU())
]))
- ModuleList(modules=None) 以类似list的形式储存submodules。可以像标准list一样切片,但被包含的modules会自动注册,且对所有Module方法都是可见的。 示例代码:
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
def forward(self, x):
# ModuleList can act as an iterable, or be indexed using ints
for i, l in enumerate(self.linears):
x = self.linears[i // 2](x) + l(x)
return x
MyModule就是有10层线性网络的神经网络模型了。
2.2 Convolution Layers
- class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’) 在输入信号(由几个平面图像构成)上应用2维卷积
2.3 Pooling layers
2.4 Padding Layers
2.5 Non-linear Activations (weighted sum, nonlinearity)
- class ReLU(inplace=False) 逐元素应用修正线性单元(ReLU: R e L U ( x ) = ( x ) + = m a x ( 0 , x ) ReLU(x)=(x) ^+ =max(0,x) ReLU(x)=(x)+=max(0,x))
2.6 Non-linear Activations (other)
- class LogSoftmax(dim=None)
2.7 Normalization Layers
- class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) batch normalization1
2.8 Recurrent Layers
2.9 Transformer Layers
2.10 Linear Layers
- class Linear(in_features, out_features, bias=True) 对输入信号进行一个线性转换: y = x A T + b y = xA^T + b y=xAT+b
2.11 Dropout Layers
- class torch.nn.Dropout2(p=0.5, inplace=False) 在训练过程中,随机将input tensor以概率为p的伯努利分布置0。每一次forward call独立。 这一方法被证明对正则化和防止co-adaptation of neurons(我还不知道这是啥意思)有效,文献:Improving neural networks by preventing co-adaptation of feature detectors 此外,训练时输出会乘以 1 1 − p \frac{1}{1-p} 1−p1,则在评估模型时直接输出结果即可。
2.12 Sparse Layers
- class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None) embedding词典。相当于一个大矩阵,每一行存储一个word的embedding。
Embedding.weight
是这个矩阵的值(Tensor),weight.data可以改变该值。 输入是索引的列表(IntTensor或LongTensor),输出是对应的词嵌入(尺寸为 (input尺寸,embedding_dim) )。 num_embeddings是词典长度(int)。 embedding_dim是表示向量维度(int)。weight
:尺寸为 (num_embeddings, embedding_dim) ,从 N ( 0 , 1 ) \mathcal{N}(0,1) N(0,1) 中初始化数据。
示例代码:
>>> # an Embedding module containing 10 tensors of size 3 >>> embedding = nn.Embedding(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) >>> embedding(input) tensor([[[-0.0251, -1.6902, 0.7172 标签:
6431连接器