张量

什么是张量

在pytorch中，张量（tensor）是基于向量和矩阵的扩展，其可以扩展成任意维度。（注意，这里的张量与数学、物理上的张量不同）

张量本质上就是多维数组，因此，张量的维度代表了我们从张量中指定一个元素所需要的索引个数。0维张量就是标量，不需要索引；一维张量就是向量，只需要一个索引就能找到对应元素；二维张量就是矩阵，我们需要同时指定行与列，才能确定到对应的元素，\(\cdots\cdots\)。

注：对于张量的dim(维度)的使用请参考本文后面的torch.sort()和张量的常用方法中.sum()方法

为什么要使用张量

对于一个只有一个特征的样本，我们每次训练时（对每个iteration）都会选用多个样本，那么一次iteration中我们就需要使用一个向量来存储对应的mini-batch。

对于一个有多个特征的样本，一个样本就对应了一个向量，样本的第\(i\)个特征就是向量的第\(i\)个元素，我们每次训练时（对每个iteration）都会选用多个样本，那么一次iteration中我们就需要使用一个矩阵来存储对应的mini-batch。

对于图片，在卷积神经网络中，卷积与池化操作都是在特征图上进行的，每个特征图都是一个矩阵，多个特征图就是多个同阶矩阵构成的三维立体结构（立方体），因此一个样本对应一个三维张量，而在一次iteration中我们就需要使用四维张量来存储对应的mini-batch。

依此类推，我们还有可能使用到更高维度的张量进行训练。

那么，同样是多维数组，numpy的array与pytorch的tensor有什么区别呢？

相比于numpy的array，pytorch的tensor的优点如下：

pytorch的tensor可以分为CPU张量和GPU张量，顾名思义，就是pytorch的GPU张量可以运行在GPU上，实现机器学习和深度学习的训练加速。
pytorch的tensor可以在多个设备或机器上分布式操作，也就是支持多设备加速训练
pytorch的tensor可以跟踪创建计算图，易于实现前向传播与反向传播求导

什么是计算图，pytorch如何创建计算图

什么是计算图

计算图是一种数据结构，是图的一种，是用来描述运算的有向无环图。

计算图有两个主要元素：节点（Node）和边（Edge），其中：

节点表示数据，如向量，矩阵，多维张量
边表示运算，如加减乘除等等

比如，使用计算图描述\(y = (x + w)\times(w+1)\)：

首先，计算先后分为两步：

第一步：\(a = x+w\)，\(b=w+1\)

第二步：\(y = a*b\)

graph BT
id1(("x"))
id2(("w"))
id3(("1"))
id4(("a(+)"))
id5(("b(+)"))
id6(("y(*)"))
id1 --> id4
id2 --> id4
id3 --> id5
id2 --> id5
id4 --> id6
id5 --> id6

计算图的求导方法：

计算图的求导方式可以认为是递归求导，在上图中，首先是\(y\)分别对\(a\)和\(b\)求导，然后是\(a\)对\(x, w\)求导、\(b\)对\(w, 1\)求导，而对常数求导没有意义，因此\(b\)只需要对\(w\)求导，比如，当\(a = a_0, b=b_0, x=x_0, w=w_0\)时： \[ \begin{aligned} &\frac{\partial y}{\partial a} = b_0\\\ &\frac{\partial y}{\partial b} = a_0\\\ &\frac{\partial a}{\partial x} = w_0\\\ &\frac{\partial a}{\partial w} = x_0\\\ &\frac{\partial b}{\partial w} = 1 \end{aligned} \]

graph TB
id1(("x"))
id2(("w"))
id4(("a(+)"))
id5(("b(+)"))
id6(("y(*)"))
id6 --> id4
id6 --> id5
id4 --> id1
id4 --> id2
id5 --> id2

因此， \[ \begin{aligned} &\frac{\partial y}{\partial x} = \frac{\partial y}{\partial a}\frac{\partial a}{\partial x} \end{aligned} \] 而\(y\)对\(w\)导数在递归中有两条路径：\(y\rightarrow a\rightarrow w\)与\(y\rightarrow b \rightarrow w\)，那么最终的导数就是把各个路径上的导数进行求和，因此： \[ \frac{\partial y}{\partial w} = \frac{\partial y}{\partial a}\frac{\partial a}{\partial w}+\frac{\partial y}{\partial b}\frac{\partial b}{\partial w} \] 因此，计算图求导实际上就是微积分上的链式法则

计算图的运算：

动态计算图与静态计算图

根据计算图的搭建方式，可以将计算图分为动态图和静态图。

动态图：运算与搭建同时进行，相比于静态图，动态图更加灵活，易于调节。

静态图：先搭建图，然后运算，相比于动态图，静态图更加高效，但不灵活。

Pytorch使用的是动态图搭建方式，TensorFlow默认是使用静态图机制。

在pytorch的动态图机制中，在一个iteration中，每执行一次运算，就在计算图上进行添加，在神经网络的反向传播中，则可以使用搭建的动态图进行反向求导。

静态图机制中，只会根据第一次模型前向传播来构建一个静态的计算图，后面的梯度自动求导都是根据这个计算图来计算的；而动态图机制中，则会为每次前向传播计算都构建一个动态计算图，后续的每一次迭代都是使用一个新的计算图进行计算的。

然后是反向求导：

张量的类型

按照张量内部数据类型分类，可以分为双精度浮点型（torch.float64或者torch.double）、单精度浮点型（torch.float32或者torch.float）、16位浮点型（torch.float16或者torch.half）、长整型（torch.int64或者torch.long）、整型（torch.int32或者torch.int）、短整型（torch.int16或者torch.short）、8位整型（torch.int8）、8位无符号整型（torch.uint8）、布尔类型（torch.bool）

除此之外，根据张量所在设备，张量还可以分为CPU张量和GPU张量，前者在CPU上运算，后者在GPU上运算，如果要使用后者，需要下载支持CUDA的pytorch，并且本地有NVIDIA显卡并装有对应版本的CUDA（可以附加cuDNN来对深度学习进行加速）

Data type	dtype	CPU tensor	GPU tensor
32-bit floating point	`torch.float32`or`torch.float`	`torch.FloatTensor`	`torch.cuda.FloatTensor`
64-bit floating point	`torch.float64`or`torch.double`	`torch.DoubleTensor`	`torch.cuda.DoubleTensor`
16-bit floating point	`torch.float16`or`torch.half`	`torch.HalfTensor`	`torch.cuda.HalfTensor`
8-bit integer(unsigned)	`torch.uint8`	`torch.ByteTensor`	`torch.cuda.ByteTensor`
8-bit integer(signed)	`torch.int8`	`torch.CharTensor`	`torch.cuda.CharTensor`
16-bit integer(signed)	`torch.int16`or`torch.short`	`torch.ShortTensor`	`torch.cuda.ShortTensor`
32-bit integer(signed)	`torch.int32`or`torch.int`	`torch.IntTensor`	`torch.cuda.IntTensor`
64-bit integer(signed)	`torch.int64`or`torch.long`	`torch.LongTensor`	`torch.cuda.LongTensor`
Boolean	`torch.bool`	`torch.BoolTensor`	`torch.cuda.BoolTensor`

在我们定义张量时，如果未明确指定张量类型，通常默认的就是32位浮点型或者64位整型（如果创建的数据中写入的均为整数，默认为64位整型；如果带了小数点，默认为32位浮点型）

张量的常用属性和方法

属性/方法	作用
`dtype`	张量的数据类型，根据内部数据类型进行分类，如`torch.float32`、`torch.int64`
`type()`	返回张量的数据类型，根据所在设备的分类返回，比如`torch.LongTensor`、`torch.cuda.LongTensor`
`shape`	返回张量的形状，比如`torch.Size([2, 3])`
`device`	张量所在设备，如果为CPU张量，则返回`cpu`，如果为GPU张量，则返回`cuda:n`，`n`代表在第几个GPU上（如果是第一个GPU，则\(n=0\)，依此类推），如果只有一个GPU，则返回`cuda:0`
`cpu()`	对于GPU张量，可以使用`.cpu()`的方法返回内容等相同但是在CPU上的张量（不会改变原张量）
`cuda()`	对于CPU张量，可以使用`.cuda()`的方法返回内容相同但是在GPU上的张量（不会改变原张量），如果使用了多个GPU，则需要指定张量在哪个GPU，比如指定在`cuda:0`，那么`cuda()`函数中的参数就为0
`is_leaf`	是否为叶子结点
`requires_grad`	是否支持反向求导
`requires_grad_()`	设置张量是否支持反向求导（in-place操作）
`retain_grad()`	保留非叶子结点的梯度
`grad`	当前张量的梯度
`grad_fn`	记录张量是使用哪些方法计算得来的
`backward()`	反向求导
`data`与`detach()`	返回相同张量，与原张量共享数据（内存地址相同）
`zero_()`	将使用该方法的张量内的全部元素的值变为零（in-place方法）
`item()`	对零维张量，可以使用`item()`方法来使得这个张量返回成python数字类型的数据
`sum()`	对张量内元素进行求和
`size()`	返回张量的形状，比如`torch.Size([2, 3])`
`max()`	返回张量中的最大值
`min()`	返回张量中的最小值
`numpy()`	转换为numpy的ndarray类型的数据
`tolist()`	转换为列表list数据
`clone()`	复制张量（不共享内存）

`dtype`

张量的数据类型，根据内部数据类型进行分类，如torch.float32、torch.int64

比如：

1
2
3

import torch
a = torch.tensor([1, 2, 3])
print(a.dtype)

输出结果：

1	`torch.int64`

`type()`

返回张量的数据类型，根据所在设备的分类返回，比如torch.LongTensor、torch.cuda.LongTensor

比如：

import torch
a = torch.tensor([1, 2, 3])
print(a.type())
print(a.cuda().type())

输出结果：

1 2	`torch.LongTensor torch.cuda.LongTensor`

`shape`

返回张量的形状，

比如：

import torch
a = torch.tensor([1, 2, 3])
b = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
c = torch.tensor([[[1, 2, 3], 
                   [4, 5, 6]],
                  [[7, 8, 9],
                   [10, 11, 12]]])
print(a.shape)
print(b.shape)
print(c.shape)

输出结果：

1
2
3

torch.Size([3])
torch.Size([2, 3])
torch.Size([2, 2, 3])

`device`

张量所在设备，如果为CPU张量，返回cpu，如果为GPU张量，返回cuda:n，n代表在第几个GPU上（如果是第一个GPU，则\(n=0\)，依此类推），如果只有一个GPU，则返回cuda:0

比如：

import torch
a = torch.tensor([1, 2, 3])
print(a.device)
print(a.cuda().device)

输出结果：

1 2	`cpu cuda:0`

`cpu()`

对于GPU张量，可以使用.cpu()的方法返回内容等相同但是在CPU上的张量（不会改变原张量）。

比如：

import torch
a = torch.tensor([1, 2, 3], device='cuda')
b = a.cpu()
print(a.device)
print(b.device)

输出结果：

1 2	`cuda:0 cpu`

`cuda()`

对于CPU张量，可以使用.cuda()的方法返回内容相同但是在GPU上的张量（不会改变原张量）。

比如：

import torch
a = torch.tensor([1, 2, 3])
b = a.cuda()
print(a.device)
print(b.device)

输出结果：

1 2	`cpu cuda:0`

注意：如果使用了多个GPU，则需要指定张量在哪个GPU，比如指定在cuda:0，那么cuda()函数中的参数就为0：

1	`b = a.cuda(0)`

`is_leaf`

在计算图中，有一种节点叫做叶子结点，用户创建的节点称为叶子结点，在反向求导后，默认情况下只有叶子结点才能保存梯度，而非叶子节点默认会释放梯度（可以节省内存）

比如：

import torch
a = torch.tensor([1, 2, 3])
b = a + a
c = b.sum()
print(a.is_leaf)
print(b.is_leaf)
print(c.is_leaf)

输出结果：

1
2
3

True
True
True

`requires_grad`

根据Pytorch的自动求导机制，如果一个张量设置requires_grad为True的话，才会对这个张量以及由这个张量计算出来的其他张量求导，并将数值存储在张量的grad属性中。如果一个张量设置requires_grad = True，那么由它计算出来的其他张量的requires_grad的值也为True，如果requires_grad的值为False，则不会对该张量反向求导，如果整个流程的张量的requires_grad都为False（使用backward()方法的张量的requires_grad = False），那么反向求导会报错。

注意：如果要进行反向求导，张量必须是浮点型，不能为整型或布尔型，否则反向求导时会报错。

比如：

import torch
a = torch.tensor([1, 2, 3], dtype = torch.float, requires_grad = True)
b = a + a
c = b.sum()
print(f"a是否为叶子节点：{a.is_leaf}\nb是否为叶子结点：{b.is_leaf}\nc是否为叶子结点：{c.is_leaf}\n")
print(f"a.requires_grad = {a.requires_grad}\nb.requires_grad = {b.requires_grad}\nc.requires_grad = {c.requires_grad}")

输出结果：

a是否为叶子节点：True
b是否为叶子结点：False
c是否为叶子结点：False

a.requires_grad = True
b.requires_grad = True
c.requires_grad = True

反向求导：

import torch
a = torch.tensor([1, 2, 3], dtype = torch.float, requires_grad = True)
b = a + a
c = b.sum()
c.backward()
print(a.grad)

输出结果：

1	`tensor([2., 2., 2.])`

再比如：

import torch
a = torch.tensor([1, 2, 3], dtype= torch.double)
b = a + a
b.requires_grad_(True) # 设定b的requires_grad为True
c = b.sum()
c.backward()

print(f"a是否为叶子节点：{a.is_leaf}\nb是否为叶子结点：{b.is_leaf}\nc是否为叶子结点：{c.is_leaf}\n")
print(f"a.requires_grad = {a.requires_grad}\nb.requires_grad = {b.requires_grad}\nc.requires_grad = {c.requires_grad}\n")
print(f"b的梯度为：{b.grad}\na的梯度为：{a.grad}")

输出结果：

a是否为叶子节点：True
b是否为叶子结点：True
c是否为叶子结点：False

a.requires_grad = False
b.requires_grad = True
c.requires_grad = True

b的梯度为：tensor([1., 1., 1.], dtype=torch.float64)
a的梯度为：None

a并不是由b计算得出的，因此默认a的requires_grad为False，也就是不会计算a的梯度。同时，因为设置了b的requires_grad = True，那么b为叶子结点，而在上一个例子中，因为设置了a的requires_grad = True，那么a为叶子节点，而通过叶子节点计算出的b与c不再是叶子结点。

如果没有设置requires_grad = True，那么通过该张量计算出来的张量也是叶子结点。（因为并没有构建计算图）

`requires_grad_()`

除了在定义阶段中使用requires_grad参数决定是否自动求导，我们还可以在张量构建后使用requires_grad_()来设置张量的requires_grad属性（in-place操作）

比如：

import torch
a = torch.tensor([1, 2, 3], dtype= torch.double)
b = a + a
b.requires_grad_(True) # 设定b的requires_grad为True
c = b.sum()
c.backward()
print(b.grad)

输出结果：

1	`tensor([1., 1., 1.], dtype=torch.float64)`

`retain_grad()`

在默认情况下，非叶子结点在反向求导后的梯度是会释放的，求该非叶子结点的梯度的目的只是为了实现反向传播，在全连接神经网络中，我们可以认为各个神经元的权重与偏置为叶子结点，而线性映射\(z\)与通过激活函数的输出值\(a\)为非叶子结点（也就是需要训练的参数为叶子结点，其余为非叶子结点），因此，最终Pytorch只会保留这些叶子结点的梯度，以便于实现梯度下降，而只在计算过程中起到中转作用的非叶子结点的梯度则会被释放，以便于减少显存/内存的占用量。

对于非叶子结点使用.grad，得到的结果为None，并会收到警告。

如果我们想要保存非叶子结点的梯度，需要使用retain_grad()方法：

比如：

import torch
a = torch.tensor([1, 2, 3], dtype= torch.float, requires_grad = True)
b = a + a
c = b.sum()
c.backward()
print(b.grad)

输出结果：

1
2

None
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.

保存非叶子结点的梯度：

import torch
a = torch.tensor([1, 2, 3], dtype= torch.float, requires_grad = True)
b = a + a
b.retain_grad()
c = b.sum()
c.backward()
print(b.grad)

输出结果：

1	`tensor([1., 1., 1.])`

`grad`

反向求导后，可以使用grad来获取当前张量的梯度，

比如：

import torch
a = torch.tensor([1, 2, 3], dtype= torch.float, requires_grad = True)
b = a + a
c = b.sum()
c.backward()
print(a.grad)

输出结果：

1	`tensor([2., 2., 2.])`

grad返回的仍然是tensor类型的数据

`grad_fn`

grad_fn的作用是记录张量是使用哪些方法计算得来的，用以构建计算图，在反向求导时知道该张量是如何计算得出的，才能够正确地传递导数

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype = torch.float).requires_grad_(True)
print(a)
print(a*a)
print(a[0])
print(a[0][1]*a[1][0])
print(a[0] + a[1])

输出结果：

tensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)
tensor([[ 1.,  4.,  9.],
        [16., 25., 36.]], grad_fn=<MulBackward0>)
tensor([1., 2., 3.], grad_fn=<SelectBackward0>)
tensor(8., grad_fn=<MulBackward0>)
tensor([5., 7., 9.], grad_fn=<AddBackward0>)

`backward()`

开始反向求导，必须保证当前张量的requires_grad = True，否则会报错。

注意：只有标量（零维张量）才能直接使用backward()

如果要对其他维度的张量求导，需要向backward()中传入相同维度的张量参数，实际原理如下：

比如我们想对J求导，而J为一个\(2\times3\)的矩阵，比如： \[ J = \left[ \begin{matrix} j_1&j_2&j_3\\\ j_4&j_5&j_6 \end{matrix} \right] \] 我们传入的参数也应该为一个\(2\times3\)维度的张量，假设这个张量为\(M\)，设 \[ M = \left[ \begin{matrix} m_1&m_2&m_3\\\ m_4&m_5&m_6 \end{matrix} \right] \] 反向求导：J.backward(M)

Pytorch所做的其实就是让\(J\)与\(M\)对应位置元素相乘后相加，成为一个新的标量： \[ J' = \displaystyle\sum_{i=1}^{6}{j_im_i} \] 然后对\(J'\)反向求导

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype = torch.float, device='cuda:0', requires_grad = True)
b = a + a
print(f"b = {b}")
b.backward(torch.cuda.FloatTensor([[1, 1, 1],
                             [1, 1, 1]]))
print(a.grad)

输出结果：

b = tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]], device='cuda:0', grad_fn=<AddBackward0>)
tensor([[2., 2., 2.],
        [2., 2., 2.]], device='cuda:0')

注意：backward()使用一次后，计算图其实已经就没有了，也就是说，我们无法进行多次反向求导，如果我们想要进行多次反向求导，就需要将backward()的retain_graph设定为True

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype = torch.float, device='cuda:0', requires_grad = True)
b = a + a
print(f"b = {b}")
b.backward(torch.cuda.FloatTensor([[1, 1, 3],
                             [1, 2, 1]]), retain_graph = True)
print(a.grad)
b.backward(torch.cuda.FloatTensor([[1, 1, 3],
                             [1, 2, 1]]))
print(a.grad)

输出结果：

b = tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]], device='cuda:0', grad_fn=<AddBackward0>)
tensor([[2., 2., 6.],
        [2., 4., 2.]], device='cuda:0')
tensor([[ 4.,  4., 12.],
        [ 4.,  8.,  4.]], device='cuda:0')

如果需要自定义的计算方法，需要提前建立好各个张量，之后改变张量内的值：

import torch
a = torch.tensor([[1, 2, 3],
                  [1, 1, 1]], dtype = torch.float).requires_grad_(True)
b = torch.tensor([[2, 4, 6],
                  [1, 1, 1]], dtype = torch.float).requires_grad_(True)
c = torch.Tensor(2, 3)
for i in range(2):
    for j in range(3):
        c[i][j] = a[i][j]*b[i][j]
d = c.sum()
d.backward()
print(a.grad)
print(b.grad)

输出结果：

tensor([[2., 4., 6.],
        [1., 1., 1.]])
tensor([[1., 2., 3.],
        [1., 1., 1.]])

`data`与`detach()`

假设张量为a，那么a.data与a.detach()两种方法均会返回与a相同的张量，且与原张量a共享数据，一方改变，另一方也会改变（内存地址相同）。

data与detach()两种方法返回的张量的requires_grad = False，也就是说，这两种方法返回的张量的内容是可以进行更改的，并且不会使计算图发生改变（如果一个张量的requires_grad = True，那么这个张量的内容将无法进行更改），因此，如果我们想要改变requires_grad = True的张量，我们可以使用data或者detach()方法得到相同内容（内存地址也相同）但不会构建计算图的张量，然后对这个张量的内容进行更改，这样就使得原先的张量内容更改了。

data与detach()的区别是：data不安全，使用.data返回的张量不能被autograd追踪求微分，也就是说，我们使用.data更改了张量的内容后，计算图依旧会根据原先的内容进行反向求导，这很可能导致求导结果与我们需要的结果不同。而如果我们使用的是.detach()，当我们修改张量的值并进行求导操作，会报错。

通常情况下，最好使用.detach()进行操作。

需要注意的是，如果我们想要直接修改某个张量内部的值，可以使用a.data = ......，这样我们就能直接修改张量a的内容，并且张量a的其余属性不变，如果是用a = ......，很有可能导致a变成一个新的张量

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
b = a.data
c = a.detach()
b = a - a
print(a)
c = a - a
print(a)
a.data = a.data - a.data
print(a)

输出结果：

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[0, 0, 0],
        [0, 0, 0]])

我们通常使用.data来进行梯度下降

`zero_()`

in-place方法，会将使用该方法的张量内的全部元素的值变为零，比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a)
a.zero_()
print(a)

输出结果：

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[0, 0, 0],
        [0, 0, 0]])

通常，我们可以使用zero_()方法对梯度进行清零操作（如果不进行清零操作，多次前向与反向传播后，梯度会不断累积）：

import torch
a = torch.randn(3, 3)
b = a.sum()
a.data = a.data - 0.01*a.grad
a.grad.data.zero_()

`item()`

如果为零维张量，也就是张量内只有一个元素，那么我们可以使用item()方法来使得这个张量返回成python数字类型的数据（返回标量），比如：

import torch
a = torch.tensor([2])
print(a)
print(a[0])
print(a.item())

输出结果：

1
2
3

tensor([2])
tensor(2)
2

`sum()`

对张量内元素进行求和，比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a.sum())

返回结果：

1	`tensor(21)`

因此，.sum()方法返回的依旧是一个张量

我们也可以分维度来进行求和，使用axis来指定方向，比如axis = 0代表行方向，axis = 1代表列方向，如果使用axis = 0，那么程序会把每行同一个位置的元素加起来，如果axis = 1，那么程序会把每列同一个位置的元素加起来，比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a.sum(axis = 0))
print(a.sum(axis = 1))

输出结果：

1 2	`tensor([5, 7, 9]) tensor([ 6, 15])`

当然，除了使用axis，我们还可以使用dim，dim = n与axis = n结果相同，dim = n表示沿着第n维切片，特殊地，dim = -1表示沿着最后一个维度切片，

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a.sum(dim = 0))
print(a.sum(dim = 1))

输出结果：

1 2	`tensor([5, 7, 9]) tensor([ 6, 15])`

又比如：

import torch
a = torch.tensor([[[1,1,1],[2,2,2],[3,3,3]],
                  [[4,4,4],[5,5,5],[6,6,6]],
                  [[7,7,7],[8,8,8],[9,9,9]]])
print(a)
print("\n")
print(a.sum(axis = 0))
print(a.sum(dim = 0))
print("\n")
print(a.sum(axis = 1))
print(a.sum(dim = 1))
print("\n")
print(a.sum(axis = 2))
print(a.sum(dim = 2))
print("\n")
print(a.sum(dim = -1))

输出结果：

tensor([[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])


tensor([[12, 12, 12],
        [15, 15, 15],
        [18, 18, 18]])
tensor([[12, 12, 12],
        [15, 15, 15],
        [18, 18, 18]])


tensor([[ 6,  6,  6],
        [15, 15, 15],
        [24, 24, 24]])
tensor([[ 6,  6,  6],
        [15, 15, 15],
        [24, 24, 24]])


tensor([[ 3,  6,  9],
        [12, 15, 18],
        [21, 24, 27]])
tensor([[ 3,  6,  9],
        [12, 15, 18],
        [21, 24, 27]])


tensor([[ 3,  6,  9],
        [12, 15, 18],
        [21, 24, 27]])

按维度求和就是将该维度上不同，其余维度上相同位置的元素相加

维度的判定：

张量的第零维度也就是张量第一个索引所代表的维度，比如：

import torch
a = torch.tensor(
       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])
print(a[0])
print()
print(a[1])
print()
print(a[2])

输出结果：

tensor([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]])

tensor([[4, 4, 4],
        [5, 5, 5],
        [6, 6, 6]])

tensor([[7, 7, 7],
        [8, 8, 8],
        [9, 9, 9]])

因此，在三维上，第零个维度就是深度（对应图像的通道数）

如果要将第零维度方向的张量求和，也就是将这些矩阵的每个元素分别求和（即[a[0] + a[1] + a[2]]），得到的结果为：

import torch
a = torch.tensor(
       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])
print(a.sum(dim = 0))

输出结果：

1
2
3

tensor([[12, 12, 12], 
        [15, 15, 15], 
        [18, 18, 18]])

同理，第一个维度可以用第二个索引表示：

import torch
a = torch.tensor(
       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])
print(a[0][0])
print(a[0][1])
print(a[0][2])

输出结果：

1
2
3

tensor([1, 1, 1])
tensor([2, 2, 2])
tensor([3, 3, 3])

很明显，第一维度就是矩阵中的行，如果要将第一维度方向的张量求和，也就是分别将每个矩阵中，按行求和（不同行中的同一元素相加求和，也就是[a[0][0] + a[0][1] + a[0][2], a[1][0] + a[1][1] + a[1][2], a[2][0] + a[2][1] + a[2][2]]）（第）：

import torch
a = torch.tensor(
       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])
print(a.sum(dim = 1))

输出结果：

1
2
3

tensor([[ 6,  6,  6],
        [15, 15, 15],
        [24, 24, 24]])

第三个维度可以用第三个索引表示，求和也就是将其余维度相同第三个维度不同的元素加到一起：

import torch
a = torch.tensor(
       [[[1, 1, 1],
         [2, 2, 2],
         [3, 3, 3]],

        [[4, 4, 4],
         [5, 5, 5],
         [6, 6, 6]],

        [[7, 7, 7],
         [8, 8, 8],
         [9, 9, 9]]])
print([
    [a[0][0][0] + a[0][0][1] + a[0][0][2], a[0][1][0] + a[0][1][1] + a[0][1][2], a[0][2][0] + a[0][2][1] + a[0][2][2]],
    [a[1][0][0] + a[1][0][1] + a[1][0][2], a[1][1][0] + a[1][1][1] + a[1][1][2], a[1][2][0] + a[1][2][1] + a[1][2][2]],
    [a[2][0][0] + a[2][0][1] + a[2][0][2], a[2][1][0] + a[2][1][1] + a[2][1][2], a[2][2][0] + a[2][2][1] + a[2][2][2]]
])
print(a.sum(dim=2))

输出结果：

[[tensor(3), tensor(6), tensor(9)], [tensor(12), tensor(15), tensor(18)], [tensor(21), tensor(24), tensor(27)]]
tensor([[ 3,  6,  9],
        [12, 15, 18],
        [21, 24, 27]])

`size()`

返回张量的大小（张量内元素的个数），比如：

import torch
a = torch.tensor([[1, 2, 3],
				 [4, 5, 6]])
print(a.size())

输出结果：

1	`torch.Size([2, 3])`

注意，返回的结果是torch.Size类型的数据，如果我们想要获取其中的一个值，其与从列表中提取元素的方法相同：

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a.size()[0])
print(a.size()[1])

输出结果：

1
2

2
3

当然，我们还可以使用axis来指定方向，

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(a.size(axis = 0))
print(a.size(axis = 1))

输出结果：

1
2

2
3

`max()`

返回张量中的最大值，比如：

import torch
a = torch.randn(3, 3)
print(a)
print(a.max())

输出结果：

tensor([[ 0.1255,  1.0074,  1.8243],
        [-0.2208,  0.5725,  0.3242],
        [ 1.8480,  2.1703,  1.1590]])
tensor(2.1703)

如果我们是想求某个维度上的最大值，我们也可以使用axis或者dim参数，比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [0, 5, 6]])
print(a.max(dim = 0))

输出结果：

1
2
3

torch.return_types.max(
values=tensor([1, 5, 6]),
indices=tensor([0, 1, 1]))

返回的结果为torch.return_types.max类型的数据，其有values和indices两种属性，其中.values记录着某个维度上的最大值，indices记录着最大值是该维度上的第几个（从第0个开始）

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [0, 5, 6]])
print(a.max(dim = 0).values)

输出结果：

1	`tensor([1, 5, 6])`

比如：

import torch
a = torch.tensor([[1, 2, 3],
                  [0, 5, 6]])
print(a.max(dim = 0).indices)

输出结果：

1	`tensor([0, 1, 1])`

注意，两种属性返回的结果均为torch.tensor类型的数据

`min()`

返回张量中的最小值，比如：

import torch
a = torch.randn(3, 3)
print(a)
print(a.min())

输出结果：

tensor([[-1.1288,  0.6214,  0.7543],
        [-1.4536,  0.2196,  1.9445],
        [ 0.2462,  1.0133,  0.0296]])
tensor(-1.4536)

如果要求某个维度方向的最小值，用法与max()相同

`numpy()`

转换为numpy的ndarray类型的数据，如果张量位于GPU上，需要先转为CPU张量，再转为numpy的ndarray，

import torch
a = torch.randn(3, 3)
print(a.numpy())
print(type(a.numpy()))

输出结果：

[[ 2.1274524   0.5391169   0.11493025]
 [-0.27077958 -1.9428172  -1.8231913 ]
 [-0.39520448  0.98392785 -0.73108184]]
<class 'numpy.ndarray'>

`tolist()`

转换为列表list数据，

import torch
a = torch.randn(3, 3)
print(a.tolist())
print(type(a.tolist()))

输出结果：

1
2

[[-0.2866360545158386, 0.21355263888835907, -0.8256316781044006], [-0.6836923360824585, 0.8668343424797058, 0.02956523187458515], [-0.9885536432266235, 1.0136524438858032, -1.4012690782546997]]
<class 'list'>

`clone()`

直接使用赋值=，新张量与原张量会共用内存，修改新的张量原张量也会改变，

复制张量clone()，与原张量不共用内存，修改复制后的张量不会对原张量产生影响

import torch
a = torch.arange(1, 10).reshape(3, 3)
b = a.clone()
print("修改前：")
print(a)
print(b)
b[0] = 6
print("修改后：")
print(a)
print(b)

输出结果：

修改前：
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
修改后：
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[6, 6, 6],
        [4, 5, 6],
        [7, 8, 9]])

如果直接使用赋值符号=：

import torch
a = torch.arange(1, 10).reshape(3, 3)
b = a
print("修改前：")
print(a)
print(b)
b[0] = 6
print("修改后：")
print(a)
print(b)

输出结果：

修改前：
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
修改后：
tensor([[6, 6, 6],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[6, 6, 6],
        [4, 5, 6],
        [7, 8, 9]])

张量的创建

创建方式	作用
`torch.tensor()`	直接创建
`torch.from_numpy()`	从numpy创建
`torch.zeros()`	创建全零张量
`torch.zeros_like()`	创建全零张量
`torch.ones()`	创建全一张量
`torch.ones_like()`	创建全一张量
`torch.full()`	创建全\(n\)张量
`torch.full_like()`	创建全\(n\)张量
`torch.eye()`	创建单位对角阵
`torch.arange()`	创建等差的一维张量
`torch.linspace()`	创建均分的一维张量
`torch.logspace()`	创建对数均分的一维张量
`torch.normal()`	创建正态分布的张量
`torch.randn()`	创建标准正态分布的张量
`torch.randn_like()`	创建标准正态分布的张量
`torch.rand()`	创建\([0,1)\)上的均匀分布
`torch.rand_like()`	创建\([0,1)\)上的均匀分布
`torch.randint()`	创建\([\mathrm{low},\mathrm{high})\)上的整数均匀分布
`torch.randint_like()`	创建\([\mathrm{low},\mathrm{high})\)上的整数均匀分布
`torch.randperm()`	生成从0到\(n-1\)的随机排列
`torch.bernoulli()`	生成伯努利分布

直接创建

torch.tensor()：

torch.tensor(
			data,
			dtype=None,
			device=None,
			requires_grad=False,
			pin_memory=False)

data：数据，可以是列表list，可以是numpy的ndarray等等
dtype：数据类型，默认与data的数据类型一致
device：所在设备，cuda/cpu，多GPU需要指明是哪个GPU：cuda:n
requires_grad：是否需要梯度
pin_memory：是否存于锁页内存

根据张量类型创建

1
2
3

a = torch.DoubleTensor([1, 2, 3])
b = torch.cuda.FloatTensor([[1, 2, 3],
                            [4, 5, 6]])

从numpy创建

torch.from_numpy(ndarray)：

从torch.from_numpy()创建的张量与原来的ndarray共享内存，当修改其中一个的数据，另一个也将被改动

import torch
import numpy as np

a_np = np.arange(1, 10)
a = torch.from_numpy(a_np)
print(a)
print(a_np)
a[2] = 100
print()
print(a)
print(a_np)

输出结果：

tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)
[1 2 3 4 5 6 7 8 9]

tensor([  1,   2, 100,   4,   5,   6,   7,   8,   9], dtype=torch.int32)
[  1   2 100   4   5   6   7   8   9]

依据数值创建

`torch.zeros()`

创建全零张量

torch.zeros(*size,
           out=None,
           dtype=None,
           layout=torch.strided,
           device=None,
           requires_grad=False)

*size：张量的形状，如\((3,3)\)、\((3,224,224)\)、2, 3, 4等
out：输出的张量
layout：内存中布局形式，有strided，sparse_coo等，其中：默认的为strided，如果使用稀疏张量，sparse_coo则更合适
device：所在设备
requires_grad：是否需要梯度

比如：

import torch
b = torch.tensor([1])
a = torch.zeros((3, 3), out=b)
print(a)
print(b)

输出结果：

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

同时，如果使用out参数输出张量，那么返回的张量和输出的张量的内存是一样的：

import torch
b = torch.tensor([1])
a = torch.zeros((3, 3), out = b)
b[1][1] = torch.tensor([1])
print(a)
print(b)

输出结果：

tensor([[0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]])
tensor([[0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]])

注意，*size代表我们可以传入多个数，并不一定要为元组：

1
2
3

import torch
a = torch.zeros(3, 3)
print(a)

输出结果：

1
2
3

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

`torch.zeros_like()`

创建全零张量

torch.zeros_like(input,
                dtype=None,
                layout=None,
                device=None,
                requires_grad=False)

input：创建与input同形状的全零张量
dtype：数据类型
layout：内存中布局形式

import torch
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
b = torch.zeros_like(a)
print(a)
print(b)

输出结果：

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[0, 0, 0],
        [0, 0, 0]])

`torch.ones()`

创建全一张量，用法与torch.zeros()相同

`torch.ones_like()`

创建全一张量，用法与torch.zeros_like()相同

`torch.full()`

创建全\(n\)张量，\(n\)的值由参数fill_value设定：

torch.full(size,
          fill_value,
          out=None,
          dtype=None,
          layout=torch.strided,
          device=None,
          requires_grad=False)

size：张量的形状，如(3, 3)
fill_value：张量的值

比如：

1
2
3

import torch
a = torch.full((3, 3), fill_value=2)
print(a)

输出结果：

1
2
3

tensor([[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]])

`torch.full_like()`

创建全\(n\)张量，\(n\)的值由参数fill_value设定：

torch.zeros_like(input,
                 fill_value,
                dtype=None,
                layout=None,
                device=None,
                requires_grad=False)

用法参考torch.zeros_like()

import torch
a = torch.zeros(2, 3)
b = torch.full_like(a, fill_value = 6)
print(b)

输出结果：

1 2	`tensor([[6., 6., 6.], [6., 6., 6.]])`

`torch.eye()`

创建单位对角矩阵

torch.eye(n,
         m=None,
         out=None,
         dtype=None,
         layout=torch.strided,
         device=None,
         requires_grad=False)

n：矩阵行数
m：矩阵列数，默认为空，为空时创建的为方阵

比如：

1
2
3

import torch
a = torch.eye(3)
print(a)

输出结果：

1
2
3

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

按照间隔创建

`torch.arange()`

创建等差的一维张量（等差数列）

torch.arange(start=0,
            end,
            step=1,
            out=None,
            dtype=None,
            layout=torch.strided,
            device=None,
            requires_grad=False)

start：数列起始值，默认为0
end：数列“结束值”（不包含）
step：数列公差（步长），默认为1

注意：返回的结果中不包含end

比如：

1
2
3

import torch
a = torch.arange(10)
print(a)

输出结果：

1	`tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`

比如：

import torch
a = torch.arange(1, 10)
print(a)
b = torch.arange(1, 10, 0.5)
print(b)

输出结果：

1
2
3

tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000, 4.5000, 5.0000,
        5.5000, 6.0000, 6.5000, 7.0000, 7.5000, 8.0000, 8.5000, 9.0000, 9.5000])

`torch.linspace()`

创建均分的一维张量

torch.linspace(start,
              end,
              steps=100,
              out=None,
              dtype=None,
              layout=torch.strided,
              device=None,
              requires_grad=False)

start：数列起始值
end：数列结束值（包含）
steps：数列长度，默认为100（不是公差）

注意：返回的结果中包含end，steps表示的不是公差，而是均分的数目

比如：

1
2
3

import torch
a = torch.linspace(0, 10, 11)
print(a)

输出结果：

1	`tensor([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])`

`torch.logspace()`

创建对数均分的一维张量，常用于创建等比数列

torch.logspace(start,
              end,
              steps=100,
              base=10.0,
              out=None,
              dtype=None,
              layout=torch.strided,
              device=None,
              requires_grad=False)

start：数列起始值
end：数列结束值
steps：数列长度
base：对数函数的底，默认为10

生成的是\(\mathrm{base}^\mathrm{start}\)到\(\mathrm{base}^\mathrm{end}\)的等比数列

比如：

1
2
3

import torch
a = torch.logspace(1, 5, 5)
print(a)

输出结果：

1	`tensor([1.0000e+01, 1.0000e+02, 1.0000e+03, 1.0000e+04, 1.0000e+05])`

其等价于：

import torch
x = torch.linspace(1, 5, 5)
a = 10**x
print(a)

输出结果：

1	`tensor([1.0000e+01, 1.0000e+02, 1.0000e+03, 1.0000e+04, 1.0000e+05])`

依据概率分布创建

`torch.normal()`

生成正态分布（高斯分布）

1
2
3

torch.normal(mean,
            std,
            out=None)

mean：均值
std：标准差

或者：

torch.normal(mean,
            std,
            size,
            out=None)

size：一个元组，指定输出张量的形状

注意，mean和std既可以为标量（零维张量，依旧为必须为torch.tensor类型的数据），也可以为张量

同时，mean与std张量内元素的值不能为整型数据。

如果mean和std均为标量，生成结果为一个均值为mean，标准差为std的标量：

import torch
meann = torch.tensor([0], dtype=torch.float)
stdd = torch.tensor([1], dtype=torch.float)
a = torch.normal(meann, stdd)
print(a)

输出结果：

1	`tensor([-1.5427])`

如果mean和std均为张量且维度相同，生成结果为对应元素满足mean中对应元素为均值，std对应元素为方差的张量：

import torch
meann = torch.tensor([0, 1, 2, 3], dtype=torch.float)
stdd = torch.tensor([0.1, 0.1, 0.1, 0.1], dtype=torch.float)
a = torch.normal(meann, stdd)
print(a)

输出结果：

1	`tensor([0.1568, 1.1241, 2.0056, 2.9929])`

如果mean和std中一个为标量，另一个为张量，那么我们将为标量的广播为与另一个同维度的张量，填充元素的值为原本标量的值

import torch
meann = torch.tensor([0], dtype=torch.float)
stdd = torch.tensor([[1, 2, 3, 4],
                     [5, 6, 7, 8]], dtype=torch.float)
a = torch.normal(meann, stdd)
print(a)

输出结果：

1 2	`tensor([[ 0.9402, 2.7704, 2.5734, -0.7569], [ 0.6887, -11.1956, -4.2237, -9.9899]])`

广播机制与Numpy中的广播机制相同

如果mean与std维度不同，使用广播机制（注意是否满足广播条件，不满足会报错）

import torch
meann = torch.tensor([0, 1, 2, 3], dtype=torch.float)
stdd = torch.tensor([[1, 2, 3, 4],
                     [5, 6, 7, 8]], dtype=torch.float)
a = torch.normal(meann, stdd)
print(a)

输出结果：

1 2	`tensor([[ 0.5296, 1.1933, -0.8489, 5.6830], [-0.4387, -1.7417, 0.2424, -4.8737]])`

`torch.randn()`

生成标准正态分布

torch.randn(*size,
           out=None,
           dtype=None,
           layout=torch.strided,
           device=None,
           requires_grad=False)

size：张量的形状

1
2
3

import torch
a = torch.randn(2, 2, 3)
print(a)

输出结果：

tensor([[[-0.7078, -1.0771,  0.9436],
         [-1.1914, -1.8960,  0.5167]],

        [[-0.1060,  0.8221,  0.2546],
         [ 0.4660, -0.0694,  0.1593]]])

`torch.randn_like()`

生成标准正态分布，用法参考torch.zeros_like()

`torch.rand()`

在\([0, 1)\)区间上，生成均匀分布

torch.rand(*size,
           out=None,
           dtype=None,
           layout=torch.strided,
           device=None,
           requires_grad=False)

比如：

1
2
3

import torch
a = torch.rand(2, 2, 3)
print(a)

输出结果：

tensor([[[0.1706, 0.4481, 0.3976],
         [0.5722, 0.2439, 0.1884]],

        [[0.1673, 0.2974, 0.1648],
         [0.3629, 0.2083, 0.4379]]])

`torch.rand_like()`

在\([0, 1)\)区间上，生成均匀分布，用法参考torch.zeros_like()

`torch.randint()`

在区间[low, high)上生成整数均匀分布

torch.randint(low=0,
             high,
             size,
             out=None,
             dtype=None,
             layout=torch.strided,
             device=None,
             requires_grad=False)

比如：

1
2
3

import torch
a = torch.randint(1, 10, (3, 3))
print(a)

输出结果：

1
2
3

tensor([[7, 7, 8],
        [5, 3, 6],
        [8, 5, 1]])

`torch.randint_like()`

在区间[low, high)上生成整数均匀分布，用法参考torch.zeros_like()

`torch.randperm()`

生成从0到\(n-1\)的随机排列

torch.randperm(n,
              out=None,
              dtype=torch.int64,
              layout=torch.strided,
              device=None,
              requires_grad=False)

n：张量的长度

比如：

1
2
3

import torch
a = torch.randperm(10)
print(a)

输出结果：

1	`tensor([8, 1, 5, 3, 7, 0, 4, 9, 2, 6])`

`torch.bernoulli()`

以input为概率，生成伯努利分布（0-1分布，两点分布）

torch.bernoulli(input,
               *,
               generator=None,
               out=None)

input：概率值，必须为torch.tensor类型的数据，同时内部元素不能为整型（包括长整型、整型、短整型等等）

比如：

1
2
3

import torch
a = torch.bernoulli(torch.tensor([0.1, 0.2, 0.5]))
print(a)

输出结果：

1	`tensor([0., 0., 1.])`

张量的操作

操作方法	作用
`torch.cat()`	将张量按维度进行拼接
`torch.stack()`	将张量按新增维度进行拼接
`torch.chunk()`	将张量按照维度和份数进行平均切分
`torch.split()`	将张量按照维度和份数进行切分
`torch.index_select()`	在维度`dim`上，按`index`索引数据进行筛选，只保留维度`dim`上`index`指定的张量
`torch.masked_select()`	按`mask`中的为`True`的元素进行索引筛选，返回值：一维张量
`reshape()`	变换张量的形状
`resize_()`	变换张量形状，in-place方法
`resize()`	改变张量的维度，返回修改后的结果
`torch.transpose()`	交换张量的两个维度
`torch.squeeze()`	压缩长度为1的维度（轴）
`torch.unsqueeze()`	依据`dim`扩展维度

in-place操作和out-of-place(non-inplace)操作

in-place操作：也称就地操作，in-place操作会直接修改原内存处的值，也就是对调用该方法的tensor本身进行修改。在Pytorch中，in-place操作的函数名基本都是以下划线_结尾的。in-place操作的优点就在于节省内存，尤其是在处理高维数据时能够显著减少额外的内存开销，这在训练神经网络时是很有价值的一个特性，但在内存节省的同时，in-place操作也带来了一些隐患：
- in-place操作可能会覆盖计算梯度所需的值
- 每个in-place操作实际上都需要重写计算图，out-of-place版本只是简单地分配新对象并保持对旧图的引用，而in-place操作则要求将所有输入的创建者更改为表示该操作的函数
out-of-place/non-inplace操作：只是返回张量等而不是直接修改本身的操作，这样的操作会更加安全，当然也会需要更多的内存

张量的拼接与切分

张量的拼接

`torch.cat()`

将张量按维度dim进行拼接

1
2
3

torch.cat(tensors,
         dim=0,
         out=None)

tensors：张量序列，比如[a, a, a]
dim：要拼接的维度

比如：

import torch
a = torch.ones((2, 3))
a_0 = torch.cat([a, a], dim=0)
a_1 = torch.cat([a, a], dim=1)
print(a_0)
print(a_1)

输出结果：

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.]])

在二维上，dim=0代表在行上拼接，也就是会有更多的行，dim=1代表在列上拼接，也就是会有更多的列

在三维上，dim=0通常代表深度，也就是矩阵的个数（a[0]，a[1]等都代表了一个矩阵），dim=1通常代表矩阵内的行，dim=2通常代表矩阵内的行内的元素，每个行同位置的元素也就对应着列。

`torch.stack()`

在新创建的维度dim上进行拼接

1
2
3

torch.stack(tensors,
           dim=0,
           out=None)

tensors：张量序列
dim：要拼接的维度（注意，是新增的维度）

实验1：

import torch
a = torch.ones((2, 3))
a_stack = torch.stack([a, a], dim=2)
print(a_stack)
print(a_stack.size())

输出结果：

tensor([[[1., 1.],
         [1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.],
         [1., 1.]]])
torch.Size([2, 3, 2])

实验2：

import torch
a = torch.ones((2, 3))
a_stack = torch.stack([a, a, a], dim=2)
print(a_stack)
print(a_stack.size())

输出结果：

tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 3, 3])

实验3：

import torch
a = torch.ones((2, 3))
a_stack = torch.stack([a, a], dim=0)
print(a_stack)
print(a_stack.size())

输出结果：

tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])

对比实验1和实验2，新增维度的个数就是tensors参数对应的序列中张量的个数。

对比实验1和实验3，使用torch.stack()只是新增维度，如果原先这个维度已经存在，它会把原先这个维度和这个维度后面的维度统一后移一位，然后在原先维度处“新增维度”。

比如：

import torch
a = torch.arange(1, 10).reshape(3, 3)
a1 = torch.stack([a, a], dim = 0)
a2 = torch.stack([a, a], dim = 1)
a3 = torch.stack([a, a], dim = 2)
print(a1, end='\n\n')
print(a2, end='\n\n')
print(a3)

输出结果：

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]],

        [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

tensor([[[1, 2, 3],
         [1, 2, 3]],

        [[4, 5, 6],
         [4, 5, 6]],

        [[7, 8, 9],
         [7, 8, 9]]])

tensor([[[1, 1],
         [2, 2],
         [3, 3]],

        [[4, 4],
         [5, 5],
         [6, 6]],

        [[7, 7],
         [8, 8],
         [9, 9]]])

我们可以认为[a, a]是一个三维张量，dim = 0代表的就是a矩阵，dim=0的拼接的结果就是[a, a]；

dim = 1代表的是a中的行，拼接就是不同矩阵相同的行拼接成一个新的矩阵，再把不同的行对应的不同矩阵拼接成三维

dim = 2代表的是a一行中的具体某个元素，拼接就是不同矩阵同一行的同一元素（也就是张量列表对应的不同张量[矩阵]）拼接成一个新的向量，然后同一行的不同向量拼接成一个矩阵，进而拼成三维

张量的切分

`torch.chunk()`

将张量按维度dim进行平均切分，返回值为张量列表

1
2
3

torch.chunk(input,
           chunks,
           dim=0)

input：要切分的张量
chunks：要切分的份数
dim：要切分的维度

注意，若不能均分（不能整除），最后一个张量的元素个数会小于其他张量

比如：

import torch
a = torch.ones((2, 7))
list_of_tensors_0 = torch.chunk(a, 3, dim=0)
list_of_tensors_1 = torch.chunk(a, 3, dim=1)
print(list_of_tensors_0)
print(list_of_tensors_1)

输出结果：

(tensor([[1., 1., 1., 1., 1., 1., 1.]]), tensor([[1., 1., 1., 1., 1., 1., 1.]]))
(tensor([[1., 1., 1.],
        [1., 1., 1.]]), tensor([[1., 1., 1.],
        [1., 1., 1.]]), tensor([[1.],
        [1.]]))

在上面的例子中，dim=0的维度（行方向）只有2个，那么分为3份是不可能的，因此其只分为了两份，每行一份；dim=1的维度（列方向）有7个，分为3份是不可能均分的，而\(\frac{7}{3} = 2.\dot{3}\)，因此，前两份各含有3列，最后一份只有\(7 - 2\times3=1\)列

`torch.split()`

将张量按维度dim进行切分，返回值：张量列表

1
2
3

torch.split(tensor,
           split_size_of_sections,
           dim=0)

tensor：要切分的张量
split_size_of_sections：为int类型时，表示每一份的长度，与torch.chunk()相同；为list列表时，按列表中元素进行切分
dim：要切分的维度

比如：

import torch
a = torch.ones((2, 5))
list_of_tensors = torch.split(a, [2, 1, 2], dim=1)
for i in list_of_tensors:
    print(i)

输出结果：

tensor([[1., 1.],
        [1., 1.]])
tensor([[1.],
        [1.]])
tensor([[1., 1.],
        [1., 1.]])

又如：

import torch
a = torch.ones((2, 5))
list_of_tensors = torch.split(a, 2, dim=1)
for i in list_of_tensors:
    print(i)

输出结果：

tensor([[1., 1.],
        [1., 1.]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[1.],
        [1.]])

`torch.index_select()`

在维度dim上，按index索引数据进行筛选，只保留维度dim上index指定的张量

torch.index_select(input,
                  dim,
                  index,
                  out=None)

input：要索引的张量
dim：要索引的维度
index：要索引数据的序号（要求为torch.tensor类型的数据，并且内部元素需为torch.long类型）

比如：

import torch
a = torch.randint(0, 9, size=(3, 3))
idx = torch.tensor([0, 2], dtype = torch.long)
a_select = torch.index_select(a, dim=0, index = idx)
print(a)
print(a_select)

输出结果：

tensor([[0, 5, 1],
        [7, 8, 7],
        [0, 4, 6]])
tensor([[0, 5, 1],
        [0, 4, 6]])

在上述例子中，index = torch.tensor([0, 2], dtype = torch.long)，dim = 0，代表着在二维矩阵中取第一行和第三行

`torch.masked_select()`

按mask中的为True的元素进行索引，返回值：一维张量

1
2
3

torch.masked_select(input,
                   mask,
                   out=None)

input：要索引的张量
mask：与inpu同形状的布尔类型张量

比如：

import torch
a = torch.randint(0, 9, (3, 3))
print(a)
mask = a.ge(5) # 如果a中元素大于等于5，对应位置为True，否则为False
print(mask)
a_select = torch.masked_select(a, mask)
print(a_select)

输出结果：

tensor([[1, 6, 5],
        [6, 8, 0],
        [1, 1, 1]])
tensor([[False,  True,  True],
        [ True,  True, False],
        [False, False, False]])
tensor([6, 5, 6, 8])

张量变换

`reshape()`

用于改变张量的形状

1	`.reshape(*size)`

.reshape()方法更改后的张量与原张量共享内存(当张量在内存中是连续的时候)，一方改变另一方也会改变。

当*size中出现-1时，其所代表的的维度可以调整，也就是不做规定（根据别的维度设定计算得出）

import torch
a = torch.arange(1, 10)
b = a.reshape(3, 3)
print(f"更改前：\na = {a}\nb = {b}")
a[0] = 6
print(f"更改后：\na = {a}\nb = {b}")

输出结果：

更改前：
a = tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
b = tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
更改后：
a = tensor([6, 2, 3, 4, 5, 6, 7, 8, 9])
b = tensor([[6, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

我们除了可以直接对张量使用reshape()方法，还可以使用torch.reshape()：

1	`torch.reshape(input, shape)`

input：要变换的张量
shape：新张量的形状

比如：

import torch
a = torch.randperm(8)
a_reshape = torch.reshape(a, (2, 4))
print(a)
print(a_reshape)

输出结果：

1
2
3

tensor([7, 4, 6, 3, 2, 0, 1, 5])
tensor([[7, 4, 6, 3],
        [2, 0, 1, 5]])

`resize_()`

改变张量本身的维度，in-place方法

比如：

import torch
a = torch.randperm(8)
a.resize_(2, 4)
print(a)

输出结果：

1 2	`tensor([[1, 0, 6, 7], [3, 5, 2, 4]])`

`resize()`

改变张量的维度，返回修改后的结果，对张量本身不做修改

比如：

import torch
a = torch.randperm(8)
b = a.resize(2, 4)
print(a)
print(b)

输出结果：

1
2
3

tensor([1, 5, 2, 3, 6, 0, 7, 4])
tensor([[1, 5, 2, 3],
        [6, 0, 7, 4]])

`torch.transpose()`

交换张量的两个维度

1
2
3

torch.transpose(input,
               dim0,
               dim1)

input：要变换的张量
dim0：要交换的维度
dim1：要交换的维度

也就是将张量的dim0维度和第dim1维度互换

比如：

import torch
a = torch.arange(9)
a = torch.reshape(a, (3, 3))
print(a)
b = torch.transpose(a, 0, 1)
print(b)

输出结果：（等价于矩阵的转置）

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([[0, 3, 6],
        [1, 4, 7],
        [2, 5, 8]])

又比如：

import torch
a = torch.arange(24)
a = torch.reshape(a, (2, 3, 4))
print(a)
b = torch.transpose(a, 0, 1)
print(b)

输出结果：

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])
tensor([[[ 0,  1,  2,  3],
         [12, 13, 14, 15]],

        [[ 4,  5,  6,  7],
         [16, 17, 18, 19]],

        [[ 8,  9, 10, 11],
         [20, 21, 22, 23]]])

在上述三维张量的例子中，dim0 = 0代表深度，dim1 = 1代表矩阵的行，将这两个维度互换，也就是将同一个矩阵的几行变成不同矩阵的同一行，至于在第几行取决于其在第几个矩阵

在上述二维张量的例子中，dim0 = 0代表矩阵的行，dim1 = 1代表矩阵内行的元素，将这两个维度互换，也就是将每行中的所有元素变成不同行的同一位置的元素，至于在第几个位置取决于其在第几行，宏观上来说，就是矩阵的转置

在图像的处理中，有时候我们获得的图像是\(C\times H\times W\)（通道\(\times\)高\(\times\)宽），我们需要将其转换为\(H\times W\times C\)，就可以使用到torch.transpose()

一个张量的某个维度的含义，我们可以通过改变这个维度上的值看出，比如，对于三维张量a，a[0]、a[1]都是矩阵，那么第零维就代表深度

`torch.squeeze()`

压缩长度为1的维度（轴）

1
2
3

torch.squeeze(input,
             dim=None,
             out=None)

dim：若为None，移除所有长度为1的轴；若指定维度，当且仅当该轴长度为1时，可以被移除

比如：

import torch
a = torch.rand(1, 2, 3, 1)
print(f"a = {a}, shape = {a.shape}")
b = torch.squeeze(a)
print(f"b = {b}, shape = {b.shape}")

输出结果：

a = tensor([[[[0.6342],
          [0.5454],
          [0.4946]],

         [[0.1255],
          [0.7076],
          [0.7710]]]]), shape = torch.Size([1, 2, 3, 1])
b = tensor([[0.6342, 0.5454, 0.4946],
        [0.1255, 0.7076, 0.7710]]), shape = torch.Size([2, 3])

又比如：

import torch
a = torch.rand(1, 1)
print(f"a = {a}, shape = {a.shape}")
b = torch.squeeze(a)
print(f"b = {b}, shape = {b.shape}")

输出结果：

1 2	`a = tensor([[0.8574]]), shape = torch.Size([1, 1]) b = 0.8574031591415405, shape = torch.Size([])`

由上述例子可知，torch.squeeze()的作用为删除多余的（嵌套的）维度

`torch.unsqueeze()`

依据dim扩展维度

1
2
3

torch.unsqueeze(input,
               dim,
               out=None)

比如：

import torch
a = torch.arange(4).resize_(2, 2)
print(f"a = {a}, shape = {a.shape}")
print(f"b = {b}, shape = {b.shape}")
print(f"c = {c}, shape = {c.shape}")
print(f"d = {d}, shape = {d.shape}")

输出结果：

a = tensor([[0, 1],
        [2, 3]]), shape = torch.Size([2, 2])
b = tensor([[[0, 1],
         [2, 3]]]), shape = torch.Size([1, 2, 2])
c = tensor([[[0, 1]],

        [[2, 3]]]), shape = torch.Size([2, 1, 2])
d = tensor([[[0],
         [1]],

        [[2],
         [3]]]), shape = torch.Size([2, 2, 1])

张量的计算

下面表格中除了可以使用torch.*()外还可以直接对张量使用.*()方法，比如：a.mm(a)（对矩阵a右乘一个矩阵a）

方法	作用
`torch.t()`或	矩阵的转置
`torch.mm()`	矩阵乘法
`torch.matmul()`	矩阵的乘法
`torch.inverse()`	矩阵的逆
`torch.trace()`	矩阵的迹
`torch.linalg.matrix_rank()`	矩阵的秩
`torch.dot()`	向量的点积
`torch.mv()`	矩阵的向量积
`torch.norm()`	矩阵或向量的范数（已弃用）
`torch.linalg.vector_norm()`	向量的范数
`torch.linalg.matrix_norm()`	矩阵的范数
`torch.linalg.norm()`	向量或矩阵的范数
`torch.allclose()`	比较两个张量的元素是否接近
`torch.eq()`	比较两个张量的元素是否相等
`torch.equal()`	比较两个张量是否完全相等
`torch.ge()`	逐元素比较是否大于等于
`torch.gt()`	逐元素比较是否大于
`torch.le()`	逐元素比较是否小于等于
`torch.lt()`	逐元素比较是否小于
`torch.ne()`	逐元素比较是否不等于
`torch.pow()`	计算张量的幂
`torch.exp()`	计算张量的自然指数
`torch.log()`	计算张量的自然对数
`torch.log2()`	计算张量以2为底数的对数
`torch.log10()`	计算张量以10为底数的对数
`torch.sqrt()`	计算张量的平方根
`torch.rsqrt()`	计算张量的平方根的倒数
`torch.clamp_max()`	根据最大值裁剪
`torch.clamp_min()`	根据最小值裁剪
`torch.clamp()`	根据范围裁剪
`torch.abs()`	计算张量的绝对值
`torch.max()`	计算张量中的最大值
`torch.argmax()`	输出张量中最大值所在的位置
`torch.min()`	计算张量中的最小值
`torch.argmin()`	输出张量中最小值所在的位置
`torch.sort()`	张量排序
`torch.topk()`	计算张量前k大/小的数值与其所在的位置
`torch.kthvalue()`	计算张量第k小的数值与其所在的位置
`torch.mean()`	计算张量的均值
`torch.sum()`	计算张量的和
`torch.cumsum()`	计算指定维度的累加和
`torch.median()`	计算（指定维度的）中位数
`torch.prod()`	计算（指定维度的）乘积
`torch.cumprod()`	计算指定维度的累乘积

二维矩阵的计算

矩阵的转置

1	`torch.t(input)`

input：二维张量

其等价于torch.transpose(input, 0, 1)

矩阵的乘法

torch.mm(input,
        mat2,
        *,
        out=None)

input：矩阵乘法中位于左边的二维张量，需要保证dtype = torch.float
mat2：矩阵乘法中位于右边的二维张量，需要保证dtype = torch.float

比如：

import torch
a = torch.eye(3, dtype = torch.float)
print(a)
b = torch.reshape(torch.arange(1, 10, dtype = torch.float), (3, 3))
print(b)
print(f"a·b = {torch.mm(a, b)}")

输出结果：

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
a·b = tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

除了使用torch.mm()，还可以使用.matmul()或者torch.matmul方法（不是in-place方法）：

torch.matmul(input,
            other,
            *,
            out)

比如：

import torch
a = torch.eye(3)
b = torch.arange(1, 10, dtype = torch.float).resize_(3, 3)
print(a.matmul(b))
print(a)
print(torch.matmul(a, b))

输出结果：

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

torch.mm()与torch.matmul()不同的是，torch.mm()要求了张量必须为二维的矩阵，而torch.matmul()则没有这个限制，其只计算最后面两个维度，比如：

import torch
a = torch.eye(3, 3).resize_(1, 3, 3)
b = torch.arange(1, 10, dtype = torch.float).resize_(1, 3, 3)
print(f"a = {a}\nb = {b}")
try:
    print(torch.mm(a, b))
except Exception as e:
    print(e)

try:
    print(torch.matmul(a, b))
except Exception as e:
    print(e)

输出结果：

a = tensor([[[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]]])
b = tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]])
self must be a matrix
tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]])

矩阵的逆

1
2
3

torch.inverse(input,
             *,
             out)

input：用于计算逆的原矩阵张量（需要为torch.float或者torch.float64类型）

比如：

import torch
a = torch.rand((3, 3))
print(a)
b = torch.inverse(a)
print(b)
print(torch.mm(a, b))

输出结果：

tensor([[0.1048, 0.9454, 0.6859],
        [0.6498, 0.3000, 0.8390],
        [0.9779, 0.5697, 0.5404]])
tensor([[-0.6818, -0.2594,  1.2680],
        [ 1.0132, -1.3257,  0.7723],
        [ 0.1657,  1.8669, -1.2582]])
tensor([[ 1.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  1.0000e+00,  0.0000e+00],
        [-5.2154e-08,  0.0000e+00,  1.0000e+00]])

（可能会出现很小很小的误差，导致应该为0的位置不为0，而是一个很小很小的数）

矩阵的迹

计算矩阵的迹（矩阵对角元素的和），返回结果为张量

1	`torch.trace(input)`

比如：

1
2
3

import torch
a = torch.eye(3)
print(torch.trace(a))

输出结果：

1	`tensor(3.)`

矩阵的秩

1	`torch.linalg.matrix_rank()`

其中输入的张量必须为torch.float或者torch.double类型

比如：

1
2
3

import torch
a = torch.arange(1, 10, dtype = torch.float).resize_(3, 3)
print(torch.linalg.matrix_rank(a))

输出结果：

1	`tensor(2)`

向量的点积

1	`torch.dot()`

比如：

import torch

a = torch.arange(6, dtype=torch.float32)
b = torch.ones(6, dtype=torch.float)
print(torch.dot(a, b))

输出结果：

1	`tensor(15.)`

矩阵向量积

1	`torch.mv(input, vec)`

input：一个二维张量（矩阵），其形状为(m, n)
vec：一个一维张量（向量），其长度为n

矩阵向量积其实就是将行向量转为列向量然后左乘一个矩阵，本质上就是矩阵的乘法，可以用torch.matmul()替代，或者将向量变成\(n\times1\)维度的二维矩阵然后使用torch.mm()替代实现

import torch

A = torch.arange(1, 21).reshape(4, 5)
x = torch.randint(1, 6, (1, 5)).squeeze()
print(f'A = {A}\nx = {x}\n')

print(f'Ax = {torch.mv(A, x)}')
print(f'Ax = {torch.matmul(A, x)}')

输出结果：

A = tensor([[ 1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10],
        [11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20]])
x = tensor([3, 3, 5, 1, 1])

Ax = tensor([ 33,  98, 163, 228])
Ax = tensor([ 33,  98, 163, 228])

矩阵的范数

1	`torch.norm(input, p='fro', dim=None, keepdim=False, out=None, dtype=None)`

input: 要计算范数的张量。张量的类型需要是浮点型。
p: 指定范数的类型。可以是以下值之一：
- None 或 'fro': Frobenius范数（默认值），仅适用于矩阵。
- inf: 无穷范数。
- 1: L1范数。
- 2: L2范数（欧几里得范数）。
- 其他实数：p-范数。
- 'nuc': 核范数，仅适用于矩阵。
dim: 指定计算范数的维度。如果为 None，则计算整个张量的范数。
keepdim: 如果为 True，则在输出张量中保留维度 dim。
out: 指定输出张量。
dtype: 指定输出张量的数据类型。

import torch

a = torch.arange(1, 10, dtype=torch.float).reshape(3, 3)

print(a.norm()) # 计算矩阵的F范数

输出结果：

1	`tensor(16.8819)`

torch.norm()已被弃用，并可能在未来的 PyTorch 版本中被移除。建议使用 torch.linalg.vector_norm()来计算向量范数，使用 torch.linalg.matrix_norm() 来计算矩阵范数，或者使用 torch.linalg.norm() 来获得类似的功能

向量范数：

1	`torch.linalg.vector_norm(x, ord=2, dim=None, keepdim=False, *, dtype=None, out=None)`

x (Tensor): 输入的向量张量。
ord：(int, float, inf, -inf, ‘fro’, ‘nuc’, optional): 范数的类型。可以是以下值之一：
- 2：默认值，表示L2范数（欧几里得范数）。
- 其他整数值或浮点值：表示Lp范数。
- inf：最大范数。
- -inf：最小范数。
dim (int, list of ints, optional): 指定计算范数的维度。如果为 None，则计算输入张量的整体范数。
keepdim (bool, optional): 是否在输出中保留维度。默认为 False。
dtype (torch.dtype, optional): 指定返回张量的数据类型。
out (Tensor, optional): 指定输出张量。

矩阵范数：

1	`torch.linalg.matrix_norm(input, ord=None, dim=[-2, -1], keepdim=False, out=None)`

input (Tensor): 输入的矩阵张量，其形状至少为2D。
ord(str or inf or float or int, 可选): 范数的类型。可以是以下值之一：
- 'fro'：Frobenius范数。
- None：默认值，等同于 'fro'。
- inf：最大列范数。
- -inf：最小列范数。
- int：指定整数阶的范数。
- float：指定实数阶的范数
- 'nuc'：核范数（奇异值的和）。
dim (list of ints, 可选): 指定计算范数的维度。默认为 [-2, -1]，即最后两个维度。
keepdim (bool, 可选): 是否在输出中保留维度。默认为 False。
out (Tensor, 可选): 指定输出张量。

比如：

import torch

# 创建一个3x3的矩阵
a = torch.arange(1, 10, dtype=torch.float).reshape(3, 3)

# 计算Frobenius范数
fro_norm = torch.linalg.matrix_norm(a, 'fro')
print(f"Frobenius norm: {fro_norm}")

# 计算L2范数（谱范数）
l2_norm = torch.linalg.matrix_norm(a, 2)
print(f"L2 norm: {l2_norm}")

# 计算核范数
nuc_norm = torch.linalg.matrix_norm(a, 'nuc')
print(f"Nuclear norm: {nuc_norm}")

输出结果：

1
2
3

Frobenius norm: 16.881942749023438
L2 norm: 16.84810447692871
Nuclear norm: 17.916473388671875

torch.linalg.norm()可以用于求向量或矩阵的范数：

1	`torch.linalg.norm(input, ord=None, dim=None, keepdim=False, out=None)`

input (Tensor): 输入的向量或矩阵张量。
ord：(int, float, inf, -inf, ‘fro’, ‘nuc’, optional): 范数的类型。可以是以下值之一：
- 'fro'：Frobenius范数（仅用于矩阵）。
- 'nuc'：核范数（仅用于矩阵）。
- None：默认值，对于矩阵等同于 'fro'，对于向量等同于 2。
- int：指定整数阶的范数。
- float：指定实数阶的范数。
- inf：最大范数。
- -inf：最小范数。
dim (int, list of ints, optional): 指定计算范数的维度。如果为 None，则计算输入张量的整体范数。
keepdim (bool, optional): 是否在输出中保留维度。默认为 False。
out (Tensor, optional): 指定输出张量。

比较大小

比较两个元素是否接近

torch.allclose(input,
              other,
              rtol = 1e-05,
              atol = 1e-08,
              equal_nan=False)

比较两个张量是否相近的公式为： \[ |A - B|\leq \rm{atol} + \rm {rtol}\times \it|B| \]

input：第一个张量
other：与input同维度的张量
rtol与atol为float类型的数据，rtol是相对容差，atol是绝对容差
equal_nan：如果为True，那么缺失值(nan)可以判断为接近

比如：

import torch
a = torch.tensor([10.0])
b = torch.tensor([10.1])
print(torch.allclose(a, b, rtol = 1e-5, atol = 1e-8))
print(torch.allclose(a, b, rtol = 0.1, atol = 0.01))

输出结果：

1 2	`False True`

又比如：

import torch
a = torch.tensor(float("nan"))
print(a)
print(torch.allclose(a, a, equal_nan=False))
print(torch.allclose(a, a, equal_nan=True))

输出结果：

1
2
3

tensor(nan)
False
True

这个函数在调试和验证深度学习模型的实现时非常有用，例如，可以用来检查模型的输出是否与预期相符，或者在分布式训练中检查不同设备上的模型参数是否一致。

比较两个元素是否相等

torch.eq(input,
        other,
        *,
        out=None)

input：用于比较的张量
other：用于比较的另一个张量（维度与input不要求相同）

比如：

import torch
a = torch.arange(1, 7)
print(a)
b = torch.arange(1, 7)
c = torch.unsqueeze(b, dim=0)
print(c)
print(torch.eq(a, c))

输出结果：

1
2
3

tensor([1, 2, 3, 4, 5, 6])
tensor([[1, 2, 3, 4, 5, 6]])
tensor([[True, True, True, True, True, True]])

又比如：

import torch
a = torch.randperm(9).resize_(3, 3)
b = torch.arange(9)
print(a)
print(b)
try:
    print(torch.eq(a, b))
except Exception as e:
    print(e)

输出结果：

tensor([[7, 2, 5],
        [1, 0, 4],
        [8, 3, 6]])
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 1

因此，torch.eq()应该只是支持嵌套上的维度不同，并且会补齐维度（在输出结果上），而不是从头到尾主元素比较：

import torch
a = torch.randperm(9).resize_(1, 3, 1, 3)
b = torch.arange(9)
print(a)
print(b)
try:
    print(torch.eq(a, b.resize_(3, 3)))
except Exception as e:
    print(e)

输出结果：

tensor([[[[2, 5, 1]],

         [[0, 3, 7]],

         [[4, 6, 8]]]])
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
tensor([[[[False, False, False],
          [False, False, False],
          [False, False, False]],

         [[ True, False, False],
          [False, False, False],
          [False, False, False]],

         [[False, False, False],
          [False, False, False],
          [False, False,  True]]]])

判断两个张量是否具有相同的形状和元素

1 2	`torch.equal(input, other)`

input和other：两个用于比较的张量
输出的结果仅为True或者False，只要对应元素中有一个不一样或者维度不一样，就会返回False

比如：

import torch
a = torch.randperm(9)
b = torch.randperm(9)
c = torch.unsqueeze(b, dim=0)
print(torch.equal(a, b))
print(torch.equal(a, c))
print(torch.eq(a, b))
print(torch.eq(a, c))

输出结果：

False
False
tensor([ True,  True, False, False,  True, False,  True,  True, False])
tensor([[ True,  True, False, False,  True, False,  True,  True, False]])

逐元素比较是否大于等于

torch.ge(input,
        other,
        *,
        out)
# 大于等于：greater than or equal to

input：用于比较的张量
other：可以为python数字类型（int、float、bool），也可以为张量，如果为数字类型或者某个维度为1的张量，会使用广播机制进行广播
若input中的一个元素大于等于other中的一个元素，这个元素的位置返回True，否则返回False

比如：

import torch
a = torch.randperm(10).resize_(2, 5)
b = torch.arange(4, 5).resize_(2, 1)
print(f"a = {a}")
print(f"b = {b}")
print(f"torch.ge(a, b) = {torch.ge(a, b)}")

输出结果：

a = tensor([[4, 7, 1, 6, 9],
        [2, 3, 5, 0, 8]])
b = tensor([[4],
        [0]])
torch.ge(a, b) = tensor([[ True,  True, False,  True,  True],
        [ True,  True,  True,  True,  True]])

又比如：

import torch
a = torch.randperm(10).resize_(2, 5)
b = torch.randperm(10).resize_(2, 5)
print(torch.ge(a, b))
print(torch.ge(a, 5))

输出结果：

tensor([[ True, False,  True,  True, False],
        [False, False,  True,  True,  True]])
tensor([[ True, False,  True,  True, False],
        [ True, False, False,  True, False]])

逐元素比较是否大于

torch.gt(input,
        other,
        *,
        out)
# 大于：greater than

用法与上面的torch.ge()相同

逐元素比较是否小于等于

torch.le(input,
        other,
        *,
        out)
# 小于等于：less than or equal to

用法与上面的torch.ge()相同

逐元素比较是否小于

torch.lt(input,
        other,
        *,
        out)
# 小于：less than

用法与上面的torch.ge()相同

逐元素比较是否不等于

torch.ne(input,
        other,
        *,
        out)

用法与上面的torch.ge()相同

逐元素判断是否为缺失值

1	`torch.isnan(input)`

比如：

1
2
3

import torch
a = torch.tensor([1., float("nan"), 2.])
print(torch.isnan(a))

输出结果：

1	`tensor([False, True, False])`

基本运算

逐元素运算

逐元素运算，elementwise operation

使用基本的+、-、*、/、//(整除)、%(取余)、**(指数运算)等都是逐元素运算，

比如：

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(f"a = {a}")
print(f"a + a = {a+a}")
print(f"a * a = {a*a}")
print(f"a - a = {a-a}")
print(f"a**2 = {a**2}")
print(f"a%2 = {a%2}")
print(f"a//3 = {a//3}")

输出结果：

a = tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
a + a = tensor([[ 2,  4,  6],
        [ 8, 10, 12],
        [14, 16, 18]])
a * a = tensor([[ 1,  4,  9],
        [16, 25, 36],
        [49, 64, 81]])
a - a = tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
a**2 = tensor([[ 1,  4,  9],
        [16, 25, 36],
        [49, 64, 81]])
a%2 = tensor([[1, 0, 1],
        [0, 1, 0],
        [1, 0, 1]])
a//3 = tensor([[0, 0, 1],
        [1, 1, 2],
        [2, 2, 3]])

张量的幂

逐元素计算张量的幂：使用**或者torch.pow()：

torch.pow(input,
         exponent,
         *,
         out)

input：用于计算幂的张量（作为底）
exponent：用于计算幂的指数，可以为python数字类型（int、float、bool），也可以为张量，如果为数字类型或者某维度为1的张量，会使用广播机制。如果使用同纬度张量，则input对应元素作为底，exponent对应元素作为指数

比如：

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(torch.pow(a, 2))
print(torch.pow(a, a))

输出结果：

tensor([[ 1,  4,  9],
        [16, 25, 36],
        [49, 64, 81]])
tensor([[        1,         4,        27],
        [      256,      3125,     46656],
        [   823543,  16777216, 387420489]])

也可以直接对张量使用pow()方法：

import torch
a = torch.arange(1, 10).reshape(3, 3)
b = torch.arange(3, 0, -1)
print(a)
print(b)
print(a.pow(b))

输出结果：

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([3, 2, 1])
tensor([[  1,   4,   3],
        [ 64,  25,   6],
        [343,  64,   9]])

自然指数函数

1
2
3

torch.exp(input,
         *,
         out)

逐元素（\(x\)）计算\(e^x\)，返回同维度的张量

比如：

1
2
3

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(torch.exp(a))

输出结果：

1
2
3

tensor([[2.7183e+00, 7.3891e+00, 2.0086e+01],
        [5.4598e+01, 1.4841e+02, 4.0343e+02],
        [1.0966e+03, 2.9810e+03, 8.1031e+03]])

对数函数

\(\ln(x)\)

1
2
3

torch.log(input,
         *,
         out)

比如：

import torch
a = torch.arange(1, 10)
b = torch.exp(a)
print(torch.log(b))

输出结果：

1	`tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])`

\(\log_2(x)\)

1
2
3

torch.log2(input,
          *,
          out)

比如：

1
2
3

import torch
a = torch.arange(1, 9)
print(torch.log2(a))

输出结果：

1	`tensor([0.0000, 1.0000, 1.5850, 2.0000, 2.3219, 2.5850, 2.8074, 3.0000])`

\(\lg(x)\)

1
2
3

torch.log10(input,
           *,
           out)

比如：

1
2
3

import torch
a = torch.logspace(1, 10, 10)
print(torch.log10(a))

输出结果：

1	`tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])`

对数函数还有in-place方法：.log_()、.log2_()、.log10_()，需要保证使用该方法的张量为torch.float或者torch.double类型，比如：

import torch
a = torch.arange(1, 11, dtype=torch.float)
b = a.clone()
c = a.clone()
d = a.clone()
b.log_()
c.log2_()
d.log10_()
print(b)
print(c)
print(d)

输出结果：

tensor([0.0000, 0.6931, 1.0986, 1.3863, 1.6094, 1.7918, 1.9459, 2.0794, 2.1972,
        2.3026])
tensor([0.0000, 1.0000, 1.5850, 2.0000, 2.3219, 2.5850, 2.8074, 3.0000, 3.1699,
        3.3219])
tensor([0.0000, 0.3010, 0.4771, 0.6021, 0.6990, 0.7782, 0.8451, 0.9031, 0.9542,
        1.0000])

计算张量的平方根

1
2
3

torch.sqrt(input,
          *,
          out)

比如：

1
2
3

import torch
a = torch.arange(1, 10)**2
print(torch.sqrt(a))

输出结果：

1	`tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])`

计算张量的平方根倒数

1
2
3

torch.rsqrt(input,
           *,
           out)

比如：

import torch
a = torch.pow(torch.arange(1, 11), 2)
b = torch.rsqrt(a)
print(a)
print(b)

输出结果：

1
2
3

tensor([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100])
tensor([1.0000, 0.5000, 0.3333, 0.2500, 0.2000, 0.1667, 0.1429, 0.1250, 0.1111,
        0.1000])

根据最大值裁剪

torch.clamp_max(input,
               max,
               *,
               out)

max：可以为python的数字类型（int、float、double），也可以为张量，若维度不够，则使用广播机制

此函数的作用是在对应元素中选择最小的（如果没超过max，就选择input对应数据；如果超过了max，就选择max对应数据）

比如：

import torch
a = torch.arange(1, 10).resize_(3, 3)
b = torch.randint(1, 10, (3, 3))
print(f"a = {a}\nb = {b}")
print(torch.clamp_max(a, b))

输出结果：

a = tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
b = tensor([[6, 5, 9],
        [2, 8, 5],
        [7, 6, 6]])
tensor([[1, 2, 3],
        [2, 5, 5],
        [7, 6, 6]])

又比如：

1
2
3

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(torch.clamp_max(a, 5))

输出结果：

1
2
3

tensor([[1, 2, 3],
        [4, 5, 5],
        [5, 5, 5]])

根据最小值裁剪

torch.clamp_min(input,
               min,
               *,
               out)

此函数的作用是在对应元素中选择最大的（如果小于min，就选择min对应数据；如果大于等于min，就选择input对应数据）

比如：

1
2
3

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(torch.clamp_min(a, 5))

输出结果：

1
2
3

tensor([[5, 5, 5],
        [5, 5, 6],
        [7, 8, 9]])

参考根据最大值裁剪

根据范围裁剪

torch.clamp(input,
           min,
           max,
           *,
           out)

参考根据最大值裁剪

比如：

1
2
3

import torch
a = torch.arange(1, 10).resize_(3, 3)
print(torch.clamp(a, 3, 6))

输出结果：

1
2
3

tensor([[3, 3, 3],
        [4, 5, 6],
        [6, 6, 6]])

绝对值

1	`torch.abs(input, out=None)`

input: 输入张量，其中的元素将会被取绝对值。
out: 可选参数，用于指定输出张量的内存位置。如果提供了这个参数，那么计算的结果将会直接存储在这个张量中，而不是返回一个新的张量。

import torch

a = torch.arange(-3, 4)
print(a)
print(a.abs())

输出结果：

1 2	`tensor([-3, -2, -1, 0, 1, 2, 3]) tensor([3, 2, 1, 0, 1, 2, 3])`

统计相关计算

计算张量中的最大值

torch.max(input,
         dim,
         keepdim=False,
         *,
         out=None)

input：要求最大值的张量
dim：在dim维度上求最大值（可以不设置dim）
keepdim：是否保留原本的input所具有的维度

比如：

import torch
a = torch.randint(1, 10, (3, 3))
print(f"a = {a}\n")
print(f"torch.max(a) = {torch.max(a)}\n")
print(f"torch.max(a, dim = 0) = {torch.max(a, dim = 0)}\n")
print(f"torch.max(a, dim = 0, keepdim = True) = {torch.max(a, dim = 0, keepdim = True)}\n")
print(f"torch.max(a, dim = 1) = {torch.max(a, dim = 1)}\n")
print(f"torch.max(a, dim = 1, keepdim = True) = {torch.max(a, dim = 1, keepdim = True)}")

输出结果：

a = tensor([[9, 4, 5],
        [5, 4, 1],
        [6, 6, 6]])

torch.max(a) = 9

torch.max(a, dim = 0) = torch.return_types.max(
values=tensor([9, 6, 6]),
indices=tensor([0, 2, 2]))

torch.max(a, dim = 0, keepdim = True) = torch.return_types.max(
values=tensor([[9, 6, 6]]),
indices=tensor([[0, 2, 2]]))

torch.max(a, dim = 1) = torch.return_types.max(
values=tensor([9, 5, 6]),
indices=tensor([0, 0, 0]))

torch.max(a, dim = 1, keepdim = True) = torch.return_types.max(
values=tensor([[9],
        [5],
        [6]]),
indices=tensor([[0],
        [0],
        [0]]))

又比如：

import torch
a = torch.randint(1, 10, (3, 3))
print(f"a = {a}\n")
b = torch.max(a, dim = 0)
print(f"b = {b}\n")
print(f"b.values = {b.values}\n")
print(f"b.indices = {b.indices}")

输出结果：

a = tensor([[3, 2, 7],
        [5, 7, 6],
        [9, 1, 7]])

b = torch.return_types.max(
values=tensor([9, 7, 7]),
indices=tensor([2, 1, 0]))

b.values = tensor([9, 7, 7])

b.indices = tensor([2, 1, 0])

也可以使用.max()方法

输出张量中最大值所在位置

torch.argmax(input,
            dim=None,
            keepdim=False,
            *,
            out=None)

参数的使用参考如上的torch.max()

比如：

import torch
a = torch.randint(1, 10, (3, 3))
print(f"a = {a}")
print(torch.argmax(a, dim=0))

输出结果：

a = tensor([[1, 7, 3],
        [2, 3, 3],
        [5, 5, 6]])
tensor([2, 0, 2])

又比如：

import torch
a = torch.randint(1, 10, (3, 3))
print(f"a = {a}")
print(torch.argmax(a))

输出结果：

a = tensor([[4, 5, 4],
        [5, 5, 2],
        [7, 8, 6]])
tensor(7)

也可以使用.argmax()方法

计算张量中的最小值

torch.min(input,
         dim,
         keepdim=False,
         *,
         out=None)

参考上面的torch.max()

也可以使用.min()方法

输出张量中最小值所在位置

torch.argmin(input,
            dim=None,
            keepdim=False,
            *,
            out=None)

参考上面的torch.argmax()

也可以使用.argmin()方法

排序

torch.sort(input,
          dim,
          descending=False,
          *,
          out=None)

input：要排序的输入张量
dim：指定在哪个维度上进行排序。如果为None，则会返回根据最后一个维度排序的张量
descending：是否按降序进行排序

返回的结果为torch.return_types.sort类型的数据，包含values和indices两个属性，values属性为排序好后的结果，为张量；indices属性为排序好后结果原本在该维度的位置。

比如：

import torch
a = torch.randperm(16).reshape(4, 4)
print(f"a = {a}\n")
b = torch.sort(a, dim = 0)
c = torch.sort(a, dim = 1)
print(f"b = {b}\n\nb.values = {b.values}\n\nb.indices = {b.indices}\n")
print(f"c = {c}\n\nc.values = {c.values}\n\nc.indices = {c.indices}\n")

输出结果：

a = tensor([[13,  4, 11, 12],
        [10,  3,  0,  7],
        [ 6, 14,  5,  8],
        [ 1,  2, 15,  9]])

b = torch.return_types.sort(
values=tensor([[ 1,  2,  0,  7],
        [ 6,  3,  5,  8],
        [10,  4, 11,  9],
        [13, 14, 15, 12]]),
indices=tensor([[3, 3, 1, 1],
        [2, 1, 2, 2],
        [1, 0, 0, 3],
        [0, 2, 3, 0]]))

b.values = tensor([[ 1,  2,  0,  7],
        [ 6,  3,  5,  8],
        [10,  4, 11,  9],
        [13, 14, 15, 12]])

b.indices = tensor([[3, 3, 1, 1],
        [2, 1, 2, 2],
        [1, 0, 0, 3],
        [0, 2, 3, 0]])

c = torch.return_types.sort(
values=tensor([[ 4, 11, 12, 13],
        [ 0,  3,  7, 10],
        [ 5,  6,  8, 14],
        [ 1,  2,  9, 15]]),
indices=tensor([[1, 2, 3, 0],
        [2, 1, 3, 0],
        [2, 0, 3, 1],
        [0, 1, 3, 2]]))

c.values = tensor([[ 4, 11, 12, 13],
        [ 0,  3,  7, 10],
        [ 5,  6,  8, 14],
        [ 1,  2,  9, 15]])

c.indices = tensor([[1, 2, 3, 0],
        [2, 1, 3, 0],
        [2, 0, 3, 1],
        [0, 1, 3, 2]])

注：对于二维张量（矩阵），dim = 0为行维度，也就是对所有行的相同位置进行排序，因此，dim = 0起到的效果为对每一列进行排序;dim = 1为列维度，也就是在行中的每一个元素，那么就是对这每一个元素进行排序，因此，dim = 1起到的效果为对每一行进行排序。

再比如：

import torch
a = torch.randperm(8).reshape(2, 2, 2)
print(f"a = {a}\n")
b = torch.sort(a, dim = 0)
print(f"b.values = {b.values}\n")
c = torch.sort(a, dim = 1)
print(f"c.values = {c.values}\n")
d = torch.sort(a, dim = 2)
print(f"d.values = {d.values}")

输出结果：

a = tensor([[[1, 5],
         [4, 7]],

        [[3, 6],
         [2, 0]]])

b.values = tensor([[[1, 5],
         [2, 0]],

        [[3, 6],
         [4, 7]]])

c.values = tensor([[[1, 5],
         [4, 7]],

        [[2, 0],
         [3, 6]]])

d.values = tensor([[[1, 5],
         [4, 7]],

        [[3, 6],
         [0, 2]]])

注：对于三维张量，dim = 0维度对应着矩阵（第零维度的个数表示矩阵的个数），因此，在dim = 0上排序，也就是将不同的矩阵的相同位置进行排序，比如，一共有三张矩阵，分别比较这三张矩阵相同位置的元素的大小，然后对这三个元素进行排序；dim = 1就是矩阵的行，那么也就是分别对每个矩阵的列进行排序。

计算张量前k大的数值与其所在位置

torch.topk(input,
          k,
          dim = -1,
          largest = True,
          sorted = True,
          *,
          out=None)

input：要用于计算的张量
k：前k大的数值
dim：计算的维度
largest：如果为True，那么就是返回前k大的数值与其所在位置；如果为False，那么就是返回前k小的数值与其所在位置
sorted：返回的结果是否按顺序，如果为False，则返回的前k大的值不一定按大小排序

返回的结果为torch.return_types.topk类型的数据，含有values和indices两个属性，两个属性可以参考torch.sort()的返回结果

比如：

import torch
a = torch.randint(1, 100, (6, 6))
b = torch.topk(a, 3)
print(f"a = {a}\n")
print(f"b.values = {b.values}\n")
print(f"b.indices = {b.indices}")

输出结果：

a = tensor([[67, 63, 56, 85,  8, 41],
        [36, 44, 48, 64, 48, 78],
        [87, 18, 16, 44, 51, 41],
        [78, 39, 50, 37, 27, 17],
        [14, 94,  4, 11, 94, 58],
        [72, 73, 22, 99, 80, 70]])

b.values = tensor([[85, 67, 63],
        [78, 64, 48],
        [87, 51, 44],
        [78, 50, 39],
        [94, 94, 58],
        [99, 80, 73]])

b.indices = tensor([[3, 0, 1],
        [5, 3, 2],
        [0, 4, 3],
        [0, 2, 1],
        [1, 4, 5],
        [3, 4, 1]])

又比如：

import torch
a = torch.randint(1, 100, (6, 6))
b = torch.topk(a, 3, largest = False, dim = 0)
print(f"a = {a}\n")
print(f"b.values = {b.values}") # 前k小的

计算张量第k小的数值与其所在的位置

torch.kthvalue(input,
              k,
              dim,
              keepdim = False,
              *,
              out = None)

返回结果为torch.return_types.kthvalue类型的数据，含有values和indices两种属性，torch.kthvalue()参数的用法和返回结果的两种属性可以参考前面的torch.topk()

比如：

import torch
a = torch.randint(1, 100, (6, 6))
b = torch.kthvalue(a, 2, dim = 0)
print(f"a = {a}\n")
print(f"b.values = {b.values}")

返回结果：

a = tensor([[58, 36, 56, 97, 93, 84],
        [31, 61, 59, 46, 75, 43],
        [40, 46, 10, 72, 33, 82],
        [57,  5, 75, 29, 34,  9],
        [50,  8, 78, 51, 19, 40],
        [83, 61, 74, 46, 14, 50]])

b.values = tensor([40,  8, 56, 46, 19, 40])

根据指定维度计算均值

torch.mean(input,
          dim,
          keepdim=False,
          *,
          dtype=None,
          out=None)

input：用于计算均值的张量
dim：指定维度，可以为None。如果为None，就代表求张量中所有元素的均值，返回的结果将会是只含一个元素的张量。dim不仅仅可以是数，还可以是列表，代表多个维度，比如dim = [0, 1]，代表同时选择0维和1维
keepdim：是否保留原先张量的维度
dtype：返回张量的类型，可以为torch.half、torch.float、torch.double。input张量不能为整型数据，需要为浮点型数据，如果不想改动input张量的数据类型，可以使用dtype参数

比如：

import torch
a = torch.randn((4, 4))
print(f"a = {a}\n")
mean = torch.mean(a, dtype = torch.float64)
print("mean = ", mean)

输出结果：

a = tensor([[ 0.5349, -0.4964,  1.3024, -0.3824],
        [ 0.6674,  0.9679,  0.3401,  0.1994],
        [-0.5476, -1.3807, -1.4394,  0.7975],
        [ 1.0385, -0.3552,  0.0408,  0.2940]])

mean =  tensor(0.0988, dtype=torch.float64)

又比如：

import torch
a = torch.tensor([[1, 2, 3, 4],
                  [1, 2, 3, 4],
                  [1, 2, 3, 4],
                  [1, 2, 3, 4]])
print(f"a = {a}\n")
mean_0 = torch.mean(a, dim = 0, dtype = torch.float)
print("mean_0 = ", mean_0)
mean_1 = torch.mean(a, dim = 1, dtype = torch.float)
print("mean_1 = ", mean_1)

输出结果：

a = tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

mean_0 =  tensor([1., 2., 3., 4.])
mean_1 =  tensor([2.5000, 2.5000, 2.5000, 2.5000])

根据指定的维度求和

torch.sum(input,
         dim,
         keepdim = False,
         *,
         dtype=None,
         out=None)

具体用法参考上面的torch.mean()

比如：

import torch
a = torch.tensor([[1, 2, 3, 4],
                  [1, 2, 3, 4],
                  [1, 2, 3, 4],
                  [1, 2, 3, 4]])
print(f"a = {a}\n")
summ = torch.sum(a)
print("summ = ", summ)
summ_0 = torch.sum(a, dim = 0)
print("summ_0 = ", summ_0)
summ_1 = torch.sum(a, dim = 1)
print("summ_1 = ", summ_1)

输出结果：

a = tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])

summ =  tensor(40)
summ_0 =  tensor([ 4,  8, 12, 16])
summ_1 =  tensor([10, 10, 10, 10])

根据指定的维度计算累加和

torch.cumsum(input,
            dim,
            *,
            dtype = None,
            out = None)

参数用法参照上面的torch.mean()

累加和：比如对于张量tensor([1, 2, 3, 4, 5])，其累加和为：tensor([1, 1+2, 1+2+3, 1+2+3+4, 1+2+3+4+5])，即在指定维度上，第i个元素为前i个元素的值的和

比如：

import torch
a = torch.mm(torch.ones(6, 1), torch.arange(1, 7, dtype = torch.float).reshape(1, -1))
print(f"a = {a}\n")
b_0 = torch.cumsum(a, dim = 0)
print(f"b_0 = {b}\n")
b_1 = torch.cumsum(a, dim = 1)
print(f"b_1 = {b_1}")

输出结果：

a = tensor([[1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.]])

b_0 = tensor([[ 1.,  2.,  3.,  4.,  5.,  6.],
        [ 2.,  4.,  6.,  8., 10., 12.],
        [ 3.,  6.,  9., 12., 15., 18.],
        [ 4.,  8., 12., 16., 20., 24.],
        [ 5., 10., 15., 20., 25., 30.],
        [ 6., 12., 18., 24., 30., 36.]])

b_1 = tensor([[ 1.,  3.,  6., 10., 15., 21.],
        [ 1.,  3.,  6., 10., 15., 21.],
        [ 1.,  3.,  6., 10., 15., 21.],
        [ 1.,  3.,  6., 10., 15., 21.],
        [ 1.,  3.,  6., 10., 15., 21.],
        [ 1.,  3.,  6., 10., 15., 21.]])

根据指定的维度计算中位数

torch.median(input,
            dim,
            keepdim = False,
            *,
            out = None)

参数用法参照上面的torch.max()

返回的数据为torch.return_types.median类型，有values和indices两种属性，values为对应维度上中位数的值，indices为对应维度上中位数在原张量中的位置（从0开始）

比如：

import torch
a = torch.mm(torch.ones(7, 1), torch.arange(1, 8, dtype = torch.float).reshape(1, -1))
print(f"a = {a}\n")
b = torch.median(a)
print("b = {b}\n")
b_0 = torch.median(a, dim = 0, keepdim = True)
print(f"b_0 = {b_0}\n")
b_1 = torch.median(a, dim = 1, keepdim = True)
print(f"b_1 = {b_1}\n")

输出结果：

a = tensor([[1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.],
        [1., 2., 3., 4., 5., 6., 7.]])

b = {b}

b_0 = torch.return_types.median(
values=tensor([[1., 2., 3., 4., 5., 6., 7.]]),
indices=tensor([[3, 3, 3, 3, 3, 3, 3]]))

b_1 = torch.return_types.median(
values=tensor([[4.],
        [4.],
        [4.],
        [4.],
        [4.],
        [4.],
        [4.]]),
indices=tensor([[3],
        [3],
        [3],
        [3],
        [3],
        [3],
        [3]]))

根据指定的维度计算乘积

torch.prod(input,
          dim,
          keepdim = False,
          *,
          dtype = None,
          out = None)

参数用法参照上面的torch.mean()

比如：

import torch
a = torch.ones(3, 4)*2
print(f"a = {a}\n")
b = torch.prod(a)
print(f"b = {b}\n")# 2^12
b_0 = torch.prod(a, dim = 0) # 2^3
print(f"b_0 = {b_0}\n")
b_1 = torch.prod(a, dim = 1) # 2^4
print(f"b_1 = {b_1}")

输出结果：

a = tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

b = 4096.0

b_0 = tensor([8., 8., 8., 8.])

b_1 = tensor([16., 16., 16.])

根据指定的维度计算累乘积

torch.cumpord(input,
             dim,
             *,
             dtype = None,
             out = None)

参数用法参照上面的torch.mean()

累乘积：与累加和类似，比如一个张量为tensor([1, 2, 3, 4, 5, 6])，那么累乘和的结果为tensor([1, 1*2, 1*2*3, 4!, 5!, 6!])

比如：

import torch
a = torch.mm(torch.ones(6, 1), torch.arange(1, 7, dtype = torch.float).reshape(1, -1))
print(f"a = {a}\n")
b_0 = torch.cumprod(a, dim = 0)
print(f"b_0 = {b_0}\n")
b_1 = torch.cumprod(a, dim = 1)
print(f"b_1 = {b_1}")

输出结果：

a = tensor([[1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.]])

b_0 = tensor([[1.0000e+00, 2.0000e+00, 3.0000e+00, 4.0000e+00, 5.0000e+00, 6.0000e+00],
        [1.0000e+00, 4.0000e+00, 9.0000e+00, 1.6000e+01, 2.5000e+01, 3.6000e+01],
        [1.0000e+00, 8.0000e+00, 2.7000e+01, 6.4000e+01, 1.2500e+02, 2.1600e+02],
        [1.0000e+00, 1.6000e+01, 8.1000e+01, 2.5600e+02, 6.2500e+02, 1.2960e+03],
        [1.0000e+00, 3.2000e+01, 2.4300e+02, 1.0240e+03, 3.1250e+03, 7.7760e+03],
        [1.0000e+00, 6.4000e+01, 7.2900e+02, 4.0960e+03, 1.5625e+04, 4.6656e+04]])

b_1 = tensor([[  1.,   2.,   6.,  24., 120., 720.],
        [  1.,   2.,   6.,  24., 120., 720.],
        [  1.,   2.,   6.,  24., 120., 720.],
        [  1.,   2.,   6.,  24., 120., 720.],
        [  1.,   2.,   6.,  24., 120., 720.],
        [  1.,   2.,   6.,  24., 120., 720.]])

计算张量的标准差

torch.std(input,
         dim,
         unbiased = True,
         keepdim = False,
         *,
         out = None)

unbiased：是否使用无偏估计，如果使用无偏估计，\(\sigma = \sqrt{\frac{1}{N-1}\displaystyle\sum_{i=1}^N(x_i-\overline{x})^2}\)；如果不使用无偏估计，\(\sigma = \sqrt{\frac{1}{N}\displaystyle\sum_{i=1}^N(x_i - \overline{x})^2}\)

比如：

import torch
a = torch.mm(torch.ones(6, 1), torch.arange(1, 7, dtype = torch.float).reshape(1, -1))
print(f"a = {a}\n")
b_0 = torch.std(a, dim = 0, keepdim = True)
print(f"b_0 = {b_0}\n")
b_1 = torch.std(a, dim = 1, keepdim = True)
print(f"b_1 = {b_1}\n")
b = torch.std(a)
print(f"b =", b)

输出结果：

a = tensor([[1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.],
        [1., 2., 3., 4., 5., 6.]])

b_0 = tensor([[0., 0., 0., 0., 0., 0.]])

b_1 = tensor([[1.8708],
        [1.8708],
        [1.8708],
        [1.8708],
        [1.8708],
        [1.8708]])

b = tensor(1.7321)

线性回归模型

import torch
import matplotlib.pyplot as plt
import numpy as np

# 定义权重和偏差
w = torch.tensor([0], dtype = torch.float, device = 'cuda:0', requires_grad = True)
b = torch.tensor([0], dtype = torch.float, device = 'cuda:0', requires_grad = True)

# 训练集
x = torch.linspace(1, 10, 20).cuda()
y = 2*x + 1

# 超参数
alpha = 0.001
epoch = 1500

# 记录代价函数
Loss = []

# 训练
for i in range(epoch):
    # 预测值
    y_hat = w*x + b
    
    # 损失函数
    L = (y_hat - y)**2
    
    # 代价函数
    J = L.sum()
    
    # 反向求导
    J.backward()
    
    # 梯度下降
    w.data = w.data - alpha*w.grad
    b.data = b.data - alpha*b.grad
    
    # 梯度清零
    w.grad.data.zero_()
    b.grad.data.zero_()
    
    # 记录代价函数
    Loss.append(J.item())

print(w, b)

x_1 = torch.linspace(0, 10, 1000).cuda()
y_1 = w*x_1 + b

# 输出预测图像
plt.scatter(x.cpu().numpy(), y.cpu().numpy())
plt.plot(x_1.cpu().numpy(), y_1.cpu().detach().numpy(), color="r") # 因为y_1的requres_grad为True，所以无法直接转为numpy数据，需要使用.detach()方法
plt.show()

# 输出代价函数曲线
train = np.arange(0, epoch)
J = np.array(Loss)
plt.plot(train, J, color="r")
plt.show()

多项式线性回归模型

import torch
import matplotlib.pyplot as plt
import numpy as np

w = torch.tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype = torch.float, requires_grad = True)
b = torch.tensor([0], dtype = torch.float, requires_grad = True)

xx = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], dtype = torch.float)
x_r = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], dtype = torch.float)
y = torch.tensor([10, 6, 3, 4, 6, 8, 10, 18], dtype = torch.float)

# 输入归一化
mu = (x_r.max() - x_r.min()) / x_r.size()[0]
delta = x_r.max() - x_r.min()
x_r = (x_r - mu) / delta



x = torch.tensor([x_r.cpu().numpy(), (x_r**2).cpu().numpy(), (x_r**3).cpu().numpy(), (x_r**4).cpu().numpy(), (x_r**5).cpu().numpy(), (x_r**6).cpu().numpy(), (x_r**7).cpu().numpy(), (x_r**8).cpu().numpy(), (x_r**9).cpu().numpy(), (x_r**10).cpu().numpy()])
J_r = []
epoch = 100000
alpha = 0.25


for i in range(epoch):
    y_hat = torch.mm(w, x) + b
    L = (y_hat - y)**2 / x_r.size()[0]
    J = L.sum() 
    
    J.backward()
    w.data = w.data - alpha*w.grad
    b.data = b.data - alpha*b.grad
    w.grad.data.zero_()
    b.grad.data.zero_()
    
    J_r.append(J.item())

X_r = torch.linspace(0, 8, 1000).cuda()
X_1 = (X_r - mu) / delta
X = torch.tensor([X_1.cpu().numpy(), (X_1**2).cpu().numpy(), (X_1**3).cpu().numpy(), (X_1**4).cpu().numpy(), (X_1**5).cpu().numpy(), (X_1**6).cpu().numpy(), (X_1**7).cpu().numpy(), (X_1**8).cpu().numpy(), (X_1**9).cpu().numpy(), (X_1**10).cpu().numpy()])
Y = torch.mm(w, X) + b

plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color = 'r')
plt.show()

1 2	`plt.plot(np.arange(0, epoch), J_r) plt.show()`

动态演示版：

import torch
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import numpy as np

w = torch.tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.float, requires_grad=True)
b = torch.tensor([0], dtype=torch.float, requires_grad=True)

xx = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], dtype=torch.float)
x_r = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8], dtype=torch.float)
# y = 6*(x_r**5) + 3*(x_r**3) + 2*(x_r)
y = torch.tensor([10, 6, 3, 4, 6, 8, 10, 18], dtype=torch.float)

mu = (x_r.max() - x_r.min()) / x_r.size()[0]
delta = x_r.max() - x_r.min()
x_r = (x_r - mu) / delta

x = torch.tensor([x_r.cpu().numpy(), (x_r ** 2).cpu().numpy(), (x_r ** 3).cpu().numpy(), (x_r ** 4).cpu().numpy(),
                  (x_r ** 5).cpu().numpy(), (x_r ** 6).cpu().numpy(), (x_r ** 7).cpu().numpy(),
                  (x_r ** 8).cpu().numpy(), (x_r ** 9).cpu().numpy(), (x_r ** 10).cpu().numpy()])
# y_hat = torch.zeros_like(y)
# L = torch.zeros_like(y)
J_r = []
epoch = 200
alpha = 0.25

# for i in range(epoch):
y_hat = torch.mm(w, x) + b
L = (y_hat - y) ** 2 / x_r.size()[0]
J = L.sum()

J.backward()
w.data = w.data - alpha * w.grad
b.data = b.data - alpha * b.grad
w.grad.data.zero_()
b.grad.data.zero_()



X_r = torch.linspace(0, 8, 1000).cuda()
X_1 = (X_r - mu) / delta
X = torch.tensor([X_1.cpu().numpy(), (X_1 ** 2).cpu().numpy(), (X_1 ** 3).cpu().numpy(), (X_1 ** 4).cpu().numpy(),
                  (X_1 ** 5).cpu().numpy(), (X_1 ** 6).cpu().numpy(), (X_1 ** 7).cpu().numpy(),
                  (X_1 ** 8).cpu().numpy(), (X_1 ** 9).cpu().numpy(), (X_1 ** 10).cpu().numpy()])
# print(J_r)
Y = torch.mm(w, X) + b
# print(Y[0])

fig = plt.figure(figsize=(12, 6))
# plt.rcParams["font.family"] = "FangSong"





# plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
for i in range(100001):
    y_hat = torch.mm(w, x) + b
    L = (y_hat - y) ** 2 / x_r.size()[0]
    J = L.sum()

    J.backward()
    w.data = w.data - alpha * w.grad
    b.data = b.data - alpha * b.grad
    w.grad.data.zero_()
    b.grad.data.zero_()
    J_r.append(J.item())
    Y = torch.mm(w, X) + b
    if i < 100:
        plt.pause(0.018)


        plt.subplot(121)
        plt.cla()
        plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
        plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
        plt.subplot(122)
        plt.cla()
        plt.plot(np.arange(0, i+1), J_r)
        plt.draw()
    elif i < 200:
        if i % 2 == 0:
            plt.pause(0.018)

            plt.subplot(121)
            plt.cla()
            plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
            plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
            plt.subplot(122)
            plt.cla()
            plt.plot(np.arange(0, i + 1), J_r)
            plt.draw()
    elif i < 500:
        if i % 5 == 0:
            plt.pause(0.018)

            plt.subplot(121)
            plt.cla()
            plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
            plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
            plt.subplot(122)
            plt.cla()
            plt.plot(np.arange(0, i + 1), J_r)
            plt.draw()
    elif i < 10000:
        if i % 100 == 0:
            plt.pause(0.018)

            plt.subplot(121)
            plt.cla()
            plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
            plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
            plt.subplot(122)
            plt.cla()
            plt.plot(np.arange(0, i + 1), J_r)
            plt.draw()
    elif i < 90000:
        if i % 1000 == 0:
            plt.pause(0.018)

            plt.subplot(121)
            plt.cla()
            plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
            plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
            plt.subplot(122)
            plt.cla()
            plt.plot(np.arange(i - 5000, i + 1), J_r[i - 5000:i + 1])
            plt.draw()
    else:
        alpha = 0.001
        if i % 1000 == 0:
            plt.pause(0.018)

            plt.subplot(121)
            plt.cla()
            plt.scatter(xx.cpu().numpy(), y.cpu().numpy())
            plt.plot(X_r.cpu().numpy(), Y[0].cpu().detach().numpy(), color='r')
            plt.subplot(122)
            plt.cla()
            plt.plot(np.arange(i - 1000, i + 1), J_r[i - 1000:i + 1])
            plt.draw()
plt.show()

逻辑回归模型

import torch
import matplotlib.pyplot as plt
import numpy as np

def sigmoid(w, x, b):
    return 1/(1 + torch.exp(-1*(w*x + b)))

x = torch.tensor([1, 2, 3, 9, 10, 17])
y = torch.tensor([0, 0, 0, 1, 1, 1])

w = torch.tensor([0], dtype = torch.float, requires_grad = True)
b = torch.tensor([0], dtype = torch.float, requires_grad = True)

J_r = []
epoch = 100000
alpha = 0.01


for i in range(epoch):
    if i == 2000:
        alpha = 0.005
    elif i == 5000:
        alpha = 0.002
    elif i == 10000:
        alpha = 0.001
    elif i == 20000:
        alpha = 0.0005
    elif i == 50000:
        alpha = 0.0001
    y_hat = sigmoid(w, x, b)
    L = -1*(y*torch.log(y_hat) + (1 - y)*torch.log(1 - y_hat))
    J = L.sum()
    J_r.append(J.item())
    J.backward()
    w.data = w.data - alpha*w.grad
    b.data = b.data - alpha*b.grad
    w.grad.data.zero_()
    b.grad.data.zero_()

plt.scatter(x.numpy(), y.numpy())

x_r = torch.linspace(0, 20, 1000)
y_r = sigmoid(w, x_r, b)

plt.plot(x_r.data, y_r.data, color='r')
plt.show()

1 2	`plt.plot(torch.arange(0, epoch), J_r) plt.show()`

全连接神经网络模型

import torch
import matplotlib.pyplot as plt
from tqdm import tqdm

x = torch.linspace(-4, 4, 100, dtype = torch.float).resize_(1, 100).cuda()
y = x**2 + torch.randn((1, 100)).cuda()
plt.subplot(121)
plt.scatter(x.cpu(), y.cpu())

# 输入层100个特征，两个隐藏层，每个隐藏层1500个神经元，输出层1个神经元，带上输入层共4层
w1 = torch.randn((1500, 1)).cuda() / torch.sqrt(torch.tensor([100/2])).cuda()
w1.requires_grad_(True)
b1 = torch.zeros((1500, 1)).cuda()
b1.requires_grad_(True)

w2 = torch.randn((1500, 1500)).cuda() / torch.sqrt(torch.tensor([1500/2])).cuda()
w2.requires_grad_(True)
b2 = torch.zeros((1500, 1)).cuda()
b2.requires_grad_(True)

w3 = torch.randn((1, 1500)).cuda() / torch.sqrt(torch.tensor([1500/2])).cuda()
w3.requires_grad_(True)
b3 = torch.zeros(1, 1).cuda()
b3.requires_grad_(True)

#超参数设置
epoch = 50000
alpha = 0.01
Loss = []

for i in tqdm(range(epoch)):
    z1 = torch.mm(w1, x) + b1
    a1 = torch.clamp_min(z1, 0.1*z1)

    z2 = torch.mm(w2, a1) + b2
    a2 = torch.clamp_min(z2, 0.1*z2)

    z3 = torch.mm(w3, a2) + b3
    a3 = torch.clamp_min(z3, 0.1*z3)
    # print(a2)

    J = torch.pow(a3 - y, 2).mean()
    J.backward()

    w3.data = w3.data - alpha*w3.grad
    b3.data = b3.data - alpha*b3.grad
    w2.data = w2.data - alpha*w2.grad
    b2.data = b2.data - alpha*b2.grad
    w1.data = w1.data - alpha*w1.grad
    b1.data = b1.data - alpha*b1.grad

    # print(w2.data)

    w3.grad.data.zero_()
    b3.grad.data.zero_()
    w2.grad.data.zero_()
    b2.grad.data.zero_()
    w1.grad.data.zero_()
    b1.grad.data.zero_()

    Loss.append(J.item())

    # print(J)

x = torch.linspace(-4, 4, 10000).resize_(1, 10000).cuda()

z1 = torch.mm(w1, x) + b1
a1 = torch.clamp_min(z1, 0.1*z1)
z2 = torch.mm(w2, a1) + b2
a2 = torch.clamp_min(z2, 0.1*z2)
z3 = torch.mm(w3, a2) + b3
a3 = torch.clamp_min(z3, 0.1*z3)
# print(a2.data)

plt.plot(x[0].cpu(), a3.data[0].cpu(), color="r")
plt.subplot(122)
plt.plot(torch.arange(0, epoch), Loss)
plt.show()

注意：设定的超参数很可能导致过拟合

深度学习 > PyTorch

#人工智能 #神经网络 #深度学习 #PyTorch

张量

https://blog.shinebook.net/2025/02/26/人工智能/pytorch/张量/

作者

发布于

2025年2月26日

许可协议

权重、偏置初始化的实现上一篇

神经网络的创建下一篇

张量

什么是张量

为什么要使用张量

什么是计算图，pytorch如何创建计算图

什么是计算图

动态计算图与静态计算图

张量的类型

张量的常用属性和方法

dtype

type()

shape

device

cpu()

cuda()

is_leaf

requires_grad

requires_grad_()

retain_grad()

grad

grad_fn

backward()

data与detach()

zero_()

item()

sum()

size()

max()

min()

numpy()

tolist()

clone()

张量的创建

直接创建

从numpy创建

依据数值创建

torch.zeros()

torch.zeros_like()

torch.ones()

torch.ones_like()

torch.full()

torch.full_like()

torch.eye()

按照间隔创建

torch.arange()

torch.linspace()

torch.logspace()

依据概率分布创建

torch.normal()

torch.randn()

torch.randn_like()

torch.rand()

torch.rand_like()

torch.randint()

torch.randint_like()

torch.randperm()

torch.bernoulli()

张量的操作

in-place操作和out-of-place(non-inplace)操作

张量的拼接与切分

张量的拼接

torch.cat()

torch.stack()

张量的切分

torch.chunk()

torch.split()

torch.index_select()

torch.masked_select()

张量变换

reshape()

resize_()

resize()

torch.transpose()

torch.squeeze()

torch.unsqueeze()

张量的计算

二维矩阵的计算

矩阵的转置

矩阵的乘法

矩阵的逆

矩阵的迹

`dtype`

`type()`

`shape`

`device`

`cpu()`

`cuda()`

`is_leaf`

`requires_grad`

`requires_grad_()`

`retain_grad()`

`grad`

`grad_fn`

`backward()`

`data`与`detach()`

`zero_()`

`item()`

`sum()`

`size()`

`max()`

`min()`

`numpy()`

`tolist()`

`clone()`

`torch.zeros()`

`torch.zeros_like()`

`torch.ones()`

`torch.ones_like()`

`torch.full()`

`torch.full_like()`

`torch.eye()`

`torch.arange()`

`torch.linspace()`

`torch.logspace()`

`torch.normal()`

`torch.randn()`

`torch.randn_like()`

`torch.rand()`

`torch.rand_like()`

`torch.randint()`

`torch.randint_like()`

`torch.randperm()`

`torch.bernoulli()`

`torch.cat()`

`torch.stack()`

`torch.chunk()`

`torch.split()`

`torch.index_select()`

`torch.masked_select()`

`reshape()`

`resize_()`

`resize()`

`torch.transpose()`

`torch.squeeze()`

`torch.unsqueeze()`