资讯详情

DataWhale 组队学习GNN task3

DataWhale 组队学习GNN task3

参考:[DataWhale GNN 学习资料],torch_geometric.nn — pytorch_geometric 1.7.1 documentation (pytorch-geometric.readthedocs.io)、

学习基于图神经网络的节点表征

分析数据集

from torch_geometric.datasets import Planetoid from torch_geometric.transforms import NormalizeFeatures  dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())  print() print(f'Dataset: {dataset}:') print('======================') print(f'Number of graphs: {len(dataset)}') print(f'Number of features: {dataset.num_features}') print(f'Number of classes: {dataset.num_classes}')  data = dataset[0]  # Get the first graph object.  print() print(data) print('======================')  # Gather some statistics about the graph. print(f'Number of nodes: {data.num_nodes}') print(f'Number of edges: {data.num_edges}') print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}') print(f'Number of training nodes: {data.train_mask.sum()}') print(f'Training node label rate: {int(data.train_mask.sum()) / data.num_nodes:.2f}') print(f'Contains isolated nodes: {data.contains_isolated_nodes()}') print(f'Contains self-loops: {data.contains_self_loops()}') print(f'Is undirected: {data.is_undirected()}') 
Dataset: Cora():
======================
Number of graphs: 1
Number of features: 1433
Number of classes: 7

Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
======================
Number of nodes: 2708
Number of edges: 10556
Average node degree: 3.90
Number of training nodes: 140
Training node label rate: 0.05
Contains isolated nodes: False
Contains self-loops: False
Is undirected: True

​ 由结果可以看出 Cora 图拥有:

Note: 无法下载数据集:Planetoid无法直接下载Cora等数据集的3个解决方式_诸神缄默不语的博客-CSDN博客

可视化节点表征分布的方法

​ 利用 **TSNE **将对高维节点[]表征嵌入到二维平面空间,然后在二维平面空间画出节点

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize(out, color):
    z = TSNE(n_components=2).fit_transform(out.detach().cpu().numpy())
    plt.figure(figsize=(10,10))
    plt.xticks([])
    plt.yticks([])

    plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
    plt.show()

使用 MLP 进行图节点分类

​ MLP 只对输入节点的特征进行操作,它在所有节点之间共享权重

构建 MLP 图节点分类器模型

import torch
from torch.nn import Linear
import torch.nn.functional as F

class MLP(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(MLP, self).__init__()
        torch.manual_seed(12345)	# 为 CPU 设置种子用于生成随机数,以使得结果是确定的
        self.lin1 = Linear(dataset.num_features, hidden_channels)
        self.lin2 = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x):
        x = self.lin1(x)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        return x

model = MLP(hidden_channels=16)
print(model)
""" MLP( (lin1): Linear(in_features=1433, out_features=16, bias=True) (lin2): Linear(in_features=16, out_features=7, bias=True) ) """

训练模型

利用进行训练:

model = MLP(hidden_channels=16)
criterion = torch.nn.CrossEntropyLoss()  # 定义损失标准
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)  # 定义优化函数

def train():
    model.train()
    optimizer.zero_grad()  	# Clear gradients.
    out = model(data.x)  	# Perform a single forward pass.
    loss = criterion(out[data.train_mask], data.y[data.train_mask])  # 只根据训练节点计算损失
    loss.backward()  		# Derive gradients.
    optimizer.step()  		# 根据梯度更新参数
    return loss

for epoch in range(1, 51):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
    """ Epoch: 046, Loss: 1.1284 Epoch: 047, Loss: 1.1229 Epoch: 048, Loss: 1.0383 Epoch: 049, Loss: 1.0439 Epoch: 050, Loss: 1.0563 """

测试模型

def test():
    model.eval()
    out = model(data.x)
    pred = out.argmax(dim=1)  # 选取概率最高的一类
    test_correct = pred[data.test_mask] == data.y[data.test_mask]	# 预测与真实对比
    test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # 准确率
    return test_acc

test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')

在这里插入图片描述

​ 用于训练 MLP 的有标签节点数量过少,此神经网络被过拟合,它对未见过的节点泛化性很差

使用 GCN 进行图节点分类

数学定义

X ′ = D ^ − 1 / 2 A ^ D ^ − 1 / 2 X Θ , \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta}, X′=D^−1/2A^D^−1/2XΘ,

​ 其中 A ^ = A + I \mathbf{\hat{A}} = \mathbf{A} + \mathbf{I} A^=A+I 表示插入自环的邻接矩阵, D ^ i i = ∑ j = 0 A ^ i j \hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij} D^ii​=∑j=0​A^ij​ 表示其对角线度矩阵。邻接矩阵可以包括不为 1 1 1 的值,当邻接矩阵不为 {0, 1} 值时,表示邻接矩阵存储的是边的权重

​ D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^−1/2A^D^−1/2 为对称归一化矩阵,它的节点可表述为: x i ′ = Θ ∑ j ∈ N ( v ) ∪ { i } e j , i d ^ j d ^ i x j \mathbf{x}^{\prime}_i = \mathbf{\Theta} \sum_{j \in \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j \hat{d}_i}} \mathbf{x}_j xi′​=Θj∈N(v)∪{ i}∑​d^j​d^i​ ​ej,i​​xj​ 其中 d ^ i = 1 + ∑ j ∈ N ( i ) e j , i \hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i} d^i​=1+∑j∈N(i)​ej,i​, e j , i e_{j,i} ej,i​ 表示从源节点 j j j 到目标节点 i i i 的边的对称归一化系数(默认值为1.0)

PyG 中的 GCNConv 模型

GCNConv(in_channels: int, out_channels: int, improved: bool = False, cached: bool = False, add_self_loops: bool = True, normalize: bool = True, bias: bool = True, **kwargs)

其中:

  • in_channels:输入数据维度
  • out_channels:输出数据维度
  • improved:如果为 true, A ^ = A + 2 I \mathbf{\hat{A}} = \mathbf{A} + 2\mathbf{I} A^=A+2I,其目的在于增强中心节点自身信息
  • cached:是否存储 D ^ − 1 / 2 A ^ D ^ − 1 / 2 \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} D^−1/2A^D^−1/2 的计算结果以便后续使用,这个参数只应在归纳学习的景中设置为true
  • add_self_loops:是否在邻接矩阵中增加自环边
  • normalize:是否添加自环边并在运行中计算对称归一化系数
  • bias:是否包含偏置项

构建 GCN 图节点分类器模型

​ 与 MLP 图节点分类器模型的不同在于:将线性层(torch.nn.Linear)改为了图卷积层(torch_geometric.nn.GCNConv

from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GCN, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GCNConv(dataset.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GCN(hidden_channels=16)
print(model)
""" GCN( (conv1): GCNConv(1433, 16) (conv2): GCNConv(16, 7) ) """

可视化未训练的 GCN 网络的节点表征

model.eval()

out = model(data.x, data.edge_index)
visualize(out, color=data.y)

训练模型

optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x, data.edge_index)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # 只根据训练节点计算损失
      loss.backward()  # Derive gradients.
      optimizer.step()  # 根据梯度更新参数
      return loss

for epoch in range(1, 51):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
    """ Epoch: 046, Loss: 1.2266 Epoch: 047, Loss: 1.2149 Epoch: 048, Loss: 1.1631 Epoch: 049, Loss: 1.1756 Epoch: 050, Loss: 1.1714 """

测试模型

def test():
      model.eval()
      out = model(data.x, data.edge_index)
      pred = out.argmax(dim=1)  # 选取概率最高的一类
      test_correct = pred[data.test_mask] 

标签: 4595连接器

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

锐单商城 - 一站式电子元器件采购平台