构建模型

GraphTransformer 是一种图神经网络，它使用多头自注意力（稀疏或密集）来编码图结构和节点特征。它是 Transformer 架构在任意图上的泛化。

在本教程中，我们将以 Graphormer 模型为例，展示如何使用 DGL 构建图 Transformer 模型。

Graphormer 是一种为图结构数据设计的 Transformer 模型，它将图的结构信息编码到标准 Transformer 中。具体来说，Graphormer 利用度编码（degree encoding）来衡量节点的重要性，利用空间编码（spatial encoding）和路径编码（path encoding）来衡量节点对之间的关系。度编码和节点特征作为 Graphormer 的输入，而空间编码和路径编码则在自注意力模块中充当偏置项。

度编码

度编码器是一个可学习的嵌入层，它将每个节点的度编码成一个向量。它以图节点的批量输入度和输出度为输入，并输出节点的度嵌入。

degree_encoder = dgl.nn.DegreeEncoder(
    max_degree=8,  # the maximum degree to cut off
    embedding_dim=512  # the dimension of the degree embedding
)

路径编码

路径编码器编码两个节点之间最短路径上的边特征，以获取自注意力模块的注意力偏置。它以形状为 [shape] 的批量边特征为输入，并根据路径编码输出注意力偏置。

path_encoder = PathEncoder(
    max_len=5,  # the maximum length of the shortest path
    feat_dim=512,  # the dimension of the edge feature
    num_heads=8,  # the number of attention heads
)

空间编码

空间编码器编码两个节点之间的最短距离，以获取自注意力模块的注意力偏置。它以两个节点之间的最短距离为输入，并根据空间编码输出注意力偏置。

spatial_encoder = SpatialEncoder(
    max_dist=5,  # the maximum distance between two nodes
    num_heads=8,  # the number of attention heads
)

Graphormer 层

Graphormer 层类似于 Transformer 编码器层，其中多头注意力部分被 BiasedMHA 替换。它不仅接收输入的节点特征，还接收上面计算的注意力偏置，并输出更新后的节点特征。

我们可以像在 PyTorch 中实现 Transformer 编码器一样，将多个 Graphormer 层堆叠成一个列表。

layers = th.nn.ModuleList([
    GraphormerLayer(
        feat_size=512,  # the dimension of the input node features
        hidden_size=1024,  # the dimension of the hidden layer
        num_heads=8,  # the number of attention heads
        dropout=0.1,  # the dropout rate
        activation=th.nn.ReLU(),  # the activation function
        norm_first=False,  # whether to put the normalization before attention and feedforward
    )
    for _ in range(6)
])

模型前向传播

将上述模块组合起来定义了 Graphormer 模型的主要组成部分。然后我们可以如下定义前向传播过程

node_feat, in_degree, out_degree, attn_mask, path_data, dist = \
    next(iter(dataloader))  #  we will use the first batch as an example
num_graphs, max_num_nodes, _ = node_feat.shape
deg_emb = degree_encoder(th.stack((in_degree, out_degree)))

# node feature + degree encoding as input
node_feat = node_feat + deg_emb

# spatial encoding and path encoding serve as attention bias
path_encoding = path_encoder(dist, path_data)
spatial_encoding = spatial_encoder(dist)
attn_bias[:, 1:, 1:, :] = path_encoding + spatial_encoding

# graphormer layers
for layer in layers:
    x = layer(
        x,
        attn_mask=attn_mask,
        attn_bias=attn_bias,
    )

为简单起见，我们在前向传播过程中省略了一些细节。完整的实现请参阅 Graphormer 示例。

您还可以探索其他实用模块来自定义您自己的图 Transformer 模型。在下一节中，我们将展示如何准备用于训练的数据。