1.2 图、节点和边

(中文版)

DGL 使用唯一的整数来表示每个节点，称为节点 ID。每条边则使用一对整数表示其两个端节点的 ID。DGL 根据添加到图中的顺序为每条边分配一个唯一的整数，称为其边 ID。节点和边 ID 的编号从 0 开始。在 DGL 中，所有边都是有向的，边 \((u, v)\) 表示方向从节点 \(u\) 指向节点 \(v\)。

为了指定多个节点，DGL 使用一个包含节点 ID 的 1-D 整数张量（即 PyTorch 的 tensor、TensorFlow 的 Tensor 或 MXNet 的 ndarray）。DGL 将这种格式称为“节点张量”（node-tensors）。为了指定多条边，DGL 使用一个节点张量对 \((U, V)\)。其中 \((U[i], V[i])\) 定义了一条从 \(U[i]\) 到 \(V[i]\) 的边。

创建 DGLGraph 的一种方法是使用 dgl.graph() 方法，该方法接受一组边作为输入。DGL 还支持从其他数据源创建图，详见 1.4 从外部来源创建图。

以下代码片段使用 dgl.graph() 方法创建了一个对应于下方所示的四节点图的 DGLGraph，并演示了其用于查询图结构的一些 API。

https://data.dgl.ai/asset/image/user_guide_graphch_1.png

>>> import dgl
>>> import torch as th

>>> # edges 0->1, 0->2, 0->3, 1->3
>>> u, v = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
>>> g = dgl.graph((u, v))
>>> print(g) # number of nodes are inferred from the max node IDs in the given edges
Graph(num_nodes=4, num_edges=4,
      ndata_schemes={}
      edata_schemes={})

>>> # Node IDs
>>> print(g.nodes())
tensor([0, 1, 2, 3])
>>> # Edge end nodes
>>> print(g.edges())
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]))
>>> # Edge end nodes and edge IDs
>>> print(g.edges(form='all'))
(tensor([0, 0, 0, 1]), tensor([1, 2, 3, 3]), tensor([0, 1, 2, 3]))

>>> # If the node with the largest ID is isolated (meaning no edges),
>>> # then one needs to explicitly set the number of nodes
>>> g = dgl.graph((u, v), num_nodes=8)

对于无向图，需要创建双向的边。dgl.to_bidirected() 在这种情况下很有用，它可以将一个图转换为一个新的、包含双向边的图。

>>> bg = dgl.to_bidirected(g)
>>> bg.edges()
(tensor([0, 0, 0, 1, 1, 2, 3, 3]), tensor([1, 2, 3, 0, 3, 0, 0, 1]))

注意

由于张量类型在 C 语言中具有高效的内部存储以及明确的数据类型和设备上下文信息，DGL API 中通常首选使用张量类型。但是，大多数 DGL API 确实支持使用 Python 可迭代对象（如列表）或 numpy.ndarray 作为参数，以便进行快速原型设计。

DGL 可以使用 \(32\) 位或 \(64\) 位整数来存储节点和边 ID。节点和边 ID 的数据类型应相同。通过使用 \(64\) 位整数，DGL 可以处理包含高达 \(2^{63} - 1\) 个节点或边的图。但是，如果一个图包含少于 \(2^{31} - 1\) 个节点或边，则应使用 \(32\) 位整数，因为这会带来更好的速度并需要更少的内存。DGL 提供了进行此类转换的方法。请参阅下方的示例。

>>> edges = th.tensor([2, 5, 3]), th.tensor([3, 5, 0])  # edges 2->3, 5->5, 3->0
>>> g64 = dgl.graph(edges)  # DGL uses int64 by default
>>> print(g64.idtype)
torch.int64
>>> g32 = dgl.graph(edges, idtype=th.int32)  # create a int32 graph
>>> g32.idtype
torch.int32
>>> g64_2 = g32.long()  # convert to int64
>>> g64_2.idtype
torch.int64
>>> g32_2 = g64.int()  # convert to int32
>>> g32_2.idtype
torch.int32