异构图支持

什么是异构图？

异构图是节点和边都具有类型的图

适用于异构图的模型？

使用异构图 API 的模型

数据集	RMSE (DGL)	RMSE (官方)	速度 (DGL)	速度 (官方)	速度对比
MovieLens-100K	0.9077	0.910	0.0246秒/轮	0.1008秒/轮	5倍
MovieLens-1M	0.8377	0.832	0.0695秒/轮	1.538秒/轮	22倍
MovieLens-10M（全图训练）	0.7875	0.777	0.6480秒/轮	内存溢出	-

R-GCN [PyTorch 代码]
- 我们提供了支持异构图输入的 R-GCN 模型。新代码可以使用一个 GPU 训练 AM 数据集（>5M 边），而原始实现只能在 CPU 上运行并消耗 32GB 内存。
- 原始实现在 CPU 上训练一轮需要 51.88秒。基于异构图的新 R-GCN 在 V100 GPU 上训练一轮仅需 0.1781秒（快 291 倍！！）。
异构注意力网络 [PyTorch 代码]
Metapath2vec [PyTorch 代码]
- 元路径采样器比原始实现快一倍。

如何使用异构图？

以下是创建和操作异构图的示例

import dgl
import torch
import dgl.function as fn

g = dgl.heterograph({
    ('user', 'follows', 'user'): [(0, 1), (1, 2)],
    ('user', 'plays', 'game'): [(0, 0), (1, 0), (1, 1), (2, 1)],
    ('game', 'attracts', 'user'): [(0, 0), (0, 1), (1, 1), (1, 2)],
    ('developer', 'develops', 'game'): [(0, 0), (1, 1)],
    })

# Here the user nodes have a single feature named x, and game nodes have a single feature named y
x = torch.randn(3, 5)
y = torch.randn(2, 4)
g.nodes['user'].data['x'] = x
g.nodes['game'].data['y'] = y

# Edge features are similar
a = torch.randn(2, 5)
b = torch.randn(4, 7)
g.edges['follows'].data['a'] = a
g.edges['plays'].data['b'] = b

# One can also perform message passing.
# The following code performs a full message passing on the "plays" edges.
g['follows'].update_all(fn.copy_u('x', 'm'), fn.sum('m', 'z'))
z = g.nodes['game'].data['z']
assert torch.allclose(z[0], x[0])
assert torch.allclose(z[1], x[0] + x[1])
assert torch.allclose(z[2], x[1])

# Moreover, one can also perform message passing on multiple types at the same time, aggregating the results
g.multi_update_all({
    'follows': (fn.copy_u('x', 'm'), fn.sum('m', 'w')),
    'attracts': (fn.copy_u('a', 'm'), fn.sum('m', 'w')),
    }, 'sum')

查看我们的异构图教程：在 DGL 中使用异构图

查看完整的 API 参考。

知识图谱模型

我们还发布了 DGL-KE，这是 DGL 的一个子包，用于在知识图谱上训练嵌入。该包改编自 KnowledgeGraphEmbedding 包。我们使其快速且可扩展，同时保持了原始包的灵活性。使用单个 NVIDIA V100 GPU，DGL-KE 可以在 6.85 分钟内训练 FB15k 上的 TransE，大大优于 GraphVite 等现有工具。对于拥有数亿条边的图（例如完整的 Freebase 图），在一台 EC2 x1.32xlarge 机器上需要几个小时。

目前支持以下模型

TransE
DistMult
ComplEx

支持以下训练方案

CPU 训练
GPU 训练
CPU & GPU 联合训练
在 CPU 上进行多进程训练

使用一个 NVIDIA V100 GPU 在 FB15k 上的训练结果

训练速度

模型	TransE	DistMult	ComplEx
最大步数	20000	100000	100000
时间	411秒	690秒	806秒

训练精度

模型	MR	MRR	HITS@1	HITS@3	HITS@10
TransE	69.12	0.656	0.567	0.718	0.802
DistMult	43.35	0.783	0.713	0.837	0.897
ComplEx	51.99	0.785	0.720	0.832	0.889

相比之下，GraphVite 使用 4 个 GPU 需要 14 分钟。因此，DGL-KE 在 FB15k 上训练 TransE 比 GraphVite 快 2 倍，同时使用更少的资源。

更多信息请参考此目录

其他

新的内置消息函数：点积（u_dot_v 等 #831 @classicsong）
更高效的数据格式和序列化（#728 @VoVAllen）
ClusterGCN（#877, @Zardinality）
CoraFull, Amazon, KarateClub, Coauthor 数据集（#855 @VoVAllen）
更多性能改进
更多错误修复

10月 08日

作者：DGL团队，类别：发布

博客详情

DGL v0.4 版本发布（异构图更新）

异构图支持

什么是异构图？

适用于异构图的模型？

如何使用异构图？

知识图谱模型

其他

关注我们

快速链接

资料

联系我们