同构图的 OnDiskDataset
本教程展示了如何为同构图创建可用于 GraphBolt 框架的 OnDiskDataset
。
学完本教程后,您将能够
组织图结构数据。
组织特征数据。
为特定任务组织训练/验证/测试集。
要创建一个 OnDiskDataset
对象,您需要将所有数据(包括图结构、特征数据和任务)组织到一个目录中。该目录应包含一个 metadata.yaml
文件,该文件描述了数据集的元数据。
现在,让我们一步步生成各种数据并将它们组织起来,最终实例化 OnDiskDataset
。
安装 DGL 包
[1]:
# Install required packages.
import os
import torch
import numpy as np
os.environ['TORCH'] = torch.__version__
os.environ['DGLBACKEND'] = "pytorch"
# Install the CPU version.
device = torch.device("cpu")
!pip install --pre dgl -f https://data.dgl.ai/wheels-test/repo.html
try:
import dgl
import dgl.graphbolt as gb
installed = True
except ImportError as error:
installed = False
print(error)
print("DGL installed!" if installed else "DGL not found!")
Looking in links: https://data.dgl.ai/wheels-test/repo.html
Requirement already satisfied: dgl in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (2.2a240410)
Requirement already satisfied: numpy>=1.14.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (1.26.4)
Requirement already satisfied: scipy>=1.1.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (1.14.1)
Requirement already satisfied: networkx>=2.1 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (3.4.2)
Requirement already satisfied: requests>=2.19.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (2.32.3)
Requirement already satisfied: tqdm in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (4.66.6)
Requirement already satisfied: psutil>=5.8.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (6.1.0)
Requirement already satisfied: torchdata>=0.5.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (0.9.0)
Requirement already satisfied: pandas in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from dgl) (2.2.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from requests>=2.19.0->dgl) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from requests>=2.19.0->dgl) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from requests>=2.19.0->dgl) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from requests>=2.19.0->dgl) (2024.8.30)
Requirement already satisfied: torch>=2 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torchdata>=0.5.0->dgl) (2.1.0+cpu)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from pandas->dgl) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from pandas->dgl) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from pandas->dgl) (2024.2)
Requirement already satisfied: six>=1.5 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->dgl) (1.16.0)
Requirement already satisfied: filelock in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torch>=2->torchdata>=0.5.0->dgl) (3.16.1)
Requirement already satisfied: typing-extensions in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torch>=2->torchdata>=0.5.0->dgl) (4.12.2)
Requirement already satisfied: sympy in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torch>=2->torchdata>=0.5.0->dgl) (1.13.3)
Requirement already satisfied: jinja2 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torch>=2->torchdata>=0.5.0->dgl) (3.1.4)
Requirement already satisfied: fsspec in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from torch>=2->torchdata>=0.5.0->dgl) (2024.10.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from jinja2->torch>=2->torchdata>=0.5.0->dgl) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/dgl-dev-cpu/lib/python3.10/site-packages (from sympy->torch>=2->torchdata>=0.5.0->dgl) (1.3.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pythonlang.cn/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
DGL installed!
数据准备
为了演示如何组织各种数据,我们首先创建一个基础目录。
[2]:
base_dir = './ondisk_dataset_homograph'
os.makedirs(base_dir, exist_ok=True)
print(f"Created base directory: {base_dir}")
Created base directory: ./ondisk_dataset_homograph
生成图结构数据
对于同构图,我们只需要将边(即种子)保存到 Numpy 或 CSV 文件中。
注意:- 保存为 Numpy 时,数组需要是 (2, N)
的形状。推荐使用此格式,因为它比 CSV 文件构造图的速度快得多。- 保存为 CSV 文件时,不要保存索引和标题。
[3]:
import numpy as np
import pandas as pd
num_nodes = 1000
num_edges = 10 * num_nodes
edges_path = os.path.join(base_dir, "edges.csv")
edges = np.random.randint(0, num_nodes, size=(num_edges, 2))
print(f"Part of edges: {edges[:5, :]}")
df = pd.DataFrame(edges)
df.to_csv(edges_path, index=False, header=False)
print(f"Edges are saved into {edges_path}")
Part of edges: [[734 698]
[492 101]
[141 102]
[255 293]
[172 382]]
Edges are saved into ./ondisk_dataset_homograph/edges.csv
生成图的特征数据
目前,特征数据支持 numpy 数组和 torch 张量。
[4]:
# Generate node feature in numpy array.
node_feat_0_path = os.path.join(base_dir, "node-feat-0.npy")
node_feat_0 = np.random.rand(num_nodes, 5)
print(f"Part of node feature [feat_0]: {node_feat_0[:3, :]}")
np.save(node_feat_0_path, node_feat_0)
print(f"Node feature [feat_0] is saved to {node_feat_0_path}\n")
# Generate another node feature in torch tensor
node_feat_1_path = os.path.join(base_dir, "node-feat-1.pt")
node_feat_1 = torch.rand(num_nodes, 5)
print(f"Part of node feature [feat_1]: {node_feat_1[:3, :]}")
torch.save(node_feat_1, node_feat_1_path)
print(f"Node feature [feat_1] is saved to {node_feat_1_path}\n")
# Generate edge feature in numpy array.
edge_feat_0_path = os.path.join(base_dir, "edge-feat-0.npy")
edge_feat_0 = np.random.rand(num_edges, 5)
print(f"Part of edge feature [feat_0]: {edge_feat_0[:3, :]}")
np.save(edge_feat_0_path, edge_feat_0)
print(f"Edge feature [feat_0] is saved to {edge_feat_0_path}\n")
# Generate another edge feature in torch tensor
edge_feat_1_path = os.path.join(base_dir, "edge-feat-1.pt")
edge_feat_1 = torch.rand(num_edges, 5)
print(f"Part of edge feature [feat_1]: {edge_feat_1[:3, :]}")
torch.save(edge_feat_1, edge_feat_1_path)
print(f"Edge feature [feat_1] is saved to {edge_feat_1_path}\n")
Part of node feature [feat_0]: [[0.2675768 0.84555141 0.31953485 0.70518215 0.7384711 ]
[0.17017616 0.67410909 0.49357539 0.17954053 0.51379857]
[0.20808962 0.62090961 0.00869142 0.76270778 0.75740362]]
Node feature [feat_0] is saved to ./ondisk_dataset_homograph/node-feat-0.npy
Part of node feature [feat_1]: tensor([[0.3102, 0.6617, 0.3103, 0.1763, 0.4377],
[0.3336, 0.4147, 0.4776, 0.6154, 0.4325],
[0.9472, 0.4797, 0.4150, 0.9046, 0.7426]])
Node feature [feat_1] is saved to ./ondisk_dataset_homograph/node-feat-1.pt
Part of edge feature [feat_0]: [[0.48323927 0.16915343 0.64657681 0.95671693 0.67171557]
[0.73523352 0.25524394 0.82357219 0.84688155 0.09598407]
[0.03860003 0.93619916 0.81360089 0.47665546 0.93298402]]
Edge feature [feat_0] is saved to ./ondisk_dataset_homograph/edge-feat-0.npy
Part of edge feature [feat_1]: tensor([[0.5663, 0.9633, 0.1347, 0.3310, 0.9384],
[0.8327, 0.9789, 0.8282, 0.2175, 0.5416],
[0.0256, 0.3471, 0.4384, 0.0020, 0.7780]])
Edge feature [feat_1] is saved to ./ondisk_dataset_homograph/edge-feat-1.pt
生成任务
OnDiskDataset
支持多个任务。对于每个任务,我们需要分别准备训练/验证/测试集。这些集合通常因任务而异。在本教程中,我们将创建一个节点分类任务和一个链接预测任务。
节点分类任务
对于节点分类任务,我们需要每个训练/验证/测试集的节点 ID 和相应的标签。与特征数据一样,这些集合支持 numpy 数组和 torch 张量。
[5]:
num_trains = int(num_nodes * 0.6)
num_vals = int(num_nodes * 0.2)
num_tests = num_nodes - num_trains - num_vals
ids = np.arange(num_nodes)
np.random.shuffle(ids)
nc_train_ids_path = os.path.join(base_dir, "nc-train-ids.npy")
nc_train_ids = ids[:num_trains]
print(f"Part of train ids for node classification: {nc_train_ids[:3]}")
np.save(nc_train_ids_path, nc_train_ids)
print(f"NC train ids are saved to {nc_train_ids_path}\n")
nc_train_labels_path = os.path.join(base_dir, "nc-train-labels.pt")
nc_train_labels = torch.randint(0, 10, (num_trains,))
print(f"Part of train labels for node classification: {nc_train_labels[:3]}")
torch.save(nc_train_labels, nc_train_labels_path)
print(f"NC train labels are saved to {nc_train_labels_path}\n")
nc_val_ids_path = os.path.join(base_dir, "nc-val-ids.npy")
nc_val_ids = ids[num_trains:num_trains+num_vals]
print(f"Part of val ids for node classification: {nc_val_ids[:3]}")
np.save(nc_val_ids_path, nc_val_ids)
print(f"NC val ids are saved to {nc_val_ids_path}\n")
nc_val_labels_path = os.path.join(base_dir, "nc-val-labels.pt")
nc_val_labels = torch.randint(0, 10, (num_vals,))
print(f"Part of val labels for node classification: {nc_val_labels[:3]}")
torch.save(nc_val_labels, nc_val_labels_path)
print(f"NC val labels are saved to {nc_val_labels_path}\n")
nc_test_ids_path = os.path.join(base_dir, "nc-test-ids.npy")
nc_test_ids = ids[-num_tests:]
print(f"Part of test ids for node classification: {nc_test_ids[:3]}")
np.save(nc_test_ids_path, nc_test_ids)
print(f"NC test ids are saved to {nc_test_ids_path}\n")
nc_test_labels_path = os.path.join(base_dir, "nc-test-labels.pt")
nc_test_labels = torch.randint(0, 10, (num_tests,))
print(f"Part of test labels for node classification: {nc_test_labels[:3]}")
torch.save(nc_test_labels, nc_test_labels_path)
print(f"NC test labels are saved to {nc_test_labels_path}\n")
Part of train ids for node classification: [809 209 773]
NC train ids are saved to ./ondisk_dataset_homograph/nc-train-ids.npy
Part of train labels for node classification: tensor([1, 2, 6])
NC train labels are saved to ./ondisk_dataset_homograph/nc-train-labels.pt
Part of val ids for node classification: [156 777 233]
NC val ids are saved to ./ondisk_dataset_homograph/nc-val-ids.npy
Part of val labels for node classification: tensor([2, 8, 3])
NC val labels are saved to ./ondisk_dataset_homograph/nc-val-labels.pt
Part of test ids for node classification: [484 372 48]
NC test ids are saved to ./ondisk_dataset_homograph/nc-test-ids.npy
Part of test labels for node classification: tensor([8, 6, 8])
NC test labels are saved to ./ondisk_dataset_homograph/nc-test-labels.pt
链接预测任务
对于链接预测任务,我们需要每个训练/验证/测试集的种子或表示种子的正/负属性和组的相应标签和索引。与特征数据一样,这些集合支持 numpy 数组和 torch 张量。
[6]:
num_trains = int(num_edges * 0.6)
num_vals = int(num_edges * 0.2)
num_tests = num_edges - num_trains - num_vals
lp_train_seeds_path = os.path.join(base_dir, "lp-train-seeds.npy")
lp_train_seeds = edges[:num_trains, :]
print(f"Part of train seeds for link prediction: {lp_train_seeds[:3]}")
np.save(lp_train_seeds_path, lp_train_seeds)
print(f"LP train seeds are saved to {lp_train_seeds_path}\n")
lp_val_seeds_path = os.path.join(base_dir, "lp-val-seeds.npy")
lp_val_seeds = edges[num_trains:num_trains+num_vals, :]
lp_val_neg_dsts = np.random.randint(0, num_nodes, (num_vals, 10)).reshape(-1)
lp_val_neg_srcs = np.repeat(lp_val_seeds[:,0], 10)
lp_val_neg_seeds = np.concatenate((lp_val_neg_srcs, lp_val_neg_dsts)).reshape(2,-1).T
lp_val_seeds = np.concatenate((lp_val_seeds, lp_val_neg_seeds))
print(f"Part of val seeds for link prediction: {lp_val_seeds[:3]}")
np.save(lp_val_seeds_path, lp_val_seeds)
print(f"LP val seeds are saved to {lp_val_seeds_path}\n")
lp_val_labels_path = os.path.join(base_dir, "lp-val-labels.npy")
lp_val_labels = np.empty(num_vals * (10 + 1))
lp_val_labels[:num_vals] = 1
lp_val_labels[num_vals:] = 0
print(f"Part of val labels for link prediction: {lp_val_labels[:3]}")
np.save(lp_val_labels_path, lp_val_labels)
print(f"LP val labels are saved to {lp_val_labels_path}\n")
lp_val_indexes_path = os.path.join(base_dir, "lp-val-indexes.npy")
lp_val_indexes = np.arange(0, num_vals)
lp_val_neg_indexes = np.repeat(lp_val_indexes, 10)
lp_val_indexes = np.concatenate([lp_val_indexes, lp_val_neg_indexes])
print(f"Part of val indexes for link prediction: {lp_val_indexes[:3]}")
np.save(lp_val_indexes_path, lp_val_indexes)
print(f"LP val indexes are saved to {lp_val_indexes_path}\n")
lp_test_seeds_path = os.path.join(base_dir, "lp-test-seeds.npy")
lp_test_seeds = edges[-num_tests:, :]
lp_test_neg_dsts = np.random.randint(0, num_nodes, (num_tests, 10)).reshape(-1)
lp_test_neg_srcs = np.repeat(lp_test_seeds[:,0], 10)
lp_test_neg_seeds = np.concatenate((lp_test_neg_srcs, lp_test_neg_dsts)).reshape(2,-1).T
lp_test_seeds = np.concatenate((lp_test_seeds, lp_test_neg_seeds))
print(f"Part of test seeds for link prediction: {lp_test_seeds[:3]}")
np.save(lp_test_seeds_path, lp_test_seeds)
print(f"LP test seeds are saved to {lp_test_seeds_path}\n")
lp_test_labels_path = os.path.join(base_dir, "lp-test-labels.npy")
lp_test_labels = np.empty(num_tests * (10 + 1))
lp_test_labels[:num_tests] = 1
lp_test_labels[num_tests:] = 0
print(f"Part of val labels for link prediction: {lp_test_labels[:3]}")
np.save(lp_test_labels_path, lp_test_labels)
print(f"LP test labels are saved to {lp_test_labels_path}\n")
lp_test_indexes_path = os.path.join(base_dir, "lp-test-indexes.npy")
lp_test_indexes = np.arange(0, num_tests)
lp_test_neg_indexes = np.repeat(lp_test_indexes, 10)
lp_test_indexes = np.concatenate([lp_test_indexes, lp_test_neg_indexes])
print(f"Part of test indexes for link prediction: {lp_test_indexes[:3]}")
np.save(lp_test_indexes_path, lp_test_indexes)
print(f"LP test indexes are saved to {lp_test_indexes_path}\n")
Part of train seeds for link prediction: [[734 698]
[492 101]
[141 102]]
LP train seeds are saved to ./ondisk_dataset_homograph/lp-train-seeds.npy
Part of val seeds for link prediction: [[771 495]
[715 87]
[590 983]]
LP val seeds are saved to ./ondisk_dataset_homograph/lp-val-seeds.npy
Part of val labels for link prediction: [1. 1. 1.]
LP val labels are saved to ./ondisk_dataset_homograph/lp-val-labels.npy
Part of val indexes for link prediction: [0 1 2]
LP val indexes are saved to ./ondisk_dataset_homograph/lp-val-indexes.npy
Part of test seeds for link prediction: [[166 289]
[697 620]
[976 534]]
LP test seeds are saved to ./ondisk_dataset_homograph/lp-test-seeds.npy
Part of val labels for link prediction: [1. 1. 1.]
LP test labels are saved to ./ondisk_dataset_homograph/lp-test-labels.npy
Part of test indexes for link prediction: [0 1 2]
LP test indexes are saved to ./ondisk_dataset_homograph/lp-test-indexes.npy
将数据组织到 YAML 文件中
现在我们需要创建一个 metadata.yaml
文件,其中包含图结构、特征数据、训练/验证/测试集的路径和数据类型。
注意:- 所有路径应相对于 metadata.yaml
。- 以下字段是可选的,未在以下示例中指定。- in_memory
:指示是将数据加载到内存还是使用 mmap
。默认值为 True
。
请参阅 YAML 规范 了解更多详情。
[7]:
yaml_content = f"""
dataset_name: homogeneous_graph_nc_lp
graph:
nodes:
- num: {num_nodes}
edges:
- format: csv
path: {os.path.basename(edges_path)}
feature_data:
- domain: node
name: feat_0
format: numpy
path: {os.path.basename(node_feat_0_path)}
- domain: node
name: feat_1
format: torch
path: {os.path.basename(node_feat_1_path)}
- domain: edge
name: feat_0
format: numpy
path: {os.path.basename(edge_feat_0_path)}
- domain: edge
name: feat_1
format: torch
path: {os.path.basename(edge_feat_1_path)}
tasks:
- name: node_classification
num_classes: 10
train_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(nc_train_ids_path)}
- name: labels
format: torch
path: {os.path.basename(nc_train_labels_path)}
validation_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(nc_val_ids_path)}
- name: labels
format: torch
path: {os.path.basename(nc_val_labels_path)}
test_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(nc_test_ids_path)}
- name: labels
format: torch
path: {os.path.basename(nc_test_labels_path)}
- name: link_prediction
num_classes: 10
train_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(lp_train_seeds_path)}
validation_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(lp_val_seeds_path)}
- name: labels
format: numpy
path: {os.path.basename(lp_val_labels_path)}
- name: indexes
format: numpy
path: {os.path.basename(lp_val_indexes_path)}
test_set:
- data:
- name: seeds
format: numpy
path: {os.path.basename(lp_test_seeds_path)}
- name: labels
format: numpy
path: {os.path.basename(lp_test_labels_path)}
- name: indexes
format: numpy
path: {os.path.basename(lp_test_indexes_path)}
"""
metadata_path = os.path.join(base_dir, "metadata.yaml")
with open(metadata_path, "w") as f:
f.write(yaml_content)
实例化 OnDiskDataset
现在我们可以通过 dgl.graphbolt.OnDiskDataset
加载数据集。实例化时,只需传入包含 metadata.yaml
文件的基础目录即可。
首次实例化时,GraphBolt 会预处理原始数据,例如从边构建 FusedCSCSamplingGraph
。预处理后,所有数据(包括图、特征数据、训练/验证/测试集)都会被放入 preprocessed
目录。后续任何数据集加载都会跳过预处理阶段。
预处理后,需要显式调用 load()
以加载图、特征数据和任务。
[8]:
dataset = gb.OnDiskDataset(base_dir).load()
graph = dataset.graph
print(f"Loaded graph: {graph}\n")
feature = dataset.feature
print(f"Loaded feature store: {feature}\n")
tasks = dataset.tasks
nc_task = tasks[0]
print(f"Loaded node classification task: {nc_task}\n")
lp_task = tasks[1]
print(f"Loaded link prediction task: {lp_task}\n")
Start to preprocess the on-disk dataset.
Finish preprocessing the on-disk dataset.
Loaded graph: FusedCSCSamplingGraph(csc_indptr=tensor([ 0, 7, 15, ..., 9983, 9988, 10000], dtype=torch.int32),
indices=tensor([188, 589, 176, ..., 294, 762, 730], dtype=torch.int32),
total_num_nodes=1000, num_edges=10000,)
Loaded feature store: TorchBasedFeatureStore(
{(<OnDiskFeatureDataDomain.NODE: 'node'>, None, 'feat_0'): TorchBasedFeature(
feature=tensor([[0.2676, 0.8456, 0.3195, 0.7052, 0.7385],
[0.1702, 0.6741, 0.4936, 0.1795, 0.5138],
[0.2081, 0.6209, 0.0087, 0.7627, 0.7574],
...,
[0.9070, 0.4060, 0.9906, 0.6465, 0.1518],
[0.1824, 0.9145, 0.4194, 0.6864, 0.4178],
[0.9964, 0.0864, 0.5270, 0.4842, 0.0228]], dtype=torch.float64),
metadata={},
), (<OnDiskFeatureDataDomain.NODE: 'node'>, None, 'feat_1'): TorchBasedFeature(
feature=tensor([[0.3102, 0.6617, 0.3103, 0.1763, 0.4377],
[0.3336, 0.4147, 0.4776, 0.6154, 0.4325],
[0.9472, 0.4797, 0.4150, 0.9046, 0.7426],
...,
[0.0891, 0.8304, 0.5157, 0.1804, 0.8821],
[0.5526, 0.8321, 0.5452, 0.4415, 0.0907],
[0.2525, 0.3944, 0.8356, 0.9236, 0.3284]]),
metadata={},
), (<OnDiskFeatureDataDomain.EDGE: 'edge'>, None, 'feat_0'): TorchBasedFeature(
feature=tensor([[0.4832, 0.1692, 0.6466, 0.9567, 0.6717],
[0.7352, 0.2552, 0.8236, 0.8469, 0.0960],
[0.0386, 0.9362, 0.8136, 0.4767, 0.9330],
...,
[0.7652, 0.2556, 0.4112, 0.4190, 0.0296],
[0.9907, 0.8527, 0.5779, 0.3108, 0.8355],
[0.0917, 0.6557, 0.5226, 0.9362, 0.3608]], dtype=torch.float64),
metadata={},
), (<OnDiskFeatureDataDomain.EDGE: 'edge'>, None, 'feat_1'): TorchBasedFeature(
feature=tensor([[0.5663, 0.9633, 0.1347, 0.3310, 0.9384],
[0.8327, 0.9789, 0.8282, 0.2175, 0.5416],
[0.0256, 0.3471, 0.4384, 0.0020, 0.7780],
...,
[0.3607, 0.8425, 0.5213, 0.5604, 0.7548],
[0.1914, 0.1043, 0.7555, 0.8857, 0.1084],
[0.3130, 0.3773, 0.9874, 0.2341, 0.6229]]),
metadata={},
)}
)
Loaded node classification task: OnDiskTask(validation_set=ItemSet(
items=(tensor([156, 777, 233, 648, 371, 319, 351, 538, 537, 730, 83, 817, 401, 256,
717, 606, 719, 211, 389, 226, 742, 37, 950, 414, 282, 556, 289, 863,
293, 273, 86, 97, 529, 301, 492, 140, 268, 11, 477, 766, 654, 999,
132, 245, 181, 733, 977, 207, 491, 969, 334, 14, 995, 576, 167, 732,
302, 71, 363, 275, 426, 805, 320, 250, 483, 277, 958, 290, 17, 535,
296, 647, 108, 867, 815, 821, 270, 819, 223, 471, 584, 928, 877, 621,
406, 497, 878, 62, 478, 997, 675, 396, 595, 828, 45, 987, 994, 361,
804, 906, 65, 701, 91, 385, 9, 822, 533, 944, 862, 993, 707, 613,
961, 755, 429, 467, 769, 740, 340, 912, 684, 616, 510, 459, 131, 945,
625, 267, 940, 816, 231, 695, 489, 224, 383, 40, 966, 258, 58, 609,
506, 127, 443, 399, 790, 232, 378, 764, 503, 516, 589, 368, 272, 664,
807, 38, 579, 376, 704, 607, 318, 114, 978, 907, 726, 735, 982, 909,
676, 94, 324, 671, 590, 992, 709, 46, 550, 257, 188, 423, 913, 185,
360, 159, 262, 330, 718, 373, 469, 639, 155, 588, 517, 532, 153, 911,
472, 456, 806, 594], dtype=torch.int32), tensor([2, 8, 3, 3, 4, 0, 6, 7, 1, 4, 8, 4, 2, 6, 3, 3, 5, 2, 3, 2, 6, 5, 2, 3,
1, 0, 5, 8, 2, 0, 9, 8, 2, 7, 9, 3, 4, 3, 4, 3, 9, 2, 4, 6, 8, 4, 5, 8,
2, 8, 2, 4, 5, 0, 8, 2, 0, 0, 4, 6, 0, 9, 4, 7, 5, 1, 2, 7, 8, 5, 6, 6,
7, 7, 7, 4, 9, 6, 7, 8, 8, 9, 5, 6, 6, 6, 3, 7, 9, 1, 7, 7, 7, 5, 1, 2,
1, 7, 7, 3, 7, 1, 0, 0, 6, 5, 6, 2, 7, 6, 3, 0, 3, 2, 2, 3, 9, 3, 4, 7,
0, 8, 6, 0, 9, 7, 2, 9, 8, 5, 1, 2, 3, 7, 7, 3, 4, 8, 4, 1, 1, 3, 6, 5,
3, 0, 3, 9, 0, 1, 1, 1, 7, 6, 7, 6, 2, 5, 4, 4, 6, 9, 2, 0, 7, 4, 1, 4,
7, 9, 4, 5, 6, 1, 0, 7, 3, 6, 3, 7, 1, 9, 0, 9, 3, 1, 7, 8, 1, 1, 4, 0,
9, 8, 9, 1, 2, 3, 8, 5])),
names=('seeds', 'labels'),
),
train_set=ItemSet(
items=(tensor([809, 209, 773, 871, 422, 244, 549, 736, 875, 100, 846, 32, 295, 380,
799, 768, 635, 593, 779, 34, 854, 905, 762, 455, 205, 888, 126, 493,
868, 173, 539, 279, 284, 326, 322, 678, 212, 19, 914, 162, 880, 367,
146, 306, 242, 67, 900, 112, 220, 448, 963, 241, 260, 409, 661, 724,
416, 592, 4, 782, 274, 941, 333, 923, 618, 842, 857, 72, 56, 699,
844, 79, 174, 960, 292, 10, 144, 541, 435, 398, 27, 335, 39, 175,
929, 237, 772, 681, 150, 234, 741, 366, 869, 344, 650, 783, 400, 449,
313, 836, 605, 352, 808, 219, 152, 697, 204, 610, 130, 308, 394, 21,
255, 190, 886, 408, 700, 357, 466, 80, 206, 339, 15, 519, 713, 976,
265, 792, 927, 710, 309, 460, 314, 187, 433, 747, 288, 527, 967, 261,
29, 221, 820, 899, 286, 354, 560, 343, 853, 603, 528, 874, 441, 329,
760, 705, 523, 93, 716, 620, 679, 837, 508, 35, 151, 786, 113, 81,
2, 147, 893, 739, 932, 850, 168, 673, 956, 788, 974, 641, 952, 655,
824, 567, 348, 104, 826, 498, 109, 608, 898, 771, 629, 847, 328, 198,
432, 775, 631, 457, 475, 947, 115, 813, 651, 298, 715, 128, 734, 542,
753, 500, 178, 891, 611, 586, 141, 636, 8, 217, 677, 545, 191, 192,
461, 73, 798, 851, 936, 511, 283, 397, 213, 388, 437, 525, 834, 557,
975, 6, 436, 759, 666, 561, 612, 957, 76, 125, 617, 794, 1, 90,
251, 350, 60, 16, 922, 942, 119, 754, 85, 990, 737, 507, 811, 77,
332, 18, 660, 186, 600, 464, 42, 890, 778, 543, 624, 381, 465, 575,
5, 515, 892, 810, 102, 935, 669, 643, 518, 571, 276, 197, 176, 138,
632, 98, 170, 751, 239, 910, 105, 349, 182, 218, 447, 667, 450, 193,
139, 522, 259, 881, 802, 418, 473, 215, 738, 889, 143, 745, 33, 728,
106, 540, 746, 883, 572, 403, 99, 670, 285, 749, 486, 225, 377, 427,
849, 362, 179, 393, 691, 553, 402, 797, 796, 479, 365, 711, 665, 902,
649, 916, 390, 526, 145, 514, 748, 370, 668, 565, 604, 658, 598, 421,
534, 840, 633, 386, 196, 312, 122, 474, 129, 243, 524, 110, 812, 919,
962, 327, 872, 948, 656, 955, 861, 791, 31, 568, 307, 870, 998, 725,
52, 438, 135, 194, 23, 107, 774, 744, 551, 253, 789, 87, 908, 342,
859, 420, 53, 829, 103, 712, 856, 470, 585, 509, 405, 622, 698, 504,
708, 379, 425, 263, 451, 827, 985, 573, 830, 703, 82, 743, 440, 512,
855, 934, 485, 930, 269, 7, 841, 965, 358, 415, 795, 172, 623, 404,
980, 723, 831, 180, 228, 702, 488, 583, 59, 391, 848, 336, 337, 321,
685, 252, 984, 25, 570, 501, 184, 858, 566, 694, 894, 495, 901, 305,
413, 645, 123, 756, 818, 552, 84, 89, 387, 480, 317, 246, 41, 227,
189, 727, 248, 287, 896, 885, 278, 294, 430, 30, 75, 50, 177, 12,
674, 353, 630, 291, 690, 3, 0, 428, 569, 160, 971, 882, 663, 266,
866, 770, 341, 991, 395, 563, 696, 55, 101, 845, 657, 78, 638, 672,
839, 972, 124, 776, 235, 599, 446, 384, 758, 61, 44, 949, 920, 513,
64, 979, 445, 70, 157, 210, 355, 47, 662, 731, 133, 63, 411, 574,
750, 36, 345, 558, 359, 202, 424, 752, 26, 439, 392, 946, 989, 765,
417, 434, 311, 463, 496, 926, 490, 720, 823, 158, 564, 304, 938, 596,
121, 407, 555, 904, 200, 973, 148, 149, 240, 959, 706, 614],
dtype=torch.int32), tensor([1, 2, 6, 7, 1, 2, 7, 5, 5, 4, 1, 5, 4, 1, 2, 6, 7, 4, 5, 0, 8, 9, 7, 3,
3, 8, 9, 9, 7, 4, 2, 2, 1, 4, 4, 4, 2, 4, 0, 9, 4, 9, 2, 4, 2, 9, 2, 3,
2, 9, 1, 8, 9, 5, 9, 8, 3, 3, 0, 3, 4, 8, 0, 2, 8, 8, 2, 4, 7, 2, 2, 6,
9, 7, 0, 4, 3, 9, 5, 6, 3, 4, 7, 7, 6, 6, 0, 4, 4, 7, 7, 1, 0, 0, 1, 6,
3, 9, 0, 3, 4, 9, 0, 2, 8, 1, 6, 0, 0, 4, 2, 9, 0, 0, 7, 9, 0, 0, 1, 7,
9, 4, 3, 0, 1, 2, 4, 9, 1, 6, 3, 5, 0, 9, 5, 8, 4, 3, 8, 1, 7, 7, 8, 3,
6, 3, 5, 0, 4, 9, 8, 5, 5, 9, 5, 9, 7, 0, 6, 9, 4, 1, 4, 6, 5, 4, 9, 0,
9, 2, 0, 1, 5, 5, 0, 1, 0, 7, 3, 2, 2, 9, 6, 4, 2, 2, 4, 9, 1, 9, 9, 9,
6, 9, 6, 5, 2, 8, 5, 9, 1, 8, 5, 7, 1, 1, 9, 5, 4, 1, 0, 4, 8, 1, 3, 4,
0, 5, 0, 9, 8, 4, 3, 5, 3, 1, 5, 5, 8, 1, 5, 3, 3, 9, 7, 1, 1, 2, 0, 1,
4, 8, 8, 0, 0, 2, 4, 6, 8, 8, 9, 9, 5, 1, 8, 4, 1, 6, 6, 4, 4, 2, 4, 6,
1, 3, 8, 7, 1, 7, 6, 7, 9, 2, 1, 7, 3, 3, 0, 6, 3, 0, 2, 2, 3, 5, 0, 7,
2, 3, 3, 2, 3, 1, 4, 9, 4, 2, 1, 0, 2, 0, 0, 8, 8, 1, 2, 9, 7, 5, 5, 8,
8, 9, 1, 1, 1, 9, 9, 2, 1, 0, 6, 0, 0, 0, 5, 6, 6, 2, 7, 6, 2, 5, 8, 3,
9, 2, 1, 5, 0, 5, 1, 8, 6, 2, 7, 0, 4, 0, 5, 8, 2, 7, 8, 9, 6, 9, 8, 9,
6, 9, 5, 7, 9, 4, 6, 4, 4, 7, 2, 7, 5, 7, 2, 1, 8, 5, 0, 7, 6, 4, 7, 5,
7, 7, 0, 7, 1, 6, 7, 9, 7, 2, 9, 9, 2, 8, 3, 9, 2, 8, 9, 7, 4, 0, 3, 1,
1, 2, 1, 9, 4, 1, 5, 4, 3, 9, 0, 4, 9, 0, 9, 2, 6, 5, 1, 2, 8, 0, 5, 3,
0, 7, 8, 7, 3, 2, 3, 7, 2, 5, 5, 4, 4, 6, 9, 9, 8, 4, 8, 1, 8, 3, 3, 5,
8, 4, 5, 5, 5, 5, 0, 7, 3, 6, 9, 1, 5, 8, 2, 0, 0, 0, 1, 5, 5, 9, 6, 3,
6, 9, 1, 7, 3, 5, 7, 4, 7, 7, 2, 6, 5, 2, 1, 4, 5, 6, 8, 4, 1, 5, 7, 0,
2, 9, 3, 5, 4, 7, 6, 5, 8, 9, 6, 6, 8, 2, 7, 9, 7, 9, 7, 8, 5, 8, 6, 1,
5, 4, 1, 8, 5, 1, 5, 3, 7, 0, 7, 7, 9, 9, 8, 3, 9, 4, 0, 4, 4, 6, 9, 1,
2, 5, 6, 7, 0, 6, 8, 3, 2, 1, 2, 1, 5, 6, 6, 6, 5, 2, 1, 8, 8, 4, 3, 1,
1, 9, 2, 1, 0, 6, 3, 4, 6, 0, 8, 2, 5, 7, 4, 4, 2, 9, 0, 9, 4, 0, 8, 2])),
names=('seeds', 'labels'),
),
test_set=ItemSet(
items=(tensor([484, 372, 48, 254, 281, 626, 864, 986, 338, 66, 587, 865, 118, 452,
860, 92, 419, 833, 686, 356, 757, 375, 171, 201, 988, 887, 627, 931,
970, 876, 154, 458, 642, 236, 481, 601, 761, 951, 195, 116, 835, 693,
369, 136, 767, 852, 785, 722, 787, 937, 548, 238, 653, 54, 582, 547,
374, 580, 619, 300, 954, 310, 602, 442, 536, 996, 51, 546, 800, 921,
924, 382, 692, 781, 531, 784, 111, 142, 410, 918, 939, 364, 634, 578,
230, 562, 20, 165, 183, 968, 13, 615, 933, 137, 682, 134, 554, 468,
203, 780, 544, 803, 69, 494, 323, 412, 801, 521, 721, 49, 687, 299,
591, 271, 214, 24, 264, 482, 838, 347, 74, 689, 117, 925, 164, 964,
208, 943, 953, 897, 163, 199, 873, 843, 166, 431, 88, 22, 879, 520,
43, 597, 325, 659, 28, 303, 161, 915, 683, 222, 476, 462, 640, 96,
644, 903, 297, 917, 68, 502, 216, 453, 825, 637, 983, 57, 646, 249,
169, 652, 763, 331, 120, 559, 247, 577, 895, 814, 487, 729, 346, 444,
714, 884, 229, 280, 688, 680, 499, 793, 581, 316, 95, 454, 530, 832,
315, 981, 505, 628], dtype=torch.int32), tensor([8, 6, 8, 0, 9, 6, 1, 4, 6, 2, 6, 4, 1, 1, 4, 0, 2, 9, 0, 3, 4, 6, 1, 1,
8, 4, 3, 2, 7, 5, 9, 7, 8, 3, 3, 2, 9, 7, 5, 2, 7, 0, 4, 7, 1, 6, 2, 2,
9, 3, 0, 5, 4, 6, 6, 1, 6, 8, 0, 7, 4, 8, 5, 8, 3, 1, 8, 8, 5, 5, 8, 4,
5, 3, 5, 7, 4, 7, 3, 8, 1, 5, 0, 4, 7, 9, 2, 1, 1, 2, 0, 6, 1, 4, 3, 5,
8, 9, 9, 7, 0, 8, 5, 5, 2, 0, 3, 3, 5, 9, 5, 3, 5, 2, 2, 1, 6, 0, 3, 1,
6, 2, 9, 0, 4, 7, 4, 0, 1, 7, 8, 6, 4, 1, 2, 2, 8, 4, 7, 1, 3, 6, 7, 4,
3, 3, 9, 2, 8, 3, 3, 5, 7, 0, 7, 4, 8, 2, 5, 6, 6, 3, 6, 2, 4, 1, 6, 3,
1, 2, 3, 1, 0, 9, 2, 7, 7, 9, 6, 4, 1, 4, 2, 0, 9, 3, 6, 3, 6, 1, 5, 1,
1, 3, 5, 5, 2, 5, 6, 6])),
names=('seeds', 'labels'),
),
metadata={'name': 'node_classification', 'num_classes': 10},)
Loaded link prediction task: OnDiskTask(validation_set=ItemSet(
items=(tensor([[771, 495],
[715, 87],
[590, 983],
...,
[ 55, 17],
[ 55, 659],
[ 55, 904]], dtype=torch.int32), tensor([1., 1., 1., ..., 0., 0., 0.], dtype=torch.float64), tensor([ 0, 1, 2, ..., 1999, 1999, 1999])),
names=('seeds', 'labels', 'indexes'),
),
train_set=ItemSet(
items=(tensor([[734, 698],
[492, 101],
[141, 102],
...,
[447, 161],
[543, 184],
[346, 301]], dtype=torch.int32),),
names=('seeds',),
),
test_set=ItemSet(
items=(tensor([[166, 289],
[697, 620],
[976, 534],
...,
[841, 267],
[841, 373],
[841, 500]], dtype=torch.int32), tensor([1., 1., 1., ..., 0., 0., 0.], dtype=torch.float64), tensor([ 0, 1, 2, ..., 1999, 1999, 1999])),
names=('seeds', 'labels', 'indexes'),
),
metadata={'name': 'link_prediction', 'num_classes': 10},)
/dgl/python/dgl/graphbolt/impl/ondisk_dataset.py:463: GBWarning: Edge feature is stored, but edge IDs are not saved.
gb_warning("Edge feature is stored, but edge IDs are not saved.")