DiskBasedFeature

class dgl.graphbolt.DiskBasedFeature(path: str, metadata: Dict | None = None, num_threads=None)[source]

基类: Feature

磁盘特征的封装器。

通过 numpy 文件初始化磁盘特征获取器。请注意，您可以使用 gb.numpy_save_aligned 替代 np.save 以可能获得更高的性能。

参数:

path (string) – numpy 特征文件的路径。请注意，numpy 的维度应大于 1。
metadata (Dict) – 特征的元数据。
num_threads (int) – 驱动 io_uring 队列的线程数。

示例

>>> import torch
>>> from dgl import graphbolt as gb
>>> torch_feat = torch.arange(10).reshape(2, -1)
>>> pth = "path/to/feat.npy"
>>> np.save(pth, torch_feat)
>>> feature = gb.DiskBasedFeature(pth)
>>> feature.read(torch.tensor([0]))
tensor([[0, 1, 2, 3, 4]])
>>> feature.size()
torch.Size([5])

count()[source]

获取特征的数量。

返回:: 特征的数量。
返回类型:: int

metadata()[source]: 获取特征的元数据。:returns: 特征的元数据。:rtype: Dict

pin_memory_()[source]: 占位符 DiskBasedFeature pin_memory_ 实现。它是一个空操作（no-op）。

read(ids: Tensor | None = None)[source]

按索引读取特征。返回的张量将在 CPU 上。:param ids: 特征的索引。仅读取特征的指定索引。

特征将被读取。

返回:: 读取的特征。
返回类型:: torch.Tensor

read_async(ids: Tensor)[source]

按索引异步读取特征。

参数:: ids (torch.Tensor) – 特征的索引。仅读取特征的指定索引。
返回:: 返回的生成器对象在第 read_async_num_stages(ids.device) 次调用时返回一个 future。可以通过在返回的 future 对象上调用 .wait() 来访问返回结果。多次调用 .wait() 是未定义的行为。
返回类型:: 一个生成器对象。

示例

>>> import dgl.graphbolt as gb
>>> feature = gb.Feature(...)
>>> ids = torch.tensor([0, 2])
>>> for stage, future in enumerate(feature.read_async(ids)):
...     pass
>>> assert stage + 1 == feature.read_async_num_stages(ids.device)
>>> result = future.wait()  # result contains the read values.

read_async_num_stages(ids_device: device)[source]

read_async 操作的阶段数。请参阅 read_async 函数以了解其用法说明。当 read_async 与位于 ids_device 上的张量一起使用时，此函数需要返回 yield 操作的数量。

参数:: ids_device (torch.device) – 传递给 read_async 的 ids 参数所在的设备。
返回:: read_async 操作的阶段数。
返回类型:: int

read_into_memory() → TorchBasedFeature[source]: 将基于磁盘的特征更改为基于 torch 的特征。

size()[source]: 获取特征的大小。:returns: 特征的大小。:rtype: torch.Size

to(_)[source]: 占位符 DiskBasedFeature to 实现。它是一个空操作（no-op）。

update(value: Tensor, ids: Tensor | None = None)[source]: 目前基于磁盘的特征不支持更新。