cupy.RawModule#

class cupy.RawModule(unicode code=None, *, unicode path=None, tuple options=(), unicode backend=u'nvrtc', bool translate_cucomplex=False, bool enable_cooperative_groups=False, name_expressions=None, bool jitify=False)[source]#

用户定义的自定义模块。

此类可用于编译原始 CUDA 源代码或加载 CUDA 模块（*.cubin, *.ptx）。当需要从同一源代码中检索多个 CUDA 内核时，此类非常有用。

对于前一种情况，CUDA 源代码在调用任何方法时进行编译。对于后一种情况，可以通过提供其路径来加载现有的 CUDA 二进制文件（*.cubin）或 PTX 文件。

通过调用 get_function() 可以从 RawModule 中检索 CUDA 内核，它将返回一个 RawKernel 实例。（与 RawKernel 类似，生成的二进制文件也会被缓存。）

参数:

code (str) – CUDA 源代码。与 path 互斥。
path (str) – cubin/ptx 路径。与 code 互斥。
options (tuple of str) – 传递给后端的编译器选项 (NVRTC 或 NVCC)。详情请参阅 https://docs.nvda.net.cn/cuda/nvrtc/index.html#group__options 或 https://docs.nvda.net.cn/cuda/cuda-compiler-driver-nvcc/index.html#command-option-description。
backend (str) – nvrtc 或 nvcc 之一。默认为 nvrtc
translate_cucomplex (bool) – CUDA 源代码是否包含头文件 cuComplex.h。如果设置为 True，任何使用 cuComplex.h 中函数的代码将被转换为其等效的 Thrust 实现。默认为 False。
enable_cooperative_groups (bool) – 是否在 CUDA 源代码中启用 cooperative groups。如果设置为 True，则会正确配置编译选项，并使用 cuLaunchCooperativeKernel 启动内核，以便在 CUDA 源代码中使用 cooperative groups。此功能仅在 CUDA 9 或更高版本中受支持。
name_expressions (sequence of str) – 一个字符串序列（例如列表），用于引用 C++ 全局/模板内核的名称。例如，对于模板内核 func1<T> 和非模板内核 func2，可以使用 name_expressions=['func1<int>', 'func1<double>', 'func2']。然后必须将此元组中的字符串逐个传递给 get_function() 以检索相应的内核。
jitify (bool) – 是否使用 Jitify 辅助 NVRTC 编译 C++ 内核。默认为 False。

注意

从 CuPy v13.0.0 开始，如果 options 中未指定，RawModule 默认使用 C++11 标准编译（-std=c++11）。

注意

RawModule 中的每个内核都拥有独立的函数属性。

注意

在 CuPy v8.0.0 之前，编译发生在初始化时。现在，编译发生在首次从模块中检索任何对象（内核或指针）时。

方法

compile(self, log_stream=None)#

编译当前模块。

通常，您不必调用此方法；内核在首次调用时会隐式编译。

参数:: log_stream (object) – 传递 sys.stdout 或文件对象，编译器输出将写入其中。默认为 None。

注意

调用 compile() 将重置 RawKernel 的内部状态。

get_function(self, unicode name)#

通过名称从模块中检索 CUDA 内核。

参数:: name (str) – 内核函数名称。对于 C++ 全局/模板内核，name 是指初始化当前 RawModule 实例时指定的名称表达式之一。
返回:: 一个 RawKernel 实例。
返回类型:: RawKernel

注意

以下示例展示了如何检索其中一个特殊化的 C++ 模板内核

code = r'''
template<typename T>
__global__ void func(T* in_arr) { /* do something */ }
'''

kers = ('func<int>', 'func<float>', 'func<double>')
mod = cupy.RawModule(code=code, options=('--std=c++11',),
                     name_expressions=kers)

// retrieve func<int>
ker_int = mod.get_function(kers[0])

另请参阅

nvrtcAddNameExpression 和 nvrtcGetLoweredName，来自 NVRTC 文档的 Accessing Lowered Names。

get_global(self, name)#

通过名称从模块中检索全局符号的指针。

参数:: name (str) – 全局符号的名称。
返回:: 全局符号的句柄。
返回类型:: MemoryPointer

注意

此方法可用于访问，例如，常量内存

# to get a pointer to "arr" declared in the source like this:
# __constant__ float arr[10];
memptr = mod.get_global("arr")
# ...wrap it using cupy.ndarray with a known shape
arr_ndarray = cp.ndarray((10,), cp.float32, memptr)
# ...perform data transfer to initialize it
arr_ndarray[...] = cp.random.random((10,), dtype=cp.float32)
# ...and arr is ready to be accessed by RawKernels