PyOpenCLでは前述のPrefixSumを計算する組み込み関数を提供しています。
class pyopencl.scan.GenericScanKernel(
ctx,
dtype,
arguments,
input_expr,
scan_expr,
neutral,
output_statement,
is_segment_start_expr=None,
input_fetch_exprs=[],
index_dtype=<class 'numpy.int32'>,
name_prefix='scan',
options=[],
preamble='', devices=None)参考までに以下に実装例を掲載します。
CLScanKernelTest.py.
import pyopencl.array as clarr
import pyopencl as cl
import numpy as np
ctx = cl.Context(cl.get_platforms()[0].get_devices(cl.device_type.GPU))
queue = cl.CommandQueue(ctx)
a = np.array((1, 5, 352, 2, 2, 2, 3, 2)).astype(np.int32)
print(a)
a_mem = clarr.to_device(queue, a)
from pyopencl.algorithm import GenericScanKernel
scan_kernel = GenericScanKernel(
ctx, np.int32,
arguments="__global int* ary",
input_expr="ary[i]",
scan_expr="a+b",
neutral="0",
output_statement="""
ary[i] = item;
"""
)
scan_kernel(a_mem)
print(a_mem.get())
上記のプログラムの出力は以下のようになります。
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/komatsu/PycharmProjects/MyPythonProject/CLScanKernelTest.py [ 1 5 352 2 2 2 3 2] [ 1 6 358 360 362 364 367 369]
Copyright 2018-2019, by Masaki Komatsu