PyOpenCLでは前述のPrefixSumを計算する組み込み関数を提供しています。
class pyopencl.scan.GenericScanKernel( ctx, dtype, arguments, input_expr, scan_expr, neutral, output_statement, is_segment_start_expr=None, input_fetch_exprs=[], index_dtype=<class 'numpy.int32'>, name_prefix='scan', options=[], preamble='', devices=None)
参考までに以下に実装例を掲載します。
CLScanKernelTest.py.
import pyopencl.array as clarr import pyopencl as cl import numpy as np ctx = cl.Context(cl.get_platforms()[0].get_devices(cl.device_type.GPU)) queue = cl.CommandQueue(ctx) a = np.array((1, 5, 352, 2, 2, 2, 3, 2)).astype(np.int32) print(a) a_mem = clarr.to_device(queue, a) from pyopencl.algorithm import GenericScanKernel scan_kernel = GenericScanKernel( ctx, np.int32, arguments="__global int* ary", input_expr="ary[i]", scan_expr="a+b", neutral="0", output_statement=""" ary[i] = item; """ ) scan_kernel(a_mem) print(a_mem.get())
上記のプログラムの出力は以下のようになります。
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/komatsu/PycharmProjects/MyPythonProject/CLScanKernelTest.py [ 1 5 352 2 2 2 3 2] [ 1 6 358 360 362 364 367 369]
Copyright 2018-2019, by Masaki Komatsu