nVidia CUDA API（上）-技术开发专区

nVidia CUDA API（上）

作者：heresy live空间编辑：覃里 2009-11-27 16:46 来源：heresy

　　Directive to specify how a kernel is executed

　　指定 kernel 在 device 上執行的設定參數，主要就是指定這份 kernel 要用多大的 block grid(也就是多少個 block)、每個 block 多大(也就是每個 block 有多少 thread)。所以呼叫 __global__ 的地方，都要指定 execution configuration;他的形式式在 function name 和參數之間，加入「<<< Dg, Db, Ns >>>」。

　　其中，三個值的意義如下：

　　Dg 的型別是 dim3(屬於 common runtime component 的部分，一種簡單的資料結構)，用來指定 grid 的維度和大小;Dg.x * Dg.y 就是 grid 中會被執行的 block 數目。

　　Db 的型別是 dim3，用來指定 block 的維度和大小;Db.x * Db.y * Db.z 就是每個 block 中的 thread 數目。

　　Ns 的型別是 size_t，用來指定每個 block 在 shared memory 中動態分配的變數的位元數。這個值可以不用指定，預設值是 0。

　　如果 function 是宣告成「__global__ void Func(float* parameter);」，那呼叫的方法就是「Func<<< Dg, Db, Ns >>>(parameter);」。而在 device 上總共會被產生的執行序數目，就會是 ( Dg.x * Dg.y ) * ( Db.x * Db.y * Db.z )。

　　Built-in variables

　　指定 grid 和 block 的維度，以及 block 和 thread 的索引，有下面這些：

gridDim

　　資料型別是 dim3，儲存 grid 的維度資料。

blockIdx

　　資料型別是 uint3，儲存 grid 中 block 的索引值。

blockDim

　　資料型別是 dim3，儲存 block 的維度資料。

threadIdx

　　資料型別是 uint3，儲存 block 中 thread 的索引值。

　　而這些變數都是唯獨的，不能去修改他們的值;此外，也不能去用他們的位址。而他們主要的用處，是讓 device 上的程式，可以知道自己是哪一個 block 的哪一個 thread，進而知道自己在陣列或 texture 中該取的值;某種程度上，就相當於迴圈中不斷累加、用來計數的 index。

　　Extension 的部分大概就是上面這些了～而 runtime library 的部分，請參考《nVidia CUDA API(下)》。

第1页：Function type qualifiers 第2页：Directive to specify how a kernel is executed

关注我们