CUDA编程手记2-programmiing model -技术开发专区

CUDA编程手记2-programmiing model

作者：shark_sq的博客编辑：覃里 2009-11-27 17:57 来源：IT168�

　　【IT168 文档】Thread Execution:

　　A warp of 32 threads physically running on SM

　　Sharing instructions

　　 4 cycles for 1 warp instruction

　　 Dynamically scheduled by SM

　　• Executed when operands ready

　　Block IDs and Thread IDs

　　different dimension for what use?

　　----To Simplify memory addressing when processing multidimensional data

　　 Image processing

　　 Solving PDEs on volumes

　　Threads Hierarchy

　　thread

　　Thread block

　　Cooperative Thread Array (CTA)

　　 Max 512 threads

　　Grid

　　Share data in global memory

　　 Dynamically scheduled at runtime

　　Kernel

　　 The part of code running on threads

　　CUDA Memory Model

　　Global memory

　　Contents visible to all threads

　　Shared memory

　　shared by all threads in one block

　　Constant memory

　　CUDA extends C

　　Declaration specs

　　 global, device, shared,local, constant

　　Runtime API

　　 Memory, symbol,execution management

　　Type & Scope

　　__device__ glob/grid

　　__device__ __constant__ constant/grid

　　Automatic variables without any qualifier reside in a register

　　 Except arrays that reside in local memory

　　Variables:

　　Built-in Vector Types

　　int1, int2, int3, int4, float1, float2, float3, float4,...

　　Define by a constructor function make_

　　• int4 make_int4 (int w, int x, int y, int z)

　　• int4 iv(1, 2, 3, 4)

　　– iv.w = 1, iv.x = 2, iv.y = 3, iv.z = 4

　　Built-in dim3 Type

　　gridDim, blockDim, blockIdx, threadIdx

关注我们