技术开发 频道

CUDA编程手记2-programmiing model

  【IT168 文档】Thread Execution:

  A warp of 32 threads physically running on SM

  Sharing instructions

   4 cycles for 1 warp instruction

    Dynamically scheduled by SM

  • Executed when operands ready

  Block IDs and Thread IDs

  different dimension for what use?

  ----To Simplify memory addressing when processing multidimensional data

   Image processing

    Solving PDEs on volumes

  Threads Hierarchy

  thread

  Thread block

  Cooperative Thread Array (CTA)

     Max 512 threads

  Grid

  Share data in global memory

     Dynamically scheduled at runtime

  Kernel

    The part of code running on threads

  CUDA Memory Model

  Global memory

  Contents visible to all threads

  Shared memory

  shared by all threads in one block

  Constant memory

  CUDA extends C

  Declaration specs

     global, device, shared,local, constant

  Runtime API

    Memory, symbol,execution management

  Type & Scope

  __device__ glob/grid

  __device__ __constant__ constant/grid

  Automatic variables without any qualifier reside in a register

     Except arrays that reside in local memory

  Variables:

  Built-in Vector Types

  int1, int2, int3, int4, float1, float2, float3, float4,...

  Define by a constructor function make_

  • int4 make_int4 (int w, int x, int y, int z)

  • int4 iv(1, 2, 3, 4)

  – iv.w = 1, iv.x = 2, iv.y = 3, iv.z = 4

  Built-in dim3 Type

  gridDim, blockDim, blockIdx, threadIdx

0
相关文章