技术开发 频道

CUDA编程手记1

  【IT168 文档Parallel Program Model Summary:

  1.Data parallel programming dominates

  2.Loops in programs are the source of data parallelism

  3.Exploitation of parallelism involves sharing work in loops among processes

  4.Have to use appropriate scheduling techniques for optimal work sharing

  5.Have to perform some transformations to expose parallelism

  Architecture of GPU:

  An array of SPs

    8 streaming processors(sp)

    2 Special Function Units (SFU)

  • Transcendental operations (e.g. sin,

  cos) and interpolation

    A 16KB read/write shared memory

  • Not a cache, but a softwaremanaged

  data store

    Multithreading issuing unit

  • Dispatch instructions

   Instruction cache

   Constant cache

  Threads:

  Each thread has an ID that it uses to compute memory addresses and make control decisions

  Thread Blocks

  Divide thread array into multiple blocks

   Threads within a block cooperate via shared memory,atomic operations and barrier synchronization

   Threads in different blocks cannot cooperate

  And the end it is said:

  The API is an extension to the ANSI C programming language

   Low learning curve

  The hardware is designed to enable lightweight runtime and driver

  High performance

0
相关文章