CUDA编程手记1-技术开发专区

CUDA编程手记1

作者：shark_sq的博客编辑：覃里 2009-11-27 17:54 来源：IT168�

　　【IT168 文档】Parallel Program Model Summary：

　　1.Data parallel programming dominates

　　2.Loops in programs are the source of data parallelism

　　3.Exploitation of parallelism involves sharing work in loops among processes

　　4.Have to use appropriate scheduling techniques for optimal work sharing

　　5.Have to perform some transformations to expose parallelism

　　Architecture of GPU：

　　An array of SPs

　　 8 streaming processors(sp)

　　 2 Special Function Units (SFU)

　　• Transcendental operations (e.g. sin,

　　cos) and interpolation

　　 A 16KB read/write shared memory

　　• Not a cache, but a softwaremanaged

　　data store

　　 Multithreading issuing unit

　　• Dispatch instructions

　　 Instruction cache

　　 Constant cache

　　Threads:

　　Each thread has an ID that it uses to compute memory addresses and make control decisions

　　Thread Blocks

　　Divide thread array into multiple blocks

　　 Threads within a block cooperate via shared memory,atomic operations and barrier synchronization

　　 Threads in different blocks cannot cooperate

　　And the end it is said:

　　The API is an extension to the ANSI C programming language

　　 Low learning curve

　　The hardware is designed to enable lightweight runtime and driver

　　High performance

关注我们