【IT168 文档】Parallel Program Model Summary:
1.Data parallel programming dominates
2.Loops in programs are the source of data parallelism
3.Exploitation of parallelism involves sharing work in loops among processes
4.Have to use appropriate scheduling techniques for optimal work sharing
5.Have to perform some transformations to expose parallelism
Architecture of GPU:
An array of SPs
8 streaming processors(sp)
2 Special Function Units (SFU)
• Transcendental operations (e.g. sin,
cos) and interpolation
A 16KB read/write shared memory
• Not a cache, but a softwaremanaged
data store
Multithreading issuing unit
• Dispatch instructions
Instruction cache
Constant cache
Threads:
Each thread has an ID that it uses to compute memory addresses and make control decisions
Thread Blocks
Divide thread array into multiple blocks
Threads within a block cooperate via shared memory,atomic operations and barrier synchronization
Threads in different blocks cannot cooperate
And the end it is said:
The API is an extension to the ANSI C programming language
Low learning curve
The hardware is designed to enable lightweight runtime and driver
High performance