技术开发 频道

簡單的 CUDA 程式:DeviceInfo

  【IT168 文档】在《nVidia CUDA API(下)》中有提到 CUDA SDK 裡有提供一些基本的裝置管理的界面,也就是像是「cudaSetDevice( int )」這一類的函式。基本上,這些 function 的功用、用法在該篇文章中講的應該是夠了~

  會在這邊另外提出來,是因為 Heresy 自己有碰到一些執行上問題。主要的問題是,在在主要顯示卡執行計算時間過久的 CUDA 程式的時候,可能會使得 Windows 螢幕的畫面沒辦法更新。這點在 nVidia 的 Forum 裡也有人提出來過(參考:《Updating desktop to be stopping while running CUDA》),而在其 FAQ 中,也有提及:

  33. What is the maximum kernel execution time?

  On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset.

  This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if they run longer than the maximum allowed time.

  For this reason it is recommended that CUDA is run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

  而官方的建議,則就是讓 CUDA 程式在沒有顯示桌面的 GPU 上執行了~

  下面的程式的功能,主要有兩部分:第一部分是列出目前電腦上,可以執行 CUDA 的 device;而第二部分,則是設定 CUDA 使用的 device。

#include "stdio.h"
#include
"cuda_runtime.h"

// output given cudaDeviceProp
void OutputSpec( const cudaDeviceProp sDevProp )
{
  printf(
"Device name: %s\n", sDevProp.name );
  printf(
"Device memory: %d\n", sDevProp.totalGlobalMem );
  printf(
" Memory per-block: %d\n", sDevProp.sharedMemPerBlock );
  printf(
" Register per-block: %d\n", sDevProp.regsPerBlock );
  printf(
" Warp size: %d\n", sDevProp.warpSize );
  printf(
" Memory pitch: %d\n", sDevProp.memPitch );
  printf(
" Constant Memory: %d\n", sDevProp.totalConstMem );
  printf(
"Max thread per-block: %d\n", sDevProp.maxThreadsPerBlock );
  printf(
"Max thread dim: ( %d, %d, %d )\n", sDevProp.maxThreadsDim[0], sDevProp.maxThreadsDim[1], sDevProp.maxThreadsDim[2] );
  printf(
"Max grid size: ( %d, %d, %d )\n", sDevProp.maxGridSize[0], sDevProp.maxGridSize[1], sDevProp.maxGridSize[2] );
  printf(
"Ver: %d.%d\n", sDevProp.major, sDevProp.minor );
  printf(
"Clock: %d\n", sDevProp.clockRate );
  printf(
"textureAlignment: %d\n", sDevProp.textureAlignment );
}

void main()
{
  
// part1, check the number of device
  int    iDeviceCount = 0;
  cudaGetDeviceCount(
&iDeviceCount );
  printf(
"Number of GPU: %d\n\n", iDeviceCount );

  
if( iDeviceCount == 0 )
  {
    printf(
"No supported GPU\n" );
    
return;
  }

  
// part2, output information of each device    
  for( int i = 0; i < iDeviceCount; ++ i )
  {
    printf(
"\n=== Device %i ===\n", i );
    cudaDeviceProp    sDeviceProp;
    cudaGetDeviceProperties(
&sDeviceProp, i );
    OutputSpec( sDeviceProp );
  }

  
// part3, set CUDA to use the second device
  cudaSetDevice( 1 );

  
// part4, do something
  ...
}

   其中,void OutputSpec( const cudaDeviceProp sDevProp ) 這個函式是負責把 CUDA 的 device 資訊(以 cudaDeviceProp 為變數的格式)輸出用的,格式上 Heresy 也是隨便弄弄而已。

0
相关文章