|
|
Pages: [1] 2
|
 |
|
Author
|
Topic: ATI OpenCL MultiBeam app (rev177) (Read 19081 times)
|
Raistmer
Working Code Wizard
Volunteer Developer
Knight who says 'Ni!'
   
Offline
Posts: 11022
|
App requirements (the same as for released OpenCL ATI AstroPulse): ATI Stream SDK 2.2 and higher (current version is 2.3) installed, Catalyst 10.7 and higher, HD4xxx(look know issues for more detailed info) and higher GPU. EDIT: With latest Catalyst (10.12 APP) drivers SDK is unneeded. Command line parameter that application supports-period_iterations_num <N> splits single longest PulseFind kernes call on N calls -period_iterations_num 1 (default value) If you see lags in GUI or even driver restarts - add this parameter with value >1 (integer numbers). -instances_per_device <N> allows to run N app instances per single device -instances_per_device 1 (default value) -hp will instruct application to rise its priority class to high. Useful when host under high non-BOINC loads. Also useful if BOINc client itself imposes high load on CPU. app_info section: <app> <name>setiathome_enhanced</name> </app> <file_info> <name>MB_6.10_win_SSE3_ATI_r177.exe</name> <executable/> </file_info> <file_info> <name>MultiBeam_Kernels.cl</name> <executable/> </file_info>
<app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <plan_class>ati13ati</plan_class> <cmdline>-period_iterations_num 2 -instances_per_device 1</cmdline> <flops>20987654321</flops> <file_ref> <file_name>MB_6.10_win_SSE3_ATI_r177.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>MultiBeam_Kernels.cl</file_name> <copy_file/> </file_ref> <coproc> <type>ATI</type> <count>1</count> </coproc> </app_version>
Known issues: - App will not work correctly with HD43xx GPUs and possibly other models who have max workgroup size of 128 instead of 256 (this can be checked with CLInfo sample from SDK) - Double check if your config (GPU+driver) has OpenCL support in case of mobility GPU. Ask ATi for OpenCL support if not. If you want to use few app instances per GPU change -instances_per_device param value and don't forget to change <count> field in app_info too to inform BOINC. -Currently HD5 version only partially compatible with HD6xxx GPUs, probably, due to unmature drivers. It tested as working on HD6970, but there are sporadic driver restarts possible. Also, there is additional binary placed in archive: MB_6.10_win_SSE3_ATI_HD5_r177.exe It uses same CL file as usual version and designed for HD5xxx and higher GPUs. I would like to say great THANK YOU to SubSpace who provided countless profiler data from his HD5870 while I worked on HD5xxx kernels. I have no HD5xxx-compatible hardware and w/o his help this secondary build hardly appear. Also, I thank all alpha and beta testers who took participation in app checking and whole Lunatics crew for continuous support.
|
|
|
|
« Last Edit: 28 Jan 2011, 06:16:36 pm by Gecko »
|
Logged
|
|
|
|
benool
Squire
Offline
Posts: 35
|
Snif  Found out that the ATI 4550 has a Max Workgroup size less than 256... Here is the output from ClInfo: C:\Program Files\ATI Stream\bin\x86>CLInfo.exe Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 128 Max work items[1]: 128 Max work items[2]: 128 Max work group size: 128 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 0 Native vector width short: 0 Native vector width int: 0 Native vector width long: 0 Native vector width float: 0 Native vector width double: 0 Max clock frequency: 650Mhz Address bits: 32 Max memory allocation: 134217728 Image support: No Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 16384 Kernel Preferred work group size multiple: 32 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 01B4A40C Name: ATI RV710 Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1016 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_gl_sharing cl_amd_device _attribute_query Also strangely, when I have my GeForce active (display monitor enabled) CLInfo stops after listing the Nvidia details: C:\Program Files\ATI Stream\bin\x86>CLInfo.exe Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_devi ce_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices
Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 64 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 ERROR: clgetDeviceInfo(-30) Could this be another 'friendly' behavior between Nvidia and AMD?
|
|
|
|
|
Logged
|
|
|
|
|
|
Raistmer
Working Code Wizard
Volunteer Developer
Knight who says 'Ni!'
   
Offline
Posts: 11022
|
Here is the output from ClInfo: ERROR: clgetDeviceInfo(-30)
Could this be another 'friendly' behavior between Nvidia and AMD?
Please, try to run modded CLinfo version attached to this post
|
|
|
« Last Edit: 21 Feb 2011, 06:02:37 am by Raistmer »
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 2494
|
Modded CLinfo works fine: Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_ sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices cl_khr_d3d10_sharing
Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 7 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Max clock frequency: 1600Mhz Address bits: 14757395255531667488 Max memory allocation: 260423680 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 4096 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 128 Cache size: 114688 Global memory size: 1041694720 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Error correction support: 0 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 003E0D78 Name: GeForce GTX 460 Vendor: NVIDIA Corporation Driver version: 266.58 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_ sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c l_khr_fp64
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0344A40C Name: Juniper Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.900 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c l_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 4143Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 247 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0344A40C Name: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_amd_fp64 cl_khr_global_int32 _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a md_popcnt cl_amd_printf cl_khr_d3d10_sharing Claggy
|
|
|
|
« Last Edit: 21 Feb 2011, 03:16:28 pm by Claggy »
|
Logged
|
|
|
|
benool
Squire
Offline
Posts: 35
|
works as well for me with the modified CLinfo. It lists all devices correctly. C:\Program Files\ATI Stream\bin\x86>CLInfo_no_OCL1_1.exe Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_devi ce_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices
Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 64 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Max clock frequency: 1500Mhz Address bits: 14757395255531667488 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 268238848 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 16384 Error correction support: 0 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 003974B8 Name: GeForce 8600 GTS Vendor: NVIDIA Corporation Driver version: 260.99 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_devi ce_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_ global_int32_extended_atomics
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 128 Max work items[1]: 128 Max work items[2]: 128 Max work group size: 128 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 650Mhz Address bits: 32 Max memory allocation: 134217728 Image support: No Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 16384 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 01C0A40C Name: ATI RV710 Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1016 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_gl_sharing cl_amd_device _attribute_query
Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3200Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 0 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 01C0A40C Name: Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_amd_fp64 cl_khr_global_int32 _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a md_popcnt cl_amd_printf
C:\Program Files\ATI Stream\bin\x86>
|
|
|
|
|
Logged
|
|
|
|
Ghost0210
Guest
|
And just to be different this new clInfo only picks up my 5670 and CPU no NV card: E:\Downloads>CLInfo_no_OCL1_1.exe Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices cl_khr_d3d10_sharing
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 5 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 02D1A40C Name: Redwood Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1016 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c l_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 6 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3200Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 65536 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 319 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 02D1A40C Name: AMD Phenom(tm) II X6 1090T Proc essor Vendor: AuthenticAMD Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_amd_fp64 cl_khr_global_int32 _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a md_popcnt cl_amd_printf cl_khr_d3d10_sharing
|
|
|
|
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 2494
|
And just to be different this new clInfo only picks up my 5670 and CPU no NV card:
Ghost, does GPU-Z or GpuCapsViewer report OpenCL support on your Nvidia GPU? Claggy
|
|
|
|
|
Logged
|
|
|
|
Ghost0210
Guest
|
Hi Claggy, Yes openCL is checked in GPU-z, and the original CLInfo lists the 465 the 5670 and the CPU: E:\Documents\ATI Stream\samples\opencl\bin\x86>CLInfo.exe Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd _offline_devices cl_khr_d3d10_sharing Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store cl_khr_ic d cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pr agma_unroll
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 5 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0283A40C Name: Redwood Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1016 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c l_amd_popcnt cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 6 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3200Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 65536 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 319 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0283A40C Name: AMD Phenom(tm) II X6 1090T Proc essor Vendor: AuthenticAMD Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_amd_fp64 cl_khr_global_int32 _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a md_popcnt cl_amd_printf cl_khr_d3d10_sharing
Passed! Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 11 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Max clock frequency: 1215Mhz Address bits: 32 Max memory allocation: 260456448 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 4096 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 128 Cache size: 180224 Global memory size: 1041825792 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 003A0D88 Name: GeForce GTX 465 Vendor: NVIDIA Corporation Driver version: 266.58 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_ sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c l_khr_fp64
Passed!
|
|
|
|
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 2494
|
Hi Claggy,
Yes openCL is checked in GPU-z, and the original CLInfo lists the 465 the 5670 and the CPU:
Your platform order is the other way round to mine and benool's, probably the difference here, [Edit: that reminds me, after looking at Benool's driver version, must try Cat 11.2] Claggy
|
|
|
|
« Last Edit: 21 Feb 2011, 03:24:25 pm by Claggy »
|
Logged
|
|
|
|
Ghost0210
Guest
|
Your platform order is the other way round to mine and benool's, probably the difference here,
Claggy
If thats set by the order the cards are in the motherboard then that would make sense. I have to have the ATI in slot 0 then the nvidia in slot 1 otherwise one or the other doesn't get picked up IIRC you have yours the other way round; nvidia first then ATI? Ghost
|
|
|
|
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 2494
|
Your platform order is the other way round to mine and benool's, probably the difference here,
Claggy
If thats set by the order the cards are in the motherboard then that would make sense. I have to have the ATI in slot 0 then the nvidia in slot 1 otherwise one or the other doesn't get picked up IIRC you have yours the other way round; nvidia first then ATI? Ghost Yep, NV first, ATI 2nd, then i have to use that workaround to stop Boinc's ATI apps crashing, wonder since Benool's order is the same as mine, and he's running CAL 1.4.1016, if that problem has been fixed in the latest drivers, Back soon, Claggy
|
|
|
|
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 2494
|
Your platform order is the other way round to mine and benool's, probably the difference here,
Claggy
If thats set by the order the cards are in the motherboard then that would make sense. I have to have the ATI in slot 0 then the nvidia in slot 1 otherwise one or the other doesn't get picked up IIRC you have yours the other way round; nvidia first then ATI? Ghost The Nvidia and ATI in the same host issue is now fixed with Cat 11.2, no workaround needed anymore,  Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\Stephen>C:\Users\Stephen\Downloads\CLInfo_no_OCL1_1\CLInfo_no_OCL1_1.exe Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_ sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices cl_khr_d3d10_sharing
Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 7 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Max clock frequency: 1600Mhz Address bits: 14757395255531667488 Max memory allocation: 260423680 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 4096 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 128 Cache size: 114688 Global memory size: 1041694720 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Error correction support: 0 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 02A30C20 Name: GeForce GTX 460 Vendor: NVIDIA Corporation Driver version: 266.58 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store c l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_ sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extend ed_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics c l_khr_fp64
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 035BA40C Name: Juniper Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1016 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops c l_amd_popcnt cl_khr_d3d10_sharing
Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 4143Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 247 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 035BA40C Name: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.3 (451 ) Extensions: cl_amd_fp64 cl_khr_global_int32 _base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomi cs cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_s haring cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_a md_popcnt cl_amd_printf cl_khr_d3d10_sharing
Claggy
|
|
|
|
« Last Edit: 22 Feb 2011, 02:58:18 am by Claggy »
|
Logged
|
|
|
|
Ghost0210
Guest
|
The Nvidia and ATiI in the same host issue is now fixed with Cat 11.2, no workaround needed anymore,  Claggy I'll try swapping my cards round tomorrow then and see if that makes a difference to the new CLInfo
|
|
|
|
|
Logged
|
|
|
|
benool
Squire
Offline
Posts: 35
|
In my case, physical order is ATI first and NVidia in a lower slot.
On the drivers side Catalyst 11.2 (you were right) and ForceWare 260.99.
OpenCL checkbox is checked in GPUZ for both cards.
|
|
|
|
|
Logged
|
|
|
|
|
Pages: [1] 2
|
|
|
|
Quote!
Left to themselves, things tend to go from bad to worse.- Murphy's Law
|
 |  |  |
| |
Online users/last 15m
28 Guests, 1 User
ML1 14 Members/last 24hML1, Claggy, arkayn, Richard Haselgrove, Josef W. Segur, Hans Dorn, Mike, Urs Echternacht, Pizzadude, corsair, mr.mac52, Morten, Raistmer, Jim_S
| |
 | |  |
|