Seti@Home optimized science apps and information
 
Welcome, Guest. Please login or register.
Did you miss your activation email?
08 Jan 2009, 04:28:03 pm

Login with username, password and session length
 
If you've registered already but never got your activation email, please click here.
 
 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  GPU crunching  |  Topic: GPU client 0 Members and 0 Guests are viewing this topic. « previous next »
Pages: 1 2 3 [4] 5 6 ... 16 Go Down Print
Author Topic: GPU client  (Read 30580 times)
Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #45 on: 14 Dec 2007, 05:34:53 pm »

bob: yes stil 100% CPU usage not all things are on gpu and i dont use for now async acces ....

for now i am working on chirp routine ....
Logged

Macbeth
Knave
*
Offline Offline

Posts: 5



View Profile
Re: GPU client
« Reply #46 on: 14 Dec 2007, 06:21:49 pm »

Out of curiosity, what RAC would you expect to get from say a Geforce 8800 series card?

Cheers.  Wink
Logged
Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 42


View Profile
Re: GPU client
« Reply #47 on: 15 Dec 2007, 09:18:44 am »

I learned to run Knabench  Cheesy

....but received a very very strange results:

WinXP 32. testWU-1 - testWU-7
C2D E6600 (2.4GHz), default-515.exe, one core in knabench vs 8800GTS 320Mb and last sahcuda.exe

1 - all 7 results - DIFFERENT!  Sad
2 - 8800GTS slower than one core E6600!!!  Embarrassed

This is as it should be?


Quick timetable

WU : testWU-1.wu
default-515.exe : 304 seconds
sahcuda.exe : 499 seconds
Speedup: -64.14%, Ratio: 0.61 x

WU : testWU-2.wu
default-515.exe : 496 seconds
sahcuda.exe : 590 seconds
Speedup: -18.95%, Ratio: 0.84 x

WU : testWU-3.wu
default-515.exe : 541 seconds
sahcuda.exe : 657 seconds
Speedup: -21.44%, Ratio: 0.82 x

WU : testWU-4.wu
default-515.exe : 125 seconds
sahcuda.exe : 123 seconds
Speedup: 1.60%, Ratio: 1.02 x

WU : testWU-5.wu
default-515.exe : 499 seconds
sahcuda.exe : 596 seconds
Speedup: -19.44%, Ratio: 0.84 x

WU : testWU-6.wu
default-515.exe : 823 seconds
sahcuda.exe : 943 seconds
Speedup: -14.58%, Ratio: 0.87 x

WU : testWU-7.wu
default-515.exe : 361 seconds
sahcuda.exe : 376 seconds
Speedup: -4.16%, Ratio: 0.96 x
« Last Edit: 15 Dec 2007, 09:25:42 am by Radiohead » Logged

Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #48 on: 15 Dec 2007, 10:19:40 am »

something wrong is on your computer ...... Shocked

see there :http://setiathome.berkeley.edu/result.php?resultid=681495948 - this is one real work unit crunched with last aplication .....

and yes its still slower than any CPU version ....
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 42


View Profile
Re: GPU client
« Reply #49 on: 15 Dec 2007, 10:36:40 am »

Have to reinstall Windows  Sad
Logged

Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #50 on: 15 Dec 2007, 10:38:38 am »

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 42


View Profile
Re: GPU client
« Reply #51 on: 16 Dec 2007, 05:31:51 am »

and yes its still slower than any CPU version ....

Strangely....

I always thought that Nvidia 8 Series faster than Intel C2D

http://en.wikipedia.org/wiki/FLOPS
"As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU."

And Nvidia promises that the new card (GeForce 9800) will be even faster. 1 or 3 (!!!!) Tflops.... http://www.nordichardware.com/index.php?news=1&action=more&id=6911

I understand that this performance is not at all the tasks...
Perhaps the algorithm sahcuda can optimize computing?
seti_britta mathematician  Smiley
It can help?  Grin
« Last Edit: 16 Dec 2007, 03:41:43 pm by Radiohead » Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 42


View Profile
Re: GPU client
« Reply #52 on: 16 Dec 2007, 05:35:53 am »

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
8500 - 16/16 processors
8800 GTS - 96/96 processors
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 42


View Profile
Re: GPU client
« Reply #53 on: 16 Dec 2007, 05:50:05 am »

something wrong is on your computer ...... Shocked

Again, I launched the knabench.
Now all results - strongly similar  Shocked

* 16.12.2007-001-HOME-0C501089AC-bench.txt (186.94 KB - downloaded 8 times.)
Logged

Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #54 on: 16 Dec 2007, 08:53:12 am »

this code is not optimized ... there are a lot mem transfers that can be avoided for example  and so on ... next there is  mixed the CPU and GPU code in 95:5 .... and not used async access to device ....

first it mus be validated then optimized
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 361



View Profile
Re: GPU client
« Reply #55 on: 18 Dec 2007, 12:42:57 pm »

Mimo,

In what order does clock speed impact GPU performance as far as S@H is concerned?  CPU clock, memory clock, shader clock?
Also, do I understand correctly that the G92 8800GT has 12 FPU processors in the GPU?
Do the shaders provide any benefit?

Sorry for the questions. Just tying to understand this better.


Logged
Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #56 on: 18 Dec 2007, 02:12:23 pm »

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 361



View Profile
Re: GPU client
« Reply #57 on: 18 Dec 2007, 02:47:39 pm »

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders


Thanks Mimo! Very interesting.  So, higher shader count and faster shader clock will actually have better impact on crunching speed/potential for our purposes?  In the case of a new G92 based 8800GT, 112 stream processors, each that can process 4 floats in 1 instruction.  Wow!  The interest in this becomes very clear.  G80/G92 stream processors are scaler units, not vector processors? 
« Last Edit: 18 Dec 2007, 05:53:19 pm by Gecko_R7 » Logged
Devaster
Code Wizard
Knight Templar
*****
Offline Offline

Posts: 359


I like Duke !!!


View Profile
Re: GPU client
« Reply #58 on: 18 Dec 2007, 05:34:50 pm »

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 361



View Profile
Re: GPU client
« Reply #59 on: 18 Dec 2007, 05:53:51 pm »

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Thanks!
Logged
Pages: 1 2 3 [4] 5 6 ... 16 Go Up Print 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  GPU crunching  |  Topic: GPU client « previous next »
Jump to:  


Quote!
Exceptions prove the rule ... and wreck the budget.
- Murphy's Law

 
Site Statistics
Total Members:1,187
Total Posts:12,411
Total Topics:482
Downloads
Apps
Windows R-1.x25,177
Windows R-2.020,387
Windows R-2.236,768
Linux 32bit 1.x6,589
Linux 32bit 2.24,472
Linux 64bit 2.21,839
Alpha/IA64216
FreeBSD655
HPUX355
Subtotal:95,232
Source packs:4,170
Tool/WU packs:8,146
Total:162,730
GBs dl'd:283.99
Pages served
Today:3,234
Total:3,577,054
(since 6/26/2006)
173 Donations to S@H
U.S. Dollars:3,196.59
Euros:863.90
Last 24h:$ 0.00
Avg./24h:$ 6.18
Estim. total:$ 4,319.66
Latest Member:
phod
 
 
Seti@Home optimized science apps and information | Powered by Enigma 2.0 (RC1).
© 2003-2009, LSP Dev Team. All Rights Reserved.
Seti@Home optimized science apps and information Forums | Powered by SMF.
© 2005, Simple Machines LLC. All Rights Reserved.
Powered by MySQL Powered by PHP Valid XHTML 1.0! Valid CSS!