Seti@Home optimized science apps and information
 
Welcome, Guest. Please login or register.
Did you miss your activation email?
20 Nov 2008, 03:30:46 pm

Login with username, password and session length
 
If you've registered already but never got your activation email, please click here.
 
 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  GPU crunching  |  Topic: GPU client 0 Members and 0 Guests are viewing this topic. « previous next »
Pages: 1 2 3 [4] 5 6 ... 14 Go Down Print
Author Topic: GPU client  (Read 26114 times)
Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #45 on: 14 Dec 2007, 05:34:53 pm »

bob: yes stil 100% CPU usage not all things are on gpu and i dont use for now async acces ....

for now i am working on chirp routine ....
Logged

Macbeth
Knave
*
Offline Offline

Posts: 5



View Profile
Re: GPU client
« Reply #46 on: 14 Dec 2007, 06:21:49 pm »

Out of curiosity, what RAC would you expect to get from say a Geforce 8800 series card?

Cheers.  Wink
Logged
Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 41


View Profile
Re: GPU client
« Reply #47 on: 15 Dec 2007, 09:18:44 am »

I learned to run Knabench  Cheesy

....but received a very very strange results:

WinXP 32. testWU-1 - testWU-7
C2D E6600 (2.4GHz), default-515.exe, one core in knabench vs 8800GTS 320Mb and last sahcuda.exe

1 - all 7 results - DIFFERENT!  Sad
2 - 8800GTS slower than one core E6600!!!  Embarrassed

This is as it should be?


Quick timetable

WU : testWU-1.wu
default-515.exe : 304 seconds
sahcuda.exe : 499 seconds
Speedup: -64.14%, Ratio: 0.61 x

WU : testWU-2.wu
default-515.exe : 496 seconds
sahcuda.exe : 590 seconds
Speedup: -18.95%, Ratio: 0.84 x

WU : testWU-3.wu
default-515.exe : 541 seconds
sahcuda.exe : 657 seconds
Speedup: -21.44%, Ratio: 0.82 x

WU : testWU-4.wu
default-515.exe : 125 seconds
sahcuda.exe : 123 seconds
Speedup: 1.60%, Ratio: 1.02 x

WU : testWU-5.wu
default-515.exe : 499 seconds
sahcuda.exe : 596 seconds
Speedup: -19.44%, Ratio: 0.84 x

WU : testWU-6.wu
default-515.exe : 823 seconds
sahcuda.exe : 943 seconds
Speedup: -14.58%, Ratio: 0.87 x

WU : testWU-7.wu
default-515.exe : 361 seconds
sahcuda.exe : 376 seconds
Speedup: -4.16%, Ratio: 0.96 x
« Last Edit: 15 Dec 2007, 09:25:42 am by Radiohead » Logged

Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #48 on: 15 Dec 2007, 10:19:40 am »

something wrong is on your computer ...... Shocked

see there :http://setiathome.berkeley.edu/result.php?resultid=681495948 - this is one real work unit crunched with last aplication .....

and yes its still slower than any CPU version ....
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 41


View Profile
Re: GPU client
« Reply #49 on: 15 Dec 2007, 10:36:40 am »

Have to reinstall Windows  Sad
Logged

Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #50 on: 15 Dec 2007, 10:38:38 am »

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 41


View Profile
Re: GPU client
« Reply #51 on: 16 Dec 2007, 05:31:51 am »

and yes its still slower than any CPU version ....

Strangely....

I always thought that Nvidia 8 Series faster than Intel C2D

http://en.wikipedia.org/wiki/FLOPS
"As of 2007, the fastest PC processors perform over 30 GFLOPS.[8] GPUs in PCs are considerably more powerful in terms of pure FLOPS. For example, in the GeForce 8 Series the nVidia 8800 Ultra performs around 576 GFLOPS on 128 Processing elements. This equates to around 4.5 GFLOPS per element, compared with 2.75 per core for the Blue Gene/L. It should be noted that the 8800 series performs only Single precision calculations, and that while GPUs are highly efficient at calculations they are not as flexible as a general purpose CPU."

And Nvidia promises that the new card (GeForce 9800) will be even faster. 1 or 3 (!!!!) Tflops.... http://www.nordichardware.com/index.php?news=1&action=more&id=6911

I understand that this performance is not at all the tasks...
Perhaps the algorithm sahcuda can optimize computing?
seti_britta mathematician  Smiley
It can help?  Grin
« Last Edit: 16 Dec 2007, 03:41:43 pm by Radiohead » Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 41


View Profile
Re: GPU client
« Reply #52 on: 16 Dec 2007, 05:35:53 am »

lol nice times by knabech - by me my 8500 gives not only 19 % but 100% slowdown .....
8500 - 16/16 processors
8800 GTS - 96/96 processors
Logged

Radiohead
Knight o' the Realm
**
Offline Offline

Posts: 41


View Profile
Re: GPU client
« Reply #53 on: 16 Dec 2007, 05:50:05 am »

something wrong is on your computer ...... Shocked

Again, I launched the knabench.
Now all results - strongly similar  Shocked

* 16.12.2007-001-HOME-0C501089AC-bench.txt (186.94 KB - downloaded 8 times.)
Logged

Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #54 on: 16 Dec 2007, 08:53:12 am »

this code is not optimized ... there are a lot mem transfers that can be avoided for example  and so on ... next there is  mixed the CPU and GPU code in 95:5 .... and not used async access to device ....

first it mus be validated then optimized
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 287



View Profile
Re: GPU client
« Reply #55 on: 18 Dec 2007, 12:42:57 pm »

Mimo,

In what order does clock speed impact GPU performance as far as S@H is concerned?  CPU clock, memory clock, shader clock?
Also, do I understand correctly that the G92 8800GT has 12 FPU processors in the GPU?
Do the shaders provide any benefit?

Sorry for the questions. Just tying to understand this better.


Logged
Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #56 on: 18 Dec 2007, 02:12:23 pm »

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 287



View Profile
Re: GPU client
« Reply #57 on: 18 Dec 2007, 02:47:39 pm »

yes . for example 8500 GPU have two multiprocessors where every multiprocessor has 8 unified shaders, every shader can work with 4 floats at one instruction  - you may imagine that your cpu has 16*4 cores ....
instructions are very effective - low clocks time, for example MADD - multiply and add have only 4 clocks.
cache is extremely effective in contignous reads/writes - called coalescing

clocks speed havent so great impact on gpu performance as count of shaders


Thanks Mimo! Very interesting.  So, higher shader count and faster shader clock will actually have better impact on crunching speed/potential for our purposes?  In the case of a new G92 based 8800GT, 112 stream processors, each that can process 4 floats in 1 instruction.  Wow!  The interest in this becomes very clear.  G80/G92 stream processors are scaler units, not vector processors? 
« Last Edit: 18 Dec 2007, 05:53:19 pm by Gecko_R7 » Logged
Devaster
Code Wizard
Knight Templar
*****
Online Online

Posts: 308


I like Duke !!!


View Profile
Re: GPU client
« Reply #58 on: 18 Dec 2007, 05:34:50 pm »

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....
Logged

Gecko_R7
Global Moderator
Knight Templar
*****
Offline Offline

Posts: 287



View Profile
Re: GPU client
« Reply #59 on: 18 Dec 2007, 05:53:51 pm »

more about theory is in this document on the beginning : http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf

no shaders are strictly vectorized , massive pararelizm is implemented on hw....

Thanks!
Logged
Pages: 1 2 3 [4] 5 6 ... 14 Go Up Print 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  GPU crunching  |  Topic: GPU client « previous next »
Jump to:  


Quote!
Wise men talk because they have something to say; fools, because they have to say something.
- Plato

 
Site Statistics
Total Members:1,069
Total Posts:10,675
Total Topics:445
Downloads
Apps
Windows R-1.x25,140
Windows R-2.020,352
Windows R-2.236,610
Linux 32bit 1.x6,573
Linux 32bit 2.24,405
Linux 64bit 2.21,783
Alpha/IA64203
FreeBSD628
HPUX345
Subtotal:94,863
Source packs:4,169
Tool/WU packs:7,919
Total:157,859
GBs dl'd:281.83
Pages served
Today:21
Total:3,345,238
(since 6/26/2006)
173 Donations to S@H
U.S. Dollars:3,196.59
Euros:863.90
Last 24h:$ 0.00
Avg./24h:$ 6.65
Estim. total:$ 4,319.66
Latest Member:
zangetsu
 
 
Seti@Home optimized science apps and information | Powered by Enigma 2.0 (RC1).
© 2003-2008, LSP Dev Team. All Rights Reserved.
Seti@Home optimized science apps and information Forums | Powered by SMF.
© 2005, Simple Machines LLC. All Rights Reserved.
Powered by MySQL Powered by PHP Valid XHTML 1.0! Valid CSS!