|
|
Author
|
Topic: GPU crunching question (Read 40795 times)
|
|
Simon
|
I have no numbers to compare with, but going by how the 6600 compares with the 6800, 7800, 7900 would deliver a similar sort of speedup, and the 8800 GTX may be quite a lot quicker due to its different architecture.
I'm estimating that if a GF 6600 can do ~0.5 GFlops in FFTs, then a 7800/7900 would do around 3x-4x as much. Have you gotten anyone to test your code on those cards?
Regards, Simon.
|
|
|
|
« Last Edit: 14 Feb 2007, 02:01:52 pm by Simon »
|
Logged
|
|
|
|
|
|
|
citroja
|
I just found (some of) my notes on a FFT project I did about 2 years ago. I am a bit rusty, but if you want an extra set of eyes to cross reference....just let me know.
and as stated before I have a 7800GTX ready and waiting...however I do have a second one coming in soon so I will be able to do SLI testing if needed as well.
-citroja
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
wanna test  ??  unpack and run .... and write numbers and card ....
|
|
|
« Last Edit: 14 Feb 2007, 10:46:57 am by Devaster »
|
Logged
|
|
|
|
|
pepperammi
|
Fails to run in Vista Ultimate x64  Says for both fft and fft2d "The application has failed to start because it's side-by-side configuration is incorrect. Please see the application log for more detail." Event viewer info: Activation context generation failed for "C:\xxxxxxx\fft.exe". Dependent Assembly Microsoft.VC80.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="8.0.50727.762" could not be found. Please use sxstrace.exe for detailed diagnosis. Sorry. I wont be much help I suppose  [EDIT] same on a second XP machine.
|
|
|
|
« Last Edit: 14 Feb 2007, 11:38:14 am by pepperammi »
|
Logged
|
|
|
|
|
Devaster
|
only 32 bit ....
oops i forgot add ms vc8 runtime ... sorry ...
here is it :
|
|
|
|
Logged
|
|
|
|
|
pepperammi
|
Yea it kinda crashes on x64. Was able to get this info before it does though if its any help to you. On PD945, 7950 GT 512MB oc edition but can't remember the clcok speed. Will try and find that again. Also the driver for Vista aren't brilliant at the moment. I'll try and get it on the XP 32bit system I've got. [EDIT]Got results from XP 32bit system. Much better. On a 6700XL 128MBSize: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 10.3317ms Overall average execution time: 10.333ms Minimum execution time: 9.10301ms Average Mflops: 507.458 Peak Mflops: 575.95
Run timings, GPU-local (in ms):
Average execution time: 7.88121ms Overall average execution time: 7.88255ms Minimum execution time: 7.25584ms Average Mflops: 665.238 Peak Mflops: 722.574
|
|
|
« Last Edit: 14 Feb 2007, 02:10:01 pm by pepperammi »
|
Logged
|
|
|
|
|
keeleysam
|
On a 7900GT:
min_n = 4 max_n = 4 RapidMind FFT Benchmark ----------------------------------------------- Length: 16 = 2^4 Warming up... Run timings, to and from host (in us): 20597 9077.37 9696.11 9197.03 9168.72 9431.07 8683.53 9497.21 8846.68 9282.54 9018.77 9536.37 8525.4 10783.7 8275.49 8725.45 9378.13 8728.19 8879.72 9141.6 9507.8 9493.01 8800.25 11025.1 8919.49 8545.89 9093.98 10293.4 9472.42 10200.4 8922.43 9307.07 9000.57 9144.88 9039.11 9070.22 8831.57 10942 8566.39 10773.8 8636.83 8644.92 8682.37 8773.49 9290.59 7589.43 9198.22 8743.57 7973.17 9571.23 7876.32 8255.47 9064.25 8775.51 8158.25 9060.96 7676.09 7666.71 9149.89 9774.81 10266.7 10175.7 9520.35 8725.29 10543.8 8581.63 7617.03 15456.1 8748.53 8726.4 9638.18 9400.55 10548.7 8776.72 10612.3 9235.29 9257.36 9272.04 8578.75 10260.5 9040.53 7605.66 9057.08 9349.05 9530.74 8781.51 9602.82 9365.02 7739.68 7746.63 8837.69 10425.8 8660.14 9671.3 9630.79 9706.52 9869.04 9411.83 9261.09 9144.61 Average execution time: 9323.61us Normalized execution time (T/N): 582.726us/sample Normalized by complexity (T/N lg N): 145.681 Mflops (5 N lg N/T): 0.0343215 Average execution time: 9323.61us Minimum execution time: 7589.43us Normalized average execution time (T/N): 582.726us/sample Normalized minimum execution time (T/N): 474.339us/sample Average time normalized by complexity (T/N lg N): 145.681 Minimum time normalized by complexity (T/N lg N): 118.585 Average Mflops (5 N lg N/T): 0.0343215 Peak Mflops (5 N lg N/T): 0.0421639 --- Warming up... Run timings, GPU-local (in us): 5976.24 5896.85 5840.59 6505.2 5775.57 7056.4 6266.14 5998.06 5886.62 7327.8 6065.59 5858.28 6421.34 5776.61 5926.31 5250.09 5871.16 7021.49 5823.92 6924.32 5780.96 5904.29 5706.06 7206.85 6377.11 6465.32 6095.81 6328 5976.41 6630.75 5816.1 5795.21 7562.49 5496.43 6818.26 5466.12 5741.6 5980.02 5716.79 7440.3 5966.9 6397.72 5532.77 5484.52 5601.83 6377.94 5580.49 6659.62 5603.51 6320.36 5269.05 5209.39 6419.08 5713.91 5216.8 5260.48 7587.09 5241.04 5475.64 5406.69 7129.43 5858.2 5725.67 5813.34 6022.91 5768.2 5609.28 6125.66 5996.56 6007.18 7563.85 6086.56 6230.87 6926.92 5960.09 6062.77 5800.01 6015.09 5505.55 5892.75 6236.54 5841.23 5506.36 5892.58 5654.26 6105.84 5710.56 5600.19 6400.18 6086.03 6659.31 5882.92 5838.27 6343.58 6125.2 6492.9 6064 5760.77 5854.11 5531.29 Average execution time: 6057.85us Minimum execution time: 5209.39us Normalized average execution time (T/N): 378.616us/sample Normalized minimum execution time (T/N): 325.587us/sample Average time normalized by complexity (T/N lg N): 94.654 Minimum time normalized by complexity (T/N lg N): 81.3967 BenchFFT average Mflops (5 N lg N/T): 0.052824 BenchFFT peak Mflops (5 N lg N/T): 0.0614276 Residuals (compare with inverse): Average absolute: 2.4984e-008 Maximum absolute: 1.19267e-007 Average relative: -1.#IND Maximum relative: 1.#INF -----------------------------------------------
RapidMind 2D FFT Benchmark =============================================== Size: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 16.7119ms Overall average execution time: 16.7127ms Minimum execution time: 14.8866ms Average Mflops: 313.721 Peak Mflops: 352.189
Run timings, GPU-local (in ms):
Average execution time: 10.8762ms Overall average execution time: 10.8781ms Minimum execution time: 9.81122ms Average Mflops: 482.052 Peak Mflops: 534.376
|
|
|
|
|
Logged
|
|
|
|
|
popandbob
|
spat out an error for both...
fft.exe (top in pic) ftt2d.exe (bottom in pic) showed that then went to the same as fft.exe
BoB
|
|
|
|
Logged
|
|
|
|
|
citroja
|
Ok I ran it...but got an error both times running it. Used a XFX 7800 GTX OC.
RESULTS:
FFT
min_n = 4 max_n = 4 RapidMind FFT Benchmark ----------------------------------------------- Length: 16 = 2^4 Warming up... Run timings, to and from host (in us): 11088.9 10020 10061.1 9965.72 9864.96 9933.09 9944.54 9765.17 10057.2 9835.17 9694.47 9850.44 9889.76 9837.63 9770.31 9745.29 9740.39 10029.6 9977.12 9747.09 9773.63 9721.04 9869.84 9799.63 9861.39 9877.91 9840.76 10061.1 9847.86 9776.06 9863.19 9510.83 9619.27 10084.2 9967.15 9788.94 9841.71 9879.99 9715.2 9831.11 10047.5 9785.2 9878.41 9814.68 9767.72 9773.21 9901.7 10074.6 10086.7 9847.63 9846.62 9976.32 10008.6 9875.92 9859.49 9764.52 9779.82 9774.2 9933.79 9897.1 9915.27 9792.4 9807.99 9823.81 9846.13 9873.5 9807.47 10006.1 9770.74 9872.61 9938.64 9916.57 9874.38 9941.68 9819.74 9913.2 9837.42 9671.82 9753.61 9805.79 9752.28 9730.36 9751.96 9912.53 10012.9 10133.2 9882.52 9870.45 9763.79 9948.21 10232.1 9924.38 9935.36 9899.92 9818.8 10061.7 9916.66 9969.69 9952.8 9904.88 Average execution time: 9884.06us Normalized execution time (T/N): 617.753us/sample Normalized by complexity (T/N lg N): 154.438 Mflops (5 N lg N/T): 0.0323754 Average execution time: 9884.06us Minimum execution time: 9510.83us Normalized average execution time (T/N): 617.753us/sample Normalized minimum execution time (T/N): 594.427us/sample Average time normalized by complexity (T/N lg N): 154.438 Minimum time normalized by complexity (T/N lg N): 148.607 Average Mflops (5 N lg N/T): 0.0323754 Peak Mflops (5 N lg N/T): 0.0336459 --- Warming up... Run timings, GPU-local (in us): 9748.17 9507.76 9554.45 9612.01 9610.69 9481.02 9496.95 9411.37 9427.72 9407.54 9517.09 9602.89 9635.99 9578.78 9604.73 9608.02 9468.44 9477.32 9497.57 9727.09 9508.55 9551.91 9555.9 9560 9550.06 9614.92 9521.42 9391.96 9365.14 9369.59 9557.56 9480.28 9525.28 9642.08 9370.73 9727.39 9779.86 9979.25 9611.85 9492.61 9580.91 9439.35 9497.55 9502.86 9545.7 9548.19 9523.97 9503.56 9537.42 9514.92 9627 9618.37 9531.4 9570.15 9555.49 9562.65 9598.57 9823.91 9509.34 9603.7 9600.79 9564.68 9567.27 9671.98 9453.32 9650.67 9525.09 9515.26 9536.27 9488.43 9562.71 9416.56 9415.84 9441.23 9630.29 9598.56 9515.82 9514.17 9532.05 9507.69 9569.8 9491.44 9446.88 9423.49 9439.6 9511.41 9481.26 9477.17 9664.5 9769.24 9616.25 9560.46 9517.15 9606.68 9453.77 9401.95 9459.16 9489.44 9437.21 9485.7 Average execution time: 9543.38us Minimum execution time: 9365.14us Normalized average execution time (T/N): 596.461us/sample Normalized minimum execution time (T/N): 585.321us/sample Average time normalized by complexity (T/N lg N): 149.115 Minimum time normalized by complexity (T/N lg N): 146.33 BenchFFT average Mflops (5 N lg N/T): 0.0335311 BenchFFT peak Mflops (5 N lg N/T): 0.0341693 Residuals (compare with inverse): Average absolute: 2.4984e-008 Maximum absolute: 1.19267e-007 Average relative: -1.#IND Maximum relative: 1.#INF
******************EXITS WITH ERROR***************
The Instructions at "0x6962e876" referenced memory at "0x0000045c". The memory could not be "read".
Click on OK to terminate the program
******************End Message********************
FFT2d
RapidMind 2D FFT Benchmark =============================================== Size: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 15.329ms Overall average execution time: 15.3299ms Minimum execution time: 14.7271ms Average Mflops: 342.024 Peak Mflops: 356.001
Run timings, GPU-local (in ms):
Average execution time: 13.1125ms Overall average execution time: 13.1131ms Minimum execution time: 12.8642ms Average Mflops: 399.839 Peak Mflops: 407.557
******************EXITS WITH ERROR***************
The Instructions at "0x6962e876" referenced memory at "0x0000045c". The memory could not be "read".
Click on OK to terminate the program
******************End Message********************
I hope this helps...let me know if you need anything else.
-citroja
|
|
|
« Last Edit: 14 Feb 2007, 10:20:51 pm by citroja »
|
Logged
|
|
|
|
|
Devaster
|
spat out an error for both...
fft.exe (top in pic) ftt2d.exe (bottom in pic) showed that then went to the same as fft.exe
BoB
maybe you have not a compatiblle hardware with rapidmind (shader model 3.0)... it is trying a cpu backend and then you must have correctly set c++ compiler .... citroja : i dont now what happend, on all machines i have tested is it ok .... to all : please os version too thanx (this algo is heavy tuned for rapidmind)
|
|
|
|
« Last Edit: 15 Feb 2007, 09:37:05 am by Devaster »
|
Logged
|
|
|
|
|
Simon
|
Hi Devaster,
how is RapidMind working out for you? From what I saw when looking at the documentation, it seems pretty usable compared to having to write direct shader code.
Since I have no base for comparison, how would you say brookGPU, RapidMind, CUDA or other solutions you tried compare in performance, also how long does it takes you to code for them?
From what I found out, RapidMind seems the most useful because it can use both ATI X1K+ and nVidia 6x+ GPUs without modification; guess you'd need to compile different kernels with it still, or include more DLLs.
Do you know whether your code works on ATI GPUs right now? Should be possible, with RM.
<edit>Yes, it does. Amazingly, even on AGP ones; just tested on my ATI X800. Interesting, though RapidMind would only work on X1K+ ATIs. Very slow though, it needs PCI-Express to work correctly (AGP isn't all that bidirectional). Results attached - XP32, A64 3500+ (2.2 GHz) </edit>
Regards, Simon.
|
|
|
« Last Edit: 15 Feb 2007, 10:08:40 am by Simon »
|
Logged
|
|
|
|
|
Devaster
|
i have tested only brookgpu and rapidmind - for cuda i have not a gpu and NDA .... my implementation of fft in brook was very slow , but nagas (in GLSL - GPUFFTW) is comparable in speed with rapidmind ...
usability of rapidmind ... is cool ....
rapidmind gpu backend would running on all cards that have SM 3.0 and GLSL ... cell backend on cells and cpu backend with classic c++ compiler ...
|
|
|
|
|
Logged
|
|
|
|
|
citroja
|
spat out an error for both...
fft.exe (top in pic) ftt2d.exe (bottom in pic) showed that then went to the same as fft.exe
BoB
maybe you have not a compatiblle hardware with rapidmind (shader model 3.0)... it is trying a cpu backend and then you must have correctly set c++ compiler .... citroja : i dont now what happend, on all machines i have tested is it ok .... to all : please os version too thanx (this algo is heavy tuned for rapidmind) OS is Win XP Pro SP2 hmm do you need .NET 2.0? I did run it with BIONC running and not running and got the same thing....maybe bad RAM? though it tests fine??? -citroja
|
|
|
|
|
Logged
|
|
|
|
|
popandbob
|
My OS is the same as citroja Card is an ATI HIS 9250 Excalibur I do have .net 2.0
Bob
|
|
|
|
|
Logged
|
|
|
|
|
|
Quote!
Success always occurs in private, and failure in full view.- Murphy's Law
|
 |  |  |
| |
| Site Statistics |
| Total Members: | 1,046 |
| Total Posts: | 9,938 |
| Total Topics: | 440 | | Downloads |
| Apps |
| Windows R-1.x | 25,105 |
| Windows R-2.0 | 20,320 |
| Windows R-2.2 | 36,511 |
| Linux 32bit 1.x | 6,551 |
| Linux 32bit 2.2 | 4,349 |
| Linux 64bit 2.2 | 1,750 |
| Alpha/IA64 | 193 |
| FreeBSD | 606 |
| HPUX | 334 |
| Subtotal: | 94,586 |
| Source packs: | 4,121 |
| Tool/WU packs: | 7,788 |
| Total: | 154,024 | | GBs dl'd: | 280.44 | | Pages served |
| Today: | 1,363 |
| Total: | 3,213,621 |
| (since 6/26/2006) |
| 173 Donations to S@H |
| U.S. Dollars: | 3,196.59 |
| Euros: | 863.90 |
| Last 24h: | $ 0.00 |
| Avg./24h: | $ 7.09 |
| Estim. total: | $ 4,319.66 |
Latest Member: Leaps-from-Shadows |
| |
 | |  |
 |  |  |
| |
Online users/last 15m
9 Guests, 3 Users
Haselgrove, Jason G, Gecko_R7 25 Members/last 24hHaselgrove, Jason G, Gecko_R7, Raistmer, KarVi, Devaster, WHRoeder, Urs Echternacht, autocrosser, speedimic, BerndBrot, Josef W. Segur, Pappa, Yin Gang, popandbob, Geek@Play, Leaps-from-Shadows, KWSN - jonnyv, sunu, firefox, Morten, bytzmaster, StanJazz, ajs, gaulois952
| |
 | |  |
|