|
|
Author
|
Topic: GPU crunching question (Read 39833 times)
|
|
Freddy
|
Tested with 8800GTS 640MB Version (nothing done about the clock rate of memory or GPU)
min_n = 4 max_n = 4 RapidMind FFT Benchmark ----------------------------------------------- Length: 16 = 2^4 Warming up... Run timings, to and from host (in us): 10095.2 8976.7 9132.39 8718.98 8906.92 8904.71 8715.21 8833.48 8783.14 8836.1 8674.97 8913.12 8764.64 8645.37 8741.8 8818.75 9024.37 8807.76 8826.81 8911.87 9002.08 9067.97 8945.69 8910.78 8722.34 8785.37 8814.4 8836.28 8834.39 8795.27 8778.69 8968.62 8747 8943.26 9291.43 8890.32 8932.17 8860.98 8739.06 8734.42 8871.18 8755.89 8868.9 9068.03 8763.38 9002.55 8814.57 8864.37 8823.38 8856.53 8831.87 8614.2 8851.8 8697.95 8952.61 8711.42 8683.05 8912.46 8763.43 8755.46 8718.52 9060.99 8932.78 8812.21 8834.16 8825.66 8653.1 8801.54 8859.38 8665.22 8906.53 8957.47 8860.75 8777.11 8759.25 8845.62 9030.77 8915.02 8858.34 8676.31 8819.07 9009.46 8837.26 8762.6 8834.04 7046.69 8719.74 8610.55 8890.17 8839.04 9646.3 8775.46 8739.86 8720.51 9064.7 8947.07 8705.96 8704.77 8867.14 8880.16 Average execution time: 8842.67us Normalized execution time (T/N): 552.667us/sample Normalized by complexity (T/N lg N): 138.167 Mflops (5 N lg N/T): 0.0361882 Average execution time: 8842.67us Minimum execution time: 7046.69us Normalized average execution time (T/N): 552.667us/sample Normalized minimum execution time (T/N): 440.418us/sample Average time normalized by complexity (T/N lg N): 138.167 Minimum time normalized by complexity (T/N lg N): 110.105 Average Mflops (5 N lg N/T): 0.0361882 Peak Mflops (5 N lg N/T): 0.0454114 --- Warming up... Run timings, GPU-local (in us): 8263.18 8381.39 8462.2 8356.22 8373.54 8503.47 8716.67 8385.77 8394.17 8419.64 8659.13 8294.88 8407.95 8567.22 8493.25 8384.13 8477.74 8508.42 8552.66 8398.76 8761.34 8573.63 8430.25 8437 8615.68 8464.32 8483.02 8540.84 8564.65 8566.38 8503.04 8614.77 8437.5 8545.99 8401.69 8442.15 8832.88 8638.04 8456.14 8492.51 8693.16 8371.29 8350.92 8427.35 8414.12 8851.89 8438.03 8443.12 8503.04 8665.21 8719.99 8375.58 8501.07 8526.01 8325.1 8614.5 8433.29 8432.5 8532.22 8529.62 8481.02 8251.49 8543.71 8523.21 8422.35 8640.62 8603.52 8661.46 8479.36 8548.6 8649.6 8542.74 8373.39 8379.29 8413.56 8598.13 8549.43 8460.99 8544.15 8515.79 8576.4 8485.85 8558.77 8380.95 8520.18 8764.88 8403.96 8483.77 8752.86 7361.6 8661.36 8332.67 8480.45 8310.8 8649.39 8708.75 8560.87 8488.33 8491.4 8473.15 Average execution time: 8495.79us Minimum execution time: 7361.6us Normalized average execution time (T/N): 530.987us/sample Normalized minimum execution time (T/N): 460.1us/sample Average time normalized by complexity (T/N lg N): 132.747 Minimum time normalized by complexity (T/N lg N): 115.025 BenchFFT average Mflops (5 N lg N/T): 0.0376657 BenchFFT peak Mflops (5 N lg N/T): 0.0434688 Residuals (compare with inverse): Average absolute: 1.26059e-008 Maximum absolute: 5.96046e-008 Average relative: -1.#IND Maximum relative: 1.#INF -----------------------------------------------
RapidMind 2D FFT Benchmark =============================================== Size: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 13.7757ms Overall average execution time: 13.7762ms Minimum execution time: 13.2051ms Average Mflops: 380.589 Peak Mflops: 397.035
Run timings, GPU-local (in ms):
Average execution time: 12.1273ms Overall average execution time: 12.1279ms Minimum execution time: 11.7326ms Average Mflops: 432.32 Peak Mflops: 446.865
Both Tests end with an memory read error. OS is Windows XP Pro 32 Bit .Net 2.0 is not installed
Serching for Errors will be done later when work is over...
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...
|
|
|
|
|
Logged
|
|
|
|
|
WR-HW95
|
With 8800GTX @ 612/975 C:\Release-vc8>fft.exe min_n = 4 max_n = 4 RapidMind FFT Benchmark ----------------------------------------------- Length: 16 = 2^4 Warming up... Run timings, to and from host (in us): 11561.3 10482.5 8229.39 12829.6 8740.71 9539.26 9745.74 10875.1 11149.2 9760.27 12356 8845.49 11541.2 8558.26 9808.89 9916.74 9238.06 9773.12 8477.23 7909.47 11607.7 10333.6 7918.13 11377.5 7920.09 10473.6 8454.32 9801.9 10972.9 10767 9267.11 11145.3 9876.5 9839.62 13427.2 8664.71 10973.7 11119.3 9176.86 9062.31 9811.68 8923.72 7202.85 9036.6 9994.13 8747.42 10002.8 10443.1 9761.39 9866.44 10177.1 10808.3 8371.89 10052 9621.96 10266 11904.4 9640.12 9375.24 8899.69 9294.78 10726.2 6828.72 12483.1 9911.99 12466.6 8385.58 7925.68 10416.3 9766.97 9917.02 11196.4 9642.64 10324.1 11035.8 9518.3 8512.15 10829 9727.86 12404.3 10707.5 10192.5 10868.4 7899.13 9340.32 8048.62 7750.77 11226.9 8889.35 9273.54 7777.87 7842.69 7471.92 8830.4 10697.4 11466.3 8701.59 8419.39 7942.44 9761.11 Average execution time: 9788.45us Normalized execution time (T/N): 611.778us/sample Normalized by complexity (T/N lg N): 152.945 Mflops (5 N lg N/T): 0.0326916 Average execution time: 9788.45us Minimum execution time: 6828.72us Normalized average execution time (T/N): 611.778us/sample Normalized minimum execution time (T/N): 426.795us/sample Average time normalized by complexity (T/N lg N): 152.945 Minimum time normalized by complexity (T/N lg N): 106.699 Average Mflops (5 N lg N/T): 0.0326916 Peak Mflops (5 N lg N/T): 0.0468609 --- Warming up... Run timings, GPU-local (in us): 10815.9 11730.4 7816.99 7627.83 9804.42 9321.6 9801.34 9725.06 7585.92 9003.07 9982.68 6766.42 10917.9 8505.45 7894.38 10349.5 8926.79 11731.8 7668.62 8905.56 11206.2 9771.44 11598.2 8679.8 9933.78 9116.51 8855.83 9696 9815.87 8695.17 12109.5 9716.4 8787.65 8662.48 8444.54 7717.24 8718.36 9792.96 10747.7 9169.6 11555.5 8955.85 9709.7 6659.12 10377.2 9286.95 10160.9 11761.7 8587.87 12249.8 8761.67 10833.5 9495.95 7892.71 9270.47 9678.68 10709.1 9684.55 7819.5 10225.5 8822.58 12600.2 8660.8 8996.09 11010.3 6783.74 10320.5 10069.9 9703.83 10450.1 7650.74 10810.8 10639.8 9755.24 11815.3 8054.21 7740.15 10277.5 10128.5 10209.3 6895.78 7671.42 9653.26 9822.86 12298.4 10547.4 7820.62 7712.77 6761.39 8859.18 7419.95 8623.08 7702.71 8842.41 9383.91 9820.06 7636.21 8563.29 9718.36 8473.6 Average execution time: 9385.19us Minimum execution time: 6659.12us Normalized average execution time (T/N): 586.574us/sample Normalized minimum execution time (T/N): 416.195us/sample Average time normalized by complexity (T/N lg N): 146.644 Minimum time normalized by complexity (T/N lg N): 104.049 BenchFFT average Mflops (5 N lg N/T): 0.0340963 BenchFFT peak Mflops (5 N lg N/T): 0.0480544 Residuals (compare with inverse): Average absolute: 1.26059e-008 Maximum absolute: 5.96046e-008 Average relative: -1.#IND Maximum relative: 1.#INF ----------------------------------------------- C:\Release-vc8>fft2d.exe RapidMind 2D FFT Benchmark =============================================== Size: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 15.6239ms Overall average execution time: 15.6285ms Minimum execution time: 13.4389ms Average Mflops: 335.568 Peak Mflops: 390.126
Run timings, GPU-local (in ms):
Average execution time: 13.8474ms Overall average execution time: 13.851ms Minimum execution time: 10.7656ms Average Mflops: 378.619 Peak Mflops: 487.004 It looks like this likes pretty much cpu speed too... above is ran with 2xrosetta and 3.05GHz Opteron 175. I suspended Boinc and ran fft2d again. C:\Release-vc8>fft2d.exe RapidMind 2D FFT Benchmark =============================================== Size: 256 x 256 = 2^8 x 2^8 Radix: 4 = 2^2 Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 14.0743ms Overall average execution time: 14.0783ms Minimum execution time: 13.1137ms Average Mflops: 372.515 Peak Mflops: 399.801
Run timings, GPU-local (in ms):
Average execution time: 12.3266ms Overall average execution time: 12.3304ms Minimum execution time: 10.2948ms Average Mflops: 425.332 Peak Mflops: 509.276
|
|
|
|
« Last Edit: 22 Feb 2007, 09:47:17 am by WR-HW95 »
|
Logged
|
|
|
|
|
pepperammi
|
for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...
I hear the 8900 series will have 25% more shaders or something and still the G80 chips. Apparently there all along. Would that mean anything to all this? I wonder if will be able to unlock them like I think was possible on some older ATI at some point?
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
as i have wrote for older card are better a BrookGPU or Rapidmind... for new cards are better CUDA (nVIDIA) or CTM (ATI)
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
as i have see in the RapidMind FFT source : algorithm is running on two complex on one pass (ala RGBA texture format). using this format has extremely efficiency in vertex/pixel shaders and by memory transfers (shaders/GPU memory)...
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
off topic : Code Wizard : cool  my name is yellow 
|
|
|
|
« Last Edit: 24 Feb 2007, 07:04:50 pm by Devaster »
|
Logged
|
|
|
|
|
Simon
|
 I thought so, too. Keep up the good work!
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
maybe i have a good idea : modifying a boinc manager to use a GPU as a next core .... if you have a usable GPU , then you can run next instance of SETI ...
there would be a small performance hit .... (about 10 percent by my tests)
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Gecko_R7
|
Devastater: So, if a person was running S@H on C2D and had a graphics card, BOINC would recognize the GPU as a 3rd processor and manage the GPU's own client? Well, even if the GPU lost 10% performance, being able to run the CPU clients simultaneously appears to be quite a gain in aggregate vs. GPU-only crunching at 100%. This sounds pretty darn cool! Good luck!
|
|
|
|
|
Logged
|
|
|
|
|
Alex Kan
|
Devaster: Neither of the data points you've picked for fft.exe are representative of SETI's FFT workload--SETI doesn't do two-dimensional FFTs, and spends much more time doing FFTs with lengths between 16K and 128K than it does any other lengths. Also, if you're using the standard MFLOPS = 5 N log 2(N) / (1000 * time in ms) metric for FFT performance, those times strike me as a bit on the low side. A lot of those speeds seem no faster than (or worse, slower than) doing the same computations on the CPU with tuned libraries. Does RapidMind provide built-in functionality for computing FFTs?
|
|
|
|
|
Logged
|
|
|
|
|
Devaster
|
from my side : for me is not important if fft on gpu is more speedy or not but is in that you are using additional compute power to crunching ....
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Devaster
|
Tomorrow afternoon i put here first test SETI on GPU : Nagas FFT , power spectrum and data chirping in rapidmind. stay tuned and patient!!!
|
|
|
|
|
Logged
|
|
|
|
|
|
Quote!
Left to themselves, things tend to go from bad to worse.- Murphy's Law
|
 |  |  |
| |
| Site Statistics |
| Total Members: | 1,021 |
| Total Posts: | 9,117 |
| Total Topics: | 425 | | Downloads |
| Apps |
| Windows R-1.x | 25,067 |
| Windows R-2.0 | 20,291 |
| Windows R-2.2 | 36,400 |
| Linux 32bit 1.x | 6,527 |
| Linux 32bit 2.2 | 4,305 |
| Linux 64bit 2.2 | 1,714 |
| Alpha/IA64 | 187 |
| FreeBSD | 581 |
| HPUX | 323 |
| Subtotal: | 94,304 |
| Source packs: | 4,071 |
| Tool/WU packs: | 7,680 |
| Total: | 150,614 | | GBs dl'd: | 279.10 | | Pages served |
| Today: | 445 |
| Total: | 3,093,854 |
| (since 6/26/2006) |
| 173 Donations to S@H |
| U.S. Dollars: | 3,196.59 |
| Euros: | 863.90 |
| Last 24h: | $ 0.00 |
| Avg./24h: | $ 7.54 |
| Estim. total: | $ 4,319.66 |
Latest Member: fos |
| |
 | |  |
 |  |  |
| |
Online users/last 15m
27 Guests, 0 Users
17 Members/last 24hArchangel999, Jason G, EastWind, Geek@Play, msattler, rperaza26, Gecko_R7, Raistmer, fos, ajs, JDWhale, WHRoeder, _heinz, speedimic, Josef W. Segur, sunu, Fredericx51
| |
 | |  |
|