|
|
Author
|
Topic: SETI MB CUDA for Linux (Read 167295 times)
|
Metod, S56RKO
Alpha Tester
Knight o' The Realm
 
Offline
Posts: 51
|
Here are my experiences after some days: it works  My observations: - settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
- settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
- helps a lot if one sets 'run GPU tasks while computer in use' ... that's why I never observed GPU task being run however it has been run whule I was away. Don't do it blindly, think about interactive use of system.
- setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt. One thing, not really by-the-book: one needs to run script as root or else setting higher priority actually fails (only root can increase priority). Which opens potential security hole.
- At least the app I'm running (x86, 2.2, vlar-kill) has a nasty habit of complaining:
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.
Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?
Sunu, thank you for all advice.!
|
|
|
|
« Last Edit: 31 Aug 2010, 07:13:17 am by Metod, S56RKO »
|
Logged
|
|
|
|
Metod, S56RKO
Alpha Tester
Knight o' The Realm
 
Offline
Posts: 51
|
i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52. i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?
I can't say anything about worthiness, however the new one works for me.
|
|
|
|
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52. i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?
There have been quite a few releases between them, so not really a big jump. You can try it and if you don't like it, revert back. settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
Wrong settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
Wrong setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt.
It seems to depend on the kernel/distro used. Other systems seem to highly benefit from it, others not so much.
|
|
|
|
|
Logged
|
|
|
|
Metod, S56RKO
Alpha Tester
Knight o' The Realm
 
Offline
Posts: 51
|
settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
Wrong How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used. settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
Wrong If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
Wrong
How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used. Set 1 and BOINC will reserve a full CPU for each GPU. Set 0.71 as the project app_plan is doing for some hosts running stock Windows builds and if the system has 2 GPUs one CPU will be reserved, etc. You're right that small fractional settings are generally insignificant. settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
Wrong
If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value. The relationships are : rsc_fpops_bound/flops = elapsed time limit. DCF*rsc_fpops_est/flops = estimated runtime. rsc_fpops_bound = 10*rsc_fpops_est. If DCF is near or greater than 10 as sometimes happens, the estimated runtime is longer than the allowed runtime. Reducing DCF can reduce the estimates and thereby allow work fetch, without changing the allowed runtime. Adjusting flops to more than a realistic value for the host is not a very good idea, but adjusting rsc_fpops_bound values higher can protect against those errors. With the servers attempting to provide rsc_fpops_est and _bound values which are about right for DCF 1.0, we can hope things will settle down after they have enough data to know how fast the applications are. Unfortunately the initial transitions are painful. Joe
|
|
|
|
|
Logged
|
|
|
|
|
riofl
|
either today's batch of downloads is supposed to take a very long time for a gpu to complete or i have something going wrong. my fastest gpu is taking 3 hours 2 minutes to reach 85%! and even the others are taking 10 to 15 minutes longer on the other 2 gpus. all 3 gpu temps are also much lower than normal. typically they run 58-65c max load and i have not seen them rise above 50c in several hours.
is this a 'common' experience others are having too today or am i facing something going haywire?
|
|
|
|
|
Logged
|
|
|
|
Claggy
Alpha Tester
Knight who says 'Ni!'
 
Online
Posts: 2508
|
either today's batch of downloads is supposed to take a very long time for a gpu to complete or i have something going wrong. my fastest gpu is taking 3 hours 2 minutes to reach 85%! and even the others are taking 10 to 15 minutes longer on the other 2 gpus. all 3 gpu temps are also much lower than normal. typically they run 58-65c max load and i have not seen them rise above 50c in several hours.
is this a 'common' experience others are having too today or am i facing something going haywire?
Check out your results: resultid=1717782169 on hostid=4166601<core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt>
SETI@home MB CUDA 3.0 6.09 Linux 64bit - r16 by Crunch3r :p - thread priority mod
setiathome_CUDA: Found 3 CUDA device(s): Device 1 : GeForce GTX 285 totalGlobalMem = 1073020928 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 512 clockRate = 1476000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 Device 2 : GeForce GTX 295 totalGlobalMem = 939327488 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 512 clockRate = 1345500 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 Device 3 : GeForce GTX 295 totalGlobalMem = 939327488 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 512 clockRate = 1345500 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 285 is okay SETI@home using CUDA accelerated device GeForce GTX 285 Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available. Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available. setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing... setiathome_enhanced 6.01 Revision: 737 g++ (GCC) 4.2.1 (SUSE Linux) libboinc: BOINC 6.11.0
Work Unit Info: ............... WU true angle range is : 1.433000
Flopcounter: 11714606392639.039062
Spike count: 1 Pulse count: 0 Triplet count: 0 Gaussian count: 0 05:22:35 (16178): called boinc_finish
</stderr_txt> I suggest you try first restarting Boinc, then your computer. Claggy
|
|
|
|
« Last Edit: 02 Oct 2010, 07:34:32 am by Claggy »
|
Logged
|
|
|
|
|
riofl
|
i noticed it was using 100% cpu and that is what tipped me off as well.. i shut down for about 1 min then restarted and that seems to have cured it. i am wondering though if this is a symptom of something going bad or if it was just the occasional 'fluke'
|
|
|
|
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Do you still make heavy use of your main graphics card?
What driver do you use?
|
|
|
|
|
Logged
|
|
|
|
|
riofl
|
Do you still make heavy use of your main graphics card?
What driver do you use?
im using 256.53 driver with cuda 3.1 yes. i have a monitor off both ports of the 285 and one monitor off the 295 and i make heavy use of them though mostly it is in ssh, browser, email , instant msg, text editor windows. the monitors are set up in a zinerama/twinview mixture to get 3 on one desktop. this is the first time this problem has h appened, and since i power cycled the machine it has not happened since.. although i did do something out of the ordinary yesterday. tried to watch a training seminar video but it wouldnt play. had some wrong version codecs somehow since it did work 2 weeks ago. that may have tossed the vid card into a strange state since i had to kill the player. wound up eating all available memory.i think when something like this happens in the future like with the vid player, ill just power off and start up again to be safe.
|
|
|
|
« Last Edit: 02 Oct 2010, 10:35:40 pm by riofl »
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Why cuda 3.1? I think you shouldn't use it. Cuda 3.x is intended for different software and hardware.
|
|
|
|
|
Logged
|
|
|
|
|
riofl
|
i forget who told me but they said it was backward compatible and that performance was better.
|
|
|
|
|
Logged
|
|
|
|
|
riofl
|
Why cuda 3.1? I think you shouldn't use it. Cuda 3.x is intended for different software and hardware.
that must have been in an upgrade done yesterday or the day before. the list was long and i really didnt look carefully at it. i have reinstalled cuda-toolkit 2.1. it appears device 2 started causing issues now and was done with each workunit as it began working on it. this happened in the past hour i think... hopefully this will cure the problems.
|
|
|
|
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Cuda 2.3 would be the best choice.
|
|
|
|
|
Logged
|
|
|
|
|
riofl
|
argh. i didnt even notice the typo.. yes i installed 2.3 not 2.1.. sorry
|
|
|
|
|
Logged
|
|
|
|
|
|
Quote!
Nature always sides with the hidden flaw.- Murphy's Law
|
 |  |  |
| |
Online users/last 15m
23 Guests, 2 Users
Claggy, arkayn 12 Members/last 24hClaggy, arkayn, KarVi, Mike, Raistmer, Frizz, Richard Haselgrove, Josef W. Segur, mr.mac52, _heinz, msattler, Urs Echternacht
| |
 | |  |
|