|
|
Author
|
Topic: Modified SETI MB CUDA + opt AP package for full GPU utilization (Read 34629 times)
|
|
|
Archangel999
Guest
|
use mb_r396 mode with my windows xp x64 and all is ok  no more crash and errors  396 is outdated now. Use 400 instead. thanks but i will use 396 because it is ok for me 
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
use mb_r396 mode with my windows xp x64 and all is ok  no more crash and errors  396 is outdated now. Use 400 instead. thanks but i will use 396 because it is ok for me  It will be Ok just until new bunch of tasks arraives 
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
Yeah, and here's what David Anderson is thinking of doing: From: David Anderson < davea@ssl.berkeley.edu> Cc: boinc_alpha@ssl.berkeley.eduTo: BoincSpy Administrator < boinc_spy@telus.net> Date: Sun, 11 Jan 2009 21:26:37 -0800 Subject: Re: [boinc_alpha] CUDA Blue Screen of death. Message: 3 I've seen this also. It's not related to driver version. According to NVIDIA engineers, if a GPU driver request doesn't complete in 2 seconds, Windows assumes it's hung, and crashes. Apparently the SETI@home/CUDA app does something that sometimes takes > 2 secs on slow GPUs. The NVIDIA people didn't have a fix or workaround for this. So I'm going to change the server so that it won't send CUDA jobs to GPUs slower than 60 GFLOPS (I'm not sure this is the magic number, but on my machine the GPU is 50 GFLOPS, computed as clockRate * multiProcessorCount * 2,857). -- David From a comment somewhere in the BOINC sources, his board is a Quadro FX 3700. Joe
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
Wrong decision IMHO. 1) only some range of tasks suffer from this trouble - so no need to restrict whole host from processing. 2) it possible to repair application or at least tofall back to CPU version on these affected tasks.
That is - no intervntion on server side is required, at least in form of restriction whole host from getting CUDA work.
|
|
|
|
« Last Edit: 13 Jan 2009, 06:27:49 pm by Raistmer »
|
Logged
|
|
|
|
Maik
Guest
|
This build devoted to all who still experiences videodriver crash on VLAR tasks. It should abort its own execution if VLAR with AR <0.14 is detected (temporary measure, of course). All thanks go to Crunch3r for this mod.
Hmm.... im wondering where this idea is from .... looking at my script - code --- well, a angle range check - routine ... and then the idea poped up after crunch3r downloaded my script / took notice of the thread....  a credit would have been nice
|
|
|
|
|
Logged
|
|
|
|
|
|
Richard Haselgrove
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 970
|
Hmm.... im wondering where this idea is from .... looking at my script - code --- well, a angle range check - routine ... and then the idea poped up after crunch3r downloaded my script / took notice of the thread....  a credit would have been nice To be fair, I don't think anyone was trying to belittle your contribution. We've all had something to say on the subject, starting with Raistmer's thread on Beta ( http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1443) and Alexander's batch file. Each contributer has drawn on and extended the work which has gone before. So here's a suggestion for the next phase. Make the VLAR autokill threshhold a variable, and read it in from an XML file somewhere [I used to love .INI files when I was programming, but I digress]. That way, those of us who like living dangerously can test out ever-lower ARs, while the default value can stay at the nice safe (I would say over-cautious) 0.14 level. And we don't have to ask Crunch3r to compile a new version every day with the new daily threshhold hard-coded into the source.... 
|
|
|
|
|
Logged
|
|
|
|
Maik
Guest
|
To be fair, I don't think anyone was trying to belittle your contribution. We've all had something to say on the subject, starting with Raistmer's thread on Beta (http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1443) and Alexander's batch file. Each contributer has drawn on and extended the work which has gone before. So thanks to Alexander too. So here's a suggestion for the next phase. Make the VLAR autokill threshhold a variable, and read it in from an XML file somewhere...
There is no way in my scipt to terminate specified WU's (and i will not add it). This script is just watching the process (like you could do via WinTaskManager) and terminating this process if it is idle for 60 sec. The BM is restarting the same wu after this happend. You could do that by yourself by marking the task on WTM and terminate it. The idea on my script is to do that work after you have filtered your cache and have enough work to do others things than watching the WTK  With non-VLAR-taks that is working fine for me. After a terminate like this the task is running to 100% and at stderr you will see 'Restart at xx%'. So there is no obstacle to crunch VLAR's. Just dont delete it if BM gets new tasks But no warranty that it is working if you use my script and crunch VLAR's. If you get a driver-crash all following WU's will maybe fail 
|
|
|
|
« Last Edit: 13 Jan 2009, 07:49:59 pm by Maik »
|
Logged
|
|
|
|
|
Raistmer
|
like living dangerously can test out ever-lower ARs, while the default value can stay at the nice safe (I would say over-cautious) 0.14 level. And we don't have to ask Crunch3r to compile a new version every day with the new daily threshhold hard-coded into the source....  Hehe, actually, it's my build again, no prob to rebuild it (just for sake of total and complete correctness  ) Idea with variable threshold is good but I personally have hope and trust that this annoing VLAR issue will be solved very soon. So actually do't want to put any more efforts in this - many different tasks on hold...
|
|
|
|
|
Logged
|
|
|
|
Maik
Guest
|
btw. I was testing the v5 400basic mod on a VLAR: 0.0112 No crash, laggy like hell and it took 25min to do 1%.Ii aborted it. I think my card is still to slow for VLARs so i'll try the v5a. thanks for this 
|
|
|
|
|
Logged
|
|
|
|
Maik
Guest
|
Current bug fixes fight mostly with different overflows. Actually, they should eliminate overflows at all. So, please, report any overflow you will get if it not from VLAR and not from task was ran after driver crash w/o OS reboot.
- 16no08ab.25078.13571.14.8.1_1
- WU true angle range is : 2.579270
- SETI@Home Informational message -9 result_overflow
- Spike count: 30
- wingman: in progress
- 16no08ab.25078.13571.14.8.0_0
- WU true angle range is : 2.579270
- SETI@Home Informational message -9 result_overflow
- Spike count: 27
Pulse count: 2 Triplet count: 1 - wingman: CPU - same result
v5a, driver 8120, no driver crash, next wu's in list running fine Edit: 2 more -9 overflow ... did reboot ... next overflow ... the 3new ones all with Spike count: 30 still have 11 more wu's form this series in task list, lets see ... summary 5 wu's now, all with AR: 2.579270, all from 16no08ab.25078.13571.14.8.x_x ... -series it seems that this series picked up the iss or something like that  all wingmans who returned results yet have same results ...
|
|
|
« Last Edit: 13 Jan 2009, 10:31:40 pm by Maik »
|
Logged
|
|
|
|
|
Raistmer
|
Ok, I formulated it bad. Rephrase: All CUDA MB overflows that didn't pass validation versus CPU wingman - because there is "usual" overflows and "non-usual" ones. Surely we take interest only for "non-usual" ones for this moment.
|
|
|
|
|
Logged
|
|
|
|
Archangel999
Guest
|
use the new app and work better then the 396 but are more lagy but it crunch realy fast i 'm having 8800gtx and use the new vbscript all working fine
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
There is known "feature" of BOINC 6.4.5 that now works more as bug - it sets process priority for CUDA application to "Normal" instead of "Idle" It helps nothing because "Idle" priority of worker thread has the same value for "Normal" and "Idle" priorities of process (so called priority class) so thread priority should be adjusted, not process priority (not priority class). Setting process priority to "Normal" excessively increase control thread priority that caused sluggish OS behavior.
So for Don's (Geek@Play) request I did another cleanup for BOINC - now modded CUDA MB will adjust priority class as well. Hope it will give more smooth execution. V5b attached.
|
|
|
|
Logged
|
|
|
|
|
|
Quote!
The enemy of my enemy is not quite as much of an enemy as my enemy if they ask, and in either case, I will play nice to the enemy of my enemy only so far as it hurts my enemy for real.- 13th century Mongol warlord trying to describe the current semiconductor marketplace after dining on tainted cheese
|
 |  |  |
| |
| Site Statistics |
| Total Members: | 123 |
| Total Posts: | 29,786 |
| Total Topics: | 892 | | Downloads |
| Apps |
| Windows R-1.x | 0 |
| Windows R-2.0 | 0 |
| Windows R-2.2 | 0 |
| Linux 32bit 1.x | 0 |
| Linux 32bit 2.2 | 0 |
| Linux 64bit 2.2 | 0 |
| Alpha/IA64 | 1,938 |
| FreeBSD | 0 |
| HPUX | 0 |
| Subtotal: | 0 |
| Source packs: | 5,803 |
| Tool/WU packs: | 10,078 |
| Total: | 22,048 | | GBs dl'd: | 309.53 | | Pages served |
| Today: | 6,824 |
| Total: | 8,668,428 |
| (since 6/26/2006) |
| 173 Donations to S@H |
| U.S. Dollars: | 3,196.59 |
| Euros: | 863.90 |
| Last 24h: | $ 0.00 |
| Avg./24h: | $ 3.32 |
| Estim. total: | $ 4,319.66 |
Latest Member: Miep |
| |
 | |  |
 |  |  |
| |
Online users/last 15m
33 Guests, 5 Users (1 Hidden)
arkayn, Geek@Play, benool, Claggy 43 Members/last 24harkayn, Geek@Play, benool, Claggy, Raistmer, Ghost, Tye, Vyper, perryjay, Jason G, SciManStev, k6xt, Morten, Slawek, cristipurdel, Frizz, Purple Rabbit, sunu, Wild6-NJ, corsair, M_M, Franz, PatrickV2, JohnDK, _heinz, cenit, Josef W. Segur, glennaxl, skildude, msattler, mr.mac52, Gizbar, Devaster, WHRoeder, kit344, Byron Leigh Hatch @ team Carl Sagan, TouchuvGrey, Metod, S56RKO, Questor, VoidPilot, The Grinch, hiamps, Pepi
| |
 | |  |
|