Donate To Seti@HomeSeti@Home optimized science apps and information
 
Welcome, Guest. Please login or register.
23 Nov 2014, 11:55:18 am

Login with username, password and session length
 
» Home
» Forums
» Downloads
» FAQ
» News

» Search site
 
 
 
If you've registered already but never got your activation email, please click here.
 
 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Discussion Forum  |  Topic: Recent Driver Cuda-safe Project List 0 Members and 0 Guests are viewing this topic. « previous next »
Pages: [1] 2 Go Down Print
Author Topic: Recent Driver Cuda-safe Project List  (Read 7104 times)
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Recent Driver Cuda-safe Project List
« on: 04 Sep 2011, 05:00:19 am »

Hi All,

Sticky DownClocks ?

    It's come to my attention that there are cuda enabled Boinc projects around that still cause 'sticky downclocks' on newer technology drivers, through terminating processes mid kernel execution (non-threadsafe behaviour, see tech stuff below). during snooze, exit &/or error out scenarios. 

I'll start a list here indicating what is 'safe' for Cuda 270.xx+ drivers, & what isn't:

Avoid killing Cuda or OpenCL applications via task manager unless you really need to, and expect a reboot will be needed for most newer cards (with newer drivers) if you do, even with applications that use a threadsafe strategy:


Project Name    Stock App ThreadsafeExit?          3rdParty App ThreadSafeExit?
-------------   -------------------------          ----------------------------
Collatz          No                                 N/A
Einstein@Home    No (reportedly being updated)      ?
GPUGrid          No (fall/Autumn update)            ?
PrimeGrid        No                                 ?
Seti@Home        No (version updates in progress)   Yes (Later Lunatics builds)

Last updated: 4th September, 2011


Technical Background:  Cuda & OpenCL Compute languages, and the underlying infrastructure they depend on including drivers & Hardware, have moved to heavy use of  'asynchronous execution' constructs.  The use of the Windows api TerminateProcess() function, as used in outdated BoincApi for Snooze/Exit or by killing via task manager or similar, immediately halts execution & frees the resources.  Too bad if the Graphics hardware is in the middle of writing to a memory area (asynchronously), and the buffer is freed from under it... These errors tend to force drivers into a failsafe mode only a rebbot can remedy at this time.  It's possible future drivers may be hardened somewhat to some forms of abuse, but it seems unlikely this particular application induced situation can be easily prevented, by other means than making the applications behave in a more threadsafe fashion..

Jason

Update: 18th November 2011.
How to determine if your Cuda enabled Boinc project science application might be using non-threadsafe exit code, that can induce 'sticky downlclocks' on any kind of snooze/completion/exit
- Ensure a task for the project is processing 'normally'
- Suspend all inactive tasks (so new ones won't start)
- Monitor GPU clock rate in something like GPU-Z, nVidia Inspector or similar
- repeatedly snooze then resume the running task, a threadsafe snooze shutdown should always return to full normal clocks when resumed.

Update: March 2nd 2012
Unfortunately the message about sticky downclocks caused by non-threadsafe application exit behaviour hasn't apparently been getting out enough for users.  The Cuda 4.1 Toolkit release notes text file contains  a fairly concise description of the issues at hand, suitable for developers, relating directly to later drivers & proper application termination.
Quote
* The CUDA driver creates worker threads on all platforms, and this can cause issues at process cleanup in some multithreaded applications on all supported operating systems.
On Linux, for example, if an application spawns multiple host pthreads, calls into CUDART, and then exits all user-spawned threads with pthread_exit(), the process may never terminate. Driver threads will not automatically exit once the user's threads have gone down.
The proper solution is to either:
(1) call cudaDeviceReset()* on all used devices before termination of host threads, or,
(2) trigger process termination directly (i.e, with exit()) rather than relying on the process to die after only user-spawned threads have been individually exited.
*note that the Cuda 3.2 equivalent of cudaDeviceReset() is cudaThreadExit()
« Last Edit: 02 Mar 2012, 02:22:23 am by Jason G » Logged
arkayn
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 1159


Aaaarrrrgggghhhh


WWW
Re: Recent Driver Cuda-safe Project List
« Reply #1 on: 04 Sep 2011, 06:46:52 am »

Collatz                 No                                      N/A
Logged

Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #2 on: 04 Sep 2011, 09:41:22 am »

Collatz                 No                                      N/A
Thanks, Updated.

[Edit:] added GPU Grid as well.
« Last Edit: 04 Sep 2011, 10:05:44 am by Jason G » Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #3 on: 17 Nov 2011, 10:14:22 am »

Updated first post with:

Quote
Update: 18th November 2011.
How to determine if your Cuda enabled Boinc project science application might be using non-threadsafe exit code, that can induce 'sticky downlclocks' on any kind of snooze/completion/exit
- Ensure a task for the project is processing 'normally'
- Suspend all inactive tasks (so new ones won't start)
- Monitor GPU clock rate in something like GPU-Z, nVidia Inspector or similar
- repeatedly snooze then resume the running task, a threadsafe snooze shutdown should always return to full normal clocks when resumed.
« Last Edit: 17 Nov 2011, 10:19:56 am by Jason G » Logged
Richard Haselgrove
Messenger Pigeon
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2601



Re: Recent Driver Cuda-safe Project List
« Reply #4 on: 17 Nov 2011, 10:22:18 am »

I asked GPUGrid specifically about this in September.

This reply 5 October:

Quote from: GDF
Probably. We have to do it a bit differently, but yes.

Then again, 7 November:

Quote from: GDF
We will probably postpone this new application in 2012 to focus on upgrading the server and the new AMD alpha application.

Edit - their attitude was "our app is compiled using CUDA3.1 - so nobody needs to use CUDA 4.x drivers yet". Thus, as we've seen elsewhere, ignoring that part of the userbase who might wish to update drivers to suit other projects, or even non-BOINC uses of their GPUs.

Edit2  - to be fair, that last comment came from skgiven, who is a moderator/tester - not from GDF, who is a developer/scientist. So it's not necessarily indicative of the developers' attitudes.
« Last Edit: 17 Nov 2011, 10:29:56 am by Richard Haselgrove » Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #5 on: 17 Nov 2011, 10:45:27 am »

Oh well, no 560's (without ti) or upcoming Keplers for them I guess.  I understand the attitude, I really do.  The old one works with the flawed code design, so why break it ? My answer to that of course is because it is flawed, so drives away users.  My only real fear is they'll leave it too late.

Jason

« Last Edit: 17 Nov 2011, 11:19:20 am by Jason G » Logged
Richard Haselgrove
Messenger Pigeon
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2601



Re: Recent Driver Cuda-safe Project List
« Reply #6 on: 17 Nov 2011, 11:18:44 am »

Well, let's face it - the majority of users probably never suffer downclocking, or don't notice it if they do (the machine will boot normally next time they turn it on).

At GPUGrid, the tasks run for hours, and the scientists like them back within 24 hours (exch job is cascaded from a previous one, so they like to keep the chain moving). So I doubt many people are switching from project to project.

At PrimeGrid, the other place where the subject has come up recently, even the longer class of tasks only runs for 20 minutes on Juan's 560, so again there should never be any need for preemption.
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #7 on: 17 Nov 2011, 11:20:52 am »

Here's what it comes down to in the medium-longer term:

The deeper I've looked into the range of problem situations, the more it seems Microsoft, nVidia and all the other members of the Khronos Consortium really got things right for the 'bigger picture' (this time), and are seriously looking to the future.  Unfortunately, IMNSHO,  the GPGPU community has subsisted for so long on sloppy design & bodgy bandaid workarounds that the addition of true heterogeneous processing capability & secure virtualised environment is going to bring many tears along the way.
Logged
Richard Haselgrove
Messenger Pigeon
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2601



Re: Recent Driver Cuda-safe Project List
« Reply #8 on: 17 Nov 2011, 11:40:55 am »

...  the ... community has subsisted for so long on sloppy design & bodgy bandaid workarounds that the addition of true heterogeneous processing capability & secure virtualised environment is going to bring many tears along the way.

So unlike our own dear BOINC Tongue
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #9 on: 17 Nov 2011, 11:44:47 am »

So unlike our own dear BOINC Tongue
  Now Now, stick to the topic please  Cheesy
Logged
Claggy
Alpha Tester
Knight who says 'Ni!'
***
Online Online

Posts: 2944


WWW
Re: Recent Driver Cuda-safe Project List
« Reply #10 on: 17 Nov 2011, 05:38:09 pm »

I've done my first post over at PrimeGrid: Message 43381

Claggy
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #11 on: 17 Nov 2011, 05:55:09 pm »

I've done my first post over at PrimeGrid: Message 43381

Claggy
  Cheers, will be interesting to see if there's any reaction.  Chances that future drivers will revert to not using any DMA engine I'd say are zero, so they'll pretty much have to fix it eventually.  Hopefully it doesn't get lost in the Ati stuff there, which is a different set of circumstances leading to another whole messy scenario we won't go into in the Cuda safety thread  Cheesy
« Last Edit: 17 Nov 2011, 05:59:08 pm by Jason G » Logged
Claggy
Alpha Tester
Knight who says 'Ni!'
***
Online Online

Posts: 2944


WWW
Re: Recent Driver Cuda-safe Project List
« Reply #12 on: 20 Nov 2011, 04:26:50 pm »

I've posted over at Collatz: Cuda threadsafe exit code

Claggy
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Recent Driver Cuda-safe Project List
« Reply #13 on: 20 Nov 2011, 04:38:45 pm »

If anything, it's going to be an interesting study on user Vs developer goals, and inertia in technological development.  My prediction is it's going to take nVidia Kepler release to kickstart revision, whereupon neither Legacy Cuda nor OpenCL apps will be functioning nicely due to mandatory new driver models.  I'll watch it a bit then drop the observations to nVidia, just to see if they have a better strategy in mind.

Jason
Logged
Claggy
Alpha Tester
Knight who says 'Ni!'
***
Online Online

Posts: 2944


WWW
Re: Recent Driver Cuda-safe Project List
« Reply #14 on: 21 Nov 2011, 03:02:07 pm »

Anything to add to Slicker's reply Jason?

Claggy
Logged
Pages: [1] 2 Go Up Print 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Discussion Forum  |  Topic: Recent Driver Cuda-safe Project List « previous next »
Jump to:  


Quote!
Reality is merely an illusion, albeit a very persistent one.
- Albert Einstein

 
Site Statistics
Total Members:96
Total Posts:55,989
Total Topics:1,581
Downloads
..Some PHP stuff ToDo
Pages served
Today:1,925
Total:20,150,167
(since 6/26/2006)
Latest Member:
Just Will Lite
 
 
Seti@Home optimized science apps and information | Powered by Enigma 2.0 (RC1).
© 2003-2014, LSP Dev Team. All Rights Reserved.
Seti@Home optimized science apps and information Forums | Powered by SMF.
© 2005, Simple Machines LLC. All Rights Reserved.
Powered by MySQL Powered by PHP Valid XHTML 1.0! Valid CSS!