|
|
Pages: [1] 2 3
|
 |
|
Author
|
Topic: 2.4 Unrecoverable Errors ?? (Read 11135 times)
|
Gus
Alpha Tester
Squire
 
Offline
Posts: 35
|
This is a brand new machine, so I don't know if these unrecoverable errors are coming from the hardware, software, or whatever. It's a core-2 quad (Q6600) right out of the box about 3 hours ago, a stock Dell computer, and I've never run anything on it but KWSN_2.4_SSE3-Core2_MB. It is not OC'ed nor modified in any way. It's cool in the house so temp should not be an issue. The box has displayed no errors so far except these.
Or they may just be bad wu's and mean nothing.
I received two of these errors so far, pretty close together, in about 2 hours of total crunch time. I'll attach a text file with the Messages Tab entries around the time of one of the errors, a URL to the two Results, and a URL to the computer.
I'm going to bed shortly so if you have any questions feel free to email me at obermege2 at comcast dot net and I'll reply in the morning. . . .
EDIT OK, after watching this for a while it looks like they may be bad wu's. I've had several others and the names of them all begin with 03mr07aa.19893.3344.xx.x.xxx. I'll leave this post up just in case.
|
|
|
« Last Edit: 11 Aug 2007, 07:20:34 am by Gus »
|
Logged
|
|
|
|
|
Simon
|
We'll take a look - right now, your result is the only one available for those WUs, so it's difficult to compare.
They do look like "can happen" mistakes.
Regards, Simon.
|
|
|
|
|
Logged
|
|
|
|
Gus
Alpha Tester
Squire
 
Offline
Posts: 35
|
One has now been returned OK by someone else. Also the most recent one is not the same vintage as the others. (Ignore the very first one, it was likely caused when I switched to the optimized app.)
It may indeed be a problem with the machine after all. I'll reseat the memory and run some diags. I've since set it to receive no new work. Hate to do that the way things are running but I don't want to trash a bunch of wu's.
Can anyone suggest good free diagnostics that really torture the machine?
(Later) I now see that it spit out a whole page of errors one after the other. Just reseated memory and restarted, we'll see.
|
|
|
|
« Last Edit: 11 Aug 2007, 07:47:41 am by Gus »
|
Logged
|
|
|
|
christofire
Guest
|
Prime95 - CPU torture test MemTest86+ - Memory test (runs at boot-time)
|
|
|
|
|
Logged
|
|
|
|
Gus
Alpha Tester
Squire
 
Offline
Posts: 35
|
Thanks, I have 4 copies of Prime95 running now, I'll do the memtest after they've run a while.
|
|
|
|
|
Logged
|
|
|
|
|
Morten
|
|
|
|
|
« Last Edit: 11 Aug 2007, 11:36:22 am by Morten »
|
Logged
|
|
|
|
Gus
Alpha Tester
Squire
 
Offline
Posts: 35
|
@Morten- Those are indeed the same errors I was getting. They always happened after running only a few seconds. Mine is running Vista Home Basic on a Q6600. (Darn I hate Vista, but that's another issue.) @Simon- I'm beginning to think there may be a problem with the discovery routine that the science app runs when a result first starts. What makes me think so is that I noticed on two occasions all four results that were in progress and had been running fine failed when I stopped and restarted BOINC. The beginning is the only time they fail. Once they get going they run fine unless Manager is stopped and restarted, then they may or may not fail immediately.  ??
|
|
|
|
|
Logged
|
|
|
|
Gus
Alpha Tester
Squire
 
Offline
Posts: 35
|
Simon, See "Client error with Chicken on 64bit Vista" in the Number Crunching boards. The author is running the 32 bit client on Vista 64, someone else reported the same with 32-bit Vista. One is running a core 2 E6600, the other a core 2 T7400. The second person reports that 64-bit on 64-bit works fine.
The problem looks like it may be specific to the core2 32-bit app. If so, I'm sorry to offer problems, but relieved that my machine may be OK after all. Prime95 is running 4 threads clean for about 3 hours now. I'm going to try the "stock" MB app and see how that goes.
|
|
|
|
« Last Edit: 11 Aug 2007, 01:38:17 pm by Gus »
|
Logged
|
|
|
|
|
Richard Haselgrove
|
I've also had three (so far) unrecoverable errors with 2.4 - I've logged one in the thread Gus has just linked, but they took the database down just as I was going to post the other two! However, I disabled network access after the first one, and I've backed up the entire BOINC folder, so I should have WU / result / state files for the other two.
The first error was on a high AR linefeed WU, the next two are October deadline multibeams. I'll post links to the results on the SETI board when the server is back up.
This is on my octo, running 32-bit Vista Business, BOINC v5.10.13, and the Core 2 variant of Chicken 2.4 (for Xeon 5320). It hasn't shown any Windows error messages or other signs of distress, and crunching is continuing on all 8 cores, with a mixture of SETI, Einstein and CPDN. It's done about 20 WUs since I upgraded to 2.4 this morning, so these errors are the exception rather than the rule.
Richard Haselgrove
|
|
|
|
|
Logged
|
|
|
|
|
Richard Haselgrove
|
And one more error. I can't upload/report because the servers are still down, but I'll backup the data again before uploading/reporting.
The WUs affected are:
12my00aa.25379.26192.790908.3.204 03mr07ab.4943.6207.6.4.94 03mr07ab.4907.6207.5.4.101 03mr07ab.4544.11115.3.4.8
- I'll post WUID / RID when I can.
|
|
|
|
|
Logged
|
|
|
|
MikeK
Guest
|
I have around ten unrecoverable errors as well since switching to 2.4.
I havn´t a long time, so it should something wrong with 2.4. While i´m not able to report have to wait til the servers are up again.
Mike
|
|
|
|
|
Logged
|
|
|
|
|
Richard Haselgrove
|
Four more errors on Core 2 / Vista overnight (and they'll be the last - I'm now dry on that box).
No signs of distress on SSE3 / server boxes: unless anyone has any better suggestions, I think I'll switch the Vista box to the SSE3 compile when the servers come back up, and do a comparison run that way.
NB before anyone asks: this is a Dell Precision workstation, running at stock speed with plenty of cooling. And I upgraded to the A05 BIOS after the first reports of Core 2 errata, so I think it's unlikely (and highly co-incidental) if these errors were in any way platform-releated.
|
|
|
|
|
Logged
|
|
|
|
|
Simon
|
Okay,
seeing as problems with 32-bit Windows apps on Vista have been reported, there will be new versions out shortly.
Those were compiled using Visual Studio 2003 (as all apps until now), but Vista doesn't like VS 2003 anymore, it seems.
So, new apps compiled with VS 2005 SP1 (with Vista patches) will be uploaded shortly (today, if I can manage).
Regards, Simon.
|
|
|
|
|
Logged
|
|
|
|
Devaster
Volunteer Developer
Knight who says 'Ni!'
   
Offline
Posts: 628
I like Duke !!!
|
having vista32 and 2.4 app and no problems ...
|
|
|
|
|
Logged
|
|
|
|
|
Richard Haselgrove
|
Okay,
seeing as problems with 32-bit Windows apps on Vista have been reported, there will be new versions out shortly.
Those were compiled using Visual Studio 2003 (as all apps until now), but Vista doesn't like VS 2003 anymore, it seems.
So, new apps compiled with VS 2005 SP1 (with Vista patches) will be uploaded shortly (today, if I can manage).
Regards, Simon.
Thanks - but don't feel you need to rush, get a good night's sleep so you can think clearly! I presume it would be helpful to run with the new Core 2 / Vista when available, rather than SSE3: anything else I can do to help track it down? Richard
|
|
|
|
|
Logged
|
|
|
|
|
Pages: [1] 2 3
|
|
|
|
Quote!
Every man is guilty of all the good he didn't do.- Voltaire
|
 |  |  |
| |
Online users/last 15m
34 Guests, 3 Users
_heinz, ML1, Richard Haselgrove 14 Members/last 24hML1, _heinz, Richard Haselgrove, Hans Dorn, mr.mac52, Claggy, Mike, arkayn, Raistmer, SubSpace, Josef W. Segur, Urs Echternacht, Pizzadude, corsair
| |
 | |  |
|