|
|
Pages: 1 2 [3] 4
|
 |
|
Author
|
Topic: 2.4V updated apps. (Read 7143 times)
|
|
Sutaru Tsureku
|
According my own tests SSSE3 64bit under 64-bit OS is the best one for such CPU So right now probably KWSN_2.4V_SSSE3_MB.exe is the leader  (from 2.4V_Windows_x64_SSSE3 archive) That's what i'm telling people all day long  However... i do see a possibility to gain another 10 to max 15% in performance... but ONLY for the 64 bit app. Anyhow, we need to get a common base (2.4V changes) for ALL apps. That' Linux,Windows,UNIX before we can start figuring out how to get some more performance... So if I have the QX6700 with WinVista Home Basic 64Bit.. The best performance I have with the SSSE3- 32Bit app now? BTW. I saw that the opt. app have a lower 'Claimed credit' than the stock app.. This is 'only' sometimes with this special AR? This are only -0.02, but..  (The opt. app is from 08/26/2007) _____________________________________________________ <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> setiathome_enhanced 5.27 DevC++/MinGW Work Unit Info: ............... WU true angle range is : 1.393579 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_vGetPowerSpectrumUnrolled 0.00013 0.00000 sse1_ChirpData_ak 0.01417 0.00000 v_vTranspose4 0.00449 0.00000 AK SSE folding 0.00083 0.00000 Flopcounter: 5876485106912.311500 Spike count: 1 Pulse count: 0 Triplet count: 2 Gaussian count: 0 </stderr_txt> ]]> Validate state Initial Claimed credit 19. 4006742531251 _____________________________________________________ _____________________________________________________ <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows SSSE3 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4v|xT|FFT:IPP_SSSE3|Ben-Joe CPUID: Intel(R) Core(TM)2 Quad CPU @ 2.66GHz Speed: 4 x 3143 MHz Cache: L1=64K L2=4096K Features: MMX SSE SSE2 SSE3 SSSE3 Work Unit Info True angle range: 1.393579 Spikes Pulses Triplets Gaussians Flops 1 0 2 0 5875824229395 </stderr_txt> ]]> Validate state Initial Claimed credit 19. 3820590900193 _____________________________________________________
|
|
|
|
« Last Edit: 04 Sep 2007, 01:01:56 am by Sutaru Tsureku »
|
Logged
|
GREETINGS !  THEY'RE THERE OUTSIDE ! 
|
|
|
|
Josef W. Segur
|
... BTW. I saw that the opt. app have a lower 'Claimed credit' than the stock app.. This is 'only' sometimes with this special AR? This are only -0.02, but..  Some of the alternative routines which are checked for performance just after startup have flop counting embedded. The stock app uses a different and longer lasting routine to test for which routines are optimal, so accrues more flops due to testing. If the angle range were within the about 0.226 to 1.12 limits for Gaussian fitting, then two WUs with the same angle range but different data could have larger credit differences because each Gaussian test starts with a precheck which can get out quickly if the data has too little range to possibly find a Gaussian. When it takes that early exit there are fewer flops counted for the test. Joe
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
So if I have the QX6700 with WinVista Home Basic 64Bit.. The best performance I have with the SSSE3- 32Bit app now?
Under Win2003 it's 64-bit one (on Core2 class CPU). Probably the same for 64-bit Vista...
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
... If there were a way to test the same app on 2 cores or 4 cores simultaneously, I wouldn't mind knowing if it can be done and trying it..............would it be a hard thing to modify the knabench script to do it, or really just not worth the bother? It might be possible to modify knabench that way, but certainly difficult. There is a way to do realistic testing, though. It requires a cache of work, but none which might cause going into EDF during the test. 1. Turn off Network activity in BOINC, then shut it down. 2. Make another folder, say BOINCTEST. 3. Copy everything from the BOINC folder and its subdirectories to BOINCTEST. 4. Install the application you want to test in the project folder below BOINCTEST. 5. Start a timer and the Boinc Manager in BOINCTEST. 6. Run for say two hours then save all messages from BOINC Manager and shut down. Make a copy of client_state.xml, that and the saved messages are the test results. 7. To test another app, wipe out all the contents of BOINCTEST and go back to step 3. This should be possible on any platform with minor modifications. I wouldn't recommend comparing more than two apps this way, it does require going through the messages and/or client_state.xml files and checking time differences, contents of stderr reports, etc. But it's about as realistic as testing can be, each test using identical WUs starting at the same points. Joe
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
Well, this approach assumes to use "normal" full-length WUs. Really realistic one  but at least one WU per core should be completed during the test because of not perfectly linear %of work done changing during WU calculation, right? This can take more than 2 hours on lower CPUs  Does CPU time for WUs with the same AR spread widely to not allow statistical approach? And how CPU time logged on web-page corresponds real time spent on WU (assuming app running 100% of time)? Are any CPU-time corrections performed? It might be possible to modify knabench that way, but certainly difficult.
All we need is some utility that starts prescribed app in prescribed quantity and set affinity to each child process (optional step? does last BOINC versions do this ?) and wait for all childs exit,t hen exits such utility then may be used instead of optimized app in knabench, right? This approach will test "worst case" of simultaneous calculation - time for completion of all work on all cores.
|
|
|
|
« Last Edit: 04 Sep 2007, 02:31:43 am by Raistmer »
|
Logged
|
|
|
|
|
msattler
|
... If there were a way to test the same app on 2 cores or 4 cores simultaneously, I wouldn't mind knowing if it can be done and trying it..............would it be a hard thing to modify the knabench script to do it, or really just not worth the bother? It might be possible to modify knabench that way, but certainly difficult. There is a way to do realistic testing, though. It requires a cache of work, but none which might cause going into EDF during the test. 1. Turn off Network activity in BOINC, then shut it down. 2. Make another folder, say BOINCTEST. 3. Copy everything from the BOINC folder and its subdirectories to BOINCTEST. 4. Install the application you want to test in the project folder below BOINCTEST. 5. Start a timer and the Boinc Manager in BOINCTEST. 6. Run for say two hours then save all messages from BOINC Manager and shut down. Make a copy of client_state.xml, that and the saved messages are the test results. 7. To test another app, wipe out all the contents of BOINCTEST and go back to step 3. This should be possible on any platform with minor modifications. I wouldn't recommend comparing more than two apps this way, it does require going through the messages and/or client_state.xml files and checking time differences, contents of stderr reports, etc. But it's about as realistic as testing can be, each test using identical WUs starting at the same points. Joe Thanks Joe! You've given me some food for thought there. As you mentioned earlier, may be very time consuming to play with, but you've go my curiosity going now. As the holiday is over and I have to go back to work today, it'll have to wait until perhaps this weekend, but I may experiment with your approach.
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
Well, this approach assumes to use "normal" full-length WUs. Really realistic one  but at least one WU per core should be completed during the test because of not perfectly linear %of work done changing during WU calculation, right? This can take more than 2 hours on lower CPUs  Although the progress isn't perfectly linear, it is monotonic (never goes backward) and is close enough to linear to remain useful. I don't think the method can provide precise speed comparison in any case, but should clearly indicate which of two apps is faster on whatever mix of work is present. Completing WUs for each core would give result files which could be compared, but my presumption was this sort of extended testing would only be used for apps already known to produce correct results. Does CPU time for WUs with the same AR spread widely to not allow statistical approach? Contention can cause something like 30% CPU time differences, the data in WUs with equal angle range probably no more than 2%. And how CPU time logged on web-page corresponds real time spent on WU (assuming app running 100% of time)? Are any CPU-time corrections performed? IIRC, BOINC doesn't start the CPU time when it launches the app, rather when the app initiates its BOINC imterface. After that, CPU time accrues as accurately as the OS allows. On my Win2k Pentium-M system, Windows Task Manager shows about 2.5 seconds more CPU time for the current SETI task than BOINC Manager does. Most of that difference is probably delay in the BOINC Manager getting the data from the core client and displaying it. Joe
|
|
|
|
|
Logged
|
|
|
|
|
msattler
|
Well Joe, my thought were somewhere along the lines of cloning the WUs, so that you had 4 copies of the same WU (to test on a quad), so that you could get 4 instances of the same WU to run at the same time.
|
|
|
|
|
Logged
|
|
|
|
|
Raistmer
|
Thank you very much for detailed answer! You right, there is no need in linear percentage to chose faster/slower case in case of all % bigger or all % smaller. I imagined case in that lets' say WU-1 got 50%, WU-2 got 95% and with second app WU-1 got 52% and WU-2 got 90%. In that case we cant just sum up nonlinear %. But don't know will be such situation in real testing or not (BTW, completion of full WU doesnt help anyway, you right).
Only one refinement - the maximum CPU time for WU is the same that time that logged with result on project web page? Not artifical time correction (some multiplier or so? ) As I remember there was a time that some optimized app adjusted CPU time logged to achive correct credit allocation - from that case my question arose.
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
Well Joe, my thought were somewhere along the lines of cloning the WUs, so that you had 4 copies of the same WU (to test on a quad), so that you could get 4 instances of the same WU to run at the same time. That's probably possible by naming the cloned WUs with existing queued WU names and suspending other WUs so only those run. It may cause maximum contention, having all 4 cores trying to do exactly the same things at the same time. OTOH, initial contention might get the 4 instances an ideal amount out of phase so they'd perform very well. Joe
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
... Only one refinement - the maximum CPU time for WU is the same that time that logged with result on project web page? Not artifical time correction (some multiplier or so? ) As I remember there was a time that some optimized app adjusted CPU time logged to achive correct credit allocation - from that case my question arose. Trux's optimized BOINC core client "calibration" feature adjusted both reported CPU time and BOINC benchmarks. It was a well-intentioned attempt to correct the logical flaw in the old method of generating credit claims. Our apps certainly don't make any time adjustments, total CPU time for a day of running will be very close to 24 hours times the number of CPUs in the host. Joe
|
|
|
|
|
Logged
|
|
|
|
|
Vyper
|
One idea of this is to update Knabench to have a separate Multithread drawer where the temporary files can be created and a specifically chosen or more WUs lie. A little program is called to se how many threads the cpu can run in parallell and then creates dir cpu1,cpu2,cpu3 and cpu4 for instance.. Then u could create a call procedure to execute multiple apps that calculates each thread and waits for the last one to return, perhaps u even can make a callroutine that executes on X cpu/thread (affinity).. If this could be acomplished we will soon see which app that is the best compile for use in parallell execution.. This is thoughts and nothing but thoughts. There is a app called Wprime that u can enter how many threads it is going to start and a Dos windows appear that takes care of this.. http://www.wprime.net .. Kind Regards Vyper
|
|
|
|
|
Logged
|
|
|
|
|
H Elzinga
|
Howdy, there are new apps ready for download both Windows x32 and x64 incl. GFX enabled ones, ALL are new. You can see there's been a little change in the name tag as well, 2.4v ---> 2.4V is the new one . There will be a credit multiplier shown in the log file (stderr.txt). Those apps are compatible with a soon to be released 5.28 stock application that reads the credit multiplier from the workunit header. DOWNLOAD ---> http://calbe.dw70.de/seti.htmlEDITMake sure you have a look at the app_info.xml first ! There might by typos in there. So to make sure all will work, have a look for yourself 
HTH Crunch3r Are there plans to relese a new automatic installer / test and benchmark tool or should i just download the same app as the 2.2 version currently running and asume this is again the fastest for my setup.
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
Are there plans to relese a new automatic installer / test and benchmark tool or should i just download the same app as the 2.2 version currently running and asume this is again the fastest for my setup.
Installing the 2.4V equivalents to the 2.2B versions you were using is the best approach for now. There may eventually be an automatic install / test, but not soon. Joe
|
|
|
|
|
Logged
|
|
|
|
|
H Elzinga
|
Will give it a try today. Thanks.
|
|
|
|
|
Logged
|
|
|
|
|
Pages: 1 2 [3] 4
|
|
|
|
Quote!
All that is necessary for the triumph of evil is that good men do nothing.- Edmund Burke
|
 |  |  |
| |
| Site Statistics |
| Total Members: | 1,072 |
| Total Posts: | 10,833 |
| Total Topics: | 447 | | Downloads |
| Apps |
| Windows R-1.x | 25,148 |
| Windows R-2.0 | 20,356 |
| Windows R-2.2 | 36,628 |
| Linux 32bit 1.x | 6,574 |
| Linux 32bit 2.2 | 4,406 |
| Linux 64bit 2.2 | 1,784 |
| Alpha/IA64 | 204 |
| FreeBSD | 629 |
| HPUX | 346 |
| Subtotal: | 94,896 |
| Source packs: | 4,071 |
| Tool/WU packs: | 7,931 |
| Total: | 157,881 | | GBs dl'd: | 282.03 | | Pages served |
| Today: | 2,419 |
| Total: | 3,359,435 |
| (since 6/26/2006) |
| 173 Donations to S@H |
| U.S. Dollars: | 3,196.59 |
| Euros: | 863.90 |
| Last 24h: | $ 0.00 |
| Avg./24h: | $ 6.62 |
| Estim. total: | $ 4,319.66 |
Latest Member: Luke@SETI |
| |
 | |  |
 |  |  |
| |
Online users/last 15m
11 Guests, 4 Users
Jason G, Luke@SETI, Josef W. Segur, Haselgrove 24 Members/last 24hJason G, Luke@SETI, Josef W. Segur, Haselgrove, KarVi, Raistmer, Devaster, Hav0k, WinterKnight, Leaps-from-Shadows, ajs, sunu, tfp, Fivestar Crashtest, WHRoeder, Yin Gang, elec999, firefox, Geek@Play, Urs Echternacht, Claggy, _heinz, Slawek, Purple Rabbit
| |
 | |  |
|