|
|
Author
|
Topic: optimized sources (Read 44322 times)
|
|
ScanMan
|
Thanks for the heads up on my question.
Regards
ScanMan
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
Hi Jason, Merci for compiling my codepieces and make asm files with Intel-Compiler. After a first look at asm-code, AKFCOMP and FPUCOMP performs well. found why my asm output not worked in ORCAS, in Configuration was Release, but must have Debug.  heinz
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
Hi Jason, if you have some little time, try this with the Intel-compiler and use the etimer-project for measuring. if you need anythink PM me. ------------------------------------------ ------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------ Compiling... Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. cl /Od /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFCOMP" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /Gy /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp" AKfoldSSE.cpp -----IPP----- -----SSE2/em----- -----AKFCOMP----- Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" Optimizer - 0 error(s), 0 warning(s) ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== I had have a look at the asm-file yet.  heinz
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Will have a look at compiling this with 'USE_AKFCOMP" defined soon , and check if I need anything else. [was done & pm'd]
Jason
|
|
|
|
« Last Edit: 21 Nov 2007, 11:43:12 am by j_groothu »
|
Logged
|
|
|
|
|
_heinz
|
Merci, must a little be finetuned to go more parallel. PM you if it is done. heinz
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
The auto- vectorizer runs  ----------------------------------- ------ Build started: Project: Optimizer, Configuration: Release32-NOGFX Win32 ------ Compiling... Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.20404 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. cl /O2 /Ob2 /Oi /Ot /Oy /GT /I "../../../boinc/win_build" /I ".." /I "..\.." /I "..\..\..\boinc\lib" /I "../../../boinc/api" /I "../../db" /I "C:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer" /I "C:\I\INTEL\IPP\5.2_beta\ia32\tools\staticlib" /I "C:\I\INTEL\IPP\5.2_beta\ia32\include" /D "USE_AKFSIMD" /D "USE_IPP" /D "USE_SSE2" /D "WIN32" /D "_WIN32" /D "_WINDOWS" /D "_CONSOLE" /D "_DEBUG" /D "_LIB" /D "_MT" /D "CLIENT" /D "NBOINC_APP_GRAPHICS" /D "_UNICODE" /D "UNICODE" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /GF /FD /EHsc /MTd /Zp16 /arch:SSE2 /fp:fast /FAs /Fa"Release32-NOGFX\\" /Fo"Release32-NOGFX\\" /Fd"Release32-NOGFX\vc90.pdb" /W3 /c /Wp64 /Zi /Gd /TP /FI "win-config.h" ".\AKfoldSSE.cpp" AKfoldSSE.cpp -----IPP----- -----SSE2/em----- -----AKFSIMD-----Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm" Optimizer - 0 error(s), 0 warning(s) ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
Working now on a vectorized version of chirpfft.cpp heinz 
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Hi Heinz, Did you manage to determine any performance differences between our 'auto vectoriser friendly' folding routine (when compiled under ICC, with the pragma hints / dependency overrides) and hand vectorised code? If you haven't had a chance I'll be able to take another look in 2 weeks (holidays  ) Jason
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
Hi Jason, I´m waiting with this till you have holidays. Realised some nice ideas to eleminate not necessary code.  The autovectorizer runs great. Let surprise you. Have a nice week. Heinz 
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
As I´m going through the code, fraction_done get my attention. Always before it is called we found (sometimes not directly before) following statement ---> progress = std::min( progress, 1.0 );
1. in function do_transpose progress = std::min( progress, 1.0 ); #ifdef BOINC_APP_GRAPHICS if ( !nographics() ) { if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData ); sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts ); } #endif remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft; fraction_done( progress, remaining ); ---------------------------------------------------------------------------------------------------------- 2. in function process_data progress = std::min( progress, 1.0 ); #ifdef BOINC_APP_GRAPHICS if ( !nographics() ) { if ( gbp ) gbp->rarray.add_source_row( (float *)WorkData ); sah_graphics->local_progress = ( (( float ) ifft + 1) / NumFfts ); } #endif remaining = 1.0 - ( double ) ( icfft + 1 ) / num_cfft; fraction_done( progress, remaining ); ------------------------------------------------------------------------------------------------------ 3. in analyzePoT.cpp line 246 progress = std::min( progress, 1.0 ); // prevent display of > 100% fraction_done( progress, remaining ); ----------------------------------------------------------------------------------------------------------------------------------- 4. in analyzePot.cpp line 387 progress = std::min( progress, 1.0 ); // prevent display of > 100% fraction_done( progress, remaining ); ---------------------------------------------------------------------------------------------------------------------------------------------------- therefore I think if we call fraction_done( double progress, double remaining ) it is not necessary in it to calculate progress again --->progress = std::min( progress, 1.0 ); because we get same result as before. So we can comment it out. After helping the Compiler with some additional vars we get following short hopfully effective code --->
; 75 : prog2 = 1.0 - remaining;
fld1 fsub QWORD PTR _remaining$[esp-4]
; 76 : // progress = std::min( progress, 1.0 ); // is alredy done before call fraction_done ; 77 : // prog = progress * ( 1.0 - pow( prog2, PROG_POWER ) ) + prog2 * pow(prog2,PROG_POWER );//original ; 78 : // A = pow( prog2,PROG_POWER ); ; 79 : // prog = progress * ( 1.0 - A ) + prog2 * A ; ; 80 : // B = 1.0 - A; C = prog2 * A; ; 81 : // prog = progress * B + C; ; 82 : // D = progress * B; ; 83 : // prog = D + C; ; 84 : ; 85 : A = pow( prog2,PROG_POWER );
fld QWORD PTR __real@4018000000000000 call __CIpow
; 86 : B = 1.0 - A; C = prog2 * A;
fld1 fsubrp ST(1), ST(0)
; 87 : D = progress * B; ; 88 : prog = D + C; ; 89 : boinc_fraction_done( prog );
sub esp, 8 fmul ST(0), ST(0) fmul QWORD PTR _progress$[esp+4] fadd ST(0), ST(0) fstp QWORD PTR [esp] call _boinc_fraction_done add esp, 8
; 90 : }
ret 0 ?fraction_done@@YAXNN@Z ENDP ; fraction_done ---------------------------------------------------------------------------------------------------------------------------------------
your comments are welcome
heinz
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Working now on a vectorized version of chirpfft.cpp heinz  Hi Heinz, I'm now on holidays  , Are you looking at this one? I am trying to get reoriented after finishing study/work for the year, and am recovering after some serious celebrations  . It's time to catch up! Jason (PS, I been raised to code wizard so I've been reading more of the private areas, I think some of the stuff we've been trying out to force the autovectoriser has some real relevance and we maybe should start a thread about it there)
|
|
|
|
|
Logged
|
|
|
|
|
_heinz
|
Hi Jason, had not have time the last days.... think we should equalize our codes first, if you are agree using the new programm structure I will upload all and if it is done PM you. heinz
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Hi Heinz, Sounds like a good idea, PM when ready, take your time, no rush  . For a comparative baseline reference, I have a functional 2.4V noGFX build with xN switches now. It was tough finding a suitable Boincapi svn revision to build against because of much restructuring of random 'utils' and gfx classes between ~august 'til now. [Investigating some unresolved externals actually led me to posts made by Simon back about July, on Beta, regarding the same sets of unresolved externals]. I think we should decide if we want to fix at a certain Boinc API svn revision (less work but may break), or build against the HEAD (lots more work...). One initial feeling I get from that experience is any improvements that involve cutting out unnecessary boinc interface, and remove some of the basic string, memory and utility functions away from boincapi --> back towards OS/app might stabilise some of those issues (As these elements seem to be in constant flux in boincapi)....Yes I'm aware that's the exact opposite feeling an api library is supposed to generate [stability and solidity]. Of course a stripped down minimalist 'version' might constitute its own branch.... Just ideas. [Might be an idea to make the required utility functions in our own lib, maybe allowing us to drop some boincapi .h & .c references completely, removing dependancy on the revision... e.g. 'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...] Jason
|
|
|
|
« Last Edit: 18 Dec 2007, 03:24:17 am by j_groothu »
|
Logged
|
|
|
|
|
_heinz
|
I like the idea of a stripped down minimalist version, but we should eleminate not necessary code with #ifdef directives, in connection with the use of include files for variants, as I have done it with USE_PFLOOP etc. , because it is important to have still one sourcecode, from which we can generate all necessary programmversions for the different cpu´s. 'str_util.c' & 'str_util.h'.. do we really need to use boinc's version of this?...] is a question for Joe surprise... we are codewizards,  who does it ? heinz
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Yay, I feel special too.. I believe It was 'The Lunatic Mods of Chickenness" calling on the Holy Powers of the "Knights Who Say Ni!"
|
|
|
|
|
Logged
|
|
|
|
|
|
Quote!
Those who cannot remember the past are condemned to repeat it.- George Santayana
|
 |  |  |
| |
| Site Statistics |
| Total Members: | 1,070 |
| Total Posts: | 10,726 |
| Total Topics: | 446 | | Downloads |
| Apps |
| Windows R-1.x | 25,141 |
| Windows R-2.0 | 20,353 |
| Windows R-2.2 | 36,615 |
| Linux 32bit 1.x | 6,573 |
| Linux 32bit 2.2 | 4,405 |
| Linux 64bit 2.2 | 1,784 |
| Alpha/IA64 | 203 |
| FreeBSD | 628 |
| HPUX | 345 |
| Subtotal: | 94,871 |
| Source packs: | 4,062 |
| Tool/WU packs: | 7,923 |
| Total: | 157,580 | | GBs dl'd: | 281.91 | | Pages served |
| Today: | 2,168 |
| Total: | 3,349,062 |
| (since 6/26/2006) |
| 173 Donations to S@H |
| U.S. Dollars: | 3,196.59 |
| Euros: | 863.90 |
| Last 24h: | $ 0.00 |
| Avg./24h: | $ 6.64 |
| Estim. total: | $ 4,319.66 |
Latest Member: Claggy |
| |
 | |  |
 |  |  |
| |
Online users/last 15m
14 Guests, 3 Users
Devaster, Raistmer, Haselgrove 27 Members/last 24hDevaster, Raistmer, Haselgrove, Jason G, Geoff, Urs Echternacht, gaulois952, _heinz, Leaps-from-Shadows, The Grinch, ceciltseng, KWSN - jonnyv, Josef W. Segur, iceMan, zangetsu, WHRoeder, jbenfield, mark henderson, Geek@Play, firefox, Slawek, Claggy, popandbob, Vyper, Gecko_R7, KarVi, sunu
| |
 | |  |
|