Seti@Home optimized science apps and information
 
Welcome, Guest. Please login or register.
Did you miss your activation email?
21 Nov 2008, 09:51:19 am

Login with username, password and session length
 
If you've registered already but never got your activation email, please click here.
 
 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  Topic: optimized sources 0 Members and 0 Guests are viewing this topic. « previous next »
Pages: 1 ... 14 15 [16] 17 18 ... 27 Go Down Print
Author Topic: optimized sources  (Read 44321 times)
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #225 on: 28 Oct 2007, 11:34:50 am »

This block compiles on mine: (For comparison, I can see no major functional difference to yours Cheesy )
----------
             CurrentSub = fftlen * (ifft + iC);
             sah_complex *WorkArea = &WorkData[iC * fftlen / 2];  // assume sah_complex 2 floats
          #if !(defined(USE_IPP) | defined(USE_FFTWF)) // makes ,memcpy inactive
                       memcpy( WorkArea, &ChirpedData[CurrentSub], int(fftlen * sizeof(sah_complex)) );
          #endif

             #if defined( USE_IPP )
                        ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) &ChirpedData[CurrentSub], // Source
                     ( Ipp32fc * ) WorkArea, //Destination
                     FftSpec[FftNum],
                     FftBuf );
             #elif defined( USE_FFTWF )
                        fftwf_execute_dft( analysis_plans[FftNum], &ChirpedData[CurrentSub], WorkArea );
             #else // replace time with freq - ooura FFT
                        cdft( fftlen * 2, 1, WorkArea, BitRevTab[FftNum], CoeffTab[FftNum] );
             #endif

----------

I did notice it went haywire if I missed out a ( Ipp32fc * ) typecast.
« Last Edit: 28 Oct 2007, 12:10:13 pm by j_groothu » Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #226 on: 28 Oct 2007, 12:14:05 pm »

yes it compiles mine too --->
analyzeFuncs.cpp
-----IPP-----
-----SSE2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\win_build\Release32-NOGFX\BuildLog.htm"
seti_boinc - 0 error(s), 0 warning(s)
----------------------------------------------------------------------------------------
heinz
Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #227 on: 28 Oct 2007, 12:29:54 pm »

Ahh good one  Grin,  I'm thinking that this
new way:
       --- Using no memcopy
       --- Using IPP Function as intended

is better than the old way:
      --- Using a memcopy (even an optimised one, which I was looking at)
      --- Using IPP function in a wierd way

of course only a test can show if this has any speed difference.  Be a while before I could look at a rebuild as I have more schoolwork and have to give some tutoring this week .  Even if it is slower I don't mind because it still has helped me to understand a small piece more of the code. The next step for me after testing this would probably be to look at Joe's even better suggestions,  There are many now!.

Thanks for trying this and keep plugging away !

Back later in the week!

Jason

Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #228 on: 28 Oct 2007, 12:55:20 pm »

changed benchmark.cpp ----->
--------------------------------------------------------------------------------------------------------
   for(loops = 0; loops < 25 && (end_cyc-total_run)< MAX_CYCLES; loops++)
      {
      if(pre_test == zero_out)   memset( out_buf, 0, test_size );
      if(pre_test == fill_in)      memcpy( out_buf, workBuf, test_size );
      ramming_speed();
      cycles = cycleCount();
      switch ( bench_list[idx].token )
         {
         case _FFT:
            #if defined( USE_IPP )
            if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            }
            else
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) workBuf,   // This is the source data, this is not overwritten
                  ( Ipp32fc * ) out_buf,   // This is some other Buffer destination
                                    // no memcpy required
                  FftSpec,
                  NULL );
            }
            #endif //seti_britta:
            #if defined( USE_FFTWF )
            fftwf_execute_dft( da_fft_plan, (sah_complex *)&in_buf[0], (sah_complex *)&out_buf );
            #endif
            break;
-----------------------------------------------------------------------------------------------------------------------------
it compiles well --->
benchmark.cpp
-----IPP-----
-----SSE2-----
-----ipp-----
-----sse2-----
Build log was saved at "file://c:\I\SC\vs90\seti_boinc_2k3_2.2B-Ben-Joe\client\Optimizer\Release32-NOGFX\BuildLog.htm"
Optimizer - 0 error(s), 0 warning(s)
-------------------------------------------------------------------------------------------------------------------------------
will try this an look if it works well....
see you again here
regards heinz
Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #229 on: 28 Oct 2007, 01:22:56 pm »

ahah I see.... now that IPP call is "In Place"  You can do this:
   
...
       if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
             //     ( Ipp32fc * ) out_buf,  // Commented out this to make it inplace
                  ( Ipp32fc * ) out_buf, // This is both source and destination
                  FftSpec,
                  NULL );
            }
...

Whether it makes any difference is another question Cheesy
questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?
« Last Edit: 28 Oct 2007, 01:35:16 pm by j_groothu » Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #230 on: 28 Oct 2007, 02:02:56 pm »


questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

hmm... maybe Alex Kan or Joe has a good answer
Logged
Josef W. Segur
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 791


View Profile
Re: optimized sources
« Reply #231 on: 29 Oct 2007, 10:39:15 am »


questions I have are:
        -    Why benchmark an array of zeroes ?
        -    If zeroed array needs to be benched , why not test it 'fully' out of place (separate src/dest buffer like below)?

hmm... maybe Alex Kan or Joe has a good answer

The 2.2B benchmark.cpp source doesn't set pre_test to zero_out anyplace. Setting pre_test = fill_in makes sense for the in place transform so it always works on the same random data, that's not needed for out of place. But the FFT benchmark is timing only, and wasted time at that except in standalone runs with -bench or -verbose, since it is not used to choose a "best" variant. The lunatics.at 2.4 builds don't run the FFT benchmark test, though Crunch3r's 2.4V builds which use IPP FFTs do.

I don't know why Ben Herndon used the out of place form of  parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
                                                       Joe
Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #232 on: 29 Oct 2007, 11:44:32 am »

I don't know why Ben Herndon used the out of place form of  parameters in the ippsFFTInv_CToC_32fc() calls, but he may have checked the actual code produced and determined that was slightly more efficient.
                                                       Joe
I wracked my brain about this, and ultimately came to a similar (though more convoluted and speculative) conclusion.  It would make sense to me if an explicit out of place call could make better use of the prefetch, cache and paralellism mechanisms we have discussed in a different context.  An explicit in place call could not, (so far as I can see for now, through read write dependancies).

After considering that, another possibility presented itself:
    for the same reasons, as originally presented the memcopy followed by the out of place form call (with inplace parameters), may simply be faster than 'true out of place' way we're playing with Roll Eyes.  If so, I suspect a 'cache doubling effect' from using same source & dest. 

The flipside is that if that effect shows verifiably then it might even  indicate the particular calls are not using streaming writes to start with... possibly bringing your hybridised codelet phased processing screaming to a new sense of urgency.

More speculation than hard data at the moment, I'll think about some small simple external tests for a while and stew on it for a couple of weeks  Wink

Jason
Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #233 on: 01 Nov 2007, 05:13:26 pm »

ahah I see.... now that IPP call is "In Place"  You can do this:
   
...
       if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
             //     ( Ipp32fc * ) out_buf,  // Commented out this to make it inplace
                  ( Ipp32fc * ) out_buf, // This is both source and destination
                  FftSpec,
                  NULL );
            }

if we do this we get a error message ---->
.\benchmark.cpp(634) : error C2660: 'w7_ippsFFTInv_CToC_32fc' : function does not take 3 arguments
also let it so as it is --->
            if(pre_test == zero_out)
            {
               ippsFFTInv_CToC_32fc(
                  ( Ipp32fc * ) out_buf,
                  ( Ipp32fc * ) out_buf,
                  FftSpec,
                  NULL );
            }
--------------------------------------------
so it compiles
heinz
Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #234 on: 01 Nov 2007, 06:00:14 pm »

so it compiles
heinz

Yes, as we have discovered before I must need my eyes checked  Grin and it would make sense , if it was ever used in the zero fill context, to leave it using  the same form as might occur in a real analysis anyway.

For the sakes of information - Here is the form for out of place Inverse FFT  (as exists):
    IppStatus ippsFFTInv_CToC_32fc(
                 const Ipp32fc* pSrc,
                 Ipp32fc* pDst, const
                 IppsFFTSpec_C_32fc* pFFTSpec,
                 Ipp8u* pBuffer);

And Here is the form for in place :
    IppStatus ippsFFTInv_CToC_32fc_I(
                 Ipp32fc* pSrcDst,
                 const IppsFFTSpec_C_32fc* pFFTSpec,
                 Ipp8u* pBuffer);

I am currently learning much about what is connected to what by trying to separate out the benchmark (for exploratory purposes).  Piece by piece it connects to almost the whole codebase, Still a few external references to track down, but I may end up with a stripped down custom testbed for examining function of different algorithms, libraries & optimised functions.

The main reason for this unnecessary but educational exploration is, I may wish to try and see actual differences between the FFT libraries, different compilers and flags, without touching my main copy of the code anymore.  Also I am interested to see how close to ideal the forward and inverse transforms are when a 'Maximum Length Sequence' is applied as input, rather than zeroes or random data (I hope I'll get a constant power spectrum, with no spikes etc...We''ll See Cheesy )

Jason
Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #235 on: 03 Nov 2007, 05:32:00 am »

Hi Jason,
her you see the output of ET I use to measure codepieces of two functions p1, p2
--------------------------------------------------------------------------------------------------------------------
ET v1.0 test seti
-------------------
Timer Frequency in:
Hz  =       3579545
MHz =       3.57955
GHz =       0.00358

Start Time =    1080132967465 Ticks
Stop Time  =    1080134441029 Ticks

Duration in Ticks   =  1473564
Duration in seconds =  0.4116623760841
--------------------------------------
Start Time =    1080134443291 Ticks
Stop Time  =    1080138377735 Ticks

Duration in Ticks   =  3934444
Duration in seconds =  1.0991463998916
--------------------------------------
        P1 = 1473564
        P2 = 3934444
        dif= 2460880

Solution:P1 is faster than P2
Press the Enter Key!
------------------------------------------------------------------------------------------------
so we see the success without running a test WU....

heinz
Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #236 on: 03 Nov 2007, 07:37:13 am »

Cool , thanks for the links by PM.  could be quite handy for the things I intend to be looking at soon.... but LOL, where is etimer.lib file that is discussed in the intel site ? The link at the end of the etimer article is giving me some 3d transform program files INTEAD  Shocked , if I can't find it I probably should let Intel know their link is broken ....

[ LOL now they fixed it ! Cheesy, maybe they read Lunatics]
« Last Edit: 03 Nov 2007, 07:42:27 am by j_groothu » Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #237 on: 03 Nov 2007, 10:47:59 am »

maybe....we are one of the most accessed, now more than 22 000 ...... Grin


Logged
Jason G
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2229


View Profile
Re: optimized sources
« Reply #238 on: 03 Nov 2007, 01:34:41 pm »

'Tis truly an Epical Thread  Cheesy.... But Wait there's more! ....

Using the timers I ran some big loop math array test pieces to establish the best optimisation configurations on my old p4 Northwood::

With everything else equal:
the xW sse2 setting I've been using all along = 14.15 secs (repeated runs to make sure)
the xN sse2 setting I wanted to test properly = 12.8 secs (repeated runs to make sure)

That makes xN builds nearly 10% faster on my old clunker with looping math code!

This means that:
 The good news is  I may already have found a way to acheive my 5 to 10% speed improvement goal for this machine! (without doing much at all.... Hmmm ...Better start thinking of a new goal! )
 Grin

Bad news is that I now have to go and rebuild the seti projects with my new settings to see if it will work ... and no time this week!  Sad


Surprise Surprise, a  QxN build is faster on my Northwood Tongue
LOL


       
« Last Edit: 03 Nov 2007, 01:43:35 pm by j_groothu » Logged
_heinz
Code Wizard
Knight who says 'Ni!'
*****
Offline Offline

Posts: 718


View Profile
Re: optimized sources
« Reply #239 on: 03 Nov 2007, 02:55:37 pm »

lol... make a copy of your current seti folder and set it parallel to the boinc folder...so you need not touch the old one.

Logged
Pages: 1 ... 14 15 [16] 17 18 ... 27 Go Up Print 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Windows  |  Topic: optimized sources « previous next »
Jump to:  


Quote!
'An it harm none, do as ye will.
- Wiccan Rede

 
Site Statistics
Total Members:1,070
Total Posts:10,726
Total Topics:446
Downloads
Apps
Windows R-1.x25,141
Windows R-2.020,353
Windows R-2.236,615
Linux 32bit 1.x6,573
Linux 32bit 2.24,405
Linux 64bit 2.21,784
Alpha/IA64203
FreeBSD628
HPUX345
Subtotal:94,871
Source packs:4,062
Tool/WU packs:7,923
Total:157,579
GBs dl'd:281.91
Pages served
Today:2,141
Total:3,349,035
(since 6/26/2006)
173 Donations to S@H
U.S. Dollars:3,196.59
Euros:863.90
Last 24h:$ 0.00
Avg./24h:$ 6.64
Estim. total:$ 4,319.66
Latest Member:
Claggy
 
 
Seti@Home optimized science apps and information | Powered by Enigma 2.0 (RC1).
© 2003-2008, LSP Dev Team. All Rights Reserved.
Seti@Home optimized science apps and information Forums | Powered by SMF.
© 2005, Simple Machines LLC. All Rights Reserved.
Powered by MySQL Powered by PHP Valid XHTML 1.0! Valid CSS!