Donate To Seti@HomeSeti@Home optimized science apps and information
 
Welcome, Guest. Please login or register.
21 May 2013, 09:48:43 am

Login with username, password and session length
 
» Home
» Forums
» Downloads
» FAQ
» News

» Search site
 
 
 
If you've registered already but never got your activation email, please click here.
 
 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Discussion Forum  |  Topic: Difference? 0 Members and 0 Guests are viewing this topic. « previous next »
Pages: [1] 2 Go Down Print
Author Topic: Difference?  (Read 4018 times)
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Difference?
« on: 09 Jun 2011, 04:21:45 pm »

Lets say we have some Credit and Runtime data from a few tasks and want to calculate Credit/sec. I see three possibilities:

1) sum(Credit) / sum(Runtime)
2) avg(Credit) / avg(Runtime)
3) avg(Credit / Runtime)

In the 3rd we calculate Credit/sec for each task and then we take the average of those.

1 and 2 give me the same result but not 3. What is the difference?
Logged
Miep
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 964


Re: Difference?
« Reply #1 on: 09 Jun 2011, 04:34:39 pm »

You mean, apart from the random number generator called 'Credit new'?

I'll look inti the maths tomorrow.

[edit]1 and 2 are identical because avarage is sum divided  by number of elements. as number of elements is identical they cancel each other out.

I need pen and paper for 3.
« Last Edit: 09 Jun 2011, 04:49:09 pm by Miep » Logged

The road to hell is paved with good intentions
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Re: Difference?
« Reply #2 on: 09 Jun 2011, 04:45:32 pm »

Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
Logged
Miep
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 964


Re: Difference?
« Reply #3 on: 09 Jun 2011, 04:54:06 pm »

yes, sorry I did understand it as a purely mathematical question of formulas and why they produce different results.
probably to do with the way the sums are done and in what order the errr. operations are performed. but I'll have to write it out on paper and have a close look.
Logged

The road to hell is paved with good intentions
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Difference?
« Reply #4 on: 09 Jun 2011, 04:55:54 pm »

Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?

How much different ? with just a few numbers is it closer than if there are a lot ?  If it's a fractional difference there are several opportunities for accumulated sum and rounding errors, which can look like random results, and changing the order of computation like that can make a big difference. 

Keeping the division to a single operation at the last step will be far more accurate if you have many results, and there are ways to further improve the result accuracy by not summing long strings of numbers in a line too.  summing in blocks of SQRT(N), then summing those block results, minimises accumulated roundoff error in the sums (one way).

[Edit:] Looking at the third equation with that in mind, it would basically maximise the accumulated summing error by adding smaller values, so the error has more effect on the average, Also having  applied truncation to every element during the divisions ... So yeah, yuck

If you have trouble sleeping sometime you can read this:

What Every Computer Scientist Should Know about Floating Point Arithmetic, by David Goldberg
« Last Edit: 09 Jun 2011, 05:21:43 pm by Jason G » Logged
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Re: Difference?
« Reply #5 on: 09 Jun 2011, 05:23:03 pm »

I first found out about it looking at thousands of results and thought about rounding errors. But then I took data from ten tasks to look closely.

With 10 decimal points accuracy for the separate credit / runtime operations the difference is already 0.0014 between the two methods for only 10 tasks. I don't think it could be a rounding error.

Edit: Thanks for the link!
« Last Edit: 09 Jun 2011, 05:30:06 pm by sunu » Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Difference?
« Reply #6 on: 09 Jun 2011, 05:38:09 pm »

Have a look at the section on summing error & see which answer you get if you use your first equation using Kahan Summation or similar, minimising the division to the one final one.  That would be the 'most right' answer, though there isn't any 'right' answer in floating point... They're all wrong!  Shocked  Cheesy
Logged
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Re: Difference?
« Reply #7 on: 09 Jun 2011, 05:49:35 pm »

Ok, I took 3400 tasks. Difference is 0.0017 almost equal with the 0.0014 from 10 tasks. This can't be a rounding error.

I'll look at Kahan Summation.
Logged
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Re: Difference?
« Reply #8 on: 09 Jun 2011, 07:17:27 pm »

I think an equivalent everyday example would be:

You drive from A to B and you want to know  your average km/h. This is elementary school stuff:  distance / time

The next time you drive from A to B you make 4-5 stops in between for coffee. How do you calculate your average speed now? Do you add the distance and the time and divide them ( sumx / sumy ) or do you calculate your average speed from each segment and then calculate the average as a whole ( avg (x / y))?

The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different?
« Last Edit: 09 Jun 2011, 07:21:47 pm by sunu » Logged
perryjay
Knight Templar
****
Offline Offline

Posts: 427


Re: Difference?
« Reply #9 on: 09 Jun 2011, 07:59:37 pm »

On that second drive do you also have to figure in the restroom stops?   Roll Eyes
Logged
Josef W. Segur
Janitor o' the Board
Knight who says 'Ni!'
*****
Offline Offline

Posts: 2494


Re: Difference?
« Reply #10 on: 09 Jun 2011, 09:04:51 pm »

Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?

Methods 1 and 2 give more weight to long-running tasks. Take two tasks, one which runs in 6 hours and gives 100 credits, another which runs in 2 hours and gives 40 credits. The six hours of the first task makes the 2 hours of the second task only 1/4 of the total time. So you get 17.5 credits/hour which is closer to the 16.7 c/h of the first task than the 20 c/h of the second.

But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h.

BOINC uses method 3 for its server-side averages, a 100 hour task is weighted the same as a 1 minute task...
                                                                       Joe
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Difference?
« Reply #11 on: 10 Jun 2011, 01:54:25 am »

The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different?
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h.

That's right they are different, nothing is goofy (except maybe me), because the order is important.   so it's a different calculation with or without precision issues.

#1:  sum(x) / sum(y) simplifies to the same as #2 by n/n,
#2: avg(x) / avg(y) is the ratio of two averages, which will weight by large x,
#3: avg(x / y), is the arithmetic mean of x/y , so likely the one you want,

but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.

« Last Edit: 10 Jun 2011, 02:28:27 am by Jason G » Logged
sunu
Alpha Tester
Knight who says 'Ni!'
***
Offline Offline

Posts: 771



Re: Difference?
« Reply #12 on: 10 Jun 2011, 06:50:30 am »

Yes, "weight" seems the magic word here. After Josef's post I looked at various weighted means but still avg(x / y) doesn't look anything like them.

but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.

I just wanted to calculate the credit / sec output of my machine broken down to CPU, GPU, AP, MB etc. Smiley

As for the problem with the car above, the answer isn't as simple as I thought http://en.wikipedia.org/wiki/Harmonic_mean#In_physics

Well, I guess we need a professional statistician  Cheesy
Logged
Jason G
Construction Fraggle
Knight who says 'Ni!'
*****
Offline Offline

Posts: 8980


Re: Difference?
« Reply #13 on: 10 Jun 2011, 06:57:01 am »

As for the problem with the car above, the answer isn't as simple as I thought http://en.wikipedia.org/wiki/Harmonic_mean#In_physics

Well, I guess we need a professional statistician  Cheesy

Hahaha, Yep, Don't know about Joe but my statistics is certainly rusty.  If you intend to process a lot of results, do work with a general idea of the golden rules in mind with floating point as well, since anything that could compound tiny error in unexpected ways will change the result as well.

Jason
Logged
Miep
Global Moderator
Knight who says 'Ni!'
*****
Offline Offline

Posts: 964


Re: Difference?
« Reply #14 on: 10 Jun 2011, 09:54:01 am »

I do plain linear regression. mainly to prove that credit new is not linear Grin
0.188 credit/second on beta with some flavour of x37.

Logged

The road to hell is paved with good intentions
Pages: [1] 2 Go Up Print 
Seti@Home optimized science apps and information  |  Optimized Seti@Home apps  |  Discussion Forum  |  Topic: Difference? « previous next »
Jump to:  


Quote!
Any sufficiently advanced technology is indistinguishable from magic.
- Arthur C. Clarke

 
Site Statistics
Total Members:91
Total Posts:51,094
Total Topics:1,430
Downloads
..Some PHP stuff ToDo
Pages served
Today:2,756
Total:17,310,937
(since 6/26/2006)
Latest Member:
[seti.international] Philip J. Fry
 
 
Seti@Home optimized science apps and information | Powered by Enigma 2.0 (RC1).
© 2003-2013, LSP Dev Team. All Rights Reserved.
Seti@Home optimized science apps and information Forums | Powered by SMF.
© 2005, Simple Machines LLC. All Rights Reserved.
Powered by MySQL Powered by PHP Valid XHTML 1.0! Valid CSS!