|
|
Pages: [1] 2
|
 |
|
Author
|
Topic: Difference? (Read 4018 times)
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Lets say we have some Credit and Runtime data from a few tasks and want to calculate Credit/sec. I see three possibilities:
1) sum(Credit) / sum(Runtime) 2) avg(Credit) / avg(Runtime) 3) avg(Credit / Runtime)
In the 3rd we calculate Credit/sec for each task and then we take the average of those.
1 and 2 give me the same result but not 3. What is the difference?
|
|
|
|
|
Logged
|
|
|
|
|
Miep
|
You mean, apart from the random number generator called 'Credit new'?
I'll look inti the maths tomorrow.
[edit]1 and 2 are identical because avarage is sum divided by number of elements. as number of elements is identical they cancel each other out.
I need pen and paper for 3.
|
|
|
|
« Last Edit: 09 Jun 2011, 04:49:09 pm by Miep »
|
Logged
|
The road to hell is paved with good intentions
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
|
|
|
|
|
Logged
|
|
|
|
|
Miep
|
yes, sorry I did understand it as a purely mathematical question of formulas and why they produce different results. probably to do with the way the sums are done and in what order the errr. operations are performed. but I'll have to write it out on paper and have a close look.
|
|
|
|
|
Logged
|
The road to hell is paved with good intentions
|
|
|
|
Jason G
|
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)? How much different ? with just a few numbers is it closer than if there are a lot ? If it's a fractional difference there are several opportunities for accumulated sum and rounding errors, which can look like random results, and changing the order of computation like that can make a big difference. Keeping the division to a single operation at the last step will be far more accurate if you have many results, and there are ways to further improve the result accuracy by not summing long strings of numbers in a line too. summing in blocks of SQRT(N), then summing those block results, minimises accumulated roundoff error in the sums (one way). [Edit:] Looking at the third equation with that in mind, it would basically maximise the accumulated summing error by adding smaller values, so the error has more effect on the average, Also having applied truncation to every element during the divisions ... So yeah, yuck If you have trouble sleeping sometime you can read this: What Every Computer Scientist Should Know about Floating Point Arithmetic, by David Goldberg
|
|
|
|
« Last Edit: 09 Jun 2011, 05:21:43 pm by Jason G »
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
I first found out about it looking at thousands of results and thought about rounding errors. But then I took data from ten tasks to look closely.
With 10 decimal points accuracy for the separate credit / runtime operations the difference is already 0.0014 between the two methods for only 10 tasks. I don't think it could be a rounding error.
Edit: Thanks for the link!
|
|
|
|
« Last Edit: 09 Jun 2011, 05:30:06 pm by sunu »
|
Logged
|
|
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Ok, I took 3400 tasks. Difference is 0.0017 almost equal with the 0.0014 from 10 tasks. This can't be a rounding error.
I'll look at Kahan Summation.
|
|
|
|
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
I think an equivalent everyday example would be:
You drive from A to B and you want to know your average km/h. This is elementary school stuff: distance / time
The next time you drive from A to B you make 4-5 stops in between for coffee. How do you calculate your average speed now? Do you add the distance and the time and divide them ( sumx / sumy ) or do you calculate your average speed from each segment and then calculate the average as a whole ( avg (x / y))?
The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different?
|
|
|
|
« Last Edit: 09 Jun 2011, 07:21:47 pm by sunu »
|
Logged
|
|
|
|
|
perryjay
|
On that second drive do you also have to figure in the restroom stops? 
|
|
|
|
|
Logged
|
|
|
|
|
Josef W. Segur
|
Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
Methods 1 and 2 give more weight to long-running tasks. Take two tasks, one which runs in 6 hours and gives 100 credits, another which runs in 2 hours and gives 40 credits. The six hours of the first task makes the 2 hours of the second task only 1/4 of the total time. So you get 17.5 credits/hour which is closer to the 16.7 c/h of the first task than the 20 c/h of the second. But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h. BOINC uses method 3 for its server-side averages, a 100 hour task is weighted the same as a 1 minute task... Joe
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
The last method now seems goofy but why is it right or wrong? And is the difference just a rounding error or avg (x / y) calculates something different? Yes, don't think about that stuff. Let's say we have x and y. Why sum(x) / sum(y) or avg(x) / avg(y) is different from avg(x / y)?
But method 3 gives equal weight to the tasks no matter how quickly or slowly they run. So you get 18.333 c/h. That's right they are different, nothing is goofy (except maybe me), because the order is important. so it's a different calculation with or without precision issues. #1: sum(x) / sum(y) simplifies to the same as #2 by n/n, #2: avg(x) / avg(y) is the ratio of two averages, which will weight by large x, #3: avg(x / y), is the arithmetic mean of x/y , so likely the one you want, but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.
|
|
|
|
« Last Edit: 10 Jun 2011, 02:28:27 am by Jason G »
|
Logged
|
|
|
|
sunu
Alpha Tester
Knight who says 'Ni!'
 
Offline
Posts: 771
|
Yes, "weight" seems the magic word here. After Josef's post I looked at various weighted means but still avg(x / y) doesn't look anything like them. but depending on what you want to achieve, if you want a more robust statistic you could possibly use the medians instead, or even truncated means to chuck out outliers.
I just wanted to calculate the credit / sec output of my machine broken down to CPU, GPU, AP, MB etc.  As for the problem with the car above, the answer isn't as simple as I thought http://en.wikipedia.org/wiki/Harmonic_mean#In_physicsWell, I guess we need a professional statistician 
|
|
|
|
|
Logged
|
|
|
|
|
Jason G
|
Hahaha, Yep, Don't know about Joe but my statistics is certainly rusty. If you intend to process a lot of results, do work with a general idea of the golden rules in mind with floating point as well, since anything that could compound tiny error in unexpected ways will change the result as well. Jason
|
|
|
|
|
Logged
|
|
|
|
|
Miep
|
I do plain linear regression. mainly to prove that credit new is not linear  0.188 credit/second on beta with some flavour of x37.
|
|
|
|
|
Logged
|
The road to hell is paved with good intentions
|
|
|
|
Pages: [1] 2
|
|
|
|
Quote!
Any sufficiently advanced technology is indistinguishable from magic.- Arthur C. Clarke
|
 |  |  |
| |
Online users/last 15m
23 Guests, 2 Users
Richard Haselgrove, ML1 14 Members/last 24hRichard Haselgrove, ML1, Byron Leigh Hatch @ team Carl Sagan, Raistmer, mr.mac52, Claggy, arkayn, Purple Rabbit, Urs Echternacht, Mike, glennaxl, Josef W. Segur, Hans Dorn, _heinz
| |
 | |  |
|