|
Post by Grey on Jun 9, 2015 13:25:53 GMT 5
In the supplementary data Pimiento&Balk.Dataset (the first one) what does mean Sample Variance ? Can't see the relevance of it.
|
|
|
Post by theropod on Jun 9, 2015 13:32:31 GMT 5
Exactly what it looks like I would presume, the variance of the sample (presumably sorted by locality or formation, but I didn’t check). It measures how variable the data are, the smaller, the less. But don’t confuse it with the standard deviation, which is the variance’s squareroot.
|
|
|
Post by Grey on Jun 9, 2015 13:46:01 GMT 5
Is it related to size variance ? Sample Mean refers to the mean size right ? The description is : Carcharocles megalodon tooth measurements and body size estimations from different geologic formations around the world.
|
|
|
Post by theropod on Jun 9, 2015 14:06:21 GMT 5
Yes, mean means average size. Since the study was primarily about total length, and since that’s obviously what the mean refers to, the variance must also refer to total length.
variance=(sum((Total Lenght - Mean Total Length)^2))/Sample Size
|
|
|
Post by Grey on Jun 9, 2015 14:24:18 GMT 5
Damn I'm just not a maths guy ahah!
The first one has a Sample Mean TL of 8.005 and the Sample Variance of 21.84605. The sample size is 2 (specimens). So you just divide by 2 the Sample Variance to get something ?
(I hate looking stupid).
Btw, it was me who said to Pimiento the supp was not available yet, she then contacted dryad.
|
|
|
Post by theropod on Jun 9, 2015 16:02:21 GMT 5
Sorry, expressed that ambiguously. If you take the sum of the squared deviations from the mean and divide that by the sample size (so yes, 2 in this case) you get the variance.
If you take the root of the variance you get the sample standard deviation.
The first file has two sheets (I often overlook those too, but I knew the whole dataset had to be in there somewhere), one is the one you posted the screenshot of, the other one contains all the 544 specimens including collection data, measurements and size estimates.
Thank you for that!
|
|
|
Post by Grey on Jun 9, 2015 16:42:03 GMT 5
Thanks !
I find curious that the largest specimen found in the dataset was found earlier in the sample of the Gatun Formation.
I'd like to know why Catalina said me there are specimens between 18-19 m in the dataset, earlier she said me the maximum in the paper was going to be 18 m (which is the case). Maybe she confuses it with others data (?).
|
|
|
Post by coherentsheaf on Jun 9, 2015 16:43:21 GMT 5
Grey The size is somewhere in the table in the link- if you open the first one at the lower left there you can switch to the table with the raw data , I went through it and could not find any larger size 17.9m. Anyway since there is discussion of the term variance in the thread, I will give a few notes what this term is useful for: Variance can be used to give probability estimates about how much a given measurement can deviate from the mean. A general way to do this Chebychev's inequality: en.wikipedia.org/wiki/Chebyshev%27s_inequalityThis roughly states that the probability that a measurement deviates from the mean value by more than k*squareroot(variance) is smaller or euqal to 1/k². For example if the mean in our sample is 10m and the variance is equal 11.45, then the probability of deviations larger than 3*squareroot(variance)=10.15 is smaller than 1/9. You will notice that this bound is not very good, we cannot rule out that a given Megalodon is larger than 20m from one of these samples with more than 88,8% probability. If we knoe that the population is normally distributed (bell shaped) we can say much more. Then large deviations become rare. The explicit formula in theis case cannot be calculated but we have calculated most values using a computer: www.mathsisfun.com/data/standard-normal-distribution-table.htmlWe know empirically that the distribution used is not a normal distribution in this case, but maybe we can make some educated guesses that in grown ups it quite resembles one. This is becuase the upper half of the probability density estimates looks like a bellcurve and there is a strong theoretical justification for it with the central limit theorem (size is composed of many indepedent genetic and environmental factors all small, so the additive compositon of their effects should be approximately normally distributed). Since I currently do not have access to my software atm/time to do it, here is an exercise for theropod : calculate the variance of the individuals with body size above the mode of the sample or other proposed means for adult size if they are a better fit. We will treat these individuals as the upper half of the normal distribution resembling the adults. Now here comes some more advanced math - no worries I will explain the takeaway in the subsequent paragraph. Centering the normal distribution at 0 and using the standarddeviation = root of variance =1, we can reverse engineer the original variance from the variance of the truncated sample. Starting with the normal distribution we now the distribution function =1/((2*pi)^1/2)*exp(x^2/2) and the probability density conditional on being larger than zero (called f) is simply twice that on the positive half meaning that given the information you are in the upper half of the distribution the relative density stays the same. The variance is given as the (expected value of (x))^2-(mean of truncated distribution)^2. SO first we will calculate the mean of the truncation by integrating x*f over the real line. A simple substitution of u=x²/2 does the trick and we get the new mean of 2/((2 pi)^1/2). To calculate the expected value of x^2 we have to integrate f*x^2 over the real line. Fortunaly both x^2 and hence f as well are symetrical meaning that f(x)=f(-x) and x^2=(-x)^2 which implies that the inegral over the positive numbers is equal to the integral over the negative numbers. It follows:2*integral over psotive= The integral over the positive + the integral over the negative/2= integral over all numbers/2= 1 which is known since the density f is simply 2 the density of the normal distribution of mean 0 and varriance 1. Putting the two together we get a variance of 1-2/pi. This means the variance of the truncated distribution is 1-2/pi times the original or conversely the variance of the original distribution is 1/(1-2/pi) that of the truncted distribution. So when we have the variance of the upper distribution we can reconstruct the variance of the normal distribution by simply multiplying with this constant. We can plausibly extract the former from the data and we will get a very good guess at the actual size distribution for grown up Megalodon (!), which is pretty cool I think
|
|
|
Post by theropod on Jun 9, 2015 20:06:00 GMT 5
I’ll try to understand the mathematical stuff later, but the variance of megalodon specimens above 10.5m is ~3.635.
|
|
|
Post by coherentsheaf on Jun 9, 2015 20:16:06 GMT 5
Ok, great, this would give us a variance about 10 for the distribution of adults. From this we get a SD of 3.16 and individuals 16.85 as +2SD individuals meaning theoretically only 2.3% of the animals should be this large or larger this is empirically somewhat too low, just I expected.... more on this later
|
|
|
Post by theropod on Jun 9, 2015 20:29:28 GMT 5
btw: There is some variation, notably the largest few categories are a bit better-sampled than expected. Perhaps some minor collection bias somewhere. And as for the maximum size, I can confirm that the largest specimen is 17.9m (or rather, my computer confirms it, so I can’t just have overlooked something): > max(megdata$tl) [1] 17.9 What Pimiento wrote wasn’t very specific, she must have meant something else (probably along the lines of knowing that some other teeth that were not sampled because they weren’t properly measured, in private collections, only floating around the internet etc. would be in that size range).
|
|
|
Post by coherentsheaf on Jun 9, 2015 20:32:07 GMT 5
Look at the spike between 17 and 18m, it is glorious. I think there is something going on!
|
|
|
Post by coherentsheaf on Jun 9, 2015 20:41:53 GMT 5
theropod, can you do a similar bar graph, 7.5m+ with somewhat wider bars?
|
|
|
Post by Grey on Jun 9, 2015 21:02:38 GMT 5
Look at the spike between 17 and 18m, it is glorious. I think there is something going on! Ahah what do you mean ? I have to admit I'm just f**king lost here !
|
|
|
Post by theropod on Jun 9, 2015 21:02:48 GMT 5
Here you go: or a little wider than that:
|
|