### Post by coherentsheaf on May 3, 2013 2:03:28 GMT 5

Here is the regression analysis I did back then (I removed some typos and unnecessary statements)

Some time ago, theropod noted large variance in the sperm whale measurements used to construct the regression equation that in turn was used to estimate livyatan body length.

In this post I will try to quantify the consequences of this variance.

At first I fitted the same linear model that the authors of the paper did. Here are the corresponding qq - plot and tukey anscombe plot:

qq -plot

This indicates that the errors are normally distributed.

Tukey Anscombe plot

This indicates that there is constant variance and the expected value of the errors is 0.

Since the the errors are uncorrelated, this means we can statistically analyze the regression equation beyond just giving a point prediction.

We can now predict in what range of measurements Livyatan falls with what probability (under the assumption that it is similar to a sperm whale).

The results are:

Livyatan falls into the range range of 1141 cm to 1557cm with a probability of 95%.

Livyatan falls into the range range of 1219cm to 1479cm with a probability of 80%.

However further inspection of the data tells us that one of the points has a very large cooks distance:

I decided to redo the analysis without this data point as well.

The results were:

The point estimate of Livyatan becomes: 1416 cm

The 95% prediction interval is: 1254 -1577 cm

The 80% prediction interval is: 1315 -1516 cm

So excluding one unusual data point gives us larger estimates for Livyatan.

As always here is my R-code:

You have to save the sperm whale measurents as livyatan.txt file to load them with R-studio:

Sex BL CBL BL-CBL BZW

M 1630 490 1140 170

M 1440 470 970 220

M 1560 450 1110 200

M 1460 415 1045 190

M 1000 255 745 125

M 1400 430 970 180

M 1280 310 970 160

M 1150 320 830 130

M 1360 365 995 165

M 1220 340 880 165

F 970 260 710 130

F 890 230 660 110

F 950 247 703 126

F 930 243 687 115

F 880 210 670 120

Some time ago, theropod noted large variance in the sperm whale measurements used to construct the regression equation that in turn was used to estimate livyatan body length.

In this post I will try to quantify the consequences of this variance.

At first I fitted the same linear model that the authors of the paper did. Here are the corresponding qq - plot and tukey anscombe plot:

qq -plot

This indicates that the errors are normally distributed.

Tukey Anscombe plot

This indicates that there is constant variance and the expected value of the errors is 0.

Since the the errors are uncorrelated, this means we can statistically analyze the regression equation beyond just giving a point prediction.

We can now predict in what range of measurements Livyatan falls with what probability (under the assumption that it is similar to a sperm whale).

The results are:

Livyatan falls into the range range of 1141 cm to 1557cm with a probability of 95%.

Livyatan falls into the range range of 1219cm to 1479cm with a probability of 80%.

However further inspection of the data tells us that one of the points has a very large cooks distance:

I decided to redo the analysis without this data point as well.

The results were:

The point estimate of Livyatan becomes: 1416 cm

The 95% prediction interval is: 1254 -1577 cm

The 80% prediction interval is: 1315 -1516 cm

So excluding one unusual data point gives us larger estimates for Livyatan.

As always here is my R-code:

livyatan <- read.table("livyatan.txt", header=TRUE)

attach (livyatan)

fit <- lm(BL.CBL ~BZW)

plot(fit)

summary(fit)

livyatan.BZW = dataframe (BZW=197)

predict(fit, livyatan.BZW, interval="predict", level=0.95)+294

predict(fit, livyatan.BZW, interval="predict", level=0.8)+294

predict(fit, livyatan.BZW, interval="predict", level=0.4)+294

fit2 <- lm(BL.CBL[-2]~BZW[-2])

plot(fit2)

summary(fit2)

predict(fit2, livyatan.BZW, interval="predict", level=0.95)+294

predict(fit2, livyatan.BZW, interval="predict", level=0.8)+294livyatan <- read.table("livyatan.txt", header=TRUE)

attach (livyatan)

fit <- lm(BL.CBL ~BZW)

plot(fit)

summary(fit)

livyatan.BZW = dataframe (BZW=197)

predict(fit, livyatan.BZW, interval="predict", level=0.95)+294

predict(fit, livyatan.BZW, interval="predict", level=0.8)+294

predict(fit, livyatan.BZW, interval="predict", level=0.4)+294

fit2 <- lm(BL.CBL[-2]~BZW[-2])

plot(fit2)

summary(fit2)

predict(fit2, livyatan.BZW, interval="predict", level=0.95)+294

predict(fit2, livyatan.BZW, interval="predict", level=0.8)+294

You have to save the sperm whale measurents as livyatan.txt file to load them with R-studio:

Sex BL CBL BL-CBL BZW

M 1630 490 1140 170

M 1440 470 970 220

M 1560 450 1110 200

M 1460 415 1045 190

M 1000 255 745 125

M 1400 430 970 180

M 1280 310 970 160

M 1150 320 830 130

M 1360 365 995 165

M 1220 340 880 165

F 970 260 710 130

F 890 230 660 110

F 950 247 703 126

F 930 243 687 115

F 880 210 670 120