BuiltWithNOF
Between-subjects

In these simple and flexible designs the experimental  units (e.g. cages of animals, litters, or individual animals) are assigned  to the treatments at random, regardless of their characteristics.

If the experimental material is heterogeneous  (e.g. animals vary a lot in weight or age) a randomised block design might be better. If using mice or rats Isogenic should be used where possible, and all species should be free of disease and matched for age/weight. They should all have been reared under identical conditions, and once they  have been assigned to their treatments they should housed, treated and  measured in random order (i.e. not by treatment group).

Assuming a measurement outcome these designs are preferably analysed using a one-way analysis of variance (ANOVA) (provided the assumptions are met).

Two numerical examples are given with the data being analysed by an ANOVA. These involve real data with the usual complications.

  1. In the first example there are two outliers, with discussion about how these should be treated.
  2. The second example is of a survey of mouse strain susceptibility to  the induction of lung tumours, with a discussion of the problems in sample  size determination, and the need for a data transformation to compensate for unequal variation in each group.
Example 1.

The data in the table below came from a preliminary study as part of an investigation into the use of laser scanning cytometry in assessing the mouse micronucleus genotoxicity assay (Styles  JA, Clark H, Festing MF, Rew DA. 2001.  Cytometry 44:153-155).

Age-matched, SPF female BALB/c mice  from a commercial company were acclimatised for two weeks in groups of four per cage and then assigned at random to treatment with urethane, 3-methylcholanthrene or saline (control) by intra-peritoneal injection. After an appropriate time the number of micronuclei was  expressed as counts per 1000 erythrocytes. Animals were assessed in random order and the investigator was blind with respect to treatment.

Micronucleus counts in mice

Control (1)

Urethane(2)

MCA(3)

1.48

3.04

1.26

2.23

3.48

2.13

1.87

2.49

1.87

1.88

3.56

0.75

1.90

6.10

2.00

1.59

2.81

1.98

1.23

3.85

2.27

2.06

2.90

2.34

2.15

5.39

1.55

0.66

1.57

2.26

0.33

2.15

1.76

1.22

2.34

0.83

Means 1.55

3.31

1.75

In this case the aim is to see whether the mean micronucleus counts  differ between the three groups. This can be assessed using a one-way  analysis of variance followed by an appropriate post-hoc comparison to see  which of the carcinogen treated groups differ from the control group. Note  that the mouse is the experimental unit (since mice were individually  treated) but the mice were housed four per cage. There may have been some cage effects, but these have been ignored in this analysis.

A dotplot showing individual observations is given below.  From this, there does not appear to be much difference between groups 1  and 3, and group 2 seems to be more variable due largely to two outliers  with very high counts.
Such outliers should always be checked against the original data in case they are due to transcription errors. In this case they are valid.
The problem with outliers is that they inflate the standard deviation, so  reducing the power of the experiment, and may bias individual treatment  means. One strategy for dealing with them is to do the analysis both with and
image2
without them to see if it makes any difference to the conclusions. For  the moment they have been kept.

 

The next step it to do a preliminary ANOVA to see if the assumptions underlying the ANOVA are met. These assumptions are that the residuals (deviation of each observation from its group mean) have a normal distribution, and the variation is the same in each group.  MINITAB version 14 produces  Residual Plots as shown here which can be used to study these assumptions.

The Normal Probability Plot should give a straight line if the residuals have a normal distribution. In this case the two outliers show up clearly, and the  residuals deviate slightly from a straight line, due largely  to the two outliers.  It is possible to do a formal statistical test of the normality of the residuals, but this is probably too sensitive. The  ANOVA is quite tolerant of deviations from the two assumptions. The plot of Residuals Versus Fitted Values (top right) shows the residuals plotted individually  against the fits (the group means). Again the two outliers show up clearly,  with group 2 being a bit more variable than the other two groups. The  histogram of the residuals is generally not very helpful with such small  numbers, and the Residuals Versus Order plot should show no obvious pattern,  as in this case.

image4

 

An ANOVA and Dunnett's test (the test appropriate for comparing treated  means with a control mean)  which includes the two outliers is given below. It shows that there are highly significant differences in mean  micronucleus counts among the three groups (p<0.0005), but Dunnett's test  shows that whereas the controls differ from group 2 (urethane), they do not differ from group 3 at the 5% level of significance.

A re-analysis of the data excluding the two outliers (not shown here)  results in exactly the same conclusions, so the two outliers can be retained in this case as removing them would make no difference to the conclusions.

One-way ANOVA: Micronuclei versus Group

Source   DF     SS       MS             F       P
Group      2      22.196   11.098   14.00   0.000
Error      33       26.165    0.793
Total      35       48.361

 Pooled StDev = 0.8904

Dunnett's test is the post-hoc test appropriate for comparing the treatment means with a control mean. The  output from MINITAB first gives the family error rate as 0.05, which means that a false positive result (claiming a difference due to a treatment when it is only due to chance) will be produce in only 5% of experiments. Because of multiple testing, this means that the individual error rate is set (by MINITAB) at 0.0272. The critical value is the amount by which means must differ to be  significantly different at the family error rate of 0.05. The output then  consists of the mean difference (Center) and the 95% confidence interval (CI) for that mean differences. If the CI does not span zero (i.e. the signs do not differ), then the difference is significant (as with level 2).

Dunnett's comparisons with a control
Family error rate = 0.05
Individual error rate = 0.0272
Critical value = 2.31
Control = level (1) of Group
Confidence (95%)Intervals for  treatment mean minus control mean 

Level      Lower   Center    Upper
2        0.9167   1.7567    2.5966
3        -0.6399   0.2000    1.0399

Numerical example 2

 The aim of this "experiment" was  to determine the susceptibility of a number of inbred strains and F1 hybrids  of mice to the development of lung tumours following treatment with urethane. Susceptible and resistant strains were later used in research to identify quantitative trait loci controlling susceptibility and in studying the effect of anti-oxidants in preventing the development of cancer.   It was already known that strain A/J is susceptible and C57BL/6 is resistant, and these two strains were included for comparative purposes (actually C57BL/6-+/Lprobheterozygous obese mice were used as these were more readily available, and assumed to be identical to  C57BL/6 in this respect).

Although it has the appearance of being an experiment (and legally was an  experiment), because it was conducted in the laboratory and involved treating mice with a carcinogen, it was really a statistical survey because "strain" can not be assigned at random to a  mouse. And the purpose of the study was not to test hypotheses, but rather to characterise experimental material.

 An appropriate way of determining  sample size for a study of this sort is not readily available.  Power analysis is not appropriate because the aim was not to test whether strains  differ: it was already known that they do differ. The resource equation  method  suggests that three mice per strain would be adequate, but each strain mean would then be rather poorly estimated. Another complication was  that it was already known that C57BL/6 mice would have a very low tumour count, and strain A/J a high count with a strong correlation between mean and standard deviation, this means that a pooled standard deviation based on the  raw data would not be appropriate. Some of the more advanced statistical  packages for power analysis will calculate sample size required to produce a  confidence interval of a certain size, but specification with ten groups could be difficult. So, a rather subjective estimate of eight mice per strain was used.  Had the outcome been the percentage of mice getting a tumour rather than  tumour counts, then much larger sizes would have been required.

 Eight animals of each strain,  matched as closely as possible for age, were treated with urethane by i.p.  injection and kept for about six months. They were then humanely killed and  the number of tumours (which were only a couple of mm diameter) on the  surface of the lungs were counted. The counts are presented in the table.

Lung tumour counts in mice following treatment with urethane

 A2GxICFW

 A2GxHTI

 A2G-hr

 NMRI

 A2G

 HTI

 B6+/Lprob

 A/J

 ICFW

 A2GxNMRI

 3

 11

 11

 3

 20

 2

 0

 34

 4

 5

 2

 11

 17

 0

 21

 3

 1

 12

 0

 6

 2

 3

 19

 5

 2

 2

 0

 8

 0

 4

 4

 5

 27

 1

 16

 5

 0

 19

 0

 7

 2

 5

 16

 6

 23

 1

 0

 12

 4

 6

 1

 3

 13

 2

 14

 0

 1

 21

 0

 9

 1

 4

 18

 3

 21

 0

 1

 13

 0

 10

 0

 4

 19

 3

 22

 1

 0

 8

 0

 11

Unpublished data, M.Festing  dating from 1977

A boxplot (MINITAB 14) provides a  useful graphical summary of the results. The horizontal bar in the middle of  the box is drawn at the median, the box covers the inter-quartile range, and  the whisker is the lowest or highest value within 1.5 x the inter-quartile  range. The asterisk indicates an outlier outside this range.

 v
image5

Such a graphical summary may be  sufficient, without further statistical analysis. It clearly shows the  susceptible and resistant  inbred strains, with two of the F1 hybrids being intermediate and one being resistant.

 However, if it is necessary to decide which strains differ then an ANOVA and post-hoc comparisons will be necessary. In view of the heterogeneity of variation between strains  a transformation of scale will be also be necessary. Counts of this sort often have a Poisson distribution, for which a square root transformation of  X+1 is usually appropriate. Such an analysis (not shown) finds, for example,  no significant difference  between A, A2G and A2G-hr.  Finally, a table showing means and standard deviations of the raw, untransformed data could be presented as shown below.

Mean tumour counts in 10 strains of mice

Strain

N

Mean

StDev

A

8

15.9

8.68

A2G

8

17.4

6.93

A2G-hr

8

17.5

4.78

A2GxHTI

8

5.75

3.33

A2GxICFW

8

1.88

1.25

A2GxNMRI

8

7.25

2.49

B6-Lepob

8

0.38

0.52

HTI

8

1.75

1.67

ICFW

8

1.00

1.85

NMRI

8

2.88

1.96