Blocked designs

Blocking splits the experiment up into several "mini-experiments" or "blocks". Typically each block has one experimental unit of each treatment, although there could be more than that.                    

The designs include:

The randomised block design: splitting the experiment into a number of "mini-experiments" for convenience, to increase power or take account of natural structure in the available material

The repeated measures or crossover, within-subject design: sequential treatments applied to a single animal or other subject. In this case the animal is the "block". These experiments usually assume that the treatment does not change the animal.

The Latin square design: used to split the experiment in two directions in order to further increase power or decrease sample size

Other more complex designs not discussed here. They include incomplete block, Graeco-Latin square, lattice designs and other playgrounds for professional statisticians. These are rare in animal experiments


The randomised block design

Numerical example 1: Apoptosis in rat thymocytes:
Very often the experimental material has some natural structure. The aim of the experiment described here was to see if two drugs designated CGP and STAU altered apoptosis in rat thymocytes compared with a vehicle control. This involved humanely killing a rat, preparing the thymocytes, and adding them to petri dishes with a suitable cell culture medium. It was only possible to obtain enough thymocytes for three petri dishes from each rat, and this was done each week for three weeks. Note that the randomisation is done within each block, i.e. each week the three dishes were assigned at random to one of the three treatments. After a suitable culture period, the amount of apoptosis was scored in each dish, in random order and blind with respect to treatment. The scores are shown in the table.

The natural structure arises from the fact that individual rats may differ in their responses and/or there may be time effects associated with the in-vitro culture. This needs to be taken into account in the statistical analysis. The results are given in the table below.

Apoptosis score in rat thymocytes


Week 1

Week 2

week 3













 A bar diagram of these data is shown below. It is clear that the relationship between the three treatments is always the same, but that there are large absolute differences affecting all three treatments in any one week. These need to be removed in the statistical analysis if the effect of the drugs is to be correctly assessed. This can be done using a two-way ANOVA "without interaction”.



 A statistical analysis of these data using MINITAB uses a two-way ANOVA, given below.

Output from MINITAB two-way analysis of variance

Source   DF      SS        MS        F      P
Week     2     21764.2   10882.1   114.82  0.000
Trt      2     2129.6   1064.8   11.23  0.023
Error    4       379.1     94.8
Total    8     24272.9

S = 9.735 R-Sq = 98.44%

Note that S is the standard deviation and is the square root of the error mean square 94.8. The R-sq given here is the percentage of variation accounted for by both week and treatment. Of more interest would be the percent to the total variation accounted for by the treatments, which is 21764.2/24272.9 =89.7%
The residuals plots are shown below. This experiment is exceptionally small, but there is no evidence that the residuals differ to any great extend from normality (i.e. the normal probability plot is a reasonable good straight line, and the plot of fits versus residuals shows no evidence that the variation is heterogeneous, i.e ., there is no particular pattern to the residuals.

Dunnett’s test is used to see which treatment means differ significantly from the control. The MINITAB output is given below. This shows that treatment 2 mean does not differ significantly (i.e. p>0.05) from the control mean (the 95% confidence interval for the difference spans zero), but treatment 3 mean does do so.

Dunnett 95.0% Simultaneous Confidence Intervals
Response Variable apoptosis
Comparisons with Control Level
Trt = 1 subtracted from:
Trt    Lower  Center   Upper
2      -8.315  18.00    44.31
3      11.352  37.7     63.98


Numerical example 2. A repeated measures or crossover, within-subject design:

The repeated measures, within subject design is very like a randomised block design with the subject being the "block". Differences between subjects are a random effect, while the treatment is a fixed effect. It is analysed using a two-way ANOVA without interaction just like a randomised block experiment, but the term "subject" (or in this case cage) is used instead of "block".

As a numerical example, the aim of this experiment was to see whether mice discriminated between a range of solutions given in their drinking water. The "subject" in this case was a cage with two C57BL/6 mice with two drinking bottles. One contained distilled water the other the test solution (see table). The position of the bottles was reversed each day, and each solution was tested for one week. The percentage of fluid consumed from the bottle containing the test substance was recorded and is shown in the table.

Note that each cage had the test solutions presented in random order. Cage 1 for example had the solutions B, C, A, E and D presented in that order. Thus, like a randomised block experiment, randomisation was done within each subject.


A within-subject design. Percentage of fluid consumed from the test bottle in a two-bottle choice experiment.


Week 1

Week 2

Week 3

Week 4

Week 5


B     69.6    

C     61.9    

A    54.9    

E   69.4    

D  78.3   


A    48.2    

D     81.5    

E    60.9    

B   61.1    

C   43.9   


C    53.4    

D     74.7    

A    49.9    

B  58.2    

E  68.5   


E    64.5    

A     50.4    

D    73.6    

B  55.3    

C   50.7   


A  Control, distilled  water in both bottles
 B 0.02% Saccharin
 C 0.05M sodium chloride
 D 0.04M sucrose
 E 10% ethanol

Assuming the block effect (cage differences) is small, a dotplot can be used to take a preliminary look at the data as shown below. There are not obvious outliers, so it is safe to go ahead and do a trial analysis of variance and look at the residuals plots to check assumptions.  If cage differences were large this might not be so appropriate.




A preliminary ANOVA can be done to generate the residuals plots to make sure that the assumptions are valid. This is shown below with the normal probability plot showing a good straight line and the plots of fits versus residuals showing no pattern that might be cause for concern. Accordingly, the analysis of variance should be fully valid, and is shown below.





Analysis of Variance for percent fluid consumed

Source DF          SS               MS          F      P
Cage       3      205.14        68.38       4.44   0.026
Treats     4      1819.17       454.79     29.53    0.000
Error      12      184.78        15.40
Total      19      2209.09

Means      N  percent
   A      4   50.850      Distilled water
   B      4    61.050      Saccharin (0.02%)
   C      4    52.475      Salt (0.05M)
   D      4   77.025      Sucrose (0.05M)
   E      4    65.825      Ethanol (10%)

From this analysis it can be concluded that there are highly significant (p<0.0005) differences in percent fluid consumed according to the treatment. The treatment means (which in a publication should probably be presented with one decimal place) suggest that the mice consume more saccharin, sucrose and ethanol than distilled water, but hardly any more salt solution. Dunnett's test can be used to see which of these means differ from the control (not shown).


The Latin square design

This design can increase the power of the experiment above that of the randomised block design by removing two possible sources of heterogeneity such as body weight, day or time of day, litter, position in the animal house etc.

The number of subjects needs to be the square of the number of treatments. So with, say, four treatments the experiment will require sixteen animals.

Small Latin square experiments (e.g. less than 4x4) may be so small that they lack power. However, they can be replicated so that a single experiment could involve two 4x4 Latin squares.
Large Latin square experiments (e.g. above about 7x7) may be difficult to design and manage.
The design uses a randomised layout. The figure below is a systematic (non-random) layout where each row consists of red, green, blue and yellow with the red being displaced one step to the right in each row, and the colours wrapping round. Note that the design is exactly balanced with exactly one animal of each treatment group in each row and column. A randomised design is obtained by randomising whole rows followed by randomising whole columns. This can be done with any size of square and it retains the balance.

No numerical example is given here, but a hypothetical example is given below.

Hypothetical example: Suppose the aim of an experiment is to determine the effect of four drug treatments on neuronal activity in some non-human primates. However, the determinations are time consuming so that only a few animals can be studied each day and diurnal rhythms may affect the results so that the treatments need to be balanced with respect to time of day. Suppose also that in this case a within-animal experiment is possible. In such a case it may be possible to use a 4x4 Latin square design using four animals. In the figure below each row could represent an animal and each column a time of day slot. Note that each animal has all four treatments and each time slot also has all four treatments, balancing the design with respect to animal and time of day.

Column 1

Column 2

Column 3

Column 4






















The above square has the treatments distributed systemmatically (the A isslipped one column  to the right for each row). Before use the Latin square needs to be randomised first by randomising whole rows and then whole columns. This method will preserve the structure of one treatment in each row and column. The results from a Latin square experiment will be analysed using a three-way analysis of variance without interaction. A 4x4 Latin square may be too small to give sufficient power to the experiment, but a single experiment could involve two or more such squares. Latin square with more than about seven treatments may become unwieldy.