Regression

Sometimes an experiment is designed to estimate the response in one “dependent” variable to changes in an “independent” variable which can be controlled by the experimenter. In such cases regression analysis is usually most appropriate.

Where there is no evidence that variation in one of these causes variation in the other, then the strength of association can be quantified using correlation. However, when there is a causal relationship between them regression analysis is most appropriate. This is the most common situation in designed experiments, where one variable is deliberately varied (known as the independent variable) and the other one (the dependent variable) is measured, such as when the aim is to quantify a dose-response relationship.

Example

Liver tumours in humans are difficult to remove surgically. However, a microwave probe has been developed which can be inserted into the tumour and a dose of microwaves given to kill the tumour in-situ. The aim of the experiment, using pigs as a model, was to find out the dose of microwaves needed to kill a tumour of a given diameter. Pigs were used as the model as their liver is about the same size as human’s. The pigs were anaesthetised, the probe was inserted in various places and a dose of microwaves was applied. The pigs were then allowed to recover and at some time later they were re-anaesthetised and the lesion diameter was measured. The table below shows the diameter of lesion in the liver of pigs following ablation with the microwave probe (Strickland et al 2002. Experimental study of large-volume microwave ablation in the liver. Br.J.Surg. 89:1003-1007). Following these experiments, the technique is now being used successfully in humans.

The aim in this case was to obtain a dose-response relationship. This can be done using regression.

In this case the experimental unit is a patch of liver in a pig because the probe can be placed in various parts of the liver and the dose level can be assigned at random. The data was presented to the computer (using MINITAB) using one column for the dose level and one for the lesion diameter. The results are shown below:

Power (bold, in Watts) and lesion diameter(cm)
36	50	100	150	200
1.9	3.3	4.7	5.5	5.8
	3.2	4.0	5.0	6.0
	2.8	3.5	4.4
	2.8	3.5	4.4
	2.4	3.9	6.0
	2.7	4.8	6.5
	3.2	4.4	5.0
	3.8	4.3	5.0
	3.4	3.7
	3.0	3.5
		3.8

In this case the experimental unit is a patch of liver in a pig because the probe can be placed in various parts of the liver and the dose level can be assigned at random. The data is presented to the computer (using MINITAB) using one column for the dose level and one for the lesion diameter

The computer output following regression analysis of the above table (Diameter abbreviated to Diam) is given below. Note that the the results are partly shown in the form of an ANOVA table (in blue).

————————————————————————————————————————————–

Regression Analysis: Diam versus Power

The regression equation is
Diam = 1.91 + 0.0213xPower    <————-The regression equation. Diameter increases by 0.0213 cm per Watt of power
Predictor       Coef           SE Coef             T             P
Constant      1.9133         0.2256           8.48      0.000
Power          0.021315    0.002027    10.51       0.000

S = 0.5327 R-Sq = 78.7% R-Sq(adj) = 77.9%

Analysis of Variance

Source       DF   SS    MS     F      P
Regression   1   31.374 31.374 110.54 0.000
Residual Error 30    8.515   0.284
Total        31 39.889

Unusual Observations
Obs Power Diam Fit SE Fit Residual St Resid
28 150 6.5000 5.1105 0.1367 1.3895 2.70R

R denotes an observation with a large standardized residual
————————————————————————————————————————————–

Although the output is extensive, the ANOVA table provides a test of whether there is an association between the power and the lesion diameter. The probability that an association as strong as the one which was observed could occur by chance is p<0.0005.

The regression equation indicates how much the lesion diameter increases for each one Watt increase in power. However, the results can be seen most clearly graphically, below. This shows the individual points, the best fitting straight line and the 90% prediction interval (in blue), within which 90% of points should lie.