Measures of dispersal/spread
Characteristics of Distributions
Dispersion (Spread or Scatter)
For example: 2.3 4.1 5.2 6.9 8.8 9.4 have range = 9.4 - 2.3 = 7.1.Semi-Interquartile Range
This is defined as: siqr = ½ (upper quartile - lower quartile)
Again this is fairly easy to calculate, is not so easily affected by freak results and is useful for comparing the dispersion of similarly shaped distributions.
Variance and Standard Deviation
Consider the following two simple distributions:
8 9 9 11 13both have the same mean value yet the -values have less scatter about their mean than the -values. To measure this scatter we first obtain two new series which show us how much the terms differ individually from their mean:
-2 -1 -1 1 3Nothing is gained by considering the mean of these differences as a measure of spread since
This can be overcome by considering the mean of the numerical deviations of the observations from their mean, ie ignoring whether these deviations are negative or positive, defining
mean deviation =
for the above distributions
2 1 1 1 3mean deviation for the values = and the mean deviation for the values =
However, this quantity is not suitable for algebraic manipulation and the elimination of the negative signs of the deviations is best achieved by squaring and then finding the mean of these squares, ie defining:
4 1 1 1 9To obtain a measure of dispersion having the same units as the original variable we define
standard deviation .
Standard deviation for the values and
standard deviation for the values
Again the statistics facilities of your calculator can be used to find the standard deviation. However, most calculators have two versions for the standard deviation. These are:
. . . . . . (1)(the one we have already seen) and
. . . . . . (2)Expression (1) is the standard deviation of a set of data values which constitute the totality of those values in which we are interested, ie the population. As already mentioned we are rarely able to study the population exhaustively so s can not often be calculated. Calculating s from all possible samples from a given population and then finding their average produces a value which is smaller than the population standard deviation. Consequently expression (1) is said to produce a biased estimate of the population standard deviation.
It can be shown that changing the divisor n in expression (1) to n-1 to give expression (2) produces an estimate the standard deviation of a population, of which the n data values are a random sample, which is unbiased. Consequently s is the value usually calculated.
Some texts use instead of s for expression (2), the symbol denoting that the quantity is an estimator. Some calculators
represent expression (1) by and expression (2) by whilst others actually use and s.
Below is a demonstration of bias in which 100 samples are taken from a uniform (0,1) distribution and and plotted against sample number and the number of values of each that are above and below the population value of 0.01833 are counted. The following is the spreadsheet display
Consider again the data on the thickness of the magnetic coating on the flexible disc, ie
973 975 976 977 976 980 981 977 979 976
Use your calculator to confirm that s, the estimated standard deviation of the population from which this sample is taken is = 2.40 microns.
For grouped data the expressions for the standard deviation become
Use your calculator to obtain as 47.0 cms
The Coefficient of Variation
As a measure of variability the standard deviation has magnitude which depends on the magnitude of the data.
The COEFFICIENT OF VARIATION expresses sample variability relative to the mean of the sample. Since s and have the same units, V has no units at all, a fact which emphasises that it is a relative measure.
Summer 25.1 27.2 24.8 29.5 22.7 28.3 23.2 24.6
Winter 43.2 37.5 52.8 61.0 41.7 39.8 65.4 38.1
For summer 25.7, s = 2.42 9.43
For winter 47.4, s = 10.89 22.96
|from SHU Science & Maths, 1998|