|
Characteristics of Distributions
Central Tendency (or Location)
To distinguish between the distribution (1) and (2) some
measure of location or central tendency is required. Three types of
"average" value can be used.Arithmetic Mean This is the most useful
of the three measures of location. If are observations on a variable X then the arithmetic mean , is defined as:
i.e.  If the observations are grouped, ie occurs times, occurs time, etc then

i.e.  This situation arises in two cases:
(i) when the classes are individual discrete values as in the radiation data:
|
No of counts |
0 |
1 |
2 |
3 |
4 |
5 |
|
Frequency |
5 |
11 |
10 |
8 |
4 |
2 |
(ii) when data on either a discrete or continuous
variable has been grouped into classes with associated frequencies. We then
assume that the data items are uniformly distributed throughout a class so that
the class midpoint can be taken to represent the values within the class as a
whole. The easiest way to perform the calculation is to use the statistical
facilities of a scientific calculator. Find the "mean" button on your
calculator, usually marked , and make sure you know how to use it! Consider the following
examples. (i) The thickness of the magnetic coating was measured at 10 randomly
chosen points on the surface of a flexible disc produced by a manufacturer. The
following results, in microns were obtained.
Thickness 973
975 976
977 976
980 981
977 979 976
The mean thickness is given by 977 microns
(ii) We have already met the following data on the heights of trees
|
Height (cms) |
Frequency (f) |
Class midpoint |
| |
|
 |
|
49.5 - 79.5 |
7 |
64.5 |
|
79.5 - 109.5 |
11 |
94.5 |
|
109.5 - 139.5 |
14 |
124.5 |
|
139.5 - 169.5 |
21 |
154.5 |
|
169.5 - 199.5 |
42 |
184.5 |
|
199.5 - 229.5 |
35 |
214.5 |
|
229.5 - 259.5 |
10 |
244.5 |
| |
140 |
|
In this case 172.7 cm.
The Median
(a) Discrete Distribution Consider a
set of data on a discrete variable. Arrange the data in ascending or descending
order of magnitude then the median is: (i) the middle item for an odd number of
observations, (ii) the average of the two middle items for an even number of
observations. In general if there are n observations the median is the
value of th observation . For example:
2, 5,
6, 8,
13, 15,
19, 22,
38 have median 13 3,
4, 8,
9, 13,
16, 17,
20, 21,
22 have median i.e. 14.5 (b) Continuous
Distributions The median divides the area under the frequency
distribution diagram (ie the histogram) into two equal parts. Whilst there
is a special formula for calculating the median for a grouped frequency
distribution it is probably easiest to estimate it from the ogive by drawing a
line across at the 50% level on the % cumulative frequency (vertical) scale to
the curve and then down to the horizontal scale to read off the estimate (see
the picture above of the ogive).
The main difference between the median and the mean is that the median is
insensitive to extreme values since, unlike the arithmetic mean, it does not
take into account the actual value of each observation, but only considers the
rank of each measurement.
The median is useful in such areas as lifetime testing of components.
Quartiles The upper and lower quartiles
(together with the median) divide the observations into quarters. For
n observations on a discrete variable the upper and lower quartiles are
the values of the and the observations respectively when they are arranged in ascending order of
magnitude.
For a distribution on a continuous variable the quartiles are, like the
median, easy to estimate from the ogive by drawing lines across at 25% and 75%
on the vertival scale and then down to the horizontal scale and reading them off
(again see the picture above of the ogive).
The Mode The MODE of a set of
observations is the value that occurs most frequently. When designating
the mode for a grouped frequency distribution, we usually refer to the MODAL
CLASS, where the modal class is the class with the highest frequency. If a
single value for the mode of grouped data must be specified, it is taken as the
midpoint of the modal class.
For example, for the data on the height of nine year old trees, the modal
class is 169.5 -199.5(cm) so that 184.5 cm would be taken as the mode.
The disadvantage with the mode as a measure of location is that it is not
always unique, ie a distribution can have more than one mode.
For grouped data the mode is not uniquely defined, since changing the class
intervals may give different maximum frequencies.
Dispersion (Spread or Scatter)
To distinguish between the distributions (1) and (2) some
measure of dispersion or scatter is required. Various measures of
dispersion are available.
The Range The RANGE of a set of observations is the difference
between the greatest and least of the observations. It is easy to calculate and
is widely used in industrial quality control as one check on manufactured items.
However, it ignores the distribution of the observations between the extremes
(eg possible concentrations about the centre) and is too easily affected by
freak results.
For example: 2.3
4.1 5.2
6.9 8.8 9.4 have
range = 9.4 - 2.3 = 7.1.
Also
2.3 6.1
6.2 6.4
6.6 9.4 still have range
7.1 Semi-Interquartile Range
This is defined as: siqr = ½ (upper quartile - lower quartile)
Again this is fairly easy to calculate, is not so easily affected by freak
results and is useful for comparing the dispersion of similarly shaped
distributions.
|