Agreement Between Measurements

We thought that a conference that said everyone would be wrong and sit down would fall a little flat. We had to come up with a method that was the right one. We thought the basic statistical methods were obvious. If we are interested in an agreement, we want to know to what extent the measures can be removed from each other in the two different ways. So we started with two methods with the differences between the measurements made on the same subject. We can calculate the average and standard deviation of these differences. If the average and standard deviation are constant and the differences are approximately normal, 95% of these differences must be between the average minus 1.96 SD and the average plus 1.96 SD. Later, we called them the 95% limit of the agreement. Think of two ophthalmologists who measure the pressure of the ophthalmometer with a tonometer. Each patient therefore has two measures – one of each observer. CCI provides an estimate of the overall agreement between these values. It is akin to a “variance analysis” in that it considers the differences in intermediate pairs expressed as a percentage of the overall variance of the observations (i.e.

the overall variability in the “2n” observations, which would be the sum of the differences between pairs and sub-pairs). CCI can take a value of 0 to 1, 0 not agreeing and 1 indicating a perfect match. Diastolic blood pressure varies less between individuals than systolic pressure, so we would expect to see a less good correlation for diastolic pressures if methods are compared in this way. In two documents (Laughlin et al., 1980; Hunyor et al., 1978) with 11 pairs of correlation, this phenomenon was observed each time. It is not indicated that diastolic measurement methods are less consistent than for systolic measurements. This table gives another illustration of the effect on the correlation coefficient of dispersal between individuals. The sample of patients in the Hunyor et al. study had much larger standard deviations than those in the Laughlin et al. sample, so the correlations were greater. It is important to note that in each of the three situations in Table 1, the passport percentages are the same for both examiners, and if the two examiners are compared to a typical 2-×-2 test for mated data (McNemar test), there would be no difference between their performance; On the other hand, the agreement between the observers is very different in these three situations.