| >>The slope of the regression line (upward or downward) indicates the direction of the correlation (+ or -).
>>The closer the data points lie to the regression line, the greater the strength of the correlation; the more they are scattered away from the regression line, the smaller the strength of the correlation. >>Note that the regression line always passes through the point where the mean of X and the mean of Y intersect. (For the present example, =3.5 and =7.0.)
|
As described in Chapter xxx, the measurement of linear correlation by way of the Pearson Product-Moment Correlation Coefficient comes down to a very simple ratio between (i) the amount of covariation between X and Y that is actually observed, and (ii) the amount of covariation that would exist if X and Y had a perfect (100%) positive correlation.

Although in principle this relationship involves
two variances and a covariance, in practice it comes down to something
much simpler, involving the prior calculation of only three values
of SS. Namely
The following table shows all of preliminary calculations that would be needed for the calculation of the correlation coefficient. You will see in a moment that these preliminary calculations also provide most of the groundwork for performing a subsequent regression analysis.
| |
![]() |
|
| |
|
Once you have these preliminaries, you can then easily calculate the correlation coefficient as
![]() |
Regression Analysis:
The regression line that has been implicitly generated by the preceding calculations can be precisely defined by just two numerical values. The first of these, known as the intercept, indicates where the line starts; and the second, known as the slope, indicates the rate at which the line angles either upward (+) or downward (-), once it gets started. The formulas and calculations for intercept (a) and slope (b) are shown below. (Note that the slope is shown first, because the value of the slope must be known before you can calculate the value of the intercept.)
![]() |
In the following figure I show the same graph
that appears at the top of this page, but now constructed in such
a way as to emphasize the intercept and slope of the regression
line. ~~ The intercept, shown on the left-hand side of the graph,
is the point at which the dotted extension of the regression line
crosses the vertical Y axis--providing that the Y
axis is lined up with the point on the horizontal axis where X
is equal to zero. (Be careful with this, because bivariate
coordinate plots do not always begin the X axis at X=0.)
~~ The slope of the regression line is indicated by the pattern
in the graph that looks like a flight of stairs. What this pattern
shows is that for each increase of one unit in the value of X,
the value of Y increases by 1.31 units. Thus, when X
is equal to zero, Y is equal to the intercept, which is
2.4; when X=1.0, Y is equal to the intercept plus
1.31 (i.e., 2.4+1.31=3.71); when X=2.0, Y is equal
to the intercept plus 2.62 (i.e., 2.4+2.62=5.02); and so on.

Standard Error of Estimate
The slope and intercept of the
regression line are in fact already generated implicitly, behind
the scenes, when you perform the calculations for the correlation
coefficient. They need to be drawn out explicitly only for the
practical purpose of making predictions based on the observed
correlation. As discussed in class, the general form of such a
prediction is
The measure of probable error in this situation is a quantity known as the standard error of estimate, which is esentially a standard deviation, a measure of the aggregate degree to which the observed bivariate data points deviate from the line of regression. As indicated in Chapter xxx, the standard error of estimate takes somewhat different forms, according to whether it is regarded as a descriptive measure or an inferential measure.
To illustrate, consider again the bivatiate
values and scatter plot first shown at the top of this page.

The sum of squared vertical distances from
the regression line, calculated as
=39.2,
can be regarded as a sum of squared deviates; divide that quantity
by N, which in this case is equal to 6, and you end up
with a variance. This particular measure of variability is spoken
of as the residual variance of Y, so named because it is
the amount of variability in the variable Y that is not
associated with variability in the variable X. ~~ In practice,
you will not actually need to calculate the sum of the squared
distances from the regression line, because the value of
can be much more easily calculated as SSY(1-r2).
At any rate, take the square root of this (descriptive) residual
variance and you end up with the (descriptive) standard error
of estimate.
variance (descriptive) | ![]() |
error (descriptive) | ![]() |
Replace N in the above expression with the appropriate number of degrees of freedom (within the context of correlation and regression, df=N-2), and you have the standard error of estimate that is usable for inferential purposes.
error of estimate (inferential) | ![]() |
As shown in the following graph, this calculated value for the inferential standard error of estimate (SE) corresponds to a pair of lines running parallel to the regression line, the first lying 3.13 units of Y above the regression line, and the other lying 3.13 units of Y below it.

Taking
as our measure of
probable error, we can now recast the prediction formula given
earlier as
. Suppose, for example, that
you wanted to predict the value of Y that would probably be associated
with a newly observed value of X=4. As shown in the above
graph, what you are essentially doing when you apply the prediction
formula is starting out on the X axis at the point where
X=4, going from there straight up to the regression line,
then turning left and going straight over to the Y axis
where you arrive at a predicted value of Y=7.64. Add
to this value as a measure of probable error, and you end up with

Although the underlying logic of the point will not be clear until
we are well along in our consideration of concepts of probability,
the basic meaning of the prediction
is
that we can have about 68% confidence that the value of Y actually
associated with our newly observed value of X=4 will fall
somewhere within the range bounded at the one extreme by 7.64-3.13=4.51
and at the other by 7.64+3.13=10.77.
Return to Prospectus Main Page