Chi-Square


degrees of freedom

sampling distributions of chi-square

logic and computational procedures












Degrees of Freedom

In working to understand the meaning of this concept, your first step should be to disentangle it entirely from most of the meanings that you normally associate with the word "freedom," since in this context it has nothing at all to do with freedom of the will, freedom of conscience, or anything of the sort. Degrees of freedom, df, is simply an index of the amount of random variability, mere chance coincidence, that can be present in a particular situation. Its closest literal translation would be something along the line of "degrees of arbitrariness."

Here is a working definition of the concept as it occurs within the context of chi-square procedures. Suppose you have two cells such as the ones shown below, and you are free to plug any integer numbers that you want into them, subject only to the stipulation that the sum of the two numbers must be equal to a certain specified quantity. For purposes of illustration we will set the sum at 20, though it could actually be any positive integer value.

cell a = ?
cell b = ?
sum = 20

It will be obvious at a glance that your "freedom" in this case is limited by the fixed sum of 20. If you start by plugging the integer 8 into cell a, the number that goes into b is then inescapably fixed as 20 - 8 = 12. Start by plugging 16 into cell b and the number that goes into a is then rigidly fixed as 20 - 16 = 4. So in this two-cell situation, only one of the cells is "free" to vary arbitrarily-which is to say, there is only one degree of freedom. If you have three cells subject to a fixed sum of 20

cell a = ?
cell b = ?
cell c = ?
sum = 20

you can plug numbers arbitrarily into any two of the cells, but once those cells are plugged the value of the third is rigidly fixed. Thus, plug 6 into any one cell and 8 into either of the remaining two, and the value of the third is then fixed as 20 - (6+8) = 6. So here, with three cells, your degrees of freedom would be equal to two. The same logic then extends to cases where the number of cells is four, five, six, and so on. When applying chi-square procedures to situations in which there is only one dimension of categorization, the general principle for determining degrees of freedom is

df = (number of cells) - 1

With two dimensions of classification, it is a different formula but the same basic logic. Suppose you have two rows and two columns of cells, as shown below, and you are free to plug any integer numbers that you want into them, subject only to the stipulation that the cell values summed across rows, down columns, and overall, must add up to the fixed row, column, and overall sums shown in bold-face type.

cell a = ?
cell b = ?
50
cell c = ?
cell d = ?
40
48
42
90

Here again your "freedom" is limited by the fixed sums. Arbitrarily plug 10 into cell a and the remaining three cells are instantly fixed at b=40, c=38, and d=2. Plugging 20 into cell d fixes the other cells at b=22, c=20, and a=28; and so on. For chi-square situations involving two rows and two columns of cells, degrees of freedom will in every instance be equal to one.

If you have two rows and three columns, your "freedom" is increased a bit, though still limited by the fixed sums.

cell a = ?
cell b = ?
cell c = ?
50
cell d = ?
cell e= ?
cell f = ?
40
25
40
25
90

If you plug 10 arbitrarily into cell a, cell b becomes fixed at 15, but the other four cells remain "free" to vary. Plug some other number into any of these remaining four cells, however, and everything else becomes instantly fixed. For chi-square situations involving two rows and three columns (or three rows and two columns), degrees of freedom is in every case equal to two.

More generally, when applying chi-square procedures to situations in which there are two dimensions of categorization, the general principle for determining degrees of freedom is

df = (number of rows - 1) x (number of columns - 1)

For more on one-dimensional and two-dimensional chi-square tests, go to logic and computational procedures

[Return to top.]

[Return to Main Menu]

[If you came to degrees of freedom from logic and procedure, click here to get back to where you were.]

























Sampling Distributions of Chi-Square

The first of the following graphs shows the general outlines of the sampling distributions of chi-square for df = 2, 3, and 4, while the second shows the outlines of the df = 2 distribution in closer detail. You will find it useful to compare each of these graphs with the entries that appear in the table of critical values of chi-square.







For more on sampling distributions of chi-square go to
logic and computational procedures

[Return to top.]

[Return to Main Menu]


































Logic and Computational Procedures

As indicated in class, the logic of chi-square (pronounced kai to rhyme with sky) flows directly from the logic of binomial probabilities. Binomial procedures apply to situations where there are exactly two mutually exclusive categories into which observations might fall-female/male, head/tail, recovery/non-recovery, and so on. Chi-square extends the logic of binomial procedures to cover situations where there are more than two categories of possible outcome; for example, students categorized according to academic class as freshman/sophomore/junior/senior or patients with a certain disease categorized according to whether their condition, following an experimental treatment, is improved, worsened, or unchanged. Chi-square procedures also extend this logic to cover situations where there is more than one dimension of classification; for example, students categorized according to whether they describe themselves as "conservative" or "liberal," as well as by their academic class as freshman, sophomore, junior, or senior. When the observed items--students or whatever else they might be--are categorized in this fashion according to two or more separate dimensions of classification concurrently, they are said to be cross-categorized. In both kinds of cases, the chi-square test is used to determine whether an observed pattern of frequencies significantly differs from the pattern of frequencies that would be expected if nothing other than random variability (a.k.a. mere chance coincidence, sampling error) were operating in the situation.

[As my HTML formatter does not support mathematical notation mixed in with text, I will be representing a value of chi-square here as chi2.]

Chi-Square with One Dimension of Classification

Suppose that a questionnaire administered to a large national sample of college students included an item aimed at measuring conservatism versus liberalism on a certain issue of social/political relevance. The item took the form of a statement, and the response categories were "strongly disagree," "moderately disagree," "undecided," "moderately agree," and "strongly agree." I will leave it to your imagination to fill in the blanks concerning what the statement was and which end of the response scale was taken to reflect "conservative" or "liberal" attitudes. Suffice it to say that in the national sample 9.4 percent of the respondents expressed strong disagreement, 15.6 percent moderately disagreed, 34.3 percent were undecided, 27.5 percent expressed moderate agreement, and 13.2 percent strongly agreed. Professor H, upon reading the results of this survey, suspects that the students at her particular college are considerably more polarized into "conservative" and "liberal" camps, in comparison with the more general population of students studied in the national sample. To test this hypothesis, she administers the same question to a random sample of 204 students at her college and then compares the pattern of responses to the pattern of the national sample.

The null hypothesis in this situation is that the responses of Professor H's 204 respondents should not differ significantly from the 9.4%/15.6%/34.3%/27.5%/13.2% pattern of the national survey. Thus, the MCE expected frequency of response in the "strongly disagree" category would be .094 x 204 = 19.18; for the "moderately disagree" category, .156 x 204 = 31.82; and so on. What Professor H actually found, however, was something that seemed to fit rather well with her suspicion concerning polarization. All that remained was to determine whether the difference between the two patterns was significant. Here is an overview of the observed counts and percentages for the five response categories, each in comparison with the corresponding MCE expected values, along with the steps required for calculating chi2.

strongly disagree
moderately disagree
undecided
moderately agree
strongly disagree
Total
O
28

(13.7%)
34

(16.7%)
50

(24.5%)
57

(27.9%)
35

(17.2%)
204

...

E
19.18

(9.4%)
31.82

(15.6%)
69.97

(34.3%)
56.10

(27.5%)
26.93

(13.2%)
204

...



The following graph shows the sampling distribution of chi-square for the case of df=4. As indicated in the graph, our calculated cs2_observed value of 12.34 is significant not only at and beyond the minimal .05 level, but also beyond the level for .02. Hence P < .02. In brief, we can be a shade more than 98 percent confident that the difference between the observed and MCE expected patterns of frequencies does not result from mere random variability.



Do keep in mind, however, that the chi-square test we have just performed is intrinsically non-directional. In and of itself, the significant chi-square value says nothing at all about the particular texture or direction of the difference. Examine the above array of data in detail, however, and you will see that the texture of the difference is consistent with Professor H's hypothesis concerning greater conservative-liberal polarization among the students at her college. In particular, there were fewer respondents in the "undecided" category than would have been expected on the null hypothesis, and more in the "strongly agree" and "strongly disagree" categories. These three cells, in fact, accounted for all but a small fraction of the calculated chi-square value of 12.34.

Chi-Square with Two Dimensions of Classification

This is one of the examples given in class, based on data reported in the 17 August 1996 issue of the British medical journal Lancet. Researchers at Columbia-Presbyterian Medical Center (NYC) sorted the subjects in a sample of 1,124 elderly women according to two dimensions of classification: (1) Was the subject receiving estrogen-replacement therapy (ERT) at any time during the preceding ten years? [yes/no] and (2) Did the subject develop clinically diagnosable indications of Alzheimer's disease at any time during the preceding five years? [yes/no] [Note: none of the subjects showed signs of Alzheimer's at the beginning of the five-year period.] Their research hypothesis (H1) was that ERT might be of some benefit in preventing or postponing the onset of Alzheimer's symptoms; hence, that the subjects receiving ERT should show a smaller percentage of Alzheimer's onset during the five-year period in comparison with the subjects who did not receive ERT. The null hypothesis (HO) was that the percentages of Alzheimer's onset within the two groups should be the same, within the limits of random variability. And here are the data as reported, cross-categorized according to the two dimensions of classification. You will see that the observed results are consistent with H1. All that remains is to determine whether the observed difference--5.77% among the ERT subjects versus 16.01% among the non-ERT subjects--is significant.

Alzheimer's onset during five year period
no
yes
Totals
received ERT
147

(94.23%)
9

(5.77%)
156
did not receive ERT
813

(83.99%)
155

(16.01%)
968
Totals
960
164
1,124

The procedures for applying chi-square to a two-dimensional situation of this sort are the same as we saw for the one-dimensional situation, except for two small modifications. The first has to do with what we take to be the MCE expected cell frequencies, and the second pertains to how we determine the appropriate value for degrees of freedom. In both of these modifications the underlying logic is the same; it is only the details that differ. In this section we will consider only the determination of the expected cell frequencies; to go to a separate discussion on degrees of freedom for a chi-square test, click here: ===> degrees of freedom.

In the one-dimensional chi-square test the values of E are simply stipulated in advance. In the student survey example they were set to match the proportions of responses in the various categories that had been found in the large national survey. In the two-dimensional situation, on the other hand, the expected cell frequencies are not given in advance, nor are they intuitively obvious. I will illustrate the logic of the point with a simple if somewhat fanciful example. Two friends, A and B, believe their friendship to be so deep as to produce a remarkable pattern of correspondences. Often they find that they are thinking the same thing at the same time. Often, even when separated, they find that they are doing the same thing at the same time. To put their faith to the test, they each toss a coin 100 times in succession, recording on each occasion the head/tail outcome of A's toss and the corresponding head/tail outcome of B's toss. Their hypothesis is that when A gets a head, B will also tend to get a head; and that when A gets a tail, B will also tend to get a tail. They of course do not suppose that even their relationship is so deep as to insure these correspondences in 100 percent of the tosses, though they do believe that the pattern of such correspondences will significantly exceed what would be expected on the basis of mere chance coincidence.

For the sake of discussion, suppose that A and B each end up with exactly 50 heads and 50 tails. In that case the contingency table would have the following marginal totals, irrespective of how much or little the heads and tails outcomes of A and B might correspond.

Outcomes for B
tail
head
Outcomes
head
[cell a]
[cell b]
50

(total heads for A)

for A
tail
[cell c]
[cell d]
50

(total tails for A)

50

(total tails

for B)
50

(total heads

for B)
100

(total number of

paired tosses)

Actually, this is one of the rare scenarios for which the values of E would be fairly intuitively obvious. If you were asked to guess the values of the MCE expected frequencies for cells a, b, c, and d, I expect you would answer 25/25/25/25--and this would be quite correct. Now, if only you can make explicit the hidden logic that leads you to this answer, you will have the procedure for figuring out the values of E for two-dimensional chi-square situations in general. I suspect the core of your implicit logic runs something like this: If A gets 50 percent heads and B gets 50 percent tails; and if nothing other than mere chance coincidence is operating in the situation; then the (conjunctive) probability that any particular one of the 100 paired tosses will include a head for A and a tail for B is .5 x .5 = .25. Thus, the expected frequency for cell a is 25 percent of the total number of paired tosses: Ea = .25 x 100 = 25. The same logic would also lead you to Eb = 25, Ec = 25, and Ed = 25.

Now see how the same logic can be extended to situations that are not intuitively obvious. When A and B perform their series of 100 paired tosses, it is actually not very likely that they would both end up with exactly 50 percent heads and 50 percent tails. It is much more likely that they each would end up with something slightly different from an exact 50/50 split. Suppose that A comes out with 46 heads and 54 tails, while B ends up with 48 heads and 52 tails. In this case the marginal totals would be distributed as follows

Outcomes for B
tail
head
Outcomes
head
[cell a]
[cell b]
46
for A
tail
[cell c]
[cell d]
54
52
48
100

and the logic would run like this. It is the same logic, word for word, as outlined above, except for the different numerical values that get plugged into it: If A gets 46 percent heads and B gets 52 percent tails; and if nothing other than mere chance coincidence is operating in the situation; then the (conjunctive) probability that any particular one of the 100 paired tosses will include a head for A and a tail for B is .46 x .52 = .2392. Thus, the expected frequency for cell a is 23.92 percent of the total number of paired tosses: Ea = .2392 x 100 = 23.92. By the same reasoning you would also arrive at Eb = 22.08, Ec = 28.08, and Ed = 25.92.

A more streamlined formulaic way of arriving at the expected cell frequencies is simply this: For each cell, multiply the marginal total for the row to which the cell belongs by the marginal total for the column to which the cell belongs, and then divide the result by the total number of cross-categorized observations. That is, for any particular cell

Ecell = (R x C) / N

Where

R = the marginal total for the row to which the cell belongs

C = the marginal total for the column to which the cell belongs

N = the total number of cross-categorized observations

The following illustration shows how this simple calculation works out for each of the cells of the present example.

Outcomes for B
tail
head
Outcomes
head
(46 x 52)/100

= 23.92
(46 x 48)/100

= 22.08
46
for A
tail
(54 x 52)/100

= 28.08
(54 x 48)/100

= 25.92
54
52
48
100

And here is the procedure applied to our ERT/Alzheimer's example. The marginal totals and observed cell frequencies (O) are the same as shown when we introduced the example; within each cell we also now include the values of E (in red) that would be obtained by using the formula

Ecell = (R x C) / N

along with the appropriate values of R, C, and N.

Alzheimer's onset during five year period
no
yes
Totals
received ERT
147

(94.23%)

133.24
9

(5.77%)

22.76
156
did not receive ERT
813

(83.99%)

826.76
155

(16.01%)

141.24
968
Totals
960
164
1,124

The calculation for cs2 is then as follows:



The following graph shows the sampling distribution of chi-square for the case of df=1. As indicated in the graph, our calculated cs2_observed value of 11.31 is significant not only at and beyond the minimal .05 level, but also beyond the levels for .02, .01, and .001. Hence P < .001.

In brief, we can be very confident indeed that the difference between the observed and MCE expected patterns of frequencies does not result from mere random variability.



[Return to top.]

[Return to Main Menu]