Configural Frequency Analysis
Motivating CFA
Configural Frequency Analysis (CFA) allows one to analyze cross-classifications at the level of individual cells (or groups of cells). The usefulness of CFA is demonstrated in the following example (from Agresti, 2002; see also Wiedermann, et al., 2022). The three variables Premarital Sex (P; 0 = no; 1 = yes), Extramarital Sex (E; 0 = no; 1 = yes), and Marital Status (M; 0 = still married, 1 = divorced) are crossed to span the following cross-classification.
P | E | M | Freq |
0 | 0 | 0 | 452 |
0 | 0 | 1 | 282 |
0 | 1 | 0 | 8 |
0 | 1 | 1 | 53 |
1 | 0 | 0 | 67 |
1 | 0 | 1 | 114 |
1 | 1 | 0 | 15 |
1 | 1 | 1 | 45 |
We ask whether the two sexual activity variables allow us to predict marital status at the time of the interview. To answer this question, we first perform a binary logistic regression with P and E as predictors and M as the outcome variable. Results are as follows:
Logistic Regression Results
With a Likelihood Ratio (LR) Chi-Square of 102.7 with 3 degrees of freedom, the regression model fails to properly represent the data in the table. Therefore, the parameters should not be interpreted, although they are all significant:
95% CI | ||||||
Parameter | Estimate | SE | z-value | p-value | lower | upper |
Constant | 5.633 | 0.923 | 6.105 | <.001 | 3.825 | 7.442 |
P | -2.799 | 0.592 | -4.727 | <.001 | -3.959 | -1.638 |
E | -4.158 | 0.843 | -4.931 | <.001 | -5.811 | -2.505 |
P × E | 1.796 | 0.512 | 3.506 | <.001 | 0.792 | 2.799 |
Note: SE = standard error; CI = confidence interval |
From this result, we conclude that simple binary logistic regression does not allow us to explain the observed frequency distribution. At this point of the analysis, the question is unanswered.
CFA Results
To answer the question, we now conduct a CFA in which the base model posits that the two predictors are related to each other but not to the outcome variable. Significant deviations at the level of individual cells indicate where and how predictors and outcome are related. The following table summarizes the CFA results.
Configuration obs exp statistic p-value Decision
0 0 0 452.00 384.004 12.0402 .000521 Type
0 0 1 282.00 349.996 13.2101 .000278 Antitype
0 1 0 8.00 31.913 17.9186 .000023 Antitype
0 1 1 53.00 29.087 19.6596 .000009 Type
1 0 0 67.00 94.693 8.0989 .004429 Antitype
1 0 1 114.00 86.307 8.8858 .002874 Type
1 1 0 15.00 31.390 8.5579 .003440 Antitype
1 1 1 45.00 28.610 9.3894 .002182 Type
The log-linear CFA base model that was estimated (i.e., the probability model used to estimate expected cell frequencies) is equivalent to the model of binary logistic regression. Results show that each cell deviates significantly from the model. Instead of interpreting, in this example, each individual cell, we ask whether
(1) engaging in pre- and/or extramarital sex results in divorce, and
(2) whether extramarital sex alone, that is, regardless of premarital sex, also results in divorce.
The first question is confirmed by the cases in Cells 0 1 0, 1 0 0, and 1 1 0. Combined, these three cells contain significantly fewer cases than expected under the assumption that P and E on the predictor side and M on the outcome side are independent (Stouffer’s Z = -5.92; p < 0.001). The second question is confirmed by Cells 0 1 0 and 1 1 0. Combined, these two cells also contain significantly fewer cases than expected under the assumption that P and E on the predictor side and M on the outcome side are independent (Stouffer’s Z = -5.14; p < 0.001). Thus, CFA allows one to conclude that marriages in which one partner had engaged in premarital sex as well as in marital sex are less likely to be stable. The same applies to marriages in which one partner engages in extramarital sex. None of these conclusions can be drawn from the results of the (ill-fitting) binary logistic regression.