This is based on lecture notes prepared together with Mark Gilthorpe for his module "Advanced Modelling Strategies".

As you know, the covariates in a statistical analysis can have a variety of different roles from a causal inference perspective: they can be mediators, confounders, proxy confounders, or competing exposures. If a suitable set of covariates can be identified that removes confounding, we may proceed to estimate our causal effect using a multivariable regression model. In linear regression models, there are only two types of variables: the dependent variable (DV) and independent variables (IVs, or predictors). No further distinction is made between the IVs – specifically, the exposure is by no means a "special" IV and is treated just like any other IV. Thus, as you can see, there is a conceptual mismatch between causal theory (DAG) that leads us to formulate a multivariable regression model (that highlights the exposure-outcome relationship and associated statistical adjustment for confounding) and the regression model itself. This conceptual mismatch can easily lead to misinterpretation of the results from a multivariable regression model.

One particularly widespread misconception is known as
*mutual adjustment*, recently called the
Table 2 fallacy since the first table in most epidemiological
articles usually describes the study
data, and the second table reports the results of a
multivariable regression model where the
erroneous efforts to illustrate mutual adjustment often
appear. To illustrate the fallacy, let us
assume that we estimate the effect of X on Y. We know
(e.g. from a DAG) that there is only
one confounder, Z, so we run the regression Y~X+Z. If our
background knowledge and the
statistical assumptions of the regression (e.g. normality)
hold, then the coefficient of X
estimates the causal effect of X on Y. The ‘Table 2 fallacy’
is the belief that we can also
interpret the coefficient of Z as the effect of Z on Y;
indeed, in larger models, the fallacy is the
belief that all coefficients have a similar interpretation
with respect to Y.

To see why this is not true, let us look at an example DAG that matches our scenario.

Figure 1

digraph G {
X [pos="1,1"]
Y [pos="2,0"]
Z [pos="0,0"]
Z -> Y
Z -> X -> Y
}

With respect to the effect of X on Y, adjustment for Z removes all confounding, but what does including X in the model mean for the effect of Z on Y?

As we can see, X *mediates* the effect of
Z on Y, but adjustment for a mediator is erroneous
when estimating the total causal effect.
Thus, the Z coefficient in our model cannot be interpreted as
a total causal effect.
Instead, we could interpret it as the *direct effect*
of Z on Y when X is held constant;
this could be stronger than, weaker than, or opposite to
the total effect (see Simpson's paradox).

Suppose Y is regressed on **all other variables in the DAG** below.
Now look to see which coefficients in this model can be interpreted as
(unconfounded) **total effects**? Please click on all coefficients that can
be interpreted as **total effects**, and then click
!

digraph G {
}

Thus far it would seem that we can at least interpret every coefficient in a multivariable regression model as eiter a total or a direct causal effect. To see that this can also fail, let us add another variable to our DAG. We include U, which affects both Z and Y:

Figure 2

dag G { X [pos="1,1"] Y [pos="2,0"] Z [pos="0,0"] U [pos=".7,-1"] Z -> Y U -> Y U -> Z -> X -> Y }

Despite this new variable, it is still sufficient to adjust for Z alone to unconfound the X→Y effect (can you see why?). Thus, the validity of the X coefficient is unchanged. Upon examining Z in this situation, however, we encounter difficulties.

The new variable U acts as a confounder of the Z→Y relationship, which means that we would have to interpret the Z coefficient as a ‘direct effect that is confounded by U’ – not exactly a helpful interpretation. Indeed, no single multivariable regression model could ever estimate the causal effects of X and Z at the same time: estimating the X effect means we must include X as an IV, but to estimate the Z effect we must not include X. In general, it is impossible to identify multiple causal effects using a single regression model, and we can usually interpret at most one coefficient in such a model as a (total) causal effect.

Suppose Y is regressed on **all other variables in the DAG**
below, **except for U**.
Now look to see which coefficients in this model can be interpreted as
(unconfounded) **total effects** or (unconfounded) **direct effects**?
Please click on all coefficients that can be interpreted as unconfounded,
and then click !

dag G {
X -> Y
}

If we are interested in multiple causal effects, we need multiple regression models. In Figure 2, we can get the effect of X from the model Y~X+Z because adjustment for Z unconfounds the X→Y effect; and we can get the effect of Z from the model Y~Z+U because adjustment for U unconfounds the Z → Y effect.

The concept of ‘mutual adjustment’, as often encountered in the literature, is seriously misleading.