This is based on lecture notes prepared together with Mark Gilthorpe for his module "Advanced Modelling Strategies".

In empirical studies we often distinguish two variables of interest: the
**exposure**, or independent variable, or cause, and the **outcome**,
or dependent variable, or effect. Once these two special variables are
selected, the other variables in the study (whether measured or not measured)
are called **covariates**.

Covariates can be categorized into several roles; not all of these roles are mutually exclusive, but some are. We will define four of these roles below; in our definition we will make use of kinship terminology. In all of the following, we assume that X is the exposure and Y is the outcome.

Confounders are variables that is both an ancestor of the exposure and an ancestor of the outcome (along a path that does not include the exposure). For instance, Z is a confounder in the following DAG:

dag {
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
Z [pos="0.500,2.000"]
X -> Y
Z -> X
Z -> Y
}

To understand why the restriction "along a path that does not lead via the exposure" in the definition above, consider the following example:

dag {
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
Z [pos="0.500,2.000"]
X -> Y
Z -> X
}

Here, Z is no longer a confounder, even though it is an ancestor of both the exposure and the outcome. This is because the only path from Z to Y leads through the exposure X.

A **mediator** is a variable that lies "between" the exposure and
the outcome; in other words, it is a descendant of the exposure and an
ancestor of the outcome. M is a mediator in the following example:

dag {
M [pos="0.500,1.050"]
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
M -> Y
X -> M
}

A mediator cannot be a confounder. Can you explain why?

Proxy confounders are covariates that are not themselves confounders, but lie "between" confounders and the exposure or outcome. In other words, a proxy confounder is a descendant of a confounder and an ancestor of either the exposure or the outcome (but not both; else it would be a confounder).

In the example below, Z is a confounder and A and M are proxy confounders. Note that M is also a mediator; the roles as mediator and proxy confounder are not mutually exclusive.

dag {
A [pos="0.700,1.500"]
M [pos="0.500,0.000"]
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
Z [pos="0.350,2.000"]
A -> Y
M -> Y
X -> M
Z -> A
Z -> M
Z -> X
}

Lastly, a competing exposure is an ancestor of the outcome that is not related with the exposure -- i.e., it is neither a confounder, nor a proxy confounder, nor a mediator. Including competing exposures in a regression model does not affect bias, but should improve precision.

In the two examples below, Z is a competing exposure.

dag {
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
Z [pos="0.500,2.000"]
X -> Y
Z -> Y
}

dag {
A [pos="0.350,2.000"]
X [exposure,pos="0.000,1.000"]
Y [outcome,pos="1.000,1.100"]
Z [pos="0.700,2.000"]
X -> A
X -> Y
Z -> A
Z -> Y
}

Below you can play a little game to test your knowledge of DAG terminology. Do you manage to give correct answers in a row?

In each example, X is the **exposure** and Y is the **outcome**!

digraph G {
}

Do you now feel eager to apply your graphical knowledge to an important statistical concept? Great! Read on about the Table 2 Fallacy.