
🛸 The Directed Prediction Index (DPI).
The Directed Prediction Index (DPI) is a quasi-causal inference method for cross-sectional data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models.

Bruce H. W. S. Bao 包寒吴霜
## Method 1: Install from CRAN
install.packages("DPI")
## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/DPI", force=TRUE)Define \(\text{DPI}\) as the product of \(\text{Direction}\) (relative direction) and \(\text{Strength}\) (absolute strength) of the expected \(X \rightarrow Y\) relationship:
\[ \begin{aligned} \text{DPI}_{X \rightarrow Y} & = \text{Direction}_{X \rightarrow Y} \cdot \text{Strength}_{XY} \\ & = \text{Delta}(R^2) \cdot \text{Sigmoid}(\frac{p}{\alpha}) \\ & = \left( R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2 \right) \cdot \left( 1 - \tanh \frac{p_{XY|Covs}}{2\alpha} \right) \\ & \in (-1, 1) \end{aligned} \]
In econometrics and broader social sciences, an exogenous variable is assumed to have a directed (causal or quasi-causal) influence on an endogenous variable (\(ExoVar \rightarrow EndoVar\)). By quantifying the relative endogeneity of outcome versus predictor variables in multiple linear regression models, the DPI can suggest a plausible (admissible) direction of influence (i.e., \(\text{DPI}_{X \rightarrow Y} > 0 \text{: } X \rightarrow Y\)) after controlling for a sufficient number of possible confounders and simulated random covariates.
\[ \begin{aligned} \text{Direction}_{X \rightarrow Y} & = \text{Endogeneity}(Y) - \text{Endogeneity}(X) \\ & = R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2 \\ & = \text{Delta}(R^2) \\ & \in (-1, 1) \end{aligned} \]
k.cov in the DPI() function). A higher \(R^2\) indicates higher dependence
(i.e., higher endogeneity) in a given variable set.\[ \begin{aligned} \text{Sigmoid}(\frac{p}{\alpha}) & = 2 \left[ 1 - \text{sigmoid}(\frac{p_{XY|Covs}}{\alpha}) \right] \\ & = 1 - \tanh \frac{p_{XY|Covs}}{2\alpha} \\ & \in (0, 1) \end{aligned} \]
\[ \begin{aligned} \text{sigmoid}(x) & = \frac{1}{1 + e^{-x}} \\ & = \frac{\tanh(\frac{x}{2}) + 1}{2}, & \in (0, 1) \\ \tanh(x) & = \frac{e^x - e^{-x}}{e^x + e^{-x}} \\ & = 1 - \frac{2}{1 + e^{2x}} \\ & = \frac{2}{1 + e^{-2x}} - 1 \\ & = 2 \cdot \text{sigmoid}(2x) - 1, & \in (-1, 1) \\ \text{Sigmoid}(\frac{p}{\alpha}) & = 2 \left[ 1 - \text{sigmoid}(\frac{p}{\alpha}) \right] \\ & = 1 - \tanh \frac{p}{2\alpha}. & \in (0, 1) \end{aligned} \]
| \(p\) | \(\text{Sigmoid}(\frac{p}{\alpha})\) with \(\alpha = 0.05\) |
|---|---|
| (~0) | (~1) |
| 0.0001 | 0.999 |
| 0.001 | 0.990 |
| 0.01 | 0.900 |
| 0.02 | 0.803 |
| 0.03 | 0.709 |
| 0.04 | 0.620 |
| 0.05 (\(\frac{p}{\alpha}\) = 1) | 0.538 |
| 0.10 | 0.238 |
| 0.20 | 0.036 |
| 0.50 | 0.00009 |
| 0.80 | 0.0000002 |
| 1 | 0.000000004 |
n.sim random samples, with k.cov
(unobservable) random covariate(s) in each simulated sample, to test the
statistical significance of DPI().