C About statistics

C.1 \(R^2\) and beyond

C.1.1 Decomposition of \(R^2\)

The decomposition of \(R^2\) according to the method of Lindeman (1980) is detailed below. The presentation is taken directly from Grömping and others (2006).

A linear model is written \(y_i=\beta_0+\beta_1x_{i1}+...+\beta_px_{ip}+e_i\) and the corresponding \(R^2\) is:

\[R^2=\dfrac{\sum_{i=1}^{n} (\hat{y_i}-\bar{y_i})^2}{\sum_{i=1}^{n} (y_i-\bar{y_i})^2}\] Additionally, we define \(R^2(S)\) for a model with regressors in set S. The additional \(R^2\) when adding the regressors in set \(M\) to a model with the regressors in set \(S\) is given as:

\[seqR^2(M|S)=R^2(M\cup S)-R^2(S)\]

The order of the regressors in any model is a permutation of the available regressors \(x_1, ..., x_p\) and is denoted by the tuple of indices \(r = (r_1, ..., r_p)\). Let \(S_k(r)\) denote the set of regressors entered into the model before regressor \(x_k\) in the order \(r\). Then the portion of \(R^2\) allocated to regressor \(x_k\) in the order \(r\) can be written as

\[seqR^2(\{x_k\}|S_k(r))=R^2(\{x_k\}\cup S_k(r))-R^2(S_k(r))\]

All in all, the \(R^2\) allocated to \(x_k\) after decomposition is:

\[R^2_{decomp}(x_k)=\dfrac{1}{p!}\sum_{r\text{ permutations}}seqR^2(\{x_k\}|r)\]

C.1.2 \(R^2\) for survival data

Among the different \(R^2\) analogues that have been proposed to measure the variation explained by survival models, the one described by Royston and Sauerbrei (2004), called \(R^2_D\) appears to be one of the most relevant with respect to the following criteria: independance from sensoring, interpretability and robustness to model misspecification (Choodari-Oskooei, Royston, and Parmar 2012).

The description is given below in the context of a Cox proportional hazards (PH) survival model with \(n\) individuals with \(T_i\) and \(C_i\) corresponding respectively to potential death (or relapse) and censoring times, with \(i=1,2,...,n\). In this time-to-event setting, \(X_i=min(T_i,C_i)\) is the time variables and \(\delta_i=I(T_i \leq C_i)\) the status variable, \(I\) being the indicator function. The Cox PH model then expresses the hazard function as follows:

\[h(t|X)=h_0(t).\text{exp}(\beta'X)\],

with \(t\) the time to a death event, \(X\) the covariate vector and \(beta\) the parameter vector. The adapted \(R^2\) called \(R^2_D\) is given by (Royston 2006):

\[R^2=\dfrac{D^2/\kappa^2}{D^2/\kappa^2 + \sigma^2_{\epsilon}}\],

with the following component:

\(D\) quantifies the separation of survival curves. It is computed by ordering the estimated prognostic index, \(\beta'X\), calculating the expected standard normal order statistics corresponding to these values, dividing the latter by a factor \(\kappa\), and performing an auxiliary regression on the scaled scores: the resulting regression coefficient is \(D\).
\(\kappa=\sqrt{8/\pi}\approx 1.60\) (Royston and Sauerbrei 2004)
\(\sigma^2_{\epsilon}\) is the variance of the error term, \(\sigma^2_{\epsilon}=\pi^2/6\) for Cox PH models

For a better understanding of this formula, it is interesting to note that in a linear regression model \(Y\sim N(\beta'X, \sigma^2)\), it is also possible to write \(R^2\) equivalently as follows:

\[R^2=\dfrac{\text{Var}(\beta'X)}{\text{Var}(\beta'X)+\sigma^2}\]

This formula underlines the analogy with \(R^2_D\), with \(D^2/\kappa^2\) being interpreted as an estimate of the variance of the prognostic index \(\beta'X\) for the Cox PH model.

C.2 Causal inference with multiple versions of treatment

This section gathers the demonstrations of the equations present in chapter 9 when they are not present in this chapter and additional details about other estimators based on IPW and TMLE.

C.2.1 Overall treatment effect with multiple versions of treatment (equation (9.3))

Here is the formal proof for equation (9.3), mostly derived from the proof of Proposition 3 in (VanderWeele and Hernan 2013).

\[\begin{equation*} \begin{aligned} E[ & Y(a, K^a(a))] = E[Y(a)] &&K^a \text{ actually received}\\ & = \sum_{c,w} E[Y(a)|c,w] \times P[c,w]&& \\ & = \sum_{c,w} E[Y(a)|a,c,w] \times P[c,w] && Y(a) \perp \!\!\! \perp A | (C,W)\\ & = \sum_{c,w} E[Y(a, K^a(a))|a,c,w] \times P[c,w]&& \\ & = \sum_{c,w,k^a} E[Y(a, k^a)|a,K^a(a)=k^a,c,w]\times P[K^a(a)=k^a|a,c,w] \times P[c,w] &&\\ & = \sum_{c,w,k^a} E[Y(a, k^a)|a,K^a=k^a,c,w]\times P[K^a=k^a|a,c,w] \times P[c,w] &&\text{consistency K} \\ & = \sum_{c,w,k^a} E[Y|a,K^a=k^a,c,w]\times P[K^a=k^a|a,c,w] \times P[c,w] &&\text{consistency Y} \\ & = \sum_{c,w} E[Y|a,c,w]\times P[c,w] \end{aligned} \end{equation*}\]

Then, the overall treatment effect can be defined and computed by:

\[E[Y(a, K^a(a))] - E[Y(a^*, K^{a^*}(a^*))]\]

C.2.2 Treatment effect with predefined distributions of versions of treatment (equation (9.5))

Here is the formal proof for equation (9.5), partially derived from the proof of Proposition 5 in (VanderWeele and Hernan 2013).

\[\begin{equation*} \begin{aligned} E[ & Y(a, G^a)] = \sum_{c} E[Y(a, G^a)|C=c] \times P[c]\\ & = \sum_{c, k^a} E[Y(a, k^a)|G^a=k^a, C=c] \times P[G^a=k^a|C=c] \times P[c]\\ & = \sum_{c, k^a} E[Y(a, k^a)| C=c] \times g^{k^a,c} \times P[c] &&\text{since } P[G^a=k^a] = g^{k^a,c}\\ & = \sum_{c, k^a} E[Y(a, k^a)| A=a, K^a=k^a, C=c] \times g^{k^a,c} \times P[c] &&\text{with } Y(a,k^a) \perp \!\!\! \perp \{A,K\} | C\\ & = \sum_{c, k^a} E[Y| A=a, K^a=k^a, C=c] \times P[c] &&\text{by consistency for Y} \end{aligned} \end{equation*}\]

C.2.3 Inverse probability of treatment weighted (IPW) estimators for precision medicine

An extension of IPW methods described in section 8.2.2.2 to multi-valued treatments (only treatment \(K\) with different modalities and no \(A\)) has already been studied and the different formulas and estimators adapted accordingly (Imbens 2000; Feng et al. 2012), defining in particular a generalized propensity score:

\[f(k|c)=P[K=k|C=c]=E[I(k)|C=c]\]

\[\begin{equation*} \text{with } I(k) = \left\{ \begin{array}{ll} 1 & \quad \text{if } K = k \\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation*}\]

and a subsequent estimator:

\[E[Y(k)]=\dfrac{\hat{E}[I(K=k)W^{K}Y]}{\hat{E}[I(K=k)W^K]} \text{ with } W^K=\dfrac{1}{f[K|C]}\]

In our precision medicine settings, to be consistent with the previously defined causal diagram (Figure 9.5), we have both \(A\), binary status depending on the class of drugs, and \(K\), the multinomial variable for versions of treatments, i.e., the precise drug. Therefore we need to define a slightly different propensity score with joint probabilities:

\[\begin{equation*} \begin{aligned} f(a,k|c) & =P[A=a,K=k|C=c]\\ & =P[K=k|A=a, C=c].P[A=a|C=c]\\ & =E[I(a,k)|C=c] \end{aligned} \end{equation*}\] \[\begin{equation*} \text{with } I(a,k) = \left\{ \begin{array}{ll} 1 & \quad \text{if } A = a, K = k \\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation*}\]

From this we can deduce the estimator:

\[E[Y(a, k)]=\dfrac{\hat{E}[I(A=a,K=k)W^{A,K}Y]}{\hat{E}[I(A=a,K=k)W^{A,K}]} \text{ with } W^{A,K}=\dfrac{1}{f[A,K|C]}\]

In all the examples presented in this study and implemented in the code, \(\mathcal{K}^{0} \cap \mathcal{K}^{1} = \emptyset\), it is therefore possible to simplify the joint probabilities since the knowledge of K automatically results in the knowledge of A allowing \(P[A=a, K=k|C=c]=P[K=k|C=c]\). The above formulas with the attached probabilities are still necessary in the general case and allow for the derivation of causal effects \(\text{CE}_1\), \(\text{CE}_2\) and \(\text{CE}_3\) previously described.

C.2.4 TMLE

Targeted maximum likelihood estimation is framework based on a doubly robust maximum-likelihood–based approach that includes a "targeting" step that optimizes the bias-variance trade-off for a defined target parameter. In particular, this method is perfectly compatible with the use of machine learning algorithms for outcome or treatment models. A detailed description of the method and its implementations can be found in Van der Laan and Rose (2011).

The implementation proposed in this article is very similar to the one proposed in a recent tutorial concerning the application to binary processing (Luque-Fernandez et al. 2018). The specific characteristics of the problem of precision medicine studied here lead to modify this approach. In particular, the outcome and treatment models used in the first steps are modified in the same way as the one explained for the standardized estimators (outcome model) and for the IPW estimators (treatment model). The step of updating the estimates is done on a model similar to Luque-Fernandez et al. (2018).

The algorithm used for the models internal to the TMLE are, as much as possible, the same as those used for the standardised and IPW estimators:

For simulated data: generalized linear models in all cases except multinomial classification performed through the function in package.
For PDX data: random forests for all models. Use of SuperLearner (Van der Laan, Polley, and Hubbard 2007) is made possible by simple modifications to the code but significantly slows down its execution.

References

Choodari-Oskooei, Babak, Patrick Royston, and Mahesh KB Parmar. 2012. “A Simulation Study of Predictive Ability Measures in a Survival Model I: Explained Variation Measures.” Statistics in Medicine 31 (23). Wiley Online Library: 2627–43.

Feng, Ping, Xiao-Hua Zhou, Qing-Ming Zou, Ming-Yu Fan, and Xiao-Song Li. 2012. “Generalized Propensity Score for Estimating the Average Treatment Effect of Multiple Treatments.” Statistics in Medicine 31 (7). Wiley Online Library: 681–97.

Grömping, Ulrike, and others. 2006. “Relative Importance for Linear Regression in R: The Package Relaimpo.” Journal of Statistical Software 17 (1): 1–27.

Imbens, Guido W. 2000. “The Role of the Propensity Score in Estimating Dose-Response Functions.” Biometrika 87 (3). Oxford University Press: 706–10.

Lindeman, Richard Harold. 1980. Introduction to Bivariate and Multivariate Analysis. Glenview, Ill: Scott, Foresman.

Luque-Fernandez, Miguel Angel, Michael Schomaker, Bernard Rachet, and Mireille E Schnitzer. 2018. “Targeted Maximum Likelihood Estimation for a Binary Treatment: A Tutorial.” Statistics in Medicine 37 (16). Wiley Online Library: 2530–46.

Royston, Patrick. 2006. “Explained Variation for Survival Models.” The Stata Journal 6 (1). SAGE Publications Sage CA: Los Angeles, CA: 83–96.

Royston, Patrick, and Willi Sauerbrei. 2004. “A New Measure of Prognostic Separation in Survival Data.” Statistics in Medicine 23 (5). Wiley Online Library: 723–48.

Van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.

Van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter.

VanderWeele, Tyler J, and Miguel A Hernan. 2013. “Causal Inference Under Multiple Versions of Treatment.” Journal of Causal Inference 1 (1). De Gruyter: 1–20.