C About statistics

C.1 R2 and beyond

C.1.1 Decomposition of R2

The decomposition of R2 according to the method of Lindeman (1980) is detailed below. The presentation is taken directly from Grömping and others (2006).

A linear model is written yi=β0+β1xi1+...+βpxip+ei and the corresponding R2 is:

R2=ni=1(^yi¯yi)2ni=1(yi¯yi)2 Additionally, we define R2(S) for a model with regressors in set S. The additional R2 when adding the regressors in set M to a model with the regressors in set S is given as:

seqR2(M|S)=R2(MS)R2(S)

The order of the regressors in any model is a permutation of the available regressors x1,...,xp and is denoted by the tuple of indices r=(r1,...,rp). Let Sk(r) denote the set of regressors entered into the model before regressor xk in the order r. Then the portion of R2 allocated to regressor xk in the order r can be written as

seqR2({xk}|Sk(r))=R2({xk}Sk(r))R2(Sk(r))

All in all, the R2 allocated to xk after decomposition is:

R2decomp(xk)=1p!r permutationsseqR2({xk}|r)

C.1.2 R2 for survival data

Among the different R2 analogues that have been proposed to measure the variation explained by survival models, the one described by Royston and Sauerbrei (2004), called R2D appears to be one of the most relevant with respect to the following criteria: independance from sensoring, interpretability and robustness to model misspecification (Choodari-Oskooei, Royston, and Parmar 2012).

The description is given below in the context of a Cox proportional hazards (PH) survival model with n individuals with Ti and Ci corresponding respectively to potential death (or relapse) and censoring times, with i=1,2,...,n. In this time-to-event setting, Xi=min(Ti,Ci) is the time variables and δi=I(TiCi) the status variable, I being the indicator function. The Cox PH model then expresses the hazard function as follows:

h(t|X)=h0(t).exp(βX),

with t the time to a death event, X the covariate vector and beta the parameter vector. The adapted R2 called R2D is given by (Royston 2006):

R2=D2/κ2D2/κ2+σ2ϵ,

with the following component:

  • D quantifies the separation of survival curves. It is computed by ordering the estimated prognostic index, βX, calculating the expected standard normal order statistics corresponding to these values, dividing the latter by a factor κ, and performing an auxiliary regression on the scaled scores: the resulting regression coefficient is D.
  • κ=8/π1.60 (Royston and Sauerbrei 2004)
  • σ2ϵ is the variance of the error term, σ2ϵ=π2/6 for Cox PH models

For a better understanding of this formula, it is interesting to note that in a linear regression model YN(βX,σ2), it is also possible to write R2 equivalently as follows:

R2=Var(βX)Var(βX)+σ2

This formula underlines the analogy with R2D, with D2/κ2 being interpreted as an estimate of the variance of the prognostic index βX for the Cox PH model.

C.2 Causal inference with multiple versions of treatment

This section gathers the demonstrations of the equations present in chapter 9 when they are not present in this chapter and additional details about other estimators based on IPW and TMLE.

C.2.1 Overall treatment effect with multiple versions of treatment (equation (9.3))

Here is the formal proof for equation (9.3), mostly derived from the proof of Proposition 3 in (VanderWeele and Hernan 2013).

E[Y(a,Ka(a))]=E[Y(a)]Ka actually received=c,wE[Y(a)|c,w]×P[c,w]=c,wE[Y(a)|a,c,w]×P[c,w]Y(a)A|(C,W)=c,wE[Y(a,Ka(a))|a,c,w]×P[c,w]=c,w,kaE[Y(a,ka)|a,Ka(a)=ka,c,w]×P[Ka(a)=ka|a,c,w]×P[c,w]=c,w,kaE[Y(a,ka)|a,Ka=ka,c,w]×P[Ka=ka|a,c,w]×P[c,w]consistency K=c,w,kaE[Y|a,Ka=ka,c,w]×P[Ka=ka|a,c,w]×P[c,w]consistency Y=c,wE[Y|a,c,w]×P[c,w]

Then, the overall treatment effect can be defined and computed by:

E[Y(a,Ka(a))]E[Y(a,Ka(a))]

C.2.2 Treatment effect with predefined distributions of versions of treatment (equation (9.5))

Here is the formal proof for equation (9.5), partially derived from the proof of Proposition 5 in (VanderWeele and Hernan 2013).

E[Y(a,Ga)]=cE[Y(a,Ga)|C=c]×P[c]=c,kaE[Y(a,ka)|Ga=ka,C=c]×P[Ga=ka|C=c]×P[c]=c,kaE[Y(a,ka)|C=c]×gka,c×P[c]since P[Ga=ka]=gka,c=c,kaE[Y(a,ka)|A=a,Ka=ka,C=c]×gka,c×P[c]with Y(a,ka){A,K}|C=c,kaE[Y|A=a,Ka=ka,C=c]×P[c]by consistency for Y

C.2.3 Inverse probability of treatment weighted (IPW) estimators for precision medicine

An extension of IPW methods described in section 8.2.2.2 to multi-valued treatments (only treatment K with different modalities and no A) has already been studied and the different formulas and estimators adapted accordingly (Imbens 2000; Feng et al. 2012), defining in particular a generalized propensity score:

f(k|c)=P[K=k|C=c]=E[I(k)|C=c]

with I(k)={1if K=k0otherwise

and a subsequent estimator:

E[Y(k)]=ˆE[I(K=k)WKY]ˆE[I(K=k)WK] with WK=1f[K|C]

In our precision medicine settings, to be consistent with the previously defined causal diagram (Figure 9.5), we have both A, binary status depending on the class of drugs, and K, the multinomial variable for versions of treatments, i.e., the precise drug. Therefore we need to define a slightly different propensity score with joint probabilities:

f(a,k|c)=P[A=a,K=k|C=c]=P[K=k|A=a,C=c].P[A=a|C=c]=E[I(a,k)|C=c] with I(a,k)={1if A=a,K=k0otherwise

From this we can deduce the estimator:

E[Y(a,k)]=ˆE[I(A=a,K=k)WA,KY]ˆE[I(A=a,K=k)WA,K] with WA,K=1f[A,K|C]

In all the examples presented in this study and implemented in the code, K0K1=, it is therefore possible to simplify the joint probabilities since the knowledge of K automatically results in the knowledge of A allowing P[A=a,K=k|C=c]=P[K=k|C=c]. The above formulas with the attached probabilities are still necessary in the general case and allow for the derivation of causal effects CE1, CE2 and CE3 previously described.

C.2.4 TMLE

Targeted maximum likelihood estimation is framework based on a doubly robust maximum-likelihood–based approach that includes a "targeting" step that optimizes the bias-variance trade-off for a defined target parameter. In particular, this method is perfectly compatible with the use of machine learning algorithms for outcome or treatment models. A detailed description of the method and its implementations can be found in Van der Laan and Rose (2011).

The implementation proposed in this article is very similar to the one proposed in a recent tutorial concerning the application to binary processing (Luque-Fernandez et al. 2018). The specific characteristics of the problem of precision medicine studied here lead to modify this approach. In particular, the outcome and treatment models used in the first steps are modified in the same way as the one explained for the standardized estimators (outcome model) and for the IPW estimators (treatment model). The step of updating the estimates is done on a model similar to Luque-Fernandez et al. (2018).

The algorithm used for the models internal to the TMLE are, as much as possible, the same as those used for the standardised and IPW estimators:

  • For simulated data: generalized linear models in all cases except multinomial classification performed through the function in package.
  • For PDX data: random forests for all models. Use of SuperLearner (Van der Laan, Polley, and Hubbard 2007) is made possible by simple modifications to the code but significantly slows down its execution.

References

Choodari-Oskooei, Babak, Patrick Royston, and Mahesh KB Parmar. 2012. “A Simulation Study of Predictive Ability Measures in a Survival Model I: Explained Variation Measures.” Statistics in Medicine 31 (23). Wiley Online Library: 2627–43.

Feng, Ping, Xiao-Hua Zhou, Qing-Ming Zou, Ming-Yu Fan, and Xiao-Song Li. 2012. “Generalized Propensity Score for Estimating the Average Treatment Effect of Multiple Treatments.” Statistics in Medicine 31 (7). Wiley Online Library: 681–97.

Grömping, Ulrike, and others. 2006. “Relative Importance for Linear Regression in R: The Package Relaimpo.” Journal of Statistical Software 17 (1): 1–27.

Imbens, Guido W. 2000. “The Role of the Propensity Score in Estimating Dose-Response Functions.” Biometrika 87 (3). Oxford University Press: 706–10.

Lindeman, Richard Harold. 1980. Introduction to Bivariate and Multivariate Analysis. Glenview, Ill: Scott, Foresman.

Luque-Fernandez, Miguel Angel, Michael Schomaker, Bernard Rachet, and Mireille E Schnitzer. 2018. “Targeted Maximum Likelihood Estimation for a Binary Treatment: A Tutorial.” Statistics in Medicine 37 (16). Wiley Online Library: 2530–46.

Royston, Patrick. 2006. “Explained Variation for Survival Models.” The Stata Journal 6 (1). SAGE Publications Sage CA: Los Angeles, CA: 83–96.

Royston, Patrick, and Willi Sauerbrei. 2004. “A New Measure of Prognostic Separation in Survival Data.” Statistics in Medicine 23 (5). Wiley Online Library: 723–48.

Van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.

Van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter.

VanderWeele, Tyler J, and Miguel A Hernan. 2013. “Causal Inference Under Multiple Versions of Treatment.” Journal of Causal Inference 1 (1). De Gruyter: 1–20.