class: center, middle, inverse, title-slide # Multivariate normal cont’d; missing data and imputation ### Dr. Olanrewaju Michael Akande ### Feb 26, 2020 --- <!-- ## Announcements --> <!-- - Homework 5 will be online by 5pm today. --> <!-- -- --> ## Outline - Multivariate normal/Gaussian model + Recap + Reading example cont'd + Answering questions + Jeffreys' prior - Missing data and imputation + Missing data mechanisms + Multivariate normal/Gaussian model + Example --- class: center, middle # Reading example cont'd --- ## Reading example - `\(Y_{i1}\)`: pre-instructional score for student `\(i\)` and `\(Y_{i2}\)`: post-instructional score for student `\(i\)`. -- - Model: + `\(\boldsymbol{Y}_i = (Y_{i1},Y_{i2})^T \sim \mathcal{N}_2(\boldsymbol{\theta}, \Sigma)\)`, + `\(\pi(\boldsymbol{\theta}) = \mathcal{N}_2(\boldsymbol{\mu}_0, \Lambda_0)\)`, and + `\(\pi(\Sigma) = \mathcal{IW}_2(\nu_0, \boldsymbol{S}_0)\)`. -- - Then, .block[ .small[ $$ `\begin{split} \pi(\boldsymbol{\theta} | \Sigma, \boldsymbol{Y}) = \mathcal{N}_2(\boldsymbol{\mu}_n, \Lambda_n) \end{split}` $$ ] ] where .block[ .small[ `\begin{eqnarray*} \Lambda_n & = & \left[\Lambda_0^{-1} + n\Sigma^{-1}\right]^{-1}\\ \boldsymbol{\mu}_n & = & \Lambda_n \left[\Lambda_0^{-1}\boldsymbol{\mu}_0 + n\Sigma^{-1} \bar{\boldsymbol{y}} \right]\\ \boldsymbol{\mu}_0 & = & \left(\begin{array}{c} 50 \\ 50 \end{array}\right); \ \ \ \Lambda_0 = \left(\begin{array}{cc} 156 & 78 \\ 78 & 156 \end{array}\right).\\ \end{eqnarray*}` ] ] --- ## Reading example: posterior computation - and .block[ .small[ $$ `\begin{split} \pi(\Sigma | \boldsymbol{\theta} | \boldsymbol{Y}) = \mathcal{IW}_2(\nu_n, \boldsymbol{S}_n) \end{split}` $$ ] ] or using the notation in the book, `\(\mathcal{IW}_2(\nu_n, \boldsymbol{S}_n^{-1} )\)`, where .block[ .small[ $$ `\begin{split} \nu_n & = \nu_0 + n\\ \\ \boldsymbol{S}_n & = \left[\boldsymbol{S}_0 + \boldsymbol{S}_\theta \right]\\ & = \left[\boldsymbol{S}_0 + \sum^n_{i=1}(\boldsymbol{y}_i - \boldsymbol{\theta})(\boldsymbol{y}_i - \boldsymbol{\theta})^T \right]. \end{split}` $$ ] ] .block[ .small[ `$$\nu_0 = p + 2 = 4$$` ] ] .block[ .small[ `\begin{eqnarray*} \Sigma_0 = \left(\begin{array}{cc} 625 & 312.5 \\ 312.5 & 625 \end{array}\right)\\ \end{eqnarray*}` ] ] --- ## Posterior distribution of the mean <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Answering questions of interest - Questions of interest: + Do students improve in reading comprehension on average? -- - Need to compute `\(\Pr[\theta_2 > \theta_1 | \boldsymbol{Y}]\)`. In R, ```r mean(THETA[,2]>THETA[,1]) ``` ``` ## [1] 0.992 ``` -- - That is, posterior probability `\(> 0.99\)` and indicates strong evidence that test scores are higher in the second administration. --- ## Answering questions of interest - Questions of interest: + If so, by how much? -- - Need posterior summaries `\(\Pr[\theta_2 - \theta_1 | \boldsymbol{Y}]\)`. In R, ```r mean(THETA[,2] - THETA[,1]) ``` ``` ## [1] 6.385515 ``` ```r quantile(THETA[,2] - THETA[,1], prob=c(0.025, 0.5, 0.975)) ``` ``` ## 2.5% 50% 97.5% ## 1.233154 6.385597 11.551304 ``` -- - Mean (and median) improvement is `\(\approx 6.39\)` points with 95% credible interval (1.23, 11.55). --- ## Answering questions of interest - Questions of interest: + How correlated (positively) are the post-test and pre-test scores? -- - We can compute `\(\Pr[\sigma_{12} > 0 | \boldsymbol{Y}]\)`. In R, ```r mean(SIGMA[,2]>0) ``` ``` ## [1] 1 ``` -- - Posterior probability that the covariance between them is positive is basically 1. --- ## Answering questions of interest - Questions of interest: + How correlated (positively) are the post-test and pre-test scores? -- - We can also look at the distribution of `\(\rho\)` instead. In R, ```r CORR <- SIGMA[,2]/(sqrt(SIGMA[,1])*sqrt(SIGMA[,4])) quantile(CORR,prob=c(0.025, 0.5, 0.975)) ``` ``` ## 2.5% 50% 97.5% ## 0.4046817 0.6850218 0.8458880 ``` -- - Median correlation between the 2 scores is 0.69 with a 95% quantile-based credible interval of (0.40, 0.85) -- - Because density is skewed, we may prefer the 95% HPD interval, which is (0.45, 0.88). ```r #library(hdrcde) hdr(CORR,prob=95)$hdr ``` ``` ## [,1] [,2] ## 95% 0.4468522 0.8761174 ``` --- ## Jeffreys' prior - Clearly, there's a lot of work to be done in specifying the hyperparameters (two or which are `\(p \times p\)` matrices). -- - What if we want to specify the priors so that we put in as little information as possible? -- - We already know how to do that somewhat with Jeffreys' priors. -- - For the multivariate normal model, turns out that the Jeffreys' rule for generating a prior distribution on `\((\boldsymbol{\theta}, \Sigma)\)` gives .block[ .small[ `$$\pi(\boldsymbol{\theta}, \Sigma) \propto \left|\Sigma\right|^{-\frac{(p+2)}{2}}.$$` ] ] -- - Can we derive the full conditionals under this prior? -- - **To be done on the board.** --- ## Jeffreys' prior - We can leverage previous work. For the likelihood we have both .block[ .small[ $$ `\begin{split} L(\boldsymbol{Y}; \boldsymbol{\theta}, \Sigma) & \propto \textrm{exp} \left\{-\dfrac{1}{2} \boldsymbol{\theta}^T(n\Sigma^{-1})\boldsymbol{\theta} + \boldsymbol{\theta}^T (n\Sigma^{-1} \bar{\boldsymbol{y}}) \right\} \end{split}` $$ ] ] and .block[ .small[ $$ `\begin{split} L(\boldsymbol{Y}; \boldsymbol{\theta}, \Sigma) & \propto \left|\Sigma\right|^{-\frac{n}{2}} \ \textrm{exp} \left\{-\dfrac{1}{2}\text{tr}\left[\boldsymbol{S}_\theta \Sigma^{-1} \right] \right\},\\ \end{split}` $$ ] ] where `\(\boldsymbol{S}_\theta = \sum^n_{i=1}(\boldsymbol{y}_i - \boldsymbol{\theta})(\boldsymbol{y}_i - \boldsymbol{\theta})^T\)`. -- - Also, we can rewrite any `\(\mathcal{N}_p(\boldsymbol{\mu}_0, \Lambda_0)\)` as .block[ .small[ $$ `\begin{split} p(\boldsymbol{\theta}) & \propto \textrm{exp} \left\{-\dfrac{1}{2} \boldsymbol{\theta}^T\Lambda_0^{-1}\boldsymbol{\theta} + \boldsymbol{\theta}^T\Lambda_0^{-1}\boldsymbol{\mu}_0 \right\}.\\ \end{split}` $$ ] ] -- - Finally, `\(\Sigma \sim \mathcal{IW}_p(\nu_0, \boldsymbol{S}_0)\)`, .block[ .small[ $$ `\begin{split} \Rightarrow \ \ p(\Sigma) \ \propto \ \left|\Sigma\right|^{\frac{-(\nu_0 + p + 1)}{2}} \textrm{exp} \left\{-\dfrac{1}{2} \text{tr}(\boldsymbol{S}_0\Sigma^{-1}) \right\}. \end{split}` $$ ] ] --- class: center, middle # Missing data and imputation --- ## Missing data - Missing data/nonresponse is fairly common in real data. For example, + Failure to respond to survey question + Subject misses some clinic visits out of all possible + Only subset of subjects asked certain questions -- - Recall that our posterior computation usually depends on the data through `\(\mathcal{L}(Y; \theta)\)`, which cannot be computed when some of the `\(y_i\)` values are missing. -- - The most common software packages often throw away all subjects with incomplete data (can lead to bias and precision loss). -- - Some individuals impute missing values with a mean or some other fixed value (ignores uncertainty). -- - As you will see, imputing missing data is actually quite natural in the Bayesian context. <!-- --- --> <!-- ## Main types of nonresponse --> <!-- - .hlight[Unit nonresponse]: the individual has no values recorded for any of the variables. For example, when participants do not complete a survey questionnaire at all. --> <!-- - .hlight[Item nonresponse]: the individual has values recorded for at least one variable, but not all variables. --> <!-- <table> --> <!-- <caption>Unit nonresponse vs item nonresponse</caption> --> <!-- <tr> --> <!-- <th> </th> --> <!-- <th height="30px" colspan="3">Variables</th> --> <!-- </tr> --> <!-- <tr> --> <!-- <th> </th> --> <!-- <td height="30px" style="text-align:center" width="100px"> Y<sub>1</sub> </td> --> <!-- <td style="text-align:center" width="100px"> Y<sub>2</sub> </td> --> <!-- <td style="text-align:center" width="100px"> Y<sub>3</sub> </td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td height="30px" style="text-align:left"> Complete cases </td> --> <!-- <td style="text-align:center"> ✔ </td> --> <!-- <td style="text-align:center"> ✔ </td> --> <!-- <td style="text-align:center"> ✔ </td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td rowspan="3"> Item nonresponse </td> --> <!-- <td rowspan="3" style="text-align:center"> ✔ </td> --> <!-- <td height="30px" style="text-align:center"> ✔ </td> --> <!-- <td style="text-align:center"> ❓ </td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td height="30px" style="text-align:center"> ❓ </td> --> <!-- <td style="text-align:center"> ❓ </td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td height="30px" style="text-align:center"> ❓ </td> --> <!-- <td style="text-align:center"> ✔ </td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td height="30px" style="text-align:left"> Unit nonresponse </td> --> <!-- <td style="text-align:center"> ❓ </td> --> <!-- <td style="text-align:center"> ❓ </td> --> <!-- <td style="text-align:center"> ❓ </td> --> <!-- </tr> --> <!-- </table> --> --- ## Missing data mechanisms - Data are said to be .hlight[missing completely at random (MCAR)] if the reason for missingness does not depend on the values of the observed data or missing data. -- - For example, suppose - you handed out a double-sided survey questionnaire of 20 questions to a sample of participants; - questions 1-15 were on the first page but questions 16-20 were at the back; and - some of the participants did not respond to questions 16-20. -- - Then, the values for questions 16-20 for those people who did not respond would be .hlight[MCAR] if they simply did not realize the pages were double-sided; they had no reason to ignore those questions. -- - **This is rarely plausible in practice!** --- ## Missing data mechanisms - Data are said to be .hlight[missing at random (MAR)] if, conditional on the values of the observed data, the reason for missingness does not depend on the missing data. -- - Using our previous example, suppose - questions 1-15 include demographic information such as age and education; - questions 16-20 include income related questions; and - once again, some participants did not respond to questions 16-20. -- - Then, the values for questions 16-20 for those people who did not respond would be .hlight[MAR] if younger people are more likely not to respond to those income related questions than old people, where age is observed for all participants. -- - **This is the most commonly assumed mechanism in practice!** --- ## Missing data mechanisms - Data are said to be .hlight[missing not at random (MNAR or NMAR)] if the reason for missingness depends on the actual values of the missing (unobserved) data. -- - Continuing with our previous example, suppose again that - questions 1-15 include demographic information such as age and education; - questions 16-20 include income related questions; and - once again, some of the participants did not respond to questions 16-20. -- - Then, the values for questions 16-20 for those people who did not respond would be .hlight[MNAR] if people who earn more money are less likely to respond to those income related questions than old people. -- - **This is usually the case in real data, but analysis can be complex!** --- ## Mathematical formulation - Consider the multivariate data scenario with `\(\boldsymbol{Y}_i = (\boldsymbol{Y}_1,\ldots,\boldsymbol{Y}_n)^T\)`, where `\(\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T\)`, for `\(i = 1,\ldots, n\)`. -- - For now, we will assume the multivariate normal model as the sampling model, so that each `\(\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T \sim \mathcal{N}_p(\boldsymbol{\theta}, \Sigma)\)`. -- - Suppose now that `\(\boldsymbol{Y}\)` contains missing values. -- - We can separate `\(\boldsymbol{Y}\)` into the observed and missing parts, that is, `\(\boldsymbol{Y} = (\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis})\)`. -- - Then for each individual, `\(\boldsymbol{Y}_i = (\boldsymbol{Y}_{i,obs},\boldsymbol{Y}_{i,mis})\)`. --- ## Mathematical Formulation - Let + `\(j\)` index variables (where `\(i\)` already indexes individuals), + `\(r_{ij} = 1\)` when `\(y_{ij}\)` is missing, + `\(r_{ij} = 0\)` when `\(y_{ij}\)` is observed. -- - Here, `\(r_{ij}\)` is known as the missingness indicator of variable `\(j\)` for person `\(i\)`. -- - Also, let + `\(\boldsymbol{R}_i = (r_{i1},\ldots,r_{ip})^T\)` be the vector of missing indicators for person `\(i\)`. + `\(\boldsymbol{R} = (\boldsymbol{R}_1,\ldots,\boldsymbol{R}_n)\)` be the matrix of missing indicators for everyone. + `\(\boldsymbol{\psi}\)` be the set of parameters associated with `\(\boldsymbol{R}\)`. -- - Assume `\(\boldsymbol{\psi}\)` and `\((\boldsymbol{\theta}, \Sigma)\)` are distinct. --- ## Mathematical Formulation - MCAR: .block[ `$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{\Psi})$$` ] -- - MAR: .block[ `$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\Psi})$$` ] -- - MNAR: .block[ `$$p(\boldsymbol{R} | \boldsymbol{Y},\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\Psi})$$` ] --- ## Implications for likelihood function - Each type of mechanism has a different implication on the likelihood of the observed data `\(\boldsymbol{Y}_{obs}\)`, and the missing data indicator `\(\boldsymbol{R}\)`. -- - Without missingness in `\(\boldsymbol{Y}\)`, the likelihood of the observed data is .block[ `$$\mathcal{L}(\boldsymbol{Y}_{obs}; \boldsymbol{\theta}, \Sigma) \propto p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)$$` ] -- - With missingness in `\(\boldsymbol{Y}\)`, the likelihood of the observed data is instead .block[ $$ `\begin{split} L(\boldsymbol{Y}_{obs}, \boldsymbol{R}; \boldsymbol{\theta}, \Sigma,\boldsymbol{\psi}) & \propto p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) \\ & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ \end{split}` $$ ] <!-- = \prod^n_{i=1} p(\boldsymbol{Y}_{i,obs} | \boldsymbol{\theta}, \Sigma) --> <!-- & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \prod^n_{i=1} p(\boldsymbol{Y}_{i,obs}, \boldsymbol{Y}_{i,mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ --> -- - Since we do not actually observe `\(\boldsymbol{Y}_{mis}\)`, we would like to be able to integrate it out so we don't have to deal with it. -- - That is, we would like to infer `\((\boldsymbol{\theta}, \Sigma)\)` (and sometimes, `\(\boldsymbol{\psi}\)`) using only the observed data. --- ## Likelihood function: MCAR - For MCAR, we have: .block[ $$ `\begin{split} L(\boldsymbol{Y}_{obs}, \boldsymbol{R}; \boldsymbol{\theta}, \Sigma,\boldsymbol{\psi}) & \propto p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) \\ & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = \int p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot \int p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = p(\boldsymbol{R} | \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma). \\ \end{split}` $$ ] -- - For inference on `\((\boldsymbol{\theta}, \Sigma)\)`, we can simply focus on `\(p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)\)` in the likelihood function, since `\((\boldsymbol{R} | \boldsymbol{\psi})\)` does not include any `\(\boldsymbol{Y}\)`. --- ## Likelihood function: MAR - For MAR, we have: .block[ $$ `\begin{split} L(\boldsymbol{Y}_{obs}, \boldsymbol{R}; \boldsymbol{\theta}, \Sigma,\boldsymbol{\psi}) & \propto p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) \\ & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs}, \boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi}) \cdot \int p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis} \\ & = p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma). \\ \end{split}` $$ ] -- - For inference on `\((\boldsymbol{\theta}, \Sigma)\)`, we can once again focus on `\(p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)\)` in the likelihood function, although there can be some bias if we do not account for `\(p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{X},\boldsymbol{\theta})\)`, since it contains observed data. -- - Also, if we want to infer the missingness mechanism through `\(\boldsymbol{\psi}\)`, we would need to deal with `\(p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{X},\boldsymbol{\theta})\)` anyway. --- ## Likelihood function: MNAR - For MNAR, we have: .block[ $$ `\begin{split} L(\boldsymbol{Y}_{obs}, \boldsymbol{R}; \boldsymbol{\theta}, \Sigma,\boldsymbol{\psi}) & \propto p(\boldsymbol{Y}_{obs}, \boldsymbol{R} |\boldsymbol{\theta}, \Sigma, \boldsymbol{\psi}) \\ & = \int p(\boldsymbol{R} | \boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis},\boldsymbol{\psi}) \cdot p(\boldsymbol{Y}_{obs},\boldsymbol{Y}_{mis} | \boldsymbol{\theta}, \Sigma) \textrm{d}\boldsymbol{Y}_{mis}. \\ \end{split}` $$ ] -- - The likelihood under MNAR cannot simplify any further. -- - In this case, we cannot ignore the missing data when making inferences about `\((\boldsymbol{\theta}, \Sigma)\)`. -- - We must include the model for `\(\boldsymbol{R}\)` and also infer the missing data `\(\boldsymbol{Y}_{mis}\)`. --- ## How to tell in practice? - So how can we tell the type of mechanism we are dealing with? -- - In general, we don't know!!! -- - Rare that data are MCAR (unless planned beforehand); more likely that data are MNAR. -- - **Compromise**: assume data are MAR if we include enough variables in model for the missing data indicator `\(\boldsymbol{R}\)`. -- - Whenever we talk about missing data in this course, we will do so in the context of MCAR and MAR. --- ## Bayesian inference with missing data - As we have seen, for MCAR and MAR, we can focus on `\(p(\boldsymbol{Y}_{obs} | \boldsymbol{\theta}, \Sigma)\)` in the likelihood function, when inferring `\((\boldsymbol{\theta}, \Sigma)\)`. -- - While this is great, for posterior sampling under most models (especially multivariate models), we actually do need all the `\(\boldsymbol{Y}\)`'s to update the parameters. -- - In addition, we may actually want to learn about the missing values, in addition to inferring `\((\boldsymbol{\theta}, \Sigma)\)`. -- - By thinking of the missing data as **another set of parameters**, we can sample them from the "posterior predictive" distribution of the missing data conditional on the observed data and parameters: .block[ .small[ $$ `\begin{split} p(\boldsymbol{Y}_{mis} | \boldsymbol{Y}_{obs},\boldsymbol{\theta}, \Sigma) \propto \prod^n_{i=1} p(\boldsymbol{Y}_{i,mis} | \boldsymbol{Y}_{i,obs},\boldsymbol{\theta}, \Sigma). \end{split}` $$ ] ] -- - In the case of the multivariate model, each `\(p(\boldsymbol{Y}_{i,mis} | \boldsymbol{Y}_{i,obs},\boldsymbol{\theta}, \Sigma)\)` is just a normal distribution, and we can leverage results on conditional distributions for normal models. --- ## Gibbs sampler with missing data At iteration `\(s+1\)`, do the following 1. Sample `\(\boldsymbol{\theta}^{(s+1)}\)` from its multivariate normal full conditional .block[ .small[ `$$p(\boldsymbol{\theta}^{(s+1)} | \boldsymbol{Y}_{obs}, \boldsymbol{Y}_{mis}^{(s)}, \Sigma^{(s)}).$$` ] ] -- 2. Sample `\(\Sigma^{(s+1)}\)` from its inverse-Wishart full conditional .block[ .small[ `$$p(\Sigma^{(s+1)} | \boldsymbol{Y}_{obs}, \boldsymbol{Y}_{mis}^{(s)}, \boldsymbol{\theta}^{(s+1)}).$$` ] ] -- 3. For each `\(i = 1, \ldots, n\)`, with at least one zero value in the missingness indicator vector `\(\boldsymbol{R}_i\)`, sample `\(\boldsymbol{Y}_{i,mis}^{(s+1)}\)` from the full conditional .block[ .small[ `$$p(\boldsymbol{Y}_{i,mis}^{(s+1)} | \boldsymbol{Y}_{i,obs}, \boldsymbol{\theta}^{(s+1)}, \Sigma^{(s+1)}),$$` ] ] which is also multivariate normal, with its form derived by original sampling model but with the updated parameters, that is, `\(\boldsymbol{Y}_i = (Y_{i1},\ldots,Y_{ip})^T = (\boldsymbol{Y}_{i,obs},\boldsymbol{Y}_{i,mis})^T \sim \mathcal{N}_p(\boldsymbol{\theta}^{(s+1)}, \Sigma^{(s+1)})\)`. --- ## Gibbs sampler with missing data - Rewrite `\(\boldsymbol{Y}_i = (\boldsymbol{Y}_{i,mis},\boldsymbol{Y}_{i,obs})^T \sim \mathcal{N}_p(\boldsymbol{\theta}^{(s+1)}, \Sigma^{(s+1)})\)` as .block[ .small[ `\begin{eqnarray*} \boldsymbol{Y}_i = \begin{pmatrix}\boldsymbol{Y}_{i,mis}\\ \boldsymbol{Y}_{i,obs} \end{pmatrix} & \sim & \mathcal{N}_p\left[\left(\begin{array}{c} \boldsymbol{\theta}_1\\ \boldsymbol{\theta}_2 \end{array}\right),\left(\begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right)\right],\\ \end{eqnarray*}` ] ] so that we can take advantage of the conditional normal results. -- - That is, we have .block[ `$$\boldsymbol{Y}_{i,mis} | \boldsymbol{Y}_{i,obs} = \boldsymbol{y}_{i,obs} \sim \mathcal{N}\left(\boldsymbol{\theta}_1 + \Sigma_{12}\Sigma_{22}^{-1} (\boldsymbol{y}_{i,obs}-\boldsymbol{\theta}_2), \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\right).$$` ] as the multivariate normal distribution (or univariate normal distribution if `\(\boldsymbol{Y}_i\)` only has one missing entry) we need in step 3 of the Gibbs sampler. -- - This sampling technique actually encodes MAR since the imputations for `\(\boldsymbol{Y}_{mis}\)` depend on the `\(\boldsymbol{Y}_{obs}\)`. -- - Now let's revisit the reading comprehension example again. We will add missing values to the original data and refit the model. --- ## Reading example with missing data ```r Y <- as.matrix(dget("http://www2.stat.duke.edu/~pdh10/FCBS/Inline/Y.reading")) #Add 20% missing data; MCAR set.seed(1234) Y_WithMiss <- Y #So we can keep the full data Miss_frac <- 0.20 R <- matrix(rbinom(nrow(Y_WithMiss)*ncol(Y_WithMiss),1,Miss_frac), nrow(Y_WithMiss),ncol(Y_WithMiss)) Y_WithMiss[R==1]<-NA Y_WithMiss[1:12,] ``` ``` ## pretest posttest ## [1,] 59 77 ## [2,] 43 39 ## [3,] 34 46 ## [4,] 32 NA ## [5,] NA 38 ## [6,] 38 NA ## [7,] 55 NA ## [8,] 67 86 ## [9,] 64 77 ## [10,] 45 60 ## [11,] 49 50 ## [12,] 72 59 ``` ```r colMeans(is.na(Y_WithMiss)) ``` ``` ## pretest posttest ## 0.1363636 0.2272727 ``` --- ## Reading example with missing data ```r #ACTUAL ANALYSIS STARTS HERE!!! #Data dimensions n <- nrow(Y_WithMiss); p <- ncol(Y_WithMiss) #Hyperparameters for the priors mu_0 <- c(50,50) Lambda_0 <- matrix(c(156,78,78,156),nrow=2,ncol=2) nu_0 <- 4 S_0 <- matrix(c(625,312.5,312.5,625),nrow=2,ncol=2) #Define missing data indicators ##we already know R. This is to write a more general code for when we don't R <- 1*(is.na(Y_WithMiss)) R[1:12,] ``` ``` ## pretest posttest ## [1,] 0 0 ## [2,] 0 0 ## [3,] 0 0 ## [4,] 0 1 ## [5,] 1 0 ## [6,] 0 1 ## [7,] 0 1 ## [8,] 0 0 ## [9,] 0 0 ## [10,] 0 0 ## [11,] 0 0 ## [12,] 0 0 ``` --- ## Reading example with missing data ```r #Initial values for Gibbs sampler Y_Full <- Y_WithMiss #So we can keep the data with missing values as is for (j in 1:p) { Y_Full[is.na(Y_Full[,j]),j] <- mean(Y_Full[,j],na.rm=TRUE) #start with mean imputation } Sigma <- S_0 # can't really rely on cov(Y) because we don't have full Y #Set null objects to save samples THETA_WithMiss <- NULL SIGMA_WithMiss <- NULL Y_MISS <- NULL #first set number of iterations and burn-in, then set seed n_iter <- 10000; burn_in <- 0.3*n_iter ``` --- ## Gibbs sampler with missing data ```r #library(mvtnorm) for multivariate normal #library(MCMCpack) for inverse-Wishart Lambda_0_inv <- solve(Lambda_0) #move outside sampler since it does not change for (s in 1:(n_iter+burn_in)){ ##first we must recalculate ybar inside the loop now since it changes every iteration ybar <- apply(Y_Full,2,mean) ##update theta Sigma_inv <- solve(Sigma) #invert once Lambda_n <- solve(Lambda_0_inv + n*Sigma_inv) mu_n <- Lambda_n %*% (Lambda_0_inv%*%mu_0 + n*Sigma_inv%*%ybar) theta <- rmvnorm(1,mu_n,Lambda_n) ##update Sigma S_theta <- (t(Y_Full)-c(theta))%*%t(t(Y_Full)-c(theta)) S_n <- S_0 + S_theta nu_n <- nu_0 + n Sigma <- riwish(nu_n, S_n) ``` --- ## Gibbs sampler with missing data ```r ##update missing data using updated draws of theta and Sigma for(i in 1:n) { if(sum(R[i,]>0)){ obs_index <- R[i,]==0 mis_index <- R[i,]==1 Sigma_22_obs_inv <- solve(Sigma[obs_index,obs_index]) #invert just once Sigma_12_Sigma_22_obs_inv <- Sigma[mis_index,obs_index]%*%Sigma_22_obs_inv Sigma_cond_mis <- Sigma[mis_index,mis_index] - Sigma_12_Sigma_22_obs_inv%*%Sigma[obs_index,mis_index] mu_cond_mis <- theta[mis_index] + Sigma_12_Sigma_22_obs_inv%*%(t(Y_Full[i,obs_index])-theta[obs_index]) Y_Full[i,mis_index] <- rmvnorm(1,mu_cond_mis,Sigma_cond_mis) } } #save results only past burn-in if(s > burn_in){ THETA_WithMiss <- rbind(THETA_WithMiss,theta) SIGMA_WithMiss <- rbind(SIGMA_WithMiss,c(Sigma)) Y_MISS <- rbind(Y_MISS, Y_Full[R==1] ) } } colnames(THETA_WithMiss) <- c("theta_1","theta_2") colnames(SIGMA_WithMiss) <- c("sigma_11","sigma_12","sigma_21","sigma_22") #symmetry in sigma ``` --- ## Diagnostics ```r #library(coda) THETA_WithMiss.mcmc <- mcmc(THETA_WithMiss,start=1); summary(THETA_WithMiss.mcmc) ``` ``` ## ## Iterations = 1:10000 ## Thinning interval = 1 ## Number of chains = 1 ## Sample size per chain = 10000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## theta_1 45.64 3.012 0.03012 0.03276 ## theta_2 54.15 3.453 0.03453 0.03939 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## theta_1 39.60 43.65 45.62 47.64 51.55 ## theta_2 47.31 51.91 54.17 56.45 61.08 ``` --- ## Diagnostics ```r SIGMA_WithMiss.mcmc <- mcmc(SIGMA_WithMiss,start=1); summary(SIGMA_WithMiss.mcmc) ``` ``` ## ## Iterations = 1:10000 ## Thinning interval = 1 ## Number of chains = 1 ## Sample size per chain = 10000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## sigma_11 194.8 62.89 0.6289 0.6063 ## sigma_12 152.1 60.58 0.6058 0.6910 ## sigma_21 152.1 60.58 0.6058 0.6910 ## sigma_22 247.7 83.55 0.8355 0.9659 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## sigma_11 108.30 151.2 182.5 224.4 348.6 ## sigma_12 64.76 110.3 141.9 182.0 299.6 ## sigma_21 64.76 110.3 141.9 182.0 299.6 ## sigma_22 133.33 189.3 231.8 289.0 450.8 ``` --- ## Compare to inference from full data With missing data: ```r apply(THETA_WithMiss,2,summary) ``` ``` ## theta_1 theta_2 ## Min. 32.64839 41.13748 ## 1st Qu. 43.65457 51.90859 ## Median 45.61740 54.16720 ## Mean 45.63740 54.14929 ## 3rd Qu. 47.64129 56.45068 ## Max. 58.65830 70.26826 ``` Based on true data: ```r apply(THETA,2,summary) ``` ``` ## theta_1 theta_2 ## Min. 35.50314 37.80999 ## 1st Qu. 45.35465 51.53327 ## Median 47.36177 53.68602 ## Mean 47.29978 53.68529 ## 3rd Qu. 49.22875 55.82192 ## Max. 60.94924 69.92354 ``` Very similar for the most part. --- ## Compare to inference from full data With missing data: ```r apply(SIGMA_WithMiss,2,summary) ``` ``` ## sigma_11 sigma_12 sigma_21 sigma_22 ## Min. 74.61274 -10.83674 -10.83674 82.55346 ## 1st Qu. 151.17000 110.33973 110.33973 189.31667 ## Median 182.49663 141.85462 141.85462 231.76447 ## Mean 194.75107 152.14494 152.14494 247.72255 ## 3rd Qu. 224.42867 181.98838 181.98838 288.99033 ## Max. 712.33562 600.36262 600.36262 960.62283 ``` Based on true data: ```r apply(SIGMA,2,summary) ``` ``` ## sigma_11 sigma_12 sigma_21 sigma_22 ## Min. 79.44258 11.41663 11.41663 93.65776 ## 1st Qu. 158.21469 113.23258 113.23258 203.21138 ## Median 190.77854 144.74881 144.74881 244.56334 ## Mean 202.34721 155.33355 155.33355 260.07072 ## 3rd Qu. 234.77319 186.50429 186.50429 300.90761 ## Max. 671.16538 613.88088 613.88088 947.39333 ``` Also very similar. A bit more uncertainty in dimension of `\(Y_{i2}\)` because we have more missing data there. --- ## Diagnostics: trace plots ```r plot(THETA_WithMiss.mcmc[,"theta_1"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: trace plots ```r plot(THETA_WithMiss.mcmc[,"theta_2"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: trace plots ```r plot(SIGMA_WithMiss.mcmc[,"sigma_11"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: trace plots ```r plot(SIGMA_WithMiss.mcmc[,"sigma_12"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: trace plots ```r plot(SIGMA_WithMiss.mcmc[,"sigma_22"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: autocorrelation ```r autocorr.plot(THETA_WithMiss.mcmc[,"theta_1"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: autocorrelation ```r autocorr.plot(THETA_WithMiss.mcmc[,"theta_2"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: autocorrelation ```r autocorr.plot(SIGMA_WithMiss.mcmc[,"sigma_11"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: autocorrelation ```r autocorr.plot(SIGMA_WithMiss.mcmc[,"sigma_12"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Diagnostics: autocorrelation ```r autocorr.plot(SIGMA_WithMiss.mcmc[,"sigma_22"]) ``` <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" /> Looks good! --- ## Posterior distribution of the mean <img src="14-multivariate-normal-model-III_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" />