Please make sure your final output file is a pdf document. You can submit handwritten solutions for non-programming exercises or type them using R Markdown, LaTeX or any other word processor. All programming exercises MUST be done in R, typed up clearly and with all code attached. Submissions should be made on gradescope: go to Assignments \(\rightarrow\) Homework 6.
Continuation of question 1 from homework 5. Generate 50 “test” subjects from the same sampling distribution as question 1 from homework 5, that is, \(\boldsymbol{y_i}^\star = (y_{i,1}^\star,y_{i,2}^\star)^T \sim \mathcal{N}_2(\boldsymbol{\theta}, \Sigma)\), \(i = 1, \ldots, 50\), with \(\boldsymbol{\theta} = (0,0)^T\) and \(\Sigma\) chosen so that the marginal variances are one and \(\rho = 0.8\). Keep the \(y_{i,2}^\star\) values but treat the \(y_{i,1}^\star\) values as unknown/missing (set them to NA but save the “true” values somewhere!).
Now, re-run the Gibbs sampler from the last homework to obtain the posterior samples for \((\boldsymbol{\theta}, \Sigma)\) based only on the 100 “train” subjects from the last homework (that is, \(\boldsymbol{y_i} = (y_{i,1},y_{i,2})^T \sim \mathcal{N}_2(\boldsymbol{\theta}, \Sigma)\), \(i = 1, \ldots, 100\), with the same \((\boldsymbol{\theta}, \Sigma)\)). Using the posterior samples, answer the following questions:Part (a): Generate predictive samples of \(y_{i,1}^\star\) given each \(y_{i,2}^\star\) value you kept, for the 50 test subjects. Show your sampler.
You should view this as a “train –> test” prediction problem rather than a missing data problem on an original data. That is, given the posterior samples of your parameters, and the test values for \(y_{i2}^\star\), draw from the posterior predictive distribution of \((y_{i,1}^\star | y_{i,2}^\star, \{(y_{1,1},y_{1,2}), \ldots, (y_{100,1},y_{100,2})\})\),. You may find it useful to think of this in terms of compositional sampling, that is, for each posterior sample of \((\boldsymbol{\theta}, \Sigma)\), sample from \((y_{i,1} | y_{i,2}, \boldsymbol{\theta}, \Sigma)\), which is just from the form of the sampling distribution. Don’t incorporate the prediction problem into your original Gibbs sampler!
lm
in R if you have never done this in R, which I hope is not the case), using \(y_{i,1}\) as the response variable and \(y_{i,2}\) as the predictor. Now, use the results from the fitted model to predict \(y_{i,1}^\star\) values given the \(y_{i,2}^\star\) values for 50 new test subjects. Show your R code.Part (e): Generate and make a plot of the predictive intervals for the frequentist predictions (lm
predictions are very easy to do in R). Comment on the differences between your results and the Bayesian approach.
20 points.