class: center, middle, inverse, title-slide # Course overview and introduction to Bayesian inference ### Dr. Olanrewaju Michael Akande ### Jan 10, 2020 --- class: center, middle # Welcome to STA 602L! --- ## What is this course about? <i class="fa fa-book fa-2x"></i> Learn the foundations of Bayesian inference. -- <i class="fa fa-folder-open fa-2x"></i> Work through the theory of several Bayesian models. -- <i class="fa fa-tasks fa-2x"></i> Use Bayesian models to answer inferential questions. -- <i class="fa fa-database fa-2x"></i> Apply the models to several different problem sets. -- <i class="fa fa-refresh fa-spin fa-2x"></i> ".hlight[Prior] `\(\rightarrow\)` .hlight[likelihood] `\(\rightarrow\)` .hlight[posterior]" over and over again! -- <i class="fa fa-clock fa-2x"></i> We will follow the Hoff book closely -- roughly one chapter per week. -- --- <i class="fa fa-quote-left fa-2x fa-pull-left fa-border" aria-hidden="true"></i> <i class="fa fa-quote-right fa-2x fa-pull-right fa-border" aria-hidden="true"></i> A Bayesian version will usually make things better... -- Andrew Gelman. --- ## Instructor [Dr. Olanrewaju Michael Akande](https://akandelanre.github.io.) <i class="fa fa-envelope"></i> [olanrewaju.akande@duke.edu](mailto:olanrewaju.akande@duke.edu) <br> <i class="fa fa-home"></i> [akandelanre.github.io.](https://akandelanre.github.io/IDS702_F19/) <br> <i class="fa fa-university"></i> [256 Gross Hall](https://maps.duke.edu/?id=21#!m/6866?s/Gross%20Hall) <br> <i class="fa fa-calendar"></i> Wed 9:00 - 10:00am; Thur 11:45 - 12:45pm (still subject to change!) --- ## Lead TA [Jordan Bryan](https://stat.duke.edu/people/jordan-bryan) (mainly for STA 360) <i class="fa fa-envelope"></i> [jordan.bryan@duke.edu](mailto:jordan.bryan@duke.edu) <br> ## TAs [Bai Li](https://stat.duke.edu/people/bai-li) <i class="fa fa-envelope"></i> [bai.li@duke.edu](mailto:bai.li@duke.edu) <br> <i class="fa fa-calendar"></i> Wed 3:00 - 5:00pm <br> <i class="fa fa-university"></i> [Old Chem 025](https://maps.duke.edu/map/?id=21#!s/key=Old%20Chemistry?m/2766) [Zhuoqun (Carol) Wang](https://stat.duke.edu/people/zhuoqun-wang-0) <i class="fa fa-envelope"></i> [zhuoqun.wang@duke.edu](mailto:zhuoqun.wang@duke.edu) <br> <i class="fa fa-calendar"></i> Tues 3:00 - 5:00pm <br> <i class="fa fa-university"></i> [Old Chem 025](https://maps.duke.edu/map/?id=21#!s/key=Old%20Chemistry?m/2766) --- ## FAQs All materials and information will be posted on the course webpage: [https://sta-602l-s20.github.io/Course-Website/](https://sta-602l-s20.github.io/Course-Website/) -- - How much theory will this class cover? A lot! Make sure you are especially comfortable working with probability distributions. -- - Am I prepared to take this course? Yes, if you are familiar with the topics covered in the course prerequisites. -- - Will we be doing "very heavy" computing? Not too heavy but yes, a good amount. You will be expected to be able to write your own MCMC sampler later on. -- - What computing language will we use? R! -- - What if I don't know R? This course assumes you already know R but you can still learn on the fly (you must be self-motivated). Here are some resources for you: [https://sta-602l-s20.github.io/Course-Website/resources/](https://sta-602l-s20.github.io/Course-Website/resources/). --- class: center, middle # Course structure and policies --- ## Course structure and policies - All on the website (here: [https://sta-602l-s20.github.io/Course-Website/policies/](https://sta-602l-s20.github.io/Course-Website/policies/)) -- - Make use of the teaching team's office hours, we're here to help! -- - Do not hesitate to come to my office during office hours or by appointment to discuss a homework problem or any aspect of the course. -- - When the teaching team has announcements for you we will send an email to your Duke email address. Please make sure to check your email daily. -- - Please refrain from texting or using your computer for anything other than coursework during class. --- ## Other details - What topics will we cover? Refer to Section 9 of the syllabus (here: https://sta-602l-s20.github.io/Course-Website/syllabus_pdf/Syllabus.pdf). -- - If you are auditing this course, remember to complete the audit form for the graduate school. -- - Confirm that you have access to Sakai and Gradescope. --- class: center, middle # Your turn! --- ## Introductions - Your full name. - The name you prefer to go by. - One goal you hope this course would help you achieve. --- class: center, middle # Introduction to Bayesian inference --- ## What are Bayesian methods? - .hlight[Bayesian methods] are data analysis tools derived from the principles of Bayesian inference and provide -- + parameter estimates with good statistical properties; -- + parsimonious descriptions of observed data; -- + predictions for missing data and forecasts of future data; and -- + a computational framework for model estimation, selection, and validation. --- ## Building blocks of Bayesian inference - Generally (unless otherwise stated), in this course, we will use the following notation. Let -- + `\(\mathcal{Y}\)` be the .hlight[sample space]; -- + `\(y\)` be the .hlight[observed data]; -- + `\(\Theta\)` be the .hlight[parameter space]; and -- + `\(\theta\)` be the .hlight[parameter of interest]. -- - More to come later. --- ## Bayes' theorem - basic conditional probability - Let's take a step back and quickly review the basic form of Bayes' theorem. -- - Suppose there are some events `\(A\)` and B having probabilities `\(\Pr(A)\)` and `\(\Pr(B)\)`. -- - Bayes' rule gives the relationship between the marginal probabilities of A and B and the conditional probabilities. -- - In particular, the basic form of .hlight[Bayes' rule] or .hlight[Bayes' theorem] is .block[ .small[ `$$\Pr(A | B) = \frac{\Pr(A \ \textrm{and} \ B)}{\Pr(B)} = \frac{\Pr(B|A)\Pr(A)}{\Pr(B)}$$` ] ] -- `\(\Pr(A)\)` = marginal probability of event `\(A\)`, `\(\Pr(B | A)\)` = conditional probability of event `\(B\)` given event `\(A\)`, and so on. --- ## Building blocks of Bayesian inference - Now, to a slightly more complicated version of Bayes' rule. First, 1. For each `\(\theta \in \Theta\)`, specify a .hlight[prior distribution] `\(p(\theta)\)` or `\(\pi(\theta)\)`, describing our beliefs about `\(\theta\)` being the true population parameter. -- 2. For each `\(\theta \in \Theta\)` and `\(y \in \mathcal{Y}\)`, specify a .hlight[sampling distribution] `\(p(y|\theta)\)`, describing our belief that the data we see `\(y\)` is the outcome of a study with true parameter `\(\theta\)`. `\(p(y|\theta)\)` gets us the .hlight[likelihood] `\(L(y;\theta)\)` -- 3. After observing the data `\(y\)`, for each `\(\theta \in \Theta\)`, update the prior distribution to a .hlight[posterior distribution] `\(p(\theta | y)\)`, describing our "updated" belief about `\(\theta\)` being the true population parameter. -- - Now, how do we get from Step 1 to 3? .hlight[Bayes' rule]! .block[ .small[ `$$p(\theta | y) = \frac{p(\theta)L(y;\theta)}{\int_{\Theta}p(\tilde{\theta})L(y; \tilde{\theta}) \textrm{d}\tilde{\theta}} = \frac{p(\theta)L(y;\theta)}{L(y)}$$` ] ] We will use this over and over throughout the course! --- ## Notes on prior distributions Many types of priors may be of interest. These may + represent our own beliefs; -- + represent beliefs of a variety of people with differing prior opinions; or -- + assign probability more or less evenly over a large region of the parameter space. -- + and so on... --- ## Notes on prior distributions - .hlight[Subjective Bayes]: a prior should accurately quantify some individual's beliefs about `\(\theta\)`. -- - .hlight[Objective Bayes]: the prior should be chosen to produce a procedure with "good" operating characteristics without including subjective prior knowledge. -- - .hlight[Weakly informative]: prior centered in a plausible region but not overly-informative, as there is a tendency to be over confident about one's beliefs. --- ## Notes on prior distributions - The prior quantifies your initial uncertainty in `\(\theta\)` before you observe new data (new information) - this may be necessarily subjective & summarize experience in a field or prior research. -- - Even if the prior is not "perfect", placing higher probability in a ballpark of the truth leads to better performance. -- - Hence, it is very seldom the case that a weakly informative prior is not preferred over no prior. -- - One (very important) role of the prior is to stabilize estimates in the presence of limited data. --- ## Simple example - estimating a population proportion - Suppose `\(\theta \in (0,1)\)` is the population proportion of individuals with diabetes in the US. -- - A prior distribution for `\(\theta\)` would correspond to some distribution that distributes probability across `\((0, 1)\)`. -- - A very precise prior corresponding to abundant prior knowledge would be concentrated tightly in a small sub-interval of `\((0, 1)\)`. -- - A vague prior may be distributed widely across `\((0, 1)\)` - e.g., a uniform distribution would be the common choice here. --- ## Some possible prior densities <img src="img/beta_densities.png" height="500px" style="display: block; margin: auto;" /> --- ## Beta prior densities - These three priors correspond to Beta(1,1) [also, Unif(0,1)], Beta(1,11) and Beta(2,10) densities. -- - .hlight[Beta(a,b)] is a probability density function (pdf) on (0,1), .block[ .small[ `$$\pi(\theta) = \frac{1}{B(a,b)} \theta^{a-1} (1-\theta)^{b-1},$$` ] ] where `\(B(a,b)\)` = beta function = normalizing constant ensuring the kernel integrates to one. Note: some texts write `\(\textrm{beta}(\alpha,\beta)\)` instead. -- - The beta(a,b) distribution has expectation `\(\mathbb{E} = a/(a+b)\)` and the density becomes more and more concentrated as `\(a + b\)` = prior "sample size" increases. -- - The variance is `\(ab/[(a+b)^2(a+b+1)]\)`. -- - We will look more carefully into the beta-binomial model next week but for now, I'll illustrate how this prior gets updated as data becomes available.