Exercises

Probability

Exercise 1 (Joint Distributions) A credit card company collects data on \(10,000\) users. The data contained two variables: an indicator of he customer status: whether they are in default (def =1) or if they are current with their payments (def =0). Moreover, they have a measure of their loan balance relative to income with three categories: a low balance (bal=1), medium (bal=2) and high (bal=3). The data are given in the following table:

def
bal 0 1
1 8,940 64
2 651 136
3 76 133
  1. Compute the marginal distribution of customer status
  2. What is the conditional distribution of bal given def =1
  3. Make a prediction for the status of a customer with a high balance

Solution:

  1. The marginal distribution of customer status is given by the sum of the rows of the table.
d = data.frame(def0=c(8940,651,76),def1=c(64,136,133),row.names = c("bal1","bal2","bal3"))
apply(d,2,sum)/10000
 def0  def1 
0.967 0.033 
  1. The conditional distribution of bal given def =1 is given by the ratio of the joint (elements of the table) to the marginal (sum of the def =1 column)
d[,"def1"]/sum(d[,"def1"])
[1] 0.19 0.41 0.40
  1. The prediction for the status of a customer with a high balance is the conditional distribution over def =1, given balance. \[ P(def\mid bal) = \frac{P(def \text{ and } bal)}{P(bal)} \]
balmarginal = apply(d,1,sum)
d1b3 = d["bal3","def1"]/balmarginal["bal3"]
d1b1 = d["bal1","def1"]/balmarginal["bal1"]
print(d1b3)
bal3 
0.64 
print(d1b1)
  bal1 
0.0071 
# The ratio is 
d1b3/d1b1
bal3 
  90 

Person with high balance has 63% chance of being in default, while a person with low balance has 0.7% chance of being in default, 90-times less likely!

Exercise 2 (Marginal) The table below is taken from the Hoff text and shows the joint distribution of occupations taken from a 1983 study of social mobility by @logan1983multivariate. Each cell is P(father’s occupation, son’s occupation).

d = data.frame(farm = c(0.018,0.002,0.001,0.001,0.001),operatives=c(0.035,0.112,0.066,0.018,0.029),craftsman=c(0.031,0.064,0.094,0.019,0.032),sales=c(0.008,0.032,0.032,0.010,0.043),professional=c(0.018,0.069,0.084,0.051,0.130), row.names = c("farm","operative","craftsman","sales","professional"))
d %>% knitr::kable()
farm operatives craftsman sales professional
farm 0.02 0.04 0.03 0.01 0.02
operative 0.00 0.11 0.06 0.03 0.07
craftsman 0.00 0.07 0.09 0.03 0.08
sales 0.00 0.02 0.02 0.01 0.05
professional 0.00 0.03 0.03 0.04 0.13
  1. Find the marginal distribution of fathers’ occupations.
  2. Find the marginal distribution of sons’ occupations.
  3. Find the conditional distribution of the son’s occupation given that the father is a farmer.
  4. Find the conditional distribution of the father’s occupation given that the son is a farmer.
  5. Comment on these results. What do they say about changes in farming in the population from which these data are drawn?

Solution:

apply(d,1,sum)
        farm    operative    craftsman        sales professional 
       0.110        0.279        0.277        0.099        0.235 
apply(d,2,sum)
        farm   operatives    craftsman        sales professional 
       0.023        0.260        0.240        0.125        0.352 
d["farm",]/sum(d["farm",])
     farm operatives craftsman sales professional
farm 0.16       0.32      0.28 0.073         0.16
d[,"farm"]/sum(d[,"farm"])
[1] 0.783 0.087 0.043 0.043 0.043
dd = data.frame(son=as.double(d["farm",]/sum(d["farm",])),father=as.double(d[,"farm"]/sum(d[,"farm"])))
barplot(height = t(as.matrix(dd)),beside=TRUE,legend=TRUE)

The bar chart allows us to visualize the marginal distributions of occupations of fathers and sons. The striking feature of this chart is that in the sons, the proportion of farmers has greatly decreased and the proportion of professionals has increased. From Part d, we see that of the sons who are farmers, 78% had fathers who are farmers. On the other hand, Part c shows that only 16% of the fathers who farmed produced sons who farmed. This is higher than the 2.3% of sons in the general population who became farmers, showing that sons of farmers went into farming at a rate higher than the general population of the study. However, most of the sons of farmers went into another profession. There was a great deal of movement out of farming and very little movement into farming from the fathers’ to the sons’ generation.

To determine the validity of inference outside the sample, we would need to know more about how the study was designed. We do not know how the sample was selected or what steps were taken to ensure it was representative of the population being studied. We also are not given the sample size, so we don’t know the accuracy of our probability estimates. The paper from which the table was taken, cited in the problem, provides more detail on the study.

Exercise 3 (Conditional) Netflix surveyed the general population as to the number of hours per week that you used their service. The following table provides the proportions of each category according to whether you are a teenager or adults.

Hours Teenager Adult
\(<4\) 0.18 0.20
\(4\) to \(6\) 0.12 0.32
\(>6\) 0.04 0.14

Calculate the following probabilities:

  1. Given that you spend \(4\) to \(6\) hours a week watching movies, what’s the probability that you are a teenager?
  2. What is the marginal distribution of hours spent watching movies.
  3. Are hours spent watching Netflix movies independent of age?

Solution:

  1. \(\frac{0.12}{0.12+0.32}=0.2727\)

  2. See Table:

    Hours Probability
    \(<4\) 0.18+0.20=0.38
    \(4\) to \(6\) 00.12+0.32=0.44
    \(>6\) 0.04+0.14=0.18
  3. No, the marginal distribution of hours given you are a teenager or adult are not the same:

Hours Teenager Adult
\(<4\) \(\frac{0.18}{0.18+0.12+0.04}=0.5294\) \(\frac{0.20}{0.20+0.32+0.14}=0.3030\)
\(4\) to \(6\) \(\frac{0.12}{0.18+0.12+0.04}=0.3529\) \(\frac{0.32}{0.20+0.32+0.14}=0.4848\)
\(>6\) \(\frac{0.04}{0.18+0.12+0.04}=0.1176\) \(\frac{0.14}{0.20+0.32+0.14}=0.2121\)

Exercise 4 (Joint and Conditional) The following probability table relates \(Y\) the number of TV shows watched by the typical student in an evening to the number of drinks \(X\) consumed.

Y
X 0 1 2 3
0 0.07 0.09 0.06 0.01
1 0.07 0.06 0.07 0.01
2 0.06 0.07 0.14 0.03
3 0.02 0.04 0.16 0.04
  1. What is the probability that a student has more than two drinks in an evening?
  2. What is the probability that a student drink more than the number of TV shows they watch?
  3. What’s the conditional distribution of the number of TV shows watched given they consume \(3\) drinks?
  4. What’s the expected number of drinks given they do not watch TV
  5. Are drinking and watching TV independent?

Solution:

  1. \(P(X > 2 ) = 0.26\).
  2. \(P(X>Y) = 0.42\)
  3. \(P( Y | X=4 )\) are given by \(( 1/13 , 2/13 , 8/13 , 2/13 )\).
  4. \(P( X | Y=0 )\) are given by \(( 7/22 , 7/22 , 1/11 , 3/11 )\) and so \(E(X| Y=0) = 25/22 = 1.13\)
  5. No. \(P( X = 2 \text{ and } Y=2 )= 0.14 \neq 0.43 \times 0.30 = P(X=2 ) P(Y=2)\)

Exercise 5 (Conditional probability) Shipments from an online retailer take between 1 and 7 days to arrive, depending on where they ship from, when they were ordered, the size of the item, etc. Suppose the distribution of delivery times has the following distribution function:

x 1 2 3 4 5 6 7
\(\mbox{P}(X = x)\)
\(\mbox{P}(X \leq x)\) 0.10 0.20 0.70 0.75 0.80 0.90 1
  1. Fill in the above probability table.
  2. What is the conditional probability of a delivery arriving on day four given that it did not arrive in the first three days? (Hint: find \(P(X = 4 \mid X >= 4)\))

Solution:

\[P(X = 1) = P(X \leq 1) = 0.1\] \[P(X = 2) = P(X \leq 2) - P(X \leq 1) = 0.2 - 0.1 = 0.1\] \[P(X = 3) = P(X \leq 3) - P(X \leq 2) = 0.7 - 0.2 = 0.5\]

x 1 2 3 4 5 6 7
\(\mbox{P}(X = x)\) 0.10 0.10 0.50 0.05 0.05 0.10 0.10
\(\mbox{P}(X \leq x)\) 0.10 0.20 0.70 0.75 0.80 0.90 1

\[ \begin{aligned} P(X >= 4) &= 0.05 + 0.05 + 0.10 + 0.10 = 0.30\\ P(X = 4 \text{ and } X >=4) &= P(X = 4) = 0.05\\ P(X = 4 \mid X >= 4) &= \frac{P(X = 4 \text{ and } X >=4)}{P(X >= 4)} = \frac{0.05}{0.30} = \frac{1}{6}\end{aligned} \]

Exercise 6 (Joint and Conditional) A cable television company has \(10000\) subscribers in a suburban community. The company offers two premium channels, HBO and Showtime. Suppose \(2750\) subscribers receive HBO and \(2050\) receive Showtime and \(6200\) do not receive any premium channel.

  1. What is the probability that a randomly selected subscriber receives both HBO and Showtime.
  2. What is the probability that a randomly selected subscriber receives HBO but not Showtime.

You now obtain a new dataset, categorized by gender, on the proportions of people who watch HBO and Showtime given below

Cable Female Male
HBO 0.14 0.48
Showtime 0.17 0.21
  1. Conditional on being female, what’s the probability you receive HBO?
  2. Conditional on being female, what’s the probability you receive Showtime?

Solution:

  1. We can calculate the number of people who receive both HBO and Showtime by calculating how many were “double counted” in the HBO and Showtime counts.
10000 - 6200 - 2750 - 2050
[1] -1000

Thus, the answer is \(1000/10000 = 0.1\). a) \(\frac{2750-1000}{10000}=\frac{1750}{10000}=0.175\) a) \(P(HBO|F)=\frac{0.14}{0.14+0.17}=0.4516\) a) \(P(Show|F)=\frac{0.17}{0.14+0.17}=0.5484\)

Exercise 7 (Conditionals and Expectations) The following probability table describes the daily sales volume, \(X\), in thousands of dollars for a salesperson for the number of years \(Y\) of sales experience for a particular company.

Y
X 1 2 3 4
10 0.14 0.03 0.03 0
20 0.05 0.10 0.12 0.07
30 0.10 0.06 0.25 0.05
  1. Verify that this is a legal probability table.
  2. What is the probability of at least two years experience?
  3. Calculate the mean daily sales volume
  4. Given a salesperson has three years experience, calculate the mean daily sales volume.
  5. A salesperson is paid $1000 per week plus 2% of total sales. What is the expected compensation for a salesperson?

Solution:

  1. All probabilities are between zero and one, and the entire table adds to one
  2. \(P(Y\geq 2)=0.19+0.40+0.12=0.71\)
  3. \(E(X) = 10 \times 0.20 + 20 \times 0.34 + 30 \times 0.46 = 22.6\), that is $22,600
  4. First, the conditional probability distribution \(p( X|Y)\) is given by, \(0.03 /0.4 , 0.12/0.4 , 0.25/0.4\). Hence \(E(X|Y=3)=10\times 0.075+20\times 0.3+30\times 0.625=25.5,\) that is $25,500
  5. Assuming a 5-day workweek, \(E(Comp)=1000+0.02E(X_{1}+X_{2}+X_{3}+X_{4}+X_{5})=1000+0.02\cdot 5\cdot E(X)=1000+0.02\cdot 5\cdot 22600=\$3,260\). With a 7-day week, \(E(Comp)=1000+0.02\cdot 7\cdot 22600=\$4,164\)

Exercise 8 (Expectation) \(E(X+Y) = E(X) + E(Y)\) only if the random variables \(X\) and \(Y\) are independent

Solution:

False. By the plug-in rule, this relation holds irrespective of whether \(X\) and \(Y\) are independent.

Exercise 9 (Conditional Probability) A super market carried out a survey and found the following probabilities for people who buy generic products depending on whether they visit the store frequently or not

Purchase Generic
Visit Often Sometime Never
Frequent 0.10 0.50 0.17
Infrequent 0.03 0.05 0.15
  1. What is the probability that a customer who never buys generics visits the store?
  2. What is the probability that a customer often purchases generic?
  3. Are buying generics and visiting the store independent decisions?
  4. What is the conditional distribution of purchasing generics given that you frequently visit the store?

Solution:

  1. \(P ( F \text{ and } N ) + P ( \bar{F} \text{ and } N ) = 0.17 + 0.15 = 0.32\)
  2. \(P( F \text{ and } O ) + P( \bar{F} \text{ and } O ) = 0.10 + 0.03 = 0.13\)
  3. \(P( F | S ) = 0.9091 , P( \bar{F} | S ) = 0.0909\) and \(P( F | O ) = 0.7692 , P( \bar{F} | O ) = 0.2308\). The probabilities are not the same so the events are not independent.
  4. \(P( O | F ) = 0.1299 , P( S| F) = 0.6494 , P( N | F ) = 0.2208\).

Exercise 10 (Conditional Probability) Cooper Realty is a small real estate company located in Albany, New York, specializing primarily in residential listings. They have recently become interested in determining the likelihood of one of their listings being sold within a certain number of days. An analysis of recent company sales of 800 homes in produced the following table:

Days Listed until Sold Under 20 31-90 Over 90 Total
Under $50K 50 40 10 100
$50-$100K 20 150 80 250
$100-$150K 20 280 100 400
Over $ 150K 10 30 10 50
  1. Estimate the probability that a home listed for over 90 days before being sold
  2. Estimate the probability that the initial asking price is under $50K.
  3. What the the probability of both of the above happening? Are these two events independent?
  4. Assuming that a contract has just been signed to list a home that has an initial asking price of less than $100K, what is the probability that the home will take Cooper Realty more than 90 days to sell?

Solution: Let \(A\) be the event that it takes more than 90 days to sell. Let \(B\) denote the event that the initial asking price is under $50K.

  1. \(P(A) = ( 10 + 80 + 100 + 10)/800 = 200/800 = 0.25\)
  2. \(P(B) = 100/800 = 0.125\)
  3. \(P( A \text{ and } B ) = 10/800 = 0.0125\)
  4. First, \(P( < \$100K ) = 350/800 = 0.4375\). Secondly, \[ P( A | < \$100K ) = P( < \$100K | A ) P(A)/ P( < \$100K) = (90/200)\times(200/800)/0.4375 = 0.2571 \]

Exercise 11 (Probability and Combinations.) In 2006, the St. Louis Cardinals and the Detroit Tigers played for the World Series. The two teams play seven games, and the first team to win four games wins the world series.

The Cardinals were leading the series 3 – 1. Given that each game is independent of another and that the probability of the Cardinals winning any single game is 0.55, what’s the probability that they would go on to win the World Series?

In 2012, the St. Louis Cardinals found themselves in a similar situation against the San Francisco Giants in the National League Championships. Now suppose that the probability of the Cardinals winning any single game is 0.45.

How does the probability that they get to the World Series differ from before?

Exercise 12 (Probability and Lotteries) The Powerball lottery is open to participants across several states. When entering the powerball lottery, a participant selects five numbers from 1-59 and then selects a powerball number from the digits 1-35. In addition, there’s a $1 million payoff for anybody selecting the first five numbers correctly.

  1. Show that the odds of winning the Powerball Jackpot are 1 in 175,223,510.
  2. Show that the odds of winning the $1 million are 1 in 5,153,632.

On February 18, 2006 the Jackpot reached $365 million. Assuming that you will either win the Jackpot or the $1 million prize, what’s your expected value of winning?

Mega Millions is a similar lottery where you pick 5 balls out of 56 and a powerball from 46. Show that the odds of winning mega millions are higher than the Powerball lottery On March 30, 2012 the Jackpot reached $656 million. Is your expected value higher or lower than that calculated for the Powerball lottery?

Exercise 13 (Joint Probability) A market research survey finds that in a particular week \(28\%\) of all adults watch a financial news television program; \(17\%\) read a financial publication and \(13\%\) do both.

  1. Fill in the blanks in the following joint probability table
Watches TV Doesn’t Watch Total
Reads .13 .17
Doesn’t Read
.28 1.00
  1. What is the probability that someone who watches a financial TV program read a publication oriented towards finance?
  2. What is the probability that someone who reads a finance publication watches a financial TV program.
  3. Why aren’t the answers to the above questions equal?

Solution:

  1. All probabilities come from the given probabilities and the fact that the sum of the column/row probabilities must add to 1.
Watches TV Doesn’t Watch Total
Reads .13 0.4 .17
Doesn’t Read .15 .68 .83
.28 .72 1.00
  1. \[P(reads\mid TV)=P(reads \text{ and } TV)/P(TV)=.13/.28=.4643\]

  2. \[P(TV\mid reads) = P(reads \text{ and } TV)/P(reads)=.13/.17=.7647\]

  3. The reason for the difference is the denominators of the equations are different. This is an example of a base rate issue, it is more likely that someone who reads watches TV because fewer people read. This is not a condition of independence.

Exercise 14 (Conditional Probability.) A local bank is reviewing its credit card policy. In the past 5% of card holders have defaulted. The bank further found that the chance of missing one or more monthly payments is 0.20 for customers who do not default. Of course, the probability of missing one or more payments for those who default is 1.

  1. Given that a customer has missed a monthly payment, compute the probability that the customer will default.
  2. The bank would like to recall its card if the probability that a customer will default is greater than 0.20. Should the bank recall its card if the customer misses a monthly payment? Why or why not?

Exercise 15 (Correlation) The following table shows the descriptive statistics from \(1000\) days of returns on IBM and Exxon’s stock prices.

            N   Mean    StDev    SE Mean
IBM      1000   0.0009   0.0157  0.00049 
Exxon    1000   0.0018   0.0224  0.00071

Here is the covariance table

          IBM     Exxon
IBM     0.000247
Exxon   0.000068  0.00050
  1. What is the variance of IBM returns?
  2. What is the correlation between IBM and Exxon’s returns?
  3. Consider a portfolio that invests \(50\)% in IBM and \(50\)% in Exxon. What are the mean and variance of the portfolio? Do you prefer this portfolio to just investing in IBM on its own?

Solution:

  1. \[\sigma^2_{IBM}=0.000247\]
  2. \[\rho=\frac{Cov(IBM,Exxon)}{\sigma_{IBM}\sigma_{Exxon}}=\frac{0.000068}{\sqrt{0.000247}\sqrt{0.00050}}=0.1935\]
  3. \[\mu_p=w_{I}\mu_{I}+w_{E}\mu_{E}=(0.5)(0.0009)+(0.5)(0.0018)=0.00135\] \[\sigma^2_p=w_{I}^2\sigma_{I}^2+w_E^2\sigma_{E}^2+2w_Iw_E Cov(I,E)\] \[=(0.5)^2(0.000247)+(0.5)^2(0.00050)+2(0.5)(0.5)(0.000068=0.000221\] Yes, because the portfolio has a higher mean return but with a lower variance.

Exercise 16 (Normal Distribution) After Facebook’s earnings announcement we have the following distribution of returns. First, the stock beats earnings expectations \(75\)% of the time, and the other \(25\)% of the time earnings are in line or disappoint. Second, when the stock beats earnings, the probability distribution of percent changes is normal with a mean of \(10\)% with a standard deviation of \(5\)% and, when the stock misses earnings, a normal with a mean of \(-5\)% and a standard deviation of \(8\)%, respectively.

  1. Ahead of the earnings announcement, what is the probability that Facebook stock will have a return greater than \(5\)%?
  2. Do you get the same answer for the probability that it drops at least \(5\)%?
  3. Use simulation to provide empirical answers with sample of size N = 10, 000, check and see how close you get to the theoretical answers you’ve found to the questions posed above. Provide histograms of the distributions you simulate.

Solution:

We have the following information: \[ P(\textrm{Beat Earnings}) = 0.75 ~ \mathrm{ and} ~ P(\textrm{Not Beat Earnings}) = 0.25 \] together with the following probabilities distributions \[ R_{Beat}\sim N(0.10,0.05^2) ~ \mathrm{ and} ~ R_{Not}\sim N(-0.05,0.08^2) \] We want to find the probability that Facebook stock will have a return greater than 5%: \[ \begin{aligned} P(\textrm{Beat}) & \times P(R_{Beat}>0.05) + P(\textrm{Not}) \times P(R_{Not}>0.05)\\ &=0.75 \times (1-F_{Beat}(0.05)) + 0.25 \times (1-F_{Not}(0.05)) \end{aligned} \]

0.75*(1-pnorm(0.05,0.1,0.05)) + 0.25*(1-pnorm(0.05,-0.05,0.08)) #0.657
[1] 0.66

Therefore, the probability that Facebook stock beats a 5% return is 65.7%.

Similarly, for dropping at least 5%, we want: \[ \begin{aligned} P(\textrm{Beat})\times P(R_{Beat}<-0.05) + P(\textrm{Not}) \times P(R_{Not}<-0.05)\\ =0.75 \times (F_{Beat}(-0.05)) + 0.25 \times (F_{Not}(-0.05)) \end{aligned} \] We can compute this is in R by the following commands:

0.75*pnorm(-0.05,0.1,0.05) + 0.25*pnorm(-0.05,-0.05,0.08) #0.1260124
[1] 0.13

Therefore, the probability that Facebook stock drops at least 5% is 12.6%.

Exercise 17 (Probability) Answer the following statements TRUE or FALSE, providing a succinct explanation of your reasoning.

  1. If the odds in favor of \(A\) are 3:5 then \(\mbox{P}(A) = 0.4\).
  2. You roll two fair three-sided dice. The probability the two dice show the same number is 1/4.
  3. If events \(A\) and \(B\) are independent and \(\mbox{P}(A) > 0\) and \(\mbox{P}(B)>0\), then \(\mbox{P}(A \mbox{ and } B) > 0\).
  4. If \(\mbox{P}(A \; \text{ and} \; B) \geq 0.5\) then \(P(A) \leq 0.5\).
  5. If two random variables have non-zero correlation, then they must be dependent.
  6. If two random variables have zero correlation, then they must be independent.
  7. If two random variables are independent, then the correlation between them must be zero.
  8. If \(P(A \text{ and } B) \leq 0.2\), then \(P(A) \leq 0.2\).

Solution:

  1. If the odds in favor of \(A\) are 3:5 then \(\mbox{P}(A) = 0.4\). FALSE \(P(A) = \frac{3}{3+5} = 0.375\)
  2. You roll two fair three-sided dice. The probability the two dice show the same number is 1/4. FALSE \(P = \frac{1}{3}\times\frac{1}{3}\times 3 = \frac{1}{3}\).
  3. If events \(A\) and \(B\) are independent and \(\mbox{P}(A) > 0\) and \(\mbox{P}(B)>0\), then \(\mbox{P}(A \mbox{ and } B) > 0\). TRUE \(\mbox{P}(A \mbox{ and } B) = \mbox{P}(A) \times \mbox{P}(B)>0\).
  4. If \(\mbox{P}(A \; \text{ and} \; B) \geq 0.5\) then \(P(A) \leq 0.5\). FALSE Suppose A B are independent and \(P(A) = 1\), \(P(B) = 0.5\). \(\mbox{P}(A \; \text{ and} \; B) \geq 0.5\)
  5. If two random variables have non-zero correlation, then they must be dependent. TRUE If two random variables are independent, they always have 0 correlation. Therefore if correlation is not 0, they cannot be independent.
  6. If two random variables have zero correlation, then they must be independent. FALSE Suppose \(X\) takes value in \(\{-1, 0, 1\}\) with probability \(\frac{1}{3}\) for each. \(Y = X^2\), we have

\[ X = \begin{cases} -1 & \text{with prob } 1/3\\ 0 & \text{with prob } 1/3\\ 1 & \text{with prob } 1/3 \end{cases}, Y = \begin{cases} 1 & \text{with prob } 2/3\\ 0 & \text{with prob } 1/3\ \end{cases} \]

It’s easy to find that \(\text{Cov}(X, Y) = 0\). But \(Y\) is a function of \(X\), so they are not independent.

  1. If two random variables are independent, then the correlation between them must be zero. TRUE Independence implies uncorrelation, thus correlation is 0.
  2. False, We only know \(P(A \text{ and } B) \leq P(A)\)

Exercise 18 (Binomial Distribution) The Downhill Manufacturing company produces snowboards. The average life of their product is \(10\) years. A snowboard is considered defective if its life is less than \(5\) years. The distribution is approximately normal with a standard deviation for the life of a board of \(3\) years.

  1. What’s the probability of a snowboard being defective?
  2. In a shipment of \(120\) snowboards, what is the probability that the number of defective boards is greater than \(10\)?
  3. Use simulation to provide empirical answers with sample of size N = 10, 000, check and see how close you get to the theoretical answers you’ve found to the questions posed above. Provide histograms of the distributions you simulate.

You can use R and simulation with rbinom, rnorm as an alternative

Solution:

Here we are given the following information: \(L\sim N(10,3^2)\). We want to find the probability of a snowboard being defective: \[ P(L<5)=F_L(5). \] This is simply the cumulative distribution function. We can find the solution by using the following R commands:

pnorm(5,mean=10,sd=3) #0.04779035
[1] 0.048

Thus, the probability a snowboard is considered defective is 4.78%. Out of a shipment of 120 snowboards, the distribution of the number of defective boards is a Binomial Distribution parameterized as follows: \[ N_{def}\sim Bin(120,0.0478). \] We want to find the probability: \[ P(N_{def}>10) \]

pbinom(10,120,0.0478,lower.tail = FALSE) #0.02914742
[1] 0.029

Exercise 19 (Chinese Stock Market) On August 24th, 2015, Chinese equities ended down \(- 8.5\)% (Black Monday). In the last \(25\) years, average is \(0.09\)% with a volatility of \(2.6\)%, and \(56\)% time close within one standard deviation. SP500, average is \(0.03\)% with a volatility of \(1.1\)%. \(74\)% time close within one standard deviation

Economist article, August 2015.

Exercise 20 (Body Weight) Suppose that your model for weight \(X\): Normal distribution with mean \(190\) lbs and variance \(100\) lbs. The problem is to identify the proportion of people have weights over 200 lbs?

Solution:

We are given that \(\mu = 190\) and \(\sigma^2 = 100\) so \(\sigma = 10\). Our probability model is \(X \sim N ( 190 , 100 )\).

\(X\), to get \[Z = \frac{ X - \mu}{ \sigma } = \frac{ X - 190 }{ 10}\]

Now we compute the desired probability \[ \begin{aligned} p(X>200) & = P \left ( \frac{ X - 190 }{ 10 } > \frac{ 200 - 190 }{10 } \right ) \\ & = P \left ( Z > 1 \right ). \end{aligned} \] Now use the cdf function (\(F_Z\)) to get \[P ( Z > 1 ) = 1 - F_Z ( 1 ) = 1 - 0.8413 = 0.1587\]

Exercise 21 (Google Returns) We estimated sample mean and sample variance for daily returns of Google stock \(\bar x = 0.025\), and \(s^2 = 1.1\). If I want to calculate the probability that I lose \(3\)% in a day, I need to assume a probabilistic model of the return and then calculate the \(p(r >3)\). Say, we assume that returns are normally distributed \(r \sim N( \mu , \sigma^2 )\). Estimate parameters of the distribution from the observed data and calculate \(p(r<-3) = 0.003\)

Exercise 22 (Portfolio Means, Standard Deviations and Correlation) Suppose you have a portfolio that is invested with a weight of 75% in the U.S. and 25% in HK. You take a sample of 10 years, or 120 months of historical means, standard deviations and correlations for U.S. and Hong Kong stock market returns. Given this information compute the mean and standard deviation of the returns on your portfolio.

N MEAN S TDEV
Hong Kong 120 0.0170 0.0751
US 120 0.0115 0.0330

Correlation = 0.3

Hint: you will find the following formulas useful. Let \(R_p\) denote the return on your portfolio which is a weighted combination \(R_p = pX + (1 - p)Y\) . Then \[ E(R_p) = p\mu_X + (1 - p)\mu_Y \] \[ Var(R_p) = p^2\sigma_X^2 + (1 - p)^2\sigma_Y^2 + 2p(1-p)\rho \sigma_X \sigma_Y \] where \(\mu_X\), \(\mu_Y\) and \(\sigma_X\), \(\sigma_Y\) are the underlying means and standard deviations for \(X\) and \(Y\).

Exercise 23 (Binomial) In the game Chuck-a-Luck you pick a number from 1 to 6. You roll three dice. If your number doesn’t appear on any dice, you lose $1. If your number appears exactly once, you win $1. If your number appears on exactly two dice, you win $2. If your number appears on all three dice, you win $3.

Hence every outcome has how much you win or lose on the game, namely \(-1, 1, 2\) or \(3\).

  1. Fill in the blanks in the pdf and cdf values
X -1 1 2 3
P(X)
F(X)

Explain your reasoning carefully.

  1. Compute the expected value of the game, \(E(X)\).

Solution:

  1. PDF: This is a binomial experiment with n=3 and p=1/6. Plugging in for \(x=0\): \[P(x=0)=(5/6)^3=.5787\] \[P(x=1)=3 (1/6)^1 \times (5/6)^2=.3472\] \[P(x=2)=3 (1/6)^2 \times (5/6)^1=.0694\] \[P(x=3)=(1/6)^3=.0046\]
  2. CDF: The values for this are the probabilities that X is less than or equal to a given value: \[F(x=0)=P(x \leq 0)=.5787\] \[F(x=1)=P(x \leq 1)=P(x=0)+P(x=1)=.9259\] \[F(x=2)=.9954\] \[F(x=3)=1\]

The expected value is \[ \sum_x x P_X ( x) = -1 \times 0.5787+1 \times 0.3472+2 \times 0.0694+3 \times 0.0046=-\$.08. \] You expect to lose 8 cents per game.

Exercise 24 (Binomial Distribution) A real estate firm in Florida offers a free trip to Florida for potential customers. Experience has shown that of the people who accept the free trip, 5% decide to buy a property. If the firm brings \(1000\) people, what is the probability that at least \(125\) will decide to buy a property?

Solution:

In order to find the probability that at least \(125\) decide to buy, the binomial distribution would require calculating the probabilities for 125-1000. Instead, we use the normal approximation for the binomial. \[ \mu = np =50. \] \[ \sigma = \sqrt{np(1-p)}=\sqrt{1000\times .05\times .95}=\sqrt{47.5}=6.89 \]

Calculating the Z-score for 125: \(Z=\frac{125-50}{6.89}=10.9.\) and \(P(Z \geq 10.9)=0\).

Exercise 25 (Expectation and Strategy) An oil company wants to drill in a new location. A preliminary geological study suggests that there is a \(20\)% chance of finding a small amount of oil, a \(50\)% chance of a moderate amount and a \(30\)% chance of a large amount of oil. The company has a choice of either a standard drill that simply burrows deep into the earth or a more sophisticated drill that is capable of horizontal drilling and can therefore extract more but is far more expensive. The following table provides the payoff table in millions of dollars under different states of the world and drilling conditions

Oil small moderate large
Standard Drilling 20 30 40
Horizontal Drilling -20 40 80

Find the following

  1. The mean and variance of the payoffs for the two different strategies
  2. The strategy that maximizes their expected payoff
  3. Briefly discuss how the variance of the payoffs would affect your decision if you were risk averse
  4. How much are you willing to pay for a geological evaluation that would tell you with certainty the quantity of oil at the site prior to drilling?

Solution:

  1. Using the plug-in rule, we get \[ \begin{aligned} E(Standard) & = & 31\\ E(Horizontal) & = & 40\\ Var(Standard) & = & E(X^{2})-\left[E(X)\right]^{2}\\ & = & 1010-961\\ & = & 49\\ Var(Horizontal) & = & E(X^{2})-\left[E(X)\right]^{2}\\ & = & 2800-1600\\ & = & 1200 \end{aligned} \]
S = c(20,30,40)
P = c(0.2,0.5,0.3)
E_standard = sum(S*P)
H = c(-20,40,80)
E_horizontal = sum(H*P)
E_standard
[1] 31
E_horizontal
[1] 40
sum(S^2*P) - E_standard^2
[1] 49
sum(H^2*P) - E_horizontal^2
[1] 1200
  1. Given our analysis, horizontal drilling maximizes the expected payoff.
  2. The strategy with higher expected payoff has a substantially higher variance. A risk averse person will settle for a strategy with lower expected payoff and lower variance, i.e. standard drilling, while a risk seeking person will choose horizontal drilling.
  3. If the geological evaluation tells us with certainty the type of oil, then we will be able to chose the strategy which maximizes our payoff under different types of oil. Thus, the amount one should be willing to pay is the expected payoff under this revised schedule of payoff minus the maximum payoff under the current schedule. Given this \[ WTP=0.2\times20+0.5\times40+0.3\times80-40=48-40=8 \]

Therefore, once should be willing to pay \(\$8\) million for this information.

Exercise 26 (Google Survey) Visitors to your website are asked to answer a single survey Google website question before they get access to the content on the page. Among all of the users, there are two categories

  1. Random Clicker (RC)
  2. Truthful Clicker (TC)

There are two possible answers to the survey: yes and no.

Random clickers would click either one with equal probability. You are also giving the information that the expected fraction of random clickers is \(0.3\).

After a trial period, you get the following survey results. \(65\)% said Yes and \(35\)% said No.

How many people people who are truthful clickers answered yes?

Solution:

Given the information that the expected fraction of random clickers is 0.3, \[P(RC) = 0.3 \mbox{ and } P(TC) = 0.7\] Conditioning on a random clickers, he would click either one with equal probability. \[ P(Yes \mid RC) = P(No \mid RC) = 0.5 \] By the survey results, we know the proportion response of “Yes" and”No". \[ P(Yes) = 0.65 \mbox{ and } P(No) = 0.35 \] Therefore, the probability that a truthful clicker answer “Yes" is, \[ \begin{aligned} P(Yes \mid TC) &=& \frac{P(Yes)-P(Yes \mid RC)*P(RC)}{P(TC)} \\ &=& \frac{0.65 - 0.5*0.3}{0.7} = 0.71 \end{aligned} \] Notice that, we use law of total probability in the first equation.

Computing

Exercise 27 (Portfolio Means, Standard Deviations and Correlation) You want to build a portfolio of exchange traded funds (ETFs) for your retirement strategy. You’re thinking of whether to invest in growth or value stocks, or maybe a combination of both. Vanguard has two ETFs, one for growth (VUG) and one for value (VTV).

  1. Plot the historical price series for VUG vs VTV.
  2. Calculate the means and standard deviations of both ETFs.
  3. Calculate their covariance.
  4. Suppose you decide on a portfolio that is a 50 / 50 split. Calculate the new mean and variance of your portfolio.
  5. Which portfolio best suits you?
  6. What’s the probability that growth (VUG) will beat value (VTV) in the future?

You will find the following formulas useful. Let \(P\) denote the return on your portfolio which is a weighted combination \(P = aX + bY\). Then \[ E(P) = aE(X) + bE(Y ) \] \[ Var(P ) = a^2Var(X) + b^2Var(Y ) + 2abCov(X, Y ), \] where \(Cov(X, Y )\) is the covariance for \(X\) and \(Y\).

Hint: You can use the following code to get the data

library(quantmod)
getSymbols(c("VUG","VTV"), from = "2015-01-01", to = "2024-01-01")
VUG = VUG$VUG.Adjusted
VTV = VTV$VTV.Adjusted

Solution:

library(quantmod)
getSymbols(c("VUG","VTV"), from = "2015-01-01", to = "2024-01-01");
[1] "VUG" "VTV"
VUG = VUG$VUG.Adjusted
VTV = VTV$VTV.Adjusted
plot(VUG, type = "l", col = "red", main = "VUG")

plot(VTV, type = "l", col = "red", main = "VTV")

VUG = as.numeric(VUG)
VTV = as.numeric(VTV)
n=length(VUG)
VUG = VUG[2:n]/VUG[1:(n-1)]-1
VTV = VTV[2:n]/VTV[1:(n-1)]-1
mean(VUG)
[1] 0.00061
mean(VTV)
[1] 0.00042
sd(VUG)
[1] 0.013
sd(VTV)
[1] 0.011
cov(VUG, VTV)
[1] 0.00012
# portfolio mean and variance
portfolio_mean_1 = 0.5 * mean(VUG) + 0.5 * mean(VTV)
portfolio_var_1 = 0.5^2 * var(VUG) + 0.5^2 * var(VTV) + 2 * 0.5 * 0.5 * cov(VUG, VTV)
portfolio_sd_1 = sqrt(portfolio_var_1)

You may say VUG or VTV best suits you, as long as you give an explanation such as you are risk aversion or not. Now we consider mean and variance of VUG - VTV,

mean_1 = mean(VUG) - mean(VTV)
var_1 = var(VUG) + var(VTV) + 2 * 1 * (-1) * cov(VUG, VTV)
sd_1 = sqrt(var_1)

Therefore the probability that VUG - VTV \(<0\) is

pnorm(0, mean_1, sd = sd_1)
[1] 0.49

However, we care about the probability that VUG beats VTV, that is $P(VUG > VTV) = P(VUG - VTV > 0)

pnorm(0, mean_1, sd = sd_1, lower.tail = FALSE)
[1] 0.51

Exercise 28 (Descriptive Statistics in R) Use the superbowl1.txt and derby2016.csv datasets. The Superbowl contains data on the outcome of all previous Superbowls. The outcome is defined as the difference in scores of the favorite minus the underdog. The spread is the bookmakers’ prediction of the outcome before the game begins. The Derby data consists of all of the results on the Kentucky Derby which is run on the first Saturday in May every year at Churchill Downs racetrack. Answer the following questions

For the Superbowl data.

  1. Plot the spread and outcome variables. Calculate means, standard deviations, covariances, correlations.
  2. What is the mean and the standard deviation of the winning margin (outcome)?
  3. Use a boxplot to compare the favorites’ score versus the underdog.
  4. Does this data look normally distributed?

For the Derby data.

  1. Plot a histogram of the winning speeds and times of the horses. Why is there a long right-hand tail to the distribution of times?
  2. Can you identify the outlying horse with the best winning time?

Solution:

# import superbowl data
mydata1 <- read.table("http://faculty.chicagobooth.edu/nicholas.polson/teaching/41000/superbowl1.txt",header=T)

# look at data
head(mydata1)
   Favorite   Underdog Spread Outcome Upset
1  GreenBay KansasCity     14      25     0
2  GreenBay    Oakland     14      19     0
3 Baltimore     NYJets     18      -9     1
4 Minnesota KansasCity     12     -16     1
5    Dallas  Baltimore      2      -3     1
6    Dallas      Miami      6      21     0
summary(mydata1)
   Favorite           Underdog             Spread      Outcome   
 Length:49          Length:49          Min.   : 1   Min.   :-35  
 Class :character   Class :character   1st Qu.: 3   1st Qu.: -3  
 Mode  :character   Mode  :character   Median : 6   Median :  7  
                                       Mean   : 7   Mean   :  6  
                                       3rd Qu.:10   3rd Qu.: 17  
                                       Max.   :19   Max.   : 45  
     Upset     
 Min.   :0.00  
 1st Qu.:0.00  
 Median :0.00  
 Mean   :0.33  
 3rd Qu.:1.00  
 Max.   :1.00  
# attach so R recognizes  each variable
attach(mydata1)

######################
# part 1
######################
# plot Spread vs Outcome
plot(Spread,Outcome,main="Spread v.s. Outcome")
# add a 45 degree line to compare
abline(1,1)

# Covariance, Correlation, alpha, beta
x <- Spread
y <- Outcome

mean(x)
[1] 7
mean(y)
[1] 6.2
sd(x)
[1] 4.6
sd(y)
[1] 17
cov(x,y)
[1] 24
cor(x,y)
[1] 0.3
beta <- cor(x,y)*sd(y)/sd(x)
alpha <- mean(y)-beta*mean(x)
beta
[1] 1.1
alpha
[1] -1.5
# Regression check
model <- lm(y~x)
coef(model)
(Intercept)           x 
       -1.5         1.1 
####################
# part 2
####################
# Compare boxplot 
boxplot(Spread,Outcome,horizontal=T,names=c("spread","outcome"),
  col=c("red","yellow"),main="Superbowl")

#####################
# part 3
#####################
# see the distribution of outcome and spread through histograms
hist(Outcome,freq=FALSE)
# add a normal distribution line to compare
lines(seq(-50,50,0.01),dnorm(seq(-50,50,0.01),mean(Outcome),sd(Outcome)))

hist(Spread,freq=FALSE)
lines(seq(-10,30,0.01),dnorm(seq(-10,30,0.01),mean(Spread),sd(Spread)))

######################
## Kentucky Derby
######################

mydata2 <- read.csv("http://faculty.chicagobooth.edu/nicholas.polson/teaching/41000/Kentucky_Derby_2014.csv",header=T)

# attach the dataset 
attach(mydata2)

head(mydata2)
  year year_num         date      winner mins secs timeinsec distance speedmph
1 1875        1 May 17, 1875   Aristides    2   38       158      1.5       34
2 1876        2 May 15, 1876     Vagrant    2   38       158      1.5       34
3 1877        3 May 22, 1877 Baden-Baden    2   38       158      1.5       34
4 1878        4 May 21, 1878    Day Star    2   37       157      1.5       34
5 1879        5 May 20, 1879 Lord Murphy    2   37       157      1.5       34
6 1880        6 May 18, 1880       Fonso    2   38       158      1.5       34
summary(mydata2)
      year         year_num       date              winner         
 Min.   :1875   Min.   :  1   Length:140         Length:140        
 1st Qu.:1910   1st Qu.: 36   Class :character   Class :character  
 Median :1944   Median : 70   Mode  :character   Mode  :character  
 Mean   :1944   Mean   : 70                                        
 3rd Qu.:1979   3rd Qu.:105                                        
 Max.   :2014   Max.   :140                                        
      mins           secs      timeinsec      distance       speedmph 
 Min.   :1.00   Min.   : 0   Min.   :119   Min.   :1.25   Min.   :31  
 1st Qu.:2.00   1st Qu.: 2   1st Qu.:122   1st Qu.:1.25   1st Qu.:35  
 Median :2.00   Median : 4   Median :124   Median :1.25   Median :36  
 Mean   :1.99   Mean   :10   Mean   :130   Mean   :1.29   Mean   :36  
 3rd Qu.:2.00   3rd Qu.: 9   3rd Qu.:129   3rd Qu.:1.25   3rd Qu.:37  
 Max.   :2.00   Max.   :60   Max.   :172   Max.   :1.50   Max.   :38  
##################
# part 1
##################
# plot a histogram of speedmph
hist(speedmph,col="blue")

# finer bins 
hist(speedmph,breaks=10,col="red")

hist(timeinsec,breaks=10,col="purple")

####################
# part 2
###################
# to find the left tail observation
k1 <- which(speedmph == min(speedmph))
mydata2[k1,]
   year year_num         date  winner mins secs timeinsec distance speedmph
17 1891       17 May 13, 1891 Kingman    2   52       172      1.5       31
# to find the best horse
k2 <- which(speedmph == max(speedmph))
mydata2[k2,] 
   year year_num     date      winner mins secs timeinsec distance speedmph
99 1973       99 5-May-73 Secretariat    1   59       119      1.2       38

Exercise 29 (Berkshire Hathaway: Yahoo Finance Data) Download daily return data in Warren Buffett’s firm Berkshire Hathaway (ticker symbol: BRK-A) from 1990 to the present. Analyze this data in the following way:

  1. Plot the Historical Price Performance of the stock.
  2. Calculate the Daily returns. Plot a histogram of the returns. Comment on the distribution that you obtain.
  3. Use the summary command to provide statistical data summaries.
  4. Interpret your findings.

Solution:

library(quantmod)
getSymbols("BRK-A", from = "1990-01-01", to = "2024-01-01")
BRKA = get('BRK-A')
BRKA = BRKA[,4]
head(BRKA)
plot(BRKA,type="l",col=20,main="BRKA Share Price",ylab="Price",xlab="Time",bty='n')
# calculate the simple return
N <- length(BRKA) 
y = as.vector(BRKA)
ret <- y[-1]/y[-N]-1

# create summaries of ret for BRK-A
summary(ret)
sd(ret)
# histogram of returns
hist(ret,breaks=50,main="BRK-A daily returns")
# plots to show serial correlation in 1st and 2nd moments
# to save the plot,click "Export" ans "save as image"

acf(ret,lag.max=10,main="serial correlation in 1st moment")
acf(ret^2,lag.max=10,main="serial correlation in 2nd moment")
[1] "BRK-A"
           BRK-A.Close
1990-01-02        8625
1990-01-03        8625
1990-01-04        8675
1990-01-05        8625
1990-01-08        8625
1990-01-09        8555

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -0.121  -0.006   0.000   0.001   0.006   0.161 
[1] 0.014

Exercise 30 (Confidence Intervals) A sample of weights of 40 rainbow trout revealed that the sample mean is 402.7 grams and the sample standard deviation is 8.8 grams.

  1. What is the estimated mean weight of the population?
  2. What is the 99% confidence interval for the mean?

Exercise 31 (Confidence Intervals) A research firm conducted a survey to determine the mean amount steady smokers spend on cigarettes during a week.

A sample of 49 steady smokers revealed that \(\bar{X} = 20\) and \(s = 5\) dollars.

  1. What is the point estimate? Explain what it indicates.
  2. Using a 95% confidence interval, determine the confidence interval for \(\mu\). Explain what it indicates.

Exercise 32 (Back cast US Presidential Elections) Use data from presidential polls to predict the winner of the elections. We will be using data from http://www.electoral-vote.com/. The goal is to use simulations to predict the winning percentage for each of the candidates. Use election.Rmd script as the starter.

Report prediction as a 50% confidence interval for each of the candidates.

Exercise 33 (Russian Parliament Election Fraud (5 pts)) On September 28, 2016 United Russia party won a super majority of seats, which will allow them to change the Constitution without any votes of other parties. Throughout the day there were reports of voting fraud including video purporting to show officials stuffing ballot boxes. Additionally, results in many regions demonstrate that United Russia on many poll stations got anomalously closed results, for example, 62.2% in more than hundred poll stations in Saratov Region.

Using assumption that United Russia’s range in Saratov was [57.5%, 67.5%] and results for each poll station are rounded to one decimal point (when measure in percent), calculate probability that in 100 poll stations out of 1800 in Saratov Region the majority party got exactly 62.2%.

Do you think it can happen by a chance?

Exercise 34 (A/B Testing) Use dataset from ab_browser_test.csv. Here is the definition of the columns:

  • userID: unique user ID
  • browser: browser which was used by userID
  • slot: status of the user (exp = saw modified page, control = saw unmodified page)
  • n_clicks: number of total clicks user did during as a result of n_queries
  • n_queries: number of queries made by userID, who used browser browser
  • n_nonclk_queries: number of queries that did not result in any clicks

Note, that not everyone uses a single browser, so there might be multiple rows with the same userID. In this data set combination of userID and browser is the unique row identifier.

  1. Count how many users in each group. How much larger (in percent) exp group when compared to control group
  2. Using bootstrap, construct 95% confidence interval for mean and median of number of clicks in group exp and group control. Are the mean and median significantly different?
  3. Using bootstrap, check if mean of each group has a normal distribution. Generate \(B = 1000\) bootstrap samples, calculate mean of each and plot qqplot.
  4. Use z-ratio for the means, to perform sis testing, with \(H_0\): there is no difference in average number of clicks between 2 groups
  5. Mann-Whitney (http://www.statmethods.net/stats/nonparametric.html) is another test for comparing means, that does not require normality assumption. Use this test to check hypothesis that means are equal.
  6. For each browser type and each of the 2 groups (control and exp) count the percent of queries that did not result in any clicks. You can do it be dividing sum of n_nonclk_queries by sum of n_queries. Comment your on your results.

Exercise 35 (Chicago Crime Data Analysis) On January 24, 2017 Donald Tramp tweeted about "horrible" murder rate in Chicago.

Trump Tweets

Our goal is to analyze the data and check how statistically significant such a statement. I downloaded Chicago’s crime data from the data portal: data.cityofchicago.org. This data contains reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. This data set has 6.3 million records. Each crime incident is categorized using one of the 35 primary crime types: NARCOTICS, THEFT, CRIMINAL TRESPASS, etc.. I frittered incidents of type HOMICIDE into a separate data set stored in chi_homicide.rds. Use chi_crime.R as a staring script for this problem.

  1. Create a heat map for the homicide incidents. In which areas of the city you think houses are very affordable and in which they are not?
  2. Create a map by plotting a dot for each of the homicide incidents. You will see similar picture as you saw with the heat plot. Look at the Hyde Park area in the south side Chicago. There is an "island" with no homicide incidents! Can you explain why? Hint: You might want open Google maps in your browser and zoom-in into this area.
  3. Though president’s tweet is consistent with the data (goo.gl/VTPzFw), observing 52 homicides in January is not that unusual. Calculate the total number of homicides for each January. Use bootstrap to estimate 95% confidence interval for the mean \(\mu\) over January homicides. Is \(52\) within the interval? Calculate confidence interval using \(t\)-ratio. Do you think results from \(t\)-ratio based calculations are reliable?
  4. The history of 2001-present data is rather short. Chicago tribune provided total number of homicides for Chicago for each month of the 1957-2014 period. Use this data set and calculate the confidence interval for \(\mu\) using bootstrap and \(t\)-ratio. Further answer the following questions: (i) Assuming monthly homicide rate follows Normal distribution, what is the probability that we observe 52 homicides or more? (ii) Do you think Normality assumption is valid? (iii) Assuming monthly homicide rate follows Poisson distribution, what is the probability that we observe 52 homicides or more?
  5. There is a hypothesis that crime rates are related to temperatures (goo.gl/nPpHwv). Check this hypothesis using simple regression. Use linear model to regress homicide rate to the average maximum temperature. Does this relation appear significant? Perform residual diagnostics and find outliers and leverage points.
  6. There is another hypothesis that rise in murder is related to the pullback in proactive policing that started in November of 2015 as a result of Laquan McDonald video release (https://goo.gl/7cm1CC, https://goo.gl/WcH2uB). I calculated total number of homicides for each day and split data into two parts: before and after video release. Using \(t\)-ratio, check the hypothesis \(H_0\): the homicide rate did not change after video release.

Exercise 36 (Gibbs Sampler) Suppose we model data using the following model \[\begin{aligned} y_i \sim &N(\mu,\tau^{-1})\\ \mu \sim & N(0,1)\\ \tau \sim & N(2,1).\end{aligned}\]

The goal is to implement a Gibbs sample for the posterior \(\mu,\tau | y\), where \(y = (y_1,\ldots,y_n)\) is the observed data. Gibbs sampler algorithms iterates between two steps

  1. Sample \(\mu_i\) from \(\mu \mid \tau_{i-1}, y\)
  2. Sample \(\tau_i\) from \(\tau_i \mid \mu_i, y\)

Show that those full conditional distributions are given by \[ \begin{aligned} \mu \mid \tau, y \sim & N\left(\dfrac{\tau n\bar y}{1+n\tau},\dfrac{1}{1+n\tau}\right)\\ \tau \mid \mu,y \sim & \mathrm{Gamma}\left(2+\dfrac{n}{2}, 1+\dfrac{1}{2}\sum_{i=1}^{n}(y_i-\mu)^2\right)\end{aligned} \]

Use formulas for full conditional distributions and implement the Gibbs sampler. The data \(y\) is in the file MCMCexampleData.txt.

Plot samples from the joint distribution over \((\mu,\tau)\) on a scatter plot. Plot histograms for marginal \(\mu\) and \(\tau\) (marginal distributions).

Exercise 37 (AAPL vs GOOG) Download AAPL and GOOG return data from 2018 to 2024. Plot box-plot and histogram. Calculate summary statistics using summary function. Describe clearly what you learn from the summary and the plots.

Solution:

library(quantmod)
getSymbols(c("AAPL","GOOG"), from = "2018-01-01", to = "2024-01-01")
AAPL = dailyReturn(AAPL$AAPL.Adjusted)
GOOG = dailyReturn(GOOG$GOOG.Adjusted)
boxplot(AAPL, col="red"); boxplot(GOOG, col="green")
hist(AAPL, col="red", breaks=50); hist(GOOG, col="green", breaks=50)
summary(AAPL); summary(GOOG)

     Index            daily.returns   
 Min.   :2018-01-02   Min.   :-0.129  
 1st Qu.:2019-07-03   1st Qu.:-0.008  
 Median :2020-12-30   Median : 0.001  
 Mean   :2020-12-30   Mean   : 0.001  
 3rd Qu.:2022-06-30   3rd Qu.: 0.012  
 Max.   :2023-12-29   Max.   : 0.120  
     Index            daily.returns   
 Min.   :2018-01-02   Min.   :-0.111  
 1st Qu.:2019-07-03   1st Qu.:-0.009  
 Median :2020-12-30   Median : 0.001  
 Mean   :2020-12-30   Mean   : 0.001  
 3rd Qu.:2022-06-30   3rd Qu.: 0.011  
 Max.   :2023-12-29   Max.   : 0.104  

From the box plots, we can see many extreme observations out of 1.5 IQR (inter-quarter-range) of the lower and upper quartiles, which are confirmed by the high kurtosis values. We can also see their heavy tails in the histograms. GOOG is positive skew (mean much higher than median) and AAPL is not “significantly” skew (mean close to median).

Exercise 38 (Berkshire Realty) Berkshire Realty is interested in determining how long a property stays on the housing market. For a sample of \(800\) homes they find the following probability table for length of stay on the market before being sold as a function of the asking price

Days until Sold Under 20 20-40 over 40
Under $250K 50 40 10
$250-500K 20 150 80
$500-1M 20 280 100
Over $1 M 10 30 10
  1. What is the probability of a randomly selected house that is listed over \(40\) days before being sold?
  2. What is the probability that a randomly selected initial asking price is under \(250\)K?
  3. What is the joint probability of both of the above event happening?
  4. Assuming that a contract has just been signed to list a home for under $500K, what is the probability that Berkshire realty will sell the home in under \(40\) days?

Solution:

  1. \(P(\mbox{over 40}) = \frac{10+80+100+10}{800} = 25\%\)
  2. \(P(\mbox{under 250K}) = \frac{50+40+10}{800} = 12.5\%\)
  3. \(P(\mbox{over 40 and under 250K}) = \frac{10}{800} = 1.25\%\)
  4. \(P(\mbox{under 40 and under 500K}) = \frac{50+40+20+150}{50+40+20+150+10+80} = 74.29\%\)

Exercise 39 (TrueF/False)  

  1. If \(\mathbb{P} \left ( A \; \mathrm{ and} \; B \right ) \leq 0.2\) then \(\mathbb{P} (A) \leq 0.2\).

  2. If \(P( A | B ) = 0.5\) and \(P(B ) = 0.5\), then the events \(A\) and \(B\) are necessarily independent.

  3. A box has three drawers; one contains two gold coins, one contains two silver coins, and one contains one gold and one silver coin. Assume that one drawer is selected randomly and that a randomly selected coin from that drawer turns out to be gold. Then the probability that the chosen drawer contains two gold coins is \(50\)%.

  4. Suppose that \(P(A) = 0.4 , P(B)=0.5\) and \(P( A \text{ or }B ) = 0.7\) then\(P( A \text{ and }B ) = 0.3\).

  5. If \(P( A \text{ or }B ) = 0.5\) and \(P(A \text{ and }B ) = 0.5\), then \(P(A) = P( B)\).

  6. The following data on age and martial status of \(140\) customers of a Bondi beach night club were taken

    Age Single Not Single
    Under 30 77 14
    Over 30 28 21

    Given this data, age and martial status are independent.

  7. If \(P( A \text{ and }B ) = 0.5\) and \(P(A) = 0.1\), then \(P(B|A) = 0.1\).

  8. In a group of students, \(45\)% play golf, \(55\)% play tennis and \(70\)% play at least one of these sports. Then the probability that a student plays golf but not tennis is \(15\)%.

  9. The following probability table related age with martial status

    Age Single Not Single
    Under 30 0.55 0.10
    Over 30 0.20 0.15

    Given these probabilities, age and martial status are independent.

  10. Thirty six different kinds of ice cream can be found at Ben and Jerry’s.There are \(58,905\) different combinations of four choices of ice cream.

  11. Suppose that for a certain Caribbean island the probability of a hurricane is \(0.25\), the probability of a tornado is \(0.44\) and the probability of both occurring is \(0.22\). Then the probability of a hurricane or a tornado occurring is \(0.05\).

  12. If \(P ( A \text{ and }B ) \geq 0.10\) then \(P(A) \geq 0.10\).

  13. If \(A\) and \(B\) are mutually exclusive events, then \(P(A|B) = 0\).

  14. True. By definition, if \(A\) and \(B\) are mutually exclusive events then \(P( A \text{ and }B)=0\) and so \(P(A|B) = P(A \text{ and }B)/P(B) = 0\)

Solution:

  1. False. We only know \(\mathbb{P} \left ( A \; \mathrm{ and} \; B \right ) \leq \mathbb{P}(A)\).

  2. False. This is not necessarily true. We need more information about each event to definitively say so.

  3. False. Knowing that we have a gold coin, there is 2/3 chance of being in the 2 gold coin drawer and a 1/3 chance of being in the 1 gold coin drawer. Therefore, the probability that the chosen drawer contains twofold coins is 2.

  4. False. \(P(A \text{ and }B)=P(A)+P(B)-P(A \text{ or }B)=0.2\)

  5. True. We know that \(P( A \text{ or }B ) = P(A) +P(B) - P(A \text{ and }B )\) and so\(P(A) + P(B) =1\). Also \(P( A) \geq P( A \text{ and }B ) = 0.5\) and so\(P(A)=P(B) = 0.5\)

  6. False. We can compute conditional probabilities as\(P ( S | U_{30} ) = 0.8462 , P ( M | U_{30} ) = 0.1638\) and\(P ( S | O_{30} ) = 0.5714 , P ( M | O_{30} ) = 0.4286\). The events aren’t independent as the conditional probabilities are not the .

  7. False. \(P(B|A) = P( A \text{ and }B ) / P( A)\) would be greater than one

  8. True. \(P(G)=0.45, P(T)=0.55, P(G \text{ or }T) = 0.70\) implies \(P(G \text{ and }\bar{T}) = 0.15\)

  9. False. \(P ( X > 30 \text{ and }Y = \mathrm{ married} ) = 0.15 \neq -.35 \times 0.25 = P ( X > 30 )P( Y = \mathrm{ married} )\)

  10. True. Combinations are given by \(36!/(36-6)!6! =58,905\)

  11. If two events \(A\) and \(B\) are independent then both \(P(A|B) =P(A)\) and\(P(B|A)=P(A)\).

  12. False. \(P(B|A)=P(A)\). Note that the statement is true if (and only if)\(P(A)=P(B)\)

  13. False. \(P(H \text{ or }T)=P(H)+P(T)-P(H \text{ and }T)=0.25+0.44-0.22=0.47\)

  14. True. \(A \text{ and }B\) is a subset of \(A\) and so\(P(A) \geq P(A \text{ and }B) = 0.10\)

Exercise 40 (Marginal and Joint) Let \(X\) and \(Y\) be independent with a joint distribution given by \[f_{X,Y}(x,y) = \frac{1}{2 \pi} \sqrt{ \frac{1}{xy} } \exp \left( - \frac{x}{2} - \frac{y}{2} \right) \text{ where } x, y > 0 .\] Identify the following distributions

  1. The marginal distribution of \(X\)
  2. Compute the joint distribution of \(U = X\) and \(V = X+Y\)
  3. Compute the marginal distribution of \(V\).

Solution:

  1. \[\begin{aligned} \frac{1}{2\pi} \int_0^\infty \frac{1}{\sqrt{x y}} e^{\frac{1}{2}-(x+y)} dx &= \frac{1}{\sqrt{2 \pi}} \frac{1}{\sqrt{y}} e^{\frac{-y}{2}} \end{aligned}\]
  2. We perform the substitutions \(y \to v-x\) and \(x \to u\) and obtain the parameter vector \(\begin{pmatrix}u \\ v-u\end{pmatrix}\), which has the following Jacobian: \[\begin{aligned} \left|\begin{pmatrix} 1 & 0 \\ -1 & 1 \end{pmatrix}\right| = 1 \end{aligned}\] Thus, the transformed distribution is \[\begin{aligned} f_{U,V}(u,v) &= \frac{1}{2 \pi} \frac{1}{\sqrt{u(v-u)}} e^{-v/2} \end{aligned}\]
  3. \[\begin{aligned} \frac{1}{2 \pi} \int_0^v \frac{1}{\sqrt{u(v-u)}} e^{-v/2} du &= \frac{1}{2} e^{-v/2} \end{aligned}\]

Exercise 41 (Conditional)  

  1. Let \(X\) and \(Y\) be independent standard \(N(0,1)\) random variables. Then \(X^2 + Y^2\) is an exponential distribution.
  2. Let \(X\) and \(Y\) be independent Poisson random variables with rates \(\lambda\) and \(\mu\), respectively. Show that the conditional distribution of \(X | (X+Y)\) is Binomial with \(n = X+Y\) and \(p = \lambda / ( \lambda + \mu)\).

Solution:

  1. True. The random variable \((X^2+Y^2)\) has a \(\chi^2_2\) distribution, and this is an exponential distribution with \(\lambda=1/2\).
  2. \[P(X=l | X+Y =n) = \frac{P(X=l , X+Y=n)}{P(X+Y=n)} = \frac{P(X=l, Y= n-l)}{P(X+Y=n)}\] \[=\frac{P(X=l)P(Y=n-l)}{P(X+Y=n)}=\frac{(e^{-\lambda}\lambda^l/l!)(e^{-\mu}\mu^{n-l}/(n-l)!)}{e^{-\lambda-\mu}(\lambda+\mu)^n/n!}=\frac{n!}{l!(n-l)!}\frac{\lambda^l\mu^{n-l}}{(\lambda+\mu)^n}\] \[=p^l (1-p)^{n-l}\frac{n!}{l!(n-l)!}\]

Exercise 42 (Joint and marginal) Let \(X\) and \(Y\) be independent exponential random variables with means \(\lambda\) and \(\mu\), respectively.

  1. Find the joint distribution of \(U = X+Y\) and \(V = X / ( X+Y)\).
  2. Find the marginal distributions for \(U\) and \(V\).

Solution:

  1. The joint distribution of \(X\) and \(Y\) is given by \[f_{X,Y} ( x , y ) = \frac{1}{\lambda \mu} \exp \left ( - \frac{x}{\lambda} - \frac{y}{\mu} \right )\] The transformation \(u =x+y\) and \(v= x / ( x+y)\) has inverse given by \(x = uv\), \(y = u ( 1 - v)\). Hence, the Jacobian is \(u\). The joint distribution of \((U,V)\) is then \[f_{U,V} ( u,v) = \frac{u}{\lambda \mu} \exp \left ( - \frac{uv}{\lambda} - \frac{u(1-v)}{\mu} \right )\]
  2. The marginal distribution of \(U\) is given by
    \[ \begin{aligned} f_U(u)& = \int_0^1 f_{U,V} ( u,v) dv\\ &= \frac{1}{\lambda - \mu} \left ( \exp \left ( - \frac{ u}{ \lambda } \right ) - \exp \left ( - \frac{ u}{ \mu } \right ) \right ) \end{aligned} \]

The marginal distribution of \(V\) is given by \[ f_V ( v) = \int_{0}^{\infty} f_{U,V} ( u,v) du \] Hence, \[ f_V(v) = \frac{ \lambda \mu }{ ( \mu v + \lambda ( 1-v) )^2 } \]

Exercise 43 (Toll road) You are designing a toll road that carries trucks and cars. Each week you see an average of 19,000 vehicles pass by. The current toll for cars is 50 cents and you wish to set the toll for trucks so that the revenue reaches $11,500 per week. You observe the following data: three of every four trucks on the road are followed by a car, while only one of every five cars is followed by a truck.

  1. What is the equilibrium distribution of trucks and cars on the road?
  2. What should you charge trucks so as to reach your goal of $11,500 in revenues per week?

Solution:

The transition matrix for the number of trucks and cars is given by \[P = \left ( \begin{array}{cc} \frac{1}{4} & \frac{3}{4} \\ \frac{1}{5} & \frac{4}{5} \end{array} \right )\] The equilibrium distribution \(\pi = ( \pi_1 , \pi_2 )\) is given by \(\pi P = \pi\). This has solution given by \[\frac{1}{4} \pi_1 + \frac{1}{5} \pi_2 = \pi_1 \; \; \mathrm{ and} \; \; \frac{3}{4} \pi_1 + \frac{4}{5} \pi_2 = \pi_2\] with \(\pi_1 = \frac{4}{19}\) and \(\pi_2 = \frac{15}{19}\).

With cars at a toll of 50 cents we see that we need to charge trucks $1.00 to generate revenues of $11,500 per week.

Exercise 44 Let \(X, Y\) have a bivariate normal density with density given by \[f_{X,Y} ( x, y) = \frac{1}{ 2 \pi \sqrt{ ( 1 - \rho^2 )} } \exp \left ( - \frac{1}{2} \left ( x^2 - 2 \rho x y + y^2 \right ) \right )\] Consider the transformation \(W=X\) and \(Z = \frac{ Y - \rho X}{ \sqrt{ 1 - \rho^2 } }\). Show that \(W , Z\) are independent and identify their distributions.

Solution:

The Jacobian of this linear transformation is given by \(J = \sqrt{ 1 - \rho^2 }\). Knowing that \(X = g(W,Z) = W\), and \(Y = h(W,Z) = \sqrt{1-\rho^2} Z + \rho W\), and Jacobian \(J\), we can apply the transformation formula: \[ \begin{aligned} f_{W,Z} (w,z) &=& f_{X,Y} (g(w,z),h(w,z)) |J|\\ &=& f_{X,Y} (w, \sqrt{1-\rho^2} Z + \rho W)\sqrt{1-\rho^2}\\ &=& \frac{1}{2\pi} \exp \left ( -\frac{1}{2} \left ( w^2-2\rho w \left ( \sqrt{1-\rho^2} z + \rho w \right ) + \left ( \sqrt{1-\rho^2} z + \rho w \right )^2 \right ) \right )\\ &=& \frac{1}{2\pi} \exp\left ( -\frac{1-\rho^2}{2}(w^2+z^2) \right )\\ &\propto& \exp \left ( -\frac{(1-\rho^2)w^2}{2} \right ) \exp\left ( -\frac{(1-\rho^2)z^2}{2} \right )\\ &\propto& f_W(w) f_Z(z) \end{aligned} \] Because we can write the joint distribution of \((W,Z)\) as the product of two marginal distributions, \(W\) and \(Z\) are independent. \(W\), \(Z\) both follow normal distribution with mean \(0\), and variance \((1-\rho^2)^{-1}\).

Bayes Rule

Exercise 45 (Manchester) While watching a game of Champions League football in a cafe, you observe someone who is clearly supporting Manchester United in the game. What is the probability that they were actually born within 20 miles of Manchester? Assume that you have the following base rate probabilities

  1. The probability that a randomly selected person in a typical local bar environment is born within 20 miles of Manchester is 1/20
  2. The chance that a person born within 20 miles of Manchester actually supports United is 7/10
  3. The probability that a person not born within 20 miles of Manchester supports United with probability 1/10

Solution:

Define

  • B - event that the person is born within 20 miles of Manchester
  • U - event that the person supports United.

We want \(P(B|U)\). By Bayes’ Theorem, \[ P(B|U) = \frac{P(U|B)P(B)}{P(U|B)P(B) + P(U|NB)P(NB)} = 0.269 \]

Exercise 46 (Lung Cancer) According to the Center for Disease Control (CDC), we know that “compared to nonsmokers, men who smoke are about \(23\) times more likely to develop lung cancer and women who smoke are about \(13\) times more likely” You are also given the information that \(17.9\)% of women in 2016 smoke.

If you learn that a woman has been diagnosed with lung cancer, and you know nothing else, what’s the probability she is a smoker?

Solution:

If we draw the decision tree we get

Lung Cancer Tree

If \(y\) is the fraction of women who smoke, and \(x\) is the fraction of nonsmokers who get lung cancer, the number of smokers who get cancer is proportional to 13xy, and the number of nonsmokers who get lung cancer is proportional to x(1-y).

Of all women who get lung cancer, the fraction who smoke is \(13xy / (13xy + x(1-y))\).

The x’s cancel, so it turns out that we don’t actually need to know the absolute risk of lung cancer, just the relative risk. But we do need to know y, the fraction of women who smoke. According to the CDC, y was 17.9% in 2009. So we just have to compute \[ 13y / (13y + 1-y) \approx 74\% \] This is higher than most people guess.

Exercise 47 (Tesla Chip) Tesla purchases a particular chip called the HW 5.0 auto chip from three suppliers Matsushita Electric, Philips Electronics, and Hitachi. From historical experience we know that 30% of the chips are purchased from Matsushita; 20% from Philips and the remaining 50% from Hitachi. The manufacturer has extensive histories on the reliability of the chips. We know that 3% of the Matsushita chips are defective; 5% of the Philips and 4% of the Hitachi chips are defective.

A chip is later found to be defective; what is the probability it was manufactured by each of the manufacturers?

Solution:

Let \(A_1\) denotes the event that the chip is purchased from Matsushita. Similarly \(A_2\) and \(A_3\). Let \(D\) be the event that the chip is defective and \(\bar{D}\) is not. We know \[ P( D | A_1 ) = 0.03 \; \; P( D | A_2 ) = 0.05 \; \; P( D | A_3 ) = 0.04 \; \; \] and \[ P( A_1 ) = 0.30 \; \; P( A_2 ) = 0.20 \; \; P( A_3 ) = 0.50 \]

The probability that it was manufactured by Philips given that its defective is given by Bayes rule which gives \[ P( A_2 | D ) = \frac{ P( D | A_2 ) P( A_2 ) }{P( D | A_1 )P( A_1 ) +P( D | A_2 ) P( A_2 ) + P( D | A_3 ) P( A_3 ) } \] That is \[ P( A_2 | D ) = \frac{ 0.20 \times 0.05 }{ 0.30 \times 0.03 +0.20 \times 0.05 + 0.50 \times 0.04 } = \frac{10}{39} \] Similarly, \(P( A_1 | D ) = 9 / 39\) and \(P( A_3 | D ) = 20/39\)

Exercise 48 (Light Aircraft) Seventy percent of the light aircraft that disappear while in flight in a certain country are subsequently discovered. Of the aircraft that are discovered, 60% have an emergency locator, whereas 90% of the aircraft not discovered do not have such a locator. Suppose that a light aircraft has disappeared. If it has an emergency locator, what is the probability that it will be discovered?

Solution:

Let D = {The disappearing aircraft is discovered.} and E = {The aircraft has an emergency locator}. Given: \[ P(D) = 0.70 \quad P(D^c) = 1- P(D) = 0.30 \quad P(E|D) = 0.60 \] \[ P(E^c|D^c) = 0.90 \quad P(E|D^c) = 1 - P(E^c|D^c) = 0.10 \] Then, \[ \begin{aligned} P(D|E) &=& \frac{P(D \text{ and }E)}{P(E)}\\ &=& \frac{P(D) \times P(E|D)}{P(D) \times P(E|D) + P(D^c) \times P(E|D^c)}\\ &=& \frac{0.70 \times 0.60}{(0.70\times0.60) + (0.30 \times 0.10)} = 0.93 \end{aligned} \]

Exercise 49 (Floyd Landis) Floyd Landis was disqualified after winning the 2006 Tour de France. This was due to a urine sample that the French national anti-doping laboratory flagged after Landis had won stage \(17\) because it showed a high ratio of testosterone to epitestosterone. Because he was among the leaders he provided \(8\) pairs of urine samples – so there were \(8\) opportunities for a true positive and \(8\) opportunities for a false positive.

  1. Assume that the test has a specificity of \(95\)%. What’s the probability of all \(8\) samples being labeled “negative”?
  2. Now assume the specificity is \(99\)%. What’s the false positive rate for the test
  3. Based on this data, explain how you would assess the probability of guilt of Landis in a court hearing.

Solution:

  1. We’re given that \(0.95= \frac{ \text{number of true negatives}}{\text{number of true negatives + false positives}}.\)    \(P( \text{8 negatives} | \text{not guilty})= P(\text{8 true negatives})=0.95^8\).
  2. False Positive rate = \(\frac {\text{False positives}}{\text{False positives + True negatives}}=\frac{1}{100}=0.01.\)
  3. \(P(8 negatives| \text{not guilty})=P(\text{8 true negatives})=0.95^8\) (or \(0.99^8\) if we follow the part 2).  \(P(\text{at least 1 positive}|\text{not guilty})=1-P(\text{8 negatives}|\text{not guilty})=1-0.95^8\approx 0.34\). Hence, if the specificity is 0.95, the chance of a false positive is 0.34, which is pretty large.

Exercise 50 (BRCA1) Approximately \(1\)% of woman aged \(40-50\) have breast cancer. A woman with breast cancer has a \(90\)% chance of a positive mammogram test and a \(10\)% chance of a false positive result. Given that someone has a positive test, what’s the posterior probability that they have breast cancer?

If you have the BRCA1 gene mutation, you have a \(90\)% chance of developing breast cancer. The prevalence of mutations at BRCA1 has been estimated to be 0.04%-0.20% in the general population. The genetic test for a mutation of this gene has a \(99.9\)% chance of finding the mutation. The false positive rate is unknown, but you are willing to assume its still \(10\)%.

Given that someone tests positive for the BRCA1 mutation, what’s the posterior probability that they have breast cancer?

Solution: We know \(P(C) = 0.01\) and \(P(T\mid C) = 0.9\), \(P(T\mid \bar C) = 0.1\), \[ P(C\mid T) = \frac{P(T\mid C)P(C)}{P(T\mid C)P(C) + P(T\mid \bar C)P(\bar C)} = \frac{0.9 \times 0.01}{0.9 \times 0.01 + 0.1 \times 0.99} = 0.083 \]

0.9*0.01/(0.9*0.01 + 0.1*0.99)
[1] 0.083

For part 2, we know \(P(C\mid B) = 0.9\), \(P(B) \in [0.04,0.2]\), \(P(BT\mid B) = 0.999\), \(P(BT\mid \bar B) = 0.1\), then

\[ P(C\mid BT) = \frac{P(BT, C)}{P(BT)} \] We will do the calculation for the low-end of the BRCA1 mutation \(P(B) = 0.0004\) The total probability of a positive test is given by \[ P(BT) = P(BT\mid B)P(B) + P(BT\mid \bar B)P(\bar B) = 0.999*0.0004 + 0.1*0.9996 = 0.1003596 \]

We can calculate joint over \(BT\) and \(C\) by marginalizing \(B\) from the joint distribution of \(B\) and \(C\) and \(BT\). \[ P(BT,C,B) = P(BT\mid B,C)P(B,C) = P(BT\mid B)P(C\mid B)P(B) \] Here we used the fact that \(BT\) is conditionally independent of \(C\) given \(B\). Now we can marginalize over \(B\) to get \[ P(BT,C) = P(BT,C,\bar B) + P(BT,C,B) = P(BT\mid \bar B)P(\bar B)P(C\mid \bar B) + P(BT\mid B)P(B)P(C\mid B) = 0.1*0.9996*P(C\mid \bar B) + 0.999*0.0004*0.9 \] We can find \(P(C\mid \bar B)\) from the total probability \[ P(C) = P(C\mid B)P(B) + P(C\mid \bar B)P(\bar B) = 0.9*0.0004 + P(C\mid \bar B)*0.9996 \] \[ P(C\mid \bar B) = (0.01 - 0.9*0.0004)/0.9996 = 0.009643858 \] \[ P(BT,C) = 0.1*0.9996*0.009643858 + 0.999*0.0004*0.9 = 0.00132364 \]

\[ P(C\mid BT) = \frac{P(BT, C)}{P(BT)} = 0.00132364/0.1003596 = 0.01318897 \] A small probability of 1.32%.

For the high-end of the BRCA1 mutation \(P(B) = 0.002\), we get \[ P(C\mid BT) = 0.02637794 \] Slightly large probability of 2.64%.

PB = seq(0.0004,0.002, by = 0.0001)
PCBT = 0.1*(1-PB)*(0.01 - 0.9*PB)/(1-PB) + 0.999*PB*0.9
plot(PB,PCBT, type = "l", xlab = "P(B)", ylab = "P(C|BT)")

Exercise 51 (Another Cancer Test) Bayes is particularly useful when predicting outcomes that depend strongly on prior knowledge.

Suppose that a woman is her forties takes a mammogram test and receives the bad news of a positive outcome. Since not every positive outcome is real, you assess the following probabilities. The base rate for a woman is her forties to have breast cancer is \(1.4\)%. The probability of a positive test given breast cancer is \(75\)% and the probability of a false positive is \(10\)%.

Given the positive test, what’s the probability that she has breast cancer?

Exercise 52 (Bayes Rule: Hit and Run Taxi) A certain town has two taxi companies: Blue Birds, whose cabs are blue, and Uber, whose cabs are black. Blue Birds has 15 taxis in its fleet, and Uber has 75. Late one night, there is a hit-and-run accident involving a taxi.

The town’s taxis were all on the streets at the time of the accident. A witness saw the accident and claims that a blue taxi was involved. The witness undergoes a vision test under conditions similar to those on the night in question. Presented repeatedly with a blue taxi and a black taxi, in random order, they successfully identify the colour of the taxi 4 times out of 5.

Which company is more likely to have been involved in the accident?

Solution:

We need to know \(P(Blue \mid \mbox{identified Blue})\) and \(P(Black \mid \mbox{identified Blue})\).

First of all, write down some probability statements given in the problem. \(P(Blue) = 16.7\% \mbox{ and } P(Black) = 83.3\%\) \[ P(\mbox{identified Blue} \mid Blue) = 80\% \mbox{ and } P(\mbox{identified Black} \mid Blue) = 20\% \] \[ P(\mbox{identified Black} \mid Black) = 80\% \mbox{ and } P(\mbox{identified Blue} \mid Black) = 20\% \] Therefore, by Bayes Rule, \[ \begin{aligned} P(Blue \mid \mbox{identified Blue}) =& \frac{P(\mbox{identified Blue} \mid Blue)*P(Blue) }{P(\mbox{identified Black})} \\ =& \frac{P(\mbox{identified Blue} \mid Blue)*P(Blue) }{P(\mbox{identified Blue} \mid Blue)*P(Blue) + P(\mbox{identified Black} \mid Black)*P(Black) } \\ =& 44.5\% \end{aligned} \] \[ P(Black \mid \mbox{identified Blue}) = 1- P(Blue \mid \mbox{identified Blue}) = 55.5\% \]

Therefore, even though the witness said it was a Blue car, the probability that it was a Black car is higher.

Exercise 53 (Gold and Silver Coins) A chest has two drawers. It is known that one drawer has \(3\) gold coins and no silver coins. The other drawer is known to contain \(1\) gold coin and \(2\) silver coins.

You don’t know which drawer is which. You randomly select a drawer and without looking inside you pull out a coin. It is gold. Show that the probability that the remaining two coins in the drawer are gold is \(75\)%.

Solution:

Suppose drawer A contains 3G and drawer B contains 1G2S. We know the probability \(P(G \mid A) = 1\) and \(P(G \mid B) = 1/3\). Also, it must be either of two drawers, so P(A) = P(B) = 1/2. What we are are looking for is \(P(A \mid G)\): the probability that it is drawer A. By Bayes’ rule, \[ \begin{aligned} P(A \mid G) &=& \frac{P(G \mid A)\times P(A)}{P(G)} \\ &=& \frac{P(G \mid A)\times P(A)}{P(G \mid A)\times P(A) + P(G \mid B)\times P(B)} \mbox{ (Law of total probability)} \\ &=& \frac{1\times \frac{1}{2}}{1*\frac{1}{2}+\frac{1}{3}\times\frac{1}{2}} = \frac{3}{4} \end{aligned} \] The probability that it is drawer A is 75%.

Exercise 54 (The Monty Hall Problem.) This problem is named after the host of the long-running TV show, Let’s Make a Deal. A contestant is given a choice of 3 doors. There is a prize (a car, say) behind one of the doors and something worthless behind the other two doors (say two goats).

After the contestant chooses a door Monty opens one of the other two doors, revealing a goat. The contestant has the choice of switching doors. Is it advantageous to switch doors or not?

Solution:

Suppose you plan to switch:

You can either pick a winner, a loser, or the other loser when you make your first choice. Each of these options has a probability of \(1/3\) and are marked by a “1” below. In each case, Monty will reveal a loser (X). In the first case, he has a choice, but whichever he reveals, you will switch to the other and lose. But in the other two cases, there is only one loser for him to reveal. He must reveal this one, leaving only the winner. So, if you initially pick a loser, you will win by switching. That is, there is a \(2/3\) chance of winning if you use the switch strategy.

W L L
1/3 1 X
1/3 1 X
1/3 X 1

Not convinced? Try thinking about it this way: Imagine the question with 1000 doors, and Monty will reveal 998 wrong doors after you pick one, so you are left with your choice, and one of the remaining 999 doors. Now do you want to stay or switch?

Again, suppose you are going to switch. Define the events

  • W = the one you switch to is a winner
  • FW = your first choice is a winner
  • FL = your first choice is a loser

Since you must pick either a winner or loser with your first choice, and cannot pick both, FW and FL are mutually exclusive and collectively exhaustive. By the rule of total probability: \[ P(W) = P(W \text{ and }FW) + P(W \text{ and }FL) = P(W|FW)P(FW) + P(W|FL)P(FL) \] \[ P(W) = 0 \times \frac{1}{3} + 1 \times \frac{2}{3} = \frac{2}{3} \] Why is \(P(W\mid FW)=0\)? Because if we choose correctly, and we do switch, we must be on a loser.

Why is \(P(W|FL)=1\)? If we first picked a loser, and then switched, we will now have a winner. These both come from above.

There’s a longer explanation: Suppose you choose door A and Monty opens B.

Consider the events

  • A=prize is behind A
  • B=prize is behind B
  • C=prize is behind C
  • MA=Monty opens A
  • MB=Monty opens B
  • MC=Monty Opens C

Before we choose, each door was equally likely: \(P(A)=P(B)=P(C)=1/3\).

In this case, we know Monty opened B. To decide whether to switch, we want to know if \(P(A |MB)\) and \(P(C|MB)\) are the same. If they are, then there is no gain in switching: \[ P(A|MB) = \frac{P(A \text{ and }MB)}{P(MB)} = \frac{P(MB|A)P(A)}{P(MB)} \] We need these components: \(P(A)=1/3,~P(MB|A)=1/2\). This is because you picked A, and Monty can open either B or C. He cannot open A. He can open B or C because the condition in this conditional probability is that the prize is actually behind A. \[ \begin{aligned} P(MB) & = P(MB \text{ and }A) + P(MB \text{ and }B) + P(MB \text{ and }C)\\ &= P(MB|A)P(A) + P(MB|B)P(B) + P(MB|C)P(C)\\ & = \frac{1}{2}\times\frac{1}{3} + 0\times \frac{1}{3} + 1\times \frac{1}{3} = \frac{1}{6} + \frac{1}{3} = \frac{1}{2} \end{aligned} \] I already showed that \(Pr(MB\)|A)=1/2$. We have that \(Pr(MB|B)=0\) because Monty cannot reveal B if this is where the prize is. If the prize is behind C, he must open B, since you have picked A.

Putting it all together: \(P(A|MB) = \frac{P(MB|A)P(A)}{P(MB)} = \frac{\frac{1}{2}\times \frac{1}{3}}{\frac{1}{2}} = \frac{1}{3}\)

Similarly for C: \(P(C|MB) = \frac{P(MB|C)P(C)}{P(MB)} = \frac{1\times \frac{1}{3}}{\frac{1}{2}} = \frac{2}{3}\)

So we are always better off switching$!

Exercise 55 (Medical Testing for HIV) A controversial issue in recent years has been the the possible implementation of random drug and/or disease testing (e.g. testing medical workers for HIV virus, which causes AIDS). In the case of HIV testing, the standard test is the Wellcome Elisa test.

The test’s effectiveness is summarized by the following two attributes:

  • The sensitivity is about 0.993. That is, if someone has HIV, there is a probability of 0.993 that they will test positive.
  • The specificity is about 0.9999. This means that if someone doesn’t have HIV, there is probability of 0.9999 that they will test negative.

In the general population, incidence of HIV is reasonably rare. It is estimated that the chance that a randomly chosen person has HIV is \(0.000025\).

To investigate the possibility of implementing a random HIV-testing policy with the Elisa test, calculate the following:

  1. The probability that someone will test positive and have HIV.
  2. The probability that someone will test positive and not have HIV.
  3. The probability that someone will test positive.
  4. Suppose someone tests positive. What is the probability that they have HIV?

In light of the last calculation, do you envision any problems in implementing a random testing policy?

Solution:

First, introduce some notation: let \(H\) = “Has HIV” and \(T\) = “Tests Positive”. We are given the following sensitivities and specificities \[ P(T|H) = 0.993 \; \; \text{ and} \; \; P(\bar{T}|\bar{H}) = 0.9999 \] The base rate is \(P(H) = 0.000025\).

  1. \(P(T \text{ and }H) = P(T|H)P(H) = 0.993*0.000025 = 0.000024825\)
  2. \[\begin{aligned} P(T|\bar{H})P(\bar{H}) & = (1-P(\bar{T}|\bar{H}))(1-P(H))\\ & = 0.0001 \times 0.999975 = 0.0000999975 \end{aligned}\]
  3. \(P(T) = P(T \text{ and }H) + P(T \text{ and }\bar{H}) = 0.000024825 + 0.00009975 = 0.0001248225\)
  4. \(P(H|T) = \frac{P(H \text{ and }T)}{P(T)} = \frac{0.000024825}{0.0001248225} = 0.1988\)

This says that if you test positive, you have a 20% chance of having the disease. In other words, 80% of people who test positive will not have the disease. The large number of false positives means that implementing such a policy will be pretty unpopular among the people who have to be tested. Testing high risk people only might be a better idea, as this will increase the \(P(H)\) in the population.

Exercise 56 (The Three Prisoners) An unknown two will be shot, the other freed. Prisoner A asks the warder for the name of one other than himself who will be shot, explaining that as there must be at least one, the warder won’t really be giving anything away. The warder agrees, and says that B will be shot. This cheers A up a little: his judgmental probability for being shot is now 1/2 instead of 2/3.

Show (via Bayes theorem) that

  1. A is mistaken - assuming that he thinks the warder is as likely to say "C" as "B" when he can honestly say either; but that
  2. A would be right, on the hypothesis that the warder will say "B" whenever he honestly can.

Exercise 57 (The Two Children) You meet Max walking with a boy whom he proudly introduces as his son.

  1. What is your probability that his other child is also a boy, if you regard him as equally likely to have taken either child for a walk?
  2. What would the answer be if you regarded him as sure to walk with the boy rather than the girl, if he has one of each?
  3. What would the answer be if you regarded him as sure to walk with the girl rather than the boy, if he has one of each?

Exercise 58 (Medical Exam) As a result of medical examination, one of the tests revealed a serious illness in a person. This test has a high precision of 99% (the probability of a positive response in the presence of the disease is 99%, the probability of a negative response in the absence of the disease is also 99%). However, the detected disease is quite rare and occurs only in one person per 10,000. Calculate the probability that the person being examined does have an identified disease.

Exercise 59 (The Jury) Assume that the probability is 0.95 that a jury selected to try a criminal case will arrive at the correct verdict whether innocent or guilty. Further, suppose that the 80% of people brought to trial are in fact guilty.

  1. Given that the jury finds a defendant innocent what’s the probability that they are in fact innocent?
  2. Given that the jury finds a defendant guilty what’s the probability that they are in fact guilty?
  3. Do these probabilities sum to one?

Solution:

Let \(I\) denote the event of innocence and let \(VI\) denote the event that the jury proclaims an innocent verdict.

  1. We want \(P(I|VI)\) which is given by Bayes’ rule \[ P(I|VI)=\frac{P(VI|I)P(I)}{P(VI)} \] where, by the law of total probability \(P(VI)=P(VI|I)P(I)+P(VI|\bar{I})P(\bar{I})\) Hence \(P(VI) =0.95\cdot 0.2+0.05\cdot 0.8=0.23\) Hence \(P(I|VI)=\frac{0.95\cdot 0.2}{0.23}=0.83\)
  2. Similarly \[ P(G|VG)=\frac{P(VG|G)P(G)}{P(VG)}=\frac{P(VG|G)P(G)}{1-P(VI)}=\frac{% 0.95\cdot 0.8}{1-0.23}=0.99 \]
  3. No. \(P(I|VI)+P(G|{VG})\neq 1\) because they are conditioned on different events.

Exercise 60 (Oil company) An oil company has purchased an option on land in Alaska. Preliminary geologic studies have assigned the following probabilities of finding oil \[ P ( \text{ high \; quality \; oil} ) = 0.50 \; \; P ( \text{ medium \; quality \; oil} ) = 0.20 \; \; P ( \text{ no \; oil} ) = 0.30 \; \; \] After 200 feet of drilling on the first well, a soil test is taken. The probabilities of finding the particular type of soil identified by the test are as follows: \[ P ( \text{ soil} \; | \; \text{ high \; quality \; oil} ) = 0.20 \; \; P ( \text{ soil} \; | \; \text{ medium \; quality \; oil} ) = 0.80 \; \; P ( \text{ soil} \; | \; \text{ no \; oil} ) = 0.20 \; \; \]

  1. What are the revised probabilities of finding the three different types of oil?
  2. How should the firm interpret the soil test?

Solution:

  1. Using Bayes’ Theorem, we get \[P ( \text{ high \; quality \; oil} \; | \; \text{ soil}) = \frac{ P ( \text{ soil} \; | \; \text{ high \; quality \; oil} ) P ( \text{ high \; quality \; oil} )} { P ( \text{ soil} ) }\] Using the law of total probability \[P ( \text{ soil}) = 0.5 \times 0.2 + 0.2 \times 0.8 + 0.3 \times 0.2 = 0.32\] Hence \[P ( \text{ high \; quality \; oil} \; | \; \text{ soil}) = \frac{ 0.5 \times 0.2}{ 0.32 } = 0.3125\] Similarly \[P ( \text{ medium \; quality \; oil} \; | \; \text{ soil}) = \frac{ 0.8 \times 0.2}{ 0.32 } = 0.50\] \[P ( \text{ no \; oil} \; | \; \text{ soil}) = \frac{ 0.3 \times 0.2}{ 0.32 } = 0.1875\]
  2. The firm should interpret the soil test as increasing the probability of medium quality oil, but making both no oil and high quality oil less likely than their initial probabilities.

Exercise 61 A screening test for high blood pressure, corresponding to a diastolic blood pressure of \(90\)mm Hg or higher, produced the following probability table

Hypertension
Test Present Absent
+ve 0.09 0.03
-ve 0.03 0.85
  1. What’s the probability that a random person has hypertension?
  2. What’s the probability that someone tests positive on the test?
  3. Given a person who tests positive, what is the probability that they have hypertension?
  4. What would happen to your probability of having hypertension given you tested positive if you initially thought you had a \(50\)% chance of having hypertension.

Solution:

  1. \(P(H) = 0.09 + 0.03 = 0.12\)
  2. \(P(+)=0.09+0.03 =0.12\)
  3. \(P(H|+)=P(+|H)P(H)/P(+)= 0.09/0.12 = 0.75\)
  4. Now you have to re-calculate \(P(+) = P( +|H)P(H) + P( + | \bar{H} )P( \bar{H} ) = 0.38\). Then, by Bayes rule \[P(H|+)= \frac{P(+|H)P(H)}{P(+)} = \frac{0.09 \times 0.5}{0.38} = 0.99\]

Exercise 62 (Steroids) Suppose that a hypothetical baseball player (call him “Rafael”) tests positive for steroids. The test has the following sensitivity and specificity

  1. If a player is on Steroids, there’s a \(95\)% chance of a positive result.
  2. If a player is clean, there’s a \(10\)% chance of a positive result.

A respected baseball authority (call him “Bud”) claims that \(1\)% of all baseball players use Steroids. Another player (call him “Jose”) thinks that there’s a \(30\)% chance of all baseball players using Steroids.

  1. What’s Bud’s probability that Rafael uses Steroids?
  2. What’s Jose’s probability that Rafael uses Steroids?

Explain any probability rules that you use.

Solution:

Let \(T\) and \(\bar{T}\) be positive and negative test results. Let \(S\) and \(\bar{S}\) be using and not using Steroids, respectively. We have the following conditional probabilities \[ P( T | S ) = 0.95 \; \; \text{ and} \; \; P( T | \bar{S} ) = 0.10 \] For our prior distributions we have \(P_{ Bud } ( S ) = 0.01\) and \(P_{ Jose } ( S ) = 0.30\).

From Bayes rule and we have \[ P( S | T ) = \frac{ P( T | S ) P( S )}{ P(T )} \] and by the law of total probability \[ P(T) = P( T| S) P(S) + P( T| \bar{S} ) P( \bar{S} ) \] Applying these two probability rules gives \[ P_{ Bud} ( S | T ) = \frac{ 0.95 \times 0.01 }{ 0.95 \times 0.01 + 0.10 \times 0.99 } = 0.0876 \] and \[ P_{ Jose } ( S | T ) = \frac{ 0.95 \times 0.3 }{ 0.95 \times 0.3 + 0.10 \times 0.7 } = 0.8028 \]

Exercise 63 A Breathalyzer test is calibrated so that if it is used on a driver whose blood alcohol concentration exceeds the legal limit, it will read positive \(99\)% of the time, while if the driver is below the limit it will read negative \(90\)% of the time. Suppose that based on prior experience, you have a prior probability that the driver is above the legal limit of \(10\)%.

  1. If a driver tests positive, what is the posterior probability that they are above the legal limit?
  2. At Christmas \(20\)% of the drivers on the road are above the legal limit. If all drivers were tested, what proportion of those testing positive would actually be above the limit
  3. How does your answer to part \(1\) change. Explain

Solution:

Let the events be defined as follows:

  • E - Alcohol concentration exceeds legal limit
  • NE - Alcohol concentration does not exceed legal limit
  • P - Breathalyser reads positive
  • N - Breathalyser reads negative

\[ \begin{aligned} P(P|E) & = & 0.99\\ P(N|NE) & = & 0.90\\ P(E) & = & 0.10\end{aligned} \]

Based on above, we have \(P(P|NE)=1-P(N|NE)=0.10\) and \(P(NE)=1-P(E)=0.90\). Then,

\[ \begin{aligned} P(E|P) & = & \frac{P(P|E)P(E)}{P(P)}\\ & = & \frac{P(P|E)P(E)}{P(P|E)P(E)+P(P|NE)P(NE)}\\ & = & \frac{0.99\times0.10}{0.99\times0.10+0.10\times0.90}\\ & = & 52.38\%\end{aligned} \]

Now, we have \(P(E)=0.20\). Thus, \(P(NE)=1-P(E)=0.80\). We now calculate

\[ \begin{aligned} P(E|P) & = & \frac{P(P|E)P(E)}{P(P)}\\ & = & \frac{P(P|E)P(E)}{P(P|E)P(E)+P(P|NE)P(NE)}\\ & = & \frac{0.99\times0.20}{0.99\times0.20+0.10\times0.80}\\ & = & 71.22\%\end{aligned} \]

Thus, 71.22% of all the drivers who test positive would be above the legal limit.

Compared to Part 1, the posterior probability increases due to the fact that the probability of a driver testing positive and exceeding the legal limit increases as does the probability of testing positive. However, the increase in the probability of testing positive and exceeding the legal limit is greater than the increase in the probability of testing positive which results in an increase in the posterior probability.

Exercise 64 (Chicago bearcats) The Chicago bearcats baseball team plays \(60\)% of its games at night and \(40\)% in the daytime. They win \(55\)% of their night games and only \(35\)% of their day games. You found out the next day that they won their last game

  1. What is the probability that the game was played at night
  2. What is the marginal probability that they will win their next game?

Explain clearly any rules of probability that you use.

Solution:

We have the following info:

  • \(P(Night) = 0.6, \quad P(Day)=0.4\)
  • \(P(Win|Night) = 0.55, \quad P(Lose|Night)=0.45\)
  • \(P(Win|Day) = 0.35, \quad P(Lose|Day)=0.65\)
  1. \[P(Night|Win) = \frac{P(Win|Night)P(Night)}{P(Win)}\] \[=\frac{P(Win|Night)P(Night)}{P(Win|Day)P(Day)+P(Win|Night)P(Night)}\] \[\frac{0.55*0.6}{0.35*0.4+0.55*0.6}=\frac{0.33}{0.47}=0.7021\]
  2. \(P(Win) = P(Win|Day)P(Day)+P(Win|Night)P(Night)=0.35*0.4+0.55*0.6=0.47\)

Exercise 65 (Spam Filter) Several spam filters use Bayes rule. Suppose that you empirically find the following probability table for classifying emails with the phrase “buy now” in their title as either “spam” or “not spam”.

Spam Not Spam
“buy now” 0.02 0.08
not “buy now” 0.18 0.72
  1. What is the probability that you will receive an email with spam?
  2. Suppose that you are given a new email with the phrase “buy now” in its title. What is the probability that this new email is spam?
  3. Explain clearly any rules of probability that you use.

Solution:

  1. The probability that you will receive an email with spam is: \[ \begin{aligned} Pr(Spam) = .02 + .18 = .2 \end{aligned} \]
  2. The posterior probability is \[ \begin{aligned} Pr(Spam|buy \; now) = \frac{Pr(Spam\;and\;buy\;now)}{Pr(buy\;now)} = \frac{.02}{.02+.08} = .2 \end{aligned} \]
  3. The marginal probability is given by the law of total probability: \(P( B) = P(B|A)P(A) + P(B|\bar{A})P(\bar{A})\). We also use the definition of conditional probability (not Bayes’ rule) \(Pr(A|B) = \frac{Pr(A\;and\;B)}{Pr(B)}\)

Exercise 66 (Chicago Cubs) The Chicago Cubs are having a great season. So far they’ve won \(72\) out of the \(100\) games played so far. You also have the expert opinion of Bob the sports analysis. He tells you that he thinks the Cubs will win. Historically his predictions have a \(60\)% chance of coming true.

  1. Calculate the probability that the Cubs will win given Bob’s prediction
  2. Suppose you now learn that it’s a home game and that the Cubs win \(60\)% of their games at Wrigley field. What’s you updated probability that the Cubs will win their game?

Solution:

  1. First we have \[ P ( \text{ Win} | \text{ Bob} ) = \frac{ 0.72 \times 0.6}{ 0.72 \times 0.6 + 0.28 \times 0.4} = 0.79 \]
  1. Learning the new information, we use the previous posterior as a prior for the next Bayes update \[ P ( \text{ Win} | \text{ home} , \text{ Bob} ) = \frac{ 0.79 \times 0.60}{ 0.79 \times 0.60 + 0.21 \times 0.40} = 0.85 \] it’s highly likely that the Cubs will win!

Exercise 67 (Student-Grade Causality) Consider the following probabilistic model. The student does poorly poorly in a class (\(c = 1\)) or well (\(c = 0\)) depending on the presence/absence of depression (\(d = 1\) or \(d = 0\)) and weather he/she partied last night (\(v = 1\) or \(v = 0\)) . Participation in the party can also lead to the fact that the student has a headache (\(h = 1\)). As a result of poor student’s performance, the teacher gets upset (\(t = 1\)). The probabilities are given by:

\(p(c=1|d,v)\) v d
0.999 1 1
0.9 1 0
0.9 0 1
0.01 0 0
\(p(h=1|v)\) v
0.9 1
0.1 0
\(p(t=1|c)\) c
0.95 1
0.05 0

\(p(v=1)=0.2\), and \(p(d=1) = 0.4\).

Draw the causal relationships in the model. Calculate \(p(v=1|h=1)\), \(p(v=1|t=1)\), \(p(v=1|t=1,h=1)\).

Exercise 68 (Prisoner) An unknown two will be shot, the other freed. Prisoner A asks the warder for the name of one other than himself who will be shot, explaining that as there must be at least one, the warder won’t really be giving anything away. The warder agrees, and says that B will be shot. This cheers A up a little: his judgmental probability for being shot is now 1/2 instead of 2/3. Show (via Bayes theorem) that

  1. A is mistaken - assuming that he thinks the warder is as likely to say "C" as "B" when he can honestly say either; but that
  2. A would be right, on the hypothesis that the warder will say "B" whenever he honestly can.

Solution:

Exercise 69 (True/False)  

  1. In a sample of \(100,000\) emails you found that \(550\) are spam. Your next email contains the word “bigger”. From historical experience, you know that half of all spam email contains the word “bigger” and only \(2\)% of non-spam emails contain it. The probability that this new email is spam is approximately \(12\)%.
  2. Suppose that there’s a \(5\)% chance that it snows tomorrow and a \(80\)%chance that the Chicago bears play their football game tomorrow given that it snows. The probability that they play tomorrow is then \(80\)%.
  3. Bayes’ rule states that \(p(A|B) =p(B|A)\).
  4. If \(P( A \text{ and }B ) = 0.4\) and \(P( B) = 0.8\), then \(P( A|B ) = 0.5\).

Solution:

  1. True. \[P(bigger \mid spam) = 50\% \mbox{ and } P(bigger \mid non-spam) = 2\%\]\[P(spam) = 0.55\% \mbox{ and } P(non-spam) = 99.45\%\]\[\begin{aligned}P(spam \mid bigger) &=& \frac{P(bigger \mid spam)*P(spam)}{P(bigger \mid spam)*P(spam) + P(bigger \mid non-spam)*P(non-spam) } \\&=& 12.1\%\end{aligned}\]
  2. False, 99% \[P(Play) = P(Play|Snow)P(Snow) + P(Play|NoSnow)P(NoSnow)\]\[= 0.8*0.05+1*0.95=0.99\]
  3. False. Recall Bayes’ rule:* \[p(A|B)=\frac{p(B|A)p(A)}{p(B)}\]
  4. True. \(P( A|B ) = P( A \text{ and }B )/P(B)\) by Bayes rule.

Utility and Decisions

Exercise 70 (Two Gambles) In an experiment, subjects were given the choice between two gambles:

Experiment 1
Gamble \({\cal G}_A\) Gamble \({\cal G}_B\)
Win Chance Win Chance
$2500 0.33 $2400 1
$2400 0.66
$0 0.01

Suppose that a person is an expected utility maximizer. Set the utility scale so that u($0) = 0 and u($2500) = 1. person is an expected utility maximizer. Set the utility scale so that u($0) = 0 and u($2500) = 1. Whether a utility maximizing person would choose Option A or Option B depends on the person’s utility for $2400. For what values of u($2400) would a rational person choose Option A? For what values would a rational person choose Option B?

Experiment 2
Gamble \({\cal G}_C\) Gamble \({\cal G}_D\)
Win Chance Win Chance
$2500 0.33 $2400 0.34
$0 0.67 $0 0.66

For what values of u($2400) would a person choose Option C? For what values would a person choose Option D? Explain why no expected utility maximizer would prefer B and C.

This problem is a version of the famous Allais paradox, named after the prominent critic of subjective expected utility theory who first presented it. Kahneman and Tversky found that 82% of subjects preferred B over A, and 83% preferred C over D. Explain why no expected utility maximizer would prefer both B in Gamble 1 and C in Gamble 2. (A utility maximizer might prefer B in Gamble 1. A different utility maximizer might prefer C in Gamble 2. But the same utility maximizer would not prefer both B in Gamble 1 and C in Gamble

Discuss these results. Why do you think many people prefer B in Gamble 1 and C in Gamble 2? Do you think this is reasonable even if it does not conform to expected utility theory?

Solution:

Define x = u($2400), the utility of $2400.

For A versus B:

  • Expected utility of A is 0.33* (1) + 0.66(x) + 0.1(0)
  • Expected utility of B is x

Setting them equal and solving for x tells us the value of x for which an expected utility maximizer would be indifferent between the two options \[0.33* (1) + 0.66*(x) + 0.1*(0) = x\] \[x = 0.33/0.34.\]

For C versus D:

  • Expected utility of C is 0.33* (1) + 0.67*(0)
  • Expected utility of D is 0.34*(x)

Setting them equal and solving for x tells us the value of x for which an expected utility maximizer would be indifferent between the two options \[0.33* (1) = 0.34*(x)\] \[x = 0.33/0.34\]

If x < 33/34, then an expected utility maximizer would choose option C. If x>33/34, option D an expected utility maximizer would choose option D

Why no utility maximizer would prefer B and C?

A utility maximizer would pick B if x>33/34, and would pick C if x<33/34. These regions do not overlap. By definition, an expected utility maximizer has a consistent utility value for a given payout regardless of the probability structure. Therefore, no utility maximizer would prefer B and C. A utility maximizer would be indifferent among all four of these gambles if x = 33/34. But no utility maximizer would strictly prefer B over A, and C over D.

Many people’s choices violate subjective expected utility theory in this problem. In fact, Allais carefully crafted this problem to exploit what Kahneman and Tversky called the “certainty effect.” In the choice of A vs B, many prefer a sure gain to a small chance of no gain. On the other hand, in the choice of C vs D, there is no certainty, and so people are willing to reduce their chances of winning by what feels like a small amount to give themselves a chance to win a larger amount. When given an explanation for why these choices are inconsistent with “economic rationality,” some people say they made an error and revise their choices, but others stand firmly by their choices and reject expected utility theory.

It is also interesting to look at how people respond to a third choice:

Experiment 3:

This is a two-stage problem. In the first stage there is a 66% chance you will win nothing and quit. There is a 34% chance you will go on to the second stage. In the second stage you may choose between the following two options.

Gamble Payout Probability
E $2500 33/34
E $0 1/34
F $2400 1
F $0 1/34

Experiment 3 is mathematically equivalent to Experiment 2. That is, the probability distributions of the outcomes for E and C are the same, and the probability distributions of the outcomes for F and D are the same. In experiments, the most common pattern of choices on these problems is to prefer B, C, and F. There is an enormous literature on the Allais paradox and related ways in which people systematically violate the principles of expected utility theory. For more information see Wikipedia on Allais paradox.

Exercise 71 (Decisions) You are sponsoring a fund raising dinner for your favorite political candidate. There is uncertainty about the number of people who will attend (the random variable \(X\)), but based on past dinners, you think that the probability function looks like this:

\(x\) 100 200 300 400 500
\(P_X(x)\) 0.1 0.2 0.3 0.2 0.2
  1. Calculate \(E(X)\), the expected number of people who will attend.
  2. The owner of the venue is going to charge you $1500 for rental and other miscellaneous costs. You know that you will make a profit (after per person costs) of $40 for each person attending. Calculate the expected profit after the rental cost.
  3. The owner of the venue proposes an alternative pricing scheme. Instead of charging $1500, she will charge you either $5 per person or $2100, whichever is smaller. So if 100 people come, you only pay $500. If 500 come, you pay $2100. Calculate the expected profit under this scheme (still assuming $40 per plate profit before you pay the owner).
  4. Let \(Y_1\) be your profit under the first scheme and \(Y_2\) be your profit under the second. If you do the calculations, it turns out that the standard deviations of these profits are: \[\sigma_{Y_1} = 4996 \qquad \sigma_{Y_2} = 4488\] Using the expected values calculated above explain which of the two scenarios you prefer.

Solution:

  1. \(E(X) = 100(.1) + 200(.2) + 300(.3) + 400(.2) +500(.2) = 320\)

  2. \(Y=40X-1500\) \(E(Y) = E(40X-1500) = 40 E(X) -1500 = 40(320) -1500 = 11300\)

  3. Profit = \(40X - \min(2100, 5X)\). Note that the profit formula is not linear (just like in the newspaper example). Consequently, you need to re-calculate profit for each of the five cases to get the expected value.

    \(x\) 100 200 300 400 500
    \(p_X(x)\) 0.1 0.2 0.3 0.2 0.2
    costs 500 1000 1500 2000 2100
    gross 4000 8000 12000 16000 20000
    profit = gross – costs 3500 7000 10500 14000 17900

    \(E(profit) = 3500(.1) + 7000(.2) + 10500(.3) + 14000(.2) + 17900(.2) = 11280\)

  4. The profit and standard deviation are larger in (b) than in (c). If you are risk averse, go for (c) (less uncertainty). If you are a gambler, go for (b) (higher average return). Since the difference in \(\sigma\) is larger than in the mean.

Exercise 72 (Marjorie Visit) Marjorie is worried about whether it is safe to visit a vulnerable relative during a pandemic. She is considering whether to take an at-home test for the virus before visiting her relative. Assume the test has sensitivity 85% and specificity 92%. That is, the probability that the test will be positive is about 85% if an individual is infected with the virus, and the probability that test will be negative is about 92% if an individual is not infected.

Further, assume the following losses for Marjorie

Event Loss
Visit relative, not infected 0
Visit relative, infected 100
Do not visit relative, not infected 1
Do not visit relative, infected 5
  1. Assume that about 2 in every 1,000 persons in the population is currently infected. What is the posterior probability that an individual with a positive test has the disease?
  2. Suppose case counts have decreased substantially to about 15 in 100,000. What is the posterior probability that an individual with a positive test has the disease?
  3. Suppose Marjorie is deciding whether to visit her relative and if so whether to test for the disease before visiting. If the prior probability that Marjorie has the disease is 200 in 100,000, find the policy that minimizes expected loss. That is, given each of the possible test results, should Marjorie visit her relative? Find the EVSI. Repeat for a prior probability of 15 in 100,000. Discuss.
  4. For the decision of whether Marjorie should visit her relative, find the range of prior probabilities for which taking the at-home test results in lower expected loss than ignoring or not taking the test (assuming the test is free). Discuss your results.

Solution:

  1. For a person in the given population, we have the following probabilities for the test outcome:
Condition Positive Negative
Disease present 0.85 0.15
Disease absent 0.08 0.92

Prior Probability: P(Disease present) = 0.002 and P(Disease absent)=0.998

We can calculate the posterior probability that the individual has the disease as follows:

First, we calculate the probability of a positive test (this will be the denominator of Bayes Rule):

P(Positive) = P(Positive | Present) * P(Present)+P(Positive | Absent) * P(Absent) = 0.850.002 + 0.080.998 = 0.08154

Then, we calculate the posterior probability that the individual is has the disease by applying Bayes rule:

P(Present | Positive) = P(Positive | Present) * P(Present)/P(Positive) = 0.85*0.002/0.08154 = 0.0208

The posterior probability that an individual who tests positive has the disease is 0.0208.

  1. The sensitivity and specificity of the test are the same as above:
Condition Positive Negative
Condition present0.85 0.15
Disease absent 0.08 0.92

But the prior probability is different:

P(Disease present) = 0.00015 and P(Disease absent)=0.99985

Again, we calculate the probability of a positive test (this will be the denominator of Bayes Rule):

P(Positive) = P(Positive | Present) * P(Present)+P(Positive | Absent) * P(Absent) = 0.850.00015 + 0.080.99985 = 0.0801155

Then, we calculate the posterior probability that the individual is has the disease by applying Bayes rule:

P(Present | Positive) = P(Positive | Present) * P(Present)/P(Positive) = 0.85*0.00015/0.0801155 = 0.00159

The posterior probability that an individual who tests positive has the disease is 0.00159.

In both cases, the posterior probability that the individual has the disease is small, even after taking the at-home test. But it is important to note that the probability has increased by more than a factor of 10, from 2 in 1000 to more than 2% in the first case, and from 15 in 100,000 to more than 15 in 10,000 in the second case. When the prior probability is lower, so is the posterior probability.

Some students who are inexperienced with Bayesian reasoning are surprised at how small the posterior probability is after a positive test. Untrained people often expect the probability to be closer to 85%, the sensitivity of the test. I have found that a good way to explain these results to people who have trouble with it is to ask them to imagine a population of 100,000 people. For part a, we would expect 200 have the disease. Of these, we expect about 85%, or 170 people, to test positive, and 15%, or only 30 people, to test negative. So the test identified most of the ill people. But now let’s consider the 99,800 people who do not have the disease. We expect only 8% of these to test positive, but this is 7984 people testing positive. We expect 92%, or 91,816 people, to test negative. Of those who test positive, 170 have the disease and 7,984 do not. So even though most ill people test positive and most well people test negative, because there are so many more people who do not have the disease, most of the ones testing positive actually do not have the disease.

A helpful way to visualize the role of the prior probability is to plot the posterior probability as a function of the prior probability (this was not required as part of the assignment). For prior probabilities near zero, a positive test increases the probability of the disease, but it remains very low because the positives are dominated by false positives. But once the prior probability reaches 10%, there is a better than even chance that someone testing positive has the disease; at a 20% prior probability, the posterior probability is over 70%.

p = seq(0.0001, 0.2, length.out = 100)
posterior = 0.85*p/(0.85*p + 0.08*(1-p))
plot(p, posterior, type = "l", xlab = "Prior Probability", ylab = "Posterior Probability")

  1. We will consider two different prior probabilities:

High prevalence: P(Present) = 0.002 and P( Absent ) = 0.998
Low prevalence: P(Present) = 0.00015 and P( Absent ) = 0.99985

To calculate EVSI, we do the following:

Calculate the expected loss if we do not do the test. To do this, we use the prior probability to calculate the expected loss if we visit and if we don’t. The minimum of these is the expected loss if we do not do the test.

EL[visit] = 100p + 0(1-p)
EL[no visit] = 5p + 1(1-p)

Here, p is either 0.002 (part a) or 0.00015 (part b)

Calculate the expected loss if we do the test as follows:

  1. Assume the test is positive. Using the posterior probability of disease, calculate the expected loss of visiting and not visiting. The minimum of these is the expected loss if the test is positive.
  2. Assume the test is negative. Using the posterior probability of disease, calculate the expected loss of visiting and not visiting. The minimum of these is the expected loss if the test is negative.
  3. Use the Law of Total Probability to find the probability of a positive test. (We already found this in Problems 1 and 2 – it’s the denominator of Bayes Rule.) Combine the results from a and b using the probability of disease as follows:
    E(L | test) = P(positive)E(L | positive)+P(negative)E(L | negative)

Now, the EVSI is how much doing the test reduces our expected loss:

EVSI = E(L | no test) - E(L | test)

We will now apply these steps for the two base rates. The table below shows the results.

High prevalence Low prevalence
Probability of positive test 0.08154 0.08012
Posterior probability of disease given positive test 0.0208 0.00159
Posterior probability of disease given negative test 0.00033 0.0000245
Expected loss – no test Visit: 0.2 No visit: 1.008 Visit: 0.015 No visit: 1.0006
Expected loss if positive Visit: 2.0849 No visit: 1.0834 Visit: 0.15915 No visit: 1.00637
Expected loss if negative Visit: 0.03266 No visit: 1.00131 Visit: 0.00245 No visit: 1.0001
Expected loss with test 0.11834 0.015
Expected loss – no test Visit: 0.2 No visit: 1.008 Visit: 0.015 No visit: 1.0006
EVSI 0.2-0.11834 = 0.08166 0.015-0.015 = 0

In the high prevalence case, it is better to test and follow the test result. The EVSI is 0.08166; there is 0.08166 lower loss from testing than from not testing. In the low prevalence case, however, the optimal decision is to visit no matter what the result of the test, because the posterior probability is so low even if the test is positive. Therefore, the expected loss is the same for doing or not doing the test because we ignore the test result and visit anyway. Therefore, the EVSI is zero in the low prevalence case.

  1. Prior Probability: P(Disease present) = p and P( Disease absent ) = 1- p

If we do not do the test, we cannot use the test result, so we have two choices: visit or do not visit. The expected loss of visiting is: E(L|V) = 100p + 0(1-p) = 100p
The expected loss of not visiting (lose 5 if ill, 1 if not ill): E(L|N) = 5p + (1-p)

Now consider the strategy of doing the test and visiting if the test is negative. We get the following losses

Patient status Test result Action Loss (L) Probability
No disease Negative visit 0 (1-p)*0.92
No disease Positive no visit 1 (1-p)*0.08
Disease Negative visit 100 p*0.15
Disease Positive no visit 5 p*0.85

The total expected loss of this strategy is:
E(L|T) = 0 + 10.08(1-p) + 1000.15p + 5*0.85p = 0.08 + 19.17p

Find strategy regions. With a little algebra, we can find where the lines intersect and then find the regions of optimality. These are:

Range Policy
\(0 \le p < 0.00099\) Optimal policy is visit regardless of test
\(0.00099 < p < 0.0606\) Optimal policy is to do the test and visit if negative
\(0.0606 < p \le 1\) Optimal policy is to stay home regardless of test

The graph below shows the plots of expected loss for the three policies. I show results only for probabilities up to 0.1 in order to have a better view of the results for small probabilities.

p = seq(0.0001, 0.1, length.out = 100)
ELV = 100*p
ELN = 5*p + (1-p)
ELT = 0.08 + 19.17*p
plot(p, ELV, type = "l", xlab = "Prior Probability", ylab = "Expected Loss", col = "red", lwd = 2)
lines(p, ELN, col = "blue", lwd = 2)
lines(p, ELT, col = "green", lwd = 2)
legend("topright", legend = c("Visit", "No Visit", "Test and Visit if Negative"), col = c("red", "blue", "green"), lwd = 2)

Therefore, in the range of values 0.00099 < p < 0.0606, doing the test has lower expected loss than either just visiting or just staying home. For the high prevalence situation, the prior probability of 0.002 is larger than 0.00099 and lower than 0.0606, so it is optimal to take the test and follow the result. For the low prevalence situation, the prior probability of 0.00015 is below 0.00099, and it is optimal just to visit without considering the test. That is, in very low prevalence situations, there is no point in doing the test, because even if it is positive, the posterior probability would still be low enough that it is optimal to visit. On the other hand, if Marjorie is experiencing symptoms or knows she has been exposed, that would increase the probability of having the disease. Even in a low prevalence situation, if symptoms or exposure increase the prior probability to above 0.00099 but below 0.0606, then she should take the test and follow the result. If symptoms or exposure increase the prior probability to above 0.0606, then she should not bother with the test and just stay home.

Exercise 73 (True/False Variance)  

  1. If the sample covariance between two variables is one, then there must be a strong linear relationship between the variables
  2. If the sample covariance between two variables is zero, then the variables are independent.
  3. If \(X\) and \(Y\) are independent random variables, then \(Var(2X-Y)= 2 Var(X)-Var(Y)\).
  4. The sample variance is unaffected by outlying observations.
  5. Suppose that a random variable \(X\) can take the values \(\{0,1,2\}\) all with equal probability. Then the expected and variance of \(X\) are both\(1\).
  6. The maximum correlation is \(1\) and the minimum is \(0\).
  7. For independent random variables \(X\) and \(Y\), we have \(var(X-Y)=var(X)-var(Y)\).
  8. If the correlation between \(X\) and \(Y\) is zero then the standard deviation of \(X+Y\) is the square root of the sum of the standard deviations of \(X\) and \(Y\).
  9. It is always true that the standard deviation is less than the variance
  10. If the correlation between \(X\) and \(Y\) is \(r = - 0.81\) and if the standard deviations are \(s_X = 20\) and \(s_Y = 25\), respectively, then the covariance is \(Cov (X, Y) = - 401\).
  11. If we drop the largest observation from a sample, then the sample mean and variance will both be reduced.
  12. Suppose \(X\) and \(Y\) are independent random variables and \(Var(X) = 6\) and \(Var(Y) = 6\). Then \(Var(X+Y) = Var(2X)\).
  13. Let investment \(X\) have mean return 5% and a standard deviation of 5%and investment \(Y\) have a mean return of 10% with a standard deviation of 6%. Suppose that the correlation between returns is zero. Then I can find a portfolio with higher mean and lower variance then \(X\).

Solution:

  1. False. \(Cov(X,Y) = Corr(X,Y) \sqrt{Var(X)*Var(Y)}\), so knowing \(Cov(X,Y) = 1\) doesn’t inform you about the correlation.
  2. False. This is only a special case for normal distribution.
  3. False. \(Var(2X-Y) = 4Var(X) + Var(Y)\)
  4. False. We have the following formula for the sample variance: \[\begin{aligned}\hat{\sigma}^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2\end{aligned}\]If \(x_i\) is large relative to \(\bar{x}\) then it has an undue influence.
  5. False \[E[X] = \frac{1}{3}*0+\frac{1}{3}*1+\frac{1}{3}*2=1\]\[Var[X] = \frac{1}{3}(0-1)^2+\frac{1}{3}(1-1)^2+\frac{1}{3}(2-1)^2\]\[=\frac{1}{3}+0+\frac{1}{3}=\frac{2}{3}\]
  6. False, the maximum is 1 and the minimum in -1.
  7. False. \(Var(X-Y) = Var(X) + Var(Y)\)
  8. False. Using the plug in rule, the standard deviation of \(X + Y\) is the square root of the sum of the variances of \(X\) and \(Y\)
  9. False. \(s_X \geq Var(X)\) if the \(Var (X ) \leq 1\)
  10. False. Using the formula relating covariance and correlation, namely\(Cov( X, Y ) = r s_X s_Y\) gives\(Cov ( X,Y) = - 0.81 \times 20 \times 25 = - 405.\)
  11. False. For example, \(100, 101, 102\) has mean \(101\) and variance \(0.333\),but \(100, 101\) has mean \(100.5\) and variance \(0.5\)
  12. False. First, \(Var(X+Y)=Var(X)+Var(Y)\) (as \(Cov(X,Y)=0)\) and so \(Var(X+Y)=2 Var(X)\). Secondly \(Var(2X) = 4 Var(X)\)
  13. True. For example, consider the portfolio \(P= 0.5 X + 0.5 Y\). This has expected return \(\mu_P = 0.5\times 5 + 0.5 \times 0.10 = 7.5 \%\). As the correlation is zero, its variance is given by \(\sigma_P^2 =0.5^2 \times 0.0025 + 0.5^2 0.0036 = 0.001525 < Var(X)\)

Exercise 74 (True/False Expectation)  

  1. LeBron James makes \(85\)% of his free throw attempts and \(50\)% of his regular shots from the field (field goals). Suppose that each shot is independent of the others. He takes \(20\) field goals and \(10\) free throws in a typical game. He gets one point for each free throw and two points for each field goal assuming no 3-point shots. The number of points he expects to score in a game is 28.5.
  2. Suppose that you have a one in a hundred chance of hitting the jackpot on a slot machine. If you play the machine \(100\) times then you are certain to win.
  3. The expected value of the sample mean is the population mean, that is \(E \left ( \bar{X} \right ) = \mu\).
  4. The expectation of \(X\) minus \(2Y\) is just the expectation of \(X\) minus twice the expectation of \(Y\), that is \(E (X-2Y)= E(X) - 2E (Y)\).
  5. A firm believes it has a 50-50 chance of winning a $80,000 contract if it spends $5,000 on a proposal. If the firm spends twice this amount,it feels its chances of winning improve to 60%. If the firm wants to maximize its expected value then it should spend $10,000 to try and gain the contract.
  6. \(E(X+Y)=E(X)+E(Y)\) only if the random variables \(X\) and \(Y\) are independent.

Solution:

  1. True.\[E(points) = 2E(\# FG made) + 1E(\#FT made) = 2 \times 20 \times 0.5+1 \times 10 \times 0.85 = 28.5\]This holds even if FG and FT are dependent
  2. False. The expected waiting time until you hit the jackpot is 100 times,but since each outcome of the game is independent of all previous outcomes, there is no guarantee about the number of plays until a jackpot it hit
  3. True. The expected value of the sample does equal the true value
  4. True. The plug-in rule states that \(E(aX+bY)=aE(X)+bE(Y)\). Plug in \(A=1\)and \(B =-2\). Hence \(E(X-2Y)=E(X)+E(-2Y)=E(X)-2E(Y)\)
  5. True.$ E(spend $5k) = 0.5$80000 - 5000 = 35000$. \(E(spend \$10k) = 0.6\cdot 80000 - 10000 = 38000\)
  6. False. By the plug-in rule, this relation holds irrespective of whether X and Y are.

Bayesian Parameter Learning

Exercise 75 (Beta-Binomial for Allais gambles) We’ve collected data on people’s preferences in the two Allais gambles from. For this problem, we will assume that responses are independent and identically distributed, and the probability is \(\theta\) that a person chooses both B in the first gamble and C in the second gamble.

  1. Assume that the prior distribution for \(\theta\) is Beta(1, 3). Find the prior mean and standard deviation for \(\theta\). Find a 95% symmetric tail area credible interval for the prior probability that a person would choose B and C. Do you think this is a reasonable prior distribution to use for this problem? Why or why not?
  2. In 2009, 19 out of 47 respondents chose B and C. Find the posterior distribution for the probability \(\theta\) that a person in this population would choose B and C.
  3. Find the posterior mean and standard deviation. Find a 95% symmetric tail area credible interval for \(\theta\).
  4. Make a triplot of the prior distribution, normalized likelihood, and posterior distribution.
  5. Comment on your results.

Solution:

Part a. It is convenient to use a prior distribution in the Beta family because it is conjugate to the Binomial likelihood function, which simplifies Bayesian updating. A plot of the prior density function is shown below.

curve(dbeta(x,1,3),0,1)

The prior distribution has mean a/(a+b) = 0.25 and standard deviation \[ \sqrt{\dfrac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}} = 0.194. \] An expected value of 0.25 would be reasonable if we believed that people were choosing randomly among the four options A&C, B&C, A&D, and B&D. In fact, a Beta(1,3) distribution is the marginal distribution we would obtain if we placed a uniform distribution on all probability vectors \((p_1, p_2, p_3, p_4)\) on the four categories (i.e. a uniform distribution on four non-negative numbers that sum to 1). In other words, this is the prior distribution we might assign if all we knew was that there were four possible responses, and knew nothing about the relative likelihoods of the four responses. However, we actually do know something about the relative probabilities of the categories. According to the research literature, people choose B&C more often than 25% of the time. The center of the prior distribution does not reflect this knowledge. But the prior distribution does show a very high degree of uncertainty about the probability. A 95% symmetric tail area credible interval for the prior distribution of \(\Theta\) is [0.008, 0.708]. The virtual count is small in relation to the sample size, so the data will have a large impact on the posterior distribution relative to the prior. Therefore, we won’t go too far wrong using this prior distribution, even though we do have knowledge that suggests the prior mean is likely to be higher than 0.25.

Part b. We will use the Binomial/Beta conjugate pair. The prior distribution is a member of the conjugate Beta(a, b) family, with a=1 and b=3. Therefore, the posterior distribution is a Beta(a, b) distribution, with a=a+x= 1+19 = 20, and b=b+n-x = 3+28 = 31. The posterior distribution is shown below.

curve(dbeta(x,20,31),0,1)

Part c. The posterior mean is a/(a+b*) = 20/51 = 0.392. The posterior standard deviation is \[ \sqrt{\dfrac{a^*b^*}{(a^*+b^*)^2(a^*+b^*+1)}} = 0.077. \] A 95% symmetric tail area credible interval for the posterior distribution of \(\Theta\) is [0.248, 0.536].

qbeta(c(0.025,0.975),20,31)
[1] 0.26 0.53

Part d. The triplot is shown below.

curve(dbeta(x,1,3),0,1,ylab="Density",lty=2,xlim=c(0,1),ylim=c(0,6))
curve(dbeta(x,20,31),0,1,add=TRUE)
curve(dbeta(x,19,47-19),0,1,add=TRUE)       # Normalized Likelihood is Beta(x,n-x)

The triplot shows the prior, normalized likelihood, and posterior distributions. The normalized likelihood is a Beta(19,47-19) density function. The posterior distribution is focused on slightly smaller values and is slightly narrower than the normalized likelihood. This is because the prior mean is smaller than the sample mean, and the posterior distribution has slightly more information than the prior distribution.

Exercise 76 (Poisson for Car Counts) Times were recorded at which vehicles passed a fixed point on the M1 motorway in Bedfordshire, England on March 23, 1985.2 The total time was broken into 21 intervals of length 15 seconds. The number of cars passing in each interval was counted. The result was:

cnt = c(2, 2, 1, 1, 0, 4, 3, 0, 2, 1, 1, 1, 4, 0, 2, 2, 3, 2, 4, 3, 2)

This can be summarized in the following table, that shows 3 intervals with zero cars, 5 intervals with 1 car, 7 intervals with 2 cars, 3 intervals with 3 cars and 3 intervals with 4 cars.

table(cnt)
cnt
0 1 2 3 4 
3 5 7 3 3 
  1. Do you think a Poisson distribution provides a good model for the count data? Justify your answer.
  2. Assume that \(\Lambda\), the rate parameter of the Poisson distribution for counts (and the inverse of the mean of the exponential distribution for inter arrival times), has a discrete uniform prior distribution on 20 equally spaced values between \((0.2, 0.4,\ldots 3.8, 4.0)\) cars per 15-second interval. Find the posterior distribution of \(\Lambda\).
  3. Find the posterior mean and standard deviation of \(\Lambda\).
  4. Discuss what your results mean in terms of traffic on this motorway.

Solution:

  1. A Poisson distribution is a good model to consider for the arrival counts because:
    • The data consist of counts of discrete events.
    • It seems reasonable that the events would be independent.
    • It seems reasonable that the rate of occurrence would remain constant over the 4+ minutes that the sample was taken.

These are reasons to consider a Poisson distribution, but we still need to investigate how well a Poisson distribution fits the observations. We will do this by comparing the sample frequencies with Poisson probability mass function. To do this, we need to estimate the mean of the distribution. We calculate the sample mean by summing these observations (40 total cars passing) and dividing by the number of intervals (21), to obtain an estimated arrival rate of about 1.9 cars per 15 second time interval.

Next, we use the Poisson probability mass function \(f(x|\Lambda=1.9) = e^{−1.9} (1.9)^x / x!\) to estimate the probability of each number of cars in a single interval, and multiply by 21 to find the expected number of times each value occurs (Note: for the 5 or more entry, we use \(1-F(4|\Lambda=1.9)\), or 1 minus the Poisson cdf at 4. The probability of more than 5 cars in 15 seconds is very small.) The expected counts are:

d = data.frame(cars=0:5,sample=c(3,5,7,3,3,0),expected=c(21*dpois(0:4,1.9),ppois(4,1.9)))
print(d)
  cars sample expected
1    0      3     3.14
2    1      5     5.97
3    2      7     5.67
4    3      3     3.59
5    4      3     1.71
6    5      0     0.96

The values in this table seem to show fairly good agreement. We can do a visual inspection by drawing a bar chart, shown below. The bar chart shows good agreement between the empirical and the Poisson probabilities.

barplot(height = t(as.matrix(d[,2:3])),beside=TRUE,legend=TRUE)

We can also perform a Pearson chi-square test (this is not required). The test statistic is calculated as: \[ \chi^2 =\sum_i (Y_i −E_i)^2/E_i \] where \(Y_i\) is the observed count for i cars passing, and \(E_i\) is the expected number of instances of i cars passing in a sample of size 21. The chi-square test should not be applied when counts are very small. A common (and conservative) rule of thumb is to avoid using the chi-square test if any cell has an expected count less than 5. Other less conservative rules have been proposed.4 We will combine the last two categories to increase the expected count to 2.37. To find the expected counts in this last category, we assign it probability 1 minus the cdf at x=3, and multiply by 21. The observed and expected counts are:

sum(((d$sample-d$expected)^2/d$expected)[1:4]) + (3-2.65)^2/2.65
[1] 0.62

Next, we find the degrees of freedom, n – p -1, where n is the number of categories and p is the number of parameters estimated. We have 5 categories, and we estimated the mean of the Poisson distribution, so the degrees of freedom are 5 - 1 - 1, or 3. Therefore, we compare our test statistic 0.616 against the critical value of the chi-square distribution with 3 degrees of freedom. The 95th percentile for the chi- square distribution with 3 degrees of freedom is 7.8.

qchisq(0.95,3)
[1] 7.8

Some students have tried a Q-Q plot of the Poisson quantiles against the data quantiles. Because there are only 4 distinct values in the sample, the plot shows most of the dots on top of each other. Comparing empirical and theoretical frequencies to is much more informative for discrete distributions with very few values.

We can also compare the sample mean and sample variance of the observations, recalling that the mean and variance of the Poisson distribution are the same. The sample mean of the observations is 1.90 and the sample variance is 1.59. The standard deviation of the sample mean is approximately equal to the sample standard deviation divided by the square root of the sample size, or \(\sqrt{1.59/21} = 0.275\). Therefore, the CI is

1.90 + c(-1,1)*0.275*qt(0.975,20)
[1] 1.3 2.5

The observations seem consistent with the hypothesis that the mean and standard deviation are equal.

You did not need to do everything described in this solution. If you gave a thoughtful argument for whether the Poisson distribution was a good model, and if you used the data in a reasonable way to evaluate the fit of the distribution, you would receive credit for this problem.

  1. To find the posterior distribution, we calculate the Poisson likelihood at each of the 20 lambda values, and multiply by the prior probability of 1/20. Then we divide each of these values by their sum. The formula is: \[ p(\lambda|X) = \frac{f(X|\lambda)g(\lambda)}{\sum_{i=1}^{20} f(X|\lambda_i)g(\lambda_i)} \] We use the constant prior of \(g(\lambda) = 1/20\) factors out of both the numerator and denominator, so it is unnecessary to include it. In fact, when the prior distribution is uniform, the posterior distribution is the same as the normalized likelihood.

The posterior pmf is shown in the plot below.

lambda = seq(0,4,0.2)
likelihood=function(lambda) prod(dpois(cnt,lambda))
l = sapply(lambda,likelihood)
posterior = l/sum(l)
barplot(posterior,names.arg=lambda,xlab="lambda",ylab="posterior probability")

m = sum(posterior*lambda)
print(m)
[1] 2
sqrt(sum(posterior*(lambda-m)^2))
[1] 0.3
  1. From this analysis, we see that a Poisson distribution provides a good model for counts of cars passing by this point on the roadway. The rate is approximately 2 cars per 15 seconds. There is uncertainty about this rate. There is a 95% chance the rate is less than or equal to 2.4, and there is about a 98% probability that the rate lies between 1.4 and 2.6 cars per second. If we gathered more data, we would gain more information about the rate, which would narrow the range of uncertainty.

Exercise 77 (Car Count Part 2) This problem continues analysis of the automobile traffic data. As before, assume that counts of cars per 15-second interval are independent and identically distributed Poisson random variables with unknown mean \(\Lambda\).

  1. Assume that \(\Lambda\), the rate parameter of the Poisson distribution for counts, has a continuous gamma prior distribution for \(\Lambda\) with shape 1 and scale 10e6. (The gamma distribution with shape 1 tends to a uniform distribution as the scale tends to \(\infty\), so this prior distribution is “almost” uniform.) Find the posterior distribution of \(\Lambda\). State the distribution type and hyperparameters.
  2. Find the posterior mean and standard deviation of \(\Lambda\). Compare your results to Part I. Discuss.
  3. Find a 95% symmetric tail area posterior credible interval for \(\Lambda\). Find a 95% symmetric tail area posterior credible interval for \(\theta\), the mean time between vehicle arrivals.
  4. Find the predictive distribution for the number of cars passing in the next minute. Name the family of distributions and the parameters of the predictive distribution. Find the mean and standard deviation of the predictive distribution. Find the probability that more than 10 cars will pass in the next minute. (Hint: one minute is four 15-second time intervals.)

Solution:

  1. We observed 40 cars passing in \(n\)=21 15-second intervals.
sum(cnt)
[1] 40

The posterior distribution is a gamma distribution with shape 1 + 40 = 41

1+sum(cnt)
[1] 41

and scale

n = length(cnt)
1/(1e-6 + n)
[1] 0.048

The posterior distribution is \[ \Lambda|X \sim \text{Gamma}(41,0.04761905) \]

curve(dgamma(x,shape=41,scale=0.04761905),from=0,to=5)

  1. The posterior mean is 41/21=1.952, and the posterior standard deviation is \(\sqrt{41}/21 = 0.304\). Let’s compare
mean sd mode median 95% CI
Discretized 1.95 0.305 2 2 [1.4, 2.6]
Continuous 1.95 0.305 1.9 1.94 [1.40, 2.59]

These values are almost identical. The slight differences in the median and mode are clearly artifacts of discretization. The discretized analysis gives a good approximation to the continuous analysis.

  1. The 95% symmetric tail area posterior credible interval for \(\Lambda\) is
qgamma(c(0.025,0.975),shape=41,scale=0.04761905)
[1] 1.4 2.6

and for \(\Theta\) is

1/qgamma(c(0.975,0.025),shape=41,scale=0.04761905)
[1] 0.39 0.71
  1. The Poisson / Gamma predictive distribution is a negative binomial distribution. For this problem, we predict the number of cars in the next four time periods (1 minute is four 15-second blocks). The predictive distribution is negative binomial with the parameters:
  • size a= 41
  • prob p = 1/(1+4b) = 1/(1+4/21) = 0.84

The mean is \(nab = 4\cdot 41\cdot 0.84 = 7.81\) and standard deviation of the predictive distribution is \[ \sqrt{na \dfrac{1-p}{p^2}} = \sqrt{4\cdot 41 \cdot \dfrac{1-0.84}{0.84^2}} = 2.83 \] The probabilities, as computed by the R dnbinom function, are shown in the table. Results from the Poisson probability mass function with mean \(4ab = 7.81\) are shown for comparison.

k = 0:20
p = dnbinom(k,size=41,prob=1/(1+4/21))
p = p/sum(p)
p = round(p,3)
p = c(p,1-sum(p))
poisson = round(dpois(k,7.81),3)
poisson = c(poisson,1-sum(poisson))
d = cbind(k,p,poisson)
Warning in cbind(k, p, poisson): number of rows of result is not a multiple of
vector length (arg 1)
print(d)
       k      p poisson
 [1,]  0  0.001   0.000
 [2,]  1  0.005   0.003
 [3,]  2  0.017   0.012
 [4,]  3  0.040   0.032
 [5,]  4  0.070   0.063
 [6,]  5  0.101   0.098
 [7,]  6  0.124   0.128
 [8,]  7  0.133   0.143
 [9,]  8  0.127   0.139
[10,]  9  0.111   0.121
[11,] 10  0.089   0.094
[12,] 11  0.066   0.067
[13,] 12  0.046   0.044
[14,] 13  0.030   0.026
[15,] 14  0.018   0.015
[16,] 15  0.011   0.008
[17,] 16  0.006   0.004
[18,] 17  0.003   0.002
[19,] 18  0.002   0.001
[20,] 19  0.001   0.000
[21,] 20  0.000   0.000
[22,]  0 -0.001   0.000
barplot(t(d[,2:3]),beside=TRUE,names.arg=d[,1],xlab="k",ylab="probability")
legend("topright",c("negative binomial","poisson"),fill=c("black","white"))

The difference between the Bayesian predictive distribution and the Poisson distribution using a point estimate of the rate is even more pronounced if we predict for the next 21 time periods (5 1⁄4 minutes). This was not required, but is interesting to examine. Again, the distribution is negative binomial with: size \(a = 41\), and probability \(p = 1/(1 + 21b) = 0.5\)

k = 0:65
p = dnbinom(k,size=41,prob=0.5)
p = p/sum(p)
p = c(p,1-sum(p))
poisson = dpois(k,21*41/21)
poisson = c(poisson,1-sum(poisson))
d = cbind(k,p,poisson)
Warning in cbind(k, p, poisson): number of rows of result is not a multiple of
vector length (arg 1)
barplot(t(d[,2:3]),beside=TRUE,names.arg=d[,1],xlab="k",ylab="probability")
legend("topright",c("negative binomial","poisson"),fill=c("black","white"))

To summarize, the predictive distribution is negative binomial with size \(a = 41\), and probability \(p = 1/(1 + nb)\) where \(n\) is the number of periods into the future we are predicting. The uncertainty in \(\Lambda\) makes the predictive distribution more spread out than the Poisson distribution with the same mean. When n=1, the Poisson and negative binomial probabilities are very similar. The differences become more pronounced as we try to predict further into the future.

Exercise 78 (Lung disease) Chronic obstructive pulmonary disease (COPD) is a common lung disease characterized by difficulty in breathing. A substantial proportion of COPD patients admitted to emergency medical facilities are released as outpatients. A randomized, double-blind, placebo-controlled study examined the incidence of relapse in COPD patients released as outpatients as a function of whether the patients received treatment with corticosteroids. A total of 147 patients were enrolled in the study and were randomly assigned to treatment or placebo group on discharge from an emergency facility. Seven patients were lost from the study prior to follow-up. For the remaining 140 patients, the table below summarizes the primary outcome of the study, relapse within 30 days of discharge.

Relapse No Relapse Total
Treatment 19 51 70
Placebo 30 40 70
Total 49 91 140
  1. Let \(Y_1\) and \(Y_2\) be the number of patients who relapse in the treatment and placebo groups, respectively. Assume \(Y_1\) and \(Y_2\) are independent Binomial(70,\(\theta_i\) ) distributions, for \(i=1,2\). Assume \(\theta_1\) and \(\theta_2\) have independent Beta prior distributions with shape parameters 1⁄2 and 1⁄2 (this is the Jeffreys prior distribution). Find the joint posterior distribution for \(\theta_1\) and \(\theta_2\). Name the distribution type and its hyperparameters.
  2. Generate 5000 random pairs \((\theta_1, \theta_2)\), \(k=1,\ldots,5000\) from the joint posterior distribution. Use this random sample to estimate the posterior probability that the rate of relapse is lower for treatment than for placebo. Discuss your results.

Solution:

  1. The prior distribution for each condition is \(Beta(1/2,1/2)\). Therefore, the posterior distributions for \(\theta_1\) and \(\theta_2\) are Beta distributions with parameters and quantiles as shown below.
# Study results
k.trt=19    # Number of relapses in treatment group
n.trt=70    # Number of patients in treatment group
k.ctl=30    # Number of relapses in control group
n.ctl=70    # Number of patients in control group

#Prior hyperparameters
alpha.0 =0.5
beta.0=0.5

# The joint posterior distribution is the product of two independent beta
# distributions with posterior hyperparameters as shown below
alpha.trt=alpha.0 + k.trt
beta.trt=beta.0 + n.trt-k.trt
alpha.ctl=alpha.0 + k.ctl
beta.ctl=beta.0 + n.ctl-k.ctl

# Posterior credible intervals
trt.intrv=qbeta(c(0.025,0.975),alpha.trt,beta.trt)
ctl.intrv=qbeta(c(0.025,0.975),alpha.ctl,beta.ctl)

d = data.frame(treatment=c(alpha.trt,beta.trt,trt.intrv), control=c(alpha.ctl,beta.ctl,ctl.intrv), row.names=c("alpha","beta","2.5%","97.5%"))
knitr::kable(t(d))
alpha beta 2.5% 97.5%
treatment 20 52 0.18 0.38
control 30 40 0.32 0.55

It is important to remember that the joint density or mass function is a function of all the random variables. Because \(\theta_1\) and \(\theta_2\) are independent and the \(Y_i\) are independent conditional on the \(\theta_i\), the two posterior distributions are independent. We can see this by writing out the joint distribution of the parameters and observations: \[ Bin(x_1|\theta_1,70)Bin(x_2|\theta_2,70)Be(\theta_1|1,1)Be(\theta_2|1,1) = {70 \choose 19}\theta_1^{19}(1-\theta_1)^{51}{70 \choose 30}\theta_2^{30}(1-\theta_2)^{40} \] This is proportional to the product of two Beta density functions, the first with parameters (19.5,51.5), and the second with parameters (30.5,40.5).

Conditional on the observations, \(\theta_1\) and \(\theta_2\) are independent Beta random variables. The marginal distribution of \(\theta_1\) is Beta(19.5, 51.5) and the marginal distribution of \(\theta_1\) is Beta(30.5, 40.5).

The joint density function is the product of the individual density functions (remember that if \(n\) is an integer, \(n! = \Gamma(n+1))\) and \(B(\alpha, \beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)\) \[ Be(\theta_1|20,52)Be(\theta_2\mid 31,41) = \dfrac{\Gamma(20+52)}{\Gamma(20)\Gamma(52)}\theta_1^{19}(1-\theta_1)^{51}\dfrac{\Gamma(31+41)}{\Gamma(31)\Gamma(41)}\theta_2^{30}(1-\theta_2)^{40} \] A plot of the two posterior density functions is shown below. The 95% credible intervals are shown on the plot as vertical lines

curve(dbeta(x,19.5,51.5),0,0.6,ylab="Density",lty=2,ylim=c(0,8))
curve(dbeta(x,30.5,40.5),0,0.6,add=TRUE)
abline(v=trt.intrv,lty=2, lwd=0.5)
abline(v=ctl.intrv,lty=2, lwd=0.5)

To visualize the joint density function of \(\theta_1\) and \(\theta_2\), we can do a surface or contour plot (you were not required to do this). The joint density function is a function of two variables, \(\theta_1\) and \(\theta_2\). To visualize the joint density function, we need either a surface plot or a contour plot (you were not required to do this). A surface plot is shown below.

gs<-200        # grid size
theta.t<-seq(0.25,0.6,length=gs)   # treatment mean
theta.c<-seq(0.1,0.45 ,length=gs)  # control mean

dens<-matrix(0,gs,gs)             # density function
for(i in 1:gs) {
  dens[i,]=dbeta(theta.t[i],alpha.trt,beta.trt) * dbeta(theta.c,alpha.ctl,beta.ctl)
}

# Perspective plot

persp(theta.t,theta.c,dens,theta=30,phi=5,xlab="Treatment Relapse Prob",
      ylab="Control Relapse Prob",zlab="Probability Density",
      xlim=theta.t[c(1,gs)] ,ylim=theta.c[c(1,gs)],
      main="3D Plot of Joint Density for Treatment and Control Relapse Probs")

The posterior mean relapse probability for the treatment group is 19.5/71x100% = 27.5%. A 95% symmetric tail area credible interval is [17.8%, 38.3%]. The posterior mean relapse probability for the control group is 30.5/71x100% = 43.0%. A 95% symmetric tail area credible interval is [31.7%, 54.5%].

About 1⁄4 of the treatment patients relapsed, whereas almost half the control patients relapsed. Most of the posterior density function for the treatment group lies below most of the posterior density function for the control group, but the 95% credible intervals for the two parameters overlap somewhat (you were not required to compute credible intervals). While this analysis suggests that treatment lowers the relapse probability, it does not provide a statistical basis for concluding that it does.

  1. The joint posterior distribution is Beta(19.5,51.5)Beta(30.5, 40.5). That is, \(\theta_1\) and \(\theta_2\) are independent Beta random variables with shape parameters as given in the table above. To sample from this joint distribution, we can draw 5000 random samples from each of these Beta distributions
set.seed(92) # Kuzy
nsample=5000
contr = rbeta(nsample,19.5,51.5)
treat = rbeta(nsample,30.5,40.5)

diff=treat-contr     # Estimate of difference in relapse probs
efficacious = diff>0  # If treat > control
prob.efficacious=sum(efficacious)/nsample
sd.probefficacious = sd(efficacious)/sqrt(nsample)

#Posterior quantiles, mean and standard deviation of the difference in 
#relapse rates between treatment and control
quantile(diff,c(0.05,0.5,0.95))
   5%   50%   95% 
0.021 0.156 0.287 
mean(diff)
[1] 0.15
sd(diff)
[1] 0.08
#Density Function for Difference in Treatment and Control Relapse Probabilities
plot(density(diff),main="Kernel Density Estimator",xlab="Difference in Treatment and Control Probabilities",ylab="Posterior Probability Density")

The analysis above provides evidence that there is a lower relapse probability in the treatment than in the control group. We estimate a 97.2% probability that the treatment has a lower relapse probability. The mean of the posterior distribution of the difference in relapse probabilities is approximately 0.155, and an approximate 90% credible interval for the difference in relapse rates is [0.021, 0.287].

Exercise 79 (Normal-Normal) Concentrations of the pollutants aldrin and hexachlorobenzene (HCB) in nanograms per liter were measured in ten surface water samples, ten mid-depth water samples, and ten bottom samples from the Wolf River in Tennessee. The samples were taken downstream from an abandoned dump site previously used by the pesticide industry. The full data set can be found at http://www.biostat.umn.edu/~lynn/iid/wolf.river.dat. For this problem, we consider only HCB measurements taken at the bottom and the surface. The question of interest is whether the distribution of HCB concentration depends on the depth at which the measurement was taken. The data for this problem are given below.

Surface Bottom
3.74 5.44
4.61 6.88
4.00 5.37
4.67 5.44
4.87 5.03
5.12 6.48
4.52 3.89
5.29 5.85
5.74 6.85
5.48 7.16

Assume the observations are independent normal random variables with unknown depth-specific means \(\theta_s\) and \(\theta_b\) and precisions \(\rho_s = 1/\sigma^2_s\) and \(\rho_b = 1/\sigma_s^2\). Assume independent improper reference priors for the surface and bottom parameters: \[ g(\theta_s,\theta_b ,\rho_s,\rho_b ) = g(\theta_s,\rho_s)g(\theta_b ,\rho_b) \propto \rho_s^{-1}\rho_b^{-1}. \]

  1. This prior can be treated as the product of two normal-gamma priors with \(\mu_s = \mu_b = 0\), \(\sigma_s \rightarrow 0\) and \(\sigma_b \rightarrow 0\), \(a_s = a_b = -1/2\), and \(b_s = b_b \rightarrow 0\). (These are not valid normal-gamma distributions, but you can use the usual Bayesian conjugate updating rule to find the posterior distribution.) Find the joint posterior distribution for the parameters \((\theta_s,\theta_b,\rho_s,\rho_b)\). Find 90% posterior credible intervals for \((\theta_s,\theta_b,\rho_s,\rho_b)\). Comment on your results.
  2. Use direct Monte Carlo to sample 10,000 observations from the joint posterior distribution of \((\theta_s,\theta_b,\rho_s,\rho_b)\). Use your Monte Carlo samples to estimate 90% posterior credible intervals for all four parameters. Compare with the result of part a.
  3. Use your direct Monte Carlo sample to estimate the probability that the mean bottom concentration \(\theta_b\) is higher than the mean surface concentration \(\theta_s\) and to estimate the probability that the standard deviation ” of the bottom concentrations is higher than the standard deviation \(\sigma_b\) of the surface concentrations.
  4. Comment on your analysis. What are your conclusions about the distributions of surface and bottom concentrations? Is the assumption of normality reasonable? Are the means different for surface and bottom? The standard deviations?
  5. Find the predictive distribution for the sample mean of a future sample of size 10 from the surface and a future sample of size 10 from the bottom. Find 95% credible intervals on the sample mean of each future sample. Repeat for future samples of size 40. Compare your results and discuss.
  6. Use direct Monte Carlo to estimate the predictive distribution for the difference in the two sample means for 10 future surface and bottom samples. Plot a kernel density estimator for the density function for the difference in means. Find a 95% credible interval for the difference in the two sample means. Repeat for future samples of 40 surface and 40 bottom observations. Comment on your results.
  7. Repeat part e, but use a model in which the standard deviation is known and equal to the sample standard deviation, and the depth-specific means \(\theta_s\) and \(\theta_b\) have a uniform prior distribution. Compare the 95% credible intervals for the future sample means for the known and unknown standard deviation models. Discuss.
  8. Assume that experts have provided the following prior information based on previous studies.
  • The unknown means \(\theta_s\) and \(\theta_b\) are independent and normally distributed with mean \(\mu\) and standard deviation \(\tau\). The unknown precisions \(\rho_s\) and \(\rho_b\) are independent of \(\theta_s\) and \(\theta_b\) and have gamma distributions with shape \(a\) and scale \(b\).
  • Experts specified a 95% prior credible interval of [3, 9] for \(\theta_s\) and \(\theta_b\). A good fit to this credible interval is obtained by setting the prior mean to \(\mu =6\) and the prior standard deviation to \(\tau=1.5\).
  • A 95% prior credible interval of [0.75, 2.0] is given for the unknown standard deviations \(\Sigma_s\) and \(\Sigma_b\). This translates to a credible interval of [0.25, 1.8] for \(\rho_s = \Sigma_s^{-1}\) and \(\rho_b = \Sigma_b^{-2}\). A good fit to this credible interval is obtained by setting the prior shape to $a = 4.5. and the prior scale to \(b\) = 0.19. Find the following conditional distributions: \(p(\theta_s \mid D,\theta_b,\rho_s,\rho_b)\), \(p(\theta_b \mid D,\theta_s,\rho_s,\rho_b)\), \(p(\rho_s \mid D,\theta_s,\theta_b,rho_b)\), \(p(\rho_b \mid D, \theta_s,\theta_b,\rho_s)\)
  1. Using the distributions you found, draw 10,000 Gibbs samples of \((\theta_s,\theta_b,\rho_s,\rho_b)\). Estimate 90% credible intervals for \((\theta_s,\theta_b,\rho_s^{-1/2},\rho_b^{-1/2})\) and \(\theta_b-\theta_s\).
  1. Do a traceplot of \(\theta_b-\theta_s\). Find the autocorrelation function of \(\theta_b-\theta_s\) and the effective sample size for your Monte Carlo sample for \(\theta_b-\theta_s\).
  2. Comment on your results. Compare with parts a,b, and c.

Solution:

  1. The joint posterior distribution for the parameters \((\theta_s,\theta_b,\rho_s,\rho_b)\) is the product of an improper prior density for the surface parameters times an improper prior density for the bottom parameters. The joint posterior density is proportional to the joint prior density times the surface and bottom likelihood functions: \[ g(\theta_s,\theta_b ,\rho_s,\rho_b \mid x_s,x_b) \propto g(\theta_s,\rho_s)g(\theta_b ,\rho_b) g(x_s \mid \theta_s,\rho_s)g(x_b \mid \theta_b,\rho_b) \]

The first two factors are proportional to the posterior density of the surface parameters; the second two factors are proportional to the posterior density of the bottom parameters. Therefore, the surface and bottom parameters are independent given the data with joint posterior density: \[ g(\theta_s,\theta_b ,\rho_s,\rho_b \mid x_s,x_b) \propto g(\theta_s,\rho_s \mid x_s)g(\theta_b ,\rho_b \mid x_b) \] We will calculate the posterior distributions for the surface and bottom parameters in turn.

The prior density for the surface parameters is \(g(\theta_s,\rho_s) \propto \rho_s^{-1}\)

This is the limit of the density function for a normal-gamma conjugate prior distribution. We use the Normal-Normal-Gamma update rules. The prior is given by \[ g(\theta,\rho \mid \nu,\mu,a,b) = \sqrt{\frac{\nu\rho}{2\pi}} \exp\left(-\frac{\nu\rho}{2}(\theta-\mu)^2\right) \frac{b^a}{\Gamma(a)}\rho^{a-1}e^{-b\rho} \]

For prior parameters \(\mu_0 =0 ,\, \nu=0,\, \alpha=-0.5 ,\, \beta=0\), the posterior values are \[ \frac{\nu\mu_0+n\bar{x}}{\nu+n} ,\, \nu+n,\, \alpha+\frac{n}{2}, \beta + \tfrac{1}{2} \sum_{i=1}^n (x_i - \bar{x})^2 + \frac{n\nu}{\nu+n}\frac{(\bar{x}-\mu_0)^2}{2} \] Pluggin-in our values \(n=10\) and \(\bar{x}=4.8\), we get the posterior values for the Surface \[ \mu^* = \frac{0+10*4.8}{0+10} = 4.8 ,\, \nu^* = 0+10=10,\, \alpha^* = -0.5+\frac{10}{2} = 4.5, \, \beta^* = 0 + \tfrac{1}{2} \sum_{i=1}^{10} (x_i - 4.8)^2 = 1.794 \]

s = c(3.74,4.61,4.00,4.67,4.87,5.12,4.52,5.29,5.74,5.48)
b = c(5.44,6.88,5.37,5.44,5.03,6.48,3.89,5.85,6.85,7.16)
print(mean(s))
[1] 4.8
print(mean(b))
[1] 5.8
print(0.5*sum((s - mean(s))^2))
[1] 1.8
print(0.5*sum((b - mean(b))^2))
[1] 4.6

Similar calculations for the bottom parameters give us the posterior values \[ \mu^*=\frac{0+10*5.84}{0+10} = 5.84,~ \nu^* = 0+10=10,\, \alpha^* = -0.5+\frac{10}{2} = 4.5, \, \beta^* = 0 + \tfrac{1}{2} \sum_{i=1}^{10} (x_i - 5.84)^2 = 4.627 \]

To summarise, we have:

\(\mu^*\) \(\nu^*\) \(\alpha^*\) \(\beta^*\)
surface 4.8 10 4.5 1.794
bottom 5.84 10 4.5 4.627

The marginal distribution for \(\theta_s\) is nonstandard \(t\) with center 4.80, spread \(1/\sqrt{\nu^*\alpha^*/\beta^*} = 0.20\), and degrees of freedom 9. The marginal distribution for \(\theta_s\) is nonstandard \(t\) with center 5.84, spread \(1/\sqrt{\nu^*\alpha^*/\beta^*} = 0.32\), and degrees of freedom 9. Using the quantiles of the t distribution

qgamma(c(0.05,0.95),4.5,1.794) # marginal posterior for \rho_s
[1] 0.93 4.72
qgamma(c(0.05,0.95),4.5,4.627) # marginal posterior for \rho_b
[1] 0.36 1.83
4.8 + c(-1,1)*1.833*(1/sqrt(10*4.5/1.794)) # marginal posterior for \theta_s
[1] 4.4 5.2
5.84 + c(-1,1)*1.833*(1/sqrt(10*4.5/4.627)) # marginal posterior for \theta_b
[1] 5.3 6.4
set.seed(92) # Kuzy
# Direct Monte Carlo
numsim=5000
rhos=rgamma(numsim,4.5,1.794)
rhob=rgamma(numsim,4.5,4.627)
thetas=rnorm(numsim,4.8,1/sqrt(10*rhos))
thetab=rnorm(numsim,5.84,1/sqrt(10*rhob))


# Monte Carlo quantiles
quantile(rhos,c(0.05,0.95))
  5%  95% 
0.92 4.70 
quantile(rhob,c(0.05,0.95))
  5%  95% 
0.36 1.84 
quantile(thetas,c(0.05,0.95))
 5% 95% 
4.4 5.2 
quantile(thetab,c(0.05,0.95))
 5% 95% 
5.3 6.4 
# Estimate probability that bottom concentration is larger
sum(thetab>thetas)/numsim        # Estimated prob bottom mean is larger
[1] 0.99
sd(thetab>thetas)/sqrt(numsim)   # Standard error of estimate
[1] 0.0014
quantile(thetab-thetas, c(0.05,0.95))  # 90% credible interval
  5%  95% 
0.36 1.73 
# Estimate probability that bottom  precision is smaller
sum(rhob>rhos)/numsim        # Estimated prob bottom precision is larger
[1] 0.088
sd(rhob>rhos)/sqrt(numsim)   # Standard error of estimate
[1] 0.004
quantile(rhob-rhos, c(0.05,0.95))  # 90% credible interval
   5%   95% 
-3.84  0.28 
# Estimate probability that bottom standard deviation is larger
sigmab=1/sqrt(rhob)         # bottom standard deviation
sigmas=1/sqrt(rhos)         # surfact standard deviation
sum(sigmab>sigmas)/numsim    # Estimated prob bottom std dev is larger
[1] 0.91
sd(sigmab>sigmas)/sqrt(numsim)   # Standard error of estimate
[1] 0.004
quantile(sigmab-sigmas, c(0.05,0.95))  # 90% credible interval
  5%  95% 
-0.1  1.0 
  1. The results provide strong evidence that the mean HCB concentration in the bottom samples is larger than the mean HCB concentration in the surface samples. There is also evidence that the surface concentrations may have less variation around the mean than the bottom observations, but because of the small sample size, more evidence is needed to make a definitive assessment of whether the standard deviations are different. The p-value is 0.93 for the surface observations and 0.53 for the bottom observations, which does not reject the hypothesis of normality. Therefore, it seems reasonable to proceed with the assumption that the observations are normally distributed.
qqnorm(s,main="Normal Q-Q Plot for Surface Data"); qqline(s)
qqnorm(b,main="Normal Q-Q Plot for Bottom Data"); qqline(b)

  1. The posterior-predictive for the sample mean for Normal-Gamma of size \(m\) is \(t\) distribution with mean \(\mu^*\) and spread \[ \dfrac{1}{\sqrt{\dfrac{\nu^*m}{\nu^*+m}\alpha^*/\beta^*}} \] and degrees of freedom \(m-1\). For \(m=10\) and the posterior parameters of the surface data, we have spread \[ \dfrac{1}{\sqrt{\dfrac{10*10}{10+10}*4.5/1.794}} = 0.28 \]
ss10 = 1/sqrt(100/20*4.5/1.794)
sprintf("Spread for surface data with m=10: %.2f",ss10)
[1] "Spread for surface data with m=10: 0.28"

for \(m=40\), we have spread

ss40 = 1/sqrt(400/50*4.5/1.794)
sprintf("Spread for surface data with m=40: %.2f",ss40)
[1] "Spread for surface data with m=40: 0.22"

For the bottom data, we have spread

sb10 = 1/sqrt(100/20*4.5/4.627)
sprintf("Spread for bottom data with m=10: %.2f",sb10)
[1] "Spread for bottom data with m=10: 0.45"
sb40 = 1/sqrt(400/50*4.5/4.627)
sprintf("Spread for bottom data with m=40: %.2f",sb40)
[1] "Spread for bottom data with m=40: 0.36"

Now we can calculate 95% credible intervals for the sample mean of each future sample

4.8+ss10*qt(c(0.025,0.975),9)
[1] 4.2 5.4
4.8+ss40*qt(c(0.025,0.975),9)
[1] 4.3 5.3
5.839+sb40*qt(c(0.025,0.975),9)
[1] 5.0 6.7
5.839+sb10*qt(c(0.025,0.975),9)
[1] 4.8 6.9

As we would expect, the intervals for m=40 are entirely contained within the intervals for m=10, and the intervals for the posterior mean (computed above) lie entirely within the intervals for the corresponding sample means. This is because the posterior mean intervals include uncertainty only about the surface and bottom mean, whereas the intervals in this assignment include uncertainty about both the true means and the observations. The sample mean for 40 observations has less uncertainty than the sample mean for 10 observations. The intervals for surface and bottom sample means overlap more for m=10 than for m=40, and both have more overlap than the posterior mean intervals.

  1. We do this two different ways and verify that they give the same results to within sampling error. First draw samples of Theta and Rho, draw ssize observations, compute the sample mean, and average.
set.seed(92) # Kuzy
numsim=10000
ssize=10
rhob=rgamma(numsim,4.5,4.627) # sample b precision
thetab=rnorm(numsim,5.84,1/sqrt(10*rhob))     # sample b precision
rhos=rgamma(numsim,4.5,1.794) # sample s mean
thetas=rnorm(numsim,4.8,1/sqrt(10*rhos))     # sample surface mean
diff = 0     # Difference in sample means - initialize to 0
dd=0
for (i in 1:ssize) {
  diff = diff + 
    rnorm(numsim,thetab,1/sqrt(rhob)) -    # sample bottom observation
    rnorm(numsim,thetas,1/sqrt(rhos))      # sample surface observation and subtrace
  }
diff = diff / ssize    # Calculate average - divide sum by number of observations

quantile(diff, c(0.025,0.975))   # quantiles
 2.5% 97.5% 
-0.16  2.23 
plot(density(diff),              # kernel density plot
     main=paste("Density for Difference in Sample Means for m =",ssize))

We can also sample from the t distribution (need two independent samples one for surface and one for bottom)

diff1 = 5.84 + rt(numsim,2*4.5)*ss10 -
  4.8 + rt(numsim,2*4.5)*sb10 
quantile(diff1,c(0.025,0.975))  # Compare with quantiles from full sampling method
 2.5% 97.5% 
-0.17  2.23 
# Known standard deviation model

mu0=0           # prior mean for theta is 0
tau0=Inf        # prior SD of theta is infinite

sigmas = sd(s) # observation SD is assumed known and equal to sample SD
sigmab = sd(b) # observation SD is assumed known and equal to sample SD
n=10
mus.star =      # Posterior mean for surface
  (mu0/tau0^2 + sum(s)/sigmas^2)/(1/tau0^2+n/sigmas^2)
taus.star =     # Posterior SD of surface mean
  (1/tau0^2+n/sigmas^2)^(-1/2)
mub.star =      # Posterior mean for bottom
  (mu0/tau0^2 + sum(b)/sigmab^2)/(1/tau0^2+n/sigmab^2)  
taub.star =     # Posterior SD of surface mean
  (1/tau0^2+n/sigmab^2)^(-1/2)

sprintf("Posterior mean for surface: %.2f",mus.star)
[1] "Posterior mean for surface: 4.80"
sprintf("Posterior mean for bottom: %.2f",mub.star)
[1] "Posterior mean for bottom: 5.84"
# Predictive intervals

qnorm(c(0.025,0.975),    # Surface, sample size 10
      mus.star,sqrt(taus.star^2 + sigmas^2/10))
[1] 4.3 5.4
qnorm(c(0.025,0.975),    # Surface, sample size 40
      mus.star,sqrt(taus.star^2 + sigmas^2/40))
[1] 4.4 5.2
qnorm(c(0.025,0.975),    # Surface, sample size 10
      mub.star,sqrt(taub.star^2 + sigmab^2/10))
[1] 5.0 6.7
qnorm(c(0.025,0.975),    # Surface, sample size 40
      mub.star,sqrt(taub.star^2 + sigmab^2/40))
[1] 5.1 6.5
# Kernel density plot
plot(density(diff),main=paste("Density for Difference in Sample Means"))

Exercise 80 (Gibbs: Bird feeders) A biologist counts the number of sparrows visiting six bird feeders placed on a given day.

Feeder Number of Birds
1 11
2 22
3 13
4 24
5 19
6 16
  • Assume that the bird counts are independent Poisson random variables with feeder- dependent means \(\lambda_i\), for \(i=1,\ldots,6\).
  • Assume that the means \(\lambda_i\) are independent and identically distributed gamma random variables with shape a and scale b (or equivalently, shape a and mean m = ab )
  • The mean m = ab of the gamma distribution is uniformly distributed on a grid of 200 equally spaced values starting at 5 and ending at 40.
  • The shape a is independent of the mean m and has a distribution that takes values on a grid of 200 equally spaced points starting at 1 and ending at 50, with prior probabilities proportional to a gamma density with shape 1 and scale 5.
  1. Use Gibbs sampling to draw 10000 samples from the joint posterior distribution of the mean m, the shape parameter a, and the six mean parameters \(\lambda_i\), \(i=1,\ldots,6\), conditional on the observed bird counts. Using your sample, calculate 95% credible intervals for the mean m, the shape a, and the six mean parameters \(\lambda_i\), \(i=1,\ldots,6\).
  2. Find the effective sample size for the Monte Carlo samples of the mean m, the shape parameter a, and the six mean parameters \(\lambda_i\), \(i=1,\ldots,6\).
  3. Do traceplots for the mean m, the shape parameter a, and the six rate parameters \(\lambda_i\), \(i=1,\ldots,6\).
  4. The fourth feeder had the highest bird count and the first feeder had the lowest bird count. Use your Monte Carlo sample to estimate the posterior probability that the first feeder has a smaller mean bird count than the fourth feeder. Explain how you obtained your estimate.
  5. Discuss your results.

Exercise 81 (Poisson Distribution for EPL) We will analyze EPL data. Use epl.R as a starting script. This script uses football-data.org API to download the results for two EPL teams: Manchester United and Chelsea. Model the GoalsHome and GoalsAway for each of the teams using Poisson distribution. Given these distributions calculate the following

  1. What’s the probability of a nil-nil (0 - 0) draw?
  2. What’s the probability that MU wins the match?
  3. Discuss how could you improve your model based on four Poisson distributions.
  4. Use R and random number simulation to provide empirical answers. Using a random sample of size N = 10, 000, check and see how close you get to the theoretical answers you’ve found to the questions posed above. Provide histograms of the distributions you simulate.

Hint: The difference of two Poisson random variables follows a Skellam distribution, defined in skellam package. You can use dskellam to calculate the probability of a draw. You can use rpois to simulate draws from Poisson distribution.

Exercise 82 (Homicide Rate (Poisson Distribution)) Suppose that there are \(1.5\) homicides a week. The is the rate, so \(\lambda=1.5\). The tells us that there is a still a \(1.4\)% chance of seeing \(5\) homicides in a week \[ p( X= 5 ) = \frac{e^{ - 1.5 } ( 1.5 )^5 }{5!} = 0.014 \] On average this will happen once every \(71\) weeks, nearly once a year.

What’s the chance of having zero homicides in a week?

Solution:

From the probability distribution we calculate \[ p( X= 0 ) = \frac{e^{ - 1.5 } ( 1.5 )^0 }{0!} = 0.22 \] In R

dpois(0, 1.5)
[1] 0.22

So the probability is 22% You shouldn’t be surprised to see this happening.

Exercise 83 (True/False (Binomial Distribution))  

  1. The binomial distribution is a discrete probability distribution.
  2. Assuming the Joe DiMaggio’s batting average is \(0.325\) per at-bat and his hits are independent, then he has a probability of about \(12\)% of getting more than \(2\) hits in \(4\) at-bats.
  3. Suppose that you toss a fair coin with probability \(0.5\) a head. The probability of getting five heads is a row is less than three percent.
  4. Suppose that you toss a biased coin with probability \(0.25\) of getting ahead. The probability of getting five heads out of ten tosses is less than thirty percent.
  5. Suppose that you toss a coin \(5\) times. Then there are \(10\) ways of getting \(3\) heads.
  6. The probability of observing three heads out of five tosses of a fair coin is \(0.6\).
  7. A mortgage bank knows from experience that \(2\)% of residential loan swill go into default. Suppose it makes \(10\) such loans, then the probability that at least one goes into default is \(95\)%.
  8. Jessica Simpson is not a professional bowler and \(40\)% of her bowling swings are gutter balls. She is planning to take \(90\) blowing swings.The mean and standard deviation of the number of gutter balls is\(\mu = 36\) and \(\sigma = 3.65\).
  9. The probability of at least one head when tossing a fair coin \(4\) times is \(0.9375\).
  10. The Red Sox are to play the Yankees in a seven game series. Assume that the Red Sox have a 50% chance of winning each game, with the results being independent of each other. Then the probability of the series ending 4-3 in favor of the Red Sox is \(0.5^{7}=0.0078\).
  11. Suppose that \(X\) is Binomially distributed with \(E(X)=5\) and \(Var(X)=2\),then \(n=10\) and \(p=0.5\).
  12. If \(X\) is a Bernoulli random variable with probability of success, \(p\),then its variance is \(V(X)=p(1-p)\).
  13. Historically 15% of chips manufactured by a computer company are defective. The probability of a random sample of 10 chips containing exactly one defect is 0.15.

Solution:

  1. True.
  2. False. We need to calculate \[\begin{aligned}P(hits > 2) &=& P(hits = 3) + P(hits = 4) \\&=& \binom{4}{3}\hat{p}^3(1-\hat{p}) + \binom{4}{4}\hat{p}^4 \\&=& 10.4\%\end{aligned}\]
  3. False. Given a fair coin with 50% probability for a head,\(0.5^5 = 0.0312.> 0.03\)
  4. True. This follows from a binomial distribution with \(n=10\). \(Prob(5H) = \left ( \begin{array}{c}10\\5 \end{array} \right ) ( 0.25)^5 (0.75)^5 = 0.0583\)
  5. True. We see this by computing: \[\binom{5}{3} = \frac{5!}{3!(5-3)!}=10\]
  6. False. \(P(3 \; heads)=0.5^{5}=0.03125\). \(10\) such combinations are possible so the probability is \(0.3125\)
  7. False. Using \(P( \mathrm{ at \; least \; one} ) = 1 - P( \mathrm{ none} )\) gives\(1 - ( 0.98 )^{10} = 0.1829\)
  8. False. From a Binomial distribution \(E(X) = np = 90 \times 0.4 = 36\) and the standard deviation\(s_X = \sqrt{ n p ( 1 - p ) } = \sqrt{ 21.6 } = 4.65\)
  9. True. \(P( \mathrm{ at \; least \; one \; head} ) = 1 - ( 0.5 )^4 = 0.9375\)
  10. False. The total wins of the Red Sox is a binomial random variable with\(n=7\) and \(p=0.5\). The probability of \(4\) wins is then\(\binom{7}{4} 0.5^{4} 0.5^{7-4}=35\cdot (0.5)^{7}=0.27\). Alternatively, if the series is stopped as soon as one time wins 4matches, the probability of a 4-3 outcome in favor of the Red Sox is\(\binom{6}{3} 0.5^{3} 0.5^{6-3}\cdot 0.5=0.16\), since the teams should first reach a 3-3 tie
  11. False. The variance of a \(Bin(10, 0.5)\) random variable is\(np(1-p) = 2.5\) and not \(2\)
  12. True. The Bernoulli random variable is the building block of the binomial (\(n=1\)) and hence has variance \(p(1-p)\)
  13. False. The probability of one defective is given by the binomial probability \(10 \times 0.15 \times (0.85)^9 = 0.3474\)

Exercise 84 (True/False (Poisson Distribution))  

  1. If \(X \sim Poi (2)\) and \(Y \sim Poi (3)\), then \(X+Y \sim Poi (6)\).
  2. Arsenal are playing Burnley at home in an English Premier League (EPL)game this weekend. They are favorites to win. They have a Poisson distribution for the number of goals they will score with a mean rate of \(2.5\) per game. Given this, the odds of Arsenal scoring at least two goals is greater than \(50\)%.
  3. Arsenal are playing Swansea tomorrow in an English Premier League (EPL). They are favorites to win. The number of goals they expect to score is Poisson with a mean rate of \(2.2\). Given this, the odds of Arsenal scoring at least one goal is greater than \(60\%\).
  4. Arsenal are playing Liverpool at home in an EPL game this weekend. You think that the number of goals to be scored by both teams follow a Poisson distribution with rates \(2.2\) and \(1.6\) respectively. Given this, the odds of a scoreless \(0-0\) draw are \(45-1\).
  5. Suppose your website gets on average \(2\) hits per hour. Then the probability of at least one hit in the next hour is \(0.135\).
  6. The soccer team Manchester United scores on average two goals per game.Given that the distribution of goals is Poisson, the chance that they score two or less goals is \(87\)%

Solution:

  1. False. By the properties of Poisson distribution,\[E(X+Y) = E(X) + E(Y) = 2+3 = 5.\] So, X+Y does not follow Poi(6),which has a mean 6
  2. True. Here \(S\sim Pois(2.5)\). We need\(P(S \geq 2) = 1 - P(S \leq 1) = 1 - 0.2873 = 0.7127.\) In R we have ppois(1,2.5) = 0.28729
  3. True. P(#Goal > 0) = 1-P(#Goal = 0) = 1-ppois(0,2.2)=0.89.
  4. False. Assuming each scoring’s ability is independent, we have that:\[\begin{aligned}Pr(Arsenal=0) &=& \frac{(2.2)^0 e^{-2.2}}{0!} = e^{-2.2} \\Pr(Liverpool=0) &=& \frac{(1.6)^0 e^{-1.6}}{0!} = e^{-1.6} \\\longrightarrow\quad Pr(Arsenal=0\;and\; Liverpool=0) &=& e^{-2.2}\times e^{-1.6} = e^{-3.8}\end{aligned}\]The odds are: \(O = (1-e^{-3.8})/e^{-3.8}\) or \(43.7\) to \(1\).
  5. False, dpois(0,2) = 0.135 \(= e^{-2}\) so that \(P(hit>1)= 1 - P(hit=0)=0.865\)
  6. False. We have that :\[E[g]=2\quad\Rightarrow\quad g\sim Pois(\lambda=2)\] So, we find the chance that they score two or fewer goals is:\[P(g\le2)=0.6767.\] In R: ppois(2,2)=0.67667

Exercise 85 (True/False (Normal Distribution))  

  1. The returns for Google stock on the day of earnings are normally distributed with a mean of \(5\)% and a standard deviation of \(5\)%. The probability that you will make money on the day of earnings is approximately \(60\)%.
  2. For any normal random variable, \(X\), we have \(\mathbb{P} \left ( \mu - \sigma < X < \mu + \sigma \right ) = 0.64\). Hint: You may use \(pnorm(1) = 0.841\)
  3. Suppose that the annual returns for Facebook stock are normally distributed with a mean of \(15\)% and a standard deviation of \(20\)%. The probability that Facebook has returns greater than \(10\)% for next year is \(60\)%
  4. Consider the standard normal random variable \(Z \sim N ( 0 , 1 )\). Then the random variable \(-Z\) is also standard normal.
  5. A local bank experiences a \(2\)% default rate on residential loans made in a certain city. Suppose that the bank makes \(2000\) loans. Then the probability of more than \(50\) defaults is \(25\) percent.
  6. The Binomial distribution can be approximated by a normal distribution when the number of trials is large.
  7. Let \(X \sim N(5, 10)\). Then \(P \left (X>5 \right ) = \frac{1}{2}\).
  8. Shaquille O’Neal has a \(55\)% chance of making a free throw in Basketball. Suppose he has \(900\) free throws this year. Then the chance he makes more than \(500\) free throws is \(45\)%
  9. Suppose that the random variable \(X \sim N ( -2 , 4 )\) then \(- 2 X \sim N ( 4 , 16 )\).
  10. Mortimer’s steak house advertises that it is the home of the \(16\) ounce steak. They claim that the weight of their steaks is normally distributed with mean \(16\) and standard deviation \(2\). If this is so,then the probability that a steak weights less that \(14\) ounces is\(16\)%.
  11. Advertising costs for a \(30\)-second commercial are assumed to be normally distributed with a mean of \(10,000\) and a standard deviation of\(1000\). Then the probability that a given commercial costs between\(9000\) and \(10,000\) is \(50\)%.
  12. In a sample of \(120\) Zagat’s ratings of Chicago restaurants, the average restaurant had a rating of \(19.6\) with a standard deviation of \(2.5\). If you randomly pick a restaurant, the chance that you pick one with with a rating over \(25\) is less than \(1\)%.
  13. A hospital finds that \(20\)% of its bills are at least one month in arrears. A random sample of \(50\) bills were taken. Then the probability that less than \(10\) bills in the sample were at least one month in arrears is \(50\)%
  14. A Chicago radio station believes \(30\)% of its listeners are younger than\(30\). Out of a sample of \(500\) they find that \(250\) are younger than\(30\). This data supports their claim at the \(1\)% level.
  15. The probability that a standard normal distribution is more than \(1.96\)standard deviations from the mean is \(0.05\).
  16. Suppose that the amount of money spent at Disney World is normally distributed with a mean of $60 and a standard deviation of $15. Then approximately 45% of people spend more than $70 per visit.
  17. A Normal distribution with mean 4 and standard deviation 3.6 will provide a good approximation to a Binomial random variable with parameters \(n=40\) and \(p= 0.10\).
  18. If \(X\) is normally distributed with mean \(3\) and variance \(9\) then the probability that \(X\) is greater than \(1\) is \(0.254\).

Solution:

  1. False.\[P(ret > 0) = 1 - P(ret \leq 0) = 1 - pnorm(0, 0.05, 0.05) = 84.2\%\]
  2. False. \(pnorm(1) = 0.841\) and \(2 \times pnorm(1)-1=0.6828>0.64\)
  3. True. Let R denote returns. Convert to standard normal and compute\[\mathbb{P} \left ( \frac{R- \mu}{\sigma} > \frac{0.1-\mu}{\sigma} \right ) = \mathbb{P} \left ( Z > - 0.25 \right ) = 0.5987\]where \(\mu = 0.15\) and \(\sigma = 0.2\).
  4. True. If \(X \sim N(0,1)\), then\(-X \sim N\Big(-1(0),(-1)^2\times 1\Big) = N(0,1)\). Moreover, we have that the pdf of \(Z\) and \(-Z\) is the same: \[\begin{aligned}f(z) = \frac{1}{\sqrt{2\pi}}\exp{-\frac{1}{2}z^2} = \frac{1}{\sqrt{2\pi}}\exp{-\frac{1}{2}(-z)^2} = f(-z)\end{aligned}\]Lastly, we can observe that all the even moments are the same due to symmetry and all the odd moments are zero. Thus, they are the same distribution.
  5. False. The probability of having more than 50 defaults can be calculated using a Normal approximation to the \(Bin ( 2000, 0.02) \approx N ( 2000 \times 0.02 , 2000 \times 0.02 \times 0.98 )\).So, the mean is 40 and the standard deviation is 6.261. Therefore, \[ \begin{aligned}Pr(default > 50) &=& Pr\Big(\frac{default-40}{6.261} - \frac{50-40}{6.261}\Big) = Pr(z>1.5972) \approx .055 \end{aligned} \]
  6. True, if \(n\) is large enough, a reasonable approximation to \(B(n,p)\) is\(N(np,np(1-p))\)
  7. True, since the mean is 5, then half the density will be to the right of 5
  8. False. \(\mu=np=495,\,\sigma=\sqrt{np(1-p)}=14.92\). Then\(P(X>500)=P(Z>0.335)=1-0.631=36.9\%\)
  9. True. We have \(- 2 X \sim N ( 4 , 16 )\).
  10. True. We have\(P( X < 14 ) = P \left ( \frac{ X - 16 }{2} < \frac{ 14 - 16 }{2} \right ) = P( Z < - 1 ) = 0.1587\)
  11. False. Required probability is \(0.5 - P( Z < -1 ) = 0.3413\)
  12. False. \(P(X>25)= 1 - P( Z < 2.16) = 0.0154\)
  13. True. Using normal approximation, \(n=50, p=0.2\), have\(X \sim N ( 10 , 8 )\)
  14. False. \(n=500\), \(\hat{p} \sim N \left ( p , \frac{p(1-p)}{n} \right )\)and \(H_0 : p = 0.3\) versus \(H_1 : p \neq 0.3\). Yields a \(99\)% confidence interval of \((0.1835 , 0.4165 )\). Sample proportion doesn’t lie in confidence interval so the data does not support their claim. Or could use a \(Z\)-test
  15. True. In a normal distribution, 2.5% of the population is more than 1.96standard deviations below the mean, and 2.5% is more than 1.96 standard deviations above the mean. Hence 5% of the distribution is more than1.96 standard deviations from the mean
  16. False. \(z_{70}=\frac{70-60}{15}=0.67\) and 1-F(\(z_{70}\)) = 0.25. Hence,about 25% of people spend more than $70 per visit.
  17. False. For \(n \geq 40\), we can approximate a Binomial distribution \(Bin(n,p)\) with a normal \(N ( np , np (1-p))\). In this case, we get\(np = 4\) and variance \(40 \times 0.1 \times 0.9 = 3.6\) or a standard deviation of \(1.90\)
  18. False. The standardized \(Z\)-score is given by \(Z= (1-3)/3 = -0.667\). \(P(X> 1) = P(Z> -0.667) = P(Z < 0.667) = 0.7486\)
pnorm(0.667) # 0.7476139
[1] 0.75

Exercise 86 (Tarone Study) @tarone1982use reports data from 71 studies on tumor incidence in rats

  1. In one of the studies, 2 out of 13 rats had tumors. Assume there are 20 possible tumor probabilities: \(0.025, 0.075,\ldots, 0.975\). Assume that the tumor probability is uniformly distributed. Find the posterior distribution for the tumor probability given the data for this study.
  2. Repeat Part a for a second study in which 1 in 18 rats had a tumor.
  3. Parts a and b assumed that each study had a different tumor probability, and that these tumor probabilities were uniformly distributed a priori. Now, assume the tumor probabilities are the same for the two studies, and that this probability has a uniform prior distribution. Find the posterior distribution for the common tumor probability given the combined results from the two studies.
  4. Compare the three distributions for Parts a, b, and c. Comment on your results.

Solution:

We assume that tumors occur independently, with a probability of \(\Theta\) of a tumor in each rat in the study. Our data X consist of observations on N rats, in which we record which rats have tumors. We can use either of two methods to represent the likelihood of what we have observed:

Method 1: We list, for each observation, whether or not the rat has a tumor. We multiply together the probability of getting the observation we obtained: \(\Theta\) if the rat has a tumor, and 1-\(\Theta\) if the rat does not have a tumor. If x out of n rats have tumors, the probability of observing this sequence of observations is \(\Theta^k(1-\Theta)^{n-k}\). Using Method 1, the posterior distribution for \(\Theta\) is given by: \[ g (\theta | x ) = \dfrac{f ( x | \theta ) g (\theta )}{\sum_{i=1}^{20} f ( x | \theta_i ) g (\theta_i )} = \dfrac{\theta^x (1 − \theta )^{n- x}}{\sum_{i=1}^{20} \theta_i^x (1 − \theta_i )^{n − x}} \]

Method 2: We count how many of the rats have tumors. We remember from our previous statistics class that if the tumors occur independently with probability \(\Theta\), the number of rats with tumors out of a total of \(n\) rats will have a Binomial distribution with parameters \(n\) and \(\Theta\). The probability of getting \(k\) tumors is \[ C_n^k \theta^k (1 − \theta )^{n − k},\quad k = 0, 1, 2, ..., n,\quad \mathrm{where}\quad C_n^k = \dfrac{n!}{k!(n-k)!} \] Then, the posterior distribution for \(\Theta\) is given by: \[ g (\theta | x ) = \dfrac{f ( x | \theta ) g (\theta )}{\sum_{i=1}^{20} f ( x | \theta_i ) g (\theta_i )} = \dfrac{C_n^k \theta^x (1 − \theta )^{n − x}}{\sum_{i=1}^{20} C_n^k \theta_i^x (1 − \theta_i )^{n − x}} \]

In Method 1, we are representing the likelihood of getting this exact sequence of observations. In Method 2, we are representing the likelihood of getting this number of tumors.

It is clear that the two methods give the same posterior distribution, because the constant \(C_n^k\) cancels out of the numerator and denominator of Bayes rule. That is, the number of tumors is a sufficient statistic for \(\Theta\), and the order in which the tumors occur is irrelevant.

Part a. The posterior distribution for \(\Theta_1\), the tumor probability in the first study, is: \[ g (\theta | x =2) = \dfrac{\theta^2 (1 − \theta )^{11}}{\sum_{i=1}^{20} \theta_i^2 (1 − \theta_i )^{11}} \]

theta <- seq(0.025, 0.975, 0.05)
prior <- rep(1/20, 20)
tarone = function(n,k) {
    likelihood <- theta^k * (1 - theta)^(n-k)
    posterior <- likelihood * prior / sum(likelihood * prior)
    print(knitr::kable(data.frame(theta, likelihood, posterior)))
    barplot(posterior, names.arg = theta, xlab = expression(theta), ylab = "Pr(theta | X)")
    return(posterior)
}
posta = tarone(13,2)


| theta| likelihood| posterior|
|-----:|----------:|---------:|
|  0.03|          0|      0.03|
|  0.08|          0|      0.13|
|  0.12|          0|      0.20|
|  0.18|          0|      0.20|
|  0.22|          0|      0.17|
|  0.28|          0|      0.12|
|  0.33|          0|      0.08|
|  0.38|          0|      0.04|
|  0.43|          0|      0.02|
|  0.48|          0|      0.01|
|  0.52|          0|      0.00|
|  0.58|          0|      0.00|
|  0.63|          0|      0.00|
|  0.68|          0|      0.00|
|  0.73|          0|      0.00|
|  0.78|          0|      0.00|
|  0.83|          0|      0.00|
|  0.88|          0|      0.00|
|  0.92|          0|      0.00|
|  0.98|          0|      0.00|

Part b. The posterior distribution for \(\Theta_2\), the tumor probability in the second study, is: \[ g (\theta | x =2) = \dfrac{\theta (1 − \theta )^{17}}{\sum_{i=1}^{20} \theta_i(1 − \theta_i )^{17}} \]

postb  = tarone(18,1)


| theta| likelihood| posterior|
|-----:|----------:|---------:|
|  0.03|       0.02|      0.27|
|  0.08|       0.02|      0.33|
|  0.12|       0.01|      0.21|
|  0.18|       0.01|      0.11|
|  0.22|       0.00|      0.05|
|  0.28|       0.00|      0.02|
|  0.33|       0.00|      0.01|
|  0.38|       0.00|      0.00|
|  0.43|       0.00|      0.00|
|  0.48|       0.00|      0.00|
|  0.52|       0.00|      0.00|
|  0.58|       0.00|      0.00|
|  0.63|       0.00|      0.00|
|  0.68|       0.00|      0.00|
|  0.73|       0.00|      0.00|
|  0.78|       0.00|      0.00|
|  0.83|       0.00|      0.00|
|  0.88|       0.00|      0.00|
|  0.92|       0.00|      0.00|
|  0.98|       0.00|      0.00|

Part c. If the tumor probabilities are the same in the two studies, it is appropriate to treat them as a single study in which 3 rats out of 31 have a tumor. The posterior distribution for \(\Theta\), the common tumor probability, is: \[ g (\theta | x =2) = \dfrac{\theta^3 (1 − \theta )^{28}}{\sum_{i=1}^{20} \theta_i^3(1 − \theta_i )^{28}} \]

postc = tarone(31,3)


| theta| likelihood| posterior|
|-----:|----------:|---------:|
|  0.03|          0|      0.06|
|  0.08|          0|      0.34|
|  0.12|          0|      0.34|
|  0.18|          0|      0.18|
|  0.22|          0|      0.07|
|  0.28|          0|      0.02|
|  0.33|          0|      0.00|
|  0.38|          0|      0.00|
|  0.43|          0|      0.00|
|  0.48|          0|      0.00|
|  0.52|          0|      0.00|
|  0.58|          0|      0.00|
|  0.63|          0|      0.00|
|  0.68|          0|      0.00|
|  0.73|          0|      0.00|
|  0.78|          0|      0.00|
|  0.83|          0|      0.00|
|  0.88|          0|      0.00|
|  0.92|          0|      0.00|
|  0.98|          0|      0.00|

It is useful to verify that we get the same posterior distribution by using Method 2 to calculate the likelihood of the data. It is also useful to verify that we would get the same results in Part c if we used the posterior distribution from Part a as the prior distribution for Part c, and the likelihood from Part b as the likelihood for part c. That is, we would get the same result from a two-step process, in which we started with a uniform prior distribution, incorporated the results from the first study to obtain a posterior distribution, and then incorporated the results from the second study to get a new posterior distribution.

Part d. We can compare the distributions more easily if we plot the prior distribution, the posterior distributions for the parameters of the two individual studies, and the combined posterior distribution together (see below). We see that the posterior distribution from the first study puts more probability on higher values of \(\Theta\) than the posterior distribution for the second study. This is not surprising, because a higher proportion of rats in the first study had tumors. There is a great deal of overlap in the two distributions, which suggests that the two probabilities may be the same. Thus, it is reasonable to consider combining the two samples. If there had been very little overlap in the posterior distributions from Parts a and b, it would not have been reasonable to combine the studies. Later in the semester, we will look more carefully at the problem of when to combine probabilities and when to estimate them separately. When the studies are combined, the posterior distribution is narrower than the posterior distribution for either of the two individual studies, with less probability on very low and very high values of \(\Theta\). Note that the combined distribution is consistent with both of the individual distributions, in the sense that it places high probability on values that also have high probability in the other two distributions. The combined distribution is more concentrated around values close to the combined sample proportion of about 1 in 10.

allDist=rbind(prior,posta,postb,postc) 
barplot(allDist,main="Prior and Posterior Distributions for TumorProbability",xlab="Theta",ylab="Probability", col=c("mediumpurple","lightblue","pink","lightgreen"), border=c("darkviolet","darkblue", "red","darkgreen"),names.arg=theta, legend=c("Prior","Study 1", "Study 2", "Both Studies"), beside=TRUE, ylim=c(0,0.4))

Exercise 87 Let \(X\) and \(Y\) be independent and identically distributed as a \(Exp(1)\) distribution. Their joint distribution is given by

\[ f_{X,Y}(x,y) = \exp(-x)\exp(-y) \mathrm{ , where} \; \; 0 < x,y < \infty \]

  1. Use the convolution formula to find the distribution of \(X + \frac{1}{2} Y\). Check your answer by also using a moment generating function approach.
  2. Guess want happens if you consider \(X + \frac{1}{2} Y + \frac{1}{3} Z\) where \(Z\) is also \(Exp(1)\)?

Solution:

First, we need to show the family of exponential distributions is closed under scaling. Let \(Z = aY\) so that \[P(Z \leq z) = P(Y \leq z/a) = 1 - \exp(-z/a)\] Therefore, \(Z \sim Exp(1/a)\) and \(f_{aY} = \frac{1}{a}\exp(-Y/a)\).

Let \(W = X + \frac{1}{2}Y\). The density function of W is \[\begin{aligned} f_W(W = w) &=& P(X+\frac{1}{2}Y = w) \\ &=& \int_0^w f_{X,\frac{1}{2}Y}(x, w-x)dx \\ &=& \int_0^w f_X(x)f_{\frac{1}{2}Y}(w-x)dx \\ &=& \exp(-w)*\left(\frac{1}{2}\exp(-2w)\right)\end{aligned}\]

\[\begin{aligned} E(W) &=& \int w*f_W(w)dw \\ &=& \frac{3}{2}\left(\int w*\frac{1}{3}\exp(-3w)dw\right) \\ &=& \frac{3}{2} = E(X) + \frac{1}{2}E(Y)\end{aligned}\]

By induction, the density function for \(W = X + \frac{1}{2} Y + \frac{1}{3} Z\) should be \(\exp(-w)*\left(\frac{1}{2}\exp(-2w)\right)*\left(\frac{1}{3}\exp(-3w)\right)\)

Exercise 88 (Poisson MLE vs Baye) You are developing tools for monitoring number of adbvertisemnt clicks on a website. You have observed the following data:

y = c(4,1,3,4,3,2,7,3,4,6,5,5,3,2,4,5,4,7,5,2)

which represents the number of clicks every minute over the last 10 minutes. You assume that the number of clicks per minute follows a Poisson distribution with parameter \(\lambda\).

  1. Plot likelihood function for \(\lambda\).
  2. Estimate \(\lambda\) using Maximum Likelihood Estimation (MLE). MLE is the value of \(\lambda\) that maximizes the likelihood function or log-likelihood function. Maximizing likelihood is equivalent to maximizing the log-likelihood function (log is a monotonically increasing function).
  3. Using barplot, plot the predicted vs observed probabilities of for number of clicks from 1 to 7. Is the model a good fit?
  4. Assume that you know, that historically, the average number of clicks per minute is 4 and variance is also 4. Those numbers were valculated over a long period of time. You can use this information as a prior. Assume that the prior distribution is \(Gamma(\alpha,\beta)\). What would be appropriate values for \(\alpha\) and \(\beta\) that would represent this prior information?
  5. Find the posterior distribution for \(\lambda\) and calculate the Bayesian estimate for \(\lambda\) as the expectation over the posterior.
  6. After collecting data for a few days, you realized that about 20% of the observations are zero. How this information would change your prior distribution? This is an open-ended question.

Hint: For part c, you can use the following code to calculate the predicted probabilities for the number of clicks from 0 to 5.

tb = table(y)
observed = tb/length(y)
predicted = dpois(1:7,lambda_mle)

Solution: a)

lambda = seq(0, 10, 0.1)
likelihood = sapply(lambda, function(l) prod(dpois(y, l)))
plot(lambda, likelihood, type = "l", xlab = expression(lambda), ylab = "Likelihood")

lambda_mle = optim(par = 1, fn = function(l) -sum(dpois(y, l, log = TRUE)))$par
Warning in optim(par = 1, fn = function(l) -sum(dpois(y, l, log = TRUE))): one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly
lambda_mle
[1] 4

Alternatively, the likelihood function is \[ L(\lambda) = \prod_{i=1}^{n} \frac{\lambda^{y_i}e^{-\lambda}}{y_i!} \] and the log-likelihood function is \[ \log L(\lambda) = \sum_{i=1}^{n} y_i \log(\lambda) - n\lambda \] The MLE is the value of \(\lambda\) that maximizes the log-likelihood function. We can find the MLE by taking the derivative of the log-likelihood function with respect to \(\lambda\) and setting it equal to zero. This gives \[ \frac{d}{d\lambda} \log L(\lambda) = \frac{1}{\lambda} \sum_{i=1}^{n} y_i - n = 0 \] Solving for \(\lambda\) gives \[ \lambda = \frac{1}{n} \sum_{i=1}^{n} y_i. \] A simple average of the data gives the MLE for \(\lambda\).

mean(y)
[1] 4
pred = table(y)/length(y)
poisson = dpois(1:7,lambda_mle)
d = cbind(pred, poisson)
barplot(t(d),names.arg=1:7, xlab = "Clicks", ylab = "Frequency", beside = TRUE)

  1. We know that \(E(X) = \alpha/\beta = 4\) and \(Var(X) = \alpha/\beta^2 = 4\). For example, \(\alpha = 4\) and \(\beta = 1\) would match the historical data.
alpha = 4
beta = 1
alphas = alpha + sum(y)
betas = beta + length(y)
lambda_bayes = alphas/betas
knitr::kable(data.frame(alpha = alphas, beta = betas, lambda_bayes = lambda_bayes))
alpha beta lambda_bayes
83 21 4
  1. We can use a mixture prior with one component concentrated around zero and another component concentrated around the mean that is calculated by removing zeroes from the data. The component weights are 0.2 and 0.8 respectively.

Exercise 89 (Exponential Distribution) Let \(x_1, x_2,\ldots, x_N\) be an independent sample from the exponential distribution with density \(p (x | \lambda) = \lambda\exp (-\lambda x)\), \(x \ge 0\), \(\lambda> 0\). Find the maximum likelihood estimate \(\lambda_{\text{ML}}\). Choose the conjugate prior distribution \(p (\lambda)\), and find the posterior distribution \(p (\lambda | x_1,\ldots, x_N)\) and calculate the Bayesian estimate for \(\lambda\) as the expectation over the posterior.

Exercise 90 (Bernoulli Baeys) We have \(N\) Bernoulli trials with success probability in each trial being equal to \(q\), we observed \(k\) successes. Find the conjugate distribution \(p (q)\). Find the posterior distribution \(p (q | k, N)\) and its expectation.

Exercise 91 (Exponential Family (Gamma)) Write density function of Gamma distribution in a standard exponential family form. Find \(Ex\) and \(E\log x\) by differentiating the normalizing constant.

Exercise 92 (Exponential Family (Binomial)) Write density function of Binomial distribution in a standard exponential family form. Find \(Ex\) and \(Var x\) by differentiating the normalizing constant.

Exercise 93 (Normal) Let \(X_1\) and \(X_2\) are independent \(N(0,1)\) random variables. Let \[Y_1 = X_1^2 + X_2^2, \quad Y_2 = \frac{X_1}{\sqrt{Y_1}}\]

  1. Find joint distribution of \(Y_1\) and \(Y_2\).
  2. Are \(Y_1\) and \(Y_2\) independent or not?
  3. Can you interpret the result geometrically?

Density of \(N(\mu, \sigma^2)\) is \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

AB Testing

Exercise 94 (A/B Testing) During a recent breakout of the flu, 850 out 6,224 people diagnosed with the virus presented severe symptoms. During the same flu season, a experimental anti-virus drug was being tested. The drug was given to 238 people with the flu and only 6 of them developed severe symptoms. Based on this information, can you conclude, for sure, that the drug is a success?

Solution:

Null hypothesis is \(H_0: p_1 - p_2 = 0\), \(H_a: p_1 - p_2 \neq 0\). \[ \hat{p}_1 = \frac{850}{6224} = 0.136, \quad \hat{p}_2 = \frac{6}{238} = 0.025, \quad \hat{p}_1 - \hat{p}_2 = 0.136-0.025 = 0.111 \] The standard error is \[ s = \sqrt{\frac{0.136 \times (1 - 0.136)}{6224} + \frac{0.025 \times ( 1- 0.025)}{238} }= 0.011 \] t-stat is \[ t = \frac{\hat{p}_1 - \hat{p}_2 - 0}{s} = \frac{0.136 - 0.025}{0.011} = 10.09 > 1.96 \]

Yes, given this information we can conclude that the drug is working.

If you do confidence interval, the \(95\%\) confidence interval of \(p_1 - p_2\) is \[ [0.111 - 1.96 \times 0.011, 0.111 + 1.96\times 0.011] = [0.089, 0.133] \] which doesn’t contain 0. We reject the null hypothesis.

Exercise 95 (Tesla Supplier) Tesla purchases Lithium as a raw material for their batteries from either of two suppliers and is concerned about the amounts of impurity the material contains. The percentage impurity levels in consignments of the Lithium follows closely a normal distribution with the means and standard deviations given in the table below. The company is particularly anxious that the impurity level not exceed \(5\)% and wants to purchase from the supplier who is ore likely to meet that specification.

Mean Standard Deviation
Supplier A 4.4 0.4
Supplier B 4.2 0.6
  1. Which supplier should be chosen?
  2. What if Supplier B implements some quality control which has no effect on the standard deviation but raises their mean to \(4.6\)?

Exercise 96 (Red Sox) On September 24, 2003, Pete Thamel in the New York Times reported that the Boston Red Sox had been accused of cheating by another American League Team. The claim was that the Red Sox had a much better winning record at home games than at games played in other cities.

The following table provides the wins and losses for home and away games for the Red Sox in the 2003 season

Record
Team Home Wins Home Losses Away Wins Away Losses
Boston Red Sox 53 28 42 39

Is there any evidence that the proportion of Home wins is significantly different from home and away games?

Discuss any other issues that are relevant.

Hint: a 95% confidence interval for a difference in proportions \(p_1 - p_2\) is given by \[ ( \hat{p}_1 - \hat{p}_2 ) \pm 1.96 \sqrt{ \frac{ \hat{p}_1 ( 1 - \hat{p}_1 ) }{ n_1 } + \frac{ \hat{p}_2 ( 1 - \hat{p}_2 ) }{ n_2 } } \]

Solution:

The home and away winning proportions are given by \[ p_1 = \frac{53}{81} = 0.654 \; \; \text{ and} \; \; \text{ Var} = \frac{ p_1 ( 1- p_1 ) }{ n_1 } = \frac{53 \times 28 }{81^3} = 0.0028 \] and \[ p_2 = \frac{42}{81} = 0.518 \; \; \text{ and} \; \; \text{ Var} = \frac{ p_2 ( 1- p_2 ) }{ n_2 } = \frac{42 \times 39}{81^3} = 0.0031 \] Can either do the problem as a confidence interval or a hypothesis test.

For the CI, we get \[ \left ( \frac{53}{81} - \frac{42}{81} \right ) \pm 1.96 \sqrt{ \frac{53 \times 28 }{81^3} + \frac{42 \times 39}{81^3} } = ( 0.286 , -0.014 ) \] As the confidence interval contains zero there is no significant difference at the \(5\)% level.

For the hypothesis test we have the null hypothesis that \(H_0 : p_1 = p_2\) versus \(H_1 : p_1 \neq p_2\). Then the test statistic is approximately normally distributed (large \(n = n_1 + n_2\)) is and given by \[Z = \frac{ 0.654 - 0.518 }{\sqrt{ 0.0059} } = 1.875\] At the \(5\)% level the critical value is \(1.96\) and so there’s not statistical significance.

The major issue here is whether you should test whether the proportions from home and away are different. There might be a significant home team bias in general and we should test to see if the Red Sox advantage is significantly different from that. All things else equal this will reduce the significance of the Red Sox home advantage bias.

Exercise 97 (Myocardial Infarction) In a five year study of the effects of aspirin on Myocardial Infarction (MI), or heart attack, you have the following dataset on the reduction of the probability of getting MI from taking aspirin versus a Placebo, or control.

Treatment with MI without MI
Placebo 198 10845
Aspirin 104 10933
  1. Find a \(95\)% confidence interval for the difference in proportions \(p_1-p_2\).
  2. Perform a hypothesis test of the null \(H_0 : p_1 = p_2\) at a \(1\)% significance level.

Solution:

Converting to proportions:

Treatment with MI without MI n
Placebo 0.01793 0.98207 11043
Aspirin 0.00942 0.99058 11037
  1. The CI for a difference in proportion is: \[(\hat{p}_1-\hat{p}_2)\pm z_{\alpha/2}\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\] \[=(0.01793-0.00942)\pm 1.96\sqrt{\frac{0.01793(1-0.01793)}{11043}+\frac{0.00942(1-0.00942)}{11037}}\] \[=0.00851\pm 1.96\sqrt{0.0000015945+0.0000008454}=0.00851\pm 1.96\sqrt{0.0000024399}\] \[=0.00851\pm 1.96*0.00156205=0.00851\pm 0.00306\] Therefore, the CI for the difference in these proportions is: \[(0.00545,0.01157)\]
  2. At the 1% level, we would have a implied CI for the difference in proportions of: \[=0.00851\pm 2.58*0.00156205=0.00851\pm 0.00403\] Therefore, the CI for the difference in these proportions is: \[(0.00448,0.01254)\] We can see that this CI does not contain 0, so therefore we can reject the null that \(p_1=p_2\) at the 1% significance level.

Exercise 98 (San Francisco Giants) In October 1992, the ownership of the San Francisco Giants considered a sale of the franchise that would have resulted in a move to Florida. A survey from the San Francisco Chronicle found that in a random sample of \(625\) people, 50.7% would be disappointed by the move. Find a 95% confidence interval of the population proportion

Solution:

The 95% CI is found by \[ \hat{p} \pm z_{\alpha /2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=.507\pm 1.96\sqrt{\frac{.507\times .493}{625}}=.507\pm 2\times .020=\left(.467,.547\right) \]

Exercise 99 (Pfizer) Pfizer introduced Viagra in early 1998 and during \(1998\) of the \(6\) million Viagra users \(77\) died from coronary problems such as heart attacks. Pfizer claimed that this rate is no more than that in the general population.

You find from a clinical study of \(1,500,000\) men who were not on Viagra that \(11\) of then died of coronary problems in the same length of time during the \(77\) Viagra users who dies in \(1998\).

Do you agree with Pfizer’s claim that the proportion of Viagra users dying from coronary problems is no more than that of other comparable men?

Solution:

A 95% confidence interval for a difference in proportions \(p_1 - p_2\) is given by \[( \hat{p}_1 - \hat{p}_2 ) \pm 1.96 \sqrt{ \frac{ \hat{p}_1 ( 1 - \hat{p}_1 ) }{ n_1 } + \frac{ \hat{p}_2 ( 1 - \hat{p}_2 ) }{ n_2 } }\]

Can do a confidence interval or a \(Z\)-score test.

With Viagra, \(\hat{p}_1 = 77/6000000 = 0.00001283\) and without Viagra \(\hat{p}_2 = 11/1500000 = 0.00000733\). Need to test whether these are equal.

With a \(95\)% confidence interval for \(( p_1 - p_2 )\), you get an interval \(( 0.00000549 , 0.0000055 )\) which doesn’t contain zero. Hence evidence that the proportion is higher.

Measured very accurately due to the large sample sizes.

With testing might use a one-sided test and an \(\alpha\) of \(0.01\).

Exercise 100 (Voter: CI) Given a random sample of \(1000\) voters, \(400\) say they will vote for Donald Trump if the Republican nomination for the 2016 US Presidential Election. Given that he gets the nomination, a \(95\%\) confidence interval for the true proportion of voters that will vote for him includes \(45\%\).

Solution:

False, \(\hat p = \frac{400}{1000} = 40\%\) and \[ s_{\hat p} = \sqrt{\frac{\hat p(1-\hat p)}{1000}} = 0.0072 \] The \(95\%\) confidence interval is \([0.4 \pm] 0.0142\). which does not include \(45\%\).

Exercise 101 (Survey AB testing) In a pre-election survey of \(435\) voters, \(10\) indicated that they planned to vote for Ralph Nader. In a separate survey of \(500\) people, \(250\) said they planned to vote for George Bush.

  1. Perform a hypothesis test at the 5% level for the hypothesis that Nader will get 3% or less of the vote.
  2. Find a 95% confidence interval for the difference in the proportion of people that will vote for George Bush versus Ralph Nader.

Solution:

  1. \(H_{0}:p\leq 0.03\) \(H_{1}:p>0.03\) \(z=\frac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\) where \(\hat{p}=\frac{10}{435}=0.023\) and \(p_{0}=0.03\). Hence, \(z=\frac{0.023-0.03}{\sqrt{\frac{0.03\cdot 0.97}{435}}}=-0.86\), \(z_{\alpha }=1.65\) (one-sided test!) so we do not reject \(H_{0}\) as \(z<z_{\alpha }\).
  2. We know that the confidence interval is given by \((\hat{p}_{1}-\hat{p}_{2})\pm 1.96\sqrt{\frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}}+\frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}}}\) and we have \(\hat{p}_{1}=\frac{250}{500}=0.5\) and \(\hat{p}_{2}=\frac{10}{435}=0.023\). Plugging everything into the formula we find: \((0.5-0.023)\pm 1.96\sqrt{\frac{0.5\cdot 0.5}{500}+\frac{0.023\cdot 0.977}{435}}=[0.431,0.523]\)

Exercise 102 (Significance Tests.) Some defendants in southern states have challenged verdicts made during the ’50s and ’60s, because of possible unfair jury selections. Juries are supposed to be selected at random from the population. In one specific case, only 12 jurors in an 80 person panel were African American. In the state in question, 50% of the population was African American. Could a jury with 12 African Americans out of 80 people be the result of pure chance?

Using \(n = 80\), \(X = 12\) find a 99% confidence interval for the proportion \(p\). Is that significantly different from \(p = 0.5\).

Exercise 103 A marketing firm is studying the effects of background music on people’s buying behavior. A random sample of \(150\) people had classical music playing while shopping and \(200\) had pop music playing. The group that listened to classical music spent on average $ \(74\) with a standard deviation of $ \(18\) while the pop music group spent $ \(78.4\) with a standard deviation of $ \(12\).

  1. Test whether there is any significant difference between the difference in purchasing habits. Describe clearly your null and alternative hypotheses and any test statistics that you use.
  2. Is there a difference between using a \(5\)% and \(1\)% significance level.

Solution:

We are given the following: \(\bar{x}=74,s_{x}=18,n_{x}=150,\bar{y}=78.4,s_{y}=12,n_{y}=200\)

\[\begin{aligned} H_{0}:\mu_{x}-\mu_{y} & = & 0\\ H_{1}:\mu_{x}-\mu_{y} & \neq & 0\end{aligned} \]

In order to test for this hypothesis, we calculate the z-score and compare it with the z score at 5% and 1% confidence level.\[z=\frac{\bar{x}-\bar{y}}{\sqrt{\frac{s_{x}^{2}}{n_{x}}+\frac{s_{y}^{2}}{n_{y}}}}=\frac{74-78.4}{\sqrt{\frac{324}{150}+\frac{144}{200}}}=-2.592\]

Since \(z>z_{0.025}\), we can reject the null hypothesis at 95% confidence level and say that there exist significant differences between purchasing habits.

For \(\alpha=0.05\), \(z_{\alpha/2}=1.96\) and for \(\alpha=0.01\), \(z_{\alpha/2}=2.575\). Thus, the answer does not change when we move from 95% to 99% confidence level.

Alternately, the above question can be solved by creating a confidence interval based on the differences. Again, we will reach the same conclusion.

Exercise 104 (Sensitivity and Specificity) The quality of Nvidia’s graphic chips have the probability that a randomly chosen chip being defective is only \(0.1\)%. You have invented a new technology for testing whether a given chip is defective or not. This test will always identify a defective chip as defective and only “falsely” identify a good chip as defective with probability \(1\)%

  1. What are the sensitivity and specificity of your testing device?
  2. Given that the test identifies a defective chip, what’s the posterior probability that it is actually defective?
  3. What percentage of the chips will the new technology identify as being defective?
  4. Should you advise Nvidia to go ahead and implement your testing device? Explain.

Solution:

We have the following probabilities (\(D\) represents “defective chip” and \(T\) represents “test result indicates defective chip”): \[P(D)=0.001\] \[P(T|D)=1\] \[P(T|\bar{D})=0.01\]

  1. Sensitivity: \(P(T|D)=1\) and Specificity: \(P(\bar{T}|\bar{D})=1-P(T|\bar{D})=1-0.01=0.99\)
  2. \[P(D|T)=\frac{P(T|D)P(D)}{P(T)}=\frac{P(T|D)P(D)}{P(T|D)P(D)+P(T|\bar{D})P(\bar{D})}\] \[=\frac{(1)(0.001)}{(1)(0.001)+(0.01)(1-0.001)}=0.090992\]
  3. \[P(T) = P(T|D)P(D)+P(T|\bar{D})P(\bar{D})\] \[=(1)(0.001)+(0.01)(1-0.001)=0.01099\]
  4. No, only 9% of those chips indicated to be defective by the test will actually be defective. Essentially, we would be throwing away 91% of the chips indicated to be defective even though they are perfectly fine!

Exercise 105 (CI fo Google) Google is test marketing a new website design to see if it increases the number of click-through on banner ads. In a small study of a million page views they find the following table of responses

Total Viewers Click-Throughs
new design 700,000 10,000
old design 300,000 2,000

Find a \(99\)% confidence interval for the increase in the proportion of people who click-through on banner ads using the new web design.

Hint: a 99% confidence interval for a difference in proportions \(p_1 - p_2\) is given by \(( \hat{p}_1 - \hat{p}_2 ) \pm 2.58\sqrt{ \frac{ \hat{p}_1 ( 1 - \hat{p}_1 ) }{ n_1 } + \frac{ \hat{p}_2 ( 1 - \hat{p}_2 ) }{ n_2 } }\)

Solution:

We are interested in the distribution of \(p_1-p_2\). We have that: \[ \begin{aligned} \hat{p}_1 = \frac{10000}{700000} = \frac{1}{70} \\ \hat{p}_1 = \frac{2000}{300000} = \frac{1}{150} \end{aligned} \]

Following the hint, we have that the \(99\%\) confidence interval is:

\[ \begin{aligned} p_1 - p_2 &\in & \Big[\hat{p}_1- \hat{p}_2 - 2.58\sqrt{\frac{\hat{p}_1(1-\hat{p}_1 )}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}, \hat{p}_1- \hat{p}_2 + 2.58\sqrt{\frac{\hat{p}_1(1-\hat{p}_1 )}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\Big] \\ &=& [0.00762-0.00053,0.00762+0.00053] \\ &=& [0.00709,0.00815] \end{aligned} \]

Exercise 106 (Amazon) Amazon is test marketing a new package delivery system. It wants to see if same-day service is feasible for packages bought with Amazon prime. In a small study of a hundred thousand delivers they find the following times for delivery

Deliveries Mean-Time (Hours) Standard Deviation-Time
new system 80,000 4.5 2.1
old system 20,000 5.6 2.5
  1. Find a \(95\)% confidence interval for the decrease in delivery time.
  2. If they switch to the new system, what proportion of deliveries will be under \(5\) hours which is required to guarantee same day service.

Hint: a 95% confidence interval for a difference in means \(\mu_1 - \mu_2\) is given by \(( \bar{x}_1 - \bar{x}_2 ) \pm 1.96 \sqrt{ \frac{s_1^2 }{ n_1 } + \frac{ s_2^2 }{ n_2 } }\) ]

Solution:

  1. Follow the hint, the answer is (-1.14, -1.06).
(4.5-5.6)-1.96*sqrt(2.1^2/80000+2.5^2/20000) #-1.14
[1] -1.1
(4.5-5.6)+1.96*sqrt(2.1^2/80000+2.5^2/20000) #-1.06
[1] -1.1

Hence, the 95% confidence interval is (-1.14, -1.06). a) As we have a large sample size, we can assume a normal distribution with given sample mean and standard deviation, \(P(T \leq 5) = 0.594\)

pnorm(5,mean=4.5,sd=2.1) #0.594
[1] 0.59

Hence, 59.4% of deliveries will be under 5 hours.

Exercise 107 (Vitamin C) In the very famous study of the benefits of Vitamin C, \(279\) people were randomly assigned to a dose of vitamin C or a placebo (control of nothing). The objective was to study where vitamin C reduces the incidence of a common cold. The following table provides the responses from the experiment

Group Colds Total
Vitamin C 17 139
Placebo 31 140
  1. Is there a significant difference in the proportion of colds between the vitamin C and placebo groups?
  2. Find a \(99\)% confidence interval for the difference. Would you recommend the use of vitamin C to prevent a cold?

Hint: a 95% confidence interval for a difference in proportions \(p_1 - p_2\) is given by \[ ( \hat{p}_1 - \hat{p}_2 ) \pm 1.96 \sqrt{ \frac{ \hat{p}_1 (1- \hat{p}_1) }{ n_1 } + \frac{ \hat{p}_2 (1- \hat{p}_2) }{ n_2 } } \]

Solution:

To test \[H_0: p_1 - p_2 \neq 0,\] we could perform a two-sample t-test. \[ \hat{p}_1 = \frac{17}{139} = 12.23\% \mbox{ and } \hat{p}_2 = \frac{31}{140} = 22.14\% \] \[ \hat{\sigma} = \sqrt{ \frac{ \hat{p}_1 (1- \hat{p}_1) }{ n_1 } + \frac{ \hat{p}_2 (1- \hat{p}_2) }{ n_2 } } = 4.48\% \] \[ t-stat = \frac{\hat{p}_1 - \hat{p}_2}{\hat{\sigma} } = -2.21 < -1.96 = qnorm(0.025) \] Yes, we can reject the null hypothesis and there is a significant difference at 5% level.

To form the 99% confidence interval of the difference, we use

qnorm(0.995) #2.58.
[1] 2.6

\[ 0 \in [-9.91\% - 2.58*4.48\%, -9.91\% + 2.58*4.48\%] \] We cannot recommend the use of vitamin to prevent a cold at the 1% significance level.

Exercise 108 (Facebook vs Reading) In a recent article it was claimed that “\(96\)% of Americans under the age of \(50\)” spent more than three hours a day on Facebook.

To test this hypothesis, a survey of \(418\) people under the age of \(50\) were taken and it was found that \(401\) used Facebook for more than three hours a day.

Test the hypothesis at the \(5\)% level that the claim of \(96\)% is correct.

Solution:

We want to test the proportion of people under the age of 50 on Facebook. \[H_0: p = 96\%\]

The estimated proportion from the survey is \[\hat{p} = 401/418 = 0.959\]

The standard deviation of the sample proportion is \[\hat{\sigma} = \sqrt{\frac{\hat{p}*(1-\hat{p})}{n}} = 0.0096\]

To perform a t-test, the corresponding statistic is \[\mbox{t-stat} = \frac{0.959 - 0.96}{0.0096} = -0.1\]

It seems that we cannot reject the null hypothesis at the 5% level.

Exercise 109 (Paired T-Test) The following table shows the outcome of eight years of a ten year bet that Warren Buffett placed with Protege Partners, a New York hedge fund. Buffett claimed that a simple index fund would beat a portfolio strategy (fund-of-funds) picked by Protege over a ten year time frame. At Buffett’s shareholder meeting, he provided an update of the current state of the bet. The bundle of hedge funds picked by Protege had returned \(21.9\)% in the eight years through \(2015\) and the S&P500 index fund had soared \(65.7\)%.

SP Index Hedge Funds
2008 -37.0% -23.9%
2009 26.6% 15.9%
2010 15.1% 8.5%
2011 2.1% -1.9%
2012 16.0% 6.5%
2013 32.3% 11.8%
2014 13.6% 5.6%
2015 1.4% 1.7%
cumulative 65.7% 21.9%
  1. Use a paired \(t\)-test to assess the statistical significance between the two return strategies
  2. How likely is Buffett to win his bet in two years?

Solution:

To conduct a paired t-test, we need define a new variable \[ D = SP.Index - Hedge.Funds. \] Assume that \(D_i \overset{i.i.d.}{\sim} N(\mu, \sigma^2)\). the null hypothesis we want to test is \[ H_0: \mu=0 \] The mean and standard deviation of the difference \(\{D_i\}_{i=1}^n\) are \[ \bar{D} = 0.0574,\quad s_D = 0.0969 \]

SP.Index = c(-37,26.6,15.1,2.1,16,32.3,13.6,1.4)
Hedge.Funds = c(-23.9,15.9,8.5,-1.9,6.5,11.8,5.6,1.7)
SP.Index = SP.Index/100
Hedge.Funds = Hedge.Funds/100
mean(SP.Index-Hedge.Funds)
[1] 0.057
sd(SP.Index-Hedge.Funds)
[1] 0.097

Therefore the t-statistic is \[t=\frac{\bar{D}}{s_D\sqrt{1/n}} = \frac{0.0574}{0.0969\times\sqrt{1/8}}=1.6755.\] Notice that the degree of freedom is \(8 - 1 = 7\). The corresponding p-value is \[P(|t_7|>1.6755) = 0.1378\]

2*pt(1.6755,df=7,lower.tail = FALSE) #0.1377437
[1] 0.14

In R, you can easily perform t-test using the following command, which gives the same test result as above.

t.test(SP.Index,Hedge.Funds,paired = TRUE)

    Paired t-test

data:  SP.Index and Hedge.Funds
t = 2, df = 7, p-value = 0.1
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.024  0.138
sample estimates:
mean difference 
          0.057 

The p-value is larger than 0.05. Thus we don’t reject the null hypothesis at 5% significance level. In other words, these two return strategies are not significantly different.

For the second question, the exact probability of Buffett winning his bet is calculated by \[P\left(\frac{(1+S_9)(1+S_{10})(1+65.7\%)}{(1+H_9)(1+H_{10})(1+21.9\%)}>1\right)\] where \(S_9, S_{10}, H_9, H_{10}\) are returns of two strategies in 9th and 10th year. To derive the distribution of the left hand side expression is difficult. Here we instead give a simulation solution, where the returns are assumed to be i.i.d. Normal variables.

m1 = mean(SP.Index)
s1 = sd(SP.Index)
m2 = mean(Hedge.Funds)
s2 = sd(Hedge.Funds)
N = 1000 
cum1 = NULL # cumulative return of SP.Index
cum2 = NULL # cumulative return of Hedge.Funds
set.seed(410)
for (i in 1:N)
{ 
  # simulate returns in 2016 and 2017
  return1 = c(0.657, rnorm(n=2, mean = m1, sd = s1)) 
  # compute the cumulative return
  cum1 = c(cum1, cumprod(1+return1)) 
  
  return2 = c(0.219, rnorm(n=2, mean = m2, sd = s2))
  cum2 = c(cum2, cumprod(1+return2))
}

# compute the prob. of Buffett winning
mean(cum1>cum2) #0.9296667
[1] 0.93

The estimated probability is 92.97%, indicating that Buffett is very likely to win his bet. Since the paired t - test suggests no significant difference in average returns, you can also estimate the mean by pooling the data and redo the simulation, which gives a smaller estimate (89.27%).

Exercise 110 (Shaquille O’Neal) Shaquille O’Neal (nicknamed Shaq) is an ex-American Professional basketball player. He was notoriously bad at free throws (an uncontested shot given to a player when they are fouled). The following table compares the first three years of Shaq’s career to his last three years.

Group Free Throws Made Free Throws Attempted
Early Years 1352 2425
Later Years 1121 2132

Did Shaq get worse at free throws over his career?

Solution:

\[ \hat{p}_1 = \frac{1352}{2425} = 0.5575, \quad \hat{p}_2 = \frac{1121}{2132} = 0.5258, \quad se( \hat{p_1} - \hat{p_2} ) = \sqrt{ \frac{ \hat{p}_1 (1- \hat{p}_1) }{ n_1 } + \frac{ \hat{p}_2 (1- \hat{p}_2) }{ n_2 } } = 0.0148 \] The \(Z\)-score is \[ Z = \frac{ \hat{p}_1 - \hat{p}_2 }{ se( \hat{p_1} - \hat{p_2} ) } = 2.14 \; . \] The p-value of Z-test is \[P(|N(0,1)|>2.14) = 0.0324\] Thus we conclude that Shaq got worse at free throws.

Exercise 111 (Furniture Website) Furniture.com wishes to estimate the average annual expenditure on furniture among its subscribers. A sample of 1000 subscribers is taken. The sample standard deviation is found to be $670 and the sample mean is $3790.

  1. Give a 90% confidence interval for the average expenditure.
  2. Test the hypothesis that the average expenditure is less than $2,500 at the 1% level.

Exercise 112 (Grocery AB Testting) A grocery delivery service is studying the impact of weather on customer ordering behavior. To do so, they identified households which made orders both in October and in February on the same day of the week and approximately the same time of day. They found 112 such matched orders. The mean purchase amount in October was $121.45, with a standard deviation of $32.78, while the mean purchase amount in February was $135.99 with a standard deviation of $24.81. The standard deviation of the difference between the two orders was $38.28. To quantify the evidence in the data regarding a difference in the average purchase amount between the two months, compute the \(p\)-value of the data.

Exercise 113 (SimCity AB Testing) SimCity 5 is one of Electronic Arts (EA’s) most popular video games. As EA prepared to release the new version, they released a promotional offer to drive more pre-orders. The offer was displayed on their webpage as a banner across the top of the pre-order page. They decided to test some other options to see what design or layout would drive more revenue.

The control removed the promotional offer from the page altogether. The test lead to some very surprising results. With a sample size of \(1000\) visitors, of the \(500\) which got the promotional offer they found \(143\) people wanted to purchase the games and of the half that got the control they found that \(199\) wanted to buy the new version of SimCity.

Test at the \(1\)% level whether EA should provide a promotional offer or not.

Solution:

To see whether the promotion program works, we need to find out whether the proportion of people who want to buy the game is larger. Denote \(p_1\) is the proportion from the experimental group with promotion and \(p_2\) from the control group without promotion. Therefore, the null hypothesis we are testing is \[H_0: p_1 > p_2\]

In the sample, we observe \[n_1 = n_2 = 500\] \[\hat{p_1} = 199/500 \mbox{ and } \hat{p_2} = 143/500\]

To perform the two-sample t-test, we need to calculate the pooled sample standard deviation first. \[\bar{p} = \hat{p_1}*\frac{n_1}{n_1+n_2} + \hat{p_2}*\frac{n_2}{n_1+n_2} = 171/500\] \[\bar{\sigma} = \sqrt{\bar{p}*(1-\bar{p})*(\frac{1}{n_1} + \frac{1}{n_2})} = 0.03\]

Then, \[\mbox{t-stat} = \frac{\hat{p_1} - \hat{p_2}}{\bar{\sigma}} = 3.73 > \mbox{qnorm(0.995)} = 2.58\]

Now we can conclude that the promotional program works statistically at the 1% level.

Exercise 114 (True/False)  

  1. At the Apple conference it was claimed that “\(97\)% of people love the iWatch”. From a market survey, you found empirically that \(1175\) out of a sample of \(1400\) people love the new iWatch. Statistically speaking,you can reject Apple’s claim at the \(1\)% significance level. Hint: You may use \(pnorm(2.58) = 0.995\)
  2. Given a random sample of \(2000\) voters, \(800\) say they will vote for Hillary Clinton in the 2016 US Presidential Election. At the \(95\)%level, I can reject the null hypothesis that Hillary has an evens chance of winning the election.
  3. The average movie is Netflix’s database has an average customer rating of \(3.1\) with a standard deviation of \(1\). The last episode of Breaking Bad had a rating of \(4.7\) with a standard deviation of \(0.5\). The\(p-value\) for testing whether Breaking Bad’s rating is statistical different from the average is a lot less than \(1\)%.
  4. A chip manufacturer needs to add the right amount of chemicals to make the chips resistant to heat. On average the population of chips needs to be able to withstand heat of 300 degrees. Suppose you have a random sample of \(30\) chips with a mean of \(305\) and a standard deviation of\(8\). Then you can reject the null hypothesis \(H_0: \mu= 300\) versus\(H_a: \mu> 300\) at the \(5\)% level.
  5. Zagats rates restaurants on food quality. In a random sample of \(100\)restaurants you observe a mean of \(20\) with a standard deviation of\(2.5\). Your favorite restaurant has a score of \(25\). This is statistically different from the population mean at the \(5\)% level.
  6. The \(t\)-score is used to test whether a null hypothesis can be rejected.
  7. An oil company introduces a new fuel that they claim has on average no more than \(100\) milligrams of toxic matter per liter. They take a sample of \(100\) liters and find that \(\bar{X} = 105\) with a given\(\sigma_X = 20\). Then there is evidence at the \(5\)% level that their claim is wrong.
  8. A wine producer claims that the proportion of customers who cannot distinguish his product from grape juice is at most 5%. For a sample of\(100\) people he finds that \(10\) fail the taste test. He should reject his null hypothesis \(H_{0}:p=0.05\) at the \(5\)% level.
  9. As the \(t\)-ratio increases the \(p\)-value of a hypothesis test decreases.

Solution:

  1. True. \[\hat{p} = \frac{1175}{1400} = 83.9\%\]\[\hat{\sigma} = \sqrt{\frac{\hat{p}(1-\hat{p})}{1400}} = 0.98\%\] For\(H_0: p = 97\%\),\[t-stat = \frac{\hat{p} - 97\%}{0.98\%} = -13.37 < -2.58 = qnorm(0.005)\]Yes, we can reject the null hypothesis at 1% significance level.
  2. True. The t-statistic is\(T = (\hat{p}-0.5)/\sqrt{\frac{\hat{p} \times (1-\hat{p})}{n}} = -9.13 <\)where \(qnorm(0.025) = -1.96\)
  3. False. The formula for a T-ratio is \[\begin{aligned}T = \frac{\bar{x}_1-\bar{x}_2 - (\mu_1 -\mu_2)}{\sqrt{\frac{s_{x_1}^2}{n_1}+\frac{s_{x_2}^2}{n_2}}}\end{aligned}\] However, since we do not know \(n_1\) and \(n_2\) we cannot get the T-ratio and thus p-value.
  4. True. Since the sample size is \(n\ge30\) the t test is effectively the same as the z test. Therefore, we find the following z value:\[z=\frac{\bar{x}-\mu}{s/\sqrt{n}}=\frac{305-300}{8/\sqrt{30}}=3.4233\]Here we use a one-sided test: \[z_\alpha=1.65\] Since \(z>z_\alpha\), we can reject \(H_0\) in favor of \(H_a\) at the 5% level
  5. True. We can calculate the t statistic as \((25-20)/(2.5/10)>1.96\). Hence we reject the null.
  6. True. The \(t\) score can be used to test whether a null hypothesis can be rejected or not. Alternately, a z score can be used
  7. True. This is a one-side test \(H_0 : \mu_x \leq 100\) versus\(H_1 : \mu_X > 100\). The \(Z\)-score is\(Z = \frac{105 -100}{ 20 / \sqrt{100} } = 2.5\). The critical value for a 1-sided test at the \(5\)% level is \(1.64\). Therefore you can reject \(H_0\)are the \(5\)% .
  8. True. \(z=\frac{\hat{p}-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}=\frac{0.1-0.05}{\sqrt{\frac{0.05\cdot 0.95}{100}}}=2.29\) \(z_{\alpha /2}=1.96\)Reject the null-hypothesis since \(\left\vert z\right\vert >z_{\alpha/2}\)
  9. True. If \(t>0\) moving out into the tails decreases the \(p\)-value. If \(t<0\) it will increase the \(p\)-value as the \(t\) is getting less negative.

Exercise 115 (True/False: CI)  

  1. There is much discussion of the effects of second-hand smoke. In a survey of \(500\) children who live in families where someone smokes, it was found that \(10\) children were in poor health. A \(95\)% confidence interval for the probability of a child living in a smoking family being in poor health is then \(2\)% to \(4\)%.
  2. You are finding a confidence interval for a population mean. Holding everything else constant, an interval based on an unknown standard deviation will be wider than one based on a known standard deviation no matter what the sample size is.
  3. There is a \(95\)% probability that a normal random variable lies between \(\mu \pm \sigma\).
  4. A recent CNN poll found that \(49\)% of \(10000\) voters said they would vote for Obama versus Romney if that was the election in November 2012. A \(95\)% confidence interval for the true proportion of voters that would vote for Obama is then \(0.49 \pm 0.03\).
  5. A mileage test for a new electric car model called the “Pizzazz” is conducted. With a sample size of \(n=30\) the mean mileage for the sample is \(36.8\) miles with a sample standard deviation of \(4.5\). A \(95\)% confidence interval for the population mean is \((32.3,41.3)\) miles.
  6. In a random sample of \(100\) NCAA basketball games, the team leading after one quarter won the game \(72\) times. Then a 95% confidence interval for the proportion of teams leading after the first quarter that go on to win is approximately \((0.6,0.84)\).
  7. For the same sample, a 95% prediction interval for a particular team winning is also \((0.6,0.84)\).
  8. In playing poker in Vegas, from \(100\) hours of play, you make an average of $50 per hour with a standard deviation of $10. A 95% confidence interval for your mean gain per hour is approximately $ \(( 48 , 52 )\)
  9. If 27 out of 100 respondents to a survey state that they drink Pepsi then a 95% confidence interval for the proportion \(p\) of the population that drinks Pepsi is \(( 0.26 , 0.28 )\).
  10. The \(p\)-value is the probability that the Null hypothesis is true.

Solution:

  1. False The \(95\%\) confidence interval should be: \[\begin{aligned} \Bigg[.02 - 1.96\sqrt{\frac{.02(1-.02)}{500}},.02 + 1.96\sqrt{\frac{.02(1-.02)}{500}}\Bigg]\end{aligned}\] So, the lower bound on this interval will definitely be less than 2 percent.
  2. True Because you need to use the same data to estimate the standard deviation, and thus contains more noise or error. In other words, we are comparing between \(z\) distribution and \(t\) distribution
  3. False. There is a 95% probability that it lies between \(\mu\pm1.96\sigma\)
  4. False. The confidence interval for the proportion is given by: \[\hat{p}\pm1.96\sqrt{\hat{p}(1-\hat{p})/n}\] This gives: \[0.49\pm1.96\sqrt{(0.49)(0.51)/10000}=0.49\pm0.009798\]
  5. False. \(CI=36.8\pm1.96\frac{4.5}{\sqrt{30}}=(35.19,38.41)\)
  6. False. The 95% CI is approximately equal to \[.72 \pm 1.96 \sqrt{ \frac{ \hat{p} (1- \hat{p})}{n} } = 0.72 \pm 1.96 \times 0.045=(.63,.81)\]
  7. False. The prediction interval is equal to \[0.72 \pm 2 \sqrt{ \hat{p} (1- \hat{p} ) } \sqrt{1+1/n}\] In this case, that contains the entire interval (0,1)
  8. True. The CI is given by \(50 \pm 2 \times \frac{10}{\sqrt{100} } =(48,52)\)
  9. False. The 95% confidence interval for the proportion is given by \(\hat{p} \pm 1.96 \sqrt{ \hat{p} ( 1 - \hat{p} ) / n }\). Which here gives \(0.27 \pm 1.96 \sqrt{ 0.27 \times 0.73 / 100 }\) or the interval \(( 0.18 , 0.36 )\)
  10. False. The p-value is the probability of the observing something more extreme than the observed sample assuming the null hypothesis is true

Exercise 116 (True/False: Sampling Distribution)  

  1. The Central Limit Theorem states that the distribution of a sample mean is approximately Normal.
  2. The Central Limit Theorem states that the distribution of the sample mean \(\bar{X}\) is Normally distributed for large samples.
  3. The Central Limit Theorem guarantees that the distribution of \(\bar{X}\) is constant.
  4. The sample mean, \(\bar{x}\), approximates the population mean for large random samples.
  5. The trimmed mean of a dataset is more sensitive to outliers than the mean.
  6. The sample mean of a dataset must be larger than its standard deviation
  7. Selection bias is not a problem when you are estimating a population mean.
  8. The kurtosis of a distribution is not sensitive to outliers.

Solution:

  1. True. The standard deviation of the sample mean is \(\frac{\sigma}{\sqrt{n}}\). The CLT has (approximately) \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\), where \(\mu\) and \(\sigma^2\) are the population true value
  2. True. If \(X_i\) is a random sample (or \(iid\)) with mean \(\mu\), then \(\sqrt{n}\big(\bar{X} - \mu \big) \longrightarrow^d N(0,\sigma^2)\)
  3. False. For some set of independent random variables, the mean of a sufficiently large number of them (each with finite mean and variance) will have a distribution with variance depending on the number of samples.
  4. True. The CLT states that the \(\bar{X} \sim N \left ( \mu, \frac{\sigma^2}{n} \right )\), or the best guess for the population mean is \(\bar{x}\)
  5. False. The trimmed mean is the average of the observations deleting the outer 5% in the tails. Hence, it is less sensitive to outliers than the sample .
  6. False. Many datasets have a negative mean and the standard deviation must be positive.
  7. False. When sampling to estimate a population mean, you need to be certain to select a random sample, or selection bias may affect your results.
  8. False. Kurtosis depends on each observation in the distribution and is extremely sensitive to the outliers

Field vs Observational

Exercise 117 What is the difference between a randomized trial and an observational study?

Exercise 118 Consider a complex A/B experiment, with 6 alternatives, when you you have 5 variations to your page, plus the original.

  1. Use Bonferroni correction for multiple comparisons. Calculate the significance level for each of the 5 tests and find the number of samples needed to achieve a power of 0.95. Assume that the significance level for each test is .05.
  2. Implement TS for the same experiment. Assume an original arm with a 4% conversion rate, and an optimal arm with a 5% conversion rate. The other 4 arms include one suboptimal arm that beats the original with conversion rate of 4.5%, and three inferior arms with rates of 3%, 2%, and 3.5%. Plot the savings from a six-armed experiment, relative to a Bonferroni adjusted power calculation for a classical experiment. First plot should show the number of days required to end the experiment, with the vertical line showing the time required by the classical power calculation. The second plot should show the number of conversions that were saved by the bandit. What is the overall cost savings due to ending the experiment more quickly, and due to to the experiment being less wasteful while it is running?
  3. Run your simulator 500 times and shows the history of the serving weights for all the arms in the first of our 500 simulation runs. Comment on the results.
  4. Plot the daily cost of running the multi-armed bandit relative to an “oracle” strategy of always playing arm 2, the optimal arm

Solution:

  1. The standard power calculation with a significance level of 0.05 is .05 / (6 - 1), and find that we need 15,307 observations in each arm of the experiment. With 6 arms that’s a total of 91,842 observations
  2. The TS algorithm will converge to the optimal arm in 3,000 observations, and will have a 95% chance of having converged to the optimal arm in 5,000 observations. The savings are 86,842 observations, or 95% of the observations required by the classical power calculation.

Added After Course Started

Exercise 119 (Emily, Car, Stock Market, Sweepstakes, Vacation and Bayes.)  

Emily is taking Bayesian Analysis course. She believes she will get an A with probability 0.6, a B with probability 0.3, and a C or less with probability 0.1. At the end of semester she will get a car as a present form her (very) rich uncle depending on her class performance. For getting an A in the course Emily will get a car with probability 0.8, for B with probability 0.5, and for anything less than B, she will get a car with probability of 0.2. These are the probabilities if the market is bullish. If the market is bearish, the uncle is less likely to make expensive presents, and the above probabilities are 0.5, 0.3, and 0.1, respectively. The probabilities of bullish and bearish market are equal, 0.5 each. If Emily gets a car, she would travel to Redington Shores with probability 0.7, or stay on campus with probability 0.3. If she does not get a car, these two probabilities are 0.2 and 0.8, respectively. Independently, Emily may be a lucky winner of a sweepstake lottery for a free air ticket and vacation in hotel Sol at Redington Shores. The chance to win the sweepstake is 0.001, but if Emily wins, she will go to vacation with probability of 0.99, irrespective of what happened with the car.

After the semester was over you learned that Emily is at Redington Shores.

  1. What is the probability that she got a car?
  2. What is the probability that she won the sweepstakes?
  3. What is the probability that she got a B in the course?
  4. What is the probability that the market was bearish?

Hint: You can solve this problem by any of the 3 ways: (ii) direct simulation using R, or Python, and (ii) exact calculation. Use just one of the two ways to solve it. The exact solution, although straightforward, may be quite messy.

Solution: We introduce the following RV’s:

  • \(C \in \{0,1\}\): Emily gets a car
  • \(G\in \{A,B,\le C\}\): Emily’s grade
  • \(M\in \{1, 0\}\): Market condition (Bullish, Bearish)
  • \(R\in \{0, 1\}\): Vacation (Redington Shores)
  • \(L\): Emily is a lucky winner of a sweepstake

The conditional independence is given by the following graph

graph TD
    M((M)) --> C((C))
    G((G)) --> C
    C --> R((R))
    L((L)) --> R

we can write the joint distribution as \[ P(C,G,M,R,L) = P(M)P(G)P(C|G,M)P(L)P(R|C,L) \]

Here is what we know

Market (\(M\))
0 1
0.5 0.5
Grade (\(G\))
A B \(\le C\)
0.6 0.3 0.1
Car (\(C\mid M,G\))
0 1 M G
0.5 0.5 0 A
0.7 0.3 0 B
0.9 0.1 0 C
0.2 0.8 1 A
0.5 0.5 1 B
0.8 0.2 1 C
Vacation (\(R\mid L,C\))
0 1 L C
0.8 0.2 0 0
0.3 0.7 0 1
0.01 0.99 1 0
0.01 0.99 1 1
  1. We can calculate the probability of getting a car as \[ P(C=1\mid R=1) = P(C,R)/P(R) = \sum_{G,M,L}P(C=1,G,M,R=1,L)/\sum_{C,G,M,L}P(C,G,M,R=1,L) \]
m=c(0.5,0.5) # P(M)
g=c(0.6,0.3,0.1) # P(G)
jmg = c(t(outer(m, g, FUN = "*")))  # P(G)P(M)
cmg = c(0.5,0.3,0.1,0.8,0.5,0.2) # P(C=1 | M,G)
jcmg=jmg*cmg # P(M)P(G)P(C=1 | M,G)
l=c(0.001,0.999) # L
rlc = c(0.99,0.7) # P(R=1 | C=1,L)
cr=outer(jcmg, l*rlc, FUN = "*") # P(C=1 | M,G)P(M)P(G)P(R=1 | C=1,L)
sum(cr) # sum P(C=1 | M,G)P(M)P(G)P(R=1 | C=1,L)
[1] 0.37
c0mg = c(0.5,0.7,0.9,0.2,0.5,0.8) # P(C=0 | M,G)
jc0mg=jmg*c0mg # P(C=0 | M,G)P(M)P(G)
rlc0 = c(0.99,0.2) # P(R=1 | C=0,L)
c0r=outer(jc0mg, l*rlc0, FUN = "*") # P(C=1 | M,G)P(M)P(G)P(R=1 | C=1,L)
sum(c0r)
[1] 0.095
PR = sum(cr) + sum(c0r)

Thus \(P(C=1,R=1) = 0.3676523\) and \(P(R=1) = 0.4630275\), take the ratio to get \(P(C=1\mid R=1)\)

sum(cr)/PR
[1] 0.79

\[ P(C=1\mid R=1) = 0.3676523/0.4630275 = 0.794 \] We can also use MC simulations

gmc  = array(c(0.5,0.7,0.9,0.2,0.5,0.8, 0.5,0.3,0.1,0.8,0.5,0.2), dim=c(3,2,2)) # G,M,C
clr = array(c(0.8 ,0.3 ,0.01,0.01,0.2 ,0.7 ,0.99,0.99), dim=c(2,2,2)) # C,L,R

simvac = function(){
    m = sample(1:2,1,prob=c(0.5,0.5))
    g = sample(1:3,1,prob=c(0.6,0.3,0.1))
    c = sample(1:2,1,prob=gmc[g,m,])
    l = sample(1:2,1,prob=c(0.999,0.001))
    r = sample(1:2,1,prob=clr[c,l,])    
    return(c(m,g,c,l,r))
}
N = 10000
d = matrix(0,N,5)
for (i in 1:N){
    d[i,] = simvac()
}
PR = sum(d[,5]==2)/N
print(PR)
[1] 0.46
sum(d[,3]==2 & d[,5]==2)/sum(d[,5]==2)
[1] 0.79
PLR = 0.001 * 0.99
PLR/PR
[1] 0.0021

\[ P(L=1\mid R=1) = \dfrac{0.00099}{0.4630} = 0.002138102 \]

PRGB =
0.5*0.3*0.999*0.5*0.7+
0.5*0.3*0.999*0.5*0.2+
0.5*0.3*0.999*0.3*0.7+
0.5*0.3*0.999*0.7*0.2+
0.3 * 0.001 * 0.99
# P(G=B , R=1)
print(PRGB)
[1] 0.12
# P(G=B | R=1)
PRGB/PR
[1] 0.26
  1. PMR
PMR = #P(M,R)
0.5*0.6*0.999*0.8*0.7+
0.5*0.6*0.999*0.2*0.2+
0.5*0.3*0.999*0.5*0.7+
0.5*0.3*0.999*0.5*0.2+
0.5*0.1*0.999*0.2*0.7+
0.5*0.1*0.999*0.8*0.2+
0.5 * 0.001 * 0.99
#P(M,R)
print(PMR)
[1] 0.26
#P(M=0,R)
1-PMR/PR
[1] 0.43

Exercise 120 (Poisson: Website visits) A web designer is analyzing traffic on a web site. Assume the number of visitors arriving at the site at a given time of day is modeled as a Poisson random variable with a rate of \(\lambda\) visitors per minute. Based on prior experience with similar web sites, the following estimates are given:

  • There is a 90% probability that the rate is greater than 5 visitors per minute.
  • The rate is equally likely to be greater than or less than 14 visitors per minute.
  • There is a 90% probability that the rate is less than 27 visitors per minute.

Find a Gamma prior distribution for the arrival rate that fits these judgments as well as possible. Comment on your results.

Hint: there is no “right” answer to this problem. You can use trial and error to find a distribution that fits as well as possible. You can also use an optimization method such as Excel Solver to minimize a measure of how far apart the given quantiles are from the ones in the target distribution.

Solution:

We can try different values to see whether we can get a result that is close to the given quantiles. Given, there are only 2 parameters, I simply did a grid-search. I generated a grid of 200x200 points and calculated the quantiles for each point. Then I found the point that was closest to the given quantiles. I used sum of absolute value differences to measure the distance between the given and calculated quantiles. You can use squared differences instead.

g = expand.grid(seq(from=0.5,to=7,length=200),seq(from=0.5,to=10,length=200))
qf = function(th) {
  qgamma(c(0.1, 0.5, 0.9), shape=th[1], scale=th[2])
}
r = apply(g,1,qf)
target = c(5,14,27)
ind = which.min((colSums(abs(r-target))))
print(g[ind,])
      Var1 Var2
20272  2.8  5.3
d = data.frame(quantile = c(0.1, 0.5, 0.9), expert = target, calculated = qf(as.numeric(g[ind,])))
knitr::kable(d)
quantile expert calculated
0.1 5 5.3
0.5 14 13.3
0.9 27 27.0

Notice instead of looping through the grid, I used expand.grid combined with apply to calculate the quantiles for each point. This is a more efficient way to do the same thing.

Exercise 121 (Normal-Likelihod) We observe samples of normal random variable \(Y_i\mid \mu,\sigma \sim N(\mu,\sigma^2)\) with known \(\sigma\), specify and plot likelihood for \(\mu\)

  1. y = (-4.3,0.7,-19.4), \(\sigma=10\)
  2. y = (-12,12,-4.5,0.6), \(\sigma=6\)
  3. y = (-4.3,0.7,-19.4), \(\sigma=2\)
  4. y = (12.4,12.1), \(\sigma=5\)

Hint: Remember that the nornal likelihood is the product of the normal density function evaluated at each observation \[ L(\mu|y,\sigma) = \prod_{i=1}^n \dfrac{1}{\sqrt{2\pi}\sigma}e^{-(y_i-\mu)^2/2\sigma^2} = \dfrac{1}{(2\pi\sigma^2)^{n/2}}e^{-\sum_{i=1}^n(y_i-\mu)^2/2\sigma^2} \] The expression under the exponenta can be simplified to \[ \sum_{i=1}^n(y_i-\mu)^2 = \sum_{i=1}^n (\mu^2 - 2\mu y_i + y_i^2) = n\mu^2 - 2\mu\sum_{i=1}^n y_i + \sum_{i=1}^n y_i^2 = n\mu^2 - 2\mu n\bar{y} + \sum_{i=1}^n y_i^2 = n(\mu^2 - 2\mu\bar{y} + \sum_{i=1}^n y_i^2) \] Now \[ \mu^2 - 2\mu\bar{y} + \sum_{i=1}^n y_i^2 = \mu^2 - 2\mu\bar{y} + \bar y^2 - \bar y^2 + \sum_{i=1}^n y_i^2 (\mu - \bar{y})^2 + \sum_{i=1}^n(y_i-\bar{y})^2 \] The last summad does not depend on \(\mu\) and can be ignored. Thus, the likelihood is proportional to \[ e^\dfrac{-(\mu - \bar{y})^2}{2\sigma^2/n} \]

Solution: Let’s implement the likelihood function. We will use two different functions to calculate the likelihood. The first function uses the \(\bar y\) as summary statistic and evaluates normal density function to calculate the likelihood for the mean of the data. The second function uses the product of the normal density function to calculate the likelihood for the mean of the data. We will compare the results of the two functions to see if they are consistent.

lhood1 = function(y, mu, sigma) {
   dnorm(mean(y), mu, sigma/sqrt(length(y))) 
}
lhood2 = function(y, mu, sigma) {
   prod(dnorm(y, mu, sigma))
}
mu = seq(-15,15,0.1)
a1 = lhood1(c(-4.3,0.7,-19.4),mu,10)
a2 = sapply(mu,lhood2,y=c(-4.3,0.7,-19.4),sigma=10)
(a1/a2)[1:5]
[1] 3253 3253 3253 3253 3253
plot(mu,a1)
lines(mu,3253.175*a2,col="red")

We see that two approaches to calculating the likelihood agree (up to a constant).

plot(mu,a1,type="l",xlab="mu",ylab="likelihood")

plot(mu,lhood1(c (-12,12,-4.5,0.6),mu,6),type="l",xlab="mu",ylab="likelihood")

plot(mu,lhood1(c(-4.3,0.7,-19.4),mu,2),type="l",xlab="mu",ylab="likelihood")

plot(mu,lhood1(c(12.4,12.1),mu,5),type="l",xlab="mu",ylab="likelihood")

Exercise 122 (Normal-Normal for Lock5Data) Lock5Data package (Lock et al. 2016), includes results for a cross-sectional study of hippocampal volumes among 75 subjects (Singh et al. 2014): 25 collegiate football players with a history of concussions (FBConcuss), 25 collegiate football players that do not have a history of concussions (FBNoConcuss), and 25 control subjects. For our analysis, we’ll focus on the subjects with a history of concussions:

library(Lock5Data)
data(FootballBrain)
d = FootballBrain[FootballBrain$Group=="FBConcuss",]
plot(density(d$Hipp,bw=300))

  1. Assume that the hippocampal volumes of the subjects with a history of concussions are independent normal random variables with unknown mean \(\mu\) and known standard deviation \(\sigma\). Estimate \(\sigma\) using the sample standard deviation of the hippocampal volumes (FBConcuss group). Calculate the likelihood for \(\mu\).
  2. Assume that \(\mu\) has a normal prior distribution with mean \(\mu_0\) and standard deviation \(\sigma_0\). Estimate those parameters using the sample mean and standard error of the hippocampal volumes across all subjects.
  3. Find the posterior distribution for \(\mu\).
  4. Find a 95% posterior credible interval for \(\mu\).

Solution:

  1. The sample standard deviation of the hippocampal volumes is
sigma = sd(d$Hipp)
print(sigma)
[1] 593

Thus, we assume that \[ p(\mu\mid y) \propto e^\dfrac{-(\bar y-\mu)^2}{2\cdot 593.3^2/25} \]

mu = seq(5000,6500,length.out=200)
lhood = dnorm(mean(d$Hipp),mu,sigma/sqrt(length(d$Hipp))) 
plot(mu,lhood,type="l",xlab="mu",ylab="likelihood")

  1. The sample mean and standard deviation of the hippocampal volumes across all subjects are
mu0 = mean(FootballBrain$Hipp)
sigma0 = sd(FootballBrain$Hipp)/sqrt(75) 
print(mu0)
[1] 6599
print(sigma0)
[1] 131

Thus, we assume that the prior is \[ \mu \sim N(6598.8, 226.7^2) \]

  1. The posterior distribution for \(\mu\) has mean of \[ \mu^* = \frac{\sigma^2}{n\sigma_0^2 + \sigma^2}\mu_0 + \frac{n\sigma_0^2}{n\sigma_0^2 + \sigma^2}\bar{y} \] and variance of \[ \sigma^{*2} = \sigma_n^2 = \frac{\sigma^2\sigma_0^2}{n\sigma_0^2 + \sigma^2} \]
k = 25*sigma0^2 + sigma^2
mus = sigma^2/k*mu0 + 25*sigma0^2/k*mean(d$Hipp)
sigmas2 = sigma^2*sigma0^2/k
print(mus)
[1] 6125
print(sigmas2)
[1] 7730
  1. A 95% posterior credible interval for \(\mu\) is
qnorm(c(0.025,0.975),mus,sqrt(sigmas2))
[1] 5952 6297

Exercise 123 (Normal-Normal chest measurements) We have chest measurements of 10 000 men. Now, based on memories of my experience as an assistant in a gentlemen’s outfitters in my university vacations, I would suggest a prior \[ \mu \sim N(38, 9). \] Of course, it is open to question whether these men form a random sample from the whole population, but unless I am given information to the contrary I would stick to the prior I have just quoted, except that I might be inclined to increase the variance. My data shows that the mean turned out to be 39.8 with a standard deviation of 2.0 for the sample of 10 000.

  1. Calculate the posterior mean for the chest measurements of men in this population
  2. Calculate the predictive distribution for the next observation \(x_{n+1}\)

Solution:

mu0 = 38
sigma0 = 9
yhat = 39.8
sigma = 2
n = 10000
k = n*sigma0^2 + sigma^2
mus = sigma^2/k*mu0 + n*sigma0^2/k*yhat
sigmas2 = sigma^2*sigma0^2/k
print(mus)
[1] 40
print(sigmas2)
[1] 4e-04

Note that the posterior mean is equal to \(\bar y = 39.8\) equal to the sample mean and variance is \(0.0004\) which is equal to 4/10000, the . Thus, our posterior is standardized likelihood \[ N(\bar y, \sigma^2/n) \] Naturally, the closeness of the posterior to the standardized likelihood results from the large sample size, and whatever my prior had been, unless it were very very extreme, I would have got very much the same result.

More formally, the posterior will be close to the standardized likelihood if the ratio of posterior and prior variances \(\sigma_*^2/\sigma_0^2\) is small or in other words when when \(\sigma_0^2\) is large compared to \(\sigma^2/n\).

  1. The predictive distribution for the next observation \(x_{n+1}\) can be derived by writing \[ x_{n+1} = (x_{n+1} - \mu) + \mu \] Now \(x_{n+1}-\mu \sim N(0,\sigma^2)\) and \(\mu \sim N(\mu^*,\sigma^{*2})\). Thus, the predictive distribution is \[ x_{n+1} \sim N(\mu^*,\sigma^{*2} + \sigma^2) \]
mu = mus
sigma = sqrt(sigmas2 + sigma^2)
print(mu)
[1] 40
print(sigma)
[1] 2

Exercise 124 (Normal-Normal Hypothesis Test) Assume Normal-Normal model with known variance, \(X\mid \theta \sim N(\theta,\sigma^2)\), \(\theta \sim N(\mu ,\tau^2)\), and \(\theta\mid X \sim N(\mu^*,\rho^2)\). Given \(L_0 = L(d_1\mid H_0)\) and \(L_1 = L(d_0\mid H_1)\) losses, show that for testing \(H_0 : \theta \le \theta_0\) the “rejection region” is \(\mu^* > \theta_0 + z\rho\), where \(z = \Phi^{-1}\left(L_1/(L_0+L_1)\right)\). Remember, that in the classical \(\alpha\)-level test, the rejection region is \(X > \theta_0 + z_{1−\alpha}\sigma\)

Exercise 125 (Normal-Normal Hypothesis Test) Assume \(X\mid \theta \sim N(\theta,\sigma^2)\) and \(\theta \sim p(\theta)=1\). Consider testing \(H_0 : \theta \le \theta_0\) v.s. \(H_1 : \theta > \theta_0\). Show that \(p_0 = P(\theta \le \theta_0 \mid X)\) is equal to classical p-value.