2 Binomial
A Bernoulli random variable has only two possible values: 1 and 0, with probabilities p and 1-p, respectively. \[ P(X=x)=p(x)= \begin{cases} p^x(1-p)^{1-x}, \text{if x=0 or x=1}\\ 0, \text{otherwise}\\ \end{cases}\\ EX = p \\ VarX = p(1-p) \]
2.0.1 Binomial Distribution
Suppose there are n identical, independent Bernoulli random variables: X1…Xn. Each \(X_i\) has probability p of being 1. Then \(Y=\int_{i=1}^{n}X_i\) follows the binomial(n,p) distribution, where \(EY=np\) and \(VarY = np(1-p)\) \[ P(Y=t) = \binom{N}{k}p^y(1-p)^{n-y}, y=0,...,n \]
n=30; p=1/2;
dbinom(0,n,p)## [1] 9.313226e-10
dbinom(15,n,p)## [1] 0.1444644
dbinom(30,n,p)## [1] 9.313226e-10
y0=15; sum(dbinom(0,y0,n,p));pbinom(y0,n,p)## Warning in dbinom(0, y0, n, p): NaNs produced
## [1] NaN
## [1] 0.5722322
1-pbinom(y0,n,p);pbinom(y0,n,p,lower.tail=F)## [1] 0.4277678
## [1] 0.4277678
sum(dbinom((y0+1):n,n,p))## [1] 0.4277678
\[ X_i \text{iid from Bernoulli(p), i=1,...,n. }\\ H_0: p = p_0 vs. H_1: p>p_0\\ \text{Reject null if } Y \ge b_{alpha,p_0}\\ \]
\[ H_0: p = p_0 vs. H_1: p<p_0\\ \text{Reject null if Y} \le c_{alpha,p_0}\\ \] \[ H_0: p = p_0 vs. H_1: p\ne p_0\\ \text{Reject null if Y} \ge b_{alpha,p_0}\\ \alpha_1 + \alpha_2 = \alpha \]
qbinom(0.95, size = 30, prob = 1/2);qbinom(0.05, 30,1/2, lower.tail =F)## [1] 19
## [1] 19
pbinom(19,30,1/2);1-pbinom(19,30,1/2)## [1] 0.9506314
## [1] 0.04936857
pbinom(19,30,1/2,lower.tail=F)## [1] 0.04936857
sum(dbinom(20:30,30,1/2))## [1] 0.04936857
\(H_0: p=1/2\) vs. \(H_1: p\ne 1/2\)
n=30; y=23
2*pbinom(22,30,1/2, lower.tail=F)## [1] 0.005222879
binom.test(23,30,0.5)##
## Exact binomial test
##
## data: 23 and 30
## number of successes = 23, number of trials = 30, p-value = 0.005223
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5771635 0.9006621
## sample estimates:
## probability of success
## 0.7666667
2.0.2 Normal approximation to the binomial distribution
- The approximation is best when the binomial distribution is symemtric- that is, when p =1/2
- A frequently used rule of thumb is that the approximation is reasonable when np > 5 and n(1-p) >5
- The approximation is especially useful for large values of n
2.0.3 Central Limit Theorum
Let \({X_1,...,X_n}\) be a sequence of iid random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\). Let \(\bar{X} = \frac{1}{n} \int_{-\infty}^{x} e^-\frac{y^2}{2}\frac{1}{\sqrt{2\pi\sigma}}\). That is, \(\frac{\sqrt(n)(\bar{X}_n - \mu)}{\sigma}\) has a standard normal distribution.
\[ \frac{\sqrt(n)(\bar{X} - \mu)}{\sigma} = \frac{n(\bar{X}_n - \mu)}{\sqrt{n\sigma^2}} \\ = \frac{\sum_{i=1}^{n}X_i-n\mu}{\sqrt{Var(\sum_{i=1}^2) X_i}}\\ N(0,1) \]
Suppose that a coin is tossed 100 times and lands heads up 60 times. The number of heads Y~binomial(n=100,p=1/2). We calculate \(P(Y\ge60)\).
\[ EY = 50, VarY = np(1-p) = 25\\ P(Y\ge60) = P(\frac{Y-50}{5} \ge \frac{60-50}{5})\\ \approx 1- \phi(2) \]
1-pbinom(50,100,1/2)## [1] 0.4602054
pbinom(59,100,1/2, lower.tail=F)## [1] 0.02844397
pnorm(2,0,1, lower.tail=F)## [1] 0.02275013
- \(X_i\) \
- \(\hat{p} = \bar{X} = \frac{1}{2} \sum_{i=1}^{n} X_i, E\hat{p} = p, Var{\hat{p}} = p(1-p)/n\)\
- \(H_0 : p = p_0 vs. H1: p \ne p_0\) \
- Under H_0, \(T = \frac{\sqrt{n}(\hat{p-p_0})}{\sqrt(\hat{p}(1-\hat{p}))} \rightarrow N(0,1)\) as n \(\rightarrow \infty\)
- Under H_0, \(T = \frac{\sqrt{n}(\hat{p-p_0})}{\sqrt(p_0(1-p_0))} \rightarrow N(0,1)\) as n \(\rightarrow \infty\)
- Reject \(H_0\) if \(|T| > z_{1-\frac{\alpha}{2}}\)
n=30; p0=0.5; y=23; xbar=y/n;
tstat1=sqrt(n)*(xbar-p0)/sqrt(xbar*(1-xbar))
tstat2=sqrt(n)*(xbar-p0)/sqrt(p0*(1-p0))
round(c(xbar,tstat1,tstat2^2,2*pnorm(abs(tstat1),lower.tail=F),
2*pnorm(abs(tstat2),lower.tail=F),pchisq(tstat2^2,df=1,lower.tail=F)),6)## [1] 0.766667 3.453327 8.533333 0.000554 0.003487 0.003487
prop.test(y,n,p0,correct=F)##
## 1-sample proportions test without continuity correction
##
## data: y out of n, null probability p0
## X-squared = 8.5333, df = 1, p-value = 0.003487
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5907167 0.8820761
## sample estimates:
## p
## 0.7666667
- Don’t reject \(H_0\) if \(|T| \le z_{1-\alpha/2}\)
\[ 1-\alpha = P(\frac{\sqrt{n}|\hat{p}-p_0}{\sqrt{\hat{p}(1-\hat{p})}} \le z_{1-\frac{\alpha}{2}} \\ = P(-z_{1-\frac{\alpha}{2}}\sqrt{\hat{p}(1-\hat{p})}) \le \sqrt{n}(\hat{p} - p_0) \le z_{1-\frac{\alpha}{2}}\sqrt(\hat{p}(1-\hat{p}))\\ = P(\hat{p} - z_{1-\frac{\alpha}{2}}\sqrt{\hat{p}(1-\hat{p}/n)} \le p_0 \le \hat{p} + z_{1-\frac{\alpha}{2}}\sqrt(\hat{p}(1-\hat{p})/n) \]
alpha = 0.05; qn = qnorm(1-alpha/2)
sx = sqrt(xbar*(1-xbar)/n)
xbar+c(-1,1)*qn*sx## [1] 0.6153178 0.9180155
library(binom)
binom.confint(x=23,n=30)## method x n mean lower upper
## 1 agresti-coull 23 30 0.7666667 0.5879550 0.8848378
## 2 asymptotic 23 30 0.7666667 0.6153178 0.9180155
## 3 bayes 23 30 0.7580645 0.6083351 0.8984425
## 4 cloglog 23 30 0.7666667 0.5720336 0.8812678
## 5 exact 23 30 0.7666667 0.5771635 0.9006621
## 6 logit 23 30 0.7666667 0.5850489 0.8844879
## 7 probit 23 30 0.7666667 0.5922975 0.8892157
## 8 profile 23 30 0.7666667 0.5973321 0.8922283
## 9 lrt 23 30 0.7666667 0.5973357 0.8922495
## 10 prop.test 23 30 0.7666667 0.5729977 0.8936498
## 11 wilson 23 30 0.7666667 0.5907167 0.8820761