2 Binomial

A Bernoulli random variable has only two possible values: 1 and 0, with probabilities p and 1-p, respectively. \[ P(X=x)=p(x)= \begin{cases} p^x(1-p)^{1-x}, \text{if x=0 or x=1}\\ 0, \text{otherwise}\\ \end{cases}\\ EX = p \\ VarX = p(1-p) \]

2.0.1 Binomial Distribution

Suppose there are n identical, independent Bernoulli random variables: X1…Xn. Each \(X_i\) has probability p of being 1. Then \(Y=\int_{i=1}^{n}X_i\) follows the binomial(n,p) distribution, where \(EY=np\) and \(VarY = np(1-p)\) \[ P(Y=t) = \binom{N}{k}p^y(1-p)^{n-y}, y=0,...,n \]

n=30; p=1/2;
dbinom(0,n,p)

## [1] 9.313226e-10

dbinom(15,n,p)

## [1] 0.1444644

dbinom(30,n,p)

## [1] 9.313226e-10

y0=15; sum(dbinom(0,y0,n,p));pbinom(y0,n,p)

## Warning in dbinom(0, y0, n, p): NaNs produced

## [1] NaN

## [1] 0.5722322

1-pbinom(y0,n,p);pbinom(y0,n,p,lower.tail=F)

## [1] 0.4277678

## [1] 0.4277678

sum(dbinom((y0+1):n,n,p))

## [1] 0.4277678

\[ X_i \text{iid from Bernoulli(p), i=1,...,n. }\\ H_0: p = p_0 vs. H_1: p>p_0\\ \text{Reject null if } Y \ge b_{alpha,p_0}\\ \]

\[ H_0: p = p_0 vs. H_1: p<p_0\\ \text{Reject null if Y} \le c_{alpha,p_0}\\ \] \[ H_0: p = p_0 vs. H_1: p\ne p_0\\ \text{Reject null if Y} \ge b_{alpha,p_0}\\ \alpha_1 + \alpha_2 = \alpha \]

qbinom(0.95, size = 30, prob = 1/2);qbinom(0.05, 30,1/2, lower.tail =F)

## [1] 19

## [1] 19

pbinom(19,30,1/2);1-pbinom(19,30,1/2)

## [1] 0.9506314

## [1] 0.04936857

pbinom(19,30,1/2,lower.tail=F)

## [1] 0.04936857

sum(dbinom(20:30,30,1/2))

## [1] 0.04936857

\(H_0: p=1/2\) vs. \(H_1: p\ne 1/2\)

n=30; y=23
2*pbinom(22,30,1/2, lower.tail=F)

## [1] 0.005222879

binom.test(23,30,0.5)

## 
##  Exact binomial test
## 
## data:  23 and 30
## number of successes = 23, number of trials = 30, p-value = 0.005223
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5771635 0.9006621
## sample estimates:
## probability of success 
##              0.7666667

2.0.2 Normal approximation to the binomial distribution

The approximation is best when the binomial distribution is symemtric- that is, when p =1/2
A frequently used rule of thumb is that the approximation is reasonable when np > 5 and n(1-p) >5
The approximation is especially useful for large values of n

2.0.3 Central Limit Theorum

Let \({X_1,...,X_n}\) be a sequence of iid random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\). Let \(\bar{X} = \frac{1}{n} \int_{-\infty}^{x} e^-\frac{y^2}{2}\frac{1}{\sqrt{2\pi\sigma}}\). That is, \(\frac{\sqrt(n)(\bar{X}_n - \mu)}{\sigma}\) has a standard normal distribution.

\[ \frac{\sqrt(n)(\bar{X} - \mu)}{\sigma} = \frac{n(\bar{X}_n - \mu)}{\sqrt{n\sigma^2}} \\ = \frac{\sum_{i=1}^{n}X_i-n\mu}{\sqrt{Var(\sum_{i=1}^2) X_i}}\\ N(0,1) \]

Suppose that a coin is tossed 100 times and lands heads up 60 times. The number of heads Y~binomial(n=100,p=1/2). We calculate \(P(Y\ge60)\).

\[ EY = 50, VarY = np(1-p) = 25\\ P(Y\ge60) = P(\frac{Y-50}{5} \ge \frac{60-50}{5})\\ \approx 1- \phi(2) \]

1-pbinom(50,100,1/2)

## [1] 0.4602054

pbinom(59,100,1/2, lower.tail=F)

## [1] 0.02844397

pnorm(2,0,1, lower.tail=F)

## [1] 0.02275013

\(X_i\) \
\(\hat{p} = \bar{X} = \frac{1}{2} \sum_{i=1}^{n} X_i, E\hat{p} = p, Var{\hat{p}} = p(1-p)/n\)\
\(H_0 : p = p_0 vs. H1: p \ne p_0\) \
Under H_0, \(T = \frac{\sqrt{n}(\hat{p-p_0})}{\sqrt(\hat{p}(1-\hat{p}))} \rightarrow N(0,1)\) as n \(\rightarrow \infty\)
Under H_0, \(T = \frac{\sqrt{n}(\hat{p-p_0})}{\sqrt(p_0(1-p_0))} \rightarrow N(0,1)\) as n \(\rightarrow \infty\)
Reject \(H_0\) if \(|T| > z_{1-\frac{\alpha}{2}}\)

n=30; p0=0.5; y=23; xbar=y/n;
tstat1=sqrt(n)*(xbar-p0)/sqrt(xbar*(1-xbar))
tstat2=sqrt(n)*(xbar-p0)/sqrt(p0*(1-p0))
round(c(xbar,tstat1,tstat2^2,2*pnorm(abs(tstat1),lower.tail=F),
2*pnorm(abs(tstat2),lower.tail=F),pchisq(tstat2^2,df=1,lower.tail=F)),6)

## [1] 0.766667 3.453327 8.533333 0.000554 0.003487 0.003487

prop.test(y,n,p0,correct=F)

## 
##  1-sample proportions test without continuity correction
## 
## data:  y out of n, null probability p0
## X-squared = 8.5333, df = 1, p-value = 0.003487
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.5907167 0.8820761
## sample estimates:
##         p 
## 0.7666667

Don’t reject \(H_0\) if \(|T| \le z_{1-\alpha/2}\)

\[ 1-\alpha = P(\frac{\sqrt{n}|\hat{p}-p_0}{\sqrt{\hat{p}(1-\hat{p})}} \le z_{1-\frac{\alpha}{2}} \\ = P(-z_{1-\frac{\alpha}{2}}\sqrt{\hat{p}(1-\hat{p})}) \le \sqrt{n}(\hat{p} - p_0) \le z_{1-\frac{\alpha}{2}}\sqrt(\hat{p}(1-\hat{p}))\\ = P(\hat{p} - z_{1-\frac{\alpha}{2}}\sqrt{\hat{p}(1-\hat{p}/n)} \le p_0 \le \hat{p} + z_{1-\frac{\alpha}{2}}\sqrt(\hat{p}(1-\hat{p})/n) \]

alpha = 0.05; qn = qnorm(1-alpha/2)
sx = sqrt(xbar*(1-xbar)/n)
xbar+c(-1,1)*qn*sx

## [1] 0.6153178 0.9180155

library(binom)
binom.confint(x=23,n=30)

##           method  x  n      mean     lower     upper
## 1  agresti-coull 23 30 0.7666667 0.5879550 0.8848378
## 2     asymptotic 23 30 0.7666667 0.6153178 0.9180155
## 3          bayes 23 30 0.7580645 0.6083351 0.8984425
## 4        cloglog 23 30 0.7666667 0.5720336 0.8812678
## 5          exact 23 30 0.7666667 0.5771635 0.9006621
## 6          logit 23 30 0.7666667 0.5850489 0.8844879
## 7         probit 23 30 0.7666667 0.5922975 0.8892157
## 8        profile 23 30 0.7666667 0.5973321 0.8922283
## 9            lrt 23 30 0.7666667 0.5973357 0.8922495
## 10     prop.test 23 30 0.7666667 0.5729977 0.8936498
## 11        wilson 23 30 0.7666667 0.5907167 0.8820761

NonParametric Stats

2 Binomial

2.0.1 Binomial Distribution

2.0.2 Normal approximation to the binomial distribution

2.0.3 Central Limit Theorum

2.0.4 Bernoulli Power Calculations