4 Sign test and Wilcoxon signed rank

\[\begin{equation} \text{Paired data} (X_i, Y_i), i = 1...,n\\ (X_i, Y_i) \text{independent of} (X_j, Y_j), i \ne j \\ X_i \sim (\mu_x, \sigma_x^2), Y_i \sim (\mu_x, \sigma_y^2)\\ Cov(X_i, Y_i) = \sigma_{xy} = \rho\sigma_x\sigma_y \\ D_i = X_i -Y_i \sim N(\delta = \mu_x - \mu_y, \sigma_x^2 + \sigma_y^2 -2\rho\sigma_x\sigma_y) \\ D_i = \bar{X} - \bar{Y} \sim N(\delta, \frac{1}{n}(\sigma_x^2 + \sigma_y^2 - 2\rho\sigma_x\sigma_y)) \\ \end{equation}\]

n=30; x=rnorm(n,mean=1,sd=1); y=rnorm(n,mean=1.1,sd=1);
t.test(x,y,paired=T)

## 
##  Paired t-test
## 
## data:  x and y
## t = -0.49328, df = 29, p-value = 0.6255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7969055  0.4871960
## sample estimates:
## mean of the differences 
##              -0.1548547

t.test(x-y)

## 
##  One Sample t-test
## 
## data:  x - y
## t = -0.49328, df = 29, p-value = 0.6255
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.7969055  0.4871960
## sample estimates:
##  mean of x 
## -0.1548547

4.0.1 Sign test and Wilcoxon signed rank test

Nonparametric version of the paired or one sample t. test. Primary interest is centered on the location (median) of the population. Two scenarios: pared data, such as pretreatment and posttreatment where we are looking at shift in location due to treatment, or one sample data, where obserbations from a single population about whose location we wish to make inferences.

Assumptions:

Let \(Z_i = Y_i - X_i, i =1,...,n\) The differences \(Z_i,...Z_n\) are mutually independent. \(X_i\) and \(Y_i\) can be dependent
Each \(Z_i\) comes from a continuous population (not necessarily the same one) and has a common median \(\theta\)
\(P(Z_i \le \theta) = P(Z_i > \theta) = 1/2\)
\(P(Z_i - \theta \le 0) = P(Z_i -0 > 0) = 1/2\)
\(\theta\) is the treatment effect

Under \(H_0\): * Each of the distributions for the differences has median 0, corresponding to no shift in location due to the treatment. * Sign statistic is the number of positive \(Z_i\)’s… * \(T = \sum_{i=1}^{n}l_{{z_{i}>0}}\) * random variable \(l_{{z_{i}>0}}\) follows a Bernoulli distribution with \(p = \frac{1}{2}\) * T follows a binomial(n,1/2) distribution

4.0.2 Example

library(BSDA)

## Loading required package: lattice

## 
## Attaching package: 'BSDA'

## The following object is masked from 'package:datasets':
## 
##     Orange

x=c(1.83,0.50,1.62,2.48,1.68,1.88,1.55,3.06,1.30);
y=c(0.878,0.647,0.598,2.05,1.06,1.29,1.06,3.14,1.29);
plot(x,y,las=1); abline(0,1);

z=x-y; z; sign(z) ; stat=sum(x>y);

## [1]  0.952 -0.147  1.022  0.430  0.620  0.590  0.490 -0.080  0.010

## [1]  1 -1  1  1  1  1  1 -1  1

stat; median(z);

## [1] 7

## [1] 0.49

2*pbinom(stat-1,length(z),1/2,lower.tail=F)

## [1] 0.1796875

2*pbinom(stat-1,length(z),1/2,lower.tail=F)

## [1] 0.1796875

SIGN.test(z);

## 
##  One-sample Sign-Test
## 
## data:  z
## s = 7, p-value = 0.1797
## alternative hypothesis: true median is not equal to 0
## 95 percent confidence interval:
##  -0.0730000  0.9261778
## sample estimates:
## median of x 
##        0.49 
## 
## Achieved and Interpolated Confidence Intervals: 
## 
##                   Conf.Level L.E.pt U.E.pt
## Lower Achieved CI     0.8203  0.010 0.6200
## Interpolated CI       0.9500 -0.073 0.9262
## Upper Achieved CI     0.9609 -0.080 0.9520

When \(\theta = 0\), the distribution of the statistic T is symmetric about its mean \(n/2\)
A natural estimator of \(\theta\) is the amount that \(\hat\theta\) should be subtracted from each \(Z_i\) so that the value of Y, when applied to the shifted sample \(Z_i - \hat\theta, ..., Z_n = \hat\theta\) is as close to n/2 as possible
We estimate \(\theta\) by the amount \(\hat\theta\) that the Z sample should be shifted in order that \(Z_i - \hat\theta, ..., Z_n = \hat\theta\) appears as a sample from a population with median 0

z-median(z)

## [1]  0.462 -0.637  0.532 -0.060  0.130  0.100  0.000 -0.570 -0.480

median(z-median(z))

## [1] 0

4.0.3 Wilcoxon signed rank test

n=10; rn=(n*(n+1)/2); tv=0:rn;
plot(tv,dsignrank(tv,n),type="h",ylab="",main=paste("n=",n),las=1);

x0=11;
psignrank(x0,n);

## [1] 0.05273438

1-psignrank(rn-x0-1,n);

## [1] 0.05273438

psignrank(rn-x0-1,n,lower.tail=F);

## [1] 0.05273438

rbind(z, rank(z)); n=length(z)

##    [,1]   [,2]  [,3] [,4] [,5] [,6] [,7]  [,8] [,9]
## z 0.952 -0.147 1.022 0.43 0.62 0.59 0.49 -0.08 0.01
##   8.000  1.000 9.000 4.00 7.00 6.00 5.00  2.00 3.00

Tplusv=sign(z)*rank(abs(z));
tplus=sum(Tplusv[Tplusv>0]); tplus

## [1] 40

psignrank(tplus-1,n,lower.tail=F)

## [1] 0.01953125

wilcox.test(x, y, paired = T, alternative = "greater",conf.int=T);

## 
##  Wilcoxon signed rank exact test
## 
## data:  x and y
## V = 40, p-value = 0.01953
## alternative hypothesis: true location shift is greater than 0
## 95 percent confidence interval:
##  0.175   Inf
## sample estimates:
## (pseudo)median 
##           0.46

zz=outer(z,z,"+"); lzz=zz[lower.tri(zz,diag=T)]; median(lzz)/2

## [1] 0.46

median(z-0.46)

## [1] 0.03

wilcox.test(z, alternative = "greater",conf.int=T)

## 
##  Wilcoxon signed rank exact test
## 
## data:  z
## V = 40, p-value = 0.01953
## alternative hypothesis: true location is greater than 0
## 95 percent confidence interval:
##  0.175   Inf
## sample estimates:
## (pseudo)median 
##           0.46

4.0.4 Large sample approximation

wilcox.test(z, alternative = "greater", conf.int=T,
exact = FALSE, correct = F) ;

## 
##  Wilcoxon signed rank test
## 
## data:  z
## V = 40, p-value = 0.01908
## alternative hypothesis: true location is greater than 0
## 95 percent confidence interval:
##  0.1750369       Inf
## sample estimates:
## (pseudo)median 
##      0.4600024