Propensity Score Weighting for Covariate Adjustment

Introduction

Matching is a class of observational study methods that reduces the influence of covariate bias by matching each treatment individual to one or more control units. In the presence of many confounding variables, matching is facilitated by the propensity score, a balancing score that takes into account all measured covariates and assigns a probability of the unit being in either the treatment or control group. From here, units can be matched in a variety of ways: nearest neighbor matching, full matching, and mahalanobis distance matching are some. In each of these methods, the goal is to “match” subjects on observable characteristics, making the treatment and control groups as similar as possible.

Matching on observable characteristics is crucial to achieving a balanced treatment and control sample, but trade offs of the matching method should be considered. For one, the study sample is now a fraction of the overall data in the population. Indeed, only units with a match are included in the analyses. Additionally, the Average Treatment Effect of the Treated (ATT) is, under these methods, only internally valid for the matched sample. Even more, the higher probability that some subjects have of being placed in the treatment group over other subjects being placed in the treatment group is inappropriately ignored. Such trade offs in matching might be addressed with a more nuanced treatment of the propensity score.

Using data from NYC public schools, we compare the capabilities of various weighting methods, noting the shared technical foundations and general limitations from weighting methods as used in the survey literature.

head(working_data)
#> # A tibble: 6 × 15
#>    ...1 DBN    TotalEnrollment   ENI PercentBlack PercentSWD PercentPoverty
#>   <dbl> <chr>            <dbl> <dbl>        <dbl>      <dbl>          <dbl>
#> 1     1 01M015             190 0.89         0.274      0.258          0.847
#> 2     2 01M019             257 0.679        0.191      0.35           0.77 
#> 3     3 01M020             497 0.8          0.103      0.237          0.736
#> 4     4 01M034             333 0.937        0.318      0.372          0.979
#> 5     5 01M063             203 0.762        0.182      0.33           0.818
#> 6     6 01M064             245 0.882        0.208      0.298          0.922
#> # … with 8 more variables: self_contained <dbl>, AllStudents_CA <dbl>,
#> #   AllStudents_PA <dbl>, SWD_CA <dbl>, SWD_PA <dbl>, gifted <dbl>,
#> #   gifted_only <dbl>, treatment <dbl>

In the sample analysis, the statistical quantity of interest is the causal effect of a treatment (inclusion models) on percent attendance (PA) and chronic absenteeism (CA) for third grade public school students in NYC. In the sample dataset above, each school is identified by a unique 6 digit DBN. The additional variables, such as Percent Minority students and Percent Students in Poverty, serve as pre-treatment covariates. In what follows, I will use the MatchIt package along with various weighting packages to compare how estimates of the ATE and the ATT vary based on weighting scheme.

Regression Estimation

623 schools with non-inclusion models had an averdage attendance of 92.92% and a mean chronic absenteeism rate of 24.91%, compared with the 147 inclusion schools with a mean attendance of 94.55% and chronic absenteeism rate of 14.46%.

working_data %>%
  group_by(treatment) %>%
  summarise(mean_attendance = mean(AllStudents_PA),
            mean_chronic_absent = mean(AllStudents_CA),
            n=n())
#> # A tibble: 2 × 4
#>   treatment mean_attendance mean_chronic_absent     n
#>       <dbl>           <dbl>               <dbl> <int>
#> 1         0            92.9                24.9   623
#> 2         1            94.6                15.5   147

This is only a naive simple comparison, however, since the treatment (inclusion) is not randomized. Low-income, high-minority schools are more likely to have self-contained classrooms than high-income, low-minority schools. Additionally, schools with more students with special needs are more likely to have a self-contained option. These differences could be driving the attendance difference, rather than the effect of inclusion itself.

working_data %>%
  group_by(treatment) %>%
  summarise(mean_PercentBlack = mean(PercentBlack),
            mean_PercentSWD = mean(PercentSWD),
            mean_PercentPoverty = mean(PercentPoverty),
            mean_TotalEnrollment = mean(TotalEnrollment),
            mean_ENI = mean(ENI))
#> # A tibble: 2 × 6
#>   treatment mean_PercentBlack mean_PercentSWD mean_PercentPove… mean_TotalEnrol…
#>       <dbl>             <dbl>           <dbl>             <dbl>            <dbl>
#> 1         0             0.276           0.218             0.805             655.
#> 2         1             0.162           0.188             0.544             476.
#> # … with 1 more variable: mean_ENI <dbl>

Simple Regression

A simple unmatched regression shows the treatment effect of inclusion as a 1.63% increase in attendance rates for all students.

### ummatched effect
mod1<- lm(AllStudents_PA ~ treatment, data = working_data)
summary(mod1)
#> 
#> Call:
#> lm(formula = AllStudents_PA ~ treatment, data = working_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.9225 -1.6225  0.2775  1.6775  5.6775 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 92.92247    0.09386 989.966  < 2e-16 ***
#> treatment    1.62923    0.21483   7.584 9.69e-14 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.343 on 768 degrees of freedom
#> Multiple R-squared:  0.06967,    Adjusted R-squared:  0.06846 
#> F-statistic: 57.52 on 1 and 768 DF,  p-value: 9.693e-14

Propensity Score Matching

To isolate the effect of inclusion, we create an artificial control group that resembles the treatment group in terms of the five key covariates. One way to create this group is by hand calculating the propensity score through a logistic regression model that predicts the treatment by five covariates, resulting in a propensity score for each subject (each school). The propensity score represents the probability that the school is in the treatment group.

m_ps<- glm(treatment ~ PercentBlack + PercentSWD + PercentPoverty + TotalEnrollment +
             ENI, family = binomial(), data = working_data)
summary(m_ps)
#> 
#> Call:
#> glm(formula = treatment ~ PercentBlack + PercentSWD + PercentPoverty + 
#>     TotalEnrollment + ENI, family = binomial(), data = working_data)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -2.15968  -0.49568  -0.28753  -0.09002   2.99702  
#> 
#> Coefficients:
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      7.462e+00  7.884e-01   9.465  < 2e-16 ***
#> PercentBlack    -1.657e+00  5.572e-01  -2.974  0.00294 ** 
#> PercentSWD      -9.987e+00  2.116e+00  -4.720 2.36e-06 ***
#> PercentPoverty  -1.301e+01  2.010e+00  -6.472 9.66e-11 ***
#> TotalEnrollment -5.979e-03  6.926e-04  -8.633  < 2e-16 ***
#> ENI              8.768e+00  1.951e+00   4.494 6.98e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 750.81  on 769  degrees of freedom
#> Residual deviance: 477.26  on 764  degrees of freedom
#> AIC: 489.26
#> 
#> Number of Fisher Scoring iterations: 6
# the PS is the predicted probability
prs_df <- data.frame(pr_score = predict(m_ps, type = "response"),
                     treatment = m_ps$model$treatment)
head(prs_df)
#>     pr_score treatment
#> 1 0.51944953         1
#> 2 0.12431495         0
#> 3 0.35216718         0
#> 4 0.03577569         1
#> 5 0.21224056         1
#> 6 0.16978541         1

The matchit package facilitates the creation of this propensity score, and subsequent matching of groups. Our control group (original N >600) shrinks to the size of the treatment (N=147), so only observations with similar features remain.

## propensity score model
prop_matched <- matchit(treatment ~ PercentBlack + PercentSWD + PercentPoverty + TotalEnrollment +
                          ENI, family = binomial(), data = working_data)
prop_matched2 <- match.data(prop_matched)

# simple comparison
prop_matched2 %>%
  group_by(treatment) %>%
  summarise(mean_attendance = mean(AllStudents_PA),
            mean_chronic_absent = mean(AllStudents_CA),
            n=n())
#> # A tibble: 2 × 4
#>   treatment mean_attendance mean_chronic_absent     n
#>       <dbl>           <dbl>               <dbl> <int>
#> 1         0            93.3                22.5   147
#> 2         1            94.6                15.5   147

Balance in the propensity score can be further assessed through the cobalt package. Simple propensity score matching reduces the difference in the distributions of all covariates from baseline, but tweaking the ratio and caliper of the matching method will improve this balance.

library(cobalt)
bal.tab(prop_matched, m.threshold = 0.1)
#> Call
#>  matchit(formula = treatment ~ PercentBlack + PercentSWD + PercentPoverty + 
#>     TotalEnrollment + ENI, data = working_data, family = binomial())
#> 
#> Balance Measures
#>                     Type Diff.Adj        M.Threshold
#> distance        Distance   0.6218                   
#> PercentBlack     Contin.  -0.1707 Not Balanced, >0.1
#> PercentSWD       Contin.  -0.2868 Not Balanced, >0.1
#> PercentPoverty   Contin.  -0.5000 Not Balanced, >0.1
#> TotalEnrollment  Contin.  -0.0509     Balanced, <0.1
#> ENI              Contin.  -0.4758 Not Balanced, >0.1
#> 
#> Balance tally for mean differences
#>                    count
#> Balanced, <0.1         1
#> Not Balanced, >0.1     4
#> 
#> Variable with the greatest mean difference
#>        Variable Diff.Adj        M.Threshold
#>  PercentPoverty     -0.5 Not Balanced, >0.1
#> 
#> Sample sizes
#>           Control Treated
#> All           623     147
#> Matched       147     147
#> Unmatched     476       0

bal.plot(prop_matched, var.name = 'TotalEnrollment', which = "both")
bal.plot(prop_matched, var.name = 'PercentPoverty', which = "both")
# p3<- bal.plot(prop_matched, var.name = 'PercentSWD', which = "both")
# p4<- bal.plot(prop_matched, var.name = 'PercentBlack', which = "both")
# p5<- bal.plot(prop_matched, var.name = 'ENI', which = "both")
# gl<- list(p1,p2, p3,p4, p5)
# grid.arrange(
#   grobs = gl,
#   top=textGrob("Propensity Score Matched"))

Nearest Neighbor Matching

By increasing the ratio of matches to 4 control units: 1 treatment unit, and specifying the calpier (maximum width of the match) to 0.25, we improve the difference in groups for all covaraites. All five covariates now cross the 0.1 standardized mean difference threshold.

school_nearest <- matchit(formula = treatment ~ PercentBlack + PercentSWD + PercentPoverty + TotalEnrollment +
                          ENI, 
                          data = working_data,
                          method = "nearest",
                          family = "binomial",
                          caliper = 0.25,
                          ratio = 4)
library(cobalt)
bal.tab(school_nearest, m.threshold = 0.1)
#> Call
#>  matchit(formula = treatment ~ PercentBlack + PercentSWD + PercentPoverty + 
#>     TotalEnrollment + ENI, data = working_data, method = "nearest", 
#>     caliper = 0.25, ratio = 4, family = "binomial")
#> 
#> Balance Measures
#>                     Type Diff.Adj    M.Threshold
#> distance        Distance   0.0595 Balanced, <0.1
#> PercentBlack     Contin.   0.0058 Balanced, <0.1
#> PercentSWD       Contin.   0.0148 Balanced, <0.1
#> PercentPoverty   Contin.  -0.0328 Balanced, <0.1
#> TotalEnrollment  Contin.  -0.0661 Balanced, <0.1
#> ENI              Contin.  -0.0445 Balanced, <0.1
#> 
#> Balance tally for mean differences
#>                    count
#> Balanced, <0.1         6
#> Not Balanced, >0.1     0
#> 
#> Variable with the greatest mean difference
#>         Variable Diff.Adj    M.Threshold
#>  TotalEnrollment  -0.0661 Balanced, <0.1
#> 
#> Sample sizes
#>                      Control Treated
#> All                   623.       147
#> Matched (ESS)         177.34     102
#> Matched (Unweighted)  262.       102
#> Unmatched             361.        45

bal.plot(school_nearest, var.name = 'TotalEnrollment', which = "both")
bal.plot(school_nearest, var.name = 'PercentPoverty', which = "both")
# p3<- bal.plot(school_nearest, var.name = 'PercentSWD', which = "both")
# p4<- bal.plot(school_nearest, var.name = 'PercentBlack', which = "both")
# p5<- bal.plot(school_nearest, var.name = 'ENI', which = "both")
# gl<- list(p1,p2)
# grid.arrange(
#    grobs = gl,
#    top=textGrob("Nearest Neighbor with Caliper Matched"))

Matched Regression

Compared to the simple regression, from above, the matched estimated yields a lower standard error. Sample size has now reduced to 102 units; therefore our quantity of interest is the ATT, or the Average Treatment Effect Among the Treated. The quantity estimated refers to the percent attendance where the target population is the treated population. Compare this to the Average Treatment Effect (ATE), where the target population is the whole population, as in a randomized study, simulated by the regular regression estimate in the previous section.

#create the matched set, only 364 schools are matched
nearest_matched <- match.data(school_nearest)
## estimating treatment effects
mod3 <- lm(AllStudents_PA ~ treatment, data = nearest_matched)
summary(mod3)
#> 
#> Call:
#> lm(formula = AllStudents_PA ~ treatment, data = nearest_matched)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.9656 -1.5906  0.3344  1.7344  5.5344 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  92.9656     0.1474 630.767  < 2e-16 ***
#> treatment     1.2432     0.2784   4.465 1.07e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.386 on 362 degrees of freedom
#> Multiple R-squared:  0.0522, Adjusted R-squared:  0.04958 
#> F-statistic: 19.94 on 1 and 362 DF,  p-value: 1.071e-05

Weighted Estimation

Weighting by the propensity score will help to retain all of the subject in the analysis. By assigning a higher weight to subjects with a low propensity score, we give more weight to units who are not likely to be in the treatment class. Subjects who are not likely to be in the treatment are rare, and valuable. On the other hand, units who are more likely to be in the treatment have repeated information. These observations will be down-weighted. In the same spirit of the manipulation that underlies much of causal inference, weighting helps creates a more balanced “pseudo-population” to proceed with a balanced analysis.

Weighting of the propensity score is directly parallel to the Horvitz Thompson estimator (Horvitz and Thompson (1952)) in survey literature. The Horvitz Thompson estimator performs inverse probability weighting (by giving each observation a weight which is the inverse of its probability of inclusion), and provides an unbiased estimator of the population total and population mean under unequal probability sampling. Using such survey weights, the estimate for the population mean becomes

\[\hat{y}HT = \frac{1}{N}\sum\limits_{i=1}^{n} w_i y_i \]

Similarly, Inverse Probability Treatment Weighting (Rosenbaum (1987)), will assign a higher weight to those units which are less likely to be included, just as units underrepresented in the sample compared to the population, are assigned a higher weight in the Horvitz-Thompson approach. The formula for the IPTW weights, where each subject is weighted by the inverse of their treatment probability, is given by \[ p(x)=P(Z=1 \mid X) \\ \begin{equation} \hat{w}(IPTW) = \begin{cases} \frac{1}{\pi}\ \ for\ \ T_ij =1\\ \frac{1}{1-\pi} \ \ for \ \ T_ij = 0\\ \end{cases} \end{equation} \]

A second type of propensity score matching, Standardized IP-weighting, deals with the issues in IPTW weighting where individuals with propensity scores close to 0 (those extremely unlikely to be treated) end up with a very large weight. Large weights result in unstable estimators. Standardized weights use the marginal probability of treatment instead of 1 in the weight numerator. This example is parallel to standardized weights in survey literature, where weighting techniques like standardized weighting are used to address issues with low response rates, caused by survey coverage and unit nonresponse.

Referring back to our population of NYC public schools, the set of five measured covaraites will inform the weights. In this case, a very small number of schools (high poverty, high minority schools) are unlikely to be in the treatment condition. When these schools do receive the treatment, weighting gives their information as much influence as possible. Recall that IPTW addresses one of the main limitations of matching- reduction in sample size. Using IPTW, all subjects (schools) remain in the analysis. The same is true of overlap weights.

Inverse Probability Treatment Weighting

Given the non-linear relationship of three covariates with treatment assignment, I use a generalized additive model (Hastie and Tibshirani (2017)) to identify the propensity score. IPTW and SW weights are calculated and attached to each observation. Note that the full sample size (N=623,147) is retained.

library(gam)
mod <- gam(as.factor(treatment) ~ s(PercentBlack) + s(PercentSWD) + (PercentPoverty) + s(TotalEnrollment) +
                          (ENI), 
           data = working_data, family = "binomial")
par(mfrow=c(3,2))
plot(mod, residuals = TRUE, se= TRUE, pch = ".")

working_data <- working_data %>%
   mutate(propensity_gam = predict(mod, type = "response"))
## iptw weights
working_data$treatment_identifier <- ifelse(working_data$treatment == 1, "inclusion", "non-inclusion")
working_data$iptw <- ifelse(working_data$treatment_identifier == 'inclusion', 1/(working_data$propensity_gam),
                            1/(1-working_data$propensity_gam))

#stabilized weights
working_data$stable.iptw <- ifelse(working_data$treatment_identifier == 'inclusion',
                                   (mean(working_data$propensity_gam))/working_data$propensity_gam,
                                   mean(1-working_data$propensity_gam)/(1-working_data$propensity_gam))
working_data %>%
  group_by(treatment) %>%
  summarise(iptw = weighted.mean(AllStudents_PA, iptw),
            stable_iptw = weighted.mean(AllStudents_PA, stable.iptw),
            n=n())
#> # A tibble: 2 × 4
#>   treatment  iptw stable_iptw     n
#>       <dbl> <dbl>       <dbl> <int>
#> 1         0  93.0        93.0   623
#> 2         1  94.1        94.1   147

mod4 <- lm(AllStudents_PA ~ treatment, weights = working_data$iptw, data = working_data)
summary(mod4)
#> 
#> Call:
#> lm(formula = AllStudents_PA ~ treatment, data = working_data, 
#>     weights = working_data$iptw)
#> 
#> Weighted Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -15.898  -1.829   0.382   1.958  17.220 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  93.0280     0.1123 828.709  < 2e-16 ***
#> treatment     1.0730     0.1550   6.925 9.24e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.065 on 768 degrees of freedom
#> Multiple R-squared:  0.05877,    Adjusted R-squared:  0.05754 
#> F-statistic: 47.95 on 1 and 768 DF,  p-value: 9.235e-12

Additional Weighting Schemes

Weight Trimming

One concern regarding propensity score weighting is that observations with extremely large weights might over-influence results and yield estimates with high variance. In that case, a common recommendation is to trim observations with large survey weights. Again, the suggestion of weight trimming comes directly from survey design methods, such as Potter (1993), where weight trimming has been shown to reduce the sampling variance estimate. The tradeoffs suggested from the survey literature parallel the tradeoffs for weight trimming in PS methods: larger sample variance or substantial bias might result for some survey estimates, despite the overall decrease of variance.

Trimmed weighting generally takes two approaches. One approach, Crump et al. (2009), employs a “min max” approach, excluding subjects whose propensity score is outside of the range of this cutoff. Another approach is to exclude subjects who fall below a certain quantile of the propensity score distribution in either treatment of control group. For instance, Sturmer et al 2010 trim the propensity scores in treated patients at the lower end of the propensity score distribution and in untreated patients at the higher end of the propensity score distribution. In either approach, weight trimming improves the accuracy and precision of the final parameter estimates, as in Lee, Lessler, and Stuart (2011).

\[ \hat{w}_{ij}, IPW-T = \hat{w}_{ij},IPW{\hat{w}_{ij},IPW<c} \]

Determining the optimal amount of trimming is unfortunately arbitrary, and bias is likely to result. Analysts are advised to investigate the procedures that led to the generation of the weights (proper specification of the PS). Better specification of the PS model, popularly done by machine learning methods, such as Hill (2011), is an option. Also mentioned, in both trimming examples, the choice of threshold might be arbitrary. More importantly, trimming results in increased bias in estimates, as well as a reduced sample size.

As a final conceptual challenge to trimming methods, trimming results in an ambiguous target population that is difficult for stakeholders to interpret. In the NYC Schools example below, would generalizing the effect of gifted/talented programs to “schools that fall within the 85th quantile of the propensity score” be interpretable to policy makers? Taking the steps to translate the generalizability of an effect based on a certain subset of the PS distribution quickly becomes quantitatively convenient, but conceptually daunting at the policy level. This problem is especially of note under situations of heterogeneous treatment effects. When the treatment effects are constant across the distribution of the covariates, then weight trimming might be ok, but when the treatment effects vary across the distribution of covariates (such as lower income schools that benefit more from GT programs), then weight trimming is less than ideal.

The distribution of the weights, the points to the right are extremely unlikely to have inclusion. We might remove these by truncating the weights at a maximum of 10.

## distribution of weights
ggplot(working_data, aes(x = iptw, fill = as.factor(treatment))) +
   geom_density(alpha = 0.5, colour = "grey50") +
   geom_rug() +
   scale_x_log10(breaks = c(1, 5, 10, 20, 40)) +
   theme(panel.background = element_rect(fill='white'),
          legend.position = "none")+
   ggtitle("Distribution of inverse probability weights: \n full data")
# truncate weights
trimmed_data <- working_data %>%
  filter(iptw<10)
## distribution of truncated weights
ggplot(trimmed_data, aes(x = iptw, fill = as.factor(treatment))) +
   geom_density(alpha = 0.5, colour = "grey50") +
   geom_rug() +
   scale_x_log10(breaks = c(1, 5, 10, 20, 40)) +
  theme(panel.background = element_rect(fill='white'),
        legend.position = "none")+
   ggtitle("Distribution of inverse probability weights: \n trimmed data")
# 
# gl<- list(p1,p2)
# grid.arrange(
#   grobs = gl,
#   top=textGrob("Weight Comparison"))

mod4 <- lm(AllStudents_PA ~ treatment, weights = trimmed_data$iptw, data = trimmed_data)
summary(mod4)
#> 
#> Call:
#> lm(formula = AllStudents_PA ~ treatment, data = trimmed_data, 
#>     weights = trimmed_data$iptw)
#> 
#> Weighted Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -15.7828  -1.7621   0.3892   1.9763  10.0759 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  93.0213     0.1042 892.821  < 2e-16 ***
#> treatment     1.0341     0.1854   5.577 3.41e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.82 on 752 degrees of freedom
#> Multiple R-squared:  0.03972,    Adjusted R-squared:  0.03844 
#> F-statistic: 31.11 on 1 and 752 DF,  p-value: 3.409e-08

An advantage of overlap weighting is that overlap weights lead to exact balance on the mean of every measured covariate when the PS is estimated by logistic regression. All weights are bounded between 0 and 1 by design, eliminating the need for weight trimming. The approach might be particularly useful in the era of big data, where inclusion criteria is defined more broadly. Large data sources, with many possible covariates, also provoke the desire to clarify best practices for handling extreme propensity scores.

A limitation, as with all other PS methods, is that researchers are at the whim of which covariates are available and included in the model. Additionally, the ATE is now in reference to the target population, which is the overlap population. The estimation of the treatment could be for a sub-population that does not reflect people who receive the treatment in routine service. In the NYC schools example, we might be estimating people who have no chance of the treatment.

ps.formula<- trt~PercentBlack + PercentSWD + PercentPoverty+TotalEnrollment+
  ENI
msstat <- SumStat(ps.formula, trtgrp="1", data=data1,
                  weight=c("IPW","overlap","treated","entropy","matching"))
plot(msstat, type="balance", metric = "PSD")

### PSweight to identify ATE for Overlap
ate.any <- PSweight(ps.formula = ps.formula,
                    yname = "Y", data = data1,
                    weight= "overlap")
ate.any
#> Original group value:  0, 1 
#> 
#> Point estimate: 
#> 93.3388, 94.0635
summary(ate.any)
#> 
#> Closed-form inference: 
#> 
#> Original group value:  0, 1 
#> 
#> Contrast: 
#>             0 1
#> Contrast 1 -1 1
#> 
#>            Estimate Std.Error     lwr    upr  Pr(>|z|)    
#> Contrast 1  0.72475   0.17300 0.38568 1.0638 2.798e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Propensity Score Weighting for Covariate Adjustment

Katherine Wilson

2022-01-18

Introduction

Regression Estimation

Weighted Estimation

Additional Weighting Schemes

Specification and Doubly Robust Methods

Coding Doubly Robust Estimator in R

References