Education & Training: Analyzing Data

Design: Selection Bias and Confounding Overview

In observational studies, participants are not randomly assigned to intervention groups. In fact, individuals receiving a given treatment may be markedly different than those not receiving treatment. Covariates that are independently associated with both treatment and outcome variables are called confounders. Illness severity, for example, would be considered a confounding variable if it influences whether or not a patient receives a given treatment and is also associated with the outcome of interest. Important covariates may not be available in existing datasets. Ignoring group differences in important covariates, whether available or not, can lead to biased estimates of treatment effects. It is important to remember that random error (chance) leads to imprecise results, whereas systematic error (bias) leads to inaccurate results.

Common approaches to control for group differences include stratified analyses, matching, or multivariable modeling using observed covariates, but these strategies are limited in the number of covariates that can be included, and none address unobserved covariates. Alternative techniques to deal with confounding include sensitivity, propensity score, or instrumental variable analyses.

Sensitivity analysis identifies what the strength and prevalence of an unmeasured confounder would have to be to alter the conclusion of the study. In other words, sensitivity analysis does not rule out the possibility that confounding exists; it describes the circumstances necessary for an unmeasured confounder to negate the observed effect of the treatment (or exposure) on the outcome.

Propensity score analysis uses any and all observed covariates to determine the likelihood (conditional probability) that a person belongs to the treatment group. The propensity scores can then be used, through a variety of options, to balance observed covariates and thus, reduce observed confounding.

Instrumental variable (IV) analysis involves identifying a variable (instrument) that is associated with treatment, but not directly associated with the outcome. Since all unmeasured factors are part of error term, selection bias is (likely) present when error term is correlated with both the outcome and the treatment variable. IV analysis involves 1) modeling treatment as a function of covariates and instrument, and 2) use this information to 'break link' with unobserved confounder(s). The unique feature of IV analysis is that it reduces confounding from both observed and unobserved factors.


MA Brookhart et al. (2010) Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 19: 537-554

RB D'Agostino (1998) Tutorial in biostatistics propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statist Med. 17: 2265-2281.

JP Leigh & M Schembri (2004) Instrumental variables technique: cigarette price provided better estimate of effects of smoking on SF-12. J Clin Epidemiol. 57(3): 284-293. EP Martens et al. (2006) Instrumental variables application and limitations. Epidemiol. 17: 260-267.

PR Rosenbaum (2005) Sensitivity analysis in observational studies. Encyclopedia of Statistics in Behavioral Science. Vol 4: 1809-1814.

MG Stineman et al. (2008) The effectiveness of inpatient rehabilitation in the acute postoperative phase of care after transtibial or transfemoral amputation: study of an integrated health care delivery system. Arch Phys Med Rehabil. 89: 1863-1872.

JA Stukel et al. (2007) Analysis of observational studies in the presence of treatment selection bias: Effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 297(3): 278-285.


The videos below cover analytic procedures for dealing with confounding and recorded during the Comparative Effectiveness Research with Population-Based Data conference in the Baker Institute at Rice University on July 13, 2012.

The CLDR is funded by the National Institutes of Health (grant# P2CHD065702). See About Us for more details.

Give Website Feedback