The University of Texas Medical Branch (UTMB) Center for Health and Clinical Outcomes Research (H-COR) recently hosted a Works-in-Progress session on Merative MarketScan, a large, longitudinal U.S. claims dataset available to UTMB researchers through the Office of Biostatistics. The goal was to help researchers understand what the data can support, where the limitations fall, and which questions fit well with claims-based analysis.
Efstathia Polychronopoulou, PhD, MPH, a biostatistician in the Office of Biostatistics and UTMB alum who earned a PhD in Rehabilitation Sciences, led the presentation. She walked attendees through the structure of MarketScan, shared examples of published studies using the resource, and outlined how access typically works at UTMB.

What MarketScan captures, and what it doesn't
MarketScan is built from insurance claims, so it captures information needed for billing. That includes diagnoses, procedures, outpatient prescriptions, encounter type, and costs for both patient and payer. UTMB's current holdings include two core databases: commercial claims (primarily working-age adults and dependents) and Medicare supplemental (retirees with employer-sponsored supplemental coverage). The data cover 2012–2023 and represent more than 140 million unique enrollees across those years.
The session also walked through common constraints that shape study design. Claims data typically don't include detailed clinical context like symptoms, vital signs, lab values, or inpatient-administered medications. Geography is limited to broader units such as state rather than county or ZIP code.
The group discussed the planned transition to the Atlas MarketScan package, which should expand year coverage and add linked datasets such as lab results, dental, mortality, and health risk assessment surveys. Atlas will use a rolling year approach, dropping the oldest year as new data are added.
Feasibility questions, enrollment gaps, and design decisions from the Q&A
The conversation after the presentation zeroed in on issues that come up quickly when planning claims-based research.
Continuous enrollment came up repeatedly, especially how gaps appear when someone changes jobs or insurers. The group talked through how to handle follow-up when enrollment ends, when to require continuous enrollment before an "index event" to capture baseline conditions, and how analytic approaches like censoring work when follow-up time varies across participants.
Attendees raised questions about feasibility for slow-developing conditions, where gaps can make it harder to pin down timing of onset and diagnosis. Others asked about which outcomes MarketScan supports well (utilization, costs, treatment patterns, and coded clinical events) and where it falls short. Patient-reported outcomes, for instance, aren't available unless a study uses survey-linked components available for a subset of enrollees.
H-COR and the Office of Biostatistics encouraged UTMB investigators with potential project ideas to reach out early to discuss feasibility, analytic support options, and the access pathway.