Skip to main content

Measure Testing

Perform Sampling

The need for sampling often varies depending on the type of test (i.e., alpha or beta) and type of measure. When determining the appropriate sample size during testing, the measure developer must evaluate the burden placed on measured entities and/or beneficiaries to collect the information.  

Measure developers may test measures that rely on administrative or claims data by examining data from the entire eligible population, with limited drain on external resources, depending on the nature of the analysis.

However, to test some measures, it is necessary to collect information from measured entities or beneficiaries directly, which can become burdensome to measure developers, measured entities, and beneficiaries.

Reduce the Burden of Data Collection 

Outcome-dependent and covariate-dependent sampling are two approaches to reduce the burden of data collection while maintaining the ability to conduct meaningful testing (Ding, Lu, Cai, & Zhou, 2017). Outcome-dependent sampling may be an efficient, but statistically equivalent to simple random samples, method for developing a risk model. 

Assume a measure developer wanted 30 cases for each covariate to estimate the coefficients. For a relatively infrequent event, such as <10%, it would be more cost effective for them to use a higher sampling probability for Y=1 than Y=0.

Determining Sample Strategy

As previously noted, alpha testing frequently uses a convenience sample; however, beta testing may involve measurement of a target/initial population, which requires careful construction of samples to support adequate testing of the measure’s scientific acceptability. The analytic unit of the specific measure (e.g., physician, hospital, home health agency) determines the sampling strategy. In general, samples used for reliability and validity testing should

  • Represent the full variety of measured entities (e.g., large and small hospitals). This is especially critical if the measured entities volunteer to participate, which limits generalizability to the full population.
  • Include adequate numbers of observations to support reliability and validity analyses using the planned statistical methods. When possible, observations should be randomly selected.
  • Be of high-quality. Measure developers must ensure data used for risk adjustment are of high-quality. 
  • Test measure calculation against an appropriate data set that reflects multiple reporting entities (e.g., clinicians, clinician groups, or hospitals) to evaluate the impact of measure calculations when there may be an attribution-related concern for measured entities using shared electronic health records.
Last Updated: Dec 2022