Measure Testing Process

Proper testing and analysis are critical to development of a feasible, reliable, and valid measure. The next sections describe types of testing that may be conducted during measure development (alpha and beta testing), the procedure for planning and testing, and key considerations when analyzing and documenting results of testing and analysis, including incorporation of interested party inputs after testing is complete.

The measure developer should conduct initial testing during development (i.e., pilot testing) within the framework of alpha and beta tests. Although considered part of measure testing, alpha testing may occur as early as information gathering and repeated iteratively during development of measure specifications. Measure developers should test early and often.

Alpha and Beta Testing

Alpha Testing / Formative Testing

Alpha testing (i.e., formative testing) is of limited scope since it usually occurs before full development of detailed specifications. Measure developers may conduct alpha testing, particularly regarding feasibility of the concept in the context of the data source, as part of information gathering empirical analysis and may occur concurrently with development of technical specifications as part of an iterative process. Check with clinicians to ensure collection of the data elements occurs as part of the usual care process, either manually or electronically, e.g., wearable devices, and the data elements collect the data needed for measure calculation.

Alpha tests include methods to determine whether individual data elements are available and whether the form in which they exist is consistent with the intent of the measure. Types of testing used in an alpha test vary widely and often depend on the measure’s data source or uniqueness of the measure specifications. Measures that use data sources similar to existing measures may require minimal alpha testing. In contrast, measures that address areas with no development of specifications may require multiple iterations of alpha testing. Measure developers may want to consult with persons and families, e.g., a focus group, to determine if the data elements are meaningful and understandable to them.

For example, an alpha test may include a query to a large, integrated, delivery system database to determine how it captures specific data, where the query originates, and how to express the query. Results can impact decisions about measure specifications.

Beta Testing

Beta testing (i.e., field testing) generally occurs after development of initial technical specifications and is usually larger in scope than alpha testing. In addition to gathering further information about feasibility, beta tests serve as the main means to assess scientific acceptability and usability of a measure. Measure developers can use beta tests to evaluate the measure’s suitability for risk adjustment or stratification and help expand previous importance and feasibility evaluations. When carefully planned and executed, beta testing helps document measure properties with respect to the evaluation criteria.

Features of Alpha and Beta Testing

The measure developer should consider the features of alpha and beta testing when planning their approach.

Feature	Alpha Testing	Beta Testing
Timing	Usually conducted prior to completion of technical specifications May conduct multiple times in quick succession	Conducted after development of measure developer’s detailed and precise technical specifications
Scale	Typically, smaller scale Only enough records to ensure the data set contains all elements needed for the measure Only enough records to identify common occurrences or variation in the data	Samples strive to achieve representative and adequate sizes Requires appropriate sample selection protocols May require evaluation of multiple sites in a variety of settings depending on the data source (e.g., administrative, medical record)
Sampling	Convenience sampling	Sufficient to allow adequate testing of the measure’s scientific acceptability Representative of the target/initial population Representative of the people, places, times, events, and conditions important to the measure If based on administrative or claims data, uses entire eligible population Randomized, if possible
Specification Refinement	Permits early detection of problems in technical specifications (e.g., identification of additional inclusion and exclusion criteria)	Used to assess or revise complexity of computations required to calculate the measure
Importance	Designed to look at volume, frequency, or costs related to a measure topic (e.g., cost of treating the condition, costs related to procedures measured) Establishes, on a preliminary basis, the measure can identify gaps in care Provides support for further development of the measure	Allows for enhanced evaluation of a measure’s importance, including evaluation of performance thresholds, disparities analysis, and outcome variation Evaluates opportunities for improvement in the population, which aids in evaluation of the measure’s importance (e.g., obtaining evidence of substantial variability among comparison groups, obtaining evidence the measure is not topped-out, where most groups achieve similarly high performance levels approaching the measure’s maximum possible value)
Scientific Acceptability	Limited in scope if conducted during the formative stage Usually occurs later in development	Assesses measure reliability and validity Reports results of analysis of denominator exclusions and/or numerator exclusions (if any used) Tests results of the risk adjustment model, quantifying relationships between and among factors
Feasibility	Provides initial information about feasibility of collecting required data and calculating measures using technical specifications Identifies barriers to implementation Offers initial estimate of costs or burden of data collection and analysis	Provides enhanced information regarding feasibility, including greater determination of barriers and provider burden to implementation and costs associated with measurement Evaluates feasibility of stratification factors based on occurrences of target events in the sample
Usability and Use	No formal analytic testing at this stage; may use qualitative testing with patients and measured entities May use the technical expert panel (TEP) to assess potential usability of the measure	Identifies unintended consequences, including susceptibility to inaccuracies and errors Reports strategies to ameliorate unintended consequences May consist of focus groups or similar means of assessing usefulness of the measure by individuals May not be in the scope of measure development contract Can use the TEP to assess potential usability

Recommended Reports During Measure Testing

A measure developer should develop specific reports when testing a measure (or set of measures). Although completion of reports usually occurs after beta testing, measure developers should consider the need to report the results of formative alpha testing, especially if the intent is for alpha testing to precede beta testing. The first few steps of measure testing address planning and execution of testing and are identical for alpha and beta testing; the last steps address reporting and follow up after the conclusion of testing.

During measure testing the measure developer

Develops the testing work plan
Performs sampling
Implements the plan
Analyzes test results
Refines measure, including incorporation of interested party input
Retests the refined measure
Updates the measure documentation

Last Updated: Oct 2023