Measure Testing Summary

When reporting measure testing results, the measure developer’s assessment of each of four measurement criteria is a matter of degree. For example, not all revisions will require extensive reassessment for all testing criteria, and not all previously endorsed measures will be strong—or equally strong—among each set of criteria. Assessment is often a matter of judgment and expertise. Given the difficulty of assessment, the expectation is for measure developers to contract or employ clinical experts in addition to experienced statisticians and methodologists to provide expert judgment when reporting measure reliability and validity. The measure testing summary should reflect expert findings/consensus with respect to the measure, including importance, scientific acceptability, feasibility, and usability and use.

Potential Components of a Measure Testing Summary

Name of measure or measure set
Executive summary of tests and resulting recommendations
Type of testing conducted (i.e., alpha or beta), and overview of testing scope
Description of any deviation from the work plan along with rationale for deviation
Data collection and management method(s)
Description of test population(s) and description of test sites, if applicable
- Description of test data elements, including type and source
- Data source description (and export/translation processes, if applicable)
- Sampling methodology, if applicable
- Description of denominator exclusions and/or numerator exclusions, if applicable
- Patient medical record review process, if applicable, including abstractor/reviewer qualifications and training, and process for adjudication of discrepancies between abstractors/reviewers
Detailed description of measure specifications and measure score calculations
Description of the analysis conducted, including
- Summary statistics (e.g., means, medians, denominators, numerators, descriptive statistics for denominator/numerator exclusions)
- Measure Evaluation Criteria
  - Importance—specific analyses demonstrating importance, such as potentially avoidable variation across accountable entities or sub-populations
  - Scientific acceptability
    - Reliability—description of reliability statistics and assessment of adequacy in terms of norms for the tests, and rationale for analysis approach
    - Validity—specific analyses and findings related to any changes observed relative to analyses reported during the prior assessment/endorsement process, or changes observed based on revisions to the measure; these may include assessment of adequacy in terms of norms for the tests conducted, panel consensus findings, and rationale for analysis approach
  - Feasibility—discussion of feasibility challenges and adjustments made to facilitate obtaining measure results, and description of estimated costs or burden of data collection
  - Usability and use—if materially changing the measure, the recommendation is to provide a summary of findings related to measure interpretability and methods used to provide a qualitative and quantitative usability assessment (e.g., TEP review of measure results)
- Denominator and/or numerator exclusions and denominator exceptions—discussion of the rationale, which may include listing citations justifying denominator and/or numerator exclusions; documentation of technical expert panel (TEP) qualitative or quantitative data review; changes from prior assessment findings such as summary statistics and analyses, which may include changes in frequency and variability statistics; and sensitivity analyses
- Analysis of need for risk adjustment and stratification as described in the Risk Adjustment and Risk Stratification content
Any recommended changes to the measure specifications and an assessment as to whether there is a need for further testing
Detailed discussion of testing results compared to the CMS consensus-based entity (CBE) requirements, including whether testing results sufficiently met the CMS CBE requirements or whether there is a need for additional testing
Any limitations of the alpha or beta testing, such as
- Sample limited to fewer than two electronic health records
- Sample used registry data from only one state, and registry data are known to vary across states
- Testing was formative alpha test only and not intended to address validity and reliability
Recommend approval of a candidate measure for further development
Recommend approval of a fully tested and refined measure for implementation
Plan for comprehensive reevaluation

Measure Testing Summary Downloadable File

Last Updated: Jul 2025