Skip to main content


A (7) | B (2) | C (29) | D (13) | E (8) | F (6) | G (3) | H (10) | I (6) | J (1) | K (4) | L (3) | M (18) | N (4) | O (3) | P (15) | Q (5) | R (9) | S (14) | T (6) | U (3) | V (5) | W (1)

Access Measure

An access measure is a measure focusing on a patient’s or enrollee’s attainment of timely and appropriate health care.

Adopted Measure

An adopted measure is a measure with the same numeratordenominator, and data source as an existing measure, but added to another quality program. The only additional information the measure developer needs to provide is particular to the measure’s implementation use (such as data submission instructions). An example of an adopted measure would be an ambulatory program adopting the core hypertension measure, Controlling High Blood Pressure (CMIT Measure ID 167)External link icon (CMS CBE 0018).


Alignment, with respect to quality measures, is encouraging the use of similar, standardized quality measures across and within public and private sector efforts. Achievement of alignment is when a set of measures works well across care settings or programs to produce meaningful information without creating extra work for measured entities. Alignment includes using the same quality measures in multiple programs when possible. It can also come from consistently measuring important topics across care settings.

Analysis of Variance (ANOVA)

An ANOVA is a statistical test used to analyze the difference between the means of three or more groups. ANOVA can be one-way (one independent variable) or two-way (two independent variables). Bevans, R. (2023, June 22). One-way ANOVA. When and how to use it (with examples). Scribbr. Retrieved November 27, 2023, from…  

Appropriate Use Criteria

The appropriate use criteria are evidence-based standards to assist professionals who order and furnish applicable services to make the most appropriate treatment decisions for a specific clinical condition. See the CMS Appropriate Use Criteria Program.


Attribution is the action of linking the treatments, processes, or outcomes of health care to one or more measured entity.


An audit is a systematic inspection of records or accounts to verify their accuracy.


Bootstrap analysis (bootstrapping), as used in risk adjustment models, generally refers to estimating properties of a model estimate or the stability of an estimate by sampling from an approximating distribution. The measure developer may accomplish this by constructing many resamples of equal size from the observed dataset (e.g., the development sample), when the resamples are smaller than the observed dataset. This technique allows estimation of the sample distribution of a statistic. Measure developers can also use it to construct hypothesis tests. In the case of a regression or logistic regression risk adjustment model, the measure developer can use it to provide additional guidance regarding the inclusion of risk factors in the model.

Business Case

A business case is a justification for a proposed project or undertaking on the basis of its expected commercial benefit. It exists if the entity realizes a financial return on its investment in a reasonable time frame. The entity may realize as profit, reduction in losses, or avoided costs. A business case may also exist if the investor believes a positive indirect effect on organizational function and sustainability will accrue within a reasonable time frame (Leatherman et al., 2003). The business case for a process measure relies on the financial return on the investment necessary to implement the intervention advocated by the measure. The business case for other types of measures relies on the financial return resulting from improving the quality of care indicated by the measure.


Measure developers use the C-statistic to assess risk-adjusted models, it indicates the ability of the model to discriminate between one event and the other. If a model discriminates randomly, c = 0.5. If the risk factor modeling predicts the outcome well, then discrimination increases. The higher the c-statistic, the better the predictive power of the model.

Calculation Algorithm

A calculation algorithm is an ordered sequence of data element retrieval and aggregation through which numerator and denominator events or continuous variable values are identified by a measure. Also referred to as the performance calculation.

Chi-square test

Chi-square test measures the statistical significance of a difference in proportions. It is a statistical test commonly used to compare observed data with data one would expect to obtain according to a specific hypothesis.
Pelletier, L. R., & Beaudin, C. L. (Eds.). (2012). Q solutions: Essential resources for the healthcare quality professional (3rd ed.). National Association for Healthcare Quality.

Clinical Practice Guidelines

Clinical practice guidelines are systematically developed statements to support practitioner and patient decisions about appropriate health care for specific clinical circumstances.

Clinical Quality Language (CQL)

CQL is a Health Level Seven International® mixed normative/Standard for Trial Use. It is part of the effort to harmonize standards between electronic clinical quality measures and clinical decision support. CQL provides the ability to express logic that is human-readable yet structured enough for processing a query electronically.

Clinical Quality Measure (CQM)

A clinical quality measure is a mechanism used for assessing the degree to which a measured entity competently and safely delivers clinical services appropriate for the patient in an optimal time frame. CQMs are a subset of the broader category of performance measures.

CMS Consensus-Based Entity

The Medicare Improvements for Patients and Providers Act of 2008 requires the U.S. Department of Health and Human Services to contract with a consensus-based entity (CBE) regarding performance measurement. The CMS CBE endorses quality measures through a transparent, consensus-based process incorporating feedback from diverse groups of interested parties to foster health care quality improvement.

Cochran's Q

Cochran's Q test is a statistical test used to determine whether the proportion of "successes" is equal across three or more groups in which the same individuals appear in each group. Zach. (2021, January 26). What is Cochran's Q test? Statology. Retrieved November 27, 2023, from 

Code Language

A code language, also known as programming language, is a set of commands, instructions, and other syntax used to create a software program. A high-level language is what a programmer uses to write code. The programmer compiles the code into a low-level language, which computer hardware recognizes directly. Christensson, P. (2011). Programming language. Retrieved November 1, 2023, from

Code System

A code system is a managed collection of concepts with each concept represented by at least one internally unique code and a human-readable description (e.g., SNOMED CT).

Coefficient of Stability

The coefficient of stability is an index of reliability determined via a test-retest method in which the same test is administered to the same respondents at two different points in time.
APA Dictionary of Psychology. (n.d.). Stability coefficient. Retrieved November 27, 2023, from

Cohen's Kappa

Cohen's kappa, or Cohen's Kappa coefficient, is a quantitative measure of agreement of categorical variables between two raters (inter-rater reliability) or one rater at two time periods (intra-rater reliability). A Cohen's kappa of 0 indicates agreement equivalent to chance. A Cohen's kappa of 1 indicates total agreement.


Collinearity is when two or more variables are exactly correlated, which means the regression coefficients are not uniquely determined. Collinearity hurts the interpretability of the model because the regression coefficients are not unique and have influences from other features. Saslow, E. (2018). Collinearity - What it means, why its bad, and how does it affect other models? Medium. Retrieved November 1, 2023, from

Competing Measures

Competing measures address the same topic and the same population. Use this term when considering harmonization. See also Related Measures.

Composite Measure

A composite measure is a measure containing two or more individual measures, resulting in a single measure with a single score.

Conceptual Framework

A conceptual framework is a theoretical structure of assumptions, principles, and rules holding together the ideas comprising a broad concept.

Concordance Rate

The concordance rate is a statistical measure describing the proportion of pairs of individuals sharing an attribute, given that one already possesses this trait. A pair is considered concordant if they both possess an attribute of interest and discordant if they differ. It is commonly used to estimate the influence of nature and nurture on the development of a particular attribute or disease in an individual.

Skiold-Hanlin, S. (n.d.). Concordance rate | Definition, calculation & interpretation. Retrieved November 27, 2023, from

Confidence Interval

A confidence interval (CI) provides a range of possible values around a sample estimate (a mean, proportion, or ratio) calculated from data. CIs are commonly used when comparing groups and reflect the always-present uncertainty when working with samples of subjects. Rosati, R. J., & Quality, N. a. F. H. (2012). Q Soultions: Information Management. 

Conflict of Interest

A conflict of interest exists when an individual (or entity) has more than one motivation for trying to achieve an objective. In measure development, this situation arises when an individual has opportunities to affect specifications for quality measures impacting an interest with which the individual has a relationship.

Construct Validity

Construct validity is the extent to which the measure actually measures what it claims to measure. Construct validity evidence often involves empirical and theoretical support for the interpretation of the construct.

Continuous Variable (CV)

A continuous variable is a measure score in which each individual value for the measure can fall anywhere along a continuous scale and can be aggregated using a variety of methods such as the calculation of a mean or median (e.g., mean number of minutes between presentation of chest pain to the time of administration of thrombolytics).

Convergent Validity (concurrent validity)

Convergent validity refers to the degree to which multiple measures of a single concept are correlated.

Cost of Care

The cost of care is the total health care spending, including total resource use and unit price, by payer or consumer, for a health care service or group of health care services associated with a specified patient population, time period, and unit of clinical accountability.

Cost/Resource Use Measure

A cost/resource use measure is a measure of health services counts (in terms of units or dollars) applied to a population or event (including diagnoses, procedures, or encounters). A resource use measure counts the frequency of use of defined health system resources. Some may further apply a dollar amount (e.g., allowable charges, paid amounts, or standardized prices) to each unit of resource use.


Covariate is a variable that affects a response variable but is not of interest in the study. Zach. (2020, September 25). What is a Covariate in Statistics? Statology. 


A criterion is an accepted standard, principle, or rule used to make a decision or to inform an evaluator’s judgment.

Criterion Validity

Criterion validity measures how well one measure predicts the outcome for another measure or verifies data elements against some reference criterion determined to be valid (i.e., the gold standard).

Critical Data Element

A critical data element is an element contributing most to the computed measure score, meaning it accounts for identifying the greatest proportion of the target condition, event, or outcome being measured (numerator); the target population (denominator); population excluded (exclusion); and when applicable, risk factors with largest contribution to variability in outcome.

Cronbach's Alpha

Cronbach's alpha is a quantitative measure of internal consistency reliability. Cronbach's alpha ranges between 0 and 1, with higher values indicating more reliability. 

Data Aggregation

Data aggregation is the combining data from multiple sources to generate performance information.

Data Criteria

Data criteria are the data elements from the data model.

Data Element

A data element is a basic unit of information with a unique meaning and subcategories (data items) of distinct value. National Institute of Standards and Technology. (n.d.). Data element. Computer Security Resource Center. Retrieved November 1, 2023, from

Data Element Validity (part of Scientific Acceptability)

Data element validity is the extent to which the information represented by the data element or code used in the measure reflects the actual concept or event intended. For example

  • The measure developer uses a medication code as a proxy for a diagnosis code.
  • Data element response categories include all values necessary to provide an accurate response.

Data Fidelity

Data fidelity describes the accuracy, completeness, consistency, and timeliness of data, e.g., high-fidelity, low-fidelity. Gulen, K. (2023, April 21). The power of accurate data: How fidelity shapes the business landscape? Data Science. Retrieved November 1, 2023, from

Data Sources

Data sources are the primary source document(s) used for data collection (e.g., billing or administrative data, encounter form, enrollment form, patient medical record).

De novo Measure

A de novo measure is a new measure that is not based on an existing measure.


The denominator is a statement describing the population evaluated by the performance measure and is the lower part of a fraction used to calculate a rate, proportion, or ratio. It can be the same as the target/initial population or a subset of the target/initial population to further constrain the population for the purpose of the measure. CV measures may refer to this as measure population.

Denominator Exception

A denominator exception is any condition that should remove a patient, procedure, or unit of measurement from the denominator of the performance rate only if the numerator criteria are not met. A denominator exception allows for adjustment of the calculated score for those measured entities with higher risk populations. A denominator exception also provides for the exercise of clinical judgment and the measure developer should specifically define where to capture the information in a structured manner that fits the clinical workflow. The measured entity removes denominator exception cases from the denominator. However, the measured entity may still report the number of patients with valid exceptions. Allowable reasons fall into three general categories: medical reasons, patient reasons, or system reasons. Only proportion measures may use denominator exceptions.

Denominator Exclusion

Denominator exclusions are cases the measured entity should remove from the measure population and denominator before determining whether numerator criteria are met. Proportion and ratio measures use denominator exclusions to help narrow the denominator. For example, the measured entity would list patients with bilateral lower extremity amputations as a denominator exclusion for a measure requiring foot exams. Continuous variable measures may use denominator exclusions but may use the term measure population exclusion instead of denominator exclusion.

Direct Reference Code (DRC)

A direct reference code is a specific code referenced directly in the eCQM logic to describe a data element or one of its attributes. DRC metadata include the description of the code, the code system from which the code is derived, and the version of that code system.

Discriminant Validity

Discriminant validity is the degree to which a test of a concept (a quality measure) is not highly correlated with other tests designed to measure theoretically different concepts. Demonstrate discriminant validity by assessing variation across multiple comparison groups (such as health care providers) to show that a performance measure can differentiate between disparate groups it should theoretically be able to distinguish.

Dry Run

A dry run is full-scale measure testing involving all measured entities representing the full spectrum of the measured population. The purpose is to finalize all methodologies related to case identification/selection, data collection, and measurement calculation, and to quantify unintended consequences.

Efficiency Measure

An efficiency measure is the cost of care (inputs to the health system in the form of expenditures and other resources) associated with a specified level of health outcome.

Electronic Clinical Quality Measure (eCQM)

An electronic clinical quality measure (eCQM) is a measure specified in a standard electronic format that uses data electronically extracted from electronic health records and/or health information technology systems to measure the quality of health care provided. Electronic Clinical Quality Improvement Resource Center. (n.d.) Glossary. Retrieved May 22, 2024, from

Electronic Health Record (EHR)

The electronic health record is also known as the electronic patient record, electronic medical record, or computerized patient record. As defined by the International Social Security Association, an EHR is a “longitudinal electronic record of patient health information generated by one or more encounters in any care delivery setting. Included in this information are patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data, diagnoses and treatment, medications, allergies, immunizations as well as radiology images and laboratory results.” International Social Security Association. (n.d.). Information and communication technology - Guideline 91. electronic health record system. Retrieved November 1, 2023, from

Empirical Evidence

Empirical evidence is the data or information resulting from studies and analyses of the data elements and/or scores for a measure as specified, whether unpublished or published.


An encounter, as defined by the ASTM International is “(1) an instance of direct provider/practitioner to patient interaction, regardless of the setting, between a patient and a practitioner vested with primary responsibility for diagnosing, evaluating or treating the patient’s condition, or both, or providing social worker services; and (2) a contact between a patient and a practitioner who has primary responsibility for assessing and treating the patient at a given contact, exercising independent judgment.” Encounter serves as a focal point linking clinical, administrative, and financial information. Encounters occur in many settings—ambulatory care, inpatient care, emergency care, home health care, field and virtual (telemedicine).

Environmental Scan

An environmental scan is the process of systematically reviewing and interpreting data to identify issues and opportunities that will influence prioritization of current or future plans.


Equity is ‘‘the consistent and systematic fair, just, and impartial treatment of all individuals, including individuals who belong to underserved communities that have been denied such treatment, such as Black, Latino, and Indigenous and Native American persons, Asian Americans and Pacific Islanders and other persons of color; members of religious minorities; lesbian, gay, bisexual, transgender, and queer (LGBTQ+) persons; persons with disabilities; persons who live in rural areas; and persons otherwise adversely affected by persistent poverty or inequality.’’ (Executive Order 13985, 2021)

Expert Consensus

Expert consensus is the recommendations formulated by one of several formal consensus development methods such as consensus development conference, Delphi method, and nominal group technique.

Face Validity

Face validity is the extent to which a test appears to cover the concept it purports to measure “at face value.” It is a subjective assessment by experts of whether the measure reflects the quality of care (e.g., whether the proportion of patients with blood pressure < 140/90 is a marker of quality.)

Fast Healthcare Interoperability Resources® (FHIR®)

FHIR is a Health Level Seven International® (HL7) standard for exchanging health care information electronically. Health information technology implementers can use FHIR as a stand-alone data exchange standard but can also use in partnership with existing widely used standards. Health Level Seven International. (n.d.). FHIR overview. Retrieved November 1, 2023, from

Feasibility Criteria

Feasibility criteria is the extent to which the specifications, including measure logic, require data that are readily available or easily captured without undue burden and implemented for performance measurement.

Fisher's Test

Fisher's Exact Test is used to determine whether or not there is a significant association between two categorical variables. Zach. (2020, April 27). Fisher's Exact Test: Definition, Formula, and Example. Statology. Retrieved November 16, 2023, from

Friedman Test

The Friedman Test is a non-parametric test used to determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group. Zach. (2020, May 4). Friedman test: Definition, formula, and example. Statology. Retrieved November 27, 2023, from

Fully Developed Measure

To meet these selection criteria, the measure developer must complete  testing of the measure. This means the measure developer has completed 

  • Person/encounter-level (data element-level) reliability and validity testing, when appropriate, for each critical data element and the measure specifications do not need changes based on the results. Testing may be empiric or reference external or previous testing (e.g., established data element library such as the CMS Data Element Library (DEL) or Data Element Repository (DERep) or literature).


  • Accountable entity-level (measure score-level) reliability and validity testing, when appropriate, and specifications do not need changes based on the results. Measure developers are encouraged to report accountable entity-level reliability results by decile (rather than just the median) to detect differences in reliability across the target population size distribution.
  • Completion of face validity testing as the sole type of validity testing does not meet the criteria for completion of testing for a fully developed measure. However, face validity is acceptable for new measures (i.e., those not currently in use in CMS programs and undergoing substantive changes) that are not electronic Clinical Quality Measure (eCQM) . Instead of Likert-scale type assessments of face validity, measure developers are encouraged to develop a logic model consisting of inputs, activities, outputs, and outcomes to describe the associations between the health care structures and processes and the desired health outcome(s). The logic model should indicate the structure(s), process(es), and/or outcome(s) included in the measure. A detailed logic model will help the measure developer identify appropriate constructs for future empiric validity testing.


For measures based on survey data or patient-reported assessment tools, including patient-reported outcome-based performance measures (PRO-PMs), the measure developer has tested reliability and validity of the survey or tool and the survey or tool does not need changes based on the results. For measures based on assessment tools, the measure developer must have completed reliability and validity testing for each critical data element and complete testing of the assessment tool itself with no changes to the tool needed based on the results.


Gaming is when measured entities exploit weaknesses in the measurement system to tweak the data making their performance look better than it actually is. Includes limiting access to certain populations, neglecting care, or overuse of medications or services to ensure that the measure results are favorable.

Grey Literature

Grey literature is unpublished or not commercially indexed material that can include any documentary materials issued by government, academia, business, and industry such as technical reports, working papers, and conference proceedings. For example, contributors to the New York Academy of Medicine Grey Literature website include the Agency for Healthcare Research and Quality (AHRQ), Centers for Disease Control and Prevention, the Department of Health and Human Services (HHS), The Joint Commission, National Academy of Sciences, RAND, and RTI International.

Gwet's AC1

Gwet's AC1 is a quantitative measure of agreement of binary ratings between two raters (inter-rater reliability).


Harmonization is the standardization of specifications for related measures with the same measure focus (e.g., influenza immunization of patients in hospitals or nursing homes); related measures for the same target population (e.g., eye exam and Hemoglobin A1c for patients with diabetes); or definitions applicable to many measures (e.g., age designation for children) so they are uniform or compatible, unless the measure developer can justify differences (i.e., dictated by the evidence). The dimensions of harmonization can include numerator, denominator, exclusion, calculation, and data source, and collection instructions. The extent of harmonization depends on the relationship of the measures, the evidence for the specific measure focus, and differences in data sources. Value sets used in measures (especially eCQMs) should be harmonized when the intended meaning is the same. Harmonization of logic in eCQMs is beneficial when the data source in the EHR is the same.

Health Care Disparities

Health care disparities generally refer to differences between groups in access to, use of, quality of care, or health coverage. CMS. 2021. Paving the way to equity: A progress report 2015-2021. Retrieved November 1, 2023, from

Health Disparities

Health disparities typically refers to higher burdens of illness, mortality, injury, or quality of life experienced by one group relative to another CMS. 2021. Paving the way to equity: A progress report 2015-2021. Retrieved November 1, 2023, from

Health Information Technology (Health IT)

Per Section 3000 of the HITECH Act, the term ‘health information technology’ means “hardware, software, integrated technologies or related licenses, intellectual property, upgrades, or packaged solutions sold as services that are designed for or support the use by healthcare entities or patients for the electronic creation, maintenance, access, or exchange of health information.”

Health Information Technology for Economic and Clinical Health (HITECH) Act

The Health Information Technology for Economic and Clinical Health (HITECH) Act is a provision within American Recovery and Reinvestment Act authorizing incentive payments through Medicare and Medicaid to hospitals and clinicians toward meaningful use of EHRs.

Health Level Seven International (HL7)

HL7 is a standards-developing organization providing a framework and standards for the exchange, integration, sharing, and retrieval of electronic health information supporting clinical practice and the management, delivery, and evaluation of health services.

Health Quality Measure Format (HQMF)

HQMF is a standards-based representation of quality measures as electronic documents. Refer to a quality measure expressed in this way as an eCQM.

Health-Related Social Need (HRSN)

HRSNs are individual-level, adverse social conditions that can negatively impact a person’s health or health care. Examples include food insecurity, housing instability, and lack of access to transportation.

Hosmer-Lemeshow Test (HL Test)

The HL test is a goodness of fit test for logistic regression, especially for risk prediction models. A goodness-of-fit test tells you how well your data fits the model. Specifically, the HL test calculates if the observed event rates match the expected event rates in population subgroups. The test is only used for binary response variables (i.e., a variable with two outcomes such as alive or dead, yes or no).

Hybrid measure

A hybrid measure is a quality measure that uses more than one source of data for measure calculation. Current hybrid measures use claims data and electronic clinical data from electronic health records to calculate measure results.

Importance Criterion

The importance criterion is the extent to which the specific measure focus is important to making significant gains in health care quality (e.g., safety, timeliness, effectiveness, efficiency, equity, patient centeredness) and improving health outcomes for a specific high-impact aspect of health care where there is variation in or overall poor performance.

Inter-Rater (Inter-abstractor) Reliability Testing

Inter-rater reliability testing assesses the extent to which observations from two or more human observers are congruent with each other.

Intermediate Outcome

An intermediate outcome is a measure assessing the change produced by a health care intervention leading to a long-term outcome.

Internal Consistency Reliability Testing

Internal consistency reliability testing is testing a multiple item test or survey to assess the extent the items designed to measure a given construct are inter-correlated. Pertains to survey type measures and to the data elements used in measures constructed from patient assessment instruments.

Intra-Class Correlation

Intra-class correlation refers to correlations within a class of data (for example correlations within repeated measurements of weight), rather than to correlations between two different classes of data (for example the correlation between weight and length). Liljequist, D., Elfving, B., & Skavberg Roaldsen, K. (2019). Intraclass correlation - A discussion and demonstration of basic features. PoS ONE 14(7), e0219854.

Inverse Measures

Inverse measures are measures where a lower performance rate is better. For example, the National Healthcare Safety Network calculates most healthcare-associated infections (HAIs) as a standardized infection ratio (SIR). The SIR compares the actual number of HAIs (i.e., the numerator) with the predicted number based on the baseline U.S. experience (e.g., standard population), adjusting for several risk factors that have been found to be most associated with differences in infection rates. The goal is to have the numerator equal to or very close to zero, thereby, having a SIR equal to or very close to zero.


Jira is an Atlassian software application that tracks issues and bugs. It also allows users to quickly search issues currently being or have been resolved. HHS groups are using the ONC Project Tracking System [Jira] to track issues with eCQMs and eCQM-related standards and tools.

Kappa Coefficient

The Kappa coefficient is a statistical measure of inter-rater agreement for qualitative (categorical) items. Measure developers can think of Cohen’s kappa as a chance-corrected proportional agreement. Possible values range from +1 (perfect agreement), 0 (no agreement above that expected by chance) to -1 (complete disagreement).

Kendall's Tau

Kendall’s Tau is a statistic used to measure the ordinal association between two measured quantities. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities.
Comprehensive R Archive Network. (n.d.). Kendall's Ƭa. Tools for Descriptive Statistics. Retrieved November 13, 2023, from

Kruskal-Wallis test

A Kruskal-Wallis test is a nonparametric method for comparing the means among more than two samples. The null hypothesis of the Kruskal-Wallis test is the mean ranks of the groups are the same. The test does not assume a normal distribution of the underlying data.

Kuder-Richardson Formula 20

The Kuder-Richardson Formula 20 (KR-20) is a quantitative measure of internal consistency reliability for measurements with binary variables. 


Lean is a system of organization principles of process improvement to maximize value and eliminate waste. Lean Enterprise Institute. (n.d.). Waste. Retrieved November 1, 2023, from

Level of Analysis

The level of analysis is a performance measurement level (e.g., clinician, health plan, county populations).


Logic is the criteria used to define a quality measure and its key components.


A Mann-Whitney U test is used to compare the differences between two independent samples when the sample distributions are not normally distributed and the sample sizes are small (n <30). Zach. (2018, December 22). Mann-Whitney U Test. Statology. Retrieved November 16, 2023, from

Material Change

A material change is one that changes the specifications of a quality measure to affect the original measure’s concept or logic, the intended meaning of the measure, or the strength of the measure relative to the measure evaluation criteria.

McDonald's Omega

McDonald's omega (ω) is a quantitative test of internal consistency reliability based on a one-factor model. McDonald's omega is a reliability coefficient similar to Cronbach's Alpha. Omega has the advantage of taking into account the strength of association between items and constructs and item-specific measurement errors. ResearchGate. (2015). Re: What are the commonly used cut-off values for McDonald's Omega? Retrieved December 18, 2023 from:

McNemar's Test

The McNemar test is a non-parametric test used to analyze paired nominal data. The minimal sample size required for the McNemar test is at least ten discordant pairs.  Sundjaja, J.H., Shrestha, R., & Krishan, K. (2021). McNemar and Mann-Whitney U Tests. StatPearls Publishing.

Measure Maintenance

Measure maintenance is the periodic and consistent reviewing, evaluating, and updating of performance measures to ensure continued reliability, validity, feasibility, importance, usability, and currency with science. It also involves comparison to similar measures for potential harmonization.

Measure Score

The measure score is the numeric result computed by applying the measure specifications and scoring algorithm. The computed measure score represents an aggregation of all appropriate patient-level data (e.g., proportion of patients who died, average lab value attained) for the measured entity (e.g., hospital, health plan, home health agency, clinician). The measure specifications designate the measured entity and to whom the measure applies.

Measure Set

A measure set is a group of measures related in some way such as measures addressing a specific condition, procedure, or specialty.

Measure Steward

A measure steward is an individual or organization that owns a measure and is responsible for maintaining the measure. Measure stewards are often the same as measure developers, but not always. Measure stewards are also the ongoing point of contact for people interested in a given measure.

Measure Testing

Measure testing is the empirical analysis to demonstrate the reliability and validity of the measure as specified, including analysis of issues posing threats to the validity of conclusions about quality of care such as exclusions, risk adjustment/stratification for outcome and resource use measures, methods to identify differences in performance, and comparability of data sources/methods.

Measure Validity (part of Scientific Acceptability)

Measure validity is when the measure accurately represents the evaluated concept and achieves the intended purpose (i.e., to measure quality). For example, the measure

  • clearly identifies the evaluated concept (i.e., face validity)
  • includes all necessary data elements, codes, and tables to detect a positive occurrence when one exists (i.e., construct validity)
  • includes all necessary data sources to detect a positive occurrence when one exists (i.e., construct validity)

Measured Entities

Measured entities are the front-line clinicians and their organizations, including health information technology, collecting quality measurement data. Measured entities are the implementers of quality measures. The effect of quality measure data collection on clinician workflow can be negative. There may be effects on their payments, positive and negative, with respect to reporting and actual performance on quality measures. Because of these potential effects, measured entities should be involved in all aspects of the Measure Lifecycle.

Measures Under Consideration (MUC)

The Measures Under Consideration is a list of quality and efficiency measures HHS is considering adopting, through the federal rulemaking process, for use in the Medicare program. Made publicly available by December 1 each year for categories of measures that are described in section 1890(b)(7)(B)(i)(I) of the Social Security Act as amended by Section 3014 of the Patient Protection and Affordable Care Act (ACA).

Medical Record (Data Source)

The medical record is data obtained from the records or documentation maintained on a patient in any health care setting (e.g., hospital, home care, long term care, practitioner office). Includes electronic and paper medical record systems.


Metadata are data that describe data.

Minor Change

A minor change does not change the process of data collection, aggregation, or calculation, nor does it change the intended meaning of the measure or the strength of the measure in terms of the measure evaluation criteria. For example, the code system updates to eCQMs with the Annual Update are minor changes.


Morbidity is the rate of incidence of disease. For example, if a lumbar puncture is improperly performed, significant morbidity may follow. It also can refer to the relative incidence of a particular disease state or symptom.


Mortality is the number of deaths in a given time or place, the proportion of deaths to population. “Death rate” is also called “mortality rate.”

Multiple Chronic Conditions (MCC)

Multiple chronic conditions (MCC) is the situation in which an individual has two or more concurrent chronic conditions that collectively have an adverse effect on health status, function, or quality of life, which require complex health care management, decision-making, or coordination.

Non-parametric Methods

Non-parametric methods are a type of statistical test not involving the estimation of parameters of a statistical function. Merriam-Webster Dictionary. (n.d.). Nonparametric.  Retrieved November 1, 2023, from

Null Performance Rate

The null performance rate is when all of the denominator eligible instances are attributed to all denominator exceptions. Therefore, the performance rate for satisfactory reporting would be 0/0 (null).


The numerator is the upper portion of a fraction used to calculate a rate, proportion, or ratio. Also called the measure focus, it is the target process, condition, event, or outcome. Numerator criteria are the processes or outcomes expected for each patient, procedure, or other unit of measurement defined in the denominator. A numerator statement describes the action satisfying the conditions of the performance measure.

Numerator Exclusion

Numerator exclusions define instances measured entities should not include in the numerator data. Use numerator exclusions only in ratio and proportion measures.

Opportunity for Improvement

Opportunity for improvement is when data demonstrate considerable variation or overall, less-than-optimal performance, in the quality of care across measured entities, and/or there are disparities in care across population groups.

Outcome Measure

An outcome measure is a measure focusing on the health status of a patient (or change in health status) resulting from health care – desirable or adverse.


Overfitting a model is when a statistical model begins to describe the random error in the data rather than the relationships between variables. This occurs when the model is too complex. In regression analysis, overfitting can produce misleading R2 values, regression coefficients, and p-values. Frost, J. (n.d.). Overfitting regression models: Problems, detection, and avoidance. Statistics by Jim. Retrieved November 1, 2023, from

Paired T-Test

The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure used to determine whether the mean difference between two sets of observations is zero. In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of observations. Statistics Solutions. (n.d.). Paired T-Test. Directory of Statistical Analyses. Retrieved November 16, 2023, from…

Paperwork Reduction Act (PRA)

The PRA mandates that all federal government agencies must obtain approval from the Office of Management and Budget before collection of information that will impose a burden on the public. Measure developers should be familiar with the PRA before implementing any process involving the collection of new data.

Parameter Estimates

Parameter estimates (also called coefficients) are the change in the response associated with a one-unit change of the predictor, while all other predictors are held constant. Types of parameter estimates include

  • Point estimates, which are the single, most likely value of a parameter. For example, the point estimate of population mean (the parameter) is the sample mean (the parameter estimate).
  • Confidence intervals, which are a range of values likely to contain the population parameter.

Parametric Methods

Parametric methods make certain assumptions about a data set; namely, that the data are drawn from a population with a normal distribution. Parametric methods generally have high statistical power. Tyler, J. (2017). What are parametric and nonparametric tests? Retrieved November 1, 2023, from

Patient-Reported Outcome (PRO)

PROs are status reports on a patient’s health condition or health behavior that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else. This definition reflects the key domains of health-related quality of life (including functional status), symptoms and symptom burden (e.g., pain, fatigue), and health behaviors (e.g., smoking, diet, exercise). (Adapted from the Food and Drug Administration Guidance for Industry PRO Measures: Use in Medical Product Development to Support Labeling Claims)

Patient-Reported Outcome Measure (PROM)

A PROM is an instrument, scale, or single-item measure used to assess the associated PRO concept as perceived by the individual, obtained by directly asking the individual to self-report.

Patient-Reported Outcome-based Performance Measure (PRO-PM)

A patient-reported outcome-based performance measure (PRO-PM) is a performance measure that is based on patient-reported outcome measure (PROM) data aggregated for an accountable health care entity. Measured entities collect the data directly from the patient using the PROM tool, which can be an instrument, scale, or single-item measure.

Pearson's correlation coefficient

The correlation coefficient is a test for quantifying the relationship between two variables.
+1 indicates a strong positive relationship.
-1 indicates a strong negative relationship.
A result of zero indicates no relationship at all.
Pearson product moment correlation coefficient is one of the most common correlation coefficient formulas.


The population is the total group of people of interest for a quality measure, sometimes called the target/initial population. The measure population is a defined subset appropriate to the measure set not excluded from the individual measure.

Population Criteria

Population criteria are the basic building blocks of a quality measure, e.g., population, numerator, denominator.

Population Health Quality Measure

A population health quality measure is a broadly applicable indicator reflecting the quality of a group’s overall health and well-being. Topics include access to care, clinical outcomes, coordination of care and community services, health behaviors, preventive care and screening, and utilization of health services.

Predictive Validity

Predictive validity, also known as empirical validity, is the ability of measure scores to predict scores on some other related valid measure. The degree to which the operationalization can predict (or correlate) with other measures of the same measured construct at some time in the future.

Process Measure

A process measure is a measure focusing on steps that should be followed to provide good care. There should be a scientific basis for believing that the process, when executed well, will increase the probability of achieving a desired outcome.


A proportion is a score derived by dividing the number of cases meeting a criterion for quality (i.e., the numerator) by the number of eligible cases within a given time frame (i.e., the denominator) where the numerator cases are a subset of the denominator cases (e.g., percentage of eligible women with a mammogram performed in the last year).

Public Domain

The public domain is the “The realm embracing property rights that belong to the community at large, are unprotected by copyright or patent, and are subject to appropriation by anyone” Merriam-Webster Dictionary. (n.d.). Public domain. Retrieved November 1, 2023, from

Qualified Clinical Data Registry (QCDR)

A QCDR is an entity with clinical expertise in medicine and in quality measurement development that collects medical or clinical data on behalf of a Merit-Based Incentive Payment System (MIPS) eligible clinician for the purpose of patient and disease tracking to foster improvement in the quality of care provided to patients.

Qualified Registry

A Qualified Registry is a vendor that collects clinical data from an individual MIPS-eligible clinician, group, or virtual group and submits it to CMS on their behalf.

Quality Data Model (QDM)

The QDM is an information model defining relationships between patients and clinical concepts in a standardized format to enable electronic quality performance measurement. The model is the current structure for electronically representing quality measure concepts for interested parties involved in electronic quality measurement development and reporting. The QDM provides the language defining the criteria for clinical quality measurement. It allows the electronic definition of a clinical concept via its data elements and provides the vocabulary to relate them to each other. By relating attributes between data elements and using filtering functions, the QDM provides a method to construct complex clinical representations for eCQMs.

Quality Measure

The Patient Protection and Affordable Care Act defined a quality measure as “a standard for measuring the performance and improvement of population health or of health plans, providers of services, and other clinicians in the delivery of health care services.” (Pub. L. 111-148, 931)

Quality Reporting Document Architecture (QRDA)

QRDA is a standard document format for the exchange of eCQM data. QRDA documents contain data extracted from EHRs and other health IT systems, used to exchange eCQM data between systems, serve as the data submission standards for a variety of quality measurement and reporting initiatives, and adopted by the Office of the National Coordinator for Health IT (ONC) as the standard to support both QRDA Category I (individual patient) and QRDA Category III (aggregate) data submission.

R2 Statistic

The R2 statistic values describe how well the outcome can be predicted based on the values of the risk factors or predictors. It is frequently used to assess the predictive power of specific types of risk-adjusted models.


A ratio is a score derived by dividing a count of one type of data by a count of another type of data (e.g., number of patients with central lines who develop infection divided by the number of central line days). The key to the definition of a ratio is that the numerator is not in the denominator.

Receiver-Operating Characteristic (ROC) Curve

The ROC curve is a graph providing the c-statistic value. The ROC curve graphs the predictive accuracy of a logistic regression model.

Related Measures

Related measures are measures addressing either the same topic or the same population. This term is used when considering harmonization. See also Competing Measures.

Reliability (part of Scientific Acceptability)

Reliability reflects the measure is well defined and precisely specified so measured entities can implement consistently within and across organizations and that it distinguishes differences in performance.

Reliability Testing

Reliability testing evaluates whether the measure data elements are extracted over time, producing the same results a high proportion of the time when assessed in the same population in the same time period and/or that the measure score is precise. Often referred to as inter-rater or inter-observer reliability, reliability also applies to abstractors and coders. It can also refer to the amount of error associated with the computed measure scores (e.g., signal vs. noise).

Resource Use Measures

Resource use measures, also called cost and resource use measures, refer to broadly applicable and comparable measures of health services counts (in terms of units or dollars) applied to a population or event (broadly defined to include diagnoses, procedures, or encounters). A resource use measure counts the frequency of defined health system resources. Some measures may monetize the health service by applying a dollar amount such as allowable charges, paid amounts, or standardized prices to each unit of resource use.

Respecified Measure

A respecified measure is an existing measure changed to fit the current purpose or use. This may mean changing a measure to meet the needs of a different care setting, data source, or population; or, it may mean changes to the numerator, denominator, or adding specifications to fit the current use.

Risk Adjustment

Risk adjustment is a mathematical model applied to a quality measure correcting for differing characteristics within a population, such as patient health status. Its purpose is a fairer and more accurate comparison of outcomes of care across health care organizations or clinicians. Measure developers usually apply risk adjustment models to outcome and cost/resource use measures.


A sample is a subset of a population. The subset should be chosen in such a way that it accurately represents the whole population with respect to some characteristic of interest. A sampling frame lists all eligible cases in the population of interest (i.e., denominator) and how they are selected.

Scientific Acceptability

Scientific acceptability is the extent to which the measure, as specified, produces consistent (i.e., reliable) and credible (i.e., valid) results about the quality of care when implemented.


Scoring is the method(s) applied to data to generate results/score. Most quality measures produce rates; however, other scoring methods include categorical value, CV, count, frequency distribution, non-weighted score/composite/scale, ratio, and weighted score/composite/scales.

Semantic Validation

Semantic validation is the method of testing the validity of an eCQM whereby the measure developer compares the formal criteria in an eCQM to a manual computation of the measure from the same test database.


Sensitivity, as a statistical term, refers to the proportion of correctly identified actual positives (e.g., percentage of people with diabetes correctly identified as having diabetes). See also Specificity.

Signal-to-Noise Ratio

With respect to quality measurement, the signal is the information of interest and noise is the random, unwanted variation. The signal-to-noise ratio measures the strength of a desired signal relative to the background noise.

Spearman's ρ

Spearman's ρ is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function. Spearman's coefficient is appropriate for both continuous and discrete ordinal variables.


Specifications are measure instructions addressing data elements, data sources, point of data collection, timing and frequency of data collection and reporting, specific instruments used (if appropriate), and implementation strategies.


Specificity, as a statistical term, refers to the proportion of correctly identified negatives (e.g., percentage of healthy people correctly identified as not having the condition). Perfect specificity would mean the measure recognizes all actual negatives (e.g., all healthy people recognized as healthy). See also Sensitivity.


Stratification divides a population or resource services into distinct, independent groups of similar data, enabling analysis of the specific subgroups. This type of adjustment can show where disparities exist or where there is a need to expose differences in results.

Structure Measure

A structure measure, also known as a structural measure, is a measure assessing features of a health care organization or clinician relevant to its capacity to provide health care.

Supplemental Data Elements

Supplemental data elements are those items not captured in other eCQM fields. CMS has four required data elements - payor type, ethnicity, race, and ONC Administrative Sex.

Synthetic Data

Synthetic data are artificially generated data used to replicate the statistical components of real-world data but do not contain any identifiable information. Macaulay, T. (2019). What is synthetic data and how can it help protect privacy? Retrieved November 1, 2023, from 

Systematic Literature Review

A systematic literature review is a review of a clearly formulated question using systematic and explicit methods to identify, select, and critically appraise relevant research. A systematic literature review also collects and analyzes data from studies included in the review. Two sources of systematic literature reviews are the AHRQ Evidence-Based Clinical Information Reports and The Cochrane Library.


A t-test is a type of statistical analysis used to compare the averages of two groups and determine whether the differences between them are more likely to arise from random chance. The two groups may be independent; that is, a control group and experimental group, or they can be dependent, wherein a single group yields pretreatment and posttreatment scores.

Target/Initial Population

The target/initial population refers to all events for evaluation by a specific performance measure involving patients who share a common set of specified characteristics within a specific measurement set to which a given measure belongs. Measured entities should draw all patients/episodes (e.g., as numerator, as denominator) from the target/initial population.

Test-retest Reliability Testing

Test-retest reliability testing assesses the extent to which a survey or measurement instrument elicits the same response from the same respondent across short intervals of time.

Text Blob

A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. MySQL. (2023, June). MySQL 8.0 Reference Manual, 11.3.4 The BLOB and TEXT types. Retrieved November 1, 2023, from

Time Interval

The time interval is the time frame used to determine cases for inclusion in the denominator, numerator, or exclusion. The time interval includes an index event and period of time.


Topped-out, sometimes referred to as topped off, is when measure’s performance is so high and unvarying that measured entities can no longer make meaningful distinctions and improvements in performance.


Underfitting is when a data model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data. Underfitting occurs when a model is too simple, which can be a result of a model needing more training time, more input features, or less regularization. IBM. (2021). What is underfitting? Retrieved November 1, 2023, from

Unpaired t-test

An unpaired t-test is a statistical test comparing the averages/means of two independent or unrelated groups to determine if there is a significant difference between the two. Gleichmann, N. (2023, October 14). Paired vs unpaired T-test: Differences, assumptions and hypotheses. Technology Networks Informatics. Retrieved November 16, 2023, from

Usability and Use

Usability and Use is the extent to which interested parties (e.g., individuals, purchasers, measured entities, and policymakers) are using or could use performance results for accountability and/or performance improvement to achieve the goal of high-quality, efficient health care for individuals or populations.


Validation is testing to determine whether the measure accurately represents the evaluated concept and achieves the purpose for which the measure developer intended (i.e., to measure quality). Measure developers use validation in reference to statistical risk models where they compare model performance metrics between two different samples of data called the development and validation samples.

Validity (part of Scientific Acceptability)

Validity includes measure validity (when the measure accurately represents the evaluated concept and achieves the intended purpose, meaning to measure quality) and data element validity, which is the extent to which the information represented by the data element or code used in the measure reflects the actual concept or event intended.

Validity Testing

Validity testing is empirical analysis of the measure as specified demonstrating data are correct and/or conclusions about quality of care based on the computed measure score are correct. Validity testing focuses on systematic errors and bias.

Validity Threats

Validity threats are measure specifications or data that can affect the validity of conclusions about quality. Potential threats include patients excluded from measurement, differences in patient mix for outcome and resource use measures, measure scores generated with multiple data sources/methods, and systematic missing or “incorrect” data (unintentional or intentional).

Value Set

A value set is a subset of concepts drawn from one or more code systems, where the concepts included in the subset share a common scope of use (e.g., Anticoagulant Therapy).

Wilcoxon test

The Wilcoxon test is a nonparametric test comparing two paired groups.
GraphPad Software, L. (n.d.). Graphpad prism 10 statistics guide - Wilcoxon matched pairs test. Graphpad. Retrieved November 13, 2023, from