An access measure is a measure that focuses on a patient’s or enrollee’s attainment of timely and appropriate health care.
An adopted measure is a measure that has the same numerator, denominator, data source, and care setting as its parent measure. The only additional information the measure developer needs to provide is particular to the measure’s implementation use (such as data submission instructions).
Alignment, with respect to measures, as defined by the CMS consensus-based entity in its Changes to NQF’s Harmonization and Competing Measures Process as “encouraging the use of similar, standardized performance measures across and within public and private sector efforts.” (p. 6) Achievement of alignment is when a set of measures works well across care settings or programs to produce meaningful information without creating extra work for those responsible for the measurement. Alignment includes using the same quality measures in multiple programs when possible. It can also come from consistently measuring important topics across care settings.
Appropriate Use Criteria
The appropriate use criteria are standards that are evidence-based (to the extent feasible) and assist professionals who order and furnish applicable services to make the most appropriate treatment decisions for a specific clinical condition (modified from CMS Appropriate Use Criteria Program).
Attribution is the process of linking the treatments, processes, or outcomes of care to one or more measured entity (National Quality Forum, 2021).
An audit is a systematic inspection of records or accounts to verify their accuracy.
Bootstrap analysis (bootstrapping), as used in risk adjustment models, generally refers to estimating properties of a model estimate or the stability of an estimate by sampling from an approximating distribution. The measure developer may accomplish this by constructing many resamples of equal size from the observed dataset (e.g., the development sample), when the resamples are smaller than the observed dataset. This technique allows estimation of the sample distribution of a statistic. Measure developers can also use it to construct hypothesis tests. In the case of a regression or logistic regression risk adjustment model, the measure developer can use it to provide additional guidance regarding the inclusion of risk factors in the model.
A business case is a justification for a proposed project or undertaking on the basis of its expected commercial benefit. It exists if the entity realizes a financial return on its investment in a reasonable time frame. The entity may realize as profit, reduction in losses, or avoided costs. A business case may also exist if the investor believes that a positive indirect effect on organizational function and sustainability will accrue within a reasonable time frame (Leatherman et al., 2003). The business case for a process measure relies on the financial return on the investment necessary to implement the intervention advocated by the measure. The business case for other types of measures relies on the financial return resulting from improving the quality of care indicated by the measure.
Measure developers use the C-statistic to assess risk-adjusted models, it indicates the ability of the model to discriminate between one event and the other. If a model discriminates randomly, c = 0.5. If the risk factor modeling predicts the outcome well, then discrimination increases. The higher the c-statistic, the better the predictive power of the model.
A calculation algorithm is an ordered sequence of data element retrieval and aggregation through which numerator and denominator events or continuous variable values are identified by a measure. Also referred to as the performance calculation.
Clinical Practice Guidelines
Clinical practice guidelines are systematically developed statements to support practitioner and patient decisions about appropriate health care for specific clinical circumstances.
Clinical Quality Language (CQL)
CQL is a Health Level Seven International® mixed normative/Standard for Trial Use. It is part of the effort to harmonize standards between electronic clinical quality measures and clinical decision support. CQL provides the ability to express logic that is human-readable yet structured enough for processing a query electronically.
Clinical Quality Measure (CQM)
A clinical quality measure is a mechanism used for assessing the degree to which a measured entity competently and safely delivers clinical services appropriate for the patient in an optimal time frame. CQMs are a subset of the broader category of performance measures.
CMS Consensus-Based Entity
The Medicare Improvements for Patients and Providers Act of 2008 requires the U.S. Department of Health and Human Services to contract with a consensus-based entity (CBE) regarding performance measurement. The CMS CBE endorses quality measures through a transparent, consensus-based process incorporating feedback from diverse groups of stakeholders to foster health care quality improvement.
A code language, also known as programming language, is a set of commands, instructions, and other syntax used to create a software program. A high-level language is what a programmer uses to write code. The programmer compiles the code into a low-level language, which computer hardware recognizes directly (Christensson, 2011).
A code system is a managed collection of concepts with each concept represented by at least one internally unique code and a human-readable description (e.g., SNOMED CT).
Collinearity is when two or more variables are exactly correlated, which means the regression coefficients are not uniquely determined. Collinearity hurts the interpretability of the model because the regression coefficients are not unique and have influences from other features (Saslow, n.d.).
Competing measures address the same topic and the same population. Use this term when considering harmonization. See also Related Measures.
A composite measure is a measure containing two or more individual measures, resulting in a single measure with a single score.
A conceptual framework is a theoretical structure of assumptions, principles, and rules that holds together the ideas comprising a broad concept.
Conflict of Interest
A conflict of interest exists when an individual (or entity) has more than one motivation for trying to achieve an objective. In measure development, this situation arises when an individual has opportunities to affect specifications for quality measures that impact an interest with which the individual has a relationship.
Construct validity is the extent to which the measure actually measures what it claims to measure. Construct validity evidence often involves empirical and theoretical support for the interpretation of the construct.
Continuous Variable (CV)
A continuous variable is a measure score in which each individual value for the measure can fall anywhere along a continuous scale and can be aggregated using a variety of methods such as the calculation of a mean or median (e.g., mean number of minutes between presentation of chest pain to the time of administration of thrombolytics).
Convergent Validity (concurrent validity)
Convergent validity refers to the degree to which multiple measures of a single concept are correlated.
Cost of Care
The cost of care is the total health care spending, including total resource use and unit price, by payer or consumer, for a health care service or group of health care services associated with a specified patient population, time period, and unit of clinical accountability.
Cost/Resource Use Measure
A cost/resource use measure is a measure of health services counts (in terms of units or dollars) applied to a population or event (including diagnoses, procedures, or encounters). A resource use measure counts the frequency of use of defined health system resources. Some may further apply a dollar amount (e.g., allowable charges, paid amounts, or standardized prices) to each unit of resource use.
A criterion is an accepted standard, principle, or rule used to make a decision or to inform an evaluator’s judgment.
Criterion validity measures how well one measure predicts the outcome for another measure or verifies data elements against some reference criterion determined to be valid (i.e., the gold standard).
Critical Data Element
A critical data element is an element that contributes most to the computed measure score, that is, account for identifying the greatest proportion of the target condition, event, or outcome being measured (numerator); the target population (denominator); population excluded (exclusion); and when applicable, risk factors with largest contribution to variability in outcome.
Data aggregation is the combining data from multiple sources to generate performance information.
A data element is a unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes (OECD Glossary of Statistical Terms).
Data Element Validity (part of the Scientific Acceptability of measure properties validity subcriterion)
Data element validity is the extent to which the information represented by the data element or code used in the measure reflects the actual concept or event intended. For example
- The measure developer uses a medication code as a proxy for a diagnosis code.
- Data element response categories include all values necessary to provide an accurate response.
Data sources are the primary source document(s) used for data collection (e.g., billing or administrative data, encounter form, enrollment forms, patient medical record).
De novo Measure
A de novo measure is a new measure that is not based on an existing measure.
The denominator is a statement that describes the population evaluated by the performance measure and is the lower part of a fraction used to calculate a rate, proportion, or ratio. It can be the same as the target/initial population or a subset of the target/initial population to further constrain the population for the purpose of the measure. CV measures may refer to this as measure population.
A denominator exception is any condition that should remove a patient, procedure, or unit of measurement from the denominator of the performance rate only if the numerator criteria are not met. A denominator exception allows for adjustment of the calculated score for those measured entities with higher risk populations. A denominator exception also provides for the exercise of clinical judgment and the measure developer should specifically define where to capture the information in a structured manner that fits the clinical workflow. The measured entity removes denominator exception cases from the denominator. However, the measured entity may still report the number of patients with valid exceptions. Allowable reasons fall into three general categories: medical reasons, patient reasons, or system reasons. Only proportion measures may use denominator exceptions.
Denominator exclusions are cases the measured entity should remove from the measure population and denominator before determining whether numerator criteria are met. Proportion and ratio measures use denominator exclusions to help narrow the denominator. For example, the measured entity would list patients with bilateral lower extremity amputations as a denominator exclusion for a measure requiring foot exams. Continuous variable measures may use denominator exclusions but may use the term measure population exclusion instead of denominator exclusion.
Direct Reference Code (DRC)
A direct reference code is a specific code referenced directly in the eCQM logic to describe a data element or one of its attributes. DRC metadata include the description of the code, the code system from which the code is derived, and the version of that code system.
Discriminant validity is the degree to which a test of a concept (a quality measure) is not highly correlated with other tests designed to measure theoretically different concepts. Demonstrate discriminant validity by assessing variation across multiple comparison groups (such as health care providers) to show that a performance measure can differentiate between disparate groups it should theoretically be able to distinguish.
A dry run is full-scale measure testing involving all measured entities representing the full spectrum of the measured population. The purpose is to finalize all methodologies related to case identification/selection, data collection, and measurement calculation, and to quantify unintended consequences.
An efficiency measure is the cost of care (inputs to the health system in the form of expenditures and other resources) associated with a specified level of health outcome.
Electronic Clinical Quality Measure (eCQM)
eCQMs are measures specified in a standard electronic format that use data electronically extracted from electronic health records (EHR) and/or health information technology (IT) systems to measure the quality of health care provided.
The Source of Truth for this definition moving forward is the eCQI Resource Center.
Electronic Health Record (EHR)
The electronic health record is also known as the electronic patient record, electronic medical record, or computerized patient record. As defined by the International Social Security Association, an EHR is a “longitudinal electronic record of patient health information generated by one or more encounters in any care delivery setting. Included in this information are patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data, diagnoses and treatment, medications, allergies, immunizations as well as radiology images and laboratory results.”
Empirical evidence is the data or information resulting from studies and analyses of the data elements and/or scores for a measure as specified, whether unpublished or published.
An encounter, as defined by the ASTM International is “(1) an instance of direct provider/practitioner to patient interaction, regardless of the setting, between a patient and a practitioner vested with primary responsibility for diagnosing, evaluating or treating the patient’s condition, or both, or providing social worker services; and (2) a contact between a patient and a practitioner who has primary responsibility for assessing and treating the patient at a given contact, exercising independent judgment.” Encounter serves as a focal point linking clinical, administrative, and financial information. Encounters occur in many settings—ambulatory care, inpatient care, emergency care, home health care, field and virtual (telemedicine).
An environmental scan is the process of systematically reviewing and interpreting data to identify issues and opportunities that will influence prioritization of current or future plans.
Equity is ‘‘the consistent and systematic fair, just, and impartial treatment of all individuals, including individuals who belong to underserved communities that have been denied such treatment, such as Black, Latino, and Indigenous and Native American persons, Asian Americans and Pacific Islanders and other persons of color; members of religious minorities; lesbian, gay, bisexual, transgender, and queer (LGBTQ+) persons; persons with disabilities; persons who live in rural areas; and persons otherwise adversely affected by persistent poverty or inequality.’’ (Executive Order 13985, 2021)
Expert consensus is the recommendations formulated by one of several formal consensus development methods such as consensus development conference, Delphi method, and nominal group technique.
Face validity is the extent to which a test appears to cover the concept it purports to measure “at face value.” It is a subjective assessment by experts of whether the measure reflects the quality of care (e.g., whether the proportion of patients with blood pressure < 140/90 is a marker of quality.)
Fast Healthcare Interoperability Resources (FHIR)
FHIR is an HL7 standard for exchanging health care information electronically. Health information technology (IT) implementers can use FHIR as a stand-alone data exchange standard, but can also use in partnership with existing widely used standards. (HL7, n.d.)
Feasibility criteria is the extent to which the specifications, including measure logic, require data that are readily available or easily captured without undue burden and implemented for performance measurement.
Gaming is when providers exploit weaknesses in the measurement system to tweak the data to make their performance look better than they actually are. Includes limiting access to certain populations, neglecting care, or overuse of medications or services to ensure that the measure results are favorable.
Grey literature is unpublished or not commercially indexed material that can include any documentary materials issued by government, academia, business, and industry such as technical reports, working papers, and conference proceedings. For example, contributors to the New York Academy of Medicine Grey Literature website include the Agency for Healthcare Research and Quality (AHRQ), Centers for Disease Control and Prevention, the Department of Health and Human Services (HHS), The Joint Commission, National Academy of Sciences, RAND, and RTI International.
Harmonization is the standardization of specifications for related measures with the same measure focus (e.g., influenza immunization of patients in hospitals or nursing homes); related measures for the same target population (e.g., eye exam and Hemoglobin A1c for patients with diabetes); or definitions applicable to many measures (e.g., age designation for children) so that they are uniform or compatible, unless the measure developer can justify differences (i.e., dictated by the evidence). The dimensions of harmonization can include numerator, denominator, exclusion, calculation, and data source and collection instructions. The extent of harmonization depends on the relationship of the measures, the evidence for the specific measure focus, and differences in data sources. Value sets used in measures (especially eCQMs) should be harmonized when the intended meaning is the same. Harmonization of logic in eCQMs is beneficial when the data source in the EHR is the same.
Health Care Disparities
Health care disparities generally refer to differences between groups in access to, use of, quality of care, or health coverage. (CMS. 2021. Paving the way to equity: A progress report 2015-2021. Retrieved November 8, 2021, from https://www.cms.gov/files/document/paving-way-equity-cms-omh-progress-report.pdf)
Health disparities typically refers to higher burdens of illness, mortality, injury, or quality of life experienced by one group relative to another (CMS. 2021. Paving the way to equity: A progress report 2015-2021. Retrieved November 8, 2021, from https://www.cms.gov/files/document/paving-way-equity-cms-omh-progress-report.pdf)
Health Information Technology (Health IT)
Per Section 3000 of the HITECH Act, the term ‘health information technology’ means “hardware, software, integrated technologies or related licenses, intellectual property, upgrades, or packaged solutions sold as services that are designed for or support the use by healthcare entities or patients for the electronic creation, maintenance, access, or exchange of health information.”
Health Information Technology for Economic and Clinical Health (HITECH) Act
The Health Information Technology for Economic and Clinical Health (HITECH) Act is a provision within American Recovery and Reinvestment Act that authorizes incentive payments through Medicare and Medicaid to hospitals and clinicians toward meaningful use of EHRs.
Health Level Seven International (HL7)
HL7 is a standards-developing organization that provides framework and standards for the exchange, integration, sharing, and retrieval of electronic health information that supports clinical practice and the management, delivery, and evaluation of health services.
Health Quality Measure Format (HQMF)
HQMF is a standards-based representation of quality measures as electronic documents. Refer to a quality measure expressed in this way as an eCQM.
Hosmer-Lemeshow Test (HL Test)
The HL test is a goodness of fit test for logistic regression, especially for risk prediction models. A goodness-of-fit test tells you how well your data fits the model. Specifically, the HL test calculates if the observed event rates match the expected event rates in population subgroups. The test is only used for binary response variables (i.e., a variable with two outcomes such as alive or dead, yes or no).
A hybrid measure is a quality measure that uses more than one source of data for measure calculation. Current hybrid measures use claims data and electronic clinical data from electronic health records to calculate measure results.
Impact of a Measure (Importance Subcriterion)
The impact of a measure, now called High Priority by the CMS consensus-based entity (CBE), is when the measure topic addresses a specific national health goal or priority; affects large numbers of patients; is a leading cause of morbidity/mortality; high resource use and severity of patient/societal consequences of poor quality. For patient-reported outcomes (PROs), there is evidence that the target population values the PRO and finds it meaningful.
The importance criterion is the extent to which the specific measure focus is important to making significant gains in health care quality (e.g., safety, timeliness, effectiveness, efficiency, equity, patient centeredness) and improving health outcomes for a specific high-impact aspect of health care where there is variation in or overall poor performance.
Inter-Rater (Inter-abstractor) Reliability Testing
Inter-rater reliability testing assesses the extent to which observations from two or more human observers are congruent with each other.
An intermediate outcome is a measure that assesses the change produced by a health care intervention that leads to a long-term outcome.
Internal Consistency Reliability Testing
Internal consistency reliability testing is testing a multiple item test or survey to assess the extent the items designed to measure a given construct are inter-correlated. Pertains to survey type measures and to the data elements used in measures constructed from patient assessment instruments.
Intra-class correlation refers to correlations within a class of data (for example correlations within repeated measurements of weight), rather than to correlations between two different classes of data (for example the correlation between weight and length). (Liljequist, Elfving, & Skavberg Roaldsen, 2019)
Inverse measures are measures where a lower performance rate is better. For example, the National Healthcare Safety Network calculates most healthcare-associated infections (HAIs) as a standardized infection ratio (SIR). The SIR compares the actual number of HAIs (i.e., the numerator) with the predicted number based on the baseline U.S. experience (e.g., standard population), adjusting for several risk factors that have been found to be most associated with differences in infection rates. The goal is to have the numerator equal to or very close to zero, thereby, having a SIR equal to or very close to zero.
Jira is an Atlassian software application that tracks issues and bugs. It also allows users to quickly search issues that have or are currently being resolved. HHS groups are using the ONC Project Tracking System [Jira] to track issues with eCQMs and eCQM-related standards and tools.
The Kappa coefficient is a statistical measure of inter-rater agreement for qualitative (categorical) items. Measure developers can think of Cohen’s kappa as a chance-corrected proportional agreement. Possible values range from +1 (perfect agreement), 0 (no agreement above that expected by chance) to -1 (complete disagreement).
Lean is a system of organization principles of process improvement to maximize value and eliminate waste.
Level of Analysis
The level of analysis is a performance measurement level (e.g., clinician, health plan, county populations).
Logic is the criteria used to define a quality measure and its key components.
A material change is one that changes the specifications of an endorsed measure to affect the original measure’s concept or logic, the intended meaning of the measure, or the strength of the measure relative to the measure evaluation criteria.
Measure maintenance is the periodic and consistent reviewing, evaluating, and updating of performance measures to ensure continued reliability, validity, feasibility, importance, usability, and currency with science. It also involves comparison to similar measures for potential harmonization.
The measure score is the numeric result computed by applying the measure specifications and scoring algorithm. The computed measure score represents an aggregation of all appropriate patient-level data (e.g., proportion of patients who died, average lab value attained) for the measured entity (e.g., hospital, health plan, home health agency, clinician). The measure specifications designate the measured entity and to whom the measure applies.
A measure set is a group of measures related in some way such as measures addressing a specific condition, procedure, or specialty.
A measure steward is an individual or organization that owns a measure and is responsible for maintaining the measure. Measure stewards are often the same as measure developers, but not always. Measure stewards are also the ongoing point of contact for people interested in a given measure.
Measure testing is the empirical analysis to demonstrate the reliability and validity of the measure as specified, including analysis of issues that pose threats to the validity of conclusions about quality of care such as exclusions, risk adjustment/stratification for outcome and resource use measures, methods to identify differences in performance, and comparability of data sources/methods.
Measure Validity (part of the Scientific Acceptability of measure properties validity subcriterion)
Measure validity is when the measure accurately represents the evaluated concept and achieves the intended purpose (i.e., to measure quality). For example, the measure
- clearly identifies the evaluated concept (i.e., face validity)
- includes all necessary data elements, codes, and tables to detect a positive occurrence when one exists (i.e., construct validity)
- includes all necessary data sources to detect a positive occurrence when one exists (i.e., construct validity)
Measured entities are the front-line clinicians and their organizations, including health information technology, collecting quality measurement data. Measured entities are the implementers of quality measures. The effect of quality measure data collection on clinician workflow can be negative. There may be effects on their payments, positive and negative, with respect to reporting and actual performance on quality measures. Because of these potential effects, measured entities should be involved in all aspects of the Measure Lifecycle.
Measures Under Consideration (MUC)
The Measures Under Consideration is a list of quality and efficiency measures HHS is considering adopting, through the federal rulemaking process, for use in the Medicare program. Made publicly available by December 1 each year for categories of measures that are described in section 1890(b)(7)(B)(i)(I) of the Social Security Act as amended by Section 3014 of the Patient Protection and Affordable Care Act (ACA).
Medical Record (Data Source)
The medical record is data obtained from the records or documentation maintained on a patient in any health care setting (e.g., hospital, home care, long term care, practitioner office). Includes automated and paper medical record systems.
Metadata are data that describe data.
A minor change does not change the process of data collection, aggregation, or calculation, nor does it change the intended meaning of the measure or the strength of the measure in terms of the measure evaluation criteria. For example, the code system updates to eCQMs with the Annual Update are minor changes.
Morbidity is the rate of incidence of disease. For example, if a lumbar puncture is improperly performed, significant morbidity may follow. It also can refer to the relative incidence of a particular disease state or symptom.
Mortality is the number of deaths in a given time or place, the proportion of deaths to population. “Death rate” is also called “mortality rate.”
Multiple Chronic Conditions (MCC)
The CMS CBE defines multiple chronic conditions in the Multiple Chronic Conditions Measurement Framework as “having two or more concurrent chronic conditions that collectively have an adverse effect on health status, function, or quality of life and that require complex health care management, decision-making, or coordination.” (pp. 7-8)
Non-parametric methods are a type of statistical test not involving the estimation of parameters of a statistical function. (Nonparametric, n.d.)
Null Performance Rate
The null performance rate is when all of the denominator eligible instances are attributed to all denominator exceptions. Therefore, the performance rate for satisfactory reporting would be 0/0 (null).
The numerator is the upper portion of a fraction used to calculate a rate, proportion, or ratio. Also called the measure focus, it is the target process, condition, event, or outcome. Numerator criteria are the processes or outcomes expected for each patient, procedure, or other unit of measurement defined in the denominator. A numerator statement describes the action that satisfies the conditions of the performance measure.
Numerator exclusions define instances measured entities should not include in the numerator data. Use numerator exclusions only in ratio and proportion measures.
Opportunity for Improvement
Opportunity for improvement is when data demonstrate considerable variation or overall, less-than-optimal performance, in the quality of care across measured entities, and/or there are disparities in care across population groups.
An outcome measure is a measure that focuses on the health status of a patient (or change in health status) resulting from health care – desirable or adverse.
Overfitting a model is when a statistical model begins to describe the random error in the data rather than the relationships between variables. This occurs when the model is too complex. In regression analysis, overfitting can produce misleading R2 values, regression coefficients, and p-values (Frost, n.d. ).
Paperwork Reduction Act (PRA)
The PRA mandates that all federal government agencies must obtain approval from the Office of Management and Budget before collection of information that will impose a burden on the public. Measure developers should be familiar with the PRA before implementing any process that involves the collection of new data.
Parameter estimates (also called coefficients) are the change in the response associated with a one-unit change of the predictor, while all other predictors are held constant. Types of parameter estimates include
- Point estimates, which are the single, most likely value of a parameter. For example, the point estimate of population mean (the parameter) is the sample mean (the parameter estimate).
- Confidence intervals, which are a range of values likely to contain the population parameter.
Parametric methods make certain assumptions about a data set; namely, that the data are drawn from a population with a normal distribution. Parametric methods generally have high statistical power. (Tyler, 2017)
Patient-Reported Outcome (PRO)
PROs are status reports on a patient’s health condition or health behavior that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else. This definition reflects the key domains of health-related quality of life (including functional status), symptoms and symptom burden (e.g., pain, fatigue), and health behaviors (e.g., smoking, diet, exercise). (Adapted from the Food and Drug Administration Guidance for Industry PRO Measures: Use in Medical Product Development to Support Labeling Claims)
Patient-Reported Outcome Measure (PROM)
The CMS CBE defines PROMs in PROs in Performance Measurement as an “instrument, scale, or single-item measure used to assess the PRO concept as perceived by the patient, obtained by directly asking the patient to self-report.” (p. 27)
Patient-Reported Outcome-based Performance Measure (PRO-PM)
A patient-reported outcome-based performance measure (PRO-PM) is a performance measure that is based on patient-reported outcome measure (PROM) data aggregated for an accountable health care entity. Measured entities collect the data directly from the patient using the PROM tool, which can be an instrument, scale, or single-item measure.
The population is the total group of people of interest for a quality measure, sometimes called the target/initial population. The measure population is a defined subset appropriate to the measure set not excluded from the individual measure.
Population Health Quality Measure
A population health quality measure is a broadly applicable indicator that reflects the quality of a group’s overall health and well-being. Topics include access to care, clinical outcomes, coordination of care and community services, health behaviors, preventive care and screening, and utilization of health services.
Predictive validity, also known as empirical validity, is the ability of measure scores to predict scores on some other related valid measure. The degree to which the operationalization can predict (or correlate) with other measures of the same measured construct at some time in the future.
A process measure is a measure that focuses on steps that should be followed to provide good care. There should be a scientific basis for believing that the process, when executed well, will increase the probability of achieving a desired outcome.
A proportion is a score derived by dividing the number of cases that meet a criterion for quality (i.e., the numerator) by the number of eligible cases within a given time frame (i.e., the denominator) where the numerator cases are a subset of the denominator cases (e.g., percentage of eligible women with a mammogram performed in the last year).
The public domain is the “The realm embracing property rights that belong to the community at large, are unprotected by copyright or patent, and are subject to appropriation by anyone” (Merriam-Webster’s Dictionary, n.d).
Qualified Clinical Data Registry (QCDR)
A QCDR is an entity with clinical expertise in medicine and in quality measurement development that collects medical or clinical data on behalf of a Merit-Based Incentive Payment System (MIPS) eligible clinician for the purpose of patient and disease tracking to foster improvement in the quality of care provided to patients.
A Qualified Registry is a vendor that collects clinical data from an individual MIPS-eligible clinician, group, or virtual group and submits it to CMS on their behalf.
Quality Data Model (QDM)
The QDM is an information model that defines relationships between patients and clinical concepts in a standardized format to enable electronic quality performance measurement. The model is the current structure for electronically representing quality measure concepts for stakeholders involved in electronic quality measurement development and reporting. The QDM provides the language that defines the criteria for clinical quality measurement. It allows the electronic definition of a clinical concept via its data elements and provides the vocabulary to relate them to each other. By relating attributes between data elements and using filtering functions, the QDM provides a method to construct complex clinical representations for eCQMs.
The Patient Protection and Affordable Care Act defined a quality measure as “a standard for measuring the performance and improvement of population health or of health plans, providers of services, and other clinicians in the delivery of health care services.” (Pub. L. 111-148, 931)
Quality Reporting Document Architecture (QRDA)
QRDA is a standard document format for the exchange of eCQM data. QRDA documents contain data extracted from EHRs and other health IT systems, used to exchange eCQM data between systems, serve as the data submission standards for a variety of quality measurement and reporting initiatives, and adopted by the Office of the National Coordinator for Health IT (ONC) as the standard to support both QRDA Category I (individual patient) and QRDA Category III (provider’s aggregate) data submission.
The R2 statistic values describe how well the outcome can be predicted based on the values of the risk factors or predictors. It is frequently used to assess the predictive power of specific types of risk-adjusted models.
A ratio is a score derived by dividing a count of one type of data by a count of another type of data (e.g., number of patients with central lines who develop infection divided by the number of central line days). The key to the definition of a ratio is that the numerator is not in the denominator.
Receiver-Operating Characteristic (ROC) Curve
The ROC curve is a graph that provides the c-statistic value. The ROC curve graphs the predictive accuracy of a logistic regression model.
Related measures are measures that address either the same topic or the same population. This term is used when considering harmonization. See also Competing Measures.
Reliability (Scientific Acceptability of measure properties subcriterion)
Reliability reflects the measure is well defined and precisely specified so measured entities can implement consistently within and across organizations and that it distinguishes differences in performance.
Reliability testing evaluates whether the measure data elements are extracted over time, producing the same results a high proportion of the time when assessed in the same population in the same time period and/or that the measure score is precise. Often referred to as inter-rater or inter-observer reliability, reliability also applies to abstractors and coders. It can also refer to the amount of error associated with the computed measure scores (e.g., signal vs. noise).
Resource Use Measures
Resource use measures, also called cost and resource use measures, refer to broadly applicable and comparable measures of health services counts (in terms of units or dollars) applied to a population or event (broadly defined to include diagnoses, procedures, or encounters). A resource use measure counts the frequency of defined health system resources. Some measures may monetize the health service by applying a dollar amount such as allowable charges, paid amounts, or standardized prices to each unit of resource use.
A respecified measure is an existing measure changed to fit the current purpose or use. This may mean changing a measure to meet the needs of a different care setting, data source, or population; or, it may mean changes to the numerator, denominator, or adding specifications to fit the current use.
Risk adjustment is a mathematical model applied to a quality measure that corrects for differing characteristics within a population, such as patient health status. Its purpose is a fairer and more accurate comparison of outcomes of care across health care organizations or clinicians. Measure developers usually apply risk adjustment models to outcome and cost/resource use measures.
A sample is a subset of a population. The subset should be chosen in such a way that it accurately represents the whole population with respect to some characteristic of interest. A sampling frame lists all eligible cases in the population of interest (i.e., denominator) and how they are selected.
Scientific Acceptability of the Measure Properties
Scientific acceptability of the measure properties is the extent to which the measure, as specified, produces consistent (i.e., reliable) and credible (i.e., valid) results about the quality of care when implemented.
Scoring is the method(s) applied to data to generate results/score. Most quality measures produce rates; however, other scoring methods include categorical value, CV, count, frequency distribution, non-weighted score/composite/scale, ratio, and weighted score/composite/scales.
Semantic validation is the method of testing the validity of an eCQM whereby the measure developer compares the formal criteria in an eCQM to a manual computation of the measure from the same test database.
Sensitivity, as a statistical term, refers to the proportion of correctly identified actual positives (e.g., percentage of people with diabetes correctly identified as having diabetes). See also Specificity.
Specifications are measure instructions that address data elements, data sources, point of data collection, timing and frequency of data collection and reporting, specific instruments used (if appropriate), and implementation strategies.
Specificity, as a statistical term, refers to the proportion of correctly identified negatives (e.g., percentage of healthy people correctly identified as not having the condition). Perfect specificity would mean that the measure recognizes all actual negatives (e.g., all healthy people recognized as healthy). See also Sensitivity.
Stratification divides a population or resource services into distinct, independent groups of similar data, enabling analysis of the specific subgroups. This type of adjustment can show where disparities exist or where there is a need to expose differences in results.
A structure measure, also known as a structural measure, is a measure that assesses features of a health care organization or clinician relevant to its capacity to provide health care.
Synthetic data are artificially generated data used to replicate the statistical components of real-world data but do not contain any identifiable information (Macaulay, 2019).
Systematic Literature Review
A systematic literature review is a review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research. A systematic literature review also collects and analyzes data from studies included in the review. Two sources of systematic literature reviews are the AHRQ Evidence-Based Clinical Information Reports and The Cochrane Library.
The target/initial population refers to all events for evaluation by a specific performance measure involving patients who share a common set of specified characteristics within a specific measurement set to which a given measure belongs. Measured entities should draw all patients (e.g., as numerator, as denominator) from the target/initial population.
Test-retest Reliability Testing
Test-retest reliability testing assesses the extent to which a survey or measurement instrument elicits the same response from the same respondent across short intervals of time.
The time interval is the time frame used to determine cases for inclusion in the denominator, numerator, or exclusion. The time interval includes an index event and period of time.
Topped-out, sometimes referred to as topped off, is when measure’s performance is so high and unvarying that measured entities can no longer make meaningful distinctions and improvements in performance.
Underfitting is when a data model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data. Underfitting occurs when a model is too simple, which can be a result of a model needing more training time, more input features, or less regularization (IBM Cloud Education, 2021).
Usability and Use
Usability and Use, as defined by the CMS CBE in Measure Evaluation Criteria and Guidance for Evaluating Measures for Endorsement, is the “extent to which potential audiences (e.g., consumers, purchasers, providers, and policymakers) are using or could use performance results for both accountability and performance improvement to achieve the goal of high-quality, efficient healthcare for individuals or populations” (p.28).
Validation is testing to determine whether the measure accurately represents the evaluated concept and achieves the purpose for which the measure developer intended (i.e., to measure quality). Measure developers use validation in reference to statistical risk models where they compare model performance metrics between two different samples of data called the development and validation samples.
Validity (Scientific Acceptability of measure properties subcriterion)
Validity includes measure validity (when the measure accurately represents the evaluated concept and achieves the intended purpose, meaning to measure quality) and data element validity, which is the extent to which the information represented by the data element or code used in the measure reflects the actual concept or event intended.
Validity testing is empirical analysis of the measure as specified demonstrating data are correct and/or conclusions about quality of care based on the computed measure score are correct. Validity testing focuses on systematic errors and bias.
Validity threats are measure specifications or data that can affect the validity of conclusions about quality. Potential threats include patients excluded from measurement, differences in patient mix for outcome and resource use measures, measure scores generated with multiple data sources/methods, and systematic missing or “incorrect” data (unintentional or intentional).
A value set is a subset of concepts drawn from one or more code systems, where the concepts included in the subset share a common scope of use (e.g., Anticoagulant Therapy).