(C) 2006, 2010 Brian Altonen

Utilizing Script Data

Relating HEDIS to National Pharmacy Data

To better understand how HEDIS is applied to the development of a method for analyzing pharmaceutical data, one first has to understand the method in which NCQA breaks down the pharmacological data and then develops an algorithm for defining individuals with a specific medical history.  This history is not based on Pharma data alone, just as it would be a mistake to base the identifications of someone for a HEDIS-like study on simply the absence of presence of the diagnostic ICD code somewhere in the medical records or claims.  Generally speaking, in order for an NCQA-certified list to be produced, there usually has to be an ICD found to very something that is inferred by the script data.  In HEDIS and QIA related Diabetes studies for example, when an individual is manually added to the study population, this means that the diagnostic ICD was found, even if one has to go back ten or twenty years in some cases. 

With the ICD and diagnostics issue behind us, we next turn to the datasets used to define the health of an individual based just on the script history.  Sometimes the HEDIS-NCQA lists provided define the qualifications for including an individual in the study, other times the listing of script information provides a starting point for filtering the population further in order to uncover cases through other means, such as lab reports related to the condition under review, and finally, the listing may serve as an exclusions list, in which case it is used to identify those pharmaceutical histories that exclude a person from the official study.  The following table details this lists provided by NCQA for the 2010 HEDIS.    The second, third and fifth columns come from the NCQA website information.  The fourth column provides an abbreviated title for use in referring to these datasets just for this particular study, and was defined by an analyst.  These HEDIS measures are the 29 main measures identified that may have some sort of relevance to reviewing prescription data with HEDIS-like goals in mind; this reasoning is based on the flow chart at the top of this page.

We are just dealing with 29 studies for reviewing the prescription dataset with the goal of developing a standards report sheet.  Twenty-nine studies couldn’t be too hard, right?

The following is a layout of the data for just one small group of medications reviewed for HEDIS.

The next table is a count of the numbers of different drugs and drug groups that relate to each of these 29 studies.   The number of NCD codes related to each class are noted in the N_Rows column.  These codes don’t define the total numbers of drugs as much as it pertains to the numbers of different types of drug products that qualify for a particular category related to the HEDISRxClass ID.  The 5th and 6th columns depict a method of considerably reducing the dataset length for information processes purposes (N_ShortGen and N_ShortBrand)  ShortGen refers to the generic name assigned to each product, abbreviated considerably from the name assigned to the product based on the NCD database. 

The advantage this use of Shortened listing is it reduced the amount of time and work needed for the final evaluation to be performed.  One could easily reduce the  numbers of evaluations a computer has to perform per review.  For example, for the MPM evaluation of Digoxin users (HEDISRxClass 1), 100 individual would require 100 x 212 separate queries (212,000), per column of data; there are 212 NCD codes and related names that the system has to review, per individual, per single source of this information.  Were the same task to be performed using one of the abbreviated coding versions (assuming this data is available and/or incorporated/linked to your initial dataset), this cuts the numbers of searches down to just 100 for N_ShortGen and N_NCQAcategories, and 400 for N_ShortBrand categories.  Were we to apply the same reasoning to a query related to the last row of information (No. 29, DAE), a query of the 100 individuals for those eligible for a DAE review would require more than 3 million queries.  By replacing this dataset with the shorter Generic Dataset, we reduce the numbers of queries approximately by one-fourth.  If we found that the NCQA category could be used for this particular group to be assessed, we cut the initial size down to one-twentieth its original size and length, for just one column.  

So, one of the most important steps to take in developing a search algorithm for a particular is to know the ways in which NCDs overlap between study types, and the ways in which reclassification techniques could have the unexpected impact of pulling in rows of data that have nothing to do with the study.  Should one of the much broader NCQA classes be relied upon independently, we could risk including names of individual using another  generic version or brand of that drug that does not apply to this particular study.  For this reason, the use of short classification schemes is encouraged, but at the cost of paying time to assess the datasets in order to exclude the chances of including an ineligible set of data.  For this reason the other personal information, ICD and Claims datasets are used to support one’s findings, based on script data alone, resulting in the inclusion of an ineligible case.

The following is an example of a simple to follow flowchart on this process for HEDISRxClass 1 – Digoxin:

Compare this with the flowchart for the third series of MPM dataset – antiseizure medications:

Both of these sets really don’t demonstrate the complexity of the problem that can surface due to the immense amounts of data being analyzed and reviewed for a HEDIS-like study.    The following is an example of the second MLM review, used to define a population of individual at risk for misuse of their Diuretics-related scripts (#NCD codes = 2428):


With the latter case, it may be more effective to focus on utilizing the HEDIS method, so long as this data exists in your dataset, or is available for joining or linking to the original data.   The abbreviated generic name may also be used, so long as the dataform is fairly reliable and there is absolutely no chance for misspellings.  In many query and search tools a single error can stop the programming processes, requiring some recoding or corrections to be made, before one continues with processing the algorithm.  Multiple errors lead to multiple stops, meaning that in the long run, sometimes the use of original NCD codes alone may suffice, if reliability of dataform and quality is in question.

The above example is fairly similar to the same reasoning in relation to a diabetes care management study.  With the HEDIS review of diabetes however, and the matching QIAs and PIPs often performed, the complexity of this project continues to increase.  the following is the HEDIS-NCQA table defining the diagnostic scripts related to cases eligible for a Diabetes managed care study.  Although the listing of P”rescription” terms seems fairly brief, as revealed by the related NDC code, the complexity of this listing makes manual reviews impossible, even in extremely small datasets.

The following is the diagram for the process engaged in for identifying the medications based on NCQA recommendations.  NDC codes are not provided here due to their length fo such a list.  The HEDIS and Generic and Brand shortname listings are provided, as produced using a method to reclassify the medications so as to facilitate this focus on drug groups.  These are the medications used by NCQA-accredited programmers to diagnose an individual with diabetes in accordance with HEDIS standards.

Putting this into perspective for the moment, excluding the ICD- and Claims-related assessments, this process in itself contains numerous steps that have to be considered before any attempts are made to run the related program.  The details of each name or code decreases loss of information should the original name or code only related to a few patients.  However, a mistake in one entry or reading could compromise the entire review and take up a lot of time trying to correct. (NDC codes for example can become faulty if converted to numbers unexpectedly.)  This impact is reduced when lengthy lists are linked to the original dataset.  The use of shorter lists can be more appealing, so long as no errors are made in spelling and the like, but also have the risk of missing larger amounts of data should the original code name itself be out of date or, for some reason, not yet added to the new datasets or no longer in use.

If we now add the review of hypertension medications to the diabetes study, we produce a study that relies upon standard claims and personal information data to begin to define subgroups related to this study, and then add queries of ACE/ARB related information to define overall risk and begin to define the means to engage in some sort of grouping of diabetes related health risks.  This latter step is to some extent proactive in nature, and may be used to differentiate the meaning and application of the method of data assessment in a way that standard HEDIS methods do not directly participate in.  By automating this proactive module in the project, an additional set of outcomes can be developed for used in designing specifically targeted follow-up interventions.

One of the more complex studies that is done as a part of HEDIS, and rarely as an internal QIA study due to its complexity, is the review of the use of medications considered to be high risk to members of the aging population (esp. 65+) [DAE].  The following is the prescription drug assessment process.  For reasons related to length and complexity of the listing, note that brand names could not be included on this flowchart.

Due to the lengthy listing produced, another approach may have to be taken to analyzing this important aspect of script history.  It helps to break these groups of drugs down a little further in order to determine if some sort of priorities could be assign to this task.  By subdividing this study into groups, we also develop a way to better understand the meaning of the results by putting them into perspective with the specific group type under review, and the possible relationship other demographic or history features may have with the findings.  

 In the above categorization of the various subclasses applied to these 9,089 NDC coded products,   Those with their category names in red are considered, for this review, to be of special interest, followed by those of a dark red color, black, and finally grey.  The variety of products (# of NDC codes, used to define the size of each wedge), and type and use of the product was then used to assign relative risks.  With these risks established, a sequence of review could then be defined, in which the reviews are carried out one drug class at a time.  For example, for elderly patients the following hierarchy may be established for this process:


The following would hold true for a totally different population, defined by the researcher based on different set of priorities related to this type of risky prescription drug use analysis.


Due to the size of the original dataset for this HEDIS-based approach to reviewing scripts, and due to the value that can be obtained by engaging in this type of analysis in a fairly detailed manner, smaller groups sets are used to define the case queries.   For example, consider the following three listings of the contents for each of the major risk classes for prescriptions medicine defined in the above pie chart.




The above methodologies were used in a national pharmacy script dataset (several hundred million rows) to develop a means for identify different types of patient groups.  This work began with studies of metabolic syndrome cases (diabetes-hypertension-hyperlipidemia),  moderate to severe asthmatics, and patients with a history of rheumatoid arthritis.  The major reason for this review was compliance with medications and the possibility of prescription drug misuse of overuse (i.e. COPD-related bronchodilator dependency).

Regarding the measure of Asthmatic Medication use, historical focus has been on Steroid-use frequencies.  This study topic, now with nearly a decade of use, is in need of an additional research topics.  One such research topic is based on the recent prescription drug recommendations according to Guidelines, and assessed by NCQA.  The Research question: how many asthmatics participated in preferred use of anti-asthmatics according to recommended Guidelines and NCQA evaluation methods?  This method was evaluated using previously gathered risk population identifier information (described on another page).  Frequencies of use for the following two groups were reviewed, as well as percent “correct” use of asthmatic medications.  Exclusion criteria were applied as expected for HEDIS.

For Rheumatoid Arthritis treatment, the focus was on compliance and cost in relation to Monoclonal antibody use.  For this review, the following categorization was applied.

In essence, since this research was carried out based solely of Rx information, with predefined features for positive diagnosis and compliance with NCQA definitions for inclusion/exclusion, only the script utilization part of this review process was carried out.  For the above studies, claims data was not employed to develop the prescription drug study results, only to define the list of  those who qualified for such a study.  This suggests that Claims data may be used to define the population for the first pass through this population health review, followed by more specific studies based only on the members who qualify for such studies.  In the case of Asthma moderate to high risk identifiers, claims were the only way these individuals could be identified according to HEDIS standards.  For RA and Diabetes, ICD was employed.  For MPM analysis, several other non-HEDIS categories were found to be defineable for such a study, making it more comprehensive and highly applicable for use as an internally-reported statistic used to develop an internal population health monitoring tool.