(c) 2006, 2010 Brian Altonen

Each individual who studies population health has his/her own algorithm for engaging in special parts of a given research project.  At times these algorithms are quite unique and by some have even been considered ‘valuable intellectual property’–valuable because it is the reason he/she has the job in the first place.  During my first year of researching large population datasets in winter of 1995/6, the new and updated population predictions were about to come out for the year.  My professor managed to be the closest to the actual results found as the result of an earlier summation completed from previous census data (2000 census).  His correctness upset a number of his comrades in this area of research, enough to lead him to go find work other than within the local Population Research Center.

Also around this time, I was developing my methodology for using population pyramids to display areal features using GIS, and applying this to such things as studies of breast cancer screening in urban, suburban and rural settings, and the impacts of SES on overall health-related factors that could be measured, ranging from tobacco-use history to age-gender-ethnicity related fair (or unfair) distribution of specific types of screening practices offered at the local community level.  For the most part these were all theoretical events in Public Health, since administration and poor outreach related education were  the main reasons these SES-Health related differences existed in the first place.  My turning to the study of likelihood of chemical exposure in relation proximity to chemical release sites based on EPA evidence researched at the local level, through ground truthing, was one avenue I took to engage in more valuable research methodologies requiring higher technological approached to producing the final results.

When I initiated my work with the Medicaid Population health project, until I was able to view the entire series of databases out there, I was unable to fully comprehend what could be done with these various information sources.  It has always been my impression that IS people are typically very good at evaluating this information statistically, and if they are well-trained, spatially and temporally.  It is unusual to come upon many statisticians in the much large data banks that have a complete impressions of the value and potential applications of their information.  To date, this remains the most underutilized, misunderstood aspect of data development in the medical field.  One typically makes their mistakes and limits their productivity by not engaging in the right tasks relative to the IT resources they have at hand.

In 2002, I once handed a series of sophisticated datasets applicable to GIS for a GIS expert to use.  This individual made use of the core datasets and failed to employ the new sets of findings, because their use was not previously employed for such studies.  At the time, I had just completed the first phase of my work demonstrating applications of GIS to west nile and lyme disease detection, by making use of non-medical, biology-ecology information sources.  The biology world favored my approach, the medical world ignored it.  The result of this was my decision to go on with these ecology projects and begin to apply remote sensing, raster data, grid data, drgs, aerial photos, and DEMs to my disease ecology projects.  this result in the completion of most of the subsequent work in Winter of 2004, and my presentation of these results in 2006.  In the months ahead, two agencies paid close attention to this innovation, linking to it by 2007.  Other agencies soon followed.  To date, the methods I used to successfully predict West Nile Vector ecology-defined activity have not been attempted by any other agency, as far as I can tell.  The only reason I can think of that would deter people from engaging in this style of mapping disease ecology is the type and amounts of work entailed.  With 20 years of professorship experience, I have to say that it is my sense that students are better doers and more inclined to try something that is innovative. (Just to prove you wrong some times!)

Bringing all of this back to the development of institutional or corporate data with the goal of defining population health, population health has to be evaluated a completely different way in the work sector than is used in the health care sector.  The reason for these differences is HIPAA.  HIPAA prevents outside workers from being able to study population health outside the health care setting. This is due to human privacy rights and the lack of in-house generated health care patient data in many databases outside to typical health care system.    The same problem exists at the corporate level, where large data libraries exist in the form of claims data and various other health insurance related documented focused on activities and revenues.  These too are available only for corporate workers who have access to the datasets. 

For this reason, at the corporate level, one has to learn to employ methods that link economic data to population health data, in order to develop a more accurate impression fo what the patient medical records might bear.  Any conclusions reached in this manner are of course inconclusive, yet they are still important becauyse they provide us with another view on population health.  The limits that exist with this approach are that one lacks the ability to correlate corporate data with hospital data, without actually seeing the hospital data.  So inferences are made using corporate data, based on experiences of individuals with experience in the information found in hospital or medical records data.

This is where the various new data sources come into play for corporation trying to make the best use of their own data.  These new applications are due to the insights that medical informatic specialists can provide.  Some of the newer data sources provide just the types of information needed to suggest to researchers that a given health-related event took place in the hospital setting.  As a result, this health-related activity can be used to define a risk and possibility a diagnosis, with the actual diagnosis proven based on whether or not temporally derived support of an ongoing diagnosis exists.  This means that whereas in-house work in a health care setting often takes a bottom-up approach to looking at patient data, corporate work has to take a lateral approach to the same information, using a relatable dataset that can in turn be put through a series of algorithms in order to determine the main method of producing a link to an actual medical event.  Whereas in-house QA/QI activities of HEDIS/NCQA purpose are patient and medical record driven, corporate world studies of the same have to be product driven, followed by the use of laterally placed suppositions that can be made from this product information.  The method of focusing on the consumer makes use of insurance coverage information to develop an understanding of the individual’s (or family’s) risk based on theoretical SES status derived from address data, and behavioral features based on prescription and implied medical data.  The flowcharts used to compare medical versus pharmacal evaluations of population health are distinctly different.  So too are the algorithms applied.  the main link between these two is the national pharmaceutical classification systems that exist for drug types (as in The National NCPDP Database), and the standard uses of ICD information, NDC information, CPT/HCPCS drug identification/utilization information, and the availability of various laboratory, procedures and claims codes.

The following compares QA methodology within the insurance office setting, where records may be available for use in evaluating overall population health.  The first methodology is typically used to assess patient health for a random sample to complete set of patients (depending upon prevalence and inclusion requirements),  This data is then combined to approximate population health.    The second flowcharts depicts a methodology for use in developing a population health review based primarily upon claims and prescription data.

The differences in the outcomes generating process for this method of review

One major advantage to the latter method is the inclusion of work with the purpose and application for active preventive activities.  Proactive activities are gathered and interpreted as measures of patient involvement and the related PCP involvement in the care giving process.  The remaining information pertaining to prescription drugs and products evaluates preventive and maintenance medication activities, and occasional (hopefully) urgent or emergent care related activities.  (Note the blue tone defines the proactive activities; the red tone the non-proactive activities and the grey tone the role of claims in this process.) 

One can measure both proactive and reactive behaviors separately, related them to one another, and use them to define the overall healthy behavior of the population for a given disease or condition.  A method can also be developed to perform identical evaluations on different medical conditions, using these activities to together define a particular population health related feature.  When age-gender related subdivisions of the datasets are taken into account, a subpopulation by subpopulation assessment can be performed.  This would help to differentiate one type of insurance program from another, and be useful in determining where priorities lie with regard to recommending the best forms of intervention activities to pursue. 

The remaining step in this general flowchart on how to assess each particular population health defining measure or feature is to determine how to identify the indicators for these various proactive and reactive behaviors.  The codes listed above and those employed in claims processing and prescription drug use are the primary identifiers to be used for determining absence or presence of a given feature.  As explained previously however, mere presence is not enough to establish with certainty that a condition exits.  Temporal reviews demonstrate repeated performance are needed first to validate a suspected condition, and then to identify its severity and persistence in terms of quality of life.  This means that dates will help analysts determine the overall disease/condition history and state.  

Once Groups Health values are assessed, the same method of evaluation may be performed based on temporally defined subsets of the overall population.  In other words, the full data is to be used to define the overall conditions, states and related events to be evaluated, whereas the temporal data would be used to view specific activities and their rates of change and/or progression.  The latter may be used for example to determine when missed scripts or reduced use of medication occurs, on a per diem 24hr dose-defined basis (akin to HEDIS/NCQA measures done to evaluate cost per script measures, for reduction in costs-based prescription monitoring programs).