Part II – Examples of Applications | Brian Altonen, MPH, MS

Part II – Examples of Applications

“. . . the solutions to our problems lie outside the box.”

Aviation Week & Space Technology, July 1975

NOTE: This study was supported and supervised by former Perot Systems (now Dell Perot), and internal IPA and IT monitoring groups from 2004 to 2005. All reports generated since then follow previously agreed upon institutional and federal program IP and PHI rights and regulations. Datasets have been slightly modified without modifying statistical outcomes. Data sources have been renamed and/or provided with a unique thoretical identifier of data content. Only age and gender identifiers are presented in the original format.

Part 2 Introduction

This methodology can be applied to any type of age-gender-n analytic method. It can be used to analyze consumer population-product related costs at the population areal level, such as oil/gas consumption between regions or states, by age of credit card user. It can be used to analyze luxury item expenses of amount of money spent for items not required for the basic lifestyle, such as purchases of alcohol products, over-the-counter medicines and nutritional supplements, use of many to engage in vacation or recreation related activities, of even health related expenses and specific need-related rates for people with specific medical backgrounds, of specific age-gender groups.

Application and Theory. Let’s say your job is the analyze sports. You have been hired by a large research group contracted with major sport industry companies to review the entire United States population, as many people as you can, in order to determine how they tend to engage in sports relative to gender and age. These sports are not just the typical big money sports that millions like to engage in, such a golf, basketball, softball, soccer, racketball, tennis, horse racing, and stock car racing, but also less common events like mountain motorbike racing, horse back riding, bocce, badminton, chess, checkers, cribbage, and even monopoly. Your job is to develop a way to query or survey people about their recreational hobbies, and then evaluate these for which ones reach a peak at which ages, by which genders.

To accomplish this, you set up a survey on a national site designed to engage in these surveys at low cost. You write-up a standard series of questions, and target national distribution sites for the announcement of your survey activity. Your goal is to get several million, perhaps more to be engaged in this activity. You want people of all age ranges and all genders, providing questions about every game or recreational competitive sports like activity someone engages in at the particular stage in their life. Everyone who answers questions must provide answers for those events they engaged in during just the past 12 months or year, to make sure you can relate the age and gender directly to the events they like to engage in.

Currently, the size of the US population is a little over 311 million. A response rate to surveys of 5% for such large populations would be incredible. That would mean approximately16.5 million responses. Normal surveys try to reach goals of several hundred, or at most 1500 to 2500. Due to time and energy, survey companies typically do not engage in supersized survey events. This is because of statistics.

The nature of the formulas used to evaluate groups is that to produce a statistically significant change in your initial sample population, you must double that size to have the greatest likelihood of doing so. If you double it again, now producing a survey with 4 times the original sample size, you have even a better chance of seeing just how far off your original sample set was. Typically what happens when you do this is you end up experiencing regression to the means. You first sample size is unlikely to be anything close to perfect, but numbers wise it is assumed to suffice. Your second sample size of twice as many participants, provides you with a set of responses to compare with the first, and due to the role and nature of n is statistical equations, this new 2n is likely to show statistically significant change, if there is a likelihood such will happen. Since the 2n is better to work with, your final percentages and such are considered more reliable, and were achieved at the expense of at least double the workload required to get the survey done. Doubling this n again gives you 4n, which now is even more work, but again a more reliable and trustworthy response, and more than likely yet another example, with much better regression taking place, demonstrating the regression to the means outcome.

These scenarios can be replicated numerous times, extending into 8n, 16n, 32n, 64n, etc. Each time you double the population, you provide yourself with the change to statistically impact the final results. Without doubling the n, you may get an increase or decrease in certain averages, but these changes are unlikely to be statistically significant since you did not obey the doubling rule for n to test out your new population size.

This above model is what the standard survey companies follow regarding their engagement and evaluation of statistical outcomes. With the public health sector, this is why the n’s chosen by HEDIS/NCQA are what they are. To get a minimally reliable outcome, plus some, the n’s chosen for HEDIS/NCQA projects follow this ideology, logic and statistical philosophy. They add to their population N an addition number of people to study and submit results for just in case some of those previously validated and listed as eligible are later found not to be eligible. In a large population public health study such as diabetes, an initial maximum number of eligibles for review might be considered somewhere between 450 and 485, to which 25 to 30 are added in case later eligibility issues are uncovered. A full study is considered to be somewhere around 512 to 525, the names of which are randomly chosen by a computer based on the size of your population, and it is your job to look up each and every one of these records for hits that are not obtained automatically as administrative measures (the database provides the needed data from an electronic database with claims and diagnoses or test results, thereby preventing any need for potentially error-ridden manual data pulls, entry and evaluation).

In a population of 10,000, 512 only represents about 5% but this is the number of cases that need to be evaluated for a HEDIS/NCQA measure with large prevalence rates. Once the added 20 or 30 names are excluded from that value, we are actually talking about 4.8% being evaluated and considered representative of the entire population. Ideally, one can double that value to 1024 cases in need of evaluation, thereby jumping the percent representation to about 9.6 to 10%. The problem with this methodology, and why it is not done, is that this more than doubles the manpower-work time needed to perform a HEDIS/NCQA evaluation on population health. We would like to see everyone engaged in health care evaluated for their particular condition and all of it statistical facts, but the necessary time and manpower required for this is not there. Therefore, we evaluate only a sample of about 5% of the population and use that to represent the total population health outcome.

In the case where we evaluate 1024 people, or double this again to 2048, we have progressed to evaluating approximately 20% of the total population in the latter case. This is ideal because the percent is: a) >16.7% and b) highly likely to represent regression to the means in a favorable way. Favorable regression to the means occurs when the new n is large enough to be difficult to change. When the new n is that high, there is a sort of blindness to this statistical fact that non-mathematicians fall for when requiring a review of outcomes, and decide that they wish to wait much longer, until all the results are in.

To assist in the above matter, let’s say you decide to say, ‘ok, I’ll wait, and enter the data as it comes in, and continue to evaluate at different stages, and over time see what happens.’ This is used to monitor outcomes and is a method that tells you just how wavering the response changes can be over time. In the case of medical education, it was found that over time, more and more people know the answers to the survey questions asking them about how a particular treatment is the be managed. Whereas in the beginning, the first classes had many people providing the wrong responses before the class was given, and of course substantial improvement thereafter, in the case of an ongoing program for one year, it will be found that fewer people provide wrong responses before taking the program, thereby demonstrating less of an impact on the entire population after wards. In a course where 90% do not know the answer, but afterward only 10% do not, that represents a 90-10% or 80% change. If a year later, 75% do not know and afterwards, only 10% do not, that represents a 65% change, not as good as the first months through. This passage of information (the monkey theory) is what makes these types of surveys time sensitive methods of evaluation.

You can also use this review over time of results to define the rates of change for knowledge over time. More and more are likely to know the answers as time passes, even without taking the course. If one year is the official end date for this particular thing being taught, then over time you expect regression to the mean to take place. In this case, the final mean or number of people who know this particular piece of knowledge is greatly improved, and in the end is at its true mean for that population of participants. These participants in turn may only represent 1% of the medical profession. So do they represent the US population of physicians as a whole? They do if their total N is sufficient enough to sample enough doctors, to produce answers which represent the doctors’ responses as a whole or entire group. This brings us back to the first part of this story and lesson, once the number of participants in such a program is at 5%, you can feel pretty good about the sample, since it does match the HEDIS/NCQA protocol. Moreover, since you allow your sample to go well above that 5%, by trying to involved tends of thousands of doctors in very good programs, your validity and reliability issues begin to fade away. This new study is a very good indicator of what the remaining population of doctors are like once the 16.67% point is reached in percent of whole population sampled.

Why 16.67%? If you looks at a typical bell curve, with normal distribution, you find approximately 67% in the +/-1 standard deviation (sd), 95% to be in the +/-2sd, and 99% in +/-3sd. When you look at this type of curve is hard to find a place where a normal bell curve can be drawn and demonstrate a certain degree of offset from the overall averages and distributions of results. A 16.67% in bell curve form is limited to placing its peak somewhere near the center of the entire curve; it can result in two peaks, one at each end of the curve, with little of the results being close to the true average or peak of the single peak bell curve, but this is also unlikely to occur. We are more likely to see some form of disorderly scattered distribution of results centered around the true mean than a perfectly ordered response with multiple peaks or just one peak perfectly offset, resulting in significantly low or high outcomes. Therefore, the 1 in 6 sample of a population can be considered fairly ideal, and most likely representative of the total population. The means may not at all be the same for all of N and 1/6th of N, but the distributions are expected to be representative, and the results of any math work on this also representative. Thus the 1/6th sample size results in some that hs considered an outcome nearly equal to the perfect outcome for the entire population–demonstrative of a true regression to the means.

Bringing all of this math background back to the main theme–analyzing people for a particular areas of interest, a particular cost-related behavior, a particular purchasing behavior in relation to a particular product type, such as buying accessories for one sport versus another, during a particular period in your age, and based on whether you are male or female.

Your work begins with a database of people who represent 20% to 25% of the entire population that is to be involved with this research. This 20-25% is greater than 16.67% or 1/6th, so we are above the expectations required to considered true regression to the means as a possible outcome. In other words, out population is large enough to demonstrate some kind of regularity and fluidity of change relative to age, per genders, so as to not demonstrate that much meandering back and forth above and below an expected true outcome for the final numbers to be evaluated. With N > 16.67%, you expect your outcome to show some sort of smooth distribution in outcomes, representing the entire population, and if checked and rechecked with high N each times, would show further regression to the true means for all of N each time you double your sample size n.

To simplify this problem of sports and how people engage in sports and recreation based upon gender and age distributions, we will apply this method to a simple population 300 million instead of 311 million, and assumed a sample size could be obtained that was equal to 75 million or 25% of the entire US population. These people have an expense history in the national database that can be evaluated relative to sports-related and recreation-related purchases, including what we normally consider to be sports activities as well as recreational activities that are not normally considered to be sports, but nonetheless are recreational, such as board games, electronic games, basic cards, etc.

Example 1. “Kickball”

In the first example, an entire population is evaluated for participation in one of the most common games in elementary school–“kickball”. This game “kickball” is used as a theoretical name for a true dataset. Kickball was chosen due to some similarities this sport has with the data uncovered. In the following graph, we find “kickball” tends to be favored “by the boys more than the girls” and so results in the following age-gender distribution of percentage of people participating in this “sport”.

First notice there is a significant difference between boys and girls as stated, but also note how engagement in this activity tapers off significantly by the time one is in high school (14 years of age) and is almost non-existent by college age (19 yo). Adult involvement is nearly absent, but with a little activity on behalf of the men versus women over 22 years of age (father-child association?).

The above graph is the standard method for evaluating an event by age in one year groups. The x-axis numbers are the results of a relative frequency calculation–the numbers of people who answered this particular question are evaluated relative to an overall base population. The numbers of replies for yes and no questions are calculated relative to the numbers of surveys completed by that one-year age group.

This tells us how many, and at what frequency, but doesn’t tell us where the numbers are statistically significant. To evaluate this a standard formula was applied for statistical analysis, with a slightly different take on how to apply it. The resulting levels of significance were then written into the formula, such as a value of 1.0 means one value of statistical significance, 2.0 the next value, 3.0 the third value, etc. The transition from 1.0 to 2.0 has built into it a correction, that prevents exaggerating the level of significance. The critical change of less than 1.0 to greater than 1.0 is important, but each step up in level and amount of significance is tapered or suppressed somewhat to avoid misrepresentation. This means that a 2.0 or great response is certainly significant, without doubt as to accuracy and amount, and is in fact more than twofold in its level of significance when compared with a standard index value defining significance. This represents a logarithmic correction of the outcome as it becomes more and more significant beyond the threshold level. The same result can also be interpreted as a exponential unit of change for each integer value above the threshold level. (To be sure one understands this logic: these two sentences are saying the same thing mathematically.)

The above values are based on what is called a statistical profiling method of the significance of these two distributions. As noted in the prior page in this series, it is like asking the question ‘Is person A’s nose significantly different than person B’s nose? Could the two edges or surfaces depicted in these two patient profiles represent the same part of an identical person?’ [Facial profiling applies this question and the formulas herewith used in a 2D or 3D fashion.]

Logarithmic tools can be used to emphasis where the change occurs, without really making the amount of change seen stand out as much. In other words, in this case edge effect is required, we want to see where that change begins and ends quite clearly, in a way that reaching the threshold for change is emphasized and the amount of which that level of change is maintained is the emphasis of the graphical illustration. To apply this correction method, values that are not statistically significant are not charted, and those which are significant are defined by degree or magnitude of significance using a log equation. The following depicts these log corrections for the above values:

In two dimensional surfaces, this outcome is very important for detecting edge differences. In certain forms of imagery such as Side-looking airborne radar (SLAR), this formula would enhance the edges of land surfaces, such as a place where road edges are replaced by vegetation regions, or trimmed laws replaced by sandy dunes, etc.

In this example, those areas where there are no changes detected are excluded from being displayed. This is a standard GIS application found in most raster and grid imagery software. Logarithmic equations have the effect of greatly exaggerating change from one centroid to the next, one x-y value to the next, or in the above graph, one moving windows generated age-group statistic to the next.

Applying this to the demographics research world, we produce the eabove example of a graph. All of the values charted in the above figure are greater than 1.0; the rest indicate there is no significance difference, or changes if the above figures illustrated temporal differences. This figure above, if it were a population pyramid, essentially states that the younger age participation demonstrates the highest likelihood for change, with older men and women having an equal likelihood of participating, and as age increases, with men participating in the sports activity more than women. Also note that gender differences are minimal in the young parenting age–about 19-23.

This way of graphing the probabilities of engaging in a particular activity suggests that in terms of frequency of events, there numeric differences that play out in the graph, but once true numbers are taken into consideration in relationship to probabilities based on base population values, boys and girls are almost equally likely to behave differently when given the option of engaging or not.

Part II. Examples of Uses

Childhood Recreation

If we apply this to games that are very popular in childhood, but somehow manages to remain popular, we find the following results.

There is a fairly equal distribution of behavior during the childhood years, an elimination of these recreation activities during the young to middle parenting stage for females, especially no involvement by mid-age males, with the second peak in older male behaviors occurring during the final years in life. This type of distribution will be seen for certain other behaviors and lifetime activities with similar age-gender choice of involvement or participation.

Midlife activities

The following is for an activity with tendencies to impact mostly people in their midlife years. Examples of these activities could be engagement in risky sports, such as the highly risky contemporary sports like combined water skiing/hang-gliding, high elevation free-falling/diving, mountain free-climbing, mountain- and air-biking, or combined mid-winter mountain climbing/snowsurfing. For this group, a similar activity shows a tendency for men to be more engaged exactly at midlife, with the same results seen for women although for a much shorter period of time.There is also a peak in female engagement in the middle school to young teenage years, with only a momentary involvement by older teen age males.

We can in turn relate the above kinds of distributions to other activities with age-gender relationships. Each of the above may be related to age-gender distributions for specific diseases and used to develop extremely information insights into how a disease impacts the lives of certain age groups of patients. The methodology employed for the middle example charted represents raw statistical significance figures, with the peak age for these differences very well-defined. Applying a log to this methodology smoothed out this peak, emphasizing more the range of people actually demonstrating a statistically significant impact. The second chart depicts these impacts only when they are significant and better. The third chart gives us an idea on the relative amount of impact, with results above the critical threshold depicted in more equal terms.

Recreational Eating and Substance Abuse

This next topic is a very sensitive social issue. The research question included in the survey of individual who confess to binge-eating and drinking alcoholic beverages as a recreational activity in the college setting. Students were evaluated for the frequency of engaging in these risky behaviors based upon their age and related to others in the academic environment as a whole. We expect adult behaviors related to excessive alcohol consumption to be fairly substantial, but compared with the drinking activities of teenagers this over-indulging was considerably less and in a statistical sense not that different across the different age bands. Teenagers and the youngest college students on the other hand were very active, especially at early ages, with females demonstrating a significant drop in this experimental behavior linked to socialization, whereas males continued this into the later post-high school, post-undergraduate, post-college years.

When the same series of questions was asked about tobacco and “junk food” consumption, similar behaviors were seen, but with several features that were uniquely different.

The male age peak of the mid-20s recurs. The femal age peak shows an older age preference for onset of eating and smoking activities. The value of the log-interpretation of this is demonstrated in this example. The obvious differences in age groups when these experiments and/or decisions to engage in unhealthy smoking or eating habits commence in the typical population. The peaks in the very young and very old age groups are, as in the examples provided earlier as well, a consequence of the numbers related to this methodology and are to be ignored for this particualr comparison. [The formula for this did not correct for nulls and low N in specific age range window settings.)

Illegal drug use involving street drug products like Amphetamines, Quaaludes, and Cannabis demonstrated some minor age-gender peak differences, but overall showed that the highest risk group was again the adolescent-young adult groups.

College Student and Teacher Sports or Athletic Injuries

A review of Sports and Recreation also provides an opportunity to survey individuals and the types of sports injuries sustains at particular age ranges. This information can then be correlated with the age-association certain types of sports have with certain types of injuries, for example tennis and elbow injuries or fractures, or softball/hardball and arm fractures.

With elbow injuries and fractures, we find there to be a bimodal influence suggested by these recreation survey results. Males demonstrate a tendency to experience events related to this when under 21 years of age and between the ages of 48 and 57 inclusive. Males under 21 years of age are expected to primarily consist of sports relaed events, whereas those between 48 and 57 years of age are probably experiencing the results of a combination of risk behaviors activities, ranging from fractures due to athletic related activities along with those due to day to day domestic and outside work related activities, perhaps in combination with calcium-related bone loss in a few percent of the respondents. Women on the other hand have a much broader age distribution, suggesting non-sports related activities for some of the age groups. This is especially true for the 21 to 44 years old age group, for which the exact causes for fracture are possibly quite complex. The decrease noted for the age range 43 to 47 could be the result of response rates, but also possibly effective treatment of this age group most often associated with this type of age-related metabolic problem.

If we relate the likelihood of experiencing any of the common sports-related elbow injuries/fractures to an injury/fracture lacking a relationship to sports–such as skull, face or hip fractures–we see where gender and age, absent of sports related cause, impact the distribution of fractures over time by age and gender. In addition, with the exception of high activity/high strength requireing events, we find that women obviously experience this problem a great deal more of these than men. High strength activities such as occupational and high risk sports events are in turn more likely to impact men during the primary sports years (which has been significantly extended durign the past decade).

If the same method were applied to limb fractures like radius-ulna fractures versus tibia-fibula reveal similar outcomes as well. According to Colorado statistics, when population distributions of fractures of outdoor sportsmen were normalized to match an average US population based on Federal Health Care program statistics, the first type of fractures was show to have a strong childhood-early adult age link, whereas the second demonstrated a slight ability to continue into adult years, although not in any highly significant manner for men. This contrasted with the high significance noted for younger women regarding fracture types.

Again, comparing the above with a non-sports type of fracture–a femur neck fracture due toany of a number of diseases very common with aging, such as night blindness, osteoporosis, hearing loss, short term memory deficits, and even dementia, the following type of outcome may be seen.

. . . continued . . .

“. . . the solutions to our problems lie outside the box.”

Aviation Week & Space Technology, July 1975

Brian Altonen, MPH, MS