Topic: Demography – Lectures 1, 2, 3, perhaps 4 (catch up)
Population Pyramids and Population Health
Examples of GIS Innovations. Innovation 2 – Evaluating Statistical Differences at the very large N Population Level, 1997 to 2005
NOTE: This presentation is a merging of several lessons. The study was supported by a former Perot Systems provider and internal support groups from 2004 to 2005. For presentation purposes, datasets have been slightly modified without modifying statistical outcomes. Since parts of this data come from older presentations, some subjects reviewed were renamed and/or provided with a unique theoretical identifier of data content. Only age and gender identifiers are presented in the original format.
Part I. Introduction, Theory, and Background
The following pertains to an old formula with new life. There was this formula I developed for analyzing flood plains and transects that worked very well with raster imagery and DEMs. It was designed to identify such things as changes in a land surface over time or changes in a land surface over a specific distance based on regular transect analyses. The original purpose of this was to analyze surfaces for risk areas based on flooding behaviors. The problem with analyzing flood activities duringthe mid-1990s was you couldn’t use the standard data available for elevation since elevation was a constantly changing value over space that was always related to the exact same surface–sea level. So if you were 1000 miles up the Mississippi River, the elevation of the flood plain surface was provided to your relative to the Sea Level in the Gulf. As you went down the Mississippi River, this elevation value always reduced, approaching Gulf sea level values. To understand floods, one has to related river surface levels and potential flood surface levels to the immediately environment placed along the edge of the river, not the Gulf of Mexico water surface.
To correct for this problem I developed a way to rasterize local water edge elevation numbers and assign them to linear raster depiction of the center of the river. This raster line was then slanted to produce a slope of zero from its beginning to end. This effectively made the river seem perfectly flat, and it is that flat surface that all neighboring land surface data are then modified in order to relate them to the elevation of the closest water level, corrected for the value zero. Now the land surface rasters could be evaluated relative to the closest water body elevation, and a height of 15 feet above closest river surface could be mapped.
It ends up this ever-changing river surface elevation once corrected has features that mimic other surface edge transect properties. Whereas rivers constantly move downward, land surfaces undulate up and down. Applying the same mathematics to a lateral versus longitidinal river transect provides a different interpretation of the same section or raster point in that river GIS raster dataset. One can compare one undulating surface to another, using the same formula used to detect and correct for changes, to measure how much change is taking place each time a change happens. This led to the methodology detailed here on how to analyze transects, profiles, and other changing, irregular line depictions, in order to define where the greatest changes happen. What I added to this methodology was a method for identifying where statistically significant differences exist in the numbers recording that change. These statistically significance measures of the same values tell us where a change has occurred that has to be reviewed, such as when a significant landsurface shift occurs due to an earthquake.
Now, there are already more perfect ways of evaluating the earthquake land surface change information provided by a number of companies. This method was applied instead to evaluating curve relationships for different groups of data such as facial curvature and surface transect patterns or, in this case, population pyramid differences between two different groups of people. This latter application makes the best use of this algorithm, and it is an easy way to compare population profiles in very small (one year) increments. For populations, this means that age-gender relationships can be compared to each other, enabling a care manager to determine the exact year when interventions are most important. For business analysts, cost versus age-gender can be compared, for example related to measuring participation in sport or recreational activities, income levals in relation to gas and fuel expenditure as an age-gender feature, or market activities and shopping expenses paid in relation to age-gender behaviors.
In medicine, this method shows that cost is related directly to age in that there is an exponential increase in costs incurred as one gets older; this means that for each given disease type, that age when cost suddenly skyrockets can be calculated using this methodology. Whereas most curves generate a measure of differences as one mathematical value related to probability of likelihood this number is correct, this method generates these cost-age or measure 1-measure 2 differences as multiple results varying over time, age and gender. If multiple periods of change exist, they are all identified using this method. This allows you to see the details about where the greatest changes are occurring, and metrics which typical odds ratios, chi squares, Student t-tests and multivariates fail to provide. In a single test we can see where children, older adults and even middle aged people are in need of interventions or preventive health changes.
The final math for this analysis will not not provided for now. Only the steps are covered.
Steps to making this Discovery.
This particular project requires some documentation for IP purposes. This was not a discovery that came overnight. It was not a simple “Voila!” and it was there. To understand the value and theory of mapping, you have to run through numerous tests and formulas. One out of ten or less will be for some good. Many of the rest will be interesting, but not really add enough to the methodology to be considered a failure or a success. But to develop the more complex formulas you first have to experience making the first versions of whatever it is you are doing. That is how I came upon the way to analyze demographic data using a spatial analysis formula I developed for areal comparisons. The good thing about statistics is that old formulas can often be applied to new things, in new subjects, to answer research questions unrelated to the original use.
Step 1. 1997. Developed an equation to produce a mock or artificial land surface along with a river flows, mimicking the flow patterns of most known rivers. Standard non- or minimal-ox-bow producing meandering rivers had flow patterns consisting of 87-93% of their flow defined by the linear portion of the model, using a quadratic equation. Surface planarity defines the beginning and end of any and all flow patterns. The remains behaviors are defined using a cuboidal equation; this accounts for the curving and deviations from the expected seen for meandering river. As part of this project, linear longitudinal and lateral transects of the mid-river were produced and this form of profiling a river bed reviewed.
Step 2. 1998. Applied this to work for my thesis on Cholera and the Mississippi River, creating a transect of the entire Mississippi River from Hudson Bay to the Gulf of Mexico. This transect was evaluated based upon sea level. The riverbed normalized longitudinal transects of the Mississippi for two different times frames could be compared for documentation of topographic changes at the delta end of the river (where vibrio cholera grows) and certain portions of its mid-states region (low elevation above closest river surface level regions).
Step 3. 1998/9. Duplicated 1997 work. Applied this modeling of river beds to a smaller creek, and modified the formula so as to correct for surface planarity and look at local land elevation above closest river surface elevation instead of the actual sea level. Applied correction formula designed to cancel out elevation changes over space relative to sea level, reassigning elevation values to river surface instead. Used this to map out disease patterns when elevation above sea level becomes the primary indicator of risk. Identified where significant changes occurred based on new formulas.
Step 4. 2000. Applied new transect formula used to define statistically significant regions to line drawing instead of river edge and river bottom transects. Applied it to comparing two faces, for statistically significant differences in common identifying features like nose size and shape, chin protuberance, eye brow ridge, etc..
Step 5. 2004. Applied this approach to population pyramids, comparing male to female age distributions, followed by one population versus another; developed a way for determining if there is any statistically significant differences between two completely different population sizes. Tested and applied this to populations of 2500, 27,000 and 60,000, versus base population of 250,000 to 450,000 depending upon the year and month of each study (baseline population kept growing). 2004 to 2006.
Step 6. 2005. Developed three very different formulas for comparing two populations: first method is applied to very low total N populations, second to any population size but not necessarily reliable, and the third (most reliable) for any two populations of any two sizes. 2005.
Step 7. 2005. Afterwards, applied this same to costs, comparing total cost between genders, and then to total population. Applied stat sig technique to define where (at what age) cost become significant due to test population age-gender distributions, meaning cost is due to large numbers of that subset of the total population (more children than expected requiring more well visit care, more patients than the norm for over 65, more women in their childbirth years than normally expected, more teens that expected in the alcohol/drug testing years, etc.) . Developed technique for comparing costs ($10M+ range) to patient age-gender groups (50,000+ range), using normalization formula developed for comparing two very different curves.
Examples of Use (Teaching Notes)
In the first example of surface transect analysis, there are three transects taken of a river bed. The section reviewed is about a mile in length, with three transects taken a mile apart from each other. Imagine for a moment (not really, but imagine) that this is a place near the deep river valley of Snake River, with smaller stream beds flowing parallel and meandering-parallel to the main streamedge due to well cut terrain features. The questions might ask for this type of analysis would be how do the peaks vary over the length of this study area? How are the tributaries interacting with adjacent streamedge? How does the depth of the river impact the immediately adjacent land surface?
There are slight differences that occur from one end of this region to the other end. Some sections have the peak appear to get taller, or result in a deeper river bed, while others have much shorter peaks and cliff edges. In some transects of the river and adjacent beds we see a well defined flood plain developed due to particular local features, and some sections with no flood plain at all. Even small changes across a fairly flat plain far away the river edge can be such, such as a reduction in a ridge left by an old parallel braided stream or narrow ox-bow formation, versus the well-carved non-changing bed of a channel with only its river bottom topography changing overspace. Each of these slight surface undulations can be magnified in remote sensing software (magnified z), but still we haven’t a way to statistically define each square area or cell of these points (grid cell centroids) relative to their neighbor, except by using some the standard formulas out there in the software developed by Clark University and such, which are design to measure these changes along a z-axis by visualizing and comparing two perfected overlain X by Y projects of the same space, not necessarily designed to evaluate in different transects across the same size of terrain, adjusted for comparisons, along the same modulating x-y defined transects.
If we view the above transects as modifications in the same surface over time, the following results can be obtained.
This time the transects used define surface change over time. With the above interpretation of the same lines we expect to see signs of erosion and aging such as demonstration of changes in the higher elevation regions, talus and alluvium slope and topographic change, some filling in or topographic change for flat areas which originally had small peaks that eroded away as well as the development of new depressions in the terrain or holes. We also see signs of possible erosion along fast flowing streams, or perhaps ox-bow and braided rivulet formations occuring adjacent to major stream or river beds assuming the hydrological features, substratum and topography are set up for this.
The same terrain profiles or transects noted above for spatially distinct and temporally distinct transects can also be related to other undulating surfaces with unique spatial relationships or three-dimensional spatial features, such as faces:
The profiles of faces are much like transects taken of the land surface topography around rivers and streams (in fact they were used for the earlier examples). This means that a formula used to compare transects of a river bed or topographic region can also be applied to studying facial silhouettes.
Once the two are normalized in terms of size and placement of one key measurement index point or indicator, similar features for each of these transects can be compared with each other in terms of two axes–the first representing depth (x-axis) and the second representing distance between two nearby features (y-axis), or vice versa depending on how you term things. These values can be compared between objects to determine which two are most similar, if not where do they differ in some statistically significant fashion, and if you have a data library of these particular features, which one is most likely to have come from the individual you would like to match a profile to?
When comparisons are made of two objects, a normalization process has to take place to allow these projections to be compared. There are a number of ways to do this normalization process, and see how this contrast and compare processwill work. The transect itself may be lied side by side for visual comparison, with the aid of the computer moving this surface and testing its fit once it is perfectly positioned. This is a geometric way of the comparing the two forms, identical to the vector methods commonly employed for many such analyses.
Another way to compare and constrast is through a grid analysis, focusing on the edge of the two objects being compared. In spatial GIS raster systems, there are algorithms in place for exaggerating the differences that exist when two edges are the same. In a technique similar to photometrics methodologies, a grid can be overlain on the above profiles or transects to do similar grid comparisons. The key limitation to this methodology pertains to grid cell size, the smaller the cells the more accurate your measurement method is. But these smaller cells also require more storage space and time in order for the calculations to be completed.
In the above example again, to compare the different surfaces, when the photo or image sizes do not match, these forms have to be reprojected and normalized–made comparable in size to each other. Then the statistical analysis technique is applied to see where statistically significant portions of the paired datasets exist. In the case of profiling, the input profiles are compared with a library and the best fit is found. In the case of comparing differences, where statistical significance exists using the numbers method developed for this analysis, the resulting output demonstrates where these differences exist and to what extent relative to each other. This is done by scanning the surface and running the equations used to compare two surfaces. The numbers then tell us where the best fit exists.
The methodology I developed for population reviews utilizes in the latter task and adds a variety statistical tools to the methodology in order to quantify when a change in statistically significant in terms of change or differences. We can also term this indicator value a sensitivity index. One could begin a query by stating that only more than a 3% change should be considered statistically significant, thereby allowing for 3% error in the analytic method. Or true statistically significant values can be used assuming the right equations are in place for engaging in this comparitive analysis, keeping the desire to only illustrate statistically significant differences between the two profiles or transects whenever you are surveying or monitoring the outcomes for your project at hand.
The next section details how a special method was developed for engaging in a statistical significance evaluation of two lines or surfaces using this particular method above.
IP Background. The entire methodology developed for this work is self-created and proprietary in nature, copyrighted, and not available for any professional use at this point. This methodology is now about 10 years in age, with several generations of development over the years. For now, there are no plans to release the details of this formula or the series of methodologies I developed to produce my results, which was the case for the hexagon grid analysis.
This analytic method was designed specifically for exceptionally large N groups, with numbers of primary metrics rows amounting to millions or more. This method is designed for comparing very large population (or even very small) to exceptionally large populations. It can be used to compare the numbers and types in people in one state engaging in oil/gas consumption relative to another state in terms of gender and age relations, per area of research selected, even by various subsets of products or expenditures involved.
An addition tool was developed as well for testing and quantifying cost related outcomes, which will not be described due to the complexity of that tool and its underlying theory and formulas. Suffice it to say, this methodology is applicable to cost- and other population related metrics and has no parallels in terms of resulting in a product that defines the entire population’s statistical state, in terms of exceptionally small theoretical groups, which in this case are age and gender defined.
Introduction. Using one of the largest population based activities the current market place relates to as an example, health care, we can see how the population age-gender metrics tool can be applied across the board.
Let us assume for the minute that company 1 at has an excellent database designed for work with primarily prescription drug related information. Its database is managed by a series of SQLs that provide an excellent platform for querying the data and pulling information on the specific of what you are searching for. It has a very robust program designed to calculate hundreds of metrics at levels and forms limited only by the amount of data that is available in form and type (number vs. char, etc). There are several dozens ways to subcategorize each and every datum in the medical or pharmaceutical dataset. There are even more ways to interpret use over standard periods of time, ranging from cost per unit of drug use to number of refills per year per prescription, on a monthly or quarterly basis. In between these two estimated values are such values as cost per unit taken, cost per 30 day period per rx, cost per week per patient, number of units required per given period of time, number of containers needed per store setting per patient with a given disease history. The human part (error driving part) of this methodology is based on how we actively and cognitively define what is good and what is bad, such as deciding what is the best way to break down drugs into specific therapeutic categories, or how to best define the cost for a medication, such as by month/30D period, or per day, or per dose.
On the other hand, company 2 has data that require a significant amount of processing before it can be accessed by users. These data undergo numerous predefined sqls in order to recategorize and recalculate the end results, which are a highly respected way of evaluating patient care within the medical system. The advantages to this data are that there is also the possibility of evaluating this information at the clinical level, depending on the form of medical data presented. In the end, your primary division into datasets is made based first on medical history, in which a systems based philosophy is used to categorize medical history based on both the clinical and script data. This standard method of evaluating cases was first developed by Yale during the 1970s, and has been popular because it combines certain related actions together into a single dataset. The drawbacks to this method are improper allocation of cost-utilization information (assignment of costs to the wrong reason they were accrued, due to other medical problems taking during the same period of time), and the requirement of specific time periods in which each case can be said to still be ongoing, or closed based on the medical actions taken (last office visit, case closure following a surgery, etc.). The human (error driving) piece of this methodology is how these case/event differences are defined.
The differences between these two companies is that company 1 is limited to working at just one level (for example in-hospital), whereas company 2 at two distinct levels (in-hospital and clinics). Whereas Company 1 provides a method to evaluating that is fairly simple to perform and direct in terms of its output, the method related to Company 2, since it requires pre-programming, is much harder to perform and therefore has less variation in how the information is managed for collection, making the outcome seem more reliable, multidimensional and systems based, but also less specific in terms of what specifically the outcomes are related to. For example, if someone was to look specifically at a special ICD-related treatment protocol, in the first method this has multiple options on how to call and filter the information appropriately, whereas the second method has this filter already defined and active in the system, but one which cannot guarantee any accurate ICD-related relationship between the rx and the specific disease.
If we wanted to look at Tourette’s syndrome for example, Company 1′s methodology allows for specific ICD and rx use, in multiple drug identifier forms. Company 2′s methodology does not allow for direct ICD use, only systems use, in which subsets have to be developed, and then those subsets evaluated again for one time and multiple time relationships within each ICD , under the assumption that a one time ICD related use may or may not be just an indicator of diagnostic testing activities at the clinical level. To confirm an ICD diagnosis, if we cannot emply the national HEDIS/NCQA method for pulling these cases into a unique dataset, then we then have to take a look at the rx use level, assuming there is a drug the patient has been prescribed for Tourette’s syndrome. Neither of these two methods are perfect, but one is easier to accomplish than the other and therefore can be run automatically.
Ideally there is also the Company 3 option that is available, in which Prescription Drug, In-hospital and Office related Clinical data are available in independent datasets, without the subcategorizing and redefinition of values required by the Company 2 methodology. Such a method however requires still more work to produce the final result that Company 1′s products generate. So how do we choose between the two? Company 1 methodologies have one set of applications and Company 2 methodologies have another set of applications. It is up to the customer in need of this information to decide which way of accomplishing this is best. It is up to the statistician to determine which ways are most accurate for measuring what specific outcomes are in need of being evaluated.
If it is cost that is driving this need for such studies, Company 1 is easy to make use of and Company 2 perhaps too cumbersome and time-consuming. If it is public health that is the issue, Company 2 is the best way to go, in such a way that adequate performance tools are developed in order to assure that measurements can be made in an accurate and truthful manner, in a fairly automated fashion. In a study of prescription drug utilization compared with clinical utilization related costs and related activities, there tends to be a 4:1 to 10:1 clinical:rx cost relationship. this means that evaluating prescription costs alone is kind of the half-blind research approach to tackling this elephant in population health analysis. Only the blind man is telling us what the population health issues are, not the deaf, anosmic, ageusic, nonproprioceptive or non-tactile coresearchers.
The program I developed works at any of these three levels since the baseline dataset required for such evaluations relies solely upon a very specific way of interpreting people by age and gender, based on age in one year increments, treating age as a continuous surface across a plane that can be evaluated in much the same way that transect analyses are performed on such things as riverbed elevation above sea level transects or cross-range topography transects using Digital elevation models. Since numbers are just numbers, the evaluation of an age-gender pyramid can be interpreted much like any continuously changing linear set of data. We can look at the before and after, compare one transect to another, or determine a way to quantify the amount of difference existing between line 1 and line 2.
The main feature of the formulas I like to use is that these formulas search for significant differences, not just amount of difference. Significant difference is unique from other surface trend analyses formulas in that it is employed in order to define where important differences exist, not just due to size but also due to likelihood that these differences may or may not exist due to simple chance related outcomes. Based on variances in age and gender figures, we can tell when a 50% change in N from group 1 and group 2 is significant or not, or in other words is due to chance or not. Is a change in the % and n of people less than 18 years of age, for example, from 20% to 30% significant for the particular population you are looking at? If it is significantly different based on the analytic method being used, then that means that cost projections for that larger group more than likely will be higher, as a consequence of N, not as a consequence of chance. If the significant differences is not statistically relevant, then this means that difference in dollar value is due to probabilities and nothing more, meaning less attention has to be paid to this problem. applying this to a true set of numbers for the two populations, a 20% vs. 30% difference is significant when it involves 1 million people, versus 10,000 people, this is due to the possible age-gender value variances each group can produce.
This also demonstrate why this methodology is meant to be used by very large companies with large datasets, not companies with the largest N in their typical studies measuring about 40,000 or less. There is a 95% CI to this method for N = 40,000, a 99% CI for N=80,000 (100,00 is even better). But this assumption assume normalized distributions, which mostly take place at N>>100,000. The formulas are best applied to exceptionally large datasets. The higher the N the more reliable the outcomes suggested by the study.
This next formula that I use to evaluate deltas was developed based on some formulas I wrote up back in 1997 in order to analyze three dimensionality. The initial research question at the time was ‘how does land surface 1 differ from land surface 2 over time?’ You have two surfaces, with wear and tear demonstrated over time on certain parts of them by changes in surface topography. You use this type of formula to make sure the two places being reviewed are essentially the same place with slight changes over time, or to measure how much change occurred temporally and where these changes that took place are statistically significant in terms of size and amounts of change.
The amount of similarity between these two places is thus what is being measured, and is what a formula needed to be developed for that would explain the amount of similarity. This formula set could then be related to several dimensions (x, and then y and z), to measure amount of identical form remaining. This same formula type can be used to compare people’s faces. It can be used in 3D form to evaluate both profile and depth of a face (nose size, eye cavity form, eyebrow ridges, forehead planarity, chin, etc.). The research question is how do we develop a formula that will tell you when a change in the surface is significant and to what degree is it significant? For example, a person could have had a nose job and a chin sculpting done, but have identical eye cavity depressions and ridges (items less likely to be easily changed, versus eye lid form and shape). We need a formula that tells you where the changes exist and to what degree are they different from each other.
Now, reduce this form of analysis down to two dimensions, focusing on transects. One can look at the transect of a land surface and determine where erosion has taken place above on the edge of a mountain and determine where the alluvial fan formed by gravel, sand and soil at the mountain base has widened and by how much, over a given period of time. This method is much like comparing just the profiles of the noses and eye brow ridges on two faces, before and after plastic surgery. This method can also be applied to curves with constantly varying shape and form. A migrating ridge on a curve can be identified, or the amount of difference seen in two separate curves that seem very similar can be evaluated, and that difference determined whether or not to be statistically significant. This method is applicable to analyses of age-gender-n population pyramid related curves. One can use this method to compare two population age-gender-n curves.
I developed a number of way to test the population age-gender curves over the years. Back in the early to mid 2000s these were used to analyze statewide statistics pertaining to health insured populations (see multipage section on HEDIS/NCQA work performance in BIOSTATISTICS/Quality Assurance/Population Health and Disease Monitoring. . . for more). At first I was just trying to define the populations included and excluded from my analyses statewide, but I realized that by applying this methodology to specific subsets of populations, the result population profiles could be used to explain the findings made at various clinical levels. Since formulas as simple relationships between numbers, sets and subsets, I realized this methodology that I was once employing at remote sensing and surveillance id recognition level between 1995 approximately and 1997 had applications elsewhere.
Looking simply at percent comparisons is one way to evaluate two age-gender-n graphing techniques, but finding out where there is a statistically significant difference between the two bumps or ridges is a much harder task to perform. This is what my formulas and methods of engaging in statistical evaluation of surfaces was developed for. You can have two populations, one with a fairly large population of kids, another with a smaller population size for kids with gender related differences (i.e. more young potential mothers <18), and you need to know whether or not this difference in kids is going to be statistically significant. It there are statistically significant differences, this may suggest that the two populations could behave differently in some sort of statistically significant cost-related way as well, in turn suggesting the need for additional intervention activities pertaining to that age group which are different for the two groups (i.e. more allocation of teachers and money of health education programs, or making future classroom size projections related to increased needs for health education programs).
This methodology can be applied to any type of age-gender-n analytic method. It can be used to analyze consumer population-product related costs at the population areal level, such as oil/gas consumption between regions or states, by age of credit card user. It can be used to analyze luxury item expenses of amount of money spent for items not required for the basic lifestyle, such as purchases of alcohol products, over-the-counter medicines and nutritional supplements, use of many to engage in vacation or recreation related activities, of even health related expenses and specific need-related rates for people with specific medical backgrounds, of specific age-gender groups.
Application and Theory. Let’s say your job is the analyze sports. You have been hired by a large research group contracted with major sport industry companies to review the entire United States population, as many people as you can, in order to determine how they tend to engage in sports relative to gender and age. These sports are not just the typical big money sports that millions like to engage in, such a golf, basketball, softball, soccer, racketball, tennis, horse racing, and stock car racing, but also less common events like mountain motorbike racing, horse back riding, bocce, badminton, chess, checkers, cribbage, and even monopoly. Your job is to develop a way to query or survey people about their recreational hobbies, and then evaluate these for which ones reach a peak at which ages, by which genders.
To accomplish this, you set up a survey on a national site designed to engage in these surveys at low cost. You write-up a standard series of questions, and target national distribution sites for the announcement of your survey activity. Your goal is to get several million, perhaps more to be engaged in this activity. You want people of all age ranges and all genders, providing questions about every game or recreational competitive sports like activity someone engages in at the particular stage in their life. Everyone who answers questions must provide answers for those events they engaged in during just the past 12 months or year, to make sure you can relate the age and gender directly to the events they like to engage in.
Currently, the size of the US population is a little over 312 million (as of 8-25-2012). A response rate to surveys of 5% for such large populations would be incredible. That would mean approximately16.5 million responses. Normal surveys try to reach goals of several hundred, or at most 1500 to 2500. Due to time and energy, survey companies typically do not engage in supersized survey events. This is because of statistics.
The nature of the formulas used to evaluate groups is that to produce a statistically significant change in your initial sample population, you must double that size to have the greatest likelihood of doing so. If you double it again, now producing a survey with 4 times the original sample size, you have even a better chance of seeing just how far off your original sample set was. Typically what happens when you do this is you end up experiencing regression to the means. You first sample size is unlikely to be anything close to perfect, but numbers wise it is assumed to suffice. Your second sample size of twice as many participants, provides you with a set of responses to compare with the first, and due to the role and nature of n is statistical equations, this new 2n is likely to show statistically significant change, if there is a likelihood such will happen. Since the 2n is better to work with, your final percentages and such are considered more reliable, and were achieved at the expense of at least double the workload required to get the survey done. Doubling this n again gives you 4n, which now is even more work, but again a more reliable and trustworthy response, and more than likely yet another example, with much better regression taking place, demonstrating the regression to the means outcome.
These scenarios can be replicated numerous times, extending into 8n, 16n, 32n, 64n, etc. Each time you double the population, you provide yourself with the change to statistically impact the final results. Without doubling the n, you may get an increase or decrease in certain averages, but these changes are unlikely to be statistically significant since you did not obey the doubling rule for n to test out your new population size.
This above model is what the standard survey companies follow regarding their engagement and evaluation of statistical outcomes. With the public health sector, this is why the n’s chosen by HEDIS/NCQA are what they are. To get a minimally reliable outcome, plus some, the n’s chosen for HEDIS/NCQA projects follow this ideology, logic and statistical philosophy. They add to their population N an addition number of people to study and submit results for just in case some of those previously validated and listed as eligible are later found not to be eligible. In a large population public health study such as diabetes, an initial maximum number of eligibles for review might be considered somewhere between 450 and 485, to which 25 to 30 are added in case later eligibility issues are uncovered. A full study is considered to be somewhere around 512 to 525, the names of which are randomly chosen by a computer based on the size of your population, and it is your job to look up each and every one of these records for hits that are not obtained automatically as administrative measures (the database provides the needed data from an electronic database with claims and diagnoses or test results, thereby preventing any need for potentially error-ridden manual data pulls, entry and evaluation).
In a population of 10,000, 512 only represents about 5% but this is the number of cases that need to be evaluated for a HEDIS/NCQA measure with large prevalence rates. Once the added 20 or 30 names are excluded from that value, we are actually talking about 4.8% being evaluated and considered representative of the entire population. Ideally, one can double that value to 1024 cases in need of evaluation, thereby jumping the percent representation to about 9.6 to 10%. The problem with this methodology, and why it is not done, is that this more than doubles the manpower-work time needed to perform a HEDIS/NCQA evaluation on population health. We would like to see everyone engaged in health care evaluated for their particular condition and all of it statistical facts, but the necessary time and manpower required for this is not there. Therefore, we evaluate only a sample of about 5% of the population and use that to represent the total population health outcome.
In the case where we evaluate 1024 people, or double this again to 2048, we have progressed to evaluating approximately 20% of the total population in the latter case. This is ideal because the percent is: a) >16.7% and b) highly likely to represent regression to the means in a favorable way. Favorable regression to the means occurs when the new n is large enough to be difficult to change. When the new n is that high, there is a sort of blindness to this statistical fact that non-mathematicians fall for when requiring a review of outcomes, and decide that they wish to wait much longer, until all the results are in.
To assist in the above matter, let’s say you decide to say, ‘ok, I’ll wait, and enter the data as it comes in, and continue to evaluate at different stages, and over time see what happens.’ This is used to monitor outcomes and is a method that tells you just how wavering the response changes can be over time. In the case of medical education, it was found that over time, more and more people know the answers to the survey questions asking them about how a particular treatment is the be managed. Whereas in the beginning, the first classes had many people providing the wrong responses before the class was given, and of course substantial improvement thereafter, in the case of an ongoing program for one year, it will be found that fewer people provide wrong responses before taking the program, thereby demonstrating less of an impact on the entire population after wards. In a course where 90% do not know the answer, but afterward only 10% do not, that represents a 90-10% or 80% change. If a year later, 75% do not know and afterwards, only 10% do not, that represents a 65% change, not as good as the first months through. This passage of information (the monkey theory) is what makes these types of surveys time sensitive methods of evaluation.
You can also use this review over time of results to define the rates of change for knowledge over time. More and more are likely to know the answers as time passes, even without taking the course. If one year is the official end date for this particular thing being taught, then over time you expect regression to the mean to take place. In this case, the final mean or number of people who know this particular piece of knowledge is greatly improved, and in the end is at its true mean for that population of participants. These participants in turn may only represent 1% of the medical profession. So do they represent the US population of physicians as a whole? They do if their total N is sufficient enough to sample enough doctors, to produce answers which represent the doctors’ responses as a whole or entire group. This brings us back to the first part of this story and lesson, once the number of participants in such a program is at 5%, you can feel pretty good about the sample, since it does match the HEDIS/NCQA protocol. Moreover, since you allow your sample to go well above that 5%, by trying to involved tends of thousands of doctors in very good programs, your validity and reliability issues begin to fade away. This new study is a very good indicator of what the remaining population of doctors are like once the 16.67% point is reached in percent of whole population sampled.
Why 16.67%? If you looks at a typical bell curve, with normal distribution, you find approximately 67% in the +/-1 standard deviation (sd), 95% to be in the +/-2sd, and 99% in +/-3sd. When you look at this type of curve is hard to find a place where a normal bell curve can be drawn and demonstrate a certain degree of offset from the overall averages and distributions of results. A 16.67% in bell curve form is limited to placing its peak somewhere near the center of the entire curve; it can result in two peaks, one at each end of the curve, with little of the results being close to the true average or peak of the single peak bell curve, but this is also unlikely to occur. We are more likely to see some form of disorderly scattered distribution of results centered around the true mean than a perfectly ordered response with multiple peaks or just one peak perfectly offset, resulting in significantly low or high outcomes. Therefore, the 1 in 6 sample of a population can be considered fairly ideal, and most likely representative of the total population. The means may not at all be the same for all of N and 1/6th of N, but the distributions are expected to be representative, and the results of any math work on this also representative. Thus the 1/6th sample size results in an outcome ranging from statistically representative to nearly equal to the perfect outcome for the entire population–demonstrating a true regression to the means.
Bringing all of this math background back to the main theme–analyzing people for a particular areas of interest, a particular cost-related behavior, a particular purchasing behavior in relation to a particular product type, such as buying accessories for one sport versus another, during a particular period in your age, and based on whether you are male or female.
Your work begins with a database of people who represent 20% to 25% of the entire population that is to be involved with this research. This 20-25% is greater than 16.67% or 1/6th, so we are above the expectations required to considered true regression to the means as a possible outcome. In other words, out population is large enough to demonstrate some kind of regularity and fluidity of change relative to age, per genders, so as to not demonstrate that much meandering back and forth above and below an expected true outcome for the final numbers to be evaluated. With N > 16.67%, you expect your outcome to show some sort of smooth distribution in outcomes, representing the entire population, and if checked and rechecked with high N each times, would show further regression to the true means for all of N each time you double your sample size n.
To simplify this problem of sports and how people engage in sports and recreation based upon gender and age distributions, we will apply this method to a simple population 300 million instead of 311 million, and assumed a sample size could be obtained that was equal to 75 million or 25% of the entire US population. These people have an expense history in the national database that can be evaluated relative to sports-related and recreation-related purchases, including what we normally consider to be sports activities as well as recreational activities that are not normally considered to be sports, but nonetheless are recreational, such as board games, electronic games, basic cards, etc.
Example 1. “Kickball”
In the first example, an entire population is evaluated for participation in one of the most common games in elementary school–”kickball”. This game “kickball” is used as a theoretical name for a true dataset. Kickball was chosen due to some similarities this sport has with the data uncovered. In the following graph, we find “kickball” tends to be favored “by the boys more than the girls” and so results in the following age-gender distribution of percentage of people participating in this “sport”.
First notice there is a significant difference between boys and girls as stated, but also note how engagement in this activity tapers off significantly by the time one is in high school (14 years of age) and is almost non-existent by college age (19 yo). Adult involvement is nearly absent, but with a little activity on behalf of the men versus women over 22 years of age (father-child association?).
The above graph is the standard method for evaluating an event by age in one year groups. The x-axis numbers are the results of a relative frequency calculation–the numbers of people who answered this particular question are evaluated relative to an overall base population. The numbers of replies for yes and no questions are calculated relative to the numbers of surveys completed by that one-year age group.
This tells us how many, and at what frequency, but doesn’t tell us where the numbers are statistically significant. To evaluate this a standard formula was applied for statistical analysis, with a slightly different take on how to apply it. The resulting levels of significance were then written into the formula, such as a value of 1.0 means one value of statistical significance, 2.0 the next value, 3.0 the third value, etc. The transition from 1.0 to 2.0 has built into it a correction, that prevents exaggerating the level of significance. The critical change of less than 1.0 to great than 1.0 is important, but each step up in level and amount of significance is tapered or suppressed somewhat to avoid misrepresentation. This means that a 2.0 or great response is certainly significant, without doubt as to accuracy and amount. When we apply the statistical significance measurement method to each one-year age increments, we see the participation to be expressed as statistically significant or not with the following results.
The above values are based on a statistical profiling of the significance of these distributions. To correct for this, values that are not statistically significant are not charted, and those which are significant are defined by degree or magnitude of significance using a log equation. The following depicts these log corrections:
All of the values charted in the above figure and greater than 1.0, the number below which there is no significance. This figure essentially states that the childhood participation is still the highest, with a peak around 6 or 7 yo. Older men and women have an equal likelihood of participating, but as age increases, men participate more than women. Also note that gender differences are minimal in the young parenting age–about 19-23 yo.
This way of graphing the probabilities of engaging in a particular activity suggests that in terms of frequency of events, there numeric differences that play out in the graph, but once true numbers are taken into consideration in relationship to probabilities based on base population values, boys and girls are almost equally likely to behave differently when given the option of engaging or not.
Part II. Examples of Uses
If we apply this to games that are very popular in childhood, but somehow manages to remain popular, we find the following results.
There is a fairly equal distribution of behavior during the childhood years, an elimination of these recreation activities during the young to middle parenting stage for females, especially no involvement by mid-age males, with the second peak in older male behaviors occurring during the final years in life. This type of distribution will be seen for certain other behaviors and lifetime activities with similar age-gender choice of involvement or participation.
The following is for an activity with tendencies to impact mostly people in their midlife years. For this group, a similar activity shows a tendency for men to be more engaged exactly at midlife, with the same results seen for women although for a much shorter period of time.There is also a peak in female engagement in the middle school to young teenage years, with only a momentary involvement by older teen age males.
We can in turn relate the above kinds of distributions to other activities with age-gender relationships. Each of the above may be related to age-gender distributions for specific diseases and used to develop extremely information insights into how a disease impacts the lives of certain age groups of patients. The methodology employed for the middle example charted represents raw statistical significance figures, with the peak age for these differences very well-defined. Applying a log to this methodology smoothed out this peak, emphasizing more the range of people actually demonstrating a statistically significant impact. The second chart depicts these impacts only when they are significant and better. The third chart gives us an idea on the relative amount of impact, with results above the critical threshold depicted in more equal terms.
Recreational Substance Abuse
This next topic is a very sensitive social issue. The research question included in the survey of individual who confess to drinking alcoholic beverages as a recreational activity were evaluated for the frequency of engaging in this risky behavior based upon their age. We expect adult behaviors related to excessive alcohol consumption to be fairly substantial, but compared with the drinking activities of teenagers to be considerably less and in a statistical sense not that different across the different age bands. Teenagers on the other hand were very active, especially at younger ages, with females demonstrating a significant drop in this experimental behavior linked to socialization, whereas males continued this into the later post-high school, post-college years.
When the same series of questions was asked about tobacco consumption, similar behaviors were seen, but with several features that were uniquely different.
The male age peak of the mid-20s recurs. The female age peak shows an older age preference for onset of smoking activities. The value of the log-interpretation of this is demonstrated in this example. The obvious differences in age groups when these experiments and/or decisions to engage in unhealthy smoking habits commence in the typical population. The peaks in the very young and very old age groups are, as in the examples provided earlier as well, a consequence of the numbers related to this methodology and are to be ignored for this particualr comparison. [The formula for this did not correct for nulls and low N in specific age range window settings.)
Recreational Quaalude, LSD, Speed, Cannabis and even Nutmeg and Datura use also demonstrated some minor age-gender peak differences, but overall showed that the highest risk group was again the adolescent-young adult groups, i.e. as follows.
Sports or Athletic Injuries
A review of Sports and Recreation also provides an opportunity to survey individuals and the types of sports injuries sustains at particular age ranges. This information can then be correlated with the age-association certain types of sports have with certain types of injuries, for example tennis and elbow injuries or fractures, or softball/hardball and arm fractures.
With elbow injuries and fractures, we find there to be a bimodal influence suggested by these recreation survey results. Males demonstrate a tendency to experience events related to this when under 21 years of age and between the ages of 48 and 57 inclusive. Males under 21 years of age are expected to primarily consist of sports related events, whereas those between 48 and 57 years of age are probably experiencing the results of a combination of risk behaviors activities, ranging from fractures due to athletic related activities along with those due to day to day domestic and outside work related activities, perhaps in combination with calcium-related bone loss in a few percent of the respondents. Women on the other hand have a much broader age distribution, suggesting non-sports related activities for some of the age groups. This is especially true for the 21 to 44 years old age group, for which the exact causes for fracture are possibly quite complex. The decrease noted for the age range 43 to 47 could be the result of response rates, but also possibly effective treatment of this age group most often associated with this type of age-related metabolic problem.
If we relate the above sports-related elbow injury/fracture to an injury/fracture lacking a common relationship to sports–hip fractures (below)–we see where gender and age, absent of sports related cause, impact the distribution of fractures over time by age and gender. In addition, women ovbiously experience this problem a great deal more of these than men.
The same method applied to radius-ulna fractures versus tibia-fibula revealed similarities as well. The former has strong childhood-early adult age links, whereas the latter demonstrates an ability to continue into adult years, although not in any highly significant manner for men, versus the high significance noted for women regarding the fracture type.
Again, comparing the above with a non-sports type of fracture–a femur neck fracture due to aging and quite often unidentified osteoporosis:
The unexpected peaks at midlife for women reveals to us something we hear about, but never really think about. There are specific age bands when women are very likely to demonstrate early onset of osteoporosis related symptomatology–a fracture experience that suggests that later in life they will suffering even more from this condition. The light green humps in the third graph offer such a prediction, and cane be applied to designing an intervention program for preventing these fractures from costing us more in the future.
Still such events in preventive care are not engaged in by the typical system. Normally an insurance program of preventive care program targets much larger populations, overspending due to the fact that the age bands used to define those who are at risk in a fairly generalized fashion. Since the above graph depicts exactly what age bands are at risk, this can effectively better target any intervention plans that are developed, and reduce unnecessary costs. The limiting factor–have a manager who can understand these latter concepts and plan his/her intervention program activities accordingly. This is an example of how a sports injury related study can lead to the solution for another more important long term preventive care issue.
Part III. Evaluations based only on Prevalences
Stages in Life and the Progress of Disease
There are specific stages in life during which certain diseases tend to appear and prevail. This way of interpreting disease and age resembles the teachings already out there about psychological stages in life and how conditions can progress eitehr into some sort of behavior and psychological malady, and then progress into a physiological problem with an possible neurochemistry phenomenon underlying the conditions, and/or develop more aggressively in this physical disease direction retuling in a condition which manifests itself very physiologically and in turn behaviorally and cognitively. The psychiatric interpretation of disease that manifest in such a way might state that a particular condition is completely genetic or organic and of neurophysiological, neurochemical cause, whereas a psychological interpretation might state that a condition is manifesting itself as a result of the sociocultural setting and mindset of the individual with that particular manifestation, syndrome, behavioral problem, psychiatric condition, or predefined ICD defined diagnosis, what have you.
In physical medicine, there is also this way of interpreting diseases based on behaviors that are linked to age and gender and the surrounding environment. Some of these physiological manifestations occur regardless of how we behave and act, others are in part a consequence of our behaviors and misbehaviors and the way that we live, but also biological in nature and manifestations as well. Still others are almost completely a manifestation of the fate of where and how we live, appearing as a consequence of our reactions to our environment.
We can review diseases in people in a fashion that takes this varying ranges of responsibility for illness, varying from innocent to not-so-innocent reasons for pathogenesis. By this method, the following age ranges and disease types can be defined:
- Newborn diseases, due almost completely to the physical state of a person and his/her exposure to the environment
- Childhood diseases due to environmental causes, with some human-generated causes for increased risk of disease development
- Late Childhood-Early Adult age diseases due to a combination of physiological, psychological and behavioral factors
- Mid to late Adult age diseases introduced due to complex genetic, physiological, psychological and behavioral causes, along with age-linked causes beginning to appear in the scenario as well, in particular as unhealthy chronic disease manifestations.
- Late to Very Late Adult (Elder) age diseases due to progressive disease complications, mostly linked to chronic disease related side effects, complications and physiopathologically induced somatic and psychiatric changes.
Type 4 Mid- to Late-Adult can also be broken down further into a younger and older adult age group–the younger group may be treated more as preventive, with the goal of reducing costs for care in their future, and the older group treated as a preventive/remedial group, with a goal of improving Quality of Life and reducing costs due to long term, imminent health related complications. This action is necessary in order to prevent Type 5 medical problems from prevailing in incidence, cost and mortality, such as a diabetic who in the later stages developed retinopathy, peripheral neuropathy, gangrene related limb loss and end stage renal failure.
The above ways in which disease and the aging phenomena impact the body result in particular age-gender pyramid shapes, that make the progress of a long term disease understandable and often very predictable.
The amazing thing about this way of interpreting disease and predicting health is how recurrent these characteristics or traits are related to a particular ICD type between seemingly unrelated disease phenomena. Due to the way in which we classify disease, we often define all of the baseline features for a disease, for which reason we have a hard time seeing the link between two diseases in two completely different physiological, organ systems. Is it possible for diseases that occur in midage in the gastrointestinal system could have some basic human physiological, environmental and behavioral features that overlap with a completely different organ system such as somatic pain and dermal sensitivity; for example, such a correlation if it does exist would suggest that GERD and Irritable Bowel Syndrome have some cause effect relationships directly linked to fibromyalgia.
Newborns and Young Children
The above charts represent prevalence rates, which is to say there are the numbers of individuals afflicted, diagnosed or under surveillance for a particular condition/ICD at the “Newborns/Young Children” age range. Without revealing any ICD identifications, suffice it to say the first condition to the left is one which occurs primarily right after birth, and decreases rapidly with time soon after birth. Notice there is a slight rise in prevalence in the females more than the males between 20 and 30 years of age as well. This tells us that such a condition does not only occur right after birth, but also rarely occurs or recurs in the older patient. The fact that it is rapidly reduced in rates by the time 3 years of age is reached suggests this has something to do with environmental exposure. It could be a condition that ensues soon after environmental exposure occurs in newborns, such as an event leading to an infection or an infectious disease state, one which is very rarely seen in adults.
The second condition is a typical childhood related infectious phenomenon. Its peak is at the 1-2 year period, reducing more quickly after 9 years of age, and then plateaus at very low levels in individuals between 20 and 45 years of age. This would be suggestive of a disease that is environmental and/or infectious based, with limited tendencies to develop a secondary peak in the younger adult years. In the case of an evironmental diseases, this means that the body in young adults has perhaps completely adapted to the disease related problem and its causes. For infectious diseases, this behavior would suggest that immunization either remains effective past the age of 18 and does not require some sort of revaccination process, or whatever physiological, anatomical and other conditions that made the young child very susceptible during the younger years no longer exist in the adult body, such as events due to an immature immune system. We know this is a disease related solely to the surrounding settings since it does not peak at <1 years of age like the prior. Unlike the prior, which initiates immediately after birth, this requires some sort of human and/or environmental intervention for the condition to initiate its pathogenic process. This minor difference in the 0-2 year old age group counts is very important to better understanding the disease or medical condition and its cause and effect relationship with the human body and with people.
The third condition is a genetically based condition which normally is diagnosed immediately after birth, and tends to have increased prevalences peaking at 8 years for males, 2 years for females. The reason for this difference is uncertain. Some human behavioral and environmental engagement processes are required for the condition to set in. Notice also the secondary peak in the male population at about 20-22 years of age. This suggests that there is a tendency either for the disease to become more likely to manifest in the early adult years, or for young adults to suddenly become more susceptible than they were during their teenage years, like when a childhood immunization wears off and requires revaccination, or this takes place due to a number of features involving lifestyle, physical body and environmental changes, and the effects these have on the genetic nature of the individual. In cases where genetics has the effect of lessening the survivability of a body, we would expect to see this sort of reduction in numbers begin to appear due to the mortality rates these conditions result in. Longevity for these individuals is lessened, with few surviving after 50 years of age. Also note, there is a small peak generated on the female side of this graph–this suggests that there is some sort of natural selection process possibly going on–these women are in their reproductive years, enabling the genetic trait to continue and express itself in the younger population. This is no doubt a very controversial type of behavior that will be seen to some extent in many genetically based diseases. While such cases illsutrate the value of this way of looking at disease distributions across a large population, it also demonstrates the moral and ethical problems such analyses might result in.
The fourth condition is a psychological behavior pattern that manifests itself quite soon after childbirth (>1 year of age), with female prevalence twice that of male for the 1-3 year olds, followed by two more prevalence peaks for females at 13-15 and 35-47 years of age approximately (again, reproductive years). There is a possibility that this is a biologically-based behavior change, but much of the evidence demonstrates it to be mostly of a behavioral nature. Any childhood cases manifested as neurobiological behaviors triggered by psychosomatic and possible endocrine causes are presumably treated effectively by 15 to 20 years of age, or the problem simply is outgrown and goes away. Also note that during the youngest and oldest childhood years, females outnumber or outbehave males in regard to prevalence status. There is a primary peak in prevalence at ages 2-3, another at 10-11, and a third at 37-42 (again, reproductive years). Performing a statistical test would demonstrate the second peak to demonstrate gender-defined rate differences.
Three Progressive Conditions
The next three ICDs are for genetically based syndromes or diseases, which are almost immediately diagnosed but tend to progress in life, and as a consequence even undergo a late diagnosis in some cases. It is important to note here that there are a variety causes for pyramids to be generated in this particular form. The attempts made here are to retain objectivity with the evaluations of these results presented, with a little subjectivity added in to disguise the true causes for the conditions illustrated. Suffice it to say that this method has been run on hundreds of conditions enabling some of these differentiations to be developed, but not revealed too much right now in order to maintain IP security.
There are several features here with genetic diseases that are possibly tied to natural events, normally not evidenced by past work methodologies. Nature has its (his/her) way of varying events over time, as if in some sort of step-wise, undulating manner, which is what these three conditions demonstrate. The genetic shift concept is well defined in past writings, in which a particular trait moves around a population rather than stay within just one narrow band of population groups that are somehow genetically defined. It may be the disease itself and its genetic requirements, or population behavior fluctuations seen at the gender level that result in these flucatuations. This phenomenon could be what is at play here regarding age-gender distributions and their changes over time.
This may also be a side effect of the methodology I employed for evaluating the data that could be the cause for these undulations. The undulations from one gender to to the next are seen with the first scenario to the left, for ages 12 to 45. In the first graphed example, this could be due to artifical peaks displayed at 2, M, 8 (F), 12 (M), 22 (F), 27 (M), and 35 (F). The second peak shows some smoothening of these peaks, even though they are much greater in number (this disease is rarely according to x-axis figures, but more evenly dispersed in terms of 1-year age brackets according to the figure.) The third peak represents the smoothening out of the curve once large enough numbers of people are graphed and evaluated using the formulas that were developed.
This makes the first two peaks interesting but not necessarily something reportable with much certainty and reliability. The third however is very reliable, and demonstrates important age peaks for the condition for separate age brackets, male versus female. If this third peak represented a physiologically-linked genetic disorder, we see considerable gender specificity for the 21-28 year old age group–suggesting something that happens with males is creating an increased incidence. One the other hand, sociocultural settings and the behaviors linked specifically to men may have some impact on developing this peak. If this disease were of a psychological or mental health nature, this would be an intersting finding to report and follow up on. Like several other diseases that exist with this same type with gender-specific age-linked prevalence outcome, there is a much larger peak for younger women than there for the mid-age large peak for men. Women are perhaps more likely to be biologically different than men once menses begins, suggesting this is sociocultural caused age-gender relationship at least in part, and possible a directly or indirectly link genetic condition, which the medical problem is most commonly thought to be related to. The larger peak in women in mid-childhood years could also be due to selective examination of potential victims of this condition-meaning women are more likely to undergo visit-related care and pharmacologiucal treatment than men, until the men reach 23 years of age, then they outnumber to women in the medical books.
In sum, the conditions progress from left to right as extremely rare to moderately rare. These syndromes are mostly recognized from birth on, and tend to manifest themselves in females at one stage in life and males in two or three others. The rarest condition on the left tends to be more fatal than the other two, but this might also be a result of its overall scarcity as well.
Mid- to Late Childhood Diseases and the Environment
The “Environment” related to health in children has two major parts to it. There is the standard physical environment and its relationship to childhood health that we recognize, and there is the sociocultural environment that has an impact on childhood health. These two environments are very interactive and become increasingly interactive as the child develops a better understanding of his/her place and role in each of these and adopts new habits and behaviors in response to this learning experience.
In the next health pattern we have two medical problems, one age defined as involving a child, the other used for but not restricted to the age of an adult (>17).
The continuation of this problem well into adulthood suggests a condition that is not age-specific and biological in a way that is age-linked, like early exposure to infectious diseases is with children who don’t undergo immunizations. Instead, this problem is usually psychosocial in nature, with age acting as a determinant due to higher frequencies for specific age ranges. Notice the significant difference in gender distributions for the second example. Since biological is ruled out, this represents a gender related indirect lifestyle and environmental influence as being a possible cause, of an age-specific socially predictable human behavioral cause. For both examples above, female related events significantly outnumber male-related events.
There is also a systems related effect that is responsible for the sharply define cut off for the left condition or ICD, and the well-dispered age distribution for the second condition or ICD. The first is defined as a child syndrome–so age is a limiting factor in how the case gets counted in overall epidemiological analyses and applying these results to a population pyramid. The second ICD demonstrate a tendency for people to not always use the terms “child” or “childhood” as a part of their diagnostic logic. The “second adult” condition continues well into the later years of life. Also note that there is a very gender-specific differences from teen age years on, and then a slight leveling off of the incidence rates across genders, with females slight greater than males. Again, this is a behavioral psychology condition being evaluated for the two age ranges and is used in order to demonstrate how the judgment made of risk (by the PCP and data tech) can impact results. These two ICDs are normally not evaluated as a single condition due to underlying sociocultural meanings attached to each. Each has a distinctly different preventive methodologies that have to be engaged in when their rates are increased. So the two are typically not evaluated together, and the young cases of ICD#2 are typically not included in the ICD#1 pool, although this could in theory be done for any such research.
These next examples (charts above) detail a behavioral-cultural syndrome and a biological-preventible infectious disease condition. Both have gender-specific targeting effects taking place. That with the socially defined behavioral-cultural causality is gender targeted for social reasons, which require a unique form of intervention processes to be prevented from recurring. The infectious disease gender-differences for the other condition may be culturally based or the result of a combination of human behavior and attitude/culturally based events and activities. In other words, certain aspects of the gender defined lifestyle relate back to likelihood of infection and spread versus the other which shows a likelihood of having culturally defined gender specific actions impacting the distribution pattern. There is no difference in age-gender relations for a non-sociocultural event, for its causes are completely environmental and naturally based. Unless the environment and nature are impacted by gender, age distributions are identical for conditions without the prejudice of age playing a role in the condition. (The one of the left is sociocultural, that on the right is due solely to natural features and human physical state.)
The intervention processes for assisting in each of the two above scenarios have obvious differences. The second ICD requries considerable efforts be made at the psychological-counseling level. The first problem or condition requires more some physical form of intervention focused on disease prevention and physical treatment, but may have a sociocultural-behavioral component to it as well This sociological interpretation is suggested strongly by the great reduction in this condition that aoccurs by the early 20s. Medical conditions that prevail during the early years of life tend to be related to exposure to environmental features, for which biological and behavioral prevention practices have not yet been fully developed, or the condition has a reduction in prevalence due to early years fatality, which is not the case for this condition at all.
Late Childhood/Early Adult, or Young to Middle Adulthood
The teenage-young adult years are usually the healthiest years. The more common ailments during these years are going to be socioculturally caused or predicted, physical ailments linked to social activities and the body’s state, and at times early or continued onset of a chronic disease that impacts these years of life. Those diseases specific to these years, demonstrating a tendency to go away, are infrequent in the ICDs. These are also not included much in the standard HEDIS and NCQA study options reviewed elsewhere during the course of this work, and are only on occasion included in regional and state-requested or overseer-recommended studies of Medicaid and Medicare patients.
The following are two very strongly gender-linked STDs, meaning that they show a tendency to favor one sex or the other in the clinical setting. We know from other STD studies that some gender-related diseases and gender-favoring STDs are tested for specifically in just one population or the other. These diseases are cross-gendered and treated and tested for in both genders, but with tendencies to demonstrate tendencies to present clinically for one more than the other.
The following pair of conditions common to and strongly linked to late teenage-early adult years are as follows. They are very closely linked to each other, are strongly gender linked and behavioral in nature, with one demonstrating a more aggressive behavior with tendencies to appear into the later post-early adult years. These are examples of two purely sociocultural conditions induced by personal behavior changes, with minimal linkage attributed to genetic cause (although this will no doubt change due to changing, increasing biomolecular technologies).
The numbers of male cases for both of these conditions also suggest that it is unlikely a genetic cause exists. The possibility of genetics causes is not totally eliminated from the possibilities, but based on the previously described examples of genetic onset diseases, we expect genetic diseases to become progressive once they develop (some do develop or get diagnosed only at later ages), and either cause a mortality that results in reduced prevalences for older patients, or continues to show increasing prevalence as the older age groups are evaluated. The narrow age band for prevalence in males for these two ICDs suggests a culturally-induced change occurs–these people no longer have that medical problem once they reach their late 20s and are required to be ‘more productive members of society’ (using the common paradigm and lingo). The first condition, the worst of the two, has a malingering age-related incidence curve; the second has a very naroow band, again link to sociocultural definitions, interpretations and related human behaviors.
Middle to Late Adulthood
The middle to late adulthood years should be interpreted behaviorally and socially as the peak periods of cognitive behavior, physical and mental work productivity, high rates of healthy or recreational as well as unhealthy behaviors, and high stress related disease like consequences due to the lifestyle decisions that have been made are actively and regularly engaged in.
The diseases that present themselves during this age band can be interpreted as socially defined events, often even outweighing the biological nature of their onset. This could be interpreted as the psychosomatic period of life, with peaks for those ages and disease patterns most often linked to psychosomaticism (as defined especially in the 1940s and 1950s, and perhaps early 1960s). The contemporary medical philosophy relates many of the contemporary diseases with established pharmacological therapies recently developed. These unique neurochemical explantations did not exist during the 1940s to 1960s when a lot of the diseases we see with this midage peak had a link to some sort of psychosomatic origin.
For each of these conditions there is a peak age followed by a drop in prevalence (remember, relative prevalence rates use a formula that automatically corrects for this annual mortality rate). This drop in prevalence is due to reduction in psychosomatic behaviors, perhaps as a result of age-specific sociocultural changes expected with the post-retirement years.
The following is a basic form these kinds of conditions produce when evaluated using the new statistical method.
The following are examples of the types of outcomes generated with this formula and methodology (just relative prevalences are displayed):
Notice how the first two demonstrate relatively early onset, one with relatively early reduction in prevalence for assorted reasons, the other due to its early mortality. In the second example there is a significant difference in relative prevalence for deaths during the reproductive years, with males more likely to die due to this problem, ten years before the women experience the mortality of their condition. If this was a behavioral psychology form of disease, this would have social implications in needs of further pursuit. If this were a genetically-based disease, such a finding could have tremendous moral implications. If it were some physical disease brought on by health related activities and behaviors, its sudden changes in mortality for men versus women at 20 years of age may be preventable.
The third example above is a disease with primary manifestations in the middle years of life. Like the first its relative prevalence for females is greater than that for males. The question to ask is why the reduction in prevalence from about 45 years of age and older? Since these are prevalence rates based on gender-age-adjustments, it represents a true reduction in mortality rates from 45 years on. This condition is a physical manifestation with possible environmental and mental health related autoimmune related causes, implying the gender differences are complex and hard to define, but result in a reduction in mortality rates in 1-year intervals from 45 on. The slopes for male versus female prevalence rate changes pretty much remain identical and unchanged for the remaining lifespan, until the two rates are once again equal.
The following are life long conditions with age demonstrating a direct relationship to risk. Individual survive a long period of disease, but then due to mortality changes begin to show a reduction in relative prevalence as age increases. These are 3 of the most common causes associated with mortality and quality of life during the midlife and older years. These curves are identical for a number of other similar chronic diseases that exist as well, with similar mortality relative prevalence rate age-gender relationships.
Late Adulthood to Elder
There are numerous diseases that begin to appear in increasing amounts with old age. These include the common degenerative diseases and well as complications brought about on by some very common life long lifestyle practices and the related poor health states.
The first and last of the above medical conditions are truly progressive and are due to non-fatal illnesses that result in increased morbidity due to aging. There is very little reduction in relative prevalence as the age continues to increase to the point of 85 years. These are conditions that people with a disease experience as a consequence of lifestyle, made worse with aging due to lifestyle practices. These individuals do not suffer any increase in mortality rates due to this part of their medical history, but quality of life is reduced. Examples of these secondary disease forms that prevail with older ages (not necessarily illustrated above) include such health changes as onset of retinopathy, renal failure, end-stage renal disease, thrombocytopenia brought on by long-term prescription drug use, disseminated intravascular coagulation, peripheral neuropathy, hepatic sclerosis, portal vein shutdown, intravascular coagulation, etc. etc.
The remaining charts represent a decrease in prevalence as age increases. This suggests the greatest morality takes hold at the max values for both sides of the above pyramids. Charts 4 and 5 demonstrate slight gender-specific favoritism regarding mortality rates.
Examples of these diseases are syndromes and conditions more like to be associated with older age groups, like hypertension, diabetes and heart-disease induced medical conditions ranging from retinopathies, to organ failure, to onset of severe cardiac problems and disturbances.
More on Gender Specificity
Some diseases are logically gender specific. Breast cancer is primarily female in nature (although of course male cases do exist!), and specific anatomically related conditions or health related activities are gender specific, like child delivery or a type of infection known as balantidiasis which only involves the penis.
The gender-specific condition on the left is an occupational disease that relates almost entirely to men (not the one just mentioned). The condition on the right is a sociocultural behavioral disease or syndrome related mostly to women.
These two conditions have very different temporal features.
The first is of old age occurence, and notice it continues past retirement years, and has a reduction in survival rates after the age of 75. This condition is brought on occupationally by long term occupation related risks, resulting in physical deterioration in the later years. Examples of this type of problem could include such medical problems relate to long-term radiation exposure in a male dominant job, long-term exposure to a primarily males only carcinogen, or long-term male-specific environmentally induced disease problems, such as physically compromising, high weight related occupational risks. There is a 20:1 ratio for frequencies of men:women.
The second condition occurs in women and had an age distribution much greater than that of men. Interestingly, the peak year is 15 or 16, with this large peak reducing in size quickly by the time the early 20s are reached. There is a 6:1 ratio of women:men.
Other Statistical Numbers-derived Behaviors
These structures have a tendency to demonstrate gender crossovers, resulting in age groups in which only one gender is affected by a condition. When total numbers of cases are small, this might be explains as a result of small n and large variance. But these undulations are seen as well for larger n’s as well, suggesting there is another cause for this phenomenon. The third example below of a very popular condition nearly extinguished over time demonstrates one of the best examples of an undulating pattern. The alternating male-female pattern change suggests a possibility for gender specific genome-related changes for the organism responsible for the condition, i.e. preference for one gender over the other, cycling back and forth during the final years of existence.
Examples of Application
STDs. Your company decides it wants to produce a more effective program for reducing sexually transmitted disease (STD) rates. The current prevention program in place mails letters to people when they reach 18 years of age and continues to monitor and periodically mail out reminder letters to people who are not married and are between 20 and 45 years of age. For this age range this means that 1.5 million letters are sent per year at reduced costs for bulk mailing, but still costing about $250,000 per year for the mailing, and another $500,000 in printing and administrative costs, costing you approximately 50 cents per letter.
When you review the population health features for this particular STD you find that the following age-gender distribution in prevalence exists for this particular STD.
The above tables illustrate that women experience this condition much more than men, approximately 10 times as often relative to men at the age of 25 years of age, the peak year for people with this diagnosis on record. Age ranges that are statistically significant are illustate on the middle figure, and show that there is little to no statistically significant prevalence rates in men other than at the age of 14 to 16 years of age. For women, there is a large age range where the rates stand out as being statistically significant, with the third graph emphasizing the scope to which the interventions to be developed need to be implemented.
The following are the results inferred by this method of modeling the age-gender relative prevalence rate statistics:
An intervention that targets teens has to be developed due to the bi-gender statistical significance illustrated for individuals 14 to 16 years of age. Due to early onset of this problem in the female cases, you decide that this program perhaps should also have some activities in place as early as 12 years of age as well.
In addition, the highest risk group is the female population ranging in age from 17 to 36 years of age, with a possible risk group also found in the 48+ year olds for females, 59+ year olds for men. These specific groups of people are targeted, with very different methods taken to interventions targeting men and women, and young to middle aged versus old. Notice there was no need suggested by these statistics for targeting nearly all middle aged men and all women between the ages of 37 and 48.
This redirects the cost of your intervention program, allowing it to more effectively target age-specific group types. Chances are your method of engaging in these activities will adapt the traditional mailing method, but more selectively choose the types of members you mail to, cutting the cost for this mailing in half. This money can then be redirected at targeting older people married or unmarried, either through bulk advertising means or by way of selecting the highest risk individual based on marital status.
Psychologic and Psychiatric Disorders
The Reporting Tool
The psychiatric assemblage above illustrates the fact that we can correlate this data much more easily through photographic memory methods. Those rtrained in right brain memory use work better with eidetic or photographic memory than with verbal or left-brained methodologies. By reviewing a lengthy book about a topic with numerous illustrations it is theoretically possible to capture the entire content of the program or presentation at hand. This normally is not the case however due to the complexity of text related images. Still, there are individuals who can accomplish this unusual task.
Most people approach this method of memory recall by utilizing photographic representations to denote specific pieces of data. In chemistry, one can draw and almost completely recall the ways in which the apparati are set up according to the lab notes, but have more difficulty recalling the chemical equations defining the experimental process. This is because the reader assigned left brain value to the content of the chemical equation formula, rather than right brained visual content. Most people can merge left and right brained thinking to develop their understanding of the information they have to analyze. A “cheat sheet” for example applied to a college chemistry course exam places data in specific locations on the sheet. We typically just look at the formulas and reiterate them on paper for the test. If on the other hand you write the formula on the cheat sheet, and then jot down some figures around it, or certain sets of abbreviations, one can assign to each of these specific knowledge sets. If we then merge this information witha spider diagram, we essentially are producing a method that is highly popularized by followers of Tony Buzan and his information mapping technique to better utilizing extensive amounts of data we need to rapidly access. When Buzan’s method is applied, a memory of the illustrations remains intact, and to each part of this illustration is assigned the knowledge related to that illustration.
Tony Buzan’s Mind Mapping routines are related to eidetic or photographic memory in that the same study and recall techniques are used to produce an effective method for recalling specific knowledge sets. We can use these methods to illustrate something even as complex as the age-gender distribution and health of a large population for hundreds of ICDs. Not only are each of the population pyramids photographically memorizable, their relations to each other and the types of logic for that relationship define how these many small graphs can be used to produce a single set of details pertaining to a particular special topics related discovery. The systems approach to knowledge and applying artificial intelligence to a project allow for this method to be used for improving upon overall performance when specific statistics are required for the production of an end result. A graph is also an image that can be recalled, with its x and y axes in need of more aggressive memorization. This method of understanding diseases can in turn be used to better understand complex disease behaviors as a whole. We can use this method to completely udnerstand the health fo a population and how this health related to other examples of population health. When we are comparing the age-gender distributions of two or more ICDs, it’s like comparing one face to another.
Disease or ICD population health pyramid charts show a natural progression and shared features. Very young age diseases have incredibly similar appearances in age-gender distributions. Specific ICDs exist for each age range that is passed while getting older. Newborn diseases progress to early childhood, then late childhood, then young adult, to older adult, to very old adult, etc. Presenting hundreds of ICDs in this order demonstrates associations between seemingly unrelated conditions that in turn provide us with insights into how age-gender impact overall population health features. From this, for example, we know that the oldest population gets one or more of a specific series of oldest-age significant prevalence disorder, and we can use this thinking to plan the medical care for such individuals. We can also use this to demonstrate that some disorders are very outgrowable (i.e. tics), and that their continuation into adulthood is a consequence of personal psychology and behaviors, along with ineffective health care systems related behaviors and activities.
Another flowchart for presenting the sum of the ICD population pyramids presents them in terms of specific disease groups as defined by major physiological systems. In the above set of psychiatric pyramids, we see one disease standing out amongst the rest in that it tends to prevail in younger male adults much more than females, and unlike other conditions with this pyramid form, does not have female prevalence peaking two decades later. With smoking for example, 17 yo is the male age peak and around 45 yo the female age peak for prevalence. But with this psychatric state, we do not see the female peak slowing increased in prevalence to somewhere in the 40s, only to die off afterwards like it did with smoking.
In a third example of ICD age-gender analysis and mapping, the statistics generated were related to large parts of the country. Regression to the means standards states that overall we expect little to no major differences between large regions when those regions are so large that they represent more a reliable sample of the country, and not a particular subset of particular age-gender and ethnic/cultural groups. Preliminary analysis of the large scale NCQA regions that were performed several times for example demonstrated minimal differences between Canadian and Central American boundary states based on ethnicity. Some NCQA regional analyses in the past however has demonstrate that certain regions have more complex family groupings, with more maternal care and more childrend to care for. But even these regional differences were minimal in terms of impact on specific ICDs possibly linked to defining these health care needs.
A fourth example developed demonstrated the use of one ICD to define another health related matter. This would be like mapping out fractures of the hip and relating them to something like a higher density of old-age groups, or to groups with a likelihood of developing these problems due to poor calcium intake. We could in their map out low nutrient intakes in a population based on ICDs and use that to predict regions more likely to demonstrate old age hip and pelvic fractures related to aging, low income and poor health. Particular injuries are related to child abuse cases, such that regional mapping of these particular ER claims can tell us if there are regional differences in child abuse. The typical indicators of child abuse like Batter Child and Shaking Baby ICDs are poorly reported due to lack of sufficient information proving that such a crime exists, along with fear of legal retaliation by physicians and staff for documenting such a claim without well pronounced evidence. This problem however is unimportant should another indicator of child abuse exist, and can be mapped regionally. Some of these symptoms or medical problems induced mostly by active and passive child care/attentiveness on behalf of the care giver/baby-sitter do exist, and were effectively used to demonstrate a regional difference in documenting abuse rates–with female child abuse tendencies for children < 2yo much greater than male abuse related injury claims posted.
If ICDs are presented using a systems approach (nervous system, psychiatric, blood, gastrointestinal, heart, bleed vessels, etc.), we see how even very small groups within each system and related subsystems demonstrate statistically significance age-gender distribution differences–for example, several genetically developed behavioral diseases in children are know to mimic each other in distributions, such as tourette’s syndrome versus tics or enuresis, whereas the same for blood disorders is not the case–hemophilia age-gender distribution in terms of longevity and mortality is substantially different from age-gender distributions for thalassemia, sickle cell carriers and sickle cell disease. Likewise, a number of possibly autoimmune and/or genetically related connective tissue disorders, often lumped together for analyses, demonstrate very different age peaks–some prevail in the very young and result in early mortality. Others are not deadly and only afflict the very old. Still others demonstrate the typical 35 yo+ age of onset and become increasingly progressive in life, with some ICDs causing early mortality related decreases in prevalence after the age of 65, and others not showing any reductions in prevalence even up to 85 years of age.
Ideally, two books can be generated using population pyramid graphs generated for particular sets and subsets of diseases–one that depicts age-gender only related progression in the graphed outcomes, depicting chapters with age-gender equality and inequality, and subchapters for each going from the youngest age ranges to the oldest. The second book depicting these graphs would rely upon the systems approach, showing how seeming related disease by type, physiology and anatomy can be very different from one another, and yet others logically related such as increasgingly worse forms of hypertension, diabetes, and renal diseases demonstrate a progression in age related to diagnosis–the worst diagnoses tend to be linked to the oldest ages.
The first book allows a pediatrician or geriatrician to review a chapter on just childhood or oldest-age disease patterns. The second book allows for a fuller understanding of each system of diseases to be developed. Since this method relies upon eidetic techniques for illustrating and explaining disease patterns within a population, this method of reporting population health offers a type of insight otherwise impossible to obtain. Each page can be laid out and visualized by those with right-brained, eidetic skills for a more complete understanding that is useful for years to come.
This methodology was created for accurately analyzing and predicting population health for programs designed to developed more precisely targeted intervention practices. With the data on exceptionally large populations now available to research in public health, the method developed for population health research enables new measurement techniques to be implemented and employed for the development of more cost-effective intervention programs. The methodology presented above is based upon the mathematics developed for evaluating population curves, using a very detail-oriented formula approach previously employed for such studies involving transect analyses and facial identification software techniques. The results of this work were developed into teaching tools, which were never presented in the academic setting. The mathematical details on the above methodology have since been restricted in distribution due to IP ownership concerns and issues.
The above applications of this methodology serve to illustrate the various ways this methodology can be applied. In essence, this methodology allows for reviews of unlimited amounts of data linked to age-gender as independent variables. This methodology in the above examples was applied to two fields with large numbers of participants–sports and recreation, and population health. These methods could also be applied for patient studies involving any large-scale data gathering system related to topics as unique as food marketing and consumer history, gas or petroleum fuel utilization as an areal-cost related feature, or assessments of detailed regional changes made over time related to local demography, income-family histories and political party classifications. This methodology is basic and therefore has numerous applications for use as a highly realistic and useful applied population research methodology and tool.
The following series of questions were developed back in 2004/5 for the educational program I wrote up on my results. This was for an ESRI ArcGIS 9.0 5 or 6 credit lecture-lab course taught in Denver. At the time I was analyzing state and regional health insurance populations for the State’s new managed care program providers. I used these methods to teach students how to analyze populations and for those data entry specialists taking my course, how to read and interpret the above types of population pyramid graphs. [An example of a class schedule can be posted separately once I find it.] The point is the students taking this class go away with knowledge of a new methodology that has numerous applications. For much of the training see all pages related to https://brianaltonenmph.com/biostatistics/quality-assurance/population-disease-monitoring-the-elephant-of-public-health/)
1. What are the makings of a population pyramid? What are the format standards (include standard color coding of text/numbers and horizontal bars, inclusion of legends, etc.)
2. What are the differences, the pros and cons, regarding how we demonstrate age-gender relationships? Define this for 1, 2, 5 and 10 age range horizontal bar increments. Which are used most commonly and when?
3. What are the main advantages and disadvantages to small group analyses? What are the population and group min-max limits pertaining to this graphing technique?
4. Define a standardized uniform group analysis technique. How is this technique modified for the special format problems that varying populations present analysts with?
5. What N defines the limit as to whether or not 1-year age bands can be used? What min and max define the formula multiplier employed for this type of uniform group analysis technique?
6. What is the name of the technique employed for this analysis. What standard formula does it make use of? Define how it is employed.
7. Provide one or two research questions this methodology might be applied to.
8. Define your dependent and independent variables for this research.
9. Define how to employ this math modeling technique to two dependent variables, treating one as a dependent variable.
10. How would you use the above methods to correlate cost to age-gender variables.
1. Define and name the stages each of the following charts best relate to.
2. Do the following represent maladies that require preventive or palliative treatment forms to be established?
3. Define the age groups in the following disease types for which prevention programs can be developed. How would you apply the standard disease prevention practices differently to each of these age groups?
[insert Disease A, B, C, D]
4. For the above four graphs, which of the four most likely represents a genetic disease with a relatively high mortality rate?
5. Which represents a disease that is life long and relatively speaking is the least fatal of the above four illustrated?
6. Which represents a disease that is more than likely unpreventible in terms of onset and treatment, and generally speaking is not very fatal to its primary age group?
7. Is the follow graph most likely indicative of a biological-physiological malady, a mostly behavior-psychological malady, or a mixed sociocultural, physical and psychological malady? Based on this graph, how might you describe your treatment for this disease were it your responsibility to treat it as a only as biological and physiological event? as a purely psychiatric event?
8. Which of the following best represents an event that presents itself mostly as a young to midlife event? What is the most likely type of disease or disease related behavior related to this malady or condition?