Population Age Groupings



What is the best way to break a population down into age groups for population health studies?

The standard protocol is  to employ population bars or pyramids and make use of 5 or 10 year age groups.  This works very well for small populations (<1 million, or perhaps <0.5 million), but reduces the validity and reliability of age group study outcomes applications when such analyses are applied to exceptionally large populations (>0.5 – 1 million).

When a population being investigated has more than one million individuals, this is a valid study of the human population in general.  Assuming this 1+ million represents a true sample equally distributed across the entire population.  So long as the features of this population that vary do not have 1+ million possible meaningful outcomes, such as 1+ million different responses for blood pressure or pulse that are important to document, the numbers of meaningful outcomes of studying this many people and their meaningful variance are the delimiters.  Since health-related metrics or measurables have limited ranges and groupings of value (good vs, bad, very healthy to very unhealthy in 7 categories, etc.), it can be assumed that for most studies of medical events that occur once in 10,000 or more are probably going to be represented by a study of 1 million people.  For a study of events that involve 1 in 100,000 people, a study of 10,000,000 should suffice.  For events that occur 1 in 1,000,000, a study of 100,000,000 is perhaps the population size to initiate a review process with.  Based on probability alone, we should see close to 1000 individuals to perform a study with.

The exceptionally large population studies are not valid if and when the following assymmetries are demonstrated to exist:

  • the 1 million+ population is not selected from a much larger population based on well defined filtering features
  • age range, ethnicity, education background or gender of the study population does not generally represent the total population,
  • based on age counts, and requirements to be in the group selected, the ages appear to not be distributed fairly equally across the entire age range when compared with expected population age groups compared with the base group or control group

This work is based on a population in the 1 million or more category and the results are based upon a randonmly selected case study group from an exceptionally large data set.

To engage in a study of populations of large numbers (1 million+), 2-year or 1-year age groups may be used.  The purpsoe for employing exceptionally small age groupings to exceptionally large populations include the following:

  • results provide insights that normally invisible, not realized, or not uncovered using standard 5 or 10 year age group methods
  • results provide results that can be more effectively employed
  • results allow for more accurate definitions of target populations
  • results allow for the production of interventions that targ a population with more precision
  • results reduce overall costs for these intervention processes (i.e. sending a prevention newsletter to a 7 year age range of people instead of two 5-year age groups, this produced a 30% savings)

Age Grouping Methodologies

We can also define these gender asymmetries based upon the age-groups when these asymmetries exist.  How these age groups can be defined then becomes a primary metric for this method of evaluation.  Applying the HEDIS method of breaking down ages into groups (see other web pages on “Seeing the Elephant” for this), we can define the following broad age groups:

  • 0-1.99 years old
  • 2-17
  • 18-44
  • 45-64
  • 65+

A more exact and detailed method of breaking down age groups can be developed based upon clinical visit activities of the population as a whole.  For example, one study suggests that a peak age for visits in chldrein is 8-9 years old, followed by a 17-24 year old lull in visits, with visits increasing again to a peak somewhere around 45-55 years of age.  Based on these observations, the following age groups can be used to better define the population based on natural breaks in the population pyramid curve:

  • 0-1.99 yo
  • 2-8.99
  • 9-17.99
  • 18-24.99
  • 25-44.99
  • 45-64.99
  • 65-74.99
  • 75+

An even more useful method involves breaking the 25-64.99 yo groups into 10 year increments.   This allows for the following age group breakdowns:

  • 25-44.99
    • 25-34.99 (health promotion/health maintenance)
    • 35-44.99 (health maintenance/early prevention of family diseases and reduction in high risk behaviors)
  • 45-64.99 yo group
    • 45-54.99 (used to study chronic disease treatment and prevention)
    • 55-64.99 (used to study chronic disease care maangement and maintenance therapy)


As noted elsewhere (“Seeing the Elephant” pages), the above age-grouping procedure allows for studies that closely resemble the HEDIS requirements, and allow for preventive activities to be performed and evaluated.

The following is an example of how a research group in the managed care setting might design their population health monitoring program, with examples of possible research topics:

  • 0-1.99 yo
    •  Childhood Well-visits/Immunization studies
  • 2-8.99
    • Mid-age children care, immunization, psychological itnerventions, health education
  • 9-17.99
    • Mid-age to Highschool age care, immuization, sports medicine, prevention, health education
  • 18-24.99
    • Health Maintenance/Health Promotion, Early Prevention, Family and genetic disease treatment, psychological interventions, STDs
  • 25-34.99
    • Health Maintenance/Health Promotion, Prevention, etc.
  • 35-44.99
    • Health maintenance, preventive care, disease management
  • 45-54.99
    • Health maintenance, preventive care, disease and case management
  • 55-64.99
    • Health maintenance, preventive care, disease management
  • 65-74.99
    • Health maintenance, preventive care, disease management
  • 75+

The following ICDs demonstrate results that would result in costly, time consuming clinical and office related treatment and intervention activities were the standard 5 year age groups applied.  Incidence-Prevalences (sometimes termed ‘IP’ in these presentations), not numbers,  are presented.  They depict the percent of individual within each one year age group that have the condition, based on numbers of members of the total base population.   These percents are applicable across all populations.  Due to the process employed, they do not need to be age adjusted; they are true measures of frequencies relative to an exceptionally large base population that demonstrates regression to the means standards.  This base population is so large that the likelihood of it misrepresenting an entire population in very low.  This base population represents approximately one-third of the total population.   To statistically change these results, we would have to select 1/6th the number of members of this total N, and have that entire selection be well deviated from the standard distribution.  For 100,000,000 people, the likelihood of selecting 1/6th of the total N or 16 million people who are all off in their distribution is highly unlikely.

The one key divider in population behaviors here pertains to major population type.  The population studied is the total age range for a workforce population and a few retirees and governmentally-covered groups (a small number of medicaid members for example).  HEDIS studies demonstrate that there are significant health differences between governmentally-sponsored and employed, privately insured populations.  This review focuses on the workforce, not the unemployed, retired and disabled.  That is the primary delimiter to this work being applicable to the US population as a whole.

Age Specificity

RSV.  There are several newborn ICDs that demonstrate one year significant changes in prevalence and single year prevalence peaks.  These are missed with a standard 5 year population profiling technique and are not fully evaluated using a segmented manner of breaking the population such as the above examples in which HEDIS and institiutional study groups were defined.  This information provides more effective feed back at the rule-out level.  Similar ICDs to demonstrate this very young peak age result include lead poisoning and other childhood poisoning ICDs, several young childhood related highly infectious diseases.

Adult Abuse.  The data relate to patients who are abuse recipients.  Women are abused much more than men, and at very specific ages demonstrate peak abuse rates.  The use of an overall abuse age average would not represent the peak age of risk due to data irregularities, in particualr the skewing pattern.  Interventions can be better planned for adult abuse using this information (notice this includes children entered as adults; child abuse is also entered as a separate ICD).  Adult men abuse occurs to patients of a slightly older age than female adult abuse.  The use of age ranges would not demonstrate this difference so precisely.

Dementia.  A number of the old old age (>75) diseases are missed using standard age groupings.  The detail this information provides us with is the rate at which dementia accelerates and the fact taht it accelerates fairly equally between genders. It also produces a more accurate rendering of the slope of this progression process when compared with 5 year increment population data.


There are 3 distinct age-gender patterns related to injuries filed in the medical records.  The first two of the above graphs are essentially the same form, demonstrating child and smaller elderly years peaks in injuries.  The third of the above figures (Hip/Thigh) demonstrates a “dumbell’ effect in which the very old and very young demonstrate similar prevalence rates.  One unique feature of this worth mentioning is the gender assymtery for the older age group (F>M, a possible result of weakened bone structure).  The majority of injuries noted in the ICDs fit the form of the fourth age-gender pyramid above, including the gender equality.  This gender equality for injuries is not repeated so much for fracture, dislocations, sprains, etc., especially for the adult years.  One final major difference to note is the very low prevalences for less than 5 yos, esp. < 2 year olds, for all injuries except head and face/neck.   The younger children  have a higher likelihood of injuries noted for these two exceptions than the 5 year old group.    All of these details are lacking in a standard 5-year age group increments pyramid.


“Legal Drug” Abuse


Alcohol and Tobacco Abuse.  Alcohol and tobacco abuse have two distinct age distributions.  Alcohol abuse peaks occur slightly after legal drinking age is reached (recall these are claims-based only).  For Alcohol, the male age peak in abuse (22 yo) occurs much earlier than the female age peak (42 yo).  Tobacco abuse has fairly equal age peaks for male and female, but with age specific prevalences for males much greater than females for nearly all pre-retirement years in life.  The specific form of these peaks suggests multiple reasons exist for the significant differences between genders.  The onset of heavy smoking in males results in a more aggressive onset of behavior than for female.  The subsequent years (past the peak age) for males continue who reduce their prevalence at a very slow rate, versus female smokers who rapidly reduce their rates and then return to smoking in their mid-age and later years.  This suggests very different reasons exist for these different behavioral patterns.

Genital Mutilation.  This is a very cultural-specific ICD linked to specific recent African immigrant and recent immigrant descendent populations.  Three, possibly four age peaks are defined for this ICD.  The oldest group represents elderly immigrants with a history of enduring this process in their native land setting.  The mid-age group has two possible prevalence peaks–30 yo and 45 yo.   The latter probably represents genital mutilation processes received in the native country, but could represent cases engaged in within the US setting.  The 30 yo age peak possibly represents a mixture of US and old country victims of this cultural behavior.  An evaluation of personal family, medical and migration histories would reveal where these events took place. The most important fgroup to note is the youngest group enduring this process.  Their participation is in theory involuntary, although they are in theory too young to make such a personal medical decision; as a result, their parent’s decision making process prevails, which in some cases is very much culturally defined, motivated and instigated, in spite of local public health movements designed to prohibit such practices.    These details about the four separate age groups are missed if we rely upon a 5-year age group review of this data.

Prescription Drug Dependence

The above figures detail the drug dependency ICDs.  Two very age-restricted prevalence peaks exist–involving the late teen to early twenty year old populations–Cannabis and Hallucinogens.  The former has behaviors that continue into adult hood, the latter is much more restricted in regard to age-specific prevalence behaviors.   Also notice that male abuse is much greater than female abuse for these two drug abuse ICDs.

Amphetamine abuse age-gender behaviors are considerably different than those of cocaine abuse.  The lattrer demonstrates ongoing male behaviors.  Both genders demonstrate a drastic reduction in abuse soon after the age of 30.  Otherwise, both appear equally in terms of the relationship between male and female behavior.

Chromosomal and Developmental Anomalies

When we begin a review of genetic diseases we take into account gender links and the realtionship of the genes involved to the somatic and sex chromosomes.  Some ailments are sex-linked, others sex-related.  The XXY for example is predominantly male in origin, although a few female gender cases are noted in exceptionally large datasets, perhaps a result of misentries or a result of XXY lacking complete Y-expression and therefore dominant male somatic features (not expected).  Using the 5-year method of gender analysis, the two gender correctly appear the same, although there are minute difference in first diagnosis age range (0-2 yo).

Spinal Bifida is linked usually to poor nutrition during a specific embryogenic period.  There is a tendency for more males to express it than females according to the database evaluated.  There is also a tendency for very young females to be diagnosed as having this condition one year earlier than young males.  This suggests the possibility that in the embryonic-fetal stage, this condition could be more fatal to females, or less likely to occur in female embryos.  This newborn statistical difference is missed using the standard 5-year age-gender pyramids.

The third example Von Willebrand factor is a gene-specific condition that demonstrates a unique and hard to explain gender asymmetry for the population under 30 years of age.  This difference is more pronounced in female versus male children under the age of 20.  This newborn statistical difference is missed using the standard 5-year age-gender pyramids.



. . . To Be Continued






Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.