See on Scoop.it – National Population Health Grid
An 11-year-old boy is in the hospital after trying to commit suicide — the victim of bullying at school.
Childhood related community health related behaviors and statistics can now be monitored at high detail, locally as well as nationally using the NPHG method.
Relying upon national V-code, E-code and ICD datasets, I used NPHG to map out map a number of aggressive behaviors and various suicide behaviors by specific age group in under a day.
This way of mapping the national data takes less than 20 minutes in a teradata system per analysis, and can be used to produce 20 -40 videos/analyses or 15,000 to 30,000 images per day.
For a rotating 3D map of the US depicting childhood suicides, at extremely high resolution, see https://www.youtube.com/watch?v=K-8go3lLjDE
See on abclocal.go.com
February 4, 2014 at 6:40 pm
Hi, Brian:
Is that the absolute number, or is it a rate adjusted for population?
Margaret
February 7, 2014 at 6:41 pm
A lengthy answer to your very short question. . . . (others have approached me about this as well)
IP is rate adjusted by age/gender. N is raw values for rarest of conditions/diagnoses. Nsquared is N x N; N cubed is N x N x N. Some (only the 2D diagrams) are spatially adjusted or krigged. I have tried to distribute these techniques fairly across the hundreds to 1000 diagnoses, diagnostic groups, and age groups that I evaluated.
Rate adjustments are only good for older techniques, which are not infrequent to very rare. Poissons are sometimes used automatically, but on their own are not spatially reliable for very rare conditions (which I focus upon).
Big Data makes it possible to no longer require major adjustments like age adjustments in results. When you are working with the whole population, that has matching population distributions in 1-year increments, you needn’t perform the adjustment. That is the assumption, (so proven as well years back), which my use of this technique is based upon.
The argument is we are working with real data, in such large numbers that the results you see are true, and the results don’t need to be modified to deal with irregular distributions of age-gender groups. My review of the data nationally showed there are some small regional differences in age-gender distributions displayed with 10M to 100M+ patients (Florida is older, Great Lakes and VA have more kids and mothers), but not enough to give a p<0.05 that impacts much else in the studies this is used for.
Independent Prevalence (IP) means the map is produced by comparing results to some standard base population data, like the traditional methods, and is the closest thing to "rate adjustments" (sometimes the populations are census based and so too old). Since, with mapping data, spatial confounders get in the way, these maps require kriging of data to validate to centroids defined by IP maps, since IP maps are not krigged. The 2D maps you occasionally see with hot spots in them are the true 3D data fully krigged and spatially adjusted (the algorithm is built into my formulas but not always displayed here).
With infrequent, rare and extremely rare diagnoses, etc., you cannot and should not rely upon "adjusted values"; those are all made up values defined assuming a big "if" and are not real values. Instead, for extremely rare distributions, age and gender speak for themselves and raw value tell you everything. Neural network modelling techniques are also possible, which is basically what the NPHG displays. For example, we cannot age adjust bioterrorist behaviors noted in the V-codes. The numbers of people are too small. So it doesn't matter if there was a 15 year old terrorist who is male, or a 35 yo female, or a 65 yo male; these events are so rare, that mapping them as they stand without trying to adjust for ethnicity, age distributions, etc. provides the truest answer about their spatial behavior.
Many behaviors have nothing to do with age, and those which do, have it stand out on the age pyramids of the patients. Fibrillation (an African Islamic surgical process) shows four distinct age groups in the US for all 10 separate years tested, and so confirms age consistency/congruity and validity of the NPHG process. In addition, I selected those diseases which are otherwise too rare to predict for mapping and so are never fully modelled using the real numbers, in a more applicable, realistic fashion. There is no other way to evaluate the rarest of conditions in a consistent, valid way, at least spatially. When you do this, you’ll find that in the rarest of diseases or behaviors, it becomes obvious if age or spatio-cultural features play a more important role than population density, such as in gender specific deaths rates for male versus female sickle cell carriers, or the impacts cystic fibrosis has on overall longevity and greatly impacts lifespan, or the fact that we ignore bringing kids to the doctor the most when he/she is 7-8, for some reason.
Finally, population age adjustments are only intended for common features compared across large regions, in which you don't have the data for the entire region being reviewed. The NPHG maps are based on national whole data values for 30-35% of the population "sampled". Sampling error is pretty much totally eliminated once one third the total (actually 1/6th is the critical point, but I doubled it), so long as the shape of 1-year increment population pyramid does not change; in other words, a "bell curve" or definitive shape cannot fit under another "bell curve" or definitive shape except in just one way, once the population evaluated becomes very large. A bell curve consisting of one third a total population "bell curve" cannot be formed under another and have a mean that is far off the norm in a statistical manner. Traditionally, we used small samples–with very small samples, you can easily place one bell curve under the much bigger one and have very different results. In other words, the one third sample I am using assures me that a regression to the means is represented for most of the time. I tested and proved this theory/outcomes several hundred times, with varying populations up to 100M over the years, three dataset systems, and reported it internally for companies starting back in 2007.
So, with Big Data, using a Big Data set (not sample) considered to be equivalent to the whole data (99% of whole data down to 20%, avoid 17% or 1/6th) is a very different way of looking at things. It eliminates selection bias, so long as you know how the selections are made and make sure they are consist across the board.
The problem with my selections nationally is that they are of people on health insurance, and miss the uninsured. In general, 1/6th the whole data has a 90-95% probability for no error, 1/5th a 95-97.5% chance for no error, 1/4th a 97.5-99% chance for being correct, 1/3 a 99%-99.9%, 1/2 a 99.9-99.99%, etc.
I use a special formula to assess validity at the 1-3 year intervals as well across all age bands, to see if there are outliers, for each metric spatially evaluated.
As for the other ways of presenting these results (N-squared and such)–with the probability of error eliminated, I turned this modeling technique to the amplification of outcomes by using the simple n, nsquared, n-cubed routine to accentuate peaks and the like, to make outliers stand out. Again, these are spatially defined analytic methods meant for low frequency measures, not the standard large % or high frequency methods typically used in epidemiology population stats, for diabetes, heart, etc. Since this is a dissertation project, I'll just stop at this point with the formula description.
The applications of this technique are what make it stand out. In public health, the important point is whether or not the health problem is there, not whether or not it stands out statistically in some theoretical way. One HIV+ is just as important to deal with as one terrorist threat, or source for cholera in the local waterways, or severe mental health patient not renewing his/her medicines, or carrier of a genetic trait, or indicator of toxic chemical release exposure, versus epidemiology, where the bulk of the money focuses on populations and changing large numbers of people, ignoring those with underrepresented diagnoses.