Developing a Model for Testing the Spatial Distribution of Cancer Cases in relation to documented Toxic Release Sites, Naturally-defined Population Density Regions

General (Statewide) Studies

One of the first steps in the development of the spatial data for Oregon was to produce a cancer case map. Data for several types of cancer were gathered and then evaluated using ArcView 3.2, numerous Avenue extensions, several of the address matching tools available at the time this portion of the project was initiated (ca. 2000), and the Spatial Analyst tool. The first products include various areal and point maps, such as an attempt to quantify risk based on age-group modified incidence and prevalence rates (for now, normalized data outcomes based on census block counts are displayed instead).

At the time this project was initiated, producing point maps from address information was still a fairly tedious procedure, and in spite of the use of fairly advanced address matching tools for the time utilized by a population research center, only 47% of the cases could be reliably mapped. The most poorly mapped counties received approxomately 25% successful address matching; other counties had close to 75% mapped the first time through. When a second attempt was made to map out all cancer cases available for this project, a final map was produced that demonstrated a number of problems inherent to epidemiological/database stored epidemiologic data.


Red Dots are cases successfully address-matched; blue and yellow represent new cases added to dataset through the use of a second address matching tool or manually

Red Dots are cases successfully address-matched; blue and yellow represent new cases added to dataset through the use of a second address matching tool or manually

Additional Portland Cases added to the dataset following the second series of address matching attempts

Additional Portland Cases added to the dataset following the second series of address matching attempts

The major limitation to this process was the inconsistency of the address forms and the inconsistent use of abbreviations and non-letter or number symbols to indicate places within the cancer database (i.e. “#” = “no” = “no.” = “nbr” , and may refer to ‘PO Box’/’Post Office Box’/’Box’ numbers, or simply refer to a private box numbers such as in an apartment building, the number on a house, or even the address term “North.” An additional data processing-related feature with medical data is the lack of consistency of how information is placed into the different database cells available for such use. Since the primary effect of this improper data entry is misrepresented address information, the addresses of some cancer cases had to be managed in a completely manual fashion. Addition hindrances that came about included the fact that many locations were simply PO Boxes, not the case victims residential address, and not necessarily the place of exposure to a potential carcinogen (for this research, household-exposure was usually considered). Other problems to erupt included the common use informal address indicators most often associated with the most rural settings of the state like mile markers. Such was offten found along roads leading to homesteads situated along what were historically logging roads (such as County road 12, mile 3.7), and/or communal living settings.

To improve this problem with regional databases, it is hoped that cancer and all case information be documented, somehow, in a closely monitored fashion (using interrater reliability testing protocols at least), that all cases can be reported and officially documented as reliable with an actual GPS location, with or without the use of a standard address matching tool.

Right to privacy is of course very important for all members of the US population. For this reason, case-related data is often not shared due to these confidentiality issues. However, in recent GIS discussions with public health professionals, in national and regional peer review based meeting settings, this confidentiality issue is often discussed. The most common methods employed by researchers include a slight modification of geo-location information for anonymity purposes, and the allowance of use of data in which each case has a non-descriptive identifier defined and a document signed promising non-disclosure of this information without prior approval by public health professional. To reduce the chances for exposure of personal information, this research focuses on several cancer cases at once, reduces the role of reviewing specific cancer types as part of this research, and at time deliberately modifies or eliminated information that could result in the identification of case victims.

The purpose of this activity was to produce an isoline (contour) map depicting the theoretical cancer risk for the state of Oregon. The term “theoretical” applies here because not all cancer cases in the state were mapped, and this study focused only of a number of infrequent to rare cancer conditions and several fairly common cancer-related conditions.

The distribution of the different cancer types analyzed for this study is indicated by the following isoline (contour) map. These contours represent the distribution of cases, and have not been normalized by comparing the cancer rates to the population density and defined areal risks for the cancers used for this study. The pink sections represent hot spots for the cancer types evaluated, and may just appear to be hot spots due simply to population density features, not chemical release site locations. To identify true hot spots in relation to statewide population density, other analyses and related cartographic steps would have to be represented or undertaken by using this data.

Note: For the benefits of others trying to map the state, the most recent (1983) SPC North was used; the values noted provide information for gridcell corners (x,y’s) and numbers of cells E-W and N-S. The grid was started just a few miles off the westernmost portion of the state’s shore and allowed to exten beyond the remaining three boundaries. This additional edging allowed for fairly accurate isoline arrays to be produced (for some rural areas, curvature was lost). To produce the shape of the state, sections of the isolines passing the state boundaries were removed rather crudely by production of white polygons. The dots on the first map shown depict cases successfully mapped durign the first pass-through an address matching tool. Note the missing hot spot in southcentral Oregon that would have resulted, had the missing case data not been corrected for.

Details of data input into and Evenue Extension grid mapping tool for statewide analysis of cancer

Details of data input into and Avenue Extension grid mapping tool for statewide analysis of cancer

Based on combining the cancer case point data for 7 cancer types

Based on combining the cancer case point data for 7 cancer types

The development of a map of case density by place, based on contours (isolines) developed using a statewide 1 mile x 1 mile square-celled grid, number of cases per cell.

To produce this map, a grid was produced overlying the entire state. The number of nodes in the grid (points where lines crossed) was defined based on the size of the state. A grid with 1 mile by 1 mile cells was produced and laid over the map using an ArcView extension. Next, the numbers of cases per cell were totalled using another Arcview extension. A centroid producer Avenue extension was then employed, and the numbers of cases per cell transferred from the polygon grid cell to its matching centroid (employing yet another extension). The ArcView 3.2 Spatial Analysis extension was then used to convert these centroid values into contours.

Cases by County

The typical way the public is informed about cancer cases is at the county level. The primary reason for this is that in some cases, even noting one case to exist within a county for a particularly rare cancer type is capable of leading to the claim that the writer or publisher did not adhere to specific privacy issues regarding personal health information. By revealing to just the right inquisitive people who it is that is being discussed, you take the chance of being accused of revealing too much information, and even ignoring human privacy issues in a way that could result in legal persecution.

For this reason, cases are often not discussed in detail when it comes to researching cancer incidence. This study pretty much conforms with this research requirement and so focuses mostly upon cancer in general in relation to chemical release site-related chemical exposure. In many cases, cancer types are modified and locations slight changed for the small area analyses that are performed. For this study, various forms of cancer of undescribed types are evaluated relative to chemical release sites, to maintain the anonymity of these individuals as much as possible. However, it is important to note that since this research began, a number of GIS epidemiologists have produced reports that were published on the web concerning diseases, especially environmental diseases and diseases associated with local ecology. For this reason, significant attempts are made to prevent exact locations from being indetified in some of these maps. This is not always the case, but for the most part, the situations where this is a characteristic of the final product, there are many options as to who exactly might be related to this research.

Since the data provided for GIS research pretty much lacks specific private sorts of information regarding cases. There is no deliberate attempt to reveal a particular individual whenever such maps are produced. Since age is generally not revealed in these discussions, individuals who think they know the person being protrayed in a map are really just second guessing the information that is presented. Finally, it helps to realize that the data used for this project regarding human cases is more than a decade old. An individual who now resides in a particular area with a particular illness is unlikely to be the one portrayed using the original data.

Oregon Case Counts data in relation to Case Counts Normalized by County Area to depict relative density differences between counties

County-related Point Counts versus Incidence based on Counts Normalized according to Census Data.

These two choropleth maps depict two ways to illustrate Oregon Case Counts. The map on the left defines simply the numbers of cases per county. On the right, these Case Counts are Normalized by County Area to depict cases as a density feature displayed areally, such as at a per square mile level. Note also the differences between the two maps with regard to the amount of cases or “risk” inferred by the differences in the color patterns assigned to the same county in each map.