Inventors of the Hexagonal Grids and Theissen-Voronoi Polygon methods for Population-Area Analyses

Hexagonal Grid Mapping

Innovation 1 – The Use of Hexagonal Grids to Analyze Space, 2002 (see Oregon State Pollution and Chemical Release Site study)

Introduction.  I developed the hexagon grids tool sometime in winter of 2002/3.  It took just a day to construct an excel tool for it, but a month or more to implement it, test it and validate my formulas as being correct, no matter what systems were in use.  This was designed for use with equal area mapping and coordinate systems that rely upon some of the equal area attributes for mapping coordinates.  So these formulas do not work on all projections and coordinate systems, and will produce irregular looking polygons when applied to the wrong lat-long systems.  I apply this to English metrics, such as degrees decimal or feet measurements (i.e. in SPCs).  It does not work with degrees.

At the end of this section is a chart of the progress of this particular page.  Due to its popularity, this page has been extended significantly since I first posted my findings a little more than 2 years ago.  At the time, my rediscovery of Christaller’s technique was about 4 years old for me.  I decided I had to post it on the web rather than keep it to myself, since it wasn’t helping me to find a job.  If felt this process was one or two standard deviations ahead of the progress being made in the GIS grid mapping field.

Due to the quality of its output, I decided that this hexagonal grids work is one example of a methodology that should have been taught back when I was a graduate student in geography during the late 1990s.  But science inherently has this tendency to stick with status quo, to avoid getting too deep into something that might require a lot of intellectual capacity, time and work, and not promote innovations once they are made and discussed.  The problem with making a discovery is that if it is never published, two or three more discoveries later, people will no longer be able follow you in terms of what you are claiming to have done.

The formulas for producing in GIS a grid that is hexagonal in nature requires an understanding of the two formulas needed to replicate this kind of spatial pattern.  These are derived from basic trigonometry, geometric equations, and once understood and developed are easy to implement using GIS.  In the past this was probably not commonly implemented in spatial grid analysis due to the lack of automation of the formula processes in computer systems.  With the current technology, such a formula can be developed, tested and promoted in just a few minutes once it is understood.  And so I engaged in this process and posted it.  Due to the growing popularity of it, I decided to post more on the theory of this method with hope of putting out there the knowledge base and theories I derived my methodology from.  This was still not enough apparently to some, so a year later I decided to find the floppy that had the old excel file and formulas I used to produce these grids on a separate page, and post this for others to download.  This file has since been downloaded almost daily in the past several months, receiving a low to moderate number of hits per day from internet explorers.  Approximately one in six to one in four people who read about hexagonal grid mapping download that excel spreadsheet.  To date, the introductory hexagonal grid analysis page for employing GIS remains my most popular statistical analysis site.

Background.  A typical grid analysis of area has the option of using square grids.  Square grids is the standard.  The problem with the use of square grids is the high potential for error, and if square grid cells are used to map space, the resulting contour or isoline maps produced using these grids is sloppy and very unsatisfying.   For this reason, I decided I had to develop a hexagonal grid formula to more accurately analyze the spatial features I was reviewing–cancer cases in relation to chemical release sites–and use this to develop the contour maps of population exposure to chemicals for the state of Oregon.

To reduce or eliminate this problem in statistics and space, a moving circles program was attempted but its results weren’t that convincing or helpful.  The moving circles software represented an improvement over the use of a static square grid, but it too had limitations related to the maps it produced.  Predefined circle sizes were applied to this method.  The analyst first defines how large the circle will be that would be passed over the research area, and then the program is run, with the distance between rows also defined.   The end result was a map with just those circles that demonstrated significance displayed, all of this according to the numbers (circle size) entered by the analyst.  Areas of risk could be defined, but there was no way this could be used to produce any detailed large area map like an isoline map related to risk, and each area displayed had to be reevaluated to define how and why it was at risk.  The formula used for this moving circles scan of an area and the overall usefulness of the end product were the limitations of this method of spatial analysis and the software program.

Outcomes.  At first it appeared as though my method might be incorporated into some other spatial analysis programs.  But like any new discoveries, my methodology was initially criticised and ridiculed, and not incorporated into any spatial analysis techniques currently out there or in the makings.  This method of spatial analysis and representation is currently in the next stage of review by others in this field.  It is being tested and incorporated by some as part of their GIS studies and work, mostly by students.  Acceptance and implementation of this technique is still a ways from being recognized as being of any value, and there is perhaps still a little ridicule out there about what I did in theory due to the lack of knowledge regarding specifically how effective this methodology can be.  Scholars like to reflect upon the past teachings of its originator Christaller, but do little to learn how to incorporate such an innovative technique into their own skills set.  Once this third stage is done and over with, the statistical value of the use of hexagon grids over square grids will become better understood and thereafter recognized as being of some analytical value.

In essence, this progress with such a discovery or implementation of an idea follows the standard phases  any new technique undergoes–rejection , followed by intense criticism and ridicule, followed by the realization and recognition of truth, and then finally acceptance as being correct and of some intrinsic value to the work it related to as a whole.  Innovations are what they are–innovations–not recollections of the ongoing status quo.  The best discoveries are two or three standard deviations ahead of the norm, but these are also the least recognized discoveries.  Developing a following of a few people following in your footsteps is required for you to be in the forefront of progress in this field.  Being just one standard deviation ahead of the norm however is not always as rewarding intellectually as being two or three ahead of the norm.  Such is the problem with academia.

Cases = green '+'; hexagon patterns represent CRI site density (# per cell).

Methodology and Analytic Logic.  My reason for developing the hex grid approach was two-fold back in 2003.  First, I wanted to work with very small areas, which together represent a common feature mapped over a much larger area; this could then be used to produce an isolines map that lacked the sharp corners seen in square grid cell centroid  generated isoline maps.  Second, by taking into account the error that exists spatially, the corner areas of the cells with are considerable more distant from the centroid, I better and more accurately defined the smallest area possible for evaluation with hexagonal cells versus square grid cells.  I then went on to produce a hexagonal grid for a predefined area for each grid cell size, rather than a grid cell with edges of specific sizes.

In the case of the chemical release and exposure study, the assumption that I made was that if there is a +/- 500′ error for where the site is located, or where the exposure chemical is coming from, this means that the error was approximately 0.1 miles (5280/10 = 528 feet).  A 0.1 x 0.1 square grid cell has an area of 0.01 sq miles, with significant area in the corners of the square grid cells.  These corner areas are where the data does not match the analytical technique, and is where the error can be propagated from.

Since we begin with the assumption that error comes from two major sources in this methodology (the error in where the site is located and the grid cell shape related error), where these two types of error multiply each other’s effect is what we need to correct for.  Hexagonal grid cells have a reduced corner/lateral/vertical apices area, and a reduced chance for error introduced by this problem.  Next, you know the location error is 500 feet, so you chose a cell size of about 2 x this error, or 1000 units diameter (face to face, or apex to apex may be tried).  So now the information being evaluated is probably in the right cell.  There  is still that neighboring cell problem, meaning that the point data placed in one cell is actually relating more to other point data in the neighboring cell, but this problem has been reduced as much as possible.

If one has the time, several grid cells can be run of the same area, using different starting points for producing the points that are converted into Theissen polygons later on.  A 1/3rd or 1/2 overlap of cell patterns can be tried, for example by using the central point value for each of the grid cells ans the new centroid points for three new series of hexagons–each series going in a direction 60 degrees offset from the prior–so a single grid cell has six cells overlapping it, with the single shared point where these six overlap representing the centroid of the original hex grid cell.  [Think of the center of a pie cut into six pieces, with each piece representing a wedge pulled from a distinct neighboring pie that is overlapping the original pie.]

This latter approach reduced the size of the smallest possible cell even more–to the area value of each triangle.  Since the original grid cell size is also selected at a value that is low enough to avoid missing important relationships while taking error related size of  area into consideration, we have more than accounted for problems in analysis produced by poor cell size.  This methodology is essentially identical in interpretation value and validity and reliability as a moving windows methodology, in which the analysis is limited by a non-varying window size.  In terms of error related problems in the spatial analysis technique, a moving 0.1 dia window of circles is just as reliable as a static o.1 dia hex cell, with overlapping hex cell arrays.  The first has a static area per cell, the second has a modifiable area per cell but requires a little more work to produce those secondary hex grids.  If the second methodology is used, but the original cell size chosen to be small enough to avoid a need for the use of multiple offset hex grids, then a single pass can be used to produce the main hex grid.  Using nearest neighbor techniques, you can then engage in analyses on a per cell basis to get a result with reduced static grid cell related errors creeping in.  For example, with nearest neighbor technique, if you are interested in testing for 0.25 sq mi proximity involving two points, you can use a o.1 dia hex grid technique, and with nearest neighbor test for o.2 and o.3 dia areas as well.

The key to effectively using hex grid techniques is how you define the cell size, and then take into account the placement of the first and therefore all subsequent centroids used to produce the neighboring cells or arrays.  Cell size has to be defined by taking error analysis into account.  This methodology perhaps works best with true spatial data, not irregular polygon spatial-like, but non-contiguous data.  Unless census blocks are small, overlaying hex grids over census block maps could be a problem.  You have a block with 3 or more cell overlapping it, and use area of intersect to distribute all values in that census block, but that is done assuming perfectly equal distribution across the entire block cell.  Such is rarely to case.  You are better off using hex grid techniques to compare one point to another point over a given area, basing the grid cell size on the error for placement of each of these points.

The following figure demonstrates the use of this particular spatial analysis technique over the past 15 to 18 months, for just the hexagon grids introduction/description page, about 2 people (mostly students) per day, with peaks defined by the academic schedule.