Wednesday, February 1, 2017

Assignment 1 - Measurement Scales and Classification Methods

Part 1

Nominal, ordinal, interval and ratio data are each different types of measurement scales.


Nominal data is used to label variables that don’t have any quantitative value (My Market Research, 2012). Nominal data is discrete (Usablestats.com, not dated), meaning it has a finite number of values, and these values can’t be meaningfully subdivided (iSixSigma, not dated). Nominal data is often used for giving names to features (Usablestats.com, not dated). On a map, this may display things like the names of States, Towns, Roads and so on. A simple example of a map showing nominal data is below – this is a map of the United States, and the State names displayed are nominal data.

Figure 1 – Map of the US with State Name labels (Dr. Odd, not dated)
Ordinal data is categorised data, with categories ranked along an arbitrary scale, in which the difference between two categories may not always be the same. Values of ordinal data categories may be low, medium and high, for instance. (UCLA, not dated). As there are a finite number of categories that ordinal data can be sorted into, this type of data is discrete, like Nominal data. On a map, Ordinal data may use graduated symbols on a map to display categories, or categories displayed by colour. The below map, taken from The Telegraph newspaper, is a good example of Ordinal data, as it ranks countries by their terror threat level along a scale of Low threat – High threat. These values are arbitrary as there is no evidence of a consistent difference level between these categories: there may be a larger space between ‘low threat’ and ‘underlying threat’ than exists between ‘general threat’ and ‘high threat’. In this case, colours on a choropleth map are used to highlight the ordinal categories.


Figure 2 – Map of the World showing Terror Threat Levels (TelegraphTravel, 2016)

Unlike Nominal and Ordinal data, Interval and Ratio data are numeric. Interval data is measured along a regular continuum, with the distance between one value and the next always being the same, unlike ordinal data. With interval data, a value of 0 does not mean an absence of the thing being measured. An example of interval data is temperature (Laerd Statistics, not dated). Temperatures of 0 degrees do not mean an absence of temperature. Ordinal data is continuous as it is not limited to a finite set of values (UsableStats.com, not dated). An example of Interval data on a map is pictured below – this shows the temperature across the United States on January 30th, 2017, taken from The Weather Channel. 

Figure 3 – Temperatures across the US on 01/30/17 (The Weather Channel, 2017)

As the map shows, values range from -40 to 120 degrees Fahrenheit, although these do not represent the lowest and highest possible temperatures – there is no finite amount of values to ordinal data. It is clear here that the 0 value is not meaningful, as temperatures in some areas are below this value, but this does not mean they lack a temperature.

Finally, we come to Ratio data. Much like interval data, this is a form of continuous data, meaning values are not limited to a finite number of possibilities. However, Ratio data differs from interval as it has a meaningful 0 value, indicating the absence of the phenomenon being measured (Laerd Statistics, not dated). Examples of ratio data include rainfall levels or number of hospitals per county. The map displayed below shows the size of crater left in locations where an asteroid has collided with the Earth. The symbols vary in size, dependant on the size of the crater left. These symbols are proportional to the size of the crator, while areas of grey illustrate a lack of asteroid collisions – this is the meaningful 0 that differentiates ratio data from interval.

Figure 4 – Craters made from asteroid collisions, up to 2013 (The Washington Post, 2013) 

Part 2

Goal

For this section of the assignment, we were tasked with creating three maps for a client, each using the same data but different classification methods: specifically, Jenks Natural Breaks, Equal Interval based on range and Quantile. These maps were to be produced for the client of our company, an Agricultural Consulting and Marketing specialist. The goal of the task was to create the three requested maps and then select which one best fit the purposes of the client – who were looking to increase the number of female principal farm operators (FPFOs) in Wisconsin. The selected map must give the best representation of the distribution and density of female principal farm operators at county level, and give insight on where our client should concentrate their message.

Method

To create the maps in question, a shapefile of Wisconsin counties was downloaded from the US Census Bureau website and joined with a table containing data on female principal farm operators by county 2012 Bureau of Agriculture census. These figures did not require normalization. To make the map a more accurate depiction of Wisconsin, the projection was changed from WGS 1984 Web Mercator (auxiliary spheres) to NAD 1983 State Plane Wisconsin Central FIPS 4802 (Meters). This is because the previous projection is designed to project smaller scale maps of the entire world rather than focus on a small area such as Wisconsin, which produces distortion. NAD 1983 is more appropriate as it is designed specifically to project our client’s area of interest as distortion is minimized.

After the map had been set up, the three different classification types were applied to the data, each producing maps with different visual impacts. These are pictured below. Elements such as a North arrow, scale bar, legend and title were added to the maps to make them more cartographically pleasing and to ensure the client could clearly understand them.

Results

Figure 5 - data on FPFO mapped using Jenks Natural Breaks
The first map created uses the Jenks Natural Breaks classification method. This method automatically assigns classifications to data according to natural groups that exist within the data distribution (ESRI, not dated).

This map indicates that Northern Wisconsin is a predominant area lacking in female principal farm operators, with the highest numbers of female principal farm operators found in counties in the south of the state towards the Illinois border and in the centre of the state, particularly in the east towards the counties bordering the Twin Cities area in Minnesota. For our clients, therefore, this map would suggest focusing their marketing campaign in the northern portion of the state, where there is a lower number of female principal farm operators in existence already and therefore a larger target audience.

However, the use of Jenks Natural Breaks classification here exaggerates the number of female principal farm operators. This is principally due to the fact that the fourth and darkest in colour classification used to indicate counties where there is a high level of female principal farm operators is vastly larger than the others, covering 186.000001 – 386.00000. This is disproportionate in size to the other three categories and means that the counties in this fourth category may appear to have lots of FPOF, but in reality this is exaggerated by the distribution of the natural breaks. Furthermore, values such as ‘186.000001’ are used automatically by the Jenks Natural Breaks classification, but in the case of the data being used they are not appropriate values as there are no values that are not whole numbers in the data set.


Figure 6 - data on FPFOs mapped using Equal Interval Classification
The next classification used on the same data set uses Equal Interval classification. This differs from Jenks Natural Breaks as all of the four subranges are of equal size (ArcGIS Pro, not dated). The absence of FPFOs is also highlighted with this classification method, however it is better for this purpose than Jenks Natural breaks as the equal size of the subsets prevent exaggeration of any particular group. Unlike with Jenks Natural Breaks, only one county here has been sorted into the final category, 289.750001 – 368.00000. This gives a more accurate depiction of the number of FPFOs by county, as it is easier to compare the values of the subsets relative to one another. This classification, however, does also utilize fractions rather than integers, which is inappropriate for the data set but necessary to produce categories of equal size.


Figure 7 - data on FPFOs mapped using Quantile Classification
Finally, the last classification used is the Quantile classification. This classification means that there is the same number features in each subset. In this case, there is an equal number of FPFOs in each of the four subsets (ArcGIS Pro).

With this classification style, the pattern remains that there is a lack of FPFOs in the north of the state, however, due to the fact that the same number of points fall within each category, giving each category eighteen counties exaggerates the number of counties in which there are high numbers of FPFOs – to achieve this end, the classification has automatically assigned a final subset made up of counties where there is anywhere between 139.000001 – 368.000000. This is a larger subset than that used in the Jenks Natural Breaks classification, and likewise it makes the map difficult to read as it suggests that towards the southern and eastern borders of the state, there are many FPFOs. This could be confusing to the client.

After reviewing the maps visually after the application of the three chosen classification methods, it is clear that in this case the Equal Interval classification is the most appropriate for the data set in question, and the client. This is because it makes it easier to compare the amount of female principal farm operators in each county relative to each other which gives a clearer indication of which counties they should focus their marketing campaign in. With the other rejected classifications, the range of each subset differs which makes comparison difficult and likely to mislead. This could cause the client to miss potentially important counties where their message would be well received as they appear to have more FPFOs on paper than they do in reality.

From the Equal Interval Map produced, it is clear that the client should focus their message in the north of the state, where there are few counties with anything other than the lowest subset of FPFOs. This means that there will be a large, untapped consumer base for the client, who will be able to distribute their message effectively.



References:

ArcGIS Pro, not dated. Data Classification Methods. [Online] Available: http://pro.arcgis.com/en/pro-app/help/mapping/symbols-and-styles/data-classification-methods.htm (accessed 02/01/17)

Dr. Odd, not dated. US Map. [Online] Available: http://www.drodd.com/html7/us-map.html (accessed 01/30/17)

ESRI, not dated. Natural Breaks Classification GIS Dictionary. [Online] Available: http://support.esri.com/other-resources/gis-dictionary/term/natural%20breaks%20classification (accessed 02/01/17)

iSixSigma, not dated. Discrete Data. [Online] Available: https://www.isixsigma.com/dictionary/discrete-data/ (accessed 01/30/17)

Laerd Statistics, not dated. Types of Variable. [Online] Available: https://statistics.laerd.com/statistical-guides/types-of-variable.php (accessed 01/30/17)

My Market Research, 2012. Types of Data and Measurement Scales. [Online] Available: http://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/ (accessed 01/30/17)

TelegraphTravel, 2012. Mapped: Where in the World is Safe from Terror? 12/20/16. The Telegraph. [Online] Available: http://www.telegraph.co.uk/travel/maps-and-graphics/Mapped-Terror-threat-around-the-world/ (accessed 01/30/17)

The Washington Post, 2013. Russia’s Surprise Meteor and Earth’s Craters. 02/15/13. [Online] Available: http://www.washingtonpost.com/wp-srv/special/world/russia-meteor/index.html (accessed 01/30/17)

The Weather Channel, 2017. Current US Temperature Map. 01/30/17. [Online] Available: https://weather.com/maps/ustemperaturemap (accessed 01/30/17)

UCLA, not dated. What is the difference between categorical, ordinal and interval variables? [Online] Available: http://www.ats.ucla.edu/stat/mult_pkg/whatstat/nominal_ordinal_interval.htm (accessed 01/30/17)


UseableStats.com, not dated. Fundamentals of Statistics 1: Basic Concepts :: Discrete and Continuous. [Online] Available: http://www.usablestats.com/lessons/datatypes2 (accessed 01/30/17)

No comments:

Post a Comment