Which Measure Of Central Tendency Is Least Representative Of The Data Set Shown? 2, 2, 3, 4, 5, 32

Measures of Central Tendency

Introduction

A measure out of cardinal trend is a single value that attempts to describe a prepare of data by identifying the cardinal position inside that set up of data. As such, measures of central tendency are sometimes called measures of central location. They are besides classed as summary statistics. The mean (often called the average) is most likely the mensurate of central tendency that you are most familiar with, simply there are others, such as the median and the mode.

The hateful, median and mode are all valid measures of central trend, but under dissimilar conditions, some measures of fundamental trend become more than appropriate to use than others. In the following sections, we will expect at the mean, mode and median, and learn how to summate them and under what conditions they are most appropriate to be used.

Mean (Arithmetics)

The mean (or boilerplate) is the most popular and well known mensurate of central tendency. It tin can be used with both discrete and continuous data, although its use is well-nigh often with continuous data (run across our Types of Variable guide for data types). The mean is equal to the sum of all the values in the data set divided by the number of values in the information set. And then, if we accept $ n $ values in a data ready and they accept values $ x_1, x_2, $ …$, x_n $, the sample mean, usually denoted by $ \overline{10} $ (pronounced "ten bar"), is:

$$ \overline{x} = {{x_1 + x_2 + \dots + x_n}\over{n}} $$

This formula is usually written in a slightly different mode using the Greek capitol letter, $ \sum $, pronounced "sigma", which ways "sum of...":

$$ \overline{10} = {{\sum{x}}\over{n}} $$

Y'all may have noticed that the above formula refers to the sample hateful. So, why accept we called it a sample hateful? This is because, in statistics, samples and populations have very different meanings and these differences are very of import, even if, in the case of the hateful, they are calculated in the aforementioned mode. To admit that we are computing the population mean and non the sample mean, nosotros apply the Greek lower case letter "mu", denoted as $ \mu $:

$$ \mu = {{\sum{x}}\over{n}} $$

The mean is substantially a model of your data set. It is the value that is almost common. Y'all will discover, however, that the mean is not oft one of the actual values that y'all accept observed in your data set. However, i of its important properties is that it minimises error in the prediction of whatever one value in your data set up. That is, it is the value that produces the lowest amount of error from all other values in the data set.

An important property of the mean is that it includes every value in your information set equally part of the calculation. In addition, the hateful is the but measure of central trend where the sum of the deviations of each value from the mean is always cipher.

When not to use the hateful

The mean has ane primary disadvantage: it is especially susceptible to the influence of outliers. These are values that are unusual compared to the residue of the information set by being especially pocket-sized or large in numerical value. For example, consider the wages of staff at a factory below:

Staff	1	2	3	4	5	6	7	8	9	10
Salary	15k	18k	16k	14k	15k	15k	12k	17k	90k	95k

The mean salary for these 10 staff is $30.7k. However, inspecting the raw data suggests that this hateful value might not be the best way to accurately reflect the typical salary of a worker, equally most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a ameliorate measure of primal trend. As we will detect out later, taking the median would be a better measure of fundamental tendency in this state of affairs.

Some other fourth dimension when nosotros usually adopt the median over the mean (or mode) is when our data is skewed (i.e., the frequency distribution for our information is skewed). If we consider the normal distribution - as this is the most ofttimes assessed in statistics - when the information is perfectly normal, the mean, median and fashion are identical. Moreover, they all represent the well-nigh typical value in the information prepare. However, equally the information becomes skewed the mean loses its power to provide the best primal location for the data because the skewed information is dragging it away from the typical value. However, the median best retains this position and is not equally strongly influenced past the skewed values. This is explained in more than detail in the skewed distribution department later in this guide.

Median

The median is the eye score for a set of data that has been bundled in gild of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below:

We offset demand to rearrange that data into order of magnitude (smallest first):

Our median mark is the middle mark - in this case, 56 (highlighted in assuming). It is the eye mark because there are 5 scores earlier information technology and 5 scores subsequently it. This works fine when yous take an odd number of scores, but what happens when you have an fifty-fifty number of scores? What if you had simply 10 scores? Well, you but accept to take the centre two scores and average the result. And so, if we wait at the example below:

We once again rearrange that data into club of magnitude (smallest first):

Just now nosotros accept to accept the 5th and sixth score in our information gear up and average them to get a median of 55.5.

Mode

The style is the almost frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as existence the most popular selection. An example of a way is presented below:

Histogram showing mode as highest bar in the middle of the continuous distribution as the mode

Ordinarily, the mode is used for chiselled data where nosotros wish to know which is the most mutual category, equally illustrated below:

Bar chart showing highest bar as the mode

Nosotros can encounter above that the most common form of transport, in this detail information set, is the autobus. Notwithstanding, i of the bug with the mode is that information technology is non unique, so it leaves us with issues when we have two or more values that share the highest frequency, such as below:

Histogram of a continuous distribution showing two modes, both somewhat centrally located

Nosotros are at present stuck equally to which manner best describes the central trend of the information. This is peculiarly problematic when nosotros take continuous information because nosotros are more likely not to have whatsoever ane value that is more frequent than the other. For case, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that nosotros will notice two or more people with exactly the same weight (e.g., 67.4 kg)? The respond, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a big range of possible weights, you are unlikely to find two people with exactly the aforementioned weight; that is, to the nearest 0.one kg. This is why the mode is very rarely used with continuous data.

Some other problem with the style is that it will not provide us with a very adept measure out of central tendency when the most common mark is far away from the rest of the data in the data prepare, as depicted in the diagram below:

Histogram of a continuous distribution showing mode not centrally located

In the above diagram the mode has a value of 2. We can clearly see, nonetheless, that the fashion is non representative of the data, which is mostly concentrated around the xx to 30 value range. To use the mode to draw the cardinal tendency of this data set would be misleading.

Skewed Distributions and the Mean and Median

We often test whether our data is normally distributed because this is a common assumption underlying many statistical tests. An example of a usually distributed set of data is presented beneath:

A histogram showing a normally distributed continuous data set

When yous take a unremarkably distributed sample you can legitimately use both the mean or the median as your measure of key trend. In fact, in any symmetrical distribution the mean, median and mode are equal. However, in this situation, the mean is widely preferred every bit the best measure of central tendency considering it is the measure that includes all the values in the data gear up for its calculation, and any change in any of the scores will bear upon the value of the mean. This is non the instance with the median or mode.

However, when our data is skewed, for example, as with the correct-skewed data set beneath:

Histogram of a skewed distribution showing a noticable difference between the median and mean values

We find that the hateful is beingness dragged in the straight of the skew. In these situations, the median is generally considered to be the all-time representative of the cardinal location of the data. The more than skewed the distribution, the greater the divergence between the median and mean, and the greater emphasis should be placed on using the median equally opposed to the mean. A archetype example of the in a higher place right-skewed distribution is income (salary), where higher-earners provide a false representation of the typical income if expressed as a mean and not a median.

If dealing with a normal distribution, and tests of normality show that the information is non-normal, it is customary to utilize the median instead of the mean. However, this is more a dominion of pollex than a strict guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and hateful are not appreciably unlike (a subjective assessment), and if it allows easier comparisons to previous research to exist made.

Summary of when to use the mean, median and manner

Please use the following summary table to know what the all-time measure of central trend is with respect to the different types of variable.

Type of Variable	Best measure of fundamental tendency
Nominal	Fashion
Ordinal	Median
Interval/Ratio (not skewed)	Mean
Interval/Ratio (skewed)	Median

For answers to often asked questions nearly measures of central trend, please go the adjacent page.