# Outlier Calculator

Tukey's fences, Z-score

The outlier calculator identifies the outliers and graphs the data. It includes a scatter plot, boxplot, histogram, and optional step-by-step calculation.

### What is an outlier?

The outlier is an extreme observation value. The outlier's location is far from most data observations.
An outlier may be a valid value or an incorrect value.

### Why should you identify outliers?

Outliers may identify potentially incorrect observations or incorrect statistical distribution.
Several statistics are sensitive to outliers, like average and standard deviation, and some statistics are robust to outliers, like median and mode.
Some statistical tests like variance tests are very sensitive to outliers, and some statistical tests are robust to outliers like non-parametrical tests.

### What to do with the outlier?

After using the outlier calculator you need to decide what to do with the outliers.
You should exclude only invalid outliers.

TypeReasonDescriptionWhat to do?
Observation errorMeasurement errorThe measurement tool is not good, or not calibrated
Wrong measurement process
Exclude such outliers
Experiment errorFor example, the temperature of some subjects was higher, and this resulted in higher valuesExclude the outlier or repeat the experiment
Human errorAny mistake that is done by a person, like incorrectly reading the measurement toolExclude
Incorrect statistical modelIncorrect distributionFor example if you assume the normal distribution and use the z-score method to identify the outliers.
A skewed distribution or heavy-tailed distribution will result in many outliers
1. Use the correct distribution to identify the outliers
2. Use the Tukey's fence method that is less sensitive to the distribution
3. Transform the data to fit the normal distribution
4. Use a non-parametric test that is not sensitive to outliers
Mixed populationWhen the data include two or more groups with different characteristics1. Analyze each data group separately
2. Use a model that treats the group, for example, add group predictor in a regression
Valid outliersRandomFor any method you use to identify the outliers there is a low probability to identify valid data as an outlier.
If you use a large sample size you will get some valid outliers.
For example, when using the z-score with two standard deviations, around 4.5% of the valid observations will be outliers.
Include

### Outliers calculation methods

There are many methods to identify outliers, this outlier calculator uses the following methods.

#### Tukey's fences

Q1 - Lower quartiles.
Q3 - Upper quartiles.
Interquartile Range : IRQ = Q3 - Q1.
Usually with k = 1.5 for a regular outliers and k = 3 for extreme outliers.
Some people recommend to use k = 2.2

##### Lower fence formula

Lower fence = Q1 - k * IRQ.

##### Upper fence formula

Upper fence = Q3 + k * IRQ.

#### Z-score

The data should be symmetrical, and if the data's distribution is normal you may estimate the number of valid outliers.
Usually, we use Z-score = 3, allowing three standard deviations from the average. In this case, if the data distributes normally with no invalid outliers, 0.27% of the data will be outliers on average. p( z < -3 ) + p( z > 3) = 0.0027, when z's distribution is standard normal, N(0,1).
Some people use Z-score = 2, allowing two standard deviations from the average. In this case, if the data distributes normally with no invalid outliers, 4.55% of the data will be outliers on average.

##### Lower limit

Lower = Average - k * S.

##### Lower limit

Lower = Average - k * S.

#### Outliers - Visual Identification

You may identify the outliers by graphing the sample data:
Boxplot maker, Advanced boxplot maker, Histogram maker

#### Multivariable outlier

univariate outliers: outliers of objects that contains only one dimension.
multivariate outliers: outliers of multi-dimensional objects.
The outlier calculator identifies only the univariate outliers
For multivariate outliers you may use the following calculators:
1. Multiple Linear regression - you may find the outliers in 'Residual' column.
2. Cluster analysis - using the Silhouette method.