Identifying and dealing with outliers can be tough, but it is an essential part of the data analytics process, as well as for feature engineering for machine learning. So how do we find outliers? Luckily, there are several methods for identifying outliers that are easy to execute in Python using only a few lines of code. Before diving into methods that can be used to find outliers, let’s first review the definition of an outlier and load a dataset. By the end of the article, you will not only have a better understanding of how to find outliers, but also know how to work with them when preparing your data for machine learning.
We’ll cover all of this using the following headings:
- What is an outlier?
- How do you find outliers in your dataset?
- Finding outliers using statistical methods
- Working with outliers using statistical methods
- Wrapping up and next steps