Find Outliers with AutoEncoder - Full Tutorial (Hands-on and Theory)

Data Science Garage
Data Science Garage
1.2 هزار بار بازدید - 2 سال پیش - This video shows hot to
This video shows hot to find outliers in data by using AutoEncoders. This approach relies on reconstruction errors calculated by AutoEncoders which can be trained in your Jupyter Notebook with Python programming language very quickly. In this video we will setup a very simple AutoEncoder which consists of Encoder and Decoder.

For this example I used a circle-shaped data distribution, but this approach will work for any shape of data distribution. The main idea is that Encoder compress the input data which results in less dimensions of data and then the Decoder attempts to reconstruct the compressed data back to original data distribution.

These data points which are not outliers has low reconstruction error, while these data points which are far away from dominating data distribution (circle-shape in this example) are identified as anomaly or outliers.

Link to the Github repo with Jupyter Notebook and sample data: https://github.com/vb100/autoencoder_...

To setup an autoencoder with encoder and decoder, I used Tensorflow 2.10 in this video tutorial.
During the video I ofter visualize the data to better understand what is going on behind to open Black box of the main idea for you.

Before ingesting the initial input to Decoder part, we must to convert our numerical data representation into Tensorflow Tensor. Tensor is that format which is readable for Artificial Neural Network (ANN) in many architectures (Tensorflow, PyTorch, etc.). By having this, we are ready to train our AutoEncoder. I also suggest to apply numpy method to the tensor to be sure that the data type of our tensor is float (floating number).

To calculate the reconstruction errors for our data points we use the MSE (Mean Squared Error) loss function. It is enough to find outliers (anomalies) in our dataset.

The content of the tutorial:
0:00 - Intro
2:23 - Hand-on with Python: (1) Load dependencies
3:35 - (2): Load data with Pandas
7:21 - (3): Setup AutoEncoder with Tensorflow
12:31 (4) Get reconstruction errors
15:57 (5) - Construct a Pandas Dataframe with the results
17:33 - Automating detecting outliers (theory)

Data outliers is still a hot topic in Data Science and Data Analytics (even in Business Analytics) and make a significant impact to business outcomes calculations, Machine Learning model performance and business-related conclusions and insights. This is the reason I decided to create to create this tutorial.

This approach reminds me PCA method which separates signal from noise. You can learn more about that in this tutorial: PCA in Machine Learning. Why PCA is i...

#outliers #anomalydetection #findoutliers #statistics #tensorflow #pandas #decoder #encoder #MSE #meansquarederror #machinelearning #python #numpy #pytorch #automating #tutorial #deeplearning #businessanalytics #datascience

Happy learning!
Your - ‪@DataScienceGarage‬
Subscribe the channel to get more useful videos. See you there!
2 سال پیش در تاریخ 1401/08/14 منتشر شده است.
1,225 بـار بازدید شده
... بیشتر