148 - 7 techniques to work with imbalanced data for machine learning in python

DigitalSreeni
DigitalSreeni
14.2 هزار بار بازدید - 4 سال پیش - Imbalanced data is part of
Imbalanced data is part of life! With a proper knowledge of the data set and a few techniques from this video imbalanced data can be easily managed. Prerequisites: Pick the right metrics as overall accuracy does not provide information about the accuracy of individual classes. Look at confusion matrix and ROC_AUC. Technique 0: Collect more data, if possible. Technique 1: Pick decision tree based approaches as they work better than logistic regression or SVM. Random Forest is a good algorithm to try but beware of over fitting. Technique 2: Up-sample minority class Technique 3: Down-sample majority class Technique 4: A combination of Over and under sampling. Technique 5: Penalize learning algorithms that increase cost of classification mistakes on minority classes. Technique 6: Generate synthetic data (SMOTE, ADASYN) Technique 7: Add appropriate weights to your deep learning model. References: imbalanced-learn.org/stable/over_sampling.html?hig… scikit-learn.org/stable/modules/generated/sklearn.… Code generated in the video can be downloaded from here: github.com/bnsreenu/python_for_microscopists
4 سال پیش در تاریخ 1399/05/21 منتشر شده است.
14,267 بـار بازدید شده
... بیشتر