Path: blob/master/examples/structured_data/ipynb/imbalanced_classification.ipynb
3508 views
Kernel: Python 3
Imbalanced classification: credit card fraud detection
Author: fchollet
Date created: 2019/05/28
Last modified: 2020/04/17
Description: Demonstration of how to handle highly imbalanced classification problems.
Introduction
This example looks at the Kaggle Credit Card Fraud Detection dataset to demonstrate how to train a classification model on data with highly imbalanced classes.
First, vectorize the CSV data
In [0]:
Prepare a validation set
In [0]:
Analyze class imbalance in the targets
In [0]:
Normalize the data using training set statistics
In [0]:
Build a binary classification model
In [0]:
Train the model with class_weight
argument
In [0]:
Conclusions
At the end of training, out of 56,961 validation transactions, we are:
Correctly identifying 66 of them as fraudulent
Missing 9 fraudulent transactions
At the cost of incorrectly flagging 441 legitimate transactions
In the real world, one would put an even higher weight on class 1, so as to reflect that False Negatives are more costly than False Positives.
Next time your credit card gets declined in an online purchase -- this is why.