How To Draw A Balance Scale

Imbalanced classes put "accuracy" out of business concern. This is a surprisingly common problem in machine learning (specifically in classification), occurring in datasets with a asymmetric ratio of observations in each grade.

Standard accuracy no longer reliably measures operation, which makes model training much trickier.

Imbalanced classes appear in many domains, including:

Fraud detection
Spam filtering
Illness screening
SaaS subscription churn
Advertizement click-throughs

In this guide, we'll explore 5 effective ways to handle imbalanced classes.

Intuition: Illness Screening Case

Permit'due south say your client is a leading enquiry hospital, and they've asked you to train a model for detecting a affliction based on biological inputs nerveless from patients.

Merely hither'southward the grab… the disease is relatively rare; it occurs in just viii% of patients who are screened.

Now, before you fifty-fifty get-go, do you run across how the problem might break? Imagine if you didn't bother training a model at all. Instead, what if you but wrote a single line of lawmaking that always predicts 'No Affliction?'

def disease_screen ( patient_data ) :

# Ignore patient_data

render 'No Affliction.'

Well, guess what? Your "solution" would have 92% accurateness!

Unfortunately, that accuracy is misleading.

For patients who practice not have the disease, yous'd have 100% accurateness.
For patients whopractice have the disease, y'all'd have 0% accuracy.
Your overall accuracy would be loftier simply because near patients do not have the affliction (not because your model is any good).

This is clearly a problem because many auto learning algorithms are designed to maximize overall accuracy. The rest of this guide volition illustrate different tactics for treatment imbalanced classes.

Of import notes before we brainstorm:

First, please note that we're not going to dissever out a dissever examination prepare, tune hyperparameters, or implement cross-validation. In other words, we're non necessarily going to follow all-time practices.

Instead, this tutorial is focused purely on addressing imbalanced classes.

In addition, not every technique below will work for every problem. However, 9 times out of 10, at to the lowest degree one of these techniques should do the fob.

Residue Scale Dataset

For this guide, we'll use a synthetic dataset called Balance Scale Data, which you can download from the UCI Machine Learning Repository here.

This dataset was originally generated to model psychological experiment results, merely it'southward useful for us because it's a manageable size and has imbalanced classes.

import pandas as pd

import numpy equally np

# Read dataset

df = pd . read_csv ( 'rest-scale.data' ,

names=[ 'residue' , 'var1' , 'var2' , 'var3' , 'var4' ] )

# Display instance observations

df . head ( )

Balance Scale Dataset

The dataset contains data about whether a scale is counterbalanced or not, based on weights and distances of the two artillery.

It has ane target variable, which nosotros've labeled residuum .
It has 4 input features, which we've labeled var1 through var4 .

Image Scale Data

The target variable has three classes.

R for right-heavy, i.eastward. when var3 * var4 > var1 * var2
L for left-heavy, i.eastward. when var3 * var4 < var1 * var2
B for balanced, i.e. when var3 * var4 = var1 * var2

df [ 'rest' ] . value_counts ( )

# R 288

# L 288

# B 49

# Proper name: rest, dtype: int64

However, for this tutorial, we're going to plow this into a binary nomenclature problem.

Nosotros're going to label each observation as1 (positive grade) if the scale is balanced or0 (negative course) if the calibration is not balanced:

# Transform into binary nomenclature

df [ 'balance' ] = [ 1 if b=='B' else 0 for b in df . residuum ]

df [ 'balance' ] . value_counts ( )

# 0 576

# i 49

# Proper name: residual, dtype: int64

# Nearly viii% were balanced

Every bit you can run into, only nigh 8% of the observations were counterbalanced. Therefore, if we were to e'er predict0, we'd accomplish an accuracy of 92%.

The Danger of Imbalanced Classes

Now that we have a dataset, nosotros can really show the dangers of imbalanced classes.

Kickoff, let'due south import the Logistic Regression algorithm and the accurateness metric from Scikit-Learn.

from sklearn . linear_model import LogisticRegression

from sklearn . metrics import accuracy_score

Next, we'll fit a very simple model using default settings for everything.

# Divide input features (10) and target variable (y)

y = df . balance

X = df . drop ( 'balance' , axis=ane )

# Train model

clf_0 = LogisticRegression ( ) . fit ( 10 , y )

# Predict on training set

pred_y_0 = clf_0 . predict ( X )

As mentioned above, many machine learning algorithms are designed to maximize overall accuracy by default.

We can confirm this:

# How's the accuracy?

impress ( accuracy_score ( pred_y_0 , y ) )

# 0.9216

So our model has 92% overall accuracy, but is it because it's predicting only 1 class?

# Should nosotros be excited?

print ( np . unique ( pred_y_0 ) )

# [0]

As you tin see, this model is just predicting0, which ways it'south completely ignoring the minority class in favor of the majority class.

Adjacent, nosotros'll look at the showtime technique for handling imbalanced classes: upwards-sampling the minority grade.

ane. Up-sample Minority Class

Upwards-sampling is the process of randomly duplicating observations from the minority course in order to reinforce its betoken.

In that location are several heuristics for doing and so, merely the near common way is to only resample with replacement.

First, we'll import the resampling module from Scikit-Learn:

from sklearn . utils import resample

Side by side, we'll create a new DataFrame with an upwards-sampled minority class. Here are the steps:

Get-go, we'll separate observations from each class into unlike DataFrames.
Adjacent, we'll resample the minority class with replacement, setting the number of samples to match that of the bulk grade.
Finally, we'll combine the up-sampled minority course DataFrame with the original majority class DataFrame.

Here's the code:

eight

fifteen

xvi

# Split majority and minority classes

df_majority = df [ df . balance==0 ]

df_minority = df [ df . residual==i ]

# Upsample minority course

df_minority_upsampled = resample ( df_minority ,

replace=True , # sample with replacement

n_samples=576 , # to match majority class

random_state=123 ) # reproducible results

# Combine majority class with upsampled minority class

df_upsampled = pd . concat ( [ df_majority , df_minority_upsampled ] )

# Brandish new class counts

df_upsampled . balance . value_counts ( )

# 1 576

# 0 576

# Name: balance, dtype: int64

Every bit you can see, the new DataFrame has more observations than the original, and the ratio of the 2 classes is now 1:1.

Let's railroad train some other model using Logistic Regression, this fourth dimension on the balanced dataset:

one

two

half-dozen

ten

eleven

# Separate input features (X) and target variable (y)

y = df_upsampled . balance

X = df_upsampled . drib ( 'balance' , axis=1 )

# Railroad train model

clf_1 = LogisticRegression ( ) . fit ( 10 , y )

# Predict on preparation set

pred_y_1 = clf_1 . predict ( Ten )

# Is our model still predicting just one grade?

print ( np . unique ( pred_y_1 ) )

# [0 1]

# How'south our accuracy?

impress ( accuracy_score ( y , pred_y_1 ) )

# 0.513888888889

Bully, now the model is no longer predicting just one class. While the accuracy also took a nosedive, information technology'due south now more meaningful every bit a performance metric.

2. Downwards-sample Bulk Class

Down-sampling involves randomly removing observations from the bulk course to prevent its signal from dominating the learning algorithm.

The almost mutual heuristic for doing so is resampling without replacement.

The process is like to that of upward-sampling. Hither are the steps:

First, we'll carve up observations from each course into different DataFrames.
Adjacent, nosotros'll resample the majority class without replacement, setting the number of samples to friction match that of the minority class.
Finally, we'll combine the down-sampled bulk class DataFrame with the original minority class DataFrame.

Here's the lawmaking:

three

five

eight

nine

ten

eleven

xiv

xvi

# Separate majority and minority classes

df_majority = df [ df . balance==0 ]

df_minority = df [ df . balance==ane ]

# Downsample majority grade

df_majority_downsampled = resample ( df_majority ,

replace=False , # sample without replacement

n_samples=49 , # to match minority course

random_state=123 ) # reproducible results

# Combine minority class with downsampled majority class

df_downsampled = pd . concat ( [ df_majority_downsampled , df_minority ] )

# Brandish new grade counts

df_downsampled . residue . value_counts ( )

# ane 49

# 0 49

# Name: balance, dtype: int64

This time, the new DataFrame has fewer observations than the original, and the ratio of the two classes is at present one:1.

Over again, let'due south railroad train a model using Logistic Regression:

two

five

seven

ten

xiv

# Separate input features (X) and target variable (y)

y = df_downsampled . balance

10 = df_downsampled . drop ( 'residue' , axis=1 )

# Train model

clf_2 = LogisticRegression ( ) . fit ( X , y )

# Predict on training fix

pred_y_2 = clf_2 . predict ( 10 )

# Is our model still predicting only 1 class?

impress ( np . unique ( pred_y_2 ) )

# [0 one]

# How's our accuracy?

print ( accuracy_score ( y , pred_y_2 ) )

# 0.581632653061

The model isn't predicting just 1 grade, and the accuracy seems college.

We'd all the same want to validate the model on an unseen test dataset, but the results are more than encouraging.

3. Alter Your Performance Metric

And then far, we've looked at two ways of addressing imbalanced classes past resampling the dataset. Next, nosotros'll expect at using other operation metrics for evaluating the models.

Albert Einstein in one case said, "if you approximate a fish on its ability to climb a tree, it will alive its whole life believing that it is stupid." This quote really highlights the importance of choosing the right evaluation metric.

For a full general-purpose metric for classification, nosotros recommendArea Under ROC Curve (AUROC).

We won't dive into its details in this guide, but you tin can read more about information technology here.
Intuitively, AUROC represents the likelihood of your model distinguishing observations from two classes.
In other words, if you randomly select one observation from each class, what'south the probability that your model will exist able to "rank" them correctly?

We can import this metric from Scikit-Acquire:

from sklearn . metrics import roc_auc_score

To calculate AUROC, you lot'll demand predicted class probabilities instead of just the predicted classes. You can get them using the . predict_proba ( ) function like so:

# Predict class probabilities

prob_y_2 = clf_2 . predict_proba ( Ten )

# Keep just the positive class

prob_y_2 = [ p [ 1 ] for p in prob_y_2 ]

prob_y_2 [ : 5 ] # Example

# [0.45419197226479618,

# 0.48205962213283882,

# 0.46862327066392456,

# 0.47868378832689096,

# 0.58143856820159667]

And then how did this model (trained on the downwards-sampled dataset) practise in terms of AUROC?

impress ( roc_auc_score ( y , prob_y_2 ) )

# 0.568096626406

Ok... and how does this compare to the original model trained on the imbalanced dataset?

prob_y_0 = clf_0 . predict_proba ( Ten )

prob_y_0 = [ p [ i ] for p in prob_y_0 ]

print ( roc_auc_score ( y , prob_y_0 ) )

# 0.530718537415

Call up, our original model trained on the imbalanced dataset had an accuracy of 92%, which is much college than the 58% accuracy of the model trained on the down-sampled dataset.

However, the latter model has an AUROC of 57%, which is higher than the 53% of the original model (but not by much).

Note: if you got an AUROC of 0.47, it just means you demand to capsize the predictions considering Scikit-Larn is misinterpreting the positive class. AUROC should be >= 0.five.

4. Penalize Algorithms (Cost-Sensitive Training)

The next tactic is to employ penalized learning algorithms that increase the cost of classification mistakes on the minority course.

A popular algorithm for this technique is Penalized-SVM:

from sklearn . svm import SVC

During training, nosotros can use the statement class_weight='counterbalanced' to penalize mistakes on the minority grade by an amount proportional to how under-represented information technology is.

We also want to include the argument probability=True if nosotros desire to enable probability estimates for SVM algorithms.

Let'due south train a model using Penalized-SVM on the original imbalanced dataset:

iii

seven

viii

# Separate input features (Ten) and target variable (y)

y = df . residue

X = df . drop ( 'balance' , axis=1 )

# Railroad train model

clf_3 = SVC ( kernel='linear' ,

class_weight='counterbalanced' , # penalize

probability=True )

clf_3 . fit ( X , y )

# Predict on training set

pred_y_3 = clf_3 . predict ( Ten )

# Is our model still predicting simply one form?

print ( np . unique ( pred_y_3 ) )

# [0 ane]

# How's our accuracy?

print ( accuracy_score ( y , pred_y_3 ) )

# 0.688

# What about AUROC?

prob_y_3 = clf_3 . predict_proba ( X )

prob_y_3 = [ p [ 1 ] for p in prob_y_3 ]

impress ( roc_auc_score ( y , prob_y_3 ) )

# 0.5305236678

Once again, our purpose here is only to illustrate this technique. To really make up one's mind which of these tactics works best for this problem, you'd want to evaluate the models on a concur-out exam set.

five. Apply Tree-Based Algorithms

The final tactic we'll consider is using tree-based algorithms. Decision copse often perform well on imbalanced datasets because their hierarchical structure allows them to learn signals from both classes.

In modern applied machine learning, tree ensembles (Random Forests, Slope Boosted Copse, etc.) almost always outperform singular decision copse, and then nosotros'll jump right into those:

from sklearn . ensemble import RandomForestClassifier

At present, permit's train a model using a Random Forest on the original imbalanced dataset.

ane

half-dozen

seven

nine

eleven

xiv

xviii

twenty

# Carve up input features (X) and target variable (y)

y = df . rest

X = df . drop ( 'balance' , axis=1 )

# Train model

clf_4 = RandomForestClassifier ( )

clf_4 . fit ( X , y )

# Predict on preparation set

pred_y_4 = clf_4 . predict ( X )

# Is our model still predicting merely one class?

print ( np . unique ( pred_y_4 ) )

# [0 1]

# How's our accuracy?

print ( accuracy_score ( y , pred_y_4 ) )

# 0.9744

# What about AUROC?

prob_y_4 = clf_4 . predict_proba ( X )

prob_y_4 = [ p [ 1 ] for p in prob_y_4 ]

print ( roc_auc_score ( y , prob_y_4 ) )

# 0.999078798186

Wow! 97% accuracy and about 100% AUROC? Is this magic? A sleight of paw? Cheating? Likewise good to be truthful?

Well, tree ensembles take become very popular considering they perform extremely well on many real-world issues. We certainly recommend them wholeheartedly.

However:

While these results are encouraging, the modelcould be overfit, so you should nonetheless evaluate your model on an unseen test set earlier making the final decision.

Annotation: your numbers may differ slightly due to the randomness in the algorithm. You can fix a random seed for reproducible results.

Honorable Mentions

There were a few tactics that didn't get in into this tutorial:

Create Synthetic Samples (Data Augmentation)

Creating constructed samples is a shut cousin of upward-sampling, and some people might categorize them together. For example, the SMOTE algorithm is a method of resampling from the minority class while slightly perturbing feature values, thereby creating "new" samples.

You can discover an implementation of SMOTE in the imblearn library.

Combine Minority Classes

Combining minority classes of your target variable may be advisable for some multi-form issues.

For instance, let's say you wished to predict credit carte fraud. In your dataset, each method of fraud may be labeled separately, merely you might non intendance about distinguishing them. You could combine them all into a single 'Fraud' grade and care for the problem as binary nomenclature.

Reframe as Anomaly Detection

Anomaly detection, a.k.a. outlier detection, is for detecting outliers and rare events. Instead of edifice a nomenclature model, yous'd take a "profile" of a normal ascertainment. If a new observation strays likewise far from that "normal profile," information technology would be flagged equally an anomaly.

Determination & Next Steps

In this guide, we covered 5 tactics for handling imbalanced classes in machine learning:

Up-sample the minority grade
Downward-sample the majority class
Change your performance metric
Penalize algorithms (cost-sensitive grooming)
Utilise tree-based algorithms

These tactics are subject to the No Complimentary Lunch theorem, and you should attempt several of them and apply the results from the examination ready to determine on the best solution for your problem.

Source: https://elitedatascience.com/imbalanced-classes

Posted by: leveyoverects36.blogspot.com

How To Draw A Balance Scale

Intuition: Illness Screening Case

Of import notes before we brainstorm:

Residue Scale Dataset

The Danger of Imbalanced Classes

ane. Up-sample Minority Class

2. Downwards-sample Bulk Class

3. Alter Your Performance Metric

4. Penalize Algorithms (Cost-Sensitive Training)

five. Apply Tree-Based Algorithms

Honorable Mentions

Create Synthetic Samples (Data Augmentation)

Combine Minority Classes

Reframe as Anomaly Detection

Determination & Next Steps

0 Response to "How To Draw A Balance Scale"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel