Performance measurement of models

5 min readSep 21, 2019

In Machine Learning, performance measurement is an essential task. here if few methods used in ML/DL etc

A. Accuracy(classification)

it is defined as the number of correctly classified point/total number of point in the test set. it is very easy to measure the performance of the model.

Imbalance data: it is not very useful to measure in Imbalance dataset
when the model returns the probability score, the accuracy may not be the best possible method to measure. For the classification problem, we convert probability to 1,0 based on values. so inaccuracy measure, model M1 can be better than M2 based on probability score but they are showing the same result in classification problem.

DDI Editor's Pick: 5 Machine Learning Books That Turn You from Novice to Expert | Data Driven…

The booming growth in the Machine Learning industry has brought renewed interest in people about Artificial…

www.datadriveninvestor.com

B. Confusion matrix, TPR, FPR, FNR, TNR:(classification)

Name of the cell decides by the predicted value.

confusion matrix does not process the probability score.

model is good when TPR, TNR is high and FPR, FNR is low.

Imbalance dataset: Confusion matrix is good for imbalance dataset.

C. Precision and recall, F1-score(classification)

often used in the search engine, for information retrieval from huge collection of text data.

Precision: Of all the point model declare/predicted positive, what percentage of them are actually positive.

Recall: of all point actually positive point, how many of them predicted positive. Precision and recall only care for positive class.

0<precision and recall<1

F-1 score: combined both precision and recall.

Receiver Operating Characteristic Curve (ROC) curve and AUC(classification)

AUC — ROC curve is a performance measurement for the classification problem at various thresholds settings. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.

How to calculate the AUC value? Suppose we are taking binary classification problem where output I'm getting in the form of probability score.

Step1: Sort predicted data in decreasing order of y value

Step2: Set a threshold T1, T2, T3… : suppose I'm taking T1 as the first threshold. if value greater than T1 declared it as 1 if less than T1 declared it as 0. for every T1, calculate TPR and FPR.

for n point(observation) have Tn threshold.

Now plot ROC curve from using TPR, FPR from the above data.

The ROC curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis. The area under guess(random) model is 0.5 but the area of the total curve below better ROC os lie between 0–1. This only used for binary classification. very rare case, we use for multi-class classification.

For Imbalance data, AUC can be high but the model may not be good.
AUC does not take the actual value of predicted y instead it depends upon ordering. AUC did not use the value of probability for area calculation.
AUC of the random model will be 0.5.

Note: if your model return y=0 than just change the class level.

How to use AUC ROC curve for the multi-class model?

In the multi-class model, we can plot N number of AUC ROC Curves for N number classes using One vs ALL methodology. So for example, If you have three classes named X, Y, and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and the third one of Z classified against Y and X.

D. Log-loss:(classification)

It uses the exact probability score. we want log-loss as small as possible. its value lie between 0 to infinity. Log-loss for binary and for multiclass classification, shown below. This is a very powerful method to measure for both binaries as well as multi-class classification.

Note: it hard to interpret the log-loss value so interpretability low.

E. R-Squared/Coefficient of determination(used for regression)

SSE is also written as the sum of the squire of residue. value of R-squire lies between 0–1.

Problem: if there is an outlier, R-Squire not used in this case. it values always increase with adding a variable which is not always useful.

F. Median absolute deviation (MAD)

it is the median of absolute value of deviation.

Median measure the central tendency of error.

G. Distribution of errors

Use pdf and CDF of error.

Suppose x-axis is an error value of your model. in positive skewed distribution, most of the error have very less error value means the model is good.

from CDF you can say, at point x=3. means 90 % of error are less than x=3.which is good for my model. this is very useful when you want to compare the regression models by just plotting CDF of their error.