Precision and Recall in Machine Learning

What is precision and recall?

This is a very popular interview question for data scientists, program managers and AI (Artificial Intelligence) software engineers. In fact, I was still asked this question even when interviewing at Facebook as a TPM (Technical Program Manager) and not a data scientist. So, what is precision and recall in machine learning? Precision and recall are measurement metrics used to quantify the performance of machine learning and deep learning classifiers. They’re expressed as fractions or percentages (e.g., 50%) with 100% as the best score.

  • Recall: of the relevant observations, how many were selected? i.e., recall = TP / (TP + FN)
  • Precision: of the selected observations, how many were relevant? i.e., precision = TP / (TP + FP)

This is a textbook answer: contains a definition and a formula. Ideally, it should come with an example to show that you really understand the question. But during my transition from a regular software TPM to an AI (Artificial Intelligence) TPM, I still struggled to understand many of the concepts that were packed in this answer. For example, what is TP (True Positive), FP (False Positive), FN (False Negative) or what does it mean to be positive (P) or negative (N)?

In this blog, I’m going to unpack several of the concepts that are packed into this definition. Once you get comfortable with these concepts, we’ll dive deep into actual use-cases and you can test yourself with several practice problems on how well you truly understand the explanations. Remember, just because you read something and think you understand doesn’t mean you understand it.

❝ Tell me and I forget.
Teach me and I remember.
Involve me and I learn. ❞
– Benjamin Franklin

Being told something or even being taught is just one-way interaction. A true test is for you to get involved in some form of two-way interaction. And for that, a good start will be to take several practice quizzes. If you have truly learned what this blog will teach you about precision and recall, you should get a score of 100%. Once you understand this, we can move to F1 scores, specificity and other performance metrics. First, you have to separate the concepts from the technical jargon.

Understand the Concepts

Accuracy, precision and recall are all known as classification metrics. A classification metric is a way of quantifying how well you’ve correctly assigned the correct label or category to a categorization problem. For example, let’s say you built a cat classification algorithm that tells you whether you’ve correctly classified an image as either a cat or a dog. Since no classification algorithm is perfect, you’ll find the classifier performance suffers due to mistakes in the form of data being lost i.e., mis-classified. And the mental model to understand this concept is to visualize a funnel with two leaky faucets where data is lost.

Fig. 1: The Leaky Precision-Recall Funnel

In fact, classification models always make the same two two types of mistakes and thus leak data in two ways:

  1. Some data gets skipped (top leaky faucet) i.e., recall measures the percentage of data that wasn’t lost (top half of the funnel)
  2. Some data gets mis-classified (bottom leaky faucet) i.e., precision measures the percentage of data that was correctly classified AFTER it was skipped (bottom half of the funnel)

A Simple Example

Let’s pick a binary classification example. This time with a quiz where you have a binary situation: either you answer a question correctly or incorrectly. In this quiz, you were given 100 questions of which you answered all of them. But you answered 80 questions correctly or, in machine language terminology, you correctly classified 20 questions.

  • Accuracy = Correctly Answered Questions/ (All questions)
    • Correctly Answered Questions = 80
    • All Questions = 100
    • Accuracy = 80 / 100 = 80%
  • Recall = Correctly Answered Questions / (Correctly Answered Questions+ Skipped Questions)
    • Skipped Questions = 0
    • Recall = 80 / (80 + 0) = 100%
  • Precision = Correctly Answered Questions/ (Correctly Answered Questions + Incorrectly Answered Questions)
    • Incorrectly Answered Questions = 20
    • Precision = 80 / (80 + 20) = 80%

So your accuracy score is 80%, precision is also 80% and recall is 100% since you didn’t skip any questions. From this perspective, it doesn’t even seem useful to measure precision or recall since they don’t seem to have much of an impact. But that’s because only one leak occurred and that’s why it doesn’t matter. Remember, accuracy ~= precision x recall. So in this scenario, you’re right. It doesn’t matter.

A more Realistic Example

But a typical scenario is where both types of data leakage occurs: data gets skipped and gets mis-classified. Let’s try the same quiz again but this time, what happens if you ran out of time and couldn’t answer all 100 questions in the quiz. Let’s say you skipped 15 questions and incorrectly answered 5 questions. But you correctly answered the remaining 80 questions. So from an accuracy perspective, yes, your score is still 80% (80/100). But now, if trying to improve for the next time you take the quiz, you want to analyze where you need to improve: is it a recall issue (you skipped some questions) or a precision issue (you incorrectly answered some questions)?

  • Accuracy = Correctly Answered Questions/ (All questions)
    • Correctly Answered Questions = 80
    • All Questions = 100
    • Accuracy = 80 / 100 = 80%
  • Recall = Correctly Answered Questions / (Correctly Answered Questions+ Skipped Questions)
    • Skipped Questions = 15
    • Recall = 80 / (80 + 15) = 84%
  • Precision = Correctly Answered Questions/ (Correctly Answered Questions + Incorrectly Answered Questions)
    • Incorrectly Answered Questions = 5
    • Precision = 80 / (80 + 5) = 94%
  • Accuracy ~= Precision x Recall
    • Accuracy ~= 94% x 84% ~= 80% (it’s actually 79.3% but for now, we’re keeping it simple to understand the relationship between accuracy, precision and recall)

Since your precision is good (94%), it looks like you studied well for this quiz! But your recall is only 84% so your best way to get better results for the next quiz is to focus on your speed in answering so you don’t miss any questions. Similarly, this analytic method – breaking accuracy into precision and recall – tells data scientists where they need to focus their time on improving.

Connect Terminology to Concepts

Now that you understand the concepts (you don’t really understand it unless you score 100% in the quiz), let’s redo the examples by connecting the technical jargon to the concepts.

  • Correct Answer = P (Positive)
  • Incorrect Answer = N (Negative)
  • Correctly Answered Questions = TP (True Positive)
  • Incorrectly Answered Questions = FP (False Positive)
  • Skipped Questions = FN (False Negative)

There’s also a TN (True Negative) data point in classification problems but in the case of a quiz, it’s non existent so you can safely ignore it. In fact, even with classification problems, true negative values are not even part of the formula for precision and recall.
Going through the first simplistic example, you correctly answered 80 questions out of 100. Since you were really fast in answering, you didn’t skip any questions.

  • Recall = TP / (TP + FN)
    • TP = Correctly Answered Questions = 80
    • FN = Skipped Questions = 0
    • Recall = 80 / (80 + 0) = 100% (which is excellent since you were able to answer every question you were given. Whether you answered them correctly or not will be answered by the precision metric
  • Precision = TP / (TP + FP)
    • FP = Incorrectly Answered Questions = 20
    • Precision = 80 / (80 + 20) = 80%
  • Accuracy = TP / (TP + FP + TN + FN)
    • TN = 0
    • Accuracy = 80 / (80 + 20 + 0 + 0) = 80%

Going through the more realistic scenario, you ran out of time and couldn’t answer all 100 questions in the quiz. Let’s say you skipped 15 questions and incorrectly answered 5 questions. But you correctly answered the remaining 80 questions.

  • Recall = TP / (TP + FN)
    • TP = Correctly Answered Questions = 80
    • FN = Skipped Questions = 15
    • Recall = 80 / (80 + 15) = 84%
  • Precision = TP / (TP + FP)
    • FP = Incorrectly Answered Questions = 5
    • Precision = 80 / (80 + 5) = 94%
  • Accuracy = TP / (TP + FP + TN + FN)
    • TN = 0
    • Accuracy = 80 / (80 + 5 + 0 + 15) = 80%

If you were able to understand the concepts so far, then you’re ready to take the quiz to test your understanding. You should score 100% in calculating accuracy, precision and recall. In the next blog, we’ll dig into a real-life use case of a classification problem involving a people counter that Inabia implemented in a few stores to count the number of people entering and leaving. While this doesn’t sound like a classification problem, it really is since sometimes our deep learning model missed a few people (skips or false negatives) and in other cases counted people where none existed (counted ghosts or false positives). Good luck in applying your understanding of precision and recall to your machine and deep learning problems.