The Bayes Theorem: Who committed the crime?

Ritesh Uppal
4 min readMay 7, 2022

You might have come across the red and blue ball example in an attempt to understand the Bayes theorem. The explanation does come in handy, but there is more to the Bayes theorem than just calculating P(red ball| blue ball). In this article, I try to explain the Bayes theorem with a simple example of solving a murder mystery!

What is Bayes Theorem?

The Bayes Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event [1].

Fig 1: The Bayes theorem

Our target here is to find the probability of event B1 happening, given that event-A has already happened, i.e., P(B1|A).

Derivation of Bayes Theorem

By definition of conditional probability,

P(B1|A) = P(B1 ∩ A)/ P(A) where P(A)≠0 

Now, with similar logic

P(A|B1) = P(A ∩ B1)/ P(B1) where P(B1) ≠0P(A ∩ B1)= P(B1 ∩ A)= P(A|B1) * P(B1)

Let B1, B2, B3,….Bn be n mutually exclusive events such that

B1 ∪ B2 ∪ B3... ∪ Bn = Ω (Sample Space) (Exhaustive events)

In simple terms, it means that B1, B2, B3,…Bn cannot happen simultaneously (mutually exclusive) and an event B can happen only because of B1, B2, etc…Bn, i.e., there is no other way of B happening outside these n events.

Example: Let us throw a dice and record three eventsA1 = {1,2,3}A2 = {4,5}A3 = {6}Here A1 ∪ A2 ∪ A3 = {1,2,3,4,5,6} = Ω (Exhaustive events)Also, A1 ∩ A2 = Φ, A2 ∩ A3 = Φ, A3 ∩ A1 = Φ (Mutually exclusive)

Continuing with the derivation,

Fig 2: Event A is a union of mutually exclusive events and hence has a unique intersection with each event
P(A) (Probability of any event A) [See the above figure]= P(A ∩ Ω) = P( A ∩ (B1 ∪ B2 ∪ B3… ∪ Bn))= P( (A ∩ B1) ∪ (A ∩ B2).. ∪( A ∩ Bn))= ∑ P(A ∩ Bi) where i ranges from 1 to n ( This is because event A has a unique intersection with every event Bi as there is no intersection among the events themselves) = ∑ P(A|Bi) * P(Bi) where i ranges from 1 to n 

Hence, P(B1|A) = (P(A|B1) * P(B1)) / (∑ P(A|Bi) * P(Bi)) i ranges [1,n]

Who committed the crime?

A minor crime has happened; let it be called event CRIME. Police know that two persons, P1 and P2, might have committed the crime, and no one else can do it. Police want to find the culprit. Here, the condition of mutually exclusive events (either P1 or P2, not both) and exhaustive events (No one other than P1 or P2 can commit the crime) meet here.

Fig 2: The two suspects

P( Pi): The probability that ith person was in the crime spot at the time of the crime.

P( Pi |CRIME): The probability that Pi has done the crime given that the crime has already happened. This is called posterior probability because we find the cause after the event (CRIME here) has occurred.

P( CRIME|Pi): The probability that Pi has done the crime. This is called prior probability because it gives the probability of the event (CRIME here) on the condition that Pi has happened, i.e., we are modeling the event with prior knowledge that Pi has happened.

Note: Posterior probability allows us to find the probability of different causes that might have been the reason for the occurrence of an event. In contrast, the prior probability helps us model the event assuming some conditions are true.

Why do we need both P(Pi) and P(CRIME|Pi) to compute P(Pi| CRIME)?

Suppose P1 has a higher chance of committing the crime with a 0.9 probability, whereas P2 has a smaller chance of committing the crime with a 0.7 probability. Before you rest your case, what about the probability of finding P1 and P2 at the crime spot? One person can likely be a criminal, but who would you arrest now if they were not present at the crime spot?

Let us say the probability of P1 being present at the crime spot is 0.1 whereas the probability of P2 being present at the crime spot is 0.8. So now we need to take care of both, but how? Here comes THE BAYES THEOREM

P(CRIME|P1) = 0.9, P(CRIME|P2) = 0.7

P(P1) = 0.1, P(P2) = 0.8

Then using the Bayes theorem,

P (P1| CRIME)

= [P(CRIME|P1)* P(P1)]/[P(CRIME|P1) * P(P1) + P(CRIME|P2) * P(P2)]

= 0.9*0.1 / (0.9*0.1 + 0.7*0.8) = 0.14

P (P2| CRIME)

= [P(CRIME|P2)* P(P2)]/[P(CRIME|P1) * P(P1) + P(CRIME|P2) * P(P2)]

= 0.7*0.8 / (0.9*0.1 + 0.7*0.8) = 0.86

Hence P2 has a higher chance of committing the crime!

Where to use the Bayes theorem?

In reality, we often encounter a situation where something has happened, and we want to find its cause [2]. This is where the theorem can be of great help!

Do follow me for more such articles

References

  1. Joyce, James (2003), “Bayes’ Theorem”, in Zalta, Edward N. (ed.), The Stanford Encyclopedia of Philosophy (Spring 2019 ed.), Metaphysics Research Lab, Stanford University, retrieved 2020–01–17
  2. https://nptel.ac.in/courses/111/102/111102134/

--

--

Ritesh Uppal

Got hit in head by waves of data! Research Intern @Samsung | Ex-Business Analyst @UC Berkeley