COVID-19: A Poisson Process

Ritesh Uppal
6 min readMay 29, 2022

The majority of statistics courses focus solely on derivations, with little attention paid to the practical applications of these topics. I’m going to try to explain the Poisson Process in this article with the help of our constant partner, Mr. Covid-19! (Not being sexist)

Fig 1: Deriving equations

Poisson Process

The Poisson process is used for modeling an arrival process that happens at a specific rate but completely at random [1]. Here ‘arrival’ can mean the number of telephone calls in a call center, the number of cars arriving in a toll plaza, or the number of errors coming on a page of a massive text document [2]. It answers the question, ‘If there are X arrivals in time t (rate = X/t), what is the probability of K arrivals in time t1?’

For instance, suppose in a book of 400 pages there are 200 many errors. What is the probability that there are no errors in a random sample of 10 pages?

As another example, suppose we know earthquakes occur in a particular area at a rate of 5 per month. Other than this information, the timings of earthquakes can be completely random. Shown below are three different possible earthquake timings over a month.

Fig 2: Occurrence of an earthquake over a month (Case 1)
Fig 3: Occurrence of an earthquake over a month (Case 2)
Fig 4: Occurrence of an earthquake over a month (Case 3)

Note: Here, I have assumed that every month has precisely five earthquakes to simplify my explanation. There can be more or fewer earthquakes, but the probability of that happening is less than that of 5 earthquakes in a month. This is because the Poisson distribution assumes its peak at the average value ( 5 here).

Assumptions of the Poisson Process

  1. The number of arrivals in non-overlapping time intervals is independent. For example, the number of arrivals from 1–2 pm is independent of the number from 12–1 pm.
  2. Two events cannot occur at the same time. Technically, the probability of getting more than one arrival during (t, t+Δt) is almost 0 as Δt approaches 0. For example, a call center receiving two or more calls simultaneously on the same device is not permitted.
  3. The probability of one arrival during a small time interval Δt is λΔt, where λ is a constant called arrival rate. It means that the probability of arrival is just dependent on the length of the interval, Δt, as λ is already constant.

The Poisson process as a Poisson distribution

The formula for the Poisson distribution probability mass function is

Fig 5: Probability mass function(PMF) for the Poisson distribution

λ is the shape parameter that indicates the average number of events in the given time interval, and x can take values 0,1,2,3….

Fig 6: The Poisson distribution for different values of λ

Let Xt be the number of arrivals in time t, and Px(t) denote the probability of x arrivals in time t. As x represents the number of arrivals, it can take the value 0,1,2,3… Then the number of arrivals in time t, given by Xt, follows the Poisson distribution with parameter λ*t, where λ is the arrival rate.

Fig 7: The Poisson process as a Poisson distribution

Hence, the Poisson Process, which is the distribution of the number of arrivals in any interval, depends only on the length of the interval (t) and not on the exact location of the interval on the real line [1].

The memoryless property of the interarrival times

In a Poisson process, the arrival of an event is independent of the event before (waiting time between events is memoryless)[3]. If the first earthquake happened 10 minutes after the observation began, the next quake might happen at any moment greater, less, or equal to 10 minutes.

Memoryless Property:

P(X>x+a|X>a)=P(X>x), for a,x≥0.
Tossing a fair coin is an example of a memoryless probability distribution. You have a 50 percent probability of getting heads every time you toss the coin.

P(H | Tail came in last toss) = P(H) = 0.5
For a real-life example, consider the independent failures of a machine. The probability that the machine will fail five minutes from now is independent of the fact that it hasn’t failed for three months.

Interarrival or waiting time

Let N(t) be a Poisson process with rate λ. Let X1 be the time of the first arrival and X2 be the time elapsed between the first and the second arrival and so on. Then, Xi∼ Exponential(λ), for i=1,2,3,⋯ such that Xi’s are independent. To put it simply, the waiting times follow the exponential distribution and are independent of each other. (We will not go into the mathematics behind this)

Fig 8: Interarrival times
Example
Let N(t) be a Poisson process with λ=2, and let X1, X2, ⋯⋯ be the corresponding interarrival times. Given that we had no arrivals before t=1 sec, find P(X1>3)i.e. the probability that first arrival will be observed after 3 seconds from t=0?
Soln:
Using memoryless property
P(X>x+a|X>a)=P(X>x), for a,x≥0.
P(X1>3|X1>1) = P(X1>1+2|X1>1) = P(X1>2)
Hence, it is no different from saying that waiting time is more than 2 seconds i.e P(X1>2) = e−λt = e−2*2 ≈0.0183 .
To better understand this, let us say there is a random variable X which keeps track of number of tosses after which first head is observed in a coin(X=1 implies head observed in very first toss, lucky!).Now, P(X>3 | X>1) means there was no head observed in first toss, find the probability that first head will be observed after total of three tosses? Soln: If you know X>1, then just fade this fact out of the memory and shift the X= 0 to X =1 (See images below), and now the question becomes P(X>2). See how easily we got rid of the past observations and started the problem statement afresh from a new point of view. This is memorylessness!
Fig 9,10: The figure on the left shows the shifting of the axis while on the right shows shifted axis highlighting faded memory

COVID-19 : A Poisson Process

The number of Covid-19 deaths that occur today is independent of the number of deaths yesterday. I would be using the day-wise number of Covid-19 cases dataset for the period 1st Feb 2020 to 1st May 2020. The column ‘New deaths’ will be used for our analysis.

df = pd.read_csv('/content/day_wise.csv')df.head()
Fig 11: Dataframe for day-wise number of Covid-19 cases dataset
df.describe()['New deaths']
Fig 12: Descriptive statistics for ‘New deaths’ column

The average number of daily deaths from covid-19 from February 1, 2020, to May 1, 2020, was 2633 people (rounding off) which is the value of λ. Chances of not having deaths from Covid-19 in 10 days would be

Fig 13: Solution

There is almost no chance to see 0 covid-19 deaths in 10 days. Follow me for more such articles!

References

  1. https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php
  2. https://nptel.ac.in/courses/111/102/111102134/
  3. https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explained-4e2cb17d459

--

--

Ritesh Uppal

Got hit in head by waves of data! Research Intern @Samsung | Ex-Business Analyst @UC Berkeley