We currently (as of Jan, 2021) have large amounts of data concerning the Covid-19 pandemic available. We have daily updates through various news outlets and we can download data from different country’s public data websites. Aggregated data is available for viewing at Worldometers coronavirus page, and for download at Our World in Data’s Covid-19 data download page. We can certainly rely on experts to provide data and their interpretations, but sometimes it is enlightening to download the data and try to understand it ourselves. If we have Excel available, we have a very powerful tool for numerical data analysis. What follows is a simple analysis of some of the available Covid-19 data as an example of what we can do with Excel – I am not a trained epidemiologist and don’t desire to be one.

In this post we are going to look at data for a period of just over eight and one half months for the total number of cases of Covid-19 reported in the world. This is the largest number available, so hopefully it is reasonably accurate, statistically speaking. The dataset was downloaded from Our World in Data on January 19, 2021 so the data are reported up to January 18, 2021. The full dataset contains a lot of detail and everything was removed except for the summarized data for the world. The date of the first day of data reported is Jan. 22, 2020 with a  total of 557 cases.

Fig. 1 shows almost a full year of data. The initial numbers are small and it is difficult to tell how quickly things are changing on a linear scale. The divisions for the y-axis in Fig. 1 are 20 million so numbers in the thousands can’t be seen.

Fig. 2 shows the same data with a logarithmic scale for the y-axis where the divisions increase by a power of 10 with each division. Now we can see how rapidly the numbers were changing near the beginning of the pandemic. By the end of April, the graph in Fig. 2 seems to settle down and becomes almost a straight line – we will consider the range after May 1 in detail.

Fig. 3 shows just the data between May 1, 2020 and Jan. 18, 2021. This allows us  to zoom in a little bit on the behaviour of the data values. When we see a straight line on a logarithmic scale it indicates that values are increasing exponentially, meaning that there is consistent time for the number of cases to double. If the graph in Fig 3 were perfectly straight, no matter where we started, there would be a fixed time to get to double the number. We will discuss this more below.

To start with, Excel was used to provide an exponential trendline and the resulting equation is shown in Fig. 3. The general equation that Excel uses is y = A \exp{B \times x} , where the values A and B are adjusted to provide the best possible description of the data. The constant, e , is a value that appears in mathematics and physics and has a value of roughly 2.7182818. This function actually looks funny with the very small number, A = 3.31 \times 10^{-228}; it’s hard to see what this number has to do with the number of Covid-19 cases. We look at this more below.

The key to understanding the values provided by the exponential fit equation is to remember that the x-axis of Fig. 3 is a set of date values, which in Excel are numbers. In fact, Excel stores dates as the number of days from January 1, 1900 – that is January 1, 1900 is the number 1 and each day after that is numbered sequentially. For the data set we are considering, May 1, 2020 is the number 43,952, while the last date in our range, January 18, 2021 is the number 44,214. When Excel is working out the trendline equation it is using numbers between these values and we get a fit equation that is hard to interpret if we forget how Excel is storing and working with dates. The number A = 3.31 \times 10^{-228} is then the value of the function where the date value is 0. Fig. 4: World Total Covid-19 Cases May 1, 2020 to Jan. 18, 2021 – exponential trendlines

Fig. 4 shows the two segments of Covid-19 data with the exponential trendlines that Excel has fit. Note that the x-axis is now the number of days starting with May 1, 2020. This allows us to deal with smaller numbers that are easier to interpret than the numbers that Excel uses internally for dates. For us now and in what follows, May 1, 2020 corresponds to the number 1, and each day after that is incremented by 1 so January 18, 2021 corresponds to the number 263.

The data values have been split into two sections, the first from May 1, 2020 to August 8, 2020, and the second from August 9, 2020 to January 18, 2021. We’ve done this because these ranges are fairly straight lines in Fig. 4. We are going to find exponential trendlines for each of the sections of data. The equations of the trendlines are shown on the chart of Fig. 4 and we see that the coefficients A for both trendlines are a few million, which make more sense as a starting point for the number of Covid-19 cases on May 1, 2020.

Fig. 5 shows the same data values and trendlines as Fig. 4 but the y-axis is now linear instead of logarithmic which allows us to see that the data and trendlines are very close, but there is a bit of difference. The coefficients of determination, the R^2 values for both trendlines are very close to 1. This indicates that the trendlines are able to do a very good job of describing the data. Fig. 5: World Total Covid-19 Cases May 1, 2020 to Jan. 18, 2021 – linear axes

These functions are mathematical models that describe the behaviour of the number of Covid-19 cases. With the trendline equations we can make predictions about future behaviour of the number of cases – this is a simple predictive model. Before we go on to making predictions, look at Figs. 6 and 7 where we show the two data ranges with linear scales for the y-axis.

The scale of Figs. 6 and 7 is large enough that we can see how well the trendlines describe the data values. The value R^2 is called the coefficient of determination and provides a numerical measure of how well the trendline describes the data values. For our case, R^2 is very close to 1 so our trend lines are doing very well.

The trendline equation from Fig. 7: y = 7.58 \times 10^{6} e^{9.83 \times 10^{-3} x } , provides a good description of the Covid-19 data from August 9, 2020 to January 18, 2021, a period of just over 5 months or 23 weeks. A key property of exponential increase is that there is a consistent time period T_2 for the number of cases to double. Once we find that period, no matter where we start, we will find twice as many cases after a time T_2 . We are going to forgo showing the math here and simply state that the doubling time is T_2 = 70 days. For over 5 months the number of Covid-19 cases in the world has been doubling every 10 weeks.

Extrapolation can be dangerous, particularly with a simple model, but it can also be very instructive. It took about a year for the number of Covid-19 cases in the world to reach one hundred million. That number was reached on January 25, 2021 according to the WorldOMeters coronavirus page. If the number continues to increase at the rate it has been for the last five months, we will reach two hundred million on April 5, 2021. The growth of the number of active cases in this time frame could present serious problems for many health care systems.

For the sake of argument; let’s assume that the growth of the number of cases continues at the same rate for the next year. We will then see the number of cases for Covid-19 in the world increase as shown in Fig. 8. The number of cases increases rapidly and we show the dates that the model predicts:

After the number of cases have doubled six times, over 80% of the people on the planet will have been exposed to Covid-19 – this would happen in just over a year! Let’s say that again: it took about a year for the total number of Covid-19 cases in the world to reach 100 million. At the rate of growth that we’ve seen since Aug. 9, 2020 80% of the world will develop Covid-19 by March 21, 2022, slightly over a year after we reached one hundred million cases. This rate of growth is a huge assumption of course: we have politicians setting policies designed to protect us and the promise of vaccines coming soon. Of course, we also face the threat of new strains of the Covid-19 virus which may behave differently from the original so it is not going to be easy to make decisions on how we should behave to protect each other.

An important point is that we aren’t going to just see problems after the number of cases doubles 6 times. In 10 weeks, after a single doubling, if the total number of cases is 200,000,000 we will see countries whose medical systems are swamped and facing shortages of PPE, oxygen, nurses, doctors and beds in ICUs.

Excel is a very powerful tool which allows us to explore and understand the data available to us whenever we choose. This is a simple predictive model of the total number of Covid-19 cases in the world, but does not take the place of complex models in the hands of trained epidemiologists.