A brief introduction to time series analysis

Written by Renato Henriques, PhD | 2020

 

John works in the sales department of his company. On a rainy day in late November, his boss asks him to estimate the company’s revenues for the following month. At his company, sales are calculated on a monthly basis. So he starts by looking at the company’s revenues for the previous months and sees that their revenues are increasing month over month. Then John looks at the company’s previous years’ December sales, and he knows that the company’s past years’ December sales were higher than usual, potentially because of the holiday season, so he is anticipating that this year’s December sales will also be high. However, how high? What will the difference be between his estimate and actual December sales? These questions cannot be answered by John without the help of time series analysis.

The example above illustrates a classic use of time series analysis. In a nutshell, a time series is a set of repeated measurements taken sequentially over time. The main purpose of time series analysis is to predict the future of a certain process (e.g., sales) based on what has happened in the past (sales history). For instance, in the financial market, traders want to predict what the price of a certain stock will be in the upcoming trading days based on previous prices.

But time series analysis is not limited to business or financial matters. One of the best-known time series analyses, for example, is the study of climate change, in which data on global temperatures are collected annually by climate scientists to predict future global temperatures. Utilities that sell electricity or water use time series analysis to predict how much electricity or water they will need to produce in the future to meet demand, based on past consumer behavior. To avoid flying with half-empty planes, airlines collect information on the number of passengers per month to predict how many passengers they will need to accommodate on future dates. This information is then used to plan the minimum number of planes they will need to meet demand, as aircrafts are notoriously expensive to operate.

Before conducting any type of analysis on time series data, we must first assess some of its properties. Understanding the temporal structure of a time series will help us choose the right model to run.

 

 

Breakdown of Time Series

 

A time series can be broken down into a series of components that describe its structure.

  1. Trend is an overall increase or decrease in the series over a relatively long period of time.
  2. Cyclicality describes the rise or fall in the series caused by circumstances that do not have a regular pattern. A stock market exhibits a great deal of cyclicality because it tends to have periods of high values and periods of low values. However, the transition from one condition to another does not follow a regular pattern. Note: Since it is difficult to estimate the trend and cycle components separately, they are grouped into the same component named trend-cycle (not very original, I know!). In fact, this component is generally referred to as the trend. But keep in mind that any cyclicity that your time series have will be included in the trend component.
  3. Seasonality is the persistence of variations that occur periodically at specific regular intervals. For example, air conditioner sales are seasonal and tend to be higher in the summer months and lower in the winter months.
  4. Residual is essentially what remains after accounting for trend and seasonality. It includes everything from measurement errors to unpredictable factors.

 

 

Examples

 

For example, let’s look at a time series describing quarterly gas consumption in the UK from 1960 to 1986, in millions of therm.

 

 

In the case above, there is a clear upward trend in gas consumption over the years. There appears to be no cyclicality in this series.

Furthermore, we can see that gas consumption is at its lowest during the third quarter of each year. This is logical because these are the summer months when heating is not needed. On the other hand, the highest consumption of the year is in the first quarter, which is winter. The second quarter (spring) and the fourth quarter (fall) fall between the first and third quarters. This is the seasonal component.

But this is not all. There are two other aspects that should be considered in time series analysis: autocorrelation and stationarity.

Autocorrelation indicates whether there is a correlation (i.e., similarity) between observations in the time series at certain time lags. To check the autocorrelation structure of a time series, we can use an autocorrelation function, commonly known as ACF.

 

As we can see in the first figure, there is a strong correlation between the 4 time steps due to seasonality. In other words, the gas consumption of the 1st quarter (Q1) of a given year will be similar to the gas consumption of the 1st quarter of the previous year. However, this correlation fades after about 40 lags, as shown in the second figure. This means that the 1st quarter values of one year will not be similar to the 1st quarter values of 10 years ago. This is due to the positive growth trend we see in the original time series.

Stationarity indicates that the mean, variance and autocorrelation structure of the series are constant over time. This is clearly not the case for the time series presented above. The positive trend shows that the mean of the series is increasing over time. We can also observe that the variance of the series is also increasing over time. Between 1960 and 1970, the volatility of the series hovers around 100-200 million therms between low and high consumption quarters. In contrast, for the following decade (1970-1980), there is a sharp increase in volatility. For example, in 1980, the low consumption quarter (Q3), about 217 million therms of gas were consumed while during the high consumption quarter (Q1), over 840 million therms of gas were consumed.

 

Step-by-step Time Series Decomposition

 

 

The first step is to remove the trend (the positive growth over time). One way to do this is to calculate the difference between one timestep and the next. This way, your time series will be centered at 0 and the positive trend will be removed. So let’s do that and plot the series again.

 

Now that we have removed the trend, the second step is to deal with the increasing variance (or volatility) over time. We can see that the variance increases each year. One way to deal with this is to calculate the standard deviation of the detrended gas consumption for each year, and then divide the detrended gas consumption for each quarter by the standard deviation of its year.

 

Great, now the mean is centered around 0, and the variance is constant over time. However, we still have the seasonal pattern. To fix this, we’ll do the following: instead of taking the standard deviation of each year, we’ll take the average of each quarter. So we will calculate the average value of all Q1, Q2, Q3 and Q4. As mentioned earlier when discussing the seasonal pattern, the demand for gas is higher during the winter months (Q1), so the Q1 gas consumption should have a value close to the average of the Q1 gas consumption. After computing the average, we subtract the value for each quarter from the corresponding average. This should remove seasonality from the series. Note that this step is often not necessary as many models allow to incorporate seasonality in their specification.

 

That’s it! In summary, to get this last time series, we removed the positive trend, made sure that its variance is constant over time, and also removed the seasonal component.

We have now cleaned up our time series and visually it looks more stationary than the original series. You can also perform a formal test to see if your time series is truly stationary, such as the Dickey-Fuller test.

Stationarity is important because, in its absence, a model describing the data will vary in accuracy at different times. We may proceed to model and forecast time series. In a future blog post, I will describe the most commonly used time series models.

 

Evaluation of a Predictive Water Quality Model

Evaluation of a Predictive Water Quality Model

Identifying times when water is unsafe for recreation, for drinking, or for aquatic life is a major challenge. Traditionally, sampling has been the preferred means of determining whether water is safe. Predictive modeling based on artificial intelligence (AI) is an approach that is becoming more and more popular.

Data Science: 6 Common Data Types

Data Science: 6 Common Data Types

Before any project, it is crucial to understand the difference between the following data types: numerical, categorical, continuous, discrete, nominal and ordinal. This knowledge is key to fully grasp the statistical nature of the available data and to properly handle any given features. Despite its simplicity, this step is essential to achieve a robust and meaningful data analysis. In fact, data types usually dictate which imputation strategies, statistical measurements, plot designs and algorithms are the most appropriate to use.