Time Series forecasting using LSTM/ARIMA/Moving Average use case(Single/Multi-variate) with code

13 min readMar 21, 2021

Steps:

Intro
Traditional method(Moving Average/ARIMA/Regression/KNN/Prophet)
Advance method(Univariate Single-step/Multi-variate Single-step/Multi-variate Multi-step time-series forecasting)

Time series forecasting is an important area of machine learning that is often neglected. It is important because there are so many prediction problems that involve a time component. These problems are neglected because it is this time component that makes time series problems more difficult to handle.

There are so many factors involved in the prediction — physical factors vs. psychological, rational and irrational behavior, etc. All these aspects combine to make share prices volatile and very difficult to predict with a high degree of accuracy. Using features like the latest announcements about an organization, their quarterly revenue results, etc., machine learning techniques have the potential to unearth patterns and insights we didn’t see before, and these can be used to make unerringly accurate predictions.

Let's read data to predict share price. Find script and data here

There are multiple variables in the dataset — date, open, high, low, last, close, total_trade_quantity, and turnover.

The columns Open and Close represent the starting and final price at which the stock is traded on a particular day.
High, Low and Last represent the maximum, minimum, and last price of the share for the day.
Total Trade Quantity is the number of shares bought or sold in the day and Turnover (Lacs) is the turnover of the particular company on a given dat

Note: the market is closed on weekends and public holidays.Notice the above table again, some date values are missing.

The profit or loss calculation is usually determined by the closing price of a stock for the day, hence we will consider the closing price as the target variable.

Moving Average:

The predicted closing price for each day will be the average of a set of previously observed values. Instead of using the simple average, we will be using the moving average technique which uses the latest set of values for each prediction. In other words, for each subsequent step, the predicted values are taken into consideration while removing the oldest observed value from the set. Here is a simple figure that will help you understand this with more clarity.

The first step is to create a dataframe that contains only the Date and Close price columns, then split it into train and validation sets to verify our predictions.

RMSE does not help us in understanding how the model performed. So here is a plot of the predicted values along with the actual values.

The RMSE value is close to 105 but the results are not very promising (as you can gather from the plot). The predicted values are of the same range as the observed values in the train set (there is an increasing trend initially and then a slow decrease).

In the next section, we will look at two commonly used machine learning techniques — Linear Regression and kNN, and see how they perform on our stock market data.

Linear Regression:

The most basic machine learning algorithm that can be implemented on this data is linear regression. we do not have a set of independent variables. We have only the dates instead. Let us use the date column to extract features like — day, month, year, mon/fri etc. and then fit a linear regression model.

We will first sort the dataset in ascending order and then create a separate dataset so that any new feature created does not affect the original data.

This creates features such as:

‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.

Apart from this, we can add our own set of features that we believe would be relevant for the predictions. For instance, my hypothesis is that the first and last days of the week could potentially affect the closing price of the stock far more than the other days. So I have created a feature that identifies whether a given day is Monday/Friday or Tuesday/Wednesday/Thursday.

If the day of week is equal to 0 or 4, the column value will be 1, otherwise 0. Similarly, you can create multiple features. If you have some ideas for features that can be helpful in predicting stock price.

The RMSE value is higher than the previous technique, which clearly shows that linear regression has performed poorly. Let’s look at the plot and understand why linear regression has not done well:

One disadvantage in using regression algorithms is that the model overfits to the date and month column. Instead of taking into account the previous values from the point of prediction, the model will consider the value from the same date a month ago, or the same date/month a year ago.

k-Nearest Neighbours

Another interesting ML algorithm that one can use here is kNN (k nearest neighbours). Based on the independent variables, kNN finds the similarity between new data points and old data points.

There is not a huge difference in the RMSE value, but a plot for the predicted and actual values should provide a more clear understanding.

The RMSE value is almost similar to the linear regression model and the plot shows the same pattern. Like linear regression, kNN also identified a drop in January 2018 since that has been the pattern for the past years. We can safely say that regression algorithms have not performed well on this dataset.

Time series forecasting techniques

Auto ARIMA

ARIMA is a very popular statistical method for time series forecasting. ARIMA models take into account the past values to predict the future values. There are three important parameters in ARIMA:

p (past values used for forecasting the next value)
q (past forecast errors used to predict the future values)
d (order of differencing)

Parameter tuning for ARIMA consumes a lot of time. So we will use auto ARIMA which automatically selects the best combination of (p,q,d) that provides the least error. To read more about how auto ARIMA works, refer to this article.

https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/

As we saw earlier, an auto ARIMA model uses past data to understand the pattern in the time series. Using these values, the model captured an increasing trend in the series. Although the predictions using this technique are far better than that of the previously implemented machine learning models, these predictions are still not close to the real values.

As its evident from the plot, the model has captured a trend in the series, but does not focus on the seasonal part. In the next section, we will implement a time series model that takes both trend and seasonality of a series into account.

Prophet

There are a number of time series techniques that can be implemented on the stock prediction dataset, but most of these techniques require a lot of data preprocessing before fitting the model. Prophet, designed and pioneered by Facebook, is a time series forecasting library that requires no data preprocessing and is extremely simple to implement. The input for Prophet is a dataframe with two columns: date and target (ds and y).

Prophet tries to capture the seasonality in the past data and works well when the dataset is large.

https://www.analyticsvidhya.com/blog/2018/05/generate-accurate-forecasts-facebook-prophet-python-r/

Prophet (like most time series forecasting techniques) tries to capture the trend and seasonality from past data. This model usually performs well on time series datasets, but fails to live up to it’s reputation in this case.

As it turns out, stock prices do not have a particular trend or seasonality. It highly depends on what is currently going on in the market and thus the prices rise and fall. Hence forecasting techniques like ARIMA, SARIMA and Prophet would not show good results for this particular problem.

Long Short Term Memory (LSTM):

LSTMs are widely used for sequence prediction problems and have proven to be extremely effective. The reason they work so well is because LSTM is able to store past information that is important, and forget the information that is not. LSTM has three gates:

The input gate: The input gate adds information to the cell state
The forget gate: It removes the information that is no longer required by the model
The output gate: Output Gate at LSTM selects the information to be shown as output

The LSTM model can be tuned for various parameters such as changing the number of LSTM layers, adding dropout value or increasing the number of epochs. Stock price is affected by the news about the company and other factors like demonetization or merger/demerger of the companies.

Lets try different type of usecase using LSTM comparing with base model “Moving window average”-

Univariate time-series forecasting
Multi-variate & single-step forecasting(yi is scaler)
Multi-variate & Multi-step forecasting(yi is dynamic)

Time-Series forecasting basically means predicting future dependent variable (y) based on past independent variable (x). This article makes you comfortable in reading TensorFlow 2.0 also.

Components of Time Series

Time series analysis provides a body of techniques to better understand a dataset.

Perhaps the most useful of these is the decomposition of a time series into 4 constituent parts:

Level. The baseline value for the series if it were a straight line.
Trend. The optional and often linear increasing or decreasing behavior of the series over time.
Seasonality. The optional repeating patterns or cycles of behavior over time.
Noise. The optional variability in the observations cannot be explained by the model.

Let's start with code

Reading data and preprocessing

There are multiple variables in the dataset — ‘Date Time’, ‘p (mbar)’, ‘T (degC)’, ‘Tpot (K)’, ‘Tdew (degC)’, ‘rh (%)’, ‘VPmax (mbar)’, ‘VPact (mbar)’, ‘VPdef (mbar)’, ‘sh (g/kg)’, ‘H2OC (mmol/mol)’, ‘rho (g/m**3)’, ‘wv (m/s)’, ‘max. wv (m/s)’, ‘wd (deg)’

Here Observations is 1)One reading every 10 mins 2)1 day = 6*24 = 144 readings 3) 5 days = 144*5 = 720 readings

Moving window average

Given last ‘k’ values of temp-observations (only one feature <=> univariate), predict the next observation. Basically, Average the previous k values to predict the next value.

‘Average’ is easily one of the most common things we use in our day-to-day lives. For instance, finding the average temperature of the past few days to get an idea about today’s temperature.

The predicted temp will be the average of a set of previously observed values. Instead of using the simple average, we will be using the moving average technique which uses the latest set of values for each prediction. In other words, for each subsequent step, the predicted values are taken into consideration while removing the oldest observed value from the set. Here is a simple figure that will help you understand this with more clarity.

We will implement this technique on our dataset. The first step is to create a dataframe that contains only the Date and temp, then split it into train and validation sets to verify our predictions.

Forecasting task: Predict temperature (in deg C) in the future.

Creating dataframe for prediction, time as an index. given historical temp, predict future temp. So here we will split based on time, first 300k observation will be used for training, rest of everything will be used for the test. At every point, it took the previous k value and predict the next value. Will do it again for the next cycle and predict.

print (‘Single window of past history’)
print (x_train_uni[0])
print (‘\n Target temperature to predict’)
print (y_train_uni[0])

Let’s visualize this to get a more intuitive understanding. So here is a plot of the predicted values along with the actual values.

The model predicts is gree value which is the average of all values. So here average s far away from the actual value.

Univariate time-series forecasting(Univariate LSTM):

Features from the history: only temperature => univariate
Problem definition: Given last “k=20” values of temp, predict the next temp value.

Here, we will start to set up our LSTM model architecture by initializing the optimizer learning rate as well as the number of layers in the network. The neural network consists of 1 LSTM node with 8 hidden units, a dense layer that specify the model’s output based on how much future data we want to forecast.

In training, we have 200 batches of dataset, therefore the steps_per_epoch will be 200. In each steps, it will take 256 data points (1 batch) for training. With 200 steps per epoch, all 200 batches will be trained in each epoch. With 10 epochs, each batch will be trained 10 times.

Here it will create the 8 LSTM cell, each cell will have input and finally connected with the single dense unit which gives the final output. loss used here is mean absolute error(mae) and optimizer adam for compilation.

For each epoch, we will send 200 batch sizes. You can edit model architecture with multiple layers of LSTM layers.

loss decreased from 0.7541 to 0.0228 after 10 epochs in train and 0.19 to 0.0170 in Val.

Still, errors having when patterns not defined, perform best in the case of the historical data pattern. Try with some other model architecture.

Multi-variate & single-step forecasting(yi is scaler):

For Multivariate forecasting, it simply means predicting dependent variable (y) based on more than one independent variable (x).

Problem definition: Given three features (p, T, rho) at each timestamp in the past, predict the temperature at a single time-stamp in the future.

Here each data considered as time-series data with respect to the predicted variable. Let's select three feature from all variables

We can see the relation between temp and humidity inversely proportional. Next, we will prepare train/Val data and feed it into our trained model to forecast the next step. Besides, it also does prepare the past data for plotting purposes as well as ground-truth for validation.

https://www.kaggle.com/kcostya/lstm-models-for-multi-step-time-series-forecast

If a single step is true,

Here we can see LSTM performs well, in this case, compared to the Average moving method.

Multi-variate & multi-step forecasting(Yi is vector):

the model in this article will predict multi-step ahead of the dependent variable (y) based on the past k independent variables (x). Here Generate multiple future values of temperature.

Predicted 72 feature values, here single_step not included in dataset creation.

Actual next 72 value

Note: Don't have to bother about the cyclicity problem

Sample plot on three data

In this Article, an LSTM model is developed. It has the capability of forecasting the next steps ahead of data based on previous historical data with k features. Nowadays LSTM extensively uses for time series. Attention and transformer(require a lot of data and fine-tuning) so mostly use for NLP task.

The LSTM model can be tuned for various parameters such as changing the number of LSTM layers, adding dropout value, or increasing the number of epochs for prediction.

==============Code================

ranasingh-gkp/Time_series_forcasting

Time_series_forcasting. Contribute to ranasingh-gkp/Time_series_forcasting development by creating an account on…

github.com

Reference:

Applied AI

https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-learningnd-deep-learning-techniques-python/