Time Series Forecasting through Extreme Learning Machine

An one-step learning approach

Rafael Rocha
6 min readMar 14, 2022

Extreme Learning Machine

The most common artificial neural network architecture is the feedforward neural network. The information of this network propagate (flows) in one direction from the input layer to the output layer.

Extreme Learning Machine (ELM) are feedforward neural networks, which can be used for regression and classification approaches, for example. The weights between the input layer and the hidden layer are assigned randomly. While the weights between the hidden layer and the output layer are computed or learned in a one-step. The second set of weights is computed by the Moore-Penrose inverse of the hidden layer output matrix.

Feedforward Neural Network

The figure below shows a feedforward neural network, which demonstrates the elements in ELM.

The input layer is composed by the input matrix X of size M x N plus the bias of size M x 1, where M is the number of examples and N (equal to 3 in the image) is the number of features. Next is to show the weights W1 assigned randomly of size L x N + 1, where L is the number of neurons in the hidden layer.

The hidden layer output is computed by the following equation:

Where Xa is the concatenation of bias and the input matrix X and tanh is the hyperbolic tangent activation function, which limits the output of each neuron to -1 and 1.

The weights W2 are obtained by the multiplication by the Moore-Penrose inverse of Ha and the target y, as shown in the equation below:

Where Ha is the bias plus hidden layer output matrix H. Thus, we can make predictions by the following equation:

All steps to obtain the parameters W1 and W2 are performed by the training data, and with these parameters in hand, we can make new predictions of data that isn’t in the training process, in this case, the test data. The below function shows the one-step learning of ELM to obtain the predictions and the parameters W1 and W2:

Time Series Forecasting

Initially, it is necessary to transform the time series forecasting problem into a machine learning problem. To do that, we adjust the temporal data in terms of input and target variables to make it available to any linear regression application, including EML.

In this way, we use the concept of lag, which is the past values in a time series. For our problem, the lag is used to set the number of features in the input matrix, if we use a lag of 3, the input matrix will have a size of M x 3, and the three past values of the time series are used as input. The target variable is assigned as the next value in the series, which can be immediately after the first lag (one-step forward) or more steps forward.

To exemplify, consider the example below:

# Time seriesseries = [3.93, 4.58, 4.8, 5.07, 5.14, 4.94]# Three lags and one-step forwardX = [[3.93, 4.58, 4.8],[4.58, 4.8, 5.07],[4.8, 5.07, 5.14]]y = [[5.07],[5.14],[4.94]]# Two lags and two-steps forwardX = [[3.93, 4.58],[4.58, 4.8],[4.8, 5.07]]y = [[5.07],[5.14],[4.94]]

In this example, the time series is series, X and y are the input matrix and the target, respectively. The first example presents three lags and one-step forward, which to predict the value at X(t) it is necessary the values at X(t-1), X(t-2), and X(t-3). In the last example, two lags and two-steps forward, the values at X(t-2) and X(t-3) are used to predict the values at X(t).

The function below adjust the time series to a machine learning problem:

The inputs of the function are the time series, the number of lags (lag) and, the number of steps forward (step_forward). While the outputs are the input matrix of size (M-(lag+step_forward-1)) x lag and the target variable of size (M-(lag+step_forward-1)) x 1.

Results and Evaluation

To evaluate the predictions of ELM is used the mean squared error (MSE) as shown in the equation below:

Where M is the number of examples, C is the number of outputs, y is the true target variables and y_pred is the predicted target variable.

The function below computes the MSE:

Before training the ELM model and obtaining the set weights to make the predictions, we must normalize the time series. Here is used the min-max normalization, which normalizes the input data into a specific range (a, b), in this case (0, 1), as shown in the function below.

To obtain the parameters of normalization is used only the training data, and with these parameters, we normalize both training and test data. In this way, after training the model and making the prediction in both data, we perform the reverse normalization to evaluate properly the model performance.

We used the time series of Brazilian gross domestic product between 1980 and 1997 and the frequency of the data is monthly (total of 256 months), available on link. It is used 80% of the first monthly data to train the model and 20% of the last monthly data to test the training parameters. The figure below shows the time series split into training (blue line) and test (green line) data, where the black dashed line represents the split.

First, to create the X and y, it is used the lag of 2 and one-step forward to make predictions. The trained model with 3 hidden neurons shows a good performance in the time series forecasting, which reach MSE of 30.99 and 38.74 in the training and test data, respectively. The figure below shows the predictions in the test data, assigned as the red dashed line, which is similar to real test data (green line).

When changing the lag to 3 and preserving the one-step forward, occurs a worsening in the model performance, reaching a MSE of 180.83 in the test data. Therefore, it is not a good idea to increase the number of past values to predict one-step forward for the analyzed time series.

Conclusion

In this way, use extreme learning machine is a good start method to time series forecasting, where it is necessary transform the time series into an input and target before. The ELM is simple approach, but obtain powerful results in propose, where the training is done in a one-step with help of Moore-Penrose inverse.

The complete code is available on Github and Colab. Follow the blog if the post it is helpful for you.

--

--