Extreme Learning Machine to Multiclass Classification

An example applied to the Iris dataset

Rafael Rocha
6 min readApr 17, 2022
Photo by Sven Brandsma on Upsplash

Extreme Learning Machine

Extreme Learning Machine (ELM) are feedforward neural networks, which can be used for regression and classification approaches, for example. The weights between the input layer and the hidden layer are assigned randomly. While the weights between the hidden layer and the output layer are computed or learned in a one-step. The second set of weights is computed by the Moore-Penrose inverse of the hidden layer output matrix.

A previous blog post used the EML to perform the time series forecasting, by a method that transforms the time series forecasting problem into a linear regression problem. A reading of the previous blog post is encouraged, which elucidates a better understanding of ELM concepts. The blog post is found at the link.

In this blog post is used the ELM to perform a multiclass classification task, specifically a classification of Iris plants into three species.

The data

The Iris dataset (available on Kaggle) contains 3 classes, that represent the three species of Iris plants (Setosa, Versicolour, and Virginica) with 50 examples each. The dataset has 4 attributes or features that are (in cm): sepal length, sepal width, petal length, and petal width. The figure below shows three lines of the dataset each of a specific class where Setosa, Versicolour, and Virginica are assassinated as class Iris-setosa, Iris-versicolor, and Iris-virginica, respectively.

Iris dataset

Before proceeding, it is necessary to adjust our target variable. To do this, it is transformed the species into a range of values, in this case: Iris-setosa = 0, Iris-versicolor = 1, and Iris-virginica = 2.

In the time series forecasting (or linear regression, in this case) analysis, we had a single target variable (continuous). In the classification case, it is essential to change the encoding of our target variable, due the classification works with discrete values (classes). It is used one-hot encoding to perform this, where we transform the single values of the target variable into a vector, in which the 1 in the specific index indicates the respective class.

Iris-setosa -> 0 -> [1, 0, 0]Iris-versicolor -> 1 -> [0, 1, 0]Iris-virginica -> 2 -> [0, 0, 1]

The example above presents the target variable of Iris-setosa class, which is transformed to a value in a range and after to the one-hot encoding, where the value 1 in index 0 indicates that the target variable is of the class 0 (Iris-setosa). The code below shows how to perform the one-hot encoding in Python:

The targets are the vector of targets variables and n_classes is the number of classes in data, in our case n_classes is 3. To explore our future results, may be necessary to revert the one-hot encoding and code below do this:

Network

The one-hot encoding influences our feedforward neural network architecture, specifically in the number of outputs neurons. In this scenario, the number of outputs neurons is equal to the number of classes to represent the three species of Iris plant.

Feedforward neural network

The figure above presents the feedforward neural network. X is the input matrix of size M x N, where M is the number of examples and N is the number of features. W1 is the weights matrix assigned randomly of size L x N +1, where L is the number of neurons in the hidden layer. W2 is the weights matrix that is obtained by the Moore-Penrose inverse. H is the hidden layer output, and y is the output neurons.

Training

Initially, we split our data into train and test sets, with 20% of the data being used in the test set. The code in Python below split the data:

Before training the ELM model, we must normalize the data. Here is used the standard scaler is to standardize features by removing the mean and scaling to unit variance. The code below performs the normalization.

The training set is used to obtain the mean and the standard deviation and after we can normalize both train and test sets.

Finally, we can train the ELM model for the Iris plant classification. We used L=15 neurons in the hidden layer and as result obtain the W1 and W2 weights matrices, besides the predictions. The function below trains the ELM model with input matrix X, target variable y, number of neurons L and W1 weight matrix (optional) as input parameters. The function returns the W1, W2, and the predictions of the training set.

The predictions are in one-hot encoding, as shows below:

[0.7, 0.2, 0.1] # Prediction: one-hot encoding[0] # Prediction: revert one-hot encoding

Above presents one single prediction, in which we revert the one-hot encoding we obtain the [0], which indicates that the example was classified as belonging to class 0 or Iris-setosa.

Results

With the weight matrices in hand, we can make the predictions of the test set. To evaluate both train and test sets, we use the accuracy metric, which can be defined as the mean of the elements that are equal when comparing the true target variable and the predicted target variable. Here, we use the target variables in the normal form (no one-hot encoding), so we must revert the one-hot encoding to obtain the accuracy results. The function below calculates the accuracy:

Evaluating the results, we obtain 98.33% of accuracy in the training set and 96.66 in the test set, which is a reasonable performance of classification in the Iris dataset.

Besides, we can evaluate the performance in terms of the confusion matrix, which allows better visualization of results. The confusion matrix below shows the results presented by the test set.

[9,  0, 0][0, 14, 1][0, 0,  6]

The rows represent the true label and the columns represent the predicted label. The matrix size is 3 x 3, due to the 3 classes present in the dataset. The sum of row 0 is 9, that is, exist 9 examples of class 0 in the test set, and 9 examples are correctly classified as class 0 (9 only in column 0). On the other hand, exist 15 examples of class 1 in the test set, but only 14 are correctly classified as class 1 (column 1) and 1 is misclassified as class 2 (column 2).

As the accuracy metric, the confusion matrix uses the target variable in normal form. The function below returns the confusion matrix, which has as input parameters the true target variable and the predicted target variable.

Conclusion

In this way, using ELM is a good introductory approach to performing the multiclass classification, In addition to enabling the initial understanding of neural networks, which is a recurrent topic in both machine learning and data science. The ELM is a simple approach but obtains powerful results in propose, where the training is done in a one-step with help of the Moore-Penrose inverse.

The complete code is available on Github and Colab. Follow the blog if the post is helpful for you.

If you are interested in ELM, I wrote a blog post about time series forecasting using ELM. If you want to check it out:

--

--