Neural networks, machine learning and forecasting, part V – training a network

 

By D. Karaoulanis

Τhis is the fifth, and next to the last, part of this series. In this post we discuss the construction and "education" or "training" of a neural network and why it works.

In the process of constructing a neural network, the first thing that concerns us is the set of neurons of which it will be composed, as well as the number of hidden levels of processing of which it will be composed. Then, an important factor is the weighting factors that will be defined for each synapse between neurons. These weighting factors are decisive in terms of the results that this network expects to give. Finally, the definition of the function that connects the input and output data is also important and is the subject of how to train the neural network. The larger and more complex the neural network, the greater the number of functions it can implement. In the process of training such a network, the greater the number of input-output vectors entered, the more accurate the determination of this function and the more efficient the network is, performing the desired task. There are a number of algorithms that are used to train neural networks. In practice training some neural network is equivalent to strengthening or weakening the values of the weights, according to the rules set by the training algorithm, until they receive the appropriate values. The methodologies used are as follows:

Supervised Training: During supervised training, the instructor is provided with the input and output training vectors, according to which the appropriate optimization algorithm tries to identify the appropriate function (filter) that connects the data. Such an optimization (minimization) algorithm modifies the values of the weights in the various synapses in order to yield a specific result. The process of finding the appropriate function depends on the input-output training vectors used as well as the initial values of the weights given. The iterative process through the minimization algorithm terminates when the error function receives a minimum value. A serious disadvantage of this method is that the network is likely to be trapped in a local minimum without being able to search for a better solution.

Unsupervised Training: During unsupervised training, the instructor enters into the network the input vectors, which he has sorted by categories. Through the error function minimization algorithm, the weight coefficients are pseudorandom changed, giving results that are evaluated and adjusted. In this way, we would say, the network trains itself, keeping in its memory the appropriate weight coefficients that are suitable for a specific task. In this way, the network works somewhat "consultatively", yielding various results, which are perceived as possible answers to a specific problem. The disadvantage of this method is the long execution time of the algorithm for finding a total minimum.

Competition between the Neurons: This method of training consists in the logic that during the training of the artificial neural network, some neurons play an active role for a particular process while others do not participate at all. Thus, inactive neurons remain " off "while the network is occupied and evolved by the "dominants". This process is quite similar to the function that occurs several times in the human brain (according to neuroscientists), where some neurons remain inactive in one specific process while others are active. It is, we would say, a form of specialization of the parts of the network where each of these groups is activated in the performance of a specific action. Such a model of network operation separates it into areas of functionality and from this side easier to trace in terms of its functionality.

Learning theory must address three fundamental and practical issues related to learning from examples, ability, sample complexity, time complexity. The first issue concerns whether the correct solution is contained in the set of solutions that a network can offer. If not, we can never hope to achieve the optimal solution. This remains a difficult and open problem. The second issue, the complexity of the sample, determines the number of training standards required to train the network in order to ensure a valid generalization. Very few patterns can cause the "over-Fitting" problem where the network works well on the training dataset, but with low performance on new datasets that come from the same distribution as the training templates. The third issue is the computational complexity of the learning algorithm used to estimate a solution from education standards. Many existing learning algorithms have high computational complexity. For example, the popular back propagation learning algorithm for feed forward networks requires high computing power due to slow convergence.

In the back propagation algorithm, the change in weights is based on the calculation of the contribution of each weight to the total error. Νeurons compete, in a way, with each other, and only the one with the greatest response to given input modifies his weights. Finally, in random learning, changes in weights are introduced randomly and depending on whether or not the output is improved based on some user-defined criteria; these changes are adopted or discarded.

A very widespread technique is that of gradual learning of artificial neural networks. It represents a dynamic supervised learning and unsupervised learning technique that can be applied when training data is gradually made available over time or its size is out of system memory limits. The goal of progressive learning is to adapt the learning model to the new data without forgetting its existing knowledge, without retraining the model. Some progressive training algorithms have incorporated some parameters or assumptions that control the relevance of old data, while others, called fixed incremental machine learning algorithms, learn representations of training data that have not been partially forgotten over time. Fuzzy ART and TopoART are two examples of this second approach.

To consider that an algorithm can support progressive learning, it must meet the following criteria:

• It should be able to learn additional information from the new training data.
• It should retain the previously acquired knowledge.
• It should also be able to train with new classes that may appear with the new data.

Finally, in recent years, algorithms have been developed, which are based on the logic of the evolution of biological nervous systems in nature. Compared to other methods of training neural networks, this method of developing algorithms is very general, allows learning without clear objectives, with only sparse feedback and with arbitrary neural models and network structures. Similar to natural selection in nature, which is driven only by feedback from reproductive success, this method is driven by some measure of overall performance. While the most common artificial neural network learning algorithms operate through supervised learning and thus depend on a set of input-output pairs, the main advantage of the new approach is that it allows learning even when such data is not fully available, with only sparse feedback.