
6 minute read
Joining Forces
By George Ogden
Alone, a perceptron acts as a unit. It is a mathematical model that aims to recreate a neuron by taking in inputs and generating an output between 0 and 1 by multiplying each input by a corresponding weight, adding a bias and then mapping the output through a sigmoid function. In many cases, however, there is strength in numbers. On its own, it can only solve linearly separable problems (Limitations and Cautions, 2005).
While this works for many problems, for example, the more bedrooms a house has, the higher its value or the more profit a company makes, the more likely it is their stock price will rise, there are a lot of non-linearly separable problems. The older a banana is, the better it tastes, until a certain age, when it becomes less edible and the older a car is, the lower its value, until it becomes a ‘classic car’ and it rises in price.
The image shows an example of linearly separability. Cited from: Linear separability | Machine Learning Quick Reference [Internet]. [cited 2022 Feb 16].Available from: https://subscription. packtpub.com/book/big_data_and_business_intelligence/9781788830577/2/ch02lvl1sec26/linear-separability The image shows an example of Non-linearly separability. Cited from: Linear separability | Machine Learning Quick Reference [Internet]. [cited 2022 Feb 16].Available from: https://subscription. packtpub.com/book/big_data_and_business_intelligence/9781788830577/2/ch02lvl1sec26/linear-separability However, this problem can be overcome by combining perceptrons to create a network that can solve more complicated problems. The organisational system of most neural networks (Brunton, 2019) is a layer-by-layer approach. The first layer is the input layer, which receives the absolute input, for example, the pixel values of an image. From there, the outputs of this layer become the inputs of the next layer and then these outputs become the inputs of the next layer. The action of each layer is sequential, so the layers always occur in a certain order and (most of the time) cannot be skipped. A value is output(ted) once the input has propagated

An image showcasing how a neural network works. Cited: What are Neural Networks? [Internet]. 2021 [cited 2022 Feb 16]. Available from: https://www.ibm.com/cloud/learn/neural-networks through all of the hidden layers to reach the output layer.

Just as there does not have to be a single input, there can also be multiple outputs – one famous example of this is software that recognises handwritten digits, where the ten outputs of the final layer represent the probability of it being each digit (Sanderson, 2017).

This image depicts how a handwritten letter, after being passed through a neural network outputs the probabilities of it being each number. Cited from: Awesome ML Frameworks and MNIST Classification [Internet]. [cited 2022 Feb 16]. Available from: https://kaggle.com/arunkumarramanan/awesome-ml-frameworks-and-mnist-classification In a few ways, this mirrors the brain. In the brain, the output of one neuron becomes the input of the next through synapses and there is some idea of connection. Additionally, the output of a single perceptron is meant to act like a model of a neuron, in that is preserves key features, such as the fact its outputs are based on its inputs and the output is between 0 and 1 (Brain Neurons & Synapses, 2019). However, one key difference is the relative sizes between artificial neural networks and the brain. One of the largest neural networks of its time, which won ImageNet, an annual artificial intelligence competition to classify images, persisted of 650,000 neurons, which appears a large number until comparing it with the 86 billion neurons in the brain. However, where the brain

can do many jobs well, ImageNet only had one task – classifying 1.2 million pre-defined images into 1000 categories. And despite the fact it was 84.7% accurate, this is far better than the second-place entry which was 73.8% accurate (Krizhevsky, 2010).
The connection between perceptrons is best visualised as a line, where the weight of each line represents the strength of the connection between each unit. Where a stronger connection would suggest a higher dependence between the input and output, this is true due to the underlying mathematics. If the value of the input changes slightly and the weight is a large value, there will be a large change in a perceptron’s output, whereas if the weight is a small value, there will be a smaller change. This will then affect the input of the next layer, and then the following layer, and so on, eventually affecting the final output.
On the other hand, a weaker connection, hence a smaller weight, less dependent on the input. For example, the speed of a car is not affected by its colour, so there would be a negligible weight if this were turned into a neural network, whereas the speed that the wheels are turning would require a larger weight.
To formalise the mathematics, the equation of a perceptron is an excellent starting point.
where the output of each perceptron is the sum of all of the weights multiplied by all of the inputs, in turn, added to a bias and then mapped through the sigmoid function, as the network is still made out of perceptrons, all that is needed is to change what some of the variables represent. There are still inputs and outputs of each layer and the equation remains the same, however, the values in the equation change. In the new case, it helps to rewrite the first equation in its more condensed form:
which expresses the same idea – but the sigma highlights the sum without of each product without using ellipses. The modified equation becomes:
The appearance of subscripts and superscripts is not due to this suddenly becoming a polynomial, but to identify each perceptron. The superscripts identify the layer that the perceptron

is in, the j subscript represents the position in the layer that the perceptron is and the i subscript represents the corresponding weight of a perceptron from the ith perceptron in the previous layer. The equation states that the output of the jth perceptron in the Lth layer is equal to its bias, plus the sum of all of its weights multiplied by the outputs of each of the previous layer, mapped through a sigmoid function (Shiffman, 2018).
This strange-looking equation can be used to approximate any function or task, but only when the task is written mathematically, for example, a letter may be represented by a numerical value, for example, a=1, b=2 … Using this system, the word “the” becomes 20,8,5, which could be added together to give 33 or multiplied to give 800 – operations that we cannot do with the word “the”. Due to this property, neural networks can be defined as “Universal Function Approximators”, which means that the values of the perceptrons can be altered to give an approximate value for any function, whether that be image classification or adding two numbers. However, as it is just an approximator. It may calculate, for example, 1+1 = 1.9, which may seem useless but in another case, it may class an image of a car to be 90% an image of a car and 10% an image of a grill, which is extremely useful, as no function can be defined (currently) that states definitely what an image is (Elfouly, 2019). The real problem, however, lies behind what the values of the weights and biases should be so that the approximation is as good as possible.
References
Alex Krizhevsky, I. S. (2010). ImageNet Classification with Deep Convolutional. Toronto. Brain Neurons & Synapses. (2019, September 27). Retrieved from The Human Memory: https://human-memory.net/brain-neurons-synapses/ Brunton, S. (2019, June 5). Neural Networks Architecture. Neural Network Architectures. YouTube. Retrieved from https://www. youtube.com/watch?v=oJNHXPs0XDk Elfouly, S. (2019, August 4). Neural Networks as universal function approximators. Retrieved from Medium: https://towardsdatascience.com/neural-networks-as-universal-function-approximators-11eda72fa30e Limitations and Cautions. (2005). Retrieved from Neural Network Toolbox: http://matlab.izmiran.ru/help/toolbox/nnet/percep11. html Sanderson, G. (2017, October 5). But what is a Neural Network? | Deep learning, chapter 1. YouTube. Retrieved from https://www. youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi Shiffman, D. (2018, January 18). 10.12: Neural Networks: Feedforward Algorithm Part 1 - The Nature of Code. YouTube. Retrieved from https://www.youtube.com/watch?v=qWK7yW8oS0I&t=1s