Perceptron as a Function Approximator

Hemanth Nag
Artificial Intelligence in Plain English
4 min readJan 17, 2021

--

In this blog post, I am going to explain how a modified perceptron can be used to approximate function parameters. For the learning process, we are going to use simple gradient descent and implement everything from scratch in python.

Introduction:

The simplest feed-forward artificial neural network is a perceptron, which is also the first studied neural network (Rosenblatt 1962). It consists of an input layer and a single output neuron.

Now, suppose you want to perform linear regression on the dataset given below:

x=  [1, 2, 3, 4, 5, 6, 7, 8, 9]
yd= [1, 4, 9, 16, 25, 36, 49, 64, 81]

Here, x is the input and yd is the desired output. As soon as we look at the dataset, we can easily identify that it is just one-to-one mapping of input to its squared value. So, the mapping function used here is x^w (read as x to the power w) where w is 2.

But we can not always identify the value of w that easily. And what if mapping functions are a little more complex like e^wx, etc.

Whenever we know the mapping function, instead of using the above mentioned standard perceptron, we can modify it and use it as a w parameter approximator.

For example, when mapping function y = x^w, the model given below can be used:

Here, instead of the usual neuron where mapping is ‘y=wx +b’, we have used y=x^w. The activation function is 1 i.e. linear activation. The necessary non-linearity is provided by the the mapping function inside the neuron(x^w here).

In the next step, we are going to see the learning process through gradient descent and w parameter approximation.

Implementation:

Import necessary libraries in python:

#importing desired libraries
import numpy as np
import matplotlib.pyplot as plt

Now create the input dataset:

#creating our own dataset where y= x^4.78
input1=[x*0.1 for x in range(1,17)]
output=[x**4.78 for x in input1]

Initialize the hyperparameters:

#here, we are randomly initialising w=3, learning rate(lr) = 0.001
w=3
lr= 0.001
epoch= 100
error_history= np.zeros(epoch)

Since there is just 1 single neuron, we perform simple gradient descent i.e. a simplified back propagation. Here, we first find the squared error, differentiate it to find the gradient. Subtract the product of learning rate and gradient with w. Repeat the procedure for the entire dataset, epoch number of times.

#function to calculate gradient descent
def gradient_descent(input1,output,w,epoch):
for i in range(epoch):
error=0
for (x,yd) in zip(input1,output):
y= x**w
error+= (yd - y)**2 #sum of squared error
gradient= 2*np.log(x)*(x**(2*w)) - 2*yd*np.log(x)*(x**w)
w = w - lr*gradient #update w
error_history[i]=error/16
return (w)

Note: The above used gradient descent function is specific to the mapping function x^w. ‘y’ and ‘gradient’ parameter should be changed if it has to be used for other functions.

Perform gradient descent with 100 epochs as initialized earlier and take the predicted w value:

#traing the network and obtaining the value of w
w = gradient_descent(input1,output,w,epoch)

Now plot the mean squared error to see if it is decreasing after each epoch and make sure the model is converging.

#plotting the cost fuction to see how it decreases gradually 
plt.plot([x for x in range(epoch)],error_history)
plt.title("Mean Squared Error")
plt.ylabel("ERROR")
plt.xlabel("EPOCH")
plt.show()

Finally, lets check if the predicted data is matching with the input data:

#plot actual vs predicted output
plt.scatter(input1, output, label="Actual Output")
x = np.linspace(0, 1.6, 100)
plt.plot(x, x**w, color='r', label="Predicted Output")
plt.title("DESIRED vs PREDICTED OUTPUT")
plt.legend()
plt.show()
print('Predicted value of w is: ',w)

Conclusion:

As we can see from the above graph, the predicted output is matching with the desired output. The exact value of w is predicted to be ‘4.765’. If we look back, the dataset was created with a w value of ‘4.78’. So, our model has managed to learn the parameter value with great accuracy.

We can further extrapolate the model to predict more complex functions with multiple parameters.

Thank you!!

--

--