Sigmoid Neuron model, Gradient Descent with sample code
In the perceptron model, the limitation is that there is a very harsh change in output function(binary output) which require linearly separable data. However, in most real-life cases, we need a continuous output. So we propose a Sigmoid Neurons model
Above function shown is a sigmoid function where it takes linear input resulting in smooth continuous output(shown in red line).
Here red line is the output of the sigmoid model and the blue line output of the perceptrons model. The output value lies between [0,1] irrespective of the number of inputs. As the sum keeps changing, we observe different values along the red line.
The sigmoid model can be used both for regression and classification problems. In the case of regression, the predicted of the sigmoid function is y value whether in a classification problem, first predict using sigmoid function then decide threshold value for classes which classify the different class of predicted y. The threshold can be 0.5 or mean of predicted y or anything depending on the problem.
Loss function
Cross entropy or square loss use in case of sigmoid neurons model.
Learning algorithm(Mathematics behind Gradient Descent)
By changing the value of coefficient(w) and bias(b), you will get family of the sigmoid function.
Here start with random ‘w’ and ‘b’ value, compute the loss than again update w and b and so on. initially, you might start with worst sigmoid function but as the update of w and b happen, it will reach optimal sigmoid function.
Nowadays, it is inbuilt in Pytorch/Tensorflow which automatically calculate optimal w and b with loss minimization.
Hereafter to calculate the loss with each pair of (w,b), we need to find grade-w and grade-b.
Here we use Tayler series to calculate the loss after (w,b) update. the calculated loss should be less than the previous one. Now our objective is to find (w,b) such that the second term of third approximate Taylor series result a negative value.
how to calculate grade
Overall fitness function is given below
Iterate until satisfied means
- At the predecide value of the number of iteration
- At the predecided value of loss value
- if the change in (w,b) is not much with iteration
Example in the case of multi-variable
#importing library
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
import matplotlib.colors
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
from tqdm import tqdm_notebook#sigmoid class for two variable
def sigmoid(x1, x2, w1, w2, b):
return 1/(1 + np.exp(-(w1*x1 + w2*x2 + b)))
#calculate loss
def calculate_loss(X, Y, w_est, b_est):
loss = 0
for x, y in zip(X, Y):
loss += (y - sigmoid(x, w_est, b_est))**2
return loss#=======================================
#class SigmoidNeuron:
def __init__(self):
self.w = None
self.b = None
def perceptron(self, x):
return np.dot(x, self.w.T) + self.b
def sigmoid(self, x):
return 1.0/(1.0 + np.exp(-x))
def grad_w(self, x, y):
y_pred = self.sigmoid(self.perceptron(x))
return (y_pred - y) * y_pred * (1 - y_pred) * x
def grad_b(self, x, y):
y_pred = self.sigmoid(self.perceptron(x))
return (y_pred - y) * y_pred * (1 - y_pred)
def fit(self, X, Y, epochs=1, learning_rate=1, initialise=True, display_loss=False):
# initialise w, b
if initialise:
self.w = np.random.randn(1, X.shape[1])
self.b = 0
if display_loss:
loss = {}
for i in tqdm_notebook(range(epochs), total=epochs, unit="epoch"):
dw = 0
db = 0
for x, y in zip(X, Y):
dw += self.grad_w(x, y)
db += self.grad_b(x, y)
self.w -= learning_rate * dw
self.b -= learning_rate * db
if display_loss:
Y_pred = self.sigmoid(self.perceptron(X))
loss[i] = mean_squared_error(Y_pred, Y)
if display_loss:
plt.plot(loss.values())
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error')
plt.show()
def predict(self, X):
Y_pred = []
for x in X:
y_pred = self.sigmoid(self.perceptron(x))
Y_pred.append(y_pred)
return np.array(Y_pred)
Evaluation
Accuracy rate in case of the classification problem
RMSE in case of the regression problem
===================================
======Detail code is given below in Github link======
Reference
- Wikipedia
- Deep Learning by One Fourth Labs(special thanks)