Mathematice behind Linear regression with code (Ordinary least squares, linear least squares)
the linear regression algorithm tries to find a plane or line that best fits the data points as well as possible. Linear regression is a regression technique that predicts real value i.e find the line that best fits the data point.
The same equation can be extended for a d-dimension dataset:
So, we need to find a plane (W, b) of the above equation that best fit most of the data points.
Cost function:
Since the error can be positive and negative as y_iPred can be above or below the plane/line, so to maintain positive error we square the error for each x_i.
The error function follows a parabolic curve, which means error (Y-axis) will always be positive.
We need to minimize the errors for all the sets of points which is referred to as MSE (Mean squared Error).
And the cost function of linear regression is:
Use an optimizer to compute the optimal value of W, b which minimizes the above cost function. A gradient descent optimizer can be used to find the optimal value of the plane for which the mean squared error is minimum.
Gradient Descent optimizer:
In order to find the plane (W, b), we want the error to be as small as possible. Gradient Descent is an iterative method to get to the minimum error. To find the minimum error we find the gradient of the function.
A smaller learning rate could get you closer to the minima but takes more time to reach the minima, a larger learning rate converges sooner but there is a chance that you could overshoot the minima. Sometimes the cost function can be a non-convex function where you could settle at a local minima but for linear regression, it is always a convex function.
Steps to follow for Gradient Descent Method:
- Initialize weight vector and bias term.
- Find the derivative of the function with respect to weight and bias.
- Update the weight and bias, With each iteration, the weight vector and bias term are updated to reach minima.
Step #1: Initialize weight vector and bias term:
Initialize the weight vector and bias term with random values. The dimension of the weight vector will be equal to the dimension of the dataset.
Step #2: Find the derivative of the function with respect to weight and bias:
Below two equations represent the derivate of function f with respect to weight vector and bias term respectively.
Step #1: Update the weight and bias:
The weight vector and bias term are updated for each iteration according to the mentioned equations.‘lr’ represents the learning rate which defines how fast the update should take place.
Implement code:
Detail LR code(SKLearn code implement)
Need for regularization in Linear Regression — YouTube
Note: there is no problem with imbalanced data in linear regression as there are no class labels.
Reference:
AppliedAI