https://freakonometrics.hypotheses.org/9593

Generalized linear models(GLM) with application

Rana singh
3 min readAug 17, 2022

Learning GLM lets you understand how we can use probability distributions as building blocks for modeling. I assume you are familiar with linear regression and normal distribution.

Gaussian Naive Bayes(GNB) theory can be found at below location

This is the list of probability distributions and their canonical link functions.

  • Normal distribution: identity function
  • Poisson distribution: log function
  • Binomial distribution: logit function

the advantage of statistical modeling is that you can make any kind of model that fits well with your data.

Various link functions are implemented in statsmodels. However, if you need to use more complex link functions, you have to write models yourself. For this purpose, probabilistic programming frameworks such as Stan, PyMC3 and TensorFlow Probability would be a good choice.

Find code at the below location

Linear regression

Linear regression is used to predict the value of continuous variable y by the linear combination of explanatory variables X.

In the univariate case, linear regression can be expressed as follows

Notice this model assumes normal distribution for the noise term. The model can be illustrated as follows

Poisson regression:

Poisson distribution is used to model count data. It has only one parameter which stands for both the mean and standard deviation of the distribution. This means the larger the mean, the larger the standard deviation.

Now, let’s apply Poisson regression to our data. The result should look like this.

The prediction curve is exponential as the inverse of the log link function is an exponential function. From this, it is also clear that the parameter for Poisson regression calculated by the linear predictor is guaranteed to be positive.

The code for Poisson regression

The magenta curve is the prediction by Poisson regression.

logistic regression:

If you use logit function as the link function and binomial / Bernoulli distribution as the probability distribution, the model is called logistic regression.

The right-hand side of the second equation is called the logistic function. Therefore, this model is called logistic regression. As the logistic function returns values between 0 and 1 for arbitrary inputs, it is a proper link function for the binomial distribution.

Reference:

http://cs229.stanford.edu/notes/cs229-notes1.pdf

--

--

No responses yet