Interpreting the Probability density functions as a data scientist

6 min readOct 11, 2019

Random variable:

Discrete random variable: X is a discrete random variable, if its range is countable.

Continuous random variable: A continuous random variable is a random variable where the data can take infinitely many values. For example, a random variable measuring the time taken for something to be done is continuous since there is an infinite number of possible timestamps that can be taken.

Population and sample:

A population includes all of the elements from a set of data. Mean of the population is denoted as μ.
A sample consists of one or more observations drawn from the population. The mean of the sample is denoted as X̄. If sampling was done randomly than it is called a random sample.

As sample size increases, the sample means converges to the population mean.

Depending on the sampling method, a sample can have fewer observations than the population, the same number of observations, or more observations. More than one sample can be derived from the same population.

Gaussian distribution(Normal distribution):

The mean, median and mode of the distribution coincide.
The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
The total area under the curve is 1.
Exactly half of the values are to the left of the center and the other half to the right.

Most of the continuous random variables followed Gaussian distribution by nature. The probability density function can be shown below.

The peak is mostly located at the mean position of the population where σ² denoted variance of the population. σ² decides the shape of the PDF.

As x increases(move away from μ), y reduces exponential of the squire.
The curve is symmetric.
Shape fall is exponentially quadratic.

When mean = 0, all curves are at probability =0.5.

As the variance decreases, the curve tries to become vertical line at x=0.

68–95–99.7 rule

68% of the points lie between -1σ to 1σ deviation of the mean.

Symmetric distribution, Skewness, and Kurtosis:

A symmetric distribution is a type of distribution where the left side of the distribution mirrors the right side. By definition, a symmetric distribution is never a skewed distribution.

Kurtosis measure the peakedness of a distribution.
Mean gets impacted by outliers.

The curve above the normal plot is positive kurtosis and below the normal curve (N=0) is negative kurtosis.

Standard normal variate:

Given any distribution with given points (X1,X2,X3,X4..) with mean and variance = N(μ,σ²), you can standardize to convert into standard normal variate N(0,1).

After standardization, you can tell simply the 68% of points lie between -1 and +1. and 95% point lies between -2 to +2.

Kernel density estimation:

Used to convert histogram into PDF.

Take all heights of points on individual kernels and sum them — the sum is total height of distribution.

Kernel density estimation

In statistics, kernel density estimation ( KDE) is a non-parametric way to estimate the probability density function of…

en.wikipedia.org

Sampling distribution & Central Limit theorem:

CLT: The means of each sample from the population is equal to the population mean(μ). The distribution can be any distribution.

Quantile-Quantile plot(Q-Q plot):

To determine the random sample variables normally distributed or not. if the number of samples is small, it is had to interpret the Q-Q plot.

How distributions are used?

Gaussian distribution give the theoretical model of distribution of data which observed in many cases of natural phenomenon.

Suppose we know that data is distributed normally X ~ N (µ, σ) with mean µ and deviation σ. We can draw PDF and CDF using the above random data.

PDF and CDF tell us how data is distributed. PDF and CDF draw only in the case of Gaussian distribution.

Chebyshev’s inequality:

If I don't know the distribution, mean=finite, and standard=finite. We can not draw PDF and CDF because of distribution.

Here you can find the percentage of points lying between the given range.

Uniform distribution:

It is used to generate a random number which has a lot of applications. Height tells us what the probability is of finding that value. The probability density function(PDF) for continuous random variable and probability mass function(PMF) for a discrete random variable:

Uniform distribution (continuous)

In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of…

en.wikipedia.org

NOTE: sample uniformly means each point have equal chance of lie in sample dataset D’

Bernoulli and Binomial Distribution:

Bernoulli distribution

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1]…

en.wikipedia.org

Binomial distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

en.wikipedia.org

Log-Normal Distribution:

if ln(X) is normally distributed. if not, you can check using the Q-Q plot.

NOTE: if data given in log-normal, convert into Gaussian distribution by taking log. so you can use all ML techniques.

Most of the time in the real application, distribution is log-normal. Log-normal is right-skewed as we increase σ value. please see the example given below link.

example found at the below link.

Log-normal distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random…

en.wikipedia.org

Power law distribution:

Power law

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity…

en.wikipedia.org

also know as 80–20 rule. 80% of the time value found in a 20% interval.

Pareto distribution:

Pareto distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a…

en.wikipedia.org

you can find an example in the application section in the above link.

Box cox transform:

if the dataset is in power-law/Pareto distribution, to convert into Gaussian distribution, use Box cox transform.

By putting all x value in Box cox function, you will get lambda( λ) value. use lambda( λ) value you can convert each x into y.

you can directly find Y value using the formula given in link

scipy.stats.boxcox - SciPy v1.3.1 Reference Guide

scipy.stats. boxcox( x, lmbda=None, alpha=None) [source] Return a positive dataset transformed by a Box-Cox power…

docs.scipy.org

In a single line using boxcox(x) function, in just one line, we can find y value which is normally distributed.

Weibull distribution:

Used to measure the height of the dam. collect a one-week interval of rain data.

to determine particle size

Weibull distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named…

en.wikipedia.org

Interpreting the Probability density functions as a data scientist

Random variable:

Population and sample:

Gaussian distribution(Normal distribution):

Symmetric distribution, Skewness, and Kurtosis:

Standard normal variate:

Kernel density estimation:

Kernel density estimation

In statistics, kernel density estimation ( KDE) is a non-parametric way to estimate the probability density function of…

Sampling distribution & Central Limit theorem:

Quantile-Quantile plot(Q-Q plot):

How distributions are used?

Chebyshev’s inequality:

Uniform distribution:

Uniform distribution (continuous)

In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of…

Bernoulli and Binomial Distribution:

Bernoulli distribution

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1]…

Binomial distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability…

Log-Normal Distribution:

Log-normal distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random…

Power law distribution:

Power law

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity…

Pareto distribution:

Pareto distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a…

Box cox transform:

scipy.stats.boxcox - SciPy v1.3.1 Reference Guide

scipy.stats. boxcox( x, lmbda=None, alpha=None) [source] Return a positive dataset transformed by a Box-Cox power…

Weibull distribution:

Weibull distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named…

Written by Rana singh

No responses yet