Image for post
Image for post


  1. Intro
  2. Types(Filter methods, Wrapper methods, Embedded methods, Hybrid methods)-: Information gain/chi-squ/corr/MAD/stepwise/logistic/RF
  3. Genetic algorithm for feature selection

Feature selection is the process of reducing the number of input variables when developing a predictive model. Adding redundant variables reduces the generalization capability of the model and may also reduce the overall accuracy of a classifier. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model.

The goal of feature selection in machine learning is to find the best set of features that…

Image for post
Image for post
FE for ML

Feature engineering, also known as feature creation, is the process of constructing new features from existing data to train a machine learning model. Typically, feature engineering is a drawn-out manual process, relying on domain knowledge, intuition, and data manipulation. This process can be extremely tedious and the final features will be limited both by human subjectivity and time. Automated feature engineering aims to help the data scientist by automatically creating many candidates features out of a dataset from which the best can be selected and used for training.

Fortunately, featuretools is exactly the solution we are looking for. This open-source…

Image for post
Image for post


  1. Introduction
  2. Train our own word embedding (code)
  3. Phrases(bigrams)
  4. t-SNE visualizations in 2D
  5. Retrain Glove Vectors on top of my own data


Word embedding is one of the most popular representations of document vocabulary. It is capable of capturing the context of a word in a document, semantic and syntactic similarity, relation with other words, etc.

Word2Vec is one of the most popular techniques to learn word embeddings using a shallow neural network. It was developed by Tomas Mikolov in 2013 at Google.

After playing around with GLOVE, you will quickly find that certain words in your training data…

Image for post
Image for post

The biggest problem with deep learning is Overfitting. Deep NN means it had many hidden layers, which means many ways to train. given the potential of many training weights, the biggest encounter is overfitting. you have to always regularize to avoid overfitting. In ML learning we extensively use L1, L2 regularization to avoid overfitting. Let's discuss some of the parameters used in deep learning to control overfitting.


  1. Basic MLP terminology explained
  2. Application on MNIST data using Keras
  3. Hyperparameter tuning(sklearn/hyperopt)

1. Dropout layers & Regularization:

Dropout is a general concept used for regularization.

dropout rate: it's basically the probability of inactive neurons(dropped out) in given…

Image for post
Image for post


Ottonova, Consumer-Centric, is a “digital-first” insurance company, provides health insurance for patients and employees based in Munich (Germany). The company develops an AI-based chat App platform where customers can register and access the clinic insurance coverage plans for medical services. They offer automated AI-based concierge services like doctor consultations, treatment plans, medical data storage, prescription management, and more with help of big data processing. Apart from concierge services, they also facilitate customers with health and fitness policy management application where customers can leverage the maximum benefit of coverage plans. They deal with digital products that set standards in terms of…

Image for post


  1. problem definition and solution requirements
  2. datasets+code
  3. keyword and sentence vector and data structure
  4. high level Design Architecture
  5. Docker containerization and elastic search installation [setup]
  6. index data[code]
  7. search[code]
  8. deployment[code]
  9. extension to solution

1. Problem definition and solution requirements

We want to build simple search engine given repository of questions with most relevant answer in decreasing order of most relevant.

Image for post
Image for post

What are APIs?

In very simple language, it is a function call. In another way, calling a function which is most likely from a different box which is nothing but a server. The output dataset is in either json or XML format.

Image for post
Image for post

“If I had my life to live over again, I would have made a rule to read some poetry and listen to some music at least once every week.”― Charles Darwin

Life exists on the sharp edged wire of the Guitar. Once you jump, it’s echos can be heard with immense intangible pleasure. Let's explore this intangible pleasure…

Music is nothing but a sequence of nodes(events). Here input to the model is a sequence of nodes.

Some of the music generated example using RNNs shown below

Music Representation:

  1. sheet-music
  2. ABC-notation: it has a sequence of characters which is very simple…

Image for post
Image for post

Random variable:

Discrete random variable: X is a discrete random variable, if its range is countable.

Continuous random variable: A continuous random variable is a random variable where the data can take infinitely many values. For example, a random variable measuring the time taken for something to be done is continuous since there is an infinite number of possible timestamps that can be taken.

Population and sample:

  • A population includes all of the elements from a set of data. Mean of the population is denoted as μ.
  • A sample consists of one or more observations drawn from the population. The mean of the sample is…

Rana singh

Leadership belief /Analyst(AI)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store