T-SNE visualization of high dimension MNIST dataset
T-SNE state t-distributed statistics neighborhood embedding system. PCA is a very simple old technique but now a day T-SNE used widely. all cases where PCA have limitation T-SNE can be used. PCA preserve global structure while T-SNE can preserve the local structure. it is an iterative algorithm where at every iteration, it tries to reach a better solution.
Embedding
taking every point in high dimensional space and place it into low dimensional space such that it preserves the neighborhood distance between points like X1 and X2 same as in low dimension space X1' and X2'. it does not give any guarantee about the points which is not neighborhood points. t-sne try to keep neighborhood point as close as possible and non-neighborhood point as far away from possible.
Crowding problem
sometimes it is impossible to preserve distance in all neighborhood, called crowding problem. In such cases t-distribution primarily used to resolve the crowding problem. Ex. suppose you have four neighborhood point at the edge of squire which are one unit distance to each other in sequence. when t-sne project it into low dimension(2-D to 1-D), the distance between X3 and X4 become 4 unit while original it was one unit. this is because of the crowding problem.
Two most important parameter of T-SNE
1. Perplexity: Number of points whose distances I want to preserve them in low dimension space.
2. step size: basically is the number of iteration and at every iteration, it tries to reach a better solution.
Note: when perplexity is small, suppose 2, then only 2 neighborhood point distance preserve in low dimension space. in this case, the result will be crazy. When perplexity is very high 100, you will get a mess. so try to choose multiple perplexity various values where perplixity< number of the data point. also run for a long time until the shape stabilizes.
Apply T-SNE on MNIST dataset
MNIST is 784-dimensional data try which is being projected by T-SNE into 2-D data.
# TSNE
#https://distill.pub/2016/misread-tsne/from sklearn.manifold import TSNE# Picking the top 1000 points as TSNE takes a lot of time for 15K points
data_1000 = standardized_data[0:1000,:]
labels_1000 = labels[0:1000]model = TSNE(n_components=2, random_state=0)
# configuring the parameteres
# the number of components = 2
# default perplexity = 30
# default learning rate = 200
# default Maximum number of iterations for the optimization = 1000tsne_data = model.fit_transform(data_1000)# creating a new data frame which help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_1000)).T
tsne_df = pd.DataFrame(data=tsne_data, columns=(“Dim_1”, “Dim_2”, “label”))# Ploting the result of tsne
sn.FacetGrid(tsne_df, hue=”label”, size=6).map(plt.scatter, ‘Dim_1’, ‘Dim_2’).add_legend()
plt.show()
Above exercise done only on 1000 dataset for demonstration purpose but T-SNE gives a good result with a high dataset. In the above plot, it can be seen the different cluster for a different label. Another thing can be done is to try a range of perplexity, step size and rerun many time before making the final conclusion.
T-SNE group data on a visual basis so same color point makes one cluster in low dimension space. the point which is group together is closer to each other.
Notes:
- it tries to do expend dense cluster and try to contract the sparse cluster. so it can not be said about the size of cluster in t-sne plot. t-sne does not preserve the distance between cluster while projecting in low dimension space.
- always try to do multiple perplexities, step size and most important rerun before making the conclusion.
===================================
==================================
Visualization using PCA explained in my other blog can be found at below link( https://medium.com/@ranasinghiitkgp/principal-component-analysis-pca-with-code-on-mnist-dataset-da7de0d07c22)
Reference:
Google Image
Applied AI