• Aucun résultat trouvé

t-distribution Stochastich Neighbor Embedding (t-SNE)

N/A
N/A
Protected

Academic year: 2022

Partager "t-distribution Stochastich Neighbor Embedding (t-SNE)"

Copied!
16
0
0

Texte intégral

(1)

t-distribution Stochastich Neighbor Embedding (t-SNE)

Juliette & Thomas

Master 2 Économiste d'entreprise

April 17, 2020

M Éc E n

(2)

Contents

1 What is t-SNE

2 How t-SNE works

3 Limits of t-SNE

MEn t-SNE 2 / 16

(3)

What is t-SNE

An unsupervised non-linear technique

t-SNE is a very recent algorithm developed in 2008 by Laurens van der Maatens and Georey Hinton.

It is a dimension reduction technique to maps multi-dimensional data to two or more dimensions

MEn t-SNE 3 / 16

(4)

What is t-SNE

Application elds

t-SNE is an algorithm usually used in the following areas:

I Medicine I Biology

I Signal treatments I Recognition

The following example of t-SNE is a representation of picture recognition:

Database: MNIST

MEn t-SNE 4 / 16

(5)

What is t-SNE

Dierence with PCA

PCA is a linear algorithm→it is not able to interpret complex polynomial relationship between features and so may lead to poor visualization.

PCA is a linear dimension reduction technique that seeks to maximize variance and preserves large pairwise distances. However it doesn't try to minimize the intra-group variance.

t-SNE diers from PCA by preserving only small pairwise distances or local similarities.

MEn t-SNE 5 / 16

(6)

What is t-SNE

Machine Learning trade o

t-SNE is more like a black box where the interpretation is put aside.

Credit: https://urlz.fr/crCP

MEn t-SNE 6 / 16

(7)

How t-SNE works

Step 1

We rst describe the functionning of a SNE algorithm.

Stochastic Neighbor Embedding (SNE) starts by converting the

high-dimensional Euclidean distances between data points into conditional probabilities that represent similarities.

We dene the similarity of datapointxi to datapointxj as:

Pj|i= exp(−||xi−xj||2/2σi2) Σk6=iexp(−||xi−xk||2/2σi2)

WherePj|iis the conditionnal probability: xi would pick xj as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered atxi

MEn t-SNE 7 / 16

(8)

How t-SNE works

Step 1

Consider the following example

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 8 / 16

(9)

How t-SNE works

Step 1

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 9 / 16

(10)

How t-SNE works

Step 2

For the low-dimensional counterpartsyi andyj of the high-dimensional datapointsxi andxj it is possible to compute a similar conditional probability, which we denote by:

Qj|i= exp(−||yi−yj||2) Σk6=iexp(−||yi−yk||2)

Logically, the conditional probabilitiesPj|iandQj|imust be equal for a perfect representation of the similarity of the datapoints in the dierent dimensional spaces→the dierence between them must be zero for the perfect replication of the plot in high and low dimensions.

By this logic SNE attempts to minimize this dierence of conditional probability.

MEn t-SNE 10 / 16

(11)

How t-SNE works

Step 2

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 11 / 16

(12)

How t-SNE works

Step 2

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 12 / 16

(13)

How t-SNE works

Step 3

Here is the dierence between SNE and t-SNE. To calculate the similarity between datapoints in lower dimension the algorithme choose a t-distribution probability law instead of a Gaussian law. This allows a higher inter-group variance.

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 13 / 16

(14)

How t-SNE works

Step 4

Credit: https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 14 / 16

(15)

Limits of t-SNE

Caveat emptor

It is important to analyze the performance of t-SNE. The algorithm computes pairwise conditional probabilities and tries to minimize the sum of the dierence of the probabilities in higher and lower dimensions.

This involves a lot of calculations and computations→the algorithm is quite heavy on the system resources. That's why is recommended to use a dataset with less than 10 000 points.

As already discuss, the t-SNE gives an output which is not interpretive. It may be ecient to use the output of a t-SNE as an input of unsupervised classication algorithm.

MEn t-SNE 15 / 16

(16)

Limits of t-SNE

Sources

https://distill.pub/2016/misread-tsne/

https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r- python/

https://www.youtube.com/watch?v=NEaUSP4YerM&t=587s

MEn t-SNE 16 / 16

Références

Documents relatifs

• a new way to represent the SDDP as a pickup-and-delivery problem with time windows and release dates, which brings advantages when considering preemptive vehicle returns to the

(b) The previous question shows that it is enough to consider the case k, l 6 n/2, for otherwise we can replace the representations by isomorphic ones so that these inequalities

be
the
best
way
to
solve
the
debt
and
employment
problems
of
the
province.
This
left
no
use


When you are young, you take risk by optimism When you get older, experienced, you

[r]

linear forms, and their extensions to discrete symmetries, scale transforma- tions, and to the conformal group are also determined.. Certain of the results and methods

joli (jolie), pretty mauvais (mauvaise), bad nouveau (nouvelle), new petit (petite), little vieux (vieille), old.. ordinal

For degradation process we proposed a combined method based on two approaches designed to rare events simulation: the standard conditional Monte Carlo method and the