• Aucun résultat trouvé

Clustering by contrast

N/A
N/A
Protected

Academic year: 2022

Partager "Clustering by contrast"

Copied!
17
0
0

Texte intégral

(1)

Cyril CHHUN

Télécom Paris

Advisor: Jean-Louis DESSALLES

June 20, 2019

(2)

1 Introduction

2 The algorithm

3 Test results

4 Conclusion

(3)

Introduction

What are the end goals of contrast learning?

Design aclustering algorithm able to:

• understand the meaning of “small bacteria” and “small galaxy”

without going through the set of “small” objects

• produce relevant descriptions: “it’s a singer who has ten million views on Youtube”

• produce negations and explanations: “she isnot a writer”

• detect anomalies: a talking cat

• learn from a single example: a “Siamese cat”

(4)

Impossibility theorem

Which properties would we expect of a clustering algorithm?

• Scale-invariance, richness, consistency

• Kleinberg (2002)1: it is impossible to design a distance function-based clustering algorithm which verifies those three properties.

• Solution: forsake one of those properties or use non-metric functions

1Jon Kleinberg. An impossibility theorem for clustering. Advances in Neural Information Processing Systems 15, 2002.

(5)

Vocabulary

• Object: observed instance

• Prototype: mental representation of a group as a basic object

• Contrast: “difference” between two objects

• Weight: number of times a prototype has been recalled

• Deviation: acceptable range of an object’s properties compared to its prototype

• Order: real-life observations are first-order objects, contrasts are second-order objects, etc.

(6)

Design

Prototypes

How to represent prototypes?

Mean Deviation Weight

 µ1

...

µm

 σ1

...

σm

w

(7)

Design

Finding the clusters

Given objectb, how to find the best prototype a of deviation a0?

• Dimension-agnostic, scale-invariant, not density-based.

• The prametric function d(a,b)7→

m

X

j=1

1 |ajbj| a0j > θj

!

verifiesscale-invariance along any axis.

• It is not a distance, as none of the three properties are verified!

• Problem: many prototypes can verify the smallest distance.

(8)

Design

Comparing the clusters

Figure:The smaller cluster seems more reasonable: how to avoid the hub?

• We simply take the best prototypes two by two and choose the one whose mean is closer to the object along the most

dimensions. The other cluster is eliminated.

• Using this rule, we make a tournament and pick the winner.

• Deviations are not used in this step so as to avoid hubs.

(9)

Design

Updating the memory

How to stock the new information in the memory?

• The object is added as a prototype no matter what, with a deviation equal to εtimes itself and a weight of 1

• The winning cluster is updated as follows:

mean= weightprototype+object weight+ 1

deviation= weightdeviation+|prototypeobject| weight+ 1

weight =weight+ 1

• We enforce a limited memory to cope with initial errors and improve efficiency. Unused prototypes are forgotten first.

(10)

Design

Skeleton

def feed_data_online(data):

for obj in data:

closest_clusters = find_closest_clusters(obj) winner = cluster_battles(obj, closest_clusters) update_memory(obj, winner)

• Clustering: simple loop with complexity O(mem_size×n)

→ Online learning

(11)

Understanding results

• Softer clustering than k-means; different ways to classify when seeing a new object:

– Assign objectb to prototypea ifd(a,b) = 0

– Find the closest prototype to the object (by tournament for example)

(12)

Live demonstration

(13)

What about contrasts?

How to extract relevant contrasts?

• The contrast features should be meaningful, i.e.

low-dimensional and applicable between similar objects.

• Given an objectb and its closest prototype a, we extract the contrast c such that

cj = (ajbj)·1 |ajbj| aj0 > θj

!

• Example: seeing a black tomato would give a “red-to-black”

contrast.

(14)

What about contrasts?

How to stock the contrasts in memory?

• We can use the same principle! Contrast-prototypes with mean, deviation and weight.

Then, how to refine the contrasts?

• We can use the same procedure!

(15)

Second demonstration

(16)

Feedback on the checklist

3 understand the meaning of “small bacteria” and “small galaxy”

without going through the set of “small” objects

7 produce relevant descriptions: “it’s a singer who has ten million views on Youtube”

7 produce negations and explanations: “she isnot a writer”

3 detect anomalies: a talking cat

3 learn from a single example: a “Siamese cat”

(17)

Conclusion

• The algorithm is dimension-agnostic and verifies scale-invariance

• It learns on-the-fly and has a reasonable complexity (linear on average)

• Designed to be used on relatively high-level datasets

• Contrasts still need testing: some inconsistent results can appear

Références

Documents relatifs

The Logistic regression classifier and its hyper-parameters were chosen with a grid search but are identical to the study we conducted for the gender classification and language

We built a miniature model of a production line that implements such a sys- tem using modern standards for distributed industrial control systems, IEC 61499 and OPC-UA.. We show

Together, this should result in an interaction of compatibility and required speed: In the compatible case (i.e. when the subject and partner perform movements

We assessed (i) the acceptability and feasibility of the devises, software, and setup in a clinical setting using a qualitative approach with an observer listing all significant

The latter vanish within a consistent mean field approximation when both normal and anomalous pairings are retained.. Such a conclusion is puzzling, since it amounts to

Jiang, “enhancing the security of mobile applications by using tee and (u)sim,” in Proceedings of the 2013 IEEE 10th International Con- ference on Ubiquitous Intelligence

From these data, we can observe that the water gas shift reaction depends on the inlet alcohol mole fraction, a rise of X ethanol leads to a decrease of y,

Displaying websites in groups is known, in net-art circles, as a "speed show." Using the speed show format, nine unique but similar websites of international origin will