Slides

(1)

Multi-Scale Synthesis of Large-Scale Traces

Aggregation/disaggregation process

Robin Lamarche-Perrin¹³, Lucas Schnorr¹², Jean-Marc Vincent¹²

1Laboratoire LIG, UniversitéJoseph Fourier [email protected] 2MESCAL INRIA/LIG Team³MAGMA LIG Team

2012 December 10

(2)

Outline

1 Context

2 Measure

3 Dynamic aggregation

4 Experiments

5 Future Work

(3)

Outline

1 Context

2 Measure

4 Experiments

5 Future Work

(4)

Comprehensible Representation

Space explosion

(5)

Comprehensible Representation (2)

phase 1 phase phase 4

2 and 3

Server

time

TraderClient

thread

JVM

(6)

Folding information

(7)

Folding information(2)

(8)

Aggregation/Clustering

Data clustering approach Similarity of objects

⇒distance function; semantic of the function Many methods, (k-means, hierarchical,...) Level of clustering

Aggregation approach External information

⇒hierarchy, topology, ...

Information loss estimation

(9)

Objective

Goal 1

Provide a measure of the quality of partial aggregations

Goal 2

Provide an interactive synthetic representation of large-scale data with partial multi-level aggregations

(10)

Outline

1 Context

2 Measure

4 Experiments

5 Future Work

(11)

Aggregation

P^{12 10 10} 12 11 14 11 12 9 Cluster A

5 5 17 2 13 6 20 19 13 Cluster B

100 Aggregate

11 11 11 11 11 11 11 11 11 Normalized Agg.

Q' Q

Quality estimate of an aggregation function

Goal

comparison of aggregations : criteria composition : dynamic aggregation process semantic : related to an extra structure

(12)

Aggregation

P^{12 10 10} 12 11 14 11 12 9 Cluster A

5 5 17 2 13 6 20 19 13 Cluster B

100 Aggregate

11 11 11 11 11 11 11 11 11 Normalized Agg.

Q' Q

Goal

comparison of aggregations : criteria composition : dynamic aggregation process semantic : related to an extra structure

(13)

Aggregation

P^{12 10 10} 12 11 14 11 12 9 Cluster A

5 5 17 2 13 6 20 19 13 Cluster B

100 Aggregate

11 11 11 11 11 11 11 11 11 Normalized Agg.

Q' Q

Goal

comparison of aggregations : criteria composition : dynamic aggregation process

(14)

Entropy : Measure of Homogeneity/Disorder

H=−X|sk|

|S|log₂|sk|

|S| =X

pklog₂ 1 pk

(1)

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 1.2

Proportion of state 1 :p

Bitrate

Entropy for a two-state system

Quantity of information to code the system

(15)

Entropy Properties

Characteristics

H>0,H(p) =0⇒deterministic system H(p)6log₂n,H(p) =log₂n⇒uniform system Independence property

Conditionning

Entropy Gain

G=Hmicro−Hmacro

G>0

G=0 (no aggregation or deterministic micro-system) maximal if one aggregate

Composition property

(16)

Divergence

D(pmicro||pmacro) =X

pmicro(k)log₂pmacro(k) pmicro(k) Uniform distribution on the aggregate

pmacro(x) = 1

|A(x)|

X

k∈A(x)

p(k)

D=0 (no aggregation or uniform distribution) Dmax=H_Uniform−H_micro

Quantity of information to re-code the system

(17)

Outline

1 Context

2 Measure

4 Experiments

5 Future Work

(18)

Dynamic multi-level aggregation

Combination Entropy Gain/Divergence Tradeoff aggregation and quantity of information

RIC=G−DRelative Information Criterion Parametrized Information Criteria

PRIC=pG−(1−p)D

p=0 no aggregation p=1 maximal aggregation Evolution as a function ofp

(19)

Quel niveau d’agrégation doit-on considérer ?

Quelle partie de la hiérarchie doit-on afficher ?

Projet TRIVA

Agrégation et visualisation de systèmes distribués

(20)

Processus

Projet TRIVA

Multi-level aggregation: Triva Application/Demo

(21)

Machines

Processus

Projet TRIVA

(22)

Clusters

Machines

Processus

Projet TRIVA

Multi-level aggregation: Triva Application/Demo

(23)

Clusters

Machines

Processus

Projet TRIVA

?

(24)

Outline

1 Context

2 Measure

4 Experiments

5 Future Work

(25)

Aggregations within a Hierarchy

(26)

Experiments

AHierarchy: Site (5) - Cluster (9) - Machine (188) - Process (188)

BRatio Gain/Loss with P = 10% CRatio Gain/Loss with P = 40%

Cluster level

Site level

Full aggregation A.1

A.2

A.3

Scenario with 188 processes, grouped by 9 clusters and 5 sites (Treemaps A, A.1, A.2, and A.3) and with two values of P (Treemaps B and C); when the ratio gain/loss is 10% (treemap B), everything is aggregated but the

(27)

Experiments

AHierarchy: Cluster (3) - Machine (50) - Process (433) A.1 Machine level

Cluster level A.2

Full aggregation A.3

BRatio Gain/Loss with P = 10% CRatio Gain/Loss with P = 30%

Scenario with 433 processes, grouped by 50 machines and 3 clusters (treemaps A, A.1, A.2, and A.3) and with two values of P (treemaps B and C);

(28)

Experiments

AHierarchy: Site (10) - Super-Cluster (100) - Cluster (1000) - Machine (10000) - Process (1000000)

Bwith P=10%

A.1

A.2 A.3

B.1

B.2 B.4 B.3

Synthetic scenario with 1 million processes, grouped by 10000 machines, 1000 clusters, 100 super-clusters and 10 sites; treemap A shows the aggregated behavior of all processes for each machine; treemap B is configured with a gain/loss ratio of 10%, highlighting the heterogeneous

(29)

Outline

1 Context

2 Measure

4 Experiments

5 Future Work

(30)

Future Works

Modeling :

- qualitative state−→quantitative state - node aggregation−→flow aggregation - integration of spatial/temporal aggregation Analysis tool

- Visualization of aggregation quality - Statistical tests (significance) Algorithms

- optimal aggregation (structure impact) - dynamics of aggregates

muito obrigado por toda sua atenção e colaboração

(31)

Future Works

Modeling :

- qualitative state−→quantitative state - node aggregation−→flow aggregation - integration of spatial/temporal aggregation Analysis tool

- Visualization of aggregation quality - Statistical tests (significance) Algorithms

- optimal aggregation (structure impact) - dynamics of aggregates

muito obrigado por toda sua atenção e colaboração