Toward Autonomic Grids: Analyzing the Job Flow with Aﬃnity Streaming

(1)

Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming

Xiangliang Zhang, Cyril Furtlehner

Julien Perez, Cécile Germain-Renaud, Michèle Sebag

TAO team−INRIA CNRS

Université de Paris-Sud, F-91405 Orsay Cedex, France

29 June 2009

(2)

Motivation: Autonomic Computing G-StrAP: Data Streaming for Jobs Results: Multi-scale Monitoring Conclusion and Perspective

Motivations of Autonomic Computing

(4)

Goals of Autonomic Computing

AUTONOMIC VISION & MANIFESTO http://www.research.ibm.com/autonomic/manifesto/

Self-managing system with the ability of Self-healing: detect, diagnose and repair problems Self-configuring: automatically incorporate and configure components

Self-optimizing: ensure the optimal functioning wrt defined requirements

Self-protecting: anticipate and defend against security breaches

Data Mining for Autonomic Computing

(5)

Autonomic Grid Computing System

(6)

G-StrAP: relies on Affinity Propagation (AP)

Affinity Propagation (AP) [Frey2007]

statistic physics algorithm for clustering (based on messaging passing)

a cluster = an exemplar (akin k-centers)

the model = set of {exemplar, frequency}

Why AP ?

Traceability: real jobs as exemplars

because of categorical variables, e.g., userid, queue name etc No prior knowledge of K, number of clusters

quasi optimality wrt. information loss

—> stability [Meila2006]

(8)

From AP to Large-scale Data Streaming 1 SCALABILITY

: fromO(N²logN) toO(N^h+^h+²¹) Hierarchical Affinity Propagation

negligible infromation loss (proof in the paper)

(9)

From AP to Large-scale Data Streaming

2 Non stationary distribution

various Virtual Organization number and expertise of users Streaming AP (StrAP)

(10)

Non stationary distribution, continue

Page-Hinkley statistic

(Cumulative-Sum-like test) [Page54,Hinkley70]

0 100 200 300 400 500 600 700 800 900 1000

−5 0 5 10 15 20 25 30 35 40

time t

pt p¯t mt Mt

ptchanging distribution

¯ pt= ¹_tPt

ℓ=1pℓ

mt=Pt

ℓ=1(pℓ−¯pℓ+δ) Mt=max{mℓ} PHt=Mt−mt

ifPHt> λ, changed detected

How to set λ ???

(11)

Self-adaptive change detection test

Self-adapt λ

_≡An optimization problem BIC: Fλ= _|C|¹ P|C|

i=1 1 ni

P

e_j∈Ci d(ej,e_i^∗)

+ϕ^ρ₂logN+ηO_t

∝loss + size of model + percentage of outlier OPTIMIZATION:

ǫ-greedy searchfrom afinite set ofλvalues λ=argmin{E(F_λ}),

λ1 λ2 λ3 λ4 ...

E(Fλ1) E(Fλ2) E(Fλ3) E(Fλ4) ...

Gaussian Process Regression based on{λi,F_λ_i} continuous value ofλis generated

(12)

G-StrAP : Multi-scale Realtime Monitor

Dashboard for Monitoring

(14)

G-StrAP Dashboard for Grid Monitoring

Online Monitoring

1 2 3 4 5

0 20 40 60 80 100

Reservoir 7 0 0 0 0 0

10 47 54 129 0 0

8 18 24 30 595 139

7 13 14 24 9728 19190

Clusters

Percentage of jobs assigned (%)

exemplar shown as a job vector

1 2 3 4 5 6 7 8

0 20 40 60 80 100

7 0 0 0 0 0

10 47 54 129 0 0

9 18 25 20110 0 0

8 18 24 30 595 139

6 5 10 14 127 10854

10 18 29 20091 395 276

LogMonitor is getting clogged

(15)

G-StrAP Dashboard for Grid Monitoring Online Monitoring

1 2 3 4 5

0 20 40 60 80 100

10 47 54 129 0 0

8 18 24 30 595 139

7 13 14 24 9728 19190

Clusters

Percentage of jobs assigned (%)

exemplar shown as a job vector

1 2 3 4 5 6 7 8

0 20 40 60 80 100

7 0 0 0 0 0

10 47 54 129 0 0

9 18 25 20110 0 0

8 18 24 30 595 139

6 5 10 14 127 10854

10 18 29 20091 395 276

LogMonitor is getting clogged

Off-line Analysis

no prior knowledge about failure patterns

summarizing Terabyte data

(16)

Discussion and Conclusion

G-StrAP:

linear computational complexityO(N^1+ǫ)with bounded information loss

dealing with non stationary distribution data with self-adaptive change detectiontest

G-StrAP for Grid Monitoring

Traceability of failures

providing multi-scale models to describing the status of Grid online description of different type of job patterns

offline globally analysis

good quality clustering wrt. supervised learning

(18)

Perspectives

Theoretical Perspectives:

limitation 1: AP single parameter s∗

—> self-adapts∗

limitation 2: better coping with flowing patterns

—> Hierarchical G-StrAP, absorb new patterns from reservoir more efficiently

Applicative Perspectives:

profiling users

—> a user = a histogram of job exemplars customized interface

characterize evolution of users or virtual organizations build crisis scenari to test Middleware

(19)

Xiangliang ZHANG [email protected] http://www.lri.fr/∼xlzhang

Toward Autonomic Grids: Analyzing the Job Flow with Aﬃnity Streaming

Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming

Contents

Motivations of Autonomic Computing

Goals of Autonomic Computing

Data Mining for Autonomic Computing

Autonomic Grid Computing System

Contents

G-StrAP: relies on Affinity Propagation (AP)

From AP to Large-scale Data Streaming 1 SCALABILITY

From AP to Large-scale Data Streaming

2 Non stationary distribution

Non stationary distribution, continue

How to set λ ???

Self-adaptive change detection test

Self-adapt λ

Contents

G-StrAP : Multi-scale Realtime Monitor

G-StrAP Dashboard for Grid Monitoring

Online Monitoring

G-StrAP Dashboard for Grid Monitoring Online Monitoring

Off-line Analysis

Contents

Discussion and Conclusion

G-StrAP:

G-StrAP for Grid Monitoring

Perspectives