Analysis and diagnosis of Web performances from an end-user perspective

(1)

HAL Id: tel-01179587

https://pastel.archives-ouvertes.fr/tel-01179587

Submitted on 23 Jul 2015

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Analysis and diagnosis of Web performances from an

end-user perspective

Heng Cui

To cite this version:

Heng Cui. Analysis and diagnosis of Web performances from an end-user perspective. Web. Télécom ParisTech, 2013. English. �NNT : 2013ENST0017�. �tel-01179587�

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

L ist of Figures

Wild 4 Wild 2 Wild 6 Good Poor Wild 1.0 0.8 0.6 0.4 0.2 0.1 0.05 0.005 0.001

(15)

thc

ths

Front-End Back-End

(16)

(17)

(18)

L ist of Tables

4 TOT poor% 2 TOT poor% 6 TOT poor% 4-hour 0.1 Single

Client Side L imited Pages

(19)

(20)

I ntroduction

You affect the world by what you browse

1.1 M otivation

WWW the Web

consumed the largest percentage of bandwidth on the network 100 400 0.2% 0.6% $100; 000 2% $2.5 100 1%

(21)

1.2 Background K nowledge

(22)

Request

Response

L ower L ayer

(23)

3 1

(24)

1.4 Thesis Claim

subjective objectively

(25)

what-if

(26)

Par t I

Web Per for mance Analysis Based on

User Exper iences

(27)

(28)

Over view of Par t I

(29)

(30)

I nitial Platfor m

The only way to know how customers see your business is to look at it through their eyes

2.1 I ntroduction

eye detect explain

2.2 Platfor m Set-up

2.2.1 Browser Plugin waiting time

(31)

End Client Packet Capture raw data Browser Plugin Web Browser Network Interface DBMS

fir st painting event first visual

impact

fir st impression time

full page load event

full load time Tful l

waiting time web session

2.2.2 Packet L evel Capture

httpid httpid

(32)

2.3 M easurement

2.3.1 M etr ics and M ethodology

tD NS

tT CPRT T

tNetRT T

tOﬀset

(33)

tT CPRT T tT CPRT T ti_m m i i-th wm m wm = P ti m (P ti D N S+ P ti T C P RT T+ P ti N et RT T+ P ti Oﬀset+ P ti D at aT r an s Tful l weighted breakdown m br eakdownm = Tful l × wm m 2.3.2 M easurement Result

(34)

2.3.2.1 Wireless vs. Wired Client

2.3.2.2 Student Residence A

tD NS tT CPRT T tD ataTrans

2.3.2.3 Student Residence B

(35)

17:30 18:00 18:30 19:00 19:30 102 103 104 105 M ill is e c o n d ( lo g ) First Impression Full Load 17:30 18:00 18:30 19:00 19:30 102 103 104 105 M ill is e c o n d ( lo g ) First Impression Full Load 17:300 18:00 18:30 19:00 19:30 10 20 30 F u ll L o a d ( s e c ) DNS SYNACK.RTT Net.RTT ServiceOffset DataTransfer 17:300 18:00 18:30 19:00 19:30 2 4 6 8 10 F u ll L o a d ( s e c ) DNS SYNACK.RTT Net.RTT ServiceOffset DataTransfer 17:30 18:00 18:30 19:00 19:30 101 102 103 104 M ill is e c o n d ( lo g ) YouTube RTT in Wireless 17:30 18:00 18:30 19:00 19:30 101 102 103 104 M ill is e c o n d ( lo g ) YouTube RTT in Ethernet good bad bad good bad 0 = 100 > 50 100 100 100

(36)

00:00 00:30 01:00 01:30 02:00 102 103 104 105 M ill is e c o n d ( lo g ) First Impression Full Load 00:000 00:30 01:00 01:30 02:00 5 10 15 20 25 30 F u ll L o a d ( s e c ) DNS SYNACK.RTT Net.RTT ServiceOffset DataTransfer 00:00 00:30 01:00 01:30 02:00 101 102 103 104 M ill is e c o n d ( lo g ) DNS SYNACK RTT 00:000 00:30 01:00 01:30 02:00 2 4 6 8 10 12 d a ta p k t. r e tr . ra te ( % ) YouTube RT Ts

(37)

22:00 00:00 02:00 04:00 06:00 102 103 104 105 M ill is e c o n d ( lo g ) First Impression Full Load Good Bad 22:300 22:40 22:50 23:00 23:10 23:20 10 20 30 40 50 60 F u ll L o a d ( s e c ) DNS SYNACK.RTT Net.RTT ServiceOffset DataTransfer bad 03:000 03:10 03:20 03:30 03:40 03:50 5 10 15 20 F u ll L o a d ( s e c ) DNS SYNACK.RTT Net.RTT ServiceOffset DataTransfer good 22:000 00:00 02:00 04:00 06:00 50 100 150 200 250 M ill is e c o n d

Net.RTT of ACK Payload=0 Net.RTT of ACK Payload>0 SYNACK RTT 22:00 00:00 02:00 04:00 06:00 10−4 10−2 100 102 S e c o n d ( lo g ) Service Offset

2.4 Related Wor k

(38)

not 100 100 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 Handshake RTT at server.side (ms) C D F peak hour non−peak hour

(39)

0 50 100 150 0 0.2 0.4 0.6 0.8 1 milliseconds C D F 1031 test cnxs SYNACK.RTT NET.RTT 101 102 103 0 0.2 0.4 0.6 0.8 1 milliseconds C D F 25 test cnxs SYNACK.RTT NET.RTT “ general-purpose”

2.5 Discussion

(40)

(41)

(42)

On the Relationship between QoS and

QoE for Web Sessions

If your actions inspire others to dream more, learn more, do more and become more, you are a leader

3.1 I ntroduction

3.2 M easurement Set-up

(43)

dis-satisfaction

3.2.2 M etr ics

synack

net.r tt

r tt

ser vice offset

loss duplicate retr ansmission

session object

mean Key Per for mance I ndex (K PI )

[nr :; SY N A CK ; N E T:RT T; SE RV:OF F:; RT T; L OSS; D :RE T R:]

nr.

L OSS D.RETR

(44)

Client Internet Time SYN SYN/ACK S1 S2 ACK S3 GET ACK Data FIN/RST/nextGET net.rtt synack service offset Data ACK Data FIN/RST/nextGET Timer Client Internet Time S1 S2 S3 loss d.retr. 3.2.3 M ethodology i xi− m i ni m axi− m i ni

xi i maxi mi ni maximum minimum

i Signature Extraction Web Session Clustering Signature Normalization Web Session Re -bulding kmeans kmeans

(45)

er ror distance four kmeans 100 10 kmeans 0 2 4 6 8 10 0 0.2 0.4 Numbe r of Clus te rs A vg .( d is t. ) Wild Us e rs a ll doma ins google youtube a ma zon Turning P oints v.s. 0 20 40 60 80 100 0.04 0.06 0.08 0.1 0.12

s e que nce of runs

A vg .( d is t. ) Wild Us e rs a ll doma ins google v.s.

3.3 Controlled Exper iment

(46)

3.3.1 Global Analysis

sharp

500

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c

1#,tot.:3583, ADS L:1194, EUR:1154, S TEX:1235

0 2 4 6 8

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0

500 1000 1500 2000

2#,tot.:536, ADS L:176, EUR:177, S TEX:183

0 2 4 6 8 ra te (% )

500 1000 1500 2000

4#,tot.:525, ADS L:170, EUR:177, S TEX:178

0 10 20 30 40 ra te (% )

3#,tot.:523, ADS L:168, EUR:177, S TEX:178

0 2 4 6 8 clus te r 1 YouTube ccr e ba y s ina ora nge google a ma zon clus te r 2 s ohu.com othe rs clus te r 3 163.com clus te r 4 ba idu.com 10−1 100 101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 2# 3# 4# 1# 500

(47)

200 500

3.3.2 Discussion of Cluster ing Number

>

3.4 Home User s Sur fing in the Wild

feedback button

poor good

(48)

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c

1#,tot.:1053, ADS L:342, EUR:354, S TEX:357

0 2 4 6 8

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 ra te (% )

2#,tot.:3645, ADS L:1215, EUR:1154, S TEX:1276

0 2 4 6 8

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c

3#,tot.:525, ADS L:170, EUR:177, S TEX:178

0 10 20 30 40 ra te (% ) 10−1 100 101 102 0 0.5 1

Full Loa d in S e conds (log)

C D F 1# 3# 2# 3

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 1000 2000 n r. / m s e c

1#,tot.:666, ADS L:325, EUR:311, S TEX:30

0 2 4 6 8

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 1000 2000 ra te (% )

2#,tot.:1148, ADS L:447, EUR:177, S TEX:524

0 2 4 6 8

nr S YNACK NET.RTTS ERV.OFF RTT LOSS D.RETR 0 1000 2000 n r. / m s e c

3#,tot.:2498, ADS L:769, EUR:977, S TEX:752

0 2 4 6 8

nr S YNACK NET.RTT S ERV.OFF RTT LOSS D.RETR 0

1000 2000

4#,tot.:386, ADS L:16, EUR:43, S TEX:327

0 2 4 6 8 ra te (% )

nr S YNACK NET.RTTS ERV.OFF RTT LOSS D.RETR 0 1000 2000 n r. / m s e c

5#,tot.:525, ADS L:170, EUR:177, S TEX:178

0 20 40 10−1 100 101 102 0 0.5 1

Full Loa d in S e conds (log)

C D F 4# 1# 2# 5# 3# 5 poor good

(49)

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c clus te r 1 0 5 10 15

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 clus te r 2 0 5 10 15 ra te (% )

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 clus te r 4 0 5 10 15 ra te (% ) Wild clus te r 1 CN(94%) Othe rs (6%) clus te r 2 EU(49%) N/A(10%) Othe rs (1%) Google (17%) US (23%) EU(51%) clus te r 3 Google (4%) US (20%) Othe rs (5%) N/A(20%) CN(96%) clus te r 4 Othe rs (4%) 10−1 100 101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 4# 3# 1# 2# Wild 4 4 TOT poor%

cl ust er I D nr : SY N A CK N E T:RT T SE RV:OF F RT T L OSS D :RE T R T OT poor %

431

6%

113 76

(50)

70 400

3.4.2 Discussion of Cluster ing Number

70% 10 70% 10 400 100 0% 4% 30% 11%

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 2500 n r. / m s e c 0 5 10 15 20

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 2500 ra te (% ) 0 5 10 15 20 2 10−1 100 101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 2# 1# 2 Wild 2 6

(51)

2 TOT poor%

6 TOT poor%

3.4.3 Per-Client Analysis

we want to compare the KPIs of good and poor Web sessions on a per-user basis to see if we can identify commonalities among the different users

500 500

(52)

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 2500 n r. / m s e c Clus te r 1 0 10 20

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 2500 Clus te r 2 0 5 10 15 20 R a te (% )

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 1000 2000 Clus te r 3 n r. / m s e c 0 10 20

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 1000 2000 Clus te r 4 0 10 20 R a te (% )

nr S YNACK NET.RTTS ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 2500 Clus te r 5 n r. / m s e c 0 5 10 15 20

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 1000 2000 Clus te r 6 0 10 20 R a te (% ) 6 10−1 100 101 102 103 0 0.5 1 Full Loa d (s ) C D F 6# 5# 1# 4# 3# 2# 6 Wild 6 90% 10 seconds 70% 80% 10

(53)

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c good s e s s ions : 228 (87.692308%) 0 5 10 15

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 poor s e s s ions : 32 (12.307692%) 0 5 10 15 ra te ( % ) TO

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c good s e s s ions : 332 (77.209302%) 0 5 10 15

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 poor s e s s ions : 98 (22.790698%) 0 5 10 15 ra te ( % ) STEX

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 n r. / m s e c good s e s s ions : 43 (100.000000%) 0 5 10 15 ra te (% ) ADSL 100−1 100 101 102 103 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F TO.poor S TEX.poor TO.good S TEX.good ADS L.a ll v.s. Good Poor

3.4.4 Session Anomaly Analysis

anomalies

(54)

90% 10

no a noma lie s : 4% poor (250 in tota l)

0 5 10 15

500 1000 1500 2000

1 or 2 a noma lie s : 16% poor (345 in tota l)

0 5 10 15 ra te (% )

3 or 4 a noma lie s : 37% poor (95 in tota l)

0 5 10 15

500 1000 1500 2000

>=5 a noma lie s : 67% poor (43 in tota l)

0 5 10 15 ra te (% ) no a noma ly EU(31%) Google (18%) Othe rs (11%) US (18%) CN(22%) Othe rs (3%) 1 or 2 a noma lie s N/A(13%) EU(41%) Google (6%) US (16%) CN(21%) Othe rs (12%) 3 or 4 a noma lie s US (11%) EU(26%) CN(51%) >=5 a noma lie s CN(91%) N/A(6%) Othe rs (3%) 10−1 100 101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 1 or 2 a noma lie s no a noma ie s >=5 a noma lie s 3 or 4 a noma lie s Wild

3.5 M ore Discussions

(55)

0 1 or 2 3 or 4 ≥ 5

the same Web page across different clients

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 200 400 600 n r. / m s e c

1#,tot.:177, ADS L:0, EUR:177, S TEX:0

0 2 4 6 8

200 400 600

2#,tot.:182, ADS L:169, EUR:0, S TEX:13

0 2 4 6 8 ra te (% )

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 200 400 600 n r. / m s e c

3#,tot.:32, ADS L:2, EUR:0, S TEX:30

0 2 4 6 8

200 400 600

4#,tot.:136, ADS L:1, EUR:0, S TEX:135

0 2 4 6 8 ra te (% ) 0 1 2 3 4 5 6 7 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 4# 3# 1# 2# 10−1 100 101 102 103 104 10−3 10−2 10−1 100 millis e conds C C D F ADS L S TEX EUR (50+ 50+ 50+ 50+ 3050) 5 = 650(ms).

(56)

3%

3.6 Related Wor k

future work leaving the

connection level of analysis

3.7 Conclusion

(57)

(58)

Distr ibuted Analysis and Diagnosis of

Web Per for mance

The only thing that will redeem mankind is cooperation

4.1 I ntroduction

(59)

4.2.1 High-L evel Algor ithm Descr iption not V1 γ poll poll Vi ;j i j max{ ||Vi ;j − Vi ;j − 1||} < γ γ four 4.2.2 Architecture 4.2.2.1 Client Peer s

(60)

Cluster ing

4.2.2.2 Public Coor dinator

(61)

4.2.3 Selected Features t1 t1 t2 DNS delay dns t1;2 t3 t4

TCP connecting (or handshake) delay tcp t3;4

t6 t ′ 6 t′₈ t′₇ t8 t6;8 HTTP quer y delays http t6;8 t9 t8 t9 D N S; T CP; H T T P

l og(D N S + 1); l og(T CP + 1); l og(H T T P + 1)

(62)

popular

full load

(63)

Measurement Locations Connection Type CPU/MEM

WebPage Country Rank #Obj. Vol.(KB) #Servers

loop

$URL ← web list

if $URL ⊂ then

$RandStr ←r andom str i ng

$URL ←$URL + sear ch?q = $RandStr $URL

while f ul l y l oad ti meout do

(64)

< 0.1 0.1 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10

Thre s hold for DKme a ns

R

o

u

n

d

Clus te ring Loga rithm Me trics google s ina libe ro le monde 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10

Thre s hold for DKme a ns

R

o

u

n

d

Clus te ring Loga rithm Me trics google s ina libe ro le monde 1.0 0.8 0.6 0.4 0.2 0.1 0.05 0.005 0.001 0.1

4.3.1.2 Cluster ing Quality

cosine similar ity

A = { a1; a2· · · an} B = { b1; b2· · · bn} Si mi l ar i ty = _{kA kkB k}A ·B = P_n i = 1ai× bi √Pn i = 1(ai)2× √P n i = 1(bi)2 − 1 + 1 + 1 k

(65)

k 1.0 0.1 0.001 0.8 0.85 0.9 0.95 1 0 0.2 0.4 0.6 0.8 1 Google P a ge

Cos ine S imila rity

C C D F Thre s hold=0.001 Thre s hold=0.1 Thre s hold=1.0 Ce ntra lize d 0.8 0.85 0.9 0.95 1 0 0.2 0.4 0.6 0.8 1 S ina P a ge

C C D F Thre s hold=0.001 Thre s hold=0.1 Thre s hold=1.0 Ce ntra lize d 0.8 0.85 0.9 0.95 1 0 0.2 0.4 0.6 0.8 1 Le monde P a ge

C C D F Thre s hold=0.001 Thre s hold=0.1 Thre s hold=1.0 Ce ntra lize d

(66)

4-hour 0.1

Test Scenario ClusterID Cluster1 Cluster2 Cluster3 Cluster4 0.95 0.98 0.99 0.96 0.99 0.97 0.99 0.94 4.3.2 Case Studies

4.3.2.1 Home Networ k Compar isons

local networ k

(67)

clusterID DNS TCP HTTP EUR HOME1 HOME2 NAP TOT.

115.43 20

79.93 14

81.35 16

54.84 16 53.54 12 20:000 00:00 04:00 08:00 12:00 2 4 6 8 google s e c o n d s

NAP EUR HOME1 HOME2

20:000 00:00 04:00 08:00 12:00 1 2 3 4 bing s e c o n d s

NAP EUR HOME1 HOME2

(68)

10 0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1 S e conds C D F s ina NAP EUR HOME1 HOME2 20:000 00:00 04:00 08:00 12:00 10 20 30 s ina s e c o n d s

NAP EUR HOME1 HOME2

400 100

(69)

444.89 480.69 32

82.89 221.66

447.61 511.37 33

78.13 204.64

471.10 468.06 17

391.00 442.71 16

89.11 230.41

75.99 199.80

Single

clusterID tcp EUR HOME1 HOME2 NAP

15 11 27

28.75 57 47 55

Panther Express CDNetworks ChineNet Guangdong Province Network

(70)

Locations IP Hit(%) Org.

77.67.3.x 89.1 Panther Expr ess Cor p

91.202.200.x 65.4 CDNetwor ks networ k

93.188.131.x 25.7 Panther Expr ess Cor p.

93.188.128.x 52.7 Panther Expr ess Cor p.

174.35.6.x 9.7 CDNETWORK S

HOM E2

(71)

(72)

(73)

(74)

Par t I I

(75)

(76)

Over view of Par t I I

human-analysis automatic

(77)

(78)

Design of the Tool

You cannot solve a problem from the same consciousness that created it. You must learn to see the world anew

5.1 Architecture Descr iption

FireL og

5.1.1 Client Side

(79)

Page monitor ing User feedback Pr ivacy policy fir st second Data dispatching

key browsing events

single-thread

(80)

synchronous

Other s

(81)

5.2 M etr ic I llustr ation

t1 t1 t2 DNS delay dns t1;2 t3 t4

TCP connecting (or handshake) delay tcp t3;4

t6 t′₆ t′₈ t′₇ t8 t6;8 HTTP quer y delays http t6;8 t9

5.3 Tool Evaluation

(82)

Browsing Over head:

Timestamp Accur acy:

1

(83)

0 2 4 P L T (m s ) 0 3 6 9 C P U (s ) 0 100 200 M E M (M B ) goog le wik i ikea YT am azon cn n ebay _yaho o FFx FL FB FA −20 −1 0 1 2 3 4 5 0.2 0.4 0.6 0.8 1 Timestamp Drift (ms) C D F S YN S YNACK t3− t ′ 3 t4− t ′ 4 −20 −1 0 1 2 3 4 5 0.2 0.4 0.6 0.8 1 Timestamp Drift (ms) C D F GET s e nt firs t rcv da ta t6− t ′ 6 t8− t ′ 8

(84)

Root Cause Analysis of Web Per for mance

Computers are useless. They can only give you answers

6.1 I ntroduction

(85)

6.2 Diagnosis Scheme

6.2.1 Proposed Heur istic

6.2.1.1 M ain Scheme

Client sidediagnosis

Networ k and ser ver side diagnosis

(86)

Algor ithm 1 I nput: P P L T tsi_{st ar t} tsi_{en d} httpi tcpi p _N _B Output: function I dl e←nul l C.App.Scor e←nul l Ser v.Scor e←nul l for all_P i _∈P do I dl e← (P; tsi_{st ar t}; tsi_{en d}) C.App.Scor e← P I dl e P L T if C.App.Scor e≥ thc then retur n H T T P ← P ht t pi N T CP ← P t cpi p # i p if H T T P ≥ thm s then ⊲ Ser v.Scor e← H T T P_{T C P} if Ser v.Scor e≥ ths then

retur n else

tcpi p_base← (dataset)

T ← ± 5 ⊲

Res← P ; T ; tcpi p_base if Res 6= nul l then

retur n Res

if N ≥ th′_{si ze} B ≥ th′′_{si ze} then retur n

retur n ⊲

Other factor diagnosis

(87)

Algor ithm 2 I nput: dataset Output: function dataset if tcpi p dataset then retur n tcpi p _⊲ B aseL i sti p_←_{nul l} for all i p do ⊲ tcpi p_base← tcpi p_{10%t h} httpi p_{10%t h} tcpi p_base basel i ne retur n tcpi p_base Algor ithm 3 I nput: P T tcpi p_base Output: function P ; T ; tcpi p_base U←nul l ⊲ V←nul l _⊲ ∆ ←nul l ⊲ for i p∈P do ∆ ←tcpi p_{− tcp}i p base ⊲ ∆ U if mean(U)≤ thm sthen ⊲ retur n nul l for i p∈T do

if tcpi p_base≤ tcpgoogl e_base + thm sthen ⊲

∆ ←tcpi p− tcpi p_base ⊲

∆ V

mi n max V ⊲

if V then

retur n nul l

F1← m ean (U)− m ean (V)_{st ddev(V)} ⊲

F2← _{m ean (U)}m ean (V) ⊲

if F1≤ thF 1 F2≥ thF 2 then

retur n else

(88)

minimum maximum ≥ 5 F1 F2 threshold description t hc t hm s t hs t h′si z e t h′′si z e t hF 1 t hF 2

6.2.1.3 Flow-char t I nter pretation

tree-based

before

(89)

(90)

80

mid-box

6.2.2 Tuning of Threshold

thc

(91)

0 20 40 60 80 100 103 104 105 Ctrl.CP U (%) M e d .L o a d ( m s ) ya hoo.com a ka ma i.com Te s tP a ge 0 0.2 0.4 0.6 0.8 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 C.App.S core C D F Ya hoo P a ge 30%−100%CPU Limit 20%CPU Limit 10%CPU Limit 0 0.2 0.4 0.6 0.8 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 C.App.S core C D F Ya hoo P a ge 30%−100%CPU Limit 20%CPU Limit 10%CPU Limit 0 0.2 0.4 0.6 0.8 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 C.App.S core C D F Ya hoo P a ge 30%−100%CPU Limit 20%CPU Limit 10%CPU Limit thc thc= 0.2 thm s thm s = 100 136 271 ths network server 100 1.0 4.0

(92)

0 20 40 60 0 50 100 150 200 250 300 time (s e c) P in g .R T T t o G W ( m s ) 1 1.5 2 2.5 3 3.5 4 0 20 40 60 80 100

S e rv.S core Thre s hold

D e te c t a s ’ lo c a l n e tw ’ (% ) google ya hoo youtube a ka ma i ths ths 1 2 2.5 2.5 thF 1= 1 thF 2= 0.5 86 663.19

(93)

6.3 “ Wild” Deployment

6.3.1 M easurement and Dataset

# page # domai n # obj ect # page # domai n # obj ect

A (F R) B (F R) C(CN ) 6.3.2 L imitation Analysis 21% 29% 32% 28% 39% 21% 44% 6.3.2.1 Client L imitation 0.2

(94)

0.9 0.2 0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Re a l World "C.App.S core "

C D F Us e r A Us e r B Us e r C

6.3.2.2 Networ k Per for mance

ti p_base ti p_base > tcpgoogl e_base + 100 ms

(95)

10 100 1000 0 0.2 0.4 0.6 0.8 1 ms (log s ca le ) C D F

All Google TCP Conne cting De la ys

Us e r A Us e r B 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Fra ction of "fa r.a wa y" s e rve r conta cte d

C

D

F

us e r A: Ne twork.Limite d Wild.Inte rne t.Limit

Loca l.Ne tw.Limit

(96)

10 100 1,000 10,000 0 0.2 0.4 0.6 0.8 1 ms C D F

Us e rB. S e rve r.S ide .Limite d S e s s ions

tcp http 10 100 1,000 10,000 0 0.2 0.4 0.6 0.8 1 ms C D F

Us e r A. NonS e rv.Limite d S e s s ions

tcp http countr y count CN JP Or g. count CDNETWORK S NC Numer icable S.A.

Panther Express DNS s e rve r ACK Da ta Da ta time (t) ba ck-e nd s e rve rs DNS TCP Ha nds ha ke front-e nd s e rve r clie nt GET s e rv.offs e t ne t.rtt

(97)

6.3.2.4 M ore Discussions of Ser ver Side I nter action Ser ver side behavior

server-side limited serv.offset net.rtt 0 500 400 < 100 300 < 100 0 > 300 200

(98)

100 200 300 400 0 200 400 600 800

que ry inte rva l (s e c.)

s e rv .o ff s e t (m s ) 100 200 300 400 0 200 400 600 800 s e rv .o ff s e t (m s )

que ry inte rva l (s e c.)

300 400 500 600 700 193. 51.2 24.4 2 193. 51.2 24.4 3 193. 51.2 24.6 193. 51.2 24.7 S e rv. O ff ( m s ) 300 400 500 600 700 77.6 7.11 .9 80.2 39.2 21.4 3 23.6 2.1. 80 213. 254. 248. 241 S e rv .O ff ( m s ) Front-End Back-End same

1st :l oad:t i m e− 2n d:l oad:t i m e

1st :l oad:t i m e × 100%

10%

(99)

0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1 100%*(1st.load−2nd.load)/1st.load C C D F da y #1 da y #2 da y #3 da y #4 101 102 103 104 101 102 103 104 1s t time brows e (ms ) 2 n d t im e b ro w s e ( m s )

http de la ys for the s a me obj.

it allows us to discover such interesting domain and reveal details of their server-side interactions

(100)

6.3.3 Discussion 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 tot.DNS /tot.de la y C D F Us e r A Us e r B Us e r C time variant dynamic

(101)

P

I dl e P dns P t cp P ht t p N r :Obj D L :Obj B yt es D L :B yt es

Cl i ent :L i mi t 0.83

Ser v:Si de:L i mi t 0.44

N et w: L i mi t 0.60

6.4 Compar ison With Other Related Wor k

[C:A pp:Scor e; T CP; H T T P; N r :] C.App.Scor e T CP H T T P N r. four same

clustering can be considered as a complement for our diagnosis heuristic

Client Side L imited Pages

Cl ust er I D C:A pp:Scor e T CP H T T P N r : T ot :P age

1533 ms 231

10954 ms

(102)

101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F Clus te r 1 Clus te r 2 Clus te r 3 Clus te r 4

Client Side L imited

Tools:

Troubleshooting:

Web Per for mance and User I mpact:

(103)

(104)

(105)

(106)

Par t I I I

A New M ethod For Analysis of the Web

Browsing

(107)

(108)

Over view of Par t I I I

a single entity

(109)

(110)

Cr itical Path M ethod for Web Analysis

A project without a critical path is like a ship without a rudder

7.1 I ntroduction

all

Cr itical Path M ethod key elements

what-if

any project with interdependent activities can apply this method of analysis

project activities

This process determines which activities are “ critical” (i.e. on the longest path) and which have “ total float” (i.e. can be delayed without making the project longer)

(111)

7.2 Obj ect Download Activities

tdn s tt cp tht t p tdat a tdat a:en d tdn s undefined tt cp undefined tst ar t ten d tst ar t = mi n(tdn s; tt cp; tht t p)

ten d = max(tdat a; tdat a:en d)

tst ar t ten d

7.3 Dependency Discover y

(112)

7.3.1 Dependency Definition

one object depends on a list of other objects if downloading the former children object can only occur when the downloads of all the latter parent objects are finished

1) Reference Dependency

2) Redirection Dependency

3) Blocking Dependency

4) I nfer red Dependency

7.3.2 Dependency I dentification

{ }

(113)

immediately after

hidden

finish before

F = max(ten d(A); ten d(B )) − mi n(tst ar t(A); tst ar t(B ))

TA = ten d(A) − tst ar t(A) TB = ten d(B ) − tst ar t(B ) F : ( < TA + TB : parallel running ≥ TA + TB : not in parallel

(114)

same r andom-ness

7.3.3 Algor ithm for dependency extr action

tst ar t(i ) ten d(i ) P 1(i ) P 2(i ) P 3(i ) i Notation Meaning tst a r t(i ) i ten d(i ) i P 1(i ) i P 2(i ) i P 3(i ) i E xpl i ci t P ar ent (i ) i B l ockP ar ent (i ) i i I nf er P ar ent (i ) i P ar ent Set (i ) i i i

(115)

Algor ithm 4

I nput: tst ar t(i ) ten d(i )

Output: P ar entSet(i )

function

tst ar t(i )

for i _∈sor ted obj ects do B l ockP ar ent(i )←nul l

E xpl i ci tP ar ent(i )

i

for all j tst ar t(j ) > ten d(i ) & & j =∈E xpl i ci tP ar ent(i ) do

j P ar entSet(i )

Algor ithm 5

I nput: tst ar t(i ) ten d(i )

Output: B l ockP ar ent(i ) function

for i ∈sor ted obj ects do B l ockChi l dr en←nul l B l ockP ar ent(i )←nul l

if P 1(i ) ≥ max1 P 2(i ) ≥ max2 P 3(i ) ≥ max3 then for all J tst ar t(i ) do J ten d(J ) x ←argmin i (tst ar t(i )) ∀i =∈B l ockChi l dr en tst ar t(i )≥ ten d(J.pop) if tst ar t(x) − ten d(J.pop) < δthen

B l ockP ar ent(x) ←J.pop

x B l ockChi l dr en ⊲ x Algor ithm 6

I nput: P ar entSet(i ) Output: I nf er P ar ent(i )

function

for i P ar entSet(i ) 6= nul l do P ar entSet(i )← (i ; P ar entSet(i ))

I nf er P ar ent(i ); P ar entSet(i ) ← (i ; P ar entSet(i ))

if 6= then

Sel ected← (i ; P ar entSet(i )) Sel ected I nf er P ar ent(i )

P ar entSet(i )←nul l ⊲

imme-diately after

(116)

Algor ithm 7 function i P ar entSet(i ) for j P ar entSet(i ) do for all do if ∃i ; j then j P ar entSet(i ) retur n P ar entSet(i ) function i P ar entSet(i ) P ar ents(i )←nul l for j P ar entSet(i ) do for all do if ∃j ten d(j ) > ten d(x) + α x ∈P ar entSet(i ) x 6= j then j P ar entSet(i ) j P ar ents(i ) retur n P ar ents(i ) P ar entSet(i ) function i P ar entSet(i )

Sel ected←nul l

for j ∈P ar entSet(i ) do tst ar t(i ) ten d(j )

(tst ar t(i ) − ten d(j )) ⊲ > θ

if ten d(j ) tst ar t(i ) > β then

j Sel ected retur n Sel ected

i union

E xpl i ci tP ar ent(i ) B l ockP ar ent(i ) I nf er P ar ent(i )

7.4 Cr itical Path Extr action

V E V

(117)

Name Description Value max1 6 max2 8 max3 256 δ 10 α 1 β 0:7

θ ten d(par ent ) tst a r t(chi l d) 80%

E i j

downloading delay client delay

Ti i Ti ;j i j i Ti = ten d(i ) − tst ar t(i ) all ten d(3) > max(ten d(1); ten d(2)) T3;4 T1;4 T2;4 T1;4= T2;4= 0.01 root leaves Cr itical Path

7.5 Exper iment Results

(118)

levenshtein edit distance 7.5.2 Case Study 7.5.2.1 Google Page ◦ ◦ ◦ ◦ ◦ ◦

(119)

◦ 7.5.2.2 Wikipedia Page ◦ ◦ ◦ ◦

(120)

◦

7.6 Usage of Cr itical Path M ethod

“ activities in this path should have a direct impact on the whole project”

7.6.1 Per for mance Prediction

What-if

(121)

0 100 200 0 new what-if 100 1 100 5% 200 10% 15% 20000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1 Millis e conds C D F Google P LT(0ms ) P LT(100ms ) P LT(200ms ) P .P LT(100ms ) P .P LT(200ms ) 20000 4000 6000 8000 10000 12000 0.2 0.4 0.6 0.8 1 Millis e conds C D F S igcomm P LT(0ms ) P LT(100ms ) P LT(200ms ) P .P LT(100ms ) P .P LT(200ms )

(122)

7.6.1.2 What-if Using Cache 0 0 2000 4000 6000 8000 10000 0 0.2 0.4 0.6 0.8 1 ms C D F No ca che

P re dicte d with Ca che with ca che 0 2000 4000 6000 8000 10000 0 0.2 0.4 0.6 0.8 1 ms C D F No ca che

P re dicte d with Ca che with ca che

10%

(123)

(124)

4.7 5:1− 4:7_5:1 = 8%

7.6.3 K ey Obj ect Extr actor

> 0% 20% 40% 60% 80% 100% 1.html 2.html 4.js 10.js 7.gif 5.js 8.png 6.png 9.gif 3.png

(125)

7.7 Discussion

similarity

Gbase G1 distances

D i st(G1; Gbase) = D el (G1; Gbase) + Add(G1; Gbase)

D el (G1; Gbase) G1 Gbase

Add(G1; Gbase) G1 Gbase

G1 Gbase Gbase G1 β > 0 10 20 30 40 50 0 2 4 6 8 10 12 14 Google P a ge Numbe r of S ha re d S e s s ions A vg .T re e .D is ta n c e Corr.Thr.=0.7 Corr.Thr.=0.9

(126)

7.8 Related Wor k

Char acter ization of Web tr affic

Per for mance M odeling

(127)

(128)

Conclusions for Par t I I I

(129)

(130)

Conclusion

The true sign of intelligence is not knowledge but imagination

8.1 Evaluation of the Thesis Wor k

Our initial platform which combines measurements at both higher user’s level and lower net-work level can greatly reduce the gap between user’s subjective QoE (Quality of Experience) and objectively measured QoS(Quality of Service)

Different kinds of clustering techniques can be adopted to study the Web performance

Using traditional centralized clustering helps us to reveal the relationship between user level satisfaction (QoE) and lower network level (QoS) metrics

Distributed clustering provides scalable approach to use the measurements performed of different users

(131)

We propose root cause analysis model based on passive measurements from user’s wild brows-ing. Our method is implemented via the Mozilla development interfaces and stores the data in a database

Sharing browsing experiences help us develop a novel approach in the analysis of the Web page browsing. Our novel approach named Critical Path Method can be useful for the Web per-formance analysis in several scenarios including providing what-if predictions and key object extraction

15%

8.2 Future Wor k

(132)

10.8% 46.9%

M odel Development

(133)

(134)

(135)

Proceedings of the 2nd international conference on COMmunication systems and NETworks

Proceedings of IMC’ 10

Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation

IETF RFC3390 IETF RFC5681

(136)

HomeNets, ACM SIGCOMM Workshop on Home Networks

Proceedings of IEEE INFOCOM 2009

IETF. I-D draft-ietf-tcpm-fastopen-02

Proceedings of SIAM Int’ l Conf. Data Mining

ACM Computer Communication Review

TMA 2011, third International Workshop on Traffic Monitoring and Analysis

Association for Information Systems

Proceedings of IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology

Intelligent Net-working and Collaborative Systems

TMA 2012

(137)

Application Level Performance in Wired and Wireless Environments

TMA 2011, third International Workshop on Traffic Monitoring and Analysis

Proceedings of MASCOTS

Proceedings of 23rd International Teletraffic Congress

Proceedings of the Association of Information Systems Americas Conference

International Conference On Mobile Systems

ITU-T Recommandation G.1030

Causality Models, Reasoning and Inference 2nd Edition

HotMetrics, ACM Sigmetrics Workshop

ACM SIGCOMM Workshop on Measurements Up the Stack

IEEE Computer

Proceedings of IMC ’ 10

Proceedings of the 7th USENIX conference on Networked Systems Design and Implementation

(138)

ACM SIGCOMM Workshop on Measurements Up the Stack

Proceedings of AMCIS

IETF RFC6298

Proceedings of the Seventeenth International Conference on Machine Learning

Proceedings of CoNEXT’ 11

Proceedings of PAM’ 12

High Performance Web Sites: Essential Knowledge for Front-End Engineers

Even Faster Web Sites

Journal of the Audio Engineering Society

PAM’ 12

Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Appli-cations

Computer Networks

IETF RFC2988

(139)

(140)

Résumé en Fr ançais

A.1 I ntroduction

A.1.1 M otivation

(141)

A.1.2 Obj ectifs de la Thèse

subjective objectivement

(142)

A.2 Contr ibutions Pr incipales

A.2.1 Analyse de la per for mance Web basée sur les expér iences de l’ utilisateur

M easurement Set-up End Clie nt P a cke t Ca pture ra w da ta Brows e r P lugin We b Brows e r Ne twork Inte rfa ce DBMS Exempled’ Utilisation

(143)

4 TOT poor%

431

6%

113 76

1310 1122 505 11.7% 64%

(144)

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 clus te r 2 0 5 10 15 ra te (% )

nr S YNACK NET.RTT S ERV.OFF RTT LOS S D.RETR 0 500 1000 1500 2000 clus te r 4 0 5 10 15 ra te (% ) 10−1 100 101 102 103 0 0.2 0.4 0.6 0.8 1 Full Loa d (s ) C D F 4# 3# 1# 2#

(145)

Scénar io

Développement d’ Outil

Expér ience de labor atoire thc

(146)

(147)

0 0.2 0.4 0.6 0.8 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 C.App.S core C D F Ya hoo P a ge 30%−100%CPU Limit 20%CPU Limit 10%CPU Limit 1 1.5 2 2.5 3 3.5 4 0 20 40 60 80 100

S e rv.S core Thre s hold

D e te c t a s ’ lo c a l n e tw ’ (% ) google ya hoo youtube a ka ma i Vér itable Déploiement

(148)

# page # domai n # obj ect # page # domai n # obj ect A (F R) B (F R) C(CN ) 21% 29% 32% 28% 39% 21% 44%

Conclusion de la par tie I I

(149)

M éthode du chemin cr itique

(150)

Exemple d’ utilisation

(151)

20000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1 Millis e conds C D F Google P LT(0ms ) P LT(100ms ) P LT(200ms ) P .P LT(100ms ) P .P LT(200ms )

(152)

0% 20% 40% 60% 80% 100% 1.html 2.html 4.js 10.js 7.gif 5.js 8.png 6.png 9.gif 3.png

Fra ction of Time s tha t a re S e le cte d in Critica l P a th

(153)

A.3 Conclusion

A.3.1 Contr ibutions

Notre plateforme initiale, qui combine des mesures au niveau de l’ utilisateur et au niveau du réseau, peut réduire considérablement l’ écart entre la QoE subjective de l’ utilisateur (Quality of Experience) et la QoSmesuré objectivement (Quality of Service).

Différents types detechniques declustering peuvent êtreadoptées pour étudier les performances Web

Nous proposons des modèle d’ analyse de causes profondes basés sur des mesures passives de navigation de l’ utilisateur. Notre méthode est basée sur les interfaces de développement de Mozilla et stocke les données dans une base de données.

(154)

Le partage des expériences de navigation nous à aidé à développer une nouvelle approche dans l’ analyse de la navigation de pages Web. Notre nouvelle approche, appelée méthode du chemin critique, peut être utile pour l’ analyse des performances Web dans plusieurs scénarios, y compris la production de prédictions what-if et l’ extraction d’ objet clé

A.3.2 Per spectives

Développement d’ outil

(155)

(156)

Analysis and diagnosis of Web performances from an end-user perspective

HAL Id: tel-01179587

https://pastel.archives-ouvertes.fr/tel-01179587

Analysis and diagnosis of Web performances from an

end-user perspective

Heng Cui

To cite this version:

Contents

L ist of Figures

L ist of Tables

I ntroduction

1.1

M otivation

1.2

Background K nowledge

1.4

Thesis Claim

Par t I

Web Per for mance Analysis Based on

User Exper iences

Over view of Par t I

I nitial Platfor m

2.1

I ntroduction

2.2

Platfor m Set-up

2.3

M easurement

2.4

Related Wor k

2.5

Discussion

On the Relationship between QoS and

QoE for Web Sessions

3.1

I ntroduction

3.2

M easurement Set-up

3.3

Controlled Exper iment

3.4

Home User s Sur fing in the Wild

3.5

M ore Discussions

3.6

Related Wor k

3.7

Conclusion

Distr ibuted Analysis and Diagnosis of

Web Per for mance

4.1

I ntroduction

Par t I I

Over view of Par t I I

Design of the Tool

5.1

Architecture Descr iption

5.2

M etr ic I llustr ation

5.3

Tool Evaluation

Root Cause Analysis of Web Per for mance

6.1

I ntroduction

6.2

Diagnosis Scheme

6.3

“ Wild” Deployment

6.4

Compar ison With Other Related Wor k

Par t I I I

A New M ethod For Analysis of the Web

Browsing

Over view of Par t I I I

Cr itical Path M ethod for Web Analysis

7.1

I ntroduction

7.2

Obj ect Download Activities

7.3