• Aucun résultat trouvé

Key Object Extractor

Critical Path Method for Web Analysis

3) Blocking Dependency: When a new object is ready to download but is blocked due to a limit on the maximum allowed parallel connections 3 , then, the new object can only be started when one of these

7.6 Usage of Critical Path Method

7.6.3 Key Object Extractor

Finally, we show another usage by our method, which is to extract key objects inside a page. A Web page may contain multiple objects, which may have different impact w.r.t. the whole Web page rendering time. We still use google experiment samples as used in Sec. 7.5.2.1. We extract one critical path for each web page download. Across all the downloads, we then count the number of times each object was part of the critical path. Fig. 7.10(a) shows the percentage of times.

We see that objects like the main HTML (1.htmland2.html) and other script files (4.jsand10.js) are counted into the critical path much more frequently (e.g. for>80% cases); however, icon objects like 3.png are rarely critical. In this case, we say that objects that are frequently included in the critical path play more important roles for thegoogle.ompage rendering while others are not.

0% 20% 40% 60% 80% 100%

1.html 2.html 4.js 10.js 7.gif 5.js 8.png 6.png 9.gif 3.png

Fraction of Times that are Selected in Critical Path

(a) Key Objects Distribution inGooglePage

(b) Example ofGooglePage Object Activity

Figure 7.10: Key Object Extraction by CPM

106 7. Critical Path Method for Web Analysis

7.7 Discussion

In this chapter, we presented a methodology called Critical Path Method to analyze the Web page download performance. To generate the critical path, extraction of the dependency graph is the key.

We use some features by the Web browser itself and shared information of multiple browsing experi-ences to preform the dependency graph extraction. As we have shown, we browse a given Web page in an automatic way to accumulate enough samples. In this section, we discuss the impact of the number of shared samples on the quality of the dependency graph generation. We still use the Google case that are discussed in Sec. 7.5.2.1. We make the dependency graph generated by query all 50 shared sam-ples as baseline; then we comparesimilaritybetween the baseline graph and other dependency graphs generated by query less amount of shared samples. Since the dependency graph is represented as a

“tree”, therefore, we quantify the similarity between two tree graphsGbaseandG1 by theirdistances computed as:

Dist(G1, Gbase) =Del(G1, Gbase) +Add(G1, Gbase)

whereDel(G1, Gbase)represents number of links that exist in graphG1 but not exist inGbase, while Add(G1, Gbase)means number of links that do not exist in graphG1but exist inGbase. This distance formula represents total number of operations (either deleting or adding links) forG1 to achieve the same graph ofGbase. In our case,Gbaserefers to the baseline dependency graph andG1can be any dependency graph obtained using fewer samples.

Fig. 7.11 shows the results. For each test case, we randomly select a fixed number of samples from all the data. We generate 30 dependency graphs in each test and compute the average distance with the baseline graph. Due to the fact that after query other samples, we use correlation study (as discussed in Sec. 7.3.3) to check existence of some parental relationships, therefore, we also plot the results for different correlation threshold (β in Tab. 7.2) in Fig. 7.11. From this figure, we can see that distance between baseline and the dependency graphs inferred from small number of samples are typical larger.

While increasing the shared sample size (e.g. from 5 to 20), distance decreases rapidly. When the shared sample size is large enough (e.g. >30), distance is stable around 1, meaning that the inferred dependency graph does not have much change comparing to the baseline graph. Moreover distance changes for different correlation thresholds are also similar.

0 10 20 30 40 50

Figure 7.11: Discussions on Number of Shared Experiences (50 browsing samples in total)

7. Critical Path Method for Web Analysis 107

7.8 Related Work

We can classify the literature related to Web performance studies as follows:

Characterization of Web traffic: Gebert et al. [56] show that HTTP flows dominate all the traffic in their 14 days measurement from an access network. Gehlen et al. [57] use one week worth of data to show that most of today’s Web traffic is handled by a few organizations. Butkiewicz et al. [43] study Web page content-type features and their impact on the page performance. Ihm et al. [64] use long term log data to study the Web page content trend during recent five years.

Performance Modeling: Butkiewicz et al. [43] define a set of Web page complexity metrics and use linear regression to relate these metrics with page load time. Li et al. [73] focus on Google and Yahoo pages and propose WebProphet, a system for dependency extraction and page load performance prediction. Wischik [87] expresses page load time as a function of network conditions such as RTT and loss rate.

Performance Improvements: Some papers propose methods to improve the Web page load perfor-mance. For example, Google and Yahoo! publish recommendations [14] [34] on how to construct Web pages that load fast. Other papers study improvements to TCP that is used by HTTP to transfer the data. S. Radhakrishnan et al. [78] propose a fast open protocol for TCP connections that enables data exchange during the TCP handshake, which speeds up the Web page downloading. Other pa-pers suggest to increase the TCP initial congestion window [51], to shorten the TCP retransmission timer [76].

Among these works, the closest to our work is the chapter on WebProphet [73] where authors propose to use controlled experiments to perturb the download time of the objects of a Web page to detect the object dependency relationships. As we have seen in the course of this chapter, such dependencies are also crucial in our our work. However, we use a completely different mechanism to extract this dependency information.

108 7. Critical Path Method for Web Analysis

109