• Aucun résultat trouvé

Performance with a different job load

Data replication in grid environments

3.5 Selective-rank replication algorithm

3.6.2 Experimental results

3.6.2.3 Performance with a different job load

We now examine the scalability of our MaxDAR optimizers by varying the number of submitted jobs. The number of created replicas and job execution time for five optimizers is shown in Figure 3.8(a) and Figure 3.8(b).

As shown in Figure 3.8(a), MaxDAR optimizers make efficient use of the storage resources and network bandwidth with lower number of created repli-cas. The reason is that E-Bino and E-Zipf try to replicate the most popular files to local storage, which leads to a linear increase in the number of created replicas for the number of executed jobs. On the contrary, MaxDAR optimiz-ers replicate files by considering the overall data availability benefit gain. As the number of sites is fixed, the overall data availability will have an upper boundary; therefore increasing the number of submitted jobs does not affect the number of created replicas.

As illustrated in Figure 3.8(b), all five optimizers produce a similar job execution time. However, as the number of jobs submitted is increased, the performance of the MaxDAR optimizers is improved. Their job execution time gets slightly faster than E-Bino and E-Zipf.

Data replication in grid environments 93

FIGURE 3.8: Replication number and job execution time when varying the number of submitted jobs.

94 Fundamentals of Grid Computing

3.7 Concluding remarks

In this chapter, we address the issue of providing an efficient data access in data grid environment. We assume that data files are not of equal im-portance and they can be classified according to their degree of imim-portance, called selective-rank. We formulate the replication problem as an optimiza-tion problem where the aim is to maximize the overall data availability of the system according to a selective-rank with storage constraints. The proposed replication algorithm maximize data availability with selective rank (Max-DAR) focuses on improving data availability with differentiated replication.

The storage cost is utilized for the replica replacement strategy. Based on the MaxDAR algorithm, three variant optimizers have been developed.

The performance of the MaxDAR optimizers is evaluated by simulations in OptorSim. Our simulation results show that the MaxDAR optimizers achieved better performance than other replica schemes of OptorSim whilst also assur-ing the good overall data availability of systems.

Data replication in grid environments 95

3.8 References

[Akamai, 2009] Akamai (2009). Documentation. Available online at: http:

//www.akamai.com (accessed May 1, 2009).

[Androutsellis-Theotokis and Spinellis, 2004] Androutsellis-Theotokis, S. and Spinellis, D. (2004). A survey of peer-to-peer content distribution technolo-gies. ACM Computing Survey, 36(4):335–371.

[Barish and Obraczka, 2000] Barish, G. and Obraczka, K. (2000). World wide web caching: trends and techniques. IEEE Communications Magazine, pages 178–185.

[Bell et al., 2002] Bell, W. H., Cameron, D. G., Capozza, L., Millar, A. P., Stockinger, K., and Zini, F. (2002). Simulation of dynamic grid replication strategies in OptorSim. InProceedings of the 3rd International Workshop on Grid Computing (GRID’02), pages 46–57, London, UK. Springer-Verlag.

[Bell et al., 2003] Bell, W. H., Cameron, D. G., Carvajal-Schiaffino, R., Mil-lar, A. P., Stockinger, K., and Zini, F. (2003). Evaluation of an economy-based file replication strategy for a data grid. InProceedings of the 3rd In-ternational Symposium on Cluster Computing and the Grid (CCGRID’03), page 661, Washington, DC, USA. IEEE Computer Society.

[Bernstein et al., 1987] Bernstein, P. A., Hadzilacos, V., and Goodman, N.

(1987). Concurrency control and recovery in database systems. Addison-Wesley.

[Biomedical informatics research network (BIRN), 2005] Biomedical infor-matics research network (BIRN) (2005). Documentation. Available online at: http://www.nbirn.net/ (accessed May 1, 2009).

[Cameron et al., 2004a] Cameron, D., Casey, J., Guy, L., Kunszt, P., Lemaitre, S., Mccance, G., Stockinger, H., Stockinger, K., Andronico, G., Bell, W., Ben-Akiva, I., Bosio, D., Chytracek, R., Domenici, A., Donno, F., Hoschek, W., Laure, E., Lucio, L., Millar, P., Salconi, L., Segal, B., and Silander, M. (2004a). Replica management in the European data grid project. Journal of Grid Computing, 2(4):341–351.

[Cameron et al., 2004b] Cameron, D., Millar, A. P., and Nicholson, C.

(2004b). OptorSim: a simulation tool for scheduling and replica optimi-sation in data grids. InProceedings of Computing in High Energy Physics (CHEP 2004), Interlaken, Switzerland.

[Cameron et al., 2004c] Cameron, D., Millar, A. P., Nicholson, C., Carvajal-Schiaffino, R., Stockinger, K., and Zini, F. (2004c). Analysis of scheduling

96 Fundamentals of Grid Computing

and replica optimisation strategies for data grids using OptorSim. Journal of Grid Computing, 2(1):57–69.

[Capozza et al., 2002] Capozza, L., Stockinger, K., and Zini, F. (2002).

Preliminary evaluation of revenue prediction functions for economically-effective file replication. Technical Report DataGrid-02-TED-020724, CERN, Geneva, Switzerland.

[Carman et al., 2002] Carman, M., Zini, F., Serafini, L., and Stockinger, K.

(2002). Towards an economy-based optimisation of file access and replica-tion on a data grid. In Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), page 340, Washington, DC, USA. IEEE Computer Society.

[Chervenak et al., 2005] Chervenak, A., Schuler, R., Kesselman, C., Koranda, S., and Moe, B. (2005). Wide area data replication for scientific collab-orations. In Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing (GRID’05), pages 1–8, Washington, DC, USA. IEEE Computer Society.

[Chervenak et al., 2002] Chervenak, A. L., Deelman, E., Foster, I. T., Guy, L., Hoschek, W., Iamnitchi, A., Kesselman, C., Kunszt, P. Z., Ripeanu, M., Schwartzkopf, R., Stockinger, H., Stockinger, K., and Tierney, B. (2002).

Giggle: a framework for constructing scalable replica location services. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–17, Baltimore, Maryland, USA.

[Clarke et al., 2002] Clarke, I., Miller, S. G., Hong, T. W., Sandberg, O., and Wiley, B. (2002). Protecting free expression online with Freenet. IEEE Internet Computing, 6(1):40–49.

[Compact muon solenoid, 2009] Compact muon solenoid (2009). Project doc-umentation. Available online at: http://cmsinfo.cern.ch/ (accessed May 1, 2009).

[Demers et al., 1987] Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., and Terry, D. (1987). Epi-demic algorithms for replicated database maintenance. In Proceedings of the 6th annual ACM Symposium on Principles of Distributed Computing (PODC’87), pages 1–12, New York, NY, USA. ACM Press.

[Digital island, 2009] Digital island (2009). Project documentation. Available online at: http://www.digisle.com (accessed May 1, 2009).

[Dullmann et al., 2001] Dullmann, D., Hoschek, W., Jaen-Martinez, J., Se-gal, B., Stockinger, H., Stockinger, K., and Samar, A. (2001). Models for replica synchronisation and consistency in a data grid. High Performance on Distributed Computing, 0:67.

Data replication in grid environments 97 [European datagrid, 2009] European datagrid (2009). Project

documenta-tion. Available online at: http://www.edg.org (accessed May 1, 2009).

[Fanning, 2001] Fanning, S. (2001). Napster. Available online at: http://

www.napster.com (accessed May 1, 2009).

[Gnutella, 2009] Gnutella (2009). Documentation. Available online at: http:

//www.gnutella.com (accessed May 1, 2009).

[Gridpp, 2009] Gridpp (2009). edg-replica-manager 1.0. Available online at: http://www.gridpp.ac.uk/wiki/EDG_Replica_Manager (accessed May 1, 2009).

[GriPhyN, 2009] GriPhyN (2009). Grid physics network website project doc-umentation. Available online at: http://www.griphyn.org/ (accessed May 1, 2009).

[Holtman, 2001] Holtman, K. (2001). The compact muon solenoid (CMS) experiment note: data grid system overview and requirements. Technical Report 2001/037, CERN, Geneva, Switzerland.

[Hoschek et al., 2000] Hoschek, W., Jean-Martinez, J., Samar, A., Stockinger, H., and Stockinger, K. (2000). Data management in an international data grid project. InProceedings of the 1st IEEE/ACM International Workshop on Grid Computing (Grid’00), volume 1971, pages 77–90, Bangalore, India.

Springer-Verlag.

[Kangasharju et al., 2002] Kangasharju, J., Roberts, J. W., and Ross, K. W.

(2002). Object replication strategies in content distribution networks. Com-puter Communications, 25(4):367–383.

[Kubiatowicz et al., 2000] Kubiatowicz, J., Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. (2000). OceanStore: an architecture for global-scale persistent storage. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 190–201, New York, NY, USA. ACM Press.

[Lamehamedi et al., 2003] Lamehamedi, H., Shentu, Z., Szymanski, B., and Deelman, E. (2003). Simulation of dynamic data replication strategies in data grids. InProceedings of the 17th International Symposium on Parallel and Distributed Processing (IPDPS’03), page 100, Washington, DC, USA.

IEEE Computer Society.

[Large hadron collider, 2006] Large hadron collider (2006). Computing grid project. Available online at: http://lcg.web.cern.ch/LCG(accessed May 1, 2009).

98 Fundamentals of Grid Computing

[Large hadron collider, 2009] Large hadron collider (2009). Documentation.

Available online at: http://lhc.web.cern.ch/lhc/ (accessed May 1, 2009).

[Lei et al., 2008] Lei, M., Vrbsky, S. V., and Hong, X. (2008). An on-line replication strategy to increase availability in data grids.Future Generation Computer Systems, 24(2):85–98.

[Lin et al., 2006] Lin, Y.-F., Liu, P., and Wu, J.-J. (2006). Optimal placement of replicas in data grid environments with locality assurance. InProceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS’06), pages 465–474, Washington, DC, USA. IEEE Computer So-ciety.

[Loukopoulos et al., 2002] Loukopoulos, T., Ahmad, I., and Papadias, D.

(2002). An overview of data replication on the internet. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’02), pages 31–37, Washington, DC, USA. IEEE Com-puter Society.

[Martins et al., 2006] Martins, V., Pacitti, E., and Valduriez, P. (2006). Sur-vey of data replication in P2P systems. Technical report, INRIA.

[Park et al., 2003] Park, S.-M., Kim, J.-H., Ko, Y.-B., and Yoon, W.-S.

(2003). Dynamic data grid replication strategy based on internet hierarchy.

InProceedings of the 2nd International Workshop on Grid and Cooperative Computing (GCC’2003), pages 838–846, Shanghai, China. Springer-Verlag.

[Particle physics data grid, 2009] Particle physics data grid (2009). Docu-mentation. Available online at: http://www.ppdg.net/ (accessed May 1, 2009).

[Pierre et al., 2002] Pierre, G., van Steen, M., and Tanenbaum, A. S. (2002).

Dynamically selecting optimal distribution strategies for web documents.

IEEE Transactions on Computers, 51(6):637–651.

[Ranganathan and Foster, 2001] Ranganathan, K. and Foster, I. (2001).

Identifying dynamic replication strategies for a high-performance data grid.

In Proceedings of the 2nd International Workshop on Grid Computing (GRID’01), pages 75–86, London, UK. Springer-Verlag.

[Ranganathan et al., 2002] Ranganathan, K., Iamnitchi, A., and Foster, I.

(2002). Improving data availability through dynamic model-driven repli-cation in large peer-to-peer communities. In Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID ’02), page 376, Washington, DC, USA. IEEE Computer Society.

[Rhea et al., 2001] Rhea, S., Wells, C., Eaton, P., Geels, D., Zhao, B., Weath-erspoon, H., and Kubiatowicz, J. (2001). Maintenance-free global data storage. IEEE Internet Computing, 5(5):40–49.

Data replication in grid environments 99 [Stockinger et al., 2002] Stockinger, H., Samar, A., Holtman, K., Allcock, B., Foster, I., and Tierney, B. (2002). File and object replication in data grids.

Cluster Computing, 5(3):305–314.

[Wiesmann et al., 2000a] Wiesmann, M., Pedone, F., Schiper, A., Kemme, B., and Alonso, G. (2000a). Understanding replication in databases and dis-tributed systems. InProceedings of 20th International Conference on Dis-tributed Computing Systems (ICDCS’2000), Taipei, Taiwan, R.O.C. IEEE Computer Society.

[Wiesmann et al., 2000b] Wiesmann, M., Schiper, A., Pedone, F., Kemme, B., and Alonso, G. (2000b). Database replication techniques: a three pa-rameter classification. InProceedings of the 19th IEEE Symposium on Re-liable Distributed Systems (SRDS’00), page 206, Washington, DC, USA.

IEEE Computer Society.

Chapter 4