• Aucun résultat trouvé

Toward pervasive, autonomic and on-demand data managementmanagement

Data management in grids

4.6 Toward pervasive, autonomic and on-demand data managementmanagement

Future works in data management in Grids include the integration of new concepts linked with the mobility, pervasiveness, context of the users and re-sources. Taking advantage of light devices, interconnected in an ad hoc way and participating with more stable resources to a grid is the key idea of the Pervasive Grid concept. The Pervasive Grid [Parashar and Pierson, 2009 ] encompasses many new challenges due mainly to the uncertainty of the infor-mation and resources on the next generation of grids. Early works on data management in these pervasive grids [Pierson, 2008 ] show the common ap-proaches and differences between classical data management in grids and the data management in pervasive grids. The need for enhanced fault tolerance and recovery mechanisms, together with the inclusion of self characteristics (self-healing, self-management, self-recovery . . . ) will lead to the development of a new class of grid computing, closer to autonomic computing [IBM, 2009 ]. Independent collaborative services (embedded in Service Oriented Archi-tecture) will be developed and interconnected. Dynamic reconfiguration of components according to the evolution of context (like moving, replicating, splitting) will be more and more present in future developments.

Existing research directions exist for integrating data resources and compu-tation resources. Indeed data are normally not used directly, but are processed before being delivered to users. Moving the data to the computation nodes

Data management in grids 117 can be inefficient compared to moving the processing to the data (like the mo-bile agent approach for Distributed Query Processing). OGSA-DAI opened the path for this integration (by defining operations during the execution of some requests). Subsequent works on data- and work-flow in grids [Glatard et al., 2005 ] paved the way for more efficiency and optimization. The idea is to allocate computation resources taking into account the placement of the data manipulated by the processes.

Another direction for data management in grids will be developed with the concept of Cloud Computing. Cloud computing accounts for integrating on-demand resources (Amazon for instance), up to deploying specific middleware (like Grid’5000 [Cappello et al., 2005 ]) on-the-fly for customers. Today such infrastructures are mainly based on dedicated clusters, but soon Grid Cloud Computing will become a reality. Data management in such environments will be based on works in grid computing, adding more efforts on quality of services and accounting for enabling comprehensive business models.

4.7 Concluding remarks

This chapter gave an overview of the different techniques related to data management in today’s computing grids. It tried to sketch the fundamen-tal differences with the data management in other distributed systems or to delineate the links with distributed databases and all the corresponding back-ground. Exploring several problems and partly some solutions, we proposed a comprehensive view of the data management techniques for classical problems:

Identification, Replication, Access, Query, Security, Consistency.

Finally, we gave briefly some future directions of the data management in grids towards autonomic and on-demand computing.

Acknowledgment

The author would like to thank his esteemed colleagues Lionel Brunie, Georges Da Costa, Abdelkader Hameurlain and Harald Kosch, and all those who, especially during the Data Management in Grids workshops [Pierson, 2005 ], [Pierson and Brunie, 2007 ], [Pierson and Kosch, 2008 ], raised some interesting and debated discussions in the last years.

118 Fundamentals of Grid Computing

4.8 References

[Alfieri et al., 2005] Alfieri, R., Cecchini, R., Ciaschini, V., dell’Agnello, L., Frohner, A., Lorentey, K., and Spataro, F. (2005). From gridmap-file to VOMS: managing authorization in a grid environment. Future Generation Computer Systems, 21(4):549–558.

[Allcock et al., 2005] Allcock, W., Bresnahan, J., Kettimuthu, R., and Link, M. (2005). The globus striped GridFTP framework and server. In Pro-ceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05), page 54, Washington, DC, USA. IEEE Computer Society.

[Antonioletti et al., 2005] Antonioletti, M., Atkinson, M. P., Baxter, R. M., Borley, A., Hong, N. P. C., Collins, B., Hardman, N., Hume, A. C., Knox, A., Jackson, M., Krause, A., Laws, S., Magowan, J., Paton, N. W., Pear-son, D., Sugden, T., WatPear-son, P., and Westhead, M. (2005). The design and implementation of grid database services in OGSA-DAI. Concurrency:

Practice and Experience, 17(2–4):357–376.

[Bertino and ¨Ozsu, 1994] Bertino, E. and ¨Ozsu, M. T. (1994). Guest editors’

introduction. Distributed and Parallel Databases, 2(1):5–6.

[Bhardwaj and Sinha, 2005] Bhardwaj, D. and Sinha, M. (2005). GridFS:

ensuring high-speed data transfer using massively parallel I/O. In Bhalla, S., editor, Proceedings of the 4th International Workshop on Databases in Networked Information Systems (DNIS 2005), volume 3433 of Lec-ture Notes in Computer Sciences, pages 280–287. Springer-Verlag. Avail-able online at: http://springerlink.metapress.com/openurl.asp?genre=

article{\&}issn=0302-9743{\&}volume=3433{\&}spage=280(accessed May 1, 2009).

[Buyya et al., 2008] Buyya, R., Pathan, M., and Vakali, A., editors (2008).

Content delivery networks. Springer-Verlag.

[Capit et al., 2005] Capit, N., Costa, G. D., Georgiou, Y., Huard, G., Martin, C., Mouni´e, G., Neyron, P., and Richard, O. (2005). A batch scheduler with high level components. InProceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), pages 776–783. IEEE Computer Society.

[Cappello et al., 2005] Cappello, F., Caron, E., Dayd´e, M. J., Desprez, F., J´egou, Y., Primet, P. V.-B., Jeannot, E., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Qu´etier, B., and Richard, O. (2005). Grid’5000: a large scale and highly reconfigurable grid experimental testbed. In Proceed-ings of the 6th IEEE/ACM International Conference on Grid Computing

Data management in grids 119 (GRID’2005), pages 99–106, Seattle, Washington, USA. IEEE Computer Society.

[Cardenas et al., 2007] Cardenas, Y., Pierson, J.-M., and Brunie, L. (2007).

Management of a cooperative cache in grids with grid cache services. Con-currency: Practice and Experience, 19(16):2141–2155.

[Chadwick et al., 2008] Chadwick, D. W., Zhao, G., Otenko, S., Laborde, R., Su, L., and Nguyen, T.-A. (2008). PERMIS: a modular authorization infrastructure. Concurrency: Practice and Experience, 20(11):1341–1357.

[Chang and Chang, 2006] Chang, R.-S. and Chang, J.-S. (2006). Adaptable replica consistency service for data grids. InProceedings of the 3rd Interna-tional Conference on Information Technology (ITNG’06), pages 646–651, Washington, DC, USA. IEEE Computer Society.

[Chen et al., 2007] Chen, Y., Berry, D., and Dantressangle, P. (2007).

Transaction-based grid database replication. InUK e-Science Al one Hands Meeting 2007, Nottingham, UK.

[Chervenak et al., 2001] Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., and Tuecke, S. (2001). The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23:187–200.

[Czajkowski et al., 2001] Czajkowski, K., Kesselman, C., Fitzgerald, S., and Foster, I. T. (2001). Grid information services for distributed resource shar-ing. In Proceedings of the 10th International Symposium on High Perfor-mance Distributed Computing (HPDC’2001), pages 181–194, San Francisco, USA. IEEE Computer Society. Available online at: http://csdl.computer.

org/comp/proceedings/hpdc/2001/1296/00/12960181abs.htm (accessed May 1, 2009).

[EGEE, 2009] EGEE (2009). File transfer service. Available online at: http:

//egee-jra1-dm.web.cern.ch/egee-jra1-dm/FTS/ (accessed May 1, 2009).

[Foster and Kesselman, 2004] Foster, I. and Kesselman, C., editors (2004).

The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, 2nd edition.

[Foster et al., 2001] Foster, I., Kesselman, C., and Tuecke, S. (2001). The anatomy of the grid: enabling scalable virtual organizations. International Journal High Performance Supercomputer Applications, 15(3):200–222.

[Glatard et al., 2005] Glatard, T., Montagnat, J., and Pennec, X. (2005).

Grid-enabled workflows for data intensive medical applications. In Pro-ceedings of the 18th International Symposium on Computer-Based Medical Systems (ISCBMS), pages 537–542. IEEE Computer Society.

120 Fundamentals of Grid Computing

[gLite, 2009] gLite (2009). Documentation. Available online at: http:

//glite.web.cern.ch/glite/(accessed May 1, 2009).

[Globus, 2009a] Globus (2009a). Documentation. Available online at: http:

//www.globus.org (accessed May 1, 2009).

[Globus, 2009b] Globus (2009b). Reliable file transfer. Available online at: http://www.globus.org/toolkit/docs/4.0/data/rft/ (accessed May 1, 2009).

[Gossa et al., 2007] Gossa, J., Pierson, J.-M., and Brunie, L. (2007). Adapt-able distance-based decision-making support in dynamic cross-grid environ-ment. In Kermarrec, A.-M., Boug´e, L., and Priol, T., editors,Proceedings of the 13th International EuroPar Conference (EuroPar’2007), volume 4641 ofLecture Notes in Computer Sciences, pages 437–446. Springer-Verlag.

[Gribble et al., 2001] Gribble, S. D., Halevy, A. Y., Ives, Z. G., Rodrig, M., and Suciu, D. (2001). What can database do for peer-to-peer ? InWebDB, pages 31–36.

[Hoschek et al., 2000] Hoschek, W., Ja´en-Mart´ınez, F. J., Samar, A., Stockinger, H., and Stockinger, K. (2000). Data management in an interna-tional data grid project. InGRID, pages 77–90. Available online at: http://

link.springer.de/link/service/series/0558/bibs/1971/19710077.htm (ac-cessed May 1, 2009).

[Hupfeld et al., 2008] Hupfeld, F., Cortes, T., Kolbeck, B., Stender, J., Focht, E., Hess, M., Malo, J., Marti, J., and Cesario, E. (2008).

The XtreemFS architecture: a case for object-based file systems in grids. Concurrency: Practice and Experience, 20(17):2049–2060.

Available online at: http://dblp.uni-trier.de/db/journals/concurrency/

concurrency20.html#HupfeldCKSFHMMC08 (accessed May 1, 2009).

[IBM, 2009] IBM (2009). Autonomic computing: IBM’s perspective on the state of information technology. Available online at: http://researchweb.

watson.ibm.com/autonomic/(accessed May 1, 2009).

[Jung and Yeom, 2008] Jung, I. Y. and Yeom, H. Y. (2008). An efficient and transparent transaction management based on the data workflow of HVEM data grid. InProceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (CLADE ’08), pages 35–44, New York, NY, USA. ACM Press.

[Lynden et al., 2009] Lynden, S., Mukherjee, A., Hume, A. C., Fernandes, A. A. A., Paton, N. W., Sakellariou, R., and Watson, P. (2009). The de-sign and implementation of OGSA-DQP: a service-based distributed query processor. Future Generation Computer Systems, 25(3):224–236.

Data management in grids 121 [Morvan and Hameurlain, 2009] Morvan, F. and Hameurlain, A. (2009). Dy-namic query optimization: towards decentralized methods. International Journal of Intelligent Information and Database Systems. Available online at: http://www.inderscience.com (accessed May 1, 2009).

[Ozsu and Valduriez, 1999] Ozsu, T. M. and Valduriez, P. (1999). Princi-ples of distributed database systems. PrenticeHall, Englewood Cliffs, NJ, USA, 2nd edition. Available online at: http://www.amazon.ca/exec/obidos/

redirect?tag=citeulike09-20\&path=ASIN/0136597076(accessed May 1, 2009).

[Pahlevi and Kojima, 2004] Pahlevi, S. and Kojima, I. (2004). OGSA-WebDB: an OGSA-based system for bringing web databases into the grid.

Proceedings of the International Conference on Information Technology:

Coding and Computing (ITCC 2004), 2:105–109.

[Parashar and Pierson, 2009] Parashar, M. and Pierson, J.-M. (2009). Per-vasive grids: challenges and opportunities. In Handbook of Research on Scalable Computing Technologies. IGI Global.

[Pearlman et al., 2003] Pearlman, L., Kesselman, C., Welch, V., Foster, I., and Tuecke, S. (2003). The community authorization service: status and future. InProceedings of Computing in High Energy Physics (CHEP ’03).

[Pierson, 2005] Pierson, J.-M., editor (2005).Proceedings of the 1st Workshop on VLDB Data Management (VLDB DMG’2005), volume 3836 ofLecture Notes in Computer Sciences. Springer-Verlag.

[Pierson, 2008] Pierson, J.-M. (2008). Data management concerns in a per-vasive grid. In Proceedings of the International Conference on Vector and Parallel Processing (VECPAR), number 5336 in Lecture Notes in Computer Sciences, pages 506–520. Springer-Verlag. Available online at:

http://www.springerlink.com (accessed May 1, 2009).

[Pierson and Brunie, 2007] Pierson, J.-M. and Brunie, L., editors (2007). Pro-ceedings of the Workshop on VLDB Data Management in Grids (VLDB DMG’2006), volume 19.

[Pierson and Kosch, 2008] Pierson, J.-M. and Kosch, H., editors (2008). Pro-ceedings of the Workshop on VLDB Data Management in Grids Workshop (VLDB DMG 2007), volume 20.

[Pucciani, 2008] Pucciani, G. (2008). The replica consistency problem in data grids. PhD thesis, University of Pisa, Pisa, Italy.

[Qi et al., 2004] Qi, Z., You, J., Jin, Y., and Tang, F. (2004). GridTP ser-vices for grid transaction processing. In Li, M., Sun, X.-H., Deng, Q., and Ni, J., editors, Proceedings of the Second International Workshop

122 Fundamentals of Grid Computing

on Grid and Cooperative Computing (GCC 2003), volume 3033 of Lec-ture Notes in Computer Sciences, pages 891–894. Springer-Verlag. Avail-able online at: http://springerlink.metapress.com/openurl.asp?genre=

article{\&}issn=0302-9743{\&}volume=3033{\&}spage=891(accessed May 1, 2009).

[Scavo and Welch, 2007] Scavo, T. and Welch, V. (2007). A grid authoriza-tion model for science gateways. InProceedings of the Workshop on Grid Computing Environments (GCE).

[Seitz et al., 2003] Seitz, L., Pierson, J.-M., and Brunie, L. (2003). Key management for encrypted data storage in distributed systems. In Pro-ceedings of the 2nd International IEEE Security in Storage Workshop (SISW 2003), pages 20–30. IEEE Computer Society. Available on-line at: http://csdl.computer.org/comp/proceedings/sisw/2003/2059/00/

20590020abs.htm (accessed May 1, 2009).

[Seitz et al., 2005] Seitz, L., Pierson, J.-M., and Brunie, L. (2005). Sygn:

a certificate based access control in grid environments. Technical Report 2005-07, LIRIS.

[Shibboleth, 2009] Shibboleth (2009). Internet2. Available online at: http:

//shibboleth.internet2.edu/ (accessed May 1, 2009).

[Stockinger et al., 2003] Stockinger, H., Donno, F., Laure, E., Muzaffar, S., Kunszt, P., and Millar, P. (2003). Grid data management in action: ex-perience in running and supporting. In Proceedings of the EU DataGrid Project on Computing in High Energy Physics (CHEP 2003), pages 24–28.

[Stockinger et al., 2001] Stockinger, H., Rana, O. F., Moore, R., and Merzky, A. (2001). Data management for grid environments. In Hertzberger, L. O., Hoekstra, A. G., and Williams, R., editors, Proceedings of the 9th In-ternational Conference on High-Performance Computing and Networking (HPCN 2001), volume 2110 ofLecture Notes in Computer Sciences, pages 151–160. Springer-Verlag. Available online at: http://link.springer.de/

link/service/series/0558/bibs/2110/21100151.htm(accessed May 1, 2009).

[Sun and Xu, 2004] Sun, Y. and Xu, Z. (2004). Grid replication coherence protocol.Proceedings of the International Parallel and Distributed Process-ing Symposium, 14:232.

[Teaff et al., 1995] Teaff, D., Watson, D., and Coyne, B. (1995). The architec-ture of the high performance storage system. InProceedings of the Goddard Conference on Mass Storage and Technologies, pages 28–30.

[Thizbolt et al., 2007] Thizbolt, F., Ortiz, A., and M’zoughi, A. (2007). Vis-ageFS: dynamic storage features for wide-area workflows. In Zheng, S.,

Data management in grids 123 editor, Proceedings of the International Conference on Parallel and Dis-tributed Computing Systems (PDCS), pages 61–66. ACTA Press. Available online at: http://www.actapress.com(accessed May 1, 2009).

[T¨urker et al., 2005] T¨urker, C., Haller, K., Schuler, C., and Schek, H.-J.

(2005). How can we support grid transactions ? Towards peer-to-peer transaction processing. InCIDR, pages 174–185. Available online at: http:

//www.cidrdb.org/cidr2005/papers/P15.pdf (accessed May 1, 2009).

[University of North Carolina, 2009] University of North Carolina (2009).

Storage resource broker. Available online at: http://www.sdsc.edu/srb (ac-cessed May 1, 2009).

[Wang et al., 2008] Wang, T., Vonk, J., Kratz, B., and Grefen, P. (2008). A survey on the history of transaction management: from flat to grid trans-actions. Distributed Parallel Databases, 23(3):235–270.

[Wolski et al., 1999] Wolski, R., Spring, N. T., and Hayes, J. (1999). The network weather service: a distributed resource performance forecasting service for metacomputing. Future Generation Computer Systems, 15(5–

6):757–768.

Chapter 5

Future of grids resources