• Aucun résultat trouvé

Mining grid services

Rahul Ramachandran, Sara Graves, John Rushing, Ken Keizer, Manil Maskey, Hong Lin and Helen Conover

4.6 Mining grid services

The Linked Environments for Atmospheric Discovery (LEAD) project6is building a compre-hensive cyber infrastructure in mesoscale meteorology using grid computing technologies.

This multidisciplinary effort involving nine US institutions and some 80 scientists and stu-dents is addressing the fundamental IT and meteorology research challenges needed to create an integrated, scalable framework for identifying, accessing, decoding, assimilating, predict-ing, managpredict-ing, analyspredict-ing, mining and visualizing a broad array of meteorological data and model output, independent of format and physical location (Droegemeier et al., 2005). One of the major goals of LEAD is to develop and deploy technologies that will allow people (students, faculty, research scientists, operational practitioners) and atmospheric tools (radars, numerical models, data assimilation systems, data mining engines, hazardous weather decision support systems) to interact with weather. Even though mesoscale meteorology is the driver for LEAD, the functionalities and infrastructure being developed are extensible to other domains such as biology, oceanography and geology (Droegemeier et al., 2005). An overview of the LEAD architecture and middleware components can be found in the work of (Gannon et al., 2007).

The grid mining service approach for data mining, shown in Figure 4.7, is similar to the Web service implementation (Figure 4.3), with modifications as necessitated by the complexities and capabilities of the grid middleware. Both grid and Web mining services are built on ADaM executable modules. A data mining workflow can be created by using a workflow composition

5The ActiveBPEL Open Source Engine: http://www. activebpel.org/

6https://portal.leadproject.org/gridsphere/gridsphere

4.6 MINING GRID SERVICES 67

Figure 4.7 Grid Mining Service architecture

tool, which is part of the LEAD portal. A variant of BPEL, called GPEL (Kandaswamy et al., 2006), provides additional grid-specific features used to describe workflows in LEAD. A GPEL engine is used to parse the workflow. One of differences in the Web and grid service implementations is that LEAD has only a small set of primary persistent services. ‘Application services’ including the mining modules are virtual and are created on demand by an application factory. The GPEL engine queries the application factory service to determine the best instance of the mining service and uses the query result to invoke the individual service at different host computers.

4.6.1 Architecture components

The Generic Application Factory (GFac) service (Slominski, 2007) provides a mechanism to convert any application executable such as an ADaM Toolkit component into a service. The GFac requires a wrapper around the application executable with the shell script meeting specific GFac Service Toolkit specifications. Use of GFac to convert ADaM modules to LEAD grid ser-vices entails wrapping an application executable, registering the service, creating an instance of the service from the registry and finally invoking the service. The ADaM Toolkit components can reside on separate hosts (computing resources) from the Factory service within the grid.

Detailed descriptions for registration, creation and invoking of ADaM grid services are as follows.

Service registration Shell wrapped ADaM modules are registered as services in a GFac reg-istry. Registering a service is a process of capturing metadata about the application server’s host and the ADaM module’s input and output parameters. A GFac specific XML descriptor is created for the registration of each service. GFac also provides a Web interface portal for registration of these services. During workflow creation the registry is automatically loaded for the discovery of these ADaM services.

Service creation An instance of any ADaM registered service can be created using the in-formation in the GFac registry. After the mining service instance is created, a URL to the

corresponding WSDL is returned. Many instances of the same service can be created. Each of these services will have a unique WSDL. Usually, this action is performed using a grid portal.

The GFac portlet contacts the GFac registry and creates a service.

Service execution An instance of an ADaM service can be invoked as a standalone service from GFac or as a component of a workflow from a workflow composer via a portal. The input and output data for the service are staged using GridFTP for data transfer. The access to the GFac and the workflow portlets is via the portal and the grid credential used for the portal is also used to instantiate mining workflows at the application server.

Workflow composer Mining workflows are composed using XBaya, a graphical user interface for composing services. The composed workflows are deployed to a Grid Process Execution Language (GPEL) engine. During service registration, the user can select the input and output data to be staged, and any data to be published to the data catalogue.

GPEL GPEL is an XML based workflow process description language. A set of XML schemas is defined that closely follows the industry standard process execution language, BPEL. To utilize the Grid resources, GPEL has added additional attributes to split the services into multiple tasks and eventually combine the results. In addition, GPEL allows parallel execution of workflow components which is essential for mining large volumes of data.

4.6.2 Workflow example

A mining workflow to generate a cloud mask by applying a unsupervised classifier (k-means clustering algorithm) on images from the Geostationary Operational Environmental Satellite (GOES) can be created using the XBaya composer in the LEAD portal. The workflow contains additional processing services required to convert the input data to the required ADaM data model and to convert the result to an image file for visualization. The original data and the workflow results are presented in Figure 4.8.

Figure 4.8 Clustering results from the sample workflow. (A) Original GOES image. (B) Two class cluster mask generated by the k-means algorithm

REFERENCES 69

4.7 Summary

The redesigned ADaM Toolkit has been adapted to provide a suite of Web and grid services.

Even though the basic architecture design and component functionality has not changed from the original ADaM system design, standards-based specifications and protocols have replaced internal specifications and protocols. The use of these standards now maximizes the interop-erability of these services with other applications and tools. ADaM services allow scientists to utilize the components in the mining and image processing toolkit in a distributed mode.

Using different workflow composers, scientists can mix and match various ADaM services with other available services to create complex mining and analysis workflows to solve their specific problem. Even though applications of these services presented in this chapter have focused on Earth Science problems, these mining services can and are being effectively applied to solve problems in various other domains.

Acknowledgements

The authors would like to acknowledge the contributions of Chris Lynnes and Long Pham at the NASA Goddard Earth Sciences Data and Information Services Center for the ADaM Web services effort. Den-nis Gannon at Indiana University and his students have been instrumental in incorporating the ADaM Toolkit as grid services within the LEAD project. This research work was supported by NASA grant NNG06GG18A and NSF grant ATM-0331579.

References

Berendes, T., Ramachandran, R., Graves, S. and Rushing, J. (2007), ADaMIVICS: a software tool to mine satellite data, in ‘87th AMS Annual Meeting’.

Droegemeier, K. K., Gannon, D., Reed, D., Plale, B., Alameda, J., Baltzer, T., Brewster, K., Clark, R., Domenico, B., Graves, S., Joseph, E., Morris, V., Murray, D., Ramachandran, R., Ramamurthy, M., Ramakrishnan, L., Rushing, J., Weber, D., Wilhelmson, R., Wilson, A., Xue, M. and Yalda, S. (2005),

‘Service-oriented environments in research and education for dynamically interacting with mesoscale weather’, IEEE Computing in Science and Engineering 7, 24–32.

Dunham, M. H. (2003), Data Mining: Introduction and Advanced Topics, Pearson Education.

Gannon, D., Plale, B., Christie, M., Huang, Y., Jensen, S., Liu, N., Marru, S., Pallickara, S. L., Perera, S., Shirasuna, S., Simmhan, Y., Slominski, A., Sun, Y. and Vijayakumar, N. (2007), Building grid portals for e-science: a service-oriented architecture, in L. Grandinetti, ed., ‘High Performance Computing and Grids in Action’, IOS Press.

Gu, Y. and Grossman, R. L. (2007), ‘UDT: UDP-based data transfer for highspeed wide area networks’, Computer Networks 51, 1777–1799.

Hinke, T. and Novotny, J. (2000), Data mining on NASA’s Information Power Grid, in ‘9th International Symposium on High-Performance Distributed Computing’, pp. 292–293.

Hinke, T., Rushing, J., Kansal, S., Graves, S., Ranganath, H. and Criswell, E. (1997), Eureka phenom-ena discovery and phenomphenom-ena mining system, in ‘AMS 13th International Conference on Interac-tive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology’, pp. 277–281.

Hinke, T., Rushing, J., Ranganath, H. and Graves, S. (2000), ‘Techniques and experience in mining remotely sensed satellite data’, Artificial Intelligence Review: Issues on the Application of Data Mining 14, 503–531.

Kandaswamy, G., Fang, L., Huang, Y., Shirasuna, S., Marru, S. and Gannon, D. (2006), ‘Building Web services for scientific application’, IBM Journal of Research and Development 50, 249–260.

Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J. and Zhao, Y. (2005), ‘Scientific workflow management and the Kepler system’, Concurrency and Computation:

Practice and Experience 18, 1039–1065.

Lynnes, C. (2006), The simple, scalable, script-based science processor, in J. Qu, W. Gao, M. Kafatos, R. Murphy and V. Salomonson, eds, ‘Earth Science Satellite Remote Sensing’, Springer, pp. 146–161.

Ramachandran, R., Conover, H., Graves, S. J. and Keiser, K. (2000), Algorithm development and mining (ADaM) system for earth science applications, in ‘Second Conference on Artificial Intelligence, 80th AMS Annual Meeting’.

Ramachandran, R., Li, X., Movva, S., Graves, S., Nair, U. S. and Lynnes, C. (2007), Investigating data mining techniques to detect dust storms in MODIS imagery, in ‘32nd International Symposium on Remote Sensing of Environment’.

Rushing, J., Ramachandran, R., Nair, U., Graves, S., Welch, R. and Lin, A. (2005), ‘ADaM: a data mining toolkit for scientists and engineers’, Computers and Geosciences 31, 607–618.

Rushing, J. A., Ranganath, H. S., Hinke, T. and Graves, S. J. (2001), ‘Using association rules as texture features’, IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 845–858.

Slominski, A. (2007), Adapting BPEL to scientific workflows, in I. Taylor, E. Deelman, D. Gannon and M. Shields, eds, ‘Workflows for e-Science’, Springer, pp. 208–226.

5

Mining for misconfigured