• Aucun résultat trouvé

Metadata Management in the EU DataGrid

N/A
N/A
Protected

Academic year: 2022

Partager "Metadata Management in the EU DataGrid"

Copied!
20
0
0

Texte intégral

(1)

Metadata Management in the European DataGrid Project

Gavin McCance

University of Glasgow

European DataGrid Project GridPP Project

(2)

Outline

‹

Classes of metadata in EDG

„ Grid internal metadata

„ Application specific metadata

‹

Products

„ Replica catalogues

„ Spitfire

‹

Technology details

‹

Future Work

(3)

Types of Metadata

‹

Two types of metadata used in EDG WP2

‹

Grid internal metadata

„ Metadata on files (size, checksum, etc)

„ Metadata on logical names (application specific)

‹

Application specific general metadata

„ Not related on logical filenames

„ Bookkeeping databases

„ Data Catalogues

„ Image metadata

„ etc

(4)

Grid Internal

Replication Metadata

(5)

Replica Location Problem

‹

Given a logical file identifier – how do we find all the replicas of that file on the Grid

‹

Driven by two use-cases:

„ a) Particle physics – multiple replica of the same file so that the data are always near the compute resources - for data hungry applications

„ b) Earth Observation/Medical – convenient mechanism for logical namespace. Don’t need to know the physical

location of the files.

(6)

Replica Metadata

‹

Logical filename to storage (physical) filename mapping

GUID

Logical Alias

Logical Alias Logical Alias

Physical File Replica

Physical File Replica Physical File Replica Physical File Replica

Replica Location Service Replica Metadata Catalog

(7)

Replica Location Service (RLS)

‹

Optimised to answer 2 very specific queries:

“for a given GUID, give me all the replicas”

“for a given GUID give me all locally available replicas”

‹

Scalability achieved by:

„ Each site has a Local Replica Catalog LRC containing mappings for files located at the given site

„ Each site runs a Replica Location Index RLI which contains a bloom-filter hashmap for all GUIDs in all LRCs

(8)

Architecture…

Replica Location Index

Local Replica

Catalog Local Replica

Catalog

Local Replica Catalog

Site 1 Site 2 Site 3

(9)

Architecture…

‹Each LRC updates the RLI on every other site.

Replica Location Index

Replica Location

Index Replica Location

Index Local Replica

Catalog

Local Replica

Catalog Local Replica

Catalog

Site 1 Site 2 Site 3

(10)

Sequence to answer the query

‹

for a given GUID, give me all locally available replicas

„ simply contact the Local Replica Catalog.

‹

for a given GUID, give me all the replicas

„ contact Replica Location Index to retrieve all LRCs potentially having a mapping for the given GUID:

GUID Æ List of LRCs

„ contact each LRC in the list to retrieve all replicas

(11)

Bloom Filter Indexing

‹

Advantages:

„ High level of scalability

„ Fast

„ Not a memory intensive hash

‹

Disadvantages:

„ Only fulfills “EQUALTY” type queries, i.e. no wildcards

„ Non-deterministic, i.e. there are a small number of false positives to be dealt with

(12)

Replica Metadata Catalog (RMC)

‹

Stores GUID metadata:

„ logical file names (human readable)

„ small number of user-defined attributes ~O(10)

‹

Attributes are natively typed:

„ string, float, int, date

(13)

RMC

‹

Used to do GUID selection based on application- specific metadata

„ Subsequently use the RLS to find the physical replica based on the GUID

‹

Currently a centralised catalog

„ though work ongoing with Oracle Streams for replicated architectures

„ Work on clustering and replication for high availability solutions

(14)

Application Specific

General Metadata

(15)

Spitfire: Technology Demonstrator

‹

Capabilities:

„ Simple Grid-enabled front-end for any remote RDBMS through secure Web Services (SOAP-RPC)

„ Provides sample generic RDBMS methods that may easily be customized with little additional development

„ WSDL interfaces

„ Web Browser integration (data browser servlet)

„ GSI authentication

„ Local authorization module

„ Not suitable for the retrieval of LARGE result sets

‹

Status: current version 2.1

„ Used by EU DataGrid Earth Observation and Biomedical applications.

(16)

Spitfire Sample API

‹

Spitfire Sample API based upon common SQL

operations. Use the Spitfire Grid service where you might have used JDBC before.

‹

Provides DB query operations, update operations, and schema update operations.

‹

Provides browser servlet to expose specific views of

the data to web based clients.

(17)

Technology details

‹

All services implemented as secure web services

‹

WSDL exposed allowing auto-client generation

„ Supplied clients: Java, C++

„ Others have successfully used perl, python clients using our WSDL

‹

SSL secure authentication using Grid Proxy certificates (GSI, but NOT httpg)

‹

‘Medium-grained’ authorization including web-based administration tool:

„ ‘medium-grained’ meaning each method can be

allowed/denied based on patterns of distinguished names, VOMS capabilities.

„ can interpret grid-map files

„ can interpret VOMS credentials and capabilities contained therein

(18)

Deployment

‹

Tested and deployed on

„ Tomcat/MySQL,

„ Tomcat/Oracle9i

„ Oracle9iAS/Oracle 9i.

‹

Testing ongoing for Tomcat/DB2.

(19)

Future Work

‹

Plan to work together with DAIS working group of GGF to ensure that our services can be re-factored into DAIS-compliant services.

„ Should be fairly easy since we are starting from web services.

‹

Plan to work more closely with applications in order

to refine the metadata interface, or just to enable

their existing metadata applications to be ‘on the

Grid’.

(20)

Security modules

‹

Authentication using standard GSI certs or proxies

„ Trustmanager checks validity and revocation

‹

Role based Authorisation

„ Specific and default roles

(spitfire−cacerts.jks)Trusted CAs

Revoked cert repository Has certificate

been revoked?

Find default Does user specify role?

yes

no

Role ok?

Role repository

Oracle XSQL servlet

Map role to connection id

Connection mappings

.xsql files Is certificate signed

by a trusted CA?

HTTP + SSL Request + client certificate

TomCat

SSLServerSocketFactory

Servlet chain Security Servlet

no

Authorisation module

role

Request + connection id TrustManager

Références

Documents relatifs

Correspondence to: Chunhui Lai, Department of Mathematics, Zhangzhou Teachers College, ihangzhou, Fujian 363000, China.. All rights

Paragraph 77 of the Operational Directives for the Implementation of the Convention provides that recognition to contributors shall be provided through an

In the last step, the software engineer has to manually select the model fragment that he believes is the one corresponding to the target feature taking into account the

This kind of visualization supports the user in understanding and validating the found mapping relations between input and output concepts of the metadata formats.. An example of

Our generic format will be used in the LINDO project in order to integrate any indexation engine in the system without changing the system itself, the output of the new

EGSO is a Grid test-bed related to a particular application Designed to improve access to solar data for the solar physics and other communities.. Addresses the generic problem of

• Systems avoid the hassle of dealing with the different networking protocols involved by using an “ELN Client”. • “ELN Client” hides the complexity of client-BS

Indeed, [4] proved Theorem 1.10 using standard rim hook tableaux instead of character evaluations, though virtually every application of their result uses the