• Aucun résultat trouvé

Hosting and Data Management

N/A
N/A
Protected

Academic year: 2022

Partager "Hosting and Data Management"

Copied!
107
0
0

Texte intégral

(1)

White Paper

Contemporary approaches in

Hosting and Data Management

(2)

SOROR SAHRI, SALIMABENBERNOU(UNIVERSITÉ DEPARIS 5) JÉRÔMETOTEL, LOÏCBERTIN(DATA4)

CRISTINELDIACONU(CENTRE DEPHYSIQUE DES PARTICULES DEMARSEILLE, CNRS) ITEBEDDINEGHORBEL(INSERM)

DANIELLEGELDWERTH-FENIGER(CNRS,UNIVERSITÉ DEPARIS13) LEILAABIDI(USPC, IMAGERIES DUVIVANT)

CHRISTOPHECÉRIN(UNIVERSITÉ DEPARIS13).

PHILIPPEWERLE(UNIVERSITÉ DEPARIS 13) MARIELAFAILLE(USPC)

CONTACT:christophe.cerin@lipn.univ-paris13.fr Version dated February 17, 2018

(3)

Preface

W

HITE PAPER, CONTEMPORARY APPROACHESin Hosting and Data Management are they another publication on data hosting, its opportunities and its risks? Well no! It is above all a pedagogical approach to the subject which has been retained, so that everyone, and especially if he or she is not a computer science specialist, can use appropriately this vast area which belongs to the data and especially its hosting.

The data is present everywhere and it is no coincidence that more and more companies, insti- tutions and individuals are interested in. Still, we need to know where to find it, how to exploit it and where to store it. Especially, when it comes to sensitive topics such as health or safety, knowing that, there is a supposed or proven confidentiality issue, the media seize it and make a big deal of it, while huge range of possibilities are concealed in the cloud computing which are hardly ever being processed.

This document is, first of all, an excellent synthesis of the state of the art, conceived in collabora- tive mode, whose results are quite convincing. It illustrates the overall richness, in this field, and highlights all the aspects of the subject.

As a popularized document, it is available to everyone, who wants to know more about the cloud computing and data hosting, and provides uncompromising treatments to the major issues, in hosting and data management. As such it leaves no room for any fantasies especially when it comes sensitive data.

Security and risk analysis are handled with great rigor, as these questions can still slow down the cloud computing’s expansion, especially in the research world, where economic issues can be crucial on the long run. No solution is perfect, but we must certainly take into account the benefits of a secure online storage and should not forget it: has anyone never been confronted with difficulties with data exchanges before developing online information sharing solutions?

At a time when technology is always evolving faster and the flow of information is very fast, organizations, and especially universities need to get organized and need to appeal for multiple proficient skills, not only on the technical side, but also in the overall functional areas, as the habits of using, and the profession will dictate the solutions to hold, in terms of hosting, storage and information sharing, and the future will probably still be holding many surprises!

Whatever the field of activity, it’s still probably true in the field of research, it’s a form of collaboration, between multidisciplinary teams, from where the best ideas are born. France is renowned for its training qualities and our researchers are not left behind: let’s share our know-how to bring out our solutions on the international level, and let’s not keep only the best, of the fantastic opportunities, which the cloud computing is providing us with !

(4)

intended primarily to be the basics of their work, which can be amended, completed, enriched and updated by the willingness of all those who are wishing to contribute to it.

Have a pleasant reading.

Françoise Farag

Présidente de Salvia Développement

Présidente du conseil d’institut de l’IUT de Villetaneuse

(5)

Preamble

T

HIS PRESENT DOCUMENT’Sgoal is to explain, slowly but steadily the issues, whenever cloud computing technologies’ implementation is performed. It has been written in the clearest and most educational way possible, so as to allow, any neophyte audience, to understand easily. In terms of vocabulary, we made it self-sufficient by explaining the technical terms as clear as possible. It is therefore an entry point, in the field of data hosting, and obviously with its existing principles and its editorial choices.

This edition is primarily intended to the public of higher education and research, who can pin- point the various issues. Indeed, we suppose that any university or individuals within this university may need positive elements to define a policy for hosting data in his or her private cloud, or can they imagine the consequences of such a policy in his or her daily life as a researcher. We offer them a complete overview, a collection of objective and factual information which is proposed, discussed, commented on the underlying ecosystems, public or private. A special notch is made on ethical and legal subjects.

However, the document can be instructive to anyone, from the outside world of higher education and research, but still willing to get an idea of the problems and solutions for hosting and managing massive data. We also hope that the reader will find an approach which forms as a whole, including methodological elements concerning the topics of migration and adoption, in our daily lives, of the cloud computing’s technologies and data centers.

A discriminating factor, among others, in deciding whether an individual or an organization has an interest in switching to cloud-based technologies, i.e, outsourcing data, is to ask whether the data trusted to a third party can be valued safely. It’s quite often the same question as with your money. If you feel that your bank values your money well, there is no reason to change it. If you feel that your bank is not doing enough to invest, with your money, in the real economy and innovative projects, you may be going for another one.

The cloud is a third party like a bank. Likewise, it supposes a choice of relationship. The problem is therefore who controls the data and with what type of contract. This report, which we hope will be followed by other editions, is organized into four main parts:

• Part 1 gives vocabulary elements, introduces data hosting and especially research data;

• Part 2 a state of the art of the large issues involved in hosting and data management. A special discussion is performed on health data;

• Part 3 is a situation’s analysis of the solutions implemented for the data hosting. We provide with specific examples, of the implemented projects, mainly in the academic world. An important focus is performed on security and risk analysis aspects;

• Part 4 develops policy elements, provides guidelines and recommendations for individuals and decision-makers.

(6)

November 2016, written in french. This explains that most of the examples are coming from the french landscape.

We would also like to thank the following persons for their thorough re-readings and comments:

Jean-Philippe Gouigoux (technical director of MGDIS and Microsoft’s Most Valuable Professional on the Azure cloud), Daniel Balvay (PARCC-HEGP and René Descartes University), Laura Werle (IPAG Master 2), Amaladasse Palmyre for his translation, Francis André (CNRS - Direction de l’information scientifique et technique).

The PREDON group (https://predon.org), which takes care of the preservation of scientific data, is actively supporting this white paper project on data hosting. Several editors of the white paper are also involved in the PREDON action, which will be presented later. Within the interdisciplinary PREDON project, preservation is discussed with the aim of exchanging methods, practices and technologies which are useful for scientific projects, which require the collection and analysis of digital data. The project brings together a wide range of disciplines (particle physics, astrophysics, ecology, computer science technology, life sciences, etc.) and has contacts in large national computing centers such as CC-IN2P3 (Lyon), CINES ( Montpellier) and the CDS (Strasbourg). PREDON is also an action within the GdR Madics (http://madics.fr).

(7)

Contents

1

The research data and their management . . . 1

1.1 What is a datum? 1 1.1.1 General definition . . . 1

1.1.2 Open data . . . 2

1.1.3 The Meta-data . . . 3

1.2 Some legal aspects concerning the research data 3 1.2.1 Intellectual Property . . . 4

1.2.2 Rendering anonymous the personal data . . . 5

1.2.3 Health data . . . 6

1.2.4 Software licences . . . 7

1.3 The big data 7 1.3.1 Data management in the field of research . . . 8

1.3.2 Data Management Plan and data life cycle . . . 10

1.4 Data Hosting and Cloud Computing 12 1.4.1 Computer science technology’s infrastructure related to the cloud and data . 12 1.4.2 Service models offered by cloud computing . . . 14

1.4.3 Cloud computing’s general operations . . . 14

1.4.4 Cloud, security and data ownership . . . 16

1.4.5 Adoption of cloud . . . 17

1.4.6 Storage and archiving in the cloud . . . 17

1.5 New jobs, new participants and new roles 20 1.5.1 Syntec numérique . . . 20

1.5.2 APEC’s study . . . 20

1.5.3 Identifications by the pole Systematic and by OPIIEC . . . 20

1.5.4 "New" emerging professions . . . 21

(8)

1.6.1 CNIL’s report . . . 24

1.6.2 ANSSI - Digital trust space . . . 24

1.6.3 Security Policy for the State . . . 25

1.6.4 Security Policy for Research and Health . . . 26

1.6.5 CNIL’s and ANSSI’s Outsourcing guide lines . . . 26

1.6.6 National Strategy for Digital Security . . . 26

1.6.7 ANSSI’s 2015 activity report . . . 27

1.6.8 Bring all the participants together and get them involved in the security of the information systems. Recommendations of the Senior Defense and Security Officer 27 1.6.9 Report to the Senate - Digital Security and Risks . . . 27

2

Issues in data hosting . . . 29

2.1 Hosting and links to the cloud and to the data centers 29 2.1.1 Introduction . . . 29

2.1.2 Functional view of the data life cycle . . . 29

2.1.3 Examples of scientific approaches impacted by the cloud and the big data . . 31

2.2 Hosting and links with ethics and legal commissions 33 2.2.1 The question of hosting personal data . . . 33

2.2.2 Hosting health data . . . 35

2.2.3 Hosting health data as part of the research . . . 37

2.2.4 Hosting authorization procedure for health data . . . 38

2.3 Data preservation 41 2.3.1 Introduction . . . 41

2.3.2 Factors pushing for more preservation . . . 42

2.3.3 Calculating the dimensions or sizing . . . 43

2.3.4 Community Initiatives Internationally . . . 44

2.3.5 Specific tools and methodologies . . . 45

2.3.6 Complementary readings . . . 46

2.4 Methodological elements for risk management 47 2.4.1 Requirement of security needs to protect information . . . 47

2.4.2 Security by design . . . 47

2.4.3 Privacy by Design . . . 48

2.4.4 A protection approach . . . 48

2.4.5 Methodologies . . . 49

2.4.6 The need for information security: the EBIOS method . . . 49

2.4.7 Elements of an approach . . . 50

2.4.8 Risks assessment . . . 50

2.4.9 Risk processing . . . 50

2.4.10 Examples of risk studies . . . 51

3

Examples of usage cases . . . 57

3.1 Health fields 57 3.1.1 Definition of health data and details about their specificity . . . 57

(9)

3.1.2 Data type . . . 57

3.1.3 Data origination . . . 57

3.1.4 CépiDc (Center for epidemiology of Medical Causes of Death . . . 58

3.1.5 General problems . . . 59

3.1.6 Trends in data hosting . . . 59

3.2 The SPC Life imaging (IDV) project and the cloud CUMULUS 60 3.2.1 Context . . . 60

3.2.2 Technical solutions . . . 61

3.2.3 Accuracy on the tools integrated in the IDV cloud . . . 64

3.2.4 Conclusion . . . 66

3.3 DATA4 Group’s Data Center Overview 67 3.3.1 Data center: the state of the art and its evolution . . . 67

3.3.2 Infrastructure . . . 67

3.3.3 Certifications: a guarantee for customers . . . 68

3.3.4 The hyper-logged data center . . . 69

3.3.5 The data-center as a computer . . . 70

3.3.6 Presentation of the DATA4 Paris-Saclay campus . . . 71

3.3.7 Description of a typical data center . . . 73

3.3.8 Security . . . 77

3.3.9 Supervision and maintenance . . . 79

3.3.10 Conclusion . . . 80

3.4 Structure of the EBIOS risk management method 80 3.4.1 An iterative process in five modules . . . 81

3.4.2 Module 1 – Study of the context . . . 81

3.4.3 Module 2 – Étude des événements redoutés . . . 82

3.4.4 Module 3 – Study of the threat scenarios . . . 84

3.4.5 Module 4 – Risk study . . . 84

3.4.6 Module 5 – Study of security measures . . . 84

3.5 Univcloud University’s Cloud Security Study 84 3.5.1 Future investments in the cloud (according to the Ministry) . . . 84

3.5.2 The Digital University in the Paris Region Ile de France (UNPIdF) . . . 85

3.5.3 The Univcloud project . . . 85

3.5.4 The organization pattern of the project . . . 86

3.5.5 The EBIOS method for an Univcloud "Security Record" . . . 88

4

Conclusion and recommendations . . . 91

4.1 Preliminaries 91 4.2 The method of continuous quality improvement quality 92 4.2.1 General ideas . . . 92

4.2.2 The quality management standard . . . 92

4.2.3 Continuous improvement of this document . . . 92

4.3 Recommandations 93 4.3.1 Recommendations to the research participants . . . 93

4.3.2 Recommendations to decision makers . . . 93

(10)
(11)
(12)

1.1 What is a datum?

1.1.1 General definition

As an overall view, a datum is defined as the following:

Definition 1 A datum is a set of values referring to the representation and coding of the information or knowledge in a sense adapted by usage. A datum is not an information. It requires an interpretation to become an information.

In this document we focus onresearch data, also known as scientific data. According to the Organization for Economic Co-operation and Development (OECD), scientific data are "factual records (numbers, texts, images and sounds) which are used as primary sources for scientific research and are generally recognized by the scientific community, as they are needed to validate researches’

results". In the Data Management and Sharing Report1, scientific data is defined as "the result of an experiment, usually from a device (robot, sensor, audio or video recording, etc.), or from a human observation in the common world. Observed mainly in areas where the information acquisition is performed in the field (making observations, measurements, counting, etc.), and they contribute to the progress of the research in terms of knowledge and inventories."

There are different types of research data, which differ according to the way the data is produced and according to their assumed value. The research data can be raw data, processed data, derived data, observation data, experimental data or computational or simulation data2,3. Often, one of the common points, to all these types of data, is their large volume, their hugeness being a key dimension of the big data as we will see later.

1 Report of the Working Group on Data Management and Sharing, INRA, 2012

2 From Open Data to Open Research Data: What Policy (s) for Research Data, R. Gaillard, 2014

3 Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, NSF, 2005

(13)

1.1 What is a datum?

Thus a datum must be taken into account, according to a dimension or an attribute which characterizes it more precisely. The datum can be massive or not, open or not, personal or not. Often these attributes also refer to specific legal questions.

For example, the amount of the data openings (public or community) raises questions on potential risks of the information leakage. Third party members could be attracted by the confidential data, as they represent sales values. This will cause a potential threat to the person whose information was disclosed (public or non-public / privacy information, contained in the data). Therefore, there are risks, by the use of unwanted political or economical means by individuals.

1.1.2 Open data

Definition 2 Open dataare originally public or private digital data, which an organization (community, public service, company) distributes, in a structured way, according to a method and an open license guaranteeing its free access and its re-utilization by all; without any technical, legal or financial restrictions. The initial data, before their distribution, are too often provided in unusable and unstructured formats.

According to the Open Government Data Group4, the open data ought to meet eight principles.

They should be:

• complete (each data set must contain all the available data, except the ones which are subject to privacy, to security or to privileged accesses);

• primary (open data are raw data, taken directly from the source, and be as detailed as possible and without any processing or modifications); Open data must contain the primary data, but they are not summed up here;

• suitable (data must be made available as quickly as possible to be updated);

• accessible (data must be available for the largest number of people);

• workability (ready to be processed by computer tools);

• non-discriminatory (accessible without registration);

• non-proprietary (available in open formats) and

• free of rights.

The notion of open data is part of open science movementOpen Science)which considers science as a common asset, whose distribution is for the welfare of the public and general interest. Thus, the opening of the research data meets five challenges5:

• accelerate scientific discoveries, innovations and returning to investment in research and development;

• encourage scientific collaboration and opportunities for interdisciplinary researches;

• avoid experiments’ duplication, promote the reuse of data and minimize data loss risk;

• ensure the integrity and the reproduction of research and

• to be able to access the mass data and open new analysis’ fields, which were not intended by the data developer.

The research data are linked to legal constraints which hinder their free opening. Moreover, the eight principles mentioned above reflect two groups of aspects which goes with the process of the opening data, including legal aspects. The second group is concerned by the technical aspects.

4 The 8 Principles of Open Government Data, 2007

5 Make public its datasets, Cirad, 2015

(14)

R Further details on the open data question can be found in two documents: "Open Data White Paper" written as part of the Infonomics Resource Facility (IRF) project (IRF)6and in CNRS’s recently published white paper "open science in a digital republic7".

1.1.3 The Meta-data

No data without meta-data! In fact, the data are valuable only in a very specific context. The addition of information, which makes it possible, is essential to identify and locate them.

Definition 3 The meta-data are the data which describe other data (semantically the data about the data).

They allow an individual or a computer to understand the meaning and the organization of the data, and make life easier for their integration, collection and sharing. In the medical field, the meta-data will, for example, indicate a certain amount of information concerning an image (size, dimensions, device’s settings used to capture the image, experimental details of the subject such as age, weight, sex and even name). For a textual document, meta-data can specify the length of the document, the name of the author, the date of writing, and a summary which may include opinions or clinical findings. Meta-data are the basis for semantic web techniques. .

R The term semantic web8refers to an evolution of the web, which would allow the available data (contents, links) to be more easily usable and explainable automatically by software agents. In the context of scientific research, the semantic web would make it possible, for example, to render bibliographic research more efficient. You can refer to the following article, which picks up the issues raised during the training on research data, organized by the TGIR Huma-Num, and which deals with semantic web, meta-data and inter-operability within the SHS9(Humanities and Social Sciences).

The main elements of basic vocabulary having been raised. Now, we will discuss about some of the legal aspects concerning the data, and then we will present the big data, which is certainly the current universe, where we are dealing, the most, with data management. We will then come back to the data hosting problem, in order to introduce the issues concerning the hardware and software infrastructures. The cloud is introduced as one of the infrastructures, which allows to manage big data. The chapter concludes with a presentation of new professions, new participants and their respective roles in hosting and managing data, in the big data era.

1.2 Some legal aspects concerning the research data

Research data raises many legal issues. The right associated with the research data depends on the nature of these data. There are data covered by copyright, others by thesui generisright, ie, a specific which belongs to the database producer. There is also a huge amount of data which is

6 Livre sur les données ouvertes (Open Data Book), B. Meszaros et al., 2015

7 Open science in a digital republic, CNRS, 2016

8 The Semantic Web, Scientific American Magazine, 2001

9 Gérer les données de la recherche, de la création à l’interopérabilité (Manage research data from creation to interoperability), J. Demange, 2015

(15)

1.2 Some legal aspects concerning the research data

covered by regulations. These regulations may come from instructions or from various laws on raw data, such as the INSPIRE instructions on geographical data10. The purpose of these instructions is primarily to ensure the interoperability and the harmonization of the data, for use, from anywhere in the world. The regulations may also concern personal data (SHS data or bio-medical data), or confidential data or of sensitive nature (health data, for example). These same data can also be understood as public information.

Lionel Maurel, lawyer and librarian, emphasizes the importance of establishing "a precise legal diagnosis to determine the status of each layer composing the results of a research project: software, inventions, research data, digitized contents, associated meta-data, editorial recognition through articles, books, websites, etc". Each layer can have a different status, which also regulates the licenses to choose and to supervise their availability and their reuse11.

In the following steps, we provide details on the overall points, related to the rights, associated with research data.

1.2.1 Intellectual Property

Definition 4 Intellectual property can be defined as a set of exclusive rights granted to the author of an intellectual work (Intellectual Property Code).

It consists of two branches:

• literary and artistic property, including copyright and the right of databases and

• industrial property, which concerns patents, trademarks, designs and models, names of the field.

Databases are defined by the Intellectual Property Code.

Definition 5 Databases are a collection of works, data or other independent elements, arranged in a systematic or methodical manner, and individually accessible, by electronic means, or by any other means (article L.112-3 of the Intellectual Property Code).

The instruction of March 11th, 199612, transposed in France by the law of July 1st, 199813con- cerning the protection of databases, defines the structural protection for these databases. Databases are protected under two possible basis:

• under copyright, on the structure of the database (layout, disposal), as part of the original work spirit and / or

• under the rightsui generisof the producer’s database, from the moment when the producer justifies a financial, material and human investment in the database.

Exception to copyrights and to the producer’s database, as part of the research

The Law of a Digital Republic14put into application on October 9th, 2016, introduces an amendment to the Intellectual Property Code, thus providing an exception, to copyrights and to the producer’s database, for performing searches and explorations on texts and data as part of the research.

10 The Inspire guidelines for neophytes, F. Merrien et M. Leobet, 2011

11 Le statut juridique des données de la recherche (The legal status of research data), L. Maurel, 2015

12 Directive 96/9/CE du 11 mars 1996 sur la protection des bases de données (Directive 96/9/EC of March 11th, 1996 on the protection of databases)

13 Loi 98-536 du 1erjuillet 1998 (Law 98-536 of July 1st 1998)

14 Loi no2016-1321 du 7 octobre 2016 pour une République numérique (Law n° 2016-1321 of October 7th, 2016 for a Digital Republic)

(16)

Thus, when a work has been disclosed, its author may not prohibit copies or digital reproductions, made from a lawful source, for the purpose of exploring texts and data included or associated with the scientific writings, intended to the public research, is prohibited to the usage of any commercial purpose. A decree sets the conditions under which the exploration of texts and data is implemented, as well as the methods of preservation and communication of the produced files. At the end of the research activities, for which they were produced; these files incorporate the research data (article L 122-5 of the Intellectual Property Code).

Similarly, when a database is made available to the public by the right’s holder

the latter may not prohibit, other than for any commercial purposes, the digital copies or reproductions of the database, made by a person who lawfully has access thereto, for the purpose of searching in-depth, texts and data included or associated with the scientific writings in a research setting. The preservation and the communication of the technical copies, resulting from the processing, in the end of the research activities, for which they were produced, are secured, by decree, by the designated organizations.

Other copies or reproductions are destroyed (Article L 342-3 of the Intellectual Property code).

The following criteria must therefore be met to qualify this exception:

• access to a lawful source

• for the purpose of exploration / search in-depth of texts and included data or associated with the scientific writings

• for the purpose of public research / in a research setting, with a non-commercial goal.

The text does not give any definition of the very notion of exploration or search in-depth of texts and data. However, it is generally considered that Text Data Mining (TDM) consists of exploring, via searching technical tools, bulks of texts and / or with numerous sources and supports of the data - in order to infer from it the new knowledge.

R Implementing decrees are expected in January 2017 to clarify the conditions and for putting, these exceptions, into application. But, early 2018, in the absence of an implementing decree, the TDM exception is not operative. People are working to get this TDM exception accepted in the Future European Copyright Directive currently under discussion in Brussels and Strasbourg.

1.2.2 Rendering anonymous the personal data

Article 2 of the law "informatique et libertés"15defines the concept ofpersonal data.

Definition 6 A personal data is qualified as "any information related to a natural person identified or who can be identified, directly or indirectly, by reference to an identification number or to one or more elements of its own. In order to determine whether a person is identifiable, all the means must be taken into account, so as to allow his or her identification, which is available or to which the responsible or any other person may have access".

15 Article 2 de Loi 78-17 du 6 janvier 1978 relative à l’informatique, aux fichiers et aux libertés (Article 2 of the Law 78-17 of January 6th, 1978 related to IT, to the files and to freedoms)

(17)

1.2 Some legal aspects concerning the research data

Personal data may bedirectly identifiable(last name, first name,. . . ) orindirectly identifiable (INSEE directory registration number, telephone number, IP address,. . . ).

Conversely, the data is not a personal data, or a combination of data which is not related to a natural person and does not identify a natural person. The Personal character depends on the means which can be implemented and leading to the identification of a person. The field of personal data is thereforeprogressive, as it depends on the technical side and its performances. It is also important to take into account thecontext of the information processingto infer from it the personal character:

indeed, the identification of an individual can also be inferred from the own knowledge of the people who process the data.

Definition 7 Rendering anonymous is a procedure which breaks the link between the data and a natural person, thus protecting the privacy of the individuals.

The usage of rendering anonymous may be necessary when an entity is not legitimate to process personal data for the purposes which he or she has chosen. In fact, rendering anonymous would make it possible to benefit from the enriched data, while protecting the privacy of individuals and meeting the requirements for the protection of personal data.

However, the concept of rendering anonymous is not easy to apply because it is difficult or impossible to determine, whether a data is always a personal data, or if it has been rendered anonymous. Indeed, an individual can be identified indirectly, for example through other data sources, which combined together, will lead to his or her identification.

To assess the rendering anonymous, it would be necessary to answer several questions:

• Is it reasonably possible for a natural person to be identified from the processed data and from other data?

• What is the probability of a re-identification?

• What is the probability that the re-identification is correct?

R If you are interested in the different techniques of the existing rendering anonymous processes, their effectiveness and their limits, you can refer to the document written by Benjamin Nguyen

16.

1.2.3 Health data

Health data are considered as sensitive data according to the law "Informatique et Libertés".

Definition 8 Health data are "personal data which, directly or indirectly, reveal racial or ethnic origins, political, philosophical or religious opinions or trade union membership of persons, or the one which is related to health or sex life of these persons" (art.8 loi "Informatique et Libertés").

There is no legal definition of health data, either in the French law or in the European law, but there are indications which appear in:

16 Techniques d’anonymisation, B. Nguyen, Statistique et société, 2014 (Techniques of rendering anonymous, B.

Nguyen, Statistics and society, 2014)

(18)

• Article 1111-8 of the Public Health Code (CSP) : data "collected or produced during prevention, diagnosis or care activities";

• the jurisprudence Lindqvist Court of Justice of European Communities (CJCE ) 6/11/2003, Bodil Lindqvist, C-101/01) provides some insights about the definition of health data: the EU Court of Justice considers, in this case the indication of a person who has been injured on foot and is on sick leave constitutes personal health data within the meaning of Article 8, paragraph 1 of the Directive 95/46 / EC. The Court, in this case, gives a broad interpretation to the term "health data", "as such it includes information on all aspects, both physical and psychic, of a person’s health";

• the decision of the State Council of July 19th, 2010 No317182 also provides some indications which, for students enrolled in specialized classes, "the only reference that the student is enrolled in a health care structure does not give much information about the nature or the severity of the affection, of what he or she suffers, to be considered as a health data". On the other hand, if the information makes it possible to identify the type of disability or impairment of the student, then it will be a health data;

• the European regulation states that "health data includes any information related to a person’s physical or mental health, or the health services’s allowances allocated to which person".

Health data are considered particularly sensitive and are subject to specific guidelines.

1.2.4 Software licences

To conclude on the legal issues, we would like to make a carving on the problems related to software and more particularly to open source software by quoting a brochure17which emanates from the Free Software Thematic Group (GTTL) of the SYSTEMATIC competitiveness pole. This brochure covers software protection, the specific legal framework of free software, the use and the operations of free software. This document is indirectly related to our issues, but it seems important to remind us how free software plays a major role in our daily lives. In particular, many middleware (software serving as an intermediary for communication between several applications), cloud and big data tools are open source projects. Thus the ownership, of these technologies by communities, is easier and, on the other hand, the adoption passes through the communities, which make it a success or a failure, and no longer through the circuits, which can be locked up, by large operators.

1.3 The big data

Many figures circulate on the volume of data produced, collected and analyzed daily. Everyone agrees the volume will grow exponentially. For example, the firm IDC18already estimated in 2011, the number of data produced and shared on the Internet would reach 8 zettabytes in 2015. Let’s recall that 1 zettabyte = 1000 exabytes = 1 million petabytes = 1 billion terabytes! This "information flooding" of data has also been perceptible for more than a decade, in the scientific research field and it is now clear that the volume of data collected over the next few years will exceed those collected during the previous centuries19.

Big data characterizes the research data of many scientific domains (genome, astronomical,

17 Fondamentaux juridiques - Collaboration industrielle et innovation ouverte (Legal Basics - Industrial Collaboration and Open Innovation)

18 International Data Corporation

19 The data deluge: an e-Science perspective

(19)

1.3 The big data

climatic, etc.) which include terabyte and even petabyte volumes in 2015 for a single experiment.

Big data are defined in many different ways in the literature. We are referring back to the firm IDC

20:

Definition 9 The Big data is presented by our American colleagues, and are defined as "a new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis". (IDC Councelling firm)

It was from 2001 when the scientific community began to explain the challenges inherent in the growth of data, as three-dimensional, through the rule known as "the 3Vs" (volume, velocity and variety), then "the 5Vs" (volume, velocity, variety, veracity and value) or more, because the specialists disagree on the number of Vs to take into account, as they depend on the disciplines. The common definitions, associated with these dimensions are, the following:

• Volume - Refers to the size of the overall data created by the different participants or equipment of the ecosystem;

• Variety - Refers to the number of individual systems and / or types of data used by the participants; multiple data formats within the same system are problematic in practice;

• Velocity - Represents the frequency with which data is both generated, captured, shared, and updated;

• Value - Refers to our ability to transform all created data into one value, based on the two previous types of data: initial and non-immutable; non-modifiable;

• Veracity or truthfulness - Refers to the fact that the initial data may be unintentionally modified through their life cycles, or the fact that the unchangeable data is being managed. The whole affects the quality of the information and therefore the reliability of the information is at stake.

On Figure 1.1, other terms, than the five initial ones, appear and they allow to specify certain goals which are related to the stages of the life cycle of the data. As an example, let’s consider the terms Terabytes, Records / Arch, Transactions, Tables, Files. All these terms refer to the issues related to storage, representation (Tables, Files) and the access model (Transactional). In such a framework, the goal of the big data is to provide powerful means to store, to represent and to access data

Figure 1.2 is a refinement of the notions of the big data and it comes from a report of the NIST21. This colored figure, where each of the five colors, emphasizes either a definition or a challenge or the goal of the big data. This allows to start the discussions on the major scientific and technical issues which arise in this field, particularly on the issues of new data models, analysis issues and issues of functional and technical architectures as well as tools. The density of the terms demonstrates the issues, buried behind the big data terminology, are to be taken into account from multiple angles of the approach. Following this, we will focus more specifically on the architectural aspects of the systems, as they are directly related to our topic on the data hosting.

1.3.1 Data management in the field of research

Today, the dimensions of the big data mean that the data management, in the research sector, is becoming more and more an integral part of the research project. Data management requires rigorous

20 Big data Analytics: Future Architectures, Skills and Roadmaps for the CIO, IDC, 2011

21 National Institute of Standards and Technology

(20)

Figure 1.1: 5Vs of big data

Figure 1.2: Categorization of the terms and goals of the big data (from NIST)

(21)

1.3 The big data

organization, planning, and monitoring throughout the life of the project and beyond, to ensure their sustainability, accessibility and reuse. Data management in research fulfills the following objectives

22:

• it increases the research’s efficiency by facilitating data access and their analysis to the researcher, who conducted the research or by any other new researcher;

• it ensures the continuity of the research through the reuse of data, while avoiding duplicating efforts;

• it promotes expanded distribution and increases the impact: the research data are properly formatted, described and identified so as to preserve their long-term values;

• it ensures the integrity of the research and the authentication of the results. Accurate and comprehensive research data also allow the reconstruction of events and processes which led to these results;

• It reduces the risk of loss and enhances data security through the use of robust and responsive storage devices. However, it should be noted now that these problems are not only technical in nature. We will examine further, the various participants working to secure the information systems. The challenge is then to make all these participants work together;

• it accompanies the publication’s current evolution: scientific journals are increasingly propos- ing that the data, which incorporates the basis of a publication to be shared and deposited in a data accessible data facility. As a result, the research management of data, makes life easier for the submitting the process to scientific journals based on the documented data sets;

• it satisfies donors’ financing conditions for the project: they are mostly interested in the researchers producing data, issued during a project, and often require funding for the opening of the project of these data, so that they are freely accessible and free of charge;

• It testifies and takes responsibility: by managing your research data and making it available.

As such, it demonstrates the responsible using of the public funding for research.

1.3.2 Data Management Plan and data life cycle

Good data management requires the development of a Data Management Plan (DMP) and must take into account all the stages of the data life cycle.

Definition 10 The data management plan is a formal document explaining how data is obtained, documented, analyzed and how it’s used during and after a research project. It describes how data are produced, described, stored and distributed.

Research operators, for example, at European level, ask researchers to build a data management plan and integrate it into their scientific project23. Very soon there will be an obligation, even at national level, to provide this data to this type of document, in response to each request for projects.

R You can inspire from the preexisting DMP models, as the one which was produced by the Common Documentation Service (SCD) of Paris Descartes, the Bureau of Archives and Support Directorate for Research and Innovation (DARI) of Paris Diderot24or develop your

22 Pourquoi gérer les données de la recherche?, Cirad, 2015 (Why manage research data ?, Cirad, 2015)

23 Le libre accès aux publications et aux données de recherche, Portail H2020 (Open Access to Publications and Research Data, H2020 Portal)

24 Réaliser un plan de gestion de données, A. Cartier et al. 2015 (Perform a data management plan, A. Cartier and al.

2015)

(22)

own data management plan using online tools25.

There are several existing representations for the life cycle of the research data.

Definition 11 Schematically, the life cycle of a given datum corresponds to the period which extends from the datum’s design to its utilization and until its destruction or its preservation for historical or scientific purposes.

According to the UK Data Archives Centre26specialized in social sciences research data, the life cycle of the research data contains six main steps:

• the creation of the data

– definition of the research technical baseline – implementation of a data management plan – data localization

– data collection – data description

• data processing

– data entry, digitization, translation, transcription – control, validation, data cleaning

– rendering the data anonymous and description of the data – data management and storage

• data analysis – interpretation

– production of derived data – production of research results – data preparation for preservation

• data preservation

– migration in a sustainable format – creation of meta-data

– data documentation – archiving data

• accessibility and data sharing – distribution / data sharing – access control (or not)

– establishment of protections (copyright vs. copyleft) – promoting data via an open internet platform

• the reusing of data

– monitoring and reviewing the research – new research from the data

– cross-referencing data with other data from other domains

In the sub-section 2.1.2, we will explain the purely functional view of a model of the data life cycle in e-sciences (sciences using digital technology, such as cloud and big-data).

25 DMP Tool

26 Research Data Lifecycle, UK Data Archives

(23)

1.4 Data Hosting and Cloud Computing

Data management must take into account the entire life cycle of the data. It is important to note that the preservation and the accessibility of data, for the time being are not decided during archiving, but it’s just a question of how it’s anticipated. Anyone wishing to optimize the management of his or her data should, therefore take some time to think about it, in order to establish clearly when the data begin to exist and when do they vanish. The description of the data by meta-data, from the stage of creation, is essential to effectively manage data throughout their life cycle. Anticipation is really to encourage and conversely it must be possible to change the documents’ statuses according to the environment’s evolution.

1.4 Data Hosting and Cloud Computing

The cloud nowadays offer a functional concept which allows to get access to it via the network, and upon request, for virtualization, shared computing storage and computing resources. This concept is now more and more popular in the context of the big data, and open data in particular, to deposit data including the very simplest uses, such as the online file storage for individuals. Dropbox services27have indeed increased in the recent past, and are used by "Mister everyone".

In order to take better advantage of the cloud’s approach, new management and data processing solutions have emerged. Indeed, traditional data management systems (relational databases) cannot meet the challenges of, variety and scale, required by the big data. The software evolution have therefore followed, quite naturally, the material evolution. This is particularly the development of new databases adapted to large and unstructured data, which is mainly known as NoSQL28, and the development of high-performance computing modes, such as the MapReduce paradigm29. These two notions clearly pin-points a major evolution to pursue the phenomenon of the big data concerning languages and programming modes.

1.4.1 Computer science technology’s infrastructure related to the cloud and data For the computer scientist, an infrastructure (also known as platform) includes processors, storage and network. In this document we will speak more specifically about the type ofcloud infrastructure. The following is the definition, of cloud computing, given by NIST (National Institute for Standards and Technology),:

Definition 12 Cloud computing is a computer model which allows an easy on-demand "access to a set of configurable resources (network, processors, storage, applications, and services) which can be provisioned and released very easily in order to interact with the supplier".

A cloud, from the computer scientist’s point of view, is made up of three ingredients:

• An ERP (Enterprise Resource Planning) is intended to manage the customer relationship and the application catalog which the user can deploy in the cloud;

• a deployment model (how and on what types of processors are the applications deployed)

• storage nodes and computing, i.e, an hardware infrastructure.

27 Dropbox, service de stockage et de partage de copies de fichiers locaux en ligne (Local file copy and file storage service online)

28 Système de gestion de bases de données NoSQL (NoSQL database management system)

29 Paradigme MapReduce

(24)

Figure 1.3: Features, service models and deployment of cloud computing.

From an overall point of view, the cloud is composed offive essential features, and it’s based onthree service models(SaaS, PaaS et IaaS, see Figure 1.4), andthree deployment models(private cloud, public cloud and hybrid cloud).

The cloud’s five essential features, highlighted by NIST, are:

• On-demand self-service

The cloud is recognized as an "oriented service" i.e, it plays around with a software architecture based on a set of simple services, developed by inspiring on the professions’ processes of the company or of the university for example. In the service’s approach there is also a clear distinction between the notion of contract and that of an implementation. The cloud goes beyond physical and hardware layers (more server management and software system), more installs / software configurations on PCs, and for developers, instantaneous deployment without management of heterogeneous platforms. The cloud is based on the virtualization principle (cf. 14 page 15) which makes it possible to reduce or even eliminate the hardware and software dependency.

• Broad network access

Services are accessible through the use of protocols and standards coming from the Internet environment.

• Resource pooling

Applications share a set of resources to achieve wide range of broad economies.

• Quick and adjustable supplying (rapid elasticity)

The cloud can increase flexibility. Infrastructure is allocated according to the needs and can be adjusted upwards or downwards.

• Resources and services which can be measured (measured service)

Le cloud is also oriented "according to the usage" i.e, it relies upon so-called "reporting" tools to track consumption with flexible billing for use.

Cloud computing has opened opportunities for collaboration between individuals and companies, regardless of their geographical location. The cloud can be public, often through a distributed network of computing resources which may not be in the organization’s premises. The cloud can be

(25)

1.4 Data Hosting and Cloud Computing

private when an organization does not open it to the outside world, and finally it can be an hybrid when it allows to take resources in both public and private cloud.

1.4.2 Service models offered by cloud computing

According to NIST, there are three service models which can be offered in cloud computing: SaaS, PaaS et IaaS (Figure 1.4). They are differentiated with regard to the involvement of the user in the management of the system. To the left side of the Figure 1.4 we have the classic organization where the provider takes care of nothing and asks to the customer to do everything. To the right side of the Figure 1.4, the provider takes care of everything.

Software as a service orSoftware as a Service(SaaS) is a model for operating software in the clouds in which they are installed on remote servers rather than on the user’s equipment. Customers do not pay a license for using one version, but generally they use the online service for free or pay a recurring subscription.

Platform as a Service(PaaS), is another form of cloud computing where the cloud provider makes available, to the organizations, a rapidly running real time environment, by allowing them to master the applications they can install, configure and use themselves. For example, the stack LAMP software, for building website servers is a PaaS. LAMP brings together the Linux operating system, the Apache web server, a database server (MySQL or MariaDB) and, other original ones, PHP, Perl or Python. Cloud9 IDE is an integrated online development (IDE) environment which can also be considered as an application service for developing online applications in multiple programming languages.

Infrastructure as a Service(IaaS) is another form of cloud where the organization buys service from a cloud provider. This service can be an empty device or one with an operating system (OS) such as Windows or Linux. This can represent, for some information systems departments (DSI), a way of making savings. Mainly by transforming investments into leasing contracts and as such avoiding dealing directly with cooling, electrical redundancy, physical access control. We will talk about this later during the data centers’ presentations.

1.4.3 Cloud computing’s general operations

From an economic model’s point of view, cloud computing is essentially an off-the-shelf subscription to external services. For example, cloud providers have created a payment system "pay-as-you-go"

this allows a customer to rent a server over a short period of time, and obtain the desired sizes, in terms of number of processors, disk space and network bandwidth. This implies, the provider is tracking your resources’ consumption and will send again an invoice for it, which can be conflicting if the method is intrusive.

The cloud user isin the core of the concernsand the minimum effort, the user should make to use it, is obtained, thanks to an all-round automation, for example during the services’ deployment. As we have already mentioned, the cloud aims to pool, to the extreme, the hardware and software resources and to provide a unique and coherent environment. It’s based on what’s called a multi-tenant architecture.

(26)

Figure 1.4: The different services offered by cloud computing.

Definition 13 The multi-tenant architecture is a principle which allows to provide one software to several client organizations from a single installation.

This concept can be seen as tenants who would, each lives in an apartment of the same building.

This organization has the advantage of sharing a number of resources such as heating, water distribu- tion, electricity, management and maintenance services performed by a joint property ownership manager. . . unlike people living in their own homes. The pooling, which is the result of this type of infrastructure, allows the provider to update only one software and notNcopies located in different geographical areas. The resulting economy is immediate.

The support service question should also be the key element to a successful cloud. One might think, in terms of accounting arguments, which since it is a matter of updating a single copy rather thanN, possibly scattered over several sites, we could reduce the number of computer scientists.

This way of reasoning belongs to the past. With the cloud, there is no longer a computer scientist to re-format hard disks. The jobs are changing. It would be better to reassign the computer scientists, as such they wouldn’t end up with non-functional tools. A support service would therefore be of great use for spreading its competencies, sending it back to the competent services, or even managing the outputs.

Cloud computing manages services through thevirtualization. Cloud and virtualization are two different concepts; virtualization is a technology whereas the cloud is more a solution made of multiple technologies, including virtualization.

Definition 14 Virtualization involves running one or more operating systems / applications as a single piece of software on one or more computers-servers / operating systems, instead of installing only one by computer.

Thus the service provider, in a cloud, saves on the hardware required to serve multiple customers.

Operating systems are isolated from each other by proprietary techniques, i.e, there is no interaction between client environments, so virtualization is "sealed". The idea is to offer the user a sandbox in

(27)

1.4 Data Hosting and Cloud Computing

which he or she can work without disturbing the user next door. In recent years, the multiplexing of operating systems, i.e, the fact of running several operating systems on the same machine, is replaced by a multiplexing of several applications in containers, isolated from each other but sharing the libraries from the host operating system. In this way we obtain a more resource-efficient use, as it is possible to run several tens of containers or a hundred on a processor, whilst it is possible to run only a few complete operating systems on the same processor via the classical virtualization.

So we are now far away from the practices of a few years ago, when the rule "one server for one application" was in force.

1.4.4 Cloud, security and data ownership

Beyond the technical implementation of deployment in the cloud, it may often become difficult to know, how the ownership of the hosted information, used and / or accessed in the cloud by multi-tenant IT organizations, is defined due to the pooling. In addition, uncertainties about the security and confidentiality of the information, held in the cloud, raise a problem; Security breach cases faced by established companies have raised new concerns among public and private sector companies, as well as, between individuals.

It is increasingly recognized that the cloud system users need to be aware of potential risks to privacy issues when information is owned and / or processed in public or shared by cloud systems.

These concerns delay the migration and adoption of the cloud technologies despite their undeniable benefits. However, according to Gartner, spending on cloud services and Infrastructure as a Service (IaaS) has annual growth rates estimated at 17.7% and 41.3% from 2011 to 2016, respectively.

The cloud is more a virtual organization of services than a physical organization. Cloud systems are typically hosted in data centers.

Definition 15 A data center is a physical site on which are gathered together the overall devices of the information system of an organization.

It includes an environmental control (air-conditioning, fire prevention and control system, etc. . . ), an emergency and redundant power supply, as well as a high level of physical security, such as a fence around the servers, and a bio-metric access control monitoring with laser beams.

R For a detailed description of the infrastructures of a data center, we request you to refer to part 3 of this document, which provides details on a data center in the Parisian region.

As far as the data is concerned, it is well known that cloud computing can lead to different forms of confinement and the following threats have been identified:

• Confinement in a platform / infrastructure: migrating a cloud provider, using a platform, to a cloud provider using a different platform can be very complex;

• Data Lockout: Since the cloud is still new, property standards - to determine who actually owns the data once they are in the cloud - are not yet sufficiently developed, which can make data migration more complex, if cloud users decide to move the data of the originally selected provider, out of the platform;

• Confinement by tools: if the integrated tools, to a certain cloud, must manage a cloud environment, and if they are not compatible with other clouds, these tools can only handle data or applications which dwell only in the provider’s cloud.

(28)

• The drawbacks attributed to the clouds, implemented by private providers (e.g., Google, Amazon, Microsoft, Apple) are: (1) submitting to the legal authority in effect, in the country hosting the data; (2) the duration and cost of the transfer (to be linked with the current rates and the guarantees offered on these outputs); (3) dependency on the service provider; (4) the securing of the data and (5) the durability of the service.

To deal with all the contingencies, major large accounts are scrutinizing very closely all the contracts which link them to the suppliers. . . , and as a matter of fact go for good insurances!

1.4.5 Adoption of cloud

In October 2015, appeared a report of CIGREF (Computer Club of Large French Companies) entitled

"The reality of the cloud in large companies". This report notes that three years after analyzing the "fundamentals of cloud computing" and the question of "the data protection in the cloud", le CIGREF is now wondering why its members are moving to the cloud. It mainly appears that this is due to

• the pressure of the professions’ directions,

• the cloud as a vector of innovation and agility,

• the pressure of the suppliers,

• the simplification of the infrastructures,

• the cost reduction,

which are proving to be the leading drivers for the adoption of the cloud’s solution. However, the introduction of the cloud also has its dark sides and difficulties in its implementation; and so this study is therefore also interested in:

• the governing authority to be put into place,

• the implementation for tracking issues,

• the difficulty in assessing the security level and ensuring compliance with the regulations of a cloud’s offering,

• the contractual difficulties with suppliers,

• the problems in adapting regulations to this innovation represented by the cloud.

In the academic world, a study commissioned in 2016 by ALECSO and ITU entitled "Guidelines to improve the use of cloud computing in education in the Arab countries"30provides, not only, a panorama of experiences leading to the adoption of cloud computing technologies in the world of education and research, but also proposes an approach and technical recommendations to go further on these issues. The overall book is rather intended to, on the one hand to the decision-makers, and on the other to a public of computer scientists to implement the recommendations. This study is therefore a general scope of the cloud’s migration strategies, and contains many references from the completed projects.

1.4.6 Storage and archiving in the cloud

The information storage is provided by a computer scientist’s device. The current technologies for storing are themass memories(hard drive, SSD card or microSD) or thefast access memorieslike the RAM memory (Random Access Memory) and laptops. When written on a mass storage device, the information stays there even after a power failure. When you write in a RAM and there is a

30 Guidelines to improve the use of cloud computing in education in the Arab countries, 2016

(29)

1.4 Data Hosting and Cloud Computing

power failure, the information is lost. The advantage of RAM memories is that the access duration, read and write rates are much higher than those of a mass storage device. However, for an equivalent price, a mass storage device provides a storage capacity far greater than a RAM.

A conventional hard disk is a set of trays with a magnetic surface which, like a tape, records information through the movement of the read-write heads, attached to the arms. All of these arms are called the comb. Thanks to the rapid rotation of the disk (from 5400 to 15000 revolutions / minute) the reading head floats above the surface of the magnetic disk, at a comparable distance to the width of a fly between the landing ground of an A320 and the tarmac. This technology reaches its limits and uses the properties of a less intuitive mechanism called a tunnel effect in quantum mechanics. Its response time is measured in milliseconds or 10-3 seconds (about 15 ms).

The SSD card, on the other hand, is an electronic mass memory mechanism based on electronic components capable of storing information without power supply, but whose read and write response times are much better than those of the hard disks, but less good than those of the RAM or the fast accessing RAM. Its response time is measured in milliseconds (about 0.1 ms) of the RAM is 10 nanoseconds (10 10-9 seconds).

In terms of cost, we have the following simple ranking: 1 GB mechanical hard disk <1 GB in SSD mass memory <1 GB in active RAM.

These technologies are evolving and the decline in the cost of their manufacture is contributing greatly. We are currently talking about NVRAM (Non Volatile RAM) which should be a standard for our machines in the near future, and promises 100 times faster access times than the SSDs. This technology aims to keep both the benefits of the RAM and the media. The most known current NVRAM technology is flash memory. NVRAMs potentially impact the organization of memory on a computer system. The organization is hierarchical with high-speed access memories, on the top of the hierarchy and in the bottom of the hierarchy, are "slow" but with large-capacity memories.

NVRAMs remove one level in the hierarchy. This is a phenomenon which has always existed: floppy disks 5"1/4 have been replaced by other media which at the same time would be also replaced one day or the other. The CD-ROM for example is disappearing.

R A technical presentation of NVRAM technology and its integration into IT systems is available online31. It’s about a presentation performed as part of the international workshop, High Performance Data Intensive Computing (HPDIC) in Phoenix, Arizona. This presentation is for computer scientists’ audience.

In the past, wearchiveda document when it was in the end of the life cycle. Archiving today is implemented immediately from the beginning of the documents’ creation, which are enriched throughout their life cycle. However, it is important to differentiate archiving from storage and backup.

Definition 16 Storage concerns actions, tools and methods for storing electronic content. The backup concerns all the actions, tools and methods intended to duplicate original electronic contents, in order to secure the contents and to prevent their loss. Finally, electronic archiving concerns all the actions, tools and methods used to gather, identify, select, classify and preserve electronic content in the very long term.

31 Future server platforms: persistent memory for data-intensive applications, S. Kannan and A. Gavrilovska, 2014

Références

Documents relatifs

L’exposition de trente huit (38) thalles du lichen, pour une durée de 30 jours, a permis de mesurer les concentrations du plomb et du Zinc atmosphériques en deux saison

Au terme d’un passage au sein du service d’urgences, décision est prise de poursuivre les soins en hospitalisation (in situ ou par transfert vers un autre établissement de soins),

Increase data reuse: All surveys intended for use have different measures to increase data re- use; among those lists of publications that used the data; how-to instructional

Data pruning In all the previous items, the entire set of data is used. However, because of a large redundancy, not all data are equally important, or useful. Hence, one can

Pr ZOUBIR Mohamed* Anesthésie Réanimation Pr TAHIRI My El Hassan* Chirurgie Générale PROFESSEURS AGREGES :..

We have shown that our solution and Linked Data can be applied successfully in a real world scenario with significant advantages over established Open Data solutions, e.g., a

L’essai pressiométrique est l’un des outils principaux de la reconnaissance géotechnique en France et est un moyen très pratique pour faire le lien avec des

We believe that three aspects are key for dealing with intensive Big Data management: (i) dealing with efficient simple storage systems that focus efficient read and write