• Aucun résultat trouvé

Health fields 57

Dans le document Hosting and Data Management (Page 68-71)

3.1.1 Definition of health data and details about their specificity

Health data includes all medical data (i.e. magnetic resonance imaging (MRI), X-ray scanner (CT),. . . ) and of health (i.e. Hospital Information System (HIS), Radiological Information System (RIS),. . . ). They concern a person, a group or a population of people. These data are useful for clinical routine, research, health monitoring and, on a large scale, for improving health policy.

In regards to the new data acquisition and the new data generational methods, data volumes are constantly increasing, so the need for a type of hosting, which respects the specificity of the health data. This part is dedicated to explain the implementation of the data hosting.

3.1.2 Data type

We can differentiate between two types of health data, raw as well as processed data.

Raw data

Raw data is collected or acquired from a production source. For example, it may be an image acquisition device, patient information inputs, a DNA sequencer, and so on.

These data are usually large and contains several pieces of information.

The processed data

The processed data are obtained following a succession of processes applied to the raw data. These processes produce data which may be smaller or larger than the original data.

3.1.3 Data origination

In public health, the majority of the data is collected and stored through hospital information systems or through other information systems such as radiological SI or PACS (image archiving and transmission system), or Picture Archiving and Communication System). Another source of data, which is available through health research consortium and groups.

3.1 Health fields

Hospital Information System (HIS)

The HIS is necessary for the good functioning of the health care or healing pathway within the health facility. The HIS essentially contains:

• Administrative and medical information;

• Information about the health care pathway of the patient.

This information evolves according to the duration of the HIS’s implementation and the quality and quantity of the input data.

Picture Archiving and Communications System (PACS)

In the medical field, more and more hospitals are equipped with PACS. These systems provide faster access to the patient’s medical images. The problem is to find the most intelligent, safe and secure way to manage the circulation and sharing of the medical images between practitioners, in order to make the best diagnosis.

Cohort or group study

These are databases consisting of a set of topics sharing a number of common characteristics, tracked over time at the individual level to identify the eventual occurrence of health events of interest. These are research objects used on the long term. For example, the cohortConsistencyaims to follow 200,000 volunteers for research purposes. The aim is to implement a large epidemiological cohort, representative of the general adult population and large population, intended to contribute to the development of epidemiological research and to provide information for the sake of public health’s purposes.

Since March 2009, INSERM has set up a National Cohort Coordination Unit (CCNC), within the Thematic Institute for the Public Health, running in partnership with the Institute for Public Health Research (IReSP). The goal of this unit is to provide cohorts with publicly funded services and to make life easier for shared access to the data of these cohorts to all interested scientific communities.

Among the cohorts managed by INSERM, we present CépiDC.

3.1.4 CépiDc (Center for epidemiology of Medical Causes of Death

Among INSERM’s legal missions the production is the national statistics of the medical causes of death. This mission is provided by a service unit of INSERM, the center for epidemiology of medical causes of death (CépiDc). It includes medical coding, statistical processing and the data flow of the medical causes of death. INSERM has the additional responsibility, throughout this process, to implement all the physical and logical measures to guarantee the confidentiality of the data.

For each death occurring in the French territory (from 550000 to 600000 per year), a doctor must write a certificate, including his or her medical causes. Certificates written on paper are sent to the concerned town halls and then to the regional health agencies, who forward them to the provider of the CépiDc, which ensures the digitization and computerized captures. This circuit, which may take several months, is gradually replaced by almost immediate access to the data, thanks to the online application CertDc, available since 2007. It now allows the computerization of more than 10% of the certificates, which in use, are increasing enormously.

Via the "paper" circuit, The data received by CépiDc are of two types: images of the certificate and of the civil status report, which is linked to them, structured data relating to the deceased, in a first time, and the text for the causes of death.

The data received via both circuits are checked and corrected before entering the medical coding circuit, which is based on an application developed by an international consortium (five European

countries and the United States), in which INSERM is represented.

The data set is exchanged, at different stages of their processing and in the form of XML encrypted files, with the INSEE and the InVS (health monitoring institute), thus contributing to the last national statistics on deaths and health surveillance. The CépiDc responds to the requests from researchers working on cohorts by providing them with the data in the form of CSV format files related to the monitored populations. It performs statistical studies by itself using existing statistical software or methods and programs developed internally.

The INSERM CIO provides CépiDc infrastructure and IT support: servers, hosting, security and operating tools, application maintenance, etc. Since INSERM is the data producer on the causes of death, it does not have to apply for approval as a health data host. The data concerning the deceased are anonymous but indirectly identifiable, hence the legal obligation to protect their access. This is achieved mainly by the encryption of the data flows, the architecture is set up to host servers and a strict management of data access rights, including within CépiDc.

The missions of CépiDc will develop in the coming years making available to the medico-economic researchers the data contained in the future national system of the health data (SNDS, managed by CNAM-TS), which will also be fueled by the medical causes of death. The infrastruc-tures, to be put into place, for this new mission are being studied, the same as is the way to get closer to the data using an encrypted identifier.

3.1.5 General problems

Hosting health data must meet a set of prerequisites. Handling such sensitive and personal data, requires a specific safety and security policy. Indeed, CNIL recommends to the various participants to take the necessary measures to secure the data. For example, in research, rendering anonymous of the data is often recommended. This is also the case with medical images. To use them, we must first render them anonymous in order to eliminate any possibility of identifying a person from his or her file or a DICOM image. Rendering anonymous is a technique which makes it possible to remove from a document any reference of the concerned person, through his or her personal data (name, social security number, INS, address,. . . ).

Moreover, the fact that these data are very bulky, it implies that this specificity must be taken into account in the architecture of the hosting system. For any health data hosting infrastructure, a data management plan is required. This plan essentially presents the hosting policy adopted as part of the project. In general, the data is produced by a set of centers also called "nodes". Each node ensures data security and pushes data to a more secure and secured place dedicated to hosting health data. The transmission of the data is carried out following the security policy of the transmission of health data. The data transmission is performed following the health data transmission security policy (Figure 3.1). The data center, also called Data-center, hosts all the data transmitted. It allows users access to data and the possibility of processing them via infrastructures dedicated to scientific calculations.

3.1.6 Trends in data hosting

The current trend on the national level is to minimize the number of hosting centers while increasing the quality of the data access. This principle is essentially based on pooling the resources of the various participants of the production and the processing of the health data.

Another trend related to the previous one is to back-up a scientific computing brick to the hosting part of the health data. This correlation will reduce the data processing time (in parallel to the

Dans le document Hosting and Data Management (Page 68-71)