Une base de données informatisée se définit comme un ensemble de données organisé en fonction d’un objectif préalablement défini (118,141,142) et stocké sur un support accessible par ordinateur.
Si l’on souhaite exploiter la base dans le cadre d’une activité de recherche, il est important de s’assurer :
• que celle-ci est interrogeable, c’est à dire organisée et accessible (142) ;
• que la réponse fournie est exploitable, c’est à dire représentative du sujet étudié, uniforme et idéalement, dans un souci de coopération scientifique, partageable de façon compréhensible indépendamment du lieu de recueil des données (143).
Une base de données collectée automatiquement à haute fréquence est souvent de type relationnel, c’est-à-dire qu’elle associe plusieurs tables liées les unes aux autres (118,141,142). La classification des données et les interactions entre les tables vont dépendre directement de l’objet étudié et des données collectées. Les règles de programmation vont permettre d’uniformiser, d’organiser, et de stocker les données de manière à rendre chaque donnée unique, tout en liant entre elles toutes les données d’un même patient (118,141,142,144). A cela s’ajoutent des procédures de nettoyage et de suppression des données inutiles dans le but de limiter la surcharge et le ralentissement de la base. La programmation de ce type de base de données est souvent réalisée en langage SQL (Structured Query Langage). Le langage SQL comporte plusieurs composantes qui vont permettre la manipulation, l’organisation et le contrôle des données tout en facilitant la réalisation de requêtes. Le système de gestion de la base de données permet la programmation et l’utilisation de la base, mais garantit aussi l’intégrité et la sécurité de celle-ci, en respectant les propriétés ACID (atomicité, cohérence,
isolation et durabilité). Ces propriétés font en sorte, notamment, que chaque transaction réussit complètement ou échoue complètement ; aucune transaction ne peut être partiellement complétée (145). Le système de gestion choisi dans le cadre d’une base de données de grande taille est un système sur serveur. Ce type de dispositif va permettre la gestion, l’organisation et le stockage d’une grande quantité de données tout en autorisant simultanément l’accès à plusieurs utilisateurs (142,144).
Stimulated by massive technologic innovation, ICUs currently contain abundant high- performance bedside medical systems, such as cardiorespiratory monitors, pulse oximeters, mechanical ventilators, and infusion pumps (1–5). These systems provide physicians with clinical data, such as physiologic signals, pharmacotherapy, or therapeutic procedures, in order to establish the patient’s therapeutic care plan (5, 6). Simultaneously, it became obvious to the scientific community that elaborating data gathering procedures was crucial, as a wide amount of data was lost rather than used to improve clinical research efficiency and data analysis (1-7). This data collection gives rise to the concept of biomedical signal databases. Subsequently, electronic medical devices surrounding critically ill patients expanded and theses biomedical signals can now be timely linked to clinical, radiologic, laboratory, and pharmaceutical data (8– 16). At the same time, several researchers have developed virtual patients, or computational models, attempting to recreate a real patient’s clinical course under several medical situations, particularly during mechanical ventilation and hemodynamic support (17, 18). Based on these models, ICU teams have conceived computerized clinical decision support systems (CDSSs) aiming to assist caregivers in the management of critically ill patients (19–21). In order to develop and validate both virtual patient and CDSS in critical care, databases that combine biomedical signals, therapeutics, and the clinical outcome following these treatments are necessary (13, 14, 17, 22). To be used in pediatric critical care, such database should include patients under 18 years old, be exhaustive including mechanical ventilation, hemodynamic support, clinical and therapeutics data, and collect data at a high frequency to capture changes in patient physiology. To our knowledge, none of the databases currently described in the literature meet these criteria in pediatric critical care (1, 7, 9, 10, 15, 23–27). Thus, our objective was to build such a database, combining patient therapeutics and clinical variables in time, using the information system and network architecture available through fully electronic
charting in the PICU of a university hospital. The purpose of this article is to describe the data acquisition process from bedside to a research electronic database.
This article describes the collection of a prospective database gathered in the PICU of Sainte Justine Hospital, a 24-bed PICU, medical ICU, surgical ICU, and cardiac ICU in a free-standing tertiary maternal child health center in Montreal, Canada. A fully electronic ICU-specific medical record (IntelliSpace Critical Care and Anesthesia [ICCA]; Koninklijke Philips Electronics, Amsterdam, The Netherlands) was implemented in the PICU in January 2013. We included all patients under 18 years old admitted to the PICU since May 21, 2015. From admission to discharge, all patients’ demographic, physiologic, medical, and therapeutic data were prospectively collected.
Four types of data are collected in our database from medical devices available at the bedside (Supplemental Fig. 1): 1) physiologic signals or biomedical signals from patient monitors (i.e., heart rate, blood pressure, saturation); 2) respiratory and ventilator variables from the ventilator (e.g., Fio2, positive end-expiratory pressure, respiratory rate); 3) pharmacotherapy from the infusion pumps (e.g., drug name, dose, timing of drug administration); and 4) patient demographics and information from the electronic medical record (age, sex, weight, diagnosis, laboratory results). Patient monitors are IntelliVue MP60, MP70, and MX800 (Koninklijke Philips Electronics, the Netherlands). These monitors are designed for surveillance purposes; monitoring physiologic cardiorespiratory waveforms and values such as the electrocardiogram, invasive or noninvasive blood pressure, oxygen saturation, respiratory rate, and end-tidal Co2. The monitors’ biomedical signals from each patient admitted to the unit are continuously transmitted in health level 7 (HL7), across the IntelliVue medical network to the IntelliVue Database Server (Koninklijke Philips Electronics, the Netherlands). The signals are then sent
to the corresponding patients’ electronic medical record (ICCA; Koninklijke Philips Electronics), and the value is recorded. The nurse verifies the data every 5–60 minutes and modifies it if the data do not correspond to the subject (i.e. artefacts). The same signal is transmitted simultaneously to the research database every 5 seconds. Prior to being implemented in the research database, the signal, coded in HL7, must be translated into ordinary medical data, using the free software HL7 listener “Mirth Connect Server 18.104.22.16820” from Mirth (Quality Systems Inc., Irvine, CA) (Supplemental Fig. 1).
Supplemental Fig 1: Global architecture
Data collected into the research database include the patient’s specific identification number; all values and units of measure described above, and times these values occurred (Fig. 1).
Fig 1. Data gathering process
In order to spare data storage space, a programmed cleaning process runs hourly to erase useless nonmedical entries automatically generated by the Mirth Connect Server 22.214.171.12420 while translating the HL7 signal. The ventilators used in the unit and connected to the server are Servo I (Maquet, Rastatt, Germany), and the infusion pumps are Infusomat (B. Braun Medical, Melsungen, Germany). The ventilator settings and the physiologic measurements available on the ventilator are transmitted to the Datacaptor connectivity suite (Capsule Technologie, Andover, MA) through the Datacaptor terminal server (Capsule Technologie) to the patient’s electronic medical record ICCA (Koninklijke Philips Electronics). For research purposes, these data are captured from the Datacaptor terminal server (Capsule Technologie) using periodic- programmed Structured Query Language requests and stored every 30 seconds in a specific table in the research database based on Microsoft Server 2008 (Microsoft, Redmond, WA). The type of medication and its concentration and infusion flow rate are transmitted to the electronic medical record ICCA (Koninklijke Philips Electronics) and the research database using the
same process as described for ventilators. The medication data are gathered in two separate tables depending on the type of medication administration, either continuous or intermittent (IV push medication). The infusions data are stored every 30 seconds, whereas IV push medications are stored by nurses in the electronic medical record at the time of administration. The research database is also linked to the electronic medical record ICCA (Koninklijke Philips Electronics), in order to retrieve medical data, including, push and oral medication, diagnosis, and laboratory test results.
Database Organization and Data Extraction
Prior to being stored, all data are coded and organized in three tables (Fig. 1). To facilitate database use and query, research valuable data are extracted from the tables and summarized into a single time-organized table. Depending on the research purposes, data can be extracted and organized in a single patient timeline. This ad vitam æternam reusable full set of data variables constitutes what we define as the perpetual patient (17).
The servers dedicated to the database are physically located in the informatics department of the Sainte Justine Hospital with restricted access to guarantee data security. The database and the workstation maintenance are overseen by the applied clinical research unit of the hospital. Ethics
The study and the database construction were approved by the institutional review board of Sainte Justine Hospital (number 4061) with a waiver of consent but an opt-out option. The exploitation of the database is regulated by a database policy validated by the institutional review board, and no patient identifiers (name, health insurance number) are stored in the database.
61 Trouble Shooting
To limit the inconvenience on daily medical practice, each step of the process had to be closely checked and prepared, justifying the involvement of caregivers’ staff, IT specialist, and manufacturer. Despite this preparation phase, we dealt with technical issues, including synchronization among therapeutic and surveillance devices during the first 3 months of the project. Other than cost and storage capacity, a major setback was the interference between concurrent data collection and data input into medical chart. This could have potentially compromised patient care in the unit, as the database system and the patients’ electronic medical record (ICCA; Koninklijke Philips Electronics) run through the same network. This interference was limited to a slight slowdown of the medical record system and an impairment of the blood test results retrieving process into the medical record. We do not believe that there was any consequence on patient care and safety, as access to the laboratory server was not altered. This issue was solved in the 3 first months.
Once the data gathering process was running properly, the validation procedure was performed to control the accuracy of the data and to ensure the appropriateness of the gathering process. The objective validation procedure is to settle data accuracy and synchronization. It combined several phases performed at the bedside, including video recording. The validation phases are currently being conducted prospectively within the PICU. During this time, patients’ data collected in the database are compared to data simultaneously displayed on monitoring and therapeutic devices available at bedside.
Between May 21, 2015, and December 31, 2016, 1,386 PICU stays were recorded in the research database from 1,194 patients (Supplemental Fig. 2)
Supplemental Fig. 2: Flowchart
The research database contained one table of 135,224,902 entries (five data sets/entry, a dataset is composed of the storage time, the data description, and its point estimate) from the physiologic signal monitors with an average of 487,820 physiologic data sets/PICU stay and two tables of 408,131,514 and 16,131,718 entries (one dataset/ entry) from the ventilators and infusion pumps for a total volume of 241 GB (approximately 150 GB/yr). Patients’ characteristics at admission to the PICU are depicted in Table 1. PICU stays were divided into 870 medical (63%), 463 non-trauma surgical (33%), and 53 trauma (4%) admissions. A wide spectrum of diagnoses was represented (Table 1).
Table1: Demographic data and disease categories of stays included in the database
IQR: Interquartile Range
The research database gathered abundant physiologic, respiratory, therapeutic, and clinical information (Supplemental Table 1). In particular, ventilation data were automatically collected for every ventilated patient to a maximum of 25 ventilator-setting items and 22 ventilator-related surveillance items every 30 seconds, depending on the type and mode of ventilatory support (Table 2). With regard to medication, we successfully collected data on infusion medications, their concentration, and rate for all admissions (Supplemental Table 1). The data collection permitted the reconstruction of included patient’s entire critical care admission course: patient timeline (Fig. 2).
Demographic data n PICU stays = 1,386
Age (years), median [IQR] 2.0 [0.0 – 9.0]
Weight (Kg), median [IQR] 12.7 [6.2 – 27.0]
Length of stay (hours), median [IQR] 51.0 [26.0 – 103.0]
Dead, n (%) 52 (3.8%)
Main diagnostic category at admission, n (%)
Pulmonary 360 (26.0%)
Post-surgical care 261 (18.8%)
Post-cardiac surgery care 202 (14.6%)
Neurologic 150 (10.8%) Cardiac 107 (7.7%) Infectious 58 (4.2%) Traumatic / Burn 53 (3.8%) Intoxication 49 (3.5%) Otorhinolaryngology 39 (2.8%) Metabolic / hydroelectrolytic 34 (2.5%)
Hematologic / non cerebral tumor 28 (2.0%)
Renal and liver grafts 21 (1.5%)
Liver and gastrointestinal causes 14 (1.0%)
Supplemental Table 1: Example of therapeutic information collected at the same time as the physiologic parameters
Therapeutic Data n patients = 1,194
High Flow oxygen, n (%) 336 (28.1%)
Recorded duration (hr) 30477
Noninvasive ventilation, n (%) 295 (24.7%)
Recorded duration (hr) 29140
Invasive ventilation, n (%) 511 (42.8%)
Recorded duration (hr) 77678
Inotropic and vasoactive medication order1
Epinephrine, n (%) 303 (25.4%) Dobutamine, n (%) 22 (1.8%) Milrinone, n (%) 195 (16.3%) Levosimendan, n (%) 9 (0.8%) Dopamine, n (%) 92 (7.7%) Norepinephrine, n (%) 98 (8.2%) Isoproterenol, n (%) 19 (1.6%)
Sedative and analgesic treatment order (continuous and discontinuous)1
Midazolam, n (%) 375 (31.4%) Lorazepam, n (%) 410 (34.3%) Dexmedetomidine, n (%) 273 (22.9%) Propofol, n (%) 195 (16.3%) Ketamine, n (%) 439 (36.8%) Morphine, n (%) 777 (65.1%) Hydromorphone, n (%) 111 (9.3%) Fentanyl, n (%) 578 (48.4%) Sufentanil, n (%) 10 (0.8%) Remifentanil, n (%) 4 (0.3%)
1. Ordered treatment, not necessarily administered
The preliminary data on validation of the database demonstrated an accurate capture of monitoring signals (28). Following this validation phase, we ensured that the data time stamp was always the same (i.e., data server time). However, we have huge amount of data on infusion pumps and ventilator variables and therefore the validation process is still ongoing. This database is currently used in several concomitant research studies including the development and validation of the automated pediatric logistic organ dysfunction (PELOD) 2 score (29), real-time diagnosis of cerebral status following traumatic brain injury (30), automatic real-time hypoxemia in pediatric acute respiratory distress syndrome monitoring (31), and detection of ventilator-associated events (32), for example.
65 Table 2: Major items in the database
Fig. 2: Patient timeline
Using the bedside information systems and network architecture already available in the PICU of Sainte Justine Hospital, we successfully prospectively collected a large amount of high- frequency and time-organized clinical data into a comprehensive database. Our research database responds to the successful characteristics by Pryor et al (33), which are multidisciplinary team, stable funding, focused goals, data collection, design tied to a particular database focus and function, and relevant leadership. Informatics improvements and the expansion of electronic medical records have empowered data gathering process at the bedside.
Currently, several systems of data gathering are described in the literature (2), both in critical care (8, 10, 15, 25–27) and in other medical fields (4, 12, 34, 35), but few with such a high rate of storage and high amount of data (9, 25–27).
The database’s “gold standard” is undoubtedly the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) database implemented in the ICUs of the Boston’s Beth Israel Hospital in 1996 (32). This database have evolved and flourished throughout the past 20 years (1, 7, 36– 38) from its first version in the late 90s to its current third version, described in 2016 (15). In its latest version (MIMIC III), freely available after the researcher completed a recognized course in protecting human research participants and signed a data use agreement, the MIMIC database gathered data from 38,597 patients above 16 years old admitted to the ICU between 2001 and 2012, and from 7,870 neonates admitted from 2001 to 2008. The MIMIC database addresses many questions surrounding not only data gathering but also data sharing (36, 37) in the field of critical care. The MIMIC database, however, as numerous other high- and low-rate storage databases (2, 10, 15, 38, 39), has failed to include pediatric patients from 28 days old to 16 years old. Indeed, only a few databases described in the past 10 years include all types of admissions in PICU with this high-rate data collection (Table 3) and, to our knowledge, none of them integrate the variety of data we describe (Tables 2 and 3). The currently available database described in the literature more exhaustive than ours in terms of demographic and medical data is the Virtual Pediatric System (VPS), LLC, an online pediatric critical care network implemented in 2005 and accessible online since 2009. It was built in 2005 on the previously described Virtual PICU (implemented in 1997) (40) along with the partnership of the National Outcomes Center, the National Association of Children’s Hospitals and Related Institutions, and the Children’s Hospital Los Angeles. The VPS database is a prospective observational cohort of more than one million consecutive admissions from 135 PICUs around the United States and Canada. Its main objective was to develop a web-based database with prospective data collection, aiming to provide information on PICU practices and patients outcomes (41), with no or few biomedical signals, ventilator settings, or medication data. With regard to biomedical signals and high rate data acquisition, the previously available database
gathering system in PICU was described in 2003 by Goldstein et al (9) and implemented at Doernbecher Children’s Hospital, Oregon Health and Science University. In their publication, Goldstein et al (9) described about 170 pediatric patients where the main admission diagnosis was brain injury. The database by Goldstein et al (9) was mainly a physiologic signal and waveform database with a high-frequency recording rate (from 1 to 500 vs 0.2 to 0.03 Hz). Finally, the only database presently comparable to ours is the trending, tracking, and triggering system (T3) (Etiometry, Boston, MA) (25–27). T3 is a Food and Drug Administration– approved system, implemented in the PICU and cardiac PICU of the Boston Children’s Hospital (United States) in 2013 and the Hospital for Sick Children (Canada, 2015) which authorizes collection, storage, and display of organized data at the bedside in near-real time, providing physicians with crucial clinical information and a research database. The T3 system is based on the same architecture as ours, the IntelliVue medical network, and collects physiologic data from monitors and respiratory settings from respirators at a 5-second frequency (25). Other clinical, laboratory, and demographic data are manually collected from patients’ chart without collection on medication and other therapeutics. Depending on database purposes, collected data and recording rates will vary. When studying physiologic signals and variability, a high recording and acquisition rate is necessary, in contrast to epidemiologic studies where the number of variables matters most (33). Technologic improvement has progressively challenged researchers to deal with both storage and cost problems (14). Data storage capacity has increased throughout the years, for a limited incremental cost increase (14). The future scope of research with high-frequency database is undeniable. The main purpose of this database is to provide our research group with a large, high-quality, multimodal dataset. In the future, the dataset will help us develop, validate, and model virtual patients in cardiorespiratory physiology (17, 18) as well as CDSS’s and data-driven learning systems (42) prior to applying them in PICU daily practice (19, 20, 25, 27, 43, 44). It has already served to construct physiologic
predictive models that will soon be published and might serve for future epidemiologic studies, registry-based randomized controlled trial (24). Indeed, this type of database will simplify screening of patients and data of interest, and complete case report forms automatically, thus saving time and cost while insuring completeness (24, 39). Combining data from biomedical signals, ventilatory support and timed IV medication might be of great interest to conduct physiology studies, ventilator to patient interactions, or pharmacodynamics studies. Furthermore, due to the large amount of collected data, these databases, once linked to specific software, could be useful for data mining; a knowledge discovery process while exploring the database (22). By providing diagnostic and therapeutic tools while improving research efficiency and cost, this database could potentially enhance patient care and safety. Our database has several limitations. First, data reliability is critical in this kind of data storage. High-frequency database validation procedures are not well established in literature. Despite the ability to compare database samples to patient medical records (45), a data extraction system to validate the reliability of data gathering over time is still necessary. Data gathering and organization are crucial, as important clinical information can be lost from ICU monitoring devices (2, 39). Validation procedures were often not included or insufficiently considered in numerous databases’ description (9, 24, 46). To ensure data reliability, accuracy, and synchronization to future users of the database, we are currently completing the validation procedures. Based on other high-quality database validation procedures available in literature (10, 45, 47, 48), we have elaborated a human resource prospective validation procedure. We are using videotaped data displayed at the bedside, and comparing it to simultaneously collected