An Automated Web-Based Searchable Archive
For Implantable Cardioverter Defibrillator Data
by Shin-Ning Duh
S.B., Management Science
S.B., Electrical Engineering & Computer Science Massachusetts Institute of Technology (2002)
Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
August 29, 2003
@ 2003 Shin-Ning Duh. All rights reserved. The author hereby grants to M.I.T. permission
to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so.
Author
Department of Electrical Engineering and Computer Science August 29, 2003
Certified by
,,,Professor Roger G. Mark Thesis Supervisor
Accepted by
Professor Arthur C. Smith Chairman, Department Committee on Graduate Theses
MASSACHUSETTS INS-T E OF TECHNOLOGY
An Automated Web-Based Searchable Archive
for Implantable Cardioverter Defibrillator Data
by Shin-Ning Duh
Submitted to the
Department of Electrical Engineering and Computer Science
August 29, 2003
In Partial Fulfillment of the Requirements for the Degree of Master of Engineering In Electrical Engineering and Computer Science
ABSTRACT
This paper details the research, design, and implementation of a computer system built to archive electrograms and related patient data from implantable cardioverter defibrillators (ICDs) produced by Medtronic in particular.
The archive will provide a central and stable repository for researchers looking for data originally stored in the ICDs. The Web interface to the system provides an easy means of searching for patients or episodes of interest. The database will also provide the functionality for appending comments onto a particular episode.
Furthermore, the system allows the physicians to directly upload data from the ICDs into the database. Automatic detection and expansion functions will also allow data from new models and current models that have been modified to be inserted properly into the database.
Thesis Supervisor: Roger G. Mark
Table of Contents
Introduction ... 6
1.1 Background ... 6
1.1.1 Cardiac Arrhythm ias ... 6
1.1.2 Electrocardiogram s ... 7
1.1.3 Ventricular Arrhythm ias ... 9
1.1.3 Sudden Cardiac Death ... 10
1.1.4 Autom ated External Defibrillators ... 11
1.1.5 Im plantable Cardioverter Defibrillators ... 12
1.2 M otivation ... 17
1.3 Related W ork ... 18
1.4 The paper ... 18
The Current Archive System ... 19
2.1 System overview ... 19
2.1.1 Client system ... 20
2.2.1 Shared Archive ... 24
2.3 Technology choices ... 26
2.3.1 Inform ation storage ... 26
2.3.2 W eb server ... 27
2.3.3 Softw are ... 27
2.4 Incorporation of existing and future m anufacturers ... 28
G oals and D esign ... 29
3.1 Goals ... 29
3.2 M edtronic D ata structure ... 30
3.2.1 De-identified text file ... 30
3.2.2 Intra-section structure ... 32
3.2.3 Overview file ... 34
3.2.4 EGM files ... 34
3.3 Interface design ... 35
3.3.1 Present site structure for Guidant data ... 35
3.3.2 M odifications for M edtronic data ... 41
3.4 Database design ... 49
3.4.1 H andling of new m odels and fields ... 49
3.4.2 Table relations ... 50
Im plem entation ... 53
4.1 M iddlew are generation ... 53
4.1.1 Algorithm ... 54
4.1.2 N otable specifics ... 55
4.2 Table generation ... 56
4.2.1 Algorithm ... 56
4.3.1 Insertion detail ... 58
4.3.2 Database m odification routine ... 59
4.3.3 Generation of non-section tables ... 60
4.4 Data retrieval and display ... 60
4.4.1 Guidant integration ... 60
4.4.2 Special considerations ... 61
Conclusion ... 62
5.1 Future w ork ... 64
5.1.1 Tem plate sections ... 64
5.1.2 Text file dow nload ... 64
5.1.3 Sorting function ... 65
5.1.4 Statistical analysis tools ... 65
5.2 Lesson learned ... 65
Appendix ... 66
1. Sam ple De-Identified File Content ... 66
List of Figures
Figure 1. Exam ple of a cardiac signal. ... 8
Figure 2. Intervals betw een cardiac signals. ... 9
Figure 3. ICD and the heart...13
Figure 4. Com ponents inside the ICD . ... 15
Figure 5. System overview diagram . ... 19
Figure 6. Client site ICD disk index/archive system ... 21
Figure 7. The shared ICD data archiving system ... 25
Figure 8. Current search page screen shot. ... 37
Figure 9. Patient result page screen shot... 38
Figure 10. Episode search result screen shot. ... 39
Figure 11. Patient page screen shot...39
Figure 12. ICD page screen shot...40
Figure 13. Episode page screen shot...40
Figure 14. EGM page screen shot...41
Figure 15. Bank page screen shot. ... 41
Figure 16. M edtronic and Guidant integrated search page ... 42
Figure 17. M edtronic ICD page screen shot... 43
Figure 18. Tachycardia Log Entry page screen shot. ... 44
Figure 19. Full detail and com m ent page screen shot. ... 46
Figure 20. Beat interval plot page screen shot ... 47
Figure 21. N on-log ext section page screen shot. ... 48
Figure 22. Log-based text section page screen shot ... 49
Figure 23. Relationships in the database ... 51
Figure 24. Insertion process of data. ... 53
Chapter 1
Introduction
This paper details the research, design, and implementation of a web-enabled and searchable
computer system built to archive electrograms and related patient data from implantable cardioverter
defibrillators (ICDs). This system will handle how the data sent from the hospitals should be handled and
how users will be able to search the ICD data they need. In particular, I will go into depth in describing
the implementation for models manufactured by Medtronic, one of the major manufacturers of ICDs.
1.1 Background
Implantable cardioverter defibrillators are common devices employed to constantly monitor the heart
rhythm of patients at high risk of sudden cardiac death from ventricular tachyarrhythmias. The most
life-threatening arrhythmia is ventricular fibrillation (VF), where the patients can die within minutes if
immediate treatment is unavailable. The less serious ventricular tachycardia (VT) can also turn into VF,
thus posing additional threat. ICDs have been shown to possess higher life-saving potential than medical
therapy for patients with VF. Not only do they apply shocks to the patients' hearts when necessary, they
also store electrogram (EGM) documentation of episodes of VF and VT attacks.
1.1.1 Cardiac Arrhythmias
Arrhythmias are disorders of the heart rhythms. They can describe hearts that beat outside of the
consequence, while some of them are fatal. The seriousness of arrhythmias also depends on a person's
age, activity, and physical fitness.
Bradycardia is the term for a heart that beats less than 60 beats per minute. Many athletes actually have
it because their hearts have become very efficient in circulating blood. Thus bradycardia can happen to
perfectly healthy individuals. However, bradycardia can also cause insufficient cardiac output. With so
few contractions, the body may not get enough blood. Symptoms of fatigue, dizziness, lightheadedness,
fainting or near-fainting spells will then arise. When needed, bradycardia can be easily corrected by
pacemaker implantation.
Tachycardia is the term for a heart that beats more than 100 beats per minute. It is also called
tachyarrhythmia. While it is less harmful at the lower range, it is much more serious above the optimized
cardiac output rate of 180 beats per minute. The heart is no longer able to adequately fill its chambers
with blood before each contraction. The body becomes deprived of blood despite the heart pumping
madly. Symptoms may include palpitations, rapid heart action, dizziness, lightheadedness, fainting or
near fainting. Ventricular fibrillation may also arise, making the heart unable to beat at all and sending
the patient into cardiac arrest, and treatment may involve cardioversion, medication, and implantation of
ICDs.
Arrhythmias can also be classified according to the origin of the heartbeat abnormality. Ventricular
arrhythmias come from the lower chambers of the heart called the ventricles, while supraventricular
arrhythmias come from tissues above the ventricles. Supraventricular arrhythmias can describe any
arrhythmias coming from the sinus node, the atrium, or even the atrioventricular node that connects to
both the ventricles and the atria. As an example, ventricular tachycardia describes unusually fast beating of
the heart originating from the ventricles.
1.1.2 Electrocardiograms
An electrocardiogram measures the electrical activity of the heartbeat from the surface of the patient's
across his chest. These electrodes are connected to wires that send the signals to the ECG machine, where
the electrical activity of the heart will be printed on moving strip of paper for examination by the
physicians. By examining the ECG, a physician can determine the presence of any arrhythmias.
There are two major kinds of information provided by the ECG. By measuring the time intervals on
the ECG, the physician can determine the speed and regularity of the heartbeats. Next, by measuring the
morphologies of the impulses, the physician can determine where in the heart the problem may have
risen. The ECG machine automatically records signals from a number of different combinations of the
electrodes, with each one being a lead by itself. For example, a lead records the signals between the
patient's two arms, another one records the signals between the patient's left arm and right leg, and yet
another one records the signals across the patient's chest. With all these leads of varied combination, the
physician can practically get a three-dimensional picture of the electrical activity within the patient's
body.
E4.
1:
Figure 1. Example of a cardiac signal'.
Example of an electrical signal produced by one single heartbeat is shown in Figure 1. The first little
hump marked by P is the P-Wave produced by the atria. The big spike that follows is the QRS Complex
produced by the ventricles. Following the QRS Complex, another hump marked by T is the T-Wave,
which represents the repolarization of the ventricles in preparation for the next signal. The interval in
between the beginnings of the P-Wave and the QRS Complex is called the PR interval during which the
impulse is traveling through the atria, the atrial-ventricular (AV) node, and the ventricular conduction
1 Cardiac Arrhythmia InfoCenter. http:/ /home.earthlink.net/-avdoc/infocntr/htrhythm/hrecg.htm, 2003.
system. The distance between the QRS Complex of one signal to the next is called the RR interval. These
intervals are shown in Figure 2.
-
-
-
---.
F~j
Figure 2. Intervals between cardiac signals.
1.1.3 Ventricular Arrhythmias
It has been estimated that 350,000 sudden cardiac deaths occur in the United States each year, and
most of them are due to ventricular tachycardia and ventricular fibrillation [1]. The most common causes
that trigger these ventricular arrhythmias are heart attacks and heart muscle diseases.
1.1.3.1 Ventricular Tachycardia
Ventricular Tachycardia (VT) is one of the most frequently observed arrhythmias in the United States.
It is the unusually rapid beating of the heart that originates from the ventricles at a rate of greater than
100 beats per minute, but oftentimes VT may reach 200-350bpm. When VT reaches a rate of 300bpm or more, it can be called Fast Ventricular Tachycardia (FVT). Other characteristics of VT include wide QRS
complexes of greater than 120ms, presence of AV dissociation, and fusion beats. Its greatest risk comes
from its tendency to spontaneously degenerate into fatal Ventricular Fibrillation (VF) [2].
There are two types of VT, non-sustained and sustained. Non-sustained VT stops on its own within a
few seconds of its onset. Although it may cause palpitations, lightheadedness, and fainting, it is generally
not life threatening. For sustained VT, the continuous rapid heartbeats dramatically reduce cardiac output
and blood pressure. Eventually collapse and death may ensue, so medical intervention is necessary to
patient goes into cardiac arrest with loss of consciousness and no detection of pulse or blood pressure,
electrical cardioversion has to be given via the use of defibrillators.
1.1.3.2 Ventricular Fibrillation
Ventricularfibrillation (VF) is a condition in which the heart's electrical activity becomes disordered
and chaotic [3]. It is often preceded by VT. For VT, the individual myocardial muscle fibers contract in a
rapid, unsynchronized manner. The heart would look like a trembling piece of organ with no ability to
pump any blood. VF can cause immediate collapse of the patient's cardiovascular system, and it is the
primary cause of sudden cardiac death. The patient has to be treated with electrical cardioversion right
away to have any chance of survival. However, even if the patient is successfully resuscitated, he is still at
very high risk of recurrence [4].
Characteristics of VF include an irregular heart rate usually exceeding 300 bpm, an initial amplitude
exceeding 0.2mV, and a waveform that completely flattens within 15 minutes. VF is associated with
coronary artery disease and can be caused by blockages in the coronary arteries. ICDs are the best
treatment for patients of VF.
1.1.3 Sudden Cardiac Death
Sudden Cardiac Death (SCD) is the consequence of a cardiac arrest caused by VF or VT.
Approximately 250,000 out-of-hospital cardiac arrests occur annually in the United States, making SCD
the leading cause of death in the country [5]. However, only 2 to 5 percent of these cardiac arrest victims
are successfully resuscitated. In treating these patients, time is the key. It has been shown that for every
minute that elapses before defibrillation, 7 to 10 percent of patients who might be saved are lost [6], and
the American Heart Association advises a 4-minute critical time frame. It has been documented that
individuals who have had near sudden death experiences, severe heart diseases, and congestive heart
Several alternatives exist in the prevention of out-of-hospital SCD. Traditionally, VF patients have
been given antiarrhythmic medications to keep their heart conditions in check. These medications include
amiodarone, sotalol, and other drugs as needed, such as beta-blockers, aspirin, and ACE-inhibitors.
However, automated external defibrillators (AEDs) and implantable cardioverter defibrillators have been
shown to be more effective in recent years.
1.1.4 Automated External Defibrillators
Automated External Defibrillators (AEDs) are defibrillators able to automatically detect the presence of
VT and VF and perform cardioversion to cardiac arrest victims. When a cardiac arrest case is suspected,
an AED operator can place electrodes on the patient's chest. The AED will detect the patient's condition,
confirm the presence of an arrest, and make a decision for cardioversion. The AED will also instruct the
operator to carry out necessary steps for defibrillating the patient.
There are about 50,000 AEDs in use today [7], and effort has been spent to disseminate AEDs aboard
airplanes and in communities. For these publicly accessible AEDs, liability is not an issue at all since
legislation has been passed in almost all states. For the remaining few states without their own legislation,
the proposed federal Cardiac Arrest Survival Act would protect businesses that provide AEDs from
liability.
American Airlines tested a program in 1997 where automated external defibrillators were placed on
all flights and all 24,000 of its flight attendants were trained to use them in a two-year period [8]. During
this time, defibrillators were used 200 times, which included 191 times on the aircraft and 9 times at the
terminal gate. The AEDs were able to identify cases of VF at 100% specificity and sensitivity, and 13 out
of 13 VF victims that underwent the defibrillation were successfully defibrillated. The rate of survival till
discharge from the hospital was 40% compared to originally rare survival of VF victims aboard due to
delay in getting defibrillation.
Equipping police with the devices and training officers to use them can also save lives since the time
first apply AEDs to patients of suspicious cardiac arrest, followed by manual cardiopulmonary
resuscitation. With a mean 4.4 minutes from collapse to the delivery of the first defibrillation shock, the
survival rate of 90 witnessed cases is 74% compared to 49% of patients who waited till the arrival of
paramedics at a mean of 9.8 minutes.
Even when the defibrillator operators have not used the device in the past, use of AEDs within five
minutes of cardiac arrest proved to be invaluable in resuscitating victims. In a two-year study where
AEDs were installed through out terminals at three Chicago airports, 11 out of 18 VF cardiac arrest
victims were successfully revived, with 6 of the rescuers having no training for AEDs in the past, and
only 3 of them having medical degrees. Out of the remaining 7 unfortunate cases, 4 were due to use of
defibrillation later than 5 minutes, and only 3 were patients who stayed in fibrillation even after the
application of defibrillation within 5 minutes of cardiac arrest [101.
However, since the time to defibrillation is so crucial, the locations at which AEDs are placed have to
be carefully strategized. Operation Heartbeat from the American Heart Association aims to start a chain
of survival that will eventually raise the 5% survival rate for SCD victims to more than 20%. To achieve
this goal, immediate access to an AED is very important in the four-step process of early access, early
CPR, early defibrillation and early advanced care. However, for patients with constant and high known
risk of cardiac arrest or recurrent VF, an implantable cardioverter defibrillator (ICD) would prove more
effective than an AED.
1.1.5 Implantable Cardioverter Defibrillators
Implantable cardioverter defibrillators (ICDs) are defibrillators that can be implanted into patients'
chests to automatically detect episodes of arrhythmias and deliver therapeutic cardioversion if necessary
(Figure 3). They have been in use since 1985 and have saved the lives of many people who would have
Figure 3. ICD and the heart.
An ICD is about the size of a beeper, and performs functions that include those of a pacemaker. For
example, sustained VT can now be stopped by the ICD with the use of anti-tachycardia pacing, a rapid and
painless stimulation of the ventricles that avoids the need for an uncomfortable shock. The ICD system
utilizes one or more pervenous electrodes for rate sensing and defibrillation. The electrode is placed in
the apex of the right ventricle. The ICD and electrode combination follows the parameters programmed
into the ICD by the programmer.
The concept of an ICD to treat SCD is due to Michel Mirowski in 1966 [11]. When an arrhythmia is
detected, the device will deliver the programmed cardioversion, energy of less than 5 Joules for VT, or
defibrillation therapy, energy up to the maximum of the device for VF [12]. The device usually has
independently programmable tachyarrhythmia detection procedures for each case, but the detection of
FVT can be programmed via either VT or VF detection. The detection is based on heart rate and the
length of time that rate remains above the programmed rate. Different therapies can be independently
programmed for each arrhythmia type.
ICDs have been shown to be highly effective for saving lives of patients with high cardiac death
probability. A European study where the researchers followed 102 patients, there was a 91% survival rate
with 58% receiving therapies from their devices [13]. The Multicenter Unsustained Tachycardia Trial
(MUSTT) and Multicenter Automatic Defibrillator Implantation Trial (MADIT) have confirmed that defibrillator implantation can reduce mortality by as much as 50% in coronary artery disease patients
sustained ventricular tachycardia on electrophysiologic studies [14]. In the MADIT II Study where the
researchers enrolled 1232 patients over four years and assigned them to conventional medical therapy
and ICD implantation in a 2:3 ratio showed that the mortality rate for the ICD group was 5.6% lower than
that of the other group. There was also a 31% reduction of hazard ratio for the risk of death from any
cause in the ICD group [151. The AVID (Antiarrhythmics vs. Implantable Defibrillators), CASH (Cardiac
Arrest Study Hamburg), and CIDS (Canadian Implantable Defibrillator Study) have also shown results
that support the superiority of ICDs in treating patients over antiarrhythmics. ICDs are thus considered
highly effective devices to prevent SCD in patients of ventricular arrhythmias.
ICDs also serve the function of providing information about patients' conditions or deaths. A
collaborative study by several universities has found that for patients with ICDs, non-cardiac related
deaths were sometimes wrongly associated with SCD [16]. The reason for death for 109 patients was
classified into three categories: sudden cardiac, nonsudden cardiac, and noncardiac. Later, the
categorization was assessed via autopsy and interrogation of ICDs. Out of the 17 patients categorized as
sudden cardiac cases, only 7 had their ICDs discharge near the time of death, suggesting that deaths
classified as sudden cardiac can in fact be non-cardiac. The examination process can be greatly facilitated
if ICD data have been supplied throughout the patients' history with their ICDs as data contained in the
ICDs proved to be an invaluable source of information.
ICD Implantation
Recent advances in ICD technology have made it relatively safe to implant in a minor operation that
takes one to two hours. The implantation procedure is performed under local anesthetic by an
electrophysiologist, a cardiologist specialized in the treatment of arrhythmias. An incision is made below
the patient's collarbone and a pocket is created for the device's pulse generator, composed of the battery
and other electronics, under the skin. Electrodes are then guided into the heart through the veins with the
use of x-ray technology. After a testing by the physician artificially simulating an arrhythmia and the ICD
correctly identifying and performing necessary treatment, the incision is closed and the operation is
ICD Components
The various components within an ICD are shown in Figure 4. The functions of these components are
also described below.
7b ho
Figure 4. Components inside the IC 2
1. Beeper: Warning beeps are sounded when the battery runs low. Other problems that the
physician should check out may also be indicated by beeps.
2. Battery: Batteries made of lithium silver vanadium oxide last six years or more.
3. CPU: The processor chip analyzes electrical signals from the heart and determines whether any shocks are necessary. The chip runs at less than 100 kilohertz for energy efficiency considerations.
4. Antenna: Communication between the ICD and the programmer is made through low-frequency
radio waves sent from the unit's antenna.
5. Connectors: A lead with three connections provides sensing capabilities and a pathway for high-energy shocks to one of the heart's lower chambers. A second lead with only one connection
provides sensing and pacing to an upper chamber.
6. Casing: The ICD is encased in titanium. Scar tissues grow around it, locking it in place. The device is sealed shut to prevent leakage, so when the battery dies, the entire device must be
replaced.
7. Memory: The pattern of heart activity before, during and after disturbances is recorded. Since these signals are recorded from the patient's heart chambers, they are electrograms (EGM) rather
than ECGs.
8. Leads: Flexible wires convey sensing information to the ICD and carry electric shocks to the
heart. A patient may receive one or two leads depending on the nature of the heart problem.
They are covered with silicone insulation.
9. Capacitors: Similar to the ones used in camera flashes, capacitors can take up to 10 seconds to
draw enough energy from the battery for a large jolt. Charge times for smaller shocks are much
shorter.
10. Transformer: For serious heart disturbances requiring a shock, a transformer converts low-voltage battery power into a higher low-voltage.
The Programmer
Apart from the ICD device itself is the monitoring device used by the physicians. After implant, the
physician can check or adjust the ICD settings by using an external computerized device called a
programmer/recorder/monitor (PRM). The PRM device communicates with the ICD in patient's body
via radio waves from a doughnut-shaped receiver held over the implant site. It works much like using a
garage door opening or clicking a remote control to change channels on a television. The doctor or nurse
uses the PRM to program and test the ICD after implant. When the patient goes in for a checkup, the
The programmer is also able to print paper copies of the data contained in the device at the time of
download. This is how EGMs are traditionally made available for the physician to examine. If a floppy
disk is supplied and inserted into the programmer, the programmer will also save the result of its
interrogation into the diskette.
1.2 Motivation
What triggers fatal arrhythmias has largely remained a mystery despite the effectiveness of the
defibrillators. In attempting to understand the causes of SCD, National Institute of Health sponsored the
Triggers of Ventricular Arrhythmias (TOVA) study [17]. It aims to investigate factors that trigger the
discharge of ICDs. In particular, activities such as wake time, time of assuming the upright posture,
eating and taking of medications are hypothesized to relate to onset of arrhythmia. Thus TOVA will
characterize triggering in 2000 episodes of ICD discharge in 3000 patients treated in 44 centers nationwide
over four years to determine if ICD discharges occur more frequently in certain period. These periods
may include the 3 hours after awakening, on Monday, and in winter, and determination can easily be
made from data contained in the ICDs. TOVA will also utilize EGMs recorded prior to ICD discharge to
identify sympathetic activation during triggering.
Our effort to build a web-accessible database system is in support of the gathering and sharing of
such ICD case data. Currently ICD manufacturers provide proprietary hardware and software to retrieve
data contained in ICDs and store them in floppy disks. However, the floppy disks are hardly ideal
because they are difficult to index and unreliable in storing data. Furthermore, sharing of information is
not as convenient and viewing of information is only possible in paper plots. Thus a searchable archiving
system that may be accessed by multiple users from multiple sites is desired. The physicians will be able
to upload and make the data available instantly, and the researchers will be able to access desired data
1.3 Related Work
The effort to gather ICD cases and make them available over the web started with the TOVA Database
Project in Early 2002 [18]. The TOVA project had several goals. First it transformed the Guidant data in its
original proprietary ZOOM format into PhsioNet's WFDB format, which is a public and well documented
format [19], so that the waveforms can be stored and displayed properly. Secondly a web interface was
created so that viewing and computerized processing of these waveforms and associated parameters is
made possible. Lastly, other proprietary ICD data have to be deciphered, converted, and adapted for web
access.
It was found that input file formats differed between models and manufacturers, so the same codes
cannot be applied quid pro quo. However, the similarities between models for a particular manufacturer
were large enough that considerable code reuse was possible. Eventually inputting a directory containing
the ZOOM format data into the toolchain could output waveforms in WFDB with their annotations, txt
files with parameters, and a data descriptor file that itemizes all generated files.
Another important consideration is that since the ICD data will be provided publicly, de-identification
of data is necessary to protect patients' privacy. This was also made possible in the TOVA project. By
September 2002, there is a complete system capable of processing ICD data from all Guidant ICD devices
in active use. Data from more than 500 patients have also been uploaded and archived.
1.4 The paper
This document outlines the design and implementation of an automated web-based searchable
archive for ICD data. Chapter two will cover the overall goals and design of the archive system. Chapter
three will detail the implementation of the archive. Chapter four will present the potential effects of the
Chapter 2
The Current Archive System
At the inception of my participation in the project, a working system designed around Guidant data
has been implemented [181. Guidant data can be uploaded from the clinic, properly de-identified and
sent to the server at MIT, further processed and archived, and provided to researchers for searching and
viewing over the web. Medtronic data can also be uploaded from the clinic, de-identified, and sent to
MIT, but it has not progressed further than being converted into some waveforms and textual data. This
chapter has been dedicated to describe how the overall system functions, with details pertaining to
Medtronic data.
2.1 System overview
The system can be described by its two major components, the client systems generally located in
clinical facilities, and the shared archive presently located at MIT. The design of the system is shown in
Figure 5.
Client System Shared Archive
ICD disk data are:
ICD
disk data are:" Generated 0 Converted
" Indexed and 0 Merged
archived I Archived
" De-identified Data retrieved are:
" Uploaded to the 0 Searchable
server side * Displayable by
The data will reside in the shared archive while the interface of the client system is used to upload and
download these data. A client-server architecture for the archive has been chosen because it is more
efficient and can handle simultaneous requests from multiple users who may wish to gain access to the
shared archive. In addition, the interface will be web-based because the danger of information leak over
the web has essentially been eliminated by first stripping away any personal information before the data
leaves the clinic. Thus for the purpose of an ICD archive, a web-based client-server model will enjoy all of
the benefits and none of the downfalls.
This section follows the flow of data from the device to the eventual search result displayed on the
web interface. It is organized around the clients and the server, and details the various stages within these
components, with their specific roles and functions. The flow of data starts from its transfer from the ICD
into the clinic system, then to the insertion from the clinic into the MIT database, and finally to the search
result displayed on users' computer screens. Particular attention has been made to describe the working
of the system's Medtronic portion.
2.1.1 Client system
The client systems are designed around an internal Linux clinic PC that performs the tasks of ICD
disk indexing, archiving and de-identification. The clinic PC runs a web server and provides the internal
network within the clinic. The customized software for receiving, indexing, archiving and de-identifying
ICD data runs on the clinic PC.
One or more Windows user PCs can be connected to the clinic PC via internal network, working as
client machines running web browsers. The web server provides pages to guide users in uploading or
Clinic PC User PC Internal
Web Server Web Browser Internet
o Indexing o Archiving
o De-identifying De-IDed
Data Diskette
File System
1se
(HTML)
Figure 6. Client site ICD disk index/archive system.
The programmer interrogates the ICD
The ICD implanted in the patient's body holds all of its data in binary format. This data includes
various settings on the device, information about the patient, and waveform or other related information
when an episode occurs. Since the recording of data is an ongoing process, the content of this data can
always change before a downloading session at the clinic is made by the physician when the patient goes
for a visit. The physician downloads data from the patient's ICD device with the help of a programmer
that interrogates the device with radio waves. After the download, unless the physician specifically erases
the data, the already downloaded data will remain inside the ICD.
Diskette holds programmer output
After the programmer interrogates the ICD, it can print paper copies of the data and save the result of
its interrogation into a diskette. The resulting files saved in the diskette are of manufacturer-specific
format, and we will refer to them as the raw data files. For Guidant, the raw data files include a number of
overview, text, and binary graphical files. For Medtronic, a single binary file with a .ppd extension is
raw data file that combines all of the text and graphical data inside the diskette. It is also possible for the
diskette to hold multiple raw data files, either for the same patient or different patient.
Since the data would stay in the ICD unless the physician erases it intentionally, there can be repeated
data contained in multiple files. Every downloading session furthermore gives its resulting raw data file a
different name to prevent overwriting of old data by new data, making distinction by name infeasible.
Thus this repeated data must be handled, which will be discussed later in this chapter. For our system, a
diskette containing the raw data files is required for the next step.
User PC translates data and uploads resulting files to clinic PC
The user PC is used to translate the binary raw data files in the diskettes into text files, and upload
both the binary and text files to the Clinic PC where they are archived in their unprocessed forms. The
translation is done with proprietary programs supplied by the manufacturers, and these programs may
require the user PC to be run on certain platform. On the other hand, for uploading to the Clinic PC, as
long as a local connection to the Clinic PC can be made, there is no platform restriction for the user PC.
For Medtronic, the translation utility program is called Save to Disk Translation Utility. The utility takes
the binary raw data file as input and outputs a text file that can be read and understood by humans; we
will refer to this textual output as the original datafile. The utility has to be run under a Windows
environment where it can be run with a graphical interface or under the command prompt. For the
graphical interface option, the names of the files contained in the diskette will appear in the left window
of the interface as soon as the diskette is inserted into the floppy drive of the user PC. The user can then
select into the right window desired files to be translated. Only the files that appear in the right window
will be translated at the end upon the click of a button. An output directory where the resulting text files
with .txt extension are to be placed can also be chosen at this time.
After the text file is outputted, the user places the .txt file with its .ppd counterparts in the same
zipped file to save memory, and sends the zipped file to the clinic PC. This is a manual step that is a bit
cumbersome if there are multiple raw-original data file pairs to be dealt with since every pair has to be
run the translation and zipping steps with one single program as a possible extension of the system in the
future.
The user PC can be based on any platform as long as it is connected to the Clinic PC via local network,
but the translation utility program supplied by the manufacturer can only be run under the windows
environment. Thus a windows-based PC is preferred to save the trouble of having to use two different
PCs. The user has to select the zip file, and uploads it to the clinic PC via an html interface.
To upload the zip file, the user has to follow the instructions on the upload page. A username,
password, and the TOVA patient study code have to be entered along with the file to be uploaded. The
usemame and password are necessary to establish the authenticity of the user and prevent abuse of the
system by people without proper authority. The patient study code is used to identify the patient in order
to organize the data with other data from the same patient in the clinic PC. After all the information is
supplied, the upload button will trigger action to archive the zip file into clinic PC.
Clinic PC de-identifies data and archives
The clinic PC serves as the raw data archive for ICD data downloaded from patients and is run on a
Linux platform. It unzips the zip files that contain the raw-original data file pairs, indexes them, makes a
de-identified copy of the original data file, and archives them locally.
To protect the patient's privacy, de-identification is carried out on the clinic PC and from this step on
all of the data will be de-identified. De-identification before the data leaves the clinic avoids complexity of
web security and the possible compromise of the patient's identity through any names (of the patient
himself or his physician), dates (birthday or implant date, shifted by a constant amount but preserving
the time intervals), or other data that could reveal who the patient might be. The original data file is taken
as input, and the output is a zip file containing a de-identified version of the original data file (also of .txt
format) and an overview file named ICD-id-info.txt. The overview file will be used to identify the patient
at the server side, and it includes five vital pieces of information for doing so: patient ID alias, device
After the de-identification process, an index table is created that maps the patient's real name to an
alias code. Other fields of the table include an index number, the study code, patient's birthday, and
patient's gender. Upon clicking on the patient's name, a patient page is shown. This page lists the
multiple devices and zip files associated with this patient. For the device, manufacturer, model, serial
number, serial number alias, and implant date are provided. For the actual interrogation data,
interrogation disk number, disk name, save date, format, programmer model, programmer serial
number, zip file, and its associated de-identified output file are provided.
User PC downloads de-identified data and uploads to Physio-Arch
After clinic PC processes the data, user can access the archived de-identified zip file, download it
and upload it to Physio-Arch, the server that resides at MIT. Both the download and upload are done for
one single file at a time. This is a manual step at the moment, pending on hospital approval that will
make external connection for the clinic PC possible. Once the approval is granted, direct connection from
the clinic PC to Physio-Arch will automate this manual download and upload process.
The user downloads the data via the same internal site that he uses to upload data into the clinic PC.
For uploading into Physio-Arch, the user PC has to have external connection and employs the use of the
World Wide Web.
2.2.1 Shared Archive
A Linux PC running a web server and a relational database hosts the archive. Customized software is used for converting, inserting, and displaying ICD data. The structure of the system is shown in Figure
Linux PC (Physio-Arch)
Internet Web Server
Convert data Retrieve data
Generate file for * Search data EGM/Middleware 9 Display waveforms
Create tables
Insert data
i
Relational Database (PostgreSQL)File System (Linux) oseQ
Figure 7. The shared ICD data archiving system.
Physio-Arch runs automated data processing events
The upload of a de-identified zip file by user at the clinic triggers an automated chain of events that includes EGM file generation and database insertion, which eventually leads to the archiving of ICD data
at Physio-Arch. The end results are files that can be used to generate EGMs on web browsers and database entries specific down to the individual field level. Both the EGM generation and database insertion phases take the same de-identified zip file as input.
The EGM generation phase unzips and looks at the de-identified data file, extracts sections with
EGM waveform and beat interval data, and converts them into WFDB format to be used for graphical
display on the web later. During this phase, beat interval data has to be synchronized carefully with the actual EGM data. Also, every single EGM episode has to be properly indexed according to patient's name, device model, and device SN. The end results are groups of files stored in a temporary directory, with each group representing a particular episode. The unzipped de-identified data file and the overview
file are also contained in this directory. This temporary directory will be cleared as another upload session begins.
The database insertion phase takes the temporary directory from the EGM generation phase as input, generates a middleware file that contains field-value pairs from the non-graphical sections and file
names of EGM data, and inserts them into the database. The EGM files are also copied to a permanent
location to be accessed by the interface.
Researchers search for data on Physio-Arch website
Access to ICD data will be conducted over the web at the Physio-Arch website dedicated to ICD
data. Researchers will be able to search the data by patient or by episode. All of the information contained
in the ICDs is displayed, including any text and EGM data. A utility provided by PhysioNet takes the
EGM files stored in Physio-Arch directories and displays them in annotated and unannotated formats. Other textual data are displayed in tabular format according to their respective sections. Comments can
also be attached to the EGMs by authorized researchers or physicians.
2.3 Technology choices
There were several choices we could make to implement the system described above. Here the choices
and alternatives are discussed.
2.3.1 Information storage
There were several choices for storing ICD data. Firstly, information can be processed into text and
graphical files, and accessed through Perl-based Common Gateway Interface (CGI) programs. Secondly,
information can be stored in serialized JAVA objects and accessed through Java. Thirdly, information can
be processed via Perl into files in Extensible Markup Language (XML), and displayed by Extensible
Stylesheet Language Transformations (XSLT). Finally, information can be stored in relational databases
and accessed through Perl- or PHP-based programs that connect the database with a web interface.
Out of all the choices, the relational database approach is employed, the reasons being the database's
robustness, versatility, and efficiency in accessing data. Programming is easy and structural, and once the
data is stored in the database, statistics can be easily performed or groups formed according to the needs
storage is a PostgreSQL database. It is an open-source freeware that works in conjunction with Linux
operating systems. Linux is an extremely stable platform that is superior to Windows and comparable to
Unix, and being an open-source freeware it requires no monetary investment. Although MIT provides
free use of Unix to researchers within its community, we need to consider the possibility that the server
will be hosted elsewhere, perhaps in a hospital or clinic that may not be on a Unix system.
The open-source nature of the system's underlying architecture keeps the operating cost low, an
important facet of non-profit services. Also, it ensures that in the case of any problems with the
architecture, there would be plenty of support in the PostgreSQL and Linux communities. Any possibly
useful future extensions to the architecture will also be free of charge to obtain.
2.3.2 Web server
Apache web server has been adopted for the purpose of handling data for the web. It has provided
open-source HTTP server support for many UNIX and Windows users since April of 1996. The fact that it
has stood the test of time proves its reliability and popularity. In July 2003, the Netcraft survey found that
63% of the web sites were being run on Apache [19].
2.3.3 Software
With the exception of utility programs provided by the manufacturers, software used by the system
is developed with open-source Perl, Perl CGI, C, SQL and the WFDB software package in PhysioToolkit3,
the software repository of PhysioNet, an NIH-funded Research Resource for Complex Physiologic Signals
[201.
All of the text parsing programs were written in Perl for its powerful manipulation of strings, and the underlying CGI for web pages was also written in Perl. Conversion of graphical data was done in C,
and display of graphical data was made possible by the WFDB software package. SQL was used to access
2.4 Incorporation of existing and future manufacturers
ICD data are of specific formats, making data storage in separate manufacturer-specific databases the most efficient solution. However, in searching for episodes and patients of interest,
manufacturers are often not a concern for the researchers. This poses a difficulty in attempting to
integrate these separate databases for different manufacturers. Furthermore, it is possible for a patient to
own ICDs from different manufacturers.
As a result, it has been decided that separated searches would be run for the different databases on
the same set of criteria. Integration of data from different manufacturers will then be done on the
interface. Even though this interface-level integration yields a less elegant solution than a complete
integration of data for all manufacturers, it avoids a lot of the problems in attempting to match differently
Chapter 3
Goals and Design
The success of this system depends on how useful it will be. It has to be user-friendly and serve its
purpose well. The researchers should be able to quickly search for data that they want to see and data
that they need.
This chapter will outline the goals and the details of design for the Medtronic portion of the system. It
will include a description of the data model for the Medtronic database, discussions on the designs of the
interface and database, and issues related to integration with Guidant data and any other future
expansions to be incorporated into the system.
3.1 Goals
To make this system truly useful for the researchers, great care has to be given in the following areas
to guide the design and implementation:
* Data shall be automatically handled to the maximum extent
* Data shall be properly converted, parsed, and stored in the database
" Data shall be readily and easily searchable by researchers
" Data shall be correctly retrieved and displayed on web browsers
" A mechanism should exist for data from yet unseen models to be handled appropriately * A mechanism should exist for certain authorized researchers to attach comments to certain
* The system design should allow integration with the existing system and future systems of
similar purpose and usage
3.2 Medtronic Data structure
My contribution to the project starts in the shared archive phase of the system, with particular focus on the handling of Medtronic data. At this point, an overview has been generated and is available to me
to identify the patient, and EGM files are also available after the EGM generation step, along with the
original de-identified data file. All of these files should be parsed for information, generated into
middleware, and inserted into the database. Afterwards, the web interface has to be written to retrieve
data from the database, and integration of data display with the Guidant database that has previously
been built needs to be carried out also.
It is thus imperative to be familiarized with the underlying data that is the focus of the archive
system.
3.2.1 De-identified text file
Data contained in the de-identified text files are divided into sections. As mentioned earlier, some of
these sections contain only text information while some other sections contain information that can be
converted into EGM displays.
The very first section is the header section. It specifies information such as program version, model
number, file type, and other mega-data about the file itself. After the header section are the numbered
sections presenting information on device settings, episode records, log entries, and waveform data.
Although all of the data files for various Medtronic models have the same section structure, the section
names vary in a few cases, placing doubts that the ordering of the sections may vary across different
models as well. This introduced some complications in processing the data files.
Following is a list and brief descriptions of sections present in Medtronic data files listed by their
section numbers. Missing sections are result of unavailable models.
1. Parameters: features of the ICD such as Detection being turned on or off and Therapy Modes and
strengths.
2. Counters and Status: status of the ICD itself such as timestamps and energy level.
3. Tachycardia Log Entries: data about tachycardia episode start time, type, etc.
4. SVT/NST Log Entries: data about supraventricular and non-sustained tachycardia episode start
time, type, etc.
6. Episode Records: numerical values that can be graphed into waveforms. The entry numbers
correspond to entry numbers in Sections 3 and 4. The numbering for data related to different
sections is done separately, so the entry numbers sometimes overlap. The content of this section
has been converted into EGM files.
10. Patient Alert Log: episode information or trouble session information. 11. Mode Switch Log: any mode switch episode data.
12. Dual Chamber R-R Intervals (Single Chamber R-R Intervals): numerical values that can be
graphed into waveforms. The content of this section has also been converted into EGM files and
are separated according to related entry type and entry number. For dual chamber ICD models,
R-R, R-P, P-R, and P-P data are stored. For single chamber lCD models, only R-R and P-P data are
stored.
13. Daily Log: levels of ICD battery, impedance, and other measures on a daily basis. 14. Weekly Log: levels of ICD battery, impedance, and other measures on a weekly basis.
15. Patient Information: patient ID, birthday, ICD device and lead information of which private items have been de-identified.
20. Long Term Clinical Trends: an log entry of arrhythmic episode information.
The sections can roughly be grouped into two groups. For some sections they are stand-alone, and for
stand-alone sections of which data can be simply listed in tabular format on an individual page. This
classification into stand-alone and EGM-related sections will help greatly in trying to design a site linkage
structure and the database.
3.2.2 Intra-section structure
To process data contained in various sections, a closer look at the structure within each individual
section of the raw data is essential. Below is a description of how the data look within each section.
Section Beginning and Ending
Every section, regardless of its content, always starts with
"Section", [section_ number] ," [sectionname]", [sectionformat_version]
and end' with
"End of Section", [sectionnumber]
This convention is very useful in determining the section beginnings and ends and searching for
appropriate sections for proper handling of data contained in them, so I made the assumption that the
sections will always start and end in the same fashion even for a new model that was not seen previously.
Section Meta-Data
After the section beginning line, one or more lines are devoted to describe the section instead of the
data contained in them. Example of such lines are
"Number of Parameters", 211
"Last Session Time","Nov 14, 2001"
5The exception is the header section which begins with " File Header" and end with " End of File Header".
Data Presentation Format Declaration
After the meta-data lines, how the data will be presented in the section is declared. There are
basically two kinds of presentation for data.
1. All lines of data will be presented in "Name","Value","Units". For example, the Parameters
section has the following format for its data:
"VF Detection Enable","On"
"VF Detection Interval",320,"ms"
Depending on the particular piece of data, it is possible that there is no unit specification for the
line as in the case of "VF Detection Enable".
2.
There are several possible formats, each declared on a separate line. Fields for the data will be determined by the first element of the data line. For example, the Patient Alert Log section hasthe following lines to declare their data format6.
[1] "Low Battery Voltage" ,"Date/Time", "Measured(V) ", "Tolerance(V) "
[21 "V. Pacing lead impedance", "Date/Time", "Impedance (ohms) ", "Tolerance
Minimum(ohms) ", "Tolerance Maximum(ohms) "
[3] "Defibrillation lead impedance", "Date/Time", "Impedance (ohms)",
"Tolerance Minimum(ohms) ", "Tolerance Maximum(ohms) "
[4] "Charge time", "Date/Time", "Charge Time (sec) ", "Tolerance (sec)"
[51 "ICD Electrical Reset occurred", "Date/Time"
[6]"All therapies exhausted for Episode", "Date/Time", "Therapy
Type", "Episode Number"
[7] "Shocks delivered for Episode", "Date/Time", "Number of
Shocks", "Episode Number", "Tolerance (shocks)"
Data from this section can take any of the 8 formats declared above. To determine which field the
data would correspond to, we need to look for the first element of the line. The following data line will
correspond to format 5:
"ICD Electrical Reset occurred","Jan 25, 2002 03:01:20"
Data Content Lines
The real data for the sections is displayed after the data format declaration lines and a blank line.
Thus the real data can be found by finding the Section beginning and counting one blank line after that.
At the end of the data content lines, the section ending line is presented to signal the end of this section.
3.2.3 Overview file
The overview file is the output of the clinic PC and is named ICD-id-info.txt for all of the ICD files. It
is used to identify the patient. The format of the overview file stays the same regardless of which
manufacturer the data comes from, and includes five pieces of information:
1. Patient ID Alias: will be used to identify patient since the patient name has been removed 2. Device Manufacturer: Guidant, Medtronic, or other manufacturers
3. Device Model: manufacturer-specific model numbers
4. Device SN Alias: will be used to identify the ICD since the serial number can be exploited to
identify the patient
5. Disk Data Format: was used to help facilitate EGM conversion process
3.2.4 EGM files
EGM files will be later used to display EGMs on the web. They are generated by the EGM conversion utility written in C from the overview and de-identified data files. They are grouped systematically
the group will contribute a component to the final EGM output on the web. For example, the .ann file
contains annotation data while the .dat file contains the actual waveform data.
Every download session will generate a group of Current EGM files, in addition to any group of
Tachycardia (Tachy) or Supraventricular Tachycardia (SVT) EGM files that are identifiable by their entry
numbers. All the graphical sections have been singled out and processed by the EGM conversion utility.
Thus in obtaining graphical data to generate the middleware, I can skip these sections and simply grab
the names of these EGM files in the file system instead as my graphical data.
3.3 Interface design
Having a working Guidant prototype greatly facilitated the interface design for displaying Medtronic
data. However, adaptations were still necessary, since Medtronic data has a different presentation from
that of Guidant. Although the general linkage structure can be kept the same, the details may differ. Here
the design of the interface is described. Only when we keep the structure of the interface in mind, can we
design a useful and fitting database with the appropriate table relations.
3.3.1 Present site structure for Guidant data
The present lCD archive site is designed to display ICD data from Guidant devices. However, the
general site structure can be taken to display data from other manufacturers, with modifications on
individual pages. The site is organized by a search on the top page, followed by the result displaying
page, patient page, ICD page, then finally the detailed log or EGM pages.
The search page
The top page for the site is the search page. There is a button that will retrieve and display all of the
patients in the database. Otherwise, two types of searches are possible: search for patients, and search for
episodes. The search criteria for each of them are presented below, with screen shot shown in Figure 8 on
The search for patient can be carried out via the following criteria:
1. Patient ID: alias given during de-identification.
2. Year of Birth: Before/On/After a certain year or Any.
3. Gender: Male, Female, or Any.
4. Implant Date: Before/On/After a certain year or Any.
5. ICD Manufacturer: a choice of all ICD manufacturers contained in the database. In this case, Guidant was the only choice.
6. ICD Model: a choice of all ICD models contained in the database.
The search for episodes can be carried out via the choices outlined below, in addition to all of the
criteria found in the search for patients:
1. Type of Episode: manufacturer-specific choices that will depend on the selection of ICD
manufacturer.
2. Therapy Type: manufacturer-specific choices that will depend on the selection of ICD
manufacturer.
3. Comment Search: Search for episodes marked by a certain comment. These comments are from a
fixed pool of pre-set comments that can be modified by authorized users. Multiple selections for
http:f/physio-arch.mit.educg-Archive of EGM & Related Data from
Implanted Cardioverter Defibrillators (ICDs)
Search For Patients
Patient_ID: Year of Birth: Gender: Implant Date: ICD Manufacturer: ICD Model: After 1969 [After 2001 [1860%
Search For Episodes
Patient Search Parameters
Year of Birth: Gender:
Implant Date: ICD Manufacturer:
ICD Model:
Episode Search Parameters
Type: Therapy Type: Any testing testing [After 1969 Male
1
86 B|Any
Figure 8. Current search page screen shot.
View lftltt of all patiqntwmrr ai A imthe d ttabpe
Other pages
After the search criteria are selected, users can click on the Search Patients or Search Episodes buttons,
and the results will be displayed in their respective result pages. Following are descriptions of each of the
pages within the site.
Patient Search Result Page. All of the records that meet the criteria are displayed along with some
basic information on the patients such as their device count, gender, birth year, and implant year. Links
are provided to link to the Patient Page. These pieces of information were chosen to be displayed to help
identify subject of interest and refine further search if necessary.
Ardive of the EGM & Related Data from Implanted (Pacemaker->efibrillator) Devices
List of Subjects Meedng Your Search Criteria:
Subjects whose Gender Is Male, Year of Birth Is After 1969, Year of inplantation is After 2001, ICD Manufacturer is Guidant, ICD Model is
1860.
No. PatientD Device Gender Birth Year Implant Year
1 002 000136 1 Male 1975 2002
2 002 000417 2 Male 1976 2002
Figure 9. Patient result page screen shot.
Episode Search Result Page. All of the records that meet the criteria are displayed along with some
basic information on the episode such as its type, number of attempts, therapy information, and the
existence of graphical data to be examined. Links are provided to link to the ICD page that contains this
episode information, the patient, and the particular episode page. These pieces of information were
chosen to be displayed to help identify subject of interest and refine further search if necessary, and the
indication of existence of graphical data will help researchers to decide in pursuing further to look at the
.. ..... ..... ... ... .. .. ... .. .
Archive ifth I tt& GM 4 Iated Lint, front 1iplantedt (PncemrAkvr;1)etibrilhitor) IDevicec
List of EDisodes Meetina Search Criterla
Figure 10. Episode search result screen shot.
Patient Page. The very first table shows the subject ID, birth year and gender of the patient. Then
each ICD this patient has carried is presented in a table by itself with certain basic information and link to
the ICD page.
EGMI & Related Data froi GRdat ( Pcenaker
Defibrall,1t-SabJect ID: 002_000417
Birth Year: 1976
Gender: Male
Number of EpIsodes 13 l~brof Bak A
Record: 002 * 11417 2: EGM & Related Data From ICD Device 2 Device Mnufacter: Gidant
Device Model: 1860 Device Serial Number 002000045 Namber of Episodes: 1
NUmber of Baks: 2
Figure 11. Patient page screen shot.
ICD Page. Information about the device itself is displayed on this page. Specifically, on the left
hand side are the episode sections containing EGM data. On the right hand side are the bank sections