AlterEgo as a speech interface for ALS patients

(1)

AlterEgo as a Speech Interface for ALS Patients

by

Matthew Wu

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Engineering Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2019

c

○ Massachusetts Institute of Technology 2019. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

May 23, 2019

Certified by . . . .

Pattie Maes

Director of Fluid Interfaces Research Group in MIT Media Lab

Thesis Supervisor

Accepted by . . . .

Katrina LaCurts

Chair, Master of Engineering Thesis Committee

(2)

(3)

AlterEgo as a Speech Interface for ALS Patients

by

Matthew Wu

Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2019, in partial fulfillment of the

requirements for the degree of

Master of Engineering Electrical Engineering and Computer Science

Abstract

AlterEgo is a wearable, silent-speech device that is able to detect what word a user is attempting to say without discernible movement of the face or mouth. At it’s current stage, the device can handle a vocabulary of about 10 words and this is enough to create usable applications for patients with speech impediments or speech loss. After thoroughly testing across a series of conditions with multiple patients per condition, we have found ALS to be the best suited condition for our system and a quick & urgent communication device to be its best application for this population. We present AlterEgo as a real-time, speech assistive technology and have made modifications so that ALS patients can effectively and independently use the device.

Thesis Supervisor: Pattie Maes

(4)

(5)

Acknowledgments

I would first like to acknowledge my thesis supervisor, Pattie Maes, and project advisor, Arnav Kapur, for giving me the opportunity to work on this project and for their guidance throughout the process.

I would also like to acknowledge all the research hospitals and residential homes that have kindly worked with us to pilot test our work. In particular, I would like to thank Julie Greenberg, Ross Zafonte, Satra Ghosh, Jordan Green, and Kristina Simonyan for their advice as we explored the assistive technology space.

I would also like to express my gratitude towards Katie Seaver from the Leonard Florence Center and Alex Burnham from the Boston Home who connected us with their facilities and patients to help push our research forward.

Finally, I would like to thank my family and friends for their unconditional sup-port, without which this work would not have been possible.

(6)

(7)

List of Figures

1-1 Steve Sailing from the Leonard Florence Home is pictured above during a pilot test of AlterEgo as an assistive technology. . . 14

2-1 A simple system or code can give speech impaired patients a quick way to respond to others with no technology at all. . . 18 2-2 Letterboards have been widely produced with physical and virtual

ver-sions to allow paralyzed patients communicate their own thoughts. . . 19 2-3 Picture Boards give patients a chance to quickly respond or initiate

conversation with their peers. . . 20

4-1 One of our primary users pictured above who was already adept at using communication technologies as seen with his Eyegaze communi-cation on the left screen. . . 37

5-1 This figure displays the placement of electrodes on the face. In our testing, we used the white and black electrodes for noise detection and the red, yellow, green, and purple ones for signal detection. . . 42 5-2 The signal produced is run through a convoluted neural net of the

above architecture before resulting in word probabilities for each of the word in the initial given vocabulary. . . 43 5-3 The small white button had to be clicked each time a sample needed

to be collected which could not be done by the ALS user and had to be done by an assistant. . . 44

(10)

(11)

List of Tables

4.1 The ALSFRS scores of each of the patients we worked with at the Leonard Florence Center . . . 34 4.2 We are able to compile the total ALSFRS scores as well as the

compo-nent relevant score for AlterEgo and validation scores for different test set size to validation set size ratios . . . 34 4.3 The component-relevant and validation accuracy scores for the MS

(12)

(13)

Chapter 1 Introduction

Developed in the Fluid Interfaces Group of the MIT Media Lab, the AlterEgo Device [8, 7] is a personalized, wearable, silent-speech interface. It allows a user to provide an arbitrary text input without any noticeable movement of the face or any audible noise with almost instantaneous processing speed. It does this by reading electrical impulses from the surface of the skin in the lower face and neck that occur when a user is internally vocalizing words or phrases. The original purpose of the device seeks to augment human intelligence and make computing, the Internet, and machine intelligence a natural extension of the user’s own thinking. Some applications in this scenario would include controlling a remote or conducting a Google search without any discernible movement of the face or mouth.

At this time, AlterEgo is not a full speech application. However, it is able to confidently detect a vocabulary of about 10 words with over 90% accuracy given 15 minutes of training for unconditioned patients, which we define as subjects who have no inhibitions to their speech systems. On the other hand, there are many people who have a lack or loss of speech as a result of a disease like ALS or stroke. Many of these affected patients are forced to use letter charts or systems like Eyegaze that require the user to spell words letter by letter in a slow and tedious manner. We hope to use AlterEgo to provide these patients a method of communicating with an ease that was not previously possible. While these conditioned patients create weaker signals for the AlterEgo System to detect, there is a potential to augment the device

(14)

Figure 1-1: Steve Sailing from the Leonard Florence Home is pictured above during a pilot test of AlterEgo as an assistive technology.

to give real-time speech on a limited vocabulary back to these people.

My role on this project was the research and design of the AlterEgo device for conditioned patients. This included researching conditions AlterEgo has potential for success with, meeting with experts and patients who can provide their perspective on the devices most useful capabilities, setting up experiments to test the effectiveness of AlterEgo for various populations, and designing applications that would make our project an assistive technology for specific patients. Through our work, we have discovered that ALS is best suited for initial exploration of the system as an assistive device and have modified AlterEgo to be independently usable for ALS patients.

1.1 Thesis Overview

This thesis will proceed as follows: In Chapter 2, we will discuss the background work that has been done in the space of assistive technologies for speech devices. In Chapter 3, we investigate several conditions that cause speech impairment and their potential to benefit from the AlterEgo system. In Chapter 4, we outline the collaboration with other research centers to build a foundation for patients to test the device as well as the actual testing and their results. In Chapter 5, we walk through the most effective applications of AlterEgo for conditioned patients and the

(15)

modifications made to create an ALS-Usable Device. And finally, we conclude in Chapter 6.

(16)

(17)

Chapter 2 Background Work

There have been a few attempts at using assistive technologies to bring natural speech back to impaired patients [3, 6, 4, 5]. We mention several of these attempts below and why they fall short of the potential performance of the AlterEgo device. We can break these technologies into three main categories: yes/no/maybe, letter based, and symbol based systems.

2.1 Yes/No/Maybe Systems

For patients who have some verbal communication left or have movement of their head or arms, these systems are generally unnecessary. However, many lower technology systems can get by for patients who are paralyzed from the neck down to help them simply communicate a ‘yes’, ‘no’, or ‘maybe’ to peers. The ‘maybe’ option is helpful in case the question needs clarification or is not applicable. For example, the use of an eye blink where one blink means ‘yes’, two blinks means ‘no’, and three blinks means ‘maybe’ can allow a patient who has no speech to agree or disagree to any question that has been asked. Similarly, eye movements of right, left, and roll or gazing at these options on a board can allow for similar results.

The downside of Yes/No/Maybe systems is the lack of variety their users can give in their responses and also the inability to stimulate conversations of their own. In order to give more autonomy to users, more complex systems have been created to

(18)

Figure 2-1: A simple system or code can give speech impaired patients a quick way to respond to others with no technology at all.

allow for word production by paralyzed patients.

2.2 Letter Based Systems

The power of letter based technologies lies in their ability to produce any word in a language when the entire alphabet is available, albeit at a much slower pace than natural speech. One of the most widely used systems is the Letterboard on which the letters of the alphabet are placed in a grid like system as seen in the figure 2-2. For patients who are able to move their hands, they can point to the letters they would like to select in order to spell out words and sentences. Paralyzed patients are able to still use this system with the help of an assistant who moves his finger column by column and then row by row until the intended letter is selected.

An assistive technology called Eyegaze [11], which controls a mouse by tracking a user’s eye has also been used to aid in letter based communication systems. By having a virtual Letterboard or a simple keyboard on screen, paralyzed users can move their eyes and head to type out words. The use of autocomplete technology has also been used to accelerate this communication process. Another system called Dasher [12] has

(19)

Figure 2-2: Letterboards have been widely produced with physical and virtual versions to allow paralyzed patients communicate their own thoughts.

the user use a pointer to select the next letter from a series of options that appear on the right side of the screen in an arcade game like fashion. The system uses a probabilistic predictive model to anticipate the most likely next letter and make the more probable ones more prominent on the screen. Patients who still have control of their hands and fingers can type the words they would like to speak and then use a voice output technology like SpeakIt or VoiceOver to emit the message to those around them [1].

While the aforementioned systems can allow users to express any idea they would like, the speed with which they work is extremely slow in comparison to that of a word by word system. The process of creating words in this manner is a slow and tedious process that can tire out users and leave them working to express an idea far after they thought of it. Some symbol based systems have been developed to help improve response time for patients using this assistive technology.

(20)

Figure 2-3: Picture Boards give patients a chance to quickly respond or initiate conversation with their peers.

2.3 Symbol Based Systems

Symbol based systems have been used to help give patients quick ease of speech back in an attempt to match regular conversation. Like the Letterboard for letter based systems, the Picture Board has widely been used both physically and virtually. As seen in figure 2-3, the Picture Board lets users point, by finger, mouse, or Eyegaze, to the word or phrase they would like to convey. While the breakdown of symbol based systems can vary by requests, like I want or I need, or by categories, like people or food, or alphabetically, the general process of quickly identifying a symbol to start a conversation or respond to a question remains the same.

Despite the speed with which symbol based systems can output words, it can take time to locate the word a user would like to say and the system also has a limited vocabulary. In addition, the way words are outputted in both letter and symbol based systems is very mechanical, with articles being selected rather than spoken. The AlterEgo device looks to solve both the issues of output speed and artificial speech by allowing users to fluidly input their best effort to recreate the word they intend to say and get real-time output from the system. We will now dive into the

(21)

research that was undertaken to investigate the conditions, patients, and, applications that could most benefit from the AlterEgo device as an assistive speech technology in a manner previously seen technologies could not.

(22)

(23)

Chapter 3 Conditions Research

Over the course of the year, we conducted research on the conditions our device could benefit and met with a wide range of speech and medical experts to gain their perspectives.

3.1 Speech Impairment Types

We first see that we can segment the conditions AlterEgo can benefit into three broad categories.

3.1.1 Loss of Function

The first segment is for conditions that result in a loss of function of the speech system. Examples of this are patients who have issues with spasmodic air flow through their larynx, weakened mouth muscles, or mouth sores that make speaking painful. Conditions like spasmodic dysphonia, spinal cord injury, or oral cancer can create these issues. Spasmodic dysphonia is a condition where the larynx spasms when a patient attempts to speak but is completely fine during whispering or silent speech. Specific cases of spinal cord injury can result in the loss of function of certain mouth muscles. While the neurological signal may be intact, the ability to move the mouth to form particular words may be inhibited. Oral cancer can cause sores and lumps

(24)

in the mouth, making swallowing, speaking, and breathing difficult. For each of these conditions, we hypothesize that AlterEgo will have the same success as it had with unconditioned patients since the neuromuscular signals emitted should still be trainable by our model.

3.1.2 Loss of Control

The second segment categorizes conditions where patients lose control of their mouth while speaking. In particular, these patients may make sudden outbursts or have ran-dom facial movements. Conditions including stroke, Tourette syndrome, or cerebral palsy can result in these symptoms. We are unsure how effective the AlterEgo device would be for these patients as we would have to be able to screen the noise created from outbursts that are not intended speech. If we are unable to do so, many false signals would fire, making training of the samples produced by these patients very difficult.

3.1.3 Neurological Disorders

The third and final segment is for patients that have neurological disorders. Condi-tions with this categorization include ALS, Parkinson’s, and some types of stroke. For these patients, a neurological pathway has been affected, preventing us from knowing the strength, consistency, and sharpness of their EMG signals during silent speech. We also are unsure of how well we can teach the usage of the device to patients with cognitive deterioration. Another question lies in the degenerative nature of many neu-rological diseases. If AlterEgo is effective for users at one time, there is no guarantee that it will have the same effectiveness as their disease progresses.

3.2 Expert & Patient Interviews

Throughout the year, we conducted six meetings with experts and patients of the speech disorder community to hear how they believed AlterEgo could be most

(25)

ben-eficial. Each interview gave us a key takeaway that has guided our strategy and research.

3.2.1 Will & Jency

We first visited Will Mills, a patient at Spaulding Rehabilitation Hospital, and spoke with Jency, Will’s personal care assistant as well as Jess, one of his respiratory ther-apists. We learned a ton about Will as a potential patient and the capabilities of Spaulding as a whole. Will suffered from a brain tumor after college and took chemotherapy to fight the disease. While he is cancer free, the radiation of the treat-ment had devastating side effects that damaged the motor control of everything below his neck.

He uses a computer with a mouthpiece but is otherwise unable to move anything from the neck down. He also has trouble breathing and uses a breathing tube at all times. Both Jency and Jess believe that Will is a fantastic patient for AlterEgo. He has major speech issues due to the weaknesses in his oral muscles and need for a speaking valve, but his cognitive condition is fully intact.

3.2.2 Julie Greenberg

We met with Julie Greenberg from HST (Harvard-MIT Health Sciences and Technol-ogy) who has publications in speech perception and assistive technologies like ours. We discussed both the applications to the technology and pointed out some really interesting points regarding speech that we hadn’t previously thought about. She also pointed us in the direction of a few other people who could be extremely helpful in our research.

Julie made a great point in the way that we select patients. Since speech is a circuit, AlterEgo would require this circuit to be working all the way up to the cranial nerve firings from the brain. Patients who had lesions breaking this circuit would be the least likely to find success using AlterEgo.

(26)

3.2.3 Ross Zafonte

We met with Dr. Ross Zafonte to hear his thoughts on the AlterEgo project, which patients it could be used with, and what a partnership between Spaulding and our group would look like. Dr. Zafonte believed that the most relevant patients in Spaulding would be ALS, SCI, and stroke patients. He saw the initial testing to be a proof of concept that the device can work as well on conditioned patients as it does with unconditioned patients.

He planned to connect us with a clinical co-investigator from Spaulding who would become our main point of contact. The population of patients at Spaulding means a relationship with them would be extremely useful. His hope is that we can eventually do a pilot study with Spaulding with five patients from three conditions of interest over a few months to determine if their experience with AlterEgo can exceed that of their current speech alternatives.

3.2.4 Satra Ghosh

We held a helpful meeting with Satra Ghosh from MIT’s Brain & Cognitive Sciences Department (BCS). He had great insight into how we could accelerate the testing process and gave suggestions of which populations we should begin testing with. While Satra could connect us with BCS Testing, his main concern was that 99% of BCS subjects are unconditioned patients.

Satra felt pretty strongly that we should start with patients who have simply lost some control of the larynx, windpipe, or mouth since this population would yield the least amount of noise in testing data. He noted that brain conditions are much harder to reliably test and that we should start with mechanical conditions before tackling neurological ones. He helped us set up meetings with MASS Eye and Ear, MGH Institutes, and MGH’s Voice Surgery Clinic to give us patient populations to test with down the road.

(27)

3.2.5 Jordan Green

We also had a very productive meeting with Jordan Green. He is someone who is very familiar with sound/speech interfaces and has been studying speech for 20 years now, especially in ALS and stroke patients. He made it clear that clinical testing at the MGH Institute would be doable. The biggest takeaway from our conversation was that a partnership between our project and MGH Institutes is definitely a possibility. Jordan had some insightful suggestions. He advised that we test not only a variety of diseases but varying levels of severity within each disease. In order to quantify this, he mentioned that we should partner with a speech pathologist who could specialize in characterizing each patient’s speech condition. He also noted that MGH Institutes has usability forms that we could also use if we would like to evaluate the ease of use of AlterEgo from the patient’s perspective.

3.2.6 Kristina Simonyan Group

I met with a group of nine graduate students and research scientists at Mass Eye & Ear. They specialize in research with spasmodic dysphonia (SD) which is a condition where the larynx spasms during speech. It makes their voices quiet and hard to understand, but patients with the condition can still whisper and mouth words with any spams and maintain full cognitive ability.

User testing is definitely a possibility at Mass Eye & Ear as they see 2-3 patients a week and run their own experiments to develop their research. Since SD patients only have trouble speaking out loud but have full movement of their hands and minds, communication is the key space in which AlterEgo could benefit them.

(28)

(29)

Chapter 4 Patient Testing

4.1 Patient Homes

Before we created AlterEgo applications to benefit patients, we first needed to prove that the device is as usable for their condition. We have met with leaders of research hospitals and discussed the potential for user testing of our device with their patients. While we have been in contact with these more professional institutions, we also worked with a few living homes to conduct our research where our IRB from MIT COUHES gave us permission to run pilot testing. For residential homes, we worked with the Leonard Florence Center, Boston Home, and Easter Seals. For research hospitals our four contacts were at Spaulding Rehabilitation Hospital, Massachusetts Eye & Ear, MGH Institute of Health Professions, and MGH Center for Laryngeal Surgery & Voice Rehabilitation.

4.1.1 Residential Homes

Due to the incredibly tedious IRB clearance process for clinical trials, the more pro-fessional hospitals and research centers took months to receive permission for usage. As a result, testing at residential homes was easier here as there were less barriers to entry than there would be in hospital or clinical settings. In fact, each of the hospitals and research centers listed required follow up IRBs that were not approved until May

(30)

when our testing was already almost complete. As a result, we had to conduct all of our patient testing through these residential homes with ALS testing being done at the Leonard Florence Home and MS testing being done at the Boston Home.

At the Leonard Florence Center, we were worked Steve Sailing, an ALS patient who founded the home, as well as Katie Seaver who helped us coordinate each of our visits and select the patients we would work with. With 14 ALS patients living on two floors, it was very convenient for us to conduct testing at the home. The home also houses about 15 multiple sclerosis (MS) patients, but almost all of them maintain the ability to speak which was not the population we wanted to target at the time. After an initial pilot round of testing, we were able to return to Leonard Florence multiple times to try new iterations of our device on the patients who had seen the most success with AlterEgo.

At the Boston Home, we were in contact with Alex Burnham, one of their speech language pathologists who had an excellent feel for his residents and determined who would most likely be able to benefit from our work. The Home is a residential facility for adults with advanced multiple sclerosis and since speech loss occurs much later in MS, this subgroup was a perfect one for our research to target. The Boston Home has approximately 80 residents with MS which made it possible for us to find good candidates for pilot testing.

At Easter Seals, we have a connection with Kristi Peak-Oliveira who is a speech language pathologist there specializing in Alternative and Augmentative Communica-tion. While Easter Seals does not house one specific condition, it does have a variety of clients, with a variety of diagnoses and communication deficits including stroke, ALS, and other conditions acquired at birth. This could be an opportunity to trial our technology many different types of users.

4.1.2 Hospitals & Research Centers

At Spaulding Rehabilitation Hospital, we have been in contact with Dr. Ross Zafonte, the Senior Vice President of Medical Affairs Research and Education. We are working on determining a collaborative research partner from Spaulding who would help us

(31)

conduct these user tests. From there, Spaulding has a range of ALS and stroke patients to run user tests with. Since the majority of Spaudling patients reside in Spaulding, we would be able to go to the ALS and stroke floors and use a room to run testing over the course of a few days. We are still working with Spaulding to get an IRB approved to be able to test in their space.

At Massachusetts Eye & Ear, we are in contact with both Phil Song, their Laryn-gology Division Chief, and Kristina Simonyan, the head of their spasmodic dysphonia research division. A meeting with their entire spasmodic dysphonia team showed me that research testing with their patients would most definitely be possible. They meet with 2-3 patients a week from all over the United States, and we would simply add our user test to the experiments their lab already conducts. Phil has greater access to patients at Mass Eye & Ear and mentioned that we could also conduct testing with their vocal surgery patients as well as other populations that may benefit. We acquired permission to conduct testing at Mass Eye & Ear in May 2019.

At MGH Institute, we are in contact with Jordan Green, an Associate Provost for Research. After an insightful phone call, I learned that MGH Institute does user testing with a variety of assistive technologies that detect speech by analyzing mechanical patterns of movement. He believed that our process of detecting EMG signals was a minimal adjustment to make. MGH Institute has a variety of patients including ALS. We would be able to conduct user testing and gain perspective on applications of assistive technologies from MGH Institute. We are still working with MGH Institute to get an IRB approved to be able to test in their space.

At MGH Center for Laryngeal Surgery and Voice Rehabilitation, we have reached out to James Heaton and Geoff Meltzner to discuss a partnership for conducting user testing with their Laryngectomy patients.

4.2 Pilot Testing Process

We conducted experimental design of a user testing process for the AlterEgo device. For testing in the residential homes between different patient groups, we needed a

(32)

standardized system to best compare how each condition did against the others. After we determined which specific patient groups were able to generate a usable and consistent signal for the AlterEgo system, we worked through more rigorous experiments to find out which patients were going to most effectively be able to use it’s applications.

For each patient we tested, the goals were to get the patient comfortable enough with AlterEgo to use it’s silent speech feature, compute the accuracy with which they could train the device, and determine how well the device would work for that population of conditioned patients.

We would then go through a six step testing strategy. 1) We would first have the patient sign the IRB consent form provided by MIT COUHES. 2) We then walked through a questionnaire to get basic information for each patient with questions ranging from the type and length of time of their condition and specifics around the range of movement of their mouth and tongue. 3) Next, we would setup the patient with the AlterEgo device, attaching the gel electrodes onto their face and loading the interface that would allow them to observe the signal strength and shape of the samples they were creating. 4) We would then coach the patient to properly use the device, having them first train speaking words out loud, then speaking to themselves, then with limited lip movement, and then with silent speech. 5) We would then have the subject create a training set of 5 words with 15 samples per word and run that data through our neural network to create a predictive model. 6) After training, we would load up a real-time demo that would mimic what the functioning system would be. The subject would be asked to say a given word and after the system generated a list of probabilities for which word was the intended one, the computer would output the word with the highest probability and this would ideally be the original word that was said.

(33)

4.3 Condition Selection

Following our conversations with various experts and given the condition populations we had available, we decided that we would attempt to conduct user tests for three conditions of interest: ALS, MS, and stroke. Each of these populations had patients who had lost the ability to speak due to their condition and AlterEgo would have the potential to give a portion of that vital ability back. These were also conditions with a wide number of patients available which would allow us to select those best suited to use the device and in the case that a solution was found, would have a greater impact on that entire condition.

4.3.1 ALS

The pilot testing of ALS at Leonard Florence gave us key takeaways and valuable data regarding the potential of AlterEgo for ALS patients. In order to properly assess the phenotypes of the patients we were working with, we used the ALS Functional Rating Scale [10] or ALSFRS which provides an estimate of a patient’s degree of functional impairment across categories ranging from speech and salivation to dyspnea1 _and

orthopnea2 _{and could thus predict progression of the ALS disease. The ALSFRS}

scores of the five patients we worked with can be seen in table 4-1 where 4 is the least progression of the disease and 0 is the most progression with specific score definitions coming from the ALS C.A.R.E Program3_.

After running through the pilot testing protocol we outlined in section 4.2, we were able to calculate validation scores and look for a correlation between the scores and the ALSFRS scores. We also felt that some of the functions measured by the ALSFRS score (handwriting, cutting food, dressing/hygiene, turning in bed, walking, climbing stairs) were not applicable and so we also calculated a component-relevant ALSFRS score comprised of the other factors that would effect AlterEgo users.

We have three types of validation scores with 20%, 30%, and 50% of our training

1_{Shortness of breath}

2_{Shortness of breath when lying down}

(34)

Function Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Speech 1 0 0 0 0 Salivation 4 0 0 2 0 Swallowing 3 0 2 2 0 Handwriting 2 3 1 0 0 Cutting Food 1 0 1 0 0 Dressing/Hygiene 2 3 0 0 0 Turning in Bed 3 3 0 0 0 Walking 3 3 0 0 0 Climbing Stairs 0 0 0 0 0 Dyspnea 3 4 4 1 4 Orthopnea 2 2 2 1 1 Respiratory Insufficiency 3 3 4 4 4 ALSFRS Score 27 21 14 10 9

Table 4.1: The ALSFRS scores of each of the patients we worked with at the Leonard Florence Center

Patient 1 Patient 2 Patient 3 Patient 4 Patient 5

ALSFRS Score 27 21 14 10 9

Component-Relevant 16 9 12 10 9 Validation 20% 100% 66% 66% 50% 66% Validation 30% 89% 63% 78% 55% 78% Validation 50% 79% 62% 79% 54% 69%

Table 4.2: We are able to compile the total ALSFRS scores as well as the component relevant score for AlterEgo and validation scores for different test set size to validation set size ratios

data being used as a validation set. All of these findings are highlighted in table 4.2. From this table, we are able to find that there is a 0.71 correlation between the component-relevant score and validation 20% and 0.51 correlation between the component-relevant score and validation 30% and 50%. This suggests that there is a relevance between ALS patients being able to speak, salivate, swallow, and breath easily with being able to use the AlterEgo device well.

Though the ALS testing, we ended up compiling validation scores of an average of 70% but with many scores in the high seventies and up to 100% demonstrating the potential for AlterEgo to be viable for this population. We also saw accuracies this high for patients with ALSFRS scores ranging from 9 to 27, indicating that the

(35)

Function Patient 1 Patient 2 Patient 3 Speech 1 0 1 Salivation 3 2 1 Swallowing 2 1 1 Dyspnea 4 3 1 Orthopnea 2 2 1 Respiratory Insufficiency 4 3 4 Component-Relevant Score 16 11 9 Validation 20% 33% 50% 83% Validation 30% 33% 45% 78% Validation 50% 31% 54% 79%

Table 4.3: The component-relevant and validation accuracy scores for the MS patients that we tested with at the Boston Home.

technology could be viable for patients for a long time as they progress through the ALS disease.

4.3.2 Multiple Sclerosis

We also ran a successful pilot testing of MS at the Boston Home. We worked with three patients and gathered their information on the relevant components from the ALSFRS scoring system as we felt they were appropriate in determining the severity of other neurological diseases as well. Those component-relevant scores for our MS patients can be found in table 4.3 as well as their validation scores for 20%, 30%, and 50% splits of our training data.

With MS, these validation scores dropped to a range of 31% to 83% with an average of 54%. However, while we had previously seen a correlation between component-relevant scores and validation accuracies, we see here that the opposite is true. The explanation for this is the twitching and mental impairments that were present in patients 1 and 2 that were not captured by any component-relevant function. Patient 1 would twitch heavily, affecting the consistency of her signals and thus causing major noise in the data. Patient 2 had cognitive delays and had trouble processing the instructions we gave, at times saying the incorrect word during testing. This lead to mislabeled samples and greatly affecting the system’s ability to train the data set

(36)

well. So while the component-relevant score was helpful with ALS patients who had intact cognition and no physical twitches, it was skewed if either or both of those factors was present.

4.3.3 Stroke

After the MS pilot testing, we spoke with a few experts from Spaulding Rehabilitation Hospital and they suggested that we skip the testing of Stroke patients and focus instead on ALS, where there was already promise. The key reason for this was that there are three variations of stroke, all of which would hinder AlterEgo’s ability to perform well at this stage.

The first type of stroke, dysarthria, results in a loss of muscle control and this would cause twitching which we already saw would greatly affect the system due to its sensitivity. The next type, receptive dysphasia, effects comprehension and this would prevent patients from understanding how to use this higher tech device. And finally, there is expressive dysphasia which makes it hard to put words together to make meaning so even if the patient could use the AlterEgo system, the words they would communicate would not make sense. While using AlterEgo as an assistive technology should certainly be explored for stroke in the future, it is not the most promising condition to work with at the moment.

4.3.4 Condition Decision

After seeing the potential of AlterEgo for each of these three conditions, we concluded that ALS would be the one to specialize for. Not only did it have the highest validation accuracies across component-relevant scores, but also the nature of the disease leaves cognition intact and does not have twitching as a symptom. In fact, the weakening of muscles from ALS actually decreases movement in patients which is more ideal for our system. For these reasons, we found ALS best suited for initial exploration of AlterEgo as an assistive device [2].

(37)

appro-Figure 4-1: One of our primary users pictured above who was already adept at us-ing communication technologies as seen with his Eyegaze communication on the left screen.

priate for and most naturally adept at using AlterEgo. Since the screen gives visual feedback to the user, we noticed that tech savvy users had an easier time acclimating to the device. In addition, some patients with a more progressive form of ALS had more limited movement and this led to more consistent signals and cleaner data. We were able to settle on two ALS patients from the Leonard Florence Center as the best candidates to run extensive testing of AlterEgo with.

4.4 User Feedback

Throughout the testings, we received feedback from users on the success of the testing.

In terms of ease of use, many users found it fairly easy to use with only one struggling to create readable signals. In addition, they felt that the sensitivity of the device properly reflected the movement they were making. All of our users also indicated that they did not feel tired following their 30 minute session of testing.

We also received feedback in regards to the vocabulary words they would want to communicate quickly and easily. One subject mentioned "that people rely on their speech for quick comments related to social graces or basic needs. This included words like "Hello", "Thank you", "Good Bye", "Help me", "Get up", "Bathroom"

(38)

and other positioning and hygiene related comments.

When asked what the largest annoyance with their current systems were, their greatest complaint was access and speed of speech. A quote from one patient summed up this feeling very well. "No matter what, typing a message is always slower than natural speech and in many of my experiences, people walked away before I was able to finish my thought."

(39)

Chapter 5 Application Selection & Creation

5.1 Potential Applications

For each of the conditions we worked with, we brainstormed potential applications to fit each population. As mentioned earlier, AlterEgo has the ability to distinguish between about 10 words with over 90% accuracy when given a 15 minutes training period. Under these circumstances, we have a few applications that could benefit the right AlterEgo users [9].

5.2 Communication

Communication is a huge part of society and for patients who have restricted speech abilities, AlterEgo can have a huge impact. For patients who have trouble projecting, a speaker could be combined with AlterEgo to better amplify the words they intend to say.

There are also patients who have expressive aphasia and know what they what to say but cannot form the right words. For these patients, we can use bone conduction headphones so that they can hear the word they are trying to say. This would aid them in producing the word they want as the auditory feedback of the word they want to say can trigger the movement of the correct mouth muscles.

(40)

of them use Eyegaze, a device that tracks the movement of their eyes over specific letters to form words. However, research has shown that this device can be slow and tedious, causing users to prefer not to speak. We could create a speech application that uses AlterEgo to form words through letters. This method would be faster and easier than Eyegaze and give those locked in patients a much more positive experience communicating.

5.2.1 Memory

Patients who have lapses in memory could also uses AlterEgo to help remember things. By storing recordings of thoughts, we can build an application to help these users store and recite reminders at any time. We could also connect AlterEgo to a application like Apple Reminders or Google Calendar to remind them of upcoming events or tasks. If there is a population that can use AlterEgo and faces this problem, we can surely create an application to help.

5.2.2 Location

We have also discussed using location to narrow the vocabulary of 10 words AlterEgo can confidently use. For example, if a user is at the airport, their vocabulary would narrow to words like "security" or "check-in" whereas if they were at a restaurant, they would have words like "order" or "check". Using location to give AlterEgo a context for communication would greatly increase it’s effectiveness and is an application we would surely build if the need exists.

5.2.3 Application Choice for ALS

After working extensively with ALS patients, we found that the thing they needed most was the ability to communicate basic statements or urgent needs quickly. With their cognition being strongly intact, the use of a memory system would not be very valuable. In addition, the location based application would be less useful given that most patients spend almost the entirety of their day in the ALS home where there

(41)

is little location based context to be used. As a result, the use of AlterEgo as an efficient, real-time communication tool was clearly the best choice.

5.3 AlterEgo Device Overview

AlterEgo is a wearable silent speech interface that enables a discreet, seamless and bi-directional communication with a computing device in natural language without discernible movements or voice input.

The production of acoustic speech involves a series of intricate and coordinated events and is considered one of the most complex motor actions humans perform. An expression conceived in the brain is encoded as a linguistic instance mediated by areas in the brain and the supplementary motor area. This is then mapped into muscular movements for vocal articulation. In particular, there are projections fired to the face, laryngeal cavity, pharynx and the oral cavity. The propagation of a nerve impulse through a neuromuscular junction causes the neurotransmitter acetylcholine to be released into the synapse which triggers an action potential in the muscle fiber. This ionic movement, caused by muscle fiber resistance, generates potential difference patterns that occur in the facial and neck muscles while intending to speak. This signature can be captured from the surface of the skin in the absence of acoustic vocalization and it is the signal that the AlterEgo device looks to process.

Signals are captured through a series of six electrodes attached to the face and neck. The signals are digitally rectified, normalized to a range of 0 to 1 and con-catenated as integer streams. The streams are sent to a mobile computational device, which subsequently sends the data to the server hosting the recognition model to clas-sify silent words. Once the signal gets here, it is processed through a neural network to determine the word probabilities of the input. The specific placements of elec-trodes can be seen in figure 5-1. While we were using up to 8 detection elecelec-trodes, we have reduced to the 4 most important ones (red, yellow, green, purple as well as white, black for noise normalization) for convenience while maintaining relatively similar levels of performance.

(42)

Figure 5-1: This figure displays the placement of electrodes on the face. In our testing, we used the white and black electrodes for noise detection and the red, yellow, green, and purple ones for signal detection.

The net contains a hidden layer that convolves 400 filters of kernel size 3 with stride 1 with the processed input. This is subsequently followed by a max pooling layer. This unit is repeated twice before globally max pooling over its input. This is followed by a fully connected layer of dimension 200 passed through a rectifier nonlinearity which is followed by another fully connected layer with a sigmoid activation. The network was optimized using a first order gradient descent and regularized using a 50% dropout in each hidden layer to enable the network to generalize better on unseen data. Following training, silent words that are spoken are predicted by our model and outputted through computer speakers. A diagram of the described neural network is shown in figure 5-2.

The system outputs the word probabilities and in our speech application for con-ditioned patients, would announce the word with the highest probability through the computer microphone.

5.4 Making AlterEgo an ALS-Usable Device

In its current state, the AlterEgo project was not ready to be used by ALS patients as an independent speech system. Throughout our pilot testing, we diagnosed areas of

(43)

Figure 5-2: The signal produced is run through a convoluted neural net of the above architecture before resulting in word probabilities for each of the word in the initial given vocabulary.

improvement for the device and implemented solutions to allow for improved signal detection, independent usage, and sample consistency. These changes focused on allowing the system to be improved to an deployable level for ALS patients.

5.4.1 Signal Detection

First, alterations were made to the software. We quickly noticed the vocal output of ALS patients to be lower than those of unconditioned patients. In addition, some patients had head and mouth twitches as well as a need to swallow that greatly threw off the model if these interruptions occurred during a training sample.

To solve this, the production signal was enhanced and the placement of electrodes on the face of the user needed to be extremely precise in order to ensure that their speech could be detected. We also made the signal production feedback flexible for the specific user. As each testing session went, the front end would adapt the size of the plot axis based on how strong of a signal it was receiving.

To solve for twitches and swallowing during data collection, similar signal strength function was used. By detecting for if the maximum signal strength was far higher within a sample than in past samples, we eliminated the outlier signals from training data sets during the session.

(44)

Figure 5-3: The small white button had to be clicked each time a sample needed to be collected which could not be done by the ALS user and had to be done by an assistant.

5.4.2 Independent Button Press

Normally, all of the data samples were collected via a button that is pressed to indicate the start of a word being said and released when the word ended. This button exists on the Open BCI board itself so an assistant would need to click the button each time during testing to collect samples. However, it is hard for the assistant to tell when subjects had started and ended each word when their movements are so subtle and these miscues in timing during the capturing of samples made it harder for the model to train properly.

To combat this, a mouse click detection was implemented to replace the trigger, allowing for ALS patients to use a wireless mouse to collect data samples indepen-dently. With this solution, patients could wait until they were ready to say the next word to start data collection and this made samples much more accurate when start-ing and endstart-ing words. We also found that patients have a wide range of sample collection rhythms. Some would want to say words quickly back to back while other would give longer pauses so this solution also gave autonomy to the user in the data training process.

(45)

5.4.3 Sample Consistency

It is also difficult for ALS patients to tell how consistently they are collecting each data sample. For example, they cannot tell if they are saying the word "bathroom" on the fifth time as consistently as they did the first four times.

To give better feedback to the user, a check of sample consistency was implemented where after at least five previous samples of a word had been seen, the system would check for the similarity between the new sample and its past examples. If the average difference for the newest sample was above a threshold, the user would be given feedback that the last sample consistency was "bad" while otherwise being told the consistency was "good". This allowed users to best maintain similar data production throughout the training process.

(46)

(47)

Chapter 6 Conclusion

6.1 Future Work

Great strides have been made to make AlterEgo a usable technology for ALS patients but there is future work to be done for it to become a device used by these ALS patients in a day to day scenario. There are a few challenges left to be solved to make it a deployable system from the physical wearable to a self contained device for the software to be run on.

In order for the device to be fully functioning, real-time speech technology substi-tute, a larger vocabulary would be required and promising work is already being done to greatly increase the vocabulary of the device. In addition, if the system were sensi-tive enough to detect phonemes or parts of words, a much larger range of vocabulary could be used as many more words can be created by combining phonemes. Both of these improvements would close the gap between AlterEgo and letter based systems that can already use the entire English vocabulary. We believe that the benefit of real-time and natural speech will eventually outweigh the difference in vocabulary size between AlterEgo and its alternatives.

In addition, a redesigned wearable could make the device easier to integrate into the lives of patients. Wireless electrodes, a concentration of electrodes in just one area so that the device was not taking up a majority of the face, or sticker electrodes could all make the device more comfortable and sustainable to wear. A subtler design

(48)

would be beneficial as users would feel more natural wearing AlterEgo. Otherwise, it could be distracting for peers to see patients wearing such an intricate apparatus to communicate with others.

The external computer issue could hopefully be solved by integrating the technol-ogy into the computer monitors connected to the power chairs ALS patients currently use. However, it would take quite a bit of adaptation to move this entire product from research to a deployable deliverable. Many current ALS users use computers with their power wheelchairs but there is no standard model to build around. Another potential solution would be to run the application through a mobile device where the model could be controlled and given to a patient as a part of the larger AlterEgo system.

6.2 Parting Thoughts

This entire project was a truly fascinating one to work on. From brainstorming applications, to working with patients, to iterating on a final design, it gave me a ton of insight into both product design and assistive technologies.

Some unexpected challenges were the difficulties in obtaining permission to work with these patients through our IRB as there was so much paperwork to fill out and hoops to jump through. Certain hospitals and homes were also much harder to get in contact and schedule appointments with than expected. Another challenge also arose in teaching patients how to use the AlterEgo device as silent speech is not an intuitive skill. Especially for users who had some cognitive disabilities or were uncomfortable with technology, it took a great deal of time and patience to collect proper and consistent data from them. This could hopefully be addressed in the future with a more effective tutorial and also experienced coaching of the device.

This begs the question of whether AlterEgo is usable for all speech impaired patients. We saw that twitches and mental impairments were hard to overcome at this moment in time but intuitive use of assistive technology is almost a requirement of its own. While the goal of giving a natural speech system back to users is an

(49)

important one, there are patients who still prefer a slower letter based system since it allows them to express any idea or a more artificial symbol based system since it lets them react or respond in more ways.

While ALS is the condition this project focused on working with, there are so many speech impairments for which AlterEgo and other assistive speech devices can be beneficial for. Communication is a huge part of society and for patients who have restricted speech abilities, AlterEgo can have a huge impact. I hope that this project continues to push forward to be faster and easier than other speech system alternatives to give any speech inhibited patients a much more positive experience communicating. If our system is faster and just as accurate as alternatives, AlterEgo can have an incredible impact on their lives as the first real-time speech assistive technology.

(50)

(51)

Bibliography

[1] Shiri Azenkot and Nicole B Lee. Exploring the use of speech input by blind people on mobile devices. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, page 11. ACM, 2013.

[2] David Beukelman, Susan Fager, and Amy Nordness. Communication support for people with als. Neurology Research International, 2011, 2011.

[3] Jonathan S Brumberg, Alfonso Nieto-Castanon, Philip R Kennedy, and Frank H Guenther. Brain–computer interfaces for speech communication. Speech commu-nication, 52(4):367–379, 2010.

[4] Victoria M Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, An-tonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel, Cédric Gendrot, and Sophie Quattrocchi. Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In Eleventh Annual Conference of the International Speech Communication Association, 2010.

[5] Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Naka-mura, Yoshitaka Nakajima, and Kiyohiro Shikano. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, 52(4):301–313, 2010.

[6] Robin Hofe, Stephen R Ell, Michael J Fagan, James M Gilbert, Phil D Green, Roger K Moore, and Sergey I Rybchenko. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication, 55(1):22–32, 2013.

[7] Arnav Kapur. Human-machine cognitive coalescence through an internal duplex interface. PhD thesis, Massachusetts Institute of Technology, 2018.

[8] Arnav Kapur, Shreyas Kapur, and Pattie Maes. Alterego: A personalized wear-able silent speech interface. In 23rd International Conference on Intelligent User Interfaces, pages 43–53. ACM, 2018.

[9] Blanka Klimova, Petra Maresova, and Kamil Kuca. Assistive technologies for managing language disorders in dementia. Neuropsychiatric disease and treat-ment, 12:533, 2016.

(52)

[10] André Maier, Teresa Holm, Paul Wicks, Laura Steinfurth, Peter Linke, Christoph Münch, Robert Meyer, and Thomas Meyer. Online assessment of als functional rating scale compares well to in-clinic evaluation: a prospective trial. Amy-otrophic Lateral Sclerosis, 13(2):210–216, 2012.

[11] Carlos H Morimoto and Marcio RM Mimica. Eye gaze tracking techniques for interactive applications. Computer vision and image understanding, 98(1):4–24, 2005.

[12] David J Ward, Alan F Blackwell, and David JC MacKay. Dasher-a data entry interface using continuous gestures and language models. In UIST, pages 129– 137. Citeseer, 2000.

AlterEgo as a speech interface for ALS patients

AlterEgo as a Speech Interface for ALS Patients

by

Matthew Wu

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Engineering Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2019

c

○ Massachusetts Institute of Technology 2019. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

May 23, 2019

Certified by . . . .

Pattie Maes

Director of Fluid Interfaces Research Group in MIT Media Lab

Thesis Supervisor

Accepted by . . . .

Katrina LaCurts

Chair, Master of Engineering Thesis Committee

AlterEgo as a Speech Interface for ALS Patients

by

Matthew Wu

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Thesis Overview

Chapter 2

Background Work

2.1

Yes/No/Maybe Systems

2.2

Letter Based Systems

2.3

Symbol Based Systems

Chapter 3

Conditions Research

3.1

Speech Impairment Types

3.1.1

Loss of Function

3.1.2

Loss of Control

3.1.3

Neurological Disorders

3.2

Expert & Patient Interviews

3.2.1

Will & Jency

3.2.2

Julie Greenberg

3.2.3

Ross Zafonte

3.2.4

Satra Ghosh

3.2.5

Jordan Green

3.2.6

Kristina Simonyan Group

Chapter 4

Patient Testing

4.1

Patient Homes

4.1.1

Residential Homes

4.1.2

Hospitals & Research Centers

4.2

Pilot Testing Process

4.3

Condition Selection

4.3.1