• Aucun résultat trouvé

Design of tool for analysis of speech development disorders using landmarks and other acoustic cues

N/A
N/A
Protected

Academic year: 2021

Partager "Design of tool for analysis of speech development disorders using landmarks and other acoustic cues"

Copied!
71
0
0

Texte intégral

(1)

Design of Tool for Analysis of Speech Development

Disorders using Landmarks and Other Acoustic Cues

by

Tanya Talkar

Submitted to the Department of Electrical Engineering and Computer Science

in partial fulfillment of the requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2017

c

○ Massachusetts Institute of Technology 2017. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

May 26, 2017

Certified by . . . .

Stefanie Shattuck-Hufnagel

Principal Research Scientist

Thesis Supervisor

Certified by . . . .

Jeung-Yoon Elizabeth Choi

Research Scientist

Thesis Supervisor

Accepted by . . . .

Christopher Terman

Chairman, Master of Engineering Thesis Committee

(2)
(3)

Design of Tool for Analysis of Speech Development Disorders using

Landmarks and Other Acoustic Cues

by

Tanya Talkar

Submitted to the Department of Electrical Engineering and Computer Science on May 26, 2017, in partial fulfillment of the

requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

Abstract

Non-word repetition tasks have been used to diagnose children with various developmental difficulties with phonology, but these productions have not been phonetically analyzed to reveal the nature of the modifications produced by children diagnosed with SLI, autism spectrum disorder or dyslexia compared to those produced by typically-developing children. In this thesis, we compared the modification of predicted acoustic cues to distinctive features of manner, place and voicing for just under 30 children (ages 5-12), for the CN-Rep word inventory, in an extension of the earlier analysis in Levy et al. 2014. Feature cues, including abrupt acoustic landmarks (Stevens 2002) and other acoustic feature cues, were hand-labeled and analysis of factors that may influence feature cue modifications included position in the word, position in the syllable, word length measured in syllables, lexical stress, and manner type. Results suggest specific patterns of modification in specific contexts for specific clinical populations. These findings set the foundation for understanding how phonetic variation in speech arises in both typical and clinical populations, and for using this knowledge to develop tools to aid in more accurate and insightful diagnosis as well as improved intervention methods.

Thesis Supervisor: Stefanie Shattuck-Hufnagel Title: Principal Research Scientist

Thesis Supervisor: Jeung-Yoon Elizabeth Choi Title: Research Scientist

(4)
(5)

Acknowledgments

I would first like to acknowledge my two amazing advisors, Dr. Stefanie Shattuck-Hufnagel and Dr. Jeung-Yoon Elizabeth Choi. They have both been excellent mentors and have guided me through this work for the past three years. The work I’ve done has inspired me to go into the field of speech and hearing, and I don’t think I would have been led down this path had I not had two role models to look up to.

I also want to acknowledge my first research advisor at MIT, Dr. Armon Sharei. He al-ways sought to help me figure out what my passions truly were. He’s also been an inspiration as both a researcher and an entrepreneur.

My passion for teaching was fulfilled through my time working with the classes 6.03, 6.0001, and 6.042. Without these, I wouldn’t have gotten the invaluable experiences of seeing students grow throughout semesters and learn how to work through problems that they deemed impossible at the beginning of the year. I believe I have become a better educator through these classes and strive to continue learning from students as well.

And last but not least, I have to thank my family and friends from MIT and beyond who have stood by me and my pursuit of this degree. Without them, I don’t think I could have come as far as I have, and I will be eternally grateful for their help and support.

(6)
(7)

Contents

1 Introduction 15

1.1 Background . . . 16

1.1.1 Autism Diagnosis and Treatment . . . 16

1.1.2 Existing Products . . . 17

1.1.3 Landmarks . . . 19

2 Methodology 23 2.1 Overview of Design of Telemedicine System . . . 23

2.2 Groups of Children . . . 23

2.3 Non-Word Repetition Task . . . 25

2.4 Landmark and Other Acoustic Cue Detection and Labeling . . . 26

2.4.1 Landmark Prediction Algorithm . . . 26

2.4.2 Landmark Detection and Labeling . . . 27

2.4.3 Other Acoustic Cues . . . 29

2.5 Statistical Analysis of Modifications . . . 34

2.5.1 Alignment of Predicted and Realized Tiers . . . 35

2.5.2 Features Extracted . . . 38

2.5.3 Final Output Tab File . . . 41

2.5.4 R Scripts and Statistical Analysis . . . 41

2.5.5 Clinician Review of Files . . . 43

3 Results 45 3.1 ASD vs SLI vs DYS vs TYP with Graphs . . . 45

(8)

3.2 Overlapping Diagnoses with Graphs . . . 50

3.3 Analysis of Other Acoustic Cues . . . 55

3.4 ANOVA and MANOVA tests . . . 55

3.4.1 ANOVA Results . . . 56

3.4.2 MANOVA Results . . . 57

3.5 Diagnosis of Children and Tie to System . . . 58

4 Discussion 59 4.1 Use of Landmark and Other Acoustic Cue Features . . . 59

4.2 Discussion of Use in Telemedicine . . . 59

4.3 Future Work and Improvements . . . 60

4.4 Comparison to Other Products in Industry . . . 61

A List of CNREP Non-Words 63

B Sample Tab File 65

(9)

List of Figures

2-1 Diagram of design of system with steps to diagnosis and treatment. . . 24

2-2 Venn Diagram showing the overlap of different diagnostic groups and the number of children in each group. Total, there are 28 children in this sample. 25

2-3 This figure shows the output of hand labeling landmarks in Praat. The first two tiers are the audio spectrum from the left and right channels respectively. The next area is the spectrogram, showing the formants of the audio signal. The first labeled tier is the words tier, containing intervals showing the dura-tion of each word. The second labeled tier is the realized landmark tier. It is followed by the landmark modification tier that contains all inserted and deleted landmarks. The phones tier shows the predicted phones for the word, and it is followed by the predicted landmark tier (predLM). These labels are turned into a TextGrid file to be processed for statistical analysis. . . 28

2-4 This figure shows the IPA chart for English sounds from [11]. It provides the place and manner of articulation as well as whether the sound is voiced or not. 29

(10)

2-5 This figure shows the output of hand labeling the voicing of non-words in Praat. The first two tiers are the audio spectrum from the left and right channels respectively. The next area is the spectrogram, showing the for-mants of the audio signal. The first labeled tier is the words tier, containing intervals showing the duration of each word. The second labeled tier shows the predicted phones for the word. It is followed by an utterance tier that demarcates where the child was speaking continuously in the spectrogram. The final two tiers are the realized voicing (glottal) and predicted voicing (pglottal) tiers for the word. These labels are turned into a TextGrid file to be processed for statistical analysis. . . 31

2-6 This figure shows the output of hand labeling the nasality of non-words in Praat. The final two tiers are the realized nasality (nasal) and predicted voicing (pnasal) tiers for the word. These labels are turned into a TextGrid file to be processed for statistical analysis. . . 31

2-7 This figure shows the formant transitions for the five different types of places of articulation. Each picture denotes alveolar, dental, labial, palatal, and velar transitions respectively. The red lines in each diagram highlights the general direction that the formants are expected to change during the closure and release. Specifically, we look at F1 and F2 when determining place of articulation. . . 33

2-8 This figure shows the output of hand labeling the consonant place of non-words in Praat. The final two tiers are the realized consonant place (cplace) and predicted consonant place (pcplace) tiers for the word. These tiers are comprised of closure formant transitions, spectral bursts, and release formant transitions. These labels are turned into a TextGrid file to be processed for statistical analysis. . . 34

(11)

3-1 Graph showing the normalized count of the total number of landmark mod-ifications made in children who have autism spectrum disorder (ASD) verus children who are typically developing (TYP). This graph shows that ASD children potentially make significantly more modifications than TYP children do. The normalized count was found by taking (deleted LMs + inserted LMs)/(realized LMs + deleted LMs) *100 . . . 46 3-2 Graph showing the normalized count of the total number of landmark

modifi-cations made in children who have specific language impairment (SLI) versus children who are typically developing (TYP). This graph shows that SLI chil-dren potentially make significantly more modifications than TYP chilchil-dren do. The normalized count was found by taking (deleted LMs + inserted LMs)/(realized LMs + deleted LMs)*100 . . . 47 3-3 Graph showing the normalized count of the total number of landmark

modi-fications made in children who have dyslexia (DYS) versus children who are typically developing (TYP). This graph suggests that DYS children poten-tially make approximately the same number of modifications as TYP chil-dren do. The normalized count was found by taking (deleted LMs + realized LMs)/(realized LMs + deleted LMs)*100. . . 48 3-4 Graph describing the normalized counts of landmark modifications grouped

by the lexical stress on the landmark. This compares DYS children to TYP children. Landmarks were either categorized as primary stress (1), secondary stress (2), tertiary stress (3), unstressed (4), or ambisyllabic (a). This graph shows that DYS children potentially make more modifications to ambisyllabic landmarks than do TYP children. . . 49 3-5 Graph describing the normalized counts of modifications made to landmarks

categorized by specific syllable constituents in the word for ASD children versus ASD-DYS children. Landmarks were either part of the onset (o), the nucleus (n), an ambisyllabic phone (a), or the coda (c). This graph shows that there are potentially more modifications made in the nucleus by ASD-DYS children than ASD children. . . 50

(12)

3-6 Graph describing the normalized counts of landmark modifications made in specific syllable positions comparing ASD children to ASD-DYS children This graph shows that ASD children potentially make more modifications in the 3rd syllable than ASD-DYS children do. . . 51 3-7 Graph describing the normalized counts for the number of landmark

modifica-tions made for different levels of stress - comparing ASD children to ASD-SLI children. This graph shows that ASD-SLI children potentially make more modifications to primary stress syllables than ASD children. . . 52 3-8 Figure showing the normalized count of the total number of landmark

modifi-cations made by children with DYS compared to children with SLI and DYS. This graph shows that DYS children potentially make fewer modifications overall than SLI-DYS children do. . . 53 3-9 Figure showing the normalized count of the total number of landmark

modifi-cations made by children with ASD and DYS compared to children with ASD and SLI. This graph shows that ASD-DYS children potentially make fewer modifications in primary stressed syllables than ASD-SLI children do. . . 54 3-10 Figure showing the p-values for all ANOVA tests run to compare diagnostic

groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of the features for modified landmarks between the two groups being compared. 56 3-11 Figure showing the p-values for all MANOVA tests run to compare diagnostic

groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of pairs of features for modified landmarks between the two groups being compared. . . 57 B-1 Example of a tab file output for a child completing the CNREP task by a

Python processing script described in Section 2.5.3. It includes the various features extracted from the context of the landmark, and provides the infor-mation needed to extract out common patterns of modification. . . 66

(13)

C-1 Figure showing the p-values for all ANOVA tests run to compare diagnostic groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of the features for inserted landmarks between the two groups being compared. 68 C-2 Figure showing the p-values for all ANOVA tests run to compare diagnostic

groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of the features for deleted landmarks between the two groups being compared. 68 C-3 Figure showing the p-values for all MANOVA tests run to compare diagnostic

groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of pairs of features for inserted landmarks between the two groups being compared. 69 C-4 Figure showing the p-values for all MANOVA tests run to compare diagnostic

groups to each other. The cells in green indicate a p-value less than 0.05. These tests looked at whether there was a significant difference in the values of pairs of features for deleted landmarks between the two groups being compared. 69

(14)
(15)

Chapter 1

Introduction

Diagnosis and treatment of illnesses using telemedicine has become heavily integrated into medical practice. Telemedicine appeals to those who live in remote areas and so would have to travel far distances to receive diagnosis and even treatment that could potentially be carried out over video or phone channels. There are many specialties that have happily embraced telemedicine into their daily activities, either incorporating websites which allow doctors to message patients about symptoms and diagnoses, or using video conferencing for diagnosis and following up on treatment.

In this vein, autism specialists have also started turning to telemedicine for diagnosis and treatment of autism spectrum disorder (ASD) and other speech development disorders. Even though technologies for diagnosis and treatment have been developed and tested, they have not been widely adopted. More importantly, most technologies rely on purely behav-ioral analysis, and do not incorporate quantitative speech analysis in their diagnosis. With atypical speech playing an important role in the diagnosis of autism, and speech therapy playing a large role in its treatment, it seems amiss to exclude this analysis.

To that end, this paper proposes the design and partial implementation of a diagnostic tool for speech development disorders using a non-word repetition task that can be conducted remotely. This tool will use the method of measuring the modifications of phonetic landmarks in the speech signal, proposed by Kenneth Stevens, to distinguish between groups of children, with the goal of contributing to more accurate diagnosis with greater convenience to patients and their families.

(16)

1.1

Background

1.1.1

Autism Diagnosis and Treatment

Speech language pathologists (SLP) play a large role in both the diagnosis and treatment of speech development disorders. According to the American Speech-Language-Hearing Asso-ciation, the communication skill impediments that might occur when a child has ASD can include trouble understanding and using words, or repeating words just hear or words heard days or weeks earlier [5]. Behaviors such as these could be analyzed by an SLP, but this alternative would require the child to come in and perform specific tasks. The telemedicine alternative is to conduct a long remote test that allows the SLP to observe the child’s be-havior for an extended period of time [13].

Some studies have shown that ASD diagnosis and treatment may actually benefit from telemedicine. Dr. Yellowlees mentions that children with autism typically interact better with technology and find it easier to talk to someone who is on a television screen than someone who is present in person [13].

Need for telemedicine

Without telemedicine options, parents are more likely to forgo testing for autism because of the hassle that comes with attempting to find a professional to test their child and booking an appointment. For parents who live in rural areas, it is very rare to find an autism expert close by, and therefore, even if their child does display some symptoms of autism, they are reluctant to drive to the nearest major city, which may be hours away, to wait for hours to get their child tested. For children in these situations, there is a possibility of them being diagnosed when they reach school-age, but, once again, availability of qualified professionals becomes a hindrance in their treatment if they are unable to get quality treatment nearby.

Moreover, it is becoming apparent that early detection of autism is the most beneficial with regards to treatment. It is also becoming more important for the parent themselves to be involved in this detection process, as pediatricians and specialists relying on clinical judgment end up missing 30% of the cases who come to them for early detection [5]. Therefore, it is recommended that parents complete a screening of their child in addition to the clinical

(17)

visit.

Early detection leads to many benefits with regards to the child’s development and the family’s economic burden. Children who receive early treatment are more likely to succeed in school and hold independent lives, saving approximately $30,000 to $100,000 to society per child [1].

The Individuals with Disabilities Education Act (IDEA) in the United States articulates that states should identify, locate, and evaluate all children with disabilities who require special education service or early intervention. There are many organizations as well who advocate for early screening and detection [5]. This, however, does not solve the problem of having a parent bring their child to specialists from remote areas if their child has been de-tected as having autism, or if they require additional diagnosis beyond online questionnaires for screening. This is where telemedicine solutions can come into play, and where current creators of telemedicine solutions have found their customers.

1.1.2

Existing Products

There are many telemedicine services already in existence for autism. Each has its own standard for what role it plays in either diagnosis or treatment, and the level of involvement with a clinician.

Ages and Stages Questionnaires

The Ages and Stages Questionnaires (ASQ) company has created multiple screening surveys that parents can fill out [1]. They advertise their screening systems to "pinpoint developmen-tal progress in children between the ages of one month and 5 1/2 years", focusing on children who are not yet school age [1]. Their site focuses a lot on the importance of early screening and diagnosis. They also emphasize the need to create a healthy parent-child relationship, and promote the idea of the parent performing tasks and games that will help the child with development.

There are multiple steps involved in the timeline that ASQ sets out for diagnosis and treatment. The first is the completion of the screening tool. There are two screening tools

(18)

that assess different parts of a child’s development - ASQ-3 which is a comprehensive screen-ing tool measurscreen-ing development in 5 different domains, and ASQ:SE-2, which focuses more on social-emotional development. It is suggested by ASQ that both screening tests are used in conjunction to fully assess a child’s development [1].

After the diagnostic screening has been completed, ASQ provides many resources to allow parents to keep track of their child’s progress and training and activities parents can use to help the child’s development. All of these are online, and are marketed as easy to use by ASQ. The website also details their research process, clearly demarcating why each skill was included and what skills are more important depending on the age of the child [1].

The Autism Telemedicine Company

The Autism Telemedicine Company markets itself as a solution for parents who are unable to make the long trip to the hospital, or who do not have the time to sit in a waiting area to meet a clinician. Like ASQ, they emphasize the need for early diagnosis, and actually cite the ASQ’s early diagnosis website for this purpose [4].

They are less transparent about the effectiveness of their treatment and what transpires, but they advertise screening and videoconferencing tests. There are two screening tests, an online autism screening test and an online developmental test, both of which are products from outside vendors [4]. PedsTestOnline, a patient portal for parents to check on their child’s behavior, provides both screening tests and the results are then collected and sent over to the parent by the Autism Telemedicine Company [10]. There is also an autism diagnosis videoconference that is HIPAA compliant, and a pediatrician videoconference. Neither of these tests has much detail attached, but the site provides an easy way to schedule either of these videoconferences through a booking portal called BookFresh, which allows you to see the available times for a given date [3]. Many of these times are in the evening, between 6pm and 9:45pm, which makes sense in the context of the psychologists and pediatricians who are scheduling these conferences outside of their normal working hours. The times are also very convenient for working parents who would potentially have more time for a screening in the evening when they get to spend time with their child.

(19)

treat-ment following a diagnosis of autism [4].

1.1.3

Landmarks

We use the Children’s Test of Nonword Repetition (CNREP) to categorize and analyze the modifications made by children in common sequences of sounds [7]. There are many benefits to using a non-word repetition task instead of analyzing a free form conversation. We want to make sure we are capturing the internal representation of sounds and sequences that children have, and therefore, we do not want any influence from previous pronunciations they have heard. Diagnosis will be the most accurate when we have an organic representation of what the child will pronounce when presented with a new word containing the sequences we are testing. There are also existing non-word repetition tasks which detail how specific groups of children perform with respect to adults and other groups of children, allowing us to make initial inferences as to which sequences and linguistic factors we should be paying attention to.

In fact, preliminary studies in this area have found that there is a striking difference in the degree to which children with speech disorders in the age range 5-12 years carry out phonetic modifications (reductions) in their speech in a non-word repetition task, CNREP, compared to typically developing children [9]. This task consists of the subject listening to a recording of non-words such as underbrantuand and pristoractional, and then attempting to repeat the sounds they just heard. Previous studies have focused less on the specific pronunciation modifications in the non-words, and more on the general observation of whether the subject was able to pronounce the word correctly or not. Using this kind of whole-word analysis, it has been found that children with specific language impairments (SLI) scored lower on non-word repetition tasks than did children with typically developing speech [9].

In a particular study done by Munson, it was found that phonotactic probability (i.e. the likelihood that certain sounds will occur in certain positions in the words of the language) played a role in non-word repetition in typically developing children aged 3-8 [8]. It was also found that children with smaller vocabularies were more likely to get the non-word sequences wrong, as opposed to children and adults with larger vocabularies who would correlate sequences in the non-words to match higher frequency phoneme sequences in English, thereby

(20)

pronouncing the non-word as expected [8].

Building on this, Edwards et al. conducted a study comparing the fluency of adults and children in non-word repetition [6]. The results of this study found that children had a higher sensitivity to high-frequency vs. low-frequency phoneme sequences than adults did. Adults, therefore, were able to more easily reproduce sequences that occurred less frequently, and also had similar durations of pronunciation for all sequences [6]. This study focused more on the idea that there is a period between the ages of 3-8 in which typically developing children gain experience with a much wider range of phoneme sequences and slowly reach the capabilities of adults in non-word repetition and modification of sequences. In the study above, Edwards et al. did not account for any modifications or mispronunciations that occurred because of the nature of speech of the speakers, or the experimental conditions [6]. In particular, if a child had not yet learned how to pronounce a certain phoneme or had a typical child-like (i.e. non-adult-like) pronunciation of a phoneme, then it was marked as incorrect in the experiment, and given the same weight as a modification of a phoneme or difficulty due to unfamiliarity with the low-frequency phoneme patterns.

Our diagnostic tool, consequently, aims to improve upon current error analyses by focus-ing more on specific modification patterns produced in non-word repetition, and use those patterns to group children by diagnosis. To do this, we use a particular marker of phonolog-ical accuracy, i.e. the production of acoustic landmarks, which are abrupt spectral changes associated with the transition from one manner of articulation to the next in a spoken ut-terance [12]. Detection of acoustic landmarks and their associated distinctive features was proposed by Kenneth Stevens as the critical first steps in a model of human speech percep-tion. When looking at the acoustic signal of a spoken utterance, the acoustic consequence of distinct articulatory closures and releases can be seen by analyzing amplitudes and spectral peaks [12]. The exact analysis and extraction of these landmarks will be elaborated on later in the paper.

In addition to landmarks, we also analyze the presence of other acoustic cues that are relevant to production of non-words. Specifically, we look at voicing, nasality, and place of articulation for consonants. By analyzing the patterns of modifications in these three areas and combining them with the patterns we find with landmark modifications, we anticipate

(21)

that we are able to determine ways to diagnose children and attain a granularity such that we can suggest directions of personalized therapy.

The problem we are trying to solve involves the analysis of differences in landmark mod-ification among the 8 main groups of children, specified in the Venn diagram below. As this diagram illustrates, children often receive overlapping diagnoses, and these subgroups may have distinct types of landmark modification patterns. In the work reported here, we look at the three largest groups of children, i.e. children with autism spectrum disorder (ASD), children with specific language impairment (SLI), and children with dyslexia (DYS), as well as children who have had overlapping diagnoses. We can then compare all seven groups formed by the overlaps to typically developing (TYP) children. The aim is to be able to first distinguish TYP children from any non-TYP children, and then be able to distinguish between the 7 diagnoses using patterns we see in speech modifications.

By looking at previously created recordings of the Children’s Test of Non-Word Repetition (CNREP) for children in these groups, we will be able to label the predicted landmarks and their modifications, and test the hypothesis that there are reliable differences among the groups. This approach represents an advance in this field, as speech production has not been looked at in this level of detail for these particular groups.

(22)
(23)

Chapter 2

Methodology

2.1

Overview of Design of Telemedicine System

There are multiple components to this system, all of which contribute to the ability to remotely diagnose children with ASD and other speech and language disorders. Some of the aspects have been tested or completed, and others are in progress, and therefore the ideal design and the technological details of each component will be elaborated upon.

Figure 2-1 is a diagram of the steps within the system, clearly outlining where clinician intervention will be required.

This proposed design was a guideline for how we should think about generating each part of the module. All statistical analysis code was created in such a way that it could be easily packaged and accessible to the clinician if they want to run it on their own as opposed to on a website.

2.2

Groups of Children

As mentioned in the previous chapter, there are eight groups of children we will be looking at. They are a combination of children with autism spectrum disorder (ASD), specific language impairment (SLI), dyslexia (DYS), and overlapping diagnoses of those three. Each of these seven diagnostic groups are compared to typically developing children (TYP).

(24)

Figure 2-1: Diagram of design of system with steps to diagnosis and treatment. 1. ASD 2. ASD-SLI 3. ASD-DYS 4. ASD-SLI-DYS 5. SLI-DYS 6. SLI 7. DYS 8. TYP

As of now, there are 28 children in our sample. Figure 2-2 is a Venn Diagram that shows the overlap of the clinical groups and the number of children in each group.

All of the children in this population are between 5-12 years of age and participated in an experiment held by the Gabrieli Lab at MIT in which they completed the CNREP task.

(25)

Figure 2-2: Venn Diagram showing the overlap of different diagnostic groups and the number of children in each group. Total, there are 28 children in this sample.

2.3

Non-Word Repetition Task

To accurately compare new children to our cohort of children from different groups, the tool provided to parents will require these new children to perform the same non-word repetition task, CNREP. This consists of a recording of the non-words, including instructions for children to repeat the words after they are said. The non-words provided are detailed in Appendix A.

There are a couple of advantages to using this automated approach rather than having children come into clinics for in-person testing. First, children with autism seem to be more comfortable with technology and computers than they are with testing in person. If an application can be made where the non-word is said out loud, and the child can interact with the application to record their repetition of the non-word, they would be more likely to participate [13]. The parent can also be in the room with the child, so they are not in an unfamiliar place and the child is more likely to organically produce the representation of the non-word that they process. The parent can also intervene if they find that the child is not

(26)

repeating any of the non-words. The recordings of all of these words can then be packaged by the application and accessed by the clinician.

While the recordings could be easily passed on directly to the next few modules in the application, providing a suggestion of diagnosis instantaneously, until the cohort has grown large enough, it is better to have a clinician intervene at the end of each module. For this module, the clinician can listen to the recordings and make sure the child has matched up the right recording to the right non-word, and has provided recordings for each non-word. If anything is amiss, the clinician can communicate with the parent through the application to inform them of the issue, and the parent can perform the correction. This also provides a way for communication between the clinician and parent to be asynchronous, so the clinician does not have to schedule time during regular inpatient hours, and can instead work between patients or before or after the workday.

The recordings are then passed on to the next part of the tool for landmark and other acoustic cue detection.

2.4

Landmark and Other Acoustic Cue Detection and

Labeling

We want to create a file in a TextGrid format that contains the words, phonemes, predicted landmarks, realized landmarks, and modified landmarks for each child.

2.4.1

Landmark Prediction Algorithm

Because we know the list of non-words, our first task is to label the audio spectrum from the recording with all intervals where the child is speaking. In our labeled TextGrid, we have a words tier where the duration of every word is labeled. We use the software Praat [2] for all of our labeling.

After the words tier has been created, we can estimate where the phonemes lie within the word. We provide a script with a list of the phonemes that belong to each word, and create another tier, phones, that contains the intervals of phonemes within a word.

(27)

Finally, we insert a tier of predicted landmarks. For each phoneme, we have a list of landmarks that we expect to appear when pronouncing that phoneme, characterized by closures and releases. Our predicted landmark algorithm uses no information from the given audio, but rather places predicted landmarks within the interval of the relevant phoneme.

The combination of these three tiers provides the basis for the landmarks that we ex-pect to appear in these non-words, and is the baseline from which any modifications are determined.

2.4.2

Landmark Detection and Labeling

To accurately conduct statistical analysis, we need to label all of the realized, deleted, and inserted landmarks. Currently this task is done by hand, where members of the lab will listen to the audio and analyze the spectrum to determine where landmarks lie and which of them have been modified. For this packaged tool, however, we aim to have an automated detection of landmarks. The development of this algorithm is currently being worked on by other members in the lab, and still requires some development to fully detect all of the landmarks.

Landmark detection is accomplished using a forced alignment algorithm, which takes the expected landmarks (provided by the predicted landmark tier) and attempts to match it to the audio spectrum based off the signal expected for each landmark. In forced alignment, if there is a part of the signal that is not expected from the prediction, it is counted as a modification and recorded in a separate tier as an insertion along with the potential deletion. Insertions are labeled as landmarks with a "-+" at the end, and deletions are labeled as landmarks with a "-x" at the end of the mark.

Presently, at the end of this forced alignment, a clinician could peruse the outputted TextGrid with the tiers of realized and modified landmarks to ensure that all predicted landmarks are accounted for in the sum of the realized and deleted landmarks. There is an error detection script that will check all TextGrids and determine whether all predicted landmarks are accounted for in the labeled realized and deleted landmarks.

(28)

Figure 2-3: This figure shows the output of hand labeling landmarks in Praat. The first two tiers are the audio spectrum from the left and right channels respectively. The next area is the spectrogram, showing the formants of the audio signal. The first labeled tier is the words tier, containing intervals showing the duration of each word. The second labeled tier is the realized landmark tier. It is followed by the landmark modification tier that contains all inserted and deleted landmarks. The phones tier shows the predicted phones for the word, and it is followed by the predicted landmark tier (predLM). These labels are turned into a TextGrid file to be processed for statistical analysis.

(29)

Figure 2-4: This figure shows the IPA chart for English sounds from [11]. It provides the place and manner of articulation as well as whether the sound is voiced or not.

2.4.3

Other Acoustic Cues

In addition to landmarks, we are interested in looking at the differences in modification of various acoustic cues that accompany landmark changes. These acoustic cues will also be considered when finding factors that differentiate clinical groups from one another. In the eventual creation of the tool, the patterns will be provided as output for the speech pathologist to take into account when developing personalized therapy.

Voicing Tier

The first acoustic cue we looked at was voicing, looking at where the child produced voiced landmarks with a glottal voicing. Voicing indicates that the vocal folds have come close together during sound production and cause vibration. The alternative to this is devoiced, where the vocal folds are far apart during sound production and do not cause vibration. All glides and vowels are voiced. The exception to this is the sound for h which is considered a "voiceless glide". The consonants are split into voiced and devoiced counterparts. The exact chart of these is provided in an IPA chart in figure 2-4.

Prediction of voicing and devoicing was linked to specific landmarks. With the use of the phones and predicted landmarks tier, we were able to determine where +g and -g marks should go to correspond to voicing and devoicing respectively. If the first landmark in a word is voiced, a +g will be placed there. Otherwise, if voicing occurs in the middle of a word, the

(30)

+g will be placed at the same time as the release of the devoiced landmark that precedes the voicing. Devoicing is either marked at the same time as the last landmark in an utterance, or at the closure of an devoiced consonant, both marked as -g. Since h is produced with aspiration noise, the duration of the utterance is marked at the start with a <h and at the end with a h> to reflect its duration.

When labeling other acoustic cues, we only have one labeled tier that reflects the utterance that the child produced. As opposed to the landmarks tier that has a separate modification tier, we rely on alignment with the predicted tier to tell us what has been inserted or deleted. The main reasoning behind this has to do with the fact that voicing is marked as more of a duration of time rather than a single point in time like landmarks, so relying on one tier allows us to more accurately mark the duration. Determining these inserted and deleted voicing cues and their analysis is detailed later in the alignment section.

In addition to the marks mentioned above, we introduced another mark to correspond to when there are irregular pitch periods in the child’s utterance. An example of an irregular pitch period in the spectrogram is provided in the image below. It reflects a modification in the child’s repetition of the non-word. The start of the period is marked with a <ipp and the end of the period is marked with a ipp>. These irregular pitch periods are always considered inserted, as they are not supposed to exist in the utterance.

Figure 2-5 shows a sample word with both predicted and realized voicing.

Nasal Tier

The next acoustic cue is nasality. This tier is similar to the glottal tier in that it marks periods of nasality. Within the corpus of words we are looking at, the nasals corresponded to m, n, and ng. In the predicted tier, a +n is placed at the closure of the nasal consonant, and a -n is placed at the release of the nasal consonant.

There were no additional marks that were added as insertions. The realized nasal tier was labeled similarly to the glottal tier, where it reflected exactly the utterance made by the child. Figure 2-6 shows an example of a labeled word with the nasal tiers.

(31)

Figure 2-5: This figure shows the output of hand labeling the voicing of non-words in Praat. The first two tiers are the audio spectrum from the left and right channels respectively. The next area is the spectrogram, showing the formants of the audio signal. The first labeled tier is the words tier, containing intervals showing the duration of each word. The second labeled tier shows the predicted phones for the word. It is followed by an utterance tier that demarcates where the child was speaking continuously in the spectrogram. The final two tiers are the realized voicing (glottal) and predicted voicing (pglottal) tiers for the word. These labels are turned into a TextGrid file to be processed for statistical analysis.

Figure 2-6: This figure shows the output of hand labeling the nasality of non-words in Praat. The final two tiers are the realized nasality (nasal) and predicted voicing (pnasal) tiers for the word. These labels are turned into a TextGrid file to be processed for statistical analysis.

(32)

Consonant Place Tier

The final acoustic cue considered was place of articulation for consonants. The chart below details where each consonant was considered to be produced.

The consonant place tier was more similar to the landmark tier in the sense that it corre-sponded to specific times within the spectrum, and was not a demarcation of a duration like voicing or nasality. Each consonant could potentially have up to three labels corresponding to cues within the phone. The places of articulation considered were labial (lab), dental (den), palatal (pal), alveolar (alv), and velar (vel).

The transition from a vowel to a consonant or potentially silence to the start of a conso-nant can be identified by a formant transition. Specifically, we see changes in the F1 and F2 formants, and the specific nature in which these changes happen corresponds to the place of articulation of the consonant. A chart detailing these formant transitions and how they correspond to each place of articulation is provided in figure 2-7. The transition into a con-sonant closure is denoted as <***-FTc> where the stars are substituted with the specific place of articulation. Similarly, the transition out of a consonant into a vowel or glide is denoted as <***-FTr>. The exact determination of the place of articulation is based off of the realized landmark as well as the values of the formants. However, when labeling, it was found that many children did not have the exact values that were originally stated, so transitions and place of articulation were based off of relative changes in the formants.

For stops, affricates, and fricatives, we expect there to be a spectral burst, that typically corresponds to the release. The detection of this burst is based on the profile of the spectral energy of the burst or frication noise. This is labeled as <***-SB>. Figure 2-8 is an example of a labeled word with all of the consonant place markers. Once again, there is only the realized tier for this acoustic cue, and insertions and deletions are determined by the alignment algorithm later on.

Automatic Detection of Other Acoustic Cues

Just as with landmarks, this tool will ideally be able to automatically detect other acoustic cues from the audio clips that are provided from the child. There is currently work going on

(33)

Figure 2-7: This figure shows the formant transitions for the five different types of places of articulation. Each picture denotes alveolar, dental, labial, palatal, and velar transitions respectively. The red lines in each diagram highlights the general direction that the formants are expected to change during the closure and release. Specifically, we look at F1 and F2 when determining place of articulation.

(34)

Figure 2-8: This figure shows the output of hand labeling the consonant place of non-words in Praat. The final two tiers are the realized consonant place (cplace) and predicted consonant place (pcplace) tiers for the word. These tiers are comprised of closure formant transitions, spectral bursts, and release formant transitions. These labels are turned into a TextGrid file to be processed for statistical analysis.

that is working towards this automatic detection.

2.5

Statistical Analysis of Modifications

The majority of the work done on this tool been the landmark and other acoustic cue processing and the subsequent statistical analysis on the modifications. The goal of this statistical analysis is to figure out the specific contexts where modifications are made in certain groups of children, and compare their rate of modification to other groups. By comparing where modifications occur the most, we are able to find patterns that are specific to groups. These patterns will subsequently help distinguish between groups and categorize children into diagnostic groups.

All processing from TextGrids into tab-separated files has been written in Python. All of the TextGrid files from the hand or automatic labeling described above will go through Python scripts. Each landmark and other acoustic cue is extracted and a tab-separated file is produced with all of their contexts. A separate tab-separated file is generated for each tier that is processed.

(35)

deletions, and insertions for a child or a group of children for each tier. These numbers are used to calculate the ratio of modifications made in specific contexts. This data, when processed as part of the tool, could be used to add to the existing cohort of children, or, in the case of diagnosis, to compare the child to existing data to obtain a recommended diagnosis. There are additional scripts written in R that conduct multivariate analysis of variance (MANOVA) and analysis of variance (ANOVA) tests to determine statistical significant features that separate diagnostic groups from one another. The results from these graphs and statistical tests will be discussed further in the results section.

2.5.1

Alignment of Predicted and Realized Tiers

As mentioned in the previous section, there are separate tiers for words, phonemes, and predicted and realized landmarks and acoustic cues. There is an easy alignment between the words, phonemes, and predicted tiers, where a predicted mark can easily be matched with its phone and word to determine its context. The task, then, is to be able to match the realized and modified marks to their corresponding predicted marks, which allows us to extract out features and match them to phonemes and words. The alignment is done differently for landmarks versus for other acoustic cues, so both methods will be mentioned.

Alignment of Predicted and Realized/Deleted Landmarks

To align predicted landmarks with realized and deleted landmarks, we use the Needleman-Wunsch algorithm to align sequences. The algorithm takes in two sequences, an A sequence and a B sequence, and outputs an alignment of the two sequences such that identical char-acters are aligned and insertions and deletions to the A sequence are demarcated as stars (*) in the A and B sequences respectively.

Our landmark error detection algorithm makes sure that the combination of realized and deleted landmarks is the same as the predicted landmarks. We process the landmarks word by word. For each word, we extract the predicted landmarks for that specific word. Realized landmarks are all of the labeled landmarks in the realized tier that fall between the time boundaries of the word in question. Based off the notation discussed in the previous section,

(36)

all deletions are demarcated by a x at the end of their labeled landmark. We can then extract all deletions by finding the modifications that fall within the time boundaries of the word, and taking out all landmarks that contain a -x at the end of their landmark.

Our representation of landmarks contains a timestamp, which allows us to easily merge our realized and deleted landmarks and order by timestamp. From there, we feed the two arrays of predicted and realized+deleted landmarks into the Needleman-Wunsch algorithm. We expect that the alignment code will return no insertions or deletions, since all of the predicted landmarks should be present. However, if there are insertions or deletions, then the script returns an error, telling us to double check that the landmarks were labeled correctly.

The alignment between landmarks allows us to easily link our realized and deleted land-marks to their phonemes and words by referring to the timestamp of the predicted landmark.

Alignment of Other Acoustic Cue Tiers

Since the marks for the other acoustic cues are less unique than the landmarks present in the word, namely with the glottal and nasal tiers having only two possible main markings, we believe that this alignment can not only be approached with a min-edit distance algorithm, but potentially also with an algorithm that takes into account phonetic rules and aligns with landmarks. To that end, there are two forms of alignment code that are being developed, and while they are not fully accurate yet and work still needs to be done, they are both relatively promising.

The min-edit distance algorithm runs pretty similar to the alignment code described above for landmarks, and will align predicted and realized tiers of other acoustic cues. How-ever, to account for the fact that there are not that many possibilities of marks, the algorithm sets a time limit for how far the marks can be from one another to be aligned. This limit is based off average utterance speeds for these children for these specific words. While the algorithm is relatively successful in detecting which marks to match up with, there are times where it will attribute a deletion of a period or mark to a phoneme that comes later in the sequence, rather than earlier in a sequence. So, if there were two voicing periods that had been combined, instead of indicating that the -g and +g at the end of the first period and

(37)

beginning of the second period respectively were deleted, it will say that the second voicing period was deleted completely. This does not quite capture the patterns of modification, and therefore we are looking to improve this algorithm.

In light of the weaknesses of the previous algorithm, another approach is to realize that the predicted and realized tiers of acoustic cues can be linked to landmarks, which are successfully aligned with each other. By coding in rules for when voicing, nasality, and consonant place marks happen, this algorithm will match each mark with a landmark in the predicted or realized tier. This allows us to better see the patterns of modification. While there are still some errors in matching up these marks, it is getting close to 100% accuracy for the nonwords, and hopefully we will be able to use the processing programs written for the other acoustic cue tiers to extract out common patterns of modification.

Linking Predicted Landmarks to Inserted Landmarks

There are multiple ways of dealing with inserted landmarks. We discuss the approach that is currently implemented, and will also detail the approach to be taken in future iterations of this script.

The issue with inserted marks is that they do not have a predicted phoneme within the word that they immediately match with. We also cannot use timestamps, since our phoneme mapping algorithm does not currently match phoneme durations to the signal, and instead just uses predicted durations.

Our current approach for marks, therefore, is to match each mark to the closest realized or deleted mark. A mark is either inserted with response to adjustment to a sequence, or is inserted to replace a deleted mark. With this method, linking the inserted mark to a deleted or realized mark allows us to link the phoneme to the inserted mark. This will give us the relevant information about the location of the mark within the word, and other contextual information that is relevant to figuring out the patterns differentiating groups of children.

This logic works relatively well with specifically the landmark tier since the nearby land-marks typically have the same context as the inserted landmark. However, this logic may be flawed if the closest labeled realized or deleted landmark was not actually the landmark modified or replaced by the inserted landmark. In the future, we hope to implement a

(38)

la-beling method in which we can tell whether inserted landmarks were inserted in response to a sequence or as a replacement to deleted landmarks. This will allows us to make a more accurate link between the inserted landmark and the phoneme that gives us contextual information about the modification.

In addition, this method does not quite capture the contextual changes that accompany insertions in the other tiers. For voicing and nasality, if an insertion is made, it is typically because a new period has been inserted where there was not one before or a single period has been split. For consonant place, it may be that the place of articulation changed or there was a landmark inserted and it is reflected in the consonant place tier. Either way, looking at the surrounding marks is not the most accurate way of extracting out context. Ideally, during alignment, we would be able to link an inserted acoustic cue to an inserted landmark, and, given a new scheme of determining the context of the inserted landmark, we could determine the context of the acoustic cue. However, for the non-words, there has not been much of an issue of aligning inserted acoustic cues to the phoneme that we believe they belong to. We do believe it may raise a larger issue if this algorithm is extended to conversational speech where we would be looking at utterances.

2.5.2

Features Extracted

To assess where modifications have been made, certain linguistic features will be relevant to landmark modifications. Once each landmark has been matched to a phoneme and a word, using the Python script described above, we can extract these features easily. A separate Python script does this extraction. These details are included in a row for each landmark or acoustic cue in a tab-separated file, indicating the values that each feature takes on for a single mark. As mentioned before, for each type of mark analyzed, there is a separate tab-separated file created. This leads to 4 different files, one for landmarks, one for the glottal tier, one for the nasal tier, and one of the consonant place tier.

(39)

Manner of Articulation - Landmarks Only

For landmarks, we first extract out the manner of articulation. We look at the manners of articulation that are most common in linguistic study: stops, fricatives, nasals, affricates, glides, and vowels. However, because we are looking at landmarks, we also want to ana-lyze differences that may arise in closures and releases. Our complete list of manners of articulation is as follows: 1. Stop Closure (Sc) 2. Stop Release (Sr) 3. Nasal Closure (Nc) 4. Nasal Release (Nr) 5. Fricative Closure (Fc) 6. Fricative Release (Fr)

7. Stop Release/Fricative Closure (Sc/Fr) - Affricate 8. Glide (G)

9. Vowel (V)

The shortened form in parentheses is the text that is used to represent the feature in the Tab separated file. When we process each landmark, there is a direct mapping between the landmark text and the manner of articulation using a Python dictionary that we have hardcoded. The manner of articulation is processed separately from the phoneme since inserted landmarks have their own manner of articulation, but they may have been produced in the context of a phoneme that does not match that manner of articulation.

All of the rest of the features in this section are extracted for landmarks and all other acoustic cues and depend on the phoneme that the mark is linked to.

(40)

Syllable Constituent

The building blocks of words are syllables. Within a syllable, there are components called syllable constituents that play a role in the syllable’s structure. While the exact names of these components vary depending on the language spoken, we will be labeling phonemes by four different syllable constituents:

1. Onset - A consonant or consonant cluster that precedes the nucleus.

2. Nucleus - A vowel or syllabic consonant which is in the center of the syllable and receives the lexical stress.

3. Coda - A consonant or consonant cluster that is not necessarily required to form a complete syllable and follows the nucleus.

4. Ambisyllabic - A consonant or consonant cluster between two syllables, acting in such a manner that it cannot be definitively labeled as the onset or coda of the following or preceding syllable respectively.

We have processed each of the non-words in the CNREP corpus and have labeled each phoneme as belonging to one of the four categories above. Because these are non-words, there may be some errors in the labeling, but after comparing the words and sequences to similar sequences to similar sequences in real English words, the final assignment seems to be the one that fits the best.

The current Python script also is able to process all English words that are present in the Carnegie Mellon University corpus of words. This corpus provides syllable stress and markings of syllable constituents. Therefore, if indicated in the script, processing can also be done to get the syllable constituent of all English words in that dictionary.

Lexical Stress

Each syllable in a word receives a differing amount of emphasis, which is referred to as stress. The syllable with the most stress has primary stress (1). Right below that is secondary stress (2), which is considered only slightly weaker than primary stress. We also take into account

(41)

tertiary stress (3) and syllables with no stress (4). The stress of any phonemes is based off the syllable to which it belongs. Therefore, we introduce a 5th type of stress, which is ambisyllabic (a). This arose due to the fact that ambisyllabic syllables could take on the stress of their preceding or following syllable, and therefore are potentially modified in different ways. Stress patterns for these words were decided based off derivations of common sequences in real words in English, translated to these non-words.

Position of Syllable and Number of Syllables

The final two features we look at are the position of the syllable within the word and the number of syllables in the word. All of the non-words range from 1 to 5 syllables. Each syllable is labeled with a position from 1-5 as well. Ambisyllabic phonemes are labeled as "a" just as they were in the lexical stress context, because we want to differentiate them from phonemes that belong to a specific syllable When we analyze the number of modifications made, we will also split up words by their length, performing analysis on each group to see if there are any differences within words of the same length.

2.5.3

Final Output Tab File

For each tier and type of mark, we can easily pass a TextGrid from Praat into the Python script. It will assign each mark to its predicted phoneme and thereby determine a manner of articulation (for landmarks only), a syllable constituent, a lexical stress, a position of its syllable, and the number of syllables for each mark. The realization of the mark, whether it was present as predicted (o), deleted (x), or inserted (+), is also marked. This information is then written into the Tab-separated file as described above, where each tier has a separate file. The file is passed onto a script written in R to complete the final analysis. An example of a tab-separated file for each of the tiers is provided in Appendix B.

2.5.4

R Scripts and Statistical Analysis

Using the Tab files for each tier that have been created for each child, multiple scripts in R are provided that produce graphs comparing the normalized count of modifications for multiple

(42)

features and also run MANOVA and ANOVA tests to compare the types of modifications made in each group.

ANOVA and MANOVA Tests

To compute the statistical significance of the difference between the modifications made in each group, we run both ANOVA and MANOVA tests. We did not want to compare individual marks, as that might lead to overfitting, so we decided to create rows of manner of articulation (landmarks only), lexical stress, syllable constituent, syllable position, and number of syllables in the word.

All tests were done just considering the modified marks to see if there was any difference in the features of these marks across diagnostic groups. The ANOVA test run by R takes in, for each mark, one of the features extracted and the diagnostic group this mark belongs to. This test will then tell us, for a specific feature, whether there is a statistical significant difference between the value of that feature for the modifications two diagnostic groups. Similarly, for the MANOVA test, we feed in two features and return the statistical significance of the pair of features being able to differentiate between the modifications made in two diagnostic groups. These tests compare only two groups at a time, but allow us to find specific features that contribute to differentiating groups from each other.

Graphs

We also decided to compare various statistics related to the total number of modifications. For each feature in each tier, we created a set of graphs that were grouped and stacked, showing the raw number of realized, deleted, and inserted landmarks. We also created a set of graphs that show the normalized count of modifications, calculated by taking the number of modifications divided by the number of predicted marks, or (deleted marks + inserted marks)/(realized marks + deleted marks). This gives us a normalized count for the number of modified landmarks per child. We plotted the mean of the normalized count for each group for different features. Examples of these graphs are provided in the results section to discuss some potential patterns of modification we would like to follow up on once we get a larger group of children.

(43)

2.5.5

Clinician Review of Files

In the final version of this tool, once it is created, we expect that the clinician will get full access to the results from the Python and R scripts. They will get the opportunity to review the Tab file outputted for each child after the Python script is run. They can then check to make sure all words were processed and there were not any errors in processing inserted marks. Once alignment and detection are fully automated and tested for accuracy, however, we expect that the time the clinician spends reviewing this information will be minimized, and they will only need to look at the graph and statistical test output from the R script.

(44)
(45)

Chapter 3

Results

Multiple comparisons were made between the eight groups of children to find distinguishing features that led to a different number and type of modifications. Any analysis in this section is currently based off a small group of children, and therefore we cannot make any definite conclusions about whether these patterns will persist when more children are added. These analyses do, however, guide us in determining which contexts may be important in patterns of modification.

3.1

ASD vs SLI vs DYS vs TYP with Graphs

Our first comparison of interest is between the four larger diagnostic groups without any overlapping diagnoses. We look at children with Autism Spectrum Disorder (ASD), chil-dren with a specific language impairment (SLI), chilchil-dren with dyslexia (DYS), and typically developing children (TYP). The comparison between these groups is especially important because we do not want to end up missing a diagnosis for any child. Once we have separated the non-TYP children from the TYP children, diagnosis between the three groups of children is also important because speech therapies vary across the different groups.

For the following analyses, we focused mostly on the number of landmark modifications. Additional graphs detailing differences in other acoustic cues are provided in the appendix. When comparing children who have autism spectrum disorder (ASD) to children who are typically developing (TYP), we find that ASD children currently have a higher number

(46)

Figure 3-1: Graph showing the normalized count of the total number of landmark modifi-cations made in children who have autism spectrum disorder (ASD) verus children who are typically developing (TYP). This graph shows that ASD children potentially make signifi-cantly more modifications than TYP children do. The normalized count was found by taking (deleted LMs + inserted LMs)/(realized LMs + deleted LMs) *100

(47)

Figure 3-2: Graph showing the normalized count of the total number of landmark modifi-cations made in children who have specific language impairment (SLI) versus children who are typically developing (TYP). This graph shows that SLI children potentially make sig-nificantly more modifications than TYP children do. The normalized count was found by taking (deleted LMs + inserted LMs)/(realized LMs + deleted LMs)*100

of modifications than TYP children. The numbers in figure 3-1 were found by averaging the normalized count of modifications across all of the ASD and TYP children in the sample. While the error bars are large compared to the normalized count of modifications, we know that with a larger sample size, the variation within groups of children will reduce.

We find a similar pattern when we compare children with a specific language impairment to TYP children, as in figure 3-2. SLI children in our sample have a larger number of modi-fications than TYP children, though we can’t make any definite conclusions given the small number of SLI children that we have. But, if these numbers hold for the SLI population, then we could also say that SLI children make more modifications than ASD children, creating a ranking purely based off the normalized count of modifications.

When we look at the average normalized count of landmark modifications made by chil-dren with dyslexia (DYS) versus TYP chilchil-dren, we find that there is not a significant dif-ference between the two in the current sample. This is detailed in figure 3-3. We, therefore,

(48)

Figure 3-3: Graph showing the normalized count of the total number of landmark modifica-tions made in children who have dyslexia (DYS) versus children who are typically developing (TYP). This graph suggests that DYS children potentially make approximately the same number of modifications as TYP children do. The normalized count was found by taking (deleted LMs + realized LMs)/(realized LMs + deleted LMs)*100.

(49)

Figure 3-4: Graph describing the normalized counts of landmark modifications grouped by the lexical stress on the landmark. This compares DYS children to TYP children. Land-marks were either categorized as primary stress (1), secondary stress (2), tertiary stress (3), unstressed (4), or ambisyllabic (a). This graph shows that DYS children potentially make more modifications to ambisyllabic landmarks than do TYP children.

looked into more features to see if any differentiating factors existed to separate the two groups.

Based off figure 3-4 of the normalized counts of landmark modifications grouped by the lexical stress on the landmark, it is possible that DYS children in the sample make more modifications to ambisyllabic landmarks than TYP children do. Once again, even though the error bars are large, we believe variation will decrease as more children are added to the sample and are analyzed. This will hopefully highlight other differences in features that will distinguish DYS and TYP children.

The hope is that we can use the trends mentioned above to distinguish between SLI, DYS, and ASD children if they only belong to one of those groups. These trends can not only help with diagnosing the children, but can also help a speech pathologist modify therapy exercises to accommodate for the differences in modifications for the different groups.

(50)

Figure 3-5: Graph describing the normalized counts of modifications made to landmarks categorized by specific syllable constituents in the word for ASD children versus ASD-DYS children. Landmarks were either part of the onset (o), the nucleus (n), an ambisyllabic phone (a), or the coda (c). This graph shows that there are potentially more modifications made in the nucleus by ASD-DYS children than ASD children.

3.2

Overlapping Diagnoses with Graphs

While diagnosing a child with a single diagnosis of SLI, DYS, or ASD is an accomplishment, our final goal is to be able to distinguish between the overlapping groups of these children. The overlap of disorders requires different therapies, and knowing that a child has multiple diagnoses can lead to better personalized therapy. For many of these groups, due to the small size of our population, we cannot make any final conclusions about the modifications made, but we can certainly use the initial patterns found to guide our search once more children are added.

When comparing ASD children to children who have ASD and DYS, as in figure 3-5 most features were modified with the same normalized count. However, ASD-DYS children in the sample make more modifications to landmarks in the nucleus of a syllable than ASD children. Speech therapists can use this information to potentially target words where the

(51)

Figure 3-6: Graph describing the normalized counts of landmark modifications made in specific syllable positions comparing ASD children to ASD-DYS children This graph shows that ASD children potentially make more modifications in the 3rd syllable than ASD-DYS children do.

nucleus was modified and work with the child on improving these errors.

In addition, figure 3-6 suggests that ASD-DYS children potentially make fewer modifica-tions in the 3rd syllable than children with ASD do in our sample. To detect patterns and eventually diagnose children, we need to pay attention to these specific contexts and make sure the system keeps note of which groups typically have a higher number of modifications in those contexts.

To differentiate between children with ASD and children with ASD and SLI, we looked at the differences in the normalized counts of landmark modifications for varying lexical stress in figure 3-7. Similar to the previous comparison, for most of the features, the number of modifications made were very close in number. However, it does seem as though children in the ASD-SLI group in our sample made more modifications in primary stressed landmarks than the children with ASD did. This gives us the opportunity to look further into the types of modifications made in these primary stressed syllables, and whether the trend persists

(52)

Figure 3-7: Graph describing the normalized counts for the number of landmark modifica-tions made for different levels of stress - comparing ASD children to ASD-SLI children. This graph shows that ASD-SLI children potentially make more modifications to primary stress syllables than ASD children.

Figure

Figure 2-1: Diagram of design of system with steps to diagnosis and treatment. 1. ASD 2
Figure 2-2: Venn Diagram showing the overlap of different diagnostic groups and the number of children in each group
Figure 2-3: This figure shows the output of hand labeling landmarks in Praat. The first two tiers are the audio spectrum from the left and right channels respectively
Figure 2-4: This figure shows the IPA chart for English sounds from [11]. It provides the place and manner of articulation as well as whether the sound is voiced or not.
+7

Références

Documents relatifs

Studies (Bartak et al., 1975; Cantwell et al., 1978) investigating language, cognition, and behavior in children with Autistic Spectrum Disorder (ASD) and children with

SPEAKY Project: Adaptive Tutoring System based on Reinforcement Learning for Driving Exercizes and Analysis in ASD Children.. ICDL-EpiRob 2018 Workshop on “Understanding

Concerning the pragmatic impact of children’s justifications in FOMs, partners of verbally less advanced typical children are more likely to accept immediately the child’s

Previous studies have indicated that families of young children with Autism Spectrum Disorder (ASD) experience challenges caring for their children and that social support services

The project, HANDS (Helping Autism-diagnosed teenagers Navigate and Develop Socially), studies how teenagers with ASD and with IQ within the normal range (IQ &gt; 70) can handle

percentage of correct detections in case of crossmodal stimulations and its counterpart in case of auditory stimuli. In this phase, representations of speech in the two

A battery of clinical tools consisting of the Autism Diagnostic Observation Schedule, the revised version of the Autism Diagnostic Interview, measures of intellectual functioning,

NVIQ scores, ASD-LI with average NVIQ, or spared language abilities in presence of impaired 6. NVIQ, ASD-LN with