Deeper learning at scale with roleplaying systems

(1)

Deeper Learning at Scale with Roleplaying Systems

by

Pablo José Ortiz-Lampier

B.S., University of California, Santa Barbara (2011)

S.M., Massachusetts Institute of Technology (2013)

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2021

c

○ Massachusetts Institute of Technology 2021. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

October 19, 2020

Certified by . . . .

D. Fox Harrell

Professor of Digital Media and Artificial Intelligence,

Comparative Media Studies Program, and

Computer Science and Artificial Intelligence Laboratory

Thesis Supervisor

Accepted by . . . .

Leslie A. Kolodziejski

Professor of Electrical Engineering and Computer Science,

Chair, Department Committee on Graduate Students

(2)

(3)

Deeper Learning at Scale with Roleplaying Systems

by

Pablo José Ortiz-Lampier

Submitted to the Department of Electrical Engineering and Computer Science on October 19, 2020, in partial fulfillment of the

requirements for the degree of Doctor of Philosophy

Abstract

Contemporary online learning systems are increasingly important and common ele-ments of post-secondary, workplace, and lifelong education. The current state is that these systems typically employ the banking model of education to educate learners. While this method is quite effective for teaching foundational knowledge, it is ill-suited for fostering deeper learning, “...an umbrella term for the skills and knowledge that students must possess to succeed in 21st century jobs and civic life...” [218] in-cluding, among other things, critical thinking. To meet learners’ growing needs, we must go beyond the banking model of education and advance the state of the art. As a step toward this goal, I investigated how one might design online learning systems to scalably support critical thinking. Reflection, when not treated synonymously with critical thinking, is often cited as a key component of critical thinking. Thus, by working to support reflection in online learning systems, I work to support critical thinking skills at scale and, by extension, deeper learning. This dissertation con-tributes (1) a framework, grounded in roleplay theory & practice, for designing online learning systems that scalably support reflection, (2) novel systems that exemplify and operationalize this framework, (3) a method for effectively evaluating reflection at scale, and (4) an evaluation of the novel systems and design framework in terms of their ability to support reflection.

Thesis Supervisor: D. Fox Harrell

Title: Professor of Digital Media and Artificial Intelligence, Comparative Media Studies Program, and

(4)

(5)

Acknowledgments

There is an African proverb that states, “it takes a village to raise a child.” On my academic journey, I discovered that it also takes a village to complete a Ph.D. Fortunately, I have been blessed with an extraordinarily loving and supportive village. I simply could not have completed this thesis without them. Thus, I would like to both acknowledge their efforts and dedicate this work to them.

I dedicate this thesis to my wife, Kitty, who grants me strength, wisdom, and courage when mine is spent. Your advice, patience, and keen editorial eye supported me every step of the way. Since embarking on this long journey together, each day has been a treasure.

I dedicate this thesis to my mother, Minerva, who taught me how to live a dignified and fulfilling life. She was taken before her time, during my studies. Her passing has not prevented her teachings from guiding and fortifying me through life’s many challenges as they always have. I am with you, and you are with me, always. My mom, my hero, may you rest in peace.

I dedicate this thesis to my father, Michael, who defends, honors, and encourages me in all things. You have always urged and inspired me to be the best I can be, whether on this academic journey or any other venture I have undertaken. I pray that you will find the best of me in this work and be proud.

I dedicate this thesis to my brother, Daniel, and my sister-in-law, Alex, who have faith in me. Staunch allies, wise counsellors, kind hosts, you have been a blessing to me on this journey. May fortune favor you as I do.

I dedicate this thesis to my friends. You have helped me keep my sanity in this, often, crazy world we live in. Though life may scatter us like leaves on the wind, should you seek them, you will find our bonds of friendship ever unbroken.

Last but not least, I dedicate this thesis to my Ph.D. thesis supervisor, Fox, who helped me forge a new path forward when I had lost my way. With your support and guidance, I joined an amazing lab and conducted fulfilling, substantive research on a lifelong passion of mine. I thank you from the bottom of my heart.

(6)

(7)

1 Introduction 23 1.1 Motivation . . . 25 1.2 Scope . . . 26 1.3 Vision . . . 28 1.4 Challenges . . . 28 1.5 Research Questions . . . 29 1.6 Hypotheses . . . 30 1.7 Contributions . . . 30 1.8 Overview . . . 30 2 Theoretical Framework 33 2.1 Overview of Key Terms: Deeper Learning, Reflection, and Roleplay . . . 34

2.1.1 Deeper Learning . . . 34 2.1.2 Reflection . . . 38 2.1.3 Roleplay . . . 39 2.2 Literature Review . . . 41 2.2.1 Human-Computer Interaction . . . 42 2.2.2 Education . . . 60 2.2.3 Game Design . . . 62 2.3 Summary . . . 65

(8)

3 Roleplaying Systems 67 3.1 Chimeria . . . 68 3.1.1 Branching . . . 70 3.2 Chimeria:Grayscale . . . 70 3.2.1 Overview . . . 71 3.2.2 Design Goals . . . 71 3.2.3 Project-Specific Background . . . 73

3.2.4 Roleplaying System Design . . . 76

3.3 Extensions . . . 84

3.3.1 Chimeria:Grayscale MOOC . . . 84

3.3.2 Reflective Prompts . . . 86

3.4 The Blue Room . . . 89

3.4.1 Overview . . . 89

3.4.2 Design Goals . . . 91

3.4.3 Project-Specific Background . . . 91

3.4.4 Roleplaying System Design . . . 95

3.5 Summary . . . 103 4 Methods 105 4.1 Overview . . . 105 4.2 Experimental Design . . . 107 4.2.1 Exploratory Studies . . . 107 4.2.2 Experiments . . . 107 4.3 Participants . . . 108

4.3.1 Amazon Mechanical Turk . . . 108

4.4 Measures . . . 110

4.4.1 Learning Activities Survey . . . 110

4.4.2 System Usability Scale . . . 110

4.4.3 AIRvatar: A Data Collection Framework . . . 111

(9)

4.5 Data Cleaning . . . 114

4.6 Data Analysis . . . 116

4.6.1 Reflexive Thematic Analysis . . . 116

4.7 Sample Size . . . 117

4.7.1 Power Analysis . . . 117

4.8 Summary . . . 118

5 Results 119 5.1 Chimeria:Grayscale Pilot Study . . . 120

5.1.1 A Note on Methods . . . 121

5.1.2 Participants . . . 121

5.1.3 Procedure . . . 123

5.1.4 Results . . . 123

5.2 Chimeria:Grayscale Framework Study . . . 128

5.2.2 Procedure . . . 131

5.2.3 Results . . . 136

5.3 Comparison Study of Reflection-Enabling Methods in HCI . . . 148

5.3.2 Procedure . . . 151

5.3.3 Results . . . 152

5.4 Chimeria:Grayscale Self-Debriefing Study . . . 154

5.4.2 Procedure . . . 157

5.4.3 Results . . . 157

5.5 Chimeria:Grayscale Reflective Prompts Study . . . 161

5.5.2 Procedure . . . 163

(10)

5.6 Chimeria:Grayscale Verification Study . . . 166

5.6.2 Procedure . . . 169

5.6.3 Results . . . 170

5.7 The Blue Room Pilot Study . . . 174

5.7.1 Participants . . . 175 5.7.2 Procedure . . . 177 5.7.3 Results . . . 177 5.8 Summary . . . 180 6 P3 Framework 181 6.1 Overview . . . 181

6.2 The Preparation Phase . . . 182

6.2.1 Designer Preparation . . . 182

6.2.2 Roleplaying System Preparation . . . 186

6.3 The Performance Phase . . . 189

6.3.1 Roleplaying System Facilitation . . . 189

6.3.2 Reflective Prompts . . . 190

6.4 The Processing Phase . . . 191

6.4.1 Self-Debriefing . . . 191

6.5 Summary . . . 192

7 Discussion & Conclusion 193 7.1 Overview of Contributions . . . 193

7.2 Applications . . . 194

7.3 Future Work . . . 195

7.3.1 Improved Deeper Learning at Scale . . . 195

7.3.2 Improved Roleplaying Systems . . . 195

7.3.3 Deeper Characterization of Roleplaying Systems . . . 197

(11)

(12)

(13)

List of Figures

2-1 Systematic Review - Completed PRISMA flow diagram for the systematic-review. . . 46 2-2 Systematic Review - Taxonomy of methods for evaluating reflection

derived from the systematic review corpus. . . 50 2-3 Transformative Learning Theory & The Learning Activities Survey

-The ten stages of perspective transformation. . . 60 3-1 Chimeria - Software architecture diagram for a typical Chimeria

ap-plication. . . 69 3-2 Chimeria:Grayscale - High-level user flow diagram for Chimeria:Grayscale.

The web pages labeled (1), (2), and (3) correspond to the login pages, the email client pages, and the credits pages. Roleplay participants transition from (1) to (2), and, finally, to (3) over the course of a single Chimeria:Grayscale experience. . . 72 3-3 Chimeria:Grayscale - High-level software architecture diagram for

Chime-ria:Grayscale. On the front end side, the major components of the web application are highlighted in blue. On the back end side, there is only one component: a single logging script. . . 73 3-4 Chimeria:Grayscale - Summary of ambivalent sexism as described by

(14)

3-5 Chimeria:Grayscale - Screenshot of Chimeria:Grayscale’s email client, Graymail. The purple labels correspond to the following elements of the interface: (1) primary inbox, (2) selected email, and (3) available responses. . . 77 3-6 Chimeria:Grayscale - Screenshot of the ‘Notes’ inbox in Chimeria:Grayscale’s email client, Graymail. . . 78 3-7 Chimeria:Grayscale - Screenshot of the first email presented to roleplay

participants in Chimeria:Grayscale. . . 80 3-8 Chimeria:Grayscale - Screenshot of Chimeria:Grayscale’s login page. . 81 3-9 Chimeria:Grayscale - Screenshot of an example of Chimeria:Grayscale’s

final e-mail message. . . 82 3-10 Chimeria:Grayscale - Screenshot of another example of Chimeria:Grayscale’s

final e-mail message. . . 83 3-11 Chimeria:Grayscale MOOC - Screenshot of the self-debriefing module. 85 3-12 Reflective Prompts - Screenshot of reflective prompt in Chimeria:Grayscale

(see lower-right portion of image). . . 87 3-13 Chimeria:Grayscale - Screenshot of example narrative beat, the sort

that was paired with a reflective prompt, in Chimeria:Grayscale. . . . 88 3-14 Chimeria:Grayscale - Reflective prompt paired with narrative beat

shown in Figure 3-13. . . 88 3-15 The Blue Room - High-level user flow diagram for The Blue Room. The

web pages labeled (1), (2), and (3) correspond to the login page, The Blue Room page, and the credits page. Roleplay participants transition from (1) to (2), and, finally, to (3) over the course of their experience with The Blue Room. . . 90 3-16 The Blue Room - High-level software architecture diagram for The

Blue Room. On the front end side, the major components of the web application are highlighted in blue. On the back end side, there is only one component: a single logging script. . . 91 3-17 The Blue Room - Screenshot of The Blue Room social media platform. 95

(15)

3-18 The Blue Room - Screenshot of welcome message presented to The Blue Room’s participants. . . 98 3-19 The Blue Room - Screenshot of The Blue Room’s login page. . . 99 3-20 The Blue Room - Screenshot of the inciting incident for the conflict

between Riley and protagonist in The Blue Room. . . 100 3-21 The Blue Room - Screenshots of ending presented to participants that

elect to carelessly escalate the protagonist’s conflict with Riley on The Blue Room social media platform. In this instance, the protagonist has been named “Jean” by a roleplay participant. . . 101 3-22 The Blue Room - Screenshots of ending presented to participants that

initially elect to escalate the protagonist’s conflict with Riley on The Blue Room social media platform, but, subsequently, commit to de-escalating the conflict with Riley until the end of the experience. In this instance, the protagonist has been named “Jean” by a roleplay participant. . . 102 5-1 Chimeria:Grayscale Pilot Study - Histogram of participant ages. . . . 122 5-2 Chimeria:Grayscale Pilot Study - (Left) Distribution of gender among

study participants; (Right) Distribution of employment status among study participants. . . 122 5-3 Chimeria:Grayscale Pilot Study - LAS response pattern frequencies.

X-axis labels are binary representations of LAS response patterns. The leftmost bit represents answers to the 1st LAS survey item, i.e., “...caused me to question the way I normally act.”. The 2nd-to-leftmost bit represents answers to 2nd LAS survey item, and so on. (0 = “No”; 1 = “Yes”) . . . 125 5-4 Chimeria:Grayscale Pilot Study - Histogram of SUS scores . . . 127 5-5 Chimeria:Grayscale Pilot Study - Post-Game Questionnaire results.

(16)

5-6 Chimeria:Grayscale Pilot Study - Social Presence Gaming Question-naire results. Bars = subscale means. Error bars = subscale standard deviations. . . 129 5-7 Chimeria:Grayscale Framework Study - Histogram of participant ages. 131 5-8 Chimeria:Grayscale Framework Study - (Left) Distribution of gender

among study participants; (Right) Distribution of employment status among study participants. . . 131 5-9 Chimeria:Grayscale Framework Study - Screenshot of Chimeria:Grayscale

sans Particularization & Presencing. . . 133 5-10 Chimeria:Grayscale Framework Study - Screenshot of Chimeria:Grayscale

sans Distancing. . . 135 5-11 Chimeria:Grayscale Framework Study - Histogram of perspective

trans-formation stages inhabited by control condition participants. . . 136 5-12 Chimeria:Grayscale Framework Study - Histograms of perspective

trans-formation stages inhabited by participants assigned to (upper left) the Chimeria:Grayscale sans Particularization & Presencing condition, (upper right) the Chimeria:Grayscale sans Personalization condition, (lower left) the Chimeria:Grayscale sans Intermixing condition, and (lower right) the Chimeria:Grayscale sans Distancing condition. . . . 137 5-13 Chimeria:Grayscale Framework Study - Histogram of SUS scores . . . 141 5-14 Chimeria:Grayscale Framework Study - Histogram of SUS scores

ob-tained from study participants assigned to (upper left) the Chime-ria:Grayscale sans Particularization & Presencing condition, (upper right) the Chimeria:Grayscale sans Personalization condition, (lower left) the Chimeria:Grayscale sans Intermixing condition, and (lower right) the Chimeria:Grayscale sans Distancing condition. . . 142 5-15 Chimeria:Grayscale Framework Study - Histogram of total elapsed

times obtained from study participants assigned to the control con-dition. . . 143

(17)

5-16 Chimeria:Grayscale Framework Study - Histogram of total elapsed times obtained from study participants assigned to (upper left) the Chimeria:Grayscale sans Particularization & Presencing condition, (up-per right) the Chimeria:Grayscale sans Personalization condition, (lower left) the Chimeria:Grayscale sans Intermixing condition, and (lower right) the Chimeria:Grayscale sans Distancing condition. . . 144 5-17 Chimeria:Grayscale Framework Study - Core GEQ results obtained

from study participants assigned to the control condition. Bars = subscale means. Error bars = subscale standard deviations. . . 145 5-18 Chimeria:Grayscale Framework Study - Core GEQ results obtained

from study participants assigned to (upper left) the Chimeria:Grayscale sans Particularization & Presencing condition, (upper right) the ria:Grayscale sans Personalization condition, (lower left) the ria:Grayscale sans Intermixing condition, and (lower right) the Chime-ria:Grayscale sans Distancing condition. Bars = subscale means. Error bars = subscale standard deviations. . . 146 5-19 Comparison Study of Reflection-Enabling Methods in HCI - Screenshot

of my Web-based version of the Ambivalent Sexism Inventory. . . 148 5-20 Comparison Study of Reflection-Enabling Methods in HCI - Screenshot

of the results page shown to users who have completed my Web-based version of the Ambivalent Sexism Inventory. . . 149 5-21 Comparison Study of Reflection-Enabling Methods in HCI - Histogram

of participant ages. . . 151 5-22 Comparison Study of Reflection-Enabling Methods in HCI - (Left)

Dis-tribution of gender among study participants; (Right) DisDis-tribution of employment status among study participants. . . 151 5-23 Comparison Study of Reflection-Enabling Methods in HCI - (Left)

His-togram of the baseline results from the Chimeria:Grayscale Framework Study. (Right) Histogram of perspective transformation stages inhab-ited by study participants. . . 153

(18)

5-24 Comparison Study of Reflection-Enabling Methods in HCI - (Left) Histogram of baseline results from the Chimeria:Grayscale Framework Study. (Right) Histogram of SUS scores obtained from study partici-pants. . . 154 5-25 Chimeria:Grayscale Self-Debriefing Study - Histogram of participant

ages. . . 156 5-26 Chimeria:Grayscale Self-Debriefing Study - (Left) Distribution of

gen-der among study participants; (Right) Distribution of employment sta-tus among study participants. . . 156 5-27 Chimeria:Grayscale Self-Debriefing Study - (Left) Histogram of the

baseline results from the Chimeria:Grayscale Framework Study. (Right) Histogram of perspective transformation stages inhabited by study par-ticipants. . . 158 5-28 Chimeria:Grayscale Self-Debriefing Study - (Left) Histogram of

base-line results from the Chimeria:Grayscale Framework Study. (Right) Histogram of SUS scores obtained from study participants. . . 160 5-29 Chimeria:Grayscale Self-Debriefing Study - Histogram of total elapsed

times obtained from study participants. . . 160 5-30 Chimeria:Grayscale Reflective Prompts Study - Histogram of

partici-pant ages. . . 162 5-31 Chimeria:Grayscale Reflective Prompts Study - (Left) Distribution of

gender among study participants; (Right) Distribution of employment status among study participants. . . 163 5-32 Chimeria:Grayscale Reflective Prompt Study - (Left) Histogram of

the baseline results from the Chimeria:Grayscale Framework Study. (Right) Histogram of perspective transformation stages inhabited by study participants. . . 164 5-33 Chimeria:Grayscale Reflective Prompt Study - (Left) Histogram of

baseline results from the Chimeria:Grayscale Framework Study. (Right) Histogram of SUS scores obtained from study participants. . . 166

(19)

5-34 Chimeria:Grayscale Reflective Prompts Study - Histogram of total elapsed times obtained from study participants. . . 167 5-35 Chimeria:Grayscale Verification Study - Histogram of participant ages. 168 5-36 Chimeria:Grayscale Verification Study - (Left) Distribution of gender

among study participants; (Right) Distribution of employment status among study participants. . . 169 5-37 Chimeria:Grayscale Verification Study - (Left) Histogram of the

base-line results from the Chimeria:Grayscale Framework Study. (Right) Histogram of perspective transformation stages inhabited by study par-ticipants. . . 170 5-38 Chimeria:Grayscale Verification Study - Histogram of total elapsed

times obtained from study participants. . . 174 5-39 The Blue Room Pilot Study - Histogram of participant ages. . . 176 5-40 The Blue Room Pilot Study - (Left) Distribution of gender among

study participants; (Right) Distribution of employment status among study participants. . . 176 5-41 The Blue Room Pilot Study - (Left) Histogram of the baseline results

from the Chimeria:Grayscale Framework Study. (Right) Histogram of perspective transformation stages inhabited by study participants. . . 177 5-42 The Blue Room Pilot Study - (Left) Histogram of baseline results from

the Chimeria:Grayscale Framework Study. (Right) Histogram of SUS scores obtained from study participants. . . 179 5-43 The Blue Room Pilot Study - Histogram of total elapsed times obtained

(20)

(21)

List of Tables

2.1 Systematic Review - Summary of the prevalence of measuring reflection and characterizing reflection in the systematic review corpus. . . 50 2.2 Systematic Review - Prevalence of studies that involved methods for

measuring reflection. . . 51 2.3 Systematic Review - Prevalence of studies that involved methods for

characterizing reflection. . . 54 2.4 Systematic Review - Ranked ordering of prevalence of each qualitative

method for characterizing reflection not grounded in theory observed in the systematic review corpus. . . 57 4.1 Experiments - Summary of exploratory studies and experiments. (N=number of human subjects) . . . 108 5.1 Chimeria:Grayscale Pilot Study - LAS results. (Column 1) perspective

transformation stage; (Column 2) LAS item; (Column 3 - 6) # of participants by gender identity who responded ’Yes’ . . . 124 6.1 Guidelines for prospective roleplay designers to follow in order to

(22)

(23)

Chapter 1 Introduction

Online learning is increasingly important and commonplace. This form of learning, distinguished from other forms by being conducted over the Internet, manifests in many ways, e.g., flipped classrooms, massive open online courses (MOOCs), online workplace training, educational smartphone apps, webinars, and more.

The historical roots of online learning lie in distance education. The term distance education can be defined as, “...teaching and planned learning in which teaching nor-mally occurs in a different place from learning, requiring communication through technologies as well as special institutional organization.” [141] The form taken by distance education over time has been highly dependent on available communication technology. The practice began in the early 18th century with teachers and students corresponding via mail. Much later, in the 20th century, the advent of computing and the Internet in particular paved the way for online learning, a special form of distance education and by far the most popular form at present [185]. Thus, in contemporary usage, the term distance education mostly refers to online learning.

Online learning has been gaining in popularity for many years, a trend reflected in studies conducted by both government and industry. A recent report published by the Babson Survey Research Group [177] based on data collected by the U.S. Depart-ment of Education’s National Center for Education Statistics (NCES), for example, revealed that students enrolled in degree-granting post-secondary institutions have been participating in distance education at a steadily increasing rate since 2002. In

(24)

2018, the most recent year for which NCES enrollment data is available, participation in distance education grew again by 4.2% among students enrolled in degree-granting post-secondary institutions to a total of 6,932,074 students. This figure represents 35.3% of all students enrolled in degree-granting post-secondary institutions in The United States of America that year [81]. Globally, similar trends have been ob-served [19]. As another example, a report published by LinkedIn Learning based on a survey study of organizations’ professional development practices [190] disclosed that organizations increasingly rely on online tools to foster professional development. Re-portedly, 71% of survey respondents employed online tools developed in-house for this purpose, and 67% employed tools developed externally. In 2018, usage of in-house and externally developed online tools for fostering employee professional development had increased by 13 percentage points and 18 percentage points, respectively, over the previous year. As these examples demonstrate, online learning is becoming an increasingly common practice as demand for it grows in multiple domains.

Scale, in terms of student-to-teacher ratio, is a determining factor in the design of online learning experiences. It determines which pedagogical approaches may feasibly be deployed. At smaller scales (e.g., courses offered through learning management systems), online learning has been adopted widely and, when done well, has been shown to be at least as effective as traditional, face-to-face instruction [129, 185]. A broad range of pedagogical approaches have been used successfully at this scale. At larger scales, in contrast, effective online instruction has been challenging. Massive open online courses (MOOCs), in particular, have yet to realize the vision of democ-ratized education that energized so many in the first half of this decade. Rather than pedagogical innovation, online instruction at scale has taken steps backward from the trailblazing efforts of smaller scale online instruction toward traditional pedagogies that scale well but do not fully meet the needs of 21st century learners [9, 51, 79, 226]. Thus, opportunities abound for innovating in this space toward high quality online learning at scale.

As the population of online learners grows, the challenge of delivering high quality online learning at scale becomes increasingly urgent. The urgency and importance of

(25)

this challenge motivated this dissertation.

1.1 Motivation

Over the past couple of decades, organizations such as the World Economic Forum and the Partnership for 21st Century Learning have sought to identify the key com-petencies learners need to succeed in a rapidly changing, increasingly digital world. These competencies, referred to as 21st century skills [88, 220], should equip 21st century learners to face the demands of the global marketplace and to weather in-creasingly volatile career trajectories. At present, there is broad consensus regarding what most of these skills are, e.g., critical thinking, creativity, collaboration, a deep understanding of core material, etc.1 The Hewlett Foundation popularized the term deeper learning (not to be confused with the subset of machine learning methods called deep learning) and use it when referring to 21st century skills as a set of learn-ing goals [218]. For the remainder of this dissertation, I shall eschew the term 21st century skills in favor of deeper learning. (See the §Deeper Learning section of the chapter that follows for a in-depth overview of deeper learning.)

Systems that support large-scale online learning could be improved to better meet the needs of 21st century learners. At present, the pedagogy of such systems is typically aligned with the banking model of education [80]. The banking model of education describes education as a process whereby educators, acting as authoritative sources of knowledge, transfer their knowledge to learners acting as passive receptacles for this knowledge [55]. It is an effective model for teaching introductory material, but it is ill-suited for fostering deeper learning. There are, of course, large-scale online learning systems that deviate from this model. A few notable examples include MITx course Circuits and Electronics’ usage of a web-based schematic capture and simulation tool [138] as well as MITx course Introduction to Solid State Chemistry’s usage of Cherner et. al’s virtual X-Ray Laboratory [33]. However, as noted by

1_{The list of 21st century skills identified as such varies slightly by organization. The list presented}

(26)

Guárdia, Maina and Sangrá; Cabrera and Fernández-Ferrer; and Zawacki-Ricther et. al. [25, 64, 226], large-scale online learning systems that adhere to the banking model of education remain pervasive.

To better meet the needs of 21st century learners, large-scale online learning systems could be further developed to support deeper learning, but there is little evidence-based guidance for how this might be accomplished. The Deeper Learning MOOC, developed by High Tech High [34], may have served as an ideal source of guid-ance for developing large-scale online learning systems that support deeper learning. However, as the course is no longer available and no formal assessment of the course was ever published, no guidance can be gleaned from this effort. Despite an extensive search, no evidence of similar efforts was found. Thus, there is a research gap. The pervasiveness of the banking model of education in large-scale online learning systems may be due, in part, to the ease with which it can be supported at scale. Suppose the same were true of deeper learning. With this thesis, I aim to push toward that reality. I aim to further close the research gap.

1.2 Scope

The research gap I have identified has an associated research problem, i.e., how to design online learning systems that support deeper learning at scale. The problem is broad, important, and its impactfulness, far-reaching. To address this problem with appropriate care and depth, this dissertation focuses on and explores a specific, critical subset of the identified research problem.

This dissertation focuses on the problem of scalably supporting a specific compo-nent of deeper learning for a particular group of learners via a certain class of online learning systems. The specifics of this focus are as follows. First, many luminaries on the topic of deeper learning assign substantial importance to the critical thinking competency [130, 144, 218]. When not conflated with critical thinking, reflection is frequently conceptualized as a key component of critical thinking [38]. As such, this dissertation adopts the stance that, by developing scalable support for reflection in

(27)

on-line learning systems, scalable support for critical thinking is developed by extension. With such support in place, a learner’s ability to reflect in relation to a particular con-text or subject area can be developed through practice. (See the §Reflection section of the chapter that follows for an in-depth overview of reflection.) Next, deeper learning research to-date has primarily focused on learners who are minors — neglecting adult learners in immediate need of deeper learning competencies. Demand for the skills associated with deeper learning is increasing and ongoing [190]. As such, there is a need for working adults and soon-to-be working adults to obtain these skills as soon as possible. As the context and characteristics of adult learners can differ substantially from that of primary and secondary school learners, strategies for supporting deeper learning for adults should be considered and developed separately. This dissertation adopts this stance in pursuit of developing scalable support for deeper learning for adult learners, a population that also warrants robust support. As a consequence of the language proficiencies of the author, this dissertation will specifically focus on English-speaking adult learners. Finally, if the involvement of individuals other than learners is required during the learning process, this requirement represents a bottleneck for the scalability of online learning systems. While human beings are capable of feats that computers simply cannot do, computers are not limited in terms of availability, consistency, processing speed, etc. in the same way human beings are. To maximize the efficacy of online learning systems, it is important to understand which tasks can be delegated to software and which tasks must be done by people. This dissertation focuses on online learning systems that do not involve other people in a learner’s learning process, i.e., autonomous online learning systems. This focus allowed me to explore software solutions for scaling support for deeper learning in online learning systems and to identify scalability bottlenecks that require human involvement.

In short, this dissertation will address the problem of scalably supporting reflection for English-speaking adult learners via autonomous online learning systems.

(28)

1.3 Vision

Roleplay theory and practice can serve as the foundation for a framework for designing autonomous online learning systems that support reflection at scale. Roleplay2_{, as a}

method for supporting reflection, has a long history in clinical [41,222] and educational settings [5, 86, 171, 194]. In these settings, roleplays are designed, administered, and evaluated by facilitators. The design, deployment, and evaluation of effective roleplays is quite challenging. Fortunately, a savvy facilitator may consult published guidelines to navigate this process.

Suppose that an autonomous online learning system were capable of administer-ing and evaluatadminister-ing a roleplay. Suppose that its design were grounded in published guidelines for the design of effective roleplays. Let us call this class of online learn-ing system a “roleplaylearn-ing system.” Such a system, in principle, would be capable of supporting reflection at scale.

As a step toward deeper learning at scale, I researched how one might design roleplaying systems. As part of this effort, I developed a framework for designing roleplaying systems. This framework was subsequently evaluated and contextualized as a contribution to both human-computer interaction (HCI) and education.

1.4 Challenges

There are unique challenges involved in investigating how one might design roleplaying systems. This dissertation tackles the following challenges and contributes actionable insight regarding how others might do the same:

1. Evaluating reflection at scale. Reflection is difficult to evaluate, and evalu-ating reflection at scale introduces another layer of difficulty. The methods for evaluating reflection reported on in the most recent review of the HCI litera-ture on reflection [14] are ill-suited for evaluating reflection at scale. Reportedly,

2_{The terms ‘roleplay’ and ‘simulation’ are often used interchangeably in the literature. As}

‘sim-ulation’ is far more semantically overloaded, I shall use ‘roleplay’ in most cases. When the term ‘simulation’ appears in the text, I shall attempt to clearly communicate what the term means in context.

(29)

qualitative assessments of reflection are common. However, such assessments are costly in terms of time and human effort. Thus, qualitative assessment at scale is likely to be prohibitively expensive. Reportedly, quantitative assess-ments of reflection make use of dubious methods. Such assessassess-ments involve measures of other outcome variables and flimsy justifications for their causal relationship to reflection. Thus, whether qualitative or quantitative, there does not yet appear to be a method for evaluating reflection in the HCI literature that is suitable for evaluating reflection at scale. To address this challenge, I shall need to develop or identify, either from more recent HCI research or from research farther afield, a suitable quantitative measure of reflection. This mea-sure must be grounded in theory, and its theoretical grounding must be clearly communicated. (See the §Evaluating Reflection: A Systematic Review section of the chapter that follows for details on how I accomplished this.)

2. Assumption of real-world setting in published guidelines. Published guidelines for the design of effective roleplays assume that roleplays will be administered in real-world settings. As roleplaying systems administer roleplays in virtual settings, published guidelines may not fully apply to the design of roleplaying systems as-is. To address this challenge, I shall need to develop a mapping from published guidelines to software design principles. (See the §P3 Framework chapter for the final result of this development effort.)

1.5 Research Questions

∙ RQ1: What are the primary disadvantages of current methods for evaluating reflection in human-computer interaction research, and how might these disad-vantages be addressed in order to facilitate the study of reflection at scale? ∙ RQ2: To what extent do roleplaying systems support learners’ reflections?

(30)

1.6 Hypotheses

∙ Hypothesis 1 : There exists a method for measuring reflection that is suitable for evaluating reflection at scale.

∙ Hypothesis 2 : There exists a mapping from roleplay theory and practice to software design such that the resulting set of software design principles has the following property: roleplaying systems developed using these principles can scalably and autonomously support reflection.

1.7 Contributions

This dissertation makes the following contributions toward achieving deeper learning at scale:

∙ A framework for designing roleplaying systems.

∙ A method for effectively evaluating reflection at scale. ∙ A few roleplaying systems.

∙ An evaluation of each roleplaying system in terms of their ability to support reflection on their central themes.

∙ An evaluation of my design framework in terms of its ability to produce au-tonomous online learning systems that scalably support reflection.

1.8 Overview

The remainder of this dissertation is organized as follows: After synthesizing related background knowledge (i.e., Chapter 2), the roleplaying systems developed for this dissertation are presented in detail (i.e., Chapter 3). The dissertation continues by detailing my research methods (i.e., Chapter 4), experimental results (i.e., Chapter 5),

(31)

and design framework (i.e., Chapter 6). The dissertation concludes with a discussion of application areas and directions for future research (i.e., Chapter 7).

(32)

(33)

Chapter 2 Theoretical Framework

This chapter describes the theoretical foundations of this dissertation. The chapter begins with an overview of three concepts fundamental to understanding this disser-tation: deeper learning, reflection, and roleplay. Deeper learning is an educational movement and set of learning goals that this dissertation seeks to facilitate at scale in online learning contexts. This dissertation focuses on the reflection component of deeper learning and seeks to enable it in online learning systems primarily through roleplay. After introducing these concepts, the chapter presents a review of related re-search. From HCI, it details research on reflection in including an in-depth systematic review of current methods for evaluating reflection in HCI research. This section is the longest because it is my core research area. Next, from education, the chapter in-troduces Transformative Learning Theory and debriefing practices drawn specifically from medical education. The former is a theory of adult education that undergirds this dissertation’s method for evaluating reflection. The latter is a practice employed by this dissertation to augment the reflection enabled by roleplay. From game design the chapter describes related research on serious games including, in particular, the Embedded Design Model. The Embedded Design Model, set of design methods for increasing engagement with the content of serious games, was used in the design of one of the roleplaying systems developed as part of this dissertation. Finally, the chapter concludes with a recapitulation of the chapter’s contents.

(34)

2.1 Overview of Key Terms: Deeper Learning,

Reflection, and Roleplay

This section presents an overview of three concepts fundamental to understanding this dissertation: deeper learning, reflection, and roleplay. The section presents each concept by, first, defining it and then describing relevant, associated practices in detail. Prior work related to the concepts presented in this section is described in a separate section of this chapter (see the §Literature Review section for relevant prior work organized by discipline).

2.1.1 Deeper Learning

Deeper learning (not to be confused with the subset of machine learning methods called deep learning) is a term that represents an ambition for U.S. education — an ambition to progress U.S. education beyond rote learning for as many learners as possible. Rote learning, a type of learning where knowledge is acquired through repetition and memorization [128], is considered by many educational organizations, including the William & Flora Hewlett Foundation [218] and the National Research Council (NRC) of the U.S. National Academies [144], to be insufficient for preparing learners for 21st century life on its own.

The ambition to progress education beyond rote learning is not new. In fact, though the precise terminology used often differs, there have been advocates for this vision for quite some time. Troubled by the factory-inspired model of organization and pedagogy adopted by U.S. schools post-industrialization, John Dewey, a late nine-teenth century and early twentieth century philosopher-educator who had a profound impact on educational thinking, advocated for a shift away from rote learning toward more interdisciplinary, hands-on, collaborative curricula — ones that would allow teachers and learners to together engage in learning both practical and deep [36, 130]. More contemporary advocates for progress often situate the term rote learning (or a synonym) in opposition to an endorsed alternative, e.g., rote learning vs. meaningful

(35)

learning [128], surface learning vs. deep learning [15], etc. While the ambition to progress education beyond rote learning is not new, the widespread expectation that educational institutions in the U.S. fulfill this ambition for all of its students is [130]. Deeper learning advocates seek to meet this expectation.

Definition

The term deeper learning, though popularized by the William & Flora Hewlett Foun-dation [217], has no universally accepted definition. The absence of a universally accepted definition for deeper learning is due, in part, to the Hewlett Foundation’s “...calculated decision not to force a particular pedagogy or brand of deeper learn-ing on its grantees and the stakeholders with which they work and communicate...” Reportedly, however, these selfsame grantees and stakeholders greatly desire such a definition [212]. Until a definition for deeper learning gains widespread acceptance, it is imperative for works such as this one that make use of the term to clarify its meaning in context.

There are multiple extant definitions of deeper learning. The Hewlett Foundation, for example, uses the term deeper learning when referring to a specific set of learning goals aimed at preparing learners for 21st century life: mastery of core academic content, thinking critically and solving complex problems, working collaboratively, communicating effectively, learning how to learn, and developing academic mindsets [218]. The NRC defines deeper learning as, “...the process through which a person becomes capable of taking what was learned in one situation and applying it to new situations - in other words, learning for ‘transfer.” ’ [144] They assert that this process leads learners to develop “...21st century competencies1 _{— transferable knowledge}

and skills.” Mehta and Fine’s conceptualization of deeper learning, described in their book In Search of Deeper Learning: The Quest to Remake the American High School [130], builds on the Hewlett Foundation’s definition, the NRC’s definition, and antecedents that, in a fundamental way, made a distinction between “deep” and

1_{The NRC uses the term “21st century competencies” rather than “21st century skills” to include}

(36)

shallow learning, i.e., rote learning. They posit that deeper learning occurs at the intersection of mastery (i.e., expertise, transfer of learning), identity (i.e., intrinsic motivation, perceived relevance of material, integration of material with core identity), and creativity (i.e., ability to act on or make something with learned material).

In many respects, the definitions of deeper learning described above are quite similar. Each is premised on the notion that education must progress beyond rote learning for as many learners as possible, and each lays out a set of learning goals related to this notion [130, 144, 218]. What distinguishes these definitions from each other is their focus (e.g., transfer) and their associated list of learning goals. Cru-cially for this dissertation, each conceptualization of deeper learning described above assigns substantial importance to critical thinking and, by extension, reflection (see the §Reflection subsection). For the Hewlett Foundation and the NRC, critical think-ing is a core competency associated with deeper learnthink-ing [144, 218]. For Mehta and Fine, minimal critical thinking in classrooms is treated as a sign of minimal deeper learning in classrooms [130]. As such, though this dissertation is primarily informed by the Hewlett Foundation’s definition of deeper learning, its contributions towards supporting critical thinking at scale are of value to, among others, deeper learning advocates belonging to different definitional camps.

Target Audience

Efforts related to deeper learning, i.e., outreach, research, deployments, etc., have thus far primarily focused on deeper learning for primary and secondary school students. This focus is made explicit in reports commissioned on deeper learning [43, 145, 212]. A focus on primary and secondary school students entails certain assumptions, for example, about the sites of deeper learning (e.g., schools). These assumptions, in turn, inform strategic thought regarding how to deliver deeper learning experiences to learners at scale (e.g., changing school district policies). However, these assump-tions may not hold for all learners, and the associated strategies for supporting deeper learning may be ineffective for learners who are not primary and secondary school students. Efforts to support deeper learning for primary and secondary school

(37)

learn-ers are incredibly important. They aim to prepare future generations to meet the demands of the 21st century. Recall, however, that the term deeper learning repre-sents an ambition to progress education beyond rote learning for as many learners as possible. A focus on primary and secondary students will necessarily result in less support for other groups of learners for whom deeper learning should be supported.

Though considered less frequently in related literature, one group of learners that warrants robust support for deeper learning is adult learners. Demand for the skills associated with deeper learning is increasing and ongoing [190]. As such, there is a need for working adults and soon-to-be working adults to obtain these skills as soon as possible. As the context and characteristics of adult learners can differ substantially from that of primary and secondary school learners, strategies for supporting deeper learning for adults should be considered and developed separately. This dissertation adopts this stance in pursuit of developing scalable support for deeper learning for adult learners.

***

At present, there is little evidence to suggest that deeper learning has become commonplace in K-12 schools (i.e., schools that support kindergarten through 12th grade education) [130, 212]. Even less evident is the prevalence of deeper learning in online learning systems, as deeper learning is studied infrequently in these contexts [181]. As such, a strong motivation for the research presented in this dissertation is that, despite the growing demand and need for deeper learning, it is rarely delivered at any scale, much less at the massive scale potentially enabled by online learning systems.

As alluded to above, this dissertation is informed by the Hewlett Foundation’s definition of deeper learning and specifically focuses on the critical thinking compo-nent of this construct. The subsection that follows describes how critical thinking is conceptualized in this dissertation.

(38)

2.1.2 Reflection

Critical thinking, similarly to deeper learning, has no universally accepted definition. Definitions tend to be, “...quite disparate and narrowly field dependent.” [158] Thus, it is important to clarify the meaning of the term in the context of this dissertation. The term critical thinking and reflection are often conflated [28]. When not con-flated with critical thinking, reflection is frequently conceptualized as a key component of critical thinking [38]. As such, this dissertation adopts the stance that by devel-oping support for reflection, support for critical thinking is developed by extension. Reflection, thus, will be the focus for the remainder of the dissertation.

Definition

The term reflection (not to be confused with optical reflection or the programming language feature) has multiple extant definitions among scholars. The following are a few notable examples. John Dewey defined reflection as, “...active, persistent, and careful consideration of any belief or supposed form of knowledge in the light of the grounds that support it, and the further conclusions to which it tends.” [44] This definition of reflection is useful for understanding the concept, but quite broad in scope. Similarly broad in scope, Boud et. al. define reflection as, ‘...an important human activity in which people recapture their experience, think about it, mull it over and evaluate it.” [16] In contrast to Dewey’s definition, this definition assumes that reflection is an activity conducted in response to an experience.

Though numerous types of reflection have been identified by scholars, this dis-sertation primarily focuses on critical self-reflection. Building on Dewey’s definition of reflection, critical self-reflection, as conceptualized by Mezirow, can be defined as a type of reflection characterized by an individual’s reexamination of the pre-suppositions that inform their own beliefs, thoughts, and actions [133]. As will be shown in the §Literature Review section, critical self-reflection, in conjunction with Transformative Learning Theory, provides us with the theoretical tools necessary to systematically evaluate software that seeks to support reflection.

(39)

The two remaining types of reflection relevant to this dissertation are reflection-in-action and reflection-on-action. Donald Schön, in his seminal work The Reflective Practitioner: How Professionals Think in Action [174], made a temporal distinc-tion between two types of reflecdistinc-tion, i.e., reflecdistinc-tion-in-acdistinc-tion and reflecdistinc-tion-on-acdistinc-tion. Reflection-in-action refers to reflection on events as they are occurring, and reflection-on-action refers to retrospective reflection on events. Beyond this chapter, when the term reflection is used in this dissertation, what is being discussed is reflection-on-action (more specifically, retrospective critical self-reflection). In the special case where reflection-in-action is being discussed, it will be explicitly referred to as such.

***

In traditional contexts, many activities such as reflective writing and roleplay have proved effective in supporting reflection. The subsection that follows presents an overview of the latter activity, roleplay, the primary method operationalized in this dissertation in pursuit of supporting reflection at scale.

2.1.3 Roleplay

Roleplay has varying conceptualizations depending on context. Thus, for the sake of clarity, it is important to state that this dissertation draws on conceptualizations of roleplay from psychology. Other contexts, e.g., roleplaying games, conceive of roleplay quite differently and have no bearing on the contents of this dissertation.

Definition

Yardley-Matwiejczuk defines roleplay as follows in Role Play: Theory and Practice: “...roleplay or simulation techniques are a way of deliberately constructing an ap-proximation of aspects of a ’real life’ episode or experience, but under ’controlled’ conditions where much of the episode is initiated and/or defined by the experimenter or therapist.” [222, p.1] During a roleplay, participants, “...will act for a limited time ‘as if’ the acted-out situation were real.” [41, p. 6]

(40)

In psychotherapy, roleplay is used as a means of diagnosis, as a means of in-struction, and as a means of training. [41, p. 6] For the purposes of this dissertation, roleplay’s usage as a means of instruction and as a means of training is its most salient feature. To dispel potential confusion related to the terms instruction and training, I shall clarify their meaning in this context. Instruction, in this context, refers to demonstrating a behavior to an individual. Training, in this context, refers to teach-ing an individual a new behavior. Typical subjects for roleplay-based instruction and training include self-understanding, social skills, and more.

Roleplay-based instruction and training are used to facilitate a process not unlike the process of perspective transformation described by Transformative Learning The-ory (see the §Transformative Learning subsection). Roleplay introduces individuals to a disorienting dilemma, facilitates reflection on the dilemma, and guides individuals toward learning objectives.

Terms

For the sake of clarity, this subsection defines the following terms as they are used in this dissertation: roleplay designer, roleplay facilitator, and roleplay participant. A roleplay designer is an individual that designs a roleplay or roleplaying system. A roleplay facilitator is an entity that administers a roleplay. Traditionally, this entity is a human being. However, in this dissertation, the task of roleplay facilitator is fulfilled by autonomous online learning systems, i.e., roleplaying systems. A roleplay participant is an individual that participates in a roleplay who is neither a roleplay designer nor a roleplay facilitator for that roleplay. I defined these terms, as shown above, in order to establish a simple, common language for discussions of roleplay both in this dissertation and elsewhere.

Roleplay Induction

Designing effective roleplays is challenging. In Role Play: Theory and Practice, Yardley-Matwiejczuk observed that the quality of a roleplay is affected by the qual-ity of the procedure for inducting participants into that roleplay. Effective roleplay

(41)

induction improves engagement with and, thus, the overall impact of a roleplay on participants.

Yardley-Matwiejczuk identified 3 major roleplay induction principles in her work: Particularization, Presencing, and Personalization. Particularization is a principle concerned with the process by which all facets of a roleplay scenario that a roleplay participant should be aware of are explicitly detailed to that participant. Presencing is a principle concerned with the process by which all facets of a roleplay scenario are granted a degree of familiarity and reality in the eyes of the roleplay participant. Per-sonalization is a principle concerned with the degree to which particularized content is drawn from the participants themselves. There are design guidelines associated with each of the above roleplay induction principles. By following these guidelines, one can deploy Yardley-Matwiejczuk’s roleplay induction principles in a roleplay thereby improving participant engagement and the overall impact of the roleplay.

Roleplay induction principles feature prominently in the roleplaying systems de-veloped as part of this dissertation. As there is little evidence that these principles have been used in software design prior to this dissertation, these roleplaying systems serve as case studies for the efficacy of roleplay induction in digital spaces. The prin-ciples and guidelines that worked well were distilled into my framework for designing roleplaying systems.

***

This section presented an overview of deeper learning, reflection, and roleplay. Each of these concepts is essential for understanding this dissertation and its po-tential impact. The section that follows details relevant prior work related to this dissertation.

2.2 Literature Review

This section presents an overview of prior work related to this dissertation and orga-nized by discipline. First, the section describes existing research on reflection in the

(42)

field HCI, including a systematic review on methods for evaluating reflection. Next, from the field of education, this section details Transformative Learning Theory, the theory of adult education undergirding this dissertation’s method for evaluating reflec-tion, and debriefing, a method used in this dissertation to support reflection. Finally, the section concludes with a discussion of prior research on game design relevant to this dissertation.

2.2.1 Human-Computer Interaction

A frequent topic of inquiry in HCI research is “how to design systems (i.e., software or hardware) that achieve a desired outcome for the people that use them, e.g., us-ability, efficiency, or enjoyment.” To address such research topics, HCI researchers often design, develop, and evaluate proof-of-concept systems. As HCI is a multidis-ciplinary field, the methods used to design, develop, and evaluate such systems can be as variable as the disciplinary background of the researchers themselves. This fact certainly holds true for HCI research on the topic of reflection.

HCI researchers have been interested in reflection as a desirable system design outcome for many years — since at least the early 2000s. Seminal publications in this research area include Hällnas and Redström’s 2001 paper on slow technology [67], Sengers et. al.’s 2005 paper on reflective design [180], and Li et. al.’s 2010 paper on personal informatics [111]. As these concepts proved to be rather impactful in this area, it is worth describing them in detail. Hällnas and Redström defined slow technology as, “...technology aimed at reflection and moments of mental rest rather than efficiency in performance.” [67] Slow technology, as conceived by Hällnas and Redström, should invite people to reflect through interactions intentionally designed to be slow along a number of dimensions. Reflective design, as conceptualized by Sengers et. al., advocates for system designers to reflect on the values embedded in systems as well as the practices systems support. It also advocates for reflection as a core system design outcome for HCI. Li et. al. defined personal informatics systems as, “...those that help people collect personally relevant information for the purpose of self-reflection and gaining self-knowledge.” [111] Synonymous terms for

(43)

personal informatics include “the quantified self” ’, “self-tracking”, etc. Li et. al. [111] introduced a stage-based model of personal informatics systems that describes the dynamic between such systems and the people that use them over time. Of the three concepts described above, this dissertation is most closely aligned with reflective design both in terms of philosophy and aims.

Within HCI, the body of research addressing reflection spans a diverse set of application areas including health [39,121,162,165], education [56,151,170,225], design [7, 189, 223], art [89, 137, 179], and more. Reflection, in this body of work, is valued as a means of enabling conceptual change, as a means of enabling behavioral change, and as a good itself. Mamykina et. al. [121, 122], for example, developed MAHI, a health-monitoring application that aims to help newly diagnosed individuals with diabetes to improve their ability to reflect on past diabetes-related experiences. The transition to a life managing a chronic illness can require significant conceptual and emotional change. This ability to reflect on past experiences is considered essential for making this transition and engaging in effective diabetes management [123]. To meet its design goal, MAHI affords individuals the ability to record their diabetes-related experiences and discuss them with diabetes educators on demand. A more recent example, ClassBeacons, developed by An et. al. [2], is, “...a system that uses spatially distributed lamps to depict teachers’ ongoing performance on how they have divided their time and attention over students in the classroom.” The aim of the system is to support teachers’ reflection-in-action with respect to their performance during classroom teaching without increasing their cognitive load. This ability to reflect-in-action in this manner is widely considered to be a key competency for teachers [76].

Though the topic of reflection has generated a wealth of research from the HCI community, there are pervasive methodological issues that limit progress in this area. As noted by Sengers et. al. [180] and Baumer et. al. [14], a precise definition for reflection is rarely presented in HCI research. As a result, evaluations of reflection may be questionable and comparisons between findings may be intractably difficult.

This dissertation addresses these methodological issues by being grounded in a specific conceptualization of reflection, i.e., critical self-reflection, and by employing a

(44)

method for evaluating reflection appropriate both for the selected conceptualization of reflection and for large-scale applications. The subsection that follows describes how this method was selected and what was learned in the process.

Evaluating Reflection: A Systematic Review

This subsection presents the systematic review conducted to address Research Ques-tion #1 (RQ1). It begins by defining systematic review as the term is used here. Afterward, the subsection presents the procedure followed for conducting the system-atic review as well as findings.

Overview

Recall the following from the §Introduction chapter:

RQ1: What are the primary disadvantages of current methods for eval-uating reflection in human-computer interaction research, and how might these disadvantages be addressed in order to facilitate the study of reflec-tion at scale?

To address RQ1, I conducted a systematic review of the HCI literature on reflec-tion. Systematic reviews (a.k.a. systematic literature reviews) are a type of literature review used to systematically explore specific, clearly defined research questions us-ing evidence drawn from existus-ing research. The distinctions that separate systematic reviews from more standard literature reviews are as follows: (1) Systematic reviews are conducted by following strict, systematic, replicable protocols. (2) Systematic reviews are used to answer specific, pre-defined research questions.

The aim of the systematic review was to critically evaluate how HCI researchers have evaluated reflection in their research. For the following reasons, addressing RQ1 necessitated such a review. First, the most recent published review of the HCI lit-erature on reflection was conducted in 2014. Written by Baumer et. al. [14], it was a review of the literature up to and including all papers published in 2013. In the intervening time, the HCI literature on reflection has expanded substantially. Thus,

(45)

it was important to update the findings reported by Baumer et. al. Lastly, prior re-views of the HCI literature on reflection were general rere-views of the role of reflection in research. To date and to the best of my knowledge, no review has been published that is narrowly focused on how HCI researchers evaluate reflection in their work. Such a review is required in order to understand current methods for evaluating re-flection in HCI research.

Procedure

This subsection details the procedures I followed when conducting the systematic review including procedures for corpus generation and data analysis.

Corpus Generation

To generate a corpus of publications for the systematic review in a replicable, sys-tematic manner, I followed the protocol described in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement [139]. The PRISMA Statement was designed to help researchers improve the quality of reported systematic reviews and meta-analyses. The statement consists of a 27-item checklist and a flow diagram (see Figure 2-1). The PRISMA flow diagram is the portion of the PRISMA Statement most relevant to corpus generation. It presents a step-by-step procedure for generating a corpus for the purpose of systematic review. This procedure has four distinct phases: the Identification phase, the Screening phase, the Eligibility phase, and the Included phase. I followed this exact procedure when generating a corpus for the systematic review.

During the corpus generation procedure, I used the same search and selection criteria for research papers that Baumer et. al. [14] used in their review of the HCI literature on reflection2_{. This choice had several advantages. First, using the same}

criteria as the most recently published review eliminated the need to develop a robust set of search and selection criteria. Second, as Baumer et. al.’s search criteria placed

2_{Due to changes to the ACM Digital Library’s advanced search feature, it is no longer possible to}

perfectly replicate the search results that constitute the initial corpus. The search query I submitted is no longer valid.

(46)

Figure 2-1: Systematic Review - Completed PRISMA flow diagram for the systematic-review.

(47)

no limits on publication year, using their search criteria as-is expanded the corpus generated by their study to include recent publications. Thus, this systematic review would build on prior work in the field. Finally, using Baumer et. al.’s search criteria as-is allowed me to better analyze the results of the most recent review of the HCI literature on reflection in greater depth in relation to my research question. For the sake of completeness, I included all papers reviewed by Baumer et. al. in the final corpus regardless of whether or not I deemed that they met the selection criteria described by Baumer et. al.

The initial corpus was generated as follows (c.f., the Identification phase shown in Figure 2-1). On October 14, 2019 I submitted a search query to the ACM Dig-ital Library requesting all papers published under the H.5 classification (c.f. ACM Computing Classification System) that used “reflect,” “reflecting,” or “reflection” as a keyword. This search query generated 639 results. 44 of the publications reviewed in Baumer et. al.’s systematic review were missing from these results. To ensure that I was building on prior work, I treated the missing publications as “additional records” and added them to the initial search results. This resulted in an initial corpus of 683 papers.

The selection criteria used to prune the initial corpus was as follows. (1) Papers needed to involve studies of one or more individuals engaging in reflection. (2) Papers could not themselves be the reflections of the author or authors on some topic. (3) Papers needed to be about reflection as understood by Dewey [44], Boud [16], etc., and not, for example, optical reflection or the programming language feature of the same name (i.e., reflection).

This selection criteria was applied as follows (i.e., the Screening, Eligibility, and Included phases shown in Figure 2-1). First, all duplicate papers were removed from the corpus. This reduced the size of the corpus to 626 papers. Second, the metadata of each paper in the reduced corpus was manually compared against the selection criteria. If a paper’s metadata indicated that the paper did not meet the selection criteria, it was removed from the reduced corpus. This further reduced the size of the corpus to 277 papers. Finally, the full text of each of the 277 remaining papers was

(48)

manually compared against the selection criteria. Again, if the full text of a paper did not meet the selection criteria, it was removed from the corpus. This resulted in a final corpus of 151 papers. A summary of the entire process is presented as a completed PRISMA flow diagram in Figure 2-1.

Data Analysis

A quantitative approach to synthesizing systematic review results, i.e., meta-analysis, would have been inappropriate for this systematic review. Meta-analysis is a statistical analysis that aggregates data from similar studies and produces pooled estimates of measures common to these studies. The purpose of a meta-analysis is to produce estimates that are closer to true values than the estimates produced by any one study included in the analysis. Such a statistical method is not applicable to the data collected for this systematic review. This data, i.e. methods for evaluating reflection, is strictly qualitative. Therefore, a qualitative approach to synthesizing the systematic review results was most appropriate for this systematic review.

To synthesize the results of this systematic review into useful insights, I used reflexive thematic analysis [17].

Reflexive Thematic Analysis. Thematic analysis is a widely-used method of analyzing qualitative data that focuses on identifying themes within a data set. There are a number of extant methods for conducting a thematic analysis each with different procedures and underlying philosophies. Thus, thematic analysis could be understood as a generic term for a set of approaches for analyzing qualitative data with a focus on identifying themes within a data set. [197] The type of thematic analysis used in this dissertation is called reflexive thematic analysis.

Reflexive thematic analysis is a theoretically-flexible, systematic method for con-ducting a thematic analysis. Theoretically-flexible, in this case, means that the method is compatible with numerous theoretical frameworks and can be used to answer a wide array of research questions. [197] Braun & Clarke first introduced this method in their seminal 2006 paper [17]. Reflexive thematic analysis consists of six phases: (1) familiarization with the data, (2) coding, (3) generating initial themes,

(49)

(4) reviewing themes, (5) defining and naming themes, and (6) the write-up. During each phase, depending on what is learned, researchers may return to a previous phase of the process to refine their findings. Thus, reflexive thematic analysis is character-ized by iterations of some ordered subset of its six phases. These iterations end when the findings have been sufficiently refined.

Results

This subsection presents the results of the reflexive thematic analysis. After de-scribing the distinction between measuring and characterizing reflection, the subsec-tion summarizes how HCI researchers have been evaluating reflecsubsec-tion in their pub-lished work.

Measuring vs. Characterizing Reflection

One major insight garnered from the reflexive thematic analysis was the distinc-tion between measuring reflecdistinc-tion and characterizing reflecdistinc-tion. I define measuring reflection as the act of evaluating depth of reflection, and I define characterizing reflec-tion as the act of evaluating the themes present in reflecreflec-tions. Of the studies present in the systematic review corpus that evaluate reflection, each one either measured reflection, characterized reflection, or both. No other dimensions of reflection were evaluated besides depth and theme.

Studies where reflection was measured were far less common than studies where reflection was characterized. Table 2.1 summarizes the prevalence of measuring reflection and characterizing reflection in the systematic review corpus. Out of 151 total papers, only 20 papers presented studies where reflection was measured [2, 27, 31, 53, 62, 83, 92, 100, 116, 131, 152, 153, 159, 163, 172, 178, 188, 192, 196, 199]. Over four times as many papers (i.e. 84) presented studies where reflection was character-ized [2, 3, 7, 8, 12, 22, 24, 27, 29, 31, 32, 35, 40, 45, 47, 53, 56, 61–63, 66, 68, 77, 78, 83, 90, 96, 97, 100, 101, 104, 105, 107, 109, 116–119, 121, 125, 131, 137, 140, 142, 143, 147, 149, 155, 157, 159, 161, 163, 165, 168, 172, 173, 175, 176, 178, 179, 183, 187, 189, 191, 192, 196, 199, 200, 202, 204–210, 213, 216, 219, 223, 225, 227].

Deeper learning at scale with roleplaying systems

Deeper Learning at Scale with Roleplaying Systems

by

Pablo José Ortiz-Lampier

B.S., University of California, Santa Barbara (2011)

S.M., Massachusetts Institute of Technology (2013)

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2021

c

○ Massachusetts Institute of Technology 2021. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

October 19, 2020

Certified by . . . .

D. Fox Harrell

Professor of Digital Media and Artificial Intelligence,

Comparative Media Studies Program, and

Computer Science and Artificial Intelligence Laboratory

Thesis Supervisor

Accepted by . . . .

Leslie A. Kolodziejski

Professor of Electrical Engineering and Computer Science,

Chair, Department Committee on Graduate Students

Deeper Learning at Scale with Roleplaying Systems

by

Pablo José Ortiz-Lampier

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Motivation

1.2

Scope

1.3

Vision

1.4

Challenges

1.5

Research Questions

1.6

Hypotheses

1.7

Contributions

1.8

Overview

Chapter 2

Theoretical Framework

2.1

Overview of Key Terms: Deeper Learning,

Reflection, and Roleplay

2.1.1

Deeper Learning

2.1.2

Reflection

2.1.3

Roleplay

2.2

Literature Review

2.2.1

Human-Computer Interaction