A computational model of moral learning for autonomous vehicles

(1)

A Computational Model of Moral Learning for

Autonomous Vehicles

by

Richard Kim

B.A. Economics

University of California, Los Angeles, 2006

Submitted to the Program in Media Arts and Sciences, School of

Architecture and Planning

in partial fulfillment of the requirements for the degree of

Master of Science in Media Arts and Sciences

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

@

Massachusetts Institute of Technology 2018. All rights reserved.

Signature redacted

A u th o r . ...

.. ...

... ...

Program in Media Arts and Sciences, School of Architecture and

Planning

May 25, 2018

Signature redacted

/

Iyad Rahwan

Associate Professor of Media Arts and Sciences

Signatu

Thesis Supervisor

re redacted"

Tod Machover

Academic Head

Program in Media Arts and Sciences

Certified by ...

A ptdh

~s

OFRTECAR

(2)

(3)

A Computational Model of Moral Learning for Autonomous

Vehicles

by

Richard Kim

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning

on May 25, 2018, in partial fulfillment of the requirements for the degree of

Master of Science in Media Arts and Sciences

Abstract

We face a future of delegating many important decision making tasks to artificial in-telligence (AI) systems as we anticipate widespread adoption of autonomous systems such as autonomous vehicles (AV). However, recent string of fatal accidents involv-ing AV reminds us that delegatinvolv-ing certain decisions makinvolv-ing tasks have deep ethical complications. As a result, building ethical AI agent that makes decisions in line with human moral values has surfaced as a key challenge for Al researchers. While recent advances in deep learning in many domains of human intelligence suggests that deep learning models will also pave the way for moral learning and ethical decision making, training a deep learning model usually encompasses use of large quantities of human-labeled training data. In contrast to deep learning models, research in hu-man cognition of moral learning theorizes that the huhu-man mind is capable of learning moral values from a few, limited observations of moral judgments of other individuals and apply those values to make ethical decisions in a new and unique moral dilemma. How can we leverage the insights that we have about human moral learning to design AI agents that can rapidly infer moral values of human it interacts with? In this

work, I explore three cognitive mechanisms - abstraction, society-individual

dynam-ics, and response time analysis - to demonstrate how these mechanisms contribute to

rapid inference of moral values from limited number of observed data. I propose two Bayesian cognitive models to express these mechanisms using hierarchical Bayesian modeling framework and use large-scale ethical judgments from Moral Machine to empirically demonstrate the contributions of these mechanisms to rapid inference of individual preferences and biases in ethical decision making.

Thesis Supervisor: Iyad Rahwan

(4)

(5)

The following people served as readers for this thesis:

Signature redacted

Professor Alex 'Sandy' Pentland ...

Professor of Media Arts and Sciences

MIT Media Lab

Massachusetts Institute of Technology

Signature redacted

Joshua B. Tenenbaum ...

...

Pro essor of Computational Cognitive Science

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

(6)

Acknowledgments

I would like to extend my deepest gratitude and appreciation to my advisor

Profes-sor Iyad Rahwan. ProfesProfes-sor Rahwan gave me the chance to be at MIT Media Lab, exposed me to many intellectually challenging problems, and provided me opportu-nities to collaborate with amazingly talented individuals who I otherwise could not have met. He showed me what it is to be a true inter-disciplinary scholar who can integrate ideas from various academic disciplines including psychology, economics, and computer science; moreover, do so in a way that is respectful towards scholars of those fields while challenging their established dogmas.

I also thank my readers, Professor Alex 'Sandy' Pentland and Professor Joshua B. Tenenbaum, for their contributions to my thesis by providing me with valuable comments. I feel honored and privileged to have my work read and approved by these two distinguished scholars.

This thesis could not have come to its fruition without the amazing work of my colleagues, Edmond Awad and Sohan Dsouza, two colleagues of Scalable Cooperation who designed and built Moral Machine. As this work is highly reliant on data from Moral Machine, it is no exaggeration to state that my work would not have been possible without preceding work of Edmond Awad and Sohan Dsouza. I also owe

deep gratitude to three individuals - Max Kleiman-Weiner, Andres Abeliuk, and

Niccolo Pescetelli. This work was inspired by the original work of Max Kleinman-Weiner. Andres Abeliuk and Niccolo Pescetelli also shared invaluable expertise as I struggled through several versions of the models proposed in this work. This work also could not have come to its fruition without contributions from these three talented individuals.

My time at Media Lab has been fun and memorable thanks to my colleagues in

Scalable Cooperation: Bjarke Felbo, Morgan Frank, Matt Groh, Ziv Epstein, May Allazzani, Jordan Hoffman, Judy Shen, Abhimanyu Dubey, Esteban Moro, Manuel Cebrian, and Amna Carreiro. I will always cherish the friendships that I made here. Finally, I want to express special gratitude to my family members who have shown

(7)

unconditional support and love. I thank my parents, Yong E. Kim and Hae K. Koo, for financial and moral support the last couple years as I took temporary break in my career to pursue an academic degree at MIT. Jiyeon Lee, my wife, for unconditional love she showed me through the toughest periods while pursuing the degree. I could not have asked for a better partner than Jiyeon through this tumultuous journey. Lastly, my son, Sunyul, who turned six this year. I had to sacrifice countless evenings and weekends to research, and despite my absence, he grew up to be such a delightful and joyful child. I owe him him my apology and my gratitude.

(8)

(9)

List of Figures

2-1 (a) Unpooled model (b) Pooled model . . . . 24

2-2 (a) Partially pooled (i.e. hierarchical) model (b) The hierarchical model in plate notation. Plate N indicates that there are N variables 02 representing N experiments indexed by i. Plate N x M indicates that there are M observed variables for each of N experiments. . . . . 26

3-1 An example of Moral Machine scenario . . . . 33

3-2 Twenty Characters of Moral Machine . . . . 33

3-3 Non-demographic Factors in Moral Machine . . . . 34

3-4 Distribution of responses by language of instruction. . . . . 34

3-5 Distribution of responses by country of residence . . . . 35

3-6 An example of vector representation of a choice in a Moral Machine scenario . . . . 36

4-1 An example of a binary matrix A that decomposes the characters in Moral Machine into abstract features. Black squares indicate the pres-ence of abstract features in the characters. . . . . 41

4-2 Vector representation of abstract features of a scenario choice. .... 42

4-3 Example of Covariance Matrix . . . . 44

4-4 Graphical representation of hierarchical moral principles model . . . . 44

4-5 Posterior distribution of the model parameters . . . . 46

4-6 Benchmark 1: Character-based Logistic Regression . . . . 47

4-7 Benchmark 2: Feature-based Logistic Regression . . . . 48

(12)

4-9 Evaluation results . . . .4

5-1 Graphical illustration of the drift diffusion process. a is the decision

boundary of response Y = 1. Parameter 3 indicates a priori decision

biases as a ratio of a to determine the starting position a#. Lastly, the dashed line before the initial step of the drift process is the nondecision

tim e r . . . . 56

5-2 Graphical model representation of factor analysis drift diffusion. Plate

represents T independent trials. Node Xt represents tuple (yt, rt) that

denotes the decision yt E {0, 1} and response time rt E R+ of trial t. . 57

5-3 Histogram of response time distributions . . . . 58

5-4 Comparison of inferred weights w of FDDM (a) and logistic regression

(b ) . . . . 59

5-5 Comparison of standard deviations of the inferred net utility values

from FDDM and logistic regression. Values along the x-axis are the standard deviations of net utilities derived from FDDM. Values in the y-axis are the standard deviations of net utilities inferred from logistic regression. Points that lie above the 450 diagonal line indicate scenarios that have higher standard deviation from logistic regression. Of the

6500 points, 80.2% lie above the diagonal line. . . . . 62

5-6 (a) Scatter plot of expected utility value E[ut] and standard deviation

o0

Ut. Color of the data point represents response time. Plot is divided

into four quadrants by dividing the data by the median values of two axes. (b) Toy representation of distributions of the net utility values

in each quadrant. . . . . 63

5-7 Comparison of response times in four quadrants. Blue bars indicate

the mean response times and narrow red bars show 95% bootstrap

(n=10000) confidence intervals. . . . . 63

(13)

Chapter 1 Introduction

1.1 Background

Advances in artificial intelligence (AI) and sensor technologies have made widespread adoption of autonomous systems such as autonomous vehicles (AV) closer to a reality. Thanks to autonomous systems, we will delegate important decision making tasks to AI agents, which holds many beneficial promises. For instance, it is estimated that approximately 90% of traffic fatalities can be reduced through the widespread adoption of AV's [271.

Due to the possibility of dramatically reducing traffic fatalities, AV has garnered great excitement in the media and attracted large commercial and academic research investments. However, recent string of fatal accidents involving AV [84, 85] has re-minded the public that there are ethical and legal complications to yielding control of vehicles to Al agents. These complications has catapulted the issue of AL ethics (or machine ethics) into the forefront of Al research and has made it imperative for engineers, AI researchers, and policy makers to examine approaches to incorporate society's ethical standards into the AI's decision making processes.

For example, suppose an AV experiences a brake malfunction and as a result of the defect it faces a unfortunate choice of either killing an elderly man in his seventies inside the vehicle by swerving and crashing into a wall or killing an eight year old pedestrian girl by staying on its course. In this rare, but not an improbable,

(14)

scenario, whom should the AI agent sacrifice? How can Al researchers and engineers instill society's ethical values in the mechanistic decision making process such that the decision is deemed acceptable, or at least preferable to the alternative, by the

public at large?

Interestingly, the idea of codifying ethical behaviors in an artificially intelligent system is not a new concept. In his work "I, Robot," published in 1950, the author Isaac Asimov introduced Three Laws of Robotics that governed robot's behaviors:

* A robot may not injure a human being or, through inaction, allow a human being to come to harm.

* A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

* A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Today, Asimov's Three Laws is considered a thought provoking introduction to the notion of AI ethics, but codifying these laws into a computer program to create a truly ethical AI agent is not considered a viable approach [2, 86].

Recently, machine learning, a branch of computer science that uses statistical techniques to learn from data, has become front and center of AI research. Within the field of machine learning, deep learning, a class of models that uses artificial neural network architectures inspired by biological system of intelligence, have demonstrated impressive feats in various domains of human intelligence including computer vision

[761, machine translation [89], speech generation [60], and many others. In fact, in

a domain as abstract as human emotion, deep learning models have shown close to human-level proficiency in detecting human emotions in natural language text [231. Most of these achievements were once considered beyond the reaches of machine intelligence just a couple of decades ago.

The remarkable achievements by deep learning models may lead one to conclude that they will also pave the way for research in learning ethical values and ethical

(15)

decision making by machines. However, Al researchers face several challenges in applying deep learning models in the domain of ethics. Most notably, training a

deep learning model often requires extremely large quantities of human-labeled data

-usually much larger than other machine learning models. Although researchers in deep learning models have made recent advances that enable their models to be trained with smaller number of examples [83, 72], this constraint remains a key challenge in deep learning research.

In contrast to the state-of-the-art deep learning models, human mind is capable of learning from a much smaller number of noisy and sparse examples [49]. Commonly referred as "poverty of stimulus," this remarkable capacity of the human mind was first noted in language acquisition of children. Linguists and developmental psychol-ogist observed that children demonstrate ability to rapidly learn morphological rules and hierarchical phrase structure of a language from limited observations [14] and that their ability to generate sentences and understanding meanings of new words cannot be merely explained by experience alone [521. Evidence from studies in hu-man learning suggests that the ability to learn from limited experience and to apply the lessons across new instances cuts across many domains of human intelligence [79] including learning moral values from a limited observations of ethical decisions and their outcomes [54, 88, 45].

Leveraging this insight about the human mind, how can we build an AI agent such that it has the capacity to learn moral values and societal preferences from limited interactions with people and observations of their actions? In this thesis, I explore three mechanisms that guide moral learning in humans and express them as a Bayesian model of cognition using hierarchical Bayesian modeling framework. My goal is to demonstrate how we can engineer ethical Al agents that can rapidly learn moral values and overcome "poverty of stimulus" [14].

First, children do not learn to make ethical decisions by weighing the trade-offs of values associated with specific characters and consequences involved in an instance

of moral dilemmas. Rather, they learn generalizations of moral values from a few

(16)

many novel instances of dilemmas that involve different set of characters and different degrees of harm. For example, suppose a child is reprimanded by his parents for tormenting a stray cat. From this experience, a typical child will learn not only to value the specific cat involved but will also learn not to cause harm to innocent animals in general. He may further generalize the lesson to a principle of respecting all living creatures. As a result, his parents should not expect their child to harm another species of animal such as a bird, which results in a separate lesson about respecting birds. The capacity for abstraction enables a child to take moral values learned from a single instance of moral dilemma and generalize them across infinitely unique instances of moral dilemmas.

Second, moral values and behavioral norms vary enormously from one society to another. For example, while certain societies view preferential treatment of family and friends with disdain and consider it as an example of corruption, other societies deem such preferential treatment as a moral obligation. Furthermore, although individual preferences and biases may be idiosyncratic due to unique life experiences of the individual, people belonging to the same ethnic or national groups share a similar set of values and behavioral norms. Therefore, one can often infer a great deal of information about a person's behavioral patterns and his values from his ethnic or cultural information.

Finally, when one observes a decision being made by another person facing a moral dilemma, response time (RT), the time she took to arrive at the decision, provides a clue about the inherent difficulty of the dilemma and the latent confidence level of the decision maker. For example, when faced with the dilemma to save the life of a five-year old girl versus that of an adult who is convicted criminal, most people are quick to make their decisions in favor of the little girl. However, when the dilemma is between a five-year girl versus three innocent adults, the trade-offs of the values are not as obvious for many people. As a result, people assess consequences of their decisions more carefully and methodically, which leads to accumulation of response time.

(17)

in predicting human decisions in moral dilemma found in Moral Machine, a web

application that collects human judgments in ethical dilemmas involving AV. 1 A

recent study of public sentiment about AV reveals that endowing AI driver with societal preferences and capacity to make ethical decisions is an important step before AV can undergo widespread commercial adoption [12]. In light of this recent finding, the models proposed here, in combination with Moral Machine platform, can serve as a meaningful demonstration of an approach that AI researchers and engineers should adopt and further explore in building ethical Al agents for AV and other autonomous systems.

Moreover, while Moral Machine scenarios are examples of moral dilemmas, they do not provide comprehensive and realistic picture of ethical complications that may

arise in the real-world. Real-world moral dilemmas rarely have binary choices of

actions with binary outcomes. The full extent of the consequences of our choices in dilemmas are never truly known in advance, and they are revealed in sequences, which enable people to adjust and react accordingly. In light of the limitations posed

by Moral Machine data, I note that the goal of this work is not to propose a general

model of ethical decision-making that incorporates all learning mechanisms, behav-ioral norms, and moral values, in a complex and comprehensive measure in which the details can vary greatly depending on the philosophical tradition (e.g. deontology vs consequentialism). Furthermore, it is also not the goal of this thesis to propose a pre-scriptive and normative form of ethical decision making that can be used as a module in a working AV system. Throughout this thesis, I sidestep philosophical discussion about the normative form of ethics. Rather, my focus here is to explore how three

mechanisms - abstraction, society-individual dynamics, and response time - can play

important roles in Bayesian models of moral learning and show how insights from cognitive psychology can be applied to building ethical AI agents for autonomous systems.

(18)

1.2 Summary of Contributions

Contributions of this thesis can be summarized as follows:

e Explore the latest Bayesian framework to propose a new computational model

of moral learning in order to analyze and predict human judgments in moral dilemmas in Moral Machine and empirically demonstrate the robustness of the model's capacity to rapidly infer individual preferences in abstract dimensions of moral dilemma.

e Introduce a principled approach to simultaneously inferring the idiosyncratic

preferences of individuals in moral judgments and the group-norm that repre-sents aggregation of those individual preferences.

* Combine sequential sampling model with factor analysis to propose a new Bayesian model of cognition to characterize how the mind weighs the features of moral dilemmas to arrive at a decisions.

* Using reaction time as an additional evidence, I show that the new model that incorporates reaction times yields quicker inference of societal preferences in the outcome of moral dilemmas.

* Demonstrate an approach to replicate "poverty of stimulus" in moral learning of the human mind by combining abstraction, society-individual dynamics, and response time.

e Demonstrate a step toward engineering AV system that can learn to make

ethi-cal decisions on the road from limited interactions and observations of behaviors of human drivers using a novel data from Moral Machine.

1.3 Outline

In Chapter 2, I provide a brief explanation of hierarchical Bayesian modeling frame-work and a discussion on innovations in approximate Bayesian inference

(19)

method-ologies. This chapter will provide the methodological foundation for the rest of the thesis. As a prelude to the models introduced in the later chapters, Chapter 3 contains descriptions of Moral Machine including the discussion on the original motivations behind deployment of Moral Machine. I also describe representation of data from Moral Machine relevant to the models that I introduce in later chapters, which will serve as a key step in understanding the features of the model in Chapter 4 and Chapter 5.

In Chapter 4, I present hierarchical moral principles model (HMPM). After briefly discussing the background of hierarchical Bayesian model in moral learning in Section 4.1, I provide detailed description of various elements of this model including abstract features of moral dilemma and representation of individual as a unit of society. In Section 4.3, I report on evaluation of HMPM in predicting human decisions against three benchmark models using data from Moral Machine.

In Chapter 5, I introduce of a Bayesian model that incorporates response time in moral decision making as evidence for model parameter inference. The chapter begins with a brief introduction to drift-diffusion model (DDM) in Section 5.1, which is followed by a section that provides details of factor drift diffusion model (FDDM) that combines DDM with aspects of the hierarchical moral principle model in Chapter 4. Using data from Moral Machine, I show that FDDM is capable of inferring parameter values using response time information.

Finally, in Chapter 6, I conclude the thesis with a brief discussion to summarize the main contributions of this work. In addition, I note the limitations of the approaches presented here and propose interesting future directions to address these limitations.

(20)

(21)

Chapter 2 Hierarchical Bayesian Models and

Approximate Bayesian Inference

Background knowledge plays an important role in acquisition of new knowledge in the human mind. Structural representation of the background knowledge guides the mind to restrict the hypothesis space to consider, which enables the mind to learn more from sparse observational data. Al researchers commonly refers to this as "inductive bias," and statisticians call it "priors." [79].

Bayesian inference provides a principled approach to express the relationship be-tween background knowledge and updated understanding of the world after new data has been observed. Hierarchical Bayesian modeling framework [28] takes this idea and offers a formalism to express uncertainty and structural representation of background knowledge. Due to its expressiveness of how the mind uses prior information to aid in learning, hierarchical Bayesian inference has been widely explored in cognitive research.

In addition to hierarchical Bayesian modeling framework, many approximate Bayesian inference methodologies have been introduced, notably Markov Chain Monte Carlo and Variation Inference. These innovations have enabled Bayesian modelers to ex-plore large statistical models with complex relationships between variables and have contributed to broader applications of Bayesian statistical models in scientific research including cognitive science [80].

(22)

The aim of this chapter is to provide a brief explanation of hierarchical Bayesian modeling framework in Section 2.1 using a simple example and to offer a short sur-vey of the innovations in approximate Bayesian inference methodologies in Section 2.2. While it is not meant to be a thorough survey of these two rich subjects, this chapter will serve as a general foundation for many concepts that will be introduced in subsequent chapters.

2.1 Hierarchical Bayesian Models

At its core, Bayesian inference involves computation of joint probability

distribu-tion P(Y, 0) of observed data Y and latent parameters 6. Uncertainty about latent

variables 0 is expressed as prior distribution P(0), and after observing the data,

up-dated belief about the variables is characterized by posterior distribution following the Bayes' rule:

P(OIY) =

P(0)P(YIO)

(2.1)

P(Y)

where P(Y 6) is the probability of the data given the hypothesis 0, also called the

likelihood. Evidence P(Y) is a marginalized probability P(Y) = fe P(Y, 0)dO where

E

is the entire hypothesis space. Because computation of P(Y) is intractable one

must resort to approximate Bayesian inference methodologies, which I discuss in the following section.

In Bayesian inference, computation of posterior distribution P(OIY) is highly de-pendent on the choice of prior distribution P(O), which is the primary criticisms leveled against Bayesian statistics by proponents of classical statistics, which does not specify prior distribution. Nonetheless, because hierarchical Bayesian inference offers a principled approach to incorporating background information to inference based on new evidence, it has been embraced as a popular mean to characterize the

human mind and how it learns [791.

Below, we demonstrate an application of hierarchical Bayesian inference using a simple example.

(23)

2.1.1 Example: Beta-Binomial Model

Conjugate priors are a class of probability distributions in the exponential family of distributions (e.g. Normal, Gamma, Beta, Poisson, etc.) wherein posterior distribu-tion follows the same parameter form as the prior distribudistribu-tion. Due to this property, use of conjugate priors offer a mathematically convenient, closed-form solution to derivation of posterior distribution. Here, we use Beta-Binomial model in which Beta distribution is a conjugate prior of Binomial distribution. We borrow a popular ex-ample from Bayesian Data Analysis [28] to demonstrate how one applies hierarchical Bayesian modeling framework to a solve an inference challenge.

In a clinical experiment, researchers are interested in estimating the probability that a rat develops a tumor in a population of female laboratory rats that did not receive a dose of drug treatment. In the latest experiment, four out of fourteen rates developed a tumor, which is equivalent to 28.5% of rats in this experiment. In addition to this result, researchers obtained results of seventy other experiments (Table 2.1). How can researchers incorporate data obtained from the latest experiment and the results from the previous experiments to arrive at more accurate predictive model to estimate the probability that a rat develops a tumor?

0/20 0/20 0/20 0/20 0/20 0/20 0/20 0/19 0/19 0/19 0/19 0/18 0/18 0/17 1/20 1/20 1/20 1/20 1/19 1/19 1/18 1/18 2/25 2/24 2/23 2/20 2/20 2/20 2/20 2/20 2/20 1/10 5/49 2/19 5/46 3/27 2/17 7/49 7/47 3/20 3/20 2/13 9/48 10/50 4/20 4/20 4/20 4/20 4/20 4/20 4/20 10/48 4/19 4/19 4/19 5/22 11/46 12/49 5/20 5/20 6/23 5/19 6/22 6/20 6/20 6/20 16/52 15/47 15/46 9/24

Table 2.1: Tumor incidents in rats in seventy clinical experiments [78]. The numbers on the left sides denote the number of rats that developed a tumor; the numbers on the right sides denote the total sample population of rats.

Consider a model in which each experiment i is an independent experiment that provides no insight about other experiments. We call this unpooled model because the probability that a rat develops a tumor in i-th experiment 0, is expressed as an independent Beta distribution with hyperparameters ozi and /3. The full joint

(24)

al 01 a2 132 aN ON

a

3

01

02 ON

k111 Yi k, Y1

(a) (b)

Figure 2-1: (a) Unpooled model (b) Pooled model

posterior distribution of the unpooled model parameters is

N Al

P(0, a,

31Y)

oc

17

P(Oilai, #3)P(aj)P(03)

17

P(y,j 10,) (2.2)

where 0 = (01,...,ON), a (a1,--, aN), and /3 ( 3

1,.--, 3N). Observed variable

Y = (Yi, ... , YN) where y, is a vector of size M where each element yij represent

outcome for j-th rat in i-th experiment. This Bayesian network is schematically

represented as a graphical model shown in Figure 2-la.

Here, researchers gain no insight into the parameter value of the new experiment from the results of previous experiments. The probability of rat developing a tumor

in the latest experiment is 071 - Beta(a

+

4, 3

+

10) with the expectation of a+4

If one assumes uninformative Beta prior with hyperparameters a = 0,/3 = 0, then

the expected probability is equivalent to the solution achieved via classical statistics,

4 = 0.285.

14

Alternatively, let us consider a model in which all outcomes of seventy experiments are modeled as instances of a single Binomial distribution parameterized by 0, which

has a Beta prior with hyperparameters a and 3. We call this pooled model because it

only accounts for a single population distribution. Posterior distribution of the model parameters is

N M

P(0, a, /, Y) oc P(0 1a, )P(a)P()

fJ

]7

P(yi,j 1)

(2.3)

% j

(25)

Figure 2-1b provides graphical representation of this model.

Similar to the unpooled model, posterior distributions of the model parameters can also be derived using conjugacy of Beta distribution to Binomial likelihood. Hence,

the posterior distribution is 6 ~ Beta(a + 267, /3+ 1472), and assuming

hyperparam-eters a = 0, 3= 0, the expectation is a+263 = 0.153. Note that the expected

value of the probability is substantially smaller than the previous model. Also, variance of the population parameter

(a + 267)(3 + 1472)

(a +

/

+ 1739)2_(a₊

_/

_{+ 1739)}

suggests that the outcome of the latest experiment is an extreme outlier that cannot be explained by this model. This is due to the pooled model ignoring the differences in the rats and conditions of the experiments.

Finally, we consider a hierarchical Bayesian modeling approach where we separate the parameters of individual experiments and the population statistics. We call this

model partially pooled model. We set the parameters 0 (1, ... , ON) of the individual

experiments using Beta distributions similar to the unpooled model. However, instead

of each i-th experiment having its own hyperparameters a, and /i, we pool the

hy-perparameters into a set of population parameters a and / shared by all experments.

The full joint posterior distribution is

N M

P(0, a,

/,

Y) 0c P(a)P(3)

fJ

P(Oia,

/3)

]7

P(yg 10)

(2.5)

i j

where 0 = (61, ON). This model is displayed as a graphical model shown in Figure

2-2.

In this model, we derive the probability of a rat developing a tumor in experiment

71 as 071 ~ Beta(a + 4, / + 10), not unlike the solution in the unpooled model.

However, we use a more informative prior on a and 3 that is derived from properties

of Beta distribution:

E[01(1 - E[O])

(26)

a

/

N

0(

(b)

Figure 2-2: (a) Partially pooled (i.e. hierarchical) model (b) The in plate notation. Plate N indicates that there are N variables

experiments indexed by i. Plate N x.Al indicates that there are M

for each of N experiments.

3

=

(a + 3)(1 - E[])

hierarchical model 9i representing N observed variables (2.6) (2.7)

resulting in a = 1.4 and 3 = 8.6. As a result, the posterior distribution on the

parameter of the new experiment is 671 Beta(5.4, 18.6), which has expectation of

E[67 1] = 0.223. Contrast this figure with the original estimation using a classical

statistical method -₁₄ = 0.285. Taking into account historical results of the past

ex-periments, researchers can arrive at an estimation that is closer to the population mean. Yet, the model also enables the researchers to respect the idiosyncratic differ-ences in rats and laboratory conditions in every experiment to derive unique solutions to parameter estimation in each experiment.

Although the example here has a closed-form solution thanks to the conjugacy of Beta distribution to Binomial likelihood, most statistical modeling tasks must extend beyond conjugate priors. In the following section, we turn our attention to inference frameworks that has enabled Bayesian modelers to incorporate more complex and

arbitrary relationships between model parameters.

(a)

(27)

2.2 Approximate Bayesian Inference

Although convenient in finding solution to posterior distribution, conjugate priors alone cannot express large and complex Bayesian models. For more sophisticated models that involves large interdependent relationships between model variables, many sophisticated algorithms have been introduced to approximately infer posterior

distributions. Principally, these algorithms belong to two families of methodology

-Markov Chain Monte Carlo and Variational Inference (or, Variational Bayes).

2.2.1 Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is a general methodology for drawing sample values from the posterior distribution of model parameters using approximate

distri-bution. The methodology draws sequence of values 01, 02, o3, ... that form a Markov

Chain. More specifically, the algorithm starts with some starting point value 00, and

then for each t = 1, 2, ... , it draws 0' from a transition distribution Pt(OtI1-l) that

depends on the previous draw 0 1 to create the sequence of values. Although it is an approximation methodology, MCMC algorithms guarantee that the sequence

asymptotically converge toward the the true posterior distribution 1681.

The first MCMC algorithm, which was hailed as one of "The Top Ten Algorithms

of the 20th" Century[19], is the Metropolis algorithm [531. Metropolis algorithm

begins by starting from a random first value 00. Then, for t = 1, 2, ... , its draw a

proposed value 0* from from current value Ot using a transition distribution, typically

Normal or Uniform distribution with a step size at that determines the strength of the change. We set Ot = 0* with acceptance probability

A(6*10t) =min 1, (2.8)

otherwise, we maintain Ot = Ot-1. It's extension, Metropolis-Hastings algorithm,

extends the earlier version by enabling transition distribution to be specified to any arbitrary distribution, other than Normal or Uniform distribution, as long as it meets

(28)

certain restrictions [34].

Since the introduction of Metropolis-Hastings algorithm, many other MCMC

al-gorithms have been proposed. Slice sampling [591 eliminates the need to tune the step

size in Metropolis algorithm by using auxiliary variable to automatically adjust the step size to the local contour of the density function. A recent addition to the

fam-ily of MCMC algorithm is Hamiltonian Monte Carlo (HMC) [21, 58], which borrows

ideas from Hamiltonian physics to reduce the random walk property of Metropolis-Hastings algorithm in order to rapidly move through the target distribution. It is also frequently called Hybrid Monte Carlo as the algorithm combines stochastic and deterministic sequence of steps. HMC has been shown to be more robust against high dimensional problems than the Metropolis-Hastings [4]. Finally, no-U-turn sampler

[38] further extends HMC by automatically adjusting the number of steps between stochastic and deterministic steps.

2.2.2 Variational Inference

While MCMC algorithms are indispensable tools for today's Bayesian statisticians, they are computationally intensive. Consequently, certain large-scale problems with large data sets or those requiring complex models with hundreds, if not thousands, of latent values become intractable to solve using MCMC algorithms. An alternative

to MCMC algorithms, Variational Inference (or, Variational Bayes)

1401

is a class

of algorithms that resorts to optimization to approximate the posterior distribution instead of stochastic simulation using Markov chain. Variational Inference (VI) al-gorithms do not provide asymptotic guarantees; however, they tend to significantly faster than MCMC, which makes them attractive tools for large-scale problems [6].

The main idea behind VI is to search for a set of parameters

4

in another

distri-bution Q(#) that minimizes the Kullback-Liebler (KL) divergence between Q(#) and

the original posterior distribution P(OIY). VI algorithm is formally summarized as

**Q*(#)**

= arg min KL(Q(O)I|P(OIY)) (2.9)

(29)

where KL divergence is defined as

KL(Q(#)

I

P(OIY))

=

E[log

Q(#)] -

E[log P(OIY)]

=

E[log

Q(#)] -

E[log P(O, Y)] + log P(Y)

However, because evidence P(Y) cannot be easily derived, directly minimizing the the KL divergence is usually not easy. Instead, one maximizes the Evidence Lower-Bound

(ELBO),

ELBO(Q) = E[log P(O, Y)] - E[log Q(#)] (2.10)

which is equivalent to negative value of KL divergence plus the evidence P(Y), which is a constant with respect to Q(#). One can show that maximization of the ELBO with respect to Q(O) is equivalent to minimization of the KL divergence and that

optimal approximation

Q*

(q) is close approximation, with bound P(Y), of the original

posterior distribution P(91Y) 16].

Until recently, with the introduction of Stochastic Variational Inference (SVI) [371 algorithms starting with Black-box Variational Inference (BBVI) [641, application of VI has gained increased traction among Bayesian statisticians as an alternative to MCMC. This is because SVI uses simulation methods to automatically compute

the gradient of the ELBO with high precision. As a result, researchers are able

to approximate posterior distribution without painstakingly deriving the gradient information of the ELBO on any model that uses continuous probability distributions.

The latest method is Automatic Differentiation Variational Inference (ADVI) [471,

which has significantly improved the precision of the gradient estimation.

While SVI is faster than MCMC, it does not have asymptotic guarantee that

MCMC provides. As it relies on approximation of the target distribution, SVI is

susceptible to getting stuck in the local maxima of the ELBO. This is one of the

principle issues that researchers are currently aiming to address [6]. Finally, SVI

methodologies utilize gradient of ELBO, which requires that the density functions of both the original posterior distribution P(OIY) and approximation Q(#) only contain continuous distributions. Extending SVI to discrete distribution is another area of

(30)

(31)

Chapter 3 Moral Machine Data

Proliferation of autonomous vehicles in our roads in the near future have made it imperative that we understand ethical implications of yielding control over to artificial intelligence. To this end, Moral Machine was deployed to collect data in order to understand society's preferences in how AI should resolve moral dilemmas in the road. In this chapter, we describe the details of Moral Machine platform and how we intend to use its data for models introduced in later chapters.

In Section 3.1, I describe the background behind the original goal of Moral Ma-chine. I clarify the differences in the goal of this work and the original work that motivated the deployment of Moral Machine. Afterwards, in Section 3.2, I describe salient aspects of Moral Machine for our purpose, and in Section 3.3, I explain data representation of Moral Machine scenarios that will become important for under-standing the models that we will introduce in later chapters.

3.1 Background

Moral Machines is a web platform built with the goal of aggregating human perception of ethical decisions made by autonomous vehicles (AV) and facilitating discussions about the ethical implications of widespread adoption of AV. It was built by Edmond Awad and Sohan Dsouza of Scalable Cooperation research group with guidance from Iyad Rahwan, Jean-FranAgois Bonnefon, Azim Shariff and contributions from many

(32)

other members of Scalable Cooperation.

Despite popular misconception by critics [74, 26, 67], Moral Machine was never built with the goal of crowdsourcing human judgments in moral dilemmas in order to use them as a training data for building ethical AV. Rather, it was built to collect people's preferences about how an AV should make a decision when faced with a moral dilemma and to understand how those preferences are different from society-to-society at global scale.

Since its launch in June 2016, Moral Machine has garnered intense public interest thanks to the though provoking and timely nature of the topic it addresses. As a result, it has served as a catalyst for vigorous discussions in the media about Al ethics and public policy surrounding AV. As of April 2018, Moral Machine has collected over 40 million responses from approximately 4 million unique respondents from over 200 countries and territories around the world. Measured in number of participants, it is considered the largest psychological experiment in ethics.

In this work, our goal is not to perform comprehensive analysis of all the responses recorded by Moral Machine nor to uncover cross-cultural differences in ethics between people of different countries. Rather, the goal here is to utilize the response data from Moral Machine to demonstrate learning of moral values using hierarchical Bayesian models. With this goal in mind, we use different representation of data from Moral Machine from that of the original work, and our understanding of the salient design features of Moral Machine will be critical to understanding how we use this data to evaluate the models discussed in later chapters.

3.2 Description of Moral Machine Scenarios

When a visitor participates in the judge mode of the Moral Machine platform, she is requested to complete thirteen scenarios such as an example shown in Figure 3-1. In each scenario, respondent is asked to choose one of two outcomes that they personally deem preferable to another.

(33)

-A-4 MORA

MACHINE

What should the self-driving car do?

111111111 I~ R 111I

Figure 3-1: An example of Moral Machine scenario

Man Woman Boy Girl Senior Senior Baby Pregnant Large Large

Man Woman Woman Man Woman

Fit Fit Business Business Male Female Homeless Criminal Dog Cat

Man Woman Man Woman Doctor Doctor

Figure 3-2: Twenty Characters of Moral Machine

3-2) that represent people of various demographic backgrounds found in a general

population.

Besides the characters, Moral Machine incorporates three non-demographic

fea-tures - interventionism, relationship to vehicle, and concern for law. Every Moral

Machine scenario involves choosing to stay or to swerve. Some Moral Machine sce-narios require choosing to save characters inside an AV or pedestrian characters who are in a crosswalk. Finally, in some scenarios, pedestrians may be abiding by the law

by crossing at the green light or violating the law by crossing at the red light. Figure 3-3 shows how these factors are represented in Moral Machine scenarios.

In addition, each time a scenario is shown to a respondent, Moral Machine cap-tures response time, number of miliseconds that the respondent waited to submit an

(34)

Passenger r

vs _-7 V

Pedestrian

(or Green Light)

Figure 3-3: Non-demographic Factors in Moral Machine

answer. This information is recorded along with the features of the scenario and the respondent's answer.

Since its launch, Moral Machine platform has been translated into eight languages

- French, German, Spanish, Chinese, Japanese, Korean, Arabic, and Russian - in

order to ensure that the platform captures cross-cultural variations in the perceptions of ethical decisions. Figure 3-4 shows distribution of the respondents by language that was used to partake in the Moral Machine experiment.

0.8 0.7 0.6 c 0.5 a 0.4 0 ae 0.3 0.2

0.1-0.0 _English _{Spanish Japanese Portuguese German} _Russian French - ,--Chinese Korean Arabic

Figure 3-4: Distribution of responses by language of instruction.

Moral Machine also attempts to record the approximate locations of the respon-dent by tracking tIP addresses of the devices that they used to log on to the website. Using the IP addresses, we are able to locate the country where the respondents were

(35)

from more than 230 countries and territories around the world visited the website. Figure 3-5 shows distributions of respondents by country.

0.30 0.25 C 0.20-0 0.15-0 * 0.10. 0.0

3.3 Data Description

In this work, I represent respondent's choice in a Moral Machine scenario as a random

variable Y and count vector

e,

of twenty-four dimensions in which each element of the vector denotes the number of salient features associated with choice y. Together,

a scenario is a set of vectors

e=

{

0

,

1

}, in which Y = 0 denotes the choice of stay,

and Y = 1 denotes the choice of swerve.

Figure 3-6 guides the discussion with an example of the vector representation of a scenario. The choice of allowing the AV to go straight will save a man, a woman, and a boy who are passengers. Therefore, we present this choice as vector

e

₀

with values of 1 in elements of the vector that represent these characters. The element of

"passenger" is encoded with value of 3 to denote that there are three passengers. On the other hand, the choice of swerving will save two elderly men and one elderly woman who are crossing at the red light. We represent the choice to swerve by assigning the first element of the vector with value of 1. Similar to the passenger variable, we represent the fact that three characters are crossing at a red light with

(36)

Swerve Man Woman Boy S enior Man Senior Woman I Passenger - Red Light

Figure 3-6: An example of vector representation of a choice in a Moral Machine

scenario

3.4 Conclusion

I used this chapter to thoroughly describe the salient features of Moral Machine

that are relevant to this work, which is exploration of cognitive mechanism in moral

learning. In Chapter 4 and Chapter 5, we will use the vector representation of scenario

as a basis to build cognitive models of decision making by Moral Machine respondents.

Although I limited the discussions to aspects that are relevant to this work, Moral

Machine is a rich and sophisticated web platform with many aspects that we did not

explore here. For example, it has a section in which visitors can design and submit

their own scenarios and a section in which visitors can come together to engage in a

discussion about the dilemma posed by a user-generated scenarios. These aspects of

(37)

Chapter 4 A Computational Model of Moral

Learning

Moral aptitude is the capacity to know what is right and wrong in a moral dilemma. While some aspects of the moral aptitude may be innate in our nature, it is also learned through examples and shaped by our experiences. However, moral aptitude also can't be explained by our past experiences alone because developmental studies show that even infants exhibit some capacity for moral aptitude. What intuitions can we employ from what we know about human moral learning to build an Al system that can rapidly learn from limited number of examples?

Here, we study two mechanisms that enable the human mind to rapidly infer moral values of others and express them through hierarchical Bayesian modeling framework. Employing these mechanisms, we propose a Bayesian model of moral judgment to characterize human judgments in Moral Machine scenarios and demonstrate the ben-efits of these mechanisms by testing the model's capacity to learn using data from Moral Machine.

In Section 4.1, I briefly explain the background of the challenge posed by moral learning. This section also includes an overview of the recent framework introduced in cognitive psychology literature that inspired this work. In Section 3.2, I discuss the details of the Hierarchical Moral Principles model, which is followed by Section 4.3 where we evaluate the model's predictive performance against various benchmarks.

(38)

Finally, I conclude this chapter in Section 4.4 with a brief discussion about the limi-tations of our approach and potential future extensions.

This chapter is based on a work to be published in the Proceedings of First Annual

AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

4.1 Background

Fundamental to the basic concept of morality is system of measuring trade-offs in welfare of individuals in the consequences of our decisions in moral dilemmas. Study of moral decision making in young children suggests that children at an early age can base their decisions by weighing the trade-off of utilities of other individuals

[46, 16, 32, 33].

While certain aspects of human morality may be innate to our nature, evidence points to substantial reliance on learning as a mean to acquire moral aptitude [561. We know this to be true because how we weigh the trade-offs is shaped to a large

extent by society and culture in which we are raised [5, 35, 391.

What is remarkable about human moral aptitude is that the despite the infinite variations dilemmas we face wherein the individuals involved and degrees of harm may differ in every circumstance, we are able to aptly weigh the trade-offs of those involved to make an ethically sound decision. Putting aside minority of individuals who are psychologically impaired, the vast majority of individuals are morally apt, and even those whose actions are inconsistent with their own moral compass know that their actions violate ethical standards, which triggers feeling of guilt and shame. However, how does the human mind learn to make ethical judgments from limited amount of experience? In building autonomous systems such as self-driving cars, what insights about the human mind and the way in which it learns inform us about building ethical AI agent with human-like learning capacity?

Our understanding of moral learning draws on an analogy to the problem of learn-ing abstract knowledge in other domains of human intelligence such as llearn-inguistic gram-mar or morphological rules. Our mind is capable to making powerful generalization

(39)

guided by our capacity for abstraction, building abstract structured knowledge, which enable us to learn beyond our limited experience. The power of abstraction in aiding our learning has been described as "the blessings of abstraction" [79]. This powerful mechanism also extends to our capacity to weigh the trade-offs in moral judgment through abstract representations of people and situations, making judgments across multiple variations of moral dilemma tractable based on limited past experiences

[55, 57].

How one weighs the abstract concepts to make moral judgments can vary widely from one culture to another. Studies by anthropologists have shown that societies across different regions and time periods hold widely divergent views about what

actions are ethical [36, 39, 5]. For example, certain societies strongly emphasize

respect for the elderly while others focus on protecting the young. These views in a society are what we refer to as the society's group norms. Every individual, as a member of society, has shared views with other members of the society based on shared collective experience; as a result, we are able to infer preferences of an individuals based on the ethnic, cultural, and religious information [41, 20]. While judgments based on demographic attributes such as these are commonly called "stereotypes," they can be a powerful mechanism that aides in inferring the preferences of another individual [51, 8]. We call this mechanism society-individual dynamic throughout the rest of this work.

Knowing what we know about how abstraction and society-individual dynamics contribute to learning moral values. How can we formalize these mechanisms such that they can be expressed as parts of a computational model of moral learning? A recent work from the cognitive psychology literature provides a framework to model the dynamics of moral learning by agents through interactions with other agents in their environment [45]. This framework characterizes ethical decisions as utility maximizing choice over a set of outcomes whose values are computed from weights people place on abstract moral concepts such as "kin" or "reciprocal relationship." In addition, given the complex dynamics of interactions between individuals in a group, the framework provides an approach to model how an individual's moral preferences,

(40)

and the actions resulting from them, lead to a development of the group's shared moral principles (i.e. group norms) through hierarchical Bayesian modeling framework.

In this work, we explore an application of the framework introduced in [45] to study the two mechanisms of moral learning, abstraction and society-individual

dy-namics, using real-world human ethical decisions in Moral Machine scenarios. We

characterize moral judgment as a net utility maximizing decision over a function that computes trade-offs of values in the choices of Moral Machine scenarios, and these values are computed via the weights that people place along abstract dimensions of the scenarios. Furthermore, we represent individual Moral Machine respondents as members of group defined by country. Exploiting the hierarchical structure of re-spondents and countries, we show that hierarchical Bayesian model [281 is a powerful modeling methodology to characterize rapid moral learning from sparse and noisy data.

4.2 Hierarchical Moral Principle Model

We introduce hierarchical moral principle model (HMPM), which is an instance of hierarchical Bayesion model that incorporates the two mechanisms in moral learning

described above. HMPM describes how a moral agent arrives at a decision in a

binary-choice moral dilemma such as those depicted in Moral Machine scenario. Our aim is to model how Moral Machine respondents arrive at their decisions based on the values that they place on abstract dimensions of moral dilemma. For instance, when a respondent chooses to save a female doctor in a scenario over an adult male, this decision is in part due to the value that the respondent places on the abstract concept of Medical, a rare and valuable concept in society associated with improvement of social welfare. The abstract concept of Female gender would also be a factor in his or her decision.

(41)

4.2.1 Feature Abstraction

In Moral Machine, twenty characters share many abstract features such as Female,

Senior, and Non-human. Hence, the original character vector

E8

can be decomposed

into a new vector in the abstract feature space Ay C ND where D < K via feature

mapping F :

E

- A. In this work, we use a linear mapping F(E) = AE where A is

a 18 x 24 binary matrix shown in Figure 4-1.

Intervene Male Female Young Senior Infancy Pregnancy Large Fit Working Medical Homeless Criminal Human Animal Passenger Law Abiding Law Violating

Figure 4-1: An example of a binary matrix A that decomposes the characters in Moral Machine into abstract features. Black squares indicate the presence of abstract features in the characters.

Figure 4-2 demonstrates as an example of how the original vectors in the Moral

Machine character space

E

is mapped on to a new state vector in the abstract feature

space A.

Critics of our approach may point out that Moral Machine characters are already abstractions of people in the real-world and the scenarios are already simplifications of true moral dilemmas in the road. However, given that our goal is to demonstrate contribution of abstraction to moral learning, deeper levels of abstraction of Moral Machine characters sufficiently serves this goal. Moreover, we do not make the claim that the feature abstraction shown in Figure 4-1 is the definitive feature mapping that best describes the cognitive processing of respondents when they observe the Moral

(42)

t Intervene I Male 24 Male2 Female i Female i Young r It

~Senior

: Passenger ;111 to (I Law Violating

Figure 4-2: Vector representation of abstract features of a scenario choice.

Machine characters.

4.2.2 Utility Calculus

We express the trade-offs of choices in Moral Machine as a quantitative measure of net utility, which we define as a weighted linear combination of the number of abstract features present in Moral Machine scenario. To formalize, utility of choice Y = y is

u(6y) = W T F(OY) (4.1)

We call the weights along the D abstract dimensions w E- R Dmoaprnils

These weights represent how respondent values abstract features such as Young, Se-nior, or Medical to compute utility values of the choices in scenarios.

After computing the utility values of the choices to stay Y = 0 and to swerve

Y = 1, respondent's decision to swerve is seen as probabilistic outcome based on the

sigmoid function of net utility of the two choices U(6):

P(Y = 110) = 1(4.2)

1+ e-U(6)

where the net utility is defined by

A computational model of moral learning for autonomous vehicles

A Computational Model of Moral Learning for

Autonomous Vehicles

by

Richard Kim

B.A. Economics

University of California, Los Angeles, 2006

Submitted to the Program in Media Arts and Sciences, School of

Architecture and Planning

in partial fulfillment of the requirements for the degree of

Master of Science in Media Arts and Sciences

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

@

Massachusetts Institute of Technology 2018. All rights reserved.

Signature redacted

A u th o r . ...

.. ...

... ...

Program in Media Arts and Sciences, School of Architecture and

Planning

May 25, 2018

Signature redacted

/

Iyad Rahwan

Associate Professor of Media Arts and Sciences

Signatu

Thesis Supervisor

re redacted"

Tod Machover

Academic Head

Program in Media Arts and Sciences

Certified by ...

A ptdh

~s

A Computational Model of Moral Learning for Autonomous

Vehicles

by

Richard Kim

Abstract

The following people served as readers for this thesis:

Signature redacted

Professor Alex 'Sandy' Pentland ...

Professor of Media Arts and Sciences

MIT Media Lab

Massachusetts Institute of Technology

Signature redacted

Joshua B. Tenenbaum ...

...

Pro essor of Computational Cognitive Science

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Acknowledgments

Contents

List of Figures

Chapter 1

Introduction

1.1

Background

1.2

Summary of Contributions

1.3

Outline

Chapter 2

Hierarchical Bayesian Models and

Approximate Bayesian Inference

2.1

Hierarchical Bayesian Models

P(0)P(YIO)

E

2.1.1

Example: Beta-Binomial Model

a

3

01

31Y)

17

17

+

_/

**Q*(#)**

₀