Constrained extended Kalman filter : an efficient improvement of calibration for dynamic traffic assignment models

(1)

Constrained Extended Kalman Filter: an Efficient

Improvement of Calibration for Dynamic Traffic Assignment

Models

by

Haizheng Zhang

B. Eng. Automation, Tsinghua University, 2013

Submitted to the Department of Civil and Environmental Engineering and the Department of Electrical Engineering and Computer Science

in partial fulfillment of the requirements for the degrees of Master of Science in Transportation

and

Master of Science in Electrical Engineering and Computer Science at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016

c

Author . . . . Department of Civil and Environmental Engineering Department of Electrical Engineering and Computer Science May 19, 2016 Certified by . . . . Moshe E. Ben-Akiva Edmund K. Turner Professor of Civil and Environmental Engineering Thesis Supervisor Certified by . . . . Francisco C. Pereira Professor, Technical University of Denmark Thesis Supervisor Certified by . . . . Jacob K. White Cecil H. Green Professor of Electrical Engineering and Computer Science Thesis Reader Accepted by . . . .

Leslie A. Kolodziejski Chair, Department Committee on Graduate Students Accepted by . . . .

Heidi Nepf Chair, Graduate Program Committee

(2)

(3)

Constrained Extended Kalman Filter: an Efficient

Improvement of Calibration for Dynamic Traffic Assignment

Models

by

Haizheng Zhang

Submitted to the Department of Civil and Environmental Engineering and the Department of Electrical Engineering and Computer Science

on May 19, 2016, in partial fulfillment of the requirements for the degrees of

Master of Science in Transportation and

Master of Science in Electrical Engineering and Computer Science

Abstract

The calibration (estimation of inputs and parameters) for dynamic traffic assignment (DTA) systems is a crucial process for traffic prediction accuracy, and thus critical to global traffic management applications to reduce traffic congestion. In support of the real-time traffic management, the DTA calibration algorithm should also be online, in terms of: 1) estimating inputs and parameters in a time interval only based on data up to that time; 2) performing calibration faster than real-time data generation. Generalized least squares (GLS) methods and Kalman filter-based methods are proved useful in online calibration.

However, in literature, the road networks selected to test online calibration al-gorithms are usually simple and have small number of parameters. Thus their ef-fectiveness when applied to high dimensions and large networks is not well proved. In this thesis, we implemented the extended Kalman filter (EKF) and tested it on the Singapore expressway network with synthetic data that replicate real world de-mand level. The EKF is diverging and the DTA system is even worse than when no calibration is applied. The problem lies in the truncation process in DTA systems. When estimated demand values are negative, they are truncated to 0 and the overall demand is overestimated. To overcome this problem, this thesis presents a modified EKF method, called constrained EKF. Constrained EKF solves the problem of over-estimating the overall demand by imposing constraints on the posterior distribution of the state estimators and obtain the maximum a posteriori (MAP) estimates within the feasible region. An algorithm of iteratively adding equality constraints followed by the coordinate descent method is applied to obtain the MAP estimates. In our case study, this constrained EKF implementation added less than 10 seconds of com-putation time and improved EKF significantly. Results show that it also outperforms

(4)

GLS, probably because its inherent covariance update procedure has an advantage of adapting changes compared to fixed covariance matrix setting in GLS.

The contributions of this thesis include: 1) conducting online calibration algo-rithms on a large network with relatively high dimensional parameters, 2) identifying drawbacks of a widely applied solution for online DTA calibration in a large network, 3) improving an existing algorithm from non-convergence to great performance, 4) proposing an efficient and simple method for the improved algorithm, 5) attaining better performance than an existing benchmark algorithm.

Thesis Supervisor: Moshe E. Ben-Akiva

Title: Edmund K. Turner Professor of Civil and Environmental Engineering Thesis Supervisor: Francisco C. Pereira

Title: Professor, Technical University of Denmark Thesis Reader: Jacob K. White

(5)

Acknowledgments

First and foremost, I would like to express my sincere gratitude to my advisor, Profes-sor Moshe Ben-Akiva. Your invaluable guidance, immense expertise and continuous support made my MIT graduate study colorful and memorable. I learned a lot from you. It is my honor to know you.

I express my deep gratitude to Professor Francisco Pereira, for your insightful advice and extraordinary support through this tough but worthy journey. Thanks to Professor Constantinos Antoniou for being a great source of knowledge and help. Thanks also go to Dr. Ravi Seshadri, for your inspiration and guidance through my masters research. All of you taught me how to be a better researcher. Many thanks to Professor Jacob White, for your generous help with my dual masters degree in EECS and invaluable suggestions about this thesis.

I would like to thank Katherine Rosa, the research administrator, and Eunice Kim, the lab manager of our ITS Lab. Thanks for your help and patience in every detail. I would like to thank Kiley Clapper and Janet Fischer, the administrators of department of CEE and EECS for helping me all along.

Thanks to my roommates (and ex-roommate) Tianli Zhou, Hongyi Zhang, and Chao Zhang. It has been enjoyable to share the journey with you. Special thanks to Hongyi for a good source of machine learning knowledge, it is always helpful to discuss research with you. Special thanks to Yuelong Su, Lu Lu and Runmin Xu, whom I knew in my undergrad at Tsinghua University. All of you are great examples that lead me to who I am now. My graduate life would not be so colorful without you. Thanks also go to my friends I met and knew at MIT, for being friendly, considerate and helpful, which made my life at MIT much easier. Thank you, Yin Wang, Yan Zhao, Linsen Chong, Nathaniel Bailey, Xiao Fang, Weikun Hu, Chiwei Yan, Xiang Song, Shi Wang, Yundi Zhang, Yinjin Lee, Monique Stinson, Mazen Danaf, Bilge Atasoy, Nathanael Cox, Yuhan Jia, Na Wu, Manxi Wu, Li Jin, Jeffrey Liu, Rui Sun, He Sun, Lijun Sun, Yan Leng, Zhan Zhao and Yiwen Zhu.

(6)

Tech-nology (SMART) to support my study, research and trips to Singapore. Thanks also go to the people I knew at SMART, including Kakali Basak, Yan Xu, Dong Wang, Stephen Robinson, Yang Lu, Vinh-An Vu (Jenny). Every trip to Singapore is a unique experience because of you.

I would also like to thank Zhiwei Lin, for all your love and support. Last but not least, my deepest gratitude goes to my parents for your endless love, encouragement and constant support. This work is dedicated to you.

(7)

List of Figures

1-1 General DTA framework (source: (Ben-Akiva et al., 2010a)) . . . 15

3-1 Directed Graph Model in Kalman Filtering Scheme . . . 38

3-2 2-D Posterior PDF Contour and Different Estimators . . . 44

4-1 DynaMIT Real-Time Framework (source: Ben-Akiva et al. (2010a)) . 54 4-2 Singapore Road Network (source: Google Maps, 2016 ) . . . 57

4-3 Singapore Road Network in DynaMIT and MITSIMLab . . . 58

4-4 DynaMIT and MITSIMLab Integration Workflow . . . 59

4-5 Sample W-SPSA Calibrated Demand . . . 60

4-6 Sample Gaussian Kernel Smoothed Demand . . . 61

4-7 Estimated versus Observed Flow Counts: EKF . . . 66

4-8 Flow Count RMSN versus Time: EKF . . . 67

4-9 Estimated versus Observed Flow Counts: Constrained EKF . . . 70

4-10 Flow Count RMSN versus Time: Constrained EKF . . . 71

4-11 Sample Calibrated Demand by EKF (before truncation) . . . 72

4-12 Sample Calibrated Demand by Constrained EKF . . . 72

4-13 Flow Count RMSN versus Time: GLS versus Constrained EKF . . . 74

(10)

(11)

List of Tables

(12)

(13)

Chapter 1 Introduction and Background

1.1 Motivation

Traffic congestion in urban areas has been a hot topic due to its negative temporal, economic and external impacts such as delays, wasted fuel, frustrated motorists and air pollution. It is an old and well-known problem by soaring travel demand and insufficient increase of transportation infrastructures. From 1982 to 2014, the total length of public road in the United States has increased from 3,865,894 miles to 4,194,257 miles, which is a 8.49% increase. During the same timeframe, the vehicle miles traveled (VMT) has increased 90.61%, from 1,595,010 miles to 3,040,220 miles (FHWA, 2014). In contrast, the congestion cost per auto commuter severely has increased from $400 (in 2014 dollars) in 1982 to $960 in 2014, according to 2015 Urban Mobility Scorecard (Schrank et al., 2015). To make it worse, congestion is expected to continue increasing, according to the same source. Although the annual congestion cost from 2008 to 2011 decreased due to the recession, recently urban areas have generally experienced the same challenges as in the early 2000s, for instance, the increasing population and job market that contributes to congestion (Schrank et al., 2015).

The 2015 Urban Mobility Scorecard also recommends a balanced and diversified approach to reduce congestion, comprised of more policies, programs, projects, flexi-bility, options and understanding. Traffic management plays an important role among

(14)

the mixed solutions guided by this approach. One important application is the route guidance for drivers in Traffic Management Centers (TMCs), for instance, Advanced Traveler Information Systems (ATIS) in Federal Highway Administration (FHWA) of the US Department of Transportation. The route guidance aims to achieve global traffic control to reduce congestion and its impact on energy consumption, gas emis-sions, delays and frustration. In order to provide correct and reliable guidance, TMCs should have global information, insights and preferably prediction abilities. Dynamic traffic assignment (DTA) systems are considered a most promising category of tools to estimate traffic states and make state predictions. Built upon accurate predictions, where drivers’ reactions to possible route guidance strategies are also considered, the best strategy could be selected to reduce congestion to the lowest possible level. In this thesis, we examine the basis of route guidance operations, i.e. the DTA systems.

1.2 Introduction to Dynamic Traffic Assignment

Traditionally, traffic assignment has been derived from transportation demand fore-casting (typically the four-step model), which comprises traffic generation, traffic distribution, mode choice and route assignment. It is a process where traffic demand (usually represented by a static Origin-Destination matrix) is loaded onto the road network (Barcel´o, 2010). As a result, the traffic flows are computed for the links in the road network. In contrast, Dynamic Traffic Assignment (DTA) emphasizes time-varying properties, meaning the traffic demand and flows are time-dependent. This allows the flexibility to accommodate variational traffic scenarios, where underlying patterns of time and space are evolving (Mahmassani, 2001). DTA has evolved sub-stantially since the late 1970s. It is an essential tool for estimating and predicting dynamic traffic flows on road networks.

Various formulation and solution approaches to DTA have been introduced, both analytical and simulation-based. Analytical models express the DTA problem in mathematical formulations for a specific objective (e.g. user equilibrium (UE) or system optimal (SO)). Optimization algorithms are usually applied to solve the

(15)

an-alytical DTA and obtain its inputs and parameters. However, its conciseness and accuracy of replicating traffic flow dynamics is only applicable to small networks. The analytical formulation has to be simplified for the optimization problem to be solvable, thus some traffic relationship (e.g. driver behaviors, congestion) cannot be fully captured (Peeta and Ziliaskopoulos, 2001; Balakrishna, 2006). Thus, simulation is treated as the best way to model traffic due to its efficiency and accuracy. Recently, interest has grown in simulation-based DTA methods also because they offer the ad-vantage of accurately modeling driving behavior and response to guidance. Thus, the utilization of simulation-based DTA is important for traffic estimation. In this thesis, our focus in on simulation-based DTA, and DTA in the following chapters is equivalent to simulation-based DTA.

Figure 1-1: General DTA framework (source: (Ben-Akiva et al., 2010a))

The general work flow of a simulation-based DTA system comprises the dynamic traffic management system, demand and supply modules, as shown in Figure 1-1. The management system dictates and provides inputs for the demand and supply modules. Input parameters include Origin-Destination matrices, incident/event information, weathering conditions, traffic control strategies, etc. The interaction between demand

(16)

and supply is simulated by the DTA system. While the simulation is running, the traffic conditions (e.g. flows, travel times, route choice fractions, queues, vehicle trajectories) will be measured and outputted in a timely manner.

Existing simulation-based DTA models fall in 3 categories due to different lev-els of detail in terms of presenting traffic dynamics: microscopic, mesoscopic and macroscopic (Lu, 2013).

Macroscopic DTA models represent the least detailed traffic dynamics, in which the traffic is modeled as fluid flows and individual vehicles are not quantified. Thus they are the most computationally efficient models, particularly on large networks. Example simulation-based DTA systems are METANET (Wang et al., 2001), EMME (INRO, 2015), and Visum (PTV, 2015b).

Microscopic DTA models have the most detailed simulation granularity, where driving behaviors and car interactions are modeled. Examples of existing systems include PARAMICS (Smith et al., 1995), MITSIMLab (Ben-Akiva et al., 2010b), AIMSUN2 (Barcel´o and Casas, 2005), CORSIM/TSIS (FHWA, 2015), VISSIM (PTV, 2015a), and TransModeler (FHWA, 2016).

Mesoscopic models are a combination of macroscopic and microscopic models, with the aim of balancing between efficiency and accuracy. Mesoscopic models usually have detailed representation for individual vehicles on the demand side, but car interactions are not modeled on the supply side. DTA system examples include DynaMIT (Ben-Akiva et al., 2010a), DYNASMART (Mahmassani et al., 1998) and Dynameq (Mahut and Florian, 2010)

In this thesis, we utilized both microscopic and mesoscopic models to fully utilize their advantages. A microscopic model is selected to represent our real world, because of its accuracy to replicate real scenario. A mesoscopic model is selected to be our DTA system, due to its balance between efficiency and accuracy.

(17)

1.3 Thesis Motivation and Outline

In Section 1.1, the motivation of DTA systems were discussed. We stated that the aim of a DTA system is to provide traffic state estimation and prediction, which can be used for global route guidance. For that purpose, the DTA system is usually followed by route strategy generation. Then TMC will guide the drivers with those generated strategies. Drivers obtain the guidance information, make decisions and change the existing traffic flow patterns, which will be captured in the surveillance data. Those data will again be fed into the DTA system for the latest traffic state estimation. Among all the components in the strategy generation and evaluation loop, traffic state estimation is crucial for the DTA system, since it is the basis of other steps. Since traffic state estimates depend on the inputs and parameters of the DTA system, input and parameter estimation is also a crucial step. This step is called calibration. The calibration procedure based on the surveillance data is the focus of this thesis. The strategy generation and modeling of drivers’ reactions are not in the scope of this thesis.

The remainder of this thesis is structured below. Chapter 2 summarizes and commentates on the existing algorithms of offline and online calibration, and the scope of this research is narrowed down to extended Kalman Filter (EKF) based algorithms in the online category. Chapter 3 gives more detailed description of the EKF formulations and algorithms. In addition, the drawbacks of applying EKF to DTA parameter estimation are demonstrated, particularly for parameters with natu-ral bound constraints. Following that observation, constrained EKF, a modification upon EKF is proposed to overcome the drawbacks discovered. Then a specific algo-rithm of implementing the constrained EKF is presented. In Chapter 4, a case study of Singapore expressway network is conducted and demonstrated. This experiment is performed with synthetic data generated using an existing microscopic simulator (MITSIMLab), which was already offline calibrated with existing real sensor data. Then we compare the estimated traffic flow counts and their counterparts generated by the MISTIMLab, followed by the performance comparisons of EKF, Generalize

(18)

(19)

Chapter 2 Literature Review on Calibration

for DTA

The DTA systems were introduced in Chapter 1. However, a prerequisite of using DTA is its capability of producing reliable estimations and predictions of traffic states when compared to real world. Furthermore, the prediction ability of a DTA model is usually dependent on its estimation accuracy. Thus the estimation accuracy is of much importance. No matter what models are applied in different DTA systems, parameter estimation, which is also called model calibration, is a crucial step. Usually we have the real world traffic scenario, and we measure/observe some information (these data are called measurements/observations) about the traffic states, for instance, traffic flow counts, average travel speeds and link travel times. Then based on these data, we want to estimate the parameters such that the simulated measurements from DTA models fit the data. This is the general calibration problem for DTA models. In a more concise way, the calibration problem in DTA context is defined as follows. Given a set of initial values for various parameters (OD flows, route choice parameters, road capacities and speed-density relationships) and measurements (e.g. aggregate flows, speeds and densities), estimate those parameters such that the error between the simulated outputs and observed values is minimized (Antoniou, 2004).

The calibration problem can be classified in different ways since there are at least two dimensions to view this problem. For instance, categorized by the input or

(20)

parameter type, there are supply and demand calibration, meaning the calibration of inputs and parameters on the supply side and the demand side, respectively. There are also offline calibration and online calibration, where data availability at each time step and computation time are important considerations in algorithms for the latter. In this thesis, we classify the calibration algorithms into offline and online, and summarize them in each category.

2.1 Offline Calibration: Generalized Least Squares

Model

The objective of the offline calibration, in general, is to estimate the parameters such that the simulated outputs fit the observed measurements for an average traffic scenario. Here average means the measurements are expected values over a long period of time, for instance, over a month. Please notice that it is distinct from estimating static traffic assignment (STA) over a month, because this average is over the same time interval for all days in a month. By averaging, we hope to include day to day demand fluctuations, weather, etc. into the measurements, thus offline calibration on these measurements will yield the parameters for an average day in that month.

In terms of mathematical formulation, Lu formulated the offline calibration frame-work as an optimization problem, with the following notation (Lu, 2013):

• h: interval index, h ∈ H = {1, 2, ..., H}, where H is the set of simulation intervals

• G: road network, G = {Gh|h ∈ H}

• x: time-dependent DTA parameters, e.g., OD flows, x = {xh|h ∈ H}

• xa_{: a priori time-dependent parameter values, x}a_{= {x}a

h|h ∈ H}

(21)

• βa_{: a priori values of other DTA parameters}

• Mo_{: observed sensor measurements, M}o _{= {M}o

h|h ∈ H}

• Ms_{: simulated sensor outputs, M}s _{= {M}s

h|h ∈ H}

Then the offline calibration problem can be formulated as:

min x,β z(M o , Ms, x, β, xa, βa) = H X h=1 {z1(Moh, M s h) + z2(xh, xah)} + z3(β, βa) (2.1) subject to: Ms_h = f (x1, ..., xh; β; G1, ..., Gh) (2.2) lxh ≤ xh ≤ uxh (2.3) lβ ≤ β ≤ uβ (2.4)

where, f is the abstract of the DTA model, which takes x, β and G as inputs and generate the simulated sensor outputs Ms_h; z is a loss function, which measures the-goodness-of-fit between simulated sensor outputs and observed measurements. z1 is a

specific loss function that quantifies the discrepancy between simulated and observed measurements. z2 and z3 are loss functions that penalize the parameters for moving

away from a priori values; Parameters xh and β have lower bounds lxh, lβ and upper

bounds uxh, uβ, respectively.

In general, z1, z2 and z3 work as weights to balance the objective so as to minimize

the measurement discrepancy or to constrain parameters locally.

Generalized Least Squares (GLS) is a linear regression model. It is widely known for OD estimation in the field of DTA calibration. The GLS model is one config-uration in the optimization framework stated above. In this model, f function is a multiplication of the assignment matrix (Ashok, 1996) with the OD parameters, which models the traffic assignment procedure. The model can be more complicated when we consider the impact of OD parameters in previous intervals, and is given by

(22)

(Balakrishna, 2002): f (x1, ..., xh) = h X p=h−p0 Ap_hxp (2.5)

As for the z1 function, it is defined as:

z1(Moh, M s h) = (M o h− M s h) 0 Ωh−1(Moh− M s h) (2.6)

Covariance matrix Ωh is defined for the error between simulated and observed

measurements for interval h. The dimension of Ωh is n-by-n, where n is the number

of measurements. Similarly, covariance matrices for parameter errors xh− xah, β − β a

can also be estimated and used to construct z2 and z3. The covariance matrices are

usually estimated from the residuals of an Ordinary Least Squares (OLS) model on the same OD estimation problem. Specifically, OLS is a special case of GLS, in which the covariance of errors is assumed an identity matrix.

The GLS formulation is advantageous for OD estimation, since the assignment model has an analytical form and exact solutions are available(Balakrishna, 2002). However, when we include other supply parameters, we lose the closed form advan-tages. Thus the calibration of those parameters is harder for GLS.

Notice that the GLS formulation can handle both demand and supply parameters. The drawback is that when we do not have an analytical form, we have to rely on the simulation (DTA model) to reveal the relationship to us. In order to obtain good estimation results, we need efficient algorithms that take simulation outputs and adjust the parameters. Several optimization methods are applied to solve for the supply parameters in this GLS framework (Balakrishna, 2006). Balakrishna applied the Box-SNOBFIT and simultaneous perturbation stochastic approximation (SPSA) algorithms (Spall, 1992) to solve for the supply parameters, while the demand side is still handled by GLS. This process is called joint calibration of demand and supply parameters, and Balakrishna performed supply and demand calibration in a sequential manner.

(23)

Recently, Lu proposed an enhanced SPSA method: W-SPSA(Lu, 2013). W-SPSA estimates the gradient only with the sensor counts related to a specific parameter and those sensor counts are weighed according to relevance. It utilizes a weight matrix that store the weights to estimate the enhanced gradient. Then he conducted case studies on synthetic data and real data. Results showed that W-SPSA significantly outperforms SPSA in fitting traffic flow counts and speed data.

2.2 Online Calibration

The objective of online calibration is to estimate the correct parameters such that the DTA model can represent the real-time traffic scenario, not an average scenario. It tries to solve the calibration problem in a timely manner. There are two aspects of online calibration, in terms of: 1) estimating the parameters in a time interval only based on data up to that time; 2) performing calibration faster than real-time data generation. The general optimization formulation from offline calibration still applies, but online calibration minimizes only one part of the objective function that relates to the current simulation interval h. Despite different model formulations, one commonality is that traffic states are estimated with current and historical data, not future data. Furthermore, in order to deploy to an online traffic management system, the online calibration algorithm has to be finished within each time interval, before measurements in the next interval are available.

The online calibration problem has been addressed by some studies over the decade, but not many algorithms are proved efficient and scalable. Similar to of-fline calibration, inputs and parameters calibrated are generally two-fold: demand and supply. In the following subsections, algorithms of online calibration for DTA are reviewed and commented on.

Existing algorithms to solve such problems can be categorized according to dif-ferent formulations. Based on (Omrani and Kattan, 2012), there are two categories of formulations applied to online calibration for DTA systems, namely state-space formulations and optimization formulations. They apply to both demand and supply

(24)

calibration.

2.2.1 State-Space Formulation

The first category is the state-space formulation. It is based on the idea of system control; demand and supply parameters are treated as state vectors, which evolve over time. The target is to estimate and predict the true state vector. At the same time, we are given measurements/observations in the real world, which implies the true state vector. The most widely applied approach to solve the state-space formulation is Kalman filter-based method. In Kalman filtering framework, state evolvement is modeled by a transition equation between adjacent time steps, while the connection between states and measurements is depicted by a measurement equation. Based on different assumptions in the transition and measurement equation, the effects and challenges of EKF can be very different.

On the transition equation side, one assumption is that the deviations of the de-mand and supply parameters from historical averages define a stationary time series (Ashok and Ben-Akiva, 1993). This assumption applies to the scenarios where param-eters follow similar patterns from history. It is an elegant framework that captures the structural information in demand, without explicitly knowing the pattern. It requires an offline calibrated demand to serve as a basis to construct starting point. Ashok and Ben-Akiva(Ashok and Ben-Akiva, 2000) formulated a 4th-order autoregressive (AR) process from deviations and developed a Kalman filter to estimate and predict OD demand in real-time. Wang and Papageorgiou (Wang and Papageorgiou, 2005) formulated demand and supply parameters nicely in a stochastic macroscopic model, a random walk model is used as the transition equation to estimate traffic conditions in freeway stretches. In general, this assumption works well with a stationary random process with constant mean and variance (Zhou and Mahmassani, 2007). In the same article, Zhou and Mahmassani applied a polynomial trend filter on the deviations to account for more flexibility than AR model. The authors also proposed a procedure to update the historical demand pattern online and applied the whole process in OD estimation to a netword in Irvine.

(25)

When the evolution pattern of the parameters with time is different from the pattern implied by historical parameters, the first assumption of stationary time series fails. In this case, a simple random walk model can be built on the absolute values of demand estimators with no prior information. The simple random walk model is an AR model with an autocorrelation coefficient of 1 (i.e. Xk+1 = 1 ×

Xk). Cremer and Keller (Cremer and Keller, 1987), Chang and Wu(Chang and Wu,

1994) used a random walk model to make predictions about dynamic route choice split proportions. The authors concluded that this algorithm performed well in both accuracy and stability. However, this approach is limited to slow demand change scenario, and may not reflect non-linear trends in time-dependent OD flows. Thus when historical parameters are available, the formulation under this assumption has inferior performance to the one discussed above.

In the above works mentioned, the measurement equations are all based on ana-lytical formulations, since OD flows are greatly captured by the assignment matrix. For other supply and route choice parameters, it is difficult to formulate due to their indirect relationship with traffic flows. To solve this problem, numerical methods are applied to obtain the linearization of the relationship, i.e. the gradient. In or-der words, the DTA system is now treated as a “black box” and the relationship between input parameters and simulated outputs needs to be calculated (Antoniou et al., 2004, 2006). Notice the method is general, since no prior analytical form is assumed. Based on this idea, Antoniou applied extended Kalman filter (EKF), un-scented Kalman filter (UKF), limiting EKF (LimEKF) and Iterated EKF (I-EKF) in two cases in the UK and California (Antoniou, 2004). The author calibrated the demand and supply parameters with flow count and speed sensor data. The LimEKF has the most computational advantages with complexity of O(1), which is practical for online applications. EKF and UKF have similar computation complexity of O(n), where n is the dimension of parameters. In contrast, I-EKF does multiple iterations of EKF, thus has more complexity. Although compared with LimEKF, getting the linearization of the system for EKF is time-consuming, it is concluded that the EKF is still the most straightforward approach to estimate OD demand. In Antoniou’s

(26)

calibration results, the EKF outperforms UKF and LimEKF in terms of estimation and prediction accuracy. The author also conclude that additional runs of I-EKF could further reduce the estimation error.

As discussed above, one last remark about the state space model is its generality, since it can be used to calibrate all kinds of inputs and parameters for demand, supply and other types in the future. It can also handle all types of data, as long as they can capture effects of the inputs and parameters.

2.2.2 Optimization Formulation

The optimization formulation follows the idea of GLS. It is performed by viewing the DTA online calibration problem as a stochastic minimization problem. Like in GLS, we also want to minimize a loss function that measures a mixed error between parameters and their a priori values and estimated and real-time measurements, to-gether with the error between the model and reality. Numerical methods to estimate gradient as used in state-space formulation are also useful in this framework. Huang, E.(Huang, 2010) applied a Gradient Descent Algorithm (GD) and a Conjugate Gra-dient Algorithm (CG) as direct optimization formulation for the Brisa A5 motorway between Lisbon and Cascais, Portugal. A heuristic method – Hooke-Jeeves Pattern Search Algorithm (PS) – was also applied in the same scenario. The author com-pared those three algorithms and Extended Kalman Filter (EKF) with DynaMIT-R OD estimation, in terms of estimation accuracy and computational performance. The author concluded that EKF outperformed PS, CG and GD in decreasing errors in fitting both flow counts and speeds. GD and EKF have computational advantage over PS and CG, in the sense that less runtime is needed. Partitioned Simultaneous Perturbation EKF (PSP-EKF) and PSP-GD were also implemented. It is concluded that PSP-based methods are more efficient than their original counterparts. However, the estimation accuracy of EKF still outperforms all other methods including PSP-EKF. Other stochastic optimization algorithms such as the Box complex, SNOBFIT were also applied(Balakrishna, 2006). The algorithm is validated on a small synthetic network and the error rate is less than 10%. However, the algorithm was not tested

(27)

on large road networks, thus, these algorithms may have scalability problems when deployed to complicated road network, where traffic flows intervene.

In general, the optimization formulation is a work-around for GLS in online con-text, where variables in one interval are optimized at a time. We make two observa-tions here. First, based on different loss funcobserva-tions, the state estimates can be very different, which essentially correspond to the covariance matrix selection in GLS. Sec-ond, non-exact optimization methods would also need multiple iterations to converge, which may conflict with the online requirement. Thus the optimization formulation may have more parameters to tune than the state-space formulation.

2.3 Summary

We close with two important comments. First, there were multiple algorithms applied to solve the online calibration problem. However, all the works mentioned above were applied to highway corridors or freeway stretches, thus the scalability of those algorithms has not been tested. In other words, when the road network is large and there are plenty of parameters, the online calibration problem may not work properly. Second, extended Kalman filter algorithm has superior accuracy performance over other algorithms. Based on the two reasons, we conducted the following research to examine the accuracy performance of extended Kalman filter on a large road network. In addition, although we consider the computation performance, the main focus of the proposed research is the calibration accuracy, because the numerical method used in EKF shown in Section 3.2.2 is fully parallelizable.

(28)

(29)

Chapter 3 Methodology: Constrained

Extended Kalman Filter in DTA

Calibration

In this section, the detailed state-space formulation is reviewed first. Then the ex-tended Kalman filter (EKF) algorithm and some variants applied in the field of DTA calibration are summarized together with some comments. In addition, the drawbacks of EKF are discussed from the point of view of maximizing a posterior probability density. Then the optimization formulation for constrained EKF is discussed. In or-der to solve it, an algorithm that gives a near-optimal solution is proposed, followed by the coordinate descent algorithm that obtains the true optimum within a given precision. Finally, a summary is made.

3.1 General Problem Formulation

3.1.1 State-Space Formulation in Details

As in Chapter 2, the state-space formulation is based on the view of DTA calibration in control engineering. The inputs and parameters of DTA form the state vector, and it is assumed to evolve over time. The state-space formulation has been studied

(30)

comprehensively in control theory, and there are algorithms that estimate state vector in real-time efficiently. Thus, the introduction of this formulation to DTA calibration is beneficial. The state-space formulation is explained and discussed in Estimation and Prediction of Time-Dependent Origin-Destination Flows (Ashok, 1996). While the original formulation is only for origin-destination flow estimation and prediction, the state vector can contain all kinds of parameters (e.g. demand and supply parameters). In order to express the formulation, the following notation is defined:

• h: interval index, h ∈ H = {1, 2, ..., H}, where H is the set of simulation intervals, where time is discretized into indices

• xh: state vector of time interval h

• Mh: measurements in time interval h

Then the state space model comprises the following equations:

• Transition equation

xh = fh−1(xh−1, ..., xh−p) + wh (3.1)

• Measurement equation

Mh = gh(xh, ..., xh−q+1) + vh (3.2)

where, p is the number of previous states that are believed to have relations with xh; q is the number of states related to current measurement Mh; wh is the process

error term, which indicates the imperfection of the transition model f_h; vh is the

observation error term, which absorbs the measurement error of Mh as well as the

imperfection of the model g_h.

Usually the functions f_h and g_h are hard to model, since it depends not only on multiple previous states, but also on time step. The transition equation is usually

(31)

modeled as an autoregressive process as stated in Estimation and Prediction of Time-Dependent Origin-Destination Flows (Ashok, 1996):

xh = h−p

X

k=h−1

Fk_hxk+ wh (3.3)

where, Fk_h is a square matrix, representing the effect of xk on xh; If one makes the

assumption that the autocorrelation structure remains constant over time, Fk_h would only rely on h − k, i.e. Fk_h = Fh−k. In fact, the assumption is often made due to

model parsimony.

Similarly, for the measurement model, an autoregressive process can also be ap-plied, as shown in the following display.

Mh = h−q+1

X

k=h

Ak_hxh+ vh (3.4)

where, Ak_h matrix accounts for the contribution of xk to Mh. Specifically, if our

target is only the origin-destination (OD) estimation and Mh is the aggregated sensor

counts, Ak_h is the assignment matrix, where its (i, j)-th element is the fraction of jth OD value in xk that contributes to ith value of Mh.

Typically, the models f_h and g_h need to be constructed first in order to solve for the state vectors xk. The autoregressive models are one type of candidate functions

for f_h and g_h, which are easier to estimate. This is because the coefficient matrices (Fk_h, Ak_h) work as the linearization when f_h and g_h are non-linear. When we have enough data points for the same period h, we can use least squares method to estimate the coefficient matrices. Thus, if each Ak_h is a full matrix, we have (nM × nx × q)

parameters to estimate in the whole model, where nM is the length of vector Mh and

nx is the length of xk. Similarly, for the transition model, we also need to estimate

each element for all Fk_h(nx×nx×p parameters in total). However, the amount of data

available is usually not enough to estimate such complicated models, especially for a full matrix Fk_h in the transition model. Thus, a simplification for Fk_h is a diagonal matrix, where correlations between different OD pairs are not considered. One can

(32)

make a further assumption that the diagonals have the same magnitude, in other words, Fk_h is reduced to a scalar Fk

h. The model can also be simplified in the time

dimension by reducing p. As for the measurement model, it is more convenient to estimate with numerical methods, since we already have the simulation-based DTA model to generate enough data points for us. In this research, the formulations used are rather simplified:

xh = fh(xh−1) + wh = xh−1+ wh (3.5)

Mh = gh(xh) + vh = Ahxh+ vh (3.6)

Following the discussion about estimating Ah in the measurement equation, the DTA

model is treated as a “black box”, then numerical methods are used. In other words, the linear relations are estimated based on data points generated by the DTA sim-ulation. Thus, all the relations between xh and Mh can be handled and measured,

even for the types of state vectors and measurements where no analytical formulation is available.

We make some important comments on the simplification procedure. First, the transition equation accounts for the relations between state vectors in different time intervals. A more accurate transition equation is undoubtedly beneficial. However, its positive impact is more on the prediction side, especially for predicting the traffic states of several time intervals ahead. As for its impact on calibration, it gives a starting point (xh) for the measurement model. Thus the effect of the simplification

in Equation (3.5) on calibration is limited, if g_h(·) is not a drastic changing function that depends heavily on the starting point. Second, the measurement model simpli-fication/approximation is based on the same conjecture that most information in a measurement vector is already utilized to infer the OD flows (Ashok, 1996). This conjecture is more likely to hold when measurement errors are low, and wh has low

variance. In other words, when we have enough information to infer the correct OD flow values, the measurement simplification is acceptable. Finally, it is beneficial to

(33)

include higher degrees in both equations, but computational complexity can be a major issue.

3.1.2 The Idea of Deviations

The idea of deviations comes from Dynamic Origin-Destination Matrix Estimation and Prediction for Real-Time Traffic Management Systems (Ashok and Ben-Akiva, 1993). Since the autoregressive (AR) process is based on the assumption of temporal interdependencies between OD flows in different time steps, it is beyond the capability of the AR process to account for the structural information about trip patterns. For instance, the morning peak and the evening peak are difficult to be modeled by a simple time-invariant AR process, because they clearly do not follow the same transition pattern. Thus, a simple way to incorporate the structure of temporal and spatial pattern of trips is to base on historical (e.g. offline) estimated state vectors xH

h. The deviation of state vector xh and Mh are hence defined as:

∂xh = xh − xHh (3.7)

∂Mh = Mh− MHh (3.8)

where, MH_h is the historical measurement values.

Then the transition equation Equation (3.5) and measurement equation Equa-tion (3.6) now become:

∂xh = ∂xh−1+ wh (3.9)

∂Mh = Hh∂xh+ vh (3.10)

After subtracting the historical values, the deviation ∂xh and ∂Mhare more likely

to be random variables of 0 mean, and they represent the day-to-day fluctuations. Thus the wh, vh term are more likely to be 0 mean. Thus, it is more reasonable for

(34)

E[wh] = 0, ∀h ∈ H (3.11)

E[vh] = 0, ∀h ∈ H (3.12)

E[whvTh] = 0, ∀h ∈ H (3.13)

E[whwTh] = Qh, ∀h ∈ H (3.14)

E[vhvTh] = Rh, ∀h ∈ H (3.15)

where, the Q_h, Rh are covariance matrices for wh and vh in time step h, respectively.

Further, we assume that the error terms across different time steps are uncorre-lated:

E[whwTk] = 0, ∀h, k ∈ H, h 6= k (3.16)

E[vhvTk] = 0, ∀h, k ∈ H, h 6= k (3.17)

Note that Hh in Equation (3.10) was specified as Ah in (Ashok and Ben-Akiva,

1993). This implies that the following two equations also hold, besides Equation (3.5) and Equation (3.6):

xH_h = xH_h−1+ wh (3.18)

MH_h = AhxHh + vh (3.19)

This indicates the historical states and measurements follow the same linear model based on assignment matrix as the deviations. It is assuming the linear measurement equation holds globally. This is a major but unnecessary constraint for the measure-ment model. Using deviations instead of the absolute values is a major improvemeasure-ment because the historical values already account for the non-linearity. Thus the devia-tion form is a local linearizadevia-tion in the vicinity of the historical values, not a global linearization. Here in Equation (3.10), the assumption is imposed on deviations only, and the Hh matrix is not necessarily the assignment matrix. In fact, it depicts the

(35)

local linear relationship around the historical values. As will demonstrate later in Section 3.2, the Hh matrix is calculated through numerical methods, not through

assignment matrix generated by DTA model.

We conclude this section by the following remarks. First, in this thesis, the state transition model is simplified as a random walk and the focus is on estimating the measurement model g_h, where a similar simplification is made. Second, the construc-tion of the measurement model is general in handling different data and parameter types, because it is based on local linearization with numerical methods. Finally, Equation (3.9) and Equation (3.10) are utilized to implement the idea of deviation, which is an elegant framework to avoid modeling the structural pattern in state vec-tors.

3.2 Extended Kalman Filter and Variants in DTA

Calibration

There have been several Kalman filter variants applied to solve the state-space for-mulation in the context of online calibration. Here the extended Kalman filter (EKF) algorithm are reviewed first and its connection to the state-space model is made ex-plicit. Then its variants are summarized and commented upon. Last but not least, the drawbacks of current practices of EKF are addressed and this leads to the next section.

3.2.1 Extended Kalman Filter

The extended Kalman filter is an approach to handle non-linearity. The transition and measurement equations are both non-linear in this case. The basic equations are:

• Transition equation

(36)

• Measurement equation

Mh = g(xh) + vh (3.21)

where, f and g depicts the transition and measurement relationship, which are as-sumed fixed over time; uh are control vectors, which we now assume always 0 in DTA

model because the objective now is to calibrate the system instead of controlling it; wh and vh are uncorrelated 0 mean multivariate Gaussian, with covariance matrix

Q_h and Rh, respectively.

By comparing the state-space model and EKF assumptions, we conclude that EKF has the same assumption as the simplified model as Equation (3.5) and Equa-tion (3.6), together with EquaEqua-tion (3.11) to EquaEqua-tion (3.17), except for the two addi-tional assumptions: the time-invariant model form and Gaussian distribution of the error term.

When the time-independent assumption does not hold, the EKF algorithm in-herently handles the time-dependent model, as discussed later. As for the Gaussian assumption, the idea of deviations ensures zero mean, but the shape could be non-Gaussian. When we are modeling day-to-day traffic fluctuations, if the historical OD flows are calculated from enough data, the Gaussian assumption is likely to hold.

As the equations show the basic assumptions, the algorithm of extend Kalman filter is displayed below.

For the extended Kalman filter algorithm, the input parameters are: • x0: initial starting point (guess) of the state vector at time h = 0

• P0: initial covariance matrix (guess) of x0

• Q_h: time-variant covariance matrix of wh, h ∈ H

• Rh: time-variant covariance matrix of vh, h ∈ H

The Kalman filter is an online algorithm, which means the measurements arrive while the algorithm is running. Assuming we have the estimates in the last time step

(37)

Algorithm 1 Extended Kalman Filter Initialize ˆ x0|0 = x0 (3.22) P0|0 = P0 (3.23) for h = 1 to H do Time Update

Predicted state estimate ˆ

xh|h−1 = fh−1(ˆxh−1|h−1) (3.24)

Transition equation linearization Fh−1 = ∂f_h−1 ∂x ˆ xh−1|h−1 (3.25)

Predicted covariance estimate

Ph|h−1 = Fh−1Ph−1|h−1F>h−1+ Qh (3.26)

Measurement Update

INPUT: real-time measurement Mh

Measurement equation linearization Hh = ∂g_h ∂x ˆ xh|h−1 (3.27)

Near-optimal Kalman gain

Kh = Ph|h−1H>h HhPh|h−1H>h + Rh

−1

(3.28) Updated state estimate

ˆ

xh|h = ˆxh|h−1+ Kh Mh− gh(ˆxh|h−1)

(3.29) Updated covariance estimate

Ph|h = Ph|h−1− KhHhPh|h−1 (3.30)

OUTPUT: posterior estimates ˆxh|h and Ph|h

(38)

h − 1: ˆxh−1|h−1 and Ph−1|h−1, the algorithm can immediately give us predicted

esti-mates ˆxh|h−1 and Ph|h−1 according to Time Update section. These are called prior

estimates, since they are based on the model. Subscript h|h − 1 means measurements at time h − 1 are revealed but we are predicting for time h. When new measurements Mh are available, they will get updated based on Measurement Update. The

updated estimates ˆxh|h and Ph|h are called posterior estimates.

A directed graph model that the extended Kalman filter based on is shown below. The horizontal arrows are based on the state transition model, and vertical arrows are based on measurement model. The directions of the arrows show the causal relations in a timely manner. For each time step h, when we are given Mh, we can infer xh,

and predict xh+1, just as Algorithm 1 shows.

Figure 3-1: Directed Graph Model in Kalman Filtering Scheme

There are some observations of the extended Kalman filter algorithms.

First, it is a non-linear extension to the linear Kalman filter (Kalman, 1960). It linearizes the functions f and g locally so that linear Kalman filter update formula-tions could be useful.

Second, in Algorithm 1, time-dependent f_h−1 and g_h are used instead of time-invariant f and g to extend the extended Kalman filter, which is necessary to be applied in the traffic simulation field. For instance, when added to the morning peak, the same OD flow from CBD to suburban area probably will be less of a problem to the congestion, compared with being added to the evening peak. By making those functions time-dependent, we consider the relations changing with time. We

(39)

could think the relation f_h−1 and g_h depend on time step h, which in reality depend on the current traffic situation, in terms of traffic distribution over the network, congesting level, incidents, weathering and maybe daytime or nighttime. Recall that the focus of this thesis is g_h. In our previous simplification, we model the influence of previous state vectors xh, ..., xh−q+1 by only using gh(xh). However, this is not a

major compromise if our target is to only estimate the state for the current interval h, since the influence of previous state vectors are already captured by g_h.

Third, the Kalman filter framework is general, in the sense that it does not con-strain the types of parameters and measurements. So the framework can handle all the parameters and measurements in the DTA model.

Last but not least, the two linearization steps are crucial, since they represent the underlying non-linear function. Again, in our settings, we care about the mea-surement model, and the linearization is only estimated for g_h. There are different methods to estimate the linearization, and thus there are different EKF variants, and they will be the focus in the following part.

3.2.2 Finite Difference and FD-EKF

The finite difference method is a way to obtain the gradient. It is the most straight-forward way to calculate the gradient when we do not have the analytical formulation of g_h. In our setting, the g_h function is the simulation-based DTA model. Assuming g_h(·) is a vector of dimension m and xh is a vector of dimension n, the gradient is a

(40)

can be calculated by Hh =      ∂gh1 ∂x1 . . . ∂gh1 ∂xn .. . . .. ... ∂ghm ∂x1 . . . ∂ghm ∂xn      ˆ xh|h−1 (3.31) where,      ∂gh1 ∂xi .. . ∂ghm ∂xi      ≈ gh(xh+ δi) − gh(xh− δi) 2δi (3.32) δi = [0, 0, ..., δi, ..., 0]> (3.33)

The δi vector is called perturbation vector, as it indicates that the vector xh

is perturbed at i-th element; Equation (3.32) approximates the i-th column of Hh

matrix, and it shows the change in all m measurements caused by the change in the i-th element of δi.

The method shown in Equation (3.32) is central finite difference. We make some remarks about this method. In our DTA setting, the simulation substitutes g_h. Thus, in order to get one column of Hh, we need 2 runs of simulation (gh). Thus this

algorithm has a complexity of O(n) for each Hh. Notice that the unit of complexity

is one run of simulation. Depending on the network size and number of simulated vehicles, the time needed for one run can be very different.

Based on this method, the extended Kalman filter algorithm is called FD-EKF in this thesis.

3.2.3 Simultaneous Perturbation and SP-EKF

Another method to calculate the Hh is called simultaneous perturbation. It comes

(41)

dimension, all dimensions are perturbed a small amount at the same time. Hh =      ∂gh1 ∂x1 . . . ∂gh1 ∂xn .. . . .. ... ∂ghm ∂x1 . . . ∂ghm ∂xn      ˆ xh|h−1 (3.34) where,      ∂gh1 ∂xi .. . ∂ghm ∂xi      ≈ gh(xh+ δ) − gh(xh− δ) 2δi (3.35) δ = [δ1, δ2, ..., δi, ..., δn]> (3.36)

The perturbation vector is perturbed randomly for each dimension. Notice in this case, all the columns in Hh have the same denominator as in Equation (3.35), so we

only need twice the evaluation of g_h. Thus, to obtain an approximate Hh, we need

O(1) calculations. The discussion about complexity unit in FD-EKF still holds here. The extended Kalman filter with simultaneous perturbation is called SP-EKF (Antoniou et al., 2007). Compared with the FD-EKF, it saves much computation time, but the approximated gradient matrix will be inaccurate. Since all columns of Hh are calculated with the same numerator vector, they are linear dependent.

In fact, the rank of Hh is 1. Due to this characteristic and our target is to obtain

accurate parameter estimations, this thesis focuses on FD-EKF, to obtain the most accurate gradient estimation. The EKF discussed in the following sections of this thesis is FD-EKF, unless SP is specified.

3.2.4 Characteristics of EKF in Online Calibration for DTA

As stated before, the EKF is based on a linearization of the non-linear functions. According to Online Calibration of Dynamic Traffic Assignment (Antoniou, 2004), EKF outperforms UKF in terms of error between simulated and observed measure-ments. This observation holds for both estimation and prediction. This demonstrates that EKF has good performance in practice even though it only approximates the

(42)

non-linear model to the first order. This case study was conducted on a freeway with ramps, which is considered a relatively low complexity scenario with only 80 parame-ters for each 15 minute time interval. Since the goal of online calibration is real-time performance, its performance on complex networks with larger dimension and shorter time intervals has yet to be investigated.

3.3 Constrained EKF

In this section, the situation when there are constraints on the state vectors is con-sidered in the EKF framework. An efficient near-optimal method is proposed.

3.3.1 Motivation

When we have some certain constraints on the state vectors, a simple way to impose them is to project the estimated state vector into the feasible region. When the constraints are in the form of lower and upper bounds, we can simply project each element of the state vector into its corresponding feasible region. This element-wise projection is called truncation, and this term will be used often in the remaining part. In the context of OD flow estimation, we know that for a certain OD flow variable, it should be non-negative because the number of vehicles cannot be negative. Hence, in this case, we have x ≥ 0 for this OD pair, as a natural non-negative constraint. Thus the truncation is needed for negative OD estimates in order to be fed into the DTA system. Although efficient, this fix is not necessarily correct, because estimators of different dimensions are correlated. Truncating one variable while keeping others intact disregards its relation with other variables.

As discussed above, there are natural non-negative constraints on the OD values. Since EKF is based on unconstrained Gaussian assumptions, it is likely that the estimates violate those constraints. It is especially the case when the true values of the OD flows are zero or close to zero, and the estimated variance is large. In this case, the Kalman filter tends to give estimates with noise around the true value. Suppose those true values are zero, Kalman filter estimates will oscillate around 0. For all

(43)

the OD pairs with 0 as true values, the estimates will be either positive or negative. Then, due to the truncation, those negative values will be set to zeros. Thus, on average we are estimating positive values for those OD pairs that should be all zero. Since this overestimation happens for each interval, the error accumulates. Hence the calibrated DTA system will be further and further away from the true traffic scenario we want to fit. A detailed demonstration is discussed in Section 4.3.2 of case study.

3.3.2 Optimization Formulation for Constrained Kalman

Fil-ter Estimates

In this subsection, the theoretical basis of the constrained Kalman filter is discussed. First the maximum a posteriori (MAP) estimate idea is introduced to demonstrate the objective of EKF. Then with this objective, the true MAP estimate within the constraints are demonstrated in a two dimensional example. Finally, the general optimization formulation is presented.

As discussed in Section 3.2.1, the Kalman filter family assumes Gaussian dis-tributed error terms. Thus, the state estimator, as a random vector, is also Gaussian distributed. The Kalman filter estimates ˆxh|h at time step h are essentially the

max-imum a posteriori (MAP) estimates, which are updated given the measurements and the prior distribution (based on the transition model). Equation (3.30) gives the posterior covariance matrix Ph|h, which depicts the shape of the posterior Gaussian

distribution. Thus, we can reconstruct the posterior distribution as:

fX(x) = 1 p(2π)n_|P h|h| exp −1 2(x − ˆxh|h) > Ph|h−1(x − ˆxh|h) (3.37)

where, n is the dimension of vector x.

For instance, as Figure 3-2 shows, the contours are the posterior probability den-sity function (PDF) for a 2-dimensional case. This is an example with (x, y) ∼

(44)

N (µ, Σ), where µ = [0.5, −1]> Σ =   1 0.7 0.7 1  

We can see that the “cross” is the center of the PDF, which is the MAP estimate for unconstrained Kalman filter. When we directly impose the constraints x ≥ 0, y ≥ 0, we get the “circle” point. But in terms of maximizing a posteriori probability density under the constraints, the “circle” point certainly does not do a good job. The true MAP estimate should be the “asterisk” point.

Figure 3-2: 2-D Posterior PDF Contour and Different Estimators

This problem is defined as Kalman filter with state inequality constraints. It is discussed in (Simon and Simon, 2006; Simon, 2010). They formulated the problem as

(45)

a quadratic programming with linear inequality constraints, as shown in the following display. The solution of the problem will be the estimates with inequality constraints.

max

x fX(x) ⇔ minx (x − x) >

Σ−1(x − x) (3.38)

s.t. Dx ≤ d (3.39)

where, the Σ is the covariance estimate for state estimate x; D is a known s × n constant matrix, s is the number of constraints, n is the number of state variables, and s ≤ n; Further, D is assumed full rank, i.e. rank of s. If the rank of D is less than s, we can always drop the redundant constraint to make it full rank.

In Kalman Filtering with Inequality Constraints For Turbofan Engine Health Es-timation (Simon and Simon, 2006), the same quadratic optimization formulation is proposed and the general idea of active set method is discussed. From the same source, the authors also proved that the variance of the constrained estimates is smaller than unconstrained estimates. The case study was performed on turbo fan health mon-itoring, where a quadratic programming was performed to solve the optimization problem. By comparing the constrained Kalman filter with the unconstrained one, the estimation error is largely reduced by 50%, on average. This implies that the constraints will also be very helpful to the problem discussed in Section 3.3.1.

3.3.3 An Efficient Near-Optimal Algorithm for EKF with

Bound Constraints in DTA Calibration

In the specific context of DTA estimation, the constraints are usually imposed on individual parameters. In OD estimation example, we have x ≥ 0. Another example is for supply parameters, where the supply parameter vector could have both upper bounds and lower bounds, i.e. slb ≤ s ≤ sub_{. So in our DTA calibration case, we}

(46)

have the following optimization formulation after each measurement update: min x (x − x) > Σ−1(x − x) (3.40) s.t. xlb ≤ x ≤ xub _(3.41)

where, x = ˆxh|h, Σ = Ph|h, xlb and xub are lower bound vector and upper bound

vector for state vector x.

Then an intuition of solving this is based on the truncation practice. For simplicity of the demonstration, we assume there is no upper bound for x and focus on the lower bound. When we truncate x, we set the elements that violates the lower bounds to corresponding elements in xlb_{. It is essentially adding equality constraints to the}

optimization problem. Let Set A contain those indices, where truncation is performed. In addition, Set Ac _{is the complement of Set A. In order to solve this problem, we}

can maximize the following conditional PDF:

max

xAc

fX xAc|x_A = (xlb)_A (3.42)

Maximizing the conditional probability (objective function) is equivalent to max-imizing the joint probability fX xAc, x_A= (xlb)_A, since:

fX xAc|x_A = (xlb)_A =

fX xAc, x_A= (xlb)_A

fX(xA = (xlb)A)

and the denominator is a fixed probability density for given x and Σ. Thus, we want to: max xAc fX xAc, x_A = (xlb)_A ⇔ min xAc (x − x)>Σ−1(x − x)xA = (xlb)A (3.43)

Now we prove that solution of Equation (3.42) is:

xAc = x_Ac + Σ_Ac_,A(Σ_A,A)−1 x_A− x_A (3.44)

(47)

Proof. Assume that the indices in Set Ac _{are less than each indice in Set A. If not,}

we can always do row and column exchanges to Σ, and element exchanges to x such that this assumption holds. Thus Σ and x can be split into blocks. In addition, we use the following notations:

x =   xAc xA  =   x1 x2   (3.46) Σ−1 = J =   JAc_,Ac J_Ac_,A JA,Ac J_A,A  =   J11 J12 J21 J22   (3.47) Σ−1x =   J11x1+ J12x2 J21x1+ J22x2  = h =   hAc hA  =   h1 h2   (3.48) Thus, (x − x)>Σ−1(x − x) =x>J x − 2x>J x + x>J x (3.49) =   x1 x2   >  J11 J12 J21 J22     x1 x2  − 2   h1 h2   >  x1 x2  + C (3.50) =x>₁J11x1+ 2(J12x2)>− 2h1> x1+ x>2J22x2− 2h>1x2+ C (3.51)

where, C is a constant irrelevant to x. Please note x2 is fixed at (xlb)A and only x1

is the variable. Thus, according to the first order condition:

2J11x1+ 2J12x2− 2h1 = 0 (3.52)

⇒ x1 = (J11)−1h1 − (J11)−1J12x2 (3.53)

We now claim that J11is invertible. This is because Σ is invertible to be a covariance

matrix for a multivariate normal. Thus Σ and J are both positive definite. Since the leading principal minors of J are all positive, the leading principal minors of J11 are

(48)

Then we substitute h1 back,

x1 = x1− (J11)−1J12 x2− x2

(3.54)

where, x2 = xA = (xlb)A.

In the following part, we will prove that (J11)−1J12 = −Σ12(Σ22)−1, where, Σ is

also divided to blocks like J .

Σ =   Σ11 Σ12 Σ21 Σ22   (3.55)

So, we can do row operations:

h I −Σ12(Σ22)−1 i   Σ11 Σ12 Σ21 Σ22  = h Σ0 0 i (3.56) with Σ0 _{, Σ}11− Σ12(Σ22)−1Σ21.

Since ΣJ = I, right-multiplying both sides of Equation (3.56) by J gives us

h I −Σ12(Σ22)−1 i =hΣ0 0 i   J11 J12 J21 J22  = h Σ0J11 Σ0J12 i (3.57)

from which we conclude the following by matching entries on both sides:

Σ0 = (J11)−1 (3.58)

−Σ12(Σ22)−1 = (J11)−1J12 (3.59)

Thus, we go back to the general case in Equation (3.40) and Equation (3.41), when the MAP estimates of unconstrained case violates the bounds (whose indices are in Set A), we can set them to the bounds that are nearest to them, and then obtain the conditional MAP with xA fixed to the bounds, according to Equation (3.44). Note

(49)

Algorithm 2 EKF with Iteratively Added Equality Constraints

Run EKF and obtain state estimate x and variance estimate Σ, n is the dimension of x Initialize I ← ∅ A ← ∅ x ← x do if I 6= ∅ then

Adjust invalid state elements

xIlb ← x lb Ilb (3.60) xIub ← x ub Iub (3.61)

Find conditional MAP estimates

A ← A[I (3.62) Ac _{← {1, 2, ..., n} \ A} _(3.63) xAc = x_Ac + Σ_Ac_,A(Σ_A,A)−1 x_A− x_A (3.64) end if Ilb ← ∅ Iub ← ∅

Identify invalid state indices for j = 1 to n do if xj < xlbj then Ilb ← IlbS{j} else if xj > xubj then Iub ← IubS{j} end if end for I ← Ilb [ Iub (3.65) while I 6= ∅

(50)

that this conditional MAP is not guaranteed to satisfy the bounds for xAc. Thus we

can do this iteratively, adding the indices where bounds are violated from Set Ac _to

Set A, and then re-estimate the conditional MAP, until all elements whose indices are in Set Ac _{are in the feasible region. The near-optimal algorithm is specified as}

Algorithm 2.

xA = [xA(1), ..., xA(|A|)]>, A(j) is the j-th element in Set A, |A| is the cardinality

of Set A; Similarly, ΣAc_,A= [Σ_i,j]|

i,j∈Ac_×A.

Based on our experiments, this algorithm gives the objective function (Equa-tion (3.40)) around 2% worse than the optimal, but it is much more efficient than solving the same quadratic programming problem.

3.3.4 Coordinate Descent Algorithm with Near-Optimal

Ini-tialization

When we are interested in the true optimum, this algorithm can also serve as an initial estimation, in other words, a starting point for the quadratic programming. Since we are handling with decoupled constraints for each element, a coordinate descent method can be applied to solve the quadratic programming problem. It is also faster than the quadratic programming toolbox in MATLAB. The specific algorithm of coordinate descent we used is the following.

There are several remarks of the gradient descent. First, the step size in each update is fixed to _Q1

j,j. Since the objective function is quadratic, the update of

this step size will give the optimal solution for xj, when other dimensions are fixed.

Second, this algorithm is computationally inexpensive, because there is no matrix multiplications in Equation (3.66). Last but not least, in the specific context of OD estimation, other objective functions could be used as the stopping rule. For instance, a distance measurement (like L1 norm) between current and the last estimated state

vector could be used as the objective function. When the improvement of the objective function is less than , the algorithm stops. When L1 norm is used, = 0.001 is

(51)

Algorithm 3 Coordinate Descent Initialize x ← x0 ← 0.001 Q ← Σ−1 b ← −Σ−1x Objthis← (x − x)>Σ−1(x − x) do for j = 1 to n do xj = xj− 1 Q_j,j Qj,1:nx + bj (3.66) xj ← max (xj, xlbj) (3.67) xj ← min (xj, xubj ) (3.68) end for

Objlast← Objthis (3.69)

Objthis← (x − x)>Σ−1(x − x) (3.70)

Constrained extended Kalman filter : an efficient improvement of calibration for dynamic traffic assignment models

Constrained Extended Kalman Filter: an Efficient

Improvement of Calibration for Dynamic Traffic Assignment

Models

Haizheng Zhang

Constrained Extended Kalman Filter: an Efficient

Improvement of Calibration for Dynamic Traffic Assignment

Models

by

Haizheng Zhang

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction and Background

1.1

Motivation

1.2

Introduction to Dynamic Traffic Assignment

1.3

Thesis Motivation and Outline

Chapter 2

Literature Review on Calibration

for DTA

2.1

Offline Calibration: Generalized Least Squares

Model

2.2

Online Calibration

2.2.1

State-Space Formulation

2.2.2

Optimization Formulation

2.3

Summary

Chapter 3

Methodology: Constrained

Extended Kalman Filter in DTA

Calibration

3.1

General Problem Formulation

3.1.1

State-Space Formulation in Details

3.1.2

The Idea of Deviations

3.2

Extended Kalman Filter and Variants in DTA

Calibration

3.2.1

Extended Kalman Filter

3.2.2

Finite Difference and FD-EKF

3.2.3

Simultaneous Perturbation and SP-EKF

3.2.4

Characteristics of EKF in Online Calibration for DTA

3.3

Constrained EKF

3.3.1

Motivation

3.3.2

Optimization Formulation for Constrained Kalman

Fil-ter Estimates

3.3.3

An Efficient Near-Optimal Algorithm for EKF with

Bound Constraints in DTA Calibration

3.3.4

Coordinate Descent Algorithm with Near-Optimal

Ini-tialization