Challenges in Modeling Arrival and Service Processes in Service Systems

Texte intégral

(1)1. aft. Challenges in Modeling Arrival and Service Processes in Service Systems Pierre L’Ecuyer. Dr. Université de Montréal, Canada and GERAD, CIRRELT. Thanks to Wyean Chan, Rouba Ibrahim, Boris Oreshkin, Nazim Régnard, Laure Leblanc, Delphine Réau, Mamadou Thiongane, Ger Koole. “Meet a GERAD Researcher” conference, November 2017.

(2) 2. aft. Simulation Challenges. Want to simulate large complex systems to study their behavior and improve decision making.. Speed of execution of large simulations.. I. “Modeling” methodology and tools for large and complex systems.. Dr. I.

(3) 2. aft. Simulation Challenges. Want to simulate large complex systems to study their behavior and improve decision making.. Simulation-based optimization and control.. I. Speed of execution of large simulations.. I. “Modeling” methodology and tools for large and complex systems.. Dr. I.

(4) 2. aft. Simulation Challenges. Want to simulate large complex systems to study their behavior and improve decision making. Trustable (valid) stochastic modeling of complex systems. Taking account of various kinds of information.. I. Simulation-based optimization and control.. I. Speed of execution of large simulations.. I. “Modeling” methodology and tools for large and complex systems.. Dr. I.

(5) 3. aft. Big Data. Sometimes huge amounts of data available to build stochastic models. How can we exploit this huge mass of data to build credible models? How to effectively update the models in real time as new data comes in? Strong links with data mining, machine learning, Bayesian statistics.. Dr. Generally much more complicated than selecting univariate distributions and estimating their parameters. Model inputs are often multivariate distributions and stochastic processes, with hard-to-model (but important) dependence between them, and parameters that are themselves stochastic..

(6) 4. Dr. aft. Call centers (or contact centers). Include sales by telephone, customer service, billing/recovery, public services, 911, taxis, pizza order, emergency services, etc. Employ around 3% of workforce in North America..

(7) 5. Example: A Multiskill Call Center. aft. Different call types. Depends on required skill, language, importance, etc. Agent types (groups). Each has a set of skills to handle certain call types. Service time distribution may depend on pair hcall type, agent groupi. λ1. Arrivals. λ2. ?. .... λK. ?. ?. Abandonments. Dr. Call routing rules and queues Agent types Service cdf. ?. S1. ···. G1,1. . . GK ,1 ?. ?. -. ?. SJ G1,J . . . GK ,J ?. ?.

(8) 6. aft. Examples of common performance measures. Dr. Service level: SL(τ ) = fraction of calls answered within acceptable waiting time τ . (May exclude calls that abandon before τ .) May consider its observed value over a fixed time period (a random variable), or its expectation, or the average in the long run (infinite horizon), or a tail probability P[SL(τ ) ≥ `]..

(9) 6. aft. Examples of common performance measures Service level: SL(τ ) = fraction of calls answered within acceptable waiting time τ . (May exclude calls that abandon before τ .) May consider its observed value over a fixed time period (a random variable), or its expectation, or the average in the long run (infinite horizon), or a tail probability P[SL(τ ) ≥ `].. Dr. Abandonment ratio: fraction of calls that abandon. Average waiting time for each call type.. Agent occupancy: fraction of the time where each agent is busy..

(10) 7. Performance evaluation, single call type Arrival rate λ, service rate µ, load λ/µ, s servers, waiting time W .. aft. Assumes Poisson arrivals with constant rate (not realistic) + single type.. Dr. M/M/s queue (Erlang-C). CTMC model. Approx. of P[W > 0], P[W > τ ], and E[W ]..

(11) 7. Performance evaluation, single call type Arrival rate λ, service rate µ, load λ/µ, s servers, waiting time W .. aft. Assumes Poisson arrivals with constant rate (not realistic) + single type. M/M/s queue (Erlang-C). CTMC model. Approx. of P[W > 0], P[W > τ ], and E[W ].. Dr. Approximation under quality and efficiency driven (QED) regime: λ → ∞ and s → ∞ with α = P[W > 0] ∈ (0, 1) fixed. Halfin and Whitt (1981). p Square root safety staffing: s ∗ = dλ/µ + β λ/µe. Could make sense for some large call centers..

(12) 7. Performance evaluation, single call type Arrival rate λ, service rate µ, load λ/µ, s servers, waiting time W .. aft. Assumes Poisson arrivals with constant rate (not realistic) + single type. M/M/s queue (Erlang-C). CTMC model. Approx. of P[W > 0], P[W > τ ], and E[W ].. Dr. Approximation under quality and efficiency driven (QED) regime: λ → ∞ and s → ∞ with α = P[W > 0] ∈ (0, 1) fixed. Halfin and Whitt (1981). p Square root safety staffing: s ∗ = dλ/µ + β λ/µe. Could make sense for some large call centers. M/M/s + M queue (Erlang-A).. Approx. of γ = P[abandon], P[W > 0], and α = P[W > τ ]. QED(τ ): Fix τ , α, and γ > 0. p Modified square root rule: s ∗ = d(1 − γ)λ/µ + δ (1 − γ)λ/µe. Erlang formula calculators developed by Wyean Chan http: //www-ens.iro.umontreal.ca/~chanwyea/erlang/erlangC.html.

(13) 8. aft. Multiple call types, multiskill agents. Much more difficult. Call routing rules become important and can be complicated. Approximations for service levels are not very good. Must rely on simulation.. Dr. In my lab, we developed ContactCenters and CCOptim, Java simulation and optimization software libraries for contact centers. Also some tools for model estimation from data. Developed mostly by Eric Buist (simulation part) and Wyean Chan (optimization part)..

(14) 9. Typical call center. Dr. aft. Arrival process is nonstationary and much more complicated than Poisson. Service times are not exponential and not really independent. Abandonments (balking + reneging), retrials, returns, etc..

(15) 9. Typical call center. aft. Arrival process is nonstationary and much more complicated than Poisson. Service times are not exponential and not really independent. Abandonments (balking + reneging), retrials, returns, etc.. Dr. Skill-based routing: Rules that control in real time the call-to-agent and agent-to-call assignments. Can be complex in general. Static vs dynamic rules. (e.g., using weights)..

(16) 9. Typical call center. aft. Arrival process is nonstationary and much more complicated than Poisson. Service times are not exponential and not really independent. Abandonments (balking + reneging), retrials, returns, etc. Skill-based routing: Rules that control in real time the call-to-agent and agent-to-call assignments. Can be complex in general. Static vs dynamic rules. (e.g., using weights).. Dr. Agents using fewer skills tend to work faster. Also less expensive. Compromise between single-skill agents (specialists) vs flexible multiskill agents (generalists). Staffing/scheduling/routing optimization: objective function and constraints can account for cost of agents, service-level, expected excess waiting time, average wait, abandonment ratios, agent occupancy ratios, fairness in service levels and in agent occupancies, etc. Various constraints on work schedules..

(17) 10. Data on call arrivals. 400. Dr. 200 0. number of calls. 600. aft. Available observations (for each day): X = (X1 , . . . , Xp ), arrival counts over (15 or 30 minutes) successive time periods.. 6am. 10am. 2pm. 5pm. 8pm. Quarter of hour. Ex.: Typical realizations of X for a Monday (15-min periods). Non-stationary. Strong dependence between the Xj ’s. Similar behavior in many other settings: customer arrivals at stores, incoming demands for a product, arrivals at hospital emergency, etc..

(18) 11. Dr. 0. aft. 8000 2000. 4000. 6000. Mon. Tue. Wed. Thur. Fri. Sat. Sun.. 0. # calls arrived this day after T. 10000. All days, call volumes before and after T = 2 p.m.. 2000. 4000. 6000. 8000. # calls arrived this day before T. 10000. 12000.

(19) 12. Modeling the arrivals. Dr. aft. Stationary Poisson process as in Erlang formulas? No..

(20) 12. Modeling the arrivals Stationary Poisson process as in Erlang formulas? No.. Dr. aft. Poisson process with time-dependent arrival rate λ(t)? Would imply that Var[Xj ] = E[Xj ]. Typically far from true..

(21) 12. Modeling the arrivals Stationary Poisson process as in Erlang formulas? No.. aft. Poisson process with time-dependent arrival rate λ(t)? Would imply that Var[Xj ] = E[Xj ]. Typically far from true. True arrival rate depends on several factors that are hard to predict. We can view it as stochastic, say Λj = Bj λj. and Xj ∼ Poisson(Λj ). over period j, where. Dr. B = (B1 , . . . , Bp ) = vector of random busyness factors with E[Bj ] = 1, λ = (λ1 , . . . , λp ) = vector of constant base rates (scaling factors)..

(22) 12. Modeling the arrivals Stationary Poisson process as in Erlang formulas? No.. aft. Poisson process with time-dependent arrival rate λ(t)? Would imply that Var[Xj ] = E[Xj ]. Typically far from true. True arrival rate depends on several factors that are hard to predict. We can view it as stochastic, say Λj = Bj λj. and Xj ∼ Poisson(Λj ). over period j, where. Dr. B = (B1 , . . . , Bp ) = vector of random busyness factors with E[Bj ] = 1, λ = (λ1 , . . . , λp ) = vector of constant base rates (scaling factors). Var[Xj ] = E[Var[Xj |Bj ]] + Var[E[Xj |Bj ]] = λj (1 + λj Var[Bj ]). Dispersion index (DI) and its standardized version (SDI): DI(Xj ) = Var[Xj ]/λj = 1 + λj Var[Bj ] ≥ 1,. SDI(Xj ) = (DI[Xj ] − 1)/λj = Var[Bj ]..

(23) Corr[Xj , Xk ] =. aft. 13. Corr[Bj , Bk ]. [((1 + 1/(Var[Bj ]λj ))(1 + 1/(Var[Bk ]λk ))]1/2. Dr. We expect: DI(Xj ) 1 and Corr[Xj , Xk ] ≈ Corr[Bj , Bk ] for “large” λj Var[Bj ]; i.e., large periods or high traffic. DI(Xj ) ≈ 1 and Corr[Xj , Xk ] ≈ 0 for small λj Var[Bj ]. Approximately a Poisson process when λj Var[Bj ] is small. One good theorem often tells you much more than a bunch of experiments!!! Do we see this in real data?. ..

(24) aft. 14. In a simulation, we want to generate the Bj ’s, then generate the arrivals one by one conditional on the piecewise-constant rates Λj . Another approach (less convenient) is to model and directly generate the Xj ’s, then randomize the arrival times.. Dr. Modeling the rates is harder because they are not observed!.

(25) 15. Data from a public utility call center (U). aft. One call type, data aggregated over 40 15-minute periods per day, from 8:00 to 18:00, Monday to Friday, after removing special days.. 150 140 130 Mean Count. 120 110. Dr. 100. 90 80 70 60. 5. 10. 15. 20 25 Period. 30. 35. 40.

(26) 16. aft. Call center U. 15 min aggregation 30 min aggregation 1 hour aggregation 2 hour aggregation. 30 25 20 15. 0.08 0.07. SDI(Yj,d). 35. DI(Yj,d). 15 min aggregation 30 min aggregation 1 hour aggregation 2 hour aggregation. 0.09. 40. 0.06 0.05 0.04. Dr. 10. 0.03. 5 5. 10. 15. 20 Period. 25. 30. 35. 5. 10. 15. 20 Period. 25. 30. 35. DI (left) and SDI (right) as a function of j for different period lengths..

(27) 17. aft. Corr[Xj , Xk ] in call center U, for 30 min to 4 hour data aggregations. 1. 0.9. 5. 0.8. 10. 1 0.9. 5. 0.8. 10. 0.7. 15. 0.6. 20 25. 0.6. 0.5. 20. 0.5. 0.4. 25. 0.4. 0.3. 30. 0.7. 15. 0.3. 30. 0.2. 35 10. 20. 30. 40. 40. 0.2. 35. Dr. 40. 0.1. 10. 20. 30. 0.1 40. 1. 5 10. 0.9 0.8. 1 0.9. 5. 0.8. 10. 0.7. 15 20 25 30. 0.6. 0.7. 15. 0.6. 0.5. 20. 0.5. 0.4. 25. 0.4. 0.3. 0.3. 30. 0.2. 35 40. 0.1. 10. 20. 30. 40. 0.2. 35 40. 0.1 10. 20. 30. 40.

(28) 18. Data from an emergency call center (E). 25. Mean Count. 20. Dr. 15. aft. Take one call type, Monday to Thursday (similar days), after removing special days (holidays, etc.). Other days have different arrival patterns. Day starts at 5 a.m. and is divided into 48 periods of 30 minutes. Mean counts per period, ≈ λj :. 10. 5. 0. 10. 20. Period. 30. 40.

(29) 19. aft. Emergency call center. 0.16. 7. 5. 30 min aggregation 1 hour aggregation 2 hour aggregation 4 hour aggregation. 0.12. SDI(Yj,d). DI(Yj,d). 6. 30 min aggregation 1 hour aggregation 2 hour aggregation 4 hour aggregation. 0.14. 4. 0.1. 0.08 0.06. 3. Dr. 0.04. 2. 0.02. 10. 20. Period. 30. 40. 10. 20. Period. 30. 40. DI (left) and SDI (right) as a function of j for different period lengths..

(30) 20. aft. Corr[Xj , Xk ] in call center E, for 30 min to 4 hour data aggregations. 1. 5 10 15 20 25 30 35 40 45 20. 30. 5. 0.9. 0.8. 10. 0.8. 0.7. 15. 0.7. 0.6. 20. 0.6. 0.5. 25. 0.5. 0.4. 30. 0.4. 0.3. 35. 0.3. 0.2. 40. 0.2. 0.1. 45. 40. 10. Dr. 10. 1. 0.9. 20. 30. 0.1 40. 1. 5 10 15 20 25 30 35 40 45. 10. 20. 30. 40. 1. 0.9. 5. 0.9. 0.8. 10. 0.8. 0.7. 15. 0.7. 0.6. 20. 0.6. 0.5. 25. 0.5. 0.4. 30. 0.4. 0.3. 35. 0.3. 0.2. 40. 0.2. 0.1. 45. 0.1 10. 20. 30. 40.

(31) 21. Data from a business call center (B). 350. Mean Count. 300. Dr. 250. aft. One call type, Tuesday to Friday, after removing special days. Opening hours (8:00 to 19:00) divided into 22 periods of 30 minutes. Monday and Saturday have different patterns. Mean counts per period:. 200. 150. 100. 5. 10 Period. 15. 20.

(32) 22. aft. Call center B 30 min aggregation 1 hour aggregation 2 hour aggregation 4 hour aggregation. 0.05. 30 min aggregation 1 hour aggregation 2 hour aggregation 4 hour aggregation. DI(Yj,d). 50 40 30. 0.045. 0.04. SDI(Yj,d). 60. 0.035. 0.03. 20. 0.025. 5. Dr. 0.02. 10. 10 Period. 15. 20. 5. 10 Period. 15. 20. DI (left) and SDI (right) as a function of j for different period lengths. SDI is not as large as for center E, but DI is much larger..

(33) 23. aft. Corr[Xj , Xk ] in call center B, for 30 min to 4 hour data aggregations. 1. 1. 0.9. 5. 0.8. 0.9. 5. 0.8. 0.7 0.6. 10. 0.7 0.6. 10. 0.5. 0.5. 0.4. 15. 0.4. 15. 0.3. 0.3. 0.2. 20. 0.1. 10. 15. 20. 5. Dr. 5. 0.2. 20. 10. 15. 0.1 20. 1. 1. 0.9. 5. 0.8. 0.9. 5. 0.8. 0.7. 10. 0.6. 0.7 0.6. 10. 0.5. 0.5. 0.4. 15. 0.4. 15. 0.3. 0.3. 0.2. 20. 0.1. 5. 10. 15. 20. 0.2. 20. 0.1 5. 10. 15. 20.

(34) 24. aft. Rate models. Λj = Bj λj. PGnortaAR1 PGnortaARM. Bj = 1 for all j. Bj = B for all j, where B ∼ Gamma(α, α). Bj ’s are independent, Bj ∼ Gamma(αj , αj ). Bj = B̃j B, combines common B and independent B̃j ’s. Bj = B̃j B pj /E[B pj ]. B has gamma marginals Bj and dependence specified by a normal copula (we fit all Spearman correlations). Bj = Gj−1 (Φ(Zj )) where Z = (Z1 , . . . , Zp ) ∼ N(0, R). Normal copula with Corr[Zj , Zk ] = ρ|j−k| . Normal copula with Corr[Zj , Zk ] = aρ|j−k| + c.. Dr. Poisson PGsingle PGindep PG2 PG2pow PGnorta. over period j..

(35) aft. 25. Dr. Difficulty: We want to model the Bj ’s, but they are not observed, only the Xj ’s are observed. This makes parameter estimation by maximum likelihood (ML) much more challenging, because we have no closed form expression for the likelihood. Moment matching is often possible, but much less robust and reliable. We use Monte Carlo-based methods for ML estimation..

(36) 26. Example: Likelihood Function for PG2 Model. ∞. Z p(X|B, β, α, λ) =. ∞. Z .... 0 p Z I Y Y. aft. B̃i,j = busyness factor for day i, period j. B̄i = busyness factor for day i.. 0. αj αj −1 −α B̃ p I Y Y (λj B̃i,j B̄i )Xi,j e −λj B̃i,j B̄i αj B̃i,j e j i,j d B̃i,j Xi,j ! Γ(αj ) i=1 j=1 α −1. j α −α B̃ (λj B̃i,j B̄i )Xi,j e −λj B̃i,j B̄i αj j B̃i,j e j i,j d B̃i,j Xi,j ! Γ(αj ) i=1 j=1 0 # I p " p Y αj I αj Y Y Γ(αj + Xi,j ) (B̄i λj )Xi,j = Γ(αj )I i=1 j=1 Xi,j ! (αj + B̄i λj )Xi,j +αj j=1 " p #" I p # Y αj I αj Y Y Γ(αj + Xi,j ) p(X|β, α, λ) = Γ(αj )I Xi,j ! j=1 i=1 j=1 " p # Z I Y ∞ Y β β B̄iβ−1 e −B̄i β (B̄i λj )Xi,j · d B̄i . Xi,j +αj Γ(β) (αj + B̄i λj ) i=1 0 j=1. ∞. Dr. =. Want to maximize this. No closed form expression for the last integral..

(37) 27. How the models match the DI for Center U 12. aft. Poisson PGsingle PGindep PG2 PG2pow PGnorta Data. 8 6. Dr. Dispersion Index. 10. 4 2 0. 5. 10. 15. 20 25 Period. 30. 35. 40. Comparison of the DI for the models and data..

(38) 28. How the models match the correlations for Center U. aft. 1. 0.8 0.7. PGsingle PG2 PG2pow PGnorta PGnortaAR1 PGnortaARM Data. Dr. Corr(Y1,j , Yj+1,p−j). 0.9. 0.6 0.5 0.4. 5. 10. 15. 20 25 Period. 30. 35. 40. Comparison of sample correlation between past and future demand..

(39) 29. How the distribution predicted by the model fits the data out-of-sample. aft. For each observation i (one day), estimate the model without that day, then for each period j (or block of successive periods) compute interval [Li,j , Ui,j ] such that P[Xi,j ∈ [Li,j , Ui,j ]] ≈ p (desired coverage) according to model, then compute the proportion of days where Xi,j ∈ [Li,j , Ui,j ] and compare with p via sum of squares.. Dr. RMS Deviation of out-of-sample coverage probability, for call center U. 75% target cover 90% target cover 1/4 h 1/2 h 1h 2h 4h 1/4 h 1/2 h 1h 2h Poisson 38.9 47.1 53.9 59.6 64.6 39.2 50.9 59.7 67.5 PGsingle 8.6 8.0 6.9 4.0 1.7 7.3 7.0 5.4 3.1 PGindep 4.5 10.5 24.6 36.5 46.3 1.8 8.4 22.3 37.8 PG2 4.4 3.4 3.8 3.3 2.2 2.0 3.0 3.5 2.5 4.0 2.3 2.4 2.7 2.0 1.5 1.7 1.6 1.1 PG2pow PGnorta 4.4 4.1 3.9 3.4 2.7 1.8 2.2 2.4 2.3 4.4 4.0 4.2 4.0 3.2 1.8 2.3 2.5 2.7 PGnortaARM. 4h 74.4 1.4 51.2 1.7 1.1 2.4 2.2.

(40) 30. How the models match the DI, for Center E. 2. 1.5. Dr. Dispersion Index. 2.5. Poisson PGsingle PGindep PG2 PG2pow PGnorta Data. aft. 3. 1. 0.5. 10. 20. Period. 30. 40. The DI for the models and data..

(41) 31. 0.9 0.8. 0.6 0.5 0.4. Dr. Corr(Y1,j , Yj+1,p−j). 0.7. aft. How the models match the correlations, for Center E. 0.3 0.2 0.1. 0. 10. PGsingle PG2 PG2pow PGnorta PGnortaAR1 PGnortaARM Data 20 30 Period. 40. Sample correlation between past and future demand..

(42) aft. 32. Dr. RMS Deviation of out-of-sample coverage probability, for call center E: 75% target cover 90% target cover 0.5 h 1h 2h 4h 8h 0.5 h 1h 2h 4h Poisson 10.7 16.6 23.5 31.3 37.5 8.5 13.8 21.0 30.1 PGsingle 7.2 10.0 12.5 13.5 12.0 5.3 7.8 10.1 11.4 PGindep 1.3 5.3 12.7 21.4 29.9 0.8 4.1 10.2 18.7 PG2 2.1 4.9 8.7 11.4 11.6 1.6 3.8 6.8 9.4 PG2pow 1.5 2.9 4.4 5.1 5.0 1.0 2.0 3.1 3.4 PGnorta 1.3 1.7 1.7 1.7 1.3 0.8 1.1 1.2 1.3 PGnortaARM 1.3 2.4 3.4 4.3 4.5 0.9 1.5 2.2 2.7. 8h 38.7 9.0 29.1 8.8 3.0 0.8 2.8.

(43) 33. How the models match the DI for Center B. aft. 14 12. 8 6. Dr. Dispersion Index. 10. 4 2 0. 5. Poisson PGsingle PGindep PG2 PG2pow PGnorta Data 10 15 Period. The DI for the models and data.. 20.

(44) 34. 1. 0.8 0.7. Dr. Corr(Y1,j , Yj+1,p−j). 0.9. aft. How the models match the correlations for Center B. 0.6 0.5 0.4. 5. 10 Period. PGsingle PG2 PG2pow PGnorta PGnortaAR1 PGnortaARM Data 15. 20. Sample correlation between past and future demand..

(45) aft. 35. Dr. RMS Deviation of out-of-sample coverage probability, for call center B. 75% target cover 90% target cover 0.5 h 1h 2h 4h 8h 0.5 h 1h 2h 4h Poisson 43.1 50.9 57.5 61.9 66.7 44.7 55.8 64.4 71.3 PGsingle 7.6 7.1 6.1 4.0 2.3 5.8 6.1 5.4 4.0 PGindep 3.1 13.2 27.3 39.3 48.7 2.0 12.1 26.4 41.2 PG2 4.8 4.1 5.1 4.3 2.6 3.0 2.9 3.3 2.9 PG2pow 2.5 3.3 4.1 2.0 0.8 1.7 3.3 3.7 2.8 PGnorta 3.2 3.0 2.7 1.2 1.3 2.0 2.4 2.2 1.7 PGnortaARM 3.2 3.1 2.8 1.9 0.5 2.0 2.4 2.3 2.2. 8h 77.7 3.4 51.9 2.3 2.7 2.0 2.8.

(46) 36. aft. Impact of choice of arrival model. Take call center U on a week day. Single call type. Lognormal service times with mean 206.4 and variance 23 667 (seconds).. Dr. Abandonment at rate 1/2443 per second. Staffing in each period: (16, 24, 31, 36, 43, 48, 51, 52, 56, 60, 62, 65, 67, 67, 66, 65, 62, 61, 60, 61, 64, 64, 63, 63, 64, 64, 64, 64, 65, 65, 64, 64, 62, 60, 58, 56, 53, 49, 48, 44). Performance measures: average waiting time (AWT); service level (SL) with threshold τ = 120 seconds. We simulated 10,000 days with each arrival model..

(47) 37. 240. aft. 0.9. Poisson PGsingle PGindep PG2 PG2pow PGnorta. 0.8. 0.75. 0.7. 0.65. Poisson PGsingle PGindep PG2 PG2pow PGnorta. 220 200. Average Waiting Time. Service Level under 120 seconds. 0.85. 180 160 140 120 100. 80. 0.6. 60. 0. 5. 10. 40. Dr. 0.55. 15. 20 Period. 25. 30. 35. 40. 0. 5. 10. 15. 20 Period. 25. 30. 35. 40. Evolution of the SL (left) and AWT in seconds (right) during the day for the Quebec utility society. SL = proportion of calls answered within 120 seconds in the long run..

(48) 38. 50. aft. 90. PGsingle PG2 PG2pow PGnorta. 45 40. 30 25 20 15. 60 50 40 30 20. 10. 10. 0. 10. 20. 0. Dr. 5 0. 70. Percentage of days. Percentage of days. 35. PGsingle PG2 PG2pow PGnorta. 80. 30 40 50 60 70 Service Level at 120s (in %). 80. 90. 100. 0. 110. 220. 330. 440 550 660 770 Average Waiting Time. 880. 990. 1,100. Histogram of the distribution of the daily SL (left) and daily AWT (right), with different models, for the Quebec utility society. SL = proportion of calls answered within 120 seconds during the day..

(49) 39. aft. More on arrival process modeling. Dr. Modeling the arrival rates over successive days. Dependence between the days. Seasonal effects (day of the week, period of the year). Special days (holidays, special events, etc.). External effects (weather, marketing campaigns, etc.)..

(50) 39. aft. More on arrival process modeling. Modeling the arrival rates over successive days. Dependence between the days. Seasonal effects (day of the week, period of the year). Special days (holidays, special events, etc.). External effects (weather, marketing campaigns, etc.).. Dr. Dependence between call types: the arrival rate should in fact be a multivariate process. Modeling via copulas..

(51) 39. aft. More on arrival process modeling. Modeling the arrival rates over successive days. Dependence between the days. Seasonal effects (day of the week, period of the year). Special days (holidays, special events, etc.). External effects (weather, marketing campaigns, etc.).. Dr. Dependence between call types: the arrival rate should in fact be a multivariate process. Modeling via copulas. Arrival bursts in emergency call center..

(52) 40. aft. Modeling the service times In call center U, the available data for service times is the number of calls of each type handled by each agent on each day, and the average duration of these calls. From this, we can estimate the mean and variance of a service times and match those to the mean and variance of a distribution such as lognormal or gamma.. Dr. Service times are usually not exponential..

(53) 40. aft. Modeling the service times In call center U, the available data for service times is the number of calls of each type handled by each agent on each day, and the average duration of these calls. From this, we can estimate the mean and variance of a service times and match those to the mean and variance of a distribution such as lognormal or gamma. Service times are usually not exponential.. Dr. Common assumption: the distribution depends only on the call type. But on closer examination, we find that it depends on the individual agent, on the number of call types that the agent is handling, and may change with time (learning effect, motivation and mood of agent, etc.). This is an important fact to consider when making work schedules!.

(54) 41. aft. Average service time per agent for one call type, in center U (more than 1000 agents) + +. + + + ++ ++ +++ + +++ ++++ + +++ + + ++ + ++ ++++ + + + ++ +++ + + + ++ + +++ ++ + + + ++ +++ + + + + ++ ++ + +++ ++++ ++ +++ ++ + ++++ + + + + + + + + + ++ ++ + + + + + + + + +++ ++ ++ +++++++++++++ ++ ++ + + + + + ++++ ++ ++++++ + ++ ++++++++ ++++ + ++ ++ +++ + + + + + + + + + + + + + + + + + + + ++ ++ + + + +++++++++ +++ + + + + + + + + + ++ + + + + +++ +++ +++ + + + ++ + ++ +++++++ ++ + + +++ +++ + + + ++++++ ++ ++++++ + ++ ++ + ++ + ++ ++ +++ + + + + + + ++ ++ + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + +++ ++ + + + + +++++ +++++++++++ ++++++ ++ + ++ + +++ ++ + + ++ + +++++ + + + ++++++ + +++ ++ + +++ ++ +++++ + + + + + + ++ + +++ ++ ++++++ + ++ +++++ +++ +++ ++ + ++ ++++ + + +++ +++ ++ ++ + +++ + + + + ++++ +++ +++ + + + ++ + ++++++++ +++++ + +++ +++++++ + + ++ + +++ ++ + + + ++ ++++ ++ + +++ + + +++++ ++ +++++++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + +++++++ ++ + + + ++ ++ + + + + + + ++ + + +++++ + ++ +++ + ++ + + ++ + ++ ++ ++++ + +++++++++ ++++ ++ + + + ++ + + +++ + ++++++ ++ +++ +++ ++ + ++ + ++ ++++ ++ +++ + + ++ ++ + + + + + + + + + + + + ++ + + + ++ + + ++ + + + + + + + + + + + + + + 0. Dr. 400 200 0. Temps de service. 600. 800. Temps de service moyen par agent pour le type d'appel R_Facture_F. 2000. 4000. Nombre d'appels. + + + ++ + + + + + + + + + + ++ + ++ + +. 6000. +. 8000.

(55) 42. Another call type. + + + ++ + + + + ++++ + ++ +++ + + ++ + ++ ++ + + + +++ + + + + ++++ + + + + ++ ++ +++ + + + + + + ++ ++ + + ++ +++ + + + + ++++ + ++ + + + ++++ + +++++ ++++ + ++ + + + ++ + + + + + + + + ++ ++++ ++ ++ + + + + ++ ++ ++ + + + + + + + + + ++++++ +++ + + + + + + + + + + + + + + + + ++ +++++++ ++ ++++ + + ++ +++++++ ++ ++ + + ++ ++ ++ ++ + + ++ ++ +++ +++ + ++ + + +++ + + ++ + + + + + + + + +++ + +++ + + + + + + + ++ + + + + + + + + + ++ ++ + + ++ + + ++ + ++++ + + + + + + 0. Dr. 800 600. Temps de service. 400 200 0. aft. Temps de service moyen par agent pour le type d'appel R_Coupure_F + +++. 500. 1000. Nombre d'appels. 1500. +. +.

(56) 43. Another call type. aft. + + + +. + + + + + + + + ++ ++ + + ++ +++ + + + ++++ + ++ + ++++ ++ + + ++ ++ ++ ++ + + ++++ + ++ + ++ ++ + + + + + ++ + ++ ++ + + + + + ++ +++ + +++++++ + ++ ++ + ++ + ++ + + + + + + + + + + ++++ + ++++ + ++ ++ + ++ ++ ++ + ++ + + ++++++ ++ ++ ++ + + + + + ++ +++ ++++ ++ + + + ++ ++ +++ + + ++++ + ++ + + + + ++ + + ++ + + + ++ + +++ +++ +++ +++++ + + ++ + + + ++ + + + + + + + + + + ++ + + + + ++++++ + + + + + + + + + +++ ++ ++ ++ ++ + + +++ + +++ + + + + ++ ++ +++ + + +++ + +++ ++++ + + ++ + + + ++ + + + ++ + + ++ + + + + ++ + + + + + + + ++ ++ + ++++ ++ ++ + ++ ++ + ++ ++ + + + +++++++++++ + ++ ++ ++ + + + ++ + + ++ +++ +++ + +++ ++ + ++ + ++++ + + + + ++ + ++ ++ + ++ +++ + ++ + + + + ++ + + ++++ + + + + 0. +. Dr. 600 400 200 0. Temps de service. 800. 1000. Temps de service moyen par agent pour le type d'appel R_Panne_F + + +. 1000. 2000. +. +. 3000. 4000. Nombre d'appels. +. 5000. 6000.

(57) 44. Four different agents, same call type. aft. All have handled more than 1000 calls. Daily averages:. 1000. Comparatif d'agents sur le type R_Facture_F. Dr. 600 400. AY2915 (1419) BA9537 (1588) BM5697 (1259) CP4704 (1185). ●. 200. ●. ●. ● ● ●. ● ● ● ●● ● ●. 0. Temps de service. 800. ●. 0. ●●. ●. ● ●● ● ●● ● ● ●● ●. ●. ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●. 100. ●. ●●. ● ● ● ●● ●● ●● ● ● ●. 200. Index du jour. 300. 400.

(58) 45. Same agent, 4 call types, weekly averages. aft. 350. Temps de service hebdomadaire moyen de l'agent par type d'appel. 300. ●. C_PANNE_F R_ED_F R_EDBloque_F R_Panne_F. ●. 250. ●. ●. ●. ●. ● ●. 200. ●. ●. ● ●. ● ●. ●. ●. ●. ●. ●. ● ●. ● ●. ●. ●. ● ●. 10. ●. ●. ● ● ● ●. ●. ● ●. ●. ●. ●. ●. ●. ●. ●. ●. Dr. 100. 150. ●. 0. ●. ●. ●. 50. Temps de service. ●. ●. 20. 30. Index de la semaine. 40. 50. 60.

(59) 46. Same agent, 6 call types. aft. Temps de service hebdomadaire moyen de l'agent par type d'appel. Dr. 400 300 200 100. Temps de service. 500. 600. ●. C_PANNE_F R_AideTech_F R_Budget_F R_Facture_F R_MVE_F R_Panne_F. ●. 0. 10. 20. 30. Index de la semaine. 40. 50. 60.

(60) 47. Same agent, 8 call types. aft. Temps de service hebdomadaire moyen de l'agent par type d'appel ●. 600. Dr. 400. C_PANNE_F R_Budget_F R_Facture_A R_Facture_F R_MVE_F R_Panne_F R_RetProact_F R_TransfSpec1_F. ●. 200. ●. ●. ●. 0. Temps de service. 800. ●. 0. 10. 20. 30. Index de la semaine. 40. 50. 60.

(61) 48. Modeling the evolution of service time averages For given agent and call type, day i:. aft. Mi = βdi + Γwi + i ,. where di = type of day i, wi = week of day i, and Γw is a random effect that may follow, e.g., an AR process: Γw = ρΓw −1 + ψw .. Dr. The i and ψw are residuals (noise). Gives better predictions than just taking overall average for each agent. For multiple call types, there can be a different Γw for each call type, or a single Γw for all call types (does better for our data set). There could also be common effects across agents. Better: model the evolution of all distribution parameters..

(62) 49. aft. Performance measures and optimization. For a given staffing and routing strategy, the SL on a given day (or given period) is a random variable. We may be interested in its distribution. What if we pay a penalty iff the SL is below a given number today?. Dr. After solving some work-schedule optimization problem in some call center, we re-simulated with our best feasible solution for 10000 days, and computed the empirical distribution of the SL..

(63) 50. Frequency. aft. mean = 0.906. 1500. 500. Dr. 1000. 0 0.85. 0.87. 0.89. 0.91. Distribution of Global SL. SL 0.93. 0.95.

(64) 51. Frequency mean = 0.821. aft. 1500 1250 1000. 500 250 0 0.2. Dr. 750. 0.4. SL. 0.6. 1 0.80. Distribution of SL Period: 7.

(65) 52. Frequency. aft. mean = 0.806 1500 1250 1000. 500 250 0 0.2. Dr. 750. 0.4. SL. 0.6. 1 0.80. Distribution of SL Period: 17.

(66) 53. Frequency. aft. 2000. mean = 0.915. 1750 1500 1250 1000. 500 250 0 0.5. Dr. 750. 0.6. 0.7. SL 0.9. 0.80. Distribution of SL Period: 11. 1.

(67) 54. aft. Frequency. 3750. mean = 0.965. 1250. 0 0.5. Dr. 2500. 0.6. 0.7. 0.80. SL 0.9. Distribution of SL Period: 49. 1.

(68) 55. Frequency. aft. 2000. 1500. Dr. 1000. 500. mean = 0.832. 0 0.65. 0.75. 0.85. 0.80. Distribution of SL for Call Type 3. SL 0.95.

(69) 56. Frequency mean = 0.937. 2000 1750 1500 1250 1000 750 500 250 0 0.7. Dr. aft. 2250. 0.80. SL 0.9. Distribution of SL for Call Type 17. 1.

(70) 57. Frequency (×103 ). aft. 8 7 6 5. 3 2 1 0 0. Dr. mean = 0.894. 4. 0.25. 0.5. SL 0.75 0.80. Distribution of SL for Call Type 2 in Period 20. 1.

(71) 58 3. aft. Frequency (×10 ). 7 6 5 4. 2 1 0 0. Dr. mean = 0.911. 3. 0.25. 0.5. SL 0.75 0.80. Distribution of SL for Call Type 12 in Period 30. 1.

(72) 59. Example of scheduling optimization problem Suppose the routing rules are fixed.. Dr. aft. Several call types, several agent types, several time periods..

(73) 59. Example of scheduling optimization problem Suppose the routing rules are fixed.. aft. Several call types, several agent types, several time periods.. Dr. A shift type specifies the time when the agent starts working, when he/she finishes, and all the lunch and coffee breaks. cs,q = cost of an agent of type s having shift type q..

(74) 59. Example of scheduling optimization problem Suppose the routing rules are fixed.. aft. Several call types, several agent types, several time periods. A shift type specifies the time when the agent starts working, when he/she finishes, and all the lunch and coffee breaks. cs,q = cost of an agent of type s having shift type q.. Dr. The decision variables x and z are: (i) xs,q = number of agents of type i having shift type q; (ii) z`,s,j = number of agents of type ` that work as type-s agents in period j, with Ss ⊂ S` (they use only part of their skills). This determines indirectly the staffing vector y, where ys,j = num. agents of type s in period j, and aj,q = 1 iff shift q covers period j: X X X zs,l,j for all s, j. ys,j = aj,q xs,q + zl,s,j − q. l∈Ss+. l∈Ss−.

(75) 60. Scheduling Optimization Problem. E[num. calls type k in period j answered within time limit] . E[num. calls type k in period j, ans., or abandon. after limit]. (P0) : [Scheduling problem] P P min ct x = Is=1 Q q=1 cs,q xs,q subject to Ax + Bz = y, gk,j (y) ≥ lk,j for all k, j, gj (y) ≥ lj for all j, gk (y) ≥ lk for all k, g (y) ≥ l, x ≥ 0, z ≥ 0, y ≥ 0, and integer.. Dr. gk,j (y) =. aft. x = vector of shifts; c = their costs; y = staffing vector; (Long-run) service level for type k in period j (depends on entire vector y):.

(76) 61. Sample-path optimization via simulation. aft. We simulate n independent operating days of the center, to estimate the functions g . Let ω represent the source of randomness, i.e., the sequence of independent uniform r.v.’s underlying the entire simulation (n runs).. Dr. The empirical SL’s over the n simulation runs are: ĝn,k,j (y, ω) for call type k in period j; ĝn,j (y, ω) aggregated over period j; ĝn,k (y, ω) aggregated for call type k; ĝn (y, ω) aggregated overall. For a fixed ω, these are deterministic functions of y.. We replace the (unknown) functions g (·) by ĝ (·, ω) and optimize. To compute them at different values of y, we use simulation with well-synchronized common random numbers. Discuss..

(77) 62. aft. Empirical (sample) scheduling optimization problem (SP0n ) : [Sample scheduling problem] P P min ct x = s Q q=1 cs,q xs,q. Dr. subject to Ax + Bz = y, ĝn,k,j (y) ≥ lk,j for all k, j, ĝn,j (y) ≥ lj for all j, ĝn,k (y) ≥ lk for all k, ĝn (y) ≥ l, x ≥ 0, z ≥ 0, and integer.. Theorem: When n → ∞, the optimal solution of SP0n converges w.p.1 to that of P0. Moreover, if a standard large deviation principle holds for ĝ (which is typical), the probability that the two solutions differ converges to 0 exponentially with n. [Adaptation of Vogel 1994, for example.].

(78) 63. Solving the sample optimization problem. aft. Integer programming with cutting planes. [Atlason, Epelman, and Henderson, 2004; Cezik and L’Ecuyer 2005] Replace the nonlinear constraints in SP0n by a set of linear constraints. This gives an integer program (IP).. Dr. We start with a relaxation of the IP problem (fewer constraints). Then, at each step, use simulation to compute the service levels in SP0n for the optimal solution ȳ of the current IP. For each SL constraint that is not satisfied, add a cut based on estimated subgradient. Stop when all SL constraints of SP0n are satisfied..

(79) 63. Solving the sample optimization problem. aft. Integer programming with cutting planes. [Atlason, Epelman, and Henderson, 2004; Cezik and L’Ecuyer 2005] Replace the nonlinear constraints in SP0n by a set of linear constraints. This gives an integer program (IP).. Dr. We start with a relaxation of the IP problem (fewer constraints). Then, at each step, use simulation to compute the service levels in SP0n for the optimal solution ȳ of the current IP. For each SL constraint that is not satisfied, add a cut based on estimated subgradient. Stop when all SL constraints of SP0n are satisfied. In practice, for large problems, we solve the IP as an LP and round the solution (at each step, to be able to simulate). We select a rounding threshold δ (usually around 0.5 or 0.6). Heuristic!.

(80) 63. Solving the sample optimization problem. aft. Integer programming with cutting planes. [Atlason, Epelman, and Henderson, 2004; Cezik and L’Ecuyer 2005] Replace the nonlinear constraints in SP0n by a set of linear constraints. This gives an integer program (IP).. Dr. We start with a relaxation of the IP problem (fewer constraints). Then, at each step, use simulation to compute the service levels in SP0n for the optimal solution ȳ of the current IP. For each SL constraint that is not satisfied, add a cut based on estimated subgradient. Stop when all SL constraints of SP0n are satisfied. In practice, for large problems, we solve the IP as an LP and round the solution (at each step, to be able to simulate). We select a rounding threshold δ (usually around 0.5 or 0.6). Heuristic! Phase II: run longer simulation to perform a local adjustment to the final solution, using heuristics (add, remove, switch)..

(81) 64. aft. Other objectives and constraints (alternative formulations) Chance constraints: Replace long-term average gk,j (y) by a tail probability of the service level, e.g.: P[SLk,j (τ ) ≥ lk,j ] ≥ αk,j. for all k, j.. Dr. Can use sample average approximation (SAA). Not easy to solve the SAA. Cutting planes, Benders decomposition, ... Ongoing PhD thesis of Anh Thuy Ta..

(82) 64. aft. Other objectives and constraints (alternative formulations) Chance constraints: Replace long-term average gk,j (y) by a tail probability of the service level, e.g.: P[SLk,j (τ ) ≥ lk,j ] ≥ αk,j. for all k, j.. Dr. Can use sample average approximation (SAA). Not easy to solve the SAA. Cutting planes, Benders decomposition, ... Ongoing PhD thesis of Anh Thuy Ta. Optimizing call routing rules. Chan, Koole, L’Ecuyer, in Manufacturing and Service Operations Management (2014). Replace constraints by penalties. Etc..

(83) 65. aft. Conclusion. Simulation and optimization can be useful only to the extent that we can trust the model. We can do more and more simulation runs and compute arbitrarily tight confidence intervals on certain unknown quantities, but this can be meaningless if the simulation model is not representative.. Dr. Huge masses of data are becoming available, at a rate never seen before. Exploiting this data to build credible and valid stochastic models of complex systems is in my opinion the biggest challenge that we now face for simulation..

(84) 66. References for the material of this talk: I B. N. Oreshkin, N. Régnard, and P. L’Ecuyer, “Rate-Based Daily Arrival Process. aft. Models with Application to Call Centers”, Operations Research, 64, 2, 510–527, 2016. I R. Ibrahim, P. L’Ecuyer, H. Shen, and M. Thiongane, “Inter-Dependent,. Heterogeneous, and Time-Varying Service-Time Distributions in Call Centers”, European Journal of Operational Research, 250 (2016), 480–492 I R. Ibrahim, H. Ye, P. L’Ecuyer, and H. Shen, “Modeling and Forecasting Call. Center Arrivals: A Literature Survey and a Case Study”, International Journal of Forecasting, 32, 3, 865–874, 2016. I R. Ibrahim and P. L’Ecuyer, “Forecasting Call Center Arrivals: Fixed-Effects,. Dr. Mixed-Effects, and Bivariate Models,” Manufacturing and Service Operations Management, 15, 1 (2013), 72–85. I A. Jaoua, P. L’Ecuyer and L. Delorme, “Call-Type Dependence in Multiskill Call. Centers”, Simulation: Transactions of the Society for Modeling and Simulation International, 89, 6 (2013), 722–734. I R. Ibrahim, P. L’Ecuyer, N. Régnard, and H. Shen, “On the Modeling and. Forecasting of Call Center Arrivals”, Proceedings of the 2012 Winter Simulation Conference, IEEE Press, 2012, 256–267..

(85) 67. I N. Channouf and P. L’Ecuyer, “A Normal Copula Model for the Arrival Process in. aft. Call Centers,” International Transactions in Operational Research, 19 (2012), 771–787. I A. N. Avramidis, W. Chan, M. Gendreau, P. L’Ecuyer, and O. Pisacane, “Agent. Scheduling in a Multiskill Call Center,” European J. of Operations Research, 200, 3 (2010) 822–832. I T. Cezik and P. L’Ecuyer, “Staffing Multiskill Call Centers via Linear Programming. and Simulation”, Management Science, 54, 2 (2008), 310–323. I A. N. Avramidis, A. Deslauriers, and P. L’Ecuyer, “Modeling Daily Arrivals to a. Telephone Call center”, Management Science, 50, 7 (2004), 896–908. I A. N. Avramidis and P. L’Ecuyer, “Modeling and Simulation of Call Centers”,. Dr. Proceedings of the 2005 Winter Simulation Conference, IEEE Press, 2005, 144–152. I W. Chan, T. A. Ta, P. L’Ecuyer, and F. Bastin, “Two-stage chance-constrainted. staffing with agent recourse for multi-skill call centers,” Proceedings of the 2016 Winter Simulation Conference, IEEE Press, 2016, 3189–3200. I T. A. Ta, P. L’Ecuyer, and F. Bastin, “Staffing Optimization with Chance. Constraints for Emergency Call Centers,” MOSIM 2016: the 11th International Conference on Modeling, Optimization and Simulation, Montreal, 2016..

(86)