• Aucun résultat trouvé

Algorithmic advancements in the practice of revenue management

N/A
N/A
Protected

Academic year: 2021

Partager "Algorithmic advancements in the practice of revenue management"

Copied!
184
0
0

Texte intégral

(1)

Algorithmic Advancements in the Practice of Revenue

Management

by

Jonathan Z. Amar

Diplôme dingénieur, École polytechnique (2016)

Submitted to the Sloan School of Management

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Operations Research

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2021

c

⃝ Massachusetts Institute of Technology 2021. All rights reserved.

Author . . . .

Sloan School of Management

January 8, 2021

Certified by . . . .

Nikolaos Trichakis

Zenon Zannetos (1955) Career Development Professor

Thesis Supervisor

Accepted by . . . .

Patrick Jaillet

Dugald C. Jackson Professor

Department of Electrical Engineering and Computer Science

Co-Director, Operations Research Center

(2)
(3)

Algorithmic Advancements in the Practice of Revenue

Management

by

Jonathan Z. Amar

Submitted to the Sloan School of Management on January 8, 2021, in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy in Operations Research

Abstract

In recent years, firms have been personalizing the customer experience by recommending specific products, and simultaneously these customers have raised their expectations in terms of personalization. To support these efforts, the firm must conceptualize its understanding of the market and the way customers interact with their products. This requires some modeling of how customers value each product and their attributes. The first part of the dissertation is dedicated to showing how one can go from data to personalization in retail applications.

The first chapter’s contribution is methodological, as we have worked closely with our partner beer retailer on providing store-specific assortment optimization. Using an efficient estimation procedure for choice models, conjointly with a novel application of collaborative filtering, we learn a demand model which is store specific and reliable, using a cautious validation procedure. Once armed with our model, we leverage continuous optimization techniques coupled with technical advances, to produce at scale personalized assortments which generate higher revenue subject to multiple business constraints.

The second chapter considers a different setting relevant to new e-retailers which lack the data to inform their personalization. These usually rely on questionnaires to extract information. We incorporate the personalization task into the questionnaire design, which is driven by the product recommendation objective. We provide a framework for extending ex-tant utility-estimation questionnaires, and additionally provide a direct approach leveraging robust optimization for tractability. We support our work with numerical simulations and theoretical justification in simplified settings, promising practical gains for personalization.

While we have acknowledged data uncertainty in the first part, the second part of the dissertation is focused on the study of uncertainty in modern markets, and how to address it. The third chapter considers the canonical Network Revenue Management (NRM) problem. More specifically, we take the perspective of a monopoly seller which offers multiple products which consume capacitated resources. Given demand forecasts at the granularity of products may be unreliable, in cases where demand is highly volatile or sporadic –e.g. the airline, hotel industries–, we provide a distribution free algorithm for NRM, which is essentially robust to market uncertainties. By analyzing our algorithm’s performance through the primal-dual schema, we establish its asymptotic optimality under the competitive lens. We benchmark

(4)

our algorithm by showing that in regimes where the market is potentially rapidly changing, we outperform state of the art methods.

Finally in the fourth chapter, we analyzed the problem faced by budget constrained e-advertisers. Ad-slots are allocated using a second price auction, that is the highest bidder wins the auction paying the second highest price. In this case, the advertiser is faced with a bidding decision, not knowing how much they will need to pay if they win. Considering the price uncertainty at the time of bid, which is specific to this modern market, we pro-vide a methodology for converting plethora of knapsack algorithms to bidding strategies, implementable for this setting where the price is unknown. We show near-optimality of the bidding strategies, which in turn have substantial potential impact for e-advertisers.

Thesis Supervisor: Nikolaos Trichakis

(5)

Acknowledgments

First and foremost I would like to deeply thank my advisor Prof. Nikos Trichakis, who has been a mentor from the start of my PhD. You have been an amazing academic guide, but also a great source of professional and personal advice. You have always put my interest first, and made sure I had access to the best opportunities. I really appreciate all the attention I have received and the time you have dedicated helping me advance in my work and my life. Thank you for also being a role model, I look up to your achievements and your mentoring, and I will always value your advice.

Thank you to my Thesis Committee for your support. I would like to thank Prof. Chaithanya Bandi for your involvement earlier in my thesis on the questionnaire design. Thank you Prof. Vivek Farias for your guidance throughout the past years on the applied project, and for giving me the opportunity to join your research group. I consider you as my second informal advisor, and will always value your input. Thank you Prof. Patrick Jaillet for your support during my PhD, you have been involved since my General Examination. I would also like to thank the faculty members that have been involved in the coursework I have taken and with whom I had interactions with, whether by giving be me the pleasure of being their Teaching Assistant or sharing research ideas and feedback. You have allowed me to grow and develop throughout my last years as student. Thank you to the MIT staff and specifically the ORC staff, Laura Rose and Andrew Carvalho, for making this possible with your patience and dedication.

I would like to thank my research collaborators. First thank you to my friend Nicholas Renegar who has worked closely with me on Chapter 4, this has been a stimulating project, and I hope we will work on extensions together in the future. A large part of my recent research has been devoted to an applied project with a large beer retailer, on which I worked jointly with my friend Patricio Foncea, whom I would like to thank for all those hours implementing, debugging and brainstorming about choice models. We learnt plenty together, and I really enjoyed working with you. I wish you the best for this ongoing project! Thank you Tianyi Peng for your effort as well, and I am looking forward to having you fully on board. I would also like to thank the team from our industry collaborator, Sarosh, Justin,

(6)

Emily, Debarshi, Arjun and many others... who helped us develop the work with their constructive feedback. Thank you for your patience and your collaborative spirit, I am so glad to have worked with each one of you. I would also like to give thanks to the faculty and all students involved in the COVID-19 Policy Alliance, I feel fortunate I had the opportunity to work with such a team and be part of this larger impactful group.

Throughout my four and a half years at the ORC, this unbelievable place, I feel lucky I was able to meet so many great people who form a strong network and family. I will remember the working sessions in the common space, preparing for “quals” or brainstorming on research ideas, or any other ideas as a matter of fact! The lunch space and the coffee breaks served as naturals enhancers of this great student community. Thank you to all the friends I have made in this amazing group: Antoine the roommate that managed to get away in two, Jean for criticizing my sporadic office schedule, Ali for getting me an internship and Max for getting me a job hopefully, Rim for the never ending discussions, Tamar for pushing me to speak Hebrew, Omar and Driss for the soccer streams, Nicholas for the pranks, Arthur for always bringing a good mood, Andreea for being an exceptional TA, Ted for being an awesome water-polo teammate, and so many others: Agni, Andrew VdB, Elizabeth, Emily, Jackie, Galit, Ryan, Andy, Andrew Li, Max, Sebastien, Colin... and my cousins the MBAns. There are so many moments to remember and this time would not have been the same without you.

I want to thank the people that have supported me these years. Starting with my roommates Donovan for always being positive, Noam and Avi for bringing a full house. Thank you to my good friend and co-TA Jonathan Baravir, my good buddies Jeremy and Maxime whom I already miss. Thank you to Hanna(h) and Noémie, Arnaud and Simon, Gabrielle, Aurore and François, for being part of my closer circle and having made me feel at home this whole time. Thank you Menachem for all the teachings. I have many other people to thank for making Cambridge MA feel like Cambridge FR, Antoine, Pierre-Luc, Jad, Louis, Nathan, Victor, Anne-Claire...

I want to give a special thank you to my family who has been supporting and bearing me, my parents, my four sisters, my grandparents and all those beyond that have participated in my life and made me who I am today. I cannot thank you enough for your love, your

(7)

encouragements. My parents, thank you for all of your trust regardless of the distance, I know I could not have made it to this day without your unconditional support. My sisters thank you for your love and all of the good times we have spent as a family, which I value more than any. Thank you to my childhood friends from Nice for this relationship I have been cherishing ever since I can remember, and thank you to all that have brought joy in my life from childhood until today. Thank you Corinne and Guy, for being so welcoming. Finally, the person that is half of my life, thank you Natacha for putting up with me every day, I could not be grateful enough of you unfailingly carrying me up and pushing me. I am so glad we have gone through this together, and am looking forward to the years to come.

(8)
(9)

Contents

Introduction 17

I

From Data to Personalization

26

1 Personalizing Assortments from Sales Data for Large Beverage Retailer: a Collaborative Filtering Perspective on Choice Modeling 27

1.1 Introduction . . . 27

1.1.1 Our Contribution . . . 29

1.1.2 Related Literature . . . 30

1.2 Choice Model Estimation . . . 33

1.2.1 Notation and Preprocessing . . . 33

1.2.2 Parametric Modeling . . . 35

1.2.3 Boosting with Low Rank Utility Model . . . 39

1.3 Revenue maximization . . . 43

1.3.1 Solving the Master-problem . . . 44

1.3.2 Solving the Sub-problem . . . 46

1.4 Concluding Remarks . . . 48

2 Product-Driven Questionnaire Design for Product Display Recommenda-tions 51 2.1 Introduction . . . 51

2.1.1 Related Literature . . . 57

(10)

2.3 Preference Elicitation Questionnaires . . . 61

2.4 Product Recommendation Questionnaires using Modified Preference Elicita-tion QuesElicita-tionnaires . . . 63

2.4.1 Evidence . . . 65

2.4.2 Q-modified PEQ: Performance Analysis . . . 66

2.4.3 Uncertainty Reduction Rate . . . 70

2.5 Product Recommendation Questionnaires using Robust Optimization . . . . 72

2.5.1 Robust Product Recommendation . . . 73

2.5.2 Robust Product Recommendation Questionnaire Design . . . 74

2.6 Numerical Experiments . . . 75

2.6.1 Simulation Environment and Questionnaires . . . 76

2.6.2 Synthetic Product Data . . . 76

2.6.3 Real data . . . 79

2.7 Concluding Remarks . . . 80

2.8 Appendix: Additional Proofs. . . 82

2.8.1 Proof of Proposition 2.1 . . . 82

2.8.2 Proof of Proposition 2.2 . . . 83

2.8.3 Proof of Proposition 2.3 . . . 85

2.8.4 Proof of Proposition 2.4 . . . 86

II

Dealing with Uncertainty in Modern Markets

87

3 Distribution Free Algorithms for Network Revenue Management, a Primal-Dual Approach 89 3.1 Introduction . . . 89 3.1.1 Our Contribution . . . 91 3.1.2 Related Literature . . . 92 3.2 Problem Formulation . . . 94 3.3 Primal-dual Algorithm . . . 98 3.4 Analysis . . . 100

(11)

3.4.1 Online Setting and Competitive Ratio . . . 100

3.4.2 Upper Bounds - Adversarial Scenarii . . . 101

3.4.3 Asymptotic Optimality of the Primal Dual Algorithm . . . 102

3.4.4 Parameter Tuning for Fixed Networks . . . 105

3.5 Special cases . . . 107 3.5.1 Base Network . . . 107 3.5.2 Combined Product . . . 113 3.6 Benchmarking . . . 116 3.6.1 Theoretical Benchmark . . . 117 3.6.2 Numerical Benchmark . . . 118 3.7 Concluding Remarks . . . 121

3.8 Appendix: Deferred Proofs . . . 122

3.8.1 Proofs of section 3.4 . . . 122

3.8.2 Proofs of section 3.5 . . . 131

3.8.3 Proofs of section 3.6 . . . 143

3.9 Appendix: Remaining special cases . . . 144

3.9.1 Triangular Network . . . 144

4 The Second-Price Knapsack Problem: Near-Optimal Real Time Bidding in Internet Advertisement 149 4.1 Introduction . . . 149

4.1.1 Our Contribution . . . 151

4.1.2 Related Literature . . . 153

4.2 Model and Analysis . . . 154

4.2.1 Model Framework . . . 155

4.2.2 Near Optimality of the Linear Bid. . . 156

4.2.3 Adapting Online Knapsack Algorithms . . . 157

4.3 Empirical Results . . . 163

4.3.1 iPinYou Dataset . . . 163

(12)

4.3.3 Results. . . 165

4.4 Concluding Remarks . . . 168

4.5 Appendix: Additional Tables. . . 169

4.5.1 Data Summary . . . 169

4.5.2 Testing Results . . . 171

4.6 Appendix: Proof of Proposition 4.1 . . . 171

(13)

List of Figures

1-1 Rank validation, analysis of different metrics for varying rank of additive utility. 41 1-2 In sample accuracy at different stages of the estimation.. . . 42 2-1 Relative improvement for the base case of synthetic product data, for different

methods. . . 79 2-2 Relative improvement of Q-framework on real data on existing methods, for

increasing T . . . 80 2-3 Relative improvement of Robust methods on real data against existing

meth-ods, for increasing T . . . 81 2-4 Representation of the mistake probability. . . 83 3-1 Guaranteed competitive ratio as a function of κL (log scale), compared to the

upper bound from Theorem 3.1, and their ratio which goes to 1. . . 104 3-2 CR in Base Network by optimizing the split parameter, compared to arbitrary

split and best bound for varying κ. . . . 110 3-3 Revenue fraction on instances with κ× L = 10, n = 1000 with Multiplicative

Drift for varying drift parameter ρ, and with fixed: . . . 120 3-4 Revenue fraction on instances with κ× L = 10, n = 1000 with Decreasing

Multiplicative Drift for varying drift parameter ρ, and with fixed Coef. Var. across time. . . 120 3-5 Revenue fraction on instances with κ× L = 10, n = 1000 with Linear Drift

for varying drift parameter ρ, and with fixed:. . . 121 4-1 Real-Time Bidding Flow Chart . . . 150

(14)
(15)

List of Tables

2.1 Ranges of relative improvements (in %) of Q-modified PEQs over original PEQs, for the base case of synthetic product data, for varying parameters.

The last column also reports any observed trend. . . 78

2.2 Ranges of relative improvements (in %) of the robust method over EPEQ and Q-modified PEQs, for the base case of synthetic product data, for varying parameters. . . 79

4.1 Algorithm 4.1 and Adaptive Pacing Comparison on Season Three Testing . 167 4.2 The summarized log data format for an impression and their description. . . 170

4.3 The log data format for an impression . . . 171

4.4 Season Three - Data Summary . . . 171

4.5 Season Three Subset for Online Testing - Data Summary . . . 171

(16)
(17)

Introduction

Given access to data has been progressively improving, the industry has steadily been in-creasing its interest in incorporating data to make better business decisions. The study and practice of Revenue Management, whether it be in academia or in industry, has naturally geared its attention towards this data. That is, working on interpreting the underlying in-formation it contains, in order to better understand market conditions and eventually tailor its decision systems using the insights learnt. This is commonly referred to as analytics. Much of the computer science community has focused on the prediction task leveraging data which has enabled plethora of practical problems to be supported by Machine Learning tools by and large – for example image recognition of breast scans to detect cancerous cells, anomaly detection in phone-based behavior to predict psychiatric symptoms, natural lan-guage processing to assist customers throughout web or phone chats, etc. The prescriptive task, differently said, how can one use the data to actually make better decisions, has less been the center of attention, and is often owned by Operations Research.

The main contributions of this thesis lie widely under two umbrellas.

The first one is personalization and recommendation systems, under different data sources which is a special case of prescriptive analytics. In chapter 1, we study the perspective of a large beer retailer which possesses quite some information encapsulated in historical sales, and can learn customer preferences by notably leveraging cross learning across different sale points. This first work is applied and methodological, and aims to implement a store specific assortment optimization from aggregate sales data. We then contrast in chapter 2 with the case of smaller retailers which have little access to relevant data, to that end they need to query information from their customers in order to display products that may be appealing to them. We tailor the questionnaire design to improve the product display specific to each

(18)

customer.

The second umbrella is that of capturing the inherent uncertainty in modern markets, whether by mechanism or by curse of dimensionality, and developing adequate decision mak-ing systems. In chapter 3, we focus on the Network Revenue Management problem when demand is highly volatile to the extent forecasts are not reliable. We provide an optimal distribution free algorithm for making sales decision, the contribution being algorithmic and theoretical. Finally in chapter 4 we study the specific problem faced by e-advertisers which bid in second-price auctions and who are budget constrained. In this case the winning price and the budget consumption are unknown at the time of bid, given it is to complex to model competitors’ strategies. We develop bidding strategies for the second-price knapsack problem that extend the classical online knapsack selection problem.

i. From Data to Personalization

A common challenge in industry, which is reflected in our work, is to derive managerial in-sights from data. At a high level, there is some intent in trying to conceptualize the customer process, and in particular how the industry decisions affect this customer process (whether it is customer experience, customer satisfaction...). This is precisely the role of prescriptive analytics: to appropriately model the effect of industry decisions onto the customer process, understood respectively as the effect of an input on an output. The task is by nature problem specific, yet there are common principles that enable extracting relevant information from the data for the problem at hand. In general, it is assumed that there is some underlying structure which govern the general customer process, but the exact structure is unknown and the observable data can enable some inference of this structure. The learnings drawn from the data can then be used to guide the decision making process on the business end.

Assortment Optimization

To illustrate the derivation of managerial insights, we describe a common use case, detailed in chapter1. In discrete choice modeling [1], the literature assumes that customers are endowed

(19)

with some utility, unknown to the retailer, for each of the products they are presented. They further assume these customers are utility maximizing, therefore when customers must choose (purchase) a discrete product, they select the one with highest utility among the ones offered. The observational sales data, which would represent discrete choices made by customers, can help infer the underlying structure of these endowed utilities. Once the retailer is armed with the intelligible representation of the data, they can make decisions – like offering products preferred by customers, promote or remove a product – that meet business objective at hand. Given these guiding principles for going from data collection onto its utilization, we detail in the following some practical applications studied in this thesis, and the ensuing challenges. Generally, the access to customer data – e.g. customer surveys, sales figures– with the help of analytics enables the business to understand the specificities of the present customer market. While this can be used to various marketing ends, recently the industry has been investing some dedicated effort into personalizing the customer experience, in particular by personalizing the product sets offered to customers. Personalization, whether it be tailoring assortments in physical retail or recommending appealing products to online customers, has become a common lever for multiple reasons: the first one being that personalization does not typically require a large capital investment (as an ad campaign would) but rather requires an efficient utilization of the retailers’ resources, the second reason being that there are potential gains that range from increasing revenue to improving customer satisfaction, which may be thought as long term gains. Further, given that customers are frequently at the center of recommendation systems, they now expect all businesses to offer them some level personalization, becoming the new standard in retail. In this thesis, I study in different settings how the business can go from obtaining data to personalizing the customer experience, and each setting posits its own challenges. For instance, when personalizing the offered assortment within a specific store, the available set will directly affect purchase behavior for all customers entering that store, as all customers are faced with the same assortment. In this context, the store must gear its assortment to reach many customer segments, in turn, to satisfy customers on aggregate. In chapter 1, we recommend store specific assortments, which requires to understand, leveraging the historical sales data, how customers in that store will value an offered set of products.

(20)

There are several dilemmas coming up when going from data to assortments. It is gener-ally assumed that the retailer possesses an explicit picture of the discrete choice data, that is the customer choices made among all options offered to the customer. This assumption can be questioned as retailers usually only have the sales (or even purchase) data through aggregate statistics. An arising challenge is to precisely infer customer preferences or a choice model, when the data itself is inaccurate. The inaccuracy within data can also be due to noise or misreporting, in which case properly accounting for noise in the decision making is crucial. Furthermore, the increasing amount of data also creates a computational challenge when trying to learn from data which needs to be addressed in practice. Traditional infer-ence methods are no longer applicable, so extending these using advances in optimization and hardware is critical in order to learn managerial insights. Throughout this thesis, I assess the practicality of the proposed analytical methods through their computational cost.

Related contributions. In chapter 1 we consider the problem faced by our partner beer retailer in an effort to personalize their products offered. We establish a general assortment optimization methodology starting from store sales data. Again, personalizing assortments is a critical piece to customer satisfaction and revenue maximization for beer retailers: con-sumer preferences are geography and time varying, and stores are capacity constrained. We first estimate a parametric utility model from classical choice models, our developed estima-tion methodology scales on large datasets. We invert market shares to obtain the residual utilities for observed pairs of product-stores. Using collaborative filtering, we impute the unobserved residuals; furthermore this piece in effect denoises the observed residuals, i.e. much of the observational data. Our final utility model consists of our parametric term plus a low rank correction, and we validate its performance using domain expertise.

In the second part of the chapter, we use the learnt model –parametric, low rank, and bootstrapped confidence intervals– to optimize the risk adjusted revenue subject to business and market share constraints at the store level. We rely on a continuous extension of the set function, and an appropriate first order method, to optimize for revenue. Our methods leverage advances in software and parallel computation to scale to our problem instance size.

(21)

Product Display

A commonly encountered challenge in several prescriptive analytics tasks is the lack of data, especially in retail. While the access to data has been steadily increasing, the number of online retail platforms, product availability and number of customers have also been increasing. With that in mind, the prescriptive task can be compromised if no relevant data is accessible for some specific customer or product. In fact many (smaller) online retailers have little information about their customers, which can be due to the nature of their business or the type of products they offer. Whereas larger industries have repeated interactions with customers, and can practically infer customer preferences, the smaller retailers must inform their operational decisions through a different mechanism. In this case online retailers rely on a variety of tools to actually go about obtaining customer specific information, which can be helpful to reach a business objective, lets say customer satisfaction by displaying appealing products. The nature of the tools used depend on the business, and they often belong to the wide family of customer queries: these can take the form of comparison based questions, “thumbs-up” type questions, product scoring, etc. These all relate to the Conjoint Analysis literature [2], where the objective is to identify how different attributes of products are valued by customers. However their focus is generally within the marketing realm (i.e. utility estimation), whereas a prescriptive task, like the product display problem analyzed in chapter2, tries to operationalize the learnings from the query data directly.

An other salient feature for online retailers, specifically when they display products, is they can practically recommend different product sets to each entering customer and achieve a much higher level of personalization. This further meets the increasing personalization re-quests from the customer end. The personalization task requires to leverage analytical tools to build customer centric models, that measure business metrics as we modify our displayed products. Solving these problems is an opportunity to operationalize methods grounded in Machine Learning. Combining these features, customer queries and personalization, entails that there are perhaps efficient ways to query customers and obtain higher quality informa-tion for the prescripinforma-tion objective. The emerging challenge is to design an efficient querying mechanism that will better serve the product display objective in hindsight.

(22)

Related contributions. In chapter 2 we take the position of sellers lacking personalized data for their customers, which usually present them with questionnaires to extract informa-tion about them and guide the sellers’ personalizainforma-tion efforts. In order to meet customers’ preferences, sellers must recommend personalized sets of products using the extracted infor-mation. We develop two methods for designing adaptive choice-based questionnaires tailored for the product recommendation objective. The first method leverages extant choice-based questionnaire design algorithms for utility estimation. Because the latter were developed to tackle a different task, we propose a modification that better aligns them with the task of product recommendation. The second method provides a direct solution approach, lever-aging robust optimization to enable tractability. Numerical studies using both synthetic and real-product data reveal that our methods provide substantial performance gains over state-of-the-art benchmarks. We conduct a performance analysis in a simplified context that provides theoretical justification for the benefits of the proposed modification. Our findings imply that personalization systems using extant adaptive choice-based questionnaires could readily benefit by the modification we propose, or could benefit by utilizing the proposed robust approach.

ii. Dealing with Uncertainty in Modern Markets

Distribution Free Algorithms

The above efforts in personalization suggests that the available data is rich enough to inform decisions. In the case of product recommendation, this means the business would need to extract some accurate information about customer demand across the range of products it has to offer, and also across the different price range that the business can set. In many practical problems, this even requires learning some time dependent signal which would characterize how customer preferences change across time. The complexity of grasping man-agerial insights from the data at the granularity of product by time and feature can very well be unreasonable in several practical cases, even though the success of analytics has been established for aggregate predictions. This challenge is made more apparent in modern retail markets where there are many more products available (for example due to

(23)

personal-ization), where customers have direct access to this variety online, and the market conditions themselves are highly changing, for example due to “rare” events (holidays, pandemic, viral growth)! This issue compromises the extensive use of data and their forecasting ability. The predictive power being hindered, one must acknowledge the uncertainty in such modern markets, and develop adequate decision making systems.

For example, in the context of Network Revenue Management analyzed in chapter 3, we overcome the need for accurate demand forecasts for different products at different prices and times, which would suffer from the curse of dimensionality. This problem arises in airline ticketing, automobile rentals, cloud computing, etc. which illustrates its importance. These forecasts, had they been accurate, would have enabled our decision system to tradeoff the value of a present offer on a product consuming multiple resources versus the potential opportunity of keeping those resources for later customers. I therefore study one uncertain market and the associated sale decision policy, where the objective is to efficiently serve arriving customer requests at the cost of capacitated resources. We establish a sale decision algorithm that makes no assumption on the demand patterns, which are in effect uncertain or even unknown. This is in stark contrast with much of the Network Revenue Management literature, which essentially takes for granted the access to credible forecasts, and focuses its work on the algorithmic challenge posed by the inherent dynamic program. While there has been effort to consider demand uncertainty through robust methods for Network Revenue Management (i.e. potential errors in the forecasts) [3], to the best of our knowledge we are the first to remove assumptions on forecasts altogether.

Related contributions. In chapter3we consider the canonical Network Revenue Manage-ment problem in which a monopoly retailer offers multiple products which consume capaci-tated resources, for sale to customers. Customers arrive sequentially, bid for some product at some price, and the retailer needs to make sales decisions in an online fashion, subject to inventory capacity constraints. We focus on environments whereby demand is highly un-certain, due to, for example, rapidly changing market conditions (e.g. pandemic, holidays). This uncertainty is emphasized by the granularity level of forecasts (e.g. for personalization) within modern markets. In such settings, it is often highly desirable for decision makers to guide sales decisions through distribution-free methods that are easy to calibrate and could

(24)

provide strong performance under many possible demand realizations, rather than methods that rely on imprecise and ever-changing demand forecasts. We provide a distribution-free online algorithm for Network Revenue Management and use the primal-dual schema to an-alyze its performance. We establish that if the network structure and demand sequence are chosen adversarially, the proposed algorithm is asymptotically optimal, in the sense that it achieves the best possible competitive ratio, even for imbalanced inventory levels and hetero-geneous demand with multiple prices. By analyzing some special relevant network structures, we show how to tune the hyper-parameters of our algorithm in an intuitive way to achieve stronger performance. Finally, we conduct computational experiments that demonstrate the proposed algorithm is robust to market uncertainty, and enjoys strong performance in practice outperforming existing methods.

Second-Price Knapsack Problem

Our approach for Network Revenue Management is only reflective of a recent push from the Operations Research community to provide robust solutions when faced with uncertainty [4], which shares some similarity in spirit with competitive analysis [5]. The literature has stud-ied a vast array of applications and how to incorporate uncertainty within their modeling. Another setting therein uncertainty is rooted is auction theory. Simply said, the principle of an auction is to have participants bid to win an item, whereas competitors bidding behavior is unknown. Auctions are widely used across Internet advertisement allocations, which is a notoriously large modern market. Similar to our approach for Network Revenue Manage-ment, in chapter 4we omit the need to predict opponents behavior, likely a overly complex task, and directly derive implementable bidding strategies that utilize minimal forecast in-formation. As before, the ability to precisely model opponents bidding behavior and their valuation for advertisements is out of reach given the size of the online advertisement market. We overcome this missing information by modeling uncertainty in the market faced by one bidder, which can be viewed as bidding against a single unknown bidder. Furthermore, an additional specificity of Internet advertising, the auction mechanism used is often second-price auction, which entails that the second-price paid by the winner is unknown when they bid. This uncertainty is structurally inherent to the market, and we address it with

(25)

appropri-ate modeling. The challenge is to derive bidding policies when the prices, i.e. the market conditions, are uncertain.

Related contributions. In chapter4we study the bidding problem in online advertisement (ad) exchanges, where ad slots are each sold via a separate second-price auction. This chapter considers the bidder’s problem of maximizing the value of ads they purchase in these auctions, subject to budget constraints. This “second-price knapsack” problem presents challenges when devising a bidding strategy because of the uncertain resource consumption, specific to the market: bidders win if they bid the highest amount, but pay the second-highest bid, unknown a priori, as it is dependent on competitors behavior, unreasonably too complex to model. This is in contrast to the traditional online knapsack problem, where posted prices are revealed when ads arrive, and for which there exists a rich literature of primal and dual algorithms. The main results of this work establish general methods for adapting these online knapsack selection algorithms to the second-price knapsack problem, where the prices are revealed only after bidding. In particular, a methodology is provided for converting deterministic and randomized knapsack selection algorithms into second-price knapsack bidding strategies, that purchase ads through an equivalent set of criteria and thereby achieve the same competitive guarantees. This shows a connection between the traditional knapsack selection algorithm and second-price auction bidding algorithms, that has not before been leveraged to handle the inherent uncertainty. Empirical analysis on real ad exchange data verifies the usefulness of this method for handling uncertainty in this market, and gives examples where it can outperform state-of-the-art techniques.

(26)

Part I

(27)

Chapter 1

Personalizing Assortments from Sales

Data for Large Beverage Retailer: a

Collaborative Filtering Perspective on

Choice Modeling

1.1

Introduction

The beer industry has been a growing then stable industry sector, where consumer pref-erences have been shifting from classical value products to craft beer; in fact the volume associated with craft beer grew 4% whereas concurrently the overall market went down.1

The changing preferences have been illustrated by both the increase in product availability of craft beers, as well as by its volume growth. These trends in sales reflect that cus-tomer preferences have been changing over time. However, these preferences have not been as widely detected within shopping behavior across all types of stores and all geographies. These effects highlight the extent to which the market is differentiated across different points of contact POC and across time.

In 2019, the U.S. beer industry sold over $119 billion to consumers in beer products,2

1

https://www.brewersassociation.org/statistics-and-data/national-beer-stats/

2

(28)

this totals over 6 billion gallons of beer! These sales go across a variety of products (bar-rels and single bottles), variety of styles (value to premium) and a variety of sale points (supermarket to gas stations, across all states). Given this tremendous volume, there is a tremendous interest from all beer retailers to try to win market share from their competitors and conjointly increase the overall beer market volume: these would yield large dollar value. Several beer retailers rely on different marketing strategies that span commercial display on billboards or television, targeted advertisement on web platforms, promotional offers to POC which translate into markdowns offered to customers, personalization in order to create the most appealing product as observed by the consumer... An other widely used lever is to tailor the set of products that are offered in different POCs with the intent to meet customer preferences. In fact customers are more or less appealed by certain products, and this sensitivity is naturally both geography and demography dependent. Customer preferences are also strongly affected by product seasonality (for example holiday specials), and the retailer must again tailor the product set offered in order to be most appealing to customers and generate maximum revenue. Personalization and assortment optimization has been an increasingly popular trend in various retail sectors and online platforms, to the extent that customers as a whole have raised their expectations: if personalization was a means to target customers niches or increase revenue in the past, it is now critical to offer some level of customer-centric tailoring. Furthermore adapting the offered product set does not require major expenditures in advertising. Personalization rather directly seeks to increase the “efficiency” of the offered assortment given the current state of the market, whether to meet higher customer satisfaction or a revenue target. These changes may require changing some parts of the supply chain, which we are not concerned with in this work.

These conditions, low cost of implementation and customer requisite, makes assortment optimization a preferred application area of Revenue Management with large potential rev-enue margins.

(29)

1.1.1

Our Contribution

For this chapter, we have been working closely with our large beer retailer collaborator.We had access to data which captures sales throughout multiple years, describing products, de-mographics and stores. Some of these features are time and/or store varying (e.g. price is set exogenously by the store). Using this rich and massive data-source, we have estimated a parametric choice model with a scalable methodology. This choice model stems in discrete utility models, and our estimation returns a preference vector similar to conjoint analysis and characterizing customer preferences among products. In effect, for a fixed assortment our model predicts some utility for each product, afterwards market shares are predicted according to these utilities. In particular, we have implemented increasingly complex utility choice models, from Multinomial Logit MNL, Nested Multinomial Logit NMNL, onto Cross Nested Logit CNL which we have settled with. The increasing complexity, while risking to overfit the data, allowed to capture and validate the insights from the business expertise, de-scribing certain feature sensitivity and specific substitution patterns. We leveraged advances in vectorized computation to efficiently scale our estimation procedure to larger datasets and complex utility models that have increasing number of parameters. This extends existing software which do not scale to our size problem.

In order to obtain confidence intervals on our parametric model, we bootstrapped the entire process by repeatedly removing chunks of data and reestimating the preference vec-tor. This allowed us to detect outliers in our observations, and penalize our utility model accordingly by including risk aversion into our model during the optimization.

Even though our model achieved good accuracy, we worked on enriching our model by incorporating a low rank term to our utility model. More specifically, our utility component now composes of a linear part, already estimated from classical discrete models, and an additional low rank term which we explain how to estimate. Using our parametric model learnt, we first inverse the observed market shares into observed utilities which we sometimes refer to as inverted utilities. Subsequently we define the residual utility as the difference be-tween the observed utility and parametric utility, this defines the residual utility matrix (stores,times by products) which is partially observed. Using collaborative filtering, we then

(30)

denoise this residual matrix observed for only a small set of product-store pairs, and obtain a low rank factorization of the residual utilities: this is a major component of our method-ology. Further the low rank factorization enables the imputation of the residual utilities for unobserved product-store pairs. The low rank structure allows to conserve statistical guarantees, and we can in effect test the accuracy of our final model (this is ongoing work). The strength of this low rank correction is to capture non-parametric connections between stores and features, and denoise to a large extent the observable signal from the aggregate sales data. This is made apparent throughout our rank validation procedure. Overall our estimation methodology carefully accounts for the complexity of the model with an internal validation procedure.

Before describing the optimization methodology let us review our final model: we predict the sales at the store level using the estimated CNL with utilities equal to a penalized parametric term plus a low rank utility term. We now focus on optimizing store level assortments, subject to capacity constraints and market share preservation. We provide an optimization routine that sequentially optimizes for a weighted combination of revenue and market share, and update the weights based on a bisection algorithm. For fixed weights, we are optimizing a revenue function where market shares are given by the CNL. We tackle this problem by considering the continuous multi-linear extension of the revenue function, on which we run the first order Frank-Wolf method to solve the constrained problem. We obtain stochastic gradients by leveraging properties of the multi-linear extension and efficient sampling. We then map the final vector solution to an actual assortment. We account for the non-linearity of our revenue function by running multiple restarts.

While this project is still ongoing, we predict a notable increase in revenue for optimized assortments. These are in the process of being piloted. As a disclaimer, we cannot disclose the full extent of our results nor specific business expertise.

1.1.2

Related Literature

We break down the literature review between the different parts of our methodology, from estimation to optimization.

(31)

Choice Modeling Our analysis for beer consumers follows the common preconception in

discrete choice behavior, that each customer has some endowed utility associated with each product and chooses the product maximizing his utility. The utility for each alternative is given by a random variable with some alternative-specific fixed effect plus an additional noise term which is specifically correlated between alternatives. McFadden [1] derived the Generalized Extreme Value GEV model from random utility models, this wide class encom-passes several commonly used models, with Multinomial Logit MNL from [6] being the most straightforward. However the MNL, due to its simplicity, suffers several drawbacks, the most notable being independence of irrelevant alternative IIA, which results in equivalent substitution patterns (i.e. cross-elasticities) when changing the available set of alternatives. In order to overcome IIA, the Nested Logit Model NMNL developed in [1,7] relaxes this assumption, and allows for correlation between the noise associated to each alternative be-longing to the same nest. However the NMNL is still limiting to some extent, the IIA remains within each nest, as well as between different nests altogether. The Cross Nested Logit CNL developed and analyzed in [8,9,10,11] naturally extends the NMNL, where each alternative can belong to more than one nest with different weights, allowing for a more complex utility correlation among alternatives, hence implying non-trivial substitution patterns. The CNL also generalizes the ordered GEV from [12] where alternatives are ordered, by nests. Bier-laire [13] discusses how one can learn the parameters of a CNL using a first order method by Maximum Likelihood Estimation, as done in [14]. Our work extends their software by notably scaling the estimation procedure to large numbers of observations, large number of available alternatives and large number of alternative features by leveraging advancements in vectorized computation and GPUs.

CNLs have been widely used for detecting customer preferences within the transportation [15, 16,17]. Further work considers variety of choice models relevant for different industries, for example the Mixed MNL also known as MNL with random coefficients by McFadden and Train [18], Markov Chain model in [19, 20], and other structures see for example [21,22].

Matrix imputation and its applications Once we have learnt a parametric choice model,

we estimate the residual utility for observed product-store pairs by inverting market-shares. This results in a partially observed matrix, and our task is to infer the residual matrix for

(32)

missing observations. For the inference to be meaningful, we assume that the true underlying residual matrix is of low rank, which corresponds to a low rank utility term in our utility model. This represents that products can be grouped by some underlying styles and POCs can be grouped by some underlying type, with latent weights which need to be estimated. Furthermore, this low rank representation of the observed matrix, results simultaneously in allowing us to impute the missing entries, and to denoise the observed ones, which is critically important.

Several methods are proposed in the literature to address the matrix imputation problem, which often tackle different objectives. Candès and Recht [23], Candès and Tao [24], Cai et al. [25] consider the problem of exact recovery, and utilize techniques in singular-value thresholding and convex optimization. Using the nuclear norm as a proxy of the rank, usually results in strong methods and efficient heuristics. In addition, Srebro et al. [26], Keshavan et al. [27] analyze the theoretical recovery guarantees under certain assumptions of the partially observed matrix. Given the relatively small size of our algorithm, we utilize the SoftImpute algorithm from [28], which yields matrix imputation in the matter of minutes. Given this practical aspect, we can validate the rank selection using out-of-sample error.

While we do not contribute the the Matrix Imputation community, our work clearly transfers these methods to a novel methodology in discrete choice modeling. Its combination with the choice model learnt, allows us to substantially enrich our models, with certifiable confidence. To some extent, our use of the low rank correction on our original choice model can be viewed as boosting, see [29] for further details on boosting.

Set Function and Assortment optimization We now review the literature relevant

to the assortment optimization problem under discrete choice models, which is faced by a single store. As expected the simpler models are more amenable to assortment optimiza-tion. However constraining the optimization, whether it be on cardinality of the assortment or some weighted capacity, can make the optimization laborious, and perhaps intractable. Talluri and Van Ryzin [30] establish that the unconstrained revenue optimization under an MNL is polynomially solvable, then Rusmevichientong et al. [31], Davis et al. [32] show that the cardinality constrained problem is no harder. However the weighted capacity problem is NP-hard as shown in [33]. Similarly under the NMNL, Davis et al. [34] establish a

(33)

polyno-mial algorithm for the unconstrained assortment optimization, and Gallego and Topaloglu [35] provide a polynomial approximation for the weighted capacity variant. For the MMNL Rusmevichientong et al. [36] show that the problem is NP-hard, and Désir et al. [33] show it is even hard to approximate. To the best of our knowledge, there are no established results for the CNL (other than generalizations of the results associated to NMNL).

In our work, we view the CNL revenue optimization subject to capacity constraints as a constrained set function optimization. We utilize the multi-linear extension of set functions, and the Frank Wolf algorithm from [37] to solve the continuous extension. Our implementation can be ran in parallel for different stores, and utilizes stochastic gradients by sampling for computational efficiency.

1.2

Choice Model Estimation

In this section we describe the discrete choice model chosen and the associated estimation procedure. Briefly, this relies on a parametric utility model with a low rank term to leverage cross learning through time and stores. The data comprises of sales (sell-in) data for approx-imately three hundreds stores throughout 2019, totaling to thirty seven hundred different observations of assortments from different stores and sales figures. Our data consists of products owned by our beer collaborator and competition products, comprising over twenty three hundred different products, for a total number of sales observations greater than one million.

We will first review the feature engineering and data processing, second the set of choice models we have considered, and third the overall estimation procedure by Maximum Likeli-hood Estimation MLE.

1.2.1

Notation and Preprocessing

We first process our data to understand which products are offered at a time and store. Since our data corresponds to sell-in information, we consider the assortment offered at a specific month as the set of products that have been sold-in within the last three months, for example the assortment offered in December 2019 is the set of products for which we have a

(34)

sale record within dates October 1st 2019 to December 31st 2019. As a first approximation, the actual sales pattern is understood to be split equally over the three months the record affects. Naturally we aggregate the sales (and store/product features) by store, time, and product. We remove beforehand any data outliers, as observed by sale numbers or revenue as prescribed by our partner. Let us now denote time t∈ T , store s ∈ S, product i ∈ I. From our preprocessing, it is understood that in a specific store s at time t, there is only a subset of products which are offered which we denote with the assortment Ast ⊆ I, we refer to a

store-month pair and the associated assortment as an observation. For a given observation Ast, we normalize the volume sales to 1 to obtain market shares at different observations.

Currently we add to the data a fixed fraction (5% no purchase rate) of customers that entered the store and chose not to purchase to guide the estimation of outside option. This corresponds to product number 0. It is understood that we have observable market shares ρist > 0 for different products within the observation Ast.

We now describe our feature set, we originally have product specific features, such as price, nest memberships, number of product facings on the shelf... and create non linear transformation of these (such as price per nest, squared price). Store specific features relate to its geography, its urbanicity and channel and other demographics. We utilize one-hot encoding OHE for any categorical variables; further we use a OHE of products to capture a product fixed effect. In the end, we let xist encode both store specific and product specific

features altogether. We denote the set of features D of size d, that is xist ∈ RD. In practice,

we have approximately twenty features and twenty three hundred fixed effects.

In our work we often need to define the features for all products at all store-months. When a product is not offered in a given observation, we infer its features by averaging product specific features (for example price) across stores, and use the store specific features which are known. This allows us to obtain xist for all i ∈ I and observations. This process can

be refined using additional practical knowledge when relevant. By default the no purchase option has utility of 0, which we simply set by having zero-features i.e. x0st = 0D; we will

detail more on this in the following modeling section.

As mentioned, each product i ∈ I belongs to two segment types, one beer style (Craft, Classic...) and one based on price category (value, premium...). We let the set of all nests

(35)

be the set of all styles and all price categories, which we denote N ∈ N . By default, the outside option is the only product within its own nest.

1.2.2

Parametric Modeling

Considered Models

Throughout the development of our research, we considered three different choice models supported by random utility maximization theory: MNL, NMNL and CNL. Working closely with the business team’s expertise of our industry collaborator, we have progressively made the model more complex, in order to capture different feature sensitivity and also non-trivial substitution patterns within each nest and between nests altogether. In this section, we review the choice models explored to provide context, and illustrate the IIA properties from which the CNL does not suffer, in effect justifying our final decision.

For all models, we consider the parametric utility of products as a linear product of the real and synthetic features plus a dummy fixed effect for each product i. As mentioned earlier, for ease of notation, we condense these dummies into the feature vector xist ∈ RD

using a OHE of products, which is equivalent to having a product specific fixed effect. Notice that product 0 will always have 0 utility given the linear product of its null features.

The MNL, being the most straightforward model, has probabilities of picking a product given an assortment:

PM N L[i| A, s, t] =

x ist 1 +∑j∈Aeβ⊤xjst

The model is fully parametrized by the vector β ∈ RD. Notice we can replace the 1 in the denominator, corresponding to the outside option, by the additional product 0 with null features, which is present in every assortment. In this case, the MLE is actually a convex optimization problem. Given the purchase probability definition, we have the IIA:

i, j ∈ A, A′ P

M N L[i| A, s, t]

PM N L[i| A′, s, t] =

PM N L[j | A, s, t]

PM N L[j | A′, s, t]

To overcome this, the NMNL introduces a correlation parameter µ ∈ RN where ∀N ∈ N : µN ≥ 1, µ correlates the sales of products belonging to the same nest. The assumption is

(36)

that alternatives can be clustered into mutually exclusive nests. The probabilities are given by: PN M N L [i∈ N | A, s, t] = e β⊤xistµNj∈A∩Neβ xjstµN × (∑j∈A∩N eβ⊤xjstµN)1/µNN′( ∑ j∈A∩N′eβ xjstµ N ′)1/µN ′

Notice that if ∀N : µN = 1 then the model is equivalent to an MNL. The NMNL can be

viewed as a two-step MNL, where in the first step there is a probability of selecting the nest (second term in the RHS of the above equation), and in the second step there is a probability of selecting a product given the nest (first term in the RHS). From this interpretation, it is clear that the model suffers from IIA within nests and between nests, in particular:

i, j ∈ A, A′ and i, j ∈ N ⇒ P N M N L[i| A, s, t] PN M N L[i| A′, s, t] = PN M N L[j | A, s, t] PN M N L[j | A′, s, t] A ∩ N = A′∩ N and A ∩ N =A∩ N PN M N L[N | A, s, t] PN M N L[N | A′, s, t] = PN M N L[N | A, s, t] PN M N L[N | A′, s, t]

where it is understood that PN M N L[N ] =

i∈NPN M N L[i] is the probability of selecting a

specific nest.

After several iterations communicating the insights learnt with our industry collaborator and adjusting our choice model based on their expert domain knowledge, we have settled on the CNL which additionally introduces a membership parameter α ∈ RI×N such that ∀i, N : αiN ∈ [0, 1] and ∀i ∈ I :

N∈NαiN = 1. The understanding is that products can

belong to multiple nests with different weights.

PCN L[i| A, s, t] =N αµN iNeβ x istµNj∈A∩N α µN jNeβ xjstµN × (∑j∈A∩NαµN jNeβ xjstµ N)1/µNN′( ∑ j∈A∩N′α µN ′ jN′eβ xjstµ N ′)1/µN ′ (1.1)

The advantage of this model is to capture non-trivial substitution patterns. In the estimation, we can enforce some memberships to 0 when we believe a product does not belong to a specific nest. Notice that the NMNL corresponds to a CNL where the memberships coefficients are in{0, 1}.

In our model, we consider the nests to be the union of styles and price segments, each product belonging to 2 nests. We considered variations of this by having all products be-longing to all Style nests, and biasing our initial point of α towards the actual nest. The

(37)

following accuracy metrics are with respect to the former model.

Maximum Likelihood Estimation

In the following we will focus on the CNL, and remove the superscript, for other models, this would be the same. In order to estimate the parameters of our CNL (any of the above), we proceed with MLE. Assume for a given assortment Ast we have observed the market shares ρist for different products. Then the log-likelihood LL for a parametrized model denoted

(α, β, µ) is given by:

LLst(α, β, µ) =i∈Ast

ρist× log P[i | Ast, s, t, (α, β, µ)]

The overall LL is given by

LL(α, β, µ) =Est[LLst(α, β, µ)]

We solve the MLE by performing a stochastic gradient descent SGD on the negative LL. We can easily sample stochastic gradients by sampling uniformly at random observations s, t. To actually calculate the gradient numerically, we rely on the automatic differentiation from Pytorch: e.g. for a fixed set of parameters, we calculate the forward function ˆLL(α, β, µ) being an unbiased stochastic estimate of the actual gradient, and calculate the gradient w.r.t to our parameter (α, β, µ). We then optimize using the AdaGrad algorithm.

In practice, we pass through the data multiple times running SGD. In each pass (epoch), we randomly permute our indices and select without replacement a set of observations to take a gradient step. Throughout the estimation, we keep a validation set of observations aside on which we evaluate the validation LL at the end of each epoch. If our validation LL does not improve, we terminate the estimation procedure.

Finally we check our accuracy metrics on a left out set of observations to obtain a generalization error. We analyze the test LL to check for overfitting, KL-divergence, mean-squared error MSE and R2. Note that we redefine the R2 in a special way for assortments,

(38)

shares: R2st = 1i( ˆρist− ρist)2 ∑ i(ρist− 1/|Ast|)2 and R2 =E[R2st]

As usual, a perfect fit gets an R2 = 1, whereas base model uses only the number of products

for predicting market shares.

We implement our model and the training in Pytorch, that leverages vectorized op-erations. In practice we run approximately 50 epochs through our data, by stochastically sampling 200 observations for estimating our gradients. We train our model by leaving out a test set which serves for generalization error, and get the parametric CNL (in and out-of-sample) R2 = .74.

Bootstrapping the Process

We then bootstrap the estimation process to get confidence intervals on the parametric estimates of β. During the assortment optimization, we will use these to penalize the utility for specific products in specific stores.

In each bootstrap estimation of our parameters, we remove 20% of stores at random, that is the data associated to observations within these stores across all months. We also fix α, µ from our previously estimated CNL. That is we reestimate only β on this sampled subset of stores, using the same SGD methodology, by removing (at random) an additional 20% of stores for validation, and terminate the estimation once the validation LL decreases. We run this step 100 times, and obtain an empirical distribution of the linear parameter β across random training subsamples of stores. We record the standard deviation (resp. mean) of each of the components of β as the vector σ(β) (resp. ˆβ). We use the bootstrapped vector ˆβ as the estimated utility vector. We let the mean utility for a product in a store be uist, and define the confidence interval around the utility as

uist := ˆβ⊤xist and σ(uist) := ∑ d∈D

σ(β)d|xist,d| ≥ 0 (1.2)

The CNL with parameters (α, ˆβ, µ) has a higher in-sample R2 = .76. In the future we use

(39)

1.2.3

Boosting with Low Rank Utility Model

Currently our model is given by a parametric CNL where the utilities are given by the linear parametric formulation uist = β⊤xist. Let us consider the matrix representation U =

(uist)s∈S×t∈T ,i∈I, where it is understood that the columns are indexed by products, and the

rows by the set of pairs store-month or observations equivalently. This can be understood as X⊤β where X ∈ RS×T ,I,D is the tensor of store-month observations by product by feature. Notice that given U , we can straightforwardly calculate the market shares by replacing β⊤xist

by uist in (1.1).

The objective of this section is to enrich our utility model, while keeping the parameters (α, µ) fixed. Currently our utility matrix is given by U = X⊤β, we consider extending this to

ˆ

U = X⊤β + ˆE s.t. ˆE ∈ RS×T ,I has low rank (1.3) and using the utilities from ˆU in (1.1). Our current model can be viewed as with ˆE rank 0. To obtain such a matrix ˆE, we first invert the market shares given our model learnt, this yields a partially observed matrix of residual utility. We then impute this matrix to obtain our final model, we select the rank of this matrix using a cautious validation procedure.

Inverting Market Shares

In order to invert the observed market shares for our trained CNL with parameters (α, µ), we reestimate a store-month specific utility model. That is for each store-month pair (s, t), we fix (α, µ), and learn utilities ¯uist for products in Ast such that we perfectly fit the data.

Notice that if we do not fix some parameters of the CNL, then the store-month model is over-specified. In our implementation, this can be easily done by using the OHE of products, and as before the estimation is done by performing GD on our data until convergence across all observations. For some intuition, we illustrate our approach using MNL model, in which case inverting the market shares is simply done by taking ¯uM N L

ist ← ln(ρist/ρ0st). Given for

the CNL there are no closed-form solutions, we must rely on GD to actually fit the data. In practice, we run this task in parallel given we wish to obtain store specific models. With our implementation in Pytorch, this step is done in the matter of minutes. This model

(40)

has a numerical R2 = .997, which illustrates the quality of our fit, and that we have actually

inverted the market shares. The matrix ¯U can be incorporated into the model (1.3) by considering the residual matrix ¯E ← ¯U − X⊤β and ˆU ← ¯U . While this model has a near-perfect fit on our data, the rank of ¯E can be as large as min(|I|, |S × T |), and there is no way to impute the missing entries of this matrix.

In the following, we leverage extant methods for imputing the matrix of residual utilities, which in turn will yield an boosted model.

Imputation Residuals

We use the overfit utilities to learn a model over the observed residuals of the utilities. Denote:

ϵist= ¯uist− uist = ¯uist− x⊤istβ (1.4)

for all (i, s, t) where i∈ Ast. We will apply the idea of collaborative filtering to denoise these

residuals and infer unobserved residuals at the points where i /∈ Ast, that is where there are

no overfit utility.

Consider the matrix E ∈ RS×T ,I whose entries are defined by

Est,i =       

ϵist if (i, s, t) observed, i.e. i ∈ Ast

0 otherwise

.

Note that this is a sparse matrix as we do not observe most of the entries. In fact, in our dataset, more than 90% of the entries are missing.

To impute this matrix we use the SoftImpute algorithm Mazumder et al. [28]. Roughly speaking, SoftImpute is an iterative algorithm that begins with a sparse matrix as input. At each step, it takes a (not necessarily sparse) matrix for which it computes a low rank soft-thresholding SVD approximation, and then outputs another matrix where the missing entries of the original sparse matrix are replaced by the low rank approximation. In the last step we output the result of the low rank soft-thresholding SVD. This crucial last step enables to denoise the residuals for the observed data points.

(41)

dense matrix whose entries ˆϵst,i correspond to the imputed and denoised residuals obtained

at the beginning of this section by inverting market shares. The values ˆϵist correspond to

the denoised and imputed residuals that will be use to enhance the model. More concretely, we will now define the estimated utility of a product i at store s and time t as the sum of our parametric utility and our low rank residual

ˆ

uist := uist+ ˆϵist or equivalently ˆU = U + ˆE

According to our validation results, which we describe in the next paragraph, the conser-vative value of the rank that achieves low validation error is 20. In fact, we see that in Figure 1-1a, the MSE (described in the next paragraph) associated with rank 20 is very close to the minimum MSE on the curve; this rank also corresponds to the “elbow” on Figure1-1bwhich suggests that the associated additive utility captures most of the signal. In effect we chose

ˆ

E of rank 20, running the imputation algorithm. Using the estimated CNL, the in-sample evaluation with the additive low-rank utility term increases the R2 of the model from 0.76

to 0.82.

Figure 1-1: Rank validation, analysis of different metrics for varying rank of additive utility. We highlighted in red the final rank chosen.

(a) Out-of-sample MSE on residual matrix

5 10 15 20 25 30 35 40 45 50 .15 .16 .17 rank M S E

(b) In-sample R2 on assortment market-shares

0 10 20 30 40 50 0.76 0.78 0.8 0.82 0.84 rank R 2

Rank Validation The imputation algorithm requires to chose a rank a priori. We over-come this step by developing a rank validation procedure. More specifically we consider a set of rank values that we want to test r∈ {5, 10, ..., 50}. For each value r, we run 30 epochs by

Figure

Figure 1-1: Rank validation, analysis of different metrics for varying rank of additive utility.
Figure 1-2: In sample accuracy at different stages of the estimation: CNL estimated from MLE, CNL-B bootstrapped model, CNL-B20 with additional rank 20 denoised residual highlighted as our final model, CNL-R with total residual.
Table 2.1: Ranges of relative improvements (in %) of Q -modified PEQs over original PEQs, for the base case of synthetic product data, for varying parameters
Figure 2-1: Relative improvement for the base case of synthetic product data, for varying T of:
+7

Références

Documents relatifs

Continued efforts to link specific cognitive functions to their ecological and social settings present a promising avenue to understand the evolution of cognition while recognizing

A large portion of this course deals with methods that are used to determine how much of a material is present.. We need to review the general steps that are taken for

Les résultats des expérimentations montrent que les crypto-systèmes proposés sont capables de produire des SNPAs de très haute qualité et même de qualité supérieure

Bounded cohomology, classifying spaces, flat bundles, Lie groups, subgroup distortion, stable commutator length, word metrics.. The authors are grateful to the ETH Z¨ urich, the MSRI

We can increase the flux by adding a series of fins, but there is an optimum length L m for fins depending on their thickness e and the heat transfer coefficient h. L m ~ (e

We first notice that for both the quadric Q and the two-step flag variety F ` 1,n−1;n , there are smooth families of integrable systems connecting the special Lagrangian torus

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Videotaped lessons of a model on energy chain, from two French 5 th grade classes with the same teacher, were analysed in order to assess the percentage of stability as