User response prediction in mobile advertising

(1)

HAL Id: hal-02014821

https://hal.archives-ouvertes.fr/hal-02014821

Submitted on 11 Feb 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

User response prediction in mobile advertising

Faustine Bousquet, Khanh Duong, Christian Lavergne, Sophie Lèbre, Anastasia Lieva

To cite this version:

Faustine Bousquet, Khanh Duong, Christian Lavergne, Sophie Lèbre, Anastasia Lieva. User response prediction in mobile advertising. ECML PKDD, Sep 2018, Dublin, Ireland. �hal-02014821�

(2)

User response prediction in mobile advertising

Faustine Bousquet ^∗,1,2 , Khanh Chuong Duong ¹ , Christian Lavergne ^2,3 , Sophie Lebre ^2,3 , Anastasia Lieva ¹

∗

faustine.bousquet@tabmo.io

1

TabMo Labs, Montpellier, France

2

IMAG Institut Montpellierain Alexander Grothendieck, Universite de Montpellier, France

3

Universite Paul Valery, Montpellier 3, France

Mobile Advertising Process

TabMo is an adtech company running Hawk platform. Our product has been built to be the only Creative Mobile Demand side platforms (DSP).

Some definitions:

I

Impressions: the number of times an ad is displayed.

I

Click Through Rate (CTR) = #Impressions ^#Clicks (In the following, CTR is calculated by hour)

I

Demand side platforms (DSP): platform serving advertisers or ad agencies by bidding for their campaigns in multiple ad networks automatically.

Data visualization

0.0 0.1 0.2 0.3 0.4 0.5

mai 24 mai 26 mai 28

Time

CTR

0.0 0.3 0.6 0.9

juil. 07 00:00 juil. 07 12:00 juil. 08 00:00 juil. 08 12:00 juil. 09 00:00

Time

CTR

CTR Evolution over campaign diffusion

NB : these campaigns are randomly chosen

0 500 1000 1500

Campaigns duration (in hour)

Number of Campaigns

Visualization of campaigns duration

Aims and objectives

Our aim: targeting the right person, at the right place and time with the most relevant ad

I

Clustering of mobile campaigns

Each campaign has its own KPI to optimize. It can be a performance objective (lots of clicks), a branding strategy (lots of impressions) or more complicated goals which are not easy to handle through bid requests. Then, first part of the thesis focuses on obtaining clusters of campaigns.

I

Click prediction models for each cluster

Clustering approach allows the estimation of a specific type of model for each cluster regarding its own objective. We consider that we know the KPI to optimize and predict. We have to increase the rate of this KPI with an appropriate, scalable and innovative predictive model in real time.

Mathematical approach

Model definition

Impressions and clicks are aggregated by hour. We calculate the corresponding CTR or number of impressions for each time slot. Observations can be described as:

For all,

I

c = 1, ..., C campaigns,

I

j = d _c , ..., f _c days of campaign C ,

I

h = 1, ..., H time slots,

I

t = 1, ..., T _jh repetitions of time slot h during day j ,

Y _cjht = µ + β _h ^H + β _s ^S + _cjht

where µ is a constant, β _h the time slot effect, β _s the day of week effect (assuming the following constraints: β ₁ ^H = β ₁ ^S = 0 for identifiability) and _cjht a gaussian error .

Mixture model

We assume that there are C campaigns which are part of K groups:

Z _kc =

1 if campaign c belongs to cluster k 0 otherwise

The mixture model can then be written like:

f (y _c ; β , σ ² ) =

K

X

k =1

λ _k f _k (y _c ; β _k , σ _k ² )

where f (y _c ) = Q _f _c

j =d _c

Q _H

h=1

Q ^T jh

t =1 f (y _cjht ) and P (Z _kc = 1) = λ _k , the probability that Y _c belongs to k . We used a classical Expectation-Maximisation (EM) [1] algorithm to estimate this mixture model.

Criteria for choosing number of cluster

I

Bayesian Information Criterion (BIC) : BIC (K ) = −2 × log (Ln(Y ; ˆ β _k , σ ˆ _k ² )) + #parameters × log N where N is the number of observations.

I

Integrate Classification Likelihood [2] (ICL) which penalizes the complete log likelihood : ICL(K ) = BIC (K ) − 2 P _C

c =1

P _K

k =1 τ _kc ˆ log τ ˆ _kc where τ _kc = P (Z _kc = 1|Y _c ) References

[1] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics).

Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.

[2] C. Biernacki, G. Celeux, and G. Govaert, “Assessing a mixture model for clustering with the integrated classification likelihood,” 1998.

Results of simulations

The objective of this design of experiment (DOE) is to evaluate limits of our EM algorithm when noise variance increases.

Simulation settings on 700 campaigns whose CTR is simulated:

I

Clusters are equidistributed

I

Beta values are estimated on real experiments with H = 5 time slot and S = 7 day of week. β ∈ R ¹¹ and their absolute values vary from 0 to 18.13 with a median value equal to 0.18.

BIC/ICL estimated number of clusters VS simulated number of clusters

2 3

4 5

6 7 8 9

10 11 12

13 14

15 2

3 4

5

6 7 8 9

10 11 12

13 14

15 2

3 4

5

6 7 8 9

10 11 12

13 14

15 2

3 4

5

6 7 8 9

10 11 12

13 14

15 2

3 4

5

6 7 8 9

10 11 12

13 14

15

Variance

(median : 0.24)

Variance (median : 0.75)

Variance (median : 1.37)

Variance (median : 3.52)

Variance (median : 8.64)

Simulated K

Estimated K (ICL)

2 3 4 5 6 7 8

9 10 11 12 13 14 15

Comparison of estimated and simulated number of clusters when noise variance increases

Some explanation : if we take the second piechart whose variance median is 0.75:

I

When the number of cluster simulated is K=2, the number of cluster estimated is K=2.

I

When the number of cluster simulated is K=13, the number of cluster estimated is K=9.

Campaigns confusion matrix (K=4) :

1 8 1

2 0 2

3 109 3

4 0 4

0 4 0 232

227 0 9 0

0 99

0 12 Variance

(median : 0.24)

1 16 1

2 0 2

3 101 3

4 0 4

0 8 0 228

218 2 16

0 0 92

0 19 Variance

(median : 0.75)

1 31 1

2 0 2

3 86 3

4 0 4

2 9 0 225

214 0 18

4 0 83

0 28 Variance

(median : 1.37)

1 33 1

2 1 2

3 83 3

4 0 4

6 12

0 218

214 0 17

5 1 72

0 38 Variance

(median : 3.52)

1 40 1

2 0 2

3 76 3

4 1 4

10 7 0 219

209 0 20

7 2 47

0 62 Variance

(median : 8.64)

First Clustering result

First results are on CTR metric. We worked with 700 campaigns which started and ended between May the 10th and July the 10th. Our model included 2 temporal variables:

I

day of week (cardinality S = 7)

I

time of the day into buckets (cardinality H = 5) Optimal number of clusters by BIC/ICL criteria:

80000 100000 120000 140000 160000

4 8 12

number of Cluster

BIC

80000 100000 120000 140000 160000

4 8 12

number of Cluster

ICL

Inferred profiles:

I

Beta values are very different from one cluster to another.

I

Same observation about clusters size : they include from 9 to 123 campaigns.

I

Time slot and day of week effect seem to be significant.

0.05 0.06 0.07 0.08

Beta estimated

Cluster 1 (nc = 120)

0.30 0.32 0.34 0.36

Beta estimated

Cluster 2 (nc = 173)

0.60 0.65 0.70 0.75

Beta estimated

Cluster 3 (nc = 123)

1.10 1.15 1.20 1.25

Beta estimated

Cluster 4 (nc = 45)

0.9 1.0 1.1 1.2 1.3 1.4 1.5

Beta estimated

Cluster 5 (nc = 35)

1.6 1.7 1.8 1.9 2.0 2.1

Beta estimated

Cluster 6 (nc = 42)

2.2 2.4 2.6 2.8

Beta estimated

Cluster 7 (nc = 16)

1.0 1.5 2.0 2.5 3.0

Beta estimated

Cluster 8 (nc = 19)

2.8 3.0 3.2 3.4

Beta estimated

Cluster 9 (nc = 33)

1 2 3 4

Beta estimated

Cluster 10 (nc = 9)

4.0 4.5 5.0

Beta estimated

Cluster 11 (nc = 37)

5 6 7 8

Beta estimated

Cluster 12 (nc = 39)

0 10 20 30

Beta estimated

Cluster 13 (nc = 9)

Day

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Conclusions

I

First results are encouraging. We obtain different cluster profiles with only few temporal variables.

Evaluating algorithm, its limit and statistical results still remain in progress and are part of the

challenges of this thesis. Also, we should work with domain expert to validate pertinence of clusters.

I

Context variables on ads (size, type) and device (OS, model and so on) will enrich the model.

Hopefully, these new variables and their interactions will lead to more and more homogeneous clusters.

I

User response prediction in mobile advertising

User response prediction in mobile advertising

User response prediction in mobile advertising

Faustine Bousquet ∗,1,2 , Khanh Chuong Duong 1 , Christian Lavergne 2,3 , Sophie Lebre 2,3 , Anastasia Lieva 1

faustine.bousquet@tabmo.io

TabMo Labs, Montpellier, France

IMAG Institut Montpellierain Alexander Grothendieck, Universite de Montpellier, France

Universite Paul Valery, Montpellier 3, France

Mobile Advertising Process

TabMo is an adtech company running Hawk platform. Our product has been built to be the only Creative Mobile Demand side platforms (DSP).

Some definitions:

Impressions: the number of times an ad is displayed.

Click Through Rate (CTR) = #Impressions #Clicks (In the following, CTR is calculated by hour)

Demand side platforms (DSP): platform serving advertisers or ad agencies by bidding for their campaigns in multiple ad networks automatically.

Data visualization

CTR Evolution over campaign diffusion

Visualization of campaigns duration

Aims and objectives

Our aim: targeting the right person, at the right place and time with the most relevant ad

Clustering of mobile campaigns

Each campaign has its own KPI to optimize. It can be a performance objective (lots of clicks), a branding strategy (lots of impressions) or more complicated goals which are not easy to handle through bid requests. Then, first part of the thesis focuses on obtaining clusters of campaigns.

Click prediction models for each cluster

Clustering approach allows the estimation of a specific type of model for each cluster regarding its own objective. We consider that we know the KPI to optimize and predict. We have to increase the rate of this KPI with an appropriate, scalable and innovative predictive model in real time.

Mathematical approach

Model definition

Impressions and clicks are aggregated by hour. We calculate the corresponding CTR or number of impressions for each time slot. Observations can be described as:

For all,

c = 1, ..., C campaigns,

j = d c , ..., f c days of campaign C ,

h = 1, ..., H time slots,

t = 1, ..., T jh repetitions of time slot h during day j ,

Y cjht = µ + β h H + β s S + cjht

where µ is a constant, β h the time slot effect, β s the day of week effect (assuming the following constraints: β 1 H = β 1 S = 0 for identifiability) and cjht a gaussian error .

Mixture model

We assume that there are C campaigns which are part of K groups:

Z kc =

1 if campaign c belongs to cluster k 0 otherwise

The mixture model can then be written like:

f (y c ; β , σ 2 ) =

K

X

k =1

λ k f k (y c ; β k , σ k 2 )

where f (y c ) = Q f c

j =d c

Q H

h=1

Q T jh

t =1 f (y cjht ) and P (Z kc = 1) = λ k , the probability that Y c belongs to k . We used a classical Expectation-Maximisation (EM) [1] algorithm to estimate this mixture model.

Criteria for choosing number of cluster

Bayesian Information Criterion (BIC) : BIC (K ) = −2 × log (Ln(Y ; ˆ β k , σ ˆ k 2 )) + #parameters × log N where N is the number of observations.

Integrate Classification Likelihood [2] (ICL) which penalizes the complete log likelihood : ICL(K ) = BIC (K ) − 2 P C

c =1

P K

k =1 τ kc ˆ log τ ˆ kc where τ kc = P (Z kc = 1|Y c ) References

[1] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics).

Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.

[2] C. Biernacki, G. Celeux, and G. Govaert, “Assessing a mixture model for clustering with the integrated classification likelihood,” 1998.

Results of simulations

The objective of this design of experiment (DOE) is to evaluate limits of our EM algorithm when noise variance increases.

Simulation settings on 700 campaigns whose CTR is simulated:

Clusters are equidistributed

Beta values are estimated on real experiments with H = 5 time slot and S = 7 day of week. β ∈ R 11 and their absolute values vary from 0 to 18.13 with a median value equal to 0.18.

BIC/ICL estimated number of clusters VS simulated number of clusters

Variance

(median : 0.24)

Variance (median : 0.75)

Variance (median : 1.37)

Variance (median : 3.52)

Variance (median : 8.64)

Simulated K

Estimated K (ICL)

Comparison of estimated and simulated number of clusters when noise variance increases

Some explanation : if we take the second piechart whose variance median is 0.75:

When the number of cluster simulated is K=2, the number of cluster estimated is K=2.

When the number of cluster simulated is K=13, the number of cluster estimated is K=9.

Campaigns confusion matrix (K=4) :

1 8 1

2

0 2

Faustine Bousquet ^∗,1,2 , Khanh Chuong Duong ¹ , Christian Lavergne ^2,3 , Sophie Lebre ^2,3 , Anastasia Lieva ¹

Click Through Rate (CTR) = #Impressions ^#Clicks (In the following, CTR is calculated by hour)

j = d _c , ..., f _c days of campaign C ,

t = 1, ..., T _jh repetitions of time slot h during day j ,

Y _cjht = µ + β _h ^H + β _s ^S + _cjht

where µ is a constant, β _h the time slot effect, β _s the day of week effect (assuming the following constraints: β ₁ ^H = β ₁ ^S = 0 for identifiability) and _cjht a gaussian error .

Z _kc =

f (y _c ; β , σ ² ) =

λ _k f _k (y _c ; β _k , σ _k ² )

where f (y _c ) = Q _f _c

j =d _c

Q _H

Q ^T jh

t =1 f (y _cjht ) and P (Z _kc = 1) = λ _k , the probability that Y _c belongs to k . We used a classical Expectation-Maximisation (EM) [1] algorithm to estimate this mixture model.

Bayesian Information Criterion (BIC) : BIC (K ) = −2 × log (Ln(Y ; ˆ β _k , σ ˆ _k ² )) + #parameters × log N where N is the number of observations.

Integrate Classification Likelihood [2] (ICL) which penalizes the complete log likelihood : ICL(K ) = BIC (K ) − 2 P _C

P _K

k =1 τ _kc ˆ log τ ˆ _kc where τ _kc = P (Z _kc = 1|Y _c ) References

Beta values are estimated on real experiments with H = 5 time slot and S = 7 day of week. β ∈ R ¹¹ and their absolute values vary from 0 to 18.13 with a median value equal to 0.18.