The Experiments - An Apriori Based Approach to Improve On-line Advertising Performance

An Apriori Based Approach to Improve On-line Advertising Performance

3. The Experiments

The basic aim of our experiment is to exploit Apriori rules [9] derived by banner click logs in order to increase the CTR of some banners. In this section we discuss first the data we used and then the results of the experiment itself.

3.1. Data Collection

As already mentioned, data on Internet are not a scarce resource. In our testing environment we collected more than 30 Million log lines every single day. Each line describes one of the two possible actions we are interested in: impression and click. For each of those actions we collect a substantial set of information such as user-id, time, position on the site, banner-id, etc. All data collected in our logs are anonymous, that is, no personal and/or sensitive data of any form are necessary in our system. Notice also that we do not need to know any specific graphical feature of the banners, that is, we just rely on their unique id without considering things like size, shape, colors, category, etc. For our experiment we took a sample of data composed of about 60 Million log lines collected over a month period. This sample includes only click logs as impression data are not relevant for deriving association rules. These logs are produced run-time by the system numerous front-ends. Such logs are divided into a set of files of equal number of lines. They are passed at nighttime to the backend system. On the backend we run a custom application that produce all files ready to be processed by Apriori.

3.2. Results

By running A-Priori [9] on our data set we collect a set of association rules where all items in the rules represent banners clicked by users [10]. We will refer to the click-through rate of banner A as CTR(A). We will use the notation CTR(A | B) to denote the click-through rate of banner A among all users who have previously clicked on banner B.

We now show the results of applying our system to the discovered rule 31 189, where 31 and 189 are two different banner ids. In our experiment we considered only highly performing banners in order to have a wider data set and make our analysis more robust. The click-through rate of banner 189 computed during the considered

time period among all users is 3.47%. This value was pretty much constant on every day of the month long period we considered in our experiment. Table 2 shows the day-by-day results of our experiment.

Table 2. Day-by-day lift detail by applying a specific association rule

Day Impr(31) Impr(189|31) Clicks(189|31) CTR(189|31) Lift

1 610785 2 0 0,0% 0.00

8 542609 659 82 12,4% 3.58

9 369882 1142 129 11,3% 3.25

10 419101 90 11 12,2% 3.52

11 313450 833 102 12,2% 3.53

12 137743 1113 128 11,5% 3.31

13 130238 1027 88 8,6% 2.47

14 225958 1154 107 9,3% 2.67

15 256673 1077 112 10,4% 2.99

16 331203 1022 84 8,2% 2.37

17 358906 1013 92 9,1% 2.62

18 273730 904 73 8,1% 2.33

19 153346 850 71 8,4% 2.41

20 171469 912 97 10,6% 3.06

21 343501 1450 120 8,3% 2.38

22 224614 1367 125 9,1% 2.66

23 246541 1266 112 8,8% 2.55

24 163615 1124 95 8,5% 2.43

25 117600 922 82 8,9% 2.56

26 47568 718 71 9,9% 2.85

27 51261 825 65 7,9% 2.27

28 117272 924 100 10,8% 3.12

29 119402 817 60 7,3% 2.12

30 106071 736 67 9,1% 2.62

31 7280 32 1 3,1% 0.90

On the 1st column we have the day of the month. The 2nd column reports the number of impressions of banner 31 on each day. The 3rd and 4th column contain, respectively, the number of impressions and the number of clicks of banner 189 among users who have been previously clicked on banner 31. In the 5th column we find the click-through rate of banner 189 computed only among users who previously clicked on banner 31.

The last column reports the lift computed as the ratio between the CTR on the 5th column, that is, computed only among users who clicked on both banner 31 and 189 over the CTR of 189 computed among all other users.

On the first seven days of the month we intentionally did not force the application of the rules, thus, the small values reported on the 3rd column are only by chance. That is by a random probability that banner 189 is shown to a user who previously clicked on 31.

For this specific rule, the CTR banner improvement for the entire period was 2.76, which means that by applying that rule we increase the probability of somebody clicking on banner 189 by 2.76 times. This is a quite impressive results considering how simple the model is.

We now analyze the performance of five banners optimized through our approach on another site. In Table 3 we report the lift performance for the entire period (fifteen days) of application of our discovered rules.

Table 3. Lift detail for five banners optimized by means of Apriori

Banner id Optimized CTR Non-optimized CTR Lift

52976 0.151% 0.073% 2.06

75436 1.195% 0.869% 1.38

64931 8.025% 1.634% 4.91

53586 2.759% 1.618% 1.71

75926 4.498% 0.938% 4.80

The 1st column is the banner id. The 2nd and 3rd report the banner CTR computed, respectively, with and without our optimization. The 4th column reports the lift. Notice that there is an improvement in all cases. Moreover, such improvement gets really interesting in some cases going up to 491% (third row) and 480% (last row). In the worst case we get a CTR improvement of 38% (second row), which is still a very respectable performance.

We now analyze the overall improvement we were able to achieve using the proposed system. In Figure 2 we show the overall click improvement by different minimum support thresholds. In all cases we obtained a lift compared to the non-optimized algorithm (column “real” in the figure). It is interesting to notice how decreasing the Apriori minimum support yields a better overall performance. The improvement flattens out after the value of 50 for the minimum support. In this case we improved the overall system performance, measured by overall number of clicks produced, by 45%. This was considered an outstanding result.

Figure 2. Overall improvement in the number of clicks obtained by varying minimum support.

Figure 3. Overall improvement in revenue obtained by varying minimum support.

Similarly, in Figure 3 we present the overall improvement in the revenue obtained by varying the Apriori minimum support. This chart was obtained by multiplying the revenue per banner by the number of additional clicks generated. This has a very similar trend to the chart of Figure 2 as clicks and revenue are correlated.

Dans le document APPLICATIONS OF DATA MINING IN E-BUSINESS AND FINANCE (Page 68-71)