Professional Documents
Culture Documents
An Efficient Algorithm For Allocation of Guaranteed Display Advertising
An Efficient Algorithm For Allocation of Guaranteed Display Advertising
cn54@yahoo-inc.com
Erik Vee Jian Yang
Yahoo!Labs Yahoo!Labs
erikvee@yahoo-inc.com jianyang@yahoo-inc.com
plan; collect the delivery stats so far; then re-optimize for the rest
1.00%
of the duration using the updated stats. Note that the two-hours
corresponds to two hours of serving logs. Our actual simulation is 0.80% UnderDelivery
Penalty Cost
somewhat faster due to the downsampling.
0.60%
12.00%
smooth with respect to time. For example, a 7 day contract with 4.00%
duration. -2.00%
-4.00%
4.3.4 Results
Figures 3, 4 and 5 show that the under-delivery and penalty cost
for HWM (SHALE with 0 iterations) algorithm is the worst. Fur-
ther, as the number of SHALE iterations increase it gets very close Figure 4: Dataset 2: Under Delivery and Penalty Cost Compar-
to the DUAL algorithm. Note that even SHALE with 5 or 10 itera- ison
tions performs as well or sometimes slightly better than the DUAL
algorithm. This can be attributed to different reasons; one being
the fact that there are forecasting errors intrinsic to using real serv- ployed to ensure good pacing. For these experiments, we have re-
ing logs. Another contributing factor is the fact that the DUAL moved those modifications to give a clearer picture of how the base
algorithm does not directly optimize for either of these metrics. In algorithms perform.
addition, Stage Two attempts to fulfill the delivery of every con-
tract, even if it is not optimal according to the objective function.
4.4 Experiment 3
This heuristic aspect of SHALE actually appears to aid in its per- Superficially, HWM and SHALE both perform well. In this ex-
formance when judged by simple metrics like delivery. periment, we do a more detailed simulation to compare HWM and
Figure 6 shows how these algorithm perform with respect to pac- SHALE. We fix the iteration count for SHALE at 20 and test its
ing. The pacing is similar for all three datasets for SHALE with 5, performance under varying supply levels. Specifically, for each of
10 and 20 iterations when compared with the DUAL algorithm. our 6 contract sets, we artificially reduced the supply weight on
Surprisingly, HWM has better pacing than SHALE and DUAL for each of the supply nodes while keeping the graph structure fixed in
two of the datasets. One possible reason for this is that SHALE order to simulate the increasing scarcity of supply. We define the
and DUAL algorithm gives better under-delivery and penalty cost, average supply contention (ASC) metric to represent the scarcity of
compromising some pacing. Note that the time dimension is just supply, as follows
one of the many dimensions that the representativeness portion of P P
dj
the objective function. This may also be an artifact of forecasting i si j∈i Sj
ASC = (6)
errors. In real systems, certain additional modifications are em-
P
i si
Dataset 3
5.00%
4.00%
3.00%
Percentage increase over DUAL
2.00%
1.00%
UnderDelivery
Penalty Cost
0.00%
HWM SHALE_5 SHALE_10 SHALE_20
-1.00%
-2.00%
-3.00%
Figure 7: SHALE Vs. HWM
-4.00%
30.00% to other classes of problems involving many users with supply con-
straints (e.g. each user is shown only one item). Thus, although
Dataset 1
20.00% Dataset 2 SHALE is particularly well-suited to optimizing guaranteed display
Dataset 3
ad delivery, it is also an effective lightweight optimizer. It can han-
10.00%
dle huge, memory-intensive inputs, and the underlying techniques
we use provide a useful method of mapping non-optimal dual solu-
tions into nearly optimal primal results.
0.00%
HWM SHALE_5 SHALE_10 SHALE_20
-10.00%
6. ADDITIONAL AUTHORS
7. REFERENCES
[1] Y. Chen, P. Berkhin, B. Anderson, and N. R. Devanur.
Figure 6: Pacing Comparisons on all three datasets
Real-time bidding algorithms for performance-based display
ad allocation. In C. Apté, J. Ghosh, and P. Smyth, editors,
KDD, pages 1307–1315. ACM, 2011.
where si represents the supply weight and dj and Sj represent the
[2] N. R. Devenur and T. P. Hayes. The adwords problem: online
demand and eligible supply for contract j. In Figure 7, we show
keyword matching with budgeted bidders under random
the under-delivery rate, penalty cost and L2 distance for SHALE
permutations. In J. Chuang, L. Fortnow, and P. Pu, editors,
as a percentage of the corresponding metric for HWM for various
ACM Conference on Electronic Commerce, pages 71–78.
levels of ASC. First we note that each of our metrics for SHALE
ACM, 2009.
is better than the corresponding metric for HWM for all values of
ASC. Indeed, the SHALE L2 distance is less than 50% of that for [3] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni, and
HWM. Also note that the SHALE penalty cost consistently im- C. Stein. Online stochastic packing applied to display ad
proves compared to HWM as the ASC increases. This indicates allocation. In M. de Berg and U. Meyer, editors, ESA (1),
that even though HWM appears to have better pacing for some volume 6346 of Lecture Notes in Computer Science, pages
data sets, SHALE is still a more robust algorithm and is likely pre- 182–194. Springer, 2010.
ferrable in most situations. (Indeed, we see very consistently that [4] J. Feldman, A. Mehta, V. S. Mirrokni, and S. Muthukrishnan.
its under-delivery penalty and revenue are both clearly better.) Online stochastic matching: Beating 1-1/e. In FOCS, pages
117–126. IEEE Computer Society, 2009.
[5] A. Ghosh, P. McAfee, K. Papineni, and S. Vassilvitskii.
5. CONCLUSION Bidding for representative allocations for display advertising.
We described the SHALE algorithm, which is used to generate In S. Leonardi, editor, WINE, volume 5929 of Lecture Notes
compact allocation plans leading to near-optimal serving. Our al- in Computer Science, pages 208–219. Springer, 2009.
[6] V. S. Mirrokni, S. O. Gharan, and M. Zadimoghaddam. The second stationarity condition shows αj = pj − ψj . Since
Simultaneous approximations for adversarial and stochastic ψj ≥ 0, this immediately shows that αj ≤ pj . Further, the com-
online budgeted allocation. In D. Randall, editor, SODA, plementary slackness condition for P ψj implies that ψ = 0 unless
pages 1690–1701. SIAM, 2012. uj = 0. That is, either αj = pj or i∈Γ(j) si xij ≥ dj . By com-
[7] E. Vee, S. Vassilvitskii, and J. Shanmugasundaram. Optimal plementary
P slackness of αj , we see in fact that equality must hold
online assignment with forecasts. In D. C. Parkes, (i.e. i∈Γ(j) si xij =
Pj
d ) unless αj = 0. But when αj = 0, in-
P
C. Dellarocas, and M. Tennenholtz, editors, ACM Conference spection reveals that i∈Γ(j) si xij = i∈Γ(j) si gij (−βi ) ≤ dj .
on Electronic Commerce, pages 109–118. ACM, 2010. Hence, even when αj = 0, equality must hold for an optimal αj .
[8] J. Yang, E. Vee, S. Vassilvitskii, J. Tomlin, Finally, the complementary
P slackness condition on βi implies
J. Shanmugasundaram, T. Anastasakos, and O. Kennedy. either βi = 0 or j∈Γ(i) xij = 1. Putting all of this together, we
Inventory allocation for online graphical display advertising. see that
CoRR, abs/1008.3551, 2010. 1. The optimal primal solution is given by x∗ij = gij (αj∗ − βi∗ ),
where gij (z) = max{0, θij (1 + z/Vj )}.
Appendix 2. For all j, 0 ≤ αj∗ ≤ pj . Further, either αj∗ = pj or i∈Γ(j) si x∗ij =
P
Stationarity:
X αjt − βit
= si max{0, θij (1 + )}
Vj
Vj i∈Γ(j)
For all i, j, si (xij − θij ) − si αj + si βi − γij
θij Further, by the way in which αt+1 is calculated (in Stage One,
For all i, pj − αj − ψj = 0 Step 2), we have that αjt+1 must either be pj or satisfy the follow-
ing:
Complementary slackness: X
P dj = si gij (αjt+1 − βit )
For all j, either αj = 0 or i∈Γ(j) si xij + uj = dj . i∈Γ(j)
P
For all i, either βi = 0 or j∈Γ(i) si xij = si . X αjt+1 − βit
= si max{0, θij (1 + )}
For all i, j, either γij = 0 or xij = 0. Vj
i∈Γ(j)
For all j, either ψj = 0 or uj = 0.
Using the fact that for any numbers a ≥ b that max{0, a} −
The dual feasibity conditions also tell us that αj ≥ 0, βi ≥ 0, max{0, b} ≤ a − b (which can be shown by an easy case anal-
γij ≥ 0, and ψj ≥ 0 for all i, j. (While the primal feasibility ysis), we have
conditions tell us that the constraints in the original problem must
be satified.) Since our objective is convex, and primal-dual solution
X αjt+1 − βit
dj − dj (αt ) = si max{0, θij (1 + )}
satisfying the KKT conditions is in fact optimal. Vj
i∈Γ(j)
Notice that the stationarity conditions are effectively like taking
the derivative of the Lagrangian. The first of these tells us that
X αjt − βit
− si max{0, θij (1 + )}
Vj
i∈Γ(j)
αj − βi + γij /si
xij = θij (1 + ) X
Vj ≤ si θij (αjt+1 − αjt )/Vj
i∈Γ(j)
The complementary slackness condition for the γij tells us that
γij = 0 unless xij = 0. This has the effect that when the ex- = dj (αjt+1 − αjt )/Vj
α −β
pression θij (1 + jVj i ) is negative, γij will increase just enough
That is, either αjt+1 = pj or
to make xij = 0. In particular, this implies
αj − βi dj (αt )
xij = max{0, θij (1 + )} = gij (αj − βi ) αjt+1 = αjt + Vj (1 − ) (10)
Vj dj
Since dj (αt ) ≤ dj by assumption, this shows that αjt+1 ≥ αjt for
each j. We must still prove that dj (αt+1 ) ≤ dj . To this end, note
that the β t+1 generated in Step 1 for the given αt+1 must greater
than or equal to β t , since αt+1 ≥ αt . That is, βit+1 ≥ βit for all i.
Thus,
X αjt+1 − βit+1
dj (αt+1 ) = si max{0, θij (1 + )}
Vj
i∈Γ(j)
X αjt+1 − βit
≤ si max{0, θij (1 + )}
Vj
i∈Γ(j)
= dj
as we wanted.
In general, we can use the fact that dj (αt ) ≤ dj for all t, together
with Equation 10, to see that the αj values are non-decreasing at
each iteration. From this (together with the fact that αj is bounded
by pj ), it immediately follows that the algorithm converges.
To see that the algorithm converges to the optimal solution, we
note that the dual values generated by SHALE satisfy the KKT
conditions at convergence: for all j, either αj = pj or dj (α) = dj
(i.e. pj − αj − ψj = 0 with either ψj = 0 or uj = 0), with similar
arguments holding for the other duals. Since the problem we study
is convex, this shows that the primal solution generated must be the
optimal.
As for our second claim, suppose that there is some j for which
αjt 6= pj but dj (αjt ) ≤ (1 − ε)dj . Then by Equation 10, we see
dj (αt )
αjt+1 = αjt + Vj (1 − ) ≥ αjt + Vj ε
dj
That is, αjt+1 increases (over αjt ) by at least εVj . Since αjt starts
at 0, is bounded by pj , and never decreases, we see that this can
happen at most pj /(εVj ) times for each j. In this worst case, this
happens for every j, giving us the bound we claim.