You are on page 1of 2

A SUPPLEMENTARY MATERIAL

A.1 Derivative of RiskLoss


After provided experimental evaluation and intuition of our RiskLoss function, we now introduce a more formal explanation.
After the description in Section 3, in Equation 1, we describe the gradient derivation of RiskLoss function. The equation shows the partial
𝜕L𝑖𝑞
derivative of RiskLoss considering the predicted score 𝑠𝑑𝑜𝑐 (𝑑𝑜𝑐-document) as a function of the network parameters. The term 𝜕𝑠𝑑𝑜𝑐 is the
derivative of the continuous L function (CorrScore or SpearmanLoss) relative to the predicted value 𝑠𝑑𝑜𝑐 . Moreover, 𝜕RiskLoss
𝜕L𝑖𝑞 is the derivative
of the RiskLoss function relative to L for the document query.

𝜕RiskLoss 𝜕RiskLoss 𝜕L𝑖𝑞


= · (1)
𝜕𝑠𝑑𝑜𝑐 𝜕L𝑖𝑞 𝜕𝑠𝑑𝑜𝑐
Calculating the term 𝜕RiskLoss
𝜕L𝑖𝑞 by replacing the RiskLoss function in Equation 1 and using 𝑦 to represent the index of the ground-truth
system (Y), we obtain Equation 2.

𝜕𝑅𝑖𝑠𝑘𝐿𝑜𝑠𝑠 𝜕(𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠_𝐺𝑒𝑜𝑅𝑖𝑠𝑘 (NP) − 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠_𝐺𝑒𝑜𝑅𝑖𝑠𝑘 (Y)) 2


=
𝜕L𝑖𝑞 𝜕L𝑖𝑞

𝜕𝑅𝑖𝑠𝑘𝐿𝑜𝑠𝑠 𝜕𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠_𝐺𝑒𝑜𝑅𝑖𝑠𝑘 (NP) 𝜕𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠_𝐺𝑒𝑜𝑅𝑖𝑠𝑘 (Y)
= 2𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠_𝐺𝑒𝑜𝑅𝑖𝑠𝑘 (NP)( − ) (2)
𝜕L𝑖𝑞 𝜕L𝑖𝑞 𝜕L𝑖𝑞
v
u v
u
|Q| |Q|
u
t  u
t 
𝜕𝑅𝑖𝑠𝑘𝐿𝑜𝑠𝑠 𝜕 1
∑︁ 𝑍𝑅𝑖𝑠𝑘 (𝑖) 𝜕 1
∑︁ 𝑍𝑅𝑖𝑠𝑘 (𝑦)
= ( |Q| 𝑀𝑖𝑞 ) · Φ( ) − ( |Q| 𝑀𝑦𝑞 ) · Φ( )
𝜕L𝑖𝑞 𝜕L𝑖𝑞 𝑞=1
|Q| 𝜕L𝑖𝑞 𝑞=1
|Q|

Moreover, considering the terms 𝐺 (𝑆𝑖 ) and 𝐺 (𝑦) as defined in (Equation 3):

|Q|
1
∑︁ 𝑍𝑅𝑖𝑠𝑘 (𝑖)
𝐺 (𝑆𝑖 ) = ( |Q| 𝑀𝑖𝑞 ) · Φ( )
𝑞=1
|Q|
𝑎𝑛𝑑 (3)
|Q|
1
∑︁ 𝑍𝑅𝑖𝑠𝑘 (𝑦)
𝐺 (𝑦) = ( |Q| 𝑀𝑦𝑞 ) · Φ( )
𝑞=1
|Q|

One can apply the chain rule in Equation 2, obtaining:

   
𝜕𝑅𝑖𝑠𝑘𝐿𝑜𝑠𝑠 1 𝜕𝐺 (𝑆𝑖 ) 1 𝜕𝐺 (𝑦)
= √︁ · − √︁ · (4)
L𝑖𝑞 2 𝐺 (𝑆𝑖 ) 𝜕L𝑖𝑞 2 𝐺 (𝑦) 𝜕L𝑖𝑞
𝜕𝐺 (𝑆 )
Calculating the term 𝜕L𝑖𝑞𝑖 we get :

|Q|
𝜕𝐺 (𝑆𝑖 ) 1 L ) · Φ(ZRisk(𝑖)/|𝑄 |) + ( 1
∑︁ 𝜕
= ( |Q| 𝑖𝑞 |Q| 𝑀𝑖𝑞 ) · Φ(ZRisk(𝑖)/|𝑄 |) (5)
𝜕L𝑖𝑞 𝑞=1
𝜕L𝑖𝑞

Replacing Φ by the cumulative normal distribution equation with mean 0 (zero) and standard deviation 1 (one) (as described by [? ]), and
applying the chain rule again, we obtain the term 𝜕L𝜕𝑖𝑞 Φ(ZRisk(𝑖)/|𝑄 |) in Equation 6:
𝑁 𝑄+𝑁 −𝑄
− ZRisk (𝑖 )/|𝑄 |

1
 2𝑒𝑖𝑞 − L𝑖𝑞 − 𝑒𝑖𝑞 · ( )
𝜕 𝑁2
Φ( ZRisk (𝑖 )/|𝑄 | ) = √ · exp − · ( ZRisk (𝑖 )/|𝑄 | ) 2 · (1 + 𝛼 ) · 3/2
(6)
𝜕 L𝑖𝑞 2𝜋 2 2𝑒𝑖𝑞
𝜕𝐺 (𝑦)
Calculating the term 𝜕L𝑖𝑞 we get :

|Q|
𝜕𝐺 (𝑦) 1
∑︁ 𝜕
= ( |Q| 𝑀𝑦𝑞 ) · Φ(ZRisk(𝑦)/|𝑄 |) (7)
𝜕L𝑖𝑞 𝑞=1
𝜕L𝑖𝑞

1
WSDM’22, February 21-25, 2022, Phoenix, USA Anon.

Replacing Φ by the cumulative normal distribution equation with mean 0 and standard deviation 1 again and applying the chain rule, we
obtain the term 𝜕L𝜕𝑖𝑞 Φ(ZRisk(𝑦)/|𝑄 |) in Equation 8:
𝑁 𝑄+𝑁 −𝑄
− ZRisk (𝑦)/|𝑄 |

1
 2𝑒𝑖𝑞 − 𝑒𝑖𝑞 · ( )
𝜕 𝑁2
Φ( ZRisk (𝑦)/|𝑄 | ) = √ · exp − · ( ZRisk (𝑦)/|𝑄 | ) 2 · (1 + 𝛼 ) · 3/2
(8)
𝜕 L𝑖𝑞 2𝜋 2 2𝑒 𝑖𝑞

Using the previously calculated terms and substituting them in Equation 1, one can finally obtain the derivatives of the RiskLoss function
in Equation 9.
  |Q|  
𝜕 RiskLoss 1 ∑︁ − ZRisk (𝑖 )/|𝑄 | 1
= 1 L ) · Φ( ZRisk (𝑖 )/|𝑄 | ) + ( 1
· ( |Q| 𝑖𝑞 |Q|
𝑀𝑖𝑞 ) · √ · exp − · ( ZRisk (𝑖 )/|𝑄 | ) 2 ·
2
√︁
𝜕𝑠𝑑𝑜𝑐 2 𝐺 (𝑆𝑖 ) 𝑞=1 2𝜋
𝑁 |𝑄 |+𝑁 −|𝑄 | 
2𝑒𝑖𝑞 − L𝑖𝑞 − 𝑒𝑖𝑞 · ( )
𝑁2
(1 + 𝛼 ) · −
3/2
2𝑒𝑖𝑞
(9)
  |Q|  
1 ∑︁ − ZRisk (𝑦)/|𝑄 | 1
1
· ( |Q| 𝑀𝑦𝑞 ) · √ · exp − · ( ZRisk (𝑦)/|𝑄 | ) 2 ·
2
√︁
2 𝐺 (𝑦) 𝑞=1 2𝜋
𝑁 |𝑄 |+𝑁 −|𝑄 | 
2𝑒𝑖𝑞 − 𝑒𝑖𝑞 · ( ) 𝜕 L𝑖𝑞
𝑁2
(1 + 𝛼 ) · ·
3/2 𝜕𝑠𝑑𝑜𝑐
2𝑒𝑖𝑞

We achieve the RiskLoss derivatives in Equation 9 regarding the predicted relevance of a document 𝑠𝑑𝑜𝑐 (a function of the network
parameters), where L is a loss function with a continuous behavior, such as known loss functions ListNet, NDCGLoss, and SpearmanLoss.
Since our function has derivative property, it can be used in back-propagation algorithms to adjust the network weights and achieve a
minimal loss.

You might also like