Professional Documents
Culture Documents
SVD++, male users receive less miscalibrated (i.e. more cal- is no data point for female groups when the value of the x
ibrated) recommendations than females. Lower miscalibra- axis is larger than 400, meaning the largest average profile
tion for females than males on SVD++ shows an interesting size for female groups is 400 while there are some male user
result in our experiments that needs further investigation. groups with an average profile size of around 700.
Figure 1 shows the relationship between the degree of Figure 2 shows the correlation between the aforemen-
anomaly, entropy, and profile size for 20 user groups for tioned factors for different user groups and the precision of
both male and female users and the miscalibration of the their recommendations. Unlike miscalibration, it seems the
recommendations they received. As we can see in the first correlations of these three factors with precision are much
row (anomaly vs miscalibration), in all algorithms except stronger. For example, the first row of this Figure shows that
for SVD++, the recommendations given to the female users the higher the inconsistency of the ratings, the lower the pre-
have higher miscalibration (they are less calibrated) regard- cision is, which is what we expected. The second row shows
less of the anomaly of their ratings compared to the male a strong correlation (correlation coefficient ≈ 0.9) indicat-
user groups. Also, we can see that the positive correlation ing that user groups with higher entropy (more information
between profile anomaly and recommendation miscalibra- gain) in their ratings receive more accurate (higher preci-
tion discussed in Mansoury et al. (2019) can only be seen sion) recommendations. Also, from the same Figure, we can
on male users. The second row of Figure 1 shows the re- see for the lower values of entropy, the algorithms behave
lationship between the entropy of the ratings and the mis- more fairly, but, the larger the entropy gets, the discrimina-
calibration of their recommendations. Again, it can be seen tion between female and male user groups becomes more ap-
that except for SVD++, for all other algorithms, female user parent (higher precision for male user groups). The relation-
groups have higher miscalibration in their recommendations ship between average profile size and precision is also shown
regardless of the amount of entropy of their ratings. in the last row of Figure 2. As expected, user groups with
Finally, the last row of Figure 1 shows the correlation larger profiles benefit from more accurate recommendations
between the average profile size of different user groups for both males and females. However, the discrimination can
and the miscalibration of their recommendations. Looking still be seen for some algorithms such as UserKNN where fe-
at this plot, we can see that there is no significant correlation male users with the same profile size still receive recommen-
between these two indicating the profile size of the users dations with lower precision compared to the male users.
does not affect the miscalibration of their recommendations.
However, except for SVD++, again all algorithms have higher Conclusion and Future Work
miscalibration for female user groups regardless of their pro- Unfairness in recommendation is an important issue that
file size. It seems that SVD++ is indeed the fairest algorithm needs to be diagnosed and treated properly. In this paper, we
among the four as it gives a comparable performance for investigated the impact of three different factors on recom-
both male and female users. It can also be seen that there mendation performance and fairness: the degree of anomaly,
Figure 2: The Correlation between anomaly, entropy, and size of the users’ profiles and precision of the recommendations
generated for them. Numbers next to the legends in the plots show the correlation coefficient for each user group.
entropy, and profile size. We made several interesting ob- Anuyah, O.; McNeill, D.; and Pera, M. S. 2018. All
servations that need further research. First, we observed the cool kids, how do they fit in?: Popularity and demo-
that neighborhood-based algorithms such as UserKNN and graphic biases in recommender evaluation and effective-
ItemKNN discriminate more against women in this data set, ness. In In Conference on Fairness, Accountability and
in part because they are more affected by the specific char- Transparency, 172–186.
acteristics of the profiles such as profiles size and entropy as Guo, G.; Zhang, J.; Sun, Z.; and Yorke-Smith, N. 2015. Li-
these characteristics are present more in the female profiles brec: A java library for recommender systems. In UMAP
than male ones. On the other hand, the Matrix Factoriza- Workshops.
tion based methods (ListRankMF and SVD++) that rely on
lower dimensional latent space and focus more on user-item Kamishima, T.; Akaho, S.; and Sakuma, J. 2011. Fairness-
interactions seem to even out this effect and make the recom- aware learning through regularization approach. In In
mendations fairer across different genders. In particular, the 11th International Conference on Data Mining Work-
SVD++ algorithm was able to give a comparable performance shops, 643–650.
for both male and female users, which provided a more fair Mansoury, M.; Burke, R.; Ordonez-Gauger, A.; and Sepul-
treatment across genders than the other algorithms. For fu- veda, X. 2018. Automating recommender systems ex-
ture work, we intend to use more datasets for our analysis. perimentation with librec-auto. In Proceedings of the
We will also investigate the contribution of each of the men- 12th ACM Conference on Recommender Systems, 500–
tioned factors on recommendation performance. Finally, we 501. ACM.
will explore other potential factors that could play a role in Mansoury, M.; Abdollahpouri, H.; Rombouts, J.; and Pech-
the unfairness of recommendation algorithms. enizkiy, M. 2019. The relationship between the consis-
tency of users’ ratings and recommendation calibration.
References arXiv preprint arXiv:1911.00852.
Abdollahpouri, H.; Mansoury, M.; Burke, R.; and Mobasher, Yao, S., and Huang, B. 2017. Beyond parity: Fairness objec-
B. 2019a. The impact of popularity bias on fair- tives for collaborative filtering. In In Advances in Neural
ness and calibration in recommendation. arXiv preprint Information Processing Systems, 2921–2930.
arXiv:1910.05755.
Zhu, Z.; Hu, X.; and Caverlee, J. 2018. Fairness-aware
Abdollahpouri, H.; Mansoury, M.; Burke, R.; and Mobasher, tensor-based recommendation. In In Proceedings of the
B. 2019b. The unfairness of popularity bias in recommen- 27th ACM International Conference on Information and
dation. In Workshop on Recommendation in Multistake- Knowledge Management, 1153–1162.
holder Environments (RMSE).
Ekstrand, M. D.; Tian, M.; Azpiazu, I. M.; Ekstrand, J. D.;