Professional Documents
Culture Documents
Perfilado
Perfilado
Profiling
1 Introduction
Author profiling [27] aims to identify the characteristics or traits of an author
by analyzing its sociolect, i.e., how language is shared among people. It is used
to determine traits such as age, gender, personality, or language variety. The
popularity of this task has increased notably in the last years.1 Since 2013, the
Uncovering Plagiarism Authorship and Social Software Misuse 2 (PAN) Lab at
CLEF conference organized annual shared tasks focused on different traits, e.g.
age, gender, language variety, bots, or hate speech and fake news spreaders.
Data annotation for author profiling is a challenging task. Aspects such as the
economic and temporal cost, the psychological and linguistic expertise needed by
the annotator, and the congenital subjectivity involved in the annotation task,
make it difficult to obtain large amounts of high quality data [2,33]. Furthermore,
with few exceptions for some tasks (e.g. PAN), most of the research has been
conducted on English corpora, while other languages are under-resourced.
1
According to https://app.dimensions.ai/, the term “Author profiling” tripled its
frequency in research publications in the last ten years.
2
https://pan.webis.de/.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
P. Rosso et al. (Eds.): NLDB 2022, LNCS 13286, pp. 333–344, 2022.
https://doi.org/10.1007/978-3-031-08473-7_31
334 M. Chinea-Rios et al.
Few-shot (FS) learning aims to train classifiers with little training data. The
extreme case, zero-shot (ZS) learning [6,15], does not use any labeled data. This
is usually achieved by representing the labels of the task in a textual form, which
can either be the name of the label or a concise textual description. One popular
approach [12,34,36,37] is based on textual entailment. That task, a.k.a. Natu-
ral Language Inference (NLI) [3,8], aims to predict whether a textual premise
implies a textual hypothesis in a logical sense, e.g. Emma loves cats implies that
Emma likes cats. The entailment approach for text classification uses the input
text as the premise, and the text representing the label as the hypothesis. An
NLI model is then applied to each premise-hypothesis pair, and the entailment
probability is used to get the best matching label.
Our contributions are as follows: (i) We study author profiling from a low-
resourced viewpoint and explore different zero and few-shot models based on
textual entailment. This is, to the best of our knowledge, the first attempt to use
zero and few-shot learning for author profiling. (ii) We study the identification
of age, gender, hate speech spreaders, fake news spreaders, bots, and depression,
in English and Spanish. (iii) We analyze the effect of the entailment hypothesis
and the size of the few-shot training sample on the system’s performance. (iv)
Our novel author instance selection method allows to identify the most relevant
texts of each author. Our experiments show that on average we can reach 80%
of the accuracy of the top PAN systems using less than 50% of the training data.
In this section, we give details zero and few-shot text classification based on
textual entailment and describe how we apply it to author profiling.
[7,18] has shown that Siamese networks can also work well if pre-trained on NLI
data. A Siamese network encodes premise and hypotheses into a high dimen-
sional vector space and uses a similarity function such as the dot-product to find
the hypothesis that best matches the input text. These model can also be used
in a few-shot set up. Here we follow the common approach [18] of using the
so
B B
called batch softmax [13]: J = − B i=1 S(xi , yi ) − log j=1 exp S(xi , yj ) ,
1
where B is the batch size and S(x, y) = f (x) · f (y) the similarity between input
x and label text y under the current model f . All other elements of the batch
are used as in-batch negatives. We construct the batches so that every batch
contains exactly one example of each label.
3 Related Work
Early Author Profiling attempts focused on blogs and formal text [1,14] based
on Pennebaker’s [22] theory, which connects the use of the language with the per-
sonality traits of the authors. With the rise of social media, researchers proposed
methodologies to profile the authors of posts where the language is more infor-
mal [5]. Since then, several approaches have been explored. For instance, based
on second order representation which relates documents and user profiles [17],
the Emograph graph-based approach enriched with topics and emotions [24], or
the LDSE [26], commonly used as a baseline at PAN. Recently, the research has
focused on the identification of bots and users who spread harmful information
(e.g. fake news and hate speech). In addition, there has been work to leverage the
impact of personality traits and emotions to discriminate between classes [11].
3
https://tinyurl.com/st-agg-cluster.
336 M. Chinea-Rios et al.
Table 1. Dataset overview. train and test show the number of users for each class.
The depression categories are minimal (--), mild (-), moderate (+) and severe (++).
Zero and Few-shot Text Classification has been explored in different manners.
Semantic similarity methods use the explicit meaning of the label names to
compute the similarity with the input text [31]. Prompt-based methods [32]
use natural language generation models, such as GPT-3 [4], to get the most
likely label to be associated with the input text. In this work, we use entailment
methods [12,36]. Recently, Siamese Networks have been found to give similar
accuracy while being much faster [18].
4 Experimental Setup
4.1 Dataset
We conduct a comprehensive study in 2 languages (English and Spanish) and 7
author profiling tasks: demographics (gender, age), hate-speech spreaders, bot
detection, fake news spreaders, and depression level. We use datasets from the
PAN and early risk prediction on the internet (eRisk) [20] shared tasks. Table 1
gives details on the datasets.
– Best performing system (winner): We show the test results of the system
that ranked first at each shared task overview paper. The reference to those
specific systems can be found in the Appendix (Sect. 7).
– Sentence-Transformers with logistic regression (ST-lr): We use the scikit-learn
logistic regression classifier [21] on top of the embeddings of the Sentence
Transformer model used for the Siamese Network experiments.
– Character n-grams with logistic regression (user-char-lr): We use [1..5] char-
acter n-grams with a TF-IDF weighting calculated using all texts. Note that
n − grams and logistic regression are predominant among the best systems
in the different PAN shared tasks [23,25].
– XLM-RoBERTa (xlm-roberta): Model based on the pre-trained XLM roberta-
base [16]. Trained for 2 epochs, batch size of 32, and learning rate of 2e-4.
– Low-Dimensionality Statistical Embedding (LDSE): This method [26] repre-
sents texts based on the probability distribution of the occurrence of their
terms in the different classes. Key of LDSE is to weight the probability of a
term to belong to each class. Note that this is one of the best ranked baselines
at different PAN editions.
English Spanish
Task model #EN Per-class hypothesis list #ES Per-class hypothesis list
Gender CA 8 I’m a {female, male} 4 identity hypothesis
SN 8 My name is {Ashley, Robert} 4 Soy {una mujer, un hombre}
Age CA 6 identity hypothesis 5 La edad de esta persona es {entre 18
y 24, entre 25 y 34, entre 35 y 49,
más de 50} años
SN 6 I am {a teenager, a young adult, an 5 identity hypothesis
adult, middle-aged}
Hate speech CA 8 This text expresses prejudice and 6 Este texto expresa odio o prejuicios;
hate speech; This text does not Este texto es moderado, respetuoso,
contain hate speech cortés y civilizado
SN 8 This text contains prejudice and 6 Odio esto; Soy respetuoso
hate speech directed at particular
groups; This text does not contain
hate speech
Bots CA 7 This is a text from a machine; This 6 identity hypothesis
is a text from a person
SN 7 identity hypothesis 6 Soy un bot; Soy un usuario
Fake news CA 8 This author spreads fake news; This 6 Este autor publica noticias falsas;
author is a normal user Este autor es un usuario normal
SN 8 identity hypothesis 6 Este usuario propaga noticias falsas;
Este usuario no propaga noticias
falsas
Depression CA 4 The risk of depression of this user is
{minimal,mild, moderate, severe}
SN 4 identity hypothesis
Table 3. English Validation results of the CA and SN models. n shows the number
of training users and s the total training size. Top results are highlighted with bold.
n Inst. Sel. Gender Age Hate speech Bots Fake news Depression
s F1 s F1 s F1 s F1 s F1 s F1
Cross-Attention 0 - 0 74.71.5 0 27.62.6 0 71.02.8 0 42.41.2 0 63.83.7 0 24.12.3
(CA) 8 Ra1 16 74.91.5 32 27.43.0 16 68.84.4 16 74.411.5 16 62.35.2 29 27.61.9
Ra50 800 73.22.7 1057 14.94.0 800 45.96.9 800 86.31.4 800 57.78.3 1, 3k 20.56.5
IS 257 67.54.3 361 34.63.3 348 60.17.2 191 87.61.4 203 54.83.1 1, 9k 23.610.9
16 Ra1 32 74.91.5 46 27.42.4 32 66.13.8 32 79.26.2 32 62.64.1 45 27.82.0
Ra50 1, 6k 67.75.3 2, 3k 12.50.6 1, 6k 45.55.0 1, 6k 87.10.4 1, 6k 55.98.5 2, 2k 13.66.4
IS 501 71.33.6 611 37.16.1 697 59.66.0 385 88.51.2 432 62.34.9 2, 9k 21.46.7
32 Ra1 64 74.81.5 78 27.22.3 64 65.95.5 64 86.33.3 64 65.95.5 - -
Ra50 3, 2k 69.24.1 3, 9k 12.50.6 3, 2k 56.210.3 3, 2k 88.81.3 3, 2k 48.42.8 - -
IS 1, 0k 75.53.4 1, 0k 41.73.4 1, 4k 56.06.5 756 89.71.2 880 61.34.5 - -
48 Ra1 96 74.81.2 - - 96 62.28.1 96 89.91.2 96 65.53.5 - -
Ra50 4, 8k 72.51.9 - - 4, 8k 56.110.3 4, 8k 90.50.9 4, 8k 50.04.5 - -
IS 1517 75.02.6 - - 2059 66.411.0 1132 90.41.4 1320 65.85.3 - -
64 Ra1 128 74.51.2 - - 128 59.010.5 128 90.71.2 128 65.33.3 - -
Ra50 6, 4k 74.82.2 - - 6, 4k 62.914.3 6, 4k 90.12.1 6, 4k 53.02.6 - -
IS 2, 0k 75.62.7 - - 2, 2 59.58.7 1, 5k 90.41.4 1, 8k 61.24.1 - -
128 Ra1 256 74.90.8 - - - - 256 90.42.0 - - - -
Ra50 12, 8k 74.31.6 - - - - 12, 8k 92.31.9 - - - -
IS 4, 0k 76.41.7 - - - - 3, 0k 92.11.6 - - - -
256 Ra1 512 75.61.3 - - - - 512 91.30.8 - - - -
Ra50 25, 6k 73.90.9 - - - - 25, 6k 95.81.3 - - - -
IS 8, 0k 79.01.2 - - - - 6, 0k 94.51.4 - - - -
512 Ra1 1, 0k 70.91.3 - - - - 1, 0k 93.01.1 - - - -
Ra50 51, 2k 74.30.6 - - - - 51, 2k 97.90.6 - - - -
IS 16, 0k 77.52.3 - - - - 11, 9k 97.00.6 - - - -
Siamese Network 0 - 0 38.13.5 0 37.46.6 0 46.312.4 0 62.21.4 0 51.82.4 0 21.95.2
(SN) 8 Ra1 16 55.18.7 32 39.610.9 16 41.76.1 16 50.410.1 16 58.87.9 29 26.35.5
Ra50 800 63.16.4 1057 50.57.1 800 62.67.8 800 84.51.3 800 55.85.6 1, 4k 29.23.5
IS 257 65.24.8 361 51.77.5 348 60.93.7 191 86.11.8 203 54.05.5 1, 9k 32.77.2
16 Ra1 32 59.43.2 46 39.67.1 32 47.17.1 32 73.38.9 32 59.53.0 45 23.75.5
Ra50 1, 6k 68.24.1 2, 3k 55.44.2 1, 6k 59.54.7 1, 6k 84.72.8 1, 6k 58.27.4 2, 2k 25.17.1
IS 501 69.33.6 611 50.94.2 697 60.79.4 385 86.91.8 432 61.77.4 2, 9k 28.84.8
32 Ra1 64 59.65.9 78 38.57.1 64 49.04.9 64 84.51.8 64 60.43.8 - -
Ra50 3, 2k 71.62.3 3, 9k 52.97.4 3, 2k 58.85.2 3, 2k 87.14.1 3, 2k 65.25.7 - -
IS 1, 0k 71.42.7 1, 0k 51.48.0 1, 4k 60.45.3 756 88.61.6 880 62.65.5 - -
48 Ra1 96 65.63.8 - - 96 57.97.2 96 86.52.9 96 63.33.3 - -
Ra50 4, 8k 73.11.6 - - 4, 8k 65.54.9 4, 8k 89.61.4 4, 8k 65.54.9 - -
IS 1, 5k 72.62.4 - - 2, 0k 67.84.1 1, 1k 90.11.7 1, 3k 68.45.8 - -
64 Ra1 128 66.54.2 - - 128 56.39.5 128 88.21.5 128 56.39.5 - -
Ra50 6, 4k 74.12.1 - - 6, 4k 68.76.0 6, 4k 89.61.5 6400 63.94.5 - -
IS 2, 0k 74.42.6 - - 2, 2k 68.45.9 1, 5k 89.81.5 1, 8k 63.06.2 - -
128 Ra1 256 67.82.3 - - - - 256 89.81.9 - - - -
Ra50 12, 8k 75.80.7 - - - - 12, 8k 92.41.7 - - - -
IS 4, 0k 75.30.6 - - - - 3, 0k 92.51.7 - - - -
256 Ra1 512 70.31.9 - - - - 512 91.21.6 - - - -
Ra50 25, 6k 77.11.5 - - - - 25, 6k 95.41.7 - - - -
IS 8, 0k 76.41.4 - - - - 6, 0k 94.52.2 - - - -
512 Ra1 1, 0k 73.91.9 - - - - 1, 0k 93.41.7 - - - -
Ra50 51, 2k 77.11.7 - - - - 51, 2k 97.80.9 - - - -
IS 15, 9k 77.12.0 - - - - 11, 8k 96.91.1 - - - -
340 M. Chinea-Rios et al.
Table 4. Spanish Validation results of the CA and SN models. n shows the number
of training users and s the total training size. Top results are highlighted with bold.
Table 5. Test set results of CA and SN compared to several baseline and refer-
ence approaches. n shows the number of training users. Per-block top results without
confidence interval overlaps are highlighted with bold.
(Age, Hate Speech and Fake News) and CA in two (Gender and Bots). However,
both models offer a similar performance; with the exception of Age, where CA
failed to converge. This, together with the English results, leave SN as the most
stable across tasks and languages.
The few-shot experiments with the test sets use the best configurations
obtained at our validation phase, i.e., the number of training users (n = best)
and author instance selection method for each language and task. Table 5 com-
pares the test set results of CA and SN with several baselines and reference
approaches (see Sect. 4.3). Note that the baselines use all the available training
data (n = all) and that we show both the SN and CA zero and few-shot results.7
As you can see, the zero-shot models outperform xlm-roberta in English. CA
also does it for Spanish. Comparing them against other approaches, including
the shared task top systems (winner), we find encouraging results: the zero-shot
models ranked third at some tasks, e.g. EN CA Hate Speech, EN CA Gender,
and EN CA Depression. This is remarkable considering that those models did
not use any training data, and shows their potential for low-resource author
profiling. Note that few shot (best) worked similarly or better than the CA and
SN models trained with all the train set (all ). Interestingly, LDSE, a popular
profiling baseline, ranks first for Depression.8
As expected, few-shot out-performs zero-shot in most tasks with few excep-
tions for CA: EN Hate Speech, EN Fake News, and EN Depression. Similar to
the validation experiments, as CA failed to converge for some tasks, SN offers
higher stability. Finally, the comparison of the few-shot models with the rest
of approaches shows it is ranking third for some tasks and languages: EN CA
Gender, EN CA Hate Speech, ES CA Bots, ES SN Fake News. In comparison
with winner the FSL models reach 80% of the accuracy of the winner system in
90% of the cases and 90% in 50% of the cases. They do so using less than 50%
of the training data. This shows that FSL often yields competitive systems with
less individual tuning and less annotation effort.
6 Conclusions
In this work, we studied author profiling from a low-resource perspective: with
little or no training data. We addressed the task using zero and few-shot text
classification. We studied the identification of age, gender, hate speech spreaders,
fake news spreaders, bots, and depression. In addition, we analyzed the effect of
the entailment hypothesis and the size of the few-shot training sample on the
system’s performance. We evaluated corpora both in Spanish and English. On
the comparison of Cross Attention and Siamese networks, we find that the former
performs better in the zero-shot scenario while the latter gives more stable few-
shot results across the evaluated tasks and languages. We find that entailment-
based models out-perform supervised text classifiers based on roberta-XLM and
7
The standard deviation results of Table 5 are omitted due to space. However, this
does not affect our analysis. Omitted values can be found in Appendix (Sect. 7).
8
The accuracy of the eRisk SOTA system corresponds to the DCHR metric used in
the shared task.
Zero and Few-Shot Learning for Author Profiling 343
that we can reach 80% of the state-of-the-art accuracy using less than 50% of the
training data on average. This highlights their potential for low-resource author
profiling scenarios.
7 Appendix
The repository at https://tinyurl.com/ZSandFS-author-profiling contains ex-
perimental details and results.
References
1. Argamon, S., Koppel, M., Fine, J., et al.: Gender, genre, and writing style in formal
written texts. In: TEXT, pp. 321–346 (2003)
2. Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis:
machine learning perspective. In: RANLP, pp. 97–102 (2017)
3. Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning
natural language inference. In: EMNLP, pp. 632–642 (2015)
4. Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners.
In: Advances in NIPS, pp. 1877–1901 (2020)
5. Burger, J.D., Henderson, J., Kim, G., et al.: Discriminating gender on Twitter. In:
EMNLP, pp. 1301–1309 (2011)
6. Chang, M.W., Ratinov, L.A., Roth, D., et al.: Importance of semantic representa-
tion: dataless classification. In: AAAI, pp. 830–835 (2008)
7. Chu, Z., Stratos, K., Gimpel, K.: Unsupervised label refinement improves dataless
text classification. arXiv (2020)
8. Dagan, I., Glickman, O., Magnini, B.: The Pascal recognising textual entailment
challenge. In: Machine Learning Challenges Workshop, pp. 177–190 (2006)
9. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional
transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
10. Franco-Salvador, M., Rangel, F., Rosso, P., et al.: Language variety identification
using distributed representations of words and documents. In: CLEF, pp. 28–40
(2015)
11. Ghanem, B., Rosso, P., Rangel, F.: An emotional analysis of false information in
social media and news articles. ACM Trans. Internet Technol. 20, 1–18 (2020)
12. Halder, K., Akbik, A., Krapac, J., et al.: Task-aware representation of sentences
for generic text classification. In: COLING, pp. 3202–3213 (2020)
13. Henderson, M.L., Al-Rfou, R., Strope, B., et al.: Efficient natural language response
suggestion for smart reply. arXiv (2017)
14. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts
by author gender. Linguist. Comput. 17, 401–412 (2004)
15. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: National
Conference on Artificial Intelligence, pp. 646–651 (2008)
344 M. Chinea-Rios et al.
16. Liu, Y., Ott, M., Goyal, N., et al.: Roberta: a robustly optimized BERT pretraining
approach. arXiv (2019)
17. Lopez-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., et al.: INAOE’s par-
ticipation at PAN’13: author profiling task. In: CLEF (2013)
18. Müller, T., Pérez-Torró, G., Franco-Salvador, M.: Few-shot learning with siamese
networks and label tuning. In: ACL (2022, to appear)
19. Nie, Y., Williams, A., Dinan, E., et al.: Adversarial NLI: a new benchmark for
natural language understanding. arXiv (2019)
20. Parapar, J., Martı́n-Rodilla, P., Losada, D.E., et al.: Overview of eRisk at CLEF
2021: early risk prediction on the Internet. In: CLEF (2021)
21. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning
in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
22. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural
language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)
23. Rangel, F., Giachanou, A., Ghanem, B.H.H., et al.: Overview of the 8th author
profiling task at pan 2020: profiling fake news spreaders on Twitter. In: CEUR,
pp. 1–18 (2020)
24. Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process.
Manag. 52, 73–92 (2016)
25. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: bots
and gender profiling in Twitter. In: CEUR, pp. 1–36 (2019)
26. Rangel, F., Rosso, P., Franco-Salvador, M.: A low dimensionality representation
for language variety identification. In: CICLING, pp. 156–169 (2016)
27. Rangel, F., Rosso, P., Koppel, M., et al.: Overview of the author profiling task at
pan 2013. In: CLEF, pp. 352–365 (2013)
28. Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 3rd author profiling
task at pan 2015. In: CLEF (2015)
29. Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 5th author profiling
task at pan 2017: gender and language variety identification in Twitter. In: CLEF,
pp. 1613–0073 (2017)
30. Rangel, F., Sarracén, G., Chulvi, B., et al.: Profiling hate speech spreaders on
Twitter task at pan 2021. In: CLEF (2021)
31. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese
Bert-networks. In: EMNLP (2019)
32. Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification
and natural language inference. In: EACL, pp. 255–269 (2021)
33. Troiano, E., Padó, S., Klinger, R.: Emotion ratings: how intensity, annotation con-
fidence and agreements are entangled. In: Workshop on Computational Approaches
to Subjectivity, Sentiment and Social Media Analysis, pp. 40–49 (2021)
34. Wang, S., Fang, H., Khabsa, M., et al.: Entailment as few-shot learner. arXiv
(2021)
35. Wolf, T., Debut, L., Sanh, V., et al.: Transformers: state-of-the-art natural lan-
guage processing. In: EMNLP (2020)
36. Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets,
evaluation and entailment approach. In: EMNLP, pp. 3914–3923 (2019)
37. Yin, W., Rajani, N.F., Radev, D., et al.: Universal natural language processing
with limited annotations: try few-shot textual entailment as a start. In: EMNLP,
pp. 8229–8239 (2020)