Professional Documents
Culture Documents
Sensory Perception
Youness Hourri
hourri@yahoo.com
Final Study Report for the Master’s Degree in Data Science from
the Faculty of Sciences of Fez and the Galilée Institute in Paris
Internship Supervisor :
Dr. Anne-Lise Saive Jury Members :
anne-lise.saive@institutpaulbocuse.com Prof. Ali Yahyaouy
Dr. Anne-Lise Saive
Academic Advisor : Prof. Dounia El Bourakadi
Prof. Khalid El Fazazy Prof. Khalid El Fazazy
khalid.elfazazy@usmba.ac.ma
Acknowledgements
First and foremost, I would like to thank Allah the Almighty, the Most Gracious,
and the Most Merciful for His blessings during my study and in completing this work.
I would like to express my deep gratitude to Dr. Anne-Lise Saive, my supervisor, for
her unfailing support and valuable advice throughout my internship. Her ideas and
guidance have played an essential role in shaping my understanding of the research
process.
Moreover, I would like to thank the teachers and professors at the University of Fez
and the University of Paris for their invaluable contributions to our academic jour-
ney. Their passion for teaching, expertise in their respective fields, and unwavering
support have played a pivotal role in shaping our knowledge and skills.
I would like to express my heartfelt gratitude to the staff at LYFE Institute for
their warm welcome and support during my internship. Their kindness, guidance,
and expertise have greatly contributed to my professional growth. Thank you for
making my experience truly memorable and rewarding.
My ultimate thanks are dedicated to my beloved parents for their endless support,
love, and prayers. I also would like to thank my brother and sisters who have given
me plentiful help and support in completing this work. Also, huge thanks go to my
colleagues in the computer science department for every single moment of joy and
sorrow we cherished together since the first time we stepped into our campus up to
this very second.
Finally, I would like to express my thanks to all my friends and all persons who
helped me in completing this work whose names cannot be mentioned one by one
for their help and support.
List of Figures
1 Institut Lyfe (Lyon for Excellence) Logo. . . . . . . . . . . . . . . . . 7
2 Institut Supérieur International du Parfum, de la Cosmétique et de
l’Aromatique Alimentaire Logo. . . . . . . . . . . . . . . . . . . . . . 8
3 Brain plasticity : changes in the cerebral cortex and neuronal struc-
ture and function that occur after regular practice. . . . . . . . . . . 10
4 Description strategies used by coffee experts, wine experts and novices 12
5 Overview of the olfactory experiment, showing the four sessions and
five types of tests conducted, including evocation, description, cate-
gorization of odors, difference, and memory tests. . . . . . . . . . . . 14
6 A screenshot of the Google Colab platform interface with similarity
measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7 A screenshot of the jamovi software interface with a two-way ANOVA
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 A schema illustrating the steps involved in text preprocessing. . . . . 19
9 Cosine similarity for vector space models. . . . . . . . . . . . . . . . . 21
10 Clustering example with kMeans (left) and Mean shift (right). These
two clusterings have an adjusted Rand index of 94% . . . . . . . . . . 23
11 An example of calculating Rand Index between two partitions . . . . 24
12 Linear Relationships between Words from developers.google.com . . . 25
13 Architectural comparison of CBOW and Skip-gram models for word
representation learning. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
14 Skip-gram paradigm example demonstrates how to predict context
using a single word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
15 Co-occurrence matrix for three sentences with two words as context. . 28
16 Conceptual model for the GloVe model’s implementation from to-
wardsdatascience.com. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
17 Diagram of a transformer architecture, showing the encoder and de-
coder components with self-attention layers and multi-head attention
mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
18 Scaled dot-product attention mechanism. Queries, keys, and values
are input into the attention mechanism. . . . . . . . . . . . . . . . . . 31
19 Multi-head attention mechanism. Queries, keys, and values are lin-
early projected multiple times. . . . . . . . . . . . . . . . . . . . . . . 32
20 Comparison of PCA (on the left) and t-SNE (on the right) on 5000
samples from the MNIST handwritten digit dataset (from ml-lectures.org). 36
21 Average number of answers given per group and per measurement
time for the description task. . . . . . . . . . . . . . . . . . . . . . . . 38
22 Frequency of the words most used by the ISIPCA group for the de-
scription task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
23 Frequency of the words most used by the control group for the de-
scription task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
24 Number of words used from the vocabulary selected (40% of the most
frequent words) by the ISIPCA group. . . . . . . . . . . . . . . . . . 40
25 Evolution of the use of adjectives (on the left) and nouns (on the
right) over time at ISIPCA students. . . . . . . . . . . . . . . . . . . 40
26 Visualization of participant responses in 2D space using PCA for the
description task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
27 Visualization of participant responses in 2D space using TSNE for
the description task. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
28 Graph illustrating the evolution of intra-subject cosine similarity over
time using FlauBERT for the description task. . . . . . . . . . . . . . 43
29 Graph showing the evolution of intra-subject Euclidean distance over
time using FlauBERT for the description task. . . . . . . . . . . . . . 43
30 Evolution of the average number of words used over time for the
ISIPCA group and the control group for the evocation task. . . . . . 44
31 Frequency of the words most used by the ISIPCA group for the evo-
cation task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
32 Frequency of the words most used by the control group for the evo-
cation task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
33 Evolution of the number of words used by the ISIPCA group during
the sensory evocation task using a vocabulary based on the most
frequently used words. . . . . . . . . . . . . . . . . . . . . . . . . . . 47
34 Evolution of the use of grammatical categories by the ISIPCA group
during the sensory evocation task at the different tenses (T0, T1 and
T2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
35 Scatterplot of responses used by the ISIPCA group and the control
group during the sensory evocation task using PCA. . . . . . . . . . . 48
36 Scatterplot of responses used by the ISIPCA and control group for
the sensory evocation task, generated using t-SNE. . . . . . . . . . . 49
37 Graph showing the evolution of intra-subject Euclidean distance over
time using FlauBERT for the evocation task. . . . . . . . . . . . . . . 50
38 Average number of groups formed by participants over time for the
ISIPCA and Control groups. . . . . . . . . . . . . . . . . . . . . . . . 50
39 Rand coefficient between the reference score and the participants’
scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
40 Adjusted Rand coefficient between the reference partition and the
participants’ partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
41 Cohen’s Kappa between the reference score and the participants’ scores. 52
List of Tables
1 Description of different products used in evocation and description
tasks provided by a perfume expert. . . . . . . . . . . . . . . . . . . . 15
2 Composition of different fragrance ingredients used in categorization
task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 An example of a data snippet for the similarity measure using Sen-
tenceBERT as an input for ANOVA . . . . . . . . . . . . . . . . . . . 22
Contents
1 Introduction 7
1.1 Institut Lyfe Research & Innovation Center . . . . . . . . . . . . . . 7
1.2 L’Institut Supérieur International du Parfum, de la Cosmétique et de
l’Aromatique Alimentaire . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Background on brain plasticity and olfactory expertise . . . . . . . . 8
1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Analysis methodology 19
3.1 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Mathematical approaches . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Quantitative analysis & descriptive statistics . . . . . . . . . . 20
3.2.2 Cosine similarity . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.3 Repeated measures ANOVA . . . . . . . . . . . . . . . . . . . 21
3.2.4 Cohen’s kappa . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.5 Rand index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Machine learning tools . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Word embedding . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Word2Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3 GloVe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.5 Principal component analysis . . . . . . . . . . . . . . . . . . 33
3.3.6 t-distributed stochastic neighbor embedding . . . . . . . . . . 34
4 Results 37
4.1 Description task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Richness of vocabulary . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 Grammatical tagging . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.3 Intra-subject similarity . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Evocation task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Diversity of words . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Grammatical tagging . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.3 Intra-subject semantic distance . . . . . . . . . . . . . . . . . 46
4.3 Categorization task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Number of groups . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2 Agreement measure . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Discussion 53
5.1 Description task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Evocation Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Categorization Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1 Introduction
1 Introduction
1.1 Institut Lyfe Research & Innovation Center
Inspired by two visionaries, Paul Bocuse and Gérard Pélisson, Institut Lyfe (formerly
Institut Paul Bocuse) is a higher education institution dedicated to the transmission
of exceptional know-how in the hotel and restaurant trades. Since 1990, the Institute
has been the repository of a cultural heritage and the art of French hospitality at
their highest levels of excellence. Its research center, created in 2008, conducts
multidisciplinary scientific research on the food transition, recognized in France
and internationally. Its mission is to meet the challenge of a tasty, healthy and
sustainable food for all and in all contexts of meals or hospitality.
As a unique research center within the higher schools of hospitality and catering, it
plays an active role in the evolution of culinary practices and restaurant manage-
ment. The knowledge produced by the research center nourishes the pedagogy of
Institut Lyfe, thus forming the responsible leaders of tomorrow, capable of facing
the societal challenges of sectors in constant evolution. The research center has more
than 20 researchers divided into three key research areas:
• Cognitive Sciences (where I did my internship) look at food perception and
cognition. They study how our senses interact with food, how we perceive it,
and how these perceptions influence our food preferences and behaviors.
• Social Sciences explore food cultures and transitions, examining how eating
habits evolve over time and how they influence our food choices.
• Nutrition and Eating Behavior focus on food, nutrition, and health. These
researchers explore the links between diet and health, studying nutrients, eat-
ing patterns, and their impacts on our well-being.
Thanks to these interconnected research areas, Institut Lyfe Research & Innovation
Center contributes to the advancement of knowledge and innovation in the field of
food transition, thus making a valuable contribution to the training of tomorrow’s
professionals and society as a whole.
arousing love and admiration. For this, he does not hesitate to kill young women
whose fragrant essence he captures.
Grenouille is the protagonist of the novel The Perfume by Patrick Süskind, pub-
lished in 1985. The novel describes with great finesse his ability to perceive odors,
which allows him to distinguish the slightest nuances and the slightest variations of
the scents that surround him. Thus, he can identify the ingredients of a perfume,
recognize the emotions and intentions of people, or even blend into his environment
by masking his smell. Perfume becomes for him a language, an expression of his
genius and his madness.
Neuroplasticity, also referred to as brain or neural plasticity, is the brain’s capacity
to change and adapt in response to new experiences or behaviors[1]. This process
involves the growth and reorganization of neural networks in the brain, allowing it
to be rewired to function differently from how it previously functioned.
It was once believed that neuroplasticity only occurred during childhood, but re-
search [2] has shown that many aspects of the brain can be altered even through
adulthood. However, the developing brain exhibits a higher degree of plasticity than
the adult brain. Activity-dependent plasticity can have significant implications for
healthy development, learning, and memory.
The mechanisms of neuroplasticity are numerous and complex. They involve the
formation of new synapses, the modification of the properties of existing synapses,
and the regulation of gene expression. These changes range from individual neuron
pathways making new connections to systematic remapping [3, 4]. Neuroplasticity
allows neurons to adapt to new situations. It is a powerful tool for adaptation.
Regular training can induce changes in behavior, the cerebral cortex1 , and neurons
as shown in Figure 3 [https://en.wikipedia.org/wiki/Kinesiology]. In fact, regular
practice of a specific task can lead to measurable improvements in behavior, such
as better accuracy or faster execution [1, 5]. These improvements are linked to
structural changes in the cerebral cortex, including increased neuronal density in
the regions associated with the task practiced [6]. At the neuronal level, regular
training strengthens existing neural connections and promotes the formation of new
connections [7]. These changes are often associated with an increase in synaptic plas-
ticity, allowing better information transmission and optimization of neural circuits
involved in the practiced task.
In musicians, musical practice can lead to changes in the structure and function of the
brain. For example, a study has shown that musicians have a higher density of gray
matter in certain regions of the brain involved in auditory information processing
and motor coordination [8]. Likewise, the practice of a sport can also lead to changes
in the structure and function of the brain. For example, a study has shown that
1
The cerebral cortex is an organic tissue, called gray matter, covering both hemispheres of
the brain. It is the seat of the most advanced neurological functions, such as language, memory,
consciousness
Figure 3: Brain plasticity : changes in the cerebral cortex and neuronal structure
and function that occur after regular practice.
athletes have a greater density of gray matter in certain regions of the brain involved
in the processing of visual and spatial information [9, 10].
Brain plasticity can be induced by a variety of stimuli, such as learning, physical
exercise, and sensory stimulation [11, 12]. The resulting changes in brain structure
and function can be observed at different levels, ranging from molecular and cellular
changes to changes in neural networks and brain regions [13]. For example, learning
a new skill can lead to changes in gray matter density in certain regions of the
brain [14]. Similarly, regular physical exercise can lead to changes in brain structure
and function that are associated with improved cognition [15, 14].
Perfumers are professionals specialized in the creation of perfumes. Their job is
to compose combinations of smells that create unique sensory experiences. The
creation of a complex perfume requires a thorough understanding of the different
olfactory notes, their interactions and their combinations.
The regular practice of identifying and composing different aromas can cause changes
in the brain of perfumers, especially in the regions associated with olfactory percep-
tion [16, 17]. Scientific studies have shown that olfactory experts, such as perfumers,
have differences in their brain activity and in the functional connectivity between
olfactory regions compared to non-expert individuals [18, 19].
The training and experience of perfumers can improve their sensitivity to odors and
their ability to discriminate and identify different olfactory notes. These skills can
be supported by neural plasticity mechanisms, such as the creation of new synaptic
connections between neurons or the strengthening of existing connects [20].
It should be noted that neural plasticity is a complex process that can be influenced
by various factors, including age, level of expertise, motivation and environment [21,
22]. Experienced perfumers, who have dedicated many years to refining their sense
of smell and their skills in composition, may have specific brain adaptations that
facilitate their work [16].
This paper [23] shows how naive subjects find it difficult to describe smells due
to a lack of vocabulary. However, this difficulty can be used to explore the re-
lationship between language and sensory perception. Experts can overcome this
problem through learning, training, and regular practice. Analyzing professional
speech about odors can help to understand how experts use language to describe
their sensory experience.
The olfaction is characterized by a prominent hedonic dimension. Previous study [24]
have shown that these affective responses to odors are modulated by physicochemi-
cal, physiological and cognitive factors. This study examined the influence of exper-
tise on the treatment of pleasant and unpleasant odors both at the perceptual and
verbal level. To do this, the performances of two olfactory tasks were compared be-
tween novices, cooks in training and experts (perfumers and aroma specialists): the
members of all the groups evaluated the intensity and the agreeableness of pleasant
and unpleasant odors (perceptual tasks). They were also asked to describe each of
the 20 odorants as precisely as possible (verbal description task). At the perceptual
level, the results revealed that there were no group-related differences in hedonic
evaluations of unpleasant and pleasant odors. At the verbal level, the descriptions
of odors were richer (for example, chemical terms, olfactory qualities, and olfactory
sources) and did not refer to approval among experts compared to untrained subjects
who used terms that refer to sources of odors (for example, sweets) accompanied by
terms that refer to the hedonics of odors.
In a study [25], coffee and wine experts were compared to novices for their ability to
describe the smells and flavors of coffee, wine, and the everyday smells and tastes.
The results showed that wine experts were more consistent in describing the smells
and flavors of wine than coffee experts and novices, but coffee experts were not
more consistent in describing coffee. Neither of the expert groups was more precise
in identifying everyday smells or tastes. In their descriptions of their own areas of
expertise, experts tend to use source-based terms (such as vanilla), while novices
tend to use evaluative terms (such as pleasant). However, the overall language
strategies of the two groups were similar as shown in Figure 4.
Experts can describe scents by using specific terms to describe the different olfac-
tory notes that make up a scent. For example, they may use terms such as "floral",
"fruity", "spicy" or "woody" to describe the different notes in a scent. Experts may
also use more technical terms to describe scents, such as "pyrazine" or "terpene",
which refer to specific chemical compounds present in the scent. Furthermore, ex-
perts may use metaphors to describe odors, such as "the smell of rain" or "the smell
of the sea," which evoke specific sensory associations.
Figure 4: Description strategies used by coffee experts, wine experts and novices
Figure 5: Overview of the olfactory experiment, showing the four sessions and five
types of tests conducted, including evocation, description, categorization of odors,
difference, and memory tests.
2.1.1 Participants
This study focused on the olfactory responses of 55 participants, including 48 women
and 7 men, aged 18.47 years ± 1.15 (mean ± standard deviation), who participated
in an olfactory experiment between 2018 and 2021. Among these participants, 40
were students from the ISIPCA school, which we considered as the target group,
and 15 were control students from the Sup’Biotech school2 .
One key difference between the two groups of participants in this study is their level
of olfactory training. The ISIPCA group, which was considered as the target group,
consisted of students who had undergone precise and specialized olfactory training
for 2 years. In contrast, the Sup’Biotech group, which served as the control group,
consisted of students who had not received any formal training in smells, aromas,
or other aspects of olfaction.
(after 6 months from T0) and T2 (after 6 months from T1). And for the con-
trol group (Sup’BioTech students) participated only at two measurement times T0
(first experiment) and T2 (after 12 months from T0). For all the tasks in question
(evocation, description and categorization).
2.1.3 Products
The evocation and description tasks were carried out with 12 different odors
in Table 1, representing taxonomic categories (e.g., fruits, flowers) and functional
categories (e.g., edible, bad smell). Each participant had to evoke and describe the
odors they smelled, without having access to visual or auditory cues. Two random
odors were selected from the 12 for each participant, in order to vary the stimuli
and avoid habituation effects (the same odors were used for all three measurement
times).
During the experiment, the participants were given one minute to smell and write
about each product, which was contained in a bottle coded with 3 digits. Two types
of questions were asked:
• For the evocation test, participants were asked: “What comes spontaneously
to your mind when you smell this product?”
• For the free description test, participants were asked: “How would you objec-
tively describe this product?”
Table 1: Description of different products used in evocation and description tasks
provided by a perfume expert.
Code Description
282 herbe - végétale
764 arbre - pin grillé
204 médicament - fleur tubéreuse
590 fruit - prune
516 floral - lilas
925 nettoyant - pin des landes
946 orange
126 feu de bois
529 cèdre
587 bergamote verte
673 figue
366 poubelle
The categorization task was an important part of the olfactory experiment, de-
signed to assess the participants’ ability to group and describe different odors. For
this task, 15 different odors were used, as listed in Table 2. Each participant was
presented with all of the products and asked to form groups based on their sim-
ilarities. After forming the groups, participants were then asked to describe each
Code Composition
145 Cacao
236 Fève Tonka
371 Genêt
452 Foin
593 Huile essentielle laurier noble
614 Huile essentielle sauge sclarée
728 Parmanthème
867 Mimosa
989 Tabac
54 Café
190 Huile essentielle bois de gaïac
277 Polysantol
312 Diacetyl
423/509 Glycolierral
2.2.2 Libraries
I have had the opportunity to explore and use different techniques for manipulating
and processing tabular data, as well as for cleaning and normalizing text. I will
present the main libraries and tools I used in this context.
To manipulate and process tabular data, I mainly have the Pandas library. which is
a powerful and flexible tool that allows to easily manipulate structured data. Thanks
to its data structures, especially DataFrames, I was able to perform operations such
as selecting, filtering, aggregating and merging data, as well as computing statistics
and transforming columns.
For text cleaning and normalization, I exploited several libraries, namely Spacy,
Stanza and NLTK. Spacy and Stanza are natural language processing (NLP) li-
braries that provide advanced features for text processing, such as tokenization,
grammar labeling, named entity extraction, and lemmatization. NLTK (Natural
Language Toolkit) is a popular NLP library that provides a wide range of tools for
cleaning, normalizing, and parsing text. I used these libraries to perform operations
such as stopword removal, word normalization and polarity detection.
I also used the Matplotlib and Seaborn libraries for data visualization. Matplotlib
is a very flexible data visualization library that allows you to create a wide variety
of custom graphs and visualizations. Seaborn is a Matplotlib-based library that
provides additional functionality for creating attractive and informative statistical
graphs. I used these libraries to graph tabular data and results of my analyses.
In terms of using pre-trained models, I used the Gensim library. which is an NLP
library that specializes in processing large textual datasets and using vector-based
text representation models such as Word2Vec. I exploited these pre-trained mod-
els to perform tasks such as finding similarity between words and creating vector
representations for textual data.
The Transformers library is an open-source text processing library for Python. It has
been developed by Hugging Face and is used to create and train deep learning models
for various text processing tasks such as text classification, text generation and
machine translation. It is based on the architecture of transform neural networks,
and we have used this library to transform participant responses into vectors to
measure inter-subject and intra-subject similarities using CamemBERT [26] and
FlauBERT [27] pre-trained models.
Finally, to use pre-trained models in English with non-English data, I integrated a
free version of DeepL API into our data processing pipeline. DeepL is a neural trans-
lation platform that offers high performance and high accuracy in text translation.
This API made it possible to automatically translate the data into English and guar-
antee their compatibility with the models, to facilitate analysis and understanding
of the results.
These different techniques and tools were essential for the realization of my intern-
ship, allowing me to perform advanced manipulations on the data, to clean and
normalize the text, to visualize the results and to exploit pre-trained models for
textual analysis.
3 Analysis methodology
3.1 Data preparation
During my internship, I implemented a variety of text pre-processing techniques
to prepare the textual data for subsequent analyses. These techniques included
cleaning, normalization, and text transformation steps (see Figure 8).
Pn
A·B Ai Bi
SC (A, B) = cos(θ) = = pPn i=12 pPn
kAkkBk i=1 Ai
2
i=1 Bi
Where A · B is the dot product of vectors A and B, and ||A|| and ||B|| are the
magnitudes (or norms) of vectors A and B, respectively [28].
In our case, we used cosine similarity to calculate the inter-group similarity in an
evocation and description task, as well as the similarity between participants and a
reference response in a description task.
Here’s a graphic in Figure 9 showing two vectors with similarities close to 1, close
to 0, and close to -1.
Juge Groupe T0 T1 T2
C1 CTRL 0.2291735284 0.2716048087
C10 CTRL 0.3014770065 0.2748694107
C11 CTRL 0.2744678091 0.1720879887
C12 CTRL 0.2434420459 0.1803486546
C13 CTRL 0.2162746218 0.2367323786
C14 CTRL 0.2612136049 0.2819378045
C15 CTRL 0.1678767215 0.2627067502
C16 CTRL 0.295219476 0.3641982749
C17 CTRL 0.3213923151 0.4006748945
J1 ISIPCA 0.2943289988 0.3441639841 0.1889039491
J10 ISIPCA 0.2493424309 0.4035999533 0.3439996958
J12 ISIPCA 0.323846442 0.2053571258 0.291668835
J14 ISIPCA 0.172436446 0.1559565675 0.1877741717
J2 ISIPCA 0.3896364868 0.2223658532 0.3889127714
J23 ISIPCA 0.2099970592 0.283210285 0.2950525121
J24 ISIPCA 0.2773481565 0.3201470574 0.298510928
J26 ISIPCA 0.3816302046 0.3322616657 0.260517223
J27 ISIPCA 0.4935722649 0.2991049886 0.3196476599
Po − P e
κ=
1 − Pe
where:
κ is the Cohen’s kappa coefficient, Po is the observed agreement,Pe is the expected agreement.
If the assessors totally agree, Cohen’s kappa is equal to 1. If they totally disagree
(or agree due solely to chance), Cohen’s kappa is less than or equal to 0.
Figure 10: Clustering example with kMeans (left) and Mean shift (right). These
two clusterings have an adjusted Rand index of 94%
Using the Rand Index as the similarity calculation measure, we can compute the
clustering accuracy at each iteration of the clustering process. Take Figure 11 for
example, for the pairs which are placed in the same cluster (i.e., same color) in P 1
and P 2 contains (a1 , a2 ), (a1 , a3 ), (a2 , a3 ), (a5 , a6 ), (a8 , a9 ). The pairs that are
placed in different clusters in both P 1 and P 2 include (a1 , a5 ), (a1 , a6 ), (a1 , a7 ),
(a1 , a8 ), (a1 , a9 ), (a2 , a5 ), (a2 , a6 ), (a2 , a7 ), (a2 , a8 ), (a2 , a9 ), (a3 , a5 ), (a3 , a6 ), (a3 ,
a7 ), (a3 , a8 ), (a3 , a9 ), (a4 , a7 ), (a4 , a8 ), (a4 , a9 ), (a5 , a8 ), (a5 , a9 ), (a6 , a8 ), (a6 , a9 ).
Then, there is Rand(P 1, P 2) = 5+22 36
= 75% . Obviously, the value of Rand Index
increases with iterations and at the final iteration of the clustering process, where
Pi = Pf , there is Rand(Pi , Pf ) = 1, which indicates that the process completes with
a 100% accuracy.
In our approach, we used the Rand index to quantify the overall agreement between
the categorizations of the test and the real categories of the subjects. The Rand
index measures the similarity between two sets of partitions by considering the pairs
of objects that are classified in the same way in both sets. Using this measure, we
were able to evaluate the performance of our categorization test in terms of overall
agreement with the real categories.
There are several other approaches to measure agreement between two or more meth-
ods of measurement. Some common measures of agreement include the concordance
correlation coefficient, total deviation index, and limits of agreement. These mea-
sures tend to be scalar quantities, with either small or large values implying good
agreement between the methods. Perfect agreement is implied by specific boundary
values of the measures.
Word embedding models capture various semantic relationships between words. For
example, "king" - "queen" (male, female), "swimming" - "swam" (verb, tense), and
"Paris" - "France" (capital, country), as illustrated in Figure 12. The vectors associ-
ated with these pairs reflect meaningful patterns that highlight gender associations,
verb tense transformations, and geographic hierarchies. Word embedding provide
valuable insights for language processing and analysis tasks.
Within our study, we employed this technique to portray the participants’ responses
as vectors. Consequently, we were able to compare and analyze the similarities and
dissimilarities among the responses from different participants for the description
and evocation tasks.
Word embedding is grounded on the distributional hypothesis [32] (often referred
to as the "Harris" hypothesis) that states words with comparable contextual usage
possess interconnected meanings. This technique serves the purpose of reducing
the dimensionality of word representations in comparison to vector models, thus
facilitating learning tasks associated with these words, as they are less affected by
dimension-related challenges [33].
To employ data effectively in machine learning, it is imperative to discover a math-
ematical representation, typically in the form of vectors. Word embedding offer
a representation of words in the form of vectors. By encoding all the words in
a dictionary using this method, it becomes possible to compare the word vectors,
for instance, by measuring the angle between them using cosine similarity, or by
calculating the Euclidean distance.
Before discussing some word embedding algorithms in detail, it’s important to note
that participants provide responses in the form of both simple words and sentences.
Therefore, it’s crucial to break down the sentences so that we can use the word
embedding templates. Afterward, we’ll explore the approaches that provide vector
representations of the entire sentences
3.3.2 Word2Vec
In 2013, Tomas Mikolov and his team at Google published two significant papers
[34, 31] that introduced novel models for effectively estimating vector representations
of words using large datasets. The research focused on addressing the challenge of
capturing semantic relationships between words in a continuous vector space.
One of the groundbreaking techniques introduced in these papers was Word2Vec,
a widely recognized natural language processing algorithm. Word2Vec employs a
neural network model to learn word associations from extensive collections of text,
such as large corpora or documents. The algorithm aims to map words to continuous
vector representations, enabling mathematical operations on these vectors to reveal
meaningful relationships between words.
Once trained, a Word2Vec model can exhibit impressive capabilities. For exam-
ple, it can identify words that are semantically similar or related to a given word.
This ability makes Word2Vec valuable for a variety of NLP tasks, including infor-
mation retrieval, sentiment analysis, machine translation, and more. Additionally,
the model can suggest additional words that could fit contextually within a partial
sentence, offering potential applications in text completion and generation.
To generate distributed word representations, Word2Vec models can use two primary
architectures: Continuous Bag of Words (CBOW) and Continuous Skip-gram. These
architectures are illustrated in Figure 13 from the main paper [34]. The CBOW
architecture predicts a target word based on the surrounding context words, whereas
the Skip-gram architecture predicts the context words given a target word.
1. Continuous Bag of Words (CBOW) :
• Input : In the CBOW architecture, the input consists of a context window
comprising surrounding words. For instance, in a sentence such as "The
cat sat on the mat," with a context window size of two, the input for the
word "sat" would be the words "The," "cat," "on," and "the".
• Projection : The input words are encoded as one-hot vectors, representing
individual words within a predefined vocabulary. These one-hot vectors
serve as input representations for the neural network.
• Output : The primary objective of CBOW is to predict the target word
based on its contextual information. The output layer of the neural net-
Figure 13: Architectural comparison of CBOW and Skip-gram models for word
representation learning.
In both architectures, the input and output layers are connected via a hidden layer,
known as the projection layer. The projection layer has a lower dimensionality
compared to the input and output layers, and its purpose is to learn meaningful
representations of words in a continuous vector space.
3.3.3 GloVe
GloVe, short for Global Vectors for Word Representation, is an unsupervised learning
algorithm developed by Stanford University in this paper [36]. It generates word
embeddings by utilizing a global word-context co-occurrence matrix derived from a
corpus. These embeddings exhibit intriguing linear substructures within the vector
space.
Figure 15: Co-occurrence matrix for three sentences with two words as context.
Figure 16: Conceptual model for the GloVe model’s implementation from
towardsdatascience.com.
It is important to note that Word2Vec and GloVe models work in a similar way.
They both try to create a vector space where the position of each word depends on
its neighboring words based on their meaning and context. The difference is that
Word2Vec uses local examples of word pairs that co-occur, while GloVe uses global
statistics of word co-occurrence across the whole corpus.
3.3.4 Transformers
In the field of natural language processing, word embedding techniques have revo-
lutionized the way we represent and understand the meaning of words. However,
traditional embedding models such as Word2Vec and GloVe are limited in their abil-
3
SGD is a technique used to optimize the parameters of a machine learning algorithm. By
making small, iterative changes to the configuration of the machine learning network, using a
randomly selected subset of data in order to reduce the error of the network.
ity to capture complex relationships between words and to grasp long-range context.
This is where transformers come in.
Artificial intelligence in general has undergone significant evolution thanks to the
introduction of transformers. This architecture, as shown in Figure 17 was intro-
duced in a paper titled Attention Is All You Need [37] by Ashish Vaswani and
his team. By outperforming recurrent neural network 4 based models, transform-
ers have revolutionized the way we approach text processing tasks such as machine
translation, text generation and many others.
Figure 17: Diagram of a transformer architecture, showing the encoder and decoder
components with self-attention layers and multi-head attention mechanisms.
the entire sequence. This ability to model interactions between words without re-
lying on a recurrent structure has marked a turning point in the field of natural
language processing.
Figure 18: Scaled dot-product attention mechanism. Queries, keys, and values are
input into the attention mechanism.
Figure 19: Multi-head attention mechanism. Queries, keys, and values are linearly
projected multiple times.
sentences in English. Although these models are all pre-trained transformers, they
have been trained on different data sets and for different tasks.
where n is the number of data points, xi and yi are the values of variables x
and y, and x̄ and ȳ are their respective means. Repeat this computation for
each pair of variables to fill the covariance matrix.
3. Calculate Eigenvectors and Eigenvalues: the covariance matrix is sym-
metric, so it can be diagonalized using eigendecomposition. The eigenvec-
tors represent the directions or axes in the original feature space, while the
eigenvalues indicate their importance or variance explained. To calculate the
eigenvectors and eigenvalues, solve the equation:
exp(−kxi − xj k2 /(2σi2 ))
pij = P 2 2
k6=l exp(−kxk − xl k /(2σi ))
where pij represents the similarity between points xi and xj , σi is the variance
of the Gaussian distribution for point xi , and the sum in the denominator
ensures that the similarities form a probability distribution.
• Step 2: Compute Affinities
In this step, the algorithm converts the similarities into conditional probabil-
ities, also known as affinities. This is achieved by symmetrizing the pairwise
similarities and normalizing them. The equation for computing the affinity qij
between two points, xi and xj , in the low-dimensional space is given by:
(1 + kyi − yj k2 )−1
qij = P 2 −1
k6=l (1 + kyk − yl k )
(t+1) (t) ∂C
yi = yi + η
∂yi
(t+1)
where yi represents the new position of the point yi in the (t+1)-th itera-
(t)
tion, yi represents the current position of yi , η is the learning rate, and ∂y
∂C
i
is the computed gradient.
Figure 20 illustrates the results of applying both PCA and t-SNE dimensionality
reduction techniques to 5000 samples from the MNIST handwritten digit dataset
[41]. The plots show the projected data points in 2D space, with different colors
representing different digits. While neither method achieves perfect clustering of
the data points by digit, it is clear that t-SNE produces a significantly better result,
with more distinct and well-separated clusters. This demonstrates the effectiveness
of t-SNE in visualizing complex high-dimensional data.
Figure 20: Comparison of PCA (on the left) and t-SNE (on the right) on 5000
samples from the MNIST handwritten digit dataset (from ml-lectures.org).
4 Results
The main objective of this study was to evaluate the effect of olfactory learning on
the sensory perception of perfumery students. We sought to understand how this
learning could influence their ability to evoke associations and spontaneous thoughts
related to odors, to provide accurate descriptions of perceived odors, and to carry
out an appropriate categorization of olfactory products.
Figure 21: Average number of answers given per group and per measurement time
for the description task.
t(df ) = −8.95, p < 0.05. In addition, a significant difference was observed between
the times T1 and T2, t(df ) = −5.00, p < 0.05.
Vocabulary
Figure 22: Frequency of the words most used by the ISIPCA group for the
description task.
The figure 22 shows the frequency of the words most used by the ISIPCA group
(perfumery students) during the description task. The words are classified according
to their frequency of use, with the most frequent words displayed first.
The analysis of the data revealed that the ISIPCA group uses words specific to the
field of perfumery more frequently. Among the words most used by this group, we
find terms such as "boisé", "zesté" and "fruité".
The figure 23 shows the frequency of the words most used by the control group.
Compared to the ISIPCA group, the control group uses a vocabulary less specific
to the field of perfumery, with a more balanced distribution between the different
categories of words.
Figure 23: Frequency of the words most used by the control group for the
description task.
The results in the figure 24 shows that the number of words used from the selected
vocabulary increased significantly over time. At T0, participants used an average of
4 words from the selected vocabulary, while at T1 this number increased to 5 words,
and at T2 it reached 7 words on average. The ANOVA tests carried out reveal
a strong statistical significance with a p-value of less than 0.001 and F (2, 148) =
28.4, indicating that the differences observed in the number of words used in the
vocabulary between the different measurement times are highly significant.
Post-hoc tests revealed significant differences between the learning times for the
number of words used from the selected vocabulary in the ISIPCA group. A sig-
nificant improvement was observed between T0 (M = 3.55, SD = 2.27) and T1
(M = 5.05, SD = 3.06) , t(df ) = −2.79, p < 0.05), a significant increase was ob-
served between T0 and T2 (M = 7.37, SD = 2.64) , t(df ) = −8.22, p < 0.05). A sig-
nificant difference was also observed between T1 and T2, t(df ) = −4.40, p < 0.05).
These results underline the positive effect of learning on the increased use of the
selected vocabulary over time.
Figure 24: Number of words used from the vocabulary selected (40% of the most
frequent words) by the ISIPCA group.
used in their answers. We used the Spacy library for the initial morphosyntactic
labeling, followed by a manual correction of the labels to guarantee the accuracy of
the results. The data obtained allow us to present the results of this analysis and
to highlight the grammatical types most used by the participants.
Figure 25 illustrates the evolution of the use of nouns and adjectives (the other
grammatical types have been neglected because they are rarely used) over time.
The results indicate a significant increase in the use of adjectives, while the use of
nouns showed a gradual decrease.
Figure 25: Evolution of the use of adjectives (on the left) and nouns (on the right)
over time at ISIPCA students.
The results of the ANOVA revealed a significant effect of time on the use of adjectives
F (2, 0.15) = 3.22, p < 0.05. Using post-hoc tests, we examined the specific differ-
ences between the different pairs of measurement periods. The results of the post-hoc
tests showed significant differences between the periods T0(M = 0.46, SD = 0.52)
and T2(M = 0.63, SD = 0.50), t(df ) = −2.39, p < 0.05), as well as between the
periods T1(M = 0.53, SD = 0.51) and T2, t(df ) = −2.24, p < 0.05), confirming
a significant increase in the use of adjectives over time. However, no significant
differences were observed between the T0 and T1 periods, t(df ) = −0.50, p > 0.05).
A statistical test revealed a significant effect of time on the use of nouns, F (2, 0.09) =
5.56, p < 0.05. However, the post-hoc tests did not reveal significant differences be-
tween the different measurement periods, with the exception of T0(M = 0.75, SD =
0.63) to T2(M = 0.50, SD = 0.59) which showed a significant difference with a p-
value < 0.05.
Figure 26: Visualization of participant responses in 2D space using PCA for the
description task.
Figure 27: Visualization of participant responses in 2D space using TSNE for the
description task.
This difference in clustering between the two visualization techniques highlights the
ability of the t-SNE technique to reveal more subtle structures and similarities in
participants’ responses, particularly in the ISIPCA group. These observations will
be taken into account in subsequent analysis to better understand the differences
and characteristics of the perfumery students’ responses compared to the control
group.
Cosine similarity
The graph 28 shows the evolution of within-subject cosine similarity over time (T0,
T1 and T2). Cosine similarity measures the semantic proximity between the re-
sponses of each participant. For the ISIPCA group, a decrease in cosine similarity
indicates an increasing divergence of responses over time, perhaps highlighting the
richness and diversity of responses provided by perfumery students. Despite this
decrease, statistical analyses revealed no significant differences, either in terms of
temporal variation or between groups. This indicates that the diversity of responses
cannot be attributed to significant changes over time or to variations specific to the
ISIPCA group.
Figure 28: Graph illustrating the evolution of intra-subject cosine similarity over
time using FlauBERT for the description task.
Euclidean distance
The graph shows the evolution of within-subject Euclidean distance over time (T0,
T1 and T2) for the ISIPCA group. Euclidean distance measures the semantic dis-
similarity between each participant’s responses. We observe an increase in Euclidean
distance, reflecting a greater divergence of responses over time. This increase high-
lights the richness and diversity of the responses provided by perfumery students.
Significantly, this increase in Euclidean distance was observed both in comparison
between groups and as a function of time. Post-hoc tests confirmed these results,
revealing a significant difference between T0 and T2, p < 0.05, t(df ) = −4.69, as
well as between T1 and T2, p < 0.05, t(df ) = −2.88. In contrast, no significant
difference was observed between T0 and T1, p > 0.05, t(df ) = −1.72.
Figure 29: Graph showing the evolution of intra-subject Euclidean distance over
time using FlauBERT for the description task.
Figure 30: Evolution of the average number of words used over time for the
ISIPCA group and the control group for the evocation task.
The results of the post-hoc tests also provided additional information on the signifi-
cant differences between the different times. In the ISIPCA group, these tests showed
statistical significance between T0(M = 7.5, SD = 3.53) and T2(M = 11.1, SD =
3.88), as well as between T1(M = 8.72, SD = 5.46) and T2. This suggests a gradual
improvement in word use over time among perfumery students.
In contrast, the control group showed a decrease in the average number of words used
over time. Although this decrease was less pronounced than the increase observed
among perfumery students, it was nonetheless statistically significant.
Terminology analysis
Figure 31 shows the results of the ISIPCA group’s lexical diversity analysis of the
sensory evocation task. Perfumery students used a wide range of perfume-specific
terms, highlighting their expertise in sensory language. The most frequently used
words such as "fleur", "citron" and "parfum" reflect the precision with which they
expressed their olfactory sensations. These results are consistent with previous ob-
servations in the description task, where the ISIPCA group also showed a dominant
use of perfume-specific terms.
Figure 31: Frequency of the words most used by the ISIPCA group for the
evocation task.
Figure 32 shows the results of the control group’s lexical diversity analysis during
the sensory evocation task. In contrast to the ISIPCA group, the control group used
fewer perfume-specific terms. Instead, they preferred verbs such as "faire", "penser"
and "ménager". This use of more general verbs indicates a difference in approach
to sensory evocation, where the control group may have favored more abstract or
conceptual aspects when describing fragrances.
Figure 33 shows the evolution of the number of words used by the ISIPCA group
during the sensory evocation task. The results show a significant increase in the
number of words used over time. At the start of the study (T0), participants used
an average of 4 words and at T2 it reached 5 words. Post-hoc tests revealed statis-
tical significance between T0 and T2, as well as between T1 and T2, underlining a
continued improvement in perfumery students’ sensory vocabulary.
Figure 32: Frequency of the words most used by the control group for the
evocation task.
Figure 33: Evolution of the number of words used by the ISIPCA group during the
sensory evocation task using a vocabulary based on the most frequently used
words.
Figure 34: Evolution of the use of grammatical categories by the ISIPCA group
during the sensory evocation task at the different tenses (T0, T1 and T2).
evocation task in Figure 36 does not reveal any formation of significant clusters or
groupings. The points of the two groups are dispersed uniformly in the reduced
space, emphasizing the absence of a tendency to similarity or to the gathering of
the answers between the participants.
On the other hand, when we analyzed the data from the previous description task,
we observed clusters or groupings of significant points in Figure 27, indicating a
certain similarity and consistency in the descriptions. This highlights the difference
between the evocation task and the description task, where the first shows a uniform
dispersion of points without clusters, while the second shows more marked groupings.
Intra-subject distance
Intra-subject distance measures the diversity of responses used by each participant
during the sensory evocation task. In Figure 37, we present the evolution of within-
Figure 35: Scatterplot of responses used by the ISIPCA group and the control
group during the sensory evocation task using PCA.
subject distance for the perfumery school group and the control group over time.
We observe an increase in within-subject semantic distance in the perfume school
students over time, indicating greater variety in the responses they provided. This
increase in within-subject distance suggests that perfumery students used a wider
range of words and sensory associations as they progressed through their training.
However, when comparing the perfumery school group with the control group, we
found no significant difference in within-subjects semantic distance, with a p-value
greater than 0.05. This indicates that the diversity of responses used between the
two groups is not statistically significant. However, it is worth noting that the
diversity of responses increases significantly in the perfumery school group between
time measurements T0 and T2, with a p-value below 0.05.
Figure 36: Scatterplot of responses used by the ISIPCA and control group for the
sensory evocation task, generated using t-SNE.
formed by participants varied as a function of time, and whether there were any
differences between the two groups.
The results indicate that, for both groups, the average number of groups formed re-
mained relatively constant throughout the study. At T0, participants in both groups
formed a similar number of groups on average. This trend was also maintained at
subsequent measurements, both at T1 and T2. There were no significant variations
in the number of groups formed between the different measurement periods for either
group.
Figure 37: Graph showing the evolution of intra-subject Euclidean distance over
time using FlauBERT for the evocation task.
Figure 38: Average number of groups formed by participants over time for the
ISIPCA and Control groups.
the ISIPCA and Control groups were not statistically significant, with a p-value
greater than 0.05. This means that there are no significant differences in Rand’s
coefficients between the different times and the two groups.
Figure 40 shows the adjusted Rand coefficient between the reference odor partition
given by an expert and the partitions given by participants for both groups (ISIPCA
and control) over time. The values of the adjusted Rand coefficient showed a decrease
compared to the previous Rand coefficient, but continued to increase slightly for
ISIPCA participants.
At time T0, the adjusted Rand coefficient was 0.08 for ISIPCA participants and 0.1
for the control group. At time T1, the adjusted Rand coefficient increased to 0.1 for
ISIPCA participants and decreased slightly to 0.09 for the control group. Finally,
at time T2, the adjusted Rand coefficient reached 0.12 for ISIPCA participants and
Figure 39: Rand coefficient between the reference score and the participants’
scores.
Figure 40: Adjusted Rand coefficient between the reference partition and the
participants’ partitions.
Figure 41: Cohen’s Kappa between the reference score and the participants’ scores.
We observe a slight decrease in Cohen’s Kappa coefficient for the ISIPCA group over
time, suggesting a decrease in agreement in odor categorization between participants
and the reference partition. However, this decrease is not statistically significant,
meaning that it can be attributed to chance.
5 Discussion
In this chapter, we will discuss the results obtained during the study on the tasks
of olfactory evocation, description and categorization among ISIPCA students and
the control group. We will analyze the main observations, the implications and the
limits of our study.
similar number of groups compared to the control group, indicating some stability
in their categorization ability.
Analysis of agreement between participants and the reference score established by
an expert showed an initial high concordance, which improved for perfumery school
participants over time. However, an adjusted analysis accounting for agreement due
to chance revealed a slight decrease in agreement, suggesting a random contribution
in odor categorization. Despite this, agreement between participants and the expert
remained relatively stable, with a non-significant decrease over time.
Summary of results
• Description Task: The perfumery students, or the ISIPCA group, exhibited
a significant improvement in the description of scents over time. They became
more efficient in using fragrance-specific terms compared to the control group.
There was also an increase in the lexical diversity and the number of words
used by the ISIPCA group.
• Evocation Task: The ISIPCA group used a more diverse range of words
during sensory evocation, again reflecting their growing expertise. However, no
distinct pattern or clustering of responses between participants was observed,
indicating a high degree of individuality in associations and memories evoked.
• Categorization Task: While both groups remained consistent in the number
of groups formed throughout the experiment, the ISIPCA group demonstrated
an increasing alignment with expert categorizations over time, as evidenced
by increases in the Rand and adjusted Rand coefficients.
Implications
These results suggest that perfume training at ISIPCA has a significant impact on
students’ abilities to describe, evoke, and categorize scents, indicating the effective-
ness of their educational approach. The results also underscore the complexity and
individuality inherent in scent perception, and the necessity of specific training to
more precisely articulate these experiences.
Conclusion
In conclusion, our study in the field of cognitive science, using textual analysis, led
to significant results despite the limited quantity of data. We were able to effectively
apply data science techniques, such as text analysis and natural language processing,
to extract relevant information from subjective and qualitative responses.
This internship was a valuable opportunity to apply our knowledge in data science
to a concrete scientific context, cognitive science. We have gained a thorough un-
derstanding of the specific challenges related to cognitive data analysis and have
developed effective strategies to overcome them.
Despite the encouraging results, we recognize that the shortage of data constitutes
an important limitation of this study. To improve the representativeness of our
results, it would be wise to expand the sample size of the control group and to test
participants from different demographics on a longer time-scale to include larger
degrees of perfume expertise. This could provide more insights into how perfume
perception varies across populations.
This internship was also an exceptional opportunity to learn how to apply skills in
data science in a cognitive science context. I was able to put our knowledge into
practice while developing valuable new skills and perspectives.
In conclusion, I am grateful to have participated in this enriching learning experi-
ence. I am convinced that the combination of data science and cognitive science
opens up vast possibilities to deepen our understanding of the human mind and its
complex processes.
I look forward to continuing our research journey using the skills and knowledge
acquired during this internship. I am aware of the challenges ahead, but I am still
motivated by the promising results obtained so far and I am determined to contribute
to the advancement of cognitive science through the application of cutting-edge
techniques in data science.