Professional Documents
Culture Documents
Proposed System:
• Uses flask for the backend, model deployment and CSS for building the homepage ui. The results of
this research indicate the feasibility of predicting Myers Briggs personality types based on social media
user data with the finest accuracy.
• Personality classification from user’s social media recent tweets using a pre-trained language model called
BERT
3
Concept:
The Myers-Briggs Type Indicator (MBTI) is a widely used personality assessment tool based on Carl Jung’s theories.
It categorizes individuals into one of 16 personality types, providing insights into their preferences in four
dimensions:
1. Extraversion (E) vs. Introversion (I):
• Extraversion (E): People who gain energy from social interactions, enjoy group activities, and tend to be
outgoing.
• Introversion (I): Individuals who recharge by spending time alone, prefer deeper conversations, and may be
more reserved.
2. Sensing (S) vs. Intuition (N):
• Sensing (S): People who focus on concrete details, practical information, and the present moment.
• Intuition (N): Individuals who look beyond the surface, seek patterns, imaginative about future possibilities.
3. Thinking (T) vs. Feeling (F):
• Thinking (T): Individuals who make decisions based on logic, objective analysis, and principles.
• Feeling (F): People who consider emotions, values, and empathy when making choices.
4. Judging (J) vs. Perceiving (P):
• Judging (J): People who prefer structure, planning, and organization. They like closure and decision-making.
• Perceiving(P): Individuals who are adaptable, spontaneous, and open-ended. They enjoy flexibility and
exploration. Department of CSE 4
Bidirectional Encoder Representations From Transformers:
BERT, short for Bidirectional Encoder Representations from Transformers, is an open-source machine learning
framework designed for the realm of natural language processing (NLP). It was developed by researchers from Google
AI Language in 2018. Let’s delve into the details of BERT:
1. Architecture and Working:
i. BERT leverages a transformer-based neural network to understand and generate human-like language.
ii. Unlike the original Transformer architecture, which has both encoder and decoder modules, BERT employs
an encoder-only architecture. This emphasizes understanding input sequences rather than generating output
sequences.
iii. BERT uses a bidirectional approach, considering both the left and right context of words in a sentence
simultaneously. This allows for a more nuanced understanding of context.
2. Pre-training and Fine-tuning:
BERT undergoes a two-step process:
i. Pre-training on Large Data: BERT is pre-trained on a large amount of unlabeled text data. During pre-
training, it learns contextual embeddings, which represent words considering their surrounding context in a
sentence.
ii. Fine-tuning on Labeled Data: After pre-training, BERT is fine-tuned for specific NLP tasks using labeled data.
This step tailors the model to more targeted applications by adapting its general language understanding to
task-specific nuances.
April 24, 2024 Department of CSE 5
Working of BERT :
Many models predict the next word in a sequence, which is a directional approach and may limit context learning.
BERT addresses this challenge with two innovative training strategies:
1. Masked Language Model (MLM): In BERT’s pre-training process, a portion of words in each input sequence is
masked and the model is trained to predict the original values of these masked words based on the context
provided by the surrounding words. In simple terms, Masking words.
2. Next Sentence Prediction (NSP): BERT predicts if the second sentence is connected to the first. This is done by
transforming the output of the [CLS] token into a 2×1 shaped vector using a classification layer, and then calculating
the probability of whether the second sentence follows the first using SoftMax.
Applications:
2. Question-Answering Systems
3. Text classification
4. Named entity recognition
5. Text Summarization 6
Design:
Value of types:
Before preprocessing : ‘INFJ’
After preprocessing : array([0., 0., 1., 0.], dtype=float32)
The text is further preprocessed into Bert base inputs as input_ids, token_type_ids and attention mask.
Split dataset into training, testing and validation:
76% of the dataset is used for training, 12% of dataset for testing and remaining 12% of dataset for validation.
April 24, 2024 Department of CSE 12
4. BERT Model Training:
• Bert base inputs are passed into Bert Model and the generated Bert outputs pass through a densely-connected
neural network layer that uses a sigmoid activation function to generate four outputs aka personalities.
The loss function is binary cross-entropy since it's a multi-label classification problem and it deals with the
imbalance of the dataset.
The optimizer used is Adam from TensorFlow Addons with a specified learning rate it is used to add the weights
and bias. Finally the model weights are saved in ‘bert_base_model1.h5’ file.
Department of CSE 13
5. Model Evaluation:
• Evaluating the performance of model using ROC-AUC method.
ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates the performance of a binary
classifier across various threshold settings. It represents the true positive rate (TPR) against the false positive
rate (FPR) at different threshold levels.
AUC (Area Under the Curve) computes the area under the ROC curve, providing a single scalar value that
summarizes the model's performance across all possible threshold settings.
A higher ROC-AUC score indicates better discrimination ability of the model, with a value of 1 indicating perfect
performance and a value of 0.5 indicating random guessing.
Department of CSE 14
Testing :
Test Results: auc: 0.7850 accuracy: 0.7681
Result Analysis:
Since the dataset is imbalanced the value of accuracy can be misleading so we considered ROC-AUC
value to evaluate model performance.
4. Personality Predictions Based on User behavior on the Facebook Social Media Platform
- MIACHAEL M. TADESSE 1 HONGFEI LIN1,BO XU1 AND LIANG YANG [June 2021].