You are on page 1of 34

CRICOS Provider No. 00300K (NT/VIC) I 03286A (NSW) | RTO Provider No.

0373

PSY417: Research Methods and Practice

Week 12 – Introduction to Machine Learning for Psychology


Faculty Of Health
Dr. Rebecca Williams
SEM1 2023
Week # Topic
Readings (textbook) Assessments Due
(Date Beginning)

Week 1
Introduction to the course and review of univariate statistics Chapters 1 & 2
(06/03/2023)

Week 2
(13/03/2023) Non-parametric models Chapters 6 & 7

Online quiz (Opens


Week 3 22/03/2023 at 5:00PM ACST
Logistic regression
(20/03/2023) Chapter 20 and closes 24/03/2023 at
5:00PM)

Week 4
Moderation and mediation Chapter 11
(27/03/2023)

Week 5
Advanced ANOVA: Repeated measures and mixed designs Chapters 15 & 16
(03/04/2023)

(10/04/23) Semester Break

Week 6
ANCOVA and MANOVA Chapters 13 & 17
(17/04/2023)

Week 7 Oral presentation (In class, 15


Oral Presentations
(24/04/2023) minutes)

Week 9
Factor analysis and principal component analysis Chapter 18
(08/05/2023)

Week 10
Structural Equation Modelling
(15/05/2023)

Week 11
Synthesizing statistics// Reading and writing scientific reports
(22/05/2023)

Week 12 Report due (1500 words,


Machine learning for psychology
(29/05/2023) 02/06/2023 by 5:00PM ACST)

(05/06/2023) Revision Week

Final online exam (Opens


14/06/2023 at 5:00PM ACST
(12/06/2023) Centrally organised examination period and closes 16/06/2023 at
5:00PM) 2
Today’s Lecture Outline

• What is machine learning (ML)?

• How to interpret ML jargon

• Application of ML in psychology

3
What is machine learning (ML)?
“A set of methods that can automatically detect patterns in data, and then use
the patterns to predict future data.”
(Rosenbusch et al., 2021).

• Regression is used to determine the statistical significance of predictors on


outcome variables.

• The key distinction is that ML can use regression to predict future data.

• ML takes statistical approaches from ‘description’ of phenomena to


individualized application.
4
“What’s going to happen over the next decade, just as a consequence of having
more data, is that machine-learning systems are going to be able to pull out more
insights than the humans who were thinking about those data may be able to generate.”

Tom Griffiths, a professor of psychology and computer science at Princeton University.


5
“We argue that psychology’s near-total focus on explaining the causes of
behavior has led much of the field to be populated by research
programs that provide intricate theories of psychological mechanism,
but that have little (or unknown) ability to predict future behaviors with
any appreciable accuracy.

We propose that principles and techniques from the field of machine


learning can help psychology become a more predictive science.”

(Yarkoni & Westfall, 2017).


6
Psychology to ML
Common research questions in psychology can easily
become questions that utilize ML methods if:

1. There is a focus on prediction, and

2. There is a large enough sample size to enable accurate


prediction
7
ML can be broadly dichotomized into two
types

1. Supervised machine
learning

2. Unsupervised
machine learning

Source: Wiki Commons 8


Supervised ML is used for regression and
classification problems
High 1
extraversion
Extraversion

Low
extraversion 0

Minutes spent playing Minutes spent playing


online games each week online games each week 9
Factors common to both regression and
classification in ML

10
Supervised machine learning also has
predictor and outcome variables
The intercept is the value
of extraversion when Recall from Lecture 3:
minutes = 0
Extraversion

Simple linear regression is used for


predicting the value of an outcome
The slope is how much variable (continuous) from a predictor
extraversion would change variable (continuous)
for every 1-min change in
time spent playing
𝑌 = 𝑏! + 𝑏" 𝑋" + 𝑒𝑟𝑟𝑜𝑟

Minutes spent playing


online games each week 11
However, the dataset is now separated into
training and testing sets
In supervised ML, the dataset is separated into
a training set to estimate the parameter
values (such as 𝑏! and 𝑏" ), then tested for
accuracy on the testing set.
Extraversion

Typically, the training set is extracted and the


parameters estimated numerous times, and
the averages are used. This is called “k-fold
cross-validation”.

The ‘k’ is how many times the training set is


Minutes spent playing
extracted (it’s very similar to bootstrapping...)
online games each week 12
Think back to linear regression, where the
null hypothesis is either rejected or retained

Recall from lecture 3:

Extraversion
The model is evaluated to
retain or reject the null
hypothesis if is it
significantly different from
the mean line (mean of Y)

Minutes spent playing


online games each week
13
But in supervised machine learning, there is
no null hypothesis to retain or reject
• Rather, the model is
evaluated in terms of how
well it predicts each

Extraversion
individual value of Y in the
testing set

• Each predicted value of Y is


!
referred to as Y-hat (𝑌)
Minutes spent playing
𝑌) = 𝑏! + 𝑏" 𝑋" + 𝑒𝑟𝑟𝑜𝑟 online games each week
14
Some common approaches used to evaluate
the performance of the trained model

• These might be referred to as cost functions or


performance metrics, and include

• The coefficient of determination (R2)


• Root mean square error (RMSE)
• Mean square error (MSE)
• Mean absolute error (MAE)

15
Now just looking at ML regression...

16
In supervised machine learning for
regression problems, it’s not all about fitting
a line
• In supervised ML, the regression model
may not be linear (although it can be).

Extraversion
• To capture more complex relationships
between X and Y, flexible non-linear
models can be implemented

• Some common nonlinear regression


models used in ML are
• Decision trees*
• Neural networks Minutes spent playing
• Random forests* online games each week
17
*May also find in classification
One problem commonly encountered in ML
is overfitting
This is when the model fits the training
data very well, but not the testing data.

Extraversion
It can be a problem for both linear and
non-linear regression models.

There needs to be a compromise


between model complexity (fitting the
training set) and flexibility to generalize
to new, unseen data (the testing set).
Minutes spent playing
online games each week
18
Different regression models are used to
address the issue of overfitting
Some regression techniques are often
implemented to address the problem of
overfitting.

Extraversion
These include:

• Ridge regression
• Lasso regression
• Elastic net regression

The regression models are referred to as


Minutes spent playing
‘shrinkage’ or ‘regularized’ regression. online games each week
19
Now just looking at ML classification...

20
In supervised machine learning for
classification problems, it’s not all about
fitting a logit function
High 1
• In supervised ML, the classification
model may not be logistic (although it extraversion
can be).

• To capture more complex relationships


between X and Y, flexible non-linear
models are implemented

Low
• Some common nonlinear classification extraversion 0
models used in ML are support vector
machines (SVMs)*. Minutes spent playing
online games each week 21
*May also find in regression
Support vector machines are used for both
simple and complex classification

22
Source: analyticsvidhya.com
Summary of machine learning
• The focus of research aims in studies using ML is prediction.

• Supervised ML is used for regression and classification problems. The


broad steps are:
• Separate the dataset into training and testing sets
• Apply a learning algorithm to the training set to fit a model (i.e. find the
parameters, such as b0 and b1 in linear regression)
• Evaluate how well the model fits the testing set

• The ‘significant result’ is how well the model fits the data compared
to other models (such as regular, linear regression).
23
Today’s Lecture Outline

• What is machine learning (ML)?

• How to interpret ML jargon

• Application of ML in psychology

24
25
Introduction
• Problematic smartphone use (PSU) is overuse of a smartphone with functional
impairment.

• PSU severity is most widely studied in relation to depression, anxiety, stress and
low self-esteem.

• Recent research has shown that fear of missing out (FOMO) is associated with PSU.
• A fear of missing out on rewarding and pleasurable experiences, and a cognitive bias
related to one’s social resources.

• Another maladaptive cognitive mechanism associated with PSU is rumination


• Frequent, negative self-referencing thoughts, typically about past events.
26
Study aims and hypotheses
• Understand PSU severity using established explanatory variables, but with novel statistical
methods.

• Supervised machine learning implemented as it has shown to outperform conventional


statistics in prediction.

• Regression-based ML was used to model PSU symptom severity as a continuous variable.

Hypotheses:
H1: Depression and anxiety severity should be positively associated with PSU severity.
H2: FOMO and rumination should be positively associated with PSU severity.
H3: Machine learning procedures will produce an algorithm which can predict PSU severity.
27
Methods
Participants
1238 students recruited to complete online survey. 141 were excluded for careless
responding. Among 1097 remaining, mean age = 19.4 (+/- 1.2) years. 18.1% were
male.

Instruments
1. Demographics (age, sex)
2. Smartphone addition short-scale version (SAS-SV): PSU severity
3. Depression Anxiety Stress Scale-21 (DASS-21)
4. Fear of missing out (FOMO) scale
5. Ruminative Response Scale (RRS)
28
Methods
Analysis
1. Data screened for missing/careless responses
2. Correlations between variables computed
3. Data split into training (70%) and testing sets
• k-folds cross validation (k = 10) used
4. Data entered into 6 different ML models
• Predictors = age, sex, depression + anxiety, FOMO, rumination
• Outcome = PSU severity
5. The 6 ML models were
• Ridge regression
• Lasso regression
• Elastic net regression
• Random forest
• Support vector machine
• Extreme gradient boosting
29
6. ML models compared using RMSE, MAE, R2, and statistical tests
Results
ANOVAs showed sex
differences in terms of
• PSU severity (women higher
scores)
• Depression, anxiety (women
lower scores)
• Rumination (women lower
scores)

30
Shrinkage regression techniques
FOMO was performed the best
Results largest predictor
of PSU severity

31
Discussion
• Shrinkage regression models performed the best in explaining PSU severity.

• The degree of variance explained by these models is similar to recent papers


implementing linear regression (without ML).

• H1 was unsupported (depression + anxiety did not contribute to PSU severity).

• H2 was partly supported: FOMO but not rumination conferred a relatively large
contribution in modelling PSU severity.

• Future research could use supervised ML to classify people with a gaming


disorder – which is now an ICD-11 diagnosis- based on similar predictor variables. 32
Congratulations on finishing PSY417!
You are now familiar with
1. Basic statistics 8. Structural equation
2. Bias and non-parametric models modelling
3. Linear and logistic regression 9. Systematic reviews and
4. Moderation and mediation meta-analysis
5. Repeated-measures and mixed 10. Supervised machine learning
factorial ANOVA
6. ANCOVA and MANOVA
7. Factor analysis and PCA
33
References
• Rosenbusch, H., Soldner, F., Evans, A.M., Zeelenberg, M.
(2021). Supervised machine learning methods in
psychology: A practical introduction with annotated R code.
Social and Personality Psychology Compass, 15(2), e12579.

• Yarkoni, T. & Westfall, J. (2017). Choosing prediction over


explanation in psychology: Lessons from machine learning.
Perspectives on Psychological Science, 12(6), 1100-1122.
34

You might also like