You are on page 1of 45

E-Assessment & Learning Analytics

9. Learner Modelling & Recommender Systems

Prof. Dr. Sven Strickroth

SS 2021
Outline

Learner Modelling

Recommender Systems

2
Learning Objectives

You
• know what learner modelling is
• know what relevant learner characteristics are
• know different methods for modelling learners
• know what recommender systems are
• know different types of recommender systems and
advantages/disadvantages
• know how to evaluate recommender systems

3
Learner Modelling
Learner Modelling (Chrysafiadi et al., 2013)

“The process of gathering relevant information in order to infer


the current cognitive state of the student, and to represent it so
as to be accessible and useful to the tutoring system for offering
adaptation.”
(Thomson & Mitrovic, 2009)

• Origin in Intelligent Tutoring Systems (ITS)


• Key aspect for creating adaptive educational systems
• Achieve accurate learner diagnosis and predict learner needs
→ Overall goal: Use data for adaptivity

5
Learners’ Characteristics

Domain dependent characteristics


Domain independent characteristics
Static characteristics
Dynamic characteristics
• Knowledge and skills
• Errors and misconceptions
• Learning styles and preferences
• Motivation
• Affective features
• Cognitive aspects
• Meta-cognitive aspects

6
Methods (Chrysafiadi et al., 2013)

• Overlay model
• Stereotypes
• Perturbation
• Cognitive theories
• Fuzzy student modeling
• Bayesian networks
• …

7
Overlay model

Assumption:
• Learner may have (incomplete) knowledge of the domain
• Learner‘s set of knowledge is a subset of an expert‘s set of knowledge
Expert’s knowledge

Learner’s knowledge

Goal: Eliminate lack of skills and knowledge of the domain


Variants:
• pure model: boolean
• “modern” model: “degree”

Content of the model: individual topics and concepts

does not consider cognitive needs, preferences or incorrect acquired


knowledge

8
Stereotypes (Kay, 2000)

Idea:
• Define shared characteristics
• Assign characteristics to learners

Each stereotype M has trigger t𝑀𝑖 and retraction conditions r𝑀𝑖


if ∃𝑖, t𝑀𝑖 = 𝑡𝑟𝑢𝑒 → active 𝑀
if ∃𝑗, 𝑟𝑀𝑗 = 𝑡𝑟𝑢𝑒 → not active(𝑀)

Usually combined with other modelling techniques

Advantages:
• “no” cold-start problem
• utilize group characteristics instead of individual users
Disadvantages:
• inflexible
• must be maintained and updated manually
• requires that the users can be classified into stereotypes
9
Perturbation

Extension of the overlay model:


Learner’s knowledge is subset of expert‘s knowledge and
includes misconceptions
→ Allows identification of incorrect knowledge
Learner‘s
Expert’s knowledge
Misconceptions

Learner’s knowledge

Misconceptions → “Bug library”


• Enumeration: Empirical analysis of mistakes
• Generative technique: Generate misconceptions using a
cognitive model based on learner’s behavior and found patterns

10
Cognitive Theories

Idea:
Usage of cognitive theories to model the learner‘s learning processes of
thinking and understanding
→ simulate the learners’ reasoning

Human Plausible Reasoning Theory (HPT)


• domain independent theory, originally based on answers from everyday
questions filled with plausible inferences
• → detect and categorize frequently recurring inference patterns

Multiple Attribute Decision Making (MADM)


• make preference decisions (such as evaluation, prioritization, and
selection) on predefined alternatives characterized by multiple conflicting
attributes
• → theory for supporting decision making

Ortony, Clore and Collins (OCC) cognitive theory of emotions


• allow to model possible emotional states of learners
11
Fuzzy & Bayesian learner models

Problem: uncertainties in data and subjectivity


Approaches to represent student characteristics:
• Fuzzy logic
handle concepts of partial truth
(truth of a variable is between 0 and 1)

• Bayesian network
directed acyclic graph modeling
variables (nodes) and their probabilistic
dependences
Types:
• Expert centric models
• Efficiency centric models
• Data centric models

12
Image Source: Conati, et al., 2013
Recommender Systems
Definition

Recommender System

• Selects artifacts from a set

• Filter problem

• Usually based on ratings/transactions:

• Result (often): TOP-N ranking and/or prediction of ratings

15
Tasks

Find “good” learning resources


• Relevant resources for a learning task or learning goal

Receive “good” sequence of learning resources


• Recommend a learning path (different resources) to achieve a competence

Find “good” learning partners


• to cooperate with for a learning task or learning goal
• to learn from for a learning task or learning goal

“Good”
• in respect to learning task or learning goal
• in respect to prior knowledge
• in respect to current situation (location, time, noise level, …)
16
Items recommended (Erdt et al., 2015)

17
Data used for calculation of recommendations

Ratings/transactions can be
• explicit
• implicit

Ratings/Transactions used
• Viewed learning items
• Rated learning items
• Tagged learning items
• Successful processed learning items – additional knowledge about the user can
be used for calculation of recommendation
• Learning items processed unsuccessful
• Communication history – communication with others may be indicator for same
interest and common learning items
• …
18
Quality Measures for Recommender Systems in a
Learning Scenario

Relevance:
• “Relevance refers to the ability of a RS (Recommender System) to provide
items that fit the user’s preferences” (Epifania, 2012).

Novelty:
• “Novelty (or discovery) is the extent to which users receive new and
interesting recommendations” (Pu, 2011).

Diversity:
• “Diversity measures the diversity level of items in the recommendation list”
(Pu, 2011).

19
Recommender Systems

Knowledge- Collaborative Demographic-


Content-based Hybrid
based Filtering based

Neighborhood-
Model-based
based
e.g. LSA, SVM

User-based Item-based Graph-based

Path-based Random Walk


e.g.
e.g. Shortest path PageRank/FolkRank

20
Technologies used (Erdt et al., 2015)

21
Content-based filtering

1. Comparison of the artifacts and calculation of the similarities


• 3 texts with 3 keywords each, I = {T1, T2, T3}
– T1 = {Computer Science, Google, Search Engine}
– T2 = {News, Graphics, Polygon}
– T3 = {News, Google, Graphics}

2. Calculation of similarities (Neighbours I‘)


• Keyword space: [Computer Science, Google, Search Engine, Graphics, News]
• T1 = (1,1,1,0,0,0)
• T2 = (0,0,0,1,1,1)
• T3 = (0,1,0,1,0,1)
• e.g., S(T1,T2)=0, S(T2,T3)=2/3, S(T1,T3)=1/3

3. (Prediction)
• T2: Rated with 4
• T3 is similar, Prediction:

22
Knowledge-based filtering

• domain specific attributes


• often ontology-based

• frequently: user is
involved in filtering
process → interactive
process
• Two types:
• constraint-based
• case-based
• automatic matching of
learner model and artifact
model, based on the
knowledge stored there

23
Collaborative filters

domain independent

Basis: Matrix of Items x Users


Cells of the matrix: transactions/ratings

Example: Movie Ratings on scale from 1 to 5


Rose Seymour Phillips Puig LaSalle Matthews Toby

Lady 2,5 3,0 2,5 3,0 3,0

Snakes 3,5 3,5 3,0 3,5 4,0 4,0 4,5

Luck 3,0 1,5 3,0 2,0

Superman 3,5 5,0 3,5 4,0 3,0 5,0 4,0

Dupree 2,5 3,5 2,5 2,0 3,5 1,0

Night 3,0 3,0 4,0 4,5 3,0 3,0

24
Collaborative filters: Similarity measure

Determining the similarity of two columns (similarity of users) or rows (similarity


of items)

Similarity measure needed s ∶ 𝐼 × 𝐼 → 0,1 ⊂ ℝ


• s 𝑖, 𝑗 ≤ 𝑠 𝑖, 𝑖 ∀𝑖, 𝑗 𝜖 𝐼 Rose Seymour
• s 𝑖, 𝑗 ≥ 0 ∀𝑖, 𝑗 𝜖 𝐼
2.5 3.0
• s 𝑖, 𝑖 = 1 ∀𝑖 𝜖 𝐼
3.5 3.5

3.0 1.5
e.g., based on Euclidean distance d(x, y) = σ𝑑𝑖=𝑖(𝑥𝑖 − 𝑦𝑖 )² :
1 3.5 5.0
s(𝑖, 𝑗) =
1 + 𝑑(𝑖, 𝑗) 2.5 3.5

3.0 3.0
Example: distance Rose & Seymour
d Rose, Seymour =
= 3 − 2.5 2 + 3.5 − 3.5 2 + 3 − 1.5 2 + 3.5 − 5 2 +⋯
≈ 2.4
s Rose, Seymour ≈ 0.3
25
Collaborative filters: Similarity measures

Alternative, e.g. Pearson correlation (~how much the scores do line up):

0.4 0.75

Corrects for structural differences (stricter ratings of a user)

Calculation:

Which similarity measure is chosen: depends on use case:


• Jaccard index
• Pearson correlation ([-1;1])
• cosine correlation ([-1;1])
• adjusted cosine correlation ([-1;1])
• …

26
Collaborative filters: Simple algorithms

Based on the matrix and the similarity function, the most similar other users
can be easily determined for a user
Analogous: for an item, the most similar other items can be easily
determined
Simple approach for item recommendation:
• Search most similar user
• Show me the top recommendations that I don't know yet

Disadvantages:
• It could be that in principle well matching items are not found, if the
found "best matching user" has not rated them
• Possibly the "best matching user" has rated an item completely against
the general trend
• Cold Start Problem (general issue for collaborative filters)
27
Collaborative filtering: product/user recommendation

Possible solution: weighted average

→ Recommendation of items for certain users


(Result: Items AND potential rating!!)

Analogous possible: Recommendation of users for certain items

28
Item-based collaborative filtering

Approach presented so far: user-based collaborative filtering


Only works well with a densely populated and not too large matrix
Alternative approach: item-based collaborative filtering

Basic idea:
• First calculate the n items most similar to an item (this can be done in
advance/offline, e.g. regularly overnight).
• Recommendations for a user:
• Determine their top rated items
• Determine weighted average of those items that are most similar to them

29
Example: item-based collaborative filtering

Recommendations for Toby:

Procedure
• does not need to look at all data for a concrete recommendation
• copes well with a sparse matrix
• has (manageable) additional effort for matrix of similar items

30
Matrix factorization in recommender systems

Observations:
• Matrix user x items is often (too) large
• Presumably, there are latent factors of items that are important for the
recommendation to users.
• Knowledge of these factors is interesting for the platform
Approach: decomposition of the matrix into
• users x factors and
• items x factors

(technically, e.g. via singular value decomposition (SVD) or principal


component analysis)

31
Matrix factorization in recommender systems
(Koren et al., 2009)

Results significantly more precise than "nearest neighbor"


methods

32
Generic issues

Cold Start
• new user
• new item
• Boosting
• new community/system

Quality of ratings over time

33
Evaluation
Recommender Evaluation
(Aggarwal, 2016; Herlocker, 2004)

Offline vs. online vs. user study


Quantitative vs. qualitative

Accuracy of predicted ratings (e.g., RMSE)


Hit rate, Precision & Recall, F1, …
Ranks/Utility
Coverage
Novelty
Diversity
User satisfaction/trust/usefulness/…
Effects on learning (Erdt et al., 2015)
Performance

35
Recommender Evaluation: Common offline approach

Recommendation is viewed as information retrieval task

Procedure
• Hide some items “used” by the user (ground truth)
• Recommend items & rank items
• Compare with ground truth – hidden items “used” by the user

Used in reality/Relevant Not used in reality/Not relevant


Recommended True Positive False Positive
Not Recommended False Negative True Negative

36
Hit Rate/Recall and Precision

Recommended
Item 98
Used in reality
Item 32
32
Item 32
Item 152
Item 56
Item 74
74
Item 74
Item 59

Hit Rate/Recall:
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑡𝑒𝑚𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑐𝑎𝑙𝑙(𝑘) =
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑡𝑒𝑚𝑠

Precision:
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑡𝑒𝑚𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑘) =
𝑖𝑡𝑒𝑚𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑

37
Positions matters

Recommended
Item 98
98
Used in reality
Item 32
32
Item 32
Item 152
152
Item 56
Item 74
74
Item 74
Item 59
59

38
Only first k positions matter

Recommended
Item 98
Used in reality
Item 32
Item 32
Item 152
Item 56
Item 74
Item 74
Item 59

Item 56

39
Utility-based measure

Idea:
• each item has a utility for each user
• utility correlates to the ground truth rating and position in the
recommendation list

Utility score for a user:

Overall utility score:

40
Selecting a metric

Selecting not easy


All metrics have drawbacks

Maximizing accuracy might not lead to better recommendations/usefulness

→ Solution: combine different evaluation techniques

cf. Herlocker, 2004; Schafer et al., 2007; Aggarwal, 2016

41
Wrap up
Wrap up

43
Next Lecture

• Clustering, Classification & Prediction

44
Prof. Dr. Sven Strickroth
Ludwig-Maximilians-Universität München
Institut für Informatik
Lehr- und Forschungseinheit für
Programmier- und Modellierungssprachen
Oettingenstraße 67
80538 München

Telefon: +49-89-2180-9300
sven.strickroth@ifi.lmu.de

45
References & Further Reading

• Chrysafiadi, K., & Virvou, M. (2013). Student modeling approaches: A literature


review for the last decade. Expert Systems with Applications, 40(11), 4715-4729.
• Kay, J. (2000). Stereotypes, student models and scrutability. In International
Conference on Intelligent Tutoring Systems (pp. 19-30). Springer, Berlin, Heidelberg.
• Conati, C., Gertner, A., & Vanlehn, K. (2002). Using Bayesian networks to manage
uncertainty in student modeling. User modeling and user-adapted interaction, 12(4),
371-417.
• Brusilovsky, P., & Millán, E. (2007). User models for adaptive hypermedia and
adaptive educational systems. In The adaptive web (pp. 3-53). Springer, Berlin,
Heidelberg.

• Aggarwal, C. (2016). Recommender systems: The textbook. Springer.


• Erdt, M., Fernandez, A., & Rensing, C. (2015). Evaluating recommender systems for
technology enhanced learning: a quantitative survey. IEEE Transactions on Learning
Technologies, 8(4), 326-344.
• Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for
recommender systems. Computer, 42(8).
• Segaran, T. (2007). Programming collective intelligence: building smart web 2.0
applications. O'Reilly

46

You might also like