Professional Documents
Culture Documents
Topic Analysis in
Graduate Discussion Forums
Gokarn Mallika Nitin
Swapna Gottipati
Venky Shankararaman
• Background
• Motivation for Discussion Forum Analysis
• Literature Survey
• Research Problem Statement
• Data Set details
• Solution Model and Architecture Design
• Experiments Findings
• Limitations and Future Work
• Demo
Discussion Forums in Teaching & Learning Process
Systems Comparison
Argumentative Analysis
Discourse Analysis
Sentiment Analysis
In-class Discussions
Participation
Individual Behaviour
Analysis Faculty Response Analysis
Comparison
Individual Interactions
Interaction Behaviour
Analysis
Faculty-Student Interactions
This Paper
Argumentative Analysis
Discourse Analysis
Student profiling Ongoing Work
Cognitive and Social Interaction Analysis in Graduate Discussion Forums.“, Mallika Nitin, Swapna Gottipati, Venky
Shankararaman, FIE 2019. In Proceedings of 49th Annual Frontiers in Education Conference
Research Problem
Aids in gaining insights of forums and supports collaborative learning with further features
Research Questions:
RQ1: How the clustering technique performs in discovering sub-topics?
RQ2: Which visualizations are suitable for sub-topic and topic evolution representations?
Data
14 Course
Duration 8 Lesson
Content
>50 Industry
55 Enrolled
students %
Experien
ce
Forum Respons
200
37 participa
nts
es
Data - Discussion Forum Design
The graduates exhibit reservations if the forum is redundant and repetition of the course content.
5 Text Clustering
What are visuals for the displaying cluster results - Free draw and upload?
Explain one clustering evaluation measure with an example.
6 Information Extraction
What are applications of HMM models (Or any other Sequence Model)?
What are examples of information Extraction in Industry (e.g. Finance, Retail, Travel, Healthcare, Media, Education etc.)
9 Sentiment analysis
Discuss the technique to handle negation in opinions.
Discuss technique to handle sarcasm in opinions
Data – Sample Post
Visualization Evaluations
1. Exploratory visuals 1. Qualitative analysis of clustering
2. Sub-topic network visuals algorithms
3. Topic evolution visuals 2. Quantitative analysis
Application Architecture
Dashboard Components
Instructor can intervene and create more content for less participated class topics
Relative Participation Scoring
𝑥 −𝜇
z − 𝑠𝑐𝑜𝑟𝑒=
𝜎
Instructor can intervene and aid the students who need help
Topic Discovery and
Evolution Analysis
Topic Discovery and K-Means, Agglomerative Clustering
Evolution
Human Gold Truth
We asked two human judges to label the coherence of the top words in each cluster.
Agglomerative Cluster Coheren K-means Cohere
Top words in each cluster t/ Top words in each cluster nt/
Repetiti
ve
Repetiti
ve
Agglomerative
shows good
['machine', 'tweet', 'comments',
'replying', 'google']
C ['machine', 'training', 'model',
'dataset', 'data']
C
performance.
['discuss', 'technique/s', C ['doubt', 'clarified', 'arun', 'comment', R
'opinions', 'handle', 'cluster'] 'mislead'] What about
['sentiment', 'negation', C ['customer', 'chatbots', 'service', C
'sarcasm', 'analysis',
'expressions']
'customers', 'chatbot'] quantitative?
['text', 'mining', 'classification', C ['doubt', 'clarified', 'arun', 'comment', R
'government', 'precision'] 'mislead']
['in-class', 'question', 'thanks', C ['sentiment', 'negation', 'analysis', C
'students', 'compilation'] 'scope', 'polarity']
['chatbots', 'customer', 'service', C ['government', 'sites', 'classification', R
'questions', 'banks'] 'banks', 'examples']
['data', 'plagiarism', 'clinical', C ['patient', 'text', 'healthcare', C
'manually', 'website'] 'doctors', 'analytics']
['doctors', 'symptoms', 'flu', C ['words', 'slide', 'word', 'document', C
'doctor', 'analytics'] 'general']
['sequence', 'models', C ['doctor', 'medical', 'records', C
'applications', 'state', 'model'] 'hospital', 'doctors']
Quantitative Evaluations - Agglomerative
We asked two human judges to label the coherence of the topics to the comments.
We observe that sub-topics can be part of more than one main topic. For example topic, 6 - “medical and healthcare”,
appears under “text classification”, “text mining introduction”, and “clustering”.
Instructor can identify the missing sub-topics and submit the posts
under the main topic to lead the students in the learning process
Visualizations - Topic Evolution Over Time (Sub-topics)
RQ2 – Network and Interactive graphs are suitable for the user
friendly insights of topics and evolution
Content Analysis
Naïve Bayes Model
Content Type Analysis
Human Gold Truth
Evaluations
Ongoing Research
swapnag@smu.edu.sg
http://www.mysmu.edu/faculty/swapnag/publications.html
31 | Final Presentation
9 | Final Presentation
10 | Final Presentation
Impact of number of postings on the grades
Quantitative Evaluations
Cluster Coherence
Metric kMeans Agglo
silhouette score 0.03 0.02
Calinski-Harabasz
Score 2.06 1.92