Professional Documents
Culture Documents
PARQR: Augmenting The Piazza Online Forum To Better Support Degree Seeking Online Masters Students
PARQR: Augmenting The Piazza Online Forum To Better Support Degree Seeking Online Masters Students
Noah Bilgrien, Roy Finkelberg, Chirag Tailor, India Irish, Girish Murali, Abhishek Mangal, Niklas
Gustafsson, Sumedha Raman, Thad Starner, Rosa Arriaga
Georgia Institute of Technology
Atlanta, GA, USA
{nbilgrien, roy, chirag.tailor, indiai, girishmn92, abhishekmangal, egustafsson3, sraman46, thad,
arriaga}@gatech.edu
arXiv:1909.02043v1 [cs.HC] 4 Sep 2019
1
sions. Figure 1 shows a post on Piazza answered by a student user’s current context. Remembrance Agents run as a contin-
and an instructor, along with a follow-up discussion. uous background process, rather than being actively invoked
by a user [9]. PARQR uses this method in online classrooms,
detracting less from the user experience compared with switch-
ing modes to a keyword search.
PARQR
PARQR is most fundamentally an intelligent agent which
aims to present relevant and actionable content to students
and instructors based on the context the user is operating in.
The design framework for determining when PARQR should
provide content and what sort of content it should provide is
simple: given the user is in a given context, what information
can we suggest that helps focus the user on actionable content?
When students arrive on Piazza’s home page, PARQR assumes
they want to browse previous posts to stay up to date with
course discussions or gain insight on an assignment. Here
PARQR suggests posts with large amounts of views and many
follow up discussions — indicators that other students have
found these posts worthy of their attention. When students are
asking questions, PARQR assumes they wish to find an answer
to their question and presents related posts that may contain
it. When instructors arrive on Piazza’s home page, PARQR
assumes they are there to answer questions for students, so it
suggests questions that have not yet been answered but have
received a large amount of student attention.
Figure 1. A post answered by a student and an instructor along with a
followup discussion containing two contributions.
2
answer. The number of post views and followups are min-max four models described above. The final score is computed as
normalized, and the parameter θ is used as an offset for the a weighted average of the scores from the four models- these
sigmoid method and set to seven days, which has the effect of weights were hand tuned with the highest weight given to the
prioritizing newer posts. model trained on the content of the question. PARQR displays
the five suggestions with the highest similarity. The browser
The second kind of suggestions that PARQR makes for stu- extension allows us to monitor how PARQR is used, and reg-
dents is to recommend existing, related posts as students are isters events whenever a user clicks on the New Post button,
writing a new post (fig 2). The goal is to answer a student’s one of our recommendations, or the Submit Post button.
question immediately (thereby preventing a duplicate post) or
at least help better inform the question.
To make these recommendations, PARQR computes a distance EXPERIMENTS
from the vectorized version of the in-progress post (the post We determined that PARQR reduces the number of duplicate
header, body, and selected tags) to each vectorized post in the posts by comparing the Piazza forum activity for a particular
database, where distance is the cosine similarity between the course taught with and without the use of PARQR. For our
two posts. Then, PARQR recommends the five closest posts. analysis we chose the Introduction to AI course during the
This process is expanded upon in the “architecture” section. Spring 2017 and Spring 2019 semesters and focus on the time
period of Assignment 2, in which students implement multiple
The Instructor’s Experience classical AI search algorithms. Though PARQR is used by
When instructors visit the home page of a class in Piazza, multiple classes, we selected the Introduction to AI class for
PARQR shows them posts that we believe they would want to our experiments because it has a large number of students who
take action on: recently posted questions with a lot of views are very active on Piazza. Assignment 2 was chosen because
and followups but no instructor answer. When choosing these is identical in subsequent semesters and is early enough in the
posts, PARQR first finds posts with no instructor response. semester that most students who would drop were still in the
If there are more than six of these, it additionally filters by class. Additionally, this class consists entirely of online degree
posts with no student response. If there are still more than six, seeking students. Spring 2017 was prior to any development
they are sorted in descending order by how many unresolved of PARQR and thus had no PARQR users. In Spring 2019
followups they have, and the top six are reported. PARQR was used by 98% of Piazza users in the class.
System Architecture
In order to facilitate scaling up to a large amount of active con- Duplicate Post Inter-Rater Reliability
current users, we designed the backend system as a collection Here we define two or more questions as duplicates if a student
of micro services broken into separate Docker containers. writing one of them would stop upon being shown answers to
any of the others (i.e., the posts contain sufficient information
PARQR uses Faran’s unofficial Piazza API [4] to retrieve posts
to answer one another). This judgment is subjective, and
and populates our MongoDB with the post title, body, tags,
the size of the dataset is large enough that no one researcher
answers, and any followups. The model training service runs
could be responsible for labeling it. We isolated two sets
sequentially after the parsing service to update models with of posts: Assignment 1 from Spring 2017 for calculating
new information. We use Sci-Kit Learn [8] and the Natural interrater reliability and Assignment 2 from both semesters
Language Toolkit (NLTK) [1] to create four models per class. for analyzing PARQR’s impact.
These models are Term Frequency-Inverse Document Fre-
quency (TF-IDF) encodings of 1) the concatenation of the post Three researchers hand clustered assignment 1 from the Spring
title, body, and tags 2) the answer from an instructor 3) the an- 2017 course. The researchers were given a spreadsheet of
swer from a student and 4) the concatenation of all follow-up Piazza posts from Assignment 1, including the subject, body,
discussions on the question. Before these posts are encoded, and answers to the posts. Each researcher clustered the posts
we use the NLTK SnowballStemmer and WordNetLemmer into groups of duplicates. The researchers then negotiated
to reduce the dimensionality of our data. The final models and merged their clusterings to create a single “gold standard”
and embeddings of existing posts are cached in a Redis queue. clustering of the posts.
This entire process occurs once per class every 15 minutes.
Next, three other researchers were given 200 pairs of questions
To render this information to the end-user, we built a browser pulled from this gold standard in which the ratio of pairs which
extension which monitors the user activity and inserts recom- were duplicates (17%) matched that of the gold standard. They
mendations into their page as discussed above. The browser were asked to label each of these as either a duplicate or not.
extension is supported on both Google Chrome and Mozilla All three raters, acting independently, agreed on whether two
Firefox browsers. It communicates with a RESTful backend questions are duplicates 90% of the time. The average agree-
API service built with python using FLASK. When a student ment of the three combinations of two inter-rater comparisons
begins writing a question, the browser extension sends the con- was 93%. This is significantly above the baseline agreement
tents of their question to the API. The API then transforms this of 83%, which could have been attained by always predict-
text using the TF-IDF models from the corresponding class. ing the majority class (not duplicate) Given these results, we
Using cosine similarity as a distance metric, this vector is com- are confident in having multiple researchers cluster the larger
pared to all other vectorized posts from this class using the datasets, as done in the next section.
3
Statistic Spring 2017 Spring 2019 users’ experiences with it. We are working to capture more
Enrolled students 390 590 information from student activity so we can understand their
Students active on Piazza 385 590 behavior on Piazza and how PARQR changes this behavior,
PARQR Users (Percentage) 0 (0%) 578 (98.0%) especially over time. We continue to look for new ways to
Number of posts 195 168 use PARQR to augment the online classroom experience with
Number of duplicate posts 50 (25.6%) 30 (17.8%) tools and supplemental information that help students and
Posts per active student 0.506 0.291 instructors alike.
Table 1. Piazza metrics during Assignment 2.
REFERENCES
1. Steven Bird, Ewan Klein, and Edward Loper. 2009.
Analysis of Reduction of Duplicate Posts
Natural Language Processing with Python. O’Reilly
To determine the effect of PARQR on duplicate posts, we shuf- Media.
fled posts from the Spring 2017 and Spring 2019 assignment 2
into one dataset, keeping the labelers blind to which semester 2. C. G. Brinton, M. Chiang, S. Jain, H. Lam, Z. Liu, and
each post came from, and storing a mapping to allow revers- F. M. F. Wong. 2014. Learning about Social Learning in
ing this process. Eight researchers then collaboratively and MOOCs: From Statistical Analysis to Generative Model.
asynchronously clustered this data into clusters of duplicate IEEE Transactions on Learning Technologies 7, 4 (Oct
posts. Note that since the semesters are combined, the defined 2014), 346–359.
clusters were common across both. The data was then split 3. Anat Cohen, Udi Shimony, Rafi Nachmias, and Tal
back into their original semesters. The results of this analysis Soffer. 2019. Active learners characterization in MOOC
are summarized in Table 1. forums and their generated knowledge. British Journal of
The metrics gathered in Table 1 show a drop in the number Educational Technology 50, 1 (2019), 177–198.
of duplicate posts from Spring 2017 to Spring 2019. We 4. Hamza Faran. 2018. piazza-api.
performed a one-sided Z-test on the difference in these pro- https://github.com/hfaran/piazza-api/. (2018).
portions and found the drop in duplicate posts statistically
significant (p = 0.0392). Note also that the number of posts 5. Ashok K. Goel and Lalith Polepeddi. 2016. Jill Watson:
per active student declined. A Virtual Teaching Assistant for Online Education.
(2016).
Evaluating the Model 6. David Joyner. 2018. Squeezing the Limeade: Policies and
To evaluate the performance of our model, we performed a Workflows for Scalable Online Degrees. In Proceedings
walk-forward validation experiment. We used the labelled of the Fifth Annual ACM Conference on Learning at Scale
gold standard dataset from Assignment 1 of the Fall 2017 AI (L@S ’18). ACM, New York, NY, USA, Article 53, 10
course containing 179 posts, of which 34 posts had at least pages. DOI:http://dx.doi.org/10.1145/3231644.3231649
one prior duplicate post. We ordered the posts chronologically.
Then, for each post pi , we trained our model ensemble on 7. René F. Kizilcec, Chris Piech, and Emily Schneider.
posts p0 to pi−1 and queried the model for relevant posts to 2013. Deconstructing Disengagement: Analyzing Learner
post pi . The model was able to retrieve at least one relevant Subpopulations in Massive Open Online Courses. In
post, if it existed, 73.5% of the time. Proceedings of the Third International Conference on
Learning Analytics and Knowledge (LAK ’13). ACM,
Instructor Interviews New York, NY, USA, 170–179. DOI:
http://dx.doi.org/10.1145/2460296.2460330
Five teaching assistants (TA’s) from the Spring 2019 Introduc-
tion to AI teaching staff were interviewed, all of whom had 8. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.
previously taken the class. Four of the five TA’s used PARQR Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
as instructors. TA’s reported that the posts that were suggested V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.
to them on their home page as “high attention posts” were a Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn:
useful way to keep on top of new posts and as a way to gauge Machine Learning in Python. Journal of Machine
which questions or topics were likely to be asked about during Learning Research 12 (2011), 2825–2830.
office hours (which each TA holds once a week). One TA
noted that because the instructor suggested posts display the 9. Bradley J. Rhodes and Thad Starner. 2002.
number of follow ups, they can tell when there is an active dis- Remembrance Agent : A continuously running
cussion going on. When asked about whether or not PARQR automated information retrieval system.
reduced the number of duplicate posts in a class, instructors 10. Dhawal Shah. 2018. By The Numbers: MOOCs in 2018.
were unsure, suggesting that instructors will be unable to tell if
PARQR is having an overall impact on the student experience 11. Diyi Yang, Mario Piergallini, Iris Howley, and Carolyn
in the class. Rose. 2014. Forum thread recommendation for massive
open online courses. In Educational Data Mining 2014.
Citeseer.
FUTURE WORK
We are continuing to interview more instructors and students 12. Saijing Zheng, Pamela Wisniewski, Mary Beth Rosson,
in order to gain a better understanding of PARQR’s impact and and John M. Carroll. 2016. Ask the Instructors:
4
Motivations and Challenges of Teaching Massive Open Social Computing (CSCW ’16). ACM, New York, NY,
Online Courses. In Proceedings of the 19th ACM USA, 206–221. DOI:
Conference on Computer-Supported Cooperative Work & http://dx.doi.org/10.1145/2818048.2820082