You are on page 1of 41

Topic-Modeling

Communities of Discourse
in Doctoral Dissertations

Benjamin Miller
University of Pittsburgh millerb@pitt.edu
@benmiller314
Outline
1. Dissertations
2. Topic Modeling
3. Communities of Doctoral Discourse
Dissertations

don’t get read by many people.
Dissertations

don’t get read by many people,
but a lot of people write them.
Dissertations

are a knowledge-making genre.
Dissertations

are a knowledge-making genre,
^
but also a discipline-producing genre.
Dissertations

are how we write our way into the field,
and into a professional identity within it.
Dissertations

are how we write our way into the field,
and into a professional identity within it.

They’re usefully distributed, if we’re interested
in what counts as a way into the field
Dissertations

are how we write our way into the field,
and into a professional identity within it.

They’re usefully constrained, if we’re interested
in what counts as a way into the field
Dissertations

don’t get read by many people,
but a lot of people write them,
so there are too many to read them all.

N: 1,225 full-text dissertations from the Consortium of
doctoral programs in Rhetoric/Composition, 2001-2010
The data

full text of metadata
dissertation
title
abstract
author
open vocab keywords
Dissertations closed vocab subjects
& Theses pages
university
advisor
accession number

N: 1,225 full-text dissertations from the Consortium of
doctoral programs in Rhetoric/Composition, 2001-2010
Outline
1. Dissertations
2. Topic Modeling
3. Communities of Doctoral Discourse
Topic Modeling
is a kind of text mining
full text of metadata
dissertation
title
abstract
author
open vocab keywords
Dissertations closed vocab subjects
& Theses pages
university
advisor
accession number

Topics

N: 1,225 full-text dissertations from the Consortium of
doctoral programs in Rhetoric/Composition, 2001-2010
Topic Modeling
in brief

students, writing, public, political,
class, teacher, paper, social, economic,
instructor, semester, rhetoric, society,
assignment power, labor, class

In five classrooms: A Entering the fray: The slogan's
A descriptive study of “before C place in Bolshevik organizational
writing teaching practices” in communication
encouraging college writers
to write
Inside the teaching machine: The United States
B public research university, surplus value, and
the political economy of globalization
Goldstone and Underwood 2014:
The aim of topic modeling is to identify the thematic
or rhetorical patterns that inform a collection of
documents: for instance, the articles in a group of
scholarly journals. These patterns we refer to as topics.
If each article were about a single topic, we would only
need to sort the articles into categories. But in reality,
any article participates in multiple thematic and
rhetorical patterns.” (boldface added)
Goldstone and Underwood 2014:
The aim of topic modeling is to identify the thematic
or rhetorical patterns that inform a collection of
documents: for instance, the articles in a group of
scholarly journals. These patterns we refer to as topics.
If each article were about a single topic, we would only
need to sort the articles into categories. But in reality,
any article participates in multiple thematic and
rhetorical patterns. […] The algorithm responds to this
challenge by modeling a topic as an intersection of
vocabulary and context: it identifies groups of words
that tend to be associated with each other in a
particular subset of documents.”
Goldstone and Underwood 2014:
The aim of topic modeling is to identify the thematic
or rhetorical patterns that inform a collection of
documents: for instance, the articles in a group of
scholarly journals. These patterns we refer to as topics.
If each article were about a single topic, we would only
need to sort the articles into categories. But in reality,
any article participates in multiple thematic and
rhetorical patterns.” (boldface added)
Topic Modeling
in brief

students, writing, public, political,
class, teacher, paper, social, economic,
instructor, semester, rhetoric, society,
assignment power, labor, class

In five classrooms: A Entering the fray: The slogan's
A descriptive study of “before C place in Bolshevik organizational
writing teaching practices” in communication
encouraging college writers
to write
Inside the teaching machine: The United States
B public research university, surplus value, and
the political economy of globalization
Topic Modeling
in brief

students, writing, public, political,
class
class, teacher, paper, social, economic,
instructor, semester, rhetoric, society,
assignment class
power, labor, class

In five classrooms: A Entering the fray: The slogan's
A descriptive study of “before C place in Bolshevik organizational
writing teaching practices” in communication
encouraging college writers
to write
Inside the teaching machine: The United States
B public research university, surplus value, and
the political economy of globalization
Topic Modeling
in brief

Blei et al 2011
Topic models are all over DH
• Scott Weingart has a number of
useful roundup posts at
scottbot.net/tag/topic-modeling/,
including “Topic Modeling for
Humanists: A Guided Tour.”

• (I highly recommend it if you’re
interested.)
Topic models in comp/rhet
• Clancy Ratliff and Jonathan
Goodwin 2013:
topic models of 5 journals
(CE, CCC, RSQ, JAC, RR)
http://www.culturecat.net/
node/1564
Topics don’t come with labels
Topic 32
Top words: students, writing, student, class, teacher,
classroom, teachers, paper, instructor, research,
study, instructors, semester, college, assignment,
classes, write, teaching, learning, ...

(plus abstracts
further down)
Topics don’t come with labels
• More awesome topic browsers:
– Ratliff and Goodwin:
http://jgoodwin.net/rhet-browser/
– Goldstone and Underwood:
http://rci.rutgers.edu/~ag978/quiet
Outline
1. Dissertations
2. Topic Modeling
3. Communities of Doctoral Discourse
Outline
1. Dissertations
2. Topic Modeling
3. Communities of Doctoral Discourse
No one topic dominates
Rank 1:
Students in the Classroom
0.6
0.7
0.7
0.7 0.4 5.5
0.5
0.5
0.6
0.8
0.9
0.9 4.7
0.9
0.9
1.0
1.0 Topic weight: 5.5%
1.1
1.1 4.4
1.1
1.3
1.3 4.0 Top words:
1.3
1.3
4.0
students, writing, student,
1.5
1.6 class, teacher, classroom,
1.7 3.8 teachers, paper, instructor,
1.8
1.9 3.6 research, study, instructors,
2.0 semester, college,
2.2 3.6
2.5 assignment, classes, write,
3.5
2.6
3.4
teaching, learning, …
2.6
2.9 2.9 3.0 3.2
No one topic dominates
1: Students in the Classroom

0.4 5.5
0.5
0.5
0.6
2: (Critical) Pedagogical
0.6
0.7
0.7
0.7
0.8
0.9
0.9
0.9 4.7 Theory
0.9
1.0
1.0
1.1 4.4
1.1
1.1
1.3 Topic weight: 4.7%
1.3 4.0
1.3
1.3
1.5 4.0 Top words:
1.6 students, composition,
1.7 3.8
1.8 teaching, pedagogy,
1.9 3.6 classroom, teachers, critical,
2.0
2.2 3.6 work, student, teacher,
2.5 3.5 theory, studies, knowledge,
2.6
2.6 3.4 learning, ways, education,
2.9 2.9 3.0 3.2
academic, pedagogical,
practice, …
No one topic dominates
1: Students in the Classroom

0.4 5.5
0.5
0.5
0.6
2: (Critical) Pedagogical
0.6
0.7
0.7
0.7
0.8
0.9
0.9
0.9 4.7 Theory
0.9
1.0
1.0
1.1 4.4
1.1 3: Philosophy of Language
1.1
1.3 4.0
1.3 4: Story and Narrative
1.3
1.3
1.5 4.0 5: Identity Construction
1.6
1.7 3.8 6: Process Reflections
1.8
1.9 3.6 7: Community Engagement
2.0
2.2 3.6 8: Capitalism, Marxism, and
2.5 3.5
2.6 Activism
2.6 3.4
2.9 2.9 3.0 3.2 9: Comprehension and
Usability
10: Workplace and
Organizational Histories
Yearly Variation of Topic Proportions Generally Preserves Topic Rank
Consortium Program dissertations, N 1225, years 2001−2010

0.08
Topic ranks
relatively Portion of Corpus (scaled to 1)

0.06
stable in
this time
0.04

window
0.02
0.00

32
8
48
10
15
1
35
17
55
12
31
39
25
14
21
28
6
44
41
11
45
9
43
23
7
18
3
49
36
27
40
20
30
16
19
53
38
26
29
46
52
37
33
42
54
51
34
5
Topic number
But topics form clusters
Topics form clusters

bit.ly/dissclusters
Topics form clusters
Teaching of
Writing: 19.3%
Identity and
Performance:
18.6%
Politics
17.25%
Theories of
Meaning-Making:
15.6%
Design and
Audience: 14%
WPA?
Curriculum?
11.9%
bit.ly/dissclusters
Clusters vary by program. Maybe.
Consortium schools, any department Consortium programs only
Teaching of Teaching of
Writing: 19.2% Writing: 19.3%

Theories of Identity and
Meaning-Making: Performance:
17.25% 18.6%
Identity and Politics
Performance: 17.25%
15.1%
Theories of
Design and Meaning-Making:
Audience: 14.5% 15.6%

WPA? Curriculum? Design and
12.14% Audience: 14%
Politics WPA? Curriculum?
9.68% 11.9%
Clusters vary by program. Maybe.
Consortium schools, any department Consortium programs only
Teaching of Teaching of
Writing: 19.2% Writing: 19.3%

Theories of Identity and
Meaning-Making: Performance:
17.25% 18.6%
Identity and Politics
Performance: 17.25%
15.1%
Theories of
Design and Meaning-Making:
Audience: 14.5% 15.6%

WPA? Curriculum? Design and
12.14% Audience: 14%
Politics WPA? Curriculum?
9.68% 11.9%
Clusters vary by program. Maybe.
Consortium schools, any department Consortium programs only

Topic 17: Capitalism, Marxism, and activism
Rank: 8

% of corpus: 3.60

Top words: public, political, social, economic, Politics
movement, rhetoric, society, politics, power, 17.25%
cultural, labor, university, state, democracy,
change, action, democratic, rhetorical, class, …

Politics
9.68%
And topics co-occur across clusters

bit.ly/dissedges
And topics co-occur across clusters

bit.ly/dissedges
And topics co-occur across clusters

bit.ly/dissedges
and when they don’t … opportunity!

bit.ly/dissedges
Questions?
I have a lot, too.
Let’s talk!

Benjamin Miller
University of Pittsburgh millerb@pitt.edu
@benmiller314