Professional Documents
Culture Documents
IN DISCOURSE ANALYSIS
What is discourse?
◦ first introduced by Z.Harris in 1952, a formalist view on discourse as a higher
level in the hierarchy: morpheme, clause and sentence
◦ In the framework of the functional approach (G.Leech (1983) and
D.Schiffrin (1994)), discourse is defined as language use (“utterances”, i.e.
“units of linguistic production (whether spoken or written) which are
inherently contextualized” (1994)).
◦ Discourse can be understood as stretches of language perceived to be
meaningful, united and purposive (Cook, 1989).
◦ Discourse features:
◦ Stretch of language longer than a sentence
◦ Meaningful and coherent
◦ Communicative and purposive
◦ Written or spoken
◦ The advantages of a corpus approach for the study of discourse, lexis, and
grammatical variation include the emphasis on the representativeness of
the text sample, and the computational tools for investigating distributional
patterns across discourse contexts.
Corpus approach to discourse
analysis
◦ Large-scale general corpora are effective and reliable in
providing insightful information about the preferred use of specific
lexico-grammatical patterns in everyday language use.
◦ The most important aspect of this approach is that it makes it
possible for linguists and discourse analysts to go beyond the
analysis of sentences and short texts to the analysis of huge
amounts of text.
Why use corpora?
15.02.2023
DISCOURSE ANALYSIS AND CORPUS LINGUISTICS
DA is ‘the study of real language use, by real CL is 'the study of language based on examples of real life
language use' (McEnery & Wilson, 1996)
speakers in real situations" (T.A. van Dijk, 1997)
Corpus is a collection of sampled texts, written or spoken, in
Discourse machine-readable form which may be annotated with various
forms of linguistic information (McEnery, 2006)
‘language-in-use’ (Brown and Yule 1983), Corpus-driven linguistics
‘language-in-action’, i.e. ‘meaningful symbolic behaviour’ a theory with corpus enquiries revealing hitherto unknown aspects
(Blommaert, 2005) of language, thus challenging the ‘underlying assumptions behind
many well established theoretical positions’ (Tognini Bonelli, 2001).
‘the totality of linguistic practices that pertain to a Corpus-based linguistics
particular domain or that create a particular object’
a methodology for validating existing theories and descriptions
(Jucker et al, 2009) (McEnery, 2006; Biber et al., 1998; Conrad, 2002)
15.02.2023
DISCOURSE ANALYSIS AND CORPUS LINGUISTICS
Corpus analyses treat the text as product rather than as an unfolding discourse as process and social action:
‘the computer can only cope with the material products of what people do when they use language. It can only analyse the textual traces of the
processes whereby meaning is achieved’ (Widdowson, 2000).
‘Corpus-based methods cannot account for the complex interplay of linguistic and contextual factors whereby discourse is enacted’ (Widdowson,
2000).
Textual: approaches that focus on language choices, meanings and patterns in texts, mostly commence from a
lexico-grammatical/bottom-up perspective, can also take a rhetorical top-down perspective (P-S pattern (Hoey,
2001)). Various phraseological elements operating at the level of discourse are considered (theory of lexical
priming).
Critical: an approach that ‘brings an attitude of criticality’, such as critical discourse analysis (CDA), but also draws
on other methods, e.g. systemic functional linguistics (SFL).
Contextual: analyses that adopt a more sociolinguistic approach to the corpus data, where situational factors are
also taken into account.
(Flowerdew, 2012; Hyland, 2009)
15.02.2023
COUNTING – COMPARING – VISUALISING
INAUGURAL ADDRESS BY PRESIDENT JOSEPH R. BIDEN, JR., 2021 Keywords
Word list
15.02.2023
KEYWORDS
15.02.2023
FULL-TEXT CORPUS
DATA
https://www.corpusdata.org/
15.02.2023
CASE STUDY.
UKRAINE IN THE
ENGLISH ONLINE
NEWS OF 2010-2021
The NOW corpus (News on the
Web)
https://www.english-
corpora.org/now/
15.02.2023
COLLOCATIONS
15.02.2023
COLLOCATIONS IN CONTEXT
15.02.2023
VIRTUAL CORPORA
Ukrain*_USnews_2014
Ukrain*_USnews_2020
15.02.2023
Ukrain*_USnews_2020
KEYNESS ANALYSIS
Ukrain*_USnews_2014
15.02.2023
#LANCSBOX AND VOYANT TOOLS
http://corpora.lancs.ac.uk/lancsbox/index.php https://voyant-tools.org/
15.02.2023
CLIC
HTTPS://CLIC.BHAM.AC.UK/
15.02.2023
CLIC
15.02.2023
SENTIMENT ANALYSIS
Sentiment analysis, also called opinion mining is a NLP and text mining problem which deals with computational
study of opinions, sentiments and emotions expressed in text. SA is a study of subjectivity (neutral vs emotionally
loaded) and polarity (positive vs negative) of a text (Bo Pang and Lillian Lee)
lexicon-based approaches rely on sentiment lexicon (e.g., General Inquirer, WordNet Affect, QWordNet or SentiWordNet); text corpora
have been commonly used in domain adaptation, which involves converting a domain-independent sentiment lexicon into a domain-
specific lexicon;
supervised machine learning methods (e.g., Naive Bayes, MaxEnt, Support Vector Machine).
SA can be applied at
the discourse level, which presupposes that each document/ text expresses opinions on a single entity.
the sentence-level sentiment analysis determines whether the sentence implies positive or negative opinions.
the object-oriented sentiment analysis reveals sentiment towards a specific entity mentioned in the text.
the aspect-based sentiment analysis focuses on opinions relative to specific properties (or aspects) of an entity.
15.02.2023
LIWC: LINGUISTIC INQUIRY AND WORD COUNT
http://www.liwc.net/index.php
designed by James W. Pennebaker, Roger J. Booth, and Martha E.
Francis;
the LIWC2015 master dictionary is composed of almost 6,400 words,
word stems, and selected emoticons;
analyze over 70 dimensions of language
4 general descriptor categories (total word count, words per sentence,
percentage of words captured by the dictionary, and percent of words longer
than six letters)
22 standard linguistic dimensions (e.g., percentage of words in the text that
are pronouns, articles, auxiliary verbs, etc.)
32 word categories tapping psychological constructs (e.g., affect, cognition,
biological processes)
7 personal concern categories (e.g., work, home, leisure activities)
3 paralinguistic dimensions (assents, fillers, nonfluencies)
12 punctuation categories (periods, commas, etc.)
15.02.2023
LIWC2015 OUTPUT
AFFECT IN ANGLICAN SERMONS
15.02.2023
UAM CORPUS TOOL
http://corpustool.com/index.html
15.02.2023
MANUAL ANNOTATION USING BUILT-IN ATTITUDE SYSTEM
15.02.2023
RECENT DEVELOPMENTS AND NEW CHALLENGES
https://corpus-analysis.com/
a ‘new modal order’ emerging in the era of digital literacies and computer-mediated communication
15.02.2023