A Theory-Driven Machine Learning System For Financial Disinformation Detection

Received: 26 February 2021 Accepted: 26 April 2022
DOI: 10.1111/poms.13743
ORIGINAL ARTICLE
A theory-driven machine learning system for financial

disinformation detection
Xiaohui Zhang1 Qianzhou Du2 Zhongju Zhang1
1
W. P. Carey School of Business, Arizona State Abstract
University, Tempe, Arizona, USA
Maliciously false information (disinformation) can influence people’s beliefs and
2
Business School, Nanjing University, Nanjing, behaviors with significant social and economic implications. In this study, we exam-
Jiangsu, People’s Republic of China
ine news articles on crowd-sourced digital platforms for financial markets. Assembling
a unique dataset of financial news articles that were investigated and prosecuted by the
Correspondence
Qianzhou Du, Business School, Nanjing Securities and Exchange Commission, along with the propagation data of such articles
University, No. 16 Jinyin Street, Gulou District, on digital platforms and the financial performance data of the focal firm, we develop
Nanjing 210093, Jiangsu Province, People’s
a well-justified machine learning system to detect financial disinformation published
Republic of China.
Email: qianzhou@nju.edu.cn on social media platforms. Our system design is rooted in the truth-default theory,
which argues that communication context and motive, coherence, information corre-
Handling Editor: Vijay Mookerjee spondence, propagation, and sender demeanor are major constructs to assess deceptive
communication. Extensive analyses are conducted to evaluate the performance and effi-
cacy of the proposed system. We further discuss this study’s theoretical implications
and its practical value.
KEYWORDS
fake news, financial disinformation, social media platform, theory-driven machine learning, truth-default
theory
1 INTRODUCTION the topic and distinguish these concepts based on three char-
acteristics: (i) authenticity (false or not), (ii) intention (bad or
The Internet plays an essential role in disseminating infor- not), and (iii) whether the information is news or not. Consol-
mation in a free, open, and democratic society. However, the idating earlier research (Kogan et al., 2019; Zhou & Zafarani,
quality of the information available on the Internet can vary 2020), we use the term disinformation in this study to refer
widely in terms of its accuracy, objectivity, completeness, and to news that is intentionally and verifiably false news pub-
reliability (Liu et al., 2020). Social media platforms, which lished on social media platforms. This definition emphasizes
facilitate sharing of information, exacerbate this problem. In the malicious intent and ex ante false attributes of the infor-
recent years, we have witnessed a continuing onslaught of mation. For instance, the first two messages in The Boy Who
false or misleading information on such platforms. These Cried Wolf were maliciously false (making fun of others with
types of false or misleading content can generate enough lies) at the beginning, even though the wolf really came at the
critical mass (often in a matter of minutes) so as to influ- end.
ence people’s beliefs, attitudes, and behaviors by virtue of In the financial context, researchers have demonstrated
its ubiquity with significant social and economic implications that financial news posted on social media platforms can
(Grinberg et al., 2019). As such, there has been growing atten- have a significant influence on financial markets (Chen et al.,
tion from both academia and industry on this broad topic, 2014). Unauthorized/illegal information manipulation is a
which involves various terms and concepts such as fake news, huge moral hazard issue that challenges financial risk oper-
misinformation, disinformation, maliciously false news, or ations (Xu et al., 2017). Repeated exposure to fake news can
other forms of biased content. result in an “illusory truth effect” (Pennycook et al., 2018),
To help consolidate various concepts related to fake news, which might lead to confirmation bias in investors’ finan-
Zhou and Zafarani (2020) provide a comprehensive survey of cial beliefs. Yet research on financial disinformation detection
is scant. Two recent developments are Kogan et al. (2019)
Accepted by Vijay Mookerjee, after two revisions. and Clarke et al. (2021), which showed that the existence
3160 © 2022 Production and Operations Management Society. wileyonlinelibrary.com/journal/poms Prod Oper Manag. 2022;31:3160–3179.
19375956, 2022, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/poms.13743 by The Library Goldsmiths University of London, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DETECTING FINANCIAL DISINFORMATION 3161
Production and Operations Management
of financial disinformation could distort stock trading vol- tion on crowd-sourced digital platforms. The final dataset
umes and returns and hurt the social welfare of financial consists of detailed information of financial news on Seek-
social media platforms and the fairness of market compe- ing Alpha, the diffusion patterns of the news on other social
tition. In both studies, the authors proposed a simple and media platforms, including StockTwits and Twitter, as well as
similar text analytics approach to detect financial disinforma- the financial performance data of the focal firm(s) discussed
tion before analyzing its causal effects. The importance of a in the news. We test our designed system with a rigorous
reliable financial disinformation detection method is evident evaluation of various classification algorithms and empirical
in these circumstances in order to ensure robust empirical settings. Our extensive analysis and performance evaluation
findings. illustrate that our system is effective in detecting financial
Previous detection methods (Zhou et al., 2019) are not disinformation; it outperforms previous intuitive-based, cues-
satisfactory in the financial domain. First, those methods theory-based models. We also compare our model against
were mainly trained with data in the social or political cutting-edge deep learning models and evaluate its robust-
domain. Prior literature has shown that fake social/political ness to the temporal change in financial news, the ratio of
news relies on people’s hedonic mindset like curiosity to financial disinformation in the sample, and various prediction
attract readers (Vosoughi et al., 2018). However, financial thresholds. Finally, we illustrate the explainability of our pre-
news is usually utilitarian oriented; the patterns found from dictive model using a model-agnostic interpretation method,
social/political fake news may be inadequate for identify- the Shapley values (Molnar, 2020).
ing financial disinformation. Additionally, previous methods This study contributes to several aspects of operations
rely on manual labeling mechanisms (e.g., a fact-checker) to management (OM) and information system (IS) research. We
verify the news content and provide ground-truth (Sharma showcase a systematic approach to operationalize a social
et al., 2019). This ground-truth labeling approach can be a science theory (TDT) in guiding the design of a situated IT
challenge because financial news routinely contains financial artifact for financial disinformation detection. Our proposed
domain knowledge or insider information, which is difficult system significantly outperforms existing models and offers
to verify even by professional financial editors, let alone better explainability for predicting financial disinformation,
the general crowd (Clarke et al., 2021). Finally, the set of which has high problem maturity but low solution maturity.
quantifiable features that go into the detection methods in The proposed IT artifact could also be valuable to analyze
previous studies are either ad hoc or lack a systematic the- human-induced operational risk in financial services (Xu
ory that takes a unified and comprehensive perspective of et al., 2017). From an empirical perspective, our evaluation
fake news. Financial operations, on the other hand, emphasize results emphasize the importance of situational information
the accuracy, rigor, and explainability of findings (Xu et al., such as context and motivation for financial operations.
2017). The rest of this paper is organized as follows. In Sec-
In this paper, we examine news/content on crowd-sourced tion 2, we review the relevant literature (including conceptual,
digital platforms for financial markets. Specifically, we seek empirical studies, and computational methods) about online
to develop an effective and well-justified machine learn- deceptions and fake news. Section 3 describes a comprehen-
ing application system to predict maliciously false financial sive set of quantifiable measurements from financial news
news (financial disinformation) published on social media to operationalize the TDT as well as the overall architec-
platforms. Such an Information technology (IT) artifact can ture of our system. Section 4 discusses how to construct
potentially provide a unified theoretical framework that con- our dataset and the experimental settings to evaluate our
siders the root cause of financial disinformation for early system. In Section 5, we provide extensive evaluations
detection and intervention. It could also help expand and of the system performance, the robustness of our results,
establish a reliable ground-truth financial disinformation and illustrations of the explainability of the results. Sec-
dataset, which provides a foundation for future empirical tion 6 summarizes and discusses this research’s theoretical
studies aiming to analyze the causal impacts of financial and practical implications, limitations, and future research
disinformation. directions.
To achieve the above goal, we develop a machine learn-
ing application system for financial disinformation detection
under the guidance of design research theories and paradigms 2 LITERATURE AND RELATED WORK
(Hevner et al., 2004; Walls et al., 1992). To obtain rigor and
explainability, we operationalize a comprehensive deception This study relates to several streams of research across mul-
theory, truth-default theory (TDT) (Levine, 2014), to guide tiple disciplines. These include behavioral and empirical
our design. Based on TDT and empirical financial results, we studies of online deceptions, fake reviews, and news, diffu-
proposed a comprehensive set of computable metrics of finan- sion, and influence of such content on social media platforms,
cial news’ context and motive, sender demeanor, third-party technical and computational methods of deception detection,
reactions, content correspondence, and content coherence. as well as recent econometrics studies of the impacts of fake
Our ground-truth comes from the Securities and Exchange news in the financial field. In the following, we provide a
Commission’s (SEC) investigation of financial disinforma- synthesized overview of relevant work.
3162 ZHANG ET AL.
2.1 Online deception and fake news 2.2 Financial disinformation
A rich body of literature explores deceptions in various Financial disinformation explicitly aims to prevent general
online settings such as e-commerce, product reviews, and investors from grasping the complete information or “inside”
social media. Based on the context of e-commerce, Xiao risks inherent in particular assets (Kane, 2004). The most
and Benbasat (2011) proposed a comprehensive theoret- common format of financial disinformation is manipulated
ical framework describing the process of product-related financial reports (Beneish, 1999). This type of financial fraud
consumer deception. The authors argue that sellers’ manip- is generated by the misconduct of inside managers and out-
ulations of products’ information content, presentation, and side auditing firms (Huang et al., 2017; Kane, 2004; Liang
generation would influence consumers’ decisions through et al., 2016). Such disinformation within the financial sys-
affective and cognitive mechanisms. Along a similar line tem is closely under the monitoring and regulation of the
of research, Xiao and Benbasat (2015) argued that product SEC, and thus the violators might face serious consequences
recommendation agents might be biased to deceive con- and punishments (Amiram et al., 2018). Beyond the direct
sumers and found that adding proper warning messages financial reports from firms, financial disinformation also
can alleviate the risks in an experimental setting. Closely appears across news media. Previous research on fake finan-
related to e-commerce, many studies in the literature have cial news mainly dealt with rumors or misinformation that
focused on fake product reviews. Luca and Zervas (2016), appear on traditional mainstream media. The media’s incen-
for instance, found that about 16% of restaurant reviews on tive to publish sensational news leads to low accuracy and
Yelp are suspicious, and restaurants that face weak reputation biased information, which further result in abnormal market
or high competition are more likely to commit review fraud. reactions (Ahern & Sosyura, 2015). Bali et al. (2017) inves-
Hence, firms might actively manage online sentiment about tigated the relationship between a firm’s unusual news flow
their products in a strategic way (Lee et al., 2018). Lappas on the firm’s stock return and found that unusual news flow
et al. (2016) demonstrated that even a small number of fake can lead to volatility shock. However, the cost of manipulat-
reviews could significantly affect the visibility of hotels on ing mainstream media could be high and require significant
the market. Identity deception is a significant problem due to resources.
physical distancing (Vishwanath, 2015). In the supply chain The growth of different social media platforms creates
context, Cezar et al. (2020) examined the economic impact new channels for financial disinformation. Social media dis-
when firms face agents who have the incentive to fake their cussions are usually considered as third-party opinions (Lee
attributes in order to receive favorable decisions from the et al., 2018), and there is a lack of systematic regulations
firm’s classifiers. on such discussions. Thus, social media enables firms to
Studies on fake news have examined its generation and dif- shape investors’ opinions and manipulate the financial mar-
fusion patterns, influences, and ways to combat fake news ket in a low cost way (Tardelli et al., 2021). Two recent
(Lazer et al., 2018). On the Twitter platform, it was found studies using financial disinformation data on social media
that a small group of users accounted for the generation platforms are Kogan et al. (2019) and Clarke et al. (2021),
and spread of the majority of fake news (Grinberg et al., and the SEC investigation revealed that this financial dis-
2019). Additionally, fake news circulated on social media information was made and spread by writers secretly paid
platforms is usually generated from outside sources (Chiou by subject firms. Kogan et al. (2019) found that financial
& Tucker, 2018). The diffusion patterns of fake and legit- disinformation increases the trading volume and stock price
imate news are strikingly different. Vosoughi et al. (2018) temporarily. Clarke et al. (2021) found that financial disin-
found that fake news usually diffuses faster, deeper, and formation obtains significantly more attention from investors
broader than legitimate news due to its originality. Shin et al. than legitimate news. Both of the studies stopped short in
(2018) demonstrated that fake news propagation is usually developing an effective method to detect/predict financial
recursive and periodic, that is, it is likely to reappear peri- disinformation.
odically. Shao et al. (2018) discovered that social bots play
an essential role in online fake news diffusion. Fake news
can also have significant impacts on people’s beliefs and atti- 2.3 Detection of deceptions and fake news
tudes. Kim and Dennis (2019) found that the presentation
format and the source of an article influence the extent to As discussed earlier, an effective and automatic detection
which users believe the article, which in turn influences how model is necessary for studies that seek to examine the causal
they engage with it. In a separate study, Kim et al. (2019) effects of financial disinformation using observational data.
investigated solutions to combat fake news by labeling the In the following, we summarize representative work and
credibility and reputation ratings of news sources. It is worth review relevant studies of detection methods for deceptions,
noting that all of the above work focused on political or frauds, and fake news. Zhou and Zhang (2008) validated
social news, while there are potential differences between that linguistic features are useful in detecting deception in
fake financial news and political news (Vosoughi et al., online communications. In the product management context,
2018). Abrahams et al. (2015) examined the role of linguistic- and
content-based cues in detecting potential product defects. In TA B L E 1 Design framework for financial disinformation detection
finance and accounting domains, Abbasi et al. (2012) pro- machine learning system
posed a metalearning framework and utilized firms’ financial Kernel theory Truth-default theory (TDT) emphasizes
ratios for financial fraud detection. Dong et al. (2018) fur- contextualized communication content. By
ther improved the algorithm by incorporating financial social default, people presume that communication is
honest (a truth-default state); a trigger or set of
media data to assess corporate fraud. trigger events can lead people to abandon the
Deception detection of various forms of communications truth-default state.
has also been a widely studied topic in computer science Metarequirements TDT proposes to use five types of information to
literature. The phenomenon of fake news on social media evaluate if an evidentiary threshold is crossed for
platforms in recent years has further fueled interest in this triggering the abandonment of the truth-default:
topic. To facilitate understanding and presentation, we sum- communication context and motive, sender
marize existing studies based on three distinct information demeanor, information from third parties,
communication coherence, and correspondence
components of communication (Hovland et al., 1953) used in information.
the fake news detection methods in Supporting Information
Metadesign Construct a comprehensive set of computable
Table A1. metrics that can represent each type of
As can be seen from Table A1, each component of commu- information in TDT;
nication (in this case, fake news) can be measured by a variety Design an analytic system that can extract features
of features. The source of news, especially source credibility, and identify financial disinformation.
has been found to help identify if a news article is fake or not. Testable hypothesis Evaluate the capabilities of our system. Specific
Some of the specific metrics used include whether the source testable hypotheses are as follows:
account is a bot account (Sharma et al., 2019; Shu et al., H1: Each of the TDT-based feature sets can
significantly improve the performance of financial
2020a), the number of followers of the author (Castillo et al., disinformation detection.
2011), the number of followings of the author, author posting, H2: Algorithm using TDT-based features can
and reposting behavior (Ruchansky et al., 2017), and author outperform the previous ad hoc and
engagement matrix (Shu et al., 2019; Yang et al., 2019). cues-theory-based approach for detecting financial
News content concerns the veracity of the news. Prior lit- disinformation.
erature has extracted lexical features, syntax features, and
semantic features from news content. Metrics used for lex-
ical features include the number of words, average word
length, and the number of words per sentence. Syntax fea- and interpersonal deception theory, the extracted features
tures further consider the attributes of words, like person focused primarily on content and linguistic metrics. These
pronouns, verbs tense, singular and plural pronouns, num- “cues-theories” are outdated and inconsistent with empiri-
ber of verbs, modal verbs, and passive verbs (Feng et al., cal findings (Levine & McCornack, 2014) and thus are not
2012). Semantic features distill abstract-level information, appropriate kernel theories for financial disinformation detec-
including emotiveness, objective/subjective words, general- tion. The lack of a proper unified theoretical foundation
izing words, and textual representation extracted by deep could lead to a lack of rigorousness and comprehensiveness,
learning methods. which limits the generalizability of applications (Hevner
The audience component contains rich secondary infor- et al., 2004). The magnitude of this issue could become more
mation about users and spreaders of the news as well as evident in financial services, where financial disinformation
the interaction and feedback among these users. Specific on digital platforms exhibits unique patterns and requires
metrics include user/spreader profiles (Castillo et al., 2011; extensive domain knowledge to justify the legitimateness of
Ruchansky et al., 2017), response stance measured by senti- information (Clarke et al., 2021; Xu et al., 2017).
ment or opinions (Qian et al., 2018), temporal features (Kwon We followed information system design theory (ISDT)
et al., 2017), geographic location (Yang et al., 2012), types (Walls et al., 1992) to guide our system design process, which
of devices used (F. Yang et al., 2012), the depth of retweet includes kernel theories, metarequirements, metadesign, and
tree, the path of information diffusion, and user engagement testable hypotheses. Table 1 illustrates this study’s applica-
matrix (Yang et al., 2012; Yang et al., 2019). tion of ISDT. We discuss the details of each component in the
following subsections.
3 KERNEL-THEORY-BASED DESIGN
3.1 Kernel theory: Truth-default theory
A theory-driven machine learning system is necessary for
detecting financial disinformation. The majority of the fake An important concept in TDT is that deceptive messages
news detection algorithms presented in Section 2 rely on involve intent, awareness, and/or purpose to mislead; that is,
intuitively extracted features (Sharma et al., 2019). For a message is considered as honest/true absent such intent or
the few that employed theories such as four-factor theory purpose (Levine, 2014). TDT is related and compatible with
3164 ZHANG ET AL.
another popular theory: information manipulation theory 2 we first survey prior conceptual and empirical studies to dis-
(IMT2), which was proposed by McCornack et al. (2014). till dimensions that can collectively represent the required
While IMT2 is a theory of deceptive discourse production, information for each construct. Then, we select dimensions
which posits that deception involves covert manipulation of that can be measured and extracted from real-life data as
information, TDT is focused more on credibility assessment predictors of financial disinformation.
and deception detection accuracy (Levine, 2014). The central
idea in TDT is the truth-default state. By default, people tend
to tell the truth and believe others’ communication. This pre- 3.2.1 Communication context and motive
sumption of truth-default is adaptive and will be abandoned
once suspicion is triggered. TDT, as a general theory, provides no detailed measure-
Levine (2014) indicated that some of the critical trig- ment of communication context and motive but emphasizes
ger events to abandon truth-default include (a) a motive for that deceptions mainly happen when the context thwarts
deception, (b) behaviors associated with dishonest demeanor, desired goals (Levine, 2014). Spreading financial disinfor-
(c) a lack of coherence in message content, (d) a lack of cor- mation on social media platforms is a low-cost approach to
respondence between communication content and knowledge mislead people’s perception of a firm’s performance. Anec-
of reality, or (e) information from a third party to warn poten- dotal evidence indicates that most financial disinformation
tial deception. If a trigger (or set of triggers) is strong and was created either by firms’ managers or by market manip-
crosses a threshold, then suspicion is generated, and truth- ulators who want to conduct the pump-and-dump scheme
default is abandoned. Hence, to assess/detect honesty and on the subject firm (Huang et al., 2017; Siering, 2019).
deceit, Levine (2014) proposed five types of information: The direct intention of such financial disinformation is to
(1) communication context and motive, (2) sender demeanor, increase the subject firm’s stock price. Empirical research
(3) information from third parties, (4) communication coher- has found evidence that the stock price of subject firms
ence, and (5) correspondence information. It should be noted experiences a significant short-term lift after the publica-
that TDT emphasizes contextualized communication content tion of the fake news (Kogan et al., 2019). Research has
(i.e., communication content as well as knowledge of context also shown that managers are more likely to conduct finan-
or situations in which communication occurs) in deception cial statement fraud when firms experience pressure (Huang
detection. Levine (2014) argues that contextualized com- et al., 2017). Kogan et al. (2019) find that the market reac-
munication content tends to be more critical than sender tion on financial disinformation is stronger for firms with
demeanor and observable nonverbal behaviors to improve lower market capitalization (MC). Financial disinformation
deception detection accuracy. Given this, we believe TDT is also more likely to be associated with small firms as they
offers a better fit for detecting financial disinformation, where are easier to manipulate (Siering, 2019; Tardelli et al., 2021).
the motivations and contexts often play a significant role Therefore, we can capture the contexts that motivate finan-
(Huang et al., 2017). cial disinformation by monitoring firms’ sizes and financial
TDT has provided theoretical inspiration in various areas. ratios.
TDT has been used in the cybersecurity area to ana- Previous research on financial statement fraud identi-
lyze individual differences in susceptibility to malevolent fied a set of financial ratios that capture conditions where
interruptions (Williams et al., 2017). In the online commu- firms meet pressures or development opportunities (Liang
nity domain, TDT helps to understand behavior differences et al., 2016). Specifically, guided by financial fraud detection
between individuals from different cultures (Marett et al., research (Abbasi et al., 2012; Dong et al., 2018), we use the
2017). More relevant to this research, TDT helps to under- following financial ratios to capture the context and motive
stand mass-oriented computer-mediated communication with of financial news: asset quality index (AQI), asset turnover
authenticity as a core unifying construct (Lee, 2020). In (AT), cash flow earning difference (CFED), days sales in
the following, based on the TDT and a series of empirical receivables (DSIR), depreciation index (DEPI), inventory
research, we propose a systematic and comprehensive set of growth (IG), leverage (LEV), operating performance mar-
quantitative metrics corresponding to the five categories of gin (OPM), receivables growth (RG), sales growth (SG),
information in TDT. and SGE expense (SGEE). The detailed calculation pro-
cess of these financial ratios, along with the rationale why
they may influence the communication context, is reported
3.2 Metarequirements and metadesign in Supporting Information Table A2. Abbasi et al. (2012)
argue that different information granularities in quarterly and
Metarequirements are the information needed to achieve the annual financial information make them complementary to
goal of the desired IT artifact; in our case, the information each other in deception detection. Therefore, we include both
needed for detecting financial disinformation. As discussed annual and quarterly financial ratio features in our analysis.
earlier, TDT proposed five types of information for decep- In terms of firm size, following previous research (Li et al.,
tion detection. However, they are abstract constructs rather 2021), we measure firm size with MC, the number of com-
than computational variables and need to be operationalized mon shares of outstanding at yearend (CSHO), and the stock
as specific measurements of financial news. To achieve that, price at yearend (PRCC).
3.2.2 Sender demeanor We use propagation on StockTwits, a social media platform

for investors and traders, to represent information from third
Sender demeanor refers to the believability of the information parties with domain knowledge, and the propagation on Twit-
source (Levine, 2014), a construct similar to source credi- ter to represent information from general third parties (Deng
bility. Sussman and Siegal (2003) proposed that competence et al., 2018). Shu et al. (2019) suggested that disinformation is
and trustworthiness are two essential components of source more likely to be shared among individuals who lack related
credibility. Liu et al. (2020) captured competence using past knowledge. We capture this difference using the propagation
experience of the information source and the author’s exper- patterns of the news articles on both platforms. Besides cap-
tise in online knowledge communities. Past experience can turing whether a financial news article was diffused to the
be measured by her tenure in the financial news platform, for professional or general platform, we also measure informa-
example, Seeking Alpha. Expertise reflects an individual’s tion from third parties by the diffusion/propagation patterns
ability to continuously output useful knowledge (Cheung and the status of information sharers (Vosoughi et al., 2018).
et al., 2013). On Seeking Alpha, users have two output We quantify propagation patterns in speed (the time lag
channels. Articles are formal financial analyses related to cer- between the publication of an article and its cross-platform
tain assets, and blogs are informal opinions without topic diffusion in seconds), scope (the number of users and posts
restrictions. Articles require professional financial analysis related to the article on the platform), and depth (the number
and will be checked by Seeking Alpha editors (Clarke et al., of reactions to posts related to the article, like replies, likes,
2021). We use the number of articles and blogs on the plat- and retweets). For information sharers’ status, we consider
form to measure the author’s expertise. According to social her experience (tenure in Twitter or StockTwits in months)
capital studies, trustworthiness is usually created through and interactions (the number of followers, following, posts,
people’s interactions in a social network. If an individual and received likes).
conducts a large number of high-quality interactions with oth-
ers, the trustworthiness of the individual will increase (Tsai
& Ghoshal, 1998). Thus, we use an individual’s comment 3.2.4 Communication coherence
count and average comment length to measure the trustwor-
thiness. Another aspect of the sender demeanor is information Communication coherence refers to the consistency of the
distribution behaviors. Deceivers could leverage information contents’ elements and logic (Levine, 2014). There are
diffusion to deceive people, especially in computer-mediated two subcategories of coherence: explanatory coherence and
communication like social media (Vishwanath, 2015). We use structural coherence (McNamara et al., 1996). Explanatory
the number of subcommunities to which an author distributed coherence is the content that supplies the information needed
her financial news articles and the number of subcommu- to conduct effective communication. Givenness (the amount
nities that an author participated in to measure the author’s of information), type–token ratio (the repetition of differ-
information distribution behavior. ent types of words), and connectedness (the presence of
connective words, e.g., “and,” “but”) are proper measure-
ments of explanatory coherence (Crossley et al., 2016) and
can be extracted using Linguistic Inquiry and Word Count
3.2.3 Information from third parties (LIWC). Structural coherence is the content’s internal logic
structure, that is, the relationship between sentences and
Information from third parties represents diagnostic informa- paragraphs. Since structural coherence focuses on the pat-
tion provided by channels or individuals outside of the current terns of discourse relation transitions (Lin et al., 2011), it
communication environment (Levine, 2014). The informa- can be measured by lexical overlap and semantic similarity
tion diffused on a single platform can form an echo chamber between sentences and paragraphs (Crossley et al., 2019).
and result in confirmation bias (Pennycook et al., 2018). We quantify the structural coherence of financial news by
Therefore, TDT emphasizes the question effects from third calculating the overlaps of synonyms, the cosine similarity
parties, especially information from the outside contexts, to of latent semantic analysis, the latent Dirichlet allocation
help detect deceptions. This idea is also reflected in dynamic divergence score, and the word2vec similarity score. The
communication theory (Shin et al., 2018), which suggests that coherence measurements are generated by the Tool for the
disinformation would recursively diffuse to and be shaped by Automatic Analysis of Cohesion 2.0 (TAACO). Crossley
outside contexts. found that cross-platform diffusion could et al. (2019) validated that the generated measurements are
promote the consumption of information. Chiou and Tucker highly consistent with manual evaluations of text structure
(2018) showed that external links are the primary source of cohesion.
fake news on social media. Information outside of the local
communication environment is harder to control, manipu-
late, and hide than that in the original environment (Zhou 3.2.5 Correspondence information
& Zafarani, 2020). Therefore, the propagation of a finan-
cial news article on other platforms can capture relevant Correspondence information involves a comparison between
information from third parties. what is said and known facts (Levine, 2014). Hence one
3166 ZHANG ET AL.
needs external evidence or message receiver reactions to mea-

sure communication correspondence (Levine, 2014). Kogan
et al. (2019) suggest that in most financial disinformation,
there exist exaggerations of positive information that can-
not be found in firms’ financial statements. To measure
the consistency of article sentiment and financial facts, we
employed the method in Gordon et al. (2013) and quantified
correspondence information by comparing the firm’s stan-
dardized financial ratios against the standardized sentiments
extracted from the news article. Since it is the compari-
son of two sets of standardized variables, the mean value
would be zero. Furthermore, we compare the consistency
between news articles and Seeking Alpha users’ percep-
tions/beliefs about certain facts in the article. We extract
metrics from the news comments to measure receivers’ reac-
tions to the financial news. These include the number of
comments, the average sentiments of the comments, and
the difference between the news article’s sentiment and its
comments’ sentiment. Table 2 summarizes the quantitative
metrics for the five information categories suggested by
TDT.
3.3 Testable hypotheses
According to ISDT, testable hypotheses mainly concern the

feasibility and effectiveness of the IT artifact (Walls et al.,
1992). The feasibility of the proposed financial disinforma-
tion detection system is demonstrated by the instantiation F I G U R E 1 System architecture of the financial disinformation
described in this research. To test the effectiveness, we evalu- detection system
ate our method’s performance improvement by incrementally
adding each of the five sets of features highlighted in TDT to
the model. We further benchmark our method against those
that use ad hoc and cue-theory-based features, as well as a
Data collection gathers necessary information from mul-
few deep learning models that use only textual information.
tiple sources and matches information at the news level
Therefore, we propose the following hypotheses to evaluate
to construct a dataset. Article content, author information,
the effectiveness of our system.
and comments are retrieved from the financial social media
H1 Each of the TDT-based feature sets can significantly platform Seeking Alpha. The Compustat database provides
improve the performance of financial disinformation the required dynamic and situational financial record about
detection. the subject firms in the news articles. We also monitor
H2 Algorithms using TDT-based features can outperform and record financial news articles’ cross-platform propaga-
previous ad hoc and cue-theory-based approaches for tion situations on StockTwits and Twitter. SEC files that
detecting financial disinformation. charge a series of financial disinformation provide us reliable
ground truth labels. Then, we utilize text mining methods
(e.g., LIWC, Latent Dirichlet Allocation (LDA), word2vec)
3.4 Design instantiation: A financial to process unstructured texts to generate feature sets like
disinformation detection system communication coherence and communication correspon-
dence. We choose five classical supervised learning models:
In this section, we discuss the overall system architecture naïve Bayes (NB), logistic regression (LR), support vector
of our design product. Our system utilizes both structured machine (SVM), random forest (RF), and gradient boost-
data (cross-platform propagation, financial ratios, and author ing (GB). These methods are representative classifiers that
information) and unstructured data (article content, com- have been adopted in various machine learning tasks. Last
ments, etc.). Figure 1 presents details of the overall system, but not least, we provide an example to interpret our model
which includes data collection, feature extraction, classifier results through the estimation of Shapley values for all
construction, and results explanation. features.
TA B L E 2 Quantitative metrics for TDT information
Information type Dimensions Metrics
T1: Communication Financial performance (annual; quarterly) (Abbasi et al., Asset quality index (AQI; AQI_q)
context and motive 2012; Dong et al., 2018; Huang et al., 2017; Liang et al., Asset turnover (AT; AT_q)
2016)
Cash flow earnings difference (CFED; CFED_q)
Days sales in receivables (DSIR; DSIR_q)
Depreciation index (DEPI; DEPI_q)
Gross margin index (GMI; GMI_q)
Inventory growth (IG; IG_q)
Leverage (LEV; LEV_q)
Operating performance margin (OPM; OPM_q)
Receivables growth (RG; RG_q)
Sales growth (SG; SG_q)
SGE expense (SGEE; SGEE_q)
Firm Size (Li et al., 2021; Siering, 2019) Market capitalization (MC)
Number of common shares of outstanding at yearend (CSHO)
Stock price at yearend (PRCC)
T2: Sender demeanor Knowledge (Liu et al., 2020) Author experience (tenure on Seeking Alpha, days)
Expertise (Cheung et al., 2013; Liu et al., 2020) Author article count
Author blog count
Interaction (Liu et al., 2020; Tsai & Ghoshal, 1998) Author interaction count (comment count)
Author interaction length (comment length)
Distribution (Vishwanath, 2015) Author activity scope (discussed stock’s count)
Number of subcommunities an article distributed
T3: Information from Diffusion on StockTwits/Twitter platform (Chiou & Tucker, Whether shared on StockTwits/Twitter
third parties 2018; Vosoughi et al., 2018)
The time lag between the publication of an article and its presence
on StockTwits/Twitter (seconds)
Number of users shared the article on StockTwits/Twitter
Number of posts about the article on StockTwits/Twitter
Average number of sharers’ followers on StockTwits/Twitter
Average number of sharers’ followings on StockTwits/Twitter
Average number of sharers’ posts on StockTwits/Twitter
Average number of sharers’ likes on StockTwits/Twitter
Average number of sharers’ tenure on StockTwits/Twitter (months)
Average number of replies about the article on Twitter
Average number of retweets about the article on Twitter
Average number of favorites about the article on Twitter
T4: Communication Explanatory coherence (Crossley et al., 2016) 93 LIWC features
coherence
Structural coherence (Crossley et al., 2019; Lin et al., 2011) Average sentence to sentence overlaps of noun synonyms
Average sentence to sentence overlaps of verb synonyms
Average latent semantic analysis cosine similarity between all
sentences
Average latent semantic analysis cosine similarity between adjacent
sentences
Average latent Dirichlet allocation divergence score between all
sentences
Average latent Dirichlet allocation divergence score between
adjacent sentences
Average word2vec similarity score between all sentences
Average word2vec similarity score between adjacent sentences
(Continues)
3168 ZHANG ET AL.
TA B L E 2 (Continued)
Information type Dimensions Metrics
T5: Correspondence External evidence (Kogan et al., 2019) Differences between standardized financial ratios and the article
information sentiment
Message receiver reaction (Vosoughi et al., 2018) Number of comments an article received
Differences between the sentiment of comments and the sentiment
of articles
Average sentiment of comments
4 DATA AND MODEL EVALUATION correspondence. We use LIWC software to generate explana-
SETTINGS tory coherence features and use TAACO software to calculate
the structural coherence features (Crossley et al., 2019). Then
4.1 Data collection and sampling strategy according to the article’s primary stock sticker and the date of
publication, we obtain the prior quarters’ and years’ financial
To implement and evaluate our proposed system, we col- records of the article’s subject firm from Compustat. If there
lected and aggregated data from several sources, including are missing values, we first look for the relevant data from
Seeking Alpha, SEC, StockTwits, Twitter, and Compustat. the subject firm’s financial statement. We further compute
We choose Seeking Alpha as the source of financial news various operating performance ratios of the firm proposed
articles. Seeking Alpha is the world’s largest crowdsourc- in Table 2. For rare cases of zero dominators, we fill in the
ing investing community, where contributors publish a large neutral value suggested by Dong et al. (2018); for instance,
number of analytical articles. News articles on Seeking Alpha since most ratios (e.g., AQI) capture the year-to-year changes
have a significant influence on the stock market (Chen et al., of accounting items, the neutral value would be 1. To cap-
2014), but the legitimateness of many articles on the plat- ture the news articles’ cross-platform propagation, we match
form is a concern. The SEC investigated and filed fraud news articles in the sample to longitudinal records of Twitter
charges in April 2017 against individuals and companies and StockTwits. The matching is based on the title, URL, and
for spreading financial disinformation, the majority of which publication date of articles; thus, the extracted metrics would
were published on Seeking Alpha. be strict and conservative.
All the investigated financial disinformation on the Seek- Overall, the constructed dataset contains 157 features of
ing Alpha platform has since been removed. Courtesy of 7247 financial news articles (381 financial disinformation
Clarke et al. (2021), we obtained the data of 383 financial articles and 6866 legitimate articles). The summary statis-
disinformation news articles that were used in their study. All tics of the main TDT features are reported in Supporting
these disinformation articles were published between August Information Table A3.
2011 and March 2014, so we narrowed down to the arti-
cles published in the same period on Seeking Alpha. Since
most financial disinformation identified by the SEC focuses 4.2 Evaluation setup
on firms in the pharmaceutical and healthcare industry, we
select all articles primarily discussing firms with the same As our goal is to identify financial disinformation, we label
SIC (standard industrial code) during the same time period. financial disinformation in our dataset as positive instances
The filter leads to a sample of 6866 news articles. We assume and legitimate news as negative instances. Our dataset is
that (following Clarke et al., 2021), the news articles that are highly imbalanced; there are only about 5.26% of finan-
NOT on the SEC’s charge list are legitimate news, a potential cial disinformation articles (positive instances). The high
source of noisy labeling. This potential noisy label problem imbalance degree between classes would hinder classifica-
can be alleviated by repeated undersampling, as discussed in tion algorithms’ performance (Japkowicz & Stephen, 2002).
the next section. Since we want to incorporate cross-platform Resampling is a typical strategy to alleviate the impacts of
diffusion patterns for financial disinformation detection, we data imbalance, and we utilized the random undersampling
dropped two articles that could not be determined on other method. We first randomly split our dataset into a training
platforms. Thus, in our sample, we have 381 financial dataset and a testing dataset at a ratio of 4:1. Then we under-
disinformation and 6866 legitimate financial news articles. sample the legitimate news in the training set to construct a
For each news article in the sample, we collect the title, balanced training set while we keep the data in the testing set
URL, date of publication, author profile, main contents, the same as the original naturally distributed condition. To
comment record, and the list of stock tickers included in take advantage of all the legitimate news data, we bootstrap
the article. Some structured data (e.g., author profile) can and repeat previous data splitting and resampling procedures
directly serve as metrics in Table 2. For unstructured data, 100 times with different random seeds. Thus, we obtain 100
we utilize various text mining methods (LIWC, LDA, LSA, pairs of training sets and testing sets as our experimental
word2vec, etc.) to extract metrics of content coherence and datasets.
The constructed datasets allow us to evaluate the perfor- that can automatically extract features from contents to
mance and robustness of our design systematically. Because demonstrate the necessity of feature design. We further con-
of the high imbalance degree between positive and nega- duct a series of sensitivity analyses to check the robustness
tive instances, we adopted precision, recall, and F1 score of the final design product. Finally, we present the Shapley
as performance measures. Another popular metric, classifier values for the five categories of the TDT features to illustrate
accuracy, was not appropriate in our context since it can be the interpretability of our model.
easily biased by imbalanced data. The definitions of the above
performance measures are given below. Besides the above
three metrics, we also calculate the area under the precision- 5 DESIGN EVALUATION RESULTS
recall curve (AUC). Precision-recall AUC is appropriate for
evaluating imbalanced dataset, and the default AUC value for 5.1 Main evaluation results
a random classifier would be the positive ratio in the sample
(0.0526 in this sample). We train and test all the models/classifiers on the 100 pairs
of experimental datasets described in Section 4.2. For each
Number of true positive instances model, we obtain 100 observations of the performance met-
Recall = , (1)
Number of actual positive instances rics (precision, recall, and F1 score) on the test datasets.
We conduct paired t-tests on F1 scores between TDT-based
models and the baseline model to estimate the statistical sig-
Number of true positive instances
Precision = , (2) nificance of the performance improvement. The paired t-test
Number of predicted positive instances is one of the typical and robust methods to compare per-
formance between machine learning models (Abbasi et al.,
2 ∗ Recall ∗ Precision 2012). The performances of these models are shown in
F1 score = . (3)
Recall + Precision Table 3. It can be seen that all TDT based feature categories
are essential for detecting financial disinformation.
From panel A of Table 3, we can see that each of the five
4.3 Design search categories of information alone can significantly improve the
baseline model performance in at least one classifier. The full
Hevner et al. (2004) emphasize that the process of design- TDT-based model outperforms models with any single set of
ing an IT artifact should be rigorous. It involves an iterative features. From panel B of Table 3, we can see that remov-
and incremental search of the design space and a sequence ing any feature category from the full model would lead to a
of optimizations to meet the final design objectives. In significant performance drop in at least one classifier. In par-
empirical studies that examine the impacts of fake finan- ticular, the feature set of communication context and motive,
cial news, Clarke et al. (2021) and Kogan et al. (2019) which has not been studied in previous research, is shown
proposed simple classification models that use only content to be the most effective among all different feature sets. We
features (LIWC features) to classify financial disinformation. note that for different classifiers, ensemble learning methods
We adopted that model using only LIWC features as the base- (RF and GB) perform better than other methods and bene-
line model. As discussed in Section 3, LIWC features capture fit the most from TDT-based features. The performances of
just the explanatory coherence of the news. As such, the base- ensemble learning methods also have smaller standard errors,
line model would be inadequate for financial disinformation which indicate the higher robustness of these models. Over-
detection. We sequentially add each TDT information cate- all, our analyses provide substantial evidence that TDT-based
gory to the model to evaluate its incremental improvement to models can outperform models that only use LIWC features.
the baseline model. After that, we use all feature sets to eval- As shown in Table 1, previous methods for detecting social
uate the overall performance of our proposed system. Besides and political news use source, content, and audience-related
the incremental performance, we also look at the ablation features. These features are subsets of our proposed TDT
performance by removing each feature set from the all fea- features. For comparison purposes, in Table A4, we sum-
tures model. By so, we can evaluate the unique contribution marize the feature categories used in previous representative
of each feature set to the final performance. Overfitting might models and their performance in our context. Under different
be a concern when the model includes all features. Hence, we comparisons, TDT-based models exhibit better performance.
conduct feature selection through recursive feature elimina- Therefore, we can confirm our two hypotheses.
tion with cross-validation (RFECV) and see whether a subset Besides the improved performance, we have some interest-
of features can produce a decent performance. In this process, ing observations from our analyses. On average, situational
we can gain further understanding of financial disinformation information categories like sender demeanor, communication
and provide useful feedback to underlining theories. context, and motives are more effective than static content-
Additionally, we compare our method against deep learn- related information categories like communication coherence
ing methods (e.g., long short-term memory [LSTM], bidi- and correspondence. This could indicate the dynamic nature
rectional encoder representations from transformers [BERT]) of financial disinformation.
3170
TA B L E 3 Main results
NB LR SVM RF GB
Prec. Rec. AUC F1 Prec. Rec. AUC F1 Prec. Rec. AUC F1 Prec. Rec. AUC F1 Prec. Rec. AUC F1
Panel A: Incremental performance

Baseline (LIWC) 0.222 0.931 0.210 0.358 0.439 0.951 0.420 0.600 0.437 0.954 0.419 0.598 0.606 0.954 0.581 0.741 0.608 0.968 0.590 0.746
LIWC+ThirdParties 0.237 0.916 0.221 0.375*** 0.436 0.949 0.417 0.597 0.432 0.952 0.414 0.593 0.609 0.954 0.583 0.743 0.613 0.968 0.595 0.750
LIWC+ContextMotives 0.368 0.962 0.356 0.532*** 0.495 0.961 0.478 0.653*** 0.484 0.962 0.467 0.643*** 0.868 0.992 0.861 0.925*** 0.864 0.997 0.861 0.925***
LIWC+SenderDemeanor 0.223 0.948 0.215 0.361 0.469 0.955 0.451 0.629*** 0.458 0.955 0.440 0.618*** 0.700 0.964 0.677 0.810*** 0.710 0.978 0.695 0.822***
LIWC+Coherence 0.221 0.947 0.212 0.357 0.473 0.957 0.455 0.633*** 0.475 0.957 0.457 0.634*** 0.614 0.950 0.586 0.745 0.623 0.970 0.606 0.758*
LIWC+Correspondence 0.226 0.924 0.213 0.363 0.466 0.959 0.449 0.627*** 0.461 0.960 0.445 0.622** 0.678 0.963 0.656 0.796*** 0.660 0.977 0.646 0.787***
All Features 0.379 0.961 0.366 0.543*** 0.545 0.962 0.527 0.695*** 0.507 0.959 0.488 0.662*** 0.900 0.994 0.895 0.945*** 0.883 0.998 0.881 0.936***
Panel B: Ablation performance
All-ThirdParties 0.346 0.974 0.338 0.509*** 0.527 0.965 0.511 0.681 0.510 0.964 0.494 0.666 0.895 0.988 0.885 0.939 0.869 0.997 0.867 0.928
All-ContextMotives 0.238 0.933 0.226 0.378*** 0.518 0.960 0.500 0.672** 0.494 0.958 0.476 0.651** 0.681 0.950 0.649 0.792*** 0.712 0.976 0.697 0.823***
All-SenderDemeanor 0.346 0.966 0.336 0.509*** 0.493 0.958 0.475 0.650*** 0.473 0.954 0.454 0.631*** 0.887 0.987 0.876 0.934* 0.851 0.997 0.849 0.918***
All-Coherence 0.348 0.959 0.335 0.510*** 0.485 0.959 0.467 0.643*** 0.465 0.957 0.447 0.625*** 0.881 0.989 0.872 0.932* 0.862 0.996 0.859 0.924**
All-Correspondence 0.377 0.969 0.367 0.542 0.524 0.961 0.506 0.677** 0.501 0.959 0.482 0.656* 0.903 0.987 0.891 0.942 0.866 0.997 0.864 0.927*
Note: Panel A compared with the baseline (LIWC features) model. Panel B compared with the model with all TDT features. Significant results are highlighted in bold.
Abbreviations: AUC, area under the precision-recall curve; GB, gradient boosting; LIWC, linguistic inquiry and word count; LR, logistic regression; NB, Naïve Bayes; Prec., precision; Rec, recall; RF, random forest; SVM, support vector
machine.
*p-value < 0.05; **p-value < 0.01; ***p-value < 0.001.
ZHANG ET AL.
5.2 Feature selection and an optimal subset
0.936
Abbreviations: AUC, area under the precision-recall curve; GB, gradient boosting; LIWC, linguistic inquiry and word count; LR, logistic regression; NB, Naïve Bayes; Prec., Precision; Rec., recall; RF, random forest; SVM, support vector
0.936
F1
of features
0.881
0.878
AUC
Feature selection is an integral component of a machine
learning algorithm design. While kernel theory provides
0.998
0.998
Rec.
directions for useful features, feature selection aims to deter-
mine an “optimal” set of features. We adopt the recursive
feature elimination with cross-validation (RFECV) method
0.883
0.880
Prec.
GB
13
for feature selection. RFECV would recursively drop features
according to the order of feature importance such as infor-
0.945
0.921
mation gain and then would measure the model performance
F1
through cross-validation. This recursive feature filtering pro-
0.895
0.853
cess would continue until it finds the “optimal subset of
AUC
features” or reaches the defined minimum number of fea-
tures. It is noted that different classifiers might select different
0.994
0.988
Rec.
combinations of features.
Table 4 reports the performances and the number of
0.900
0.863
Prec.
selected features for different classifiers. Overall, feature
RF
14
selection significantly improved the performance of most
classifiers, except ensemble learning methods like RF and
0.784***
GB. It should be pointed out that ensemble learning meth-
0.662
F1
ods already have embedded feature selection mechanisms,
thus the benefit of RFECV is minimal for these models. We
0.488
0.643
AUC
list the top 20 features in terms of their importance as well
as their importance scores (information gain, permutation
importance) in Supporting Information Table A5. Consis-
0.959
0.980
Rec.
tent with the observation in the previous section, features
belonging to communication context and sender demeanor
0.507
0.655
SVM
Prec.
categories rank higher on feature importance. We note that
61
some features (e.g., information from third parties and com-
0.775***
ment reactions) may not be immediately available at the 0.695
publication time of the financial disinformation. These lagged
F1
features, however, are usually not selected by the best classi-

fiers. Hence, the proposed method can still detect financial
0.527
0.632
AUC
disinformation in a timely manner.

0.962
0.989
Rec.
5.3 TDT-based models versus deep

learning models
0.545
0.639
Prec.
LR
46
Deep learning models can automatically extract features from

0.573***
Model performance with feature selection
raw data for different tasks. This raises the question of the
0.543
necessity of the theory-driven design. We compare our pro-

F1
posed model against several advanced deep learning models.

*p-value < 0.05; **p-value < 0.01; ***p-value < 0.001.
0.366
0.394
The first one is a standard deep neural network model; the

AUC
model structure and the detailed incremental/ablation perfor-

Note: Significant results are highlighted in bold.
mance of the model are reported in Supporting Information

0.961
0.956
Rec.
Table A6. The second model is the long short-term mem-

ory recurrent neural network (LSTM-RNN) (Hochreiter &
0.379
0.410
Prec.
Schmidhuber, 1997). The developed LSTM model contains

103
NB
an embedding layer with 200 dimensions mapping the con-

tents of financial news to an embedding, an LSTM layer with
Number of Features
64 neurons between two dense layers to process raw text and

extract features, and ended with a dense output layer to gener-
TA B L E 4
Best Subset
ate the prediction of the legitimateness of financial news. We

All (170)
machine.
also implement another deep learning model, BERT (Devlin

3172 ZHANG ET AL.
TA B L E 5 Comparison of TDT-based models with deep learning 1:100. Figure 2 displays our model results using test datasets
models with different imbalance ratios. For clarity, we report the
Model Precision Recall AUC F1 results of the RF classifier; trend patterns are similar for other
classifiers.
Baseline 1 (RF-LIWC features) 0.606 0.954 0.581 0.741
The results suggest that both the baseline model and the
Baseline 2 LSTM (plain text) 0.724 0.806 0.814 0.762
TDT model work well on balanced data. However, when the
Baseline 3 BERT (plain text) 0.741 0.839 0.821 0.787 ratio of financial disinformation to legitimate news decreases,
LSTM (plain text) + TDT features 0.896 0.975 0.903 0.934 we observe a substantial performance drop for the baseline
BERT (plain text) + TDT features 0.918 0.986 0.912 0.951 model due to the significant decrease in precision. On the
Our model (RF-all feature) 0.900 0.994 0.895 0.945 other hand, the TDT model performs reasonably well even
under the extreme distribution scenario.
Abbreviations: AUC, area under the precision-recall curve; BERT, bidirectional encoder
representations from transformers; LIWC, linguistic inquiry and word count; LSTM,
The prediction threshold value also impacts the perfor-
long short-term memory; RF, random forest; TDT, truth-default theory. mance of classifiers. Trained classifiers predict the proba-
bility value of an article being fake; a value above (below)
a certain threshold indicates fake financial news (nonfake).
et al., 2018), a state-of-the-art Natural Language Process- Depending on the specific situation, different threshold val-
ing (NLP) technique developed by the Google AI Language ues may be used. For example, in Kogan et al. (2019), the
team. We modify the BERT model to generate a binary pre- authors used a cutoff value of 0.2 to classify articles as
diction of the legitimateness of each financial news article. In being fake and classify articles with a probability less than
addition to using just the raw contents of financial news, we 0.01 as being nonfake. In other words, Kogan et al. (2019)
also include the TDT features in the deep learning models. emphasized precision over recall. To understand how differ-
TDT features are merged with the textual features generated ent threshold values, influence our model performance, we
by LSTM and BERT, and then all features are applied to train conduct experiments using cutoff values ranging from 0.4 to
an RF model, which performs best as shown in Section 5.1. 0.9. Figure 3 displays the model sensitivity with respect to the
Table 5 presents the performance of the deep learn- cutoff value. For clarity, we only report the results of the RF
ing models, deep learning models with TDT features, and classifier and trend patterns are similar to other classifiers. We
our proposed model (using random forest). These deep can observe that compared to the baseline model the TDT-
learning models were trained with 500 epochs, and the best- based model not only performs better but also face fewer
performing weights were saved in the final model. We find variations (especially in recall) across different threshold
that both LSTM (Baseline 2) and BERT (Baseline 3) yield values.
better results (F1 score) than Baseline 1 (with only LIWC
features). The improvement of the deep learning models indi-
cates that they might capture more predictive information 5.5 Further robustness checks
from the news content than LIWC. Meanwhile, the deep
learning models with only plain text are still inferior to our 5.5.1 Robustness to time period
TDT-based model, and it is worth noting that the deep learn-
ing models that include all the TDT features also perform Horne et al. (2019) suggest that some attributes of fake news
similarly to the TDT-based model. These results indicate are often dynamic and might change over time. For example,
that the proposed TDT features can provide a significant the authors of fake news can easily manipulate its content
contribution to financial disinformation prediction. Specifi- or adapt their writing styles gradually, while other objective
cally, our model’s improved performance further reinforces information such as the financial performance of the sub-
the importance of situational and contextualized information ject firm is more difficult to manipulate. Hence the predictive
in financial disinformation detection. performance of a machine learning model could decrease
as the model is tested on future data in a rapidly evolving
environment. Here we want to check how our financial dis-
5.4 Sensitivity analysis concerning the data information detection model, which is trained on data from a
imbalance ratio and prediction threshold given period, performs on the test data from another period.
In order to do that, we split our dataset according to the pub-
The performance of machine learning models may vary when lication date of news articles. We used financial news articles
the class distribution in the dataset is imbalanced (Japkow- published during 2011 and 2012 to train the classifiers and
icz & Stephen, 2002). Earlier studies (e.g., Clarke et al., selected the best models based on the performance in a test
2021) on financial disinformation used balanced datasets, dataset within the same period. We then applied those models
which might bias the actual model performance. To allevi- to predict the probability of being fake for the financial news
ate this concern, we conduct a series of experiments using articles published in 2013. The F1 scores of this experiment
test datasets with varying class distribution ratios. The ratio are shown in Figure 4. We can see that models with TDT
of fake and legitimate news in the manipulated test sets features perform better than the baseline LIWC models in
ranged from 1:1, 1:2, 1:10, 1:20 (reality situation), 1:50, to both earlier and later periods.
FIGURE 2 Performance sensitivity to the data imbalance ratio [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 3 Performance sensitivity to prediction threshold [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 4 Model performance over time [Color figure can be viewed at wileyonlinelibrary.com]
5.5.2 Robustness to ground truth labeling cles (except those that were investigated by the SEC) posted
by the same authors as ground-truth (legitimate news). This
Previous research on financial disinformation utilized differ- alternative approach will create a data sample that has no
ent criteria when determining the ground truth for legitimate within-author variation for fake and legitimate financial news;
news. In this study, we treated published financial news about hence we cannot use author-related features like authors’
the firms that have the same SIC code as those in the fake tenure experience on Seeking Alpha in our machine learning
class yet were not investigated by the SEC to be legitimate system. Nevertheless, it provides us with another opportunity
news. The alternative approach is to treat all the other arti- to further evaluate the robustness of our system to different
3174 ZHANG ET AL.
ground-truth labeling processes. We repeat our previous anal- can help users, especially nonexpert users, in understand-
yses as those in Section 5.1 and obtain similar results (see ing the output (Štrumbelj & Kononenko, 2014). This module
Supporting Information Table A7). will automatically generate a report for each prediction based
on the calculated Shapley values. For illustration, we report
the Shapley value analysis for the RF model with all TDT-
based features. Results on other classifiers show a similar
5.5.3 Firm-based cross-validation
pattern.
At the global level, summarizing the Shapley value for
Our previous results demonstrate the effectiveness of TDT
all predictions can illustrate the importance and actual effect
features, especially context and motive features, in detect-
of different features. We report the summary of the Shapely
ing financial disinformation. As the SEC investigation only
value analysis in Figure 5. In Figure 5, the bar graph on the
charged a limited number of firms (12 firms), one may con-
left panel reports the mean absolute Shapley value of the
cern that article-based cross-validation likely causes potential
top 20 features across all predictions; such a value and its
information leakage. The classifiers may simply identify
rank represent feature importance. We can see that the fea-
firms rather than detecting context and motives for dis-
ture importance measured by the Shapley value is similar to
information. To alleviate such a concern, we focus on a
the results in the previous feature selection analysis. In the
subsample of articles that are related to the 12 firms and
general context motive, sender demeanor, and coherence are
conduct the firm-based cross-validation. In the leave-one-firm
the most important feature categories. The bee swarm plot in
out cross-validation, we create 12 train/test pairs where each
the right panel of Figure 5 reports the distribution of actual
test set includes articles related to one of the 12 firms, yet
Shapley values, the more positive the value is, the more force
the corresponding training set does not include any article
it to predict an article as fake. The dot’s color from blue
for the firm. We repeat the main analysis on these train-
to red represents the low to high feature values. From the
ing/test pairs. Results are reported in Supporting Information
distribution, we can interpret the actual effects of different
Table A8. We observe that models with TDT features still
features.
outperform the baseline models (LIWC and BERT), and
MC is the most important feature. Low MC is associated
contextual motive features still contribute the most to the
with a higher chance of being fake. Similarly, we can inter-
model. These observations showed that even for a single firm,
pret that low asset turnover (AT) increases the probability of
financial disinformation is more likely to happen in certain
fake news. Such patterns are consistent with previous empir-
situations.
ical findings that argue small firms under financial pressure
are more likely to be associated with financial disinforma-
tion (Kogan et al., 2019; Siering, 2019). We notice that some
5.6 Explainability of the TDT-based model marginal effects observed here are actually opposite to find-
ings in financial statement fraud studies. For example, in
One of the advantages of a theory-driven design is its Abbasi et al. (2012), high AT indicates manipulation through
explainability. Rai (2020) emphasized the importance of fictitious sales. Such differences indicate that, to hide finan-
explainability and interpretability for machine learning. cial depressions from investors, firms might utilize online
According to Rai (2020), explainability can be attributed to disinformation as an alternative to costly financial statement
two parts: theoretical explanations of the model (i.e., use manipulation.
comprehensive and meaningful features) and technical expla- At the single article level, the Shapely value can pro-
nations of a prediction (i.e., provide technical explanations vide reasons for each prediction. In Figure 6, we illustrate
for a specific prediction). The theoretical explainability is the predictions for an investigated fake news (“6 Stocks
first achieved through the theory-driven design process. The Trading with Momentum: Can It Continue?”) targeting the
design of our proposed model is guided by the TDT and stock of GALE. The baseline model wrongly predicts it as
empirical IS and finance studies; thus, the TDT-based model legitimate, while the TDT model correctly classifies it as
can explain which theoretical aspects lead to the predicted fake. The bottom panel reports the explanation of the base-
result. line model (LIWC) prediction. The baseline model does
To generate explanations for the TDT-based model, we not detect enough abnormal patterns in LIWC features and
designed a results explanation module in our system. The estimates the probability of being fake as 32%. However,
module calculates Shapley values for each prediction, a con- the TDT model detects abnormal patterns in context motive
cept from coalitional game theory, to determine the features’ features (e.g., small firm size, low asset quality, AT) and
contributions to the predicted result. The Shapley value is sender demeanor features (high number of subcommuni-
the expected marginal contribution over all possible coali- ties the article distributed). The detected abnormal patterns
tions (Shapley, 1953). In other words, the absolute value of drive the TDT model to predict that the article has a prob-
the Shapely value represents feature importance, and the sign ability of 93% to be fake. We also provide another article
of the Shapely value indicates the force direction to posi- example to show how TDT-based features help to reduce
tive or negative prediction. Studies have demonstrated that the false-positive prediction in Supporting Information
interpreting machine learning output with the Shapley value Figure A1.
FIGURE 5 Global summary of the Shapely value [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 6 An example of a prediction explanation [Color figure can be viewed at wileyonlinelibrary.com]
6 DISCUSSION AND CONCLUSIONS sources. Rigorous evaluations demonstrated that our system
effectively and consistently outperforms baseline methods in
Fake news has received plenty of attention due to social detecting financial disinformation under different settings and
media’s substantial influence on the political and social conditions.
domains (Grinberg et al., 2019). The financial sector, includ-
ing financial markets and crowd-sourced financial services,
also suffers from the prevalence of fake news (Clarke et al., 6.1 Theoretical implications
2021). However, the attention on financial disinformation is
rather limited. Adopting TDT as the kernel theory, we pro- This study offers several theoretical implications for future
posed, implemented, and evaluated an explainable machine research. First, it showcases an instance of theory-driven
learning system for financial disinformation detection. Our machine learning system design to solve a relevant prob-
proposed system operationalized TDT to extract five cate- lem about financial disinformation detection. Kernel theories
gories of information (communication context and motives, play an important role in design science research (Liu et al.,
sender demeanor, information from third parties, communi- 2020). We ground our design based on TDT theory (our
cation coherence, correspondence information) from multiple kernel theory) and propose a comprehensive list of feature
3176 ZHANG ET AL.
categories with specific dimensions and measurements that ley value to show the marginal contribution of the selected
are likely to be associated with financial disinformation. This features on a predicted outcome. The Shapley values can
research illustrates how to transform abstract TDT constructs help readers to deeply understand the underlining mechanism
into computable metrics by combining theories and empir- of the proposed detection model and also provide feedback
ical findings. Our extensive design evaluations demonstrate to the adopted conceptual theory. We call for more future
the superior efficacy of our proposed IT artifact and the research that seamlessly unifies advanced machine learn-
relative importance of each of the feature categories when ing algorithms and extant theories to solve relevant business
predicting financial disinformation. We hope that this process problems.
can inform future studies of theory-driven design in the IS
community.
Second, this research augments our understanding of finan- 6.2 Practical implications
cial disinformation. Despite the growing interest in examin-
ing various impacts of financial disinformation (Clarke et al., One practical implication of this study is to help various
2021; Kogan et al., 2019), the detection method to iden- financial social media platform owners, such as Seeking
tify financial disinformation is rather limited. Measurement Alpha and StockTwits, in combating and mitigating fake
errors or misclassifications of machine learning techniques news threats on their platforms. Clarke et al. (2021) indicated
can introduce systematic biases to econometric estimations that the presence of fake news decreases users’ trust in all
and affect the validity of statistical inference. Our proposed news articles, thus eroding the overall social welfare. Many
system to detect financial disinformation is shown to be both social media platforms have allocated resources to com-
effective and robust under different scenarios. In this regard, bat fake news by hiring professional editors for censorship
this research fills a void and offers a gateway for future (Clarke et al., 2021). That approach may not be economi-
empirical studies that seek to examine the causal impacts of cally feasible in the long term. Our proposed system can be a
financial disinformation. useful complement to help editors make informed judgments
The evaluation process of design science research offers more efficiently. In addition, the ability to automatically
interesting observations that expand the previous theoreti- label news articles (fake or nonfake) will increase users’
cal understanding of disinformation. Our results highlight confidence about legitimate financial news on the platform,
the importance of situational cues (such as communication thus leading to higher user engagement and user satisfac-
context and motive, correspondence information) rather than tion with the financial social media platform (Kim et al.,
the textual content and the diffusion patterns of financial 2019).
news on social media platforms in detecting disinformation. Another practical implication is that this study can poten-
While Shu et al. (2020b) have found that propagation pat- tially help financial market regulators like the SEC, which is
terns are effective in detecting political/social disinformation, responsible for maintaining fair, orderly, and efficient finan-
we demonstrate that the diffusion patterns of news articles cial markets in the United States. Financial disinformation
on social media platforms are not as important as other can hurt the fairness and distort the operations of financial
factors such as communication context and motive, commu- markets (Kogan et al., 2019). Our proposed system can help
nication coherence, and correspondence information. Such financial market regulators in monitoring behaviors of differ-
situational cues are likely to play an increasingly important ent market participants and mitigating the negative impacts of
role in future financial disinformation detection because the financial disinformation on time.
content of a news article can be easily manipulated, and
the linguistic-feature-based detection methods might fail to
deliver the correct judgment (Shu et al., 2020a). The emer- 6.3 Limitations and future directions
gence of automatic text generators, like GPT-2 (Radford
et al., 2019), will further complicate things (in terms of dis- This research has several limitations. First, our ground-truth
information detection). Hence future studies on fake news data (especially the for-sure fake financial news) are lim-
need to consider the situational information surrounding the ited. The small number of positive instances might hinder
news. the training and the performance of machine learning. There
Last but not least, we provide intuitive explanations and are several ways to address this issue. One approach is to
interpretations of model results through both a theoretical hire professional auditors to inspect financial news on social
lens and a postmodel interpretable tool, the Shapley value media platforms and provide manual labeling of fake or legit-
analysis. Recent advances in machine learning techniques imate financial news. The other concern is the noisy label
(e.g., statistical learning and deep learning) have drasti- problem of legitimate news. Even though we validated our
cally improved prediction performance in many domains. results using samples created by different criteria, more work
However, the interpretability of the findings using those tech- may be needed to create a reliable training dataset. Finally,
niques is low. Doshi-Velez and Kim (2017) pointed out that a as a future research direction, we are eager to apply our
single or several metrics, such as accuracy, F-1 score, AUC, method to automatically detect fake news circulated online
etc., are not capable of capturing the complete story of most and then empirically evaluate the business impacts of such
real-world tasks. To fill this gap, we introduce the Shap- content.
AC K N OW L E D G M E N T S Deng, S., Huang, Z. J., Sinha, A. P., & Zhao, H. (2018). The interaction
The authors thank the entire review team for their valuable between microblog sentiment and stock return: An empirical exami-
nation. MIS Quarterly, 42(3), 895–918. https://doi.org/10.25300/MISQ/
suggestions and guidance to improve the study. The second
2018/14268
author was partially supported by the National Natural Sci- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT:
ence Foundation of China under Grant No. 72102106 and by Pre-training of deep bidirectional transformers for language under-
the National Social Science Foundation of China under Grant standing (arXiv preprint). arXiv1810.04805. https://arxiv.org/abs/1810.
No. 21ZDA033. Qianzhou Du is the corresponding author, 04805
Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media
and all authors contributed equally to the work.
data for corporate fraud detection. Journal of Management Informa-
tion Systems, 35(2), 461–487. https://doi.org/10.1080/07421222.2018.
1451954
ORCID Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of inter-
Xiaohui Zhang https://orcid.org/0000-0001-6379-0087 pretable machine learning (arXiv preprint). arXiv:1702.08608. https://
arxiv.org/abs/1702.08608
Qianzhou Du https://orcid.org/0000-0002-8080-2200
Feng, S., Banerjee, R., & Choi, Y. (2012). Syntactic stylometry for deception
Zhongju Zhang https://orcid.org/0000-0001-9200-2369 detection. In Proceedings of the 50th Annual Meeting of the Association
for Computational Linguistics: Short Papers-Volume 2 (pp. 171–175).
Association for Computational Linguistics.
REFERENCES Gordon, E. A., Henry, E., Peytcheva, M., & Sun, L. (2013). Discretionary
Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). Metafraud: A meta- disclosure and the market reaction to restatements. Review of Quantitative
learning framework for detecting financial fraud. MIS Quarterly, 36(4), Finance Accounting, 41(1), 75–110. https://doi.org/10.1007/s11156-012-
1293–1327. https://doi.org/10.2307/41703508 0301-4
Abrahams, A. S., Fan, W., Wang, G. A., Zhang, Z., & Jiao, J. (2015). An inte- Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D.
grated text analytic framework for product defect discovery. Production (2019). Fake news on Twitter during the 2016 US presidential election.
Operations Management, 24(6), 975–990. https://doi.org/10.1111/poms. Science, 363(6425), 374–378. https://doi.org/10.1126/science.aau2706
12303 Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in
Ahern, K. R., & Sosyura, D. (2015). Rumor has it: Sensationalism in financial information systems research. MIS Quarterly, 28(1), 75–105. https://doi.
media. The Review of Financial Studies, 28(7), 2050–2093. https://doi. org/10.2307/25148625
org/10.1093/rfs/hhv006 Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neu-
Amiram, D., Bozanic, Z., Cox, J. D., Dupont, Q., Karpoff, J. M., & Sloan, ral computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.
R. (2018). Financial reporting fraud and other forms of misconduct: A 8.1735
multidisciplinary review of the literature. Review of Accounting Studies, Horne, B. D., Nørregaard, J., & Adali, S. (2019). Robust Fake News Detec-
23(2), 732–783. https://doi.org/10.1007/s11142-017-9435-x tion Over Time and Attack. ACM Transactions on Intelligent Systems and
Bali, T. G., Bodnaruk, A., Scherbina, A., & Tang, Y. (2017). Unusual news Technology, 11(1), 1–23.
flow and the cross section of stock returns. Management Science, 64(9), Hovland, C. I., Janis, I. L., & Kelley, H. H. (1953). Communication and
4137–4155. https://doi.org/10.1287/mnsc.2017.2726 persuasion. Yale University Press.
Beneish, M. D. (1999). The detection of earnings manipulation. Financial Huang, S. Y., Lin, C.-C., Chiu, A.-A., & Yen, D. C. (2017). Fraud detection
Analysts Journal, 55(5), 24–36. https://doi.org/10.2469/faj.v55.n5.2296 using fraud triangle risk factors. Information Systems Frontiers, 19(6),
Castillo, C., Mendoza, M., & Poblete, B. (2011). Information credibility on 1343–1356. https://doi.org/10.1007/s10796-016-9647-9
twitter. Proceedings of the 20th International Conference on World Wide Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A sys-
Web (ACM), 675–684. tematic study. Intelligent Data Analysis, 6(5), 429–449. https://doi.org/
Cezar, A., Raghunathan, S., & Sarkar, S. (2020). Adversarial classifica- 10.3233/IDA-2002-6504
tion: Impact of agents’ faking cost on firms and agents. Production and Kane, E. (2004). Continuing dangers of disinformation in corporate account-
Operations Management, 29(12), 2789–2807. ing reports. Review of Financial Economics, 13(1-2), 149–164. https://doi.
Chen, H., De, P., Hu, Y. J., & Hwang, B.-H. (2014). Wisdom of crowds: The org/10.1016/j.rfe.2003.09.007
value of stock opinions transmitted through social media. The Review of Kim, A., & Dennis, A. R. (2019). Says who? The effects of presentation
Financial Studies, 27(5), 1367–1403. https://doi.org/10.1093/rfs/hhu001 format and source rating on fake news in social media. MIS Quarterly,
Cheung, C. M. K., Lee, M. K. O., & Lee, Z. W. Y. (2013). Understanding the 43(3), 1025–1039. https://doi.org/10.25300/MISQ/2019/15188
continuance intention of knowledge sharing in online communities of prac- Kim, A., Moravec, P. L., & Dennis, A. R. (2019). Combating fake news on
tice through the post-knowledge-sharing evaluation processes. Journal social media with source ratings: The effects of user and expert reputation
of the American Society for Information Science and Technology, 64(7), ratings. Journal of Management Information Systems, 36(3), 931–968.
1357–1374. https://doi.org/10.1002/asi.22854 https://doi.org/10.1080/07421222.2019.1628921
Chiou, L., & Tucker, C. (2018). Fake news and advertising on social media: A Kogan, S., Moskowitz, T. J., & Niessner, M. (2019). Fake news: Evidence
study of the anti-vaccination movement (Report. 25223). National Bureau from financial markets. Available at SSRN, 3237763.
of Economic Research. Kwon, S., Cha, M., & Jung, K. (2017). Rumor detection over varying time win-
Clarke, J., Chen, H., Du, D., & Hu, Y. J. (2021). Fake news, investor attention, dows. Plos One, 12(1), e0168344. https://doi.org/10.1371/journal.pone.
and market reaction. Information Systems Research, 32(1), 35–52. https:// 0168344
doi.org/10.1287/isre.2019.0910 Lappas, T., Sabnis, G., & Valkanas, G. (2016). The impact of fake reviews on
Crossley, S. A., Kyle, K., & Dascalu, M. (2019). The tool for the automatic online visibility: A vulnerability assessment of the hotel industry. Informa-
analysis of Cohesion 2.0: Integrating semantic similarity and text over- tion Systems Research, 27(4), 940–961. https://doi.org/10.1287/isre.2016.
lap. Behavior Research Methods, 51(1), 14–27. https://doi.org/10.3758/ 0674
s13428-018-1142-4 Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M.,
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The tool for the auto- Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., & Rothschild,
matic analysis of text cohesion (TAACO): Automatic assessment of local, D. (2018). The science of fake news. Science, 359(6380), 1094–1096.
global, and text cohesion. Behavior Research Methods, 48(4), 1227–1237. https://doi.org/10.1126/science.aao2998
https://doi.org/10.3758/s13428-015-0651-7 Lee, E.-J. (2020). Authenticity model of (mass-oriented) computer-mediated
communication: Conceptual explorations and testable propositions.
3178 ZHANG ET AL.
Journal of Computer-Mediated Communication, 25(1), 60–73. https://doi. niques. ACM Transactions on Intelligent Systems and Technology, 10(3),
org/10.1093/jcmc/zmz025 1–42. https://doi.org/10.1145/3305260
Lee, S. Y., Qiu, L., & Whinston, A. (2018). Sentiment manipulation in Shin, J., Jian, L., Driscoll, K., & Bar, F. (2018). The diffusion of misinforma-
online platforms: An analysis of movie tweets. Production and Operations tion on social media: Temporal pattern, message, and source. Computers in
Management, 27(3), 393–416. https://doi.org/10.1111/poms.12805 Human Behavior, 83, 278–287. https://doi.org/10.1016/j.chb.2018.02.008
Levine, T. R. (2014). Truth-default theory (TDT) a theory of human decep- Shu, K., Wang, S., & Liu, H. (2019). Beyond news contents: The role of
tion and deception detection. Journal of Language and Social Psychology, social context for fake news detection. In Proceedings of the Twelfth ACM
33(4), 378–392. https://doi.org/10.1177/0261927X14535916 International Conference on Web Search and Data Mining (pp. 312–320).
Levine, T. R., & McCornack, S. A. (2014). Theorizing about deception. Jour- Association for Computing Machinery.
nal of Language and Social Psychology, 33(4), 431–440. https://doi.org/ Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020a). Fak-
10.1177/0261927X14536397 eNewsNet: A data repository with news content, social context, and
Li, X., Zhu, H. H., & Zuo, L. (2021). Reporting technologies and textual read- spatiotemporal information for studying fake news on social media. Big
ability: Evidence from the XBRL mandate. Information Systems Research, Data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062
32(3), 1025–1042. https://doi.org/10.1287/isre.2021.1012 Shu, K., Mahudeswaran, D., Wang, S., & Liu, H. (2020b). Hierarchical prop-
Liang, D., Lu, C.-C., Tsai, C.-F., & Shih, G.-A. (2016). Financial ratios and agation networks for fake news detection: Investigation and exploitation.
corporate governance indicators in bankruptcy prediction: A comprehen- In Proceedings of the International AAAI Conference on Web and Social
sive study. European Journal of Operational Research, 252(2), 561–572. Media (pp. 626–637).
https://doi.org/10.1016/j.ejor.2016.01.012 Siering, M. (2019). The economics of stock touting during Internet-based
Lin, Z., Ng, H. T., & Kan, M.-Y. (2011). Automatically evaluating text coher- pump and dump campaigns. Information Systems Journal, 29(2), 456–483.
ence using discourse relations. In Proceedings of the 49th Annual Meeting https://doi.org/10.1111/isj.12216
of the Association for Computational Linguistics: Human Language Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and
Technologies-Volume 1 (pp. 997–1006). Association for Computational individual predictions with feature contributions. Knowledge and Informa-
Linguistics. tion Systems, 41(3), 647–665. https://doi.org/10.1007/s10115-013-0679-
Liu, X., Wang, G. A., Fan, W., & Zhang, Z. (2020). Finding useful solutions x
in online knowledge communities: A theory-driven design and multilevel Sussman, S. W., & Siegal, W. S. (2003). Informational influence in organiza-
analysis. Information Systems Research, 31(3), 731–752. https://doi.org/ tions: An integrated approach to knowledge adoption. Information Systems
10.1287/isre.2019.0911 Research, 14(1), 47–65. https://doi.org/10.1287/isre.14.1.47.14767
Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, compe- Tardelli, S., Avvenuti, M., Tesconi, M., & Cresci, S. (2021). Detecting inor-
tition, and Yelp review fraud. Management Science, 62(12), 3412–3427. ganic financial campaigns on Twitter. Information Systems, 103, 101769.
https://doi.org/10.1287/mnsc.2015.2304 https://doi.org/10.1016/j.is.2021.101769
Marett, K., George, J. F., Lewis, C. C., Gupta, M., & Giordano, G. (2017). Tsai, W., & Ghoshal, S. (1998). Social capital and value creation: The role of
Beware the dark side: Cultural preferences for lying online. Comput- intrafirm networks. Academy of Management Journal, 41(4), 464–476.
ers in Human Behavior, 75, 834–844. https://doi.org/10.1016/j.chb.2017. Vishwanath, A. (2015). Diffusion of deception in social media: Social con-
06.021 tagion effects and its antecedents. Information Systems Frontiers, 17(6),
McCornack, S. A., Morrison, K., Paik, J. E., Wisner, A. M., & Zhu, X. (2014). 1353–1367. https://doi.org/10.1007/s10796-014-9509-2
Information manipulation theory 2: A propositional theory of deceptive Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news
discourse production. Journal of Language and Social Psychology, 33(4), online. Science, 359(6380), 1146–1151. https://doi.org/10.1126/science.
348–377. https://doi.org/10.1177/0261927X14534656 aap9559
McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good Walls, J. G., Widmeyer, G. R., & El Sawy, O. A. (1992). Building an informa-
texts always better? Interactions of text coherence, background knowl- tion system design theory for vigilant EIS. Information Systems Research,
edge, and levels of understanding in learning from text. Cognition and 3(1), 36–59. https://doi.org/10.1287/isre.3.1.36
Instruction, 14(1), 1–43. https://doi.org/10.1207/s1532690xci1401_1 Williams, E. J., Morgan, P. L., & Joinson, A. N. (2017). Press accept to
Molnar, C. (2020). Interpretable machine learning. Lulu.com. update now: Individual differences in susceptibility to malevolent inter-
Pennycook, G., Cannon, T. D., & Rand, D. G. (2018). Prior exposure increases ruptions. Decision Support Systems, 96, 119–129. https://doi.org/10.1016/
perceived accuracy of fake news. Journal of Experimental Psychology: j.dss.2017.02.014
General, 147(12), 1865. https://doi.org/10.1037/xge0000465 Xiao, B., & Benbasat, I. (2011). Product-related deception in e-commerce:
Qian, F., Gong, C., Sharma, K., & Liu, Y. (2018). Neural user response gen- A theoretical perspective. MIS Quarterly, 35(1), 169–196. https://doi.org/
erator: Fake news detection with collective user intelligence. In IJCAI’18: 10.2307/23043494
Proceedings of the 27th International Joint Conference on Artificial Xiao, B., & Benbasat, I. (2015). Designing warning messages for detect-
Intelligence (pp. 3834–3840). AAAI Press. ing biased online product recommendations: An empirical investigation.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Information Systems Research, 26(4), 793–811. https://doi.org/10.1287/
Language models are unsupervised multitask learners. OpenAI Blog, 1(8), isre.2015.0592
9. Xu, Y., Pinedo, M., & Xue, M. (2017). Operational risk in financial ser-
Rai, A. (2020). Explainable AI: From black box to glass box. Journal of the vices: A review and new research opportunities. Production Operations
Academy of Marketing Science, 48(1), 137–141. https://doi.org/10.1007/ Management, 26(3), 426–445. https://doi.org/10.1111/poms.12652
s11747-019-00710-5 Yang, F., Liu, Y., Yu, X., & Yang, M. (2012). Automatic detection of rumor
Ruchansky, N., Seo, S., & Liu, Y. (2017). Csi: A hybrid deep model for on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining
fake news detection. In Proceedings of the 2017 ACM on Conference on Data Semantics (pp. 1–7). Association for Computing Machinery.
Information and Knowledge Management (pp. 797–806). Association for Yang, S., Shu, K., Wang, S., Gu, R., Wu, F., & Liu, H. (2019). Unsupervised
Computing Machinery. fake news detection on social media: A generative approach. In Proceed-
Shao, C., Ciampaglia, G. L., Varol, O., Yang, K.-C., Flammini, A., & Menczer, ings of the AAAI Conference on Artificial Intelligence (pp. 5644–5651).
F. (2018). The spread of low-credibility content by social bots. Nature AAAI Press.
Communications, 9(1), 1–9. https://doi.org/10.1038/s41467-018-06930- Zhou, L., & Zhang, D. (2008). Following linguistic footprints: Automatic
7 deception detection in online communication. Communications of the
Shapley, L. S. (1953). A value for n-person games. Contributions to the ACM, 51(9), 119–122. https://doi.org/10.1145/1378727.1389972
Theory of Games, 2(28), 307–317. Zhou, X., & Zafarani, R. (2020). A survey of fake news: Fundamental theories,
Sharma, K., Qian, F., Jiang, H., Ruchansky, N., Zhang, M., & Liu, Y. (2019). detection methods, and opportunities. ACM Computing Surveys, 53(5),
Combating fake news: A survey on identification and mitigation tech- 1–40. https://doi.org/10.1145/3395046
Zhou, X., Jain, A., Phoha, V. V., & Zafarani, R. (2019). Fake news early
detection: A theory-driven model. Digital Threats: Research and Practice, How to cite this article: Zhang, X., Du, Q., &
1, 1–25.
Zhang, Z. (2022). A theory-driven machine learning
system for financial disinformation detection.
S U P P O R T I N G I N F O R M AT I O N Production and Operations Management, 31,
Additional supporting information can be found online in the 3160–3179. https://doi.org/10.1111/poms.13743
Supporting Information section at the end of this article.

A Theory-Driven Machine Learning System For Financial Disinformation Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Theory-Driven Machine Learning System For Financial Disinformation Detection

Uploaded by

Copyright:

Available Formats

Received: 26 February 2021 Accepted: 26 April 2022

A theory-driven machine learning system for financial

Xiaohui Zhang1 Qianzhou Du2 Zhongju Zhang1

2.1 Online deception and fake news 2.2 Financial disinformation

3.2.2 Sender demeanor We use propagation on StockTwits, a social media platform

needs external evidence or message receiver reactions to mea-

3.3 Testable hypotheses

According to ISDT, testable hypotheses mainly concern the

Information type Dimensions Metrics

Information type Dimensions Metrics

Panel A: Incremental performance

5.2 Feature selection and an optimal subset

features, however, are usually not selected by the best classi-

disinformation in a timely manner.

5.3 TDT-based models versus deep

Deep learning models can automatically extract features from

necessity of the theory-driven design. We compare our pro-

posed model against several advanced deep learning models.

The first one is a standard deep neural network model; the

model structure and the detailed incremental/ablation perfor-

mance of the model are reported in Supporting Information

Table A6. The second model is the long short-term mem-

Schmidhuber, 1997). The developed LSTM model contains

an embedding layer with 200 dimensions mapping the con-

64 neurons between two dense layers to process raw text and

ate the prediction of the legitimateness of financial news. We

also implement another deep learning model, BERT (Devlin

FIGURE 6 An example of a prediction explanation [Color figure can be viewed at wileyonlinelibrary.com]

You might also like