TTS Extended Abstract

Research on Twitch.
TV has explored a litany of
Extended abstract
This project finds itself at the heart of a few distinct areas of study: Creator Economy; Platform Studies;
Parasociality; and XXX. In particular, I follow
The rise of platforms as a ubiquitous mode of economic and social organization has offered up
new ways to understand the interactions between consumers and producers. I follow x’s imposition to
study the platformization of cultural production, which can be understood as the way that platform
architecture shifts the way that cultural production occurs both in these of technological affordances but
also through changing incentive structures.
In particular, platforms have offered up increased avenues for communication between fans and those
who product entertainment content. While communication between celebrities and their fans might of
required travel to a meet and greet or the mailing in of fan letter, social media has lowered the bar to
make ones opinion heard.
Livestreaming, in particular, provides a distinct
Twitch.TV, a live-streaming platform, has received increased studies over the last few years.
Research has focused on
In particular, live-streaming provides a distinct set of technical and social affordances that shape
cultural expression. These affordances all emerge from its status as a live product that has minimal, if
any, lag between the content producer and viewer. Streamers, those who produce content, can
interface with viewers via a live-chat, referred to as a collective through the term “chat.” The
conversation between the chat and streamer can both be something that shapes content under the
hood – for example, a streamer doing something in a game because chat mentioned it – or the content
itself – through the imposition of the chat via a transparent overlay or prolonged sections of a stream
where the streamer is just talking to chat. This constructs chat, as not an ancillary part of a line-stream
but essential to its existence.
Text-to-speech is a particular expression of this relationship between viewers and live-streamers

that demands study on its own terms. Text-To-Speech, which will be referred to as TTS in following
usages, can simply be understood as the transformation of textual messages into vocalized speech. In
the context of live-streaming, TTS refers to the ability for viewers to have their messages read out-loud
in real time on the stream itself in exchange for a certain amount of money. TTS is thus dependent upon
the capacity for a computer to reproduce human voice by translating the phenetic expression of a
written text into a sound-fragment. TTS, however, is a specific affordance that exists on some streams
but necessarily on other streams. TTS is not available on every-stream, as it must be explicitly offered
and requires the use of 3rd party donation software, leading to a distinct political economy that exists
both colaboratively and separately from the Twitch Bits system (Partin 2020). However, as William
Partin brilliantly explores, platforms often-times capture 3 rd party software by providing in-built avenues
to capture that revenue. TTS itself has somewhat insulated from pure-capture while still offering
monetization via Twitch Bits. The monetary threshold for TTS changes across different channels, with
some requiring as little as a dollar.
As explored within this paper, TTS has been essential in the forming of distinct communities
across Twitch.TV. In particular, IRL streaming, an abbreviation for In real life streaming, appropriated TTS
to constitute a direct mode of communication between not only the audience and the streamer, but
also people wich were around the streamers. This paper will focus on two streamers, Ice Poseidon and
Asian Andy, who live-streamed themselves out in public via a wearable backpack. Viewers via donations
could not only have their messages played to the streamer, the message was played out-loud to anyone
in the vicinity of the streamer. This naturally resulted in viewers spending copious amount of money to
put the streamer in uncomfortable social situations, thereby allowing streamers to monetize shame. In
its most controversial forms, TTS has resulted in viewers strategically wording their messages to get past
content filters and vocalize racial slurs to everyone around the audience. This has resulted in bans by the
platform for streamers who failed to sufficiently moderate or prevent TTS from platforming hate-speech.
Central to the conversation surrounding text to speech is three elements: moderation,

monetization and parasocial community construction. Moderation is a difficult, arduous and almost
impossible process at the scale of contemporary social platforms. As Gillespie highlights in Custodians of
the Internet “Moderation is a prism for understanding what platforms are,” and this is no different for
live-streaming, and the peripheral affordances associated with it. Livestreaming itself makes moderation
even more difficult as the live-nature means there is little gap between production and circulation. This
most notably has taken place in conversation over bans of streamers for lack of moderation. Second,
research on Twitch.TV directly engages with the way that donations, subscriptions, and the creation of
an alternative currency all construct unique forms of monetization compared to other social platforms.
Despite this ongoing conversation, there lack substantial research on TTS as a distinct form of this
monetization. Third,TTS offers new avenues through which viewers can construct relationships with
content creators. This takes up the question of parasocial relationship, a key element scholars isolate as
essential to successful content creators. Parasocial relationship can be understood as a “a generalized
emotional and cognitive involvement with the character that can occur outside the context of any
particular media exposure situation,” something which can be understood in relationship to both
fictional character and real people. Research has shown that perceived authenticity, relatability, and
similarity all influence viewer retention and support. By offering users the ability to directly engage with
their favorite streamers, TTS allows for users to imagine a more authentic relationship with different
microcelebrities. This both complicates and supplements ongoing debates about the degree to which
engagement constitutes real or parasocial relationships
In this context, I argue that the emergence, politicization, and retro-active sanitization of IRL
streaming on the platform Twitch.TV emerged from the interactive possibilities of Text-To-Speech. This
analysis is split into two parts. First, I focus my analysis on the platform affordances for Text-To-Speech
via applications such as StreamElements. Secondly, I explore how TTS produces distinct relationships
between the audience and the streaming in the case of two different streamers. The monetization of
shame and the construction of an adversarial relationship between the viewer and the audience are
supplemented by the ability for viewers to directly impose their message onto both the stream, and the
space around the streamer. This not only constitutes a different relationship between the streamer and
the audience, but allows for the existence of the stream to transform social space. In this sense, I argue
that it functions as an extension of platform logics to the public sphere, regardless of the consent of
individual people surrounding the streamer. Secondly, I explore how the schematic affordances of the
technology distinguish themselves from prior antecedents that allowed conversation between content
consumers and producers, notably letters to the editor and call-in sections on radio.
Secondly, I explore the way that Text-To-Speech By connecting the schematic affordances of the
technology, and its manifestation in the streaming content itself, this paper argues that Text-To-Speech
reshapes the relationship between the audience and streamer in ways that distinguishes streaming for
its prior antecedents.
This paper takes up two distinct conversations
- Parasociality/the interactions between

- Platformization of cultural production
These take many forms but there are four which are obvious:
has offered up new interactions between consumers and cultural producers.
There i
This, however, differs from its’ original antecedents in a litany of ways, insofar as it overlays voice on-
top of the already existing broadcast. It most clearly differs from letters to the editor which would have
been screened, censored, and even sometimes manufactured. The process involved in publication also
create a time-lag of a week in its original newspaper context, where-in the writer didn’t see themselves
as truly engaging in a concurrent conversation. Although closer to TTS, the radio-call-in section requires
an ongoing screening process, that often involved a whole other employee deciding which calls to put
forward and which to not. Often-times, an individual might call and not make it through to the radio-
broadcast itself. It also often-times, was sectioned off into a specific section of the broadcast that had
already existing topics assumed.
TTS is run through a screening process of sorts, one which is algorithmic and largely functions to
censor hate-speech, that then spits out the text in an algorithmically constructed voice with more recent
iterations allowing chatters to pick the voice from a range of options – Obama, Trump, SpongeBob,
Kanye, and other “voices” being the algorithmic voice for the chatters. TTS, as expressed in the voice
itself, then is played over the stream that is already ongoing. The streamer might be playing a game, or
they might being talking about a very serious social issue.
In its original context, the radio-caller must go through a process. There is an assistant who screens the
call. They must call in response
Text-To-Speech, which will be referred to as TTS in following usages, can be simply understood as
the transformation of textual messages into vocalized speech. This relies upon the capacity for a
computer to reproduce human voice by translating the phonetic expression of a written text into a
sound fragment. Within the confines of livestreaming, TTS refers to a specific affordance present within
some live-streams that allows users to pay a specific amount of money in exchange for their message
being played on the stream. TTS is not available on every-stream, as it must be explicitly offered and
requires the use of 3rd party donation software, leading to a distinct political economy that exists both
collaboratively and separately from the Twitch Bits system (Partin 2020). The monetary donation
threshold for TTS can change according the size of the channel, with larger channels charging more for
TTS. As explored later in the paper, Streamlabs and StreamElements, and other 3rd party platforms,
moderate exactly what those messages can say due to a long-history of TTS being used to spout racial
slurs.
Different streams might have different
In this context, I argue that the emergence, politicization, and retro-active sanitization of IRL
streaming on the platform Twitch.TV emerged from the interactive possibilities of Text-To-Speech. This
analysis is split into two parts. First, I focus my analysis on the platform affordances for Text-To-Speech
via applications such as _StreamElements_. Secondly, I explore the way that Text-To-Speech produces
distinct relationships between the audience and the streaming in the case of three distinct streamers. By
connecting the schematic affordances of the technology, and its manifestation in the streaming content
itself, this paper argues that Text-To-Speech reshapes the relationship between the audience and
streamer in ways that distinguishes streaming for its prior antecedents.
TTS, however, did not emerge out of the aether on Twitch.TV, but exists in relationship to a
longer history of speech synthesizers
mone
Individual parts of a phrase or word are broken down into
In the case of its most earliest iterations, this took the form of individual sounds being painstaking
created via a variety of means.
Kratzenstein
“Inspired by a competition sponsored by the Imperial Academy of Sciences at St. Petersburg in 1780,
Kratzenstein submitted a report that detailed the design of five organ pipe-like resonators that, when
excited with the vibration of a reed, produced the vowels /a, e, i, o, u/ (Kratzenstein, 1781). Although
their shape bore little resemblance to human vocal tract configurations, and they could produce only
sustained sounds, the construction of these resonators won the prize and marked a shift toward
scientific investigation of human sound production.”
von Kempelen
“von Kempelen - a Hungarian engineer, industrialist, and government official - used his spare time and
mechanical skills to build a talking machine far more advanced than the five vowel resonators
demonstrated by Kratzenstein. The final version of his machine was to some degree a mechanical
simulation of human speech production. It included a bellows as a “respiratory” source of air pressure
and air flow, a wooden “wind” box that emulated the trachea, a reed system to generate the voice
source, and a rubber funnel that served as the vocal tract”
“The sound quality was child-like, presumably due to the high fundamental frequency of the reed and
the relatively short rubber funnel serving as the vocal tract. In an historical analysis of von Kempelen’s
talking machine, Dudley and Tarnoczy (1950) note that this quality was probably deliberate because a
child’s voice was less likely to be criticized when demonstrating the function of the machin”
The materiality of the body in many ways was still reproduced in these retellings. Joseph Faber’s speech
synthesizer of 1845 spoke with a German accent. As Dudley and Tarnoczy note, von Kempelen’s talking
machine was most likely chosen to sound child-like because it would likely face less criticism for any
malfunctions or choppiness.
However, insofar as TTS can be understood
Importantly, I focus my analysis on the platform affordances for Text-To-Speech via applications such as
StreamElements.
largely due to an ongoing tension with
What questions can
Parasociality literature review
a.

TTS Extended Abstract

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TTS Extended Abstract

Uploaded by

Copyright:

Available Formats

Research on Twitch.

TV has explored a litany of

Livestreaming, in particular, provides a distinct

Text-to-speech is a particular expression of this relationship between viewers and live-streamers

Central to the conversation surrounding text to speech is three elements: moderation,

This paper takes up two distinct conversations

- Parasociality/the interactions between

has offered up new interactions between consumers and cultural producers.

Different streams might have different

Individual parts of a phrase or word are broken down into

However, insofar as TTS can be understood

largely due to an ongoing tension with

What questions can

Parasociality literature review

You might also like