Professional Documents
Culture Documents
Extended abstract
This project finds itself at the heart of a few distinct areas of study: Creator Economy; Platform Studies;
Parasociality; and XXX. In particular, I follow
The rise of platforms as a ubiquitous mode of economic and social organization has offered up
new ways to understand the interactions between consumers and producers. I follow x’s imposition to
study the platformization of cultural production, which can be understood as the way that platform
architecture shifts the way that cultural production occurs both in these of technological affordances but
also through changing incentive structures.
In particular, platforms have offered up increased avenues for communication between fans and those
who product entertainment content. While communication between celebrities and their fans might of
required travel to a meet and greet or the mailing in of fan letter, social media has lowered the bar to
make ones opinion heard.
Twitch.TV, a live-streaming platform, has received increased studies over the last few years.
Research has focused on
In particular, live-streaming provides a distinct set of technical and social affordances that shape
cultural expression. These affordances all emerge from its status as a live product that has minimal, if
any, lag between the content producer and viewer. Streamers, those who produce content, can
interface with viewers via a live-chat, referred to as a collective through the term “chat.” The
conversation between the chat and streamer can both be something that shapes content under the
hood – for example, a streamer doing something in a game because chat mentioned it – or the content
itself – through the imposition of the chat via a transparent overlay or prolonged sections of a stream
where the streamer is just talking to chat. This constructs chat, as not an ancillary part of a line-stream
but essential to its existence.
As explored within this paper, TTS has been essential in the forming of distinct communities
across Twitch.TV. In particular, IRL streaming, an abbreviation for In real life streaming, appropriated TTS
to constitute a direct mode of communication between not only the audience and the streamer, but
also people wich were around the streamers. This paper will focus on two streamers, Ice Poseidon and
Asian Andy, who live-streamed themselves out in public via a wearable backpack. Viewers via donations
could not only have their messages played to the streamer, the message was played out-loud to anyone
in the vicinity of the streamer. This naturally resulted in viewers spending copious amount of money to
put the streamer in uncomfortable social situations, thereby allowing streamers to monetize shame. In
its most controversial forms, TTS has resulted in viewers strategically wording their messages to get past
content filters and vocalize racial slurs to everyone around the audience. This has resulted in bans by the
platform for streamers who failed to sufficiently moderate or prevent TTS from platforming hate-speech.
Secondly, I explore the way that Text-To-Speech By connecting the schematic affordances of the
technology, and its manifestation in the streaming content itself, this paper argues that Text-To-Speech
reshapes the relationship between the audience and streamer in ways that distinguishes streaming for
its prior antecedents.
These take many forms but there are four which are obvious:
There i
This, however, differs from its’ original antecedents in a litany of ways, insofar as it overlays voice on-
top of the already existing broadcast. It most clearly differs from letters to the editor which would have
been screened, censored, and even sometimes manufactured. The process involved in publication also
create a time-lag of a week in its original newspaper context, where-in the writer didn’t see themselves
as truly engaging in a concurrent conversation. Although closer to TTS, the radio-call-in section requires
an ongoing screening process, that often involved a whole other employee deciding which calls to put
forward and which to not. Often-times, an individual might call and not make it through to the radio-
broadcast itself. It also often-times, was sectioned off into a specific section of the broadcast that had
already existing topics assumed.
TTS is run through a screening process of sorts, one which is algorithmic and largely functions to
censor hate-speech, that then spits out the text in an algorithmically constructed voice with more recent
iterations allowing chatters to pick the voice from a range of options – Obama, Trump, SpongeBob,
Kanye, and other “voices” being the algorithmic voice for the chatters. TTS, as expressed in the voice
itself, then is played over the stream that is already ongoing. The streamer might be playing a game, or
they might being talking about a very serious social issue.
In its original context, the radio-caller must go through a process. There is an assistant who screens the
call. They must call in response
Text-To-Speech, which will be referred to as TTS in following usages, can be simply understood as
the transformation of textual messages into vocalized speech. This relies upon the capacity for a
computer to reproduce human voice by translating the phonetic expression of a written text into a
sound fragment. Within the confines of livestreaming, TTS refers to a specific affordance present within
some live-streams that allows users to pay a specific amount of money in exchange for their message
being played on the stream. TTS is not available on every-stream, as it must be explicitly offered and
requires the use of 3rd party donation software, leading to a distinct political economy that exists both
collaboratively and separately from the Twitch Bits system (Partin 2020). The monetary donation
threshold for TTS can change according the size of the channel, with larger channels charging more for
TTS. As explored later in the paper, Streamlabs and StreamElements, and other 3rd party platforms,
moderate exactly what those messages can say due to a long-history of TTS being used to spout racial
slurs.
In this context, I argue that the emergence, politicization, and retro-active sanitization of IRL
streaming on the platform Twitch.TV emerged from the interactive possibilities of Text-To-Speech. This
analysis is split into two parts. First, I focus my analysis on the platform affordances for Text-To-Speech
via applications such as _StreamElements_. Secondly, I explore the way that Text-To-Speech produces
distinct relationships between the audience and the streaming in the case of three distinct streamers. By
connecting the schematic affordances of the technology, and its manifestation in the streaming content
itself, this paper argues that Text-To-Speech reshapes the relationship between the audience and
streamer in ways that distinguishes streaming for its prior antecedents.
TTS, however, did not emerge out of the aether on Twitch.TV, but exists in relationship to a
longer history of speech synthesizers
mone
In the case of its most earliest iterations, this took the form of individual sounds being painstaking
created via a variety of means.
Kratzenstein
“Inspired by a competition sponsored by the Imperial Academy of Sciences at St. Petersburg in 1780,
Kratzenstein submitted a report that detailed the design of five organ pipe-like resonators that, when
excited with the vibration of a reed, produced the vowels /a, e, i, o, u/ (Kratzenstein, 1781). Although
their shape bore little resemblance to human vocal tract configurations, and they could produce only
sustained sounds, the construction of these resonators won the prize and marked a shift toward
scientific investigation of human sound production.”
von Kempelen
“von Kempelen - a Hungarian engineer, industrialist, and government official - used his spare time and
mechanical skills to build a talking machine far more advanced than the five vowel resonators
demonstrated by Kratzenstein. The final version of his machine was to some degree a mechanical
simulation of human speech production. It included a bellows as a “respiratory” source of air pressure
and air flow, a wooden “wind” box that emulated the trachea, a reed system to generate the voice
source, and a rubber funnel that served as the vocal tract”
“The sound quality was child-like, presumably due to the high fundamental frequency of the reed and
the relatively short rubber funnel serving as the vocal tract. In an historical analysis of von Kempelen’s
talking machine, Dudley and Tarnoczy (1950) note that this quality was probably deliberate because a
child’s voice was less likely to be criticized when demonstrating the function of the machin”
The materiality of the body in many ways was still reproduced in these retellings. Joseph Faber’s speech
synthesizer of 1845 spoke with a German accent. As Dudley and Tarnoczy note, von Kempelen’s talking
machine was most likely chosen to sound child-like because it would likely face less criticism for any
malfunctions or choppiness.
Importantly, I focus my analysis on the platform affordances for Text-To-Speech via applications such as
StreamElements.
a.