Professional Documents
Culture Documents
emerging technology
June 2018
This document is written by student Robbert Willem de Kruijff who declares to take full
I declare that the text and the work presented in this document is original and that no sources other
than those mentioned in the text and its references have been used in creating it.
The Faculty of Economics and Business is responsible solely for the supervision of completion of the
2
Preface
This thesis is written as part of the Master program: Business Administration – Digital
Business track, at the University of Amsterdam. The Digital Business track is considered to
be a boundary spanner between the digital world, the market and other disciplines. This thesis
concerns an empirical research about the acceptance of the Smart Speaker that has the goal to
bring the knowledge acquired during the Master’s program into practice.
I’m really grateful for the valuable comments on earlier drafts of this thesis that were given
by my supervisor Andrea Ganzaroli, as well as for the clarifying insights in the field of
Technology Acceptance. Furthermore, I would like to thank him for the great support he
3
Abstract
technology known as the Smart Speaker defined as: “A hands-free speaker powered with
digital voice assistant using two-way voice computing technology that is highly connected
In the literature review, two fields of theory have be selected and applied being; in the
first place technologies embedded in the Smart Speaker (Spoken Language Dialog System,
Voice Search as application and Smart Technologies) and secondly the Technology
Acceptance Model (TAM), further developments of this concept (TAM2, UTAUT and
Factors affecting one’s Use Intention based on the TAM and the development of the
amongst 182 respondents with the following research question in mind: What are motivations
and perceptions that affect people’s intention of adopting the AI-based Smart Speaker?
This resulted in several factors that are proven to be significantly affecting (directly
and indirectly) the Use Intention of the Smart Speaker such as: Social Influence, Perceived
concern the Interface Familiarity (the new era of human-computer interface with voice
control) and Apprehensiveness (trust and privacy issues when using this AI-based
technology).
Finally, future research could be conducted in the Virtual Assistant software, not
constricted to the Smart Speaker. Furthermore, one could look at specific contexts (such as
the elderly or a work environment), at advertising off-screen, at the Smart Speaker with the
ecosystems of large technology firms, at the actual use of the Smart Speaker in the
Netherlands and at researches more experimental designs (truly experiencing the Smart
4
Table of Contents
Preface 3
Abstract 4
Abbreviations 8
1. Introduction 9
1.3. Expectations 10
1.4. Barriers 11
2. Literature Review 16
2.2.2. TAM 2 23
2.2.3. UTAUT 23
2.2.4. HMSAM 24
3.1. Variables 28
5
3.1.3. Social Influence 29
3.1.5. Apprehensiveness 30
3.3. Hypotheses 33
4. Method 34
4.1. Sampling 34
4.2. Measures 35
5. Results 38
5.1.1 Demographics 38
5.2.1. Data 39
5.2.2. Normality 40
5.2.5. Reliability 41
5.2.6. Correlation 42
6
5.3.2. Conclusions from Multiple Regression Analysis 44
6. Discussion 52
6.2. Limitations 55
7. Conclusions 56
7.1. Contributions 56
8. Bibliography 59
9. Appendices 64
7
Abbreviations
APP Apprehensiveness
DV Dependent Variable
IF Interface Familiarity
IV Independent Variable
H3 Hypothesis 3
M Mediator / Mean
PE Perceived Entertainment
PU Perceived Usefulness
SD Standard Deviation
SI Social Influence
UI Use Intention
Technology
WS Web Skills
8
1. Introduction
‘OK Google’ …‘What’s playing tonight?’, Google Assistant will show films at your local
cinema. And if you add ‘We’re planning on bringing the kids’, Google Assistant will know to
serve up show times for kid-friendly films. You could then say ‘Let’s see Jungle Book’, and
the assistant will purchase tickets for you” (Dale, 2016). This statement in the research of
Dale (2016) is a great example of voice-controlled technologies becoming more and more
Within this piece of innovation a couple of prosperous technologies come together in the
comfort of our homes; i.e. the Smart Speaker is voice-controlled, smart (meaning that it is
possible to connect it to other smart devices) and connected to the internet (providing a
doorway to endless possibilities). A collaborative study of NPR and Edison Research (The
Smart Audio Report, 2017) shows that nowadays 16% of Americans older then 18 already
They do not only own one, but these people also seem highly satisfied. The study shows that
out of the people that own a Smart Speaker 65% of the questioned volunteers could not
imagine a life without one. Due to NPR and Edison Research “Smart Speakers are changing
behaviours and forming new habits” (The Smart Audio Report, 2017).
In 2014 Amazon was the first one to introduce a commercialized wireless playback device
that featured voice activated digital assistants. This so-called Smart Speaker is on the rise and
the numbers as mentioned by the Smart Audio Report tell us the same story. Since 2014
Google came with it’s Google Home and the Google Assistant as digital assistant software
9
(Lerner, 2017), Apple couldn’t stay behind and recently launched their Apple HomePod
(Jaffe, 2018).
Not only the big technology firms, but also several entrepreneurs tap into this technology with
more “niche” applications of the Smart Speaker such as SMARTY. SMARTY is a virtual
assistant created by a startup called Siliconic Home, uniquely based on the voice of children;
SMARTY positions itself as a kids-friendly Smart Speaker. The patented natural language
processing technology can recognize the voices of kids, which have a significant different
Another example is Olly, which is being created by a startup called Emotech in London. Olly
is again different compared to the other smart speakers. This virtual assistant actually has a
personality that can develop and evolve as the result of conversations with the consumer. This
understanding whether or not information is appropriate for the user (Montgomery, 2016).
Both these entrepreneurs, focusing on specific niches, confirm the great possibilities and the
1.3. Expectations
The research of Gartner shows that expectations are growing significantly and looking at their
Hype Cycle, the Virtual Digital Assistant (which is the software, and thus the backbone of the
presented in Appendix 1. They predict that the next 5-10 years the technology is going to
reach the productivity platform (see Appendix 1). The prospects of this Smart Speaker are
tremendous and great growth is shown already. Gartner (2016) forecasts that, by the end of
2020, end-user spending on VPA (Virtual Personal Assistant) -enabled wireless speaker will
10
1.4. Barriers
A white paper of Symantec explains the main benefit of using the Smart Speaker is that the
voice-activated assistant can access all the intelligence in the backend, translating every
request into an appropriate task. On the other hand it also has a downside, as the same report
mentions privacy and trust risks. This is illustrated this with metaphors like: “The attack of
the curious child” (children ordering stuff online) and “the tale of the mischievous neighbor”
(whispering requests through the window) both explaining the issues around trust in these
devices (Wueest, 2017). See Appendix 2 for an example of a story of such trust issues in a
news article.
The Smart Speaker can create great opportunities, but obviously there are also barriers for the
adoption of this technology. For instance, research of Voice Labs states the Smart Speaker is
still seen as a luxury product and consumers often still see it as another confusing or
redundant platform or interface (Marchick, 2017). Moreover, a chart from Statista, (based on
data from NPR and Edison Research; The Smart Audio Report, 2017) shows reasons for not
owning a Smart Speaker such as: “too expensive, not enough information available about the
Smart Speaker, not going to use it enough, worried about hackers, bothered that the speaker is
always listening, spend more money with it and listening by the government”. Thus, it can be
concluded that barriers for adoption of this technology are related to concerns about among
others the usefulness of the device and about trust and privacy risks (Cakebread, 2017).
11
1.5. This Research
When it comes to the Smart Speaker, there is a gap. There is not a good match between the
technological feasibility and what the market expects. People don’t know the technology very
well yet and might expect something different or more then is possible or reality. This is why
the Smart Speaker calls for a rational understanding. In other words: the objective of this
thesis is to assess the extent to which people are intending to adopt the Smart Speaker and to
explore important factors possibly driving this intention. With this in mind the following
What are motivations and perceptions that affect people’s intention of adopting the AI-based
Smart Speaker?
In order to answer this research question, an assessment on the Smart Speaker is done by
made in literature so far. Holistically, the model simply helps the researcher to make up the
sum of perceived “benefits” and “costs” of the technology in order to understand the attitude
towards using the Smart Speaker and eventually adopting the technology. This model as
introduced for the first time by Davis, et al. (1989) roughly looks like Figure 1. In this figure,
Perceived Usefulness can be seen as a benefit in this example and Perceived Ease of Use as
cost. More detailed explanations on the model and the appropriate used variables will be
12
1.6. Adoption of Spoken Language Dialog System (SLDS)
The so-called Virtual Digital Assistants such as Google Assistant and Amazon Alexa, is the
software embedded in the Smart Speaker. This software is the backbone of the Smart Speaker
technology and turns your spoken language into a task and creates feedback given to you by
speech. More about this technology, in research more often called a Spoken Language Dialog
But for now the question is, what does previous research say about the adoption of such
SLDS’s? Through the years, a lot of research has been done on such voice controlled systems.
The conclusion so far was pretty much the same. In theory the idea of an SLDS is there and
the future view of application of the technology is very promising (Joshi 1991/ Liddy 2001/
Chowdhury 2003).
The Technology Acceptance Model has been used before for analysing speech recognition
technology or SLDS. A study on the user acceptance of voice recognition technology done by
Simon & Paper (2007) suggests that their adapted TAM had a great predictive power
“Perceived Ease of Use”). They furthermore mention that the rapidly evolving speech
recognition technology will become less prone to error, more sophisticated, more powerful
and user-friendlier. This can lead to less fluctuations in the Perceived Usefulness, Perceived
Ease of Use and in this research’ case the social norm. Due to Simon & Paper (2007) this also
impacts the intention of and actual system use. This calls out for a new research testing the
most recent technologies with an adapted TAM; is the Smart Speaker the more stable SLDS
13
6 years later, not much has changed. Dahl (2013) states that looking at the future, applications
for natural language processing would become more and more capable. This development
will be based on factors such as; increase of power of devices, development of new
techniques for exploiting vast amounts of data available on the Internet and related
technologies like speech recognition. In other words, Dahl (2013) says that the synergy of
these factors will make the future applications of natural language processing very likely to
More recent Dale (2016) states an important gap. He says that it makes sense that research is
ahead of the actual products; it makes sense because often the commercial benefits for
companies are not always clear. Dale (2016) mentions the risk that in the past the newest
technologies remained in research. He even mentions: the next milestone for the Big Four of
interactions, taking context into account rather that analyzing merely a sequence of
To summarize, a lot of researches mention the promising technology of SLDS and point out
the developments and call for actual products. Did the Big Four of Technology create these
kinds of products with their Smart Speakers? And could this hardware become something
tangible and feasible in this line of thought? The missing link is clear and so is the question
concerning this part of the Smart Speaker. Is the improvement of the technology of SLDS
(like better translating requests into tasks, better understanding conversations including
context and being more robust against noise) going to improve the attitude of people towards
14
Since the Smart Speaker is still quite young and in the early stages of it’s promising
development, the only research currently available is limited. However the research available
was a good basis to define the Smart Speaker for this thesis. In the literature review the
technologies embedded in the smart speaker are reviewed and combined with the
development and variable versions of the TAM. From this theoretical background variables
are extracted for the research framework (and its hypotheses), which are later researched by
means of a questionnaire. The reported results are followed by a discussion (including some
striking results and limitations to the research) and finally the conclusion (including
15
2. Literature Review
In the literature review, this thesis will engage into two relevant parts for the further process
of the research. Firstly, the literature on the technologies that are embedded in the Smart
Speaker will be explored. This in order to create a better understanding what research has
found on these technologies and what challenges and thus relevant motivations or barriers can
Secondly, the TAM and its later adjusted models are reviewed in order to find the variables
that are both relevant and applicable for the research model. These variables will be the basis
for the research model, the corresponding hypotheses and eventually the questions for the
survey.
The Smart Speaker has been given several names (as also seen in the introduction) such as a
& Nam, 2017). To clarify what the Smart Speaker is, the technology review will start by
giving a definition of the Smart Speaker, which also will be the fundament and support the
“A hands-free speaker powered with digital voice assistant using two-way voice
Based on this definition three interesting features come together in the Smart Speaker which
combination makes it different from other technologies. The Smart Speaker is: a Spoken
Language Dialog Systems (digital voice assistant using two-way voice computing), a Smart
the internet making voice search possible, as well as making applications and updates
available) technology. In the literature these subjects will be discussed and explained towards
16
2.1.1. Spoken Language Dialog System
Spoken Language Dialog System (SLDS). Figure 2 creates a general understanding of what
such a system implies. In the definition this is described as “digital voice assistant using a two
In other words; the system translates your spoken language into text, the dialogue manager
translates that text into meaning and a certain task (this is the work of the digital voice
assistant, possibly with help of its connections or applications), and finally gives a response or
The reason for the Smart Speaker and thus the SLDS to have such great potential can be
explained by taking a step back by looking at the general development of the human
computer interface (interaction between human and computer), which is well illustrated in
Figure 3. The research explains: “the desktop, browser and search metaphors of the last
decades leads to a new solve metaphor focused on context and tasks Bellegarda (2013).”
Figure 3: Natural stages in the evolution of the user interface (Bellegarda, 2013)
17
Bellegarda (2013) concludes this figure by stating that the user will get more used to
expressing a more general need and thereafter let a system fulfill (or solve) this need.
When looking at expressing a more general need, research states speech is proven to be the
most essential and primary way of communicating for human beings (Prabhakar & Sahu,
2013). Logically this means, spoken language has the potential of being an important mode of
interaction with computers. Furthermore the research says that today, speech technologies are
commercially available for a limited but interesting range of tasks. These technologies enable
machines to respond correctly and reliably to human voices and provide useful and valuable
Two fields of technology research or applications also interesting to look into are Voice
Search and Smart Technologies, since these are also unique qualities that come together
within the Smart Speaker. The Smart Speaker empowers Voice Search by making it able to
do this at home and enables Smart Technologies to be controlled with the Smart Speaker as
One specific application of voice based human-computer interaction is the voice web search.
Schalkwyk, et al. (2010) found that voice (web) search is growing rapidly and many users
intend to become frequent users. When it comes to voice mobile searches the following
• Is more popular for “on-the-go” topics such as food and drink or local businesses;
• Is less likely to be used for potentially sensitive subjects (adult, social network,
• Is less likely when searching for a website that requires more intensive interaction.
18
The question for now is, what the influence of Smart Speakers on this growth is and if these
issues would be different in the privacy of our homes with an intelligent speaker. Answering
this: the research of Moorthy & Vu (2014) found that participants of the research preferred
using a Voice Activated Personal Assistant (VAPA) (another term for the Virtual Assistant or
SLDS) in private locations (home). On the other hand, also in their homes people are
skeptical about using VAPA when it concerns more private input, compared to more general
As mentioned in the definition and the name of the device, “smart” is part of the possible
cloud computed connections the Smart Speaker is capable of. This is an interesting and
unique function for its context. The Smart Speaker could be connected to other devices that
have smart features in home (think about light, curtains or the thermostat). In other words; it
could possibly be part of a smart home or a so-called “Ambient Intelligence”, or even more, it
could be the central device controlling the smart home. That is why this part is more focussed
on the possible contribution a Smart Speaker could make in these environments rather then
Within the research of Chan, et al. (2009) the test case for Ambient Intelligence (AmI) is
based on the elderly; who, with the help of AmI, could be assisted and remain independent.
The article of Chen, et al. (2009) reviews various technologies available for smart homes.
“The devices to monitor health and activity and provide assistance in the home must be non-
obtrusive and acceptable to users. The needs of users require more research.” Chen, et al.
(2009) furthermore states: “AmI means making the environment sensitive to the user by using
technology”.
In a similar research one statement points more into the direction of the Smart Speaker when
Cook, et al. (2009) mentions that such systems should understand when to interrupt a user and
when to suggest something and when not to. This suggests, more control should be possible
in AmI systems.
19
In 2013 (Balta-Ozkan, et al., 2013) state the following about the smart home industry with
energy as a context, but for this research still very relevant outcomes:
• Households say they would adopt such technologies in large quantities if these people
• The usefulness and benefits of the smart technology will have to be clearly stated and
demonstrated;
resistance;
• Finally data privacy is an issue in smart home technologies. This could be dealt with
by privacy friendly techniques. But on the other hand, experts say too much
20
2.2. Evolution of the Technology Acceptance Model
In order to measure and validate the separate motivations and issues when adopting the Smart
Speaker, a rational research model will have to be formulated and interpreted for the Smart
Speaker. As mentioned in the introduction, for this the Technology Acceptance Model, or an
adjusted version of the TAM, will be presented as the research model. This model examines
specific factors that may influence technology adoption, such as presented in the first version
of the TAM: Perceived Usefulness and Perceived Ease of Use. The development of these
models and its applicable factors for the conceptual model will be discussed and used as basis
for the adapted version of the TAM or research model. The following research and models are
explored:
Research in the acceptance of information technology has delivered many different models,
all having factors measuring the acceptance of a certain technology. The first and up until
now the most widely accepted model is, as mentioned in the introduction, the Technology
Davis, et al. (1989) based the model on earlier research by Fishbein & Ajzen (1975) that
created the Theory of Reasoned Action (TRA). The theory by Fishbein & Ajzen says that the
behavioral intention is determined by both the attitude towards that behavior and a subjective
21
The TAM presented in Figure 4 is then an adaptation of the TRA, specifically created for
As mentioned in the introduction, the first TAM made the sum of “cost” and “benefit” factors
by looking at respectively the Perceived Ease of Use and the Perceived Usefulness.
specific application system will increase his or her job performance within an
organizational context”
• “Perceived Ease of Use is the degree to which the prospective user expects the target
This was as mentioned before the basis for a lot more models and extensions on this idea of
measuring technology acceptance. Important for the construct of the research model is the
Use, this is the construct used in the research model, also confirmed by the case study
22
2.2.2. TAM 2
The further exploration of the TAM in the context of a smart speaker brings us to the TAM 2
as shown in Figure 5. This is an extended version of what Davis et al. introduced in 1989,
created by Venktantesh & Davis (2000). They found several different factors significantly
influencing user acceptance. The findings of this research improved the understanding of user
adoption.
One relevant factor extracted for this thesis is the Experience an individual has in comparable
and more accepted with the rising usage of voice search as an example. In the case of this
research, Experience will be (as also seen in the UTAUT model by Venkatesh, et al. (2003) in
Figure 6) divided into two different factors, which will be further explained in the next
chapter.
23
2.2.3. UTAUT
Acceptance and Use of Technology (UTAUT), shown in Figure 6. From this model, the most
important factor extracted for this thesis is Social Influence. Again, the direct effect of Social
2.2.4. HMSAM
Up until the following research these models remain to have an organizational context. But
the smart speaker is also meant for personal (private, in-home) use, which means there is also
a hedonic factor to be considered. In other words, how entertaining does a user think the
smart speaker is? This Enjoyment is also explained in the research of Mun & Hwang (2003,
page 435). They state that prior research already proposed Enjoyment is a determinant of
behavioral intentions.
24
The importance of this hedonic factor was also acknowledged in the research of Lowry, et al.
(2012), who further extended the TAM with a hedonic factor in the Hedonic-Motivation
System Adoption Model (HMSAM). In Figure 7 is seen that Joy (or Perceived Entertainment,
as used in the case study model in the next paragraph as well as in the research model of this
study) is introduced and further explored in the research of Lowry, et al. (2012).
Figure 7: Van der Heijden's Model as the Baseline for the HMSAM (Lowry, et al. 2012)
As stated in the introduction issues such as trust and privacy also come with a technology
with voice-control as the interface. The case study of cellular telephones using the TAM
mentions the same factor (Kwon & Chidambaram 2000): “the anxiety about using a new
medium or technology”. This finally inspires the last factor influencing the behavioral
intention towards the use of technology: Apprehensiveness. The same research mentions
innate fear and intrusion into personal privacy as part of the Apprehensiveness. As seen in
Figure 8, this is one of the few models mentioning Apprehensiveness and Perceived
Entertainment. One can see the construct of this model is similar to the research model of this
research.
25
Figure 8: Research model used by Kwom & Chidambaram (2000)
26
2.3. Summary of the literature review
Combining the technology literature with the development of the TAM and starting off with
the original TAM (Davis, et al. 1989) the following factors are important to take into account
• Social Influence;
• Perceived Entertainment;
• Apprehensiveness;
The motivation and explanation for these variables will be presented in chapter 3.
Finally as already mentioned, currently the Smart Speaker is not being sold in the
Netherlands, where this research is conducted. This means the resources of finding out the
consequence, considering the limited time frame of conducting this research, this means only
the behavioral intention (Use Intention) towards this technology will be measured as
dependent variable.
27
3. Variables and Research Model
In this section first all variables will be defined and motivated based on previous research
presented in the literature review. The motivation will be formulated by connecting both the
literature of the technology as well as the models frequently used to measure technology
acceptance. Therefore each variable is defined, motivated and connected to the literature.
With these variables finally the research model will be given and with it the hypotheses that
3.1. Variables
For the sake of structure, the definition of Perceived Usefulness is presented again: “degree to
which a person believes that using a particular system would enhance his or her job
performance (Davis, et al., 1989)”. This means for example, a person might perceive the
Smart Speaker useful because of reading a recipe, controlling the thermometer or asking for
their daily schedule. Perceived Usefulness is one of the basic elements of the Original TAM
by Davis, et al. (1989). This is also seen back in the technology review with the outcome of
Balta-Ozkan, et al. (2013) stating the acceptence of technology will be enhanced if the
usefulness and benefits of the smart technologies would clearly be stated and demonstrated.
The same research states that increased control of technologies such as these can help to
counteract consumer resistance. A more proactive approach that is provided by the Smart
Speaker might change the Perceived Usefulness or at least enhance the Use Intention.
28
3.1.2. Perceived Ease of Use
Just like Perceived Usefulness, the definition of Perceived Ease of Use: “degree to which a
person believes that using a particular system would be free from effort”. This means for
example, learning how to communicate with the Smart Speaker or connecting with the
Internet and other devices. Yet another basic TAM (Davis, et al., 1989) element, also
essential and in practice seen in the Balta-Ozkan, et al. (2013) research: “a household would
adopt smart technologies in large quantities if they wouldn’t have to change their daily
routines.”
The Social Influence by Cho (2011) defined as: “a person’s perception that most people who
are important to him think he should or should not perform the behavior in question.” Cho
(2011) also acknowledges, what many TAM studies have shown regarding the direct effect of
Social Influence on the behavioral intention (in this research the Use Intention). What your
surroundings think about a certain technology is important, also confirmed in the TAM 2
(Venkatesh & Davis, 2000) and in the research model testing the acceptance of the cellular
telephone (Kwom & Chidambaram, 2000). In the Ambient Intelligence literature it is stated
that elderly could have the opportunity to live independent, which as a development also
29
3.1.4. Perceived Entertainment
In the research of Lowry (2012), the HMSAM model describes Joy (or Perceived
Entertainment) as: ‘the extent to which the activity of using the computer is perceived to
bring about pleasure and Joy for their own sake, apart from any anticipated performance
consequences“. The Smart Speaker is obviously also a technology that can be used for
reasons of Joy or fun, not only in a professional context; meaning this hedonic factor will be
included into the research model. This Perceived Entertainment has not been explored in the
technology review, which makes sense since the technology research is more focused on the
possibility of creating these technologies or focused on context of utility, not a context of fun.
This makes this variable even more important to explore since there seems to be a gap in
practice.
3.1.5. Apprehensiveness
Kwom & Chidambaram (2000) describe Apprehensiveness in their case study to be: “anxiety
people’s personal data and people’s privacy being safe. This is because, as already explained
in the introduction, privacy is something for the 21th century. Everything that has to do with
gathering data is rather sensitive nowadays, thus affecting one’s intention to use a technology.
Apprehensiveness is also one of the main concerns in the technology review, which makes it
an interesting variable to measure and use it in the research model to look if and how it is as a
The research of Moorthy & Vu (2014), already found that people prefer using a voice-
controlled assistant at private locations. Moreover, voice search is hardly done when it
concerns sensitive subjects (Schalkwyk, et al., 2010). The smart home industry has a dilemma
whereas people are worried about their privacy of data, even though too much regulation and
30
3.1.6. Interface Familiarity
The development of computer-human interactions shows that in practice there has always
been change in the interface (Bellegarda, 2013). The challenge for now is, knowing what the
background of a respondent is, how familiar he or she is with SLDS (Bertrand et al., 2010)
and how that influences the Use Intention. This is also why Interface Familiarity in the model
has an indirect effect of Use Intention via Perceived Usefulness. A person who is more
familiar with this form of human-computer interaction may have a higher appreciation for the
This is why the first variable in this research is Interface Familiarity, detracted of what
Venkatesh, et al. (2003) in their research call Experience. Even though the original definition
as stated in the literature is slightly different (Gefen, 2000), being not so much focussed on
interface, it does form the basis for the definition used in this thesis: “one's understanding of
The definition of Gefen (2000) is more focused on being familiar to a person rather than
applicable for this research and therefore used as definition for Interface Familiarity:
learning of what, why, where and when others do what they do.”
31
3.1.7. Web Skills
Web Skills are defined as “an individual judgment of one’s capability to use a computer”
(Koufaris, 2002). Being also conducted from the Experience measurement of the UTAUT
(Venkatesh, et al., 2003). It is important to clarify the difference between the variables Web
Skills and Interface Familiarity. The variable Web Skills is more focused on a person’s self
perception of his or her skills on the web and in general is focused on computer skills on the
Internet, whereas the variable Interface Familiarity is more focused on interaction between
The review of the technologies found that people are less likely to search for a website that
requires more intensive interaction (Schalkwyk, et al., 2010). It is in this line of thought that
the decision has been made for Web Skills to be connected to Perceived Entertainment, i.e.
assuming that the more experienced and skilled a person is on the web, the better he or she
knows what a Smart Speaker is capable of, especially when in search of entertainment, which
Finally Use Intention is the dependent variable in the research model and is defined as: “the
degree to which a person has formulated conscious plans to perform or not perform some
specified future behavior” (Venkatesh, et al., 2003). In this research it means whether
As mentioned before, it is practically too difficult to measure the actual use of the Smart
Speaker since is not officially launched in The Netherlands yet. That is why it is important to
know that Use Intention is strongly connected to actual use of a technology, as seen in the
original TAM (Davis, et al., 1989) meaning chances are Use Intention is truly a predictor for
adopting the Smart Speaker. Or as Davis, et al. (1989) stated: “People’s computer use can be
32
3.2. Research Model
Based on the explored variables with their effect found in theory, the following research
Figure 9: Research model of adoption of the smart speaker (based on Technology Acceptance Model)
3.3. Hypotheses
The following hypotheses as presented in Figure 9 are stated and will be tested in the research
33
4. Method
The thesis and the collection of the data will be based on quantitative research. The data is
gathered by conducting a survey, which means the data is cross-sectional. In the introduction
of the survey, a short but clear explanation of the technology was provided to the respondent.
Not everybody is familiar with the term “Smart Speaker”, that is why in this same
introduction two examples of the device with a photo of several Smart Speakers and a clear
text including a definition gave every respondent the same background information as
presented in Appendix 3. This is information was provided in order for a respondent to create
4.1. Sampling
The non-probability, convenience sample will be users and non-users of the technology of the
Smart Speaker. Since the Smart Speaker is not introduced in the Netherlands and the survey
will be spread out via e-mail and social media (Facebook and LinkedIn), starting in the
Netherlands the expectation is that most respondents actually will be not using the product.
The survey was carried out in the period from the 23rd of April until the 10th of May (2018).
By means of an extra incentive, the researcher tried to maximize the number of respondents.
This incentive is to randomly give away one Smart Speaker (Amazon Alexa Dot). Combining
the extra reward incentive with content that is relevant for the interested respondent (sharing
the research on relevant platforms whereas the main interest is such) the number of
respondents aimed to be maximized. The minimum amount of data cases for such a research
would be 200, based on the thumb rule of that being a good sample size. Based on previous
research the response rate of researches comparable to this is very broad and rates from 16.2
(Cho, 2011), to 37 (Kwom & Chidambaram) and over 90 percent (Pavlou, 2003), which
makes it hard to estimate the response rate in this thesis. Also because within the social
platforms it’s not possible to estimate how many people actually received the survey.
34
4.2. Measures
The measures of the survey are presented in Table 1 (all measurements are intervals using a
7-point (completely disagree – completely agree) Likert scale. The Cornbach’s alpha as
mentioned in the cited paper is presented as to justify the use of the variables and items by
presenting the reliability in previous research. Apart from the examples of items presented,
Note: the items chosen and thus question asked for the Apprehensiveness variable are asked
(or R^2)
Perceived Lowry, et al. 3 I would have fun using the 0.93 – 0.98
Perceived Ease of Venkatesh & 4 I find the Smart Speaker to 0.86 - 0.98
(cited 13829
times)
337)
Speaker
35
Interface Gefen (2000) 4 I am familiar with 0.89
altered) my voice
Use Intention (UI) Venkatesh, et al. 3 I plan to use this Smart 0.935
20083)
As already explored in the literature review, not a lot of research has been done specifically
on the acceptance of the Smart Speaker. In an effort to add control variables, there has been
looked at previous similar studies including the TAM. Based on that, three control variables
The design of the study is not longitudinal but cross-sectional (only a snapshot), meaning the
issue of reversed causality cannot be ruled out. The technology is not for sale in the
Netherlands, meaning the research will be dependent on the interpretation of the description
of the product, whereas also the actual use can not be properly measured, the intention
36
Considering the sample, the frame is not based on the complete population of potential users
of the Smart Speaker, thus the respondents will be found based on convenient sampling. This
means the generalizability and results are not guaranteed to be representative. Furthermore,
the response rate in previous researches has been fluctuating, meaning there is no guarantee
Finally, there is always a risk of common method bias and social desirability in answers due
to the self-created and reported survey. Hopefully this will be as low as possible due to the
relevance of the subject, the extra incentive (lottery of a Smart Speaker) and mentioning the
issues of common method bias and social desirability prior to the questions.
The following tools have been used in order to make the analysis:
• Qualtrics: shaping, designing the online survey in order to gather the data of the
respondents;
• SPSS: the statistical program used to test, clean and analyze the data and give it
• PROCESS: a model created as a plug-in for SPSS in order to analyze the mediated
37
5. Results
5.1.1 Demographics
In Figures 10,11 and 12 the demographic data is presented. As can be seen, the educational
level and age are rather concentrated, whereas 80% of the respondents either have a
Bachelor’s or a Master’s degree and 80% of the respondents are between 18 and 25 years of
age.
Figure 10: Educational level of respondents Figure 11: Gender of the respondents
26-35 (16%)
36-45 (3%)
Other
18-25 (80%)
46-55 (1%)
38
5.1.1. Response rate
The survey was distributed amongst large networks; so it’s hard to measure how many people
actually saw it and thus making a real response rate. However, 182 people completed the
questionnaire, whereas 217 are registered to have started filling it out. This makes a 82%
“completion” rate.
Before doing any analysis with the data in order to test hypotheses, a couple of checks and
5.2.1. Data
First, a frequency check was done, which pointed out there were no errors in any of the items.
Thereafter in some of the measurements there were missing values for certain variables,
which were were deleted listwise. Also, for the sake of having a better overview, the for the
analysis irrelevant data such as IP address and start date were ignored. Also, the variable
“Sex” has been recoded into “SexNew”. Whereas Male was 1, it is now 0 and for Female 2
39
5.2.2. Normality
Based on the Kolmogorov-Smirnov & the Shapiro-Wilk output it is significantly tested that
Kolmogorov-Smirnov Shapiro-Wilk
Usefulness
Entertainment
Familiarity
use
40
5.2.3. Computing means
Before running analysis, new variables as a mean of the already existing items are created.
The means of all items (within one variable) were computed as a new set of data points. In the
case of this research the mean of all items used to describe a variable is calculated. For
instance the items: PE_1, PE_2 and PE_3 (all items concerning Perceived Entertainment)
were computed into one variable as a mean of all items of Perceived Entertainment and called
analyze all direct and indirect effects that are hypothesized in paragraph 3.3..
The outliers are checked by standardizing the means of the variables, looking into the
frequencies of those standardized variables, looking at possible outliers (cases with z>|3|) and
finally examining the distribution whether these possible outliers are isolated cases. Outliers
within the variables Perceived Entertainment and Perceived Ease of Use are excluded. This
5.2.5. Reliability
The reliability analysis will be presented diagonally in the correlation matrix in Table 3
between brackets. Conclusion is that all Cronbach’s α’s are above 0.7, meaning the scales are
reliable.
41
5.2.6. Correlation
Table one shows a matrix of correlations between coefficients of the variables used in this
report. Strong, positive correlations can be seen between several variables (apart from the
control variables that are age, education level and sex). With for instance a strong correlation
between Perceived Usefulness and Use Intention. The important correlations for the further
One data point that stands out in Table 3 is that the interpreted control variables do not
significantly correlate to the dependent variable. This suggests that using them in a control
model (in a hierarchical multiple regression) will not have any significant effect.
7. APP 3.56 1.59 -.073 -.065 .02 .42** .12 .24** (.93)
8. WS 5.78 .86 -.124 -.08 .01 .08 .44** .12 .07 (.75)
9. IF 4.68 1.64 -.08 -.003 -.08 .27** .37** .25** .20** .27** (.90)
10. SI 3.66 1.43 .04 .1 -.02 .50** .14 .31** .43** .07 .27** (.95)
11. UI 4.71 1.56 -.052 -.003 -.088 .62** .25** .57** .38** .10 .45** .53** (.94)
42
5.3. Data Analysis direct effects
In order to measure and explain the direct effects of all independent variables on Use
Intention, a multiple regression has been made. Since the expected control variables are not
correlated to the dependent variable (Use Intention) and the hierarchical multiple regression
showed no significance for the control variables, a regular multiple regression has been done
excluding the interpreted control variables. (See Appendix 5 for the outcome of the
hierarchical multiple regression, including the interpreted control variables in step 1 of the
model).
43
5.3.2. Conclusions from Multiple Regression Analysis
To examine the direct effects of the independent variables on the Use Intention of the Smart
Speaker, a multiple regression has been done. As can be seen in the Tables 4a and 4b in
paragraph 5.3.1. there are 4 variables having a significant direct effect on Use Intention.
These are Perceived Entertainment (β = 0.30, p < .001), Perceived Usefulness (β = 0.32, p <
.001), Interface Familiarity (β = 0.27, p < .001) and Social Influence (β = 0.18, p < .005). This
means for instance, if someone’s Perceived Entertainment increases with 1, the Use Intention
With this results 3 hypotheses can be significantly explained. For all these, the H0 hypotheses
Other then that, it is striking that also the Interface Familiarity has proven to have a direct
positive effect on Use Intention. Nevertheless, the on theory based H3: “Interface Familiarity
has an indirect, positive effect on Use Intention via Perceived Usefulness.” is tested in
chapter 5.4..
44
5.4. Data Analysis Indirect effects
The following indirect tests (a mediated effect) are analyzed with the method created by
Hayes (2013), of which the model (Model 4 as seen in Figure 13) used for this analysis is
presented in paragraph 5.4.1.. An example of the SPSS output testing this indirect effect (or
H3. Interface Familiarity has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H4. Perceived Ease of Use has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H5. Perceived Ease of Use has an indirect, positive effect on Use Intention via Perceived
Entertainment.
H6. Apprehensiveness has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H7. Apprehensiveness has an indirect, positive effect on Use Intention via Perceived
Entertainment.
H8. Web Skills has an indirect positive effect on Use Intention via Perceived Entertainment.
45
5.4.1. Results of the mediation analysis
The first indirect hypothesis has been explained more broadly as seen in the Table 5a and 5b.
In the first table two different steps are presented. First with the mediator as outcome,
measuring the connection between the independent variable and the mediator (A1 presented
in Model 4 in Figure 13). Thereafter the dependant variable is used as outcome; the effect of
the mediator (Perceived Usefulness) on the dependent variable (Use Intention) is analysed
(B1), as well as the direct effect of the independent variable (Interface Familiarity) on de
dependant variable (C1’). The second one, if significant, can be used to suggest a direct
effect.
Figure 13: Model 4 template for PROCESS for SPSS and SAS by Andrew F. Hayes and The Guilford Press
46
Table 5a: Indirect effect Interface Familiarity on Use Intention through Perceived Usefulness
Consequent
Table 5b as shown below calculates the total effect (C1) of the model and eventually the
indirect effect (A1B1), the effect that is of interest (also known as the mediated effect). If the
bootstrapped interval (taking 5,000 samples) is completely above zero, it can be stated there is
a significant indirect effect. This is presented as Boot LLCI and Boot ULVI in Table 5b.
47
Table 5b: Indirect effect Interface Familiarity on Use Intention through Perceived Usefulness
effect
effect
For H3: Interface Familiarity has an indirect, positive effect on Use Intention via Perceived
Usefulness, the following can be concluded based on the results as presented in Table 5a and
5b. Using a least square path analysis, the mediation is analyzed, exploring the indirect effect
of Interface Familiarity on Use Intention through Perceived Usefulness. As Seen in table 5a,
if a person is familiar with the interface that is voice control, the Perceived Usefulness is also
estimated higher (A1 = .21, p < .001), the same goes for when a respondent has a higher
Perceived Usefulness, the Use Intention increases (as also already proven in the previous
Finally, in order to measure the mediated effect (A1B1 = .14), a bias-corrected bootstrap
confidence interval has been done, based on 5,000 bootstrap samples and was entirely above
zero (.047 to .23) as seen in Table 5b. This indicates that people that are familiar with the
interface of voice control perceive the Smart Speaker to be more useful, which finally results
48
The model also suggests a direct effect of Interface Familiarity on Use Intention (C1’ = .28, p
< .001). This is true; there is a direct effect as proven in multiple regression analysis in
paragraph 5.3.. Similar to the previous analysis on H3, H4 until H8 have been carried out. An
49
5.4.2. Conclusions from mediated effects analysis
The in paragraph 5.4.1. presented Table 6, explains very well how the hypothesis should be
concluded. For every hypothesized mediated effect a least square path analysis is used to
analyze all the indirect effects. For this, the main focus is the indirect effect with a bias-
corrected bootstrap confidence interval, based on 5,000 bootstrap samples. The requirement
for the indirect effect to be significant is that this interval has to be completely above zero.
This means that the same conclusion can as H3 be drawn for the following hypotheses; i.e.
the H0’s can be rejected, the hypotheses are accepted and are significantly proven:
H3. Interface Familiarity has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H4. Perceived Ease of Use has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H5. Perceived Ease of Use has an indirect, positive effect on Use Intention via Perceived
Entertainment.
H6. Apprehensiveness has an indirect, positive effect on Use Intention via Perceived
Usefulness.
H7. Apprehensiveness has an indirect, positive effect on Use Intention via Perceived
Entertainment.
From Table 6 it can also be concluded that a significant relationship between the independent
variable and the mediator can be seen for each of these hypotheses (A1).
Something noticeable is that the strongest indirect effects are the indirect effects starting with
the Perceived Ease of Use; H4 (A1B1 = .38, Boot [.19 to .60]) and H5 (A1B1 = .45, Boot [.26
to .69]).
Furthermore, interesting is that in this model analyzing the indirect effect of H3, a direct
effect (C1’= .28, p < .001) of Interface Familiarity on Use Intention can be suggested. This
effect was already noticed in the multiple regression analysis in paragraph 5.3.1. The same
could be said for H7, suggesting a direct effect of Apprehensiveness on Use Intention (C1’=
.25, p < .001), but in this case the multiple regression in paragraph 5.3.1. proved otherwise.
50
Finally, for H8: Web Skills has an indirect positive effect on Use Intention via Perceived
Entertainment, the H0 can’t be rejected and thus the effect is not significantly proven. The
indirect effect (A1B1=.38, Boot [-.05 to .29]) is not significant since the interval contains
zero. Also the effect of Web Skills on Perceived Entertainment (A1=.12, P=.18) is neither
significant, nor is the total effect of the model (C1= .18, p=.2).
In Figure 14 an overview is given of all the relevant results, giving the answers to all the
hypotheses. As shown, Web Skills is the only “red” variable, since this is the only hypothesis
that is rejected.
51
6. Discussion
“What are motivations and perceptions that affect people’s intention of adopting the AI-based
Smart Speaker?” This research question has been answered (partly) with this research based
on the Technology Acceptance Model. Under some conditions and having a rather
concentrated response group, several motivations and perceptions can be significantly proven
to be influencing the intention of using the Smart Speaker. First the detailed interpretation of
the results will be explained and thereafter the limitations that were part of this study will be
elaborated upon.
First of all, the Technology Acceptance Model and it’s further explorations formed the basis
for isolating the right variables and items, which in general were very good predictors for the
effects on the Use Intention (the dependent variable). Furthermore, the literature review on
the technologies that are an integrated part of the Smart Speaker provided a good motivation
to add those specific variables to the research model typically for this technology.
Perceived Ease of Use, Apprehensiveness and Social Influence acted as expected; the H0’s
were rejected and the hypotheses were accepted (H1, H2, H4, H5, H6, H7 and H9). The
research model (based on the literature review and introduction) as presented turned out to be
fit for purpose and offered an adequate framework for carrying this research. This means,
when looking back at the research question several motivations and perception of people are
52
6.1.1. Interface Familiarity
One of the two hypotheses that are not mentioned in the discussion yet, is actually
significantly proven, showed by the result accepting H3 in paragraph 5.3.2.. Nevertheless, this
variable needs mentioning since the multiple regression analysis showed that Interface
Familiarity has a significant (p = .000) direct effect on Use Intention. This is interesting since
the original variable “Experience”, as seen in the TAM 2 model (Venkatesh & Davis, 2000)
as well as the UTAUT model (Venkatesh, et al., 2003), was always used as a moderator, not a
This can be due to the development of the Spoken Language Dialog Systems (the Smart
Speaker is such a system), explained by Bertrand et al. (2010) in Figure 2. The technology
review mentions the development of the human-computer interface, which is more and more
focused on solving problems and tasks. At the basis of the value of this SLDS is a very simple
and rational understanding (Prabhakar & Sahu, 2013); speech is the most essential, efficient
and primary way of communication for a human being. In this line of thought, typing in a
computer can be seen as an inefficient detour. This explains the confirmed third hypothesis;
the Interface Familiarity has an indirect effect on use Intention through Perceived Usefulness.
It could also explain the Interface Familiarity having a larger impact on Use Intention. This is
because speech as the way of communicating is potentially making the use of any system (not
only the Smart Speaker) more efficient, also stated in the previous mentioned research of
Prabhakar & Sahu (2013). Interesting for future research would be to investigate just the
perception of this new human-computer interface, regardless of the hardware this software is
embedded in.
53
6.1.2. Web Skills
The last hypothesis that needs mentioning is that of the indirect effect of Web Skills on Use
Intention through Perceived Entertainment (H8), which was rejected. Looking at the
correlation matrix in table 3 the correlation between Web Skills and Perceived Entertainment
was not significant which already suggests the effect is not there.
Now looking further at the correlation matrix, one correlation that stands out is that between
Web Skills and Perceived Ease of Use (which in it’s turn has a proven indirect effect on Use
Intention). Even though theory on the TAM doesn’t suggest such an effect, in Appendix 7
model 6 from PROCESS (Hayes, 2013) and output of the analysis of the following, multiple
mediated effect are presented: The indirect effect of Web Skills on Use Intention, through
Perceived Ease of Use and Perceived Usefulness. The output is significant (Effect Ind2=.155,
Boot [.07 to .28], interval based on 5,000 bootstrapped samples) which proves this effect is
happening and rationally this makes sense since certain Web Skills will influence one’s
Perceived Ease of Use. This suggests the outcome model should be closer to the in Figure 5
presented TAM 2 (Venkatesh & Davis, 2000). In this model Experience (or at least Web
Skill’s part of one’s experience) contributes to the effect on the Perceived Usefulness of a
54
6.2. Limitations
Apart from the limitations of the research design mentioned in paragraph 4.4., the following
First of all, the intended control variables were not significant, forcing this research not using
any of them and doing analysis without them. Furthermore, unfortunately several cases had to
be either removed or ignored in the analysis, as mentioned in paragraph 5.1.1., 217 people
started the questionnaire whereas 182 finished it. Then out of the 182 some outliers and cases
with missing data were deleted list wise. For the results this left 171 data points to analyze.
Reasons for this missing data could for instance be; sensitive questions (especially
Apprehensiveness questions), the survey being too lengthy or irrelevance of questions for
respondents.
The final important factor to mention is that looking at the demographic data, it can be seen it
is rather unbalanced. 80% of the respondents were peaking in age between 18 and 25 years
old and 80% of the respondents had either a Bachelor’s degree of a Master’s degree (57% had
a Bachelor’s degree and 23 % a Master’s degree). This makes the results and analysis not
representative for the whole population, which is the risk of having a non-probability,
convenience sample.
55
7. Conclusions
7.1. Contributions
The purpose of this thesis was to create a rational understanding of why a person would adopt
the Smart Speaker and why not. Once more, the following research question was formulated:
What are the main motivations and perception of people in their process of adopting the AI-
based Smart Speaker? Since the technology is rather young and very little research has been
done on the acceptance of the Smart Speaker, this study contributes to the understanding of
the motivations, perceptions and barriers in one’s intention to use a Smart Speaker as well as
Starting from the basic TAM a research model was created appropriate for the Smart Speaker.
Combining managerial reports (in the introduction), the literature review (both technology
and the evolution of the TAM) and the basis of the TAM, a number of variables were
identified and appropriate effects on the dependent variable (Use Intention) were determined.
were measured based on items of previous research with reliable variables and items. This
resulted in a data analysis of which the summary can be seen in Figure 14 in chapter 5.5..
As expected, based on literature, H1 until H7 and H9 were significantly proven in the results
(these hypotheses can be found in paragraph 3.3.). Apart from these hypotheses, 2
observations stood out; firstly H8 could not be significantly proven; i.e. there was no indirect
effect of Web Skills on Use Intention through Perceived Entertainment. Thereafter in the
discussion another effect was argued which could be interesting to consider (Web Skills
indirectly affecting Use Intention through Perceived Ease of Use and Perceived Usefulness).
56
Finally, the Interface Familiarity turned out to be an interesting factor. The original
hypothesis was significantly proven (indirect effect on Use Intention through Perceived
Entertainment) and apart from that a direct effect was found of Interface Familiarity on Use
In general, the understanding of what a Smart Speaker is and what technologies it contains
can be already used as implications, apart from the answers to the question of this research.
There are several factors proven as to be affecting ones intention to use this piece of
technology, both directly and indirectly, e.g. Social influence, Perceived Entertainment or
The hedonic factor is something the original models for acceptance of information systems
did not consider. This research has proven that Perceived Entertainment (Joy) does have an
impact on one’s Use Intention when it concerns the Smart Speaker. This could be of great
A good example that is relevant for today is the effect of Apprehensiveness. As mentioned in
the introduction and later proven in the results, the amount of trust people have and the issues
they might have with regard to their privacy are an important factor for them to use or not use
a Smart Speaker. It is important to know this when working with this technology in whatever
context; e.g. developing an application for the Smart Speaker, using a Smart Speaker in any
context (Smart Home for instance) or developing the Smart Speaker as a technology.
57
Finally, it is important to understand the context in which you want to use the technology, i.e.
who will be using it, where is used and with what aim. A good example of a variable to take
into account when thinking about the context is the previously mentioned Interface
Familiarity and Web Skills (which for elderly might be lower then for younger people, or
different in a work context then at home). All these factors might be different for every
constricted to the Smart Speaker but also in context such as cars and phones and the
• Research in different contexts such as: different age groups such as the elderly (as
seen in the smart home theory, the research by Chan, et al. (2009)) or children who
• Focus on marketing, where entertainment is more important. How does one create an
advertisement on something that is off-screen? And how do the Smart Speakers work
in the “ecosystem” of the technological Big Four (e.g. Amazon and Google)?
• Interesting to look at the actual use of the Smart Speakers when they are released in
The Netherlands and what the influence of the language will be?
• Research with a more experimental design, let someone use the Smart Speaker for a
barriers.
58
8. Bibliography
Balta-Ozkan, N., Davidson, R., Bicket, M., & Whitmarsh, L. (2013). Social barriers to the
Bellegarda, J. R. (2013). Spoken Language Understanding for Natural Interaction: The Siri
Cakebread, C. (2017). The Google Home Mini secretly recorded peoples' conversations and
played into a big fear about Smart Speakers . Retrieved June 1, 2018 from Business insider:
http://www.businessinsider.com/consumers-say-no-thanks-to-expensive-smart-speakers-
chart-2017-10
Chan, M., Campo, E., Esteve, D., & Fourniols, J. Y. (2009). Smart homes - current features
Cho, H. (2011). Theoretical intersections among social influences, beliefs, and intentions in
the context of 3G mobile services in Singapore: Decomposing perceived critical mass and
Cook, D. J., Augusto, J. C., & Jakkula, V. R. (2009). Ambient intelligence: Technologies,
59
Dahl, D. A. (2013). Natural Language Processing: Past, Present and Future. (A. Neustein, &
J. Markowitz, Eds.) Mobile Speech and Advanced Natural Language Solutions , 49-73.
Dale, R. (2016). The return of the chatbots. Natural Language Engineering , 22 (5), 811-817.
Davis, F. D., Bagozzi, R. P., & Warschaw, P. R. (1989). User acceptance of computer
Gartner Inc. (2017, February). Control the Connected Home with Virtual Personal Assistants.
the-connected-home-with-virtual-personal-assistants/
Gartner Inc. (2016, October). Gartner Says Worldwide Spending on VPA-Enabled Wireless
Speakers Will Top $2 Billion by 2020. Retrieved Ferbruary 2018 from gartner:
https://www.gartner.com/newsroom/id/3464317
Gartner. (2017, August). Top Trends in the Gartner Hype Cycle for Emerging Technologies,
https://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-
emerging-technologies-2017/
Gefen, D. (2000). E-commerce: the role of familiarity and trust. Omega , 28 (6), 725-737.
Hong, S. J., & Tam, K. Y. (2006). Understanding the adoption of multipurpose information
appliances: The case of mobile data services. Information system research , 17 (2), 162-179.
60
Jaffe, J. (2018, January). Apple HomePod: Everything we know about the launch date, specs
and price. From cnet.com: Apple HomePod: Everything we know about the launch date,
Jeffs, M. (2017, January). OK Google, Siri, Alexa, Cortana; Can you tell me some stats on
https://www.branded3.com/blog/google-voice-search-stats-growth-trends/
Koo, H., S, K., & Nam, C. (2017). Speaker Wars begins: Which applications will be the killer
content for smart speaker? 14th International Telecommunications Society (ITS) Asia-Pacific
Koufaris, M. (2002). Applying the technology acceptance model and flow theory to online
Kwon, H. S., & Chidambaram, L. (2000). A test of the technology acceptance model: the case
Lerner, R. (2017, June). Smart Speaker Are The Future Of Audio. Retrieved February 2018
future-of-audio/#54b03e0966a9
61
Lowry, P. B., Gaskin, J., Twyman, N., Hammer, B., & Roberts, T. (2012). Taking ‘fun and
Marchick, A. (2017, January). The 2017 Voice Report by VoiceLabs. From VoiceLabs:
http://voicelabs.co/2017/01/15/the-2017-voice-report/
Montgomery, L. (2016, October). Amazon Alexa, Google Home: The Virtual Assistant Arms
home/amazon-alexa-google-home-virtualassistant-arms-race/
Moorthy, A. E., & Vu, K. P. (2014). Voice activated personal assistant: Acceptability of use
in the public space. In International Conference on Human Interface and the Management of
Mun, Y. Y., & Hwang, Y. (2003). Predicting the use of web-based information systems: self-
efficacy, enjoyment, learning goal orientation, and the technology acceptance model.
NPR & Edsion Research. (2017). National Public Media. From The Smart Audio Report
report-fall-winter-2017/
Pavlou, P. A. (2003). Consumer acceptance of electronic commerce: Integrating trust and risk
with the technology acceptance model. International journal of electronic commerce , 7 (3),
101-134.
62
Prabhakar, O., & Sahu, N. (2013). A survey on: Voice command recognition technique.
, 3 (5), 576-585.
https://www.qualtrics.com/platform/
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., et al. (2010).
“Your Word is my Command”: Google Search by Voice: A Case Study. In A. Neustein (Ed.),
Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics (pp. 61-90).
Simon, J., & Paper, D. (2007). User Acceptance of Voice Recognition Technology: An
Venkatesh, V., & Davis, F. (2000). A theoretical extension of the Technology Acceptance
Venkatesh, V., Morris, M., Davis, G., & Davis, F. (2003). User Acceptance of Information
Wueest, C. (2017). A guide to the security of voice-activated smart speakers. From Symantex:
https://www.symantec.com/blogs/threat-intelligence/security-voice-activated-smart-speakers
63
9. Appendices
Cops raid music fan’s flat after Alexa Amazon Echo device ‘holds a party on its own’ while
he was out. Oliver Haberstroh's door was broken down by irate cops after neighbours
A music fan has been left with a huge bill after his voice-operated Amazon Echo device
Cops were forced to break into Oliver Haberstroh's flat in Hamburg, Germany, after
neighbours complained about deafening music blasting from inside - but found the apartment
empty.
Mr Haberstroh claims he walked out of his flat to meet a friend on Friday night after checking
He wrote on Facebook: "While I was relaxed and enjoying a beer, Alexa managed on her
own, without command and without me using my mobile phone, to switch on at full volume
"She decided to have it at a very inconvenient time, between 1.50am and 3am. My neighbours
After knocking on the door, the officers called an expert to break the lock open - and refused
to hand over keys for the replacement until they'd been paid for the locksmith.
A police spokesman said the source of the noise was "a black jukebox which is usually
An Amazon spokesman said: "Working directly with the customer, we have identified the
"Echo was remotely activated and the volume increased through the customer’s third party
64
"Although the Alexa cloud service worked flawlessly, Amazon has offered the customer to
It comes just weeks after a mischievous parrot used Alexa to order itself a set of ten gift
The Amazon Echo is an intelligent personal assistant, which allows owners to play music and
Source: https://www.thesun.co.uk/news/4873155/cops-raid-german-blokes-house-after-his-
alexa-music-device-held-a-party-on-its-own-while-he-was-out/
65
Appendix 2: Gartner’s Hype Cycles
66
Appendix 3: Description of the Smart Speaker Technology
The smart speaker (such as Google Home and Amazon Alexa and Apple HomePod as seen in
the pictures)
A hands-free speaker powered with digital voice assistant using two-way voice
computing technology (meaning you can talk to it by starting with a wake up word
such as “OK Google”, and it gives feedback by talking back) that is highly connected
67
Appendix 4: Items for survey
In this Appendix all the variables and corresponding items used for the survey of this thesis
are presented (based on the items of the article in between brackets). The questions are also
• Using Smart Speaker would increase my chances of achieving things that are
important to me
68
Interface Familiarity (Gefen, 2000)
69
Appendix 5: Hierarchical multiple regression
This table shows, when doing a hierarchical multiple regression the intended control variables
are not significant (underlined) within the control model (step 1). This means a hierarchical
multiple regression analysis will not provide the desired results. Thus, in the thesis a multiple
Usefulness
Use
Entertainment
Familiarity
70
Appendix 6: PROCESS model 4 and output SPSS
Below given the outcome of the first mediated effect in this thesis. The indirect effect if
**************************************************************************
Model = 4
Y = UI_TOT
X = IF_TOT
M = PU_TOT
Sample size
172
**************************************************************************
Outcome: PU_TOT
Model Summary
Model
coeff se t p
71
constant 3.5889 .2908 12.3406 .0000
**************************************************************************
Outcome: UI_TOT
Model Summary
Model
coeff se t p
**
Outcome: UI_TOT
Model Summary
Model
coeff se t p
72
***************** TOTAL, DIRECT, AND INDIRECT EFFECTS ******************
**
Total effect of X on Y
Effect SE t p
Direct effect of X on Y
Effect SE t p
Indirect effect of X on Y
73
Ratio of indirect to direct effect of X on Y
Effect se Z p
****
5000
95.00
NOTE: Some cases were deleted due to missing data. The number of such cases was:
74
Appendix 7: PROCESS model 6 and output SPSS
Figure 17: Model 6 template for PROCESS for SPSS and SAS by Andrew F. Hayes and The Guilford Press
In the following outcome of SPSS, based on Model 6 in Figure 17, the indirect effect (Ind2)
of Web Skills on Use Intention through Perceived Ease of Use and Perceived Usefulness is
75
Run MATRIX procedure:
**************************************************************************
Model = 6
Y = UI_TOT
X = WS_TOT
M1 = PEU_TOT
M2 = PU_TOT
Sample size
171
**************************************************************************
Outcome: PEU_TOT
Model Summary
Model
76
**************************************************************************
Outcome: PU_TOT
Model Summary
Model
**************************************************************************
Outcome: UI_TOT
Model Summary
Model
**
77
Direct effect of X on Y
Indirect effect(s) of X on Y
****
5000
95.00
NOTE: Some cases were deleted due to missing data. The number of such cases was:
78