You are on page 1of 18

+Model

RETAIL-787; No. of Pages 18 ARTICLE IN PRESS

Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Keyword Selection Strategies in Search Engine Optimization: How Relevant


is Relevance?夽
Mayank Nagpal, J. Andrew Petersen ∗
The Smeal College of Business, The Pennsylvania State University, University Park, PA 16802, USA

Abstract
We build an empirical framework using search queries and organic click data which provides model-based guidance to SEO practitioners for
keyword selection and web content creation. Specifically, we study how search characteristics (search query popularity, search query competition,
search query specificity, and search intent) and website characteristics (content relevance and online authority) interact to affect the expected
organic clicks as well as the organic rank a website receives from the search engine result page (SERP). It is often thought that content relevance
is a key factor to improve the effectiveness of SEO. We find, however, that content relevance is an important factor in driving organic clicks only
when the consumer is farther along in the customer journey and searching for ways to purchase a product. Whereas, when the customer is at the
awareness stage and looking for product information, online authority is the key driver of organic clicks.
© 2020 New York University. Published by Elsevier Inc. All rights reserved.

Keywords: Search engine optimization; Tobit Model; Latent Semantic Analysis

Firms use Search Engine Marketing (SEM) techniques to While significant academic research has focused on SSA
promote websites by increasing their visibility on search engine (e.g., Skiera and Nabout 2013; Wiesel, Pauwels, and Arts 2011;
results pages (SERPs). As a large majority of users begin their Li et al. 2016), research on SEO is relatively scarce. This seems
online browsing experience using search engines, SEM accounts surprising, given that organic links are considered to be more
for the largest share (47%) of digital marketing spend (Silverman trustworthy by the users (Purcell, Brenner, and Rainie 2012),
2010). SEM aims to increase the prominence of a link on the account for the majority of clicks on a SERP (Baye, De los
Search Engine Result Pages (SERP) by appearing higher on Santos, and Wildenbeest 2016; Jerath, Ma, and Park 2014), and
either the sponsored and/or the organic portions of the SERP also get a lion’s share of SEM spending ($65 billion1 against the
(see Fig. 1 for an example SERP). SEM comprises of two parts: $35 billion2 spent on sponsored links in 2016).
(1) search engine optimization (SEO) which aims at getting a Organic links on the SERPs are ranked by search engines
higher rank and more clicks from the organic search results on using various criteria. These include factors such as authority
the SERP and (2) sponsored search advertising (SSA) which of the website, quality of the incoming links, and relevance of
aims at getting a higher rank and more clicks from the sponsored webpage content to the search query.3 There are a lot of different
search results on the SERP. SEO techniques firms use to increase the rankings and with it
the number of organic clicks for their website. These techniques
can be broadly divided into 2 distinct groups – on-page and
夽 We would like to thank a digital media advertising firm for providing the off-page SEO. On-page SEO involves optimizing the content
data used in this study. We would also like to thank Arvind Rangaswamy, Gary
on the website to improve its quality and structure. Off-page
Lilien, and participants of the 2018 Marketing Science Conference for providing
feedback on an earlier version of this paper. Finally, we would like to thank the
Marketing Science Institute (MSI) and its Young Scholars program for providing
financial support for this research (MSI Grant #4-1921). This research is part of 1 https://www.borrellassociates.com/industry-papers/papers/2016/trends-in-

the first author’s dissertation. digital-marketing-services-april-16-detail.


∗ Corresponding author. 2 https://www.statista.com/statistics/266627/projected-spending-on-search

E-mail addresses: mayanknagpal@psu.edu (M. Nagpal), jap57@psu.edu -marketing-in-the-us/.


(J.A. Petersen). 3 https://moz.com/search-ranking-factors.

https://doi.org/10.1016/j.jretai.2020.12.002
0022-4359/© 2020 New York University. Published by Elsevier Inc. All rights reserved.

Please cite this article as: Nagpal, Mayank, and Petersen, J. Andrew, Keyword Selection Strategies in Search Engine Optimization: How
Relevant is Relevance? Journal of Retailing, https://doi.org/10.1016/j.jretai.2020.12.002
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Fig. 1. Search engine results page.

SEO refers to the actions such as link building which are taken the business area, to fit the following three criteria. Selected
off-site to increase the website’s credibility, trustworthiness, and keywords should: (1) have enough search volume for which it
authority for the users and the search engines. Both off-page and would be worthwhile to create and optimize content; (2) have a
on-page SEO work towards a common goal of achieving higher relatively low PPC competition score; and (3) naturally fit and
rankings on SERPs and more organic clicks. On-page and off- make sense for the content on the website and should match the
page SEO work together to improve search engine rankings in search intent of the users being targeted.
complementary fashion; however, practitioners generally advise So ideally, a firm would want to create web content about
focusing on on-page SEO to achieve short run goals as it can relevant keywords with high search volumes and low levels of
achieve faster results than off-page SEO.4 competition. This would allow the firm to get ranked highly on
Both SEO practitioners and academic researchers (Luh, the SERP and receive a large share of the large potential number
Yang, and Huang 2016) have found content relevance to be of clicks. However, it is uncommon to find such a keyword;
among the top few factors affecting organic rank. According to search queries that receive higher search volume also tend to
Google’s SEO guidelines, creating compelling and useful con- have a higher corresponding level of competition. See Table 1
tent will likely influence website ranking more than any other for an example list of healthcare-based search queries with their
factor.5 Thus, several market research surveys have found rele- corresponding search traffic and relative number of competitors.
vant content creation and keyword research as the most effective A broad search query, that is it represents a broader topic area,
on-page SEO tactic.6 In order to get a large number of clicks, it is is often searched by a larger number of users as compared to a
important to get ranked on the SERP for the appropriate search more specific search query, that is it represents a narrower topic
queries. While writing new content on their websites, managers area, but often also has a larger number of websites creating
need to select keywords7 to target search queries on which they landing pages relevant to it. Thus, firms often face a trade-off
want to focus this content. A typical keyword selection pro- between creating pages relevant to broad search queries (poten-
cess for SEO involves content marketers using keyword research tially getting a small share of a large market) versus specific
tools to identify appropriate keywords for writing content. The search queries (potentially getting a large share of a small mar-
keywords are selected, using best judgement and knowledge of ket). As an example, a website for a bicycle retailer needs to
decide whether it should create a landing page focused on a
broad topic such as “Electric Bikes” or a more specific sub-topic
4 https://www.searchenginewatch.com/2019/09/16/10-takeaways-from-the
such as “Which electric bike is the fastest?”.
Another consideration in the keyword selection decision is
-state-of-seo-survey/.
5 https://support.google.com/webmasters/answer/7451184?hl=en. analyzing the intent of the user’s search query, that is whether
6 http://webpromo.expert/google-qa-march/. the search query is informational, transactional or navigational.
7 The term keyword is used to refer to the topic area that the firm wants to
Users conducting informational searches (e.g., “blenders vs.
write content about on a given webpage. The term search query is used to refer food processors”, “wattage required for blenders”), are look-
to the text that a user types to search on a search engine. For the purpose of this
ing for information about products and usually represent search
study, we assume the search queries are manifestations of underlying keywords.

2
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Table 1
Example of search queries commonly used for healthcare-related searches.
Search queries Traffic Relative # of competitors

1 Malaria parasite life cycle 36 .15


2 Symptoms of type 1 diabetes in a child 451 .21
3 Sore throat medicine for toddlers 1701 .54
4 Type 1 vs type 2 diabetes symptoms 1701 .07
5 Malaria symptoms 26651 .13
6 Sore throat medicine 26651 .97
7 Type1 diabetes symptoms 26651 .80
8 Type2 diabetes symptoms 65701 .83

*Data calculated based on information from Google AdWords.


Notes: Table 1 provides information on the traffic each search query receives and the number of sites bidding on each search query relative to all search queries
across Google. The first four search queries in Table 1 are specific search queries. A specific or long-tail search query is often longer, in this case each search query
is at least 4 words in length, and more descriptive as the searcher has already defined a narrower topic to search. This leads to smaller search volumes on that search
query and as a result, fewer firms wanting to compete on that search query. The second four search queries in Table 1 are broad search queries. A broad or generic
search query is often shorter, in this case each search query is three or fewer words in length, and less descriptive as the searcher is often trying to define a narrower
topic to search. This leads to larger search volumes on that search query and as a result, more firms wanting to compete on that search query. They need to select
between getting a larger part of the small user base of specific search queries and getting a smaller part of the larger user base of broad search queries.
For instance, even though a specific search query such as “malaria parasite life cycle” has low relative number of competitors (.15), meaning that it will be easier
for a firm to rank highly with relevant content, it also has a relatively small volume of search traffic (36). Additionally, we can see that a broad search query such as
“sore throat medicine” has a relatively high volume of search traffic (26,651), meaning that there are a lot of potential clicks available, but it also has a high relative
number of competitors (.97). Thus, a firm must decide whether it is better to create web content for search queries with relatively higher traffic and competition (i.e.,
potentially get a smaller share of a bigger market) or for search queries with relatively lower traffic and competition (i.e., potentially get a larger share of a smaller
market).

behaviors at the top of the online purchase funnel. Users con- using link building techniques to improve their online authority.
ducting transactional searches (e.g., “cheap food processors”, As off-page SEO is a more long-term process, a direct implica-
“best food processors”), are usually searching with the intent tion of this finding in the short run is that firms which have
of making a transaction but are still searching for the retailer already worked on their off-page SEO should leverage their
or brand they desire to purchase. Users conducting navigational higher authority to get clicks by optimizing content to infor-
searches (e.g., “Blendtec blenders”, “Ninja food processors”), mational search queries. On the other hand, retailers which have
are searching with the intent of making a purchase from a specific lower online authority should try to gain more clicks from trans-
website/retailer and usually represent the bottom of the online actional search queries where the short-term strategy of content
purchase funnel. As user intent is different for different types optimization through increasing relevance is more effective.
of search queries, firms need to understand which SEO strategy In the following sections, we present an overview of the
would be more effective for targeting each type of search intent. existing literature on both SSA and SEO and build on this by
To help retailers solve this problem, we analyze how focus- presenting a conceptual model where we present the expected
ing content on keywords related to specific search queries with relationships between the search and website characteristics and
certain characteristics affects user click behavior on the SERP. the organic clicks a website receives for a given search query. We
Accordingly, we propose a modeling framework to study how use data for 1,791 search queries relevant to three different firms
key search characteristics (search query popularity, search query from three different industries (online retailer, culinary school,
competition, search query specificity, and user search intent) as and urgent health care provider) to empirically test our concep-
well as two key retailer/website characteristics (content rele- tual model. Finally, we discuss the results of our model and its
vance and online authority) affect the organic clicks a website implications to the theory and practice of SEO.
receives for a search query. The framework provides model-
based guidance to SEO practitioners in their keyword selection
Literature Review
decisions by studying how effectiveness of writing relevant
content varies with search query and website type. We try to
To date, extant work on SEM primarily focuses on SSA with
understand how search characteristics and retailer/website char-
little research dedicated so far to SEO. In this literature review
acteristics interact together to influence user click behavior. This
we look at relevant research for both SSA and SEO. Table 2
allows us to rank search queries based on the estimated number
provides a summary of the related papers in these fields as well
of clicks a given website will receive.
as the relevant contribution from each of the papers.
We find that conducting on-page SEO by optimizing content,
Research studying the relationship between SSA and SEO
that is increasing relevance of the content to the search query,
campaigns provide mixed evidence about the importance of
to transactional or lower funnel search queries is more effective
sponsored results in SEO. On one hand, click-through rates,
in getting clicks than optimizing content on upper funnel search
conversions rates, and revenues in the presence of both paid
queries with informational intent. In order to target informational
and organic search listings are significantly higher than those in
search queries retailers should focus more on their off-page SEO
the absence of paid search (Yang and Ghose 2010). On the other

3
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Table 2
Literature review of SSA and SEO research.
Authors Field Relevant contribution

Nabout and Skiera (2012) SSA Return on investment in quality improvement not always positive due to the negative effect on
CPC due to the increased rank. Disentangled the negative direct price effect from the positive
indirect price effect of quality improvement.
Skiera and Nabout (2013) SSA Developed and implemented the PROSAD (Profit Optimizing Search Engine Advertising)
bidding decision support system to automatically determine optimized bids that maximize the
advertiser’s profit.
Li et al. (2016) SSA Studied the impact of attribution strategies on the realized ROI of keywords in search
campaigns. They find that first-click attribution leads to lower revenue returns and a more
pronounced decrease in CTR for more specific keywords.
Baye, De los Santos, and SEO A retailer’s investments in factors such as the quality and brand awareness of its site increases
Wildenbeest (2016) organic clicks both directly by making the site more attractive to consumers and indirectly by
improving its rank on the SERP.
Jerath, Ma, and Park (2014) SEO/SSA Consumers who search for less popular keywords expend more effort in their search for
information and are closer to a purchase. This makes them more targetable for sponsored
search advertising
Kritzinger and Weideman SEO/SSA After a certain period of time, an investment in search engine optimization rather than a
(2015) pay-per-click campaign appears to produce better results at lower cost.
Berman and Katona (2013) SEO/SSA SEO improves search engine’s ranking quality and thus customer satisfaction. This increases
consumer’s trust in organic links lowering SE’s revenue from sponsored links. They find an
inverse U-shaped relationship between the minimum bid and search engine profits.
Yang and Ghose (2010) SEO/SSA There is a positive interdependence between the click through rate on organic and paid
listings. The positive impact of organic clicks on paid clicks is 3.5 times stronger than the
opposite impact.
Taylor (2013) SEO/SSA As high-quality organic links cannibalize sponsored clicks SE have an incentive for quality
degradation of the organic results to increase revenues.
White (2013) SEO/SSA When improvements in search quality benefit all users equally, advertisers will charge a
higher price. However, when improvements in search quality provide a greater benefit to
novice searchers, advertisers will charge a lower price.
Rutz and Bucklin (2011) SSA There is a significant spillover from generic to branded search as generic search causes an
awareness of relevance of the brand. Incorporating this spillover considerably improves the
financial performance of generic keywords for any firm.
Rutz and Bucklin (2007) SSA Developed a model for studying individual keyword performance using hierarchical Baye’s
model demonstrating the importance of keyword-level covariates and heterogeneity in
conversion estimates.
Abou Nabout (2015) SSA Compared multiple algorithms for finding the optimal profit maximizing bids.
Yao and Mela (2011) SSA A dynamic structural model of the sponsored search advertising market finds the following: 1.
enabling firms to vary bids by consumer segment causes revenue gains for both firms and SE
along with improving consumer welfare, 2. second price auctions increase firm’s bids, 3.
consumer search tools increase consumer welfare and SE revenues but reduce advertiser
profits
Chen, Liu, and Whinston SSA Find the optimal share of exposure allocated to each bidder by SEs and how this changes with
(2009) the price elasticity of advertisers.
Feng, Bhargava, and Pennock SSA Propose a rank-revision strategy weights clicks on lower ranked items more than clicks on
(2007) higher ranked items. This method converges to optimal ordering faster and more consistently.
De los Santos and Koulayev SEO/SSA Propose an optimal ranking strategy of search results that maximizes consumers’
(2013) click-through rates (CTR) based on their preferences. This ranking system also increases
consumer welfare.
Rutz, Bucklin, and Sonnier SSA Propose a modeling approach for assessing keyword performance in a sparse data
(2012) environment. They find that higher positions have higher click-through and conversion rates.
Kang and Kim (2003) Information processing Compared the performance of multiple scoring algorithms for different types of user queries,
classified based on user intent.
Agarwal, Hosanagar, and SSA Evaluate the impact of ad placement on revenues and profits generated from sponsored
Smith (2011) search: 1. CTR decreases with position 2. conversion rate increases and then decreases for
long keywords
Ghose and Yang (2009) SSA Analyzed the relationship between keyword covariates and SSA performance: 1. CTR and
conversion rate decreases with rank. 2. CTR is less for more specific keywords. 3. Top ranked
position not the most profitable due to the difference in CPC.
White and Morris (2007) SEO/SSA There are differences in the queries, clicks, post-query browsing, and search success of
advanced and novice users.
White, Dumais, and Teevan SEO/SSA Develop a model to predict expertise based on search behavior and describe how knowledge
(2009) about domain expertise can be used to improve search results help increase user expertise

4
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Table 2 (Continued)
Authors Field Relevant contribution

Shi and Trusov (2013) SEO/SSA Content of listings, textual information of previously viewed links and search intent influence
the scanning behavior of users on SERP.
Broder (2002) SEO/SSA Classified information needs or search queries into informational, navigational and
transactional
Rose and Levinson (2004) SEO/SSA Propose a framework for understanding the underlying goals of user searches.
Brynjolfsson, Hu, and Smith, Digital marketing Increased product variety leads to an increase in consumer surplus in the online market
2003 (long-tail phenomenon)
Brynjolfsson, Hu, and Smith, Digital marketing Identified supply side and demand side drivers of the long tail phenomenon along with its
2003 effects on consumers as well as producers.
Brynjolfsson, Hu, and Digital marketing Internet search and discovery tools, such as recommendation engines, are associated with the
Simester, 2011 increase in share of niche products.
Shani and Chalasani (1992) Niche marketing Provides a framework for implementation of relationship marketing for niche markets in the
packaged goods industry.
Skiera et al. (2010) SSA Top 20% of all keywords attract on average 98.16% of all searches and generate 97.21% of all
clicks. Hence, advertisers do not need to bother too much about the performance of keywords
in the long tail.
Page and Brin (1998) SEO/SSA This paper describes PageRank, a method for rating web pages objectively and mechanically,
effectively measuring the human interest and attention devoted to them.
Luh, Yang, and Huang (2016) SEO PageRank (PR) is the most dominant factor in Google ranking function. The title follows as
the second most important, and the snippet and the URL have roughly equal importance with
variations among queries.
Mavridis and Symeonidis Information technology Developed a benchmark crawler called LHS Rank which incorporates semantics and
(2015) compares its performance against established metrics.
Liu and Toubia (2015) SEO/SSA Develops a topic model Hierarchically Dual Latent Dirichlet Allocation (HDLDA), to find a
relationship between main topics in search queries and search results. This helps understand
consumer’s content preferences using the semantic mapping between search queries and
results.
Evans (2007) SEO Studied the SEO techniques used by top practitioners to find that: 1. multiple pages were
generated to influence ranking with limited success, 2. PageRank very important in SEO, 3.
firms use older domains for higher rankings.

hand, a webpage with high attractiveness, which is likely to rank lectively provide insights on how firms could improve rank and
higher on the organic links, has a lower incentive to bid for spon- get more clicks through SSA. Factors related to auction and
sored links as consumer trust in sponsored links is lower (Katona bidding strategies are not directly applicable to improving the
and Sarvary 2010). Further, SEO campaigns have been shown ranking on organic search results, as the ranking algorithm for
to be more cost effective than SSA (Kritzinger and Weideman organic results does not consider auction bids while ordering
2015) and increase consumer satisfaction (Berman and Katona results. However, we expect that factors such as website quality,
2013). However, given that the two sets of search results are relevance of content and search query heterogeneity, identified
interconnected, it is important to consider past literature from in the SSA literature, will apply to our context of SEO.
both these topics as important insights can be obtained from past Another group of researchers in SSA study how consumer
studies on the sponsored results. search patterns differ in terms of user expertise (White and
Morris 2007; White, Dumais, and Teevan 2009), keyword type
Sponsored Search Advertising (SSA) (Jerath, Ma, and Park 2014; Agarwal, Hosanagar, and Smith
2011; White, Dumais, and Teevan 2009) and search state, that
The first set of papers in the field of SSA study how link allo- is exploration state and evaluation state (Shi and Trusov 2013).
cation and auction strategies of search engines (Feng, Bhargava, Researchers in this area classify search queries based on the
and Pennock 2007; De los Santos and Koulayev 2013) and bid- underlying need of the users (Broder 2002; Kim and Kang
ding and attribution strategies of websites (Skiera and Nabout 2003; Rose and Levinson 2004), popularity (Jerath, Ma, and
2013; Li et al. 2016; Abou Nabout 2015) affect financial perfor- Park 2014), the stage of the purchase process (Li et al. 2016),
mance of firms and consumer satisfaction. Researchers in this user expertise (White and Morris 2007; White, Dumais, and
field study the importance of incorporating consumer choice (De Teevan 2009), and branded versus generic search queries (Rutz
los Santos and Koulayev 2013) and content relevance (Feng, and Bucklin 2011). These classifications are useful for studying
Bhargava, and Pennock 2007) in a search engine’s ranking how different types of search queries can affect SEM strategies of
procedure along with the importance of website quality improve- firms. Rutz, Bucklin, and Sonnier (2012) showed that incorporat-
ment (Nabout and Skiera 2012) and keyword heterogeneity ing keyword heterogeneity in SSA strategies can be profitable for
(Rutz and Bucklin 2007; Rutz, Bucklin, and Sonnier 2012; Kang firms as consumer behavior differs across search queries. Others
and Kim 2003) in websites’ SSA strategies. These findings col- have shown that even though the higher ranked links have higher

5
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

CTRs, conversion rates increase with link positions (Agarwal,


Hosanagar, and Smith 2011; Ghose and Yang 2009) as a larger
proportion of users clicking the lower ranked links have high pur-
chase intent. The increase in conversion rate is higher for more
specific search queries (Agarwal, Hosanagar, and Smith 2011)
as users tend to spend more effort in finding relevant links and
thus click on results much lower in the sponsored results. Also,
the importance of search intent (Schultz 2016; Schultz 2019;
Rutz and Bucklin 2011) is noted in SEM. These studies use
the informational, transactional, and navigational search inten-
tion classification (Broder 2002; Jansen, Booth, and Spink 2008;
Fig. 2. Conceptual model.
Jansen and Schuster 2011) to analyze the performance of SSA
campaigns. They find differences across these intentions for the
advertising rank, click-through rate, cost per click, offline sign- mental website and search query characteristics, the importance
ing rate, and cost per contract. Even while noting that findings of which is less likely to change over time.
from sponsored listings may not be directly applicable to SEO
due to differences in competitive strategies of firms and user
Conceptual Model
beliefs about sponsored links, we see that search query types
play a crucial role in SEO strategy.
When writing new web content, retailers need to select key-
words on which to focus content. If we consider each keyword as
a separate “market” of consumers, then the decision of select-
ing a keyword is analogous to selecting a target market. Past
Search Engine Optimization (SEO)
research has shown that the most important factors which firms
consider when selecting the target market are market size (Abratt
Research on SEO is scarce likely due to the lack of publicly
1993; Scaperlanda and Mauer 1969), level of competition in the
available data for important variables such as clicks on each link,
market, and nature of customer needs (Abratt 1993). In our con-
the complexity of the ever-changing ranking algorithms, and the
ceptual model, we assume that firms who invest in SEO select
difficulty in measuring important variables such as the semantic
keywords based on observed search queries which are likely
relevance of website content. Extant research identifies the most
to get them the maximum number of organic clicks. Thus, we
important SEO strategies. For example, Baye, De los Santos, and
study the major factors which affect the number of organic clicks
Wildenbeest (2016) find that investments in quality and brand
a website gets from the SERP of a given search query. The fac-
awareness increases organic traffic to a website both directly,
tors which we include in our model are the rank of the website
by influencing consumer behavior on the SERP, and indirectly
on the SERP, the characteristics of the retailer/website, the char-
by improving rank or the prominence of a link on the SERP.
acteristics of the search query and other control variables (see
In addition to website quality and brand related factors such as
Fig. 2).
PageRank (Page and Brin 1998) and website authority, studies
find content related factors such as the content relevance of the
title and the snippet as the most important factors in determining Organic Rank
organic ranks on Google SERPs (Luh, Yang, and Huang 2016).
Additionally, other studies suggest improvements in SEO by The probability of a user to click on a link in an SERP
incorporating semantic factors (Mavridis and Symeonidis 2015) is dependent on the organic rank (Baye, De los Santos, and
and consumer information needs (Liu and Toubia 2015). Wildenbeest 2016; Feng, Bhargava, and Pennock 2007; Shi and
These papers study the effect of factors affecting organic rank Trusov 2013). The first three ranked links get about 60% of
and how SEO techniques such as investment in improving brand all clicks on the SERP and the first page gets about 90% of
awareness (Baye, De los Santos, and Wildenbeest 2016) and con- the clicks.8 Thus, a major part of SEO involves firms aiming to
tent relevance (Luh, Yang, and Huang 2016) can help firms. In improve their rank on the SERP in order to get a larger number of
contrast, we study how the effect of these content and quality clicks. Organic Rank is determined by the search engines based
improvement SEO techniques (off-page and on-page SEO) on on many factors. SEO practitioners have identified over 200
organic clicks varies across the type of search query. By doing factors which affect the organic rank of a webpage on an SERP.
so, we aim to provide model-based guidance to retailers in mak- These include domain level factors such as domain authority and
ing keyword selection choices based on search query type. A key age, page level factors such as page authority and content rele-
difficulty for any retailer to determine the most important fac- vance, backlink factors such as number and quality of inbound
tors to be used while selecting keywords to get a high organic links, user interaction factors such as click through rates (CTR),
ranking is that not only do search engines use many factors and social signals such number of Twitter mentions.
while ranking the organic list, but also continuously keep updat-
ing their ranking algorithm (Evans 2007). We aim to do this by
building a modeling framework which focuses on certain funda- 8 https://moz.com/beginners-guide-to-seo.

6
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

intent into three broad categories: informational search queries


where the user is looking for some information, transactional
search queries where the user is looking to purchase a prod-
uct, and navigational search queries where the user is looking
for a specific website (Broder 2002; Jansen, Booth, and Spink
2008; Jansen and Schuster 2011; Lewandowski 2006; Rose and
Levinson 2004) (see Fig. 3).
The three types of search queries represent the different stages
of the SEO purchase funnel (Schultz 2019; Rutz and Bucklin
2011; Schultz 2016). Informational search queries are typically
Fig. 3. Search and purchase funnel. searched by users at the top of the SEO funnel who are search-
ing the web for some information about a product (e.g., “What
Retailer/Website Characteristics is the difference between a Blender and a Food Processor?”).
These users are at the awareness stage of the purchase process
We consider two website characteristics in our conceptual and are looking to learn about the product they intend to buy.
model, online authority and content relevance. Online authority Transactional search queries are searched by users who have a
represents the overall quality and popularity of the website in its buying intent but are at a consideration stage (e.g., “cheap food
domain of expertise.9 Content relevance represents the degree of processors”, “best food processors”). At this stage users have
overlap or semantic and textual similarity between the webpage decided which product to buy but are still exploring different
content and the search query. As both these characteristics make brands or retailers. Navigational search queries are searched by
a website more attractive to the users searching for any given users which are aware of the brand/retailer website they want
topic, we expect that an improvement in either of the two leads to buy from and represents the purchasing stage or the lowest
to a larger number of clicks (Mavridis and Symeonidis 2015; end of the SEO funnel (e.g., “Blendtec blenders”, “Ninja food
Liu and Toubia 2015). processors”).

Search Characteristics

We also consider the characteristics of the search query. The Moderating Role of Retailer characteristics and User Intent
characteristics which we look at are search query specificity, the
search intent or search type, and a dummy variable indicating In our conceptual model, we also look at how the two
whether the website is placed on the sponsored results for the retailer/website characteristics, that is content relevance and
search query SERP. online authority interact with the search intent of users to affect
Search query specificity is a measure of how broad or specific the number of organic clicks the retailer gets from a given search
a search query is (Li et al. 2016) within a product category (niche query. The interaction effects help us analyze whether the effec-
vs. broad market). For example, for a retailer selling electric tiveness of writing relevant content varies with search query and
bikes, a search query such as “Electric Bike” is broader com- website characteristics.
pared to a more specific search query such as “Class III Electric Rutz and Bucklin (2011) showed that users conducting infor-
Bike” as it represents a more niche market. Past literature has mational search queries have a lower awareness of relevance
shown that users conducting specific search queries are more compared to users conducting navigational or transactional
advanced in the search process (Jerath, Ma, and Park 2014; search queries. Users who conduct a generic informational
White, Dumais, and Teevan 2009) and have higher purchase search are at the early stage of awareness or information gather-
intent (Moe 2003). Specific search queries are niche segments ing and would be less involved in the search process (Jerath, Ma,
(Skiera, Eckert, and Hinz 2010), where customers have a dis- and Park 2014; White, Dumais, and Teevan 2009). Thus, they
tinct set of needs and pay premium to firms which best satisfy might not be aware of the brand or product, or when they are,
their needs (Kotler 2003). Researchers have explained how they might not be aware that the brand or product is relevant for
niches have greater growth and profit potential for firms due to the search. As the main goal of such users is to gather informa-
economies of specialization (Kotler 2003; Shani and Chalasani tion, we expect they would prefer to gain that information from
1992; Toften and Hammervoll 2010; Dalgic and Leeuw 1994). a well-known website having higher authority in that area.
Thus, making content relevant to a more specific search query However, when searchers move to a more advanced stage
should get a larger number of organic clicks. of the online purchase process and search for transactional or
The search intent is defined as the consumer’s intent, or real navigational search queries, we expect that users will expend
meaning, behind the search queries. It is important for firms to more effort to find more relevant links (Agarwal, Hosanagar, and
understand the real intent of searches in order to target the most Smith 2011; Rutz and Bucklin 2011). Thus, we expect that while
appropriate type of users. Past literature has classified search making the click decision, content relevance is more important
to users conducting transactional or navigational search queries,
whereas online authority is more important for users searching
9 https://www.searchenginejournal.com/seo-guide/search-authority/. for informational search queries.

7
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Model Development top three pages for a search query from the value of these for the
focal websites, that is a mean-differenced effect.
In this section, we provide the methodology used for empir-
ically validating the conceptual model presented in Fig. 3. Our Limited Dependent Variables
goal is to translate the conceptual framework into a model that
can be estimated. This would allow us to empirical uncover The dependent variables, rank and organic clicks, are limited
the drivers of organic clicks for different retailers for different dependent variables. The number of organic clicks is censored
search queries. However, before we can directly estimate the below at 0, whereas organic rank is censored above at the maxi-
model, we need to address several modeling challenges. These mum rank that is available in the data (i.e., 30) and below at the
challenges include the endogeneity of organic rank, unobserved minimum rank a webpage can achieve (i.e., 1). Thus, estimating
heterogeneity, limited dependent variables, and the correlation the models using OLS may lead to biased estimates (Heckman
across equations. 1976). To overcome the issue of censoring in the dependent vari-
ables, we use the Tobit Model for estimating these two variables,
Endogeneity of Rank which is an approach proposed by Tobin (1958) to model limited
dependent variables.
Search engines continuously update their SERP rankings10 to
generate the most relevant search results, which means that our Correlation Across Equations
rank variable will depend on past clicks. The standard approach
in the literature on clicks at platforms (e.g., clicks at price Both dependent variables in our model, rank and organic
comparison sites or sponsored clicks at search engines) is to clicks, are part of the same overall process. As such, it is likely
assume that such positions are exogenous. However, using the that the models we estimate for these two dependent variables
Wu–Hausman test for endogeneity on our sample, we reject the are inherently related. As a result, we will estimate the equa-
hypothesis that rank is exogenous in our data (p = .023). In order tions jointly using a Conditional Mixed Process (CMP) in Stata
to control for the potential endogeneity of rank in the model, we (Roodman 2017).
take an instrumental variable approach. We use search query
competition, which represents the level of competition in the
search query market, as an IV for this model. We expect that the Modeling Framework
level of competition in the search query market would impact
the organic rank of a website as it would have to compete with a Based on these modeling challenges, we provide the full
larger number of websites for a rank on the organic list. However, model specification that is estimated for the regression process
it should not have a direct impact on the user’s click decision on in Eqs. (1) and (2) below:
an SERP, except through its impact on rank, as users are largely 
Rankik = f 1 α1 + α2 ∗ Online Authority Diff ik + α3
unaware of the level of competition in the search query market.
We believe this makes search query competition an appropriate ∗ Content Relevance Diff ik + α4 ∗ Informationalk
IV for this model. In order to estimate the parameters of the +α5 ∗ Transactionalk + α6 ∗ Search Query Specificityk
model, we will need to estimate an IV-based regression model
where we control for the endogeneity of rank. + α7 ∗ SSAik + α8 ∗ Informationalk
∗Content Relevance Diff ik + α9 ∗ Informationalk
Unobserved Heterogeneity
∗Online Authority Diff ik + α10 ∗ Transactionalk
The data has unobserved heterogeneity among firms as well ∗ Content Relevance Diff ik + α11 ∗ Transactionalk
as among the search queries. There are unobserved differences
among firms due to various factors such as varying profitability ∗Online Authority Diff ik + α12

and competition in specific industries to which firms belong. To ∗Search Query Competitionk + F + vik (1)
account for this unobserved heterogeneity of firms we use firm
fixed effects in the models to capture differences across firms.
We also need to capture the observed and unobserved differences
among search queries. We control for the observed differences Organic Clicksik = f 2 (β1 + β2 ∗ Rankik + β3
across search queries by using the three search query character- ∗Online Authority Diff ik + β4 ∗ Content Relevance Diff ik
istics in our models. Additionally, to account for the unobserved
differences within search queries, we use relative measures of the +β5 ∗ Informationalk + β6 ∗ Transactionalk
two website characteristics, that is online authority and content +β7 ∗ Search Query Specificityk + β8 ∗ SSAik
relevance. To measure these variables, we subtract the average
value of these two measures for all other websites ranked on the + β9 ∗ Informationalk ∗ Content Relevance Diff ik
+ β10 ∗ Informationalk ∗ Online Authority Diff ik
10 https://moz.com/blog/how-often-does-google-update-its-algorithm. + β11 ∗ Transactionalk ∗ Content Relevance Diff ik

8
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

+ β12 ∗ Transactionalk ∗ Online Authority Diff ik + F vides a list of all variables used in the model along with their
descriptions and sources, Table 4a provides some descriptive
+ εik (2) statistics and correlations of the variables, and Table 4b breaks
where down the descriptive statistics by retailer.

• Rankik is the rank of website i on the organic list on SERP Dependent variables
for search query k. Organic clicks are measured as the number of clicks received
• Online Authority Diff ik is the relative online authority for by a website from the organic list on the SERP. Organic rank is
website i on search query k. the minimum rank of a website on the SERP of a search query.11
• Content Relevance Diff ik is the relative content relevance of
website i to search query k. Online authority (diff)
• Informationalk is the dummy indicating if search query k is The online authority of a webpage is defined as the stand-
Informational. ing or the impact the page has in its field of expertise (Kleinberg
• Transactionalk is the dummy indicating if search query k is 1999). We use two metrics, domain authority and page authority
Transactional. to derive our measure of online authority.12 Moz’s Page/Domain
• Search Query Specificityk is the specificity of search query Authority is a metric on how high a given webpage/domain is
k. likely to rank in search results regardless of its content. It is based
• SSAik is a dummy indicating if an ad for the website i exists on the Linkscape web index and includes link count, mozRank
on the sponsored results for search query k. and several more metrics. The highest score is achieved for
• Search Query Competitionk is the level of competitiveness pages/domain that are heavily linked and for pages that are near
for search query k. to the top of SERPs. They are aggregates of several other met-
• F represents the firm fixed effects. rics including MozRank, MozTrust, quality of the link profile,
• f1(.) (f2(.)) is the one-sided (two-sided) Tobit functional form. and other factors which are known to affect rank of the web-
• vik and εik represent the error terms. site. They are represented as integer values from 1 to 100 on
a logarithmic scale and are calculated by combining more than
The two equations (Eqs. (1) and (2)) above are estimated to 40 parameters into a single score. Given that the authority met-
obtain the expected organic click for a website from any given rics comprise of several other important variables, they measure
search query based on search query and website characteristics. the overall quality of the site and the page. In a survey study
The estimated or the expected clicks can be used as a measure of conducted by SEOMoz surveying over 150 leading search mar-
how lucrative a keyword based on a set of search queries is to a keters, it was found that authority factors were considered most
firm and its website. This can be used by firms in their keyword important among the 90 ranking factors surveyed. We create an
research campaigns to identify the most lucrative search queries online authority index from these two metrics by performing a
based on the relationship between the search query characteris- Principal Component Analysis (PCA) on the two authority fac-
tics studied (search query competition, search query specificity, tors (page authority and domain authority) and taking the first
search type) and the expected organic clicks, given the website component as the measure of online authority of a website. In
characteristics. our model, we consider online authority as a relative measure,
that is we compare the online authority of a given website against
Empirical Application the average online authority of the competitor websites on that
search query. We use the following formula to measure a given
Data Description website’s online authority relative to competing websites:

Online Authority Diff ik = Online Authorityik


We empirically validate the relationships described in the
conceptual model using data for three firms from three different − Online Authority
¯ k (3)
industries. The dataset contains information on organic clicks
on the three websites for search queries relevant to these three
firms and their main competitors for a given month. The firms 11 As the data contains instances where multiple webpages from the same

include an online retailer, a culinary school, and an urgent health domain are ranked for the same search query, we take the minimum rank obtained
by any webpage from the firm as the rank of the page of that website. The
care provider. We have data for the first 30 links of the SERP
minimum rank is used for the estimation as we assume that all the organic clicks
for 1129, 331, and 331 search queries respectively for the three received by the website from the SERP, come from the first instance the user
firms. We use data from the first three pages as these pages typi- sees a link from the domain or the website.
cally account for more than 90% of the clicks (Moz Study 2015) 12 The domain authority part of online authority represents the overall authority

from SERPs. The data contains information on search query the domain is perceived to have across all its webpages. As a robustness check we
found that the domain authority as computed by Moz creates the same rank order
traffic and the number of organic clicks for the focal firm accu-
in terms of domain rank as we find from the website ranking service Alexa.com.
mulated over a month, along with the Cost Per Click (CPC). It We use Moz to create our measure of online authority as it includes a composite
also contains information about the domain and page authority of both the domain and webpage authority whereas Alexa.com only provides a
of the first 30 ranked links for each search query. Table 3 pro- measure of the domain rank.

9
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Table 3
Variable descriptions and data sources.
Variable name Description Source

Outcome variable
Organic Clicks(ik) Natural log of the number of organic clicks on website i from Focal firm
search query k
Drivers of organic clicks
Rank(ik) Minimum rank of any webpage associated with site i on SERP for Focal firm
search query k
Online authority Online authority of focal website i – average online authority of all Moz
difference(ik) other websites ranked on the first three pages of search query k
Content relevance Content relevance of focal website i – average content relevance of Computed
difference(ik) all other websites ranked on the first three pages of search query k
Search type(k) Dummies representing whether search query k is transactional or Computed
navigational
Search query specificity(k) Length of the search query k after removing stop words Computed
Search query competition(k) Average cost per click (CPC) for getting placed in top three Google adwords
sponsored search results for search query k
SSA(k) Dummy variable representing if the focal website has a sponsored SEMRush
ad on the SERP for search query k

Table 4a
Descriptive statistics and correlations.
Variable name Mean Std dev 1 2 3 4 5 6 7 8 9 10

1 Organic clicks 16.849 282.789 1


2 Rank 8.324 7.464 −.013 1
3 Online authority diff −.131 .786 .153 −.216 1
4 Content relevance diff .111 .294 .250 −.187 −.307 1
5 Search query specificity 3.196 1.131 .206 −.263 −.063 .378 1
6 Search query competition .603 .826 −.001 .114 .136 −.248 −.271 1
7 Informational .076 N/A .029 .013 −.112 .248 .174 −.173 1
8 Transactional .889 N/A −.078 .042 .067 −.248 −.240 .187 −.813 1
9 Navigational .035 N/A .092 −.091 .062 .067 .159 −.070 −.054 −.537 1
10 SSA .394 N/A .132 −.026 .036 −.233 −.280 .449 −.218 .255 −.123 1

Table 4b
Descriptive statistics by retailer.
Online retailer Culinary school Urgent care healthcare provider

Variable name Mean Std dev Mean Std dev Mean Std dev

1 Organic clicks 4.368 17.923 14.987 57.578 61.287 653.334


2 Rank 8.852 7.420 9.549 7.197 5.296 7.104
3 Online authority diff −.217 .618 .623 .972 −.585 .544
4 Content relevance diff .064 .233 −.075 .196 .455 .288
5 Search query specificity 2.765 .647 3.199 .768 4.661 1.458
6 Search query competition .762 .769 .627 1.110 .038 .205
7 Informational .004 N/A .154 N/A .257 N/A
8 Transactional .990 N/A .770 N/A .634 N/A
9 Navigational .006 N/A .076 N/A .109 N/A
10 SSA .557 N/A .215 N/A .018 N/A
No. of observations 1,129 331 331

Here we subtract the average online authority for all websites search query or the degree of relevance of webpage content to the
for a given search query k which appear on the first three pages search query. There are a number of different textual relevance
of a SERP from the online authority of the focal website i for measures such as the general edit distance or the Levenshtein
search query k. edit distance (Levenshtein 1966) which measure the textual sim-
ilarity of two phrases. However, these measures are not the most
Content relevance (diff) appropriate measures when measuring content relevance in the
The content relevance of a webpage to a search query is the context of search engines as they do not consider the semantic
degree of similarity between the content of the webpage and the relationship between phrases, as semantics play a very important

10
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Fig. 4. Calculating search query relevance.

role in SEM (Mavridis and Symeonidis 2015). Other researchers which measures the degree to which all terms of the query or sub-
in the field of SEM have incorporated semantics in their ranking query occur together in the title. Third, we calculate the overall
measures by using techniques such as Latent Semantic Analysis query score as the product of the prominence and proximity
(Luh, Yang, and Huang 2016) and Latent Dirichlet Allocation scores.
(Mavridis and Symeonidis 2015; Liu and Toubia 2015). We The second part of the relevance score, that is the score for the
calculate content relevance as the lexical as well as semantic non-query terms is calculated based on the semantic relationship
similarity between the title of the webpage and the search query. between the non-query and query terms derived using Latent
We use the webpage title for measuring content relevance for Semantic Analysis (LSA) which is a statistical technique for
several reasons. First, most SEO practitioners recommend that extracting and inferring relations of contextual usage of words in
the title should convey the most important and accurate message documents by using singular value decomposition. The overall
of the webpage and should contain the most important search non-query score between the title and a search query is calculated
queries contained in the page. Thus, it is expected that the firms as the average semantic relationship score (described in detail in
try to write a title which most accurately represents the content Web Appendix A) between each non-query term and each term
of the webpage. Second, the searcher only sees the website title in the search query. This two-part approach to measuring content
and a snippet of text from the website before making the click relevance provides a more comprehensive measure compared to
decision. So, the searcher’s click behavior is not a function of the other typical methods such as Levenshtein distance or general
content of the website. Rather, it is a function of the searcher’s edit distance as it incorporates the semantic similarity of texts
expectations of the content of the website based on the website in addition to the textual similarity.
title and snippet of text from the website. The website title is the Similar to online authority, we operationalize content rel-
more prominent part of the search result shown to the searcher on evance as a relative measure, that is we compare the content
a SERP. So, we expect that the website title is primarily what the relevance of the focal website against the average content rele-
searcher would use to make the decision about whether the link vance of the competitor websites for a given search query. Thus,
is relevant or not. For calculating content relevance between a we use the following formula to calculate content relevance:
search query and the title of any document, we adapt the method
Content Relevance Diff ik = Content Relevanceik
used by Luh, Yang, and Huang (2016). Fig. 4 presents the entire
process we follow for calculating the content relevance of the − Content Relevance
¯ k (4)
webpage to the search query. A more detailed description of this
process has been provided in Web Appendix A. Here we subtract the average content relevance for all websites
The method for measuring content relevance involves parti- for a given search query k which appear on the first three pages
tioning the terms in the title of the webpage into two groups: of a SERP from the content relevance of the focal website i for
query terms which include terms in the title which are present search query k.
in the search query and non-query terms which include terms
in the title not present in the query. The relevance score for the Search query specificity
query and non-query terms is calculated separately and the over- Search query specificity is the specificity or broadness level of
all content relevance is the sum of the non-query score and the the search query. We use the number of terms in the search query
query score. as a measure for specificity as a longer search query is typically
The query score is calculated in three steps. First, we calculate more specific (Ghose and Yang 2009; Rutz and Bucklin 2011;
the prominence score which measures how prominent the query White, Dumais, and Teevan 2009). However, before calculating
terms are in the title. Second, we calculate the proximity score the number of terms in the search query, we used a commonly
used stop word list (Page Analyzer English Stop Words List)

11
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

and remove any words in this list from our search query before Overall, we found that the majority of the variables in both of the
computing search query specificity. models were statistically significant suggesting a good model fit.

Search query competition Rank Model


Search query competition, which represents the level of com-
petition in the search query market, is measured by using the Search characteristics and user intent
Cost per Click (CPC) or bidding price for each search query. We We see that relative to navigational search queries, the
operationalize CPC as the average price in dollars that a website direct effects of informational search queries and transactional
must pay for each click obtained from the sponsored links on the search queries are positive and significant (β = 3.645; p < .01 and
SERP for the top three positions in the sponsored search results β = 2.095; p < .05). This suggests that the average rankings of
for that given search query. the focal websites are lower for informational and transactional
search queries than for navigational search queries. We do see,
though, that search queries with a higher degree of specificity
Search type (β = −1.398; p < .01) and those that have a sponsored search
The variable search type classifies search queries based on link in addition to the organic link (β = −1.897; p < .01) lead to
the intent of the user, that is we classify search queries as higher rankings on the SERP.
informational, transactional, or navigational. The categorization
approach is adapted from the manual classification of queries Website/retailer characteristics
in previous research by Jansen, Booth, and Spink (2008) and We see that the direct effect of online authority is not signifi-
Nabout and Skiera (2012). We use a three-step approach to cant (β = −.304; p > .10) when it comes to getting ranked on the
classify the search queries among different search types. First, SERP. However, we see that the direct effect of content relevance
all search queries with website/brand names are classified as is highly negative and significant, leading to a significantly better
navigational. When a user includes a particular brand or web- rating on the SERP (β = −4.118; p < .01).
site name in his search, he has likely made the decision about
where to make the purchase from and his search intent in that
case would be to navigate to a website which lets him buy the Interaction effects
product from his chosen brand or website. Second, out of the We see that several of the interaction effects are significant.
search queries not classified as navigational, those which men- While the interaction between content relevance and informa-
tion the product, a service in a geography, or uses words related tional search queries is not significant (β = 3.095; p > .10), the
to purchases such as “buy”, “cheap”, and so forth are classi- interaction between online authority and informational search
fied as transactional. By including such words in her search, queries is negative and significant (β = −2.613; p < .05). This
the user indicates an intent to make a transaction. Also, as we suggests that websites with higher online authority will get better
have excluded search queries which include brand names, a user rankings when the searchers are seeking to obtain more infor-
conducting these search queries is likely still in the consider- mation (relative to navigational or transactional). We see that the
ation stage of the buying process. Third, with characteristics interaction between content relevance and transactional search
of navigational and transactional queries being defined clearly, queries is positive and significant (β = 2.475; p < .01) and the
informational queries become the catchall by default. Thus, interaction between online authority and transactional search
search queries which are not classified as either navigational queries is negative and significant (β = −3.227; p < .01). This
or transactional are classified as informational. As the data used suggests that websites that have more relevant content (higher
for our analysis is from three retailers, a majority of the search online authority) will receive lower (higher) rankings when the
queries are classified as transactional (1,393 search queries) search queries are transactional in nature (relative to navigational
whereas a much smaller number are classified as navigational or informational).
(236 search queries) or informational (162 search queries).
Instrumental variable
We see that the instrumental variable of search query com-
SSA petition is positive a highly significant (β = 6.055; p < .01). This
The variable SSA is a dummy variable depicting whether suggests that the higher the competition for a search query, the
the focal website appeared on the sponsored part of the SERP lower the rank. In addition to its low correlation with the main
of a search query. We see that out of 1,791 search queries, the dependent variable (organic clicks), this provides further evi-
focal website was ranked on the sponsored results for 706 search dence that search query competition is a valid instrument for the
queries. Rank model.

Results Organic Clicks Model

We provide the results for the joint estimation of the Rank and Rank
Organic Clicks models in Table 5. The table provides coefficient We see that the rank of the website on the SERP has a neg-
estimates and standard errors for coefficients for the two models. ative and significant impact on the number of organic clicks

12
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Table 5
Estimation results.
Variables Rankik model coeff. (s.e.) ln(organic clicksik ) model coeff. (s.e.)

Intercept 10.794*** (1.139) −.148 (.196)


Rankik – −.001*** (.0002)
Search characteristics and user intent
Informationalk 3.645*** (1.147) −.051*** (.016)
Transactionalk 2.095** (.935) −.003*** (.001)
Search query specificityk −1.398*** (.149) .291*** (.023)
SSAk −1.897*** (.307) .261*** (.047)
Website/retailer characteristics
Online authorityik −.304 (.906) .651*** (.124)
Relevanceik −4.118*** (1.475) 2.622*** (.344)
Interaction effects
Relevanceik × informationalk 3.095 (2.870) −2.448*** (.397)
Authorityik × informationalk −2.613** (1.033) .366** (.146)
Relevanceik × transactionalk 2.475*** (.206) 1.307*** (.339)
Authorityik × transactionalk −3.227*** (.935) −1.448*** (.135)
Instrumental variable
Search query competitionk 6.055*** (.910) ----
Model fit
Log-likelihood −3,716.473
No. of observations 1,791

* 0.05<P value <0.1; ** 0.01<P value <0.05; *** P value < 0.

(β = −.001; p < .01). This suggests that as the rank gets higher, We see that the interaction between content relevance
that is, lower rank on the SERP, it leads to fewer organic clicks. and transactional search queries is positive and significant
(β = 1.307; p < .01) and the interaction between online author-
ity and transactional search queries is negative and significant
Search characteristics and user intent
(β = −1.448; p < .01). This suggests that websites higher content
We see that relative to navigational search queries, the main
relevance (online authority) will receive more (fewer) organic
effects of informational and transactional search queries are neg-
clicks when search queries are transactional in nature (relative
ative and significant at generating organic clicks (β = −.051;
to navigational or informational).
p < .01 and β = −.003; p < .01). This suggests that the number
of organic clicks is lower for informational and transactional
search queries relative to navigational search queries. We see that
search query specificity (β = .291; p < .01) and SSA (β = .261; Discussion
p < .01) are both positive and significant. This suggests that when
search queries are more specific and when the website appears on To provide some additional insights from our findings, we
the sponsored search results, the website receives more organic used the estimates of our models to look at the impact of the
clicks on average. main and interaction effects on expected organic clicks. In this
case we compare the expected number of organic clicks when
we increase or decrease some of the key variables in the models
Website/retailer characteristics by plus or minus one standard deviation.
We see that the direct effect of online authority and content
relevance are both positive and significant (β = .651; p < .01;
β = 2.622; p < .01). As expected, this suggests that websites Main Effects
which have higher online authority and are more relevant to
the search query lead to a higher number of organic clicks. To evaluate the main effects of rank, online authority and
relevance, we computed the expected value of organic clicks
Interaction effects considering the results from the rank and organic clicks models.
We see that all of the interaction effects are significant. As expected, we found that websites which were ranked higher,
We see the interaction between content relevance and informa- had a higher online authority, and had a higher relevance to the
tional search queries is negative and significant (β = −2.448; search query led to significant higher expected clicks. Specif-
p < .01), the interaction between online authority and infor- ically, we find when rank is high (low), the expected number
mational search queries is positive and significant (β = .366; of organic clicks is 37.97 (4.36), when online authority is high
p < .05). This suggests that websites with higher content rele- (low), the expected number of organic clicks is 44.93 (8.67), and
vance (online authority) will get fewer (more) organic clicks when the relevance between the search query and the website
when search queries are informational in nature (relative to is high (low), the expected number of organic clicks is 47.99
navigational or transactional). (5.25).

13
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Fig. 5. Moderating effects of informational search on relevance (A) and authority (B).

Interactions Effects We see in Fig. 5 that when the search query type is transac-
tional (vs. navigational or informational), the expected number
We also wanted to understand how the effects of search of organic clicks is significantly higher when the website has a
type moderate the link between the online authority, relevance, higher content relevance (52.738 vs. 5.228; p < .01). Further, we
and the expected number of organic clicks. To evaluate these see in Fig. 6 that when the search query type is transactional (vs.
interaction effects, we computed the expected number of clicks navigational or informational), the expected number of organic
considering the results from the rank and organic clicks models. clicks is statistically not different whether the website has a
We provide the results of this analysis in Figs. 5 and 6. higher or lower online authority (11.132 vs. 10.135; p > .10).
We see in Fig. 5 that when the search query type is informa- This suggests that when searchers are looking for transactional
tional (vs. navigational or transactional), the expected number of content about products they want to find and purchase, that is
organic clicks is statistically not different regardless of whether at the middle-to-bottom of the funnel of the customer journey,
the content relevance is high versus low (9.521 vs. 8.574; they are more likely to click on websites which have the highest
p > .10). Further, we see in Fig. 6 that when the search query content relevance, even if the content is not from the website
type is informational (vs. navigational or transactional), the with the highest online authority.
expected number of organic clicks is significantly higher when Looking at the results from Figs. 5 and 6 together, it suggests
the website has a higher online authority (44.158 vs. 1.848; that as customers move along the customer journey from the
p < .01). This suggests that when searchers are looking for infor- top to the bottom of the purchase funnel, their decisions to click
mational content about products in general, that is at the top on websites varies with regard to the degree that the website
of the funnel of the customer journey, they are more likely is higher or lower on online authority and content relevance.
to click on websites which have the highest online author- Specifically, searchers that are earlier in the search process are
ity, even if the content is not as relevant to the exact search more likely to value online authority to get general information
query. about retailers, brands, and/or products. But as the search process

14
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

Fig. 6. Moderating effects of transactional search on relevance (A) and authority (B).

progresses, searchers are more likely to value finding exactly edit distance as it incorporates the semantic similarity of texts
the retailer, brand, and/or product that best fits their needs (i.e., in addition to the textual similarity measured by the existing
higher content relevance). methods used. Further, we build a modeling framework to study
the effect of the measured search and website characteristics on
rank and organic clicks to understand how SEO strategies can
Implications to Marketing Theory and Practice vary with search queries. Specifically, we study how key search
characteristics (search query popularity, search query competi-
Keyword research forms an integral part of both SSA and tion, search query specificity, and search intent) as well as two
SEO. For SSA, firms select keywords for auction bids, whereas key retailer/website characteristics (content relevance and online
in SEO, website content is built around the selected keywords to authority) affect the organic clicks a website receives for a search
target a certain set of search queries. Though significant research query. The expected organic clicks can be used as a measure of
has been done about keyword research in SSA, such literature the effectiveness of a SEO campaign as it represents the number
in SEO is scarce. This has led many SEO strategies to rely on of users which the SEO campaign attracts towards the web-
sets of common heuristics to select keywords for SEO. This can site. The findings from our model have implications for both
be primarily attributed to the fact that data on organic clicks is marketing researchers and practitioners in the field of SEO.
not very readily available and some of the metrics used in SEO
keyword research such as content relevance or search intent are
complicated and not easily measurable. Implications to Theory
Thus, having obtained a dataset on the organic clicks and cer-
tain website characteristics from three different online firms, we This paper enhances the research available in the field of
use the Latent Semantic Analysis (LSA) text mining approach keyword research for SEO as it investigates the relationships
to measure the content relevance of a webpage specific to any between search and website characteristics and the expected
search query. The approach used in our paper is a more compre- rank and organic clicks. The modeling framework studies two
hensive method to measure content relevance when compared to important parts of the organic click generation process. It studies
other common measures such as Levenshtein distance or general how the type of selected keyword for a given webpage influences

15
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

organic rank and then further studies how the rank a website The paper sheds light on the moderating influence of search
receives translates into organic clicks. It provides researchers intent on the effectiveness of writing relevant content as well as
with a broad outlook on how these relationships work and the effectiveness of working on improving online authority. We
how understanding these relationships can help in selecting find that improving content relevance is more effective in get-
appropriate keywords for content creation and optimization. ting more clicks from for transactional search queries, whereas
Additionally, our framework provides a more comprehensive improving online authority is more effective in getting clicks
approach to measure important SEO metrics such as content for informational search queries. As content optimization (on-
relevance and search intent. page SEO) is a more short-run strategy compared to improving
We show that the type of keyword the firm selects to write online authority (off-page SEO), the finding suggests that a web-
content about to target certain search queries not only influences site having a lower online authority would be better off writing
the rank directly, but it also influences the ranking process which content about transactional (vs. informational or navigational)
search engines use to order organic links. Results from our model search queries in order to increase their expected organic clicks.
show that online authority (content relevance) is more important This finding points out the importance of considering the intent
for getting a higher rank for informational (transactional) search behind search queries when optimizing website content.
queries. This is likely because users conducting informational
search queries have a lower awareness of relevance compared Limitations and Future Research
to users conducting navigational or transactional search queries
(Rutz and Bucklin 2011). Users who conduct a generic informa- As with any empirical analysis there are several limitations
tional search are at the early stage of awareness or information of our study. Our analysis is based on data from a single snap-
gathering and would be less involved in the search process shot of search and click behavior where we focus on explaining
(Jerath, Ma, and Park 2014; White, Dumais, and Teevan 2009). variation across search queries, websites, and domains. It would
Thus, they might not be aware of the brand or product, or when be useful for future studies to examine these relationships using
they are, they might not be aware that the brand or product data across multiple time periods to see whether changes in firm
is relevant for the search. As the main goal of such users is strategies to target different types of search queries led to differ-
to gather information, they would prefer to gain that informa- ences in the impact of content relevance on organic clicks, even
tion from a well-known website having higher authority in that though website content changes are usually not frequent. Also,
area. However, when they move to a more advanced stage of as we use data for only a one-month period, the results may not
the online purchase process and conduct transactional or navi- generalize to longer timelines or larger datasets. Future research
gational search queries, they tend to click on results much lower may want to study if these results hold for different types of
in the sponsored list, spending more effort in finding relevant industries and over a longer timeline.
links (Agarwal, Hosanagar, and Smith 2011) as their awareness Further, our analysis is based on data from three retailers
of relevance has increased (Rutz and Bucklin 2011). Thus, while (online retailer, culinary school, and urgent health care provider)
making the click decision, content relevance is more important and while we included retailer fixed effects to capture differ-
to users conducting transactional or navigational search queries, ences in the level of clicks for each retailer, it is also important
whereas online authority is more important for users conducting to understand whether our findings for the other main effects
informational search queries. varied across these retailers. We did estimate the same model by
interacting the retailer fixed effects with each of the search and
Implications to SEO Practice website characteristics. We did not find any significant inter-
actions. This suggests for our sample that our main findings
We study how the effectiveness of SEO strategies involv- held across each retailer. However, future research could inves-
ing improvement of online authority and content relevance can tigate whether the impact of search or website characteristics on
vary across various search intents. While improving online organic clicks varies across different retailers.
authority is akin to brand-building and could be viewed as long- As Google’s search results grow increasingly personalized,
term investment, increasing content relevance to selected search SEO has become a challenge as location, previous searches,
queries can improve rank and organic clicks immediately. The and browser history now affect the results that users get. Our
managerial importance of the paper stems from the fact that model does not consider these localized and personalized factors
keyword research is one of the primary methods used by SEO which affect search rankings. However, this should not affect the
marketers to enhance search traffic. The modeling framework in main findings of this paper as we consider in our model certain
this paper can be used by SEO practitioners to understand how fundamental search query and website factors, the importance of
the characteristics of the selected search queries will affect the which is unlikely to vary much. Future research can incorporate
expected clicks the firm gets. As attracting visitors to the website user demographic factors in order to control for localization and
is the first step towards getting more customers, this can serve personalization of search results.
as a measure of the effectiveness of a SEO campaign. Thus, the Another limitation of our research is that we do not study the
model provides guidance to identify the type of search charac- financial impact of SEO strategies and only look at their impact
teristics which would maximize the expected number of organic on organic clicks. As the purchase probability of the user varies,
clicks to their websites and select appropriate keywords to target a large number of organic clicks does not always mean financial
search queries accordingly. benefit for the firm as the user who visits the website may not

16
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

end up making a financial transaction. As the conversion rate or Jansen, Bernard J, Danielle L Booth and Amanda Spink (2008), “Determining
the financial benefit from each click is unknown, we were unable the Informational, Navigational, and Transactional Intent of Web Queries,”
Information Processing & Management, 44 (3), 1251–66.
to incorporate this in our model. However, as visiting a website
Jansen, Bernard J and Simone Schuster (2011), “Bidding on the Buying Fun-
is the first step towards making a purchase, the expected clicks nel for Sponsored Search and Keyword Advertising,” Journal of Electronic
can be used as a measure of effectiveness of the SEO campaign. Commerce Research, 12 (1), 1–18.
Future research could extend on our findings by estimating the Jerath, Kinshuk, Liye Ma and Young-Hoon Park (2014), “Consumer Click
financial impact of keyword selection strategies to target search Behavior at a Search Engine: The Role of Keyword Popularity,” Journal
of Marketing Research, 51 (4), 480–6.
queries.
Kang, In-Ho and Gil Chang Kim (2003), “Query type classification for web doc-
ument retrieval,” Proceedings of the 26th annual international ACM SIGIR
Funding conference on Research and development in information retrieval,.
Katona, Zsolt and Miklos Sarvary (2010), “The Race for Sponsored Links:
Bidding Patterns for Search Advertising,” Marketing Science, 29 (2),
Financial support was provided through a grant from MSI
199–215.
and its Young Scholars program. Kleinberg, Jon M (1999), “Authoritative Sources in a Hyperlinked Environ-
ment,” Journal of the ACM (JACM), 46 (5), 604–32.
Appendix A. Supplementary data Kotler, P. (2003), Marketing Management, 11th ed. Upper Saddle River, NJ:
Prentice-Hall.
Kritzinger, Wouter T and Melius Weideman (2015), “Comparative Case Study
Supplementary material related to this arti- on Website Traffic Generated by Search Engine Optimisation and a Pay-Per-
cle can be found, in the online version, at Click Campaign, versus Marketing Expenditure,” South African Journal of
doi:https://doi.org/10.1016/j.jretai.2020.12.002. Information Management, 17 (1), 1–12.
Levenshtein, Vladimir I (1966), “Binary Codes Capable of Correcting Deletions,
Insertions, and Reversals,” Soviet Physics Doklady, 10 (8), 707–10.
References Li, Hongshuang, PK Kannan, Siva Viswanathan and Abhishek Pani (2016),
“Attribution Strategies and Return on Keyword Investment in Paid Search
Abou Nabout, Nadia (2015), “A Novel Approach for Bidding on Keywords Advertising,” Marketing Science, 35 (6), 831–48.
in Newly Set-Up Search Advertising Campaigns,” European Journal of Liu, Jia and Olivier Toubia (2015), A framework for Modeling How Consumers
Marketing, 49 (5/6), 668–91. form Online Search Queries, Working Paper.
Abratt, Russell (1993), “Market Segmentation Practices of Industrial Mar- Luh, Cheng-Jye, Sheng-An Yang and Ting-Li Dean Huang (2016), “Estimating
keters,” Industrial Marketing Management, 22 (2), 79–84. Google’s Search Engine Ranking Function from a Search Engine Optimiza-
Agarwal, Ashish, Kartik Hosanagar and Michael D. Smith (2011), “Location, tion Perspective,” Online Information Review, 40 (2), 239–55.
Location, Location: An Analysis of Profitability of Position in Online Adver- Mavridis, Themistoklis and Andreas L Symeonidis (2015), “Identifying Valid
tising Markets,” Journal of Marketing Research, 48 (6), 1057–73. Search Engine Ranking Factors in a Web 2.0 and Web 3.0 Context for
Baye, Michael R, Babur De los Santos and Matthijs R. Wildenbeest (2016), Building Efficient SEO Mechanisms,” Engineering Applications of Artificial
“Search Engine Optimization: What Drives Organic Traffic to Retail Sites?,” Intelligence, 41, 75–91.
Journal of Economics & Management Strategy, 25 (1), 6–31. Moe, Wendy W (2003), “Buying, Searching, or Browsing: Differentiating
Berman, Ron and Zsolt Katona (2013), “The Role of Search Engine Optimization Between Online Shoppers using In-Store Navigational Clickstream,” Jour-
in Search Marketing,” Marketing Science, 32 (4), 644–51. nal of Consumer Psychology, 13 (1–2), 29–39.
Broder, Andrei (2002), A Taxonomy of Web Search, ACM. Nabout, Nadia Abou and Bernd Skiera (2012), “Return on Quality Improve-
Brynjolfsson, Erik, Yu Hu and Michael D. Smith (2003), “Consumer surplus ments in Search Engine Marketing,” Journal of Interactive Marketing, 26
in the digital economy: Estimating the value of increased product variety at (3), 141–54.
online booksellers,” Management Science, 49 (11), 1580–96. Page, Lawrence and Sergey Brin (1998), “Pagerank, An Eigenvector Based
Brynjolfsson, Erik, Yu Jeffrey Hu and Michael D. Smith (2006), “From niches to Ranking Approach for Hypertext,” 21st Annual ACM/SIGIR International
riches: Anatomy of the long tail,” Sloan Management Review, 47 (4), 67–71. Conference on Research and Development in Information Retrieval,.
Brynjolfsson, Erik, Yu Hu and Duncan Simester (2011), “Goodbye pareto princi- Purcell, Kristen, Joanna Brenner and Lee Rainie (2012), Search Engine Use
ple, hello long tail: The effect of search costs on the concentration of product 2012, PEW Research Center.
sales,” Management Science, 57 (8), 1373–86. Roodman, David (2017), CMP: Stata Module to Implement Conditional (Recur-
Chen, Jianqing, De Liu and Andrew B. Whinston (2009), “Auctioning keywords sive) Mixed Process Estimator, Statistical Software Components.
in online search,” Journal of Marketing, 73 (4), 125–41. Rose, Daniel E and Danny Levinson (2004), “Understanding User Goals in Web
Dalgic, Tevfik and Maarten Leeuw (1994), “Niche Marketing Revisited: Search,” Proceedings of the 13th International Conference on World Wide
Concept, Applications and Some European Cases,” European Journal of Web: ACM,.
Marketing, 28 (4), 39–55. Rutz, Oliver J and Randolph E Bucklin (2011), “From Generic to Branded:
De los Santos, Babur and Sergei Koulayev (2013), Optimizing Click-Through in A Model of Spillover in Paid Search Advertising,” Journal of Marketing
Online Rankings for Partially Anonymous Consumers, Working Paper. Research, 48 (1), 87–102.
Evans, Michael P (2007), “Analysing Google Rankings Through Search Engine and (2007), A Model of Individual
Optimization Data,” Internet Research, 17 (1), 21–37. Keyword Performance in Paid Search Advertising,.
Feng, Juan, Hemant K Bhargava and David M. Pennock (2007), “Implementing Rutz, Oliver J, Randolph E Bucklin and Garrett P Sonnier (2012), “A Latent
Sponsored Search in Web Search Engines: Computational Evaluation of Instrumental Variables Approach to Modeling Keyword Conversion in Paid
Alternative Mechanisms,” INFORMS Journal on Computing, 19 (1), 137–48. Search Advertising,” Journal of Marketing Research, 49 (3), 306–19.
Ghose, Anindya and Sha Yang (2009), “An Empirical Analysis of Search Engine Scaperlanda, Anthony E and Laurence J Mauer (1969), “The Determinants of
Advertising: Sponsored Search in Electronic Markets,” Management Sci- US Direct Investment in the EEC,” The American Economic Review, 59 (4),
ence, 55 (10), 1605–22. 558–68.
Heckman, James J (1976), “The Common Structure of Statistical Models of Schultz, Carsten D (2016), “Search Triangle: Integrating Search Intention in
Truncation, Sample Selection and Limited Dependent Variables and a Simple Search Engine Advertising,” in 2016 Global Marketing Conference at Hong
Estimator for Such Models,” Annals of Economic and Social Measurement, Kong1669–84.
5 (4),. NBER.

17
+Model
RETAIL-787; No. of Pages 18 ARTICLE IN PRESS
M. Nagpal, J.A. Petersen Journal of Retailing xxx (xxx, xxxx) xxx–xxx

(2019), “Informational, Transactional, and Navigational Toften, Kjell and Trond Hammervoll (2010), “Niche Marketing and Strate-
Need of Information: Relevance of Search Intention in Search Engine Adver- gic Capabilities: An Exploratory Study of Specialised Firms,” Marketing
tising,” Information Retrieval Journal,, 1–19. Intelligence & Planning, 28 (6), 736–53.
Shani, David and Sujana Chalasani (1992), “Exploiting Niches Using Relation- White, Ryen W, Susan T Dumais and Jaime Teevan (2009), “Characterizing the
ship Marketing,” Journal of Consumer Marketing, 9 (3), 33–42. Influence of Domain Expertise on Web Search Behavior,” Proceedings of
Shi, Savannah Wei and Michael Trusov (2013), The Path to Click: Are You on the Second ACM International Conference on Web Search and Data Mining:
It, Working Paper. ACM,.
Silverman, D. (2010), IAB Internet Advertising Revenue Report, New York: White, Ryen W and Dan Morris (2007), “Investigating the Querying and Brows-
Interactive Advertising Bureau.26. ing Behavior of Advanced Search Engine Users,” Proceedings of the 30th
Skiera, Bernd and Nadia Abou Nabout (2013), “Practice Prize Paper—Prosad: Annual International ACM SIGIR Conference on Research and Development
A Bidding Decision Support System for Profit Optimizing Search Engine in Information Retrieval: ACM,.
Advertising,” Marketing Science, 32 (2), 213–20. White, Alexander (2013), “Search engines: Left side quality versus right side
Skiera, Bernd, Jochen Eckert and Oliver Hinz (2010), “An Analysis of the Impor- profits,” International Journal of Industrial Organization, 31 (6), 690–701.
tance of the Long Tail in Search Engine Marketing,” Electronic Commerce Wiesel, Thorsten, Koen Pauwels and Joep Arts (2011), “Practice Prize
Research and Applications, 9 (6), 488–94. Paper—Marketing’s Profit Impact: Quantifying Online and Off-line Funnel
Taylor, Greg (2013), “Search quality and revenue cannibalization by competing Progression,” Marketing Science, 30 (4), 604–11.
search engines,” Journal of Economics & Management Strategy, 22 (3), Yang, Sha and Anindya Ghose (2010), “Analyzing the Relationship Between
445–67. Organic and Sponsored Search Advertising: Positive, Negative, or Zero
Tobin, James (1958), “Estimation of Relationships for Limited Dependent Vari- Interdependence?,” Marketing Science, 29 (4), 602–23.
ables,” Econometrica: Journal of the Econometric Society,, 24–36. Yao, Song and Carl F Mela (2011), “A dynamic model of sponsored search
advertising,” Marketing Science, 30 (3), 447–68.

18

You might also like