Professional Documents
Culture Documents
February 2020
Abstract
Vertical integration is central to understanding patterns of economic
activity, but there has been limited empirical work measuring the extent
to which firms own and utilize direct upstream and downstream produc-
tion links for sourcing physical inputs. We use administrative data from
Karnataka, India on the universe of good shipments between any two es-
tablishments to answer this question. Uniquely, we can identify if two
establishments are under joint ownership allowing us to map the flow of
goods both within and across firms. We find that 11% of input value can
be sourced from vertically integrated upstream establishments. Of this
potential 11%, between 30 to 40% of trade actually materializes. This
suggests that the supply of physical goods along the production chain is
an important rationale for vertical integration. Notably, within the set of
vertically integrated firms, firms which source at least one product from
within account for over three-quarters of economic activity. We look at
factors associated with the decision to source a given product from within
and find that firm size, distance to outside and within firm suppliers,
frequency of input requirement, product relationship specificity, volume,
R&D requirements and competition both upstream and downstream are
important factors. We also look at factors associated with the ownership
of an vertically integrated establishment and find that firm size, prod-
uct specificity, R&D requirements and competition matter. Finally, we
estimate an establishment level gravity specification to assess the impor-
tance of integration for trade. While both distance and state borders are
important, integration emerges as by far the strongest driver of sourcing
decision.
1
1 Introduction
Vertically integrated firms play a large and important role in the economy. There
exists a rich literature focused on integration decisions and their consequences,
but we know very little empirically about the nature of these vertical relation-
ships. Theory puts forward many possible rationales for the existence of these
relationships such as mitigating contracting frictions (Coase, 1937; Williamson,
1971), scale and scope economics (Stigler, 1951; Novak and Stern, 2009), or
strategic motives related to consolidating or extending market power (Perry,
1989; Rey and Tirole, 2007; Bresnahan and Levin, 2012). However, uncovering
the relative importance of different theories requires data on intra-firm activity
in vertically integrated firms which is often unavailable. In this paper, we use
administrative data on the movement of physical goods both within and outside
the firm from Karnataka, India to measure the extent to which firms own and
utilize direct upstream and downstream production links for sourcing physical
inputs.
We use administrative data from Karnataka, one of the largest states in India
with a population of 61 million people and GDP greater than $220 billion USD.
We observe the universe of physical shipments, above a threshold (∼ $700 USD),
between any two establishments through or within the state. Uniquely, we can
identify if two establishments are under joint ownership allowing us to map the
flow of goods both within and across firms. We measure the extent to which
direct production links exists within a firm, and find that downstream estab-
lishments can potentially source 11% of their input value from an integrated
upstream establishment. We then measure trade along these production links
and find that between 30 to 40% of the potential trade materializes. Further,
within the set of vertically integrated firms, a large number source at least one
product from within. These firms make up over three-quarters of economic ac-
tivity. To our knowledge, this is the first paper to provide empirical evidence
on within-firm production chains using linked establishment to establishment
transaction data at the product level. The literature so far has been largely
limited to identifying production linkages using industry level input-output ta-
bles, while our data allows us to do so at the firm level.
We run a series of robustness checks to verify our benchmark results. The first
measurement challenge is defining buyers and suppliers for a given product in
our data. We identify an establishment as a buyer of an input if its total inward
shipment value of that product exceeds its total outward shipment value mul-
tiplied by some threshold. Similarly, we identify an establishment as a seller of
an input if its total outward shipment value of that product exceeds its total in-
ward shipment value multiplied by some threshold. Our results are robust to the
thresholds that are picked at this stage. Next, we identify vertically integrated
production links by whether a downstream establishment has a “potential” up-
stream supplier owned by the same firm for a particular input. Our results are
similar under different definitions and scale requirements for the existence of a
2
“potential” supplier owned by the same firm. Our results are also robust to
excluding products that are not an establishment’s “primary input”, suppliers
that do not ship frequently, and suppliers that are not within a buyer’s district.
For our main analysis, we use four-digit product codes. We provide additional
results using different levels of aggregation for products. First, we aggregate all
the products in 21 different broad categories and provide results on the share
of within-firm sourcing for each category. Second, we ensure that our 4 digit
product categories are sufficiently narrow. We repeat our analysis for observa-
tions which report an 8 digit product code. As larger firms are more likely to
report 8 digit product codes this sample selection is endogenous. Reassuringly,
the proportion of within-firm sourcing remains approximately the same.
After establishing that firms utilize their internal networks, we explore the de-
cision of utilizing existing within-firm production links. This is an important
margin as many firms face this decision. 60 percent of economic activity takes
place in firms that are vertically integrated in at least one product. We explore
the effect of distance, product specificity, product R&D requirement, transac-
tion frequency, and upstream and downstream competition on the decision to
source from within. To our understanding, we are the first paper to explore this
firm decision.
3
that are relationship-specific from within. We use the measure of specificity
developed in Rauch (1999) to test this prediction. A product is categorized as
relationship-specific if it is not listed on an exchange. We find that within-firm
sourcing for specific products is 7 - 9 p.p. greater than for non-specific prod-
ucts. Further, the literature suggests that there may be under-investment by
the contracting parties if they do not internalize the benefit to the other. This
would drive higher sourcing from within-firm establishments for products where
investment is important. We find that doubling product R&D investment in-
tensity increases the within-firm sourcing by 8 p.p.
In the Lucas (1988) model, managerial talent drives the firm size distribution. If
larger firms have better managerial talent and sourcing from within is a decision
made by the management, then larger firms are likely to source from within1 .
We find that larger firms in terms of total value, number of establishments, and
number of products are far more likely to take advantage of their integrated
supply networks.
firms are more likely to export. Similarly, if there are fixed costs of sourcing from within then
it may be more beneficial for larger firms to source from within due to scale economies.
4
We similarly analyze competition in the downstream industry. Ideally, we would
like to conduct the analysis at the product level, similar to the upstream case. If
an establishment faces higher competition for a product, we would like to know
if it is more likely to source the inputs for that product from within. How-
ever, while we see inputs at the establishment level we do not see them at the
product level, i.e., for an establishment that sells multiple products we see the
total inputs purchased, but not inputs purchased for each of the products sepa-
rately. We get around this issue, by assigning each downstream establishment a
market competition measure which is the weighted average of the HHI over all
its output products where we weigh each input’s HHI by its output value. We
find that establishments operating in a more competitive environment are more
likely to source inputs from outside the firm. A 0.1 increase in our weighted
HHI measure faced by the downstream firm decreases within-firm sourcing by
3.6 percentage points. This result was not obvious ex-ante. If firms operate in
a more competitive industry, then having an upstream supplier provides a com-
petitive advantage by reducing the incidence of double marginalization hence
making it more likely that firms would source from within. On the other hand,
input quality and price matter more in more competitive industries and firms
are more likely to source from the highest quality supplier available which may
be outside the firm.
For our main analysis, we focused on the decision of the downstream buyers
to source from within the firm. Another way to get at the question would be
to look at upstream suppliers that have an integrated downstream buyer and
measure the proportion of their sales within the firm. These results would di-
verge if the size distributions of upstream and downstream establishments are
different. We find that when measured from the suppliers’ side, our results are
approximately the same.
Most of the discussion so far has focused on the intensive margin decision of
utilizing an existing link. We also look at what is associated with the existence
of an integrated seller for a given product, or the extensive margin decision
over vertical ownership. We find that larger firms are more likely to have an
integrated seller. If an input is more relationship-specific there is a higher like-
lihood of having an integrated seller. More competitive product categories are
less likely to have an integrated buyer. Finally, if a product is more R&D in-
tensive it is more likely to have an integrated seller.
5
over 3000 potential suppliers and the share of sourcing with any one is small
at 0.02%. For the average firm in our data, a one standard deviation reduction
in distance from the supplying establishment increases the share of sourcing to
0.026%, removing state border barriers increases the share to 0.07%, and ver-
tical integration of the supplying establishment increases the share to 2.03%.
While both distance and state borders are important, integration emerges as by
far the strongest driver of sourcing decision.
Our results pertain to a specific context, and care must be taken to generalize it
to other settings. We are studying firm behavior in a developing country where
contracting frictions might be especially high. Boehm and Oberfield (2018)
study how differences in contract enforcement across states in India distorts pro-
duction behavior. They find that in industries that tend to rely more heavily on
relationship-specific intermediate inputs, plants in states with more congested
courts shift their expenditures away from intermediate inputs and appear to be
more vertically integrated. Congestion in courts may also push firms to source
inputs from within rather than relying on outside suppliers. Indeed, Atalay,
Hortaçsu, and Syverson (2014) find that firms are much less likely to source
inputs from integrated suppliers in the United States. Ramondo, Rappoport,
and Ruhl (2016) study the question for MNCs based out of the US and find that
most MNCs do not engage in trade with affiliates. Atalay, Hortaçsu, Li, et al.
(2019) estimate a gravity specification similar to ours at the seller-destination
level to measure the importance of distance and existence of integrated suppliers
for within firm trade in the United States. They also find that firm boundaries
serve as significant barriers to trade, though the magnitude of our results indi-
cate that they are more important in our context.2 Our different results, in a
developing country context, highlight the importance of taking into account the
legal and contracting environment when studying firm behavior.
The remainder of the paper presents our empirical results in more detail. It is
organized as follows. In Section 2 we describe the data. In Section 3 and 4 we
define variable definitions and construction. In Section 5, we present our results
on the ownership and utilization of vertically integrated links. In Section 6, we
analyze when firms utilize their vertically integrated networks. In Section 7,
we study the firm integration decision. In Section 8, we quantify the relative
importance of distance, state borders, and vertical integration in affecting the
volume of trade. We conclude in Section 9.
2 Both Atalay, Hortaçsu, and Syverson (2014) and Atalay, Hortaçsu, Li, et al. (2019) use
data from the US Commodity Flow Survey (CFS). Their data contains a sample of 40 ran-
domly selected shipments per quarter for 9000 multi-unit firms. Their data does not allow
them to distinguish if a shipment is within firm. To get around this issue, they combine infor-
mation on the destination of the shipment with information on the presence of an integrated
establishment at the destination. Unlike the CFS, our data allows us to precisely classify each
shipment as being within or outside the firm.
6
2 Data
In India, every registered business is required to submit an electronic document
(known as an e-way bill)3 to the government prior to any movement of goods
valued above the threshold of Rs. 50,000 (∼ $700 USD). This includes any
good transported by road, air, railways, or water vessel. If the consigner is a
registered taxpayer, they are responsible for generating an e-way bill. If they
are not registered, then generating the e-way bill becomes the responsibility of
the consignee or the person transporting goods. Notably, the bill is generated
even if goods are shipped to a different establishment within the firm. The law
was introduced to increase tax compliance and reduce shipping times. Gov-
ernment officials have the authority to intercept any conveyance to verify the
e-way bill or the e-way bill number for all inter and intra-state shipments. The
penalty for non-compliance is Rs 10,000 (∼ $ 141 USD) or the value of tax-
evaded, whichever is greater. In its first phase, the law covered only interstate
shipments and in later phases was expanded to include intra-state shipments as
well.4
We use administrative data on e-way bills from the state of Karnataka. Kar-
nataka was the first state to roll out this bill at the intra-state shipment level,
starting on April 1, 2018. Our dataset covers the universe of bills from April 1,
2018 to August 29, 2019.
For each e-way bill, we observe the date of shipment, each firms’ tax ID (GSTIN),
distance, the ZIP code (PIN code) of the sender and the receiver, and the total
value of the shipment5 . A given shipment can contain multiple goods. For each
good within a shipment, we observe its HS product code, its total value, and
quantity. Firms report either 4 digit or 8 digit HS product codes, so for most
of our analysis, we will define our products by their 4 digit code. However, we
will repeat our analysis with subsetting to observations for which we see 8 digit
codes for robustness.
India starting in 2017. There is a small but growing literature studying the impact of the tax
regime. See for example Agarwal et al. (2019) and Leemput (2020).
4 For more information refer to the information provided at https://cleartax.in/s/eway-bill-
gst-rules-compliance.
5 We top-code all values at the 99th percentile.
7
3 Measuring Vertical Integration and Within-
Firm Sourcing
This section explains how we use our data to measure vertical integration and
whether an input is sourced from within the firm.
Next, we determine the set of inputs and outputs for each establishment. It
is possible that an establishment both ships in and ships out a given product.
Thus, we define an establishment to be a “net-buyer” of a good if the total
inward shipment value of that product observed in our data exceeds the to-
tal outward shipment value a multiplied by some threshold. In our preferred
specification, we use a threshold of 1.2, so a given product is an input to an
establishment or the establishment is a “net-buyer” of that good if its total in-
ward shipment value is greater than 1.2 times its total outward shipment value.
6 We use the word establishment to refer to a single unit within the firm. For example, if
a firm owns two factories then we will say that the firm has two establishments.
8
Similarly, we define an establishment to be a “net-seller” of a good if the total
outward shipment value of that product observed in our data exceeds 1.2 times
its total outward shipment value. We provide results for three such thresholds
for robustness (1, 1.2 and 1.5) and find that our results are not sensitive to the
choice of threshold.
9
Similarly, we determine whether an upstream establishment has a potential
downstream buyer by considering each establishment-product pair for which
the establishment is a “net-seller” in our data and checking for the existence of
a downstream “net-buyer” (“Downstream Integrated: Exists”), and also condi-
tioning on establishment-level scale (“Downstream Integrated: Large”) or total
scale (“Downstream Integrated: Total”).
We can also aggregate to the firm level and measure the extent to which a given
firm k utilizes its vertically integrated production links by taking the weighted
average for all W ithinSharejp for each product-establishment associated with
a firm k which has an upstream integrated supplier.
10
4.1 Distance
Each e-way bill reports the distance over which the goods are transported. This
is the measure of distance we use when we compute the average distance over
which goods are transported for supply relationships which exist. For instance,
the average shipment distance for a given input for some establishment. How-
ever, we are also interested in counterfactual distances for supply relationships
that fail to materialize. Particularly, we are interested in the distances to po-
tential suppliers within and outside the firm boundary for each “net buyer”
establishment-product pair, as well as how far a given establishment is from
other establishments owned by the same firm. Here, our measure of distance
between two establishments is the distance between their PIN code centroids.
11
both a liberal estimate and a conservative estimate. We use the liberal estimate
in our main specification and our results are not affected by this choice.
4.5 Size
We proxy for firm or establishment size by its total in-shipment value. We also
count the number of products that a firm or establishment buys and sells. For
firms, we additionally count the number of establishments that it owns.
4.6 Competition
We use the Herfindahl-Hirschman Index (HHI) at the product level as a measure
of competition. We calculate this measure by squaring the market share of each
firm competing in a market and then summing the resulting numbers.
To define market concentration, one needs to define a market within which firms
are assumed to compete. We define the market to be either the state or a district
in the state. We define the market share of firm j in product p to be its share
of total outward shipment value in market m.
Σi∈χ(j) valueip
sjpm =
Σj 0 ∈Φ(m) Σi∈χ(j 0 ) valueip
where sjpm is the market share of firm j in product p in market m, valueip is
the total value of product p in shipment i, χ(j) is the set of shipments from firm
j, and Φ(m) is the set of firm in market m.
12
5 Results: Ownership and Utilisation
In this section, we present our empirical results related to the ownership of
vertically integrated production chains and the utilisation of these networks.
We find that downstream establishments can potentially source 11% of the total
input value from an integrated upstream establishment. When we only consider
firms that operate in more than one location, or multi-establishment firms, 13%
of the total input value can be sourced from an integrated upstream supplier.
We also take the suppliers’ perspective and find that upstream establishments
can potentially sell 10% of its total output value to an integrated downstream
establishment. This measure increases to 11% when we only consider multi-
establishment firms.
In Table 7, we show that a large share of firms in the economy are vertically
integrated. Firms that can source at least one product from within make up
60% of economic activity.
Our main results for this section are reported in Table 8. In this table, we con-
sider firms located within the state, as we see their complete transaction data.
As explained in Section 3, we define a firm to be a seller of a product if its total
out-value for that product exceeds a multiple of its total in-value. Each column
refers to a different threshold value for the multiple. We report results for 1,
1.2 and 1.5. Each row corresponds to a different definition of the firm having
an integrated supplier. In the first two rows, we consider the case where the
13
firm owns a single upstream seller at sufficient scale, using threshold values of
0.5 and 1 respectively, while in the next two rows we look at whether the total
capacity of integrated sellers is sufficiently large.
First, we remove from the sample the observations where the establishment
sources the product less than three times. This may represent a one-off trans-
action and it may not be worth it to source from within. In Table A1, we find
that our baseline measures do not change much and range between 30 and 34
percent. We also report our unweighted results in Table A2.
Second, we consider only the primary inputs for each establishment. It may be
that minor inputs are not important enough to be sourced within the firm. We
define an establishment’s primary input to be that of the largest total inward
shipment value. We find similar results as the baseline. The results are reported
in Table A5 and our measure ranges from 33 to 38 percent. We also report our
unweighted results in Table A6.
14
liberal definition of having an integrated supplier. We consider an establishment
to have an integrated supplier if there is any net-seller of the input within the
firm, irrespective of the scale of the net-seller. We find that the unweighted
average of within-firm sourcing is around 37% while the weighted average is
around 19%. This likely represents a lower bound on the amount of within-firm
sourcing.
In Table 11, we report that the share of total sales that takes place within firms,
out of the total potential sales that can take place within the firm ranges from
29% to 38% for different specifications. We also report the unweighted average
in Table 12 which ranges from 42% to 48%.
We also report the unconditional share of within-firm sourcing in the table. This
is the amount of trade for the product category as a proportion of total trade
in the economy. The number ranges from 0.02 for wood/cork/straw articles to
0.19 for precious metals and stones.
15
5.2.7 Robustness to Product Code Level
For most of our specifications, we use 4 digit HS codes to define our product
categories. However, there may be a concern that defining the product at 4
digit level is too broad. For example, we may wrongly classify a firm as having
an integrated supplier by looking at the 4 digit level, if products within a given
4 digit category are not substitutable. This would downward bias our results.
To see how much of a concern this is we repeat our baseline analysis using the
observations for which we have 8 digit HS codes. As larger firms are more likely
to report 8 digit HS code, the sample selection is endogenous.
We report our results in Tables A9 and A10. The share of within-firm sourcing,
when weighted by value, lies between 41 - 46% while the unweighted number lies
between 30 - 32%. These numbers are similar to our earlier results suggesting
that product definition is unlikely to be a big concern.
6.1 Distance
In Table 15 we explore how distance affects the firm’s decision to source from
within. In columns (1) and (2) we find that doubling the distance from where
the input is sourced decreases within-firm shipments by around 5 percentage
points. Thus, input suppliers outside the firm are on average located further
than the input suppliers within the firm. This is consistent with different estab-
lishments within a firm locating close to each other. Looking at firm dispersion,
we compute a measure of how dispersed a firm is, i.e., for a given establishment
what is the average distance to all other establishments within the firm. In
columns (3) and (4) we find that firms that are more geographically dispersed
are more likely to source from outside.
In columns (5) and (6), we look at the impact of distance to net-sellers within
and outside the firm. We find that doubling the average distance to integrated
sellers reduces within-firm sourcing by around 5 p.p., while doubling the average
distance to outside firm sellers increases the within-firm sourcing by 7 - 12 p.p.
16
The attenuation in sourcing with increasing distance is higher for outside firm
suppliers, i.e., firms are less elastic to distance for within-firm suppliers.
We also explore the impact of the number of shipments in Table 18, and find a
positive effect. Doubling the number of shipments increases within-firm sourcing
by 2.6 - 5 p.p.
6.4 Competition
In Table 19 we explore how upstream and downstream competition affects
within-firm sourcing. As outlined previously, upstream competition has an am-
biguous prediction on the level of within-firm sourcing. On the one hand, it
increases the options available to the downstream establishment, making it less
likely that the product will be sourced from within. On the other hand, in the
face of higher competition, the downstream establishment is a captured buyer
for the upstream establishment which may increase the amount of within-firm
sourcing. Empirically, we find that an increase in upstream competition in-
creases within-firm sourcing. A 0.1 increase in the HHI increases within-firm
17
sourcing by 4 p.p.
6.5 R&D
In Table 20 we look at whether firms are more likely to source products that
are R&D intensive from within. High R&D intensive products may require
upfront investment by the supplier. However, a non-integrated supplier does not
internalize all of the benefits of this investment. This makes it more likely that
such products will be sourced from within. Using the measure of R&D intensity
from Nunn and Trefler (2013), we find that products which are more intensive in
R&D requirement are more likely to be sourced from within. Doubling the R&D
requirement increases the share of within-firm sourcing by about 8 percentage
points.
18
firm. We report results in Tables 22 and 23.
19
min τjk + cjk + jk ,
k
where τjk captures the transportation costs associated with sourcing from sup-
plier k, cjk is the contracting cost, and jk is an establishment-supplier specific
idiosyncratic shock. We assume that jk follows an EV1 distribution, yielding
the following expression for the share of input firm j sources from k.
exp(τjk + cjk )
E[Xjk /Xj ] = (1)
Σk exp(τjk + cjk )
We parameterize τjk as follows:
where distancejk is the distance in meters from establishment j to k, 1jk (withinf irm)
is an indicator function for whether the supplying establishment is a vertically
integrated firm, and 1jk (withinstate) is an indicator for whether the supplying
establishment is located within the same state.
where γj is a fixed effect for buying establishment which absorbs the denomi-
nator in Equation 1.
8.2 Estimation
We estimate Equation 3 with a multinomial pseudo maximum likelihood estima-
tor, implemented via a Poisson regression. For each downstream establishment-
input pair, we define the set of potential suppliers to be all “net sellers” of the
input which operate a sufficient scale relative to the demand of the downstream
establishment. As before, we define “net-sellers” of the input to be establish-
ments with total outward shipment value exceeding 1.2 times its total inward
shipment value, and the ‘sufficient scale” condition requires that total sales of
20
the input be at least 50% of what the downstream establishment buys.
We also consider the analogous decision to sell to potential buyers from the per-
spective of an upstream supplier. In Appendix Tables A11 and A12, we present
our estimates and find similar results.
Atalay, Hortaçsu, Li, et al. (2019) estimate a similar gravity specification with
data on seller-destination trade flows and find that the elasticity of bilateral
trade flows with respect to the addition of a same-firm establishment in a des-
tination is 0.89.7 Their main result is that having an additional vertically in-
tegrated establishment in a given destination ZIP code has the same effect on
shipment volumes as a 40% reduction in distance. We replicate their specifi-
cation with our data in Appendix Table A13. In our setting, the elasticity of
bilateral trade flows with respect to operating a same-firm establishment in the
destination is 0.97.8 This implies that having a vertically integrated establish-
7 Calculated by multiplying the same-firm ownership share coefficient (2.828) from a Poisson
1
regression and 1+r where r is the average number of potential recipients in a destination
(0.315).
8 Calculated by multiplying the same-firm ownership share coefficient (2.411) from a Poisson
regression and the average same-firm ownership share given that a vertically integrated firm
exists in the destination (0.402). Following Atalay, Hortaçsu, Li, et al. (2019), we carry out
a simple calculation to compute the magnitude relative to distance: exp( 0.402∗2.411
−0.401
) where
-0.401 is the coefficient on log(distance).
21
ment in a given destination has the same effect on shipment volumes as a 91%
reduction in distance. This is substantially larger than that found by Atalay,
Hortaçsu, Li, et al. (2019) in the US context.
9 Conclusion
We use administrative data from a large state in India on the movement of phys-
ical inputs both within and between firms. We make five main contributions.
First, we map out the supply network within and across the firms, focusing on
the dimension of vertical flow of physical inputs. We document that approxi-
mately 11% of inputs, by value, can be sourced from within under the current
ownership structure.
Second, we measure how much of this trade materializes and find that around 30
- 40% of the potential trade takes place. This number is economically significant,
suggesting that sourcing of physical inputs from within the firm is important.
The result is robust to a variety of sampling and measurement choices.
Third, we explore the factors that are associated with the decision to source
from within. We find that distance to suppliers within and outside the firm,
frequency and volume of product requirement, market competition in upstream
and downstream industries, product specificity, product R & D intensity and
firm scale all matter in explaining the decision.
Third, we look at the extensive margin decision and see what is associated with
the existence of an integrated buyer. We find that larger firms are more likely to
have an integrated buyer. At the product level, larger trade value in the prod-
uct, higher specificity, higher R&D intensity, and lower competition all increase
in the probability of having an integrated supplier.
References
Agarwal, Sumit et al. (June 2019). “Tax-Pass-through, Pricing Strategy and
Consumer Spending Dynamics: The Indian GST Experience”. In: SSRN
Electronic Journal.
Anderson, James E. and Eric Van Wincoop (Sept. 2004). “Trade costs”. In:
Journal of Economic Literature 42.3, pp. 691–751.
22
Atalay, Enghin, Ali Hortaçsu, Mary Jialin Li, et al. (Nov. 2019). “How Wide Is
the Firm Border?” In: The Quarterly Journal of Economics 134.4, pp. 1845–
1882.
Atalay, Enghin, Ali Hortaçsu, and Chad Syverson (2014). “Vertical integration
and input flows”. In: American Economic Review 104.4, pp. 1120–1148.
Boehm, Johannes and Ezra Oberfield (2018). “Misallocation in the Market for
Inputs: Enforcement and the Organization of Production”. In: National Bu-
reau of Economic Research Working Paper Series No. 24937.
Bresnahan, Timothy and Jonathan Levin (2012). “Vertical Integration and Mar-
ket Structure”. In: NBER Working Papers.
Coase, R. H. (Nov. 1937). “The Nature of the Firm”. In: Economica 4.16,
pp. 386–405.
Feenstra, Robert C and Gordon H Hanson (1996). “Globalization, Outsourcing,
and Wage Inequality”. In: The American Economic Review 86.2, pp. 240–
245.
Grossman, Sanford and Oliver Hart (Aug. 1986). “The Costs and Benefits of
Ownership: A Theory of Vertical and Lateral Integration”. In: Journal of
Political Economy 94.4, pp. 691–719.
Hart, Oliver and John Moore (1990). Property Rights and the Nature of the
Firm. Tech. rep. 6, pp. 1119–1158.
Klein, Benjamin, Robert G. Crawford, and Armen A. Alchian (Oct. 1978). “Ver-
tical Integration, Appropriable Rents, and the Competitive Contracting Pro-
cess”. In: The Journal of Law and Economics 21.2, pp. 297–326.
Leemput, Eva Van (2020). “A Passage to India: Quantifying Internal and Ex-
ternal Barriers to Trade”.
Lucas, Robert E (1988). “On the Mechanics of Economic Development”. In:
Journal of Monetary Economics 22, pp. 3–42.
Melitz, Marc J. (2003). “The Impact of Trade on Intra-Industry Reallocations
and Aggregate Industry Productivity”. In: Econometrica 71.6, pp. 1695–
1725.
Novak, Sharon and Scott Stern (2009). “Complementarity Among Vertical In-
tegration Decisions: Evidence from Automobile Product Development”. In:
Management Science 55.2, pp. 311–332.
Nunn, Nathan and Daniel Trefler (Oct. 2013). “Incomplete contracts and the
boundaries of the multinational firm”. In: Journal of Economic Behavior
and Organization 94, pp. 330–344.
Perry, Martin (1989). Chapter 4 Vertical integration: Determinants and effects.
Ramondo, Natalia, Veronica Rappoport, and Kim J. Ruhl (Jan. 2016). “In-
trafirm trade and vertical fragmentation in U.S. multinational corporations”.
In: Journal of International Economics 98, pp. 51–59.
Rauch, James E. (June 1999). “Networks versus markets in international trade”.
In: Journal of International Economics 48.1, pp. 7–35.
Rey, Patrick and Jean Tirole (2007). Chapter 33 A Primer on Foreclosure.
Spengler, Joseph J (1950). Vertical Integration and Antitrust Policy. Tech. rep.
4, pp. 347–352.
23
Stigler, George J (1951). The Division of Labor is Limited by the Extent of the
Market. Tech. rep. 3, pp. 185–193.
Williamson, O E (1985). The Economic Institutions of Capitalism: Firms, Mar-
kets, Relational Contracting. Free Press.
Williamson, Oliver (1975). Markets and hierarchies, analysis and antitrust im-
plications : a study in the economics of internal organization. Free Press,
p. 286.
Williamson, Oliver E (1971). “The Vertical Integration of Production: Market
Failure Considerations”. In: American Economic Review 61.2, pp. 112–123.
— (Oct. 1979). “Transaction-Cost Economics: The Governance of Contractual
Relations”. In: The Journal of Law and Economics 22.2, pp. 233–261.
24
10 Tables
Variable Count
Firms 1,191,778
Firm-Locations 2,094,156
Firm-Location-Products 9,054,299
Products 1,295
Locations 35,026
Locations in State 6,938
Districts in State 40
Notes: This table provides counts for each of the units in our data. Firms are a tax
identification number (GSTIN), locations are zipcodes (6 digit PIN codes), products are 4
digit HS product codes, and districts are 3-digit PIN codes.
25
Table 2: Descriptive Statistics: Firm Level
26
Average Value of Shipments 202,291.10 368,527.30 50,352.47 89,000.00 187,995.50
Total Value of Inward Shipments 15,820,502.00 1,016,284,634.00 50,000 339,871.5 2,283,888.0
Total Value of Outward Shipments 15,818,995.00 1,216,740,171.00 0 0 735,155.2
Notes: This table provides descriptive statistics for the data aggregated to a firm level, where a firm is identified by a unique tax id (GSTIN). We
report the average number of locations the firm operates in, the average number of product categories that the firm trades, the average number of
shipments, the average value for each shipment and average value of inward and outward shipments. The numbers are reported in the local
currency, INR.
Table 3: Descriptive Statistics: Product Level
27
Number of Buying Firm-Locations 5,545.84 11,001.99 320.5 1,543 5,487.5
Number of Selling Firm-Locations 1,325.57 2,367.00 138.5 477 1,433.5
Number of In-Shipments 88,434.74 271,974.10 1,636.00 11,951.00 63,445.00
Number of Out-Shipments 80,958.96 207,496.70 1,618.00 11,743.00 59,994.50
Notes: This table provides descriptive statistics aggregated to the product level. The product is defined as a 4 digit HS product code. Total Value
of Inward Shipments is defined as the sum of all inward shipment values for that product. Total Value of Outward Shipments is defined as the sum
of all outward shipment values for that product. Firms are identified by a unique tax id. Firm-locations are a tax id - zip code pair. We define
buying and selling firm-locations according to Section 3. The Number of In/Out Shipments is the count of shipments for each product.
Table 4: Descriptive Statistics: Buyers and Sellers - Product-Establishment
Level
28
Table 6: Potential Within Firm Sourcing
29
Table 8: Share of Within Firm Upstream Sourcing: Weighted
30
Table 10: Share of Vertically Integrated Firms That Ship Within Firm:
Weighted
31
Table 12: Share of Within Firm Downstream Sales: Unweighted
32
Table 13: Section Level Descriptive Statistics
Section Section Name Number of Number Volume (in Number of Number of Number Up-
Firms of Firm billions of Sellers Buyers stream Inte-
Locations INR) grated: To-
tal
1.00 Live Animals and Animal 18,348 26,926 114 4,791 14,495 459
Products
2.00 Vegetable Products 95,329 124,722 792 28,113 72,879 2,000
3.00 Animal or Vegetable Fats 22,353 32,539 317 4,264 19,424 521
and Oils
4.00 Prepared Foodstuffs, Bev- 81,908 128,727 1,029 20,052 67,748 2,264
erages, Spirits and Vine-
gar, Tobacco
5.00 Mineral Products 124,587 235,038 930 23,437 106,430 2,877
6.00 Chemicals and Para- 206,710 321,905 1,429 57,993 170,206 6,419
Chemical Products
7.00 Plastics and Rubber 240,793 354,105 974 70,453 194,207 6,948
8.00 Animal Hides and Skins 36,978 57,987 64 10,101 28,201 1,256
9.00 Wood, Cork, Straw and 69,359 104,727 155 21,336 54,961 2,151
Articles thereof
10.00 Pulp of Wood, Paper, 97,222 147,843 321 26,899 78,604 2,552
Paperboard, and Printed
Products
33
11.00 Textiles 275,939 362,863 1,826 119,294 193,341 6,473
12.00 Footgear, Headgear, Um- 31,141 47,063 143 11,336 22,207 1,192
brellas etc.
13.00 Articles made of Miner- 115,944 188,158 244 31,397 92,981 3,355
als, Stone, Plaster, Ce-
ment and Ceramic and
Glass Products
14.00 Precious Metals and Stone 8,102 11,912 31 2,859 5,651 318
15.00 Base Metals and Articles 296,596 501,463 3,171 106,650 235,061 11,361
thereof
16.00 Machinery and Mechani- 423,106 738,885 3,423 140,887 351,028 20,193
cal Appliances
17.00 Vehicles and Transport 58,825 86,060 1,410 19,154 43,361 1,793
Equipment
18.00 Photographic, Music, 100,466 153,144 381 29,163 81,378 3,575
Medical equipment and
Clocks
19.00 Arms and Ammunitions 1,532 1,939 5 733 878 67
20.00 Miscellaneous Manufac- 129,754 209,443 287 36,793 101,509 4,522
tured Products
21.00 Art 96,954 141,341 131 33,957 71,603 2,518
Notes: This table reports descriptive statistics for different aggregated product categories. For these broad categories, we report the number of
firms which either buy or sell the product, the number of establishments that buy or sell the product, the total volume of trade, number of sellers,
and the total number of sellers which are integrated.
Table 14: Section Level Share Within Firm
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3) (4) (5) (6)
∗∗∗ ∗∗∗
Log Mean Distance −4.869 −5.466
(0.033) (0.033)
Firm Dispersion −7.444∗∗∗ −7.685∗∗∗
(0.036) (0.035)
Log Average Distance to Within Firm Sellers −4.833∗∗∗ −4.854∗∗∗
(0.032) (0.031)
Log Average Distance to Outside Firm Sellers 7.602∗∗∗ 12.335∗∗∗
35
(0.167) (0.316)
District FE Yes Yes Yes Yes Yes Yes
Product FE No Yes No Yes No Yes
Observations 1,075,224 1,075,224 1,033,873 1,033,873 1,004,406 1,004,406
R2 0.152 0.254 0.171 0.269 0.158 0.255
Adjusted R2 0.151 0.253 0.171 0.268 0.158 0.254
∗ ∗∗ ∗∗∗
p<0.1; p<0.05; p<0.01
Notes: This table reports estimates from a linear regression of the share of within firm sourcing on various distance measures at the
product-establishment level. The share of within firm sourcing is defined by “Upstream Integrated: Total” with a scale requirement of 0.5 and a
buying/selling threshold of 1.2 according to Section 3. Columns (1) and (2) have self reported shipment distance as the independent variable.
Columns (3) and (4) have firm dispersion as the independent variable. We construct firm dispersion for an establishment as the average distance to
other establishments within the firm (see Section 4). Columns (5) and (6) have the average distance to within and outside firm net-sellers of the
product. All specifications control for district fixed effects, even columns add product fixed effects.
Table 16: Impact of Specificity on Within Firm Shipping
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3) (4)
Listed on Exchange −9.336∗∗∗ −7.168∗∗∗
(0.220) (0.219)
Listed on Exchange or Reference Priced −5.154∗∗∗ −4.117∗∗∗
(0.119) (0.118)
In Total Value −2.089∗∗∗ −2.091∗∗∗
(0.015) (0.015)
District FE Yes Yes Yes Yes
Section FE Yes Yes Yes Yes
Observations 1,049,661 1,049,661 1,049,661 1,049,661
R2 0.162 0.162 0.177 0.177
Adjusted R2 0.161 0.161 0.176 0.176
∗ ∗∗ ∗∗∗
p<0.1; p<0.05; p<0.01
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on various specificity measures at the product-establishment level. The share of
within firm sourcing is defined by “Upstream Integrated: Total” with a scale requirement of
0.5 and a buying/selling threshold of 1.2 according to Section 3. The independent variable
variable in column (1) is an indicator if the product is listed on an exchange. Independent
variable in column (2) is an indicator if the product is listed on an exchange or is reference
priced. Columns (3) and (4) add In Total Value as controls. All specifications control for
district and section fixed effects.
36
Table 17: Impact of Frequency on Within Firm Shipping
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3)
∗∗∗ ∗∗∗
Frequency Index 27.259 16.291 12.564∗∗∗
(0.170) (0.164) (0.163)
District FE No Yes Yes
Product FE No No Yes
Observations 1,075,224 1,075,224 1,075,224
R2 0.023 0.143 0.239
Adjusted R2 0.023 0.142 0.237
∗ ∗∗ ∗∗∗
p<0.1; p<0.05; p<0.01
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on the time frequency of shipments at the product-establishment level. The share
of within firm sourcing is defined by “Upstream Integrated: Total” with a scale requirement
of 0.5 and a buying/selling threshold of 1.2 according to Section 3. Frequency Index is the
fraction of months with at least one inward shipment for a given product. Columns (2) and
(3) include district fixed effects. Column 3 includes product fixed effects.
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3)
∗∗∗ ∗∗∗
Log Number of Shipments 5.073 3.238 2.635∗∗∗
(0.028) (0.027) (0.028)
District FE No Yes Yes
Product FE No No Yes
Observations 1,095,636 1,095,636 1,095,636
R2 0.029 0.147 0.243
Adjusted R2 0.029 0.146 0.241
∗ ∗∗ ∗∗∗
p<0.1; p<0.05; p<0.01
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on log the number of individual shipments at the product-establishment level. The
share of within firm sourcing is defined by “Upstream Integrated: Total” with a scale
requirement of 0.5 and a buying/selling threshold of 1.2 according to Section 3. Columns (2)
and (3) include district fixed effects. Column 3 includes product fixed effects.
37
Table 19: Impact of Competition on Within Firm Shipping
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3) (4)
∗∗∗ ∗∗∗
Upstream HHI 43.768 42.364
(0.660) (0.669)
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on upstream and downstream competition at the product-establishment level. The
share of within firm sourcing is defined by “Upstream Integrated: Total” with a scale
requirement of 0.5 and a buying/selling threshold of 1.2 according to Section 3. HHI is sum
of firm out-value shares for each product. Upstream HHI is HHI for the establishment’s
input product. Weighted Downstream HHI is the mean over the HHIs for all the
establishment’s outputs weighted by out-value. All specifications include district fixed
effects. Columns (2) and (4) include product section fixed effects.
38
Table 20: Impact of R & D on Within Firm Shipping
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3)
∗∗∗ ∗∗∗
R & D Intensity 9.461 7.215 8.031∗∗∗
(0.317) (0.379) (0.376)
Log In Total Value −2.118∗∗∗
(0.015)
District FE Yes Yes Yes
Section FE No Yes Yes
Observations 1,023,113 1,023,113 1,023,113
R2 0.137 0.157 0.173
Adjusted R2 0.137 0.157 0.172
∗ ∗∗ ∗∗∗
p<0.1; p<0.05; p<0.01
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on the log of R&D intensity at the product-establishment level. The share of within
firm sourcing is defined by “Upstream Integrated: Total” with a scale requirement of 0.5 and
a buying/selling threshold of 1.2 according to Section 3. All columns contain district fixed
effects, Column (2) and (3) add product section fixed effects, Column (3) also adds log of in
total value as a control.
39
Table 21: Impact of Firm Scale on Within Firm Shipping
Dependent variable:
Share of Shipments Within Firm (Percent)
(1) (2) (3)
∗∗∗
Log Total In-Shipment Value 0.708
(0.043)
Notes: This table reports estimates from a linear regression of the share of within firm
sourcing on various measures of firm size at the firm level. The share of within firm sourcing
is defined by “Upstream Integrated: Total” with a scale requirement of 0.5 and a
buying/selling threshold of 1.2 according to Section 3. We aggregate to the firm level by
taking a weighted average by total in-value. Total In-Shipment Value is the sum of the value
of all of the firm’s inwards shipment. Number of Products is the number of products that
the firm either buyers or sells. Number of locations is the number of zip codes that the firm
operates. We take logs over all independent variables.
40
Table 22: Existence of Integrated Seller
Dependent variable:
Integrated Seller Exists (Percent)
(1) (2) (3) (4) (5) (6) (7)
Log In Total Value 0.690∗∗∗
(0.002)
41
Distance to Outside Sellers −0.032
(0.070)
Notes: This table reports estimates from a linear regression of an indicator for having a vertically integrated supplier on variables measures at the
product establishment level. The indicator for having an integrated seller uses a scale requirement of 0.5 and a buying/selling threshold of 1.2
according to Section 3. Log In Total Value is the log of total inward shipments of the product at the establishment level. Log Number of Shipments
is the log of the number of shipments of a product that an establishment receives. Listed on Exchange is an indicator for if the product is traded on
an exchange. Upstream HHI is HHI for the establishment’s input product. Weighted Downstream HHI is the mean over the HHIs for all the
establishment’s outputs weighted by out-value. Distance to Outside sellers is the log of average distance to non-integrated net-sellers of the
product. R&D intensity is measured at the product level. All specifications contain district fixed effects and product fixed effects, unless if the
independent variable is defined at the product level.
Table 23: Existence of Integrated Seller – Firm Level
Dependent variable:
Integrated Seller Exists (Percent)
(1) (2) (3)
∗∗∗
Log In Total Value 0.362
(0.004)
Notes: This table reports estimates from a linear regression of the indicator variable for
having at least one vertically integrated upstream supplier on various variables at the firm
level. The indicator for having an integrated seller uses a scale requirement of 0.5 and a
buying/selling threshold of 1.2 according to Section 3. In Total Value is the total inward
shipment value for the firm. Number of Shipments is the total count of shipments that a
firm receives. Number of locations is the number of zipcodes that the firm operates in.
42
Table 24: Relative Value of Vertical Integration
43
Table 25: Relative Value of Vertical Integration – Interactions
44
A Appendix Tables
Table A1: Share of Within Firm Upstream Sourcing: Weighted (At least 3
shipments)
Table A2: Share of Within Firm Upstream Sourcing: Unweighted (At least 3
shipments)
Table A3: Share of Within Firm Upstream Sourcing: Weighted (Within Dis-
trict)
45
Table A4: Share of Within Firm Upstream Sourcing: Unweighted (Within Dis-
trict)
Table A5: Share of Within Firm Upstream Sourcing: Weighted (Primary Input)
Table A7: Share of Within Firm Upstream Sourcing: No Supplier Scale Re-
quirement
Table A8: Share of Vertically Integrated Firms That Ship Within Firm: Un-
weighted
46
Table A9: Share of Within Firm Upstream Sourcing: Weighted (8 digit product)
47
Table A12: Relative Value of Vertical Integration - Interactions - Seller Side
48
Table A13: Relative Value of Vertical Integration - Replication of Atalay (2019)
49