Professional Documents
Culture Documents
To cite this article: Pascal Wichmann, Alexandra Brintrup, Simon Baker, Philip Woodall & Duncan
McFarlane (2020): Extracting supply chain maps from news articles using deep neural networks,
International Journal of Production Research, DOI: 10.1080/00207543.2020.1720925
Extracting supply chain maps from news articles using deep neural networks
Pascal Wichmanna∗ , Alexandra Brintrupa , Simon Bakerb , Philip Woodalla and Duncan McFarlanea
a Institute for Manufacturing, University of Cambridge, Cambridge, UK; b Language Technology Lab,
University of Cambridge, Cambridge, UK
(Received 15 February 2019; accepted 12 January 2020)
Supply chains are increasingly global, complex and multi-tiered. Consequently, companies often struggle to maintain com-
plete visibility of their supply network. This poses a problem as visibility of the network structure is required for tasks like
effectively managing supply chain risk. In this paper, we discuss automated supply chain mapping as a means of maintaining
structural visibility of a company’s supply chain, and we use Deep Learning to automatically extract buyer–supplier relations
from natural language text. Early results show that supply chain mapping solutions using Natural Language Processing and
Deep Learning could enable companies to (a) automatically generate rudimentary supply chain maps, (b) verify existing
supply chain maps, or (c) augment existing maps with additional supplier information.
Keywords: supply chain management; supply chain map; natural language processing; text mining; supply chain visibility;
supply chain mining; deep learning; machine learning
1. Introduction
Complex products, like cars or aircraft, can be composed of tens of thousands or even millions of parts. Rather than all
being produced in-house, parts and materials are sourced from a large number of suppliers spread across the world. As sub-
stantial parts of the value creation are outsourced to suppliers, who, in turn also outsource to sub-tier suppliers themselves,
increasingly multi-tiered, complex and geographically distributed supply networks emerge (Christopher and Lee 2004).
Consequently, companies gradually lose visibility over the topology of their supply network. A study by Achilles, a provider
of supply chain management solutions, claims that ‘40% of companies who sourced only in the UK, and almost 20% who
sourced globally, had no supply chain information beyond their direct suppliers’ (Achilles Group 2013).
Lacking visibility of the supply chain structure poses a problem: by definition, a supply network is a network of depen-
dencies to suppliers, and the performance of a company’s supply chain is crucial to its operations. Information about its
extended suppliers is a valuable input to various decision-making processes of a firm, such as managing the efficiency,
resilience, or sustainability of its supply chain. Furthermore, in recent years companies have come under increasing pres-
sure to understand their supply chains to prevent modern forms of slavery and other human rights violations. In particular,
supply chain risk management without visibility of the supply network is a problem while at the same time the emergence
of longer, geographically distributed supply chains exposes companies to more and a wider range of risks. Disruptions that
occur on a sub-tier can propagate through the network, creating a ripple effect (Ivanov, Sokolov, and Dolgui 2014) and
halt the production of companies that never knew they had this dependency on a sub-tier supplier. Studies show that the
share of supply chain disruptions that originate with suppliers further upstream than the direct suppliers can be as high as
50% (KPMG International & The Economist Intelligence Unit 2013; Business Continuity Institute 2014). Suppliers critical
to continued operations can be located anywhere in the multi-tiered network and do not have to correspond to large sales
volumes (Yan et al. 2015). The reason why structural supply chain visibility cannot easily be achieved is a combination
of multiple factors. One major reason is that companies consider information about their supply base proprietary and are
unwilling to share it. Supply chain mapping is frequently named as the recommended solution to the problem of limited
supply chain visibility, and various tools exist for visualising buyer–supplier relations, yet the actual issue of acquiring the
required data in the first place remains unaddressed (Farris 2010).
Even though data that can readily be used for supply chain mapping is still scarce, vast amounts of data have become
abundantly available at low cost via the Web. A large proportion of this data is in text form and contains valuable infor-
mation about buyer–supplier relations. Since these text documents typically consist of running text in natural language
Figure 1. Conceptual model of the research focus: automating the extraction of individual buyer–supplier relations from unstructured
text.
(unstructured text) instead of in tables with a known data schema, extracting information from it is a challenging problem.
Thanks to advances in Machine Learning (ML), in particular Deep Learning, and Natural Language Processing (NLP), the
extraction of information can now be increasingly automated. A solution that could at least partially automate the genera-
tion of supply chain maps from text documents would have many beneficiaries and use cases. Improved knowledge of the
supply chain structure would enable a company to better detect and mitigate risks in advance. For example, a company may
not be aware that some of its direct suppliers depend on the same sub-tier supplier, a potential single point of failure. In
this case, the true risk exposure is obscured. With this knowledge, a company could mitigate the risk, e.g. by increasing
inventory levels, demanding suppliers to diversify their supply base, or by identifying substitute parts with different supply
chains. Knowledge of the supply chain structure would also enable a company to react to risk events more quickly and
appropriately. For example, knowledge about which sub-tier suppliers are located in a recently flooded region could allow
a company to react quickly and, for example, enable them to secure alternative supplies faster than any competitor. The
potential benefits are not limited to companies managing risks of their own supply chains. Other actors may also bene-
fit from a better understanding of supply networks, such as governmental agencies, insurance companies, or management
consultancies.
The overall aim of this research is to examine to what extent and how supply chain maps can be automatically generated
from unstructured, natural language text, such as news reports or blog posts. Given how frequently new algorithms and
network architectures are proposed in ML and NLP, the aim is not to identify the best possible algorithm but to test the
general feasibility of the idea. A prerequisite for the creation of supply chain maps is the extraction of individual buyer–
supplier relations which represents the main focus of this paper. Figure 1 shall summarise the problem background.
This study builds upon a previous paper (Wichmann et al. 2018) where the idea of automatically generating supply chain
maps from natural language text was first introduced and its challenges discussed. In this paper, we focus on the automated
classification of buyer–supplier relations by creating a text corpus and using it to train and test a Deep Learning classifier.
The automatically extracted buyer–supplier relations are then visualised in a basic supply chain map.
After summarising the relevant background (Section 2), namely supply chains, supply chain visibility as well as relevant
concepts from Machine Learning and NLP, we define the problem of extracting individual buyer–supplier relations from
text (Section 3). We then outline the methodology for addressing the problem (Section 4). Subsequently, we summarise and
discuss the results (Section 5). The extracted buyer–supplier relations can be visualised in the form of a basic supply chain
map (Section 6). Finally, we provide concluding remarks and propose ideas for future research (Section 7).
2. Related work
2.1. Supply chains and supply chain mapping
2.1.1. Supply chains
A supply chain emerges as a focal company (hereafter also referred to as Original Equipment Manufacturer or OEM)
buys products or services from a supplier to produce their own products. Since supply chains are networks (Lambert and
Cooper 2000), they consist of nodes and directed links of ‘flows of products, services, finances, and/or information from
a source to a customer’ (Mentzer et al. 2001). The combination of nodes and links give the network its structural dimen-
sions. The horizontal structure refers to the number of tiers across the supply chain. The vertical structure refers to the
number of suppliers or customers represented within each tier (Lambert and Cooper 2000). The term ‘upstream’ is used
to denote the direction towards to original supplier whereas ‘downstream’ refers to the direction towards the ultimate
customer.
International Journal of Production Research 3
elements of these supply chain maps typically consists of the companies (nodes of the network) and their inter-relations
(arcs of the network). The arcs commonly indicate the flow of goods but may also indicate flows of information or money.
In this paper, we refer to supply chain mapping as the overall process of creating and maintaining a supply chain map.
This process includes the steps of gathering the information needs, acquiring and analysing the information and visualising
the results on the required aggregation level. Only few papers appear to exist that reflect on supply chain mapping as
a method, e.g. Gardner and Cooper (2003) and Farris (2010). Other papers approach the topic from the perspective of
lean management by extending value stream mapping to supply chains, e.g. Suarez-Barraza, Miguel-Davila, and Vasquez-
García (2016). However, numerous papers report on the application of supply chain mapping to specific scenarios. For
example, Choi and Hong (2002) provide supply chain mapping case studies in the automotive industry. The supply chain
maps were limited to the centre console assembly of three different product lines. The data was collected manually through
interviews, from documents provided by the automotive companies, and via observations during a plant tour. Choi and
Hong (2002) compare the three resulting supply network structures from the points of view of formalisation, centralisation,
and complexity. Another example of a manual supply chain mapping exercise is a report by the US Geological Survey. This
report ‘uses the supply chain of tantalum (Ta) to investigate the complexity of mineral and metal supply chains in general
and show how they can be mapped’ (Soto-Viruet et al. 2013).
and Schmidhuber (1997) and address the problem of the vanishing gradient. Bidirectional RNNs combine the different
representations learned from reading data in both directions. For some types of sequential inputs, models perform similarly
well if the data is read ‘anti-chronologically’. However, because these RNNs trained on the reversed sequence learn a
different representation, it is useful to combine the outputs of RNNs trained on the normal and the reversed sequence
(Chollet 2018). Such network architectures are called bidirectional RNNs, and bidirectional versions also exist for RNN
sub-types (e.g. BiLSTM ; Graves and Schmidhuber 2005). Even more recent developments, such as Google’s attention-based
transformers (Vaswani et al. 2017), were not considered within the scope of this paper.
recall. To capture a wider range of expressions and reduce the effort of manually defining extraction patterns, we proposed
to use Deep Learning.
4. Methodology
4.1. Overview
We address the problem in two subsequent stages: corpus creation and relation classification.
Figure 3. Focus of this paper: classification of buyer–supplier relations between two mentions of organisations.
Figure 4. Buyer–supplier extraction across multiple documents: The aim is to extract triples of Entity A, Entity B, and the relationship
class.
International Journal of Production Research 7
Corpus creation: The corpus creation consists of both the collection and preparation of the text input as well as the
process of labelling sentences by human annotators. The importance of the corpus is two-fold:
• ‘Gold standard’: In order to evaluate classification performance, a human-labelled text corpus is required that will
be considered the ground truth against which the classifiers’ predictions can be compared. The gold standard data
will allow us to measure both recall and precision. The datasets also serve as a training dataset for a Machine
Learning classifier and is, thus, more than a pre-processing step but part of the solution.
• Suitability for automation: Furthermore, higher inter-annotator agreement suggests a more manageable, formal-
isable task that is more likely to be suitable for automation. If it is impossible to establish a ground truth among
human annotators, a classifier cannot be expected to perform well on the problem. It is not obvious that the task of
classifying buyer–supplier relations is simple or formalisable enough for annotators to agree.
The text corpus shall be representative of a general news set, such that a classifier’s performance measured on the dataset
is a good predictor of its performance on a previously unseen general news dataset. This is a challenging requirement given
the expected small size of the text corpus and the low share of sentences describing buyer–supplier relation in a general
news dataset.
Relation classification: Only once a gold standard dataset has been established, a supervised classifier can be trained
and the best-performing one with respect to recall, precision and F1 score can be identified. The achievable performance
is dependent on the size and quality of the corpus. A high recall would not yet suggest that supply chains can be fully
reconstructed; this would depend on the information availability which is not tested in this stage. Similarly, NER errors are
not considered when the classification performance is measured.
Approach A: Sampling of documents into three partitions Our first approach to address this trade-off was to sample doc-
uments into three separate partitions: one partition for random documents drawn from a general news dataset, a second
partition for documents that were retrieved using keywords related to selected key industries (aerospace and automotive),
and a third partition for documents that were retrieved based on a search for company names in these key industries. This
way, the trade-off between the expected relevance of a sentence and its bias could in principle be steered by adjusting the
proportion of each partition in the final sample.
Among other reasons, aerospace and automotive were chosen as key industries as they are known for having complex
and global supply chains. In addition, the assumption was that these industries are both well-covered in general news as
well as that news reports often include supply chain information. For the aerospace industry, for example, documents were
filtered for the existence of keywords, such as ‘aerospace’, ‘aircraft’ or ‘planemaker’. The RCV1 Reuters dataset contained
industry codes (‘Thomson Reuters Business Classification’) and could more accurately be filtered using 23 of these codes
instead. 50% of documents were sampled from the aerospace industry and 50% from the automotive industry. To filter
documents by company names, a of the top 100 global automotive company names and brands was used as well as a list of
the top 100 global aerospace and defence companies. Relevant documents in each partition were subsequently segmented
into individual sentences that would then be drawn randomly. Initial tests quickly revealed that the proportion of positive
sentences even in the more relevant data partitions was too small for an efficient annotation by humans.
Approach B: Manual collection of candidate sentences To address this problem but without compromising the overall
human annotation, in addition to the already created dataset, candidates for positive sentences were manually collected by
three researchers and stored in a further data partition. To prevent biases, these positive sentences could not just be obtained
via a Web search that used potential features, such as ‘supplies Toyota with’. Otherwise, a classifier trained on the data would
8 P. Wichmann et al.
Figure 5. Documents were sampled from four different sources and assigned to three different partitions each. Manually collected
candidate positive sentences were added to the corpus to increase the share of positive examples.
be biased towards the patterns the positive sentences were found with in the first place. Instead, the following strategies were
deemed acceptable and used to collect the sentences:
• Using a Web search engine by using as a search term (a) a single company name or (b) the names of two companies
of which one is known or merely suspected to supply the other.
• Manually analyse websites that tend to publish industry news, such as recent deals and partnerships.
In all of these cases, sentences were manually identified in the search result summaries, headings or the original articles
that could describe a buyer–supplier relation, partnership or collaboration. Ambiguous sentences were not ignored but were
also collected so that the overall dataset was rather too inclusive than too exclusive. Similar to the previous approach, the
focus was on aerospace and automotive companies but any by-catch from other industries was also added to the collection.
These candidate positive sentences were collected and stored without a label as it was up to the annotators to classify the
sentences.
The drawback of adding manually collected candidate positive sentences is the introduction of additional bias. This is
unavoidable as it is a direct consequence of the objective of manually collecting sentences. But it may lead to so-called
‘overfitting’ and result in false positives if the classifier is applied to previously unseen data. Some words may be reliable
indicators of buyer–supplier relations in the training data but not as reliable in a random general news dataset. E.g. words,
such as ‘award’ or ‘buy’ may be over-represented in the positive examples of the training data. To address the issue, the
classifier can be reiteratively improved: by manually labelling sentences that turned out to be false positives and adding
these labelled sentences to the training data.
Overall sampling process The overall sampling and pre-processing methodology is shown by Figure 5 and combined
both sampling approaches to equal parts. Documents were segmented into sentences using spaCy7 as off-the-shelf solution.
To facilitate the subsequent annotation, sentences were automatically NER-tagged, again using spaCy. The organisation
names were not provided to the NER tagger in advance. To reduce false positives, Flair8 and the Stanford CoreNLP9
NER taggers were used in combination with spaCy in a simple ensemble. The results of all three libraries had to match
for an organisational named entity to be considered in the subsequent steps. Only sentences with two or more detected
organisational named entities were admitted to the annotation process. Automatic NER tagging performs well but is still
imperfect and organisations may not have been detected, erroneously detected, or detected with incorrect segmentation
boundaries.
4.2.2. Labelling
Classes Due to the importance of directionality in supply relations, two classes are designed for explicitly expressed directed
buyer–supplier relations: (a) ‘company A supplies company B’, and (b) ‘company A is supplied by company B’ (inverted
International Journal of Production Research 9
direction). The actual sentence in the news article may use different words to express this relation, such as ‘purchasing
from’ or ‘using parts from’. These relations are not limited to the purchase of goods, parts and material but also include the
use of services, such as logistics services. These relations may be expressed in any tense (past, present or future) since the
tense could later automatically be identified if necessary. Furthermore, these relations should be expressed as certain, factual
statements rather than a possibility.
This leaves a set of other ways a buyer–supplier relation may be expressed: Collaborations, joint ventures, and other
forms of partnerships do not have an obvious directionality but may still result in dependencies that are relevant for a supply
chain map. Furthermore, buyer–supplier relations can be only implied, ambiguous or explicitly stated as uncertain, such as
‘company A is in talks with company B over the purchase of’ or ‘company A plans to buy from company B’. To avoid
too many different classes and to ensure that only one class is ever applicable, these cases are grouped into a single third
class of buyer–supplier relations (c) that are undirected, or implied or stated as uncertain. This class also aims to ensure that
examples of the first two classes are as reliable as possible.
We decided to distinguish a further class of relations (d) where one organisation owns another (fully or partially) or is part
of another organisation. Without this class, such relations could be misinterpreted as normal buyer–supplier relations (e.g.
‘company A buys a stake in company B’ or ‘company A sells its manufacturing business B to company C’). The purpose of
this class is less to obtain ownership relations which could be obtained from publicly available reports or databases but to
facilitate the annotation decisions and ensure the purity of the other buyer–supplier relationship classes.
Because the named entity recognition was performed automatically as part of the pre-processing of a sentence, errors
may occur where a labelled text sequence is not an organisation or was incorrectly segmented. To keep the task complexity
manageable and to ensure the identical NER tagging results as a starting point for all annotators, annotators were not asked
to rectify incorrect NER tags. Instead, a relation could be classified as ‘reject’ (e) in that case or other circumstances where
an annotator felt incapable of assigning a class.
Finally, the case that none of the above classes are appropriate was captured by a last class (f). This is the most common
case for sentences randomly obtained from news articles.
Masking Company names in each sentence were automatically masked so that the classifier did not learn relations
between specific organisational named entities but between any text sequences tagged as organisations. Three types of masks
were used: one mask each for the two organisational named entities in question (‘__NE_FROM__’ and ‘__NE_TO__’), and
one mask (‘__NE_OTHER__’) for all other organisational named entity mentions not in question but occurring in the
sentence. The exact character sequences of these masks are irrelevant; they just need to be uncommon enough to not be
confused with any words expected to appear in the input text. As Figure 7 shows, for each possible unordered pair of
two organisational named entities, the masking will be different. A sentence with three organisational named entities, for
instance, will result in three differently masked versions. For each of these masked versions, the classification algorithm
is supposed to consider the relation between the entities masked as ‘__NE_FROM__’ and ‘__NE_TO__’. This way it is
ensured that relations between all organisational named entities in a sentence are classified. For a given pair of organisational
named entities, it is sufficient to always mask the one mentioned first as ‘__NE_FROM__’ and the one mentioned thereafter
as ‘__NE_TO__’. This is because classes are either non-directional or there is a class for each directionality.
Labelling process The labelling was conducted independently by seven annotators who had received the same written
instruction as well as an introductory labelling session. To facilitate the labelling, a Web app had been developed to pro-
vide an interactive user interface, as shown in Figure 6. For each pair of two organisational named entities that had been
automatically detected, the most appropriate relation could be chosen from a drop-down menu.
To measure inter-annotator agreement, a subset of all sentences had to be labelled by all annotators. In addition,
within each labelling session of 100 sentences, 5 sentences from the beginning of a session were randomly re-injected
towards the end to measure intra-annotator agreement. To obtain the final dataset, a simple majority vote was performed
for each organisation-to-organisation relation in each sentence across all annotators. The overall process is illustrated in
Figure 7.
The organisational named entities were not masked but revealed to the annotators. This decision was made deliberately
to not make the labelling task more difficult by adding a layer of abstraction. To avoid incorrect labels, annotators were
specifically instructed not to use the company names as a clue for their labelling decision, to only consider the information
provided by the sentence at hand, and to not use any personal background knowledge about the relationship between two
organisations.
4.3. Classification
Classes For the training and testing of the classifier, the classes ‘no buyer–supplier relation’ and ‘rejected by annotator’ were
merged as any application based on the classifier would likely treat the ‘reject’ class the same as a non-existing relation,
10 P. Wichmann et al.
Figure 6. Interface of the annotation app: each pair of already auto-detected organisational named entities was labelled by human
annotators.
Figure 7. Each sentence may contain multiple pairs of organisational named entities; each pair gets labelled potentially multiple times
and potentially by multiple annotators.
especially in case of incorrectly identified organisations. This results in five classes that need to be distinguished by the
classifier.
Used algorithms In the domain of Machine Learning and NLP, new network architectures are published frequently and
the state-of-the-art is a fast-moving target. As a representative of the current state-of-the-art, a BiLSTM deep neural network
was chosen. To add further points of comparison, we also chose a multi-layer perceptron (MLP) as well as a linear Support
Vector Machine (SVM) classifier (Boser, Guyon, and Vapnik 1992).
Baseline performance To establish a baseline performance, we use two dummy classifiers: a random one and a stratified
one. As the name suggests, the random dummy classifier votes fully randomly, resulting in a uniform distribution of assigned
labels. The stratified baseline classifier10 votes randomly but respects the training set’s class distribution. That is if the class
‘None’ represents 70% of all assigned class labels, then the baseline classifier would vote ‘None’ randomly but 70% of the
time. The stratified classifier is expected to outperform the fully random classifier.
Details on the neural network classifiers (MLP and BiLSTM) The feature set is identical for both neural network
classifiers and is based on the word embeddings obtained from the GloVe dataset. The dataset, originally consisting of
International Journal of Production Research 11
840B tokens and 300-dimensional vectors trained on Common Crawl, was filtered by those tokens actually present in the
training data. Each mask was assigned a separate embedding vector, e.g. the mask ‘__NE_FROM__’ was assigned the
300-dimensional
vector
1 . . . 1 1 1 0 . The other masks were assigned the vectors 1 . . . 1 1 0 1 and
1 . . . 1 0 1 1 , respectively.
The BiLSTM and the MLP architecture were designed to expect 380 features as input. This means that, for each classifi-
cation task, a text sequence of up to 380 ‘words’ can be fed into the network. Each ‘word’ is represented by its corresponding
300-dimensional embedding. The BiLSTM has an embedding layer and considers 16 features in the hidden state of the
LSTM layer. It also uses a dropout with a probability of 0.5. The last layer is a dense layer with five output units, one for
each class. The LSTM model used the standard hyperbolic tangent function (‘tanh’) activation function. The MLP archi-
tecture is identical in terms of input and output. An embedding layer is followed by a single hidden layer of size 128. This
then connects to the output layer of size 5. In initial tests of increasing the network depth did not lead to noticeable perfor-
mance improvements. ReLU was used as an activation function for the MLP. In the case of both neural network classifiers,
a Softmax layer is used to normalise the outputs so that they can be interpreted as probabilities. As common for single-label
multi-class classification problems, categorical cross-entropy was used as a loss function for the neural networks. ‘Adam’
(Kingma and Ba 2014) was used as optimisation algorithm. Each network is trained and tested multiple times and the results
are averaged over all runs.
Details on the linear SVM classifier Generally, an SVM is a discriminative classifier that estimates a separating hyper-
plane in a high-dimensional feature space given labelled training data. The algorithm outputs an optimal hyperplane which
can be used to categorise new examples. We use a grid search to tune the hyperparameter of the SVM classifier that is
commonly referred to as C. Simply speaking, SVMs aim to fit a hyperplane to separate data points such that (1) the largest
minimum margin between different classes is achieved and (2) as many instances as possible are correctly separated. As it
is not always possible to optimise both, the C parameter determines the importance of (1).
For the SVM model, a different data representation had to be chosen. We use a simple bag-of-words approach
(Joachims 1998), where the order of words is disregarded, and a one-hot-vector is used to represent a sentence. A one-
hot-vector is a vector with a length equal to the vocabulary size of the training dataset (in our case 10,803 tokens), a value
of ‘1’ is assigned to the index of the vector if a word appears in the given sentence, ‘0’ otherwise. The SVM does not
consider word order nor does it consider positional information about the organisational named entities. Similar to the other
algorithms, the SVM is not provided with the company names. The SVM classifier is trained on this representation using a
one-vs-rest setup.
5.2. Classification
The achieved classification results are shown in Table 1. Given the class imbalances, we report the micro-averaged metrics
instead of macro-averaged ones. A macro-averaged metric would initially be computed independently for each class and
12 P. Wichmann et al.
Figure 8. Label distribution by annotator before majority vote (N = 14, 632 assigned labels).
then averaged over all classes. This would treat all classes equally despite their different sizes. Furthermore, in multi-class
single-label scenarios, the micro-averaged recall equals the micro-averaged precision, and hence the F1 score. Oversampling
the minority classes was conducted but did not visibly improve classification performance. The dataset was partitioned into
training set (70%), validation set (10%), and test set (20%). The neural networks were implemented in Python 3.6 using
PyTorch and trained on a single Linux desktop machine using an NVidia GeForce GTX 1080 Ti GPU. Using the above
system and dataset, a single training and testing run (e.g. BiLSTM trained over 22 epochs and using a batch size of 32)
could be completed in approximately one minute. As is common practice, the loss, and the F1 score on the validation data
were observed while increasing the number of epochs to avoid over- or underfitting. The model with the best score was
automatically saved to avoid under- or overfitting with respect to the number of epochs. The training was conducted up
to 100 epochs, and in an initial trial up to 1000 epochs. With a batch size of 32, the best BiLSTM model in our tests was
obtained between epoch 17 and 37.
International Journal of Production Research 13
The overall results are provided in Table 1. As expected, the stratified dummy classifier outperforms the random one
and achieves a micro-averaged F1 score of 0.38. The actual classifiers perform well-above this baseline.
The class-wise classification performance for the SVM is shown in Table 2.
The class-wise classification performance for the MLP is shown in Table 3.
The class-wise classification performance for the BiLSTM is shown in Table 4.
Even though, the BiLSTM achieved a micro-averaged F1 score of approximately 0.72 compared to 0.71 achieved by the
MLP, this shall not suggest that the BiLSTM is generally the best algorithm for the problem at hand. Despite multiple runs,
differences may still be due to chance and not all algorithms, configurations and data representations could be tested. The
trained classifiers appear to be able to distinguish well between Class 0 and all others, as demonstrated by the F1 score of
0.85 for Class 0 achieved by both neural network architectures. Especially, recall of the Classes 1–4 still remains relatively
low which is likely due to the small size of the obtained dataset. More concretely, the classifier may encounter completely
new linguistic expressions in the test phase that it did not encounter during the training phase. This is true in particular for
the classes with small sample size, such as ‘ownership / part-of’ and ‘B supplies A’. Regarding the interpretation of the
achieved results, the following limitations have to be considered.
The obtained dataset focussed on two manufacturing industries, automotive and aerospace, which may limit its useful-
ness for other industries with different supply network structures. While some generic expressions, such as ‘supplies with’,
are used across industries and the classifier should perform well for these, other expressions may be more industry-specific
and including examples of these in the dataset could lead to the discovery of further relations.
Because the human annotation was collected for already NER-tagged sentences, the error introduced by the NER itself
is not considered in the stated classification performance.
With regards to defining relationship classes, there appears to be a trade-off between the number of relationship classes
and the simplicity of the classification task. More classes may lead to a longer annotation time or lower labelling quality.
On the other hand, the defined classes are still limited in their ability to distinguish more subtle semantic differences in
relationships. For example, it may be useful to distinguish buyer–supplier relations that have just ended from those who
14 P. Wichmann et al.
explicitly never existed, and a relationship that is explicitly said to have never existed may have to be distinguished from
one where information is just lacking.
It seems as if much of the information is encapsulated in the words themselves rather than the word order. However, it
should be immediately clear that a model that does not consider word sequences cannot possibly always correctly distinguish
the class ‘A supplies B’ from the class ‘B supplies A’. For some of those sentences, the set of words is identical and only
the order of the entity mentions in the sentence is different. Thus, it is expected that models able to consider sequences
(sequence models) will outperform those models that are not able to do so.
Relations that were identified as directed ones are indicated as such in the map. The arrow head indicates the detected
material flow. The classifier interpreted the relation between Toyota and Suzuki as non-directional/partnership, as indicated
by the lacking arrow head for this link. The input text described two relation types between Toyota and Denso: a directed
buyer–supplier relation and an ownership one. The size of this knowledge graph can become arbitrarily large by processing
additional news data.
It is obvious that supply chain maps that can be generated solely based on simple company-to-company relations are
limited. For instance, the visualisation appears to suggest that Velocity Composites is a sub-tier supplier of Airbus. However,
the provided text example alone does not provide sufficient evidence for this inference. One way to address this ‘transitiv-
ity problem’ is to also extract the end-product for which a part is intended if this fact is mentioned in the context. The
visualisation can also be further enriched. e.g. the confidence in each relation classification could be indicated.
Samples of annotated sentences, resulting predictions as well as code have been made available in a public GitHub
repository.13
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. Global Vectors for Word Representation (GloVe); https://nlp.stanford.edu/projects/glove/; last accessed: 7 January 2018.
2. Word2Vec; https://code.google.com/archive/p/word2vec/; last accessed 7 January 2018.
3. LION project; http://lbd.lionproject.net; last accessed 7 January 2018.
4. Reuters corpora TRC2 and RCV1 (https://trec.nist.gov/data/reuters/reuters.html).
5. NewsIR’16 dataset (https://research.signal-ai.com/newsir16/signal-dataset.html); last accessed: 10 January 2019.
6. Webhose.io (https://webhose.io/); last accessed: 10 January 2019.
7. spaCy (https://spacy.io/; last accessed: 9 January 2019) is a widely used open-source NLP software library.
8. Flair (https://github.com/zalandoresearch/flair; last accessed: 11 June 2019).
9. Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/index.html; last accessed: 11 June 2019).
10. Stratified dummy classifier provided by the scitkit-learn library (https://scikit-learn.org/stable/modules/generated/sklearn.dummy.
DummyClassifier.html; last accessed: 13 June 2019).
11. d3.js (https://d3js.org/); last accessed: 11 June 2019.
12. Cytoscape (https://cytoscape.org/); last accessed: 11 June 2019.
13. https://github.com/pwichmann/supply_chain_mining.
References
Achilles Group. 2013. “Supply Chain Mapping.” Accessed August 30, 2016. http://www.achilles.co.uk/images/pdf/communitypdfs/Auto
motive/Achilles-Supply-Chain-Mapping.pdf.
Bach, Nguyen, and Sameer Badaskar. 2007. “A Review of Relation Extraction.” Literature Review for Language and Statistics II 2: 1–15.
Barratt, Mark, and Adegoke Oke. 2007. “Antecedents of Supply Chain Visibility in Retail Supply Chains: A Resource-based Theory
Perspective.” Journal of Operations Management 25 (6): 1217–1233.
Basole, Rahul C., and Marcus A. Bellamy. 2014. “Supply Network Structure, Visibility, and Risk Diffusion: A Computational Approach.”
Decision Sciences 45 (4): 753–789.
Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. “A Training Algorithm for Optimal Margin Classifiers.” In
Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. Pittsburgh, ACM.
Business Continuity Institute. 2014. “Supply Chain Resilience 2014 Annual Survey.” http://www.bcifiles.com/supply-chain.pdf.
Choi, Thomas Y., and Yunsook Hong. 2002. “Unveiling the Structure of Supply Networks: Case Studies in Honda, Acura, and
DaimlerChrysler.” Journal of Operations Management 20 (5): 469–493.
Chollet, Francois. 2018. Deep Learning with Python. 1st ed. Shelter Island, NY: Manning.
Chowdhury, Gobinda G. 2003. “Natural Language Processing.” Annual Review of Information Science and Technology 37 (1): 51–89.
Christopher, Martin, and Hau Lee. 2004. “Mitigating Supply Chain Risk Through Improved Confidence.” International Journal of
Physical Distribution & Logistics Management 34 (5): 388–396. doi:10.1108/09600030410545436.
Christopher, Martin, and Helen Peck. 2004. “Building the Resilient Supply Chain.” International Journal of Logistics Management 15
(2): 1–13. doi:10.1108/09574090410700275.
Cohen, Jacob. 1960. “A Coefficient of Agreement for Nominal Scales.” Educational and Psychological Measurement 20 (1): 37–46.
Cowie, Jim, and Wendy Lehnert. 1996. “Information Extraction.” Communications of the ACM 39 (1): 80–91.
Farris, M. Theodore. 2010. “Solutions to Strategic Supply Chain Mapping Issues.” International Journal of Physical Distribution &
Logistics Management 40 (3): 164–180.
Gardner, John T., and Martha C Cooper. 2003. “Strategic Supply Chain Mapping Approaches.” Journal of Business Logistics 24 (2):
37–64.
Goh, Mark, Robert De Souza, Allan N. Zhang, Wei He, and P. S. Tan. 2009. “Supply Chain Visibility: A Decision Making Perspective.”
In 2009 4th IEEE Conference on Industrial Electronics and Applications, ICIEA 2009, 2546–2551, IEEE.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. 1st ed. Cambridge, MA: MIT Press.
Graves, Alex, and Jürgen Schmidhuber. 2005. “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network
Architectures.” Neural Networks 18 (5–6): 602–610.
Hearst, Marti A. 1992. “Automatic Acquisition of Hyponyms from Large Text Corpora.” In Proceedings of the 14th conference on
Computational Linguistics, Vol. 2, 23–28, Nantes, ACL.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-term Memory.” Neural Computation 9 (8): 1735–1780.
International Journal of Production Research 17
Ivanov, Dmitry, Boris Sokolov, and Alexandre Dolgui. 2014. “The Ripple Effect in Supply Chains: Trade-off ‘Efficiency-
Flexibility-Resilience’ in Disruption Management.” International Journal of Production Research 52 (7): 2154–2172.
doi:10.1080/00207543.2013.858836.
Joachims, Thorsten. 1998. “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In European
Conference on Machine Learning, 137–142. Springer, Berlin, Heidelberg.
Jüttner, Uta, Helen Peck, and Martin Christopher. 2003. “Supply Chain Risk Management: Outlining An Agenda for Future Research.”
International Journal of Logistics: Research and Applications 6 (4): 197–210.
Kingma, Diederik P., and Jimmy Ba. 2014. “Adam: A method for stochastic optimization.” arXiv preprint: 1412.6980.
KPMG International & The Economist Intelligence Unit. 2013. “The Global Manufacturing Outlook 2013 – Competitive Advantage:
Enhancing Supply Chain Networks for Efficiency and Innovation.” 27. http://www.kpmg.com/DE/de/Documents/global-manufac
turing-outlook-2013-kpmg.pdf.
Lambert, Douglas M., and Martha C. Cooper. 2000. “Issues in Supply Chain Management.” Industrial Marketing Management 29 (1):
65–83.
Maedche, Alexander, and Steffen Staab. 2001. “Ontology Learning for the Semantic Web.” IEEE Intelligent Systems 16: 72–79.
Mentzer, J. T., William Dewitt, J. S. James, S. Keebler, Soonhong Min, Nancy W. Nix, Carlo D. Smith, and Zach G. Zacharia.
2001. “Defining Supply Chain Management.” Journal of Business Logistics 22 (2): 1–25. http://onlinelibrary.wiley.com/doi/
10.1002/j.2158-1592.2001.tb00001.x/abstract.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.”
arXiv preprint: 1301.3781.
Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. “Distant Supervision for Relation Extraction Without Labeled Data.”
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural
Language Processing of the AFNLP: Volume 2 – ACL-IJCNLP ’09 2 (2005): 1003. http://portal.acm.org/citation.cfm?doid =
1690219.1690287.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. Cambridge, Massachusetts: MIT Press.
Nadeau, David, and Satoshi Sekine. 2007. “A Survey of Named Entity Recognition and Classification.” Lingvisticae Investigationes 30
(1): 3–26.
Pyysalo, Sampo, Simon Baker, Imran Ali, Stefan Haselwimmer, Tejas Shah, Andrew Young, and Yufan Guo. 2018. “LION LBD: A
Literature-Based Discovery System for Cancer Biology.” Bioinformatics 35 (9): 1553–1561. doi:10.1093/bioinformatics/bty845.
Sheffi, Yossi. 2005. The Resilient Enterprise: Overcoming Vulnerability for Competitive Advantage. Vol. 1. Cambridge, Massachusetts:
MIT Press.
Soto-Viruet, Yadira, W. David Menzie, John F. Papp, and Thomas R. Yager. 2013. “USGS Open-File Report 2013–1239 – An Exploration
in Mineral Supply Chain Mapping Using Tantalum as an Example.” Technical Report. http://pubs.usgs.gov/of/2013/1239/.
Suarez-Barraza, Manuel F., José Miguel-Davila, and C. Fabiola Vasquez-García. 2016. “Supply Chain Value Stream Mapping: A New
Tool of Operation Management.” International Journal of Quality and Reliability Management 33 (4): 518–534.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.
“Attention Is All You Need.” In Advances in Neural Information Processing Systems, 5998–6008. Neural Information Processing
Systems (NIPS).
Wichmann, Pascal, Alexandra Brintrup, Simon Baker, Philip Woodall, and Duncan McFarlane. 2018. “Towards Automatically Generating
Supply Chain Maps From Natural Language Text.” IFAC-PapersOnLine 51 (11): 1726–1731. 16th IFAC Symposium on Informa-
tion Control Problems in Manufacturing INCOM 2018. http://www.sciencedirect.com/science/article/pii/S2405896318313284.
Yamamoto, Ayana, Yuichi Miyamura, Kouta Nakata, and Masayuki Okamoto. 2017. “Company Relation Extraction from Web News
Articles for Analyzing Industry Structure.” In 2017 IEEE 11th International Conference on Semantic Computing (ICSC), 89–92.
IEEE.
Yan, Tingting, Thomas Y. Choi, Yusoon Kim, and Yang Yang. 2015. “A Theory of the Nexus Supplier: A Critical Supplier From a
Network Perspective.” J of Supply Chain Management 51 (1): 52–66.