You are on page 1of 47

Case(s) for supply chain analytics

Text analytics for supply chain risk management


(Software/tools)
Week 7
Text Analytics
Applying data analytics to derive knowledge from text

Huge amount of textual data is available in the form of


• Social media posts
• Tweets
• NEWS
• Question answer forums
• Blogs
• YouTube video comments
• Product reviews
• News articles
How much textual data

• ~80 percent of corporate


and SC data is in some kind
of unstructured form (e.g.,
text)
• Tapping into these
information sources is not
an option, but a need to stay
competitive
• Solution: text mining
• A semi-automated process
of extracting knowledge
from unstructured data
sources
Data Mining versus Text Mining
• Both seek for novel and useful patterns
• Both are semi-automated processes
• Difference is the nature of the data:
• Structured versus unstructured data
• Structured data: in databases
• Unstructured data: Word documents, PDF files, text excerpts, XML files,
and so on
• Text mining – first, impose structure to the data, then mine the
structured data
Stakeholders of text analytics
• Government
• What is the response of people towards a particular policy?
• Advertisers
• What is trending that could be used for advertisement?
• Movie Makers
• What people disliked about a movie?
• Supply Chain analysts
• A company could use unstructured text mining to turn raw data from
contracts, social media feeds and news reports into structured data
which is relevant to the supply chain
• Academia
• Is this document plagiarized?
• Retrieve similar documents
Text analytics
• Benefits of text mining are obvious especially in text-rich data
environments
• e.g., law (court orders), academic research (research articles), finance
(quarterly reports), medicine (discharge summaries), biology
(molecular interactions), technology (patent files), Better customer
services (customer comments), etc.
• Electronic communization records (e.g., Email)
• Spam filtering
• Email prioritization and categorization
• Automatic response generation
Text Mining
• Information extraction
• Identification of key phrases and relationships within text by looking for
predefined objects and sequences in text by way of pattern matching
• Topic tracking
• Based on a user profile and documents that a user views, text mining can
predict other documents of interest to the user
• Summarization
• Summarizing a document to save time on the part of the reader
Text Mining
• Categorization
• Identifying the main themes of a document and then placing the document
into a predefined set of categories based on those themes
• Clustering
• Grouping similar documents without having a predefined set of categories
• Concept linking
• Connects related documents by identifying their shared concepts
• Question answering
• Finding the best answer to a given question through knowledge-driven
pattern matching
Text Mining Terminology
• Unstructured • Word frequency
or semistructured data • Part-of-speech tagging (assigning
• Corpus parts of speech to each word, such
• Terms as noun, verb, adjective)
• Concepts (feature) • Morphology/word structure
• Stemming • Term-by-document matrix
• Stop words • Singular value decomposition
• Synonyms
Text analytics: Tasks
Text analytics: Tasks
Text analytics: Tasks
Text analytics: Tasks
Text analytics: Text normalization
Text analytics: more basics
Text Mining Process
Software/hardware limitations
Privacy issues
Linguistic limitations
Context diagram for the text mining process

Unstructured data (text) Extract Context-specific knowledge


knowledge
from available
Structured data (databases) data sources
A0

Domain expertise
Tools and techniques
Text Mining Process
Task 1 Task 2 Task 3

Establish the Corpus: Create the Term- Extract Knowledge:


Collect & Organize the Document Matrix: Discover Novel
Domain Specific Introduce Structure Patterns from the
Unstructured Data to the Corpus T-D Matrix

Feedback Feedback

The inputs to the process The output of the Task 1 is a The output of the Task 2 is a The output of Task 3 is a
includes a variety of relevant collection of documents in flat file called term-document number of problem specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such computer processing populated with the term clustering models and
as text, XML, HTML, etc. frequencies visualizations

The three-step text mining process


Text Mining Process
• Step 1: Establish the corpus
• Collect all relevant unstructured data
• (e.g., textual documents, XML files, emails, Web pages, short notes)
• Digitize, standardize the collection
• (e.g., all in ASCII text files)
• Place the collection in a common place
• (e.g., in a flat file, or in a directory as separate files)
Text Mining Process
• Step 2: Create the Term–by–Document Matrix
n t
ri ng
me ee
Terms
ri sk a ge g in
n t
e nt an en me
tm tm re lop
es jec ftwa e P
Documents inv p ro s o d ev SA ...
Document 1 1 1

Document 2 1

Document 3 3 1

Document 4 1

Document 5 2 1

Document 6 1 1
...
Text Mining Process
• Step 2: Create the Term–by–Document Matrix (TDM)
• Should all terms be included?
• Stop words, include words
• Synonyms, homonyms
• Stemming
• What is the best representation of the indices (values in cells)?
• Row counts; binary frequencies; log frequencies;
• Inverse document frequency
Text Mining Process
• Step 2: Create the Term–by–Document Matrix (TDM)
• TDM is a sparse matrix. How can we reduce the dimensionality of the
TDM?
• Manual - a domain expert goes through it
• Eliminate terms with very few occurrences in very few documents (?)
• Transform the matrix using SVD
• SVD is similar to principle component analysis
Text Mining Process
• Step 3: Extract patterns/knowledge
• Classification (text categorization)
• Clustering (natural groupings of text)
• Improve search recall
• Improve search precision
• Scatter/gather
• Query-specific clustering
• Association rules
• Trend Analysis
Exploratory data analysis
• Gives insight about the data such as:
• Class distribution
• Top occurring words in the dataset
• Distribution of words per document

• These insights help in formulating solution strategies for the


task
• What preprocessing should be used?
• What classifier should be used?

• Sentiment Polarity Detection Dataset (example)


• Clothing products review text, Reviewer info, rating and sentiment
Exploratory data analysis
Sentiment Polarity
Detection Dataset
• Sentiment labels
• Rating distribution
• Distribution of age of
the reviewers
• Distribution of the text
length of the reviews
• Reviews per
department
Exploratory data analysis
• Frequency of top unigrams
before removing stopwords
• Frequency of top unigrams
after removing stopwords
• Frequency of top bigrams
before removing stopwords
• Frequency of top bigrams
after removing stopwords
• Frequency of top trigrams
before removing stopwords
• Frequency of top trigrams
after removing stopwords
Exploratory data analysis

An integral tool for text EDA is Word Cloud


Mining Text from news media/Twitter
Case study of several cluster of words

• Trade in Asia
• Logistics
• Macroeconomic issues
• China issues
• Energy issues
• Semiconductor
shortages
• Automobile supply chain
Sadeek and Hanaoka, 2023. Social Network Analysis and Mining
Trade in Asia
• Risks in Asian Market
• Trade in Asia
• The combination: “asia world time china japan people nikkei
political system covid lot big markets point coming real risks.”
• This word combination indicates plausible risks in the Asian
market
• A cluster of “trade japan taiwan economic south india korea
china security minister australia president indopacific,”
• This indicates a trade issue in the Asian region
Logistics and supply chain- Shipping, Port and
Logistics
• The word combination: “shipping port ports goods biden Canada cargo transport container freight logistics air
president white house.”
• In this cluster, the issues related to shipping, ports, containers, freight, logistics, cargo, etc. The presence of other
words, such as biden, Canada, president, white house may be indicative of the truck driver strikes at the USA–Canada
border.
• Retailers and Shopping risk
• The cluster is: “labor company stores holiday chain products forced retailers supply retail store vietnam cotton online
xinjiang.”
• By examining these words, duringg Omicron, Christmas and Black Friday might be dominant occasions in terms of
sales, and that possible labor shortages may be experienced during this time
• Supply Chain Revenue - Regional Supply Chain - Supply Chain Shortage
• Supply chain shortages were very common.
• From the beginning of the pandemic, “global shortage” and “disruptions” had been in the news constantly.
• In addition, “Regional Supply Chain” is represented by “billion company business companies group market investors
yuan million financial arm capital.”
• The word “regional” may indicate financial investment in the local market
Macroeconomic issues
• Price Surge risk
• The combination: “inflation bank rate central rates policy prices interest monetary fed
economy higher price market global consumer.”
• This topic represents the issue of price hikes, inflation and struggles experienced by
financial policymakers. “Price Surge” has been a problem since the beginning of the
COVID-19 pandemic due to issues of international trade
• Raw Material Import–Export
• It reflects issues in the importing and exporting of raw materials.
• Economic Growth
• The combination: “growth economic economy exports supply demand covid pandemic
domestic quarter expected gdp.”
• The pandemic and the war have both decreased economic growth in most countries.
Economic growth largely depends on imports, exports and trade. Therefore, news media
identified this topic as an important risk for supply chain management.
China issues
• China's Foreign Strategy
• China’s Covid Policy
• Due to China’s international policy, supply chain operations had
experienced issues related to trade, port congestion, and air and maritime
transport
• Energy issues
• Food and Oil Price Hit
• Energy Supply
• Energy prices have increased due to Russia and Ukraine no longer
exporting gas and oil
• Many countries are experiencing energy price hikes
Semiconductor shortage
• Chip Industry and Shortage
• Electronic Parts Production
• The word: “chip semiconductor chips industry taiwan manufacturing samsung global
billion production supply,”
• Another cluster “apple nikkei asia production told foxconn components suppliers
company iphone china tech supply maker.”
• These cluster of words clearly refer to the supply side of chip production and the
potential shortage of semiconductors for tech giants.
• Automobile supply chain
• Electric Car Production
• Automobile Supply Chain
• Supply chain disruption is largely related to the supply and production of electric and
electronic parts, semiconductor supply and energy supply (in e.g., Japan)
Logistics and supply chain issues
• In terms of logistics, some topics detected on Twitter
• Cargo Shipping Restrictions
• Supply Delay—Retailers
• Supply Shortage—War
• Supply Chain Resiliency
• Logistics Tension
Text Mining Tools
• Commercial Software Tools
• SPSS PASW Text Miner
• Statistica Data Miner
• Free Software Tools
• Netlytics (https://netlytic.org/)
• Voyant tool (https://voyant-tools.org/)
• ATLAS.ti (https://atlasti.com/)
• Topic modeling tool from Google code
(https://code.google.com/archive/p/topic-modeling-tool/)
Vector space modeling
Vector space modeling
• Set-of-Words: Documents
represented by vectors ∈ {0, 1}|Σ|
• Bag-of-Words: Documents
represented by term-frequency
vectors ∈ N |Σ|
Issues with Sets and Bag of Words
• representation has associated high
computational complexity
• Dimensionality blow up, |Σ| could
be very large
Vector space modeling
Tf-idf (Term frequency-Inverse document frequency) is more refined
model to select features to represent texts
• Key idea is to find special words characterizing the document
• Frequency:
• Most frequent words implies most significant in doc
• Most frequent words (“the”, “are”, “and”) help English structure and build
ideas but not significant in characterizing documents
• Rarity: Indicator of topics are rare words
• rare words overall but concentrated in a few docs “batsman”, “prime-
minister”
• ball, bat, pitch, catch, run =⇒ cricket related doc
TF-IDF
TF-IDF

▪ df i = document frequency of term i


= number of documents containing term i
▪ IDFi = inverse document frequency of term i,
IDFi = log (N/ df i)
(N: total number of documents)
TF-IDF
TF-IDF-Example

IDFi = log (N/ df i)


Natural Language Processing (NLP)
• Structuring a collection of text
• Old approach: bag-of-words
• New approach: natural language processing
• NLP is
• a very important concept in text mining
• a subfield of artificial intelligence and computational linguistics
• the studies of "understanding" the natural human language
• Syntax versus semantics-based text mining
Natural Language Processing (NLP)
• What is “Understanding” ?
• Human understands, what about computers?
• Natural language is vague, context driven
• True understanding requires extensive knowledge of a topic
Natural Language Processing (NLP)
• Challenges in NLP
• Part-of-speech tagging
• Text segmentation
• Word sense disambiguation
• Syntax ambiguity
• Imperfect or irregular input
• Speech acts

• Dream of AI community
• to have algorithms that are capable of automatically reading and
obtaining knowledge from text
NLP Task Categories
• Information retrieval
• Information extraction
• Named-entity recognition
• Question answering
• Automatic summarization
• Natural language generation and understanding
• Machine translation
• Foreign language reading and writing
• Text proofing
Use of NLP in SC logistics
Customer and partner communication:
• NLP-enabled chatbots are perfect for automating customer service
tasks and communicating with logistics partners
• They can assist with order tracking, scheduling, and resolving issues
Document automation
• In logistics, the processing of various documents like invoices,
shipping labels, and customs forms is tedious but critical
• NLP can automate this process by reading and interpreting text data,
even in different languages and formats, and populating databases or
systems as required
Use of NLP in SC logistics
Real-time monitoring and alerts
• Advanced NLP systems can analyze textual data from multiple sources
like emails, social media, news outlets, and other public records to
generate real-time alerts about potential disruptions in logistics
operations
• For example, they can send warnings about possible strikes, road
closures, or natural disasters that could impact the supply chain

You might also like