You are on page 1of 16

From Words to Meaning to Insight

Julia Cretchley & Mike Neal


Outline

 Content Analysis
 What is Leximancer?
 Steps to your first analysis
 In-depth Leximancer
What Is Leximancer?

 Leximancer is a software tool designed for analyzing


natural language text data
 Uses statistics-based algorithms
• Initial analysis in minutes
 Automatically analyzes a text collection
• User can direct search, add, remove, merge terms
 Extracts semantic (meaning) and relational
information (more later)
 Outputs include concept map, network cloud,
quantitative data, concept thesaurus
Leximancer Overview

Text
Let’s Look at Some Text

"We use the Laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The Laser 500 can batch process, and collate the pages to
save us time. Sometimes paper gets jammed in the Laser 500.
Then we have to open it up to remove the crumpled paper.
We have tried other machines in the past, but have not found
an alternative that works better for us.”
What is this text about? (one main topic)
Concept Extraction

 Terms around a word indicate its meaning

 Word associations discover concepts; language


independent

 Leximancer concept: A group of related words that


travel together in the text
• Evidence words include synonyms and adjectives
 They begin as seed words for coding and evolve to a thesaurus
• word-like, Name-like (proper nouns), and compounds
(United States)
Concept Extraction cont

 A few things to note...


• Several concepts may be in a single sentence
• Concept may span multiple sentences
• Adjustable resolution (default: 2 sentences)
• Stop lists remove common words (the, and)
 Algorithms
• Threshold of evidence words for a concept must be
present to be coded in a block of text
• Concept can be coded with evidence words, even if the
actual seed word (printer) is not present
Concept Extraction Units of Resolution

"We use the laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The laser 500 can batch process, and collate the pages to
save us time. Sometimes paper gets jammed in the laser 500.
Then we have to open it up to remove the crumpled pages. We
have tried other machines in the past, but have not found an
alternative that works better for us.”

Leximancer divides into two sentence units (configurable)


Concept Extraction Units of Resolution
"We use the Laser 500 printer here at the office. We are pretty
happy with it. Once there was a leak and all the toner spilled
out of the machine
machine, but a technician came out and fixed the
problem for us. We still have to top the toner up often. The
printer goes through ink quickly and the cartridges are
expensive, but we put up with this because it delivers good
results reliably. We are pleased with the quality of rinting we
get. The Laser 500 can batch process, and collate the pages to
save us time. Sometimes paper gets jammed in the Laser 500. 500
Then we have to open it up to remove the crumpled paper.paper
We have tried other machines in the past, but have not found
an alternative that works better for us.”

printer concept: laser 500, toner, machine, rinting


paper concept: pages, crumpled, jammed
Semantic and Relational Analysis

Semantic meaning created through conceptual


analysis
• Presence and frequency of words, phrases
• Co-occurrence of words make a concept
• Explicit and implicit concepts identified
(tsunami and earthquake imply Japan)
 Relationships created through concept co-
occurrence
Themes and Concept Map

 Themes
• Collection of related concepts in close proximity on the map
• Theme name is most prominent concept
 Concept map display
• Size of dots means frequency of occurrence
• Line between concepts show relationships
• Map proximity is by shared friends links (LinkedIn)
 Concept map becomes interface to explore underlying text
Concept and Theme Creation

Evidence words (thesaurus) Concepts


Laser 500
machine printer
toner
rinting
paper
pages
crumpled
jammed 2 co-occurrences of printer and paper
Additional Features

 Thesaurus (coding dictionary) automatically


generated
• No manual coding required
• Profiling and directed coding supported
 Analyst can seed their own terms
 Sentiment lens feature for affective analysis
 Discourse analysis of speakers supported
 Survey data analysis supported
Key Points Summary

 Automated, statistical approach


• How do you do this manually?
• No data management, dictionary creation and updates
 User does not have to formulate a coding scheme
• This saves time, and
• Avoids introduction of researcher bias (grounded theory)
 Nuances, subtleties, distinction in expression
• Word association approach most likely to identify these
 Evidence words with links from Leximancer allows deeper
exploration, documentation of findings
Questions?