Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: A patent landscape analysis can be defined as a state-of-the-art patent search that provides graphic rep-
Text mining resentations of information from search results. The focus is patents and patent applications from a given
Data visualization tools technology area or company patent portfolio. Unlike a traditional state-of-the-art search which provides
Patent information relevant information in text format, patent landscape analysis provides graphics and charts to demon-
Intellectual property analysis
strate patenting trends, leading patent assignees, collaboration partners, white space analysis, technology
Patent landscape analysis
Business intelligence
evaluations, etc.
In this article, we will illustrate two case studies from a more in-depth evaluation of some text mining
tools. Output from these tools may be integrated into patent analysis workflow to yield critical visual
views of the data and actionable business intelligence.
Ó 2009 Elsevier Ltd. All rights reserved.
In 2006, we initiated a study to investigate text mining and Criteria for selection of tools for further study were based on
visualization tools that were available on the market. In Phase I, ease of use and ability to work with the structured datasets that
we examined three groups of tools and identified their strengths can be routinely downloaded from online databases. Cost was also
and potential limitations from a patent analyst perspective. These a consideration. Some tools were excluded because their complex-
experiences have been summarized in an article in World Patent ity or capabilities exceeded our requirements.
Information [1]. Previous studies by others have also evaluated Tool selection was based on the following observations:
one or more of these tools [2,3].
In mid 2007, we started our Phase II activities which included VantagePoint
ongoing monitoring of additional tools and collaborations such as o relatively inexpensive
Megaputer, Innography, IFI PatentAtlas, Linguamatics, IBM Text o some unique features such as list generation, list clean up,
Analytics and IRF Matrixware. In this article we describe two case and cross-correlation maps
studies focused on two types of patent landscape analysis using
only the following tools: VantagePoint, STN AnalyzePlus and STN STN AnalyzePlus
AnaVist. o easy to use tool that adds visualization to the familiar analyze
Patent landscape analysis provides a break-down of patents and command
applications from a specific technology area or company patent o inexpensive
portfolio. A patent landscape analysis report typically presents (i) o quick and simple to access and generate charts
a visual representation of information derived from a state-of-
the-art search, (ii) graphics and charts that show patenting trends, STN AnaVist
key patent assignees, collaboration partners, and technology eval- o free software
uations and (iii) text information related to the graphs and charts o more options compared to AnalyzePlus
in an interactive database that comes with the analysis report. o requires no filters for data import
o natural progression to this post processing tool since we are
very familiar with STN
q
This article has been developed from a presentation by the authors at the PIUG
These three tools were chosen since they were thought to work
2009 Annual Conference, San Antonio, TX, USA, May 2–7, 2009.
* Corresponding author. Tel.: +1 609 818 4726. well with structured data. All three tools enable quick processing
E-mail address: yunyun.yang@bms.com (Y. Yang). of large sets of information in a short period of time and provides
0172-2190/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.wpi.2009.12.006
204 Y. Yang et al. / World Patent Information 32 (2010) 203–220
the basic charts and matrices to answer the who, when, where, and tagePoint analysis is a series of matrices, factor maps, correlation
what type of questions. maps, and charts.
VantagePoint comes with a reader module that enables end-
users to interact with, but not edit, the output and data analyzed
3. Brief description of selected tools by a professional information analyst.
Fig. 1 shows a screen shot of data fields imported from Derwent
3.1. VantagePoint World Patents Index (DWPI) for analysis by VantagePoint. The list
is database dependent.
VantagePoint is empowered by natural language processing
(NLP). It allows rapid navigation through structured text, such as 3.2. STN Analyze Plus
bibliographic information obtained from online hosts, to discover
hidden patterns, trends, and relationships. Lists generated from STN Analyze Plus is available with STN Express, Version 7.01
various fields (including NLP phrases from title, abstracts, or and higher. The Analyze Plus Wizard lets you easily (i) analyze,
claims) can be cleaned, using pre-defined thesauri, so that the con- cross-tabulate, and chart data in Microsoft Excel from single- or
cepts can be grouped, clustered, or categorized. The output of Van- multi-file search results for one or two fields of data, and (ii) group
Y. Yang et al. / World Patent Information 32 (2010) 203–220 205
4.1. Technology patent landscape – case study 1 – hepatitis C virus 4.1.1. VantagePoint
inhibitors The dataset was created by running a simple text search in the
MicroPatent full-text database (US, WO and EP patent documents),
In general, a technology patent landscape includes a high-level limited to patent documents published from 2001 to 2007. The fol-
visualization of a large dataset of patent information and provides lowing keywords or concepts were searched in the title, abstract
a quick overview of the various segments in a particular and claims:
technology domain to support decision-making. The data may be ((ns1* OR ns2* OR ns3* OR ns4* OR ns5*) OR (((non ADJ struct*)
organized and highlighted to show patent filings trends, patent OR nonstruct*) ADJ3 protein*1)) AND (HCV or hepatitis ADJ C) AND
counts by country, top patenting organizations and other details. (inhibit* or *agoni* or block* or modulat*).
It may also be used to identify potential collaboration, acquisition The search strategy was not intended to be comprehensive for
and merger opportunities. this particular case study. The search retrieved 540 patent docu-
The primary objective of this case study was to determine if the ments which represented 299 unique inventions (patent family
three chosen tools were useful for technology assessment using groups). This dataset of 299 patents was used for VantagePoint
patent information. analysis.
In the case study of HCV inhibitors, the following questions are Chart 1 is a 3D chart showing the top 15 patent assignees out of
addressed: a total of 94 patent assignees in this dataset. Two thirds of these
assignees had only 1 patent. The patenting activity was measured
Who are the major players? by plotting priority year vs. number of patents. Hoffman La Roche
When have they been patenting and how active are they started filing patents in 2005. GSK started in 2004, but was less
currently? active compared to Roche. Boehringer Ingelheim started at least
Which companies and inventors are collaborating with from 1998 and peaked in 2003 with 17 unique inventions.
whom? Note that intellectual effort was required to clean up the
What HCV inhibitor chemotypes (type of compounds) are they assignee name list in VantagePoint to create the chart above. In
working on? contrast, as you will see later, company name thesaurus can be
Y. Yang et al. / World Patent Information 32 (2010) 203–220 207
Fig. 4. STN Analyze Plus sample standardized company and inventor name clustering.
automatically applied to generate standardized assignee list with Dendreon. Another example is Merck, which appears to be associ-
minimal, if any, intellectual effort. ated with three different partners. On the other hand, Bristol-
Chart 2 is a cross-correlation map showing 13 top assignees cor- Myers Squibb and Boehringer Ingelheim appear to be working
related by their major inventors. This map illustrates interactions independently in this area.
among the various assignees based on whether documents have Chart 3 is a correlation map showing major assignees grouped
common inventors. Each dot, or node, represents one assignee, together on the basis of International Patent Classification codes
and in VantagePoint, moving the cursor over each dot brings up (IPCs, version 2007.01).
a list of the inventors associated with the patents that each dot A cluster of patent assignees is shown in the lower left corner,
represents. suggesting that these companies which includes Bristol-Myers
Assignees connected by a solid line may be collaborating on cer- Squibb, are developing similar types of compounds as HCV inhibi-
tain aspects of their HCV research – for example, Schering and tors, while Chiron, Pfizer and others use different approaches.
208 Y. Yang et al. / World Patent Information 32 (2010) 203–220
Hovering over a patent assignee node displays a typical drop 2001–2007, with BMS peaking in 2007, Schering in 2005, and
down list. This example illustrates IPC codes associated with Chir- Boehringer in 2003.
on. Clicking on a particular assignee node generates a list of docu- Chart 5 shows the top 5 IPC codes plotted against the top five
ments associated with that assignee. patent assignees. They have significant patenting activity in
So, while Chart 2 shows definite correlations between assignees A61P0031 (therapeutic activity of chemical compounds or medici-
based on inventor analysis, Chart 3 shows more possible techno- nal preparations), A61K0031 (medicinal preparations containing
logical relationships between assignees based on the IPC codes that organic active ingredients) and A61K0038 (medicinal preparations
describe the inventive concepts. These codes are assigned indepen- containing peptides) areas, followed closely by C07K005 (organic
dently by the various patenting offices. chemistry; peptides).
Chart 6 shows the patenting activity of the top 16 patent assign-
4.1.2. STN Analyze Plus ees in the A61P0031 area (therapeutic activity of chemical com-
The dataset was created by running the same simple text search pounds or medicinal preparations). This was the most frequently
described earlier for VantagePoint. This search was conducted on used IPC code in the dataset.
STN in the Chemical Abstracts database and was limited to basic Chart 7 shows the patenting activities of the top eight patent
patents (as defined by Chemical Abstracts Service) published from assignees vs. CAS controlled terms. Hepatitis C virus, antiviral
2001 to 2007, with keywords appearing in the title and abstract. agents and interferons are common controlled terms indexed from
The search retrieved 294 patent unique inventions which were documents from the major patent assignees.
submitted for analysis by STN Analyze Plus.
STN Analyze Plus uses STN standardized company names. This
feature allows for analysis of complete company information (pat- 4.1.3. STN AnaVist
ent assignee) without manual clustering. Similar automatic clus- The dataset was the same as that used for STN Analyze Plus. STN
tering may be done with inventor names as well. Fig. 4 AnaVist also uses STN standardized company names.
illustrates this automatic clustering for patent assignees and Chart 8 shows a 4-pane display of publication year, documents,
inventors. assignees and research landscape. The year 2007 is highlighted and
Chart 4 shows the patenting activities of the top 16 patent those assignees that have basic patents (as defined by Chemical
assignees out of a total of 82 patent assignees in this dataset. The Abstracts Service) that published in 2007 are also shown (interac-
patenting activity was measured by plotting patent assignee vs. tive highlighting).
publication year. This chart shows that the top four assignees have Chart 9 shows a matrix (partial) of patent assignee vs. priority
had consistent patent activity in this HCV inhibitor area, from application year (left table). When Schering is highlighted (right
210 Y. Yang et al. / World Patent Information 32 (2010) 203–220
table), Dendreon also becomes highlighted. One might deduce that tion with Schering. Thus Dendreon and Schering may have part-
five of the six Dendreon patents may have resulted from collabora- nered in this area prior to 2000.
212 Y. Yang et al. / World Patent Information 32 (2010) 203–220
14
12
None given.
10 HSP-90 Inhibitors
Cancer Cell Growth Inhibitors
Motilin Agonists
8
Peptide Inhibitors
Bacterial Growth Inhibitors
6 Tubulin Polymerization
Megalomicin Synthesis
Hydroxylase
4
Gene therapy.
GPCR agonist
2
Antibiotic
0
1995 1996
1997 1998
1999 2000
2001 2002
2003 2004
2005 2006
Chart 10 shows the research landscape for this particular set of 4.2. Company patent landscape – case study 2 – Kosan Biosciences
HCV inhibitor documents. There is a large cluster of documents in
the ‘‘viral nucleoside” space as well as the ‘‘protein” space. A company patent landscape may provide a patent intelli-
gence report that consists of visual representations of the patent
4.1.4. Summary of case study 1 assets of a specific company. It may also provide insights about
The objectives of case study 1 were met. Results show that the company in terms of patenting trends, geographic protec-
VantagePoint, STN Analyze Plus and STN AnaVist are useful for tions, top inventors, research focus and collaboration partners.
technology assessment. Furthermore, the Micropatent dataset The landscape may also illustrate the company’s relative
was successfully imported into VantagePoint and subsequently patenting strength and potential opportunities for future
used to assess the patent landscape for hepatitis C virus collaborations.
inhibitors. The corporate thesaurus function on STN proved to The primary objective in this case study was to see if the three
be very valuable for automatic clustering of patent assignee chosen tools were useful for a company patent landscape. Specifi-
names. cally for VantagePoint, we wanted to see if the dataset from DWPI
Y. Yang et al. / World Patent Information 32 (2010) 203–220 213
7
Polyketides
Hyperproliferative Disease
6 Cancer
Anti-Infective Agents
5 Gastric Motility Diseases
Recombinant DNA
4 Disease
For Treating Multiple Myeloma
3 For the Production of Synthetic Genes/Libraries
Chemical Deriv.
2 Epothilone Deriv.
Erythromycin Deriv.
0
1995 1996
1997 1998
1999 2000
2001 2002
2003 2004
2005 2006
worked well with value added abstract sub-fields for novelty, use, What is the research focus in terms of Novelty?
and mechanism of action. What is the research focus in terms of Use?
Typical questions for company patent landscape analysis What is the research focus in terms of Mechanism of
include: Action?
What research have they done recently?
Where does the company patent? What research may have been abandoned?
Who are they collaborating with?
Who are their top inventors? 4.2.1. VantagePoint
What are their research teams? A patent assignee name search (/pa) was performed in both
What are each of the research teams working on? HCAplus and DWPI files on STN. In HCAplus file, the company
214 Y. Yang et al. / World Patent Information 32 (2010) 203–220
name was expanded using CAS standardized company thesaurus (e 4.2.2. STN Analyze Plus
kosan/co) to ensure a comprehensive search result. In DWPI, A patent assignee name search (/pa) was performed in HCAplus
expanding the Derwent Patent Assignee Code KOSA-N (e kosa-n/ and DWPI files on STN. In HCAplus file, the company name was ex-
paco) helped to quickly identify all company names associated panded using CAS standardized company thesaurus (e kosan/co) to
with Kosan Biosciences, which were then searched in the patent ensure a comprehensive search result. This search result was used
assignee (/pa) field. The results were combined in DWPI and dupli- to generate Chart 14 below.
cates were removed. The final dataset of 123 patent family records Charts 15–18 were generated using the dataset described in
was downloaded from DWPI. Section 4.2.1.
Chart 11 is an auto correlation map that shows who is working Chart 14 shows a two field analysis using CAS controlled terms
with whom within Kosan. It may indicate that Zhou, Meller and vs. the publication year, using the Chemical Abstracts dataset. The
Johnson work in the same program. chart shows the steady publication activity in the areas of ‘‘gene,
Chart 12 shows the mechanism of action sub-field of the DWPI microbial” and ‘‘polyketides”. The chart also shows a decreased
abstract and what programs are active. Heat-shock-protein-90 is publication trend in ‘‘molecular cloning” since 2004.
the topic of 15 patents. There was a spike in patent filing in 2004 Chart 15 shows Kosan’s main research focus in terms of IPC
for HSP-90 inhibitors and the Motilin agonist program was active codes, using the DWPI dataset. The most frequently assigned is
in 2006. Other mechanisms of action include cancer cell growth C12n0015 (mutation or genetic engineering; DNA or RNA concern-
inhibitors and Motilin agonists. The NOT-GIVEN category comes ing genetic engineering, vectors, e.g. plasmids, or their isolation,
from patents where the mechanisms of action are not disclosed preparation or purification; use of hosts therefore) and subclass
or indexed. 52 (genes encoding for enzymes or proenzymes for DNA or RNA
Chart 13 plots the numbers of patents/patent applications vs. fragments; modified forms thereof in recombinant DNA-technol-
patent priority filing years in terms of Use (also a DWPI abstract ogy) [4].
sub-field). For example, the Use term ‘‘Erythromycin Derivatives” Chart 16 was created using Options (see Options ? Cus-
is found in documents filed in 1998 and 1999, but there was no re- tom ? Field name and code), using the DWPI dataset. STN Analyze
lated patenting activity after 1999. This program may have been Plus algorithm parses the sentences or phases into individual
discontinued or abandoned. words so the output is not useful.
Y. Yang et al. / World Patent Information 32 (2010) 203–220 215
Please note that although STN Analyze Plus offers the list clean from the ‘‘hsp 90” cluster from the Research Landscape shown in
up option to group, ungroup, or rename the words, it is very diffi- Chart 19 (the area not highlighted is non-contributing).
cult to clean up the list from the Derwent value added sub-fields Chart 21 shows the top 22 Technology Indicators out of 323
such as Mechanism of Action (ACTN field), without knowing them Technology Indicators generated from the entire dataset. Technol-
beforehand. ogy Indicators are CAS controlled terms that are only available in
Chart 17 shows title words in the Derwent Title Term field plot- documents from CAplus, and chemistry-relevant USPATFULL and
ted against publication year, using the DWPI dataset. The chart is USPAT2 documents that have been enhanced with indexing.
not useful since only the individual words, and not phrases, were
plotted. However, if the title words were cleaned or re-grouped, 4.2.4. Summary of case study 2
the visualization output may present more useful insight as shown The objectives of case study 2 were met. Results show
in Chart 18. that VantagePoint, STN Analyze Plus and STN AnaVist were
Chart 18, using the DWPI dataset, shows the cleaned Derwent ti- useful for generating a company patent landscape. The dataset
tle terms which could be potentially useful to depict the company’s from DWPI, especially the value added abstract sub-fields for
research areas. ‘‘Cancer” appears to be a major research focus in re- novelty, use, and mechanism of action, worked very well
cent years, while ‘‘antibiotic” decreased since 2000. with VantagePoint. However, STN Analyze Plus and STN AnaVist
do not have this capability for in-depth analysis using DWPI sub-
4.2.3. STN AnaVist fields.
A patent assignee name search (/pa) was performed in HCAplus
and DWPI files on STN. Charts 19 and 20 were generated using the 5. Overall tool function summary
dataset described in Section 4.2.1.
In HCAplus file, the company name was expanded using CAS In summary, all three tools accomplished the objectives with
standardized company thesaurus (e kosan/co) to ensure a compre- respect to technology and company landscape analysis. Features
hensive search result. This search result was used to generate Chart that stand out for VantagePoint include: sub-field analysis for
21 below. Derwent value added information, such as mechanism of action,
Chart 19 shows the Research Landscape generated from titles/ novelty and use, etc.; correlation maps and matrices that have
abstracts clustering field from DWPI dataset. Each document is the potential to provide insights into collaboration amongst com-
represented once on the landscape by a dot. The two most panies and their researchers; and the VP reader.
frequently occurring concepts are displayed in a cluster area. An outstanding feature, that is common to STN Analyze Plus
Clusters of documents with similar content indicate possible re- and STN AnaVist, is the ability to use the standardized company
search areas. name thesaurus that provides an automatic comprehensive listing
Chart 20 is a four-panel view of clustering Concepts, Publication of companies, with minimal need for additional manual clean up.
Year Trends, Key Researchers, and document titles charts. Yellow Features that stand out for STN Analyze Plus include: easy to gen-
highlight shows the clustering concepts, year of publication, and erate visualization that is quick and cost effective per analysis; and
the inventor names that contribute to the 14 documents selected flexibility that allows users to manipulate the charts generated in
Y. Yang et al. / World Patent Information 32 (2010) 203–220 217
Chart 20. Clustering concepts, publication year trends, key researchers, and document titles.
Excel and re-create charts at will. Features that stand out for STN visualization charts; technology indicators (unique to Chemical
AnaVist include: easy, automatic generation of three default Abstracts datasets) and research landscape visualization charts;
218 Y. Yang et al. / World Patent Information 32 (2010) 203–220
Table 1
Summary of tool features.
and an interactive, highlighting manager that permits one to The key features for VantagePoint, STN Analyze Plus and STN
see contributing and non-contributing relationships in multiple AnaVist are summarized in the Table 1. The perceived strengths
charts. and limitations are summarized in Table 2.
Y. Yang et al. / World Patent Information 32 (2010) 203–220 219
Table 2
Perceived Strengths and Limitations Summary.
Tool Strengths
VantagePoint Analytical toolbox for technology and company assessment
Sophisticated visualization output, i.e. correlation maps, factor maps, matrices, Excel charts
Can provide further in-depth analysis on abstracts, particularly DWPI abstracts for the analysis of mechanism of action, novelty, and use
A customized report to include the key information in a table format
VP reader to enable file sharing with end-users
Can work with multiple database sources, utilizing filters
STN Analyze Use of the CA company thesaurus – this eliminates the need for manual clean up of the assignee field and provides a comprehensive listing of com-
Plus panies, including mergers and acquisitions (this feature greatly improves the precision of the information output)
The charts are easy to generate and easy to manipulate
STN AnaVist Use of the CA company thesaurus – this eliminates the need for manual clean up of the assignee field and provides a comprehensive listing of com-
panies, including mergers and acquisitions
The charts and matrices are easy to generate and easy to navigate
The data is interactive
Highlighting manager: (a) automatically updates other related charts (b) multiple colors to perform comparative analysis
Tool Limitations
VantagePoint List clean up can be tedious and time consuming
Steep, initial learning curve
Does not handle chemistry-related text well
STN Analyze Limited number of default fields which can be analyzed
Plus Cannot cross-analyze the same field, e.g. cannot analyze author/inventor vs. author/inventor or corporate source/patent assignee vs. corporate
source/patent assignee
Limitation on charting capabilities (although there are several options for creating charts in Excel, such as column, bar, pie, among others, there is
no capability for creating more complex matrices)
Cost accumulation when generating multiple visualization charts
Limited to databases on STN network
STN AnaVist Limitation on the clustering fields at the visualization stage
Limitation on the default fields from which you can analyze (cannot analyze Derwent abstract sub-fields, i.e. mechanism of action, novelty, use,
etc.)
No Reader available – it is not possible to share .shx files across service centers at this time (however, .xta files may be shared between users with
full-access STN login IDs)
Limited to STN databases, Caplus; WPINDEX, WPIDS, or WPIX (access to WPIDS and WPIX for subscribers only); USPATFULL, including USPAT2;
PCTFULL; EPFULL