You are on page 1of 18

World Patent Information 32 (2010) 203–220

Contents lists available at ScienceDirect

World Patent Information


journal homepage: www.elsevier.com/locate/worpatin

Enhancing patent landscape analysis with visualization output q


Yun Yun Yang *, Lucy Akers, Cynthia Barcelon Yang, Thomas Klose, Shelley Pavlek
Bristol-Myers Squibb Co., P.O. Box 4000, Princeton, NJ 08543-4000, USA

a r t i c l e i n f o a b s t r a c t

Keywords: A patent landscape analysis can be defined as a state-of-the-art patent search that provides graphic rep-
Text mining resentations of information from search results. The focus is patents and patent applications from a given
Data visualization tools technology area or company patent portfolio. Unlike a traditional state-of-the-art search which provides
Patent information relevant information in text format, patent landscape analysis provides graphics and charts to demon-
Intellectual property analysis
strate patenting trends, leading patent assignees, collaboration partners, white space analysis, technology
Patent landscape analysis
Business intelligence
evaluations, etc.
In this article, we will illustrate two case studies from a more in-depth evaluation of some text mining
tools. Output from these tools may be integrated into patent analysis workflow to yield critical visual
views of the data and actionable business intelligence.
Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction 2. Selection of tools for Phase II study

In 2006, we initiated a study to investigate text mining and Criteria for selection of tools for further study were based on
visualization tools that were available on the market. In Phase I, ease of use and ability to work with the structured datasets that
we examined three groups of tools and identified their strengths can be routinely downloaded from online databases. Cost was also
and potential limitations from a patent analyst perspective. These a consideration. Some tools were excluded because their complex-
experiences have been summarized in an article in World Patent ity or capabilities exceeded our requirements.
Information [1]. Previous studies by others have also evaluated Tool selection was based on the following observations:
one or more of these tools [2,3].
In mid 2007, we started our Phase II activities which included  VantagePoint
ongoing monitoring of additional tools and collaborations such as o relatively inexpensive
Megaputer, Innography, IFI PatentAtlas, Linguamatics, IBM Text o some unique features such as list generation, list clean up,
Analytics and IRF Matrixware. In this article we describe two case and cross-correlation maps
studies focused on two types of patent landscape analysis using
only the following tools: VantagePoint, STN AnalyzePlus and STN  STN AnalyzePlus
AnaVist. o easy to use tool that adds visualization to the familiar analyze
Patent landscape analysis provides a break-down of patents and command
applications from a specific technology area or company patent o inexpensive
portfolio. A patent landscape analysis report typically presents (i) o quick and simple to access and generate charts
a visual representation of information derived from a state-of-
the-art search, (ii) graphics and charts that show patenting trends,  STN AnaVist
key patent assignees, collaboration partners, and technology eval- o free software
uations and (iii) text information related to the graphs and charts o more options compared to AnalyzePlus
in an interactive database that comes with the analysis report. o requires no filters for data import
o natural progression to this post processing tool since we are
very familiar with STN
q
This article has been developed from a presentation by the authors at the PIUG
These three tools were chosen since they were thought to work
2009 Annual Conference, San Antonio, TX, USA, May 2–7, 2009.
* Corresponding author. Tel.: +1 609 818 4726. well with structured data. All three tools enable quick processing
E-mail address: yunyun.yang@bms.com (Y. Yang). of large sets of information in a short period of time and provides

0172-2190/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.wpi.2009.12.006
204 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Fig. 1. VantagePoint sample data fields.

Fig. 2. STN Analyze Plus default data fields.

the basic charts and matrices to answer the who, when, where, and tagePoint analysis is a series of matrices, factor maps, correlation
what type of questions. maps, and charts.
VantagePoint comes with a reader module that enables end-
users to interact with, but not edit, the output and data analyzed
3. Brief description of selected tools by a professional information analyst.
Fig. 1 shows a screen shot of data fields imported from Derwent
3.1. VantagePoint World Patents Index (DWPI) for analysis by VantagePoint. The list
is database dependent.
VantagePoint is empowered by natural language processing
(NLP). It allows rapid navigation through structured text, such as 3.2. STN Analyze Plus
bibliographic information obtained from online hosts, to discover
hidden patterns, trends, and relationships. Lists generated from STN Analyze Plus is available with STN Express, Version 7.01
various fields (including NLP phrases from title, abstracts, or and higher. The Analyze Plus Wizard lets you easily (i) analyze,
claims) can be cleaned, using pre-defined thesauri, so that the con- cross-tabulate, and chart data in Microsoft Excel from single- or
cepts can be grouped, clustered, or categorized. The output of Van- multi-file search results for one or two fields of data, and (ii) group
Y. Yang et al. / World Patent Information 32 (2010) 203–220 205

3.3. STN AnaVist

STN AnaVist is an interactive analysis and visualization soft-


ware that offers a variety of ways to analyze search results and
visualize patterns and trends in the research landscape. The capa-
bilities include integration of content from multiple databases such
as CAplus or DWPI and the company name standardization. The
visualization results can be shared with others within an
organization.
One feature of STN AnaVist is the research landscape, which
provides a visual representation of the key concepts across the en-
tire document set. Other content view options include top organi-
zations, collaboration partners and top researchers, as well as a list
of the selected references. The tool also features a highlighting
manager, with interactive functionality.
Three default charts are automatically generated: key research-
ers by publication year trends, key organizations or assignees, and
research landscape. Fig. 3 shows ten data fields that are available
Fig. 3. STN AnaVist data fields. for analysis through STN AnaVist. Additional charts and matrices
may be generated using these fields.

4. Patent landscape analysis – two case studies


related author/inventor names and corporate source/patent as-
signee names.
The case studies are focused on two types of patent landscape
One Analyze Plus feature that stands out is the utilization of
analysis. Companies are often interested in profiling a technology
the STN standardized company name thesaurus, which provides
area or an organization as a part of a technical intelligence or com-
complete company historical detail including mergers and acqui-
petitive study. Therefore, a case study has been chosen to illustrate
sitions. It also lists joint ventures to enhance the search results.
each of these.
This feature simplifies analysis of complete company information
The first case study is a technology assessment in the hepatitis C
(patent assignee) without manually clustering them. It greatly
virus (HCV) area. Since this is a technology assessment-related case
reduces the time for post processing answer sets and improves
study, it is focused on the current state and potential future direc-
the accuracy and the comprehensiveness of the information
tion of this technology.
retrieved.
The second case study is a company assessment of Kosan Biosci-
Fig. 2 is a screen shot of the default data fields available for anal-
ences. This is focused on understanding the work of a single com-
ysis by STN Analyze Plus. Additional fields may be included via a
pany, so it is useful to identify key researchers, collaborators, and
customizable option.
trends in research.

Chart 1. Major patent assignee vs. priority filing.


206 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 2. Major assignees correlated by major inventors.

4.1. Technology patent landscape – case study 1 – hepatitis C virus 4.1.1. VantagePoint
inhibitors The dataset was created by running a simple text search in the
MicroPatent full-text database (US, WO and EP patent documents),
In general, a technology patent landscape includes a high-level limited to patent documents published from 2001 to 2007. The fol-
visualization of a large dataset of patent information and provides lowing keywords or concepts were searched in the title, abstract
a quick overview of the various segments in a particular and claims:
technology domain to support decision-making. The data may be ((ns1* OR ns2* OR ns3* OR ns4* OR ns5*) OR (((non ADJ struct*)
organized and highlighted to show patent filings trends, patent OR nonstruct*) ADJ3 protein*1)) AND (HCV or hepatitis ADJ C) AND
counts by country, top patenting organizations and other details. (inhibit* or *agoni* or block* or modulat*).
It may also be used to identify potential collaboration, acquisition The search strategy was not intended to be comprehensive for
and merger opportunities. this particular case study. The search retrieved 540 patent docu-
The primary objective of this case study was to determine if the ments which represented 299 unique inventions (patent family
three chosen tools were useful for technology assessment using groups). This dataset of 299 patents was used for VantagePoint
patent information. analysis.
In the case study of HCV inhibitors, the following questions are Chart 1 is a 3D chart showing the top 15 patent assignees out of
addressed: a total of 94 patent assignees in this dataset. Two thirds of these
assignees had only 1 patent. The patenting activity was measured
 Who are the major players? by plotting priority year vs. number of patents. Hoffman La Roche
 When have they been patenting and how active are they started filing patents in 2005. GSK started in 2004, but was less
currently? active compared to Roche. Boehringer Ingelheim started at least
 Which companies and inventors are collaborating with from 1998 and peaked in 2003 with 17 unique inventions.
whom? Note that intellectual effort was required to clean up the
 What HCV inhibitor chemotypes (type of compounds) are they assignee name list in VantagePoint to create the chart above. In
working on? contrast, as you will see later, company name thesaurus can be
Y. Yang et al. / World Patent Information 32 (2010) 203–220 207

Chart 3. Major assignees grouped by IPC.

Fig. 4. STN Analyze Plus sample standardized company and inventor name clustering.

automatically applied to generate standardized assignee list with Dendreon. Another example is Merck, which appears to be associ-
minimal, if any, intellectual effort. ated with three different partners. On the other hand, Bristol-
Chart 2 is a cross-correlation map showing 13 top assignees cor- Myers Squibb and Boehringer Ingelheim appear to be working
related by their major inventors. This map illustrates interactions independently in this area.
among the various assignees based on whether documents have Chart 3 is a correlation map showing major assignees grouped
common inventors. Each dot, or node, represents one assignee, together on the basis of International Patent Classification codes
and in VantagePoint, moving the cursor over each dot brings up (IPCs, version 2007.01).
a list of the inventors associated with the patents that each dot A cluster of patent assignees is shown in the lower left corner,
represents. suggesting that these companies which includes Bristol-Myers
Assignees connected by a solid line may be collaborating on cer- Squibb, are developing similar types of compounds as HCV inhibi-
tain aspects of their HCV research – for example, Schering and tors, while Chiron, Pfizer and others use different approaches.
208 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 4. Major patent assignee vs. publication year.

Chart 5. Major patent assignee vs. main IPC.


Y. Yang et al. / World Patent Information 32 (2010) 203–220 209

Chart 6. Major patent assignee vs. IPC class code A61P0031.

Hovering over a patent assignee node displays a typical drop 2001–2007, with BMS peaking in 2007, Schering in 2005, and
down list. This example illustrates IPC codes associated with Chir- Boehringer in 2003.
on. Clicking on a particular assignee node generates a list of docu- Chart 5 shows the top 5 IPC codes plotted against the top five
ments associated with that assignee. patent assignees. They have significant patenting activity in
So, while Chart 2 shows definite correlations between assignees A61P0031 (therapeutic activity of chemical compounds or medici-
based on inventor analysis, Chart 3 shows more possible techno- nal preparations), A61K0031 (medicinal preparations containing
logical relationships between assignees based on the IPC codes that organic active ingredients) and A61K0038 (medicinal preparations
describe the inventive concepts. These codes are assigned indepen- containing peptides) areas, followed closely by C07K005 (organic
dently by the various patenting offices. chemistry; peptides).
Chart 6 shows the patenting activity of the top 16 patent assign-
4.1.2. STN Analyze Plus ees in the A61P0031 area (therapeutic activity of chemical com-
The dataset was created by running the same simple text search pounds or medicinal preparations). This was the most frequently
described earlier for VantagePoint. This search was conducted on used IPC code in the dataset.
STN in the Chemical Abstracts database and was limited to basic Chart 7 shows the patenting activities of the top eight patent
patents (as defined by Chemical Abstracts Service) published from assignees vs. CAS controlled terms. Hepatitis C virus, antiviral
2001 to 2007, with keywords appearing in the title and abstract. agents and interferons are common controlled terms indexed from
The search retrieved 294 patent unique inventions which were documents from the major patent assignees.
submitted for analysis by STN Analyze Plus.
STN Analyze Plus uses STN standardized company names. This
feature allows for analysis of complete company information (pat- 4.1.3. STN AnaVist
ent assignee) without manual clustering. Similar automatic clus- The dataset was the same as that used for STN Analyze Plus. STN
tering may be done with inventor names as well. Fig. 4 AnaVist also uses STN standardized company names.
illustrates this automatic clustering for patent assignees and Chart 8 shows a 4-pane display of publication year, documents,
inventors. assignees and research landscape. The year 2007 is highlighted and
Chart 4 shows the patenting activities of the top 16 patent those assignees that have basic patents (as defined by Chemical
assignees out of a total of 82 patent assignees in this dataset. The Abstracts Service) that published in 2007 are also shown (interac-
patenting activity was measured by plotting patent assignee vs. tive highlighting).
publication year. This chart shows that the top four assignees have Chart 9 shows a matrix (partial) of patent assignee vs. priority
had consistent patent activity in this HCV inhibitor area, from application year (left table). When Schering is highlighted (right
210 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 7. Major patent assignee vs. controlled terms.

Chart 8. Publication year, documents, assignees and research landscape.


Y. Yang et al. / World Patent Information 32 (2010) 203–220 211

Chart 9. Patent assignee vs. priority application year.

Chart 10. Research landscape.

table), Dendreon also becomes highlighted. One might deduce that tion with Schering. Thus Dendreon and Schering may have part-
five of the six Dendreon patents may have resulted from collabora- nered in this area prior to 2000.
212 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 11. Kosan’s research teams.

What research has Kosan done recently and what may


they have abandoned?
Mechanism of Action vs Priority Year

14

12

None given.

10 HSP-90 Inhibitors
Cancer Cell Growth Inhibitors
Motilin Agonists
8
Peptide Inhibitors
Bacterial Growth Inhibitors
6 Tubulin Polymerization
Megalomicin Synthesis
Hydroxylase
4
Gene therapy.
GPCR agonist
2
Antibiotic

0
1995 1996
1997 1998
1999 2000
2001 2002
2003 2004
2005 2006

Chart 12. Kosan research program profile – mechanism of action.

Chart 10 shows the research landscape for this particular set of 4.2. Company patent landscape – case study 2 – Kosan Biosciences
HCV inhibitor documents. There is a large cluster of documents in
the ‘‘viral nucleoside” space as well as the ‘‘protein” space. A company patent landscape may provide a patent intelli-
gence report that consists of visual representations of the patent
4.1.4. Summary of case study 1 assets of a specific company. It may also provide insights about
The objectives of case study 1 were met. Results show that the company in terms of patenting trends, geographic protec-
VantagePoint, STN Analyze Plus and STN AnaVist are useful for tions, top inventors, research focus and collaboration partners.
technology assessment. Furthermore, the Micropatent dataset The landscape may also illustrate the company’s relative
was successfully imported into VantagePoint and subsequently patenting strength and potential opportunities for future
used to assess the patent landscape for hepatitis C virus collaborations.
inhibitors. The corporate thesaurus function on STN proved to The primary objective in this case study was to see if the three
be very valuable for automatic clustering of patent assignee chosen tools were useful for a company patent landscape. Specifi-
names. cally for VantagePoint, we wanted to see if the dataset from DWPI
Y. Yang et al. / World Patent Information 32 (2010) 203–220 213

What research has Kosan done recently and


what they may have abandoned (in terms of Use )?

7
Polyketides
Hyperproliferative Disease
6 Cancer
Anti-Infective Agents
5 Gastric Motility Diseases
Recombinant DNA
4 Disease
For Treating Multiple Myeloma
3 For the Production of Synthetic Genes/Libraries
Chemical Deriv.

2 Epothilone Deriv.
Erythromycin Deriv.

0
1995 1996
1997 1998
1999 2000
2001 2002
2003 2004
2005 2006

Chart 13. Kosan research program profile – use.

Chart 14. Controlled terms vs. publication year.

worked well with value added abstract sub-fields for novelty, use,  What is the research focus in terms of Novelty?
and mechanism of action.  What is the research focus in terms of Use?
Typical questions for company patent landscape analysis  What is the research focus in terms of Mechanism of
include: Action?
 What research have they done recently?
 Where does the company patent?  What research may have been abandoned?
 Who are they collaborating with?
 Who are their top inventors? 4.2.1. VantagePoint
 What are their research teams? A patent assignee name search (/pa) was performed in both
 What are each of the research teams working on? HCAplus and DWPI files on STN. In HCAplus file, the company
214 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 15. International classification codes.

name was expanded using CAS standardized company thesaurus (e 4.2.2. STN Analyze Plus
kosan/co) to ensure a comprehensive search result. In DWPI, A patent assignee name search (/pa) was performed in HCAplus
expanding the Derwent Patent Assignee Code KOSA-N (e kosa-n/ and DWPI files on STN. In HCAplus file, the company name was ex-
paco) helped to quickly identify all company names associated panded using CAS standardized company thesaurus (e kosan/co) to
with Kosan Biosciences, which were then searched in the patent ensure a comprehensive search result. This search result was used
assignee (/pa) field. The results were combined in DWPI and dupli- to generate Chart 14 below.
cates were removed. The final dataset of 123 patent family records Charts 15–18 were generated using the dataset described in
was downloaded from DWPI. Section 4.2.1.
Chart 11 is an auto correlation map that shows who is working Chart 14 shows a two field analysis using CAS controlled terms
with whom within Kosan. It may indicate that Zhou, Meller and vs. the publication year, using the Chemical Abstracts dataset. The
Johnson work in the same program. chart shows the steady publication activity in the areas of ‘‘gene,
Chart 12 shows the mechanism of action sub-field of the DWPI microbial” and ‘‘polyketides”. The chart also shows a decreased
abstract and what programs are active. Heat-shock-protein-90 is publication trend in ‘‘molecular cloning” since 2004.
the topic of 15 patents. There was a spike in patent filing in 2004 Chart 15 shows Kosan’s main research focus in terms of IPC
for HSP-90 inhibitors and the Motilin agonist program was active codes, using the DWPI dataset. The most frequently assigned is
in 2006. Other mechanisms of action include cancer cell growth C12n0015 (mutation or genetic engineering; DNA or RNA concern-
inhibitors and Motilin agonists. The NOT-GIVEN category comes ing genetic engineering, vectors, e.g. plasmids, or their isolation,
from patents where the mechanisms of action are not disclosed preparation or purification; use of hosts therefore) and subclass
or indexed. 52 (genes encoding for enzymes or proenzymes for DNA or RNA
Chart 13 plots the numbers of patents/patent applications vs. fragments; modified forms thereof in recombinant DNA-technol-
patent priority filing years in terms of Use (also a DWPI abstract ogy) [4].
sub-field). For example, the Use term ‘‘Erythromycin Derivatives” Chart 16 was created using Options (see Options ? Cus-
is found in documents filed in 1998 and 1999, but there was no re- tom ? Field name and code), using the DWPI dataset. STN Analyze
lated patenting activity after 1999. This program may have been Plus algorithm parses the sentences or phases into individual
discontinued or abandoned. words so the output is not useful.
Y. Yang et al. / World Patent Information 32 (2010) 203–220 215

Chart 16. Mechanism of action vs. publication year.

Chart 17. Derwent title terms vs. publication year.


216 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 18. Derwent title terms (cleaned) vs. publication year.

Please note that although STN Analyze Plus offers the list clean from the ‘‘hsp 90” cluster from the Research Landscape shown in
up option to group, ungroup, or rename the words, it is very diffi- Chart 19 (the area not highlighted is non-contributing).
cult to clean up the list from the Derwent value added sub-fields Chart 21 shows the top 22 Technology Indicators out of 323
such as Mechanism of Action (ACTN field), without knowing them Technology Indicators generated from the entire dataset. Technol-
beforehand. ogy Indicators are CAS controlled terms that are only available in
Chart 17 shows title words in the Derwent Title Term field plot- documents from CAplus, and chemistry-relevant USPATFULL and
ted against publication year, using the DWPI dataset. The chart is USPAT2 documents that have been enhanced with indexing.
not useful since only the individual words, and not phrases, were
plotted. However, if the title words were cleaned or re-grouped, 4.2.4. Summary of case study 2
the visualization output may present more useful insight as shown The objectives of case study 2 were met. Results show
in Chart 18. that VantagePoint, STN Analyze Plus and STN AnaVist were
Chart 18, using the DWPI dataset, shows the cleaned Derwent ti- useful for generating a company patent landscape. The dataset
tle terms which could be potentially useful to depict the company’s from DWPI, especially the value added abstract sub-fields for
research areas. ‘‘Cancer” appears to be a major research focus in re- novelty, use, and mechanism of action, worked very well
cent years, while ‘‘antibiotic” decreased since 2000. with VantagePoint. However, STN Analyze Plus and STN AnaVist
do not have this capability for in-depth analysis using DWPI sub-
4.2.3. STN AnaVist fields.
A patent assignee name search (/pa) was performed in HCAplus
and DWPI files on STN. Charts 19 and 20 were generated using the 5. Overall tool function summary
dataset described in Section 4.2.1.
In HCAplus file, the company name was expanded using CAS In summary, all three tools accomplished the objectives with
standardized company thesaurus (e kosan/co) to ensure a compre- respect to technology and company landscape analysis. Features
hensive search result. This search result was used to generate Chart that stand out for VantagePoint include: sub-field analysis for
21 below. Derwent value added information, such as mechanism of action,
Chart 19 shows the Research Landscape generated from titles/ novelty and use, etc.; correlation maps and matrices that have
abstracts clustering field from DWPI dataset. Each document is the potential to provide insights into collaboration amongst com-
represented once on the landscape by a dot. The two most panies and their researchers; and the VP reader.
frequently occurring concepts are displayed in a cluster area. An outstanding feature, that is common to STN Analyze Plus
Clusters of documents with similar content indicate possible re- and STN AnaVist, is the ability to use the standardized company
search areas. name thesaurus that provides an automatic comprehensive listing
Chart 20 is a four-panel view of clustering Concepts, Publication of companies, with minimal need for additional manual clean up.
Year Trends, Key Researchers, and document titles charts. Yellow Features that stand out for STN Analyze Plus include: easy to gen-
highlight shows the clustering concepts, year of publication, and erate visualization that is quick and cost effective per analysis; and
the inventor names that contribute to the 14 documents selected flexibility that allows users to manipulate the charts generated in
Y. Yang et al. / World Patent Information 32 (2010) 203–220 217

Chart 19. Research landscape.

Chart 20. Clustering concepts, publication year trends, key researchers, and document titles.

Excel and re-create charts at will. Features that stand out for STN visualization charts; technology indicators (unique to Chemical
AnaVist include: easy, automatic generation of three default Abstracts datasets) and research landscape visualization charts;
218 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Chart 21. Technology indicators.

Table 1
Summary of tool features.

Features VantagePoint STN Analyze Plus STN AnaVist


Data import Requires initial import filter for Not applicable Easy but only from select STN files
each data source but allows most
sources with fielded data
Maximum size for dataset A 100 MB raw dataset. As a rule- 50,000 records 20,000 records
of-thumb, this may equate to
about 10,000 DWPI (Derwent
World Patent Index)
bibliographic records
Data clean up and list Yes Yes Yes
manipulation
Temporal analysis Yes Yes Yes
(publication/priority year)
Patent assignee analysis Yes Yes Yes
Auto-generated patent assignee No – allows user to create own Yes Yes
thesaurus/corporate history thesauri
Classification analysis Yes Yes Yes
(IPC, ECLA, USPC)
Keyword analysis Yes Yes – but the fields were parsed into Yes – generated for research landscape only
individual words only. No practical use
Abstract sub-field analysis using Yes No No
Derwent data
CAS Controlled Term analysis Yes Yes Yes – analyzed as Technology Indicator clustering
field
Collaboration analysis Yes No Yes
Customized reports and graphs Yes Yes – cross-Tabulate data can be saved Yes – chart data can be saved in comma delimited
in Excel and used to make additional (.csv) format for use in Microsoft Excel
charts
End-user interface Yes – VP reader No No – it is not possible to share .shx files across
service centers at this time. However, .xta files may
be shared between users with full-access STN login
IDs
Cost Upfront software license Fixed-cost per field analysis. The Fixed-cost per analysis/visualization. The analysis
Note: Irrespective of whether required, with no analysis fee. analysis fee is based on the number of fee is based on the number of answers to be
the database access is Need to download and pay answers to be analyzed. The cost adds analyzed. It is relatively higher than STN Analyze
through fixed-fee, or pay as separately for fee-based up when multiple charts are generated. Plus, but covers multiple field analysis and sub-set
you go, this reflects the cost of bibliographic records Downloading bibliographic records is analysis. Downloading bibliographic records is not
using the tools not required for analysis. However, a required for analysis. However, a separate cost will
separate cost will apply if records are apply if records are displayed
displayed

and an interactive, highlighting manager that permits one to The key features for VantagePoint, STN Analyze Plus and STN
see contributing and non-contributing relationships in multiple AnaVist are summarized in the Table 1. The perceived strengths
charts. and limitations are summarized in Table 2.
Y. Yang et al. / World Patent Information 32 (2010) 203–220 219

Table 2
Perceived Strengths and Limitations Summary.

Tool Strengths
VantagePoint  Analytical toolbox for technology and company assessment
 Sophisticated visualization output, i.e. correlation maps, factor maps, matrices, Excel charts
 Can provide further in-depth analysis on abstracts, particularly DWPI abstracts for the analysis of mechanism of action, novelty, and use
 A customized report to include the key information in a table format
 VP reader to enable file sharing with end-users
 Can work with multiple database sources, utilizing filters
STN Analyze  Use of the CA company thesaurus – this eliminates the need for manual clean up of the assignee field and provides a comprehensive listing of com-
Plus panies, including mergers and acquisitions (this feature greatly improves the precision of the information output)
 The charts are easy to generate and easy to manipulate
STN AnaVist  Use of the CA company thesaurus – this eliminates the need for manual clean up of the assignee field and provides a comprehensive listing of com-
panies, including mergers and acquisitions
 The charts and matrices are easy to generate and easy to navigate
 The data is interactive
 Highlighting manager: (a) automatically updates other related charts (b) multiple colors to perform comparative analysis
Tool Limitations
VantagePoint  List clean up can be tedious and time consuming
 Steep, initial learning curve
 Does not handle chemistry-related text well
STN Analyze  Limited number of default fields which can be analyzed
Plus  Cannot cross-analyze the same field, e.g. cannot analyze author/inventor vs. author/inventor or corporate source/patent assignee vs. corporate
source/patent assignee
 Limitation on charting capabilities (although there are several options for creating charts in Excel, such as column, bar, pie, among others, there is
no capability for creating more complex matrices)
 Cost accumulation when generating multiple visualization charts
 Limited to databases on STN network
STN AnaVist  Limitation on the clustering fields at the visualization stage
 Limitation on the default fields from which you can analyze (cannot analyze Derwent abstract sub-fields, i.e. mechanism of action, novelty, use,
etc.)
 No Reader available – it is not possible to share .shx files across service centers at this time (however, .xta files may be shared between users with
full-access STN login IDs)
 Limited to STN databases, Caplus; WPINDEX, WPIDS, or WPIX (access to WPIDS and WPIX for subscribers only); USPATFULL, including USPAT2;
PCTFULL; EPFULL

6. Conclusions example, cross-correlation maps of patent assignees correlated


by major inventors reveal interesting insights on possible collabo-
We have discussed our Phase II investigation of text mining and ration relationships between companies. Correlation maps of pat-
visualization tools, and described two case studies using Vantage- ent assignees based on IPC codes helps visualize the tenuous
Point, STN Analyze Plus and STN AnaVist to enhance patent land- relationships among patent assignees based on the inventive con-
scape analysis. The results of these case studies demonstrate that cepts in their patents, as classified independently by the various
each of these tools has unique strengths as well as some patenting offices. STN Analyze Plus and STN AnaVist do not provide
limitations. an online reader for users nor in-depth types of correlation
Charts showing patenting trends based on technology terms, analyses.
company names, and other data fields can be easily generated Patent analysts play a critical role in selecting the types of anal-
using VantagePoint, STN Analyze Plus, and STN AnaVist. Vantage- ysis and visualization options that are most appropriate to the
Point allows use of multiple data sources and analysis of any im- dataset based on their scientific expertise, knowledge of the busi-
ported data fields, while STN AnalyzePlus and STN AnaVist are ness, understanding of the clients’ needs and knowledge of tools
limited to a select group of CAS sources and to a certain number and databases. They collaborate interactively with users to refine
of default data fields that can be analyzed. On the other hand, the analysis criteria, execute the analysis strategy, iteratively refine
STN Analyze Plus and STN AnaVist are convenient to use without strategy against desired output, and guide users in navigating the
any preliminary preparation, while VantagePoint requires data dynamic reports presented to users to realize the full value of
clean up to deliver useful output. the reports.
STN AnaVist and AnalyzePlus earn high marks for their user In summary, text mining and visualization tools enhance patent
interface. STN AnaVist provides interactive charts with colorful landscape analysis by providing easy to generate charts and in-
highlighting features that automatically update other related sights that are hidden or not easy to identify by conventional man-
charts when looking at a particular data point, making it easy to ual analysis. Text mining and visualization tools facilitate and
navigate through different views. It also allows for more detailed supplement human expert analysis. Patent analysts, who are
drill-down into a specific field. These STN tools also provide the responsible for selecting and implementing the right tools, execut-
convenience of automatic clustering for inventor names and patent ing the analysis strategy, and designing the appropriate output to
assignees using a pre-defined CAS company name thesaurus. While facilitate decision-making, play a critical role towards the overall
VantagePoint provides a thesaurus capability, it is based on indi- success of patent landscape analysis.
vidual user preferences and requires investment of user time/ef-
fort, but can be an advantage for those more inclined to build Acknowledgement
their own thesauri/ontologies.
VantagePoint comes with a reader that enables sharing of anal- As in our earlier article [1], special thanks to the vendors who
ysis results with users. It also provides insights that would be al- demonstrated their products and allowed us to use their product
most impossible to discern via conventional manual analysis. For information for this article.
220 Y. Yang et al. / World Patent Information 32 (2010) 203–220

Cynthia S. Barcelon Yang Director of Patent Analysis


References
Group at Bristol-Myers Squibb, has over 20 years
experience in the information science field. Prior to
[1] Yang Yun Yun, Akers L, Barcelon Yang C, Klose T. Text mining and visualization
joining BMS in 2001, Cynthia had worked at DuPont,
tools – impressions of emerging capabilities. World Patent Inform
initially as a bench chemist and later as information
2008;30:280–93.
[2] Laura Routsalainen. Data mining tools for technology and competitive scientist, group leader, project manager, patent
intelligence. VTT Tiedotteita – Research Notes 2451. Utgivare Publisher: searcher, and team leader in the Patent Information
Julkaisija; 2008. Services group at DuPont Pharmaceutical subsidiary.
[3] Gerhard Fischer, Nicolas Lalyre. Analysis and visualization with host-based Cynthia obtained her M.S. degree in Chemistry from
software – the features of STNÒ AnaVist™. World Patent Inform Marquette University and a M.S. degree in Information
2006;28(4):312–8. and Library Science from Drexel University. She is the
[4] http://www.wipo.int/classifications/fulltext/new_ipc/ipcen.html. current Chair of the Patent Information Users Group
(PIUG), Inc. and is an IP Advisory Board Member of the Information Retrieval Facility
(IRF), a Member of the STN Advisory Council and the Patent Documentation Group
Yun Yun Yang has almost 20 years experience in the (PDG).
patent information field. She is currently a senior patent
analyst with Bristol-Myers Squibb Company. In addition
to her primary function of performing legally significant Thomas Klose is currently a Senior Patent Analyst in
patent searches, she also leads an initiative to evaluate Bristol-Myers Squibb’s Patent Analysis Group. He first
text mining and visualization tools for the Patent worked with Eastman Kodak Co. in Rochester, NY as a
Analysis group. She started her career as a medicinal Research Chemist. Later on, he transferred to Kodak’s
chemist at the University of Pennsylvania, where she legal division and became a Patent Searcher. After a
conducted post doctoral research on the synthesis of short interval as a Senior Information Scientist with
new imaging agents for Central Nervous System (CNS) American Cyanamid in Princeton, NJ, he moved to BMS
receptors. Later, she worked as an information scientist in 2001. Tom graduated from the University of Adelaide
at DuPont and then at DuPont Pharmaceutical before with a PhD degree in Organic Chemistry. He is a mem-
taking her present position. Yun Yun obtained her PhD degree in Organic Chemistry ber of Patent Information Users Group (PIUG) and the
from Beijing Normal University. She is a member of the Patent Information Users American Chemical Society (ACS).
Group (PIUG) and American Chemical Society (ACS).

Lucy Teixeira Akers is a US Patent Agent with over


25 years of experience in the information field. She has Shelley L. Pavlek Senior Patent Analyst II at Bristol-
formerly held patent information positions at Shell and Myers Squibb Company, has over 12 years experience in
ExxonMobil, and is currently a senior patent analyst the chemical and patent information field. Shelley
with Bristol-Myers Squibb. From 2000–2004, Ms. Akers started her career as a medicinal chemist at Hoechst
was Chair of the Patent Information Users Group, whose Marion Roussel Pharmaceuticals (now Sanofi-Aventis)
membership grew over 20% during her term. Her lead- in 1993 and later as a chemical information scientist,
ership and work with the PIUG and connections with followed by patent information scientist. Shelley
other professional organizations, such as with the Pat- obtained her B.S. degree in Chemistry from Gannon
ent and Trademark Group in the UK, and with WON in University and her M.S. degree in Organic Chemistry
the Netherlands, makes her a recognized international from Ohio University. Shelley is a member of the Patent
figure in the patent information world, particularly for Information Users Group, as well as the Chemistry and
her active role in patent information development, training and education. She is the Law and the Chemical Information sections of the
currently the Chair of the Selection Board for the International Patent Information American Chemical Society.
Award.

You might also like