You are on page 1of 45

The Insider’s Guide to Accessing NLM Data

Welcome to E-utilities for PubMed


Download PubMed data your way!
Mike Davidson, MLS

National Library of Medicine


National Institutes of Health
U.S. Department of Health and Human Services
Today’s Agenda
• Look at some challenging PubMed questions
• Introduce you to a possible solution
• Provide some background
• Explore some practical tools, which can…
• Answer our questions!

2
How can I get “X” out of PubMed in
“Y” format?

3
An example:
• Which authors are currently doing research on
diabetes in pregnant women?
• Which organizations are funding research on
this topic?
PubMed CSV format groups all authors
together, doesn’t include funders.
4
Another example:

“Results by year” graph not customizable,


cannot compare multiple results sets.
oxycodone abuse fentanyl abuse

5
What do you need your tools to do?

Get exactly the data you need

…and only the data you need

…in the format you need.

6
E-utilities
An API for accessing PubMed
and other NCBI databases
7
What does that mean?

8
What is an API?
• API: Application Programming Interface
• A set of tools, routines, and protocols for
building software applications.

9
The E-utilities API
A set of tools, routines, and protocols that allow
you to interact directly with the data in 20+ NCBI
databases, including PubMed, the MeSH
database, and PMC.

10
Okay, but what does that mean?

11
The E-utilities API is just a series of
rules for querying a database.

12
URLs as Database Queries
• Each query is a URL.
• The response depends on how you build the
URL.
• There are tools that can build URLs for you,
but…
• First, we need to learn the basics.
13
The three parts of an E-utilities URL
• Base URL
– The address of the E-utilities server
• Utility name
– The specific tool you are using
• Parameters
– The details of your query
14
The three parts of an E-utilities URL
• Base URL
– https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
• Utility name
– esearch.fcgi?
• Parameters
– db=pubmed&term=“science”[journal]
+AND+breast+cancer+AND+2008[pdat]

15
The Base URL

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/

16
E-utilities Utilities
• Each of the 9 utilities asks a different kind of
question
• ESearch (esearch.fcgi?)
– Searches a database and returns a list of unique
identifiers (UIDs)
– Does NOT retrieve the full records
• EFetch (efetch.fcgi?)
– Takes series of UIDs and retrieves the full records
17
E-utilities Utilities, cont’d
Plus seven more:
EInfo
ECitMatch
EPost
ESpell
ELink
ESummary
EGQuery
See https://dataguide.nlm.nih.gov for more details

18
Parameters
• db (database)
– db=pubmed
• term (search term)
– term=“science”[journal]
+AND+breast+cancer+AND+2008[pdat]

19
Parameters, cont’d
• id (list of UIDs)
– id=11748933,11700088
• retmode/rettype (return format)
– retmode=xml
– retmode=text&rettype=abstract
• Plus lots and lots more
– https://dataguide.nlm.nih.gov for full documentation.
20
The Three Parts…Together!
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
?db=pubmed&term=“science”[journal]
?db=pubmed&term=science[journal]
+AND+breast+cancer+AND+2008[pdat]

Base URL + Utility Name + Parameters =


E-utilities URL!
21
Using an E-utilities URL
• Just pop it in your browser and see what
happens!

22
Another E-utilities URL
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&id=11748933,11700088&retmode=text&re
ttype=abstract

23
The real power of E-utilities
• Combining multiple URLs in sequence
• Using E-utilities in a programming
environment
– Don’t have to create each URL by hand
– More options for easily manipulating output

24
Using E-utilities in a programming
environment
• Learn a programming language
• Develop functions that create E-utilities URLs

But wait…
• Call functions that send URLs to the E-utilities
server
• Parse the XML response

25
EDirect
• Developed by NCBI
• Set of tools with the E-utilities URL creation
rules built in
• Can extract the specific data you need from
the PubMed XML
• Beginner-friendly, but very powerful
26
What do I need to get started with
EDirect?
• Unix command line terminal
– Mac OS X, Linux: Built in
– Windows: install Cygwin, a free Unix terminal
• EDirect installation package
– For instructions, see:
https://dataguide.nlm.nih.gov/edirect/install.html

27
What do I need to know to use
EDirect?
• Some knowledge of Unix
– Or, someone to answer questions for you
– Lots of help online
• Some familiarity with XML
• Some programming (Perl, Python) knowledge
– Helpful but not required
• The more you know, the more you can do.
28
EDirect in Action
esearch –db pubmed –query "fentanyl abuse"
<ENTREZ_DIRECT>
<Db>pubmed</Db>
<WebEnv>NCID_1_219455512_130.14.18.34_9001_1516220928_859419657_0MetA0_S_MegaStore_F_1</WebEnv>
<QueryKey>1</QueryKey>
<Count>820</Count>
<Step>1</Step>
</ENTREZ_DIRECT>

29
It’s Don’t
all goingPanic!
to be okay.
(I promise!)

30
Where did the URLs go?
$ esearch -db pubmed -query “fentanyl abuse”

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&term=fentanyl+abuse

31
Where did the URLs go? (cont’d)
$ efetch -db pubmed –id 27196667,26954292,26768169,26552130,
26437235,26400564,26364787,26284230,26227404 –format xml

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&id=27196667,26954292,26768169,265521
30,26437235,26400564,26364787,26284230,26227404
&retmode=xml
32
Putting EDirect commands together
$ esearch -db pubmed -query “fentanyl abuse” | \
> efetch –format xml

33
Answering questions with E-utilities

• Active authors on a topic


• Agencies funding a topic
• Comparing two Results by Year histograms

34
Active authors on a topic

35
Agencies funding a topic

36
Comparing Results by Year

37
E-utilities: the bigger picture
Environment

Tool

API

PubMed Data

38
E-utilities: the bigger picture
Environment

Tool

API

PubMed Data

39
E-utilities: the bigger picture
Environment

Tool

API

PubMed Data

40
E-utilities: the bigger picture
Environment

Tool

API

PubMed Data

41
Where do we go from here?
• Hands-on workshops
– EDirect for PubMed
• More classes to come!
• What would you like to see?
• https://dataguide.nlm.nih.gov/classes.html

42
Where can I get help?
• The Insider’s Guide online
– https://dataguide.nlm.nih.gov
• E-utilities documentation
– https://eutils.ncbi.nlm.nih.gov
• EDirect documentation
– https://www.ncbi.nlm.nih.gov/books/NBK179288
43
One more example, to get you thinking…
• We want to know what topics our organization
is publishing on.
• Our organization has many authors and
research components with a variety of names.
• How can we efficiently search for our
organization and analyze the topics of
published articles?
44
Questions?

45

You might also like