You are on page 1of 18

DATA SCIENCE

GETTING DATA
AGENDA 2

I. GETTING DATA
II. REGEX / REQUESTS
III. API / WRAPPERS
INTRO TO DATA SCIENCE

I. GETTING DATA
GETTING DATA 4

Data lives all over the internet

The question is whether or not the


author of the data makes it easy for
us to grab it.
INTRO TO DATA SCIENCE

II. REGEX /
REQUESTS
REGEX / REQUESTS 6

REGular EXpressions
are how we capture patterns in text
WHO IS A DATA SCIENTIST? 7
‣ REGEX / REQUESTS 8

What regex can we use to capture this?


‣ REGEX / REQUESTS 9

BEAUTIFULSO
UP
Is a python based HTML parser.
‣ REGEX / REQUESTS 10

WEB
CRAWLERS
We just built one!

Hacking OKCupid: http://www.wired.com/2014/01/how-to-hack-okcupid/all/


‣ REGEX / REQUESTS 11

WEB
CRAWLERS
We just built one!

But be careful….
Hacking OKCupid: http://www.wired.com/2014/01/how-to-hack-okcupid/all/
INTRO TO DATA SCIENCE

III. APIS AND


WRAPPERS
‣ API / WRAPPERS 13

HTML Parsing vs. API


Must call using requests and Makes the call for us (the
BeautifulSoup (imitate author is “allowing us” to
human behavior) access the data)

http://www.pythonforbeginners.com/api/list-of-python-apis
‣ API / WRAPPERS 14

API (n):
Application Programming Interface

Easing access into a web based software


‣ API / WRAPPERS 15

Examples of API’s:

• Amazon (price data)


• Twitter (tweets)
• Facebook (social network)
• Sentiment Analysis
‣ API / WRAPPERS 16

Examples of API’s:

• Amazon (price data)


• Twitter (tweets)
• Facebook (social network)
• Sentiment Analysis

Mashape.com has an extensive collection


‣ API / WRAPPERS 17

API vs. API wrapper


May still be a bit confusing Puts the API into a specific
how to call the right page programming language.
Gives us python functions.

http://www.pythonforbeginners.com/api/list-of-python-apis
‣ API / WRAPPERS 18

Conclusion
Data is all over the web, but we must be
polite and conscious of what data is
available to us.

You might also like