Professional Documents
Culture Documents
Instantly Share Code, Notes, and Snippets.: Blech Netflix-Genre-Crawl - Py
Instantly Share Code, Notes, and Snippets.: Blech Netflix-Genre-Crawl - Py
blech / netflix-genre-crawl.py
Last active 13 hours ago
Star
netflix-genre-crawl.py
#!/usr/bin/python
# TODO make range command line arguments
import time
import urllib2
import BeautifulSoup
cookie = """ copy from eg Chrome's developer tools Request tab into here: NetflixId=[string];profiles
def get_genre(i):
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', cookie))
page = opener.open("http://movies.netflix.com/WiAltGenre?agid=%s" % i)
soup = BeautifulSoup.BeautifulSoup(page.read())
for i in range(1,100):
genre = get_genre(i)
if genre:
print "%s (%s)" % (genre, i)
time.sleep(0.1)
results_00001-00100.txt
results_00001-05000-alpha.txt
results_00100-00200.txt
results_03200-03300.txt
Hi -
Would there be a way to also scrape the movies/shows associated with each genre category?