Professional Documents
Culture Documents
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
Scrapy|AFastandPowerfulScrapingandWebCrawlingFramework
Download
Documentation
Community
Companies
Commercial Support
Fork on Github
FAQ
Scrapy 1.2
pip install scrapy
pypi v1.3.0
wheel yes
PyPI
Conda
Source
coverage 83%
Terminal
pipinstallscrapy
cat>myspider.py<<EOF
importscrapy
classBlogSpider(scrapy.Spider):
name='blogspider'
start_urls=['https://blog.scrapinghub.com']
defparse(self,response):
fortitleinresponse.css('h2.entrytitle'):
yield{'title':title.css('a::text').extract_first()}
next_page=response.css('div.prevpost>a::attr(href)').extract_first()
ifnext_page:
yieldscrapy.Request(response.urljoin(next_page),callback=self.parse)
EOF
scrapyrunspidermyspider.py
Deploy them to
Scrapy Cloud
or use Scrapyd to host the spiders on your
own server
Terminal
shublogin
InsertyourScrapinghubAPIKey:<API_KEY>
#DeploythespidertoScrapyCloud
shubdeploy
#Schedulethespiderforexecution
shubscheduleblogspider
Spiderblogspiderscheduled,watchitrunninghere:
https://app.scrapinghub.com/p/26731/job/1/8
#Retrievethescrapeddata
shubitems26731/1/8
{"title":"ImprovedFrontera:WebCrawlingatScalewithPython3Support"}
{"title":"HowtoCrawltheWebPolitelywithScrapy"}
...
Easily extensible
Portable, Python
https://scrapy.org/
1/2
12/28/2016
Scrapy|AFastandPowerfulScrapingandWebCrawlingFramework
Healthy community
Star
17,564
Fork
4,836
https://scrapy.org/
2/2