Professional Documents
Culture Documents
SCRAPING PROJECT
This command will create your project setup including the files needed
to support your scrapy project.
For establishing items, you can add them in the items.py file and then
add in the spyder:
class CustomItem(Item):
one_field = Field()
another_field = Field()
class Product(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()
stock = scrapy.Field()
response.selector.xpath('//span/text()').get()
response.xpath('//span/text()').get()
response.css('span::text').get()
Scrapy tool Global Commands:
You can obtain more info for each with: scrapy <command> -h
or see available commands with: scrapy -h
Launch scrapy shell - scrapy shell urlhere
Once the shell loads you will have the response and you can use
selectors and xpath such as:
response.xpath('//title/text()')
Scrapy uses logging for event logging. We’ll provide some simple
examples to get you started, but for more advanced use-cases it’s
strongly suggested to read thoroughly its documentation.
import logging
logging.warning("This is a warning")
Email:
BEAUTIFULSOUP
soup.title
soup.p
soup.title.name
soup.find_all(“class”)
One common task is extracting all the URLs found within a page’s <a>
tags:
print(soup.get_text())
url = ‘Google.com’
headers {‘header info if needed’}
r = request.get(url, headers=headers)
soup = BeautifulSoup(r.text, ‘lxml’)
Requests
>>> r.json()
https://requests.readthedocs.io/en/master/
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
Apart from the public methods given above, there are two private
methods which might be useful with locators in page objects. These
are the two private methods: find_element and find_elements.
Example usage:
Or:
We can also use the following for delay’s or allowing content to load via
time:
driver.navigate("localfileexample.html")
WebDriverWait(driver).until(document_initialised)
el = driver.find_element(By.TAG_NAME, "p")
assert el.text == "Hellooooo Scraping"
https://www.selenium.dev/documentation/en/webdriver/waits/