Professional Documents
Culture Documents
Name:
Answer:
______________________________________________________________________________
Question 2: Write a regular expression based on the HTML of the page to extract the price.
Answer:
______________________________________________________________________________
Question 3: Please write a regular expression based on the HTML of the page to extract the
canonical URL.
Answer:
______________________________________________________________________________
Question 4: Find what is common about the three URLs below and write one regular expression
to capture all three URLs.
https://www.homedepot.com/b/Furniture-Kitchen-Dining-Room-Furniture-Dining-Chairs/N-
5yc1vZc7p6/Ntk-EnrichedProductInfo/Ntt-chair?Ntx=mode+matchpartialmax&NCNI-5
https://www.homedepot.com/b/Kitchen-Kitchenware/N-5yc1vZaqzo
https://www.homedepot.com/b/Appliances-Small-Kitchen-Appliances-Coffee-Espresso-Coffee-
Makers/N-5yc1vZbv4w
Answer:
______________________________________________________________________________
Question 5: Find what is common about the three URLs below and write one regular expression
to capture all three URLs.
Product URLs:
https://www.homedepot.com/p/Cuisinart-Triple-Rivet-15-Piece-White-Knife-Set-with-Storage-
Block-C77TR-16P/304088574
https://www.homedepot.com/p/IMAX-Vintage-Silver-Camera-Boxes-Set-of-2-36130-
2/204369237
https://www.homedepot.com/p/Cuisinart-14-Cup-Programmable-Black-Stainless-Steel-Drip-
Coffee-Maker-DCC-3200BKSP1/312699251
Answer:
______________________________________________________________________________
Part 2 – Troubleshooting
Question 6: Suppose the scraper your built for a website has been running successfully for 6
days but on the 7th day, you notice that the price is not getting extracted anymore. What are
some possible causes for this (list as many as you can)?
Answer(s):
______________________________________________________________________________
Question 7: Suppose the scraper you built of a website stopped working because you were
getting blocked. What are some things you can do to get around the blocking problem (list as
many as you can)?
Answer(s):
Part 3 – Technical Assessment
Question 8: Please describe all the strategies/methodologies you would use if you were tasked
with creating a scraper to extract all the products from a website?
Answer(s):
______________________________________________________________________________
Question 9: Suppose you were tasked with extracting 30,000 products from a website 4x/day.
How would you accomplish this task?
Answer(s):
______________________________________________________________________________
Question 10: Please describe all the ways you can design a scraper to extract information from
a specific location (ex. All products from a store with postal code 10001)?
Answer(s):
______________________________________________________________________________