You are on page 1of 5

Evaluation Questions

Name:

Part 1 - Regular Expressions


 Please try to keep your regular expressions as simple as possible. LESS IS MORE
 All work must be done on https://regex101.com/
 Settings must match what is shown in the red boxes below

Question 1: Visit the URL https://www.amazon.com/dp/B084JCXSL6/ and write a regular


expression based on the HTML of the page to extract the product title information.

Answer:

______________________________________________________________________________
Question 2: Write a regular expression based on the HTML of the page to extract the price.

Answer:

______________________________________________________________________________

Question 3: Please write a regular expression based on the HTML of the page to extract the
canonical URL.

Answer:

______________________________________________________________________________

Question 4: Find what is common about the three URLs below and write one regular expression
to capture all three URLs.

https://www.homedepot.com/b/Furniture-Kitchen-Dining-Room-Furniture-Dining-Chairs/N-
5yc1vZc7p6/Ntk-EnrichedProductInfo/Ntt-chair?Ntx=mode+matchpartialmax&NCNI-5

https://www.homedepot.com/b/Kitchen-Kitchenware/N-5yc1vZaqzo

https://www.homedepot.com/b/Appliances-Small-Kitchen-Appliances-Coffee-Espresso-Coffee-
Makers/N-5yc1vZbv4w

Answer:

______________________________________________________________________________
Question 5: Find what is common about the three URLs below and write one regular expression
to capture all three URLs.

Product URLs:
https://www.homedepot.com/p/Cuisinart-Triple-Rivet-15-Piece-White-Knife-Set-with-Storage-
Block-C77TR-16P/304088574

https://www.homedepot.com/p/IMAX-Vintage-Silver-Camera-Boxes-Set-of-2-36130-
2/204369237

https://www.homedepot.com/p/Cuisinart-14-Cup-Programmable-Black-Stainless-Steel-Drip-
Coffee-Maker-DCC-3200BKSP1/312699251

Answer:

______________________________________________________________________________

Part 2 – Troubleshooting

Question 6: Suppose the scraper your built for a website has been running successfully for 6
days but on the 7th day, you notice that the price is not getting extracted anymore. What are
some possible causes for this (list as many as you can)?

Answer(s):

______________________________________________________________________________

Question 7: Suppose the scraper you built of a website stopped working because you were
getting blocked. What are some things you can do to get around the blocking problem (list as
many as you can)?

Answer(s):
Part 3 – Technical Assessment

Question 8: Please describe all the strategies/methodologies you would use if you were tasked
with creating a scraper to extract all the products from a website?

Answer(s):

______________________________________________________________________________

Question 9: Suppose you were tasked with extracting 30,000 products from a website 4x/day.
How would you accomplish this task?

Answer(s):

______________________________________________________________________________

Question 10: Please describe all the ways you can design a scraper to extract information from
a specific location (ex. All products from a store with postal code 10001)?

Answer(s):

______________________________________________________________________________

Question 11: Go to URL https://www.amazon.com/dp/B08H99878P/ and find the endpoint


(URL/API) that contains all the information highlighted in the green box (see screenshot below).

(Hint – this is not the answer https://www.amazon.com/dp/B08H99878P/ref=olp-opf-redir?


aod=1&ie=UTF8&condition=ALL)
Answer:

You might also like