Professional Documents
Culture Documents
andreea.cosma@fortech.ro
Automate your Browser with Python
Frank Anemaet
This book is for sale at http://leanpub.com/automateyourbrowserwithpython
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Install on Ubuntu Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Install on Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Install on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Selenium browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Navigate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Take screenshot automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Click button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
HTML Text input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Get HTML element text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Scroll page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Test if webpage contains text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Locate elements by XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Get all links from webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Get Links from Webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Radiobuttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Dropdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Checkbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
css selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Wait for page load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Maximize window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Minimize window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
CONTENTS
Hide window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Window size and position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Private mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Introduction
Welcome to the Python selenium course. In this book you will learn how to automate the web
browser with Python.
You can automate any web app or web sites. This means you can do automated testing, automate
boring or automate repetitive tasks, do web scraping and a variety of other things.
How does it work?
In essence, Python controls the Web browser like Firefox or Chrome with Python. The Python
script then instructs the browser what to do, like opening new pages, navigating, collecting data,
clicking or whatever the developer want the browser to do.
Selenium is the intermediary, making it possible to control web browsers with Python. More details
on how it exactly works are discussed in this book.
Supported browsers are:
• Firefox
• Internet Explorer (version 11)
• Safari
• Opera
• Chrome
• Edge
You can run Python selenium scripts on Microsoft Windows, Apple macOS and Linux.
This is an intermediary course. You should already know the basics of Python. After reading this
book, you can control these web browsers with Python and you’ll be able to to automate any web
app, web site, do web scraping, web testing and a lot more!
Chapter 1
First lets talk about the basic setup for Python web automation. So you start with Python code.
The codes will use a web driver and depending on which browser you want to automate, you need
a different driver.
So for Chrome there’s the chromedriver, for Firefox there’s the geckodriver and some on.
A webdriver controls a web browser. So depending on the driver it controls Firefox, Chrome, Opera
etcetera.
So you are not controlling the web browser directly but you control the Web browser via a Web
Driver
Setup
You need to install the selenium module and a web driver.
I will show you how to do that. The first thing to do is to install the selenium module for python.
You can do that with the command pip install selenium or pip3 install selenium depending on
your operating system.
Most of the times it’s pip install selenium.
So by typing the commands it will install the selenium module.
Firefox
Download and install the Firefox browser. The next thing you want to do is to install the driver. You
can do that manually by downloading the Firefox driver from https://github.com/mozilla/geckodriver/releases/tag/v
(change to your version) . Then extract it, copy it to /usr/local/bin and make it executable using the
command chmod +x.
Alternatively, copy the contents of the script below and save it as firefox.sh
Then type this command to install the Firefox driver. It will install the web driver for you.
Chapter 1 6
Chrome
If you want to use Chrome, install the Chrome web browser and the driver. You can download it
from https://chromedriver.storage.googleapis.com/
Then extract it and instal it.
You can use the chrome install script below, save it as chrome.sh:
1 LATEST=$(wget -q -O - https://chromedriver.storage.googleapis.com/LATEST_RELEASE)
2 wget https://chromedriver.storage.googleapis.com/$LATEST/chromedriver_linux64.zip
3 unzip chromedriver_linux64.zip && sudo ln -s $PWD/chromedriver /usr/local/bin/chrome\
4 driver
If these commands ran successfully, the selenium module and the webdriver are installed.
There are the scripts test-firefox.py and test-chrome to test if installation was successful.
Install on Mac OS X
If you use Apple Mac OS X, you can install selenium and the web driver too. The process is similar
to installation on Linux.
First open a terminal.
Firefox
Create this script and save it as firefox.sh
Chrome
For Chrome it’s similar. You should install the driver and the selenium module. If you don’t have
Chrome installed, you should install the browser too.
First install the selenium module
Chapter 1 8
1 LATEST=$(wget -q -O - https://chromedriver.storage.googleapis.com/LATEST_RELEASE)
2 wget https://chromedriver.storage.googleapis.com/$LATEST/chromedriver_mac64.zip
3 unzip chromedriver_mac64.zip && sudo ln -s $PWD/chromedriver /usr/local/bin/chromedr\
4 iver
This will install both the web driver and the selenium module.
Install on Windows
To install Selenium on Windows 10:
1. Install Python
2. Install Latest Web Browser (Firefox, Chrome, Chromium…)
3. Download Web Driver from https://www.seleniumhq.org/download/
4. Add drivers path to environment variable PATH
When using the program, you can set the path to the driver like this:
Selenium examples
Chapter 2 11
Selenium browser
You can programmatically control a web browser with selenium. The selenium module enables you
to control Firefox or Chrome or even mobile browsers.
Import the selenium module and import time module. Then initialize Firefox.
We’ll import several modules:
The selenium module to control the web browser, the time module to create pauses and the os
module is nice to have.
1 url = 'https://python.org'
2 driver.get(url)
Navigate
Selenium is often used for testing, a simple test could be to check if the webpage title is equal to the
expected title.
You can open a browser and navigate to a url as shown in the example below, but it doesn’t validate
that the opened url is actually working (you could have a lost internet connection, a timeout, page
not found etc).
You can access the title using the .title attribute. This contains the page title. Verification can be done
with an if-statement.
1 driver.get("https://stackoverflow.com")
1 driver.quit()
Chapter 2 14
Back, refresh
There are other navigation features you can use, like go to the previous page with .back() or to the
next page with .forward(). The browser controlled by selenium actu
ally keeps a history, just like a regular browser does. You can refresh the page with .refresh().
The example below starts the Firefox browser, maximized the window, opens a url, then opens
another url and goes back and refreshes. In short: all the navigational features of the browser you
normally use can be accessed programmatically.
1 get_screenshot_as_file(filename)
So you run this code, it will open the website and take a screenshot. The screenshot will be saved as
a .png file in the same directory as your Python script. You can define a path if you want to, like the
Pictures directory.
Note that this only takes a screenshot of the visible part of the webpage. If you want to take a full
webpage screenshot, scroll down or flip a few pages.
Chapter 2 16
Driver variables
The selenium driver object stores more variables: the current url and the webpage title. These
variables change each time you open a new url using the get() method.
The program below opens a website using driver.get(), then outputs the page url and the page title.
Both variable .title and .current_url change with each new webpage you visit.
Chapter 2 18
And if you’re done run it you’ll see it visits each of those Web sites and print city title.
You can also do this for a bunch of webpages:
15 driver.get(w)
16
17 # output page title \
18 \
19
20 print(driver.title)
21 time.sleep(1)
22
23 driver.get_screenshot_as_file(f'image-{i}.png')
24 i = i + 1
25
26 driver.quit()
1 #coding=utf-8
2
3 import time
4 from selenium import webdriver
5
6 driver = webdriver.Firefox()
7 driver.get("https://python.org")
8
9 S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
10 driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment
11
12 driver.find_element_by_tag_name('body').screenshot('screenshot.png')
13 driver.quit()
Chapter 2 20
Click button
You can make selenium click a button automatically. This can be any button on a web page.
You will be able to call a button like this:
1 python_button.click()
In order to find the button, you can use the button text.
1 find_element_by_xpath()
So what is an XPath?
XPath is a query language for selecting nodes from an XML document, in this case the webpage
(HTML). You don’t have to worry to much about this, as you can get the XPath of any web element
using Chrome Developer Tools or Firefox Developer Tools. Both let you find the XPath (in the videos
we clearly show how).
A complete example is shown below, where it opens a webpage and clicks on the button. The button
reacts with an javascript alert, which selenium then clicks automatically too.
If the web browser creates a popup, you can also click on that. The driver lets you switch to an alert
and clicking the accept button, causing the popup window to be gone.
Textfield input
Many websites provide somet type of interaction through html input fields. Search engines let you
type a search query, forms let you fill in your information etc.
An input field can be completed automatically with Python selenium. The program below opens the
search engine duck.com and automatically types a search query.
The example uses the method find_element_by_id() to find the input box, by its unique id. You can
see this id by viewing the webpage source in your browser. Sometimes an element doesn’t have
unique id, in which case you can use its xpath.
The method call send_keys() enters text into the selected text field. This can be any type of text you
want.
Finally, to do the actual search and not only type in the box you can send the enter key. To send the
enter key, first add another input:
Chapter 2 23
1 textfield.send_keys(Keys.RETURN)
The program below does a search on the website duck.com automatically, but this principle works
for any search engine.
If you have a web page with text fields like a first name field, last name or different input fields you
can make the computer type those text fields automatically.
Textfields
You can use the HTML element id or XPath to find the element.
1 field1 = driver.find_element_by_id("FirstName")
1 field1.send_keys('Donald')
You’ll want to wait a second, as you don’t want it to select the next element while typing.
1 time.sleep(1)
Chapter 2 25
An example code is shown below (You can run the attached code):
Textbox input
If you want to fill a text area or text box (instead of text field). That’s possible too.
In principle that works the same, but you want to add newline characters (backslash n)
Textbox
Chapter 2 27
1 print( driver.find_element_by_id("title1").text )
If not, you can use the XPath that you can grab using Firefox or Chrome developer tools.
Example code:
Scroll page
There are multiple ways to scroll down a webpage: by javascript injection and by sending key strokes.
You can execute javascript by calling the method driver.execute_script(‘’).
Where Y is the height.
1 driver.execute_script("window.scrollTo(0, Y)")
1 driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
The program below opens a website, then scrolls to the bottom of the page using javascript injection.
I mentioned you can scroll by keypress, that works with the method send_keys.
This code opens a webpage and scrolls the page using send_keys().
This works for any type of element and text, so you can test if a web app or webpage gives the
correct output.
Chapter 3
Locate elements by XPath
On a web page each HTML element should have a unique id or a unique name, on which you can
get the HTML element.
1 element = driver.find_element_by_name("...")
2 element = driver.find_element_by_id("...")
In the web page code (HTML code) you can find those unique ids and names. For example, something
like this:
But in practice that’s often not the case, element do not always have a unique id or a unique name.
In that case you can select an HTML element using it’s XPath. XPath will give you the path to the
element.
The general ways to grab an HTML element using selenium:
1 element = driver.find_element_by_xpath("...")
To get the XPath you can use your browser. The way to get it depends on which web browser you
use. In general it’s like this: open your web browser on the page, start developer tools and grab the
XPath.
Chapter 3 31
XPath in Firefox
In the Mozilla Firefox browser:
1 element = driver.find_element_by_xpath("/html/body/div/div[2]/div[2]/div[5]/div[3]/s\
2 pan/a")
Chapter 3 32
Xpath in Chrome
From Chrome :
1 /html/body/div/div[2]/div[2]/div[5]/div[3]/span/a
So you can basically use it to select elements, where it returns an XPath query.
You can then grab your HTML element using the same XPath.
1 element = driver.find_element_by_xpath("/html/body/div/div[2]/div[2]/div[5]/div[3]/s\
2 pan/a")
Chapter 3 33
1 elems = driver.find_elements_by_xpath("//a[@href]")
2 for elem in elems:
3 print(elem.get_attribute("href"))
The program below gets the links from a webpage using the method find_elements_by_xpath. It
then iterates over the list of links and outputs the href value (link value) for each one. In HTML,
links do not only have a link value href, but also a description .text.
1 <a href="www.website.com">website</a>
1 elems = driver.find_elements_by_xpath("//a[@href]")
This will output all the links on the webpage, and all the text that is used to describe the links.
Chapter 4 35
Radiobuttons
If your webpage has radiobuttons, you can click on that automatically. To do so, get the radiobuttons
id, name or xpath, then call the .click() method.
Radio buttons
Chapter 4 36
Dropdown
You can select an item from a dropdown menu or listbox. This works slightly different, as you have
to import Select from selenium. Then there are the methods select_by_visible_text() and select_by_-
value().
Checkbox
Selenium can click on one or more checkboxes for you. Just get the element using xpath, id or name
and call the .click() method.
17 # select element
18 cbox = driver.find_element_by_xpath('/html/body/form/input[1]')
19 cbox.click()
20 time.sleep(2)
Checkbox
Chapter 4 39
Table
Webpages often have tables, you can grab all the table data with Python selenium. You can get all
rows, columns and other data. The first thing to do is to import the modules:
Then open a webpage, this can be a webpage on the internet or on your computer. As long as the
webpage has a table on it, it will work.
1 driver.get(url)
Table on webpage
Chapter 4 41
1 <table>
2 <tr>Row 1</tr>
3 <tr>Row 2</tr>
4 <tr>Row 3</tr>
5 </table>
That’s why counting the number of <tr> tags, is counting the number of cells. Inside a row you can
always find multiple columns (<td>), but those are irrelevant because we count the number of rows.
The example below opens a website and counts all the rows in the table shown. This works for any
driver including Firefox, Chrome etc. If you change the website, also change the tables xpath (you
can find it with developer tools inspector).
1 import time
2 from selenium import webdriver
3
4 # Create an instance of Firefox WebDriver
5 driver = webdriver.Firefox()
6
7 # Maximize the browser window
8 driver.maximize_window()
9
10 # Navigate to page
11 driver.get("https://news.ycombinator.com")
12
13 # Find the table element in the page
14 table = driver.find_element_by_xpath("/html/body/center/table/tbody/tr[3]/td/table")
15
16 # Find the #tr elements in the table
17 rows = table.find_elements_by_xpath("//tbody/descendant::tr")
18 print(f"Total rows: {len(rows)}")
19
20 # Pause the script for 3 seconds
21 time.sleep(3)
22
23 # Close the browser
24 driver.close()
Chapter 4 42
Table to list
Tables on any webpage can be converted into Python lists. Start by importing the selenium module
and starting a webdriver object, this can be Firefox, Chrome or any of the other supported browsers.
Optionally you can maximize the window, but even if the window is minimized or hidden selenium
continues to work.
1 import time
2 from selenium import webdriver
3
4 # Create an instance of Firefox WebDriver
5 driver = webdriver.Firefox()
6
7 # Maximize the browser window
8 driver.maximize_window()
Open a webpage with a table on it. There can be more than one table on the page, as long as you
specify the correct xpath later.
1 # Open webpage
2 driver.get("https://www.w3schools.com/html/html_tables.asp")
The table element is located by xpath, then the rows are grabbed.
1 # Get table
2 table = driver.find_element_by_xpath("/html/body/div[6]/div[1]/div[1]/div[3]")
3
4 # Get rows
5 rows = table.find_elements_by_xpath("//tbody/descendant::tr")
You can always use the xpath, sometimes you can use id.
A loop is executed to go over every row, and every column. The text in each cell is added to the list.
After the loop has completed, the table data is in the Python list.
The list in question is named result_data, and this is a nested loop: two loops in one. That’s because
it wants every row and every colum converted into a list (two dimensional into one dimensional).
The values len(rows) and len(cols) are the width and height counted in number of cells.
Chapter 4 43
1 for i in range(0,len(rows)):
2 ...
3 for j in range(0,len(cols)):
4 ...
That’s because it wants every row and every colum converted into a list (two dimensional into one
dimensional). The values len(rows) and len(cols) are the width and height counted in number of
cells.
For every row in the loop, you can get the number of columns by counting the number of <td> tags.
1 cols = rows[i].find_elements_by_tag_name('td')
Then you can get the cell data using cols[j].text.encode(‘utf-8’). The example below shows the
complete loop that converts the html table to a python list.
Then print the table, otherwise it’s just in the computers memory.
If you change the website, also change the xpath. Because the xpath is unique for every website. You
can get the xpath using Firefox dev tools or Chrome dev tools.
1 import time
2 from selenium import webdriver
3
4 # Create an instance of Firefox WebDriver
5 driver = webdriver.Firefox()
6
7 # Maximize the browser window
8 driver.maximize_window()
9
10 # Open webpage
11 driver.get("https://www.w3schools.com/html/html_tables.asp")
12
13 # Get table
14 table = driver.find_element_by_xpath("/html/body/div[6]/div[1]/div[1]/div[3]")
15
16 # Get rows
17 rows = table.find_elements_by_xpath("//tbody/descendant::tr")
18
19 # Create a list to store the text
20 result_data = []
21
22 # Get text in each column
23 for i in range(0,len(rows)):
24 cols = rows[i].find_elements_by_tag_name('td')
25 cols_data = []
26 for j in range(0,len(cols)):
27 # Get the text
28 cols_data.append(cols[j].text.encode('utf-8'))
29 result_data.append(cols_data)
30
31 # Print the result list
32 print(result_data)
33
34 # Pause the script for 3 sec
35 time.sleep(3)
36
37 # Close the browser
38 driver.close()
Chapter 4 45
css selector
You can also select by css style (webpage style, from cascading style sheets). Each html element often
has a style. You can use this to select any webpage element.
1 driver.find_element_by_css_selector()
You can let selenium check if an HTML element exists. One way to do that is using css selectors.
The method find_elements_by_css_selector() lets you select the element with the selector.
1 if driver.find_elements_by_css_selector('#element'):
2 print("Element exists")
The program below opens a webpage and finds the submit button.
If unsure, you can grab the css selector using the developer tools in Firefox and Chrome.
Chapter 4 48
Find by id
If an HTML element has a unique id, you can verify it exists by calling find_elements_by_id() as
shown in the program below.
Find by xpath
Finally, you can always use the XPath to find an element. That works as long as the webpage doesn’t
change.
The function below returns True if the element is found and False if it isn’t.
Minimize window
Even if a web browser is not maximized or visible, it can still continue to do tasks. That means you
don’t need to have the web browser open in full screen all the time, it can work in the background.
To minimize a web browser, you can use the script below:
A minimized window is still visible in the task bar, it’s the same as pressing the minimize button.
Chapter 5 50
Hide window
You can hide the window completely. Unlike a minimized window that is visible in the taskbar, a
headless window is not shown anywhere. But you can still interact with it.
In headless mode everything still works, including clicking buttons, links, scrolling etc.
Firefox can be started in headless mode, by adding the options flag options.headless = True. The
program below opens a website while in headless mode and outputs the page title.
Tabs
Most modern web browsers support tabs. You usually have one or more tabs open, and you can let
Python to do the same things: open tabs and switch tabs.
To open a new tab, you can execute Javascript on the page. To switch tab, you can call switch_-
to.window().
Private mode
Web browsers like Mozilla Firefox and Google Chrome support “private browsing”. This means your
web browser does not save history, cookies and other data, meaning on your local computer there
are no traces.
However, it does not mean “privacy mode”. Websites can track a computer by ip address or use
techniques like browser finger printing or server side cookies. That means that even if you clear
your cookies, websites track you around using other techniques.
Cookies are files stored on the computer, that are often used for tracking and authorization
Many users often confuse this, thinking private mode or incognito mode means privacy, it does not.
It means it doesn’t store the data on your computer, that’s all. By default, a selenium browser starts
with a clean, new profile.
Chapter 5 54
That’s the same as setting the private mode or opening a new private window.
Incognito mode and private mode are exactly the same, it doesn’t store data like cookies and history
locally.