You are on page 1of 6

Automation

Cheat Sheet

Websites | WhatsApp | Emails


Excel | Google Sheets | Files & Folders

Artificial Corner
File & Folder Regex
We use regex to create patterns that help
Other Metacharacters
\b Word boundary

Operation
match text.
Metacharacters \B No word boundary

We can create folders and manipulate files in Python using Path.


\d Digit (0-9)
\1 Reference

Path
Import Path:
\D No digits (0-9)
Table Extraction
\w Word Character (a-z, A-Z, 0-9, _) We can use camelot to extract tables from PDFs
from pathlib import Path and pandas to extract tables from some websites.
\W Not a Word Character
Get current working directory:
>>> Path.cwd() Whitespace (space, tab, new line)
PDF
\s
'/Users/frank/Projects/DataScience' Import library:
List directory content: \S No Whitespace (space, tab, new line) import camelot
>>> list(Path().iterdir())
[PosixPath('script1.py'), PosixPath('script2.py')] . Any character except new line Read PDF:
List directory content within a folder: tables=camelot.read_pdf('foo.pdf',
\ Ignores any special character
>>> list(Path('Dataset').iterdir()) pages='1',
^ Beginning of a string flavor='lattice')
Joining paths:
>>> from pathlib import Path, PurePath Export tables:
>>> PurePath.joinpath(Path.cwd(), 'Dataset') $ End of a string tables.export('foo.csv',
'/Users/frank/Projects/DataScience/Dataset' f='csv',
Quantifiers & Groups compress=True)
Create a directory:
>>> Path('Dataset2').mkdir() * 0 or more (greedy)
>>> Path('Dataset2').mkdir(exist_ok=True)
Export first table to a CSV file:
Rename a file: + 1 or more (greedy)
tables[0].to_csv('foo.csv')
>>> current_path = Path('Data')
>>> target_path = Path('Dataset') ? 0 or 1
>>> Path.rename(current_path, target_path) Print as a dataframe:
{3} Exact number print(tables[0].df)
Check existing file:
>>> check_path = Path('Dataset')
>>> check_path.exists() # True/False {n,} More than n characters Websites
Import library:
Metadata: {3,4} Range of numbers (Min, Max)
>>> path = Path('test/expenses.csv') import pandas as pd
>>> path.parts () Group
('test', 'expenses.csv')
>>> path.name Read table:
[] Matches characters in brackets
expenses.csv tables=pd.read_html('https://xyz.com')
>>> path.stem
expenses [^ ] Matches characters not in brackets
>>> path.suffix Printing table:
.csv | Or print(tables[0])
Send Email & Create Reports
Message
We can create an Excel report in Python using openpyxl.

Excel
With Python we can send emails and WhatsApp messages. Create workbook:
from openpyxl import Workbook
Email
Import libraries: wb = Workbook() # create workbook
ws = wb.active # grab active worksheet
import smtplib ws['C1'] = 10 # assign data to a cell
import ssl wb.save("report.xlsx") # save workbook
from email.message import EmailMessage
Working with existing workbook:
Set variables: from openpyxl import load_workbook
email_sender = 'Write-sender-here'
email_password = 'Write-passwords-here' wb = load_workbook('pivot_table.xlsx')
email_receiver = 'Write-receiver-here' sheet = wb['Report'] # grab worksheet "Report"
subject = 'Check this out!' Cell references:
body = """ min_column = wb.active.min_column
I've just published a new video on YouTube max_column = wb.active.max_column
""" min_row = wb.active.min_row
Send email: max_row = wb.active.max_row
em = EmailMessage()
em['From'] = email_sender Create Barchart:
em['To'] = email_receiver from openpyxl.chart import BarChart, Reference
em['Subject'] = subject barchart = BarChart()
em.set_content(body)
context = ssl.create_default_context() Locate data:
data = Reference(sheet,
with smtplib.SMTP_SSL('smtp.gmail.com', 465, context=context) as smtp: min_col=min_column+1,
smtp.login(email_sender, email_password) max_col=max_column,
smtp.sendmail(email_sender, email_receiver, em.as_string()) min_row=min_row,
max_row=max_row)
WhatsApp Locate categories:
Import libraries: categories = Reference(sheet,
min_col=min_column,
import pywhatkit max_col=min_column,
Send message to a contact: min_row=min_row+1,
max_row=max_row)
# syntax: phone number with country code, message, hour and minutes
pywhatkit.sendwhatmsg('+1xxxxxxxx', 'Message 1', 18, 52) Add data and categories:
Send message to a contact and close tab after 2 seconds: barchart.add_data(data, titles_from_data=True)
barchart.set_categories(categories)
# syntax: same as above plus wait_time, tab_close and close_time
pywhatkit.sendwhatmsg(“+1xxxxxxx”, “Message 2”, 18, 55, 15, True, 2) Add chart:
Send message to a group: sheet.add_chart(barchart, "B12")
# syntax: group id, message, hour and minutes Save existing workbook:
pywhatkit.sendwhatmsg_to_group("write-id-here", "Message 3", 19, 2) wb.save('report_2021.xlsx')
Web Automation HTML for Web Automation
Let's take a look at the HTML element syntax.
Web automation is the process of automating web actions like
clicking on buttons, selecting elements within dropdowns, etc. Tag Attribute Attribute
name name value End tag
The most popular tool to do this in Python is Selenium.
Selenium 4 <h1 class="title"> Titanic (1997) </h1>
Note that there are a few changes between Selenium 3.x versions and
Selenium 4. Attribute Affected content
Import libraries:
from selenium import webdriver HTML Element
from selenium.webdriver.chrome.service import Service
This is a single HTML element, but the HTML code behind
web="www.google.com" a website has hundreds of them.
path='introduce chromedriver path'
service = Service(executable_path=path)
HTML code example
driver = webdriver.Chrome(service=service)
driver.get(web) <article class="main-article">
<h1> Titanic (1997) </h1>
Find an element
<p class="plot"> 84 years later ... </p>
driver.find_element(by="id", value="...")
<div class="full-script"> 13 meters. You ... </div>
Find elements </article>
driver.find_elements(by="xpath", value="...") # returns a list
The HTML code is structured with “nodes”. Each
Quit driver rectangle below represents a node (element, attribute
driver.quit() and text nodes)
Getting the text
data = element.text Root Element Parent Node
<article>
Implicit Waits
import time
time.sleep(2) Element Attribute Element Element
<h1> class="main-article" <p> <div>
Siblings
Explicit Waits
from selenium.webdriver.common.by import By Text Attribute Text Attribute Text
Titanic (1997) class="plot" 84 years later ... class="full-script"" 13 meters. You ...
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, 'id_name'))) The “root node” is the top node. In this example,
# Wait 5 seconds until an element is clickable <article> is the root.
Every node has exactly one “parent”, except the
Options: Headless mode, change window size root. The <h1> node’s parent is the <article> node.
from selenium.webdriver.chrome.options import Options “Siblings” are nodes with the same parent.
options = Options() One of the best ways to find an element is building
options.headless = True its XPath
options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(service=service, options=options)
XPath
We need to learn how to build an XPath to
Google Sheets
Google Sheets is a cloud-based spreadsheet application that can store data in a structured way just like most
properly work with Selenium. database management systems. We can connect Google Sheets with Python by enabling the API and
downloading our credentials.
XPath Syntax
An XPath usually contains a tag name, attribute Import libraries:
from gspread
name, and attribute value. from oauth2client.service_account import ServiceAccountCredentials
//tagName[@AttributeName="Value"] Connect to Google Sheets:
scope = ['https://www.googleapis.com/auth/spreadsheets',
Let’s check some examples to locate the article, "https://www.googleapis.com/auth/drive"]
title, and transcript elements of the HTML code we
used before. credentials=ServiceAccountCredentials.from_json_keyfile_name("credentials.json",
scope)
client = gspread.authorize(credentials)
//article[@class="main-article"]
Create a blank spreadsheet:
//h1 sheet = client.create("FirstSheet")
//div[@class="full-script"]
Sharing Sheet:
sheet.share('write-your-email-here', perm_type='user', role='writer')
XPath Functions and Operators
Save spreadsheet to specific folder (first manually share the folder with the client email)
XPath functions
client.create("SecondSheet", folder_id='write-id-here')
//tag[contains(@AttributeName, "Value")]
Open a spreadsheet:
sheet = client.open("SecondSheet").sheet1
XPath Operators: and, or
Read csv with Pandas and export df to a sheet:
//tag[(expression 1) and (expression 2)]
df = pd.read_csv('football_news.csv')
sheet.update([df.columns.values.tolist()] + df.values.tolist())
XPath Special Characters
Print all the data:
Selects the children from the node set on the sheet.get_all_records()
/
left side of this character
Specifies that the matching node set should Append a new row:
// new_row = ['0', 'title0', 'subtitle0', 'link0']
be located at any level within the document sheet.append_row(new_row)
Specifies the current context should be used
. (refers to present node) Insert a new row at index 2:
sheet.insert_row(new_row, index=2)
.. Refers to a parent node
A wildcard character that selects all Update a cell using A1 notation:
* elements or attributes regardless of names sheet.update('A54', 'Hello World')
@ Select an attribute Update a range:
() Grouping an XPath expression sheet.update('A54:D54', [['51', 'title51', 'subtitle51', 'link51']])
Indicates that a node with index "n" should
[n] Update cell using row and column coordinates:
be selected sheet.update_cell(54, 1, 'Updated Data')
Pandas Selecting rows and columns Data export
Cheat Sheet
Select single column: Data as NumPy array:
df['col1'] df.values
Select multiple columns: Save data as CSV file:
Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']] df.to_csv('output.csv', sep=",")
following code examples refer to the dataframe below.
Show first/last n rows: Format a dataframe as tabular string:
df.head(2) df.to_string()
axis 1 df.tail(2)
col1 col2
Convert a dataframe to a dictionary:
Select rows by index values:
A 1 4 df.to_dict()
df.loc['A'] df.loc[['A', 'B']]
axis 0 Save a dataframe as an Excel table:
df = B 2 5 Select rows by position:
df.iloc[1] df.iloc[1:]
df.to_excel('output.xlsx')
C 3 6
Data wrangling Pivot and Pivot Table
Getting Started Filter by value: Read csv file:
df[df['col1'] > 1] df_sales=pd.read_excel(
Import pandas:
'supermarket_sales.xlsx')
import pandas as pd Sort by one column:
df.sort_values('col1') Make pivot table:
Create a series: df_sales.pivot_table(index='Gender',
Sort by columns: aggfunc='sum')
s = pd.Series([1, 2, 3], df.sort_values(['col1', 'col2'],
index=['A', 'B', 'C'], ascending=[False, True]) Make a pivot tables that says how much male and
female spend in each category:
name='col1') Identify duplicate rows:
Create a dataframe: df.duplicated() df_sales.pivot_table(index='Gender',
data = [[1, 4], [2, 5], [3, 6]] columns='Product line',
Drop duplicates: values='Total',
index = ['A', 'B', 'C'] df = df.drop_duplicates(['col1']) aggfunc='sum')
df = pd.DataFrame(data, index=index,
Clone a data frame:
columns=['col1', 'col2'])
clone = df.copy() Below are my guides, tutorials and
Read a csv file with pandas:
Concatenate multiple dataframes vertically: complete web scraping course:
df = pd.read_csv('filename.csv')
df2 = df + 5 # new dataframe - Medium Guides
pd.concat([df,df2]) - YouTube Tutorials
Advanced parameters:
Concatenate multiple dataframes horizontally: - Data Science Course
df = pd.read_csv('filename.csv', sep=',',
df3 = pd.DataFrame([[7],[8],[9]], - Automation Course
names=['col1', 'col2'],
index=['A','B','C'], - Web Scraping Course
index_col=0, columns=['col3']) - Make Money Using Your Programming
encoding='utf-8',
pd.concat([df,df3], axis=1) & Data Science Skills
nrows=3)
Made by Frank Andrade artificialcorner.com

You might also like