Comp3321 PDF | PDF
‘
+ Foust Asheoc ID; 6589695
(% extends “base.ntml” X)
{% block body
iapFruit Stande/ht>
eppavailable fruit:
ule
{6 for fruit An fruits. dtens() %)
Lea hrot=/fruits/{{ fruit[e] }}/">U{ Frutt{e] He/a> ({E fruie{a] >
4 endfor =)
“pod a new Frutt
" indext)
print (resp. text[Inderl-64:Sndex2-7})oc ID: 6689695
ronoat - cnn]
“subject ) Testing", “redirect
"classi fication” :"UNCLASSIFTED##FOR-SrFEEEAC SEM)
resp ~ s08.post(*hetps://siteworks.we.nsa.Se.gov/main/enailfore/", data-payload)
print(resp.text{resp.text.find("Your form"):1)
(U) Inthis exmapie, ses.cert could also be alist or tuple containing (certFile, Keyfle) . and Keyfle can be a passwordess PEM file or a
PEN file and password string tuple, so you arent prompted for your password every time.
(U) Low-level socket connections with socket
(U) Communication over a socket requires a server (which listens) and a client (which connects) to the server, so weil need to open up two
Interactive interpreters. Both the server and the cont can send and receive data. The server must
4. Bind o an IP adress and port,
2 Announce that its accepting connections,
4 Listen for connections
4. Accept a connection.
‘5. Communicate onthe established connection.
Weil run the server (immediately below) inthe notebook and the
client (below) in separate python window onthe system where we're running
cur jupytr-notebook
import socket
sock server ~ socket. socket(socket.AF_INET, socket SOCK STREAM) # Pv, TCP
ost = “127.
PORT = 59505 # USE YOUR ow!oe 1D: 6589695
sock server. bind((HOST, PORT) )
sock server. Listen(3)
sock conn, meta = sock server.accept()
sock conn.send(b"Hello, welcone to the server")
sock conn. reev(s095)
(U) The client mast
1 Connect to an existing (IP address, pot tuple where a server is listening,
2 Communicate on the established connection
‘So or our purposes, wel run the following in a separate python window
saris 15 THe CLIENT
Amport socket
sock client « socket socket(socket.AF_INET, socket .SCK_STREAY)
Host, PORT = '227.0.0.1", 50585 # must match a hroun server
sock_elient.connect((HOST, PORT))
sock_elient.recv(512)
sock_elient.send(b"Thank you. I am the client)
(U) Butlerng, ete. are taken care of for you, most.
(U) Topics for Future Consideration:
+ SOAP with SOAPpy andlor SUDS.
+ Using modules from the Standard Library
+ XMLRPC
+ Parsing HTML witn Beautfu'SoupDec ID: 689695
HTTPS and PKI Concepts
pest 9 yeas 200 YT
Smo M OOO
(U) Overview of HTTPS and PKI concepts,
HTTPS and PKI Concepts
PKI is confusing, especially given the mbx of internal and external uses, but there are some core concepts
Public Key Infrastructure (PKI)
Each PK certificate has two parts, the private key and the public key. The public key is simply an encrypted form ofthe private ke. tis important to
oop the private key secret at al costs. A compromized private key would allow someone else to pretend tobe the original owner.
Establishing Trust
When you go to. amazon.com, your browser receives thir server certificate. But how do you know you can trust i?
Certificate Authorities (CA's)
Buried in your browser isa long ist of known certificate authorities, such as Verisign. The amazon.com server certiicate has been digitally signed
by ane of these CA's. We know its coming from amazon.com because only amazon can generate the corresponding public key, and only the
‘corresponding private key can decrypt traffic sent to the public key. In other words, because your computer knows the public key is signed by
‘known CA, and your computer is sending data to that pubic key, only amazon can decrypt it because they have the corresponding private Key.oc ID: 6689695.
PKI in the IC
“The 1, including NSA, has its own carticate authorities (CA's). Futhermore, both the users and the servers have certificates (generally only
servers have certificates onthe outside). These certificates are signed by the IC CA's, which are visible
at isk nab sac. covinapesicerifialeAuthories shim
Digital Signature (DS) Certificate
90% of the tie, you're using your digital signature certificate, This certificate verifies that you are you to the various services you access on
[NSAnet. You also use your DS certificate for Secure Shell (SSH) to access systems lke MACHINESHOP, LABBENCH, and OpenShit.
Key Encryption (KE) Certificate
(nthe rare occasion that you encrypt an ema, you use your KE certificate. Your browser doesnt actually need his certificate
Key Formats
PKCS12
NSA keys come in PKCS12 (p12) format. contains both the public and private key. With Python, you need the OpenSSL package to use PKCS12
cericates,
PEM
PEN formats by far the most widely supported format on the outside. Many languages and framworks only support PEM, not PKCS12. However
you can convert your key from PKCS12 to PEM formal using the openssl command,
To futher complicate matters, many languages and frameworks only support unenerypted PEM certificates. You can unencrypt your PEM or
PKCS12 certificate with the openss! command, but this is generally a no-no since it would allow anyone to masquerade as YoU.
PPK
PPK format is only used by PUTTY, the SSH tool for Windows. You can convert your key from PKCS12 to PPK format with the P12_to_ PPK
Converter tooee I:
os3595
PKI with Python
pypki2
Examples at nso co sac govlovnanynkhiobmasiae README md
By Hand with ss! Package
‘SL isthe Secure Sockets Layer, which implements HTTPS (ypar Text Transer Protocol Secure)
Python 2.7.9+
‘from getpass import getpass
‘fron rllib2 import build opener, HITPCookieProcessor, HITPError, MTTPSHandler, Request
Anport ssi
penbassud = gotpassnd( “Enter your PKI passwor
context » s51.SSLContext(ss1. PROTOCOL sSLv23)
context, Ioad_cert_chain(penCertFile, KeyFile-penteyFile, password:penPasswd)
context. oad_verify_ locations (cafile-penCAFile)
‘opener = butld_opener(HTTPCookieProcessor(), HTTPSHandler (context-context))
req = Request “https: //udkipedia.nsa.ic.gov/en/Colossally.abundant_nusber')
resp = opener.open(rea)
print (resp.read())
)
Python 3.4+Bo ID: 6589695
‘from getpass import getpass
‘from urllib.request import build_opener, HITPCookieProcessor, HITPSHandler, Request
import ssi
penBassud ~ getpassud( "Enter your PRE passvor
context = 551.SSLContext(ss1_PROTOCOL_SStv23)
context. load cert _chain(pesCertFile, KeyFile:penfeyFile, password-penPasswd)
context. load_verify_ locations (cafile-penCAFile)
‘opener = butld_openar(HTTPCookieProcessor(), HITPSHandler (context=context))
req = Request(“https://udkipedia.nsa.ic.gov/en/Colossslly,abundant_nusber')
resp » opener.open(rea)
prine(str(resp.read{), encoding=‘utf-8')) read() returns bytes type, which has to be converted to str type
External Packages
OpenSSL
‘Handles PKCS12 and many other key formats, but not part of the standard library. Itis included with Anaconda/Jupyter.
Requests
‘Supports only unencrypted PEM format. Takes care of a lot of litte things for you tke HTTP redirects. More on HTTP Status Codes
‘athiissiwikinedia nsa ic govlenList of HTTP status codesoe ID: 6589695,
Updated 3 months ago by[________Jin GOMP 3524.
ae
(U#F@UO) This notebook demonstrates how to interact with web resources over HTTPS when using LABBENCH. It primarily
‘ses the requests_pki module.
UNCLASSIFIED//FOR-GFFIGIALUSE-ONLY
(U#FEYO} Python, HTTPS, and LABBENCH
(U#FOUO) This notebook demonstrates how to interact with web resources over HTTPS when using LABBENCH, Ituses
the requests_pki and rest_api modules,
(U) HTTP with requests_pki
(Ter are complicated ways of tracing wth he network using bitin rare suchas ul rib, and tpl For base (secured)
Intracon we can use requests. However, th Socuro Tho No, almost evening is now Pk enabled
(U) Lucy ter is a module for ta! LABBENCH has nave suport or quests wich makes tan ea bar fous
hips install requestsoe ID: 6589605
Amport pydeps
nodules = [requests-pki', pypac’}
spydeps.pip(nodules)
(U) Example 1: Obligatory example of requests
nport requests
1 One of the few things not yet requiring @ certificate for Secure The Net
resp = requests get(http://airs.s2.org.ns9-S¢.gov/")
print("Response code: {)* forsat(resp. status_code))
print("Length of content: {)° format(len(resp content)))
print("Length of text: {)*-format(len(resp.text)))
(U) Thats interesting. Are content and text the samo?
cesp.content == resp.text
(U) tums out that content stores the bytes of the response and text stores the unicode of the response. Let's look at the txt:
print(resp. text)
(U) That's great we want the raw HTML...which in many cases we may. However, we can render the HTML response natively within Jupytert
fron 1Python.display import display, HTML
isplay (HIME (resp.text))
(U) Notice that we dirt get any ofthe images that go along with this webpage, but for our purposes now this is sufficient.
(U) requests_pki
(U#FOYO} LABBENCH has made interacting with secure webpages trivial! Thats because the requests_pki module works seamlessly wih
LABBENCH to pass your PKI with your request. Let's see how easy tis!
(U) Example 2: nbGalleryoc ID: 6689695.
Amport requests_pki
S055 = requests. pki.session()
resp = sess. get(*https://nbgallery nsa.t¢.gov/")
resp. headers
‘splay(iTAt (resp. text))
(U) So maybe Jupyte isnt meant to be a fulledged web-browser after al
(U) Example 3: Notebook Gallery search
(U) Search the Notebook Gallry fr aterm, got the results back as JSON, and parse the JSON, This adds a new headers argument tothe GET
request.
(U#FEUO} Normally @ web server wil respond wih some default ype of output. That may be spplication/hent , appicetton/xml , oF something
‘ese. you dort ke that, you can ty persuading the server to give you something else using an Accept header. That wl tll the server your
preferred response formate. the format you prefer to accept). Serves often support mute formats, but not al of them.
beautifulsoup"
“https; //nbgallery.nsa.ic.gov/notebooks™
parans = { "" : Search_term, ‘sort’ : ‘score’ )
headers = { ‘Accept’ : ‘application/json’ }
resp = sess.get(url, parans-parans, headers-headers)
resp.url
rine (resp. text)
# json.Loods() will porse a ISON string into Lists and hashes
‘resp_parsed = json. leads(resp.text)
‘ype(resp_parsed)
4 take 0 Look at tt and find what you want
resp_parsedoe ID: 689695
# print the titles of all notebooks that matched your search term
[ record['titie"] for record in resp parsed }
(U) Example 4: Using a proxy
(U) Sometimes you noed a proxy setup, particularly when working wit second party sites. requests_pkl ind pypac make this setup quite easy!
sess = requests pel SessTONh
parans = { ‘type’ : ‘Comunity’
‘espFronCsE = sess.get(url, parans-parans)
+ ‘project’ : “ALL , ‘service’ : ‘All" }
splay HTML (respFroncse.text))
(U) Example 5: Post with JSON
Sometimes yout need to ‘post’ data rather than do a ‘get request. The ‘post works similar to the ‘get, but youl need to specity parameters for the
ost and usually need to set the headers as wel. This one posts the parameters as a JSON object; another common content type
is application/x-nw-Fore-url-encoded , in which youll need to use the urllsb brary to URL encode your parameters prior to posting them.
base_url = “hetps://naningstuff-mestern.apps.0s08. platform. cloud.nsa.t¢.fov/*
# with this post, we're telling the host that we are sending json, ond want to receive son
# the post paraneters are sent in the ‘data’ hey, and must be json in this cose
status code = ©
tries =o
whtde not status code == 200%
resp = sess.post(
base_url + "GetRecord/languages/languages' ,
hheaders=("Accept": ‘application/json', ‘Content-Type: ‘application/ json"),
data-json.dumps({" language’: ("$ne": "English')})
>
status_code = resp. status code
tries 1
Af tries > 3:
breakee ID: 689695
Drint(resp. status code)
languages = json. loads(resp. text)
print (Len Languages)
print languagest0])
(U) rest_api
‘The rest_api ibrar is another resource for accessing HTTPS pages on NSANet. Like requests pki, rest_spi takes care of al the PKI
authentication for you, but this library is built to enable you to create whats called an'API wrapper’, which means that were wrapping our own class
‘around the API, which is designed to ust make it easir to query the API and interpret the results. API, by the way, stands for Application
Programming Interface, and is basically a clearly defined set of methods for communication with a given service, or rules for interacting with data
housed in a web service
In genera i you want to hit a single web page, requests_pki is generally preferred because there's less overhead (you don't have to create a
‘whole cass to doi). But f you want to hit mutiple pages at a website or API, then rest_api 's probably the better way to go,
(U) Example 6: rest_api with TESTFLIGHT
‘This example shows a simple class that inherits from rest_opt.AbstractiestaPr , and allows us to hit a couple of pages (called ‘endpoints! ofthe
‘TESTFLIGHT API. Notice we set host and headers as class variables. With these set, we don't have to define them every time we make a query
toa TESTFLIGHT page. For each page we just add the actual page or endpoint andthe cass fils inthe rest ofthe URL.
‘pydeps.pip(rest-api")
import rest_api
class Testflight(rest_api AbstractRest#PT).
host = ‘https: //tf-saw.testFLight.prof.nsa.ic.gov"
headers = ('Accept’: ‘application/ json")
def sources (self):
"Returns 2 List of all sources that feed Testflight”
endpoint = */SolanaService/rest/eport/sources"
return self._get endpoint). 3son()
ef search(self, **hvares)
"Returns report summaries that match the given Keyword arguments”
endpoint = "/SolanoService/rest/report/search/"
return self. _post(endpoint, dataskwargs).json()Doe ID: 6689695
‘from pprint import pprint
tf = Testflight()
print(tF.sourees()[:3])
Pprint(tf.search(originator='NsA', Flelds-"subject serial nipf", starts0, rowse3, sort="Newest’))
(U) Other resources
+ (UHFOYO) Other notebooks on the Notebook Gallery that use requests (can you modiy example 4 above to find then?)
+ (U#FOYO) oynk.2, an open source module for working with your P12 certificate that originated at NSA. It's not part of Anaconda and works
Destin Jupyter on LABBENCH. Itworks with urliib requests instead
(U) One more comment. Be careful wien you ty to display the HTML from webpages..some webpages may alec things more than you want.
resp = sess. get(*https://home.web.nsa.4c.g0v/")
) and cell () at o tine, outputting the information to the screen as csv
# and to the output file in pipe("|") detimited formats
for table in tables:
1
for tr dn table.findall tr"):
teed
3-8
For td in tr.Findan(‘td"):
‘rint("{},"-Format(td. text) end-"*)
‘outfile. write(“elenent():()|".format(j,td.text))
Sed
outfile.weite("\n")
print ()
‘outfile-close()Doe I:
(G) Notice how we can display a hyperlink to our output - this might be
handy if you don't want to go to Jupyter Home to display the file.
display itm (‘*.format(?[—Jabie. txt")
(U) Now lets try something a little trickier. Let's pull down th
homepage and redisplay the "Current Activities" buHeted li:
notebook.
fe are going to use the rest_opt module here. This ts a NSA specific package and has
# HTTPS support boked in. Tt makes pulling wedpoges using your PKI
# the package was really designed to access RESTFUL websericassond!
uristring =
paraneters
headers = ( ‘textyhtal”,
‘application/shent xm",
“application/xal;q-0.9",
create an apt object for our host server
api = rest_apt.abstractRestaPt (host-uristring)
‘Het the homepage from the server. If you wanted sub-pages off the server you would put
4 thot path in the querystring as something Like “/folder/page. heal”.
resp = apt. get (querystring)
except:
print("Well that didn't work!")
creote our Beautifulsoup object
bsObj = Beautifulsoup(resp.text, “html. parser")oc ID: 6689695
‘ise the .find method to get the of the HTML document. We will drCLt down to our
4 List from there. .find() only returns the first matching HTML tag, which ts OK in this
# case because you *should* only have one tag in the document
body = bs0bj.#ind( body")
(U#FOEUE} Now for the sticky bit.
From the the Chrome brower Tools-> Developer Tools console (could have done this
in Firefox as well from Tools->Web Developer->Toggle Tools) | ascertained that the
path through the HTML to the bulleted list | care about is
section? > diviter-containeritem-container?item- containers tem147087 > divtem-contntitem-content2 > dv > div
more succinctly, as xpath it is
IM@id-"section2"yivi2Vav tI ividv!
but BeautifulSoup does not accept xpath (whomp, whomp). If you like to use xpath
the Ixml module does a decent job of parsing HTML and does accept xpath syntax.
‘ow I progress through the body object using the find next method to get to the bulleted List
activities ~ body. Find next(*div',(id':*section2"}).Find.next( ‘div’ {"elass': 'feedDisplay"))
(U) Now we have the right element in the activities object. We can use
the str() method to get the raw HTML from the object and either print it
inline in the notebook or we can just print the text using the .text attribute.
# printcactivities)
display HTML (activities. str_()))
An easier way: using ‘select’eID: 6689695
Inthe ‘inspector view in your Developer Tools, you can right-cck on your desied tag and choose ‘Copy Unique Selector’ to copy the CSS selector
path for your tag. Then you can use soup.select oF soup.select_one to navigate dretly to that tag, rather than crawing through the entre
Inierarchy to get to it (Note: Iran this in Firefox, not sure what the right-click menu is like in Chrome)
selector = “.rssintries > Linth-ehild(1) > aivintn-ehil6(a)"
# ot Least for our version of bsé, you have to replace
# nth-child with nth-of-type
selector = selector.replace(“nth-child", *nth-of-type")
1 bs0bj.seLect would find al tags with thot path
sj. select_one(selector)oc ID: 6689695
Module: Operations with Compression and
Archives [porn oe]
Ups pou 2 your ge [Ts
(U) Module: Operations with Compression and Archives49695
user_steing
rane usernane, city, state,7ip_code,prinary workstation
json string
[(*author': "Jane Austen", “title”: “Pride and Prejudice"), ("author
“Fyodor Dostoevsky", “title": “Crime and Punishment”), {
with open(“user file.esv","w") as Ft
F write user string)Dee ID:
SAE open ‘user file. json",
fuwrite(Json_string)
zipfile
nport peste
with zipfile.2ipFile(‘user_file.zip', mod
af. write ‘user File.csv")
1) as 26
2 = ripfile.ziprile(‘user File.2ip") # with 2 Filename
) # with @ File on FiLe-Like object
af nomelist()
2f2.naelist()
2 = ef. f4lelist[o)
2.fllenane, 2.File size
[(z.filenane, 2.File size) for 2 in 2F.filelist]
2f getinfo("user #lle.esv")
user_file_csv = 2f.open("user_file.esv",'n") # returns a file-Like object!
fron esv import Dictteader
user_data = [_ for _ in ictReader(user_File_cs¥)}
print (len(user_data))
user_data(o)
user_file_esv.read()
user File csv. close)
af extract(af.filelist{0], ‘2Fextract’)oc ID: 699695
gzip
eh
with gztp.open( ‘user File.csv.ge", ‘wt") as gf
ef.urite("This string will be stored as text")
azip_users = gzip-open(‘user_file.csv.gr') # takes o file none, returns a ftle-Like object!
X= Brip_users.readlines(),
azip_users. close()
x03)
‘ezip_users = gzip.open("user_file.csv.gz", ‘rt")
[E.user dicts - List OictReader(grip_users))
Ruuser diets (:2]
with open("user_flle.cey.ge',‘rb) as F
Still gzipped = f.read()
still_geippedt 102]
‘fron so import Stringio
unpacked users = gzip.GzipFile(Fileobj-io.stringlo(still_gripped)) # what iF you have bytes or o ftle-Like obejct co unpack?
‘unpacked users. readlines(){:3)
tarfile
snort tare
veith tarfile.open("userFil
‘ef-add(“user_File.csv")
‘ef add(“user_File. json")
W's moden'w") as tf:
tarfile is tarfile(‘userfile.tar’), tarfile is tarfile(‘user file.csv")
{tf = tarfile open(‘userfile.tar") # don’t need to unzip First!
‘F.petmenbers()
F-getnanes()eID: 6689695
u = tfeextractfile( user file.csv")
uz » thextractFile(tf..getmenbers()[1])
sresdline()
2.read()(2
50]
‘theextractal1('from_tarball')Doc ID: 6689655
Module: Regular Expressions
ee ‘months ago by[
ie
oD
(U) Module: Regular Expressions
(U) Regular Expressions (Regex)
(U) Now You've Got Two Problems...
‘Some people, when confronted with a problem, think "I know, Il use regular expressions.” Now they have two
problems.
Jamie Zawineki, 1997
(U) A regular expression is too! or finding and capturing patterns in text strings. Its very powerful and can be very complicated; the second
problem referred to inthe quote isa commentary on how regular expressions are essentially a separate programming language, AS @ rule of thumb,
Use the in operator or string methods tke #ind or startssth if they are suitable forthe task. When things get more complicated, use regular
‘expressions, but try to use thom sparingly, Ikea Seasoning. AL times it may be tempting to write one giant, powerul, super regular expression, but
that is probably not the best thing to do.
(U) The power of regular expressions is found in the special characters. Some, ike “and $ , are roughly equivalent to string
methods startswith and endswith , while others are more flexible, especially . and * , which allow flexible matching
(U) Getting Stuff Done without RegexBo ID; 6589695
nike" An “so many mikes!”
onike™.startewith("at")
lendswith( ko")
#ind(-¥")
“aike" Asalpha()
nike” ssdigit()
mike® replace("k", "e")
(U) Regular expressions in Python
‘There ae only a few common methods fr using the re module, but they dont always do what you would fst expect. Some functional is exposed
through lags, which are actually constants (Le. int defined forthe re module), which means that they can be combined by addition.
import re
reamaten("e", “abedef")
re.maten("2", “abedef")
re.seareh("e", “abedef")
re.searen("", “abedef")
re.search("C*, “abcdef", re.t) # re. IGWRECASE
re.search("*c", “ab\ncdef")
re.search("*e", "ab\ncdef”,re.M) # re. MULTILINE
re.search("*C", “ab\ncdef", eM + re.t)
(U)Inboth natch and search , the regular expression precedes the string to search. The diference between the two functions is
that match works ony at he beginning ofthe sting, while search examines the whole string.Boe ID: 6559698
{0} When repeatedly using the same regular expression, compiling it can speed up processing. After a compiled regular expression is
‘created, Find, seareh , and other methods can be called on it, and given ony the search sting asa single argument.
cre = re.compite("e")
€.re.search("*abede")
Regex Operators
= matches a single character
matches beginning of a string or newline
$ matches end of string
= @ oF nore of something
+= 1 or more of something
2 = 0 or 1 of something
"2, 42, 22 ~ don't be greedy (see exanple below)
{3} - match 3 of something
(2,4) = match 2 to 4 of something
\ © escape character
[rntR] = match any ONE of the letters 1, ry my Ly RN
[a-n] - match any OWE of letters fron a to m
[ln] = match letter a or
We match a letter
As = match 2 space
Ad = maten a digit
re.search(*f'ss", "hike Likes cheese\nand Mike Likes bees")
3
anda CGH MDERIH(2)-[H(4)", “vetto, 1 an a very bad terrorist. 1 you uanted to know, ay phone number 4s (303) 555-2
2
findali(*mi.tke", "1 am looking for mike and not all this stuff in between mike")
re.findali("mi.*?ke", "A am looking for mike and not all this stuff in between mike")
Capture Groups
Put what you want to pull out ofthe strings in parentheses ()Dowd: 95
ww.string © tien 4s the best tess Yor dein.
fesult = re.findali(™ je), my-strng)
print(restit)
Prine(result))
ro" gramming”
Matches and Groups
(U) The return value from a successful call of match oF search is a match objec, an unsuccessful call retums None. First this issuable for use
in 4 statements, such as iF <_re.search("abeae”): ... . For complicated regular expressions, the match object has all the deals about the
substring that was matched, as well as any captured groups, i. regions surrounded by parentheses i the regular expression. These are available
via the group and groups methods. Group 0's always the whole matching string, after which remaining groups (which can be nested) ae ordered
according to the opening parenthesis.
= resmatch(r"(\we) (\w#)", "saae Newton, physicist")
-group()
groups)
sn-group(2)
m-groups()
Other Methods
(U) Other regular expression methods work through all matches inthe sting, although what i returned isnot always straightforward, especially
‘when captured groups are involved. We demonstrate out some basic uses without captured groups. When doing more complicated things, please
‘omember: be careful, read the documentation, and do experiments fo test!
ve. findall ("2c
beadeaeeafe") # returns List of strings
re.finditer("a.c","abcadeaeeafe") # returns ‘erator of match objects
re.split(ta.", “abeadeaecafe) # returns List of strings
(U) The suv method returns a modified copy ofthe target string. The fist argument is the regular expression to match, the second argument is
‘what o replace it with-which can be another string or a function, andthe thie argument isthe string on which the substitutions are to be caried
‘ut the sub method is passed a function, the function should take a single match object as an argument and return string. For some cases, ifDoc ID: 5689658
the substituton needs to reference captured groups from the regular expression, it can do so using the syntax \gcnunber>, which is the same as
‘accessing the groups: method within a function
Pet, "ance", “abracadabra*)
re-swecra(.t2)e", “ager ecie", “abracadabra)
ef reverse_First_group(natchob3):
natch = matchobs.eroup()
rrev_group = matchobj.group(1){::-2]
‘return natch[:matchobj.start(1)] + rev_group + natch{atchobj.end(1):)
re. sub("a
)c* yreverse_First_group,“sbracadabra’
(U) Inthe above, we used start and end, which are methods on a match object hat take a single numeric argument-the group number-and
‘etum the starting and ending indices inthe string ofthe captured group.
{(U) One final waning a group can be captured more than once, for instance when its defniton Is folowed by a + or a * then only the last
‘occurence ofthe group willbe captured and stored.‘Doc ID: 6589695
Hashes
pats mont gs fT ess
sates
=a
(U) Computing Hashes in Python
(U) Hashes
(U) Lets start with hashes, Hashes map data of arbitrary size to data of fixed size and have a variety of uses:
+ securely storing passwords
+ verifying fle integrity
+ efcienty determining if data isthe same
(U) There are many different hashing algorithms. You've probably heard of some ofthe more common ones, such as MDS, SHAT, and SHA256.
(U) Hashes have some useful features:
+ they are one-way, meaning that given a hash, there isnt a function to convert it back tothe original data
+ they map data to a fixed output, which is useful wien comparing large amounts of data (such as fles)
(U) Sollet’s generate a hash.
ron hashlib import sha2se
sha2S6("abe" encode(“ascit')) hexdigest()
(Wor
‘shazs6(b‘abc") hexatgest()Doe ID: 689695
(6) We a1 know storing plaintext passwords is bad. A common technique of avoiing this isto store the hash ofthe password, then check ithe
hhashes match. So we can create a short function to check ifthe lyped password matches the stored hash:
‘def check_password(clear password, password_hash):
return sha256(clear password) hexdigest() =~ password hash
(U) Does anyone know why storing the hash ofa password is bad?
(U) Ifthe password hash database was ever compromised, it would be vuinerable to a pre-computation attack (rainbow table), where an attacker
pre-computes hashes of common passwords. There are tools such as scrypt to help mitigate this vulnerabilty.
(U) How about a safer use of hashes? Suppose you need to look for duplicate files? Doing a byte-per-byte comparison of every fe to every other
fle would be realy expensive. A better approach is to compute the hash of eac fle, then compare the hashes.
mort 0s
‘from hashlib import mds
ef get_File_checksun(Filenane)
h = mas)
chunk size = £198
With open(Filenane, ‘rb") as
wide True:
‘chunk = £.read(chunk_si26)
Af Jen(chunk) == 0:
beak
hupdate(chunk)
return h.hexdigest()
(U) There is a small danger with this approach colisions. Since we're mapping alot of data toa smaller amount of data, there is the possibilty that
‘wo fles wil map to the same hash, For SHAZ56, the chances that two fies have the same hash are 1 in 2256, or about 1 in 1.168*77. So even
with aot of file, the chance of a olision is small
(U) Notice that we dont need to ead inthe entire fle at once. One really cool feature of hashes is they can be updated:
h = shaase(b'abe)
h.update(b'def")
huhexdigest<)
sha2S6(b'abedet")-hexdigest()oc ID: 6689695
Module: SQL and yan.
stg re ant Ti
oe aes
ooo
(U) Module: SAL. and Python
(U) The Odd Couple: Programming and Databases
(U) makes lt of sense to Keep your data in a database, and programming logic ina program. Therefore, i's worth overcoming the fundamental
impedance mismatch between the two technologies. In the most common use cases, where the program ist too teribly complicated and the data
Isnt to crazily interconnected, things usually work just fin.
(U) Python has a recommended Dalabase AP although there are sight variations in the way this APL i implemented, whichis one reason to use a
‘metalbrary tke SQLAIchemy (well get to tis later). The standard brary only provides an implementation for SQLite, in the sqlite? package.
Connections to other database types require external packages, such as soi (confusingly, to get this you have to pip install ySQl~
python )
Brroboy drop tables
(U) Basics with sqlite3
To interact witha database, a program must
4. Establish @ connection
2 Create a cursor
3. Execute commands
‘> Read the results
‘© Commit the changesoe ID: 6589695
"2 Ghose the cursor andlor connection
(U) Using a basic adapter, commands are executed by passing strings containing SOL commands as arguments,
import sqlites
com = sqlite3.connect(‘test.db') # SQLite specific: creates ab {f necessary
cur = conn. cursor()
cur-execute("*"create table fruit (
4d integer primary Key,
fname text not null,
color text default "RED"
insert into fruit (ane) values ("apple")"*") # not there yet
conn. comit() # to moke sure it's written
‘cur-execute(""* select * fron fruit “**) # returns the cursor--no need to capture it.
‘cur-Fetchone()
(U) When making changes to the database, i's best to use parameter substitution instead of sting subtituton to automaticaly protect against
Lnsantized input. The sqlite3 module uses as its substitution placeholder, but ths difers between database modules (whichis a major headache
when writing code that might have toto connect to more than one type of database),
fruit data = [(banana’ "yellow,
(Ceranberry’"erimson"),
Cdate’, "brown",
(Ceggplant’, "purple’),
(fie, “orange’),
(Carape’, ‘purple"))
for £ An fruit data:
ccur.execute("""insert into fruit (name, color) values (2,2)
bi
curcexecute(“**select * from fruit""") # DANGER! OATA HASW°T BEEN WRITTEN YET!
cur. Fetenone()Doc ID: 689695
cur. fetchmany(2)
cur. fotehal()
(U) A cursor is itrable:
more fruit = [(*honeydew', ‘green'), ("ice cream bean’, "brown"),("Jusube’, ‘red"))
cur oxecutomany(""" insert into fruit (name, color) values (2,2)"** snore fruit)
cur-execute("**select * from fruit”**)
[sten[2) for Stem dn cur] # rood the none
ccurcexecute( "PRAGHA table_info(Frust)")
for Line sn cur:
print Line)
cur. Fetenal1()
com. comit() # olways remember to commit!
(U) In sqlites, many of the methods associated wth a cursor have shortcuts atthe level ofa. connection ~behind the scenes, the module creates,
‘a temporary cursor to perform the operations. We wil not cover it because it st portable
(U) Other Drivers
(U) The most common databases are MySQL. and Postgres. Instaling the packages to interac with them i often frustrating, because they have
rnon-Python dependencies. Even worse, the most curent version of sysql-python in PYPIis broken, so we request a diferent version:
vew)
(vew)
(WE) [16 vip instath papcepe2 «+
Error: pg_config executable not found
{(U) With enough exceptions to make fe very frustrating, they work ke sqliteDoc ID: 689695
(U) SQLAIchemy
(U) SQLAIchemy is very powerful, very complicated package that provides abstraction layers over interaction with SQL databases. It includes all
kinds of useful fetures lke connection pooling. Well discuss two basic use cases; in bot of which we just want to use ito get data in and out of
Python.
(U) Cross-Database SQL
{U) Imagine the folowing scenario: during development you's lke to use SQLite, even though your production database is MySQL. You dont plan
todo anything fancy: you already know the SOL. statements you want to execute (although there area couple of things you always
Wished sqlite would do for you, ke retuming a dict instead of @ tuple
(U) Enter SQLAlchemy. does require that you have a driver installed, e.g. MySQlLdb, to actully tak tothe database, but it takes care ofall the
tickytack syntax details. By default, it even commits changes automaticaly!
import tpydeps
spydeps.pip("salalcheny")
Amport salalcheny
engine = sqlalcheny-create_engine(:
ite:///Rest.db") # database protocol and URL
result = engine.execute(‘select * fron fruit")
ans = result. fetchall()
first ans © ans(01
*ypecinst_ans)
frst_ans(2]
fipst_ans. keys)
*trst_ans.values()
engine-enceute("' ‘insert Anto frust (name) values (2)°°*,Ckunqust’))
engine.execute(‘““Snsert Snto fruit (name,color) values (?, 2)'"",{(* Line’, ‘green’ (mango, "ereen")1)
result = engine.execute("select * fron fruit’)oe ID: 6689695
result. fetehal()
(U) Now, to move to MySAL, all you have todo is use a diferent URL, which folows the patter:
‘ialectsdriver://usernane:passuordmost:port/ database
The SOLAlchemy documentation lists al the dabases and crivars
(U) As Object Relational Mapper
(U) The real power in SQLALchemy i in using it to store and retrieve Python objects from a database without ever writing a single line of SQL. It
takes alte bit of what looks lke voodoo at frst. Well skip most ofthe details for now, a the risk ofthis being a complete cargo cult activity. Open
pa new fe called sql_fruit.py and put the following into it:
‘from sqlalcheny inport create_engine, Column, Integer, String, Date
fron sqlalcheny.ext.declarative inport declarative bare
fron sqlalcheny.orn inport sessionaker
engine = create _engine(*sglite:///test.db*)
Base = declarative base()
Session = sessionaaker(bindzengine)
session = Session()
lass Frust(sase):
\_tablenane__= “fruit
Ha = colum(integer, primary key=True)
rnane = Colunn(string)
colorsColuan(String, default="Reo")
ef _inkt_(self, name, color):
Felf-nane = nave
self.color = color
ef _repr_{(sel#):
return ""-format(self.id, self.name, self.color)
(U) Now, in the interactive interpreter:
‘from sql fruit import *
Fqvery = abs
son. query Fruit)99695
query. 013()
query. #iest()
nectarine = Fruit(’nectarine®, ‘orangered")
session. ada(nectarine)
o_sesston.comit()oe 1D: 6689698
Easy Databases with sqlite3 a
coe os 3 years ao by Jin COMP 3524
(U) Example on using sqiite3 to group and average data instead of using dictionaries.
Easy Databases with sqlite3
‘The great thing about sqites i that it allows you to create a simple, local database without having fo install any servers or other tools. The entire
” + aml_records + “*
saith open("file.xal, “w") as
Fawrite(xml_records)
(U) Ina rapidly. developed prototype with controlled inp, tis may not cause a problem. Given the way the real world works, though, someday this
litle snippet from a one-off script will become the long-forgoten key component ofa huge, enterprise-wide project. Somebody wil ry to feed it data
in just the wrong way ata crucial moment, and it wl fal catastrophicaly
(U) When that happens, youl wish you had used afully-developed library that would have had a beter chance against the malformed data
‘Thankful, there are several~and they actually arent any harder to get started with.
(U) Comma Separated Values (CSV)
(U) The most exciting things about the csv module are the OsctReader and Dicthriter classes, Fist, ls look atthe pain vanila options for
reading and writing,
Amport csv
# = open("user_file.csv")
reader = esv.reader(#)
header = next (reader)
ali_Lines = [Line for Line in reader
al_Aines. sore()oe ID: 6589695,
8 = open('user_File_sorted.csv’
weiter = cav.wrster(g)
setter writeron(header)
veiter writerows(al1_tines)
B-close()
(U) CSV readers and witers have other option involving dialects and separators. Note thatthe argument to csv.reader: must bean open fle (or
fle-ike object), and the reading stars at the current cursor position
(U) Accessing categorical data positonaly isnot deal. That is why csv also provides the octReader and oicturster classes, which can also
handle records with more or les in them than you expect. When given ony a file as an argument, a DictResder uses the fst line as the keys for
the remaining lines; however its also possibe to pass in Feldnanes as an adltonal parameter
F s00k(0)
a.reader - csv. bictReader(#)
records = [ine for Line in d reader}
(U) To see the diferences between reader and DictReader, look at how we might extract cies from the records in each,
4 for the object fram esv.reader
citiese = [record[2] for record in all_lines}
1 for the object from csv. DictReader
ceittest = (record{ ‘city"] for record in records]
cities == cities
(U)Ina oiceiriter, the ¢ieLénanes parameter is required and headers are not writen by default If you want one, add it with
the writeheader method. Ifthe ¢ieldnanes argument does not include all the fields for every dictionary passed into the Dictwriter, the keyword
argument extrasaction must be specified.
1 = open(*nanes only csv", “x")
d.uriter = csv.Dicthntter(g, [‘nane", ‘primary workstation}, extrasaction-"ignore")
4 weiter wrsteheader()oe HD: 6589695
writer weiterows (records)
g-close()
(U) Javascript Object Notation (JSON)
(U) JSON is another structured data forma. In many cases itlooks very similar to nested Python dict sand ist s. However, there are enough
Notable differences from those (e.g. ony single quotation marks are alowed, boolean values have a lowercase inital letter) thats wise to use &
dedicated module to parse JSON data Sil, serializing and deserialzing JSON daa structures is relatively painless.
(U) For this section, our example willbe a ist of novels:
Amport json
novel ist = (1
novel List. append({"title": "Pride and Prejudic
author’: °3
austen")
novel_1ist.append({ "title"
‘Crime and Punishment’, ‘author: ‘Fyodor Dostoevsky"})
novelist. append( {tit
+ "The Unconsoled', ‘author’: ‘Kazuo Ishigure'})
Json.dumps(novel List) # to string
eth open( ‘novel List. son’, 'w") as f:
Json.dump(novel_list,f)” # to file
the hobbit = "{"title": "The Hobbit", "author": °3.R.R. Tolkien*}’
rnovel_1ist.append(json.loads(the hobbit) # frow string
saith open("war_and_peace.json") asf: # <-- if this file existed
novel List. append json. load()) * from Fite
(U) By defaut the 1084 and 1osds methods return Unicode strings. Its possible to use the json module to define custom encoders and decoders,
but this isnot usually required,
(U) Extensible Markup Language (XML)oe ID: 689608
{(G)'This lesson is supposed tobe simple, but XML is complicated. Well cover only the basics of reading data from and wing data to fies in a very
‘basic XML format using the ElementTreeAPI, which s just the most recent of at leas three approaches to dealing with XML inthe Python Standard
Library, We wil not discuss attrbutes or namespaces at al, which are very common features of XML. f you need to process lots of XML quicly I's
probably best to look outside the standard lary (probably at a package called Lami).
(U) Attrough there are other ways to get started, an Elenenttee can be created from a fle by italizing with the keyword argument File
‘fron va.etree inport Element Tree
ml_file + open(‘user_file.xnl")
user_tree = ElementTree. ElenentTree(fie-xal_ fie)
(U) Todo much of anything its bos 0 pull he rot element out of the ElenentTree . Elements ar iterable, so they can be expanded in lst
Comprehensions. To see what s inside an element, the Element Tree module provides two class functions: dum (which prints to screen and
retums one ) and tostring . Each node has a text propery, although in our example these ae al empty except for leaf nodes.
root elt = user_tree.getroct()
sere ~ [u for u An root. elt]
bring Elenenttree. tostring(users(®]))
wehildren = (x for x An users{9]]
wehtldren(2}.text
sw ehildren[2].text = ‘north-x5-1234"
tenant Tree. dump(users{01)
(U) Toget nested descendant elements directly, use ¢indall, which returns alist of all matches, or Find, which retums the frst matched element.
Note that these are the actual elements, not copies, so changes made here ae visible inthe whole element tree.
all_usemanes = root_elt.findall("user/nane/usernane’ )
[n-text for n in al1_usernanes{:10}]
(U) To construct an XML document:
+ make an Elenent
‘+ append other Eienent s tot (epeating as necessary).
+ wrap itall up inan elenentiree , and
+ use the Elenenttree.write method (which takes a fle name, nota File object)oe ID: 6689695
apple = ElementTree.Lonent( apple")
apple.attribf color") = ‘red
apple. set("varsety', “honeyersp")
apple-text = “Tasty”
Elenent Tree. dump(apple)
‘Frult_basket ~ lenentTree. Elenent( basket!)
‘Frutt_basket.append(apple)
Frutt_basket append(ElenentTree.2L(""))
Elenentree, dunp(Frutt_basket)
Frult_tree ~ Elenenttree. ElementTree(Frust_basket)
fruit tree.write("fruit_basket wal")
(U) Bonus Material: Pickles and Shelves
(U) Atthe expense of compaiity with other languages, Python also provides builtin serialization and data storage capabilties inthe form of
the pickle and shelve modules.
(U) Pickling
Snport pickie
picklene = (
plcklene["Titie"] = ‘Python 48 Cool 3
pleklenePagecount®} = 523
ptextene author) - [———}oe ID: 6589695
wath open("/tmp/pickleddata.pick', ‘xb") as p:
P = pickle.dunp(picklene, p)
saith open("/tmp/pickleddata.pick', rb") as p:
P = pickle. load(p)
rint(p)
(U) Shelving
(U) Creating a Shelve
ingore shelve
pletion = ()
plcklene("Title"] = ‘Python Ss Cool”
plckieme[‘Papecount"} = 583
———)
‘do = shelve.open" /tmp/shelve. dat)
ickdeme[ ‘Author ]
ecb) «petene
aaned
picklene["Title’] = ‘Python 4s Cool -- The Next Phase’
pti escort) = 123
nietee'acnrs) - [——]
ar teot) «pee
amet)
.close()Doo ID: 6
(BY Opening a Shelve
‘do = shetve.open(*/tmp/shelve cat")
2 = db.keys()
2 = db "book *]
= db "book2"]
print)
print)
print)
.close()
(U) Modifying a Shelve
do = shelve.open("/tno/shelve.dat')
keys)
2 = dof" bookt"]
= do "book2")
prine(a)
print(o)
print(a)
account} = 588
eecount") = 123
ae["bo0ka"} = a
a["booka"} = &oo ID: 6689695,
‘do, close()
UNCLASSIFIEDDoe ID: 6589695
Module: System Interaction
&
oo
(U) Basie operating system interaction using the os, shuti, and sys modules.
fated over 3 years ago by
UNCLASSIFIED
(U) Introduction
{(U) Pytion provides several modules for interacting with your operating system and the les and directories it holds. We wil talk about
twee: os , shutil and sys
(U) Be aware that while this notebook is unclassified, your output may not be (depending on the fles you're displaying).
(U) os Module:
(U) This module helps you interact with the operating system, providing methods fr almost anything you would want todo ata shell prompt. On
POSIX systems, there are over 200 methods in the © module; we wil just cover the most common ones. Be aware thatthe o8 module includes
‘methods that are not cross-platform compatible; the cocumentatin fs helpfully annotated with Availability tags.
(U) Directory discovery and transversal is prety basic:
import 05
os-getoud()
os.ehdin(*/tmp') # Unix dir--choose different dir for Windowsoe ID: 6689695
os. Listdir()
os. getoui()
walker = os walk(os.curdir)
‘ype(waiker)
List(vater)
(U) Avoid one common contusion: os.curéir is a module constant (°.” on Unictike systems), while os. getexs() i a function. Either one can
be used in te method os.walk., which returns a generator that traverses the filesystem tree stating a the method's argument. Each successive
value rom the generators @ tuple of
(ainectory, [subdirectories], [files])
(U) A variety of methods allow you o examine, modiy, and create or remove directories and files.
= open(‘nen_temp_file.tat, ‘w')
F.close()
os. stat(‘new_tenp_File.tet")
s.mkain(other_dir*)
s-renane( ‘new tenp_flle.txt", ‘other dir/tempfile.txt*)
(U) The os.path submodule provides addtional functionality, including cross-platform compatible methods for constructing and deconstructing
paths. Note that while itis possible to jin a path completely, deconstructing a path occurs one element at atime, right o left.
ample path = 05-path.Join(ford, "trucks", "f250")
sample_path
os.path.split(sample path)
0s. path.exists(sample_path)
(U) Information about the current environments also available, ether via specific methods orn the os.environ object, which functions ike a
dictionary of environment variables. If os.environ Is modified, spawned subprocesses inherk the changes.
os. getlogin()oc ID; 6589695
os.getutd() # unix
os. getgroups() # Untx
‘os.environ{ ‘NE_TENP_VAR"] = "323456"
s.uname() # Unix
(U) shutil Module
(U) Living on top ofthe 0s module, shutil makes highevel operations on fles and collections of les somewhat easier In partcular, functions are
provided which support fle copying and removal, a well as cloning permissions and other moladaa,
Amport shut
shutil. copyfile(are,dest) # overurites dese
shutil.copynode(sre,dest) # permission Dies
shutll.copystat(sre,dest) # perstssion bits and other metadata
shuttl.copy(sresdest) # works Like cp Sf dest 4s 2 dtrectory
shutil.copy2(sresdest) —# copy then copystat
shuts. copytree(srcy dest)
shutil.rwtree(path) # must be real directory, not o symlink
shuthl move(sre,dest) # works with directories
(U) sys Module
(U) The sys module provides access to variables and functions used or maintained by the Python interpreter, it can be thought of as a way of
‘accessing feature from the ayer between the underlying system and Python. Some ofits constants are interesting, but nt usualy usefuloe ID; 6589695
Amport sys
sys.maxsize
sys.byteorder
sys.version
(U) Omer module atributes are sometimes useful, although fiddling wth them can introduce problems with compatibility. For instance, sys.path Is
alist of where Python wil look for modules when ingore is called. I itis modified within ascrpt, and then modules can be loaded from a new
location, but there is no inherent guarantee tha location wil be present ona system other than your own! On the other hand, sys.exit() can be
sed io shut down a script, optionally returning an error message by passing a non-zero numeric argument.
UNCLASSIFIEDac ID: 689695
Manipulating Microsoft Office Documents with
win32com
SD ae
(U) Demonstration of using win32com to create and modify Microsoft Office documents.
eat
(U) Manipulating Microsoft Office Documents with
win32com
(U) Welcome To Automation with win32com!
(Te wind2e0m module connects Python othe Mcosot Component Object Model nerfae that enables inter-pocess communication and
thject creation witin Micoeat Os appcatons.
(U) Note: wins2com only exists on Windows platforms, so this notebook will not run on LABBENCH. In order to run this notebook, install
‘Anaconda3 on your Windows platform and use jpyter-notebook
(U) "Hello World" for Word
(U) We need to impor the library, and open Word.
import wins2con.client
word = win32con.client.Otspatch( "word. Application’)Doe ID: 69695
(G) Cispaten checks to see i Word is already open. Its, it attaches to that instance. I you's tke to always open a new instance,
Use bispatcnex
(U) By defaut, Word wil start, but wont be visible. Set this to Tre it you want to see the application,
word. Visible = True
(U) Greate a document and add some tox, setting @ font size that we like.
orddoe = word. Documents Add()
orddoc content. Text = "Hello Worle
orddoc Content Font. Size = 18
(U) Save the document and exit the application. Note that win3zcon bypasses the normal Python fle object, so we need to account fr the
Windows directory separatr.
(UAFELO} Also, CiassifyTool may nag you for a classification. In order to prevent thi, in Wor
under When Closing Document’, uncheck "Atways show Classifiation Form’, and click "Save".
worddoc. Savess("u:\\private\ \Jupyter\ \win32com\\hel10.d0cx")
word. quit()
‘select the “CiassifyToo!” tab, click on "Options", and
(U) Thats it
(U) More Elaborate Word Example
(U) There's another option fr stating the application:
word = wina2com.client.gencache. EnsureDispateh( "Word. Application”)
(U) This can take slighty longer, but enables access to win32com constants, which are required for some methods. The alternative is to look
‘through the win32com documentation forthe value ofthe constants you need,
(U) Lets take a look ata possible use case. Say we have reports in a particular format that we ned to regularly generate. We can create a
template wit the sections that willbe replaced. In this case, they are eportevent , ReportTine , and ReportPlace . First, dovnload the tempol
‘Them open the template and create a dictionary withthe sections and the data that wll be used,oe ID; 6589695
‘constants = win32con.client. constants 1 save some future typing
word. Visible = True
wordioc = word. Docunents. open( u:\\private\\jupyter\\win32com\\deno_tenplate.docx')
event details = { "Reportevent™ :
“ReportTine™
“neporePlace™
»
(U) Now the magic happens. Lets iterate through the dictionary, replacing al ofthe sections withthe data.
# Beecute( FindText, MatchCase, MatchiholeKord, MatchWiLdcards, MatehSoundsL tke, MatchALMordForms,
+ Forward, Wrap, Format, Replacewtth, Replace)
for tag, data in event details itens():
~ word. Selection. Find.Execute( tag, False, False, False, False, False,
True, constants.wiFindContinue, False, data, constants.wdeplaceAl1)
(U) We can ads a couple of paragraphs of addtional info, and we're done.
paragraph = worddoc.Paragraphs.Add()
paragraph Range.Text = ‘Additional info\n"
footer ~ worddoc. Paragraphs. Add()
Footer Range.Text = "Produced by me\n*
worddoc. aveAs(u:\\private\ \Jupyter\ win32com\ \deno_out. docx")
word. quit()
(U)PowerPoint
(U) PowerPoint works very similarly. Again, ownload the template
ppt = win32com. client Dispatch('PowerPoint Application")
resentation = ppt.Presentations.Open( u:\\private\\jupyter\\wins2cem\ WyTean_tenplate.pptx")
(U) Did you notice that we didnt need to set ppt.visibie ? PowerPoint is always visible.
title = presentation. s1ides(1)
(U) We know the fst side is the ttle slide, so we've seta variable tot. PowerPoint presentations are made up of sides, which in tum are
collections of shapes. To modify a presentation, we need to know which shape is which. Lets take a look atteDoe ID: 689695
title
(U) Hmm. That
air(titiey
not very helpful Lets see what methods we have:
(U) Atthis point youre probably realizing that COM objects dont act ike normal Python objects.
help(titie)
(U) So Python just takes anything you try to do with «tie and passes ion to the Windows COM library. Which means youll need to consult
"Microson's Win32Com documentation i you have questions about something,
(U) Lets get back to working with this presentation. We stl need to ind out which shape is which:
for 1, shape in enunerate( title. shapes)
shape. TextFrane.TextRange.Text = ‘Shape #{0)' format(1+1)
(U) This sets the text for each shape to Is index number so we now have a number associated with each shape. You only need to do this when
youre wrting your script. Once you create your template, the shape numbers won' change. So the tie is #1 and the subtle #2.
(U) undo a fow times wil remove the numbers.
(U) Lets update the tie slide with todays dat:
‘from datetine inport date
today = date. today().strftine( "xViad")
‘ite. Shapes(2).TextFrane.TextRange. Text ~ today
(U) Now ls update the status of our two focus areas. Weil ski the step of identiying the shapes we want to mesify
Focust = presentation.s1ides(2)
focust.Shapes(2).TextFrane. TextRange.Text - “ALL Good, 805s"
focus? = presentation.s1ides(3)
Focus2.shapes(2).TextFrane. TextRange.Text = "Sir, We have 2 probles’
(U) Now save the presentation with todays date, and Bob's your unde.
presentation. Savehs("u:\\private\ \Jupyter\ win32eom\\WyTeam_(0) ppt’ format (today))
presentation. close()
pot.qutt()Doe ID: 6559695
(U) Visio
(U) Staring the application should look familar:
visio = win32con.cltent Dispatch(*Visto.Application")
‘documents ~ visto.docunents
ocunent = documents.Add("Basic Network Diagran.vst") # Start with o butlt-tn template
document Title = "New Network Graph” # had 0 title
pages = visto, ActiveDocunent Pages
page = pages. Ttem(1)
(U) Visio is visible by default, but can be hidden if desired
(U) So we've created a document and grabbed the page associated wit it. Visio shapes are part of stencil packages, soles add a couple.
Networkstencil = visto.Oocunents.Addtx( "periph n.vss", 9, 16464, 0)
Conputerstencil = visto. ocunents.AddEx( "Computers and Honttors.vss", &, 15+6%, 0)
(U)Otner stencils are
+ Network Locations: netloc_n..
+ Network Symbols: net sym,
+ Detailed Network shapes: ctnet_a.vss
+ Legends: agn_n.vss
{(U) Other stencil names can be found on the Internet.
(U) Now we need the shape masters that wel use.
pe = Computerstencil Masters. 1tem(“PC")
router = NetworkStenctl Masters. Tten("Router")
server = NetworkStencsl Masters. Tten("Server”)
comector = NetnorkStencil. Masters. iten( "Dynamic Connector")
(U)The names match the names you see when you view the shapes inthe stencil sidebar. Lets add a few shapes.oo ID: 6689695.
‘pct = page.orop(ec, 2, 2)
pet-Text = *10.1.1.2"
e2 = page.Orop(pe, 12, 10)
pe2.Text = "20.2.
Server ~ page.Drop(server, 15, 5)
servert. Text = °10.2.1.200"
router = page.Drop(router, 8, 8)
routera. Text = °10.2.1.250"
(U) Some of the shapes went off the page, so resize. You can walt unl the end todo this, but its more fun to watch the connections being drawn,
page. ResizeToFitcontents()
page. CenterOraxing()
(U) Now draw the connectors.
arrow = page.Drop(connector, & °)
arrowBegin = arrow.Cel1sU("Seginx").GlueTo(pct.cel1su("Pinx"))
arround = arrow.cellsU("EndX").GlueTo(routert.cellsu("PinX"))
arrow.Text = "pel connection"
(U) We can customize a connector
arrow = page.Drop(connector, ©, &)
arrowBegin ~ arrow. Cel1sU("BeginX") .GlueTo(pe2.Cel1sU("PinX"))
arrowEnd = arrow.Cel1sU("En@X").GlueTo(routert.celisu("Pinx"))
arrow.Ce11sU("LineColor"). Formula = "=RGB(255, 153, 3)"
arrow.Cel1sU("Endarrow").Formula = "5"
arrow.Cel1sU("EndarrowSize").Formula = "4"
arrow.Cel1sU("Lineiesghe™). Formula =
arrow.Text = "pe2 conection"
1
arrow = page.Drop(connector, ©, 2)
arradBegin = arrow.Cel1sU(Beginx") .GlueTo(servert.cel1su("PinX"))
arrowénd = arrow.Cel1su("EndX).GlueTo(routert.Cellsu("PinX"))
arrow-Text = "server connection*
(U) Now resize, ecenter, and save.oe ID; 6589695
page. ResizeToFitcontents()
page. CenterDrawing()
document. Saveks"U:\\private\ \Jupyter\ \win32com\\visio_demo.vsdx")
(U) Close the application.
visto.quit()
(U) Excel
(U) win32com works with Excel too, but due to the slowness of the interface, you're probably better off using pandas,oc ID: 6589695
Module: Threading and Subprocesses
(U) Module: Threading and Subprocesses
(U) Module: Threading and Subprocesses
(U) Concurrence and Python's GIL - i.e. Python doesn't offer true
concurrence
(U) Python's Global Interpreter Lock (GIL) means that you can really only have one
true thread at one time. However, Threading in Python can be immensely helpful in
speeding up processing when your script can perform subsequent steps tha do not
depend on the ouptut of other steps. Basically, it gives the illusion of being able to
do two (or more) things at the same time.
(U) Threading
(U) Threading allows you to spawn off "mini programs" called threads that work
independently of the main program (sort of). Threading allows you to send data off
to a function and let it work on getting results while you go on with your business. ItDec ID:
Can also allow you to set up functions that will process items as you add them to a
work queue. This could be especially helpful if you have parts of your program that
take a long time to execute but are independent of other parts of your program. A
good example is using a thread to execute a slow RESTful web service query.
(U) This adds some complexity to your life. Threads act asynchronously - meaning
that you have limited control as to when they execute and finish. This can cause
problems if you are depending on return values from threads in subsequent code.
You have to think about if and how you need to wait on thread output which adds
extra things to worry about in terms of accessing data. Python provides a thread-
safe container named Queue. Queues will allow your threads access without
becoming unstable, unlike other containers (such as dictionaries and lists) which
may become corrupted or have unstable behavior if you access them via multiple
threads.
(U) Subprocess
(U) The subprocess module is useful for spinning off programs on the local system
and letting them run independently.
import ipydeps
nodules = ["threading’, “queve"]
for m in modules:
{installed packages = (package. project_nane for package in ipydeps._pip.get_installed dtstributions()]
Af (n not An installed packages) and (m not in ipydeps. sys. modules)
‘pydeps.pip(n)
Anport tine
from threading import Thread, Timer, Lock
rom queue inport Queve
Amport randoneo ID; 6589695
resulta = queve()
work.q = queve()
work list = (1
1 The worker thread pulls an item from the queue and processes it
def worker()
while True:
stem = worka.get()
o_work(item)
work_a.task done() #pause while until current work. task has completed
ef do vork(iten)
‘et submit query process results ond add result to Queve
‘result_g.put( wait_randon(iten) )
‘def watt_randon(t):
‘tine. sieep(t{1])
print( "Finished task ()' format(t{0}))
def nelo():
rine("hello, world”)
‘loading up our work.q and work_List with the sone random ints between 1 ond 10
‘ine total
for i An range(se):
xX = randos.randint(1,16)
‘ine total i=
work 4. put((4,%))
work 1ist. append (44))
work.a.qsize()
Aen(work 14st)
‘citine
print("This should take {) seconds’ format(tine_total))
for w in work list:
wat randon(w)oo ID: 6689695,
oxtine
for 4 in range(s):
‘t= Thread(target-worker)
daemon = True # ehread dies when main thread exits. If we don't do this, then the threads wilt continue to
1 “Listen” to the sork.q and toke ‘tens out of the work.q and automatically process as you
# stick more ‘tens into the work a
testart() 4 you have to stort o thread before {t begins to execute
work.q.jein() # block until alt tasks are done
(U) You can also use the Timer class to specify that a thread should only
kick off after a set amount of time. This could be critical if you need to
give some other treads a head start of for various other reasons.
Remember, when we are doing threading you have to keep timing in
mind!
soit ine
1 stupid LietLe exomple
t= Tiner(5.0, hello)
‘L-daenon = True
RLstart() # after § seconds, “hello, world" will be printed
(U) You can mix these. The output below will most likely look like a bucket
of crazy because threads execute (sort of) independently.
‘oading up our work q and work Uist with the sone random ints between 1 and 10
for 1 in range(se).
x= random. randint(2,10)
work. a. put ((4,%))Doe ID: 6589695
satine
for § in range(s):
‘t= Thread(targetoworker, )
‘daenon = True
testart)
= Tiner(5.0,helo)
‘eL.daenon = True
a start() # tL ‘WiLL probobly* print “hello, sorta’ before alt the other threads Finish,
# or dt might not {t depends on the work q contents
work g.Join() block untiL alt tasks are done
(U) Subprocesses
(U) For most subprocess creation you will usally want to use the
subprocess.run() convenience method. Please note, if you wish to access
the STDOUT or STDERR output you must specify a value for the stdout
and stderr arguments. Using the subprocess.PIPE constant puts the
results from STDOUT and STDERR into the CompletedProcess object's
attributes.
Import subprocess
conpleted = subprocess.run({"1s',‘-1"], stdout-subprocess. PIPE, universal_newlines=True)
rint("ARGS:", completed. args)
‘print ("STOOUT:”, completed. stdout)
Drint(*STDERR: “completed. stderr)
Drint(*return code:" completed. returncode)
completed = subprocess.run({Is", ‘nosuehfile" ],stdout- subprocess. PIPE, stdenr-subprocess PIPE, universal_nevlines-True)Bo ID: 6559698
Drint("return code: completed.returncode)
snport
* syntax. Other things __init_:py commonly includes are import. statements for subpackages, comments/documentation, and adeiional code
that glues together the subpackages.
{U) Since our module is small and we have nothing to initaize, we'l leave __inst_.py empty fwe wanted, we could actually put al the
‘awesome py code in init__.py itself, but that can be confusing for other developers. We could also define:
aus
(Awesome, “c001")
(U) But since those would get imported anyway, we dont need todo that. You only need to define _a11__ fhe package is complex enough that
{you want to import somethings and not others or ensure that subpackages get imported
(U#FOUO}For another example, see the __\ot_9y fle forthe parserror module on GitLab,Boe ID: 6559595
2. (U) Make the package into a distribution
(U) Atthis point, you could tar or zip up that package, give it to someone else, and they could extract it and use it. But what directory should they
‘ut the package in? What if your package depends on other modules to work? What ifyou change the package and want to keep track ofthe
version? These considerations suggest that there is some package management that needs to go on. Python does have the pip package
‘manager to handle alot ofthat. So how do we distribute the package so you can just pip install avesone ?
(U) First we need to add another layer of stuff around our package to help set it up for instalation,
awesone/
README.
awesome
init__.py
- Gresone.py
setup.cfe
~~ setup.py
(U) You can s2e that our original package directory has been moved down to a subdirectory. Above it we have a new project directory by the same
name, and alongside it there are afew more files. Let's look at each
(U) setup.py
(U) setup py is the most important fe. It mainy contains a setup() function cal to configure the distribution. Its also used to run the various
‘packaging tasks via python setup.py . The setup() function comes from the setuptools package, which is not pat ofthe Python standard
library. You may need to pip install setuptools fist to use it. (setuptools improves the legacy distutils package that is part of the standard
liorary andi the oficialy recommended distnbuton too! these days.)
(U#FOUO} Our setup.py looks lke this:oc ID: 6589695
from setuptools import setup
setup
erston="1.0
description-"(U) An avesone module for avesone things",
Jong_description-open(*READHE. nd", 'r").read(),
ord https: //eitlab.col.nsa.ce, gov /pyton/avesoneg18*,
buthor="COnP3321°,
Suthor_enail-comps32iens
seripts-(1,]
packagese["mecene’, 1]
package dsta-(]
Anstall_requires( i
se.400
1
(U) Ths just scratches the surface ofthe arguments you can give to setup() . You can find more deals onthe outside from the Python Packaging
‘Authority (inttps://packaging.python.org/tutorials/istrituting-packages/ ). For example, you can specify which versions of Python are
compatible using the classifiers and python requires options, specify dependencies with install_requires and other * requires options,
‘and include non-code data files.
(UIFOUO) For another example, see parserror's soo
(U) setup.cfg
(U) setup.fg is an IN file that configures some defaults for the setup.py options. Ours is fairly simple:
[metadata]
description-File = README.na
(U) Ifyou are using sheer , you may also want to set this if your code works for both Python 2 and 3:
(U) README.mdoc ID: 68,
{C}README mdi usta Markdown fle that ges an overview of what the package is for and describes how to install and use i (You may also
see README rst les instead. The st stands for restructured text, which isa popular Python-specific way to format text.)
3. (U) Push the project to GitLab
{U#FEYO) OF course, wit a projec ke this, you shouldbe using version control And since the nsa-pip server tes into GitLab, youll need to use
‘Gi. Learning Git could be a course in its, So weil just cover the basics here.
(U) For those who are unfamiliar, Gitis a distributed version contol manager. Its version contol because it allows to to track and revert changes in
‘your text fies easiy in a Git repository. You can view the history of your changes asa tree, and even branch off from it and merge back in easly if
You need to. Its disinbuted because everybody who has a copy ofthe git repository has a full copy ofthe history wih al the changes. And its
‘2 manager (ike ip is a manager) because it manages all the tracking itself and gives you a bunch of commands to let you add and revert your
changes.
a. (U) Install and configure Git : 7
another Linux sytem, you'can yon instal git (or CentOSIRHEL) or apt.
probably wantto use Gt Bash or TeriseGt or something sma
Stora Windows crectons hice
(UsF9U} Ate Gis insta, you wi eed to confine it and SSH to conect ota. On LABBENCH, the east way od thas oun tis
(Ruby notebook oseun Gi. and SSH in upyieocker, Ktakes caret
‘+ making a gitconfig,
+ making an overall tignore, and
+ making an encrypted RSA key pair based on your Pk cr for use with SSH
{(U#F249} you ae on another system. you wil aed ake cae of hose thingsyoursel. For ips, see the oni salu nstucions ox ion
Wit
b. (U) Add version control to your project
(U) Now to work! Enter your local project rectory (i.e. the top level where setup py lives) an run git Anit . That will tur your project into aa Git
‘epostory and get Git ready to track your files. (Note that it doesnt actually stat racking them yet-you have to explicily tl t what to track frst.)
(U) Next, since this is a Python package, you probably want toa .gitignore fle to your project directory containing:Do ID: 6589695
buitd/
"oye
(U) That wil to Git to ignore temporary fles made when you run setuptools commands.
(U) Now our project structure should look something ke ths:
swesone/
[n> weit
[oo Cosgit str...)
~ pitignore
~ awesone/
_init_oy
‘recone py
setup. cfg
setup.py
(U) Next, you can start tracking all these files for changes by running
est aad *
sit commit -m “Initial commit.”
c. (U) Make a corresponding GitLab project
(U) Congratulations! Your package is now under version control. However, you have the only copy ot, 60 your local copy goes away, your code
's gone. To preserve it and enable others to work on it, ou need to push it to GitLab.
(UIFOUO) The frst stop is to make a now GitLab project
4. (U) Visit the nsw povject nage on GitLab
2 (U) Enter your package name asthe Project path (0.9, svesone )
3. (UAFOO) Enter the overall classification level ofthe code and files in your project.
4 (U) Leave the Global acooss level sat to Reporter. Thal wil allow others to bath clone (copy) your code and fle issues if there are problems
With your package. See the Gil.ab permissions chart fra full description ofthe access levels.
5. (UFO) Choose the Namespace. By default it will make a personal project under your name and SID. Ifyou belong toa GitLab group
{and have the right permissions), you can also add the project to that group. Consider joining the Pinon group and adeing your project there.
6. (U) Add a short deseripton of your package, if you want
7. (UAFEWO) Select a visibility level Since NSA GilLab uses PKI, there is no diference between “intemaf" and “publi.”
8 (U) Hit"Create Project.”oe ID: 689698
{G)Aferwards, you can copy the URL from the main page of your new project and putitin the uel argumentto setup() In setup.py
d. (U) Push your code out to GitLab
(U) Your new project page should have some instructions on how to push code from an existing flderor Git repository. Near the top of the page
you should soe a box with "SSH" highlighted tothe left anda it@gitlab.coi.nsa.e.gov... address inthe box tothe right. Copy that address.
Then,
41. (U) ¢ to the top level of your project
2 (UMFOUO) it remote add origin gitGpstlab.col.nse.se.gov... (using the git address you just copied)
3.(U) eit push -v origin master
(U) Afterwards, ifyou visit your GlLab project page, you should see a list of your projec les and the rendered contents of your README.
4. (UHFOUO) Create a tag for the package release.
(U) Back in your local repository, tag your local project with pip- and the release version and push that tag to GitLab For example,
version is 0.4.0, run
ait tag -2 pip-0.1.0 -m “Releasing to the pip repo
fait push origin pip-0.1.0
(U) ldealy version numbers should comply with Python's PEP 440 specication. n plain English, that means they shouldbe of the form:
ajor. Minor micro
| \- changes every bugeix
1
1
'
| \- changes every new feature
'
v
changes every tine backwards-conpatibility broken
5. (UHFOUS} Create distributions
(U) Source distribution (sdist) and Whee! cstbutions are both collections of files and metadata that can be installed on a machine and then used
a Python library. Wheels are considered a “bul distribution” while dist needs one extra build step, athough that i transparent toa userifyou
‘are using pip to install. For most use cases, you want to build both an edit and bast wheel, upload both, and lt pip work out which one o use
(almost always the Whee!)oe 1D: 6689695
(G)An example of building and publishing the distributions is as folows:
ppp install setuptools wheel twine
pythons setup.py sdist bdist wheel
a. (U) Upload package to NSA-Pypi with Twine--
(U) wine is @ Python library developed by the same team that maintain Pypi. It uses requests py to make secure (nips) uploads of Python
‘packages, and offers some commandline arguments to make it easy to specify what reposior, lent cet, ca bundle, and usemame/password to
se. The usemame and password can be blank (or anything you want) for uploading to NSA-Pypi, because NSA-Pypi wil use your PKI and.
‘Caspot to determine authentcaton/authorzaton
(U) Note thatthe NSA-Pypi server supports XPE with Labbench, so using twine there on Labbench does not require uploading your personal cat.
Innor-Labbench environments, the folowing command should work
twine upload dist/* +
‘Fllenane = “nb classification data.cey”
AF not Filename in o$.14stdie():
File content = bucket .get(Filenane). content
open(Filenane, "wb').write(File_content)
features labels = pd.read_csv(filenane) set_index("té")
ef preview(aF, nan
4 sampled!
af = oF. sample(n-nrows)
print “Preview of {):".fornat(nane))
‘dssplay(@Fnead(nrows))
fata", rows-3, sampled-True):
preview( features labels)
print("statistical summary of data:")
display( Features labels. descrsbe())
print("Datatypes in dataset:")
0.8", inplece=True)
‘in probability = max probs_éf[ probability” ].min()
inax_probs_dfl'correct")} = max_probs_df{'actual'] == max_probs_df{ "prediction" }
possible_prob scores = np. Linspace(
startenin_probability,
stop-.95,
run-ine(np.cetl((.95 - min probability) * 208))
)
‘rob_to_average_score « pd.DataFrane({‘prob_score*: possible_prob_scores})
average_scores ~ defaultatct( list)
for score in possible_prob_scores:
average score = np.nean(nax_probs dF query("probability >= (0)".format(
round(score,3)
)9E" correct)
average scores[ ‘total! ].append(average_ score)
‘rob_to_average_score{ ‘average score") = average_scorestotal"]
plt.Figure(dpi-100, Figsize-(20,5))
plt.plot (prob to_average_score{‘prob_score' J, prob_to_average_score| ‘average_score'])
plt-xlabel("Probability cut-of (Lover bound)")
plt-ylabel("Average accuracy percentage")oc ID: 6589695
Plt -title(“Average accuracy percentage of predictions above a certain probability score")
plt.show()
Distribution of Probability Scores
‘This ust shows the distribution ofthe probably scores. We want to see a skewed-ef distribution, meaning that most of the predictions are made
with a high probability score.
plt.figure(dpi=100, figsize=(10,5))
plt.hist nax_probs_df[ probability"), bins-20)
plt.xlabel("Probability Scores")
plt.ylabel (“count”)
plt.title("Distributtion of Probability Scores from Predictions”)
plt.show()
Show Stats on Categories of Notebooks
‘This is just showing the statistics onthe categories of notebooks as of 14 November 2017-these are just based on the labels in our taining data,
‘not on the model predictions.
fig, ax = plt.subplots(Figsize=(20,16))
bels[ “eategory" ]-value_counts() plot. ple(
autopet="12.2P88",
‘ites "nbGallery Notebooks by Category as of 14 Novenber 2017", ax-ax)
plt.show()
Run Model on a New Notebook
Jus enter the URL fora notebook in nbGallery, and see what category the model predicts for itee ID: 6559695
class Notebookextractor (object):
def _tnkt_(seiF):
Gelfs05 = requests_pki.session()
self notebooks_url = ‘hetps://nbgallery.nsa.tc.gov/notebooks/*
self. tooltip_finder = re.compile("(?:notebook has been|health score)")
ef downioad_notebook(selF, notebook td):
AF "/* in notebook 44:
notebook sd = notebook £4.rstrip(*/")
notebook sd - notebook sd. split("/")[-1]
self notebook id = notebook 44. split(*-")[0]
resp = self.ses.get(
self.notebooks url + notebook id,
headers-{
‘User-Agent’: "Mozil18/5.0 (Windows NT 6.1; MOMGA; r:21.0) Gecko/28100101 Firefox/31.0",
“Accept: "text/ntml, appLication/xhtml xml, application/xnl;9
»
‘ineout-30
>
return resp
def to_soup(self, resp):
return Beautifulsoup(resp. content)
dat extract tooltips(self, soup):
feoltips dict
for Link in soup. find ali(‘a', (‘class's ‘tooltips’, ‘title’: self. toottip_finder)):
4€* shared” in Uink{"tsCle"]=
continve
Af health score” An 2ink{"it2e"]
health = re.search(=(fie)k", Link{ "tite" ]).eroups() fo)
tooltips. sict{ health") = round(Flont¢health) / 109, 5)
use
ay » re.search("notebook has been (Qh) ~, Link “tstte"]).¢roups(9(01
ey = re-sub("(r2ed)$", =>, key)
wey et
‘otal val = re.search(* Go fetin
‘tooltips dict{key] = int(total_val)
unique. val = re. searen(“(\po)ousers", link tstle"))
5°, Link tit2e").groups(){2]oe ID; 6589695
AF untque_val:
tooltips dtct{ unique ~ + key] = Ant(untque_val.groups(){0})
return tooltips dict
ef extract toxt(self, soup):
extracted text = [1
for a in soup. select(“snotebook0isplay")[0] FindAll
4 a.mane in [*hi*, "ha", "m3", "he, “hS*, “ps
extracted text” append(a.text)
return * ".Join(extracted_text)
ef extract_description(self, soup)
escription = soup. find nets’
‘return description
{nane': “description’))[ content]
ef extract title(self, soup):
title = soup.Find(“title’).text
return title
ef extract_classification(self, soup):
‘classification = soup.find(‘aiv', {‘class': re.compie("classtanner.¢*)}).text
return classification
det extract omer type( self, soup
feroup = soup.Find('a", (re: re.comte(*H/erouns§.«°)))
46 grow:
return "Group"
return “User
de struct lang(self, 509)
Tong. tog ~ s00p.Find(ing', (tS
lang» research Gb". Lana 9
return lang
re.compile("This notebook 4s written in *)})
‘eitle"]).groups()(0} lower)
def extract(self, notebook 44, to_pandas-Trve):
resp ~ self download natebook(notebook id)
soup = self to_soup(resp)
rb_dict = self_extract_tooltips(soup)
rnb_dict[ notebook text] = self.extract_text(soup)
nb dtet[ “description’] = seif.extract_description(soup)
nb dict{"title') = self.extract_title(soup)oc ID: 6689695,
nb_dict['classification'] = self.extract_classification(soup)
nb_dict[ ‘omer_type"] = self.extract_ower_type( soup)
rnb_dtct[ "Lang'] = self.extract_lang( soup)
nb_dtet("4d") = selF.notebook id
AF to_pandas
‘return pd. DataFrane( nb _dict]) set_index( id")
return nb dict
clase Notebookdodel (object):
def _init_(se1f, 46
def snithalize rogex(self):
Self.a = aF-copy()
self class mappings = (TOP SECRET’: 3, “SECRET: 2, “CONFIDENTIAL
self sealer = scaler
self Features to_seale ~ features to_seale
self. imputer = inputer
self features to_inpute ~ features to_impute
elf features to_encode ~ features to encode
self encoded columns = encoded colums
self validate colums()
self initialize regex()
self stenmer = Snowballstenmer( english’)
self mapper = mapper
self selected Features = selected features
self all Features = List(training data)
self. clf = best_clf
+ 'UNCLASSIFIED': 0)
‘self.pm_stripper = recompile” (?:Ak (K+) ">
‘self.o_word.chars = re.compile(*[*2-2]+5")
self. strip_punct digits - re.compile(’({\ka-2]")
Self. strip_extra_spaces = re.comptle("Ns{2,)")
self. contains digits ~ re.conpile(’(.*ffs.*)")
‘self. strip_punct = re.compite(” (\bikI")
validate_columns(self):
for col in self. features to_scale + self. features to_impute + self. features_to_encode:
Af col not in list(self. df):
self.¢f[col] = np.nan
def classification_to_level(self)PorID: SEBREPS axe aff‘classi fication level’] = self df{"classification'] apply (lanbéa x: x.split(?/)(0))
self. df[classification level! ] = self.df{ ‘classification level’ ]-apply(lambda x: self.class_mappings[])
Self. df.drop({classification’], axis-i, inplace-Trve)
ef encode datetine(self, 2):
Sstrp_string = "2¥-im-%d 2A: mm:%S
AF Len(x) == 25:
strp_string += * +0000"
timestamp = datetine.strptine(x, strp_string)
return (datetine(2017, 13, 14) - tinestanp).days
ef datetine_to_days_ago(self, tinezone-True
Af “created_at' in self.dF.colums and ‘updated at’ in self.df.colums:
self-df{ ‘days_ago_crested’] ~ self.of[ ‘created at"].apply(lanbda x: self. encode_datetine(x))
self df ‘days_ago_updated'] - self df[ ‘updated at'].apply(lanbda x: self encode datetine(x))
self.df.drop(["created at", ‘updated st", axts-1, inplace-True)
aise:
self.dF[‘days_ago_created'] = ©
self.dF{ ‘days_ago_updated'] = ©
def calc ratio(self, field):
Af fleld An self.aF- columns:
self. df{ ‘unique {0} to_(0)'.format(Field)] = selF.dF{unique_' + field] / self.df{ Field]
ease.
self.df[field] = 0.0
self-df[ ‘unique’ + field)
self-df{ ‘unique (0) to {
format(Fleld)] = 0.9
def calc. ratios(seif)
self cae ratio( runs’)
self. cale_ratio( dowloads")
self cale_ratio( views")
Af self. aF['days_ago_created"].values[2] =:
self.df{ ‘updated to_created"] = 0.0
else:
self.dF{ ‘updated to_created"] - self. df
5900 updated’) / selF. af
ef scate_features(self):
self df[self. Features to_scale) = self. scaler.transforn(self.dF[ self. features. to_scale])
ef inpute_nissing( selDvcIP: 69095 are aftaelf. Features to impute] ~ self smputer.transfora(self.df{selFfeatures_to_sapute])
det encode _categories(self):
self.d¥ ~ pd.get_dumies(self.d, column
for col in self. encoded column
Af col not in list(self.f
self.af[col] = 0
If features_to_encode)
‘def strip_portion_sarkings(self):
self. text_features = [col for col in List(self.dF) AF self.dF{col] type == np-object]
for col sn self.text_features:
self.df{[col] = self.df{col].2pply(Ianbéa x: self.pm_stripper.sub("*, x))
def normalize text(self, text):
normal ~ text. lower)
1 remove any words contatning digits
normal - “ *.Join({selF.contains digits. sub(" ", word) for word in normal. split(” *)})
4# remove words that contain no word characters(a-2)
normal - “ *.Join({selF.ne_word_chars.sub(* ", word) for word in normal.split(* ")])
4 renove oll punctustion and digits
normal = self strip_punet_aigits.sub(**, normal)
# renove alL punctuation
‘mornal = self.strip_punct.sub(* *, normat)
# replace consecutive spaces with a single space
normal = self.strip_extra_spaces.sub(* °, normal)
1 renove Leading and tratLing whitespace
normal = normal.strip()
return noraal
ef normalize text_features(self):
for feature in self.text_ features:
self.df[feature] = self. df{ feature] .apply(self.normalize text)
‘det stom text(self):
for col in self.text features:
self. dF{col] = self.df{col] apply(lanbda x: * *.join([self.stemmer.sten(word) for word in x.split(" °)))
‘def F411_ntssing_coluns(sel¥)
for Feature in self.all features:
AF feature not in list(selF.¢f):
self.df{Feature] = &POIDS RPS sea gf « self. df Fi1I00(0)
def transforn(self)
self, classification to_level()
self datetine todays g0()
self cale_ratios()
self scale_features()
self impute sissing()
self encode categories()
self. strip_portion_markings()
self normalize text features()
self sten text)
self. 111 missing colums()
self. transformed = self.mapper.transfora( self dF)
ef predict (self):
return {‘predicted class's self.clf.predict(self.transformed)[0]}
‘det predict_proba( self):
probs = self.clf.predict_proba(self. transformed)
mmax_prob = round(np.rax(probs), 3)
max_class = self.clf.classes_{np.argnax(probs)]
return {"predicted class": max_class, ‘probability’
rmax_prob)
‘def transform predict(self, probability-False):
self. teansform()
AF probability:
‘return self.predict_proba()
return self predict()
‘otebook.url = input("Enter the nbGallery URL for 2 notebook: ~)
pe = Notebookéxtractor()
‘b_dF = ne.extract(notebook_url)
trode] ~ NotebookModel(nb_¢F)
prediction = nodel. transform predict(probability-True)
Dprint("The predicted category for {) is {) with a probability of {).".format(
notebook url, prediction( ‘predicted class'].tithe(), prediction{ ‘probability'}))Doc I 68969
COMP3321 Day02 Homework - GroceryDict.py
Smt ea witty ree
300 10 —
as
(U) Homework for Day 2 of COMP3321
(U) COMP3321 Day02 Homework - GroceryDict.pyeID: 6689685,
‘# Grocery Uist
yGroceryList = ["apples", “bananas”, “milk”, "eggs", “bread”,
hamburgers", "hotdogs", "ketchup", “grapes”,
tilapia", “sweet potatoes", "cereal",
“paper plates", "napkins", “cookies”,
“Hee creas”, “cherries”, “shagpoo"]
vegetables - ("sweet potatoes", "carrots", "broccoli", "spinach",
‘ontons", "mushroons", "peppers"]
fruit = ["bananas", "apples", "grapes", “plunbs", "cherries", “pineapple”)
cold_itens = ["eges", “milk”, "orange juice", “cheese”, “ice crean"]
proteins = [*turkey", "tilapia", "hanburgers", "hotdogs", "pork chops", "han", “neatballs")
boxed_itens - ["pasta", "cereal", ‘eatneal”, "cookies", "ketchup", “bread"]
paper_products = ["tollet paper", “paper plates", "napkins", “paper towels")
tolletry_itens = ["toothbrush", “toothpaste”, “deodorant”, "shampoo", soap")
GroceryStore = dict({*vegetables":vogetables, “fruit”:fruit, “cold_itens":cold_ttens,
“proteins":proteins, "boxed itens":boxed_itens,
“paper_products":paper_products, “tolletry_itens":toiletry_itess))
myewGroceryList = dict()
(U) Fl in your code below. Sort the items in myGroceryList by type ino a dictonary: ayNexGrocerylist The keys ofthe dictonary should be:
rvegetables", “fruit, “cold itens",
“proteins”, “boxed items", "paper products",
“toiletry itens"]
(U) Only use the crocerystore dict , not the individual tem lists, todo the sorting, Note:The keys for mMeucroceryList are the same as the
keys for GroceryStore.Do ID: 6589695
print (my vegetable List: ", myNewiroceryList.setdefault( vegetables", 1ist()))
print (my fruit List: *, mewGroceryList.setdefault("fruit", List()))
print ("my cold item List: ", myNewiroceryList.setdefault( ‘cold items", 1ist()))
print("My protein List: ", mesGroceryList setdefault( ‘proteins’, 115t()))
print ("my boxed item List: ", myNlewGroceryList setdefault( ‘boxed itens', list()))
print ("My paper product List! ", myNesGroceryList.setdefault( ‘paper products”, 1ist()))
print ("my tolletry item List: ", ayNesGroceryList.setdefault(‘totletry tess", list()))ec 1D: 6689696
Password Project Instructions for COMP3321
ted 6 months ago by| ia COME 3521
Sac
(&) Password Project Instructions for COMP3321 See ae
(U) Password Project Instructions for, COMP3321
(U) Tse ar the password projectnstucins fr COMP3321[_——ou neo sea fe contarng your uncon) the
Inctudor nota retabook. Flee shout be named SID pestwoTT RRS TT
(U) Password Checker Function
(U) Demonstrates the abilty to loop over data, utlizing counters and checks to see if all equrements are met.
(U) White a function called password checker that takes as input a string, which is your password, and retums a boolean True ifthe password
‘meets the following requirements and aise otherwise.
1. Password must be at least 14 characters in length
2. Password must contain atleast one character rom each ofthe four character sets defined below, and no other characters.
’3. Passwords cannot contain more than three consecutive characters from the same character set as defined below.
(U) Character sets:
‘+ Uppercase characters (string.aseii_uppercase )
+ Lowercase characters (string-aseii_lovercase )
+ Numerical digits (string. digits )
+ Special characters ( string.punctuation )Bo ID: 6539696
YYou'may want to write mutiple functions that your password checker function cals.
“Due: End of Day 3
{ef password checker password)
This 4s my awesone docstring for ny avesone pessnord
checker that the author should adjust to say something
sore meaningful.
return True
(U) Run the folowing bit of code to check your passsiora checker function. I your code is good, you should get four (4) Trve statements printed to
‘the screen,
2 This 1s a good password
print (password checker("abcABC1231@#abcABC1231 64
‘rve)
1 Ts ts Anvaltd becouse the runs of sane character set are t00 Long
print (password_checker( "abcde fgABCOEFGI2345671688%°8") == False)
# This 1s invalid becouse there are no characters from string. punctuation
print (password_checker("abcABC123abcABC123") == False)
# TMs ts alts becouse &€ ts t00 short
print (password_checker("aa861166") =~ False)
(U) Password Generator Function
(U) Demonstrates the ably to randomly insert characters into @ string that meets specific password requirements
(U) Write a function called password generator that takes as optional argument the password length and retums a valié password, as defined in
Password Checker Function. Ino length is passed tothe function, it defaults to 14, The following cade autine does not account forthe optional
‘argument. You must make a change to that
(U) Donot use the passiord_checker function in your password_genenator . You can use it ater you get something returned from
your password. generator for your own testing, butt should not be part ofthe function itself,
Due: End of Day 5Doc ID: 6559696
‘def password, generator (Length):
This is my awesome docstring for ay avesone password
fgenerator that the author should adjust to say something
sore meaningful.
return True
(U) Assuming you have a valid password_checker function, use the folowing code to check your password_generator Iino False sprint, you are
‘900d. Otherwise, something is up.
my_password - password generator()
Af len(nmy_password) '= 14 oF not password_checker(ay._password):
print(False)
ry_passvord = password generator(25)
AF en(ny_password) |= 25 or not password checker(wy_password):
print(False)
(U) Ifyou realty want fo testi out, run the following. If False prints, something is wrong
‘from random import randint
for 4 in range( 10000)
Af not password_checker(passord_generator(randint(24,30))):
print(False)Doo ID: 6589697
Final Project Schedule Generator —___
(U) Little notebook for randomly generating a final project presentation schedule. Students will present every 30 minutes
slarting atthe start time specified, with an optional hour blocked off for lunch,
Import Dependencies
Amport pydeps
Spydeps.pip(('query_input'])
Amport random
fron datetine import datetine, tinedelta
inport ipywidgets as widgets
import re
(Python display import display, clear_output
‘uery_Anput Asport QueryInput
f
f
Run Random Generator!
This uses a fandoncenerator class that inherits trom the queryrnput lass from the query_snput package we imported to make creating the
widget box and extracting the values outa litle easier.
Approved for Reoc ID: 6589697
lass RandonGenerator (Queryinput):
def _init_{(self, title-"Enter data for random project presentation tine generator"):
iuper (RandonGenerator, selF)._init_(title)
self.default_layout = ("1_width': "260px', 'r_width': ‘4B0px', ‘r_justify_content*
"box justify content”: “eenter*}
def generate tines self):
start tines - [J
‘for 4 in range(s90, 1500, 50):
start tine = ste(1)
start_time = re, sub("
AF Len(start_tine) => 3:
start_tine - "0" + start tine
start_tines. append(start_tine)
return start tines
"30" start_time)
def random schedule(self, students, start tine, lunch tine-tone):
start_tine = datetine.strptine(start tine, "")
A Lunch tine:
‘unch_tine = datetine.strptine(lunch tine, “=H0N")
‘random. shuffle(students)
for student in students:
print(f"{student} will present at {start tine.stetine("00")}*)
Sf lunch tine and start tine =- lunch tine ~ tinedelta(ainute
start_tine += tinedelta(ainutes-26)
ne
start_time += tinedelta(sinutes=20)
det subait(self, 6):
clear_output (wait-True)
self. validate snput()
self extract_input()
students = [Student.strip() for student in self.extracted_input[ student nases"].split(*\n") 4 student.steip()]
self.randon_schedule(students, self.extracted_input{ ‘start _tine’], self.extracted input get( “lunch ine"))
ef create_input_form(self, start_tines):
self butld box(deseription="cb/>Enter student nanes, one per line:",
Inane="student names’, requiredsTrue, widget_typenwidgets. Textarea, r_height="400px",
self default layout)oc ID: 6689697
self. butld_box(description-"cb/>Select the start tine: , options-start_times, nane-‘start tine’,
required-True, widget_type-widgets.Select, **self.default_layout)
self bulld box(description-"Select a lunch tine (optional):", options=[Wone] + start tines, name-'lunch time’,
required-False, widget_type-uidgets.select, "self default_layout)
self build box(description-"", nane-'submit", widget_type-xidgets. Button, button text~'submit’,
bbutton_click-self.submit, required-False, box justify content="center', r_border='solid 3px’,
ovkdth="200px", _hedght='Sepx")
ef cun(serf):
start tines = self, generate tines(),
self create_input_forn(start_tines)
self display(border="solid 25x")
1 = Randoncenerator()
re-run)You might also likeYes PleaseYes Please Rating: 4 out of 5 stars 4/5 (1898) John AdamsJohn Adams Rating: 4.5 out of 5 stars 4.5/5 (2409) Steve JobsSteve Jobs Rating: 4.5 out of 5 stars 4.5/5 (807)
|