You are on page 1of 49

INTRODUCTION

TO
PYTHON/JUPYTER NOTEBOOK

By

Dr Davison Moyo (PhD, Wits)

1|Page
PYTHON/JUPYTER NOTEBOOK: Course Outline

Session 1: Python and Jupyter Notebook


• Introduction to the Python/Jupyter Notebook
• Important packages to use in data analysis
• Installing Python/Jupyter Notebook
• The Jupyter Notebook Interface
• Creating a working FOLDER and numerous operations

Session 2: Data Wrangling with [PANDAS] in Python/Jupyter Notebook

• Importing data into your working space


• Checking your data set

• Data selection and Subsetting

• Reading and writing files

• Sorting and ranking

• Handling missing data


• Date/time types
• Merging and joining DataFrame objects
• Concatenation
• Reshaping DataFrame objects
• Data transformation
• Permutation and sampling
• Data aggregation and GroupBy operations

Session 3: Exploring Data in Python/Jupyter Notebook


• Descriptive Statistics

• Data summarization
• Frequencies

2|Page
• Crosstabs
• Normality Test (Checking that your data is normally distributed)

Session 4: Plotting and Visualization

Plotting in Pandas vs Matplotlib vs Seaborn

• Bar plots
• Stacked graphs

• Histograms

• Box plots
• Area graphs
• Line graphs

• Grouped plots

• Scatterplots

Session 5: Data Analysis in Jupyter Notebook


• Statistical modelling
• Fitting regression models
• Model selection
• Hypothesis Tests
• Independence Tests
• T-tests
• Nonparametric Tests
• ANOVA

3|Page
This series of workshops is designed for students who plan to use the free statistical
software PYTHON for statistical analysis and graphical presentation. Python has
more than one API that can be used to access the python language. I will also
introduce JUPYTER NOTEBOOK, a code environment that helps to use of Python.

PYTHON has become the most popular software environment followed by SQL
and the R. In addition, it is a free and open source, so if you can use Python, then
you will never be constrained by your future employer's choice of statistical
software. This means that the skills you learn now can follow you for the rest of your
life. Python is becoming the primary language of data science and statistics and
is being adopted across academia, government, and businesses to help manage
and learn from the growing volume of data being obtained. Hopefully, you will
get a sense of some of the power of Python from these workshops.

Look at the table below and see for yourself how Python compares to other
commercial statistical software packages (SPSS, SAS and STATA) available on the
market.
A comparison of Tools for data analysis
Features SPSS SAS STATA R Python 3.0

Learning curve Gradual Pretty steep Gradual Pretty steep Steep

User interface Point-and-click Programming Programming Programming Programming


Point-and-click

Data manipulation Strong Very strong Strong Very strong Very Strong

Data analysis Very strong Very strong Very strong Very strong Strong

Graphics Good Good Very good Excellent Excellent

Costs Expensive – Expensive – Affordable – Open source Open source


licence licence licence (free) (free)

Student Student version Student


discount (2014) discount

Released 1968 1972 1985 1995 2008

4|Page
What is Python?

Python is a high-level scripting language which can be used for a wide variety of
text processing, system administration and internet-related tasks. Unlike many
similar languages, its core language is very small and easy to master, while
allowing the addition of modules to perform a virtually limitless variety of tasks.
Python is a true object-oriented language, and is available on a wide variety of
platforms. There is even a python interpreter written entirely in Java, further
enhancing python’s position as an excellent solution for internet-based problems.

Python was developed in the early 1990’s by Guido van Rossum, then at CWI in
Amsterdam, and currently at CNRI in Virginia. In some ways, python grew out of
a project to design a computer language which would be easy for beginners to
learn yet would be powerful enough for even advanced users. This heritage is
reflected in python’s small, clean syntax and the thoroughness of the
implementation of ideas like object-oriented programming, without eliminating
the ability to program in a more traditional style. So, python is an excellent choice
as a first programming language without sacrificing the power and advanced
capabilities that users will eventually need.

Python is an interpreted high-level programming language for general-purpose


programming Created by Guido van Rossum and first released in 1991, Python
has a design philosophy that emphasizes code readability. Python was
conceived in the late 1980s and its implementation began in December 1989 by
Guido at Centrum Wiskunde & Informatica (CWI) in Netherlands.

Python 2.0 was released on 16 October 2000 and had many major new features,
including a cycle-detecting garbage collector and support for Unicode.

Python 3.0 (py3k) was released on 3 December 2008 after a long testing period

5|Page
Although pictures of snakes often appear on python books and websites, the
name is derived from Guido van Rossum’s favourite TV show, “Monty Python’s
Flying Circus”. For this reason, lots of online and print documentation for the
language has a light and humorous touch. Interestingly, many experienced
programmers report that python has brought back a lot of the fun they used to
have programming, so van Rossum’s inspiration may be well expressed in the
language itself.

6|Page
1. WHY PYTHON?
There are several reasons for data scientists to adopt Python as their preferred
programming language, including:
1. Open-source nature and active community
2. General purpose
- suitable for analysis of financial data
- other fields – DJANGO
- Web programming
3. High level language: Employs syntax closer to human language – makes
language easier to learn and implement
4. Shorter learning curve and Easy-to-learn with a syntax that is clear and
intuitive
5. It provides the larger ecosystem of a programming language (large
collection of powerful and standardized libraries – and also THIRD PARTY
SOFTWARE) and the depth of good scientific computation libraries
6. Very powerful: Powerful integration with fast, compiled languages (e.g.
C/C++) for numerical computation primitives (as used in NumPy and
pandas)
7. Ease of integrating the core modeling process with database access,
wrangling post-processing, such as visualization and web-serving
8. Availability and continued development of Pythonic interfaces to Big
Data frameworks such as Apache Spark or MongoDB
9. Support and development of Python libraries by large and influential
organizations such as Google or Facebook (e.g. TensorFlow and
PyTorch)
10. Python has certain advantages that can improve coding, especially in
the large corporations and professional environment.

Summary of the technical advantages


• Free and constantly updated.
7|Page
• Can be used in multiple domains.
• Intuitive syntax that allows for complex quantitative computations.
• Practical application

2. WHY JUPYTER?
Jupyter Notebook

- Jupyter notebook, formerly known as the iPython notebook, is a flexible tool


that helps you create readable analyses, as you can keep code, images,
comments, formulae and plots together.
- Jupyter Notebook provides a programming environment that is optimized for
interactive computing with Python

Language kernels

Jupyter + Python: {the file extension foe the Jupyter notebook files is the .ipynb:
iPython notebook document}; R and Julia

Jupyter facilitates communication tremendously

Text Can create your notes and save then together with
your code and formulae [MARKDOWN]

Code Can type your code and execute in the CODE MODE

Output Can get a variety of outputs in the PYTHON OUTPUT


WINDOW
Results
Figures
Graphs
Pictures & others

8|Page
Python Datatypes

Python offers several powerful data structures, and it pays off to make yourself
familiar with them.

One can use


o Tuples to group objects of different types.

o Lists to group objects of the same types.

o Arrays to work with numerical data. (Python also offers the data type
matrix. However, it is recommended to use arrays, since many
numerical and scientific functions will not accept input data in matrix
format.)

o Dictionaries for named, structured data sets.

o DataFrames for statistical data analysis.

9|Page
PYTHON LIBRARIES FOR DATA ANALYSIS

Numpy and Scipy – Fundamental Scientific Computing

Pandas – Data Manipulation and Analysis

Matplotlib – Plotting and Visualization

Scikit-learn – Machine Learning and Data Mining

StatsModels – Statistical Modeling, Testing, and Analysis

Seaborn – For Statistical Data Visualization

10 | P a g e
SETTING UP PYTHON

Installing Python

First, we want to download the Anaconda Python distribution. The Anaconda


distribution is not the only way to get python – indeed, there is a good chance a
version of Python is already installed on your computer – but the Anaconda
installation does a number of very nice things:

• A clean installation of the latest version of Python


• Installation of many add-on libraries that are commonly used in scientific
computing
• Installation of a “package manager” – a tool for installing new packages in
the future and updating already installed packages.

To install the Anaconda distribution, just go to the Anaconda download


page and pick the appropriate installer. Make sure to get an installer for Python
3.x, not 2.x!

INSTALLING PYTHON AND JUPYTER

PLEASE copy and paste the following link (below) into your browser:

www.anaconda.com

DOWNLOAD on the home page

Scroll down till you get to this part (below) of the home page

11 | P a g e
Then click on the Individual Edition and this will take you to this page (below)

Click DOWNLOAD and then select one (match [64-Bit or 32-Bit] your computer
PLEASE) of the 3 operating systems:
• Windows
• MacOS
• Linux

12 | P a g e
On the screen that will appear

Check the system you have on your computer

On the Control Panel then select (to verify the system you are using)

System

 Them go back to select the appropriate version of python compatible with


your computer.

 Please check your whether your operating systems is a 32bit or 64bit and
then choose the matching version of Python

 Select the latest Python version 3.8 or the latest version available on the
download page

 After downloading

Execute or Run then

Follow instructions till the end as illustrated below for either the
Windows or MAC OS operating systems
---------------------------------------------------------------------------------------------------------------------

13 | P a g e
A Step-By-Step Guide on How to Install Python and Jupyter Notebook in
Anaconda for the owners of computers running on Windows and Mac OS

Install Anaconda

What is Anaconda?
Anaconda free open source is distributing both Python and R programming
language. Anaconda is widely used in the scientific community and data scientist
to carry out Machine Learning project or data analysis.

Why use Anaconda?


Anaconda will help you to manage all the libraries required for Python, or R.
Anaconda will install all the required libraries and IDE into one single folder to
simplify package management. Otherwise, you would need to install them
separately.

FOR: Windows User

Step 1) Open the downloaded exe and click Next

14 | P a g e
Step 2) Accept the License Agreement

Step 3) Select Just Me and click Next

15 | P a g e
Step 4) Select Destination Folder and Click Next

Step 5) Click Install in next Screen

16 | P a g e
Step 6) Installation will begin

Once done, Anaconda will be installed.

17 | P a g e
FOR: Mac User

Step 1) Go to https://www.anaconda.com/download/ and Download Anaconda


for Python 3.6 for your OS.

By default, Chrome selects the downloading page of your system. In this section,
installation is done for Mac. If you run on Windows or Linux, download Anaconda
5.1 for Windows installer or Anaconda 5.1 for Linux installer.

Step 2) You are now ready to install Anaconda. Double-click on the downloaded
file to begin the installation. It is .dmg for mac and .exe for windows. You will be
asked to confirm the installation. Click Continue button.

18 | P a g e
You are redirected to the Anaconda3 Installer.

19 | P a g e
Step 3) Next window displays the ReadMe. After you are done reading the
document, click Continue

Step 4) This window shows the Anaconda End User License Agreement. Click
Continue to agree.

20 | P a g e
Step 5) You are prompted to agree, click Agree to go to the next step.

Step 6) Click Change Install Location to set the location of Anaconda. By default,
Anaconda is installed in the user environment: Users/YOURNAME/.

21 | P a g e
Select the destination by clicking on Install for me only. It means Anaconda will
be accessible only to this user.

Step 7) You can install Anaconda now. Click Install to proceed. Anaconda takes
around 2.5 GB on your hard drive.

22 | P a g e
A message box is prompt. You need to confirm by typing your password. Hit Install
Software

The installation may take sometimes. It depends on your machine.

23 | P a g e
Step 8) Anaconda asks you if you want to install Microsoft VSCode. You can
ignore it and hit Continue

Step 9) The installation is completed. You can close the window.

24 | P a g e
You are asked if you want to move "Anaconda3" installer to the Trash. Click Move
to Trash

You are done with the installation of Anaconda on a macOS system

25 | P a g e
THE JUPYTER NOTEBOOK INTERFACE

Creating/Opening a JUPYTER NOTEBOOK, you can type either ANACONDA or


Jupyter Notebook in the START MENU

or Anaconda Navigator in the START MENU

26 | P a g e
or click on Anaconda Navigator (Anaconda 3)

Then the following two screens will appear then click OK

27 | P a g e
After clicking OK, then click LAUNCH on JUPYTER NOTEBOOK

or

28 | P a g e
and it will open in your browser

29 | P a g e
4. JUPYTER’S INTERFACE – THE DASHBOARD

✓ Duplicate/shutdown running file.

✓ Rename or delete folders

✓ Mark all folders/running files

Untitled 1 ipynb loaded running

2 ipynb

PLEASE TAKE NOTE: Cannot rename a running file

UPLOAD – can load a file – the python scripts, data sets or pdf documents into

the same folder as your jupyter notebook

NEW

Text file

Folders

Notebooks

30 | P a g e
FIRST CREATE A FOLDER FOR EACH PROJECT

Click NEW then select Folder

Check the box – before the Untitled Folder

Rename the FOLDER

31 | P a g e
Open the newly renamed FOLDER

Then click on NEW and then select PYTHON 3 then see a screen similar to the
one below

NOTEBOOK USER INTERFACE

When you create a new notebook document, you will be presented with
the notebook name, a menu bar, a toolbar and an empty code cell.

Notebook name: The name displayed at the top of the page, next to the

Jupyter logo, reflects the name of the MyFileName .ipynb file. Clicking on the

notebook name brings up a dialog which allows you to rename it. Thus,

32 | P a g e
renaming a notebook from “Untitled” to “My first notebook” in the browser,

renames the Untitled.ipynb file to MyFirstNotebook.ipynb .

Menu bar: The menu bar presents different options that may be used to
manipulate the way the notebook functions.

Toolbar: The tool bar gives a quick way of performing the most-used operations
within the notebook, by clicking on an icon.

Code cell: the default type of cell; read on for an explanation of cells.

NAMING
You will notice that at the top of the page is the word Untitled. This is the title
for the page and the name of your Notebook. Since that is not a very
descriptive name, let us change it!

Just move your mouse over the word Untitled and click on the text. You should
now see an in-browser dialog titled Rename Notebook. Let us rename this one
to Hello Jupyter:

33 | P a g e
Structure of a notebook document

CODE CELL

The notebook consists of a sequence of cells. A cell is a multiline text input field,
and its contents can be executed by using Shift-Enter, or by clicking either the

“Play” button the toolbar, or Cell , Run in the menu bar. The execution

behaviour of a cell is determined by the cell’s type. There are three types of
cells: code cells, markdown cells, and raw cells. Every cell starts off being
a code cell, but its type can be changed by using a drop-down on the toolbar
(which will be “Code”, initially), or via keyboard shortcuts indicated below.

Keyboard shortcuts
All actions in the notebook can be performed with the mouse, but keyboard
shortcuts are also available for the most common ones. The essential shortcuts
to remember are the following:

• Shift-Enter: run cell


Execute the current cell, show any output, and jump to the next cell
below. If Shift-Enter is invoked on the last cell, it makes a new cell below.
This is equivalent to clicking the Cell , Run menu item, or the Play button
in the toolbar.
• Esc: Command mode
In command mode, you can navigate around the notebook using
keyboard shortcuts.
• Enter: Edit mode
In edit mode, you can edit text in cells.

34 | P a g e
For the full list of available shortcuts, click Help , Keyboard Shortcuts in the
notebook menus.

For more information on the different things you can do in a notebook, see
the collection of examples on this following link:
https://nbviewer.jupyter.org/github/jupyter/notebook/tree/master/docs/sour
ce/examples/Notebook/

Code cells
A code cell allows you to edit and write new code, with full syntax highlighting
and tab completion. The programming language you use depends on
the kernel, and the default kernel (IPython) runs Python code.

When a code cell is executed, code that it contains is sent to the kernel
associated with the notebook. The results that are returned from this
computation are then displayed in the notebook as the cell’s output. The
output is not limited to text, with many other possible forms of output are also
possible, including matplotlib figures and HTML tables (as used, for example, in
the pandas data analysis package). This is known as IPython’s rich
display capability.

35 | P a g e
The MENU BAR

The Jupyter Notebook has several menus that you can use to interact with your
Notebook. The menu runs along the top of the Notebook just like menus do in
other applications. Here is a list of the current menus:
• File

• Edit

• View

• Insert

• Cell

• Kernel

• Widgets

• Help

Let us go over the menus one by one. I will not go into detail for every single
option in every menu, but I will focus on the items that are unique to the
Notebook application.

The first menu is the File menu. In it, you can create a new Notebook or open
a pre-existing one. This is also where you would go to rename a Notebook. I
think the most interesting menu item is the Save and Checkpoint option. This
allows you to create checkpoints that you can roll back to if you need to.

36 | P a g e
Next is the Edit menu. Here you can cut, copy, and paste cells. This is also where
you would go if you wanted to delete, split, or merge a cell. You can reorder
cells here too.

37 | P a g e
Note that some of the items in this menu are greyed-out. The reason for this is
that they do not apply to the currently selected cell. For example, a code cell
cannot have an image inserted into it, but a Markdown cell can. If you see a
greyed-out menu item, try changing the cell’s type and see if the item
becomes available to use.

The View menu is useful for toggling the visibility of the header and toolbar. You
can also toggle Line Numbers within cells on or off. This is also where you would
go if you want to mess about with the cell’s toolbar.

The Insert menu is just for inserting cells above or below the currently selected
cell.

38 | P a g e
The Cell menu allows you to run one cell, a group of cells, or all the cells. You
can also go here to change a cell’s type, although I personally find the toolbar
to be more intuitive for that.

39 | P a g e
The other handy feature in this menu is the ability to clear a cell’s output. If you
are planning to share your Notebook with others, you will probably want to
clear the output first so that the next person can run the cells themselves.

40 | P a g e
The Kernel cell is for working with the kernel that is running in the background.
Here you can restart the kernel, reconnect to it, shut it down, or even change
which kernel your Notebook is using.

You probably will not be working with the Kernel all that often, but there are
times when you are debugging a Notebook that you will find you need to
restart the Kernel. When that happens, this is where you would go.

41 | P a g e
The Widgets menu is for saving and clearing widget state. Widgets are
basically JavaScript widgets that you can add to your cells to make dynamic
content using Python (or another Kernel).

Finally, you have the Help menu, which is where you go to learn about the
Notebook’s keyboard shortcuts, a user interface tour, and lots of reference
material.

42 | P a g e
JUPYTER’S INTERFACE – Prerequisites for coding

INPUT FIELD
• Green borders and pen show that you are in the Edit Mode
• To close EDIT MODE PRESS ESC so that you can go to the
COMMAND MODE

Left margin turns BLUE & Input field back to grey

• To execute this code can either press CTRL + ENTER OR press the RUN
Icon in the TOOL BAR

Red Out [1] [1, 2, 3, 4 ] output field

The OUTPUT FIELD cannot be modified

SHIFT + ENTER
CUT, COPY & PASTE CELLS

43 | P a g e
X – allows you to copy a cell

Then Use V – to Paste copied cell

C – Copy

allows you to move the IN & OUT FIELDS TOGETHER either up or down

for executing a code

In [*] Might take long to complete

Can use the … button to stop the code

TO INSERT ROWS ABOVE (A) AND BELOW (B)

Have the command field in Command mode [BLUE EDGE]

A – adds row(s) above

B – adds row(s) below

DELETING A CELL

Select it [BLUE EDGE] and press D x2

44 | P a g e
NOTEBOOK CELLS

There are 4 types of cells in a Jupyter Notebook:

CODE CELLS

- This is the cell where you write your python code that will be computed by
the ipython kernel and the output is displayed under the cell.

- Contents in this cell are treated as statements in a programming language


of current kernel. Default kernel is Python

- Here is an example of a code cell.

- When such cell is run, its result is displayed in an output cell. The output may
be text, image, matplotlib plots or HTML tables. Code cells have rich text
capability.

MARKDOWN CELL

- This is where you add the documentation by putting text formatted using

Markdown. The output is displayed in place of the cell when it is run

- All kinds of formatting features are available like making text bold and
italic, displaying ordered or unordered list, rendering tabular contents etc.

- Markdown cells are especially useful to provide documentation to the


computational process of the notebook and not executed as a CODE.

45 | P a g e
To convert a cell into a Markdown Cell, select markdown from dropdown
menu as shown below

Select the CELL and then PRESS M (the key shortcut) and then start typing

NB to convert the Markdown cell back Code ➔ select CODE

OR

Select the desired cell and PRESS Y (the key shortcut)

Advantages of using Jupyter


1. As the code becomes longer, markdown cells allow you to leave
comments, explaining how you have created the solution
2. Can select any cell you need to run and you need not run all the cells –
allows for solving a problem in the pieces saves a lot of computational
time.

You can use backslash to generate literal characters which would otherwise
have special meaning in the Markdown syntax.

\*literal asterisks\*
*literal asterisks*

Use double backslash to generate the literal $ symbol.

46 | P a g e
RAW CELLS

Contents in raw cells are not evaluated by notebook kernel. When passed
through nbconvert, they will be rendered as desired. If you type LatEx in a raw
cell, rendering will happen after nbconvert is applied.

- The Raw NBConvert cell type is only intended for special use cases when
using the nbconvert command line tool.
- Basically it allows you to control the formatting in a very specific way when
converting from a your jupyter notebook into another file format like PDF,
HTML, etc

HEADING

- This is the same as writing a heading (Line starting with #) in Markdown. To


add a headline to your notebook you can use this.
- Be it any type of cell, you are required to run each cell to see the output.
You can either use the Run option in the taskbar on top or you can press the
command (Shift + Enter) on your keyboard.
- While each cell is running it will show the cell marked with an asterisk
like In[*] which will turn into In[1] once the output is obtained.

47 | P a g e
MAKING TITLES AND SUBTITLES
You make titles using hashtags. A single hashtag gives you a title, two
hashtags gives you a subtitle and so on as shown below:
# Title

## Subtitle

### SubSubtitle etc

Bold and italics


To make text bold enclose it between two asterisks:

** bold **

Or to italicise, use one asterisk

* italics *

(This also works in WhatsApp by the way)

You can make bulleted lists using asterisks:

to mark each point

* bullet 1

* bullet 2

48 | P a g e
PERFORMING BASIC MATHEMATICAL CALCULATIONS

Python Operator Pandas Method(s)

+ add()

- sub(), subtract()

* mul(), multiply()

/ truediv(), div(), divide()

// floordiv()

% mod()

** pow()

SHARING A NOTEBOOK FILE


1. Select the FILE
2. Click Open the Jupyter notebook you want to save
3. Click on FILE and select DOWNLOAD, then select file will be
downloaded then the file will be saved with an extension
MyFileName.ipynb in your FOLDER
4. You can then attach that file to your email and share that with anyone;
your colleagues, supervisors or team members

WELCOME TO THE WORLD OF PYTHON

&

HAPPY CODING!!

49 | P a g e

You might also like