You are on page 1of 17

Dr Mohd Hilmi Hasan

DATA ANALYTICS
OAU5362/DAM5362

May 2021
OUTCOMES

At the end of this session, you will be able to:


• Demonstrate understanding of Python data structure.
• Solve data management problems in Python.

2
OUTLINE

• Jupyter Notebook
• First Python codes and Variables
• Variables and Strings
• Operators
• Decision
• Repetition
• Functions
• Library and Array
• Data Frame (Table)

3
JUPYTER NOTEBOOK

• Tool used for Python programming

• “Notebook” - documents containing code and rich text elements i.e. figures, links, equations – for data analysis

• Jupyter – Julia + Python + R

• Installation:
1.Download Anaconda from https://www.anaconda.com/products/individual (find latest Python version)
2.Install the Anaconda by following the instructions on the download page (choose default settings)
3.Well done!

• To open Jupyter Notebook, click Start → Jupyter Notebook (Anaconda 3). The apps will be opened in the
browser on the link http://localhost:8888/

4
JUPYTER NOTEBOOK

• The tabs:
o Files – location where files are kept
o Running – shows the works that are in progress
o Clusters – allows parallel computing framework
• To start a new notebook, click New → Python 3 (Fig. 1). Folder can also be created to organized our files.
• New notebook is shown in Fig. 2
• Change the notebook name by clicking on the “Untitled” (Fig. 3)

Fig. 1

Fig. 2

Fig. 3
5
FIRST PYTHON CODES & VARIABLES

• Type and run (execute) the following code: 2 + 2

• To run the code, either click “Run” button OR press Shift + Enter OR Ctrl + Enter (cell must be selected)

• To create new cell, click “+” button OR Esc + a (new cell created above) or Esc + b (new cell created below)

• Type and run: print (“Hello World!”)

• In a single cell type and run the following:


▪ a = 10
▪ b = 15
A variable name:
▪ c = a + b •Must start with a letter/underscore character
▪ d = a * b •Cannot start with a number
Q: What do the above codes do? •Can only contain alpha-numeric characters and
underscores (A-z, 0-9, and _ )
•is case-sensitive (name, Name and NAME are three
• To display the content of the variable, either use different variables)
print (variable name) or just type the variable name.
6
VARIABLES & STRING

• String:
▪ fr1 = "banana"
fr2 = "mango"
fr3, fr4, fr5 = "rambutan", "durian", "water melon"

▪ print("I like to eat " + fr4)


▪ print(fr1 + " and " + fr5 + " are in season year-round")

▪ print(fr2[0])
▪ print(fr2[3])
▪ print(len(fr3))

▪ frnew = fr5.split(" ")


print(frnew[0])
print(frnew[1])
Comment:
-Comment skips codes from being executed
-Comment in Python starts with #
-Comments are normally used to describe codes
7
OPERATORS
• Other than the normal mathematical operators i.e. +, -, *, /, Python also understands the comparison and
logical operators that result in either TRUE or FALSE

• Evaluate the following codes (comparison operators):


▪ x = 10
▪ y = 11
▪ print(x == y)
▪ print(x != y)
▪ print(x > y)
▪ print(x < y)
▪ print(x <= y)
▪ print(x >= y)

• Evaluate the following codes (logical operators):


▪ st1 = x < y and x == 11
st2 = x < y or x == 11
print(st1)
print(st2)
8
DECISION
• Decision is a process of checking for condition, and determining actions according to the condition

• Type and run:


cr = 1.5
if ( cr == 1.5 ) :
print ("Warning!")

Q: change the cr value to other number and see what happens

• Type and run: • Try:


cr = 0.8 cr = 1.5
if ( cr == 1.5 ) : if ( cr >= 1.5 ):
print ("Warning!") print ("Critical")
else : elif (cr >=1.0 and cr <1.5):
print("Normal") print("Warning")
else:
print("Normal")

Q: Change cr to 1.1 and 0.7, see the output 9


DECISION
• Try:
spe = 9505
pre = 13000
tem = 165

if ( spe >= 9500 ):


if ( pre >= 12800):
if (tem >= 150):
print ("Equipment FAIL")

Q: change the spe, pre and tem values to other number and see what happens

• Compare with:

if(spe>=9500 and pre>=12800 and tem>=150):


print ("Equipment FAIL")

10
REPETITION

• Repetition (a.k.a loop) is a process to execute the block of codes for several times. This is done based on
condition.

• Type and run:


for x in range(6):
print(x)

• Type and run:


for x in range(6):
print(x, end=‘ ’)

• Type and run:


for num1 in range(3):
for num2 in range(10, 14):
print(num1, ",", num2)

11
FUNCTIONS

• A function is a block of codes that becomes executed when it is called – using its name.

• So far, we have seen the print function that displays the values we supply in the parenthesis (this is called
arguments).

• Print function and many others are predefined functions provided by the tools/library.

• Other than predefined functions, we may also create functions, and these are known as user-defined functions.

• Type and run: • Type and run:


▪ def ex_function():
print("Hello from a function") ▪ def bmi_score(w,h):
return w/(h*h)
▪ ex_function() #function call
▪ print(bmi_score(90,1.75))
▪ print(bmi_score(51,1.53))
▪ print(bmi_score(45,1.51))
▪ print(bmi_score(89,1.65))
12
LIBRARY & ARRAY
• Python is provided with numerous kinds of library whereby each of them contains functions that we may use
in our code.

• So far, variables that we have seen are normal variables – store a single value only. E.g.:
num1 = 3
num1 = 3 * 12
print(num1)

• An array variable can store multiple values. To utilize array, the numpy library is used.
import numpy #importing library
arr = numpy.array([10, 22, 35, 44, 51]) #using function
print(arr)
• Type and run: • Type and run:
▪ print(arr[1]) ▪ print(arr[3:])
▪ print(arr[4]) ▪ print(arr[:3])
▪ ans = arr[0]*arr[3] ▪ print(arr[-2:])
print(ans) ▪ print(sum(arr))
▪ print(arr[0:3]) 13
LIBRARY & ARRAY
• Try:

a_list = numpy.array([1,25,"Three"])
print(a_list[0]+a_list[1])

• Try this also:

b_list = numpy.array([1,25,3])
print(b_list[0]+b_list[1])

14
DICTIONARY
• A dictionary is a collection of unordered, changeable and indexed data.

• Type and run:


cars = {
"brand": "Proton",
"model": "Preve",
"year": 2015
}
print(cars)
m = cars['model']
m = cars.get('year')

• To add more item:


cars["color"] = "Blue"
print(cars)

• To delete item:
del cars["model"]
print(cars) 15
DATA FRAME (TABLE)

• Python can also handle data in table form (data frame)

• Type and run:


import pandas as pd
data = {'Name':['Carrol', 'Mike', 'John'],'Gender':['Female', 'Male', 'Male'],
'Height':[160,175,173], 'Weight':[49,89,77], 'Age':[35,36,41]}
df = pd.DataFrame(data)
print(df)

• Type and run:


▪ df['Height'] ▪ df.iloc[:,2]
▪ df.loc[:,'Height’] ▪ df.iloc[2]
▪ df.loc[:,['Name','Age']] ▪ df.iloc[2,4]
▪ df[['Name','Age’]]
▪ df.loc[2] Compare these two codes (using print and sum
▪ df.loc[1:2] functions):
▪ df.loc[[1,2]] ▪ h1=df[['Height']] sum(h1)
▪ df.loc[[0,1],['Name','Weight']] ▪ h2=df['Height'] sum(h2)
16
17

You might also like