MBAS901
Essential Elements for Business Analytics
Lecture : Foundations of Python For
Data Analytics
Algorithm
•Algorithm: is the step by step solution for the given
problem.
•The algorithm can be expressed in plain spoken language
and/or series of calculations, or charts such flow charts.
•Algorithm must be
•Have clear and specific instructions.
•Do not miss any steps.
•Give steps in correct order.
2
Algorithm: example
•Make a cup of black tea 1. Get an empty cup of tea
2. Determine the type of tea
3. Add the selected tea bag in the cup
4. Determine if sugar is needed
5. Add the amount of sugar needed
6. Add water to a kettle
7. Turn on heat under the kettle
8. Bring water to a boil
9. Add boiling water into the cup
10. Stir the cup with the tea bag
11. Remove tea bag after a minute
12. Serve the tea
13. Done
3
Algorithm in computer terms
•An algorithm is a step-by-step process producing a solution for a
problem which can be translated into computer code.
•There are four main components of an algorithm: data
acquisition, sequence, selection and iteration.
•Writing an algorithm to solve a simple problem may not require
all these components.
•An algorithm is usually written in plain (simple) human language,
and then translated to a high-level programming language such as
Python.
4
Why Python Programming Language?
The ability to connect to a wide range of data sources,
integrate with many applications including machine learning,
artificial intelligence, motion graphics, etc.
Package for scientific computing in Python.
SciPy (pronounced “Sigh Pie”) mathematics, science, and engineering
High-performance, easy-to-use data structures and data analysis
tools.
5
Matplotlib is a Python 2D plotting library
Getting Started with Python
• [Link]
• Login to you Google account
• Open a Python Notebook (.ipynb)
• Write the first Python Program.
• print("I love Business Analytics")
• The above line will execute
• It will display “I love Business Analytics” on screen
• ‘Execute’ = processed by the computer ( not hanged ! )
• Every line you write in the notebook is executed !
• If you do not want to execute, write it as a comment (using #)
# this is some text for fun
• Save your notebook (as .ipynb file)
6
Data Types
• In our daily life we use numbers and text to communicate
and perform certain processes.
• The primitive data types are numbers and text.
• Objects such as images and files are also part of data
types.
7
Data Types
octal hex long (not
decimal binary
int 0o32 0x1A available in
26 0b11010
Python 3)
26L
Examples:
Numeric float 10.5
Types 0.105e2
complex Examples: 1 + 3.14j
1 – real, 3.14 - imaginary
' hello world '
string " hello world "
''’ use this for
Non – numeric multi line string '''
Types
True
bool
False
8
Variables
Variables are used to store data from one data type.
The variable fname is used to store the first name of a person.
The variable fname is of type string.
The variable age is used to store the age of a person
The variable age is of type int or float
fname= "BoB"
age = 35
print(fname)
# Displays BoB
print(age)
# Displays 35
print("fname")
fname age
# Displays fname
print("age")
# Displays age
9
Getting User Input
•Python uses the input command to get input from the user
•All input is stored as text (string data type)
capital = input("What is capital of UAE ? ")
print(capital)
•If required , the input may be converted to number (int or float data type)
emirates = input("How many emirates are there in UAE ?
")
emirates = int(emirates)
print(emirates)
10
Arithmetic Operators
Operator Description Example
+ Addition Adds values on either side of the operator. Z=a+b
- Subtraction Subtracts right hand operand from left hand Z=a–b
operand.
* Multiplication Multiplies values on either side of the operator Z=a*b
/ Division Divides left hand operand by right hand operand Z=b/a
% Modulus Divides left hand operand by right hand operand Z=b%a
and returns remainder
** Exponent Performs exponential (power) calculation on Z = a**b
operators a to the power b
// Floor Returns the integral part of the quotient. Z = a // b
11
Exercise: Operators and Expressions.
Convert the following mathematical expression to Python
expression
1 2
4
3
12
Selection or Decision
Selections: if statement
• In life there are moments when we need to decide on something using
conditions.
• Example: “If it is raining outside, then I carry umbrella.”
• Another example, if you would like to buy an item and you found two similar
products. You will then put some conditions such as a price limit, rating of the
product by experts, etc.
13
Selection or Decision
Selections: if statement (Cont)
In programming such logic is written using if statement.
If the answer to the if statement is True, then a certain process
will be executed, else other processes will be executed.
Python programming language provides following types of decision making
statements.
• If..else
• nested if
• nested else if
14
Selections: if statement
Structure of if statement in Python:
Rules:
• A lower case if keyword must be used
• The if statement must end with a colon :
• The keyword else must be lower case and start at
the same level as the keyword if
• The keyword else must end with a colon :
15
15
Condition with non numeric
capital = input("What is capital of UAE ") Input
if(capital == "Abu Dhabi"): Condition
print("Yes you are correct ") When True
else: Otherwise
print("No, it is Abu Dhabi ") When False
Sample output
Note: String comparision is case sensitive.
16
Python Comparison Operators
Operator Description Example
Equal == If the values of two operands are equal, then the 2 == 2 [True]
condition becomes True, otherwise False. 3 == 2 [False]
"sum" == "sum” [True]
"ABC" == "Abc” [False]
Not Equal != If values of two operands are not equal, then condition 2 != 2 [False]
Not Equal <> becomes True, otherwise False. 3 != 2 [True]
"sum" != "sum” [False]
"ABC" != "Abc” [True]
Greater than > If the value of left operand is greater than the value of 3 > 2 [True]
right operand, then condition becomes True. 3 > 3 [False]
Greater than equal >= If the value of left operand is greater than or equal to the 3 >= 2 [True]
value of right operand, then condition becomes True. 3 >= 3 [True]
3 >= 4 [False]
Less than < If the value of left operand is less than the value of right 2 < 3 [True]
operand, then condition becomes True. 3 < 3 [False]
Less than equal <= If the value of left operand is less than or equal to the 3 <= 4 [True]
value of right operand, then condition becomes True. 3 <= 3 [True]
4 <= 3 [False]
17
Logical Operators
When you have more than one condition in the same if statement [compound
condition], then you need to use a logical operator. These logical operators simply
allow you to request that both conditions must be met or only one of them.
• If both are conditions must be True then use and.
• If Any one of the conditions is True then use or.
Operator Description Example
and If both the operands are 3>7 and 2<3
true then condition
becomes true.
or If any of the two operands 7 > 7 or 2 < 3
are non-z ero then
condition becomes true
Example
A child is eligible to age= input("What is your age") Input
enter a ride if its age is age= int(age)
between 4 to 10. Write if(age > 3 and age < 11): Condition
a Python program to
read child age, decide print("Yes you are eligible to ride ") True
and display if the child else: Otherwis
is eligible for ride or e
print("Sorry you not eligible to ride") False
not.
Iterations: The for loop
The for loop repeats a block of code for number of times. In
this example, we will repeat
counter
for i in range(4):
for loop
number of
timers to
repeat
20
Version 1: For loop with only end value
for i in range(endValue):
Statements
The starting value of loop is 0. In actual fact, range(4) generates four
numbers 0,1,2 and 3 and the counter takes on each value one at a time.
Example Output
21
Version 2: For loop with start and end value
for i in range(startValue, endValue):
Statements
The starting value of loop can be changed to any given number.
Note: Start value must be less than the end value, otherwise loop will not
be executed.
Output
Example
22
Version 3: For loop with increment value
for i in range(startValue, endValue, stepValue):
Statements
The starting value of loop can be changed to any given number. Step
value can be change from 1 to any value.
Note: Step value must be negative if start value is greater than end
value.
Example Output
23
What is Pandas?
• A Python library is a collection of program code that can be used
repeatedly in different programs. It makes Python Programming
simpler and convenient for the programmer.
• Pandas is an open source library providing high-performance,
easy-to-use data structures and data analysis tools for the
Python programming language.
• Pandas allows data users to work with high-level building blocks
for doing practical, real world data analysis in Python.
Importing Pandas to Python
Similar to importing turtle library as we did in unit 1, the following
line imports pandas and create a pandas object with the name pd.
From now and on, you can use the object pd to perform pandas
operations.
Data Files
• CSV datafiles are very common and in a safe format to work with data.
• These files have the extension .csv
• They and can be opened, edited and saved in Microsoft Excel or Notepad
• In Python, we will work with data from CSV files
Accessing CSV file and getting familiar with the data set
You need to download the data from a Comma Separated Value (CSV) file into a Pandas
Dataframe
The file name with
the full extension.
This line
imports
pandas and
create a pd
object
Viewing Sample Data
You can view sample data from the top or bottom of the dataset
Display top 10 rows of the dataset.
[Link](5)
NOTE : In Python, Counting Starts with Zero
Display Data in one or more columns
Example: Display data stored in column “Question1” only.
df[“Question1”]
29
Describing the data of a column
To display the summary of a column including the number of records,
minimum, maximum, mean, and standard deviation you need to use the
function .describe(), as follows:
Dataset[‘ColumnName’].describe()
Example:
Display the summary of column
“Question1”.
df[‘Question1’].describe()
30
Working with loc in Pandas function
The pandas loc function allows us to search
and slice data based on both index and
columns. It is a powerful tool to allow us to
focus on the important rows and columns
for our data analytics.
Working with loc in Pandas function
Represents the The colon This represent This
first row in your separates the last row in comma
targeted data. the start your targeted separate
If you want and end data. If you s rows
data starting of the want all data and
from row zero, rows. It is to the end of columns.
then leave it a ‘must the set, then It is a
empty, have’. leave it empty. ‘must
have’.
The name of Please note Here you specify The colon Here you specify
your data the use of the first column separates the last column
frame object. square name. Please the start name. Please
In our example bracket. note that you and end of note that you
this is data2. Normal should use column the should use column
bracket will name and not columns. It name and not
not work. numbers. is ‘must numbers.
have’.
Working with loc in Pandas function
Example:
Display rows 5 to 10 and only columns “Question1” and “Question2”.
[Link][5:10,"Question1":"Question2"]
Note that you need to use
the index of the rows and
the name of the column.
In this example the index is
5:10
The column
“Question1”:”Question2”
Working with loc in Pandas function 2
You can display columns that are not in sequence. For example, you can
display Question1 and Question2.
To display selected columns or rows, you need to add them inside a
square bracket [ ].
Example:
Display rows 3, 8, and 20 and Columns “Question1”
and “Question4”.
[Link][[3,8,20],["Question1","Question4"]]
Sorting data
Sorting data is a simple technique that display data in a
ascending or descending order based on one or more
columns. The function that you need is .sort_values()
Syntax
By default, the data will be sorted in ascending order.
df.sort_values(‘‘Question1”)
35
Writing data to external file
Example:
Write the data you cleaned in the previous example to an external file.
The above lines store the DataFrame data in the an Excel file
‘[Link]’ in a sheet with the name ’Sheet1’.
36
Summary of Pandas Commands
Commands highlighted in yellow are covered in this course
Statistics
Reading or Importing Data [Link]() | Summary statistics for numerical columns
[Link]() | Returns the mean of all columns
pd.read_csv(filename) | From a CSV file [Link]() | Returns the correlation between columns in a DataFrame
pd.read_table(filename) | From a delimited text file (like TSV) [Link]() | Returns the number of non-null values in each
pd.read_excel(filename) | From an Excel file DataFrame column
pd.read_html(URL) | From HTML page [Link]() | Returns the highest value in each column
[Link]() | Returns the lowest value in each column
Selection [Link]() | Returns the median of each column
[Link]() | Returns the standard deviation of each column
df[col] | Returns column with label col as Series
df[[col1, col2]] | Returns columns as a new DataFrame Viewing/Inspecting Data
[Link][0,:] | First row [Link](n) | First n rows of the DataFrame
[Link][0,0] | First element of first column [Link](n) | Last n rows of the DataFrame
Data Cleaning [Link]() | Number of rows and columns
[Link]() | Index, Datatype and Memory information
[Link] = ['a','b','c'] | Rename columns
[Link]() | Summary statistics for numerical columns
[Link]() | Drop all rows that contain null values
[Link](x) | Replace all null values with x
[Link](columns={'old_name': 'new_ name'}) | Selective renaming
Exporting/Writing Data
df.set_index('column_one') | Change the index
df.to_csv(filename) | Write to a CSV file
Filter, Sort, and Groupby df.to_excel(filename) | Write to an Excel file
df[df[col] > 0.5] | Rows where the column col is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows where 0.7 > col > 0.5
df.sort_values(col2,ascending=False) | Sort values by col2 in descending order
[Link](col) | Returns a groupby object for values from one column
[Link]([col1,col2]) | Returns groupby object for values from multiple columns 37
Questions
•Python Tutorial
•[Link]
•Python Software Online
•[Link]
38