You are on page 1of 82

Selected Topics in Computer Science(CoSc4181)

Lecture 01: Data Science through Python


Programming Language

Department of Computer Science


Dilla University

By: Tsegalem G/hiwot


2022 G.C

1
Part I: Contents of Python programing
• Introduction to Python

• Why Python in Data Science

• Python programing basic structure

• Python Variables, data types

• Python Operators

• Python Conditional Statements

• Python For & While Loops

• Function

• Modules

2
Introduction to Python
 Python is a high-level, interpreted, interactive and object-
oriented scripting language.

 Python is designed to be highly readable.

 It uses English keywords frequently where as other languages


use punctuation, and it has fewer syntactical constructions
than other languages.
Why Python in Data Science
 Very simple and readable
 Powerful libraries
 Free and open source
 Amazing community
 An Integrated Development Environment (IDE) brings the
programmer’s entire tool into one convenient place.
Example
o Integrated Development and Learning Environment(IDLE)
o Pycharm
o Thonny
Python Hello World — Create Your First Python Program

Print(“Hello World”)
Note:
 Python Identifiers: A Python identifier is a name used to identify a variable,
function, class, module or other object.

 Reserved Words: These are reserved words and you cannot use them as
constant or variable or any other identifier names.

 Lines and Indentation: In python, blocks of code for class, function definitions
or flow control are denoted by line indentation, which is rigidly enforced.

 Quotation in Python: Python accepts single ('), double (") and triple (''' or """)
quotes to denote string literals, as long as the same type of quote starts and
ends the string.

 Comments in Python: Comments can be used to explain the code, to improve


readability of the code and to prevent execution when testing code.

• Single line Comment (#)

• Multi-line Comments(""“)
Python Variables, data types
 Variables are used for holding data values so that they can be
utilized in various computations in a program.

 Variables can store data of different types, and different types


can do different things.
• Numeric Types: int, float, complex
• Sequence Types: list, tuple, range
• Mapping Type: dict
• Boolean Type: bool
– counter=100 # An integer assignment
– miles=1000.0 # A floating point
– name="John" # A string
– print(counter)
– print(miles)
Collecting User Input

• str=input("Enter your input: ")


• print("Received input is : ", str)
Collecting User Input
o Example 1:
message = input("Tell me something, and I will repeat it back to you: ")
print(message)

Example 2:
age = input("How old are you? ")
print(“age” )
Print(age)

Example 3:
age = int(input("How old are you? "))
print(“age” )
Print(age)
Operators in Python
 Operator in python are used to perform operations between
variables

 Constructs or special symbols that are used to manipulate the


value of the operand.

 The values used in an operation are called operand

Operand Operator Operand

Result

10
Types of operator in Python
1.Arithmetic Operators

Are used to perform arithmetic operations between variables

addition(+)

subtraction(-)

Multiplication(*)

Division(/)

modules(%)

Exponentiation(**)

11
Cont…
2.Assignment operator

used to assign values of right operand to the left operand.

Operator Example

= x=10

+= x+=10 is equal to x=x+10

-= x-=10 is equal to x=x-10

*= x*=10 is equal to x=x*10

/= x/= 10 is equal to x=x/10

**= x**=10 is equal to x=x**10

%= x%=10 is equal to x=x%10

12
Cont…
3. Comparison operator

used to compare two values.

1.Equals(==)-print true if the operands ere equal otherwise print


false

Example. a=3 ,b=4 print(a==b)-prints False

2. Not Equals(!=) print True if operands are not equal otherwise


print false

3.Greater than(>)

4 .Greater than or equal(>=)

5 .Less than(<)

6.Less than or equal(<=) 13


Cont…
4. Logical Operators.
used to combine conditional statements.

1.Logical AND- Returns true if both condition are true ,otherwise it


print false.

E.g. a=3,b=4 ,c=5,d=6

print(a<b) and (c<d)=prints True.

2. Logical OR- Returns False if both condition are true, otherwise it


prints true

E.g. a=3,b=4 ,c=5,d=6

print(a>b) or (c>d)=prints False.

3.Logical not- return negation . E.g. a=4>5. not(a)=print True. 14


Cont…
5. Identity Operators.

identity operators are used to compare objects.

objects are anything in python(like data type, variables).

1.IS- returns True if both variables are same object

Example. a=15,b=15 print(x is y) = print True

a=15,b=5 print(x is y) =print False

2. Is not –Returns True if both variables are not same object.

Example. a=15,b=15 print(x is not y) =print False

a=15,b=5 print(x is not y) =print True

15
Cont…
6. Membership Operator

are used to check is a sequence is present in an object.

1. IN -returns True if sequence with the specified value is present


in the object

Example. List1=[1,2,3,5,6)

list2=[1,2,3]

print(list2 in list1)=print True, because all list1


element present in list2.

2. NOT IN- returns True if a sequence with the specified value is


not present in the object. Example based on the above example,
16
Print(list1 not in list2)-prints True.
Cont…
7.Bitwise operators: are used to compare binary numbers.

1.Bitwise AND($)- sets each bit to 1 if both bits are1. e.g.1010(10)


&1000(8)=1000(8)

2.Bitwise OR(|) –sets each bit to 1 if one of the bits is 1.


e.g.1010(10) |1000(8)=1010(1010)

3.Bitwise XOR(^)- compare two input bits and generate 1 if the


bits are different, and 0 if the bits are same.
x Y X^y
0 0 0
0 1 1
1 0 1
1 1 0 17
Cont…
4.Bitwise not(~)-inverts all bits(changes bits from 0 to 1 and
vice versa)

5.Left Shift(<<)- shift left by pushing in zeros from the right and
let the leftmost bits fell off.

Example.10 in binary=1010 and 10<<2=1000(8)

6.Right Shift(>>)- shift right by pushing in zeros from the right


and let the rightmost bits fell off.

Example.10 in binary=1010 and 10<<2=0010(2)

18
Python Collections
 There are four collection data types in the Python programming
language:

– List is a collection that is ordered and changeable. Allows


duplicate members.

– Tuple is a collection that is ordered and unchangeable. Allows


duplicate members.

– Set is a collection that is unordered and unindexed. No


duplicate members.

– Dictionary is a collection that is unordered, changeable, and


indexed. No duplicate members.
List
 List is a class of data structure, used to store multiple items in
one variable and can be created using square brackets.

 Example: mylist=["apple", "banana", "cherry", "orange"]


print(mylist)

print(mylist[1])

print(mylist[-1])

print(mylist[1:3])

 Some Method used in list are: append(), clear(), copy(),


count(), extend(), index(), insert(), pop(), remove(),
reverse(), sort(), etc
20
Tuple
 A tuple is like a list, except you can't change the values in a tuple
once it's defined and can be created using parentheses.

 Access Items: You access the list items by referring to the index
number:
 Example: mylist=("apple", "banana", "cherry", "orange“)

print(mylist)

print(mylist[1])

print(mylist[-1])

print(mylist[1:3])

 Some Method used in list are: cmp(tuple1, tuple2), len(tuple) ,


max(tuple), min(tuple) , tuple(seq) , etc 21
Tuple
 A tuple is like a list, except you can't change the values in a tuple
once it's defined and can be created using parentheses.

 Access Tuple Items: You can access tuple items by referring to


the index number, inside square brackets:

 Example: mytuple=("apple", "banana", "cherry", "orange“)


print(mytuple)

print(mytuple[1])

print(mytuple[-1])

print(mytuple[1:3])

 Some Method used in list are: cmp(tuple1, tuple2), len(tuple) ,


max(tuple), min(tuple) , tuple(seq) , etc 22
Dictionary
 A dictionary is a collection that is unordered, changeable, and
indexed. In Python, dictionaries are written with curly brackets,
and they have keys and values.

 Accessing Items: You can access the items of a dictionary by


referring to its key name, inside square brackets:
mydictionary={1:"apple", 2:"banana", 3:"cherry", 4:"orange“}

Print(mydictionary)

 Some Method used in list are: dict.clear() , dict.copy() ,


dict.fromkeys() , dict.get(key, default=None) , dict.has_key(key),
dict.items() , dict.keys() , dict.setdefault(key, default=None),
dict.update(dict2), dict.values(), etc 23
Set
 A set is a collection that is unordered and unindexed. In Python,
sets are written with curly brackets.

• Access Items: You cannot access items in a set by referring to an


index since sets are unordered the items has no index.

• But you can loop through the set items using a for loop, or ask if a
specified value is present in a set, by using the in keyword.
myset=("apple", "banana", "cherry", "orange“)

print(mytuple)

 Some Method used in list are: add(), clear(), copy(), difference(),


difference_update(), discard(), intersection(), intersection_update()
,isdisjoint(),issubset(),issuperset(),pop(),remove(),symmetric_difference(),s
24
ymmetric_difference_update(),union(),update(), etc
Python Conditional Statements
 The order in which statements are executed is called flow
control (or control flow).

 Flow control in python program is typically sequential.

 Flow control determines what is executed during a run and what


is not, therefore affecting the overall outcome of the program.

 The control flow of a Python program is regulated by conditional


statements and loops.
If statements
 An "if statement" is written by using the if keyword.

 Syntax for the if statement:


if expression1:
statement(s) Example 2:
Example 1:
a=int(input(“Enter any num”))
a=2 if a%2==0:
b=3 print(“a is Even")
if b>a:
print("b is greater than a")

• Note: Python relies on indentation (whitespace at the beginning of a


line) to define scope in the code. Other programming languages
often use curly-brackets for this purpose. 26
If-else
 The if-else allows you to specify two alternative statements one
which is executed if a condition is satisfied and one which is
executed if the condition is not satisfied.

 Syntax for the if-else statement:


if expression1:
statement(s1)
else:
Example 2:
statement(s2)
Example 1: a=int(input(“Enter any num”))
if a%2==0:
a=2
b=3
print(“a is Even")
else:
if a>b:
print(“a is Odd”)
print(“a is greater")
else:
print(“b is greater") 27
elif
 The elif saying "if the condition in expression1 is not satisfied, then
try the condition in expression2".

 Syntax for the elif statement: if expression1:


if expression1: statement(s1)
statement(s1)
elif expression2:
elif expression2:
statement(s2)
statement(s2)
elif expression3:
Example 1::
a=3 statement(s3)
b=3
if b > a: ...
print("b is greater than a") else:
elif a==b:
28
print("a and b are equal") statement(sn)
Cont…

 Example,
a=20
b=33
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")
else:
print("a is greater than b")
Cont…
and
 The and keyword is a logical operator, and is used to combine
conditional statements:

 Example: Test if a is greater than b, AND if c is greater than a:

a=200

b=33
c=500
if a > b and c > a:
print("Both conditions are True")
cont…
Or
 The or keyword is a logical operator, and is used to combine
conditional statements:

Example: Test if a is greater than b, OR if a is greater than c:

a=200
b=33
c=500
if a>b or a>c:
print("At least one of the conditions is True")
cont…
Nested If
 You can have if statements inside if statements, this is
called nested if statements.
 Example
x=41
if x>10:
print("Above ten,")
if x>20:
print("and also above 20!")
else:
print("but not above 20.")
Loops
 A loop statement allows us to execute a statement or group of
statements multiple times.

 Python has two primitive loop commands:

– while loops

– for loops
The while Loop
 With the while loop we can execute a set of statements as long
as a condition is true.

 Syntax for the while loop statement:

while expression:

statement(s)

• Example: Print i as long as i is less than 6:

i=1
while i<6:
print(i)
i+=1
The break Statement
• With the break statement we can stop the loop even if the
while condition is true:

• Example: Exit the loop when i is 3:


i=1
while i<6:
print(i)
if i==3:
break
i+=1
The continue Statement
• With the continue statement we can stop the current
iteration, and continue with the next:

• Example: Continue to the next iteration if i is 3:


i=0
while i<6:
i+=1
if i==3:
continue
print(i)
Python for loops
 A for loop is used for iterating over a sequence (that is either a
list, a tuple, a dictionary, a set, or a string.
 Example: Print each fruit in a fruit list:
fruits=["apple", "banana", "cherry"]
for x in fruits:
print(x)
Looping through a String
 Even strings are iterable objects, they contain a sequence of
characters:
• Example: Loop through the letters in the word "banana":
for x in "banana":
print(x)
break Statement in Python for loops
 With the break statement we can stop the loop before it has
looped through all the items:

 Example: Exit the loop when x is "banana":

fruits=["apple", "banana", "cherry"]


for x in fruits:
print(x)
if x=="banana":
break
Cont…
 Example: Exit the loop when x is "banana", but this time the
break comes before the print:

fruits=["apple", "banana", "cherry"]


for x in fruits:
if x=="banana":
break
print(x)
The continue Statement in Python for Loop
 With the continue statement we can stop the current
iteration of the loop, and continue with the next:

 Example: Do not print banana:

fruits = ["apple", "banana", "cherry"]


for x in fruits:
if x == "banana":
continue
print(x)
The range() Function
 To loop through a set of code a specified number of times, we can
use the range() function,
 The range() function returns a sequence of numbers, starting
from 0 by default, and increments by 1 (by default), and ends at a
specified number.
Example: Using the range() function:
for x in range(6): Example,
print(x)
Increment the sequence with 3:
Example:
for x in range(2, 30, 3):
Using the start parameter:
print(x)
for x in range(2, 6):
print(x)
Nested Loops
 A nested loop is a loop inside a loop.

 The "inner loop" will be executed one time for each iteration of
the "outer loop":

 Example: Print each adjective for every fruit:

adj=["red", "big", "tasty"]


fruits=["apple", "banana", "cherry"]
for x in adj:
for y in fruits:
print(x, y)
Python Functions
 A function is a block of code which only runs when it is called.

 Through Functions, programs be easier to write, read, test, and


fix

 You can pass data, known as parameters, into a function. A


function can return a result.
Cont…
Creating function
• In Python a function is defined using the def keyword:
Example:

def my_function():
print("Hello from a function")

Calling a Function

• To call a function, use the function name followed by


parenthesis. Example

def my_function():
print("Hello from a function")

def my_function() 44
Cont…
Function Arguments

• Information can be passed into functions as arguments.

• Arguments are specified after the function name, inside the


parentheses. You can add as many arguments as you want,
just separate them with a comma.

Example

def my_function(fname):
print(fname + " is CS program student")
my_function(“Abebe")
my_function(“Hana")
my_function(“Azeb") 45
Cont…

Return values

• To let a function return a value, we use the return statement

Example
def my_function(x): Output
return 5*x 15
25
Print(my_function(3)) 50
Print(my_function(5))
Print(my_function(10))

46
Scope of Variables
 In Python, variables are the containers for storing data
values.

• The location where we can find a variables and also access


it if required is called Scope of variables

• Scope of variable can be either local or global.

1. Global variables

• Those variables that are defined and declared outside any


function and are not specified at any function.

• They can be used at any part of the program.


Cont…
Example of global Variable

X=3

Y=1

Def function1():

print(x+y)

• In the above program x and y are global variables because


they are defined outside of the function named function1
Cont…
2.Local variables

 A variable created inside a function belongs to the local


scope of that function, and can only be used inside that
function.
Example,
A variable created inside a function is available inside that
function:
def myfunc():
x = 300
print(x)
myfunc()
Cont…
Naming Variables
If you operate with the same variable name inside and outside of a
function, Python will treat them as two separate variables, one
available in the global scope (outside the function) and one available
in the local scope (inside the function).
Example
X=20
Def fun(): output
X=10 10
Print(x) 20
Fun()
Print(x)
Python Modules
 Modules refer to a file containing Python statements and
definitions.

 A file containing Python code, for example: myfile.py , is called


a module, and its module name would be myfile.

 We use modules to break down large programs into small


manageable and organized files.
Cont…
 To create a module just save the code you want in a file with
the file extension .py

 Example: Save this code in a file named mymodule.py


def func(name):
print("Hello, " + name)
 Now we can use the module we just created, by using import
statement:
import mymodule
mymodule. func(“Abebe")
 Import From Module: You can choose to import only parts from
a module, by using the from keyword.
Cont…
 Example: The module named mymodule has one function
and one dictionary:
def func(name):
print("Hello, " + name)

person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}

 Example: Import only the person1 dictionary from the


module:
from mymodule import person1
print (person1["age"])
Cont…
 Example: The module named mymodule has one function
and one dictionary:
def func(name):
print("Hello, " + name)

person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}

 Example: Import only the person1 dictionary from the


module:
from mymodule import person1
print (person1["age"])
Part II: Contents Data Science
• Describe what data science is and the role of data scientists.

• Differentiate data and information.

• Describe data processing life cycle

• Understand different data types from diverse perspectives

• Describe data science life cycle in era of big data.

55
An Overview of Data Science
 Data science means extraction of knowledge from large
volumes of data that are structured or unstructured.

 Data science is the process of using data to find solutions to


predict outcomes for a problem statement.

 Data science is a multi-disciplinary field that uses scientific


methods, processes, algorithms, and systems to extract
knowledge and understandings.

 It’s much more than simply analyzing data.

56
Cont.…
 Today, data professionals understand that they must advance
the traditional skills of analyzing large amounts of data, data
mining, and programming skills.

 Data scientist is a professional who process and transform


raw data into useful understandings to make better business
decision

 Data scientists must master the full field of the data science
life cycle and keep a level of flexibility and understanding to
maximize returns at each phase of the process..

57
Cont…

 Data scientists need to be result-oriented, with special


knowledge and communication skills that allow them to
explain highly practical results.

 They own a strong measurable experience in

• Statistics and

• Linear algebra,

• Programming ,

• Data warehousing,

• Mining, and modeling to build and analyze algorithms.

58
Typical jobs for data scientists
 Collects large amounts of data and transforming it into more
usable format.

 Solving business related problems using data focused


techniques.

 Working with variety of programming tools like weka, R and


python.

 Having understanding of statistics including statistical tests


and distribution.

 Keep on top of analytical techniques such as machine


learning, deep learning and text analytics.
59
Cont…
 Communicating and collaborating with both IT and business.

 Looking for order and pattern Data as well as recognizing


trends that can help a business bottom line.

60
Who are data scientists?
 Computer science-20%

 Statistics and mathematics -19%

 Economics and social science -19%

 Data science and analysis -13%

 Natural science(biology, chemistry, physics) -11%

 Engineering-9%

 Others-9%

61
What are data and information?
 Data :-can be defined as a representation of facts, concepts,
or instructions in a formalized manner, which should be
suitable for communication, interpretation, or processing,
by human or electronic machines.

 It can be described as unprocessed facts and figures.

 It is represented with the help of characters such as


alphabets (A-Z, a-z), digits (0-9) or special characters (+, -,
/, *, <,>, =, etc.).

For example 12122013

62
Cont.…
 Information:- is the processed data on which decisions and
actions are based.

 It is data that has been processed into a form that is meaningful


to the receiver and is of real or perceived value in the current or
the future action or decision of receiver.

 Furtherer more, information is interpreted data; created from


organized, structured, and processed data in a particular
context.

For example 12/12/2013

63
Data Processing Cycle
 Data processing is the re-structuring or re-ordering of data
by people or machines to increase their usefulness and add
values for a particular purpose.

 Consists of the following basic steps

• Input,

• Processing, and

• Output.
 These three steps constitute the data processing cycle.

64
Cont.…
Input - in this step, data is prepared in some convenient form for
processing.

 The form depend on the processing machine.


• For example, when electronic computers are used, the input
data can be recorded on any one of the several types of storage
medium, such as hard disk, CD, flash disk and so on.

Processing - in this step, the input data is changed to produce


data in a more useful form.
• For example, interest can be calculated on deposit to a bank, or
a summary of sales for the month can be calculated from the
sales orders.
65
Cont.…
Output - at this stage, the result of the processing data is
collected.

 The particular form of the output data depends on the use of


the data.

• For example, output data may be payroll for employees.

66
Data types and their representation
 Data types are described from diverse views in computer
science and computer programming,

 data type is simply an attribute of data that tells the compiler


or interpreter how the programmer means to use the data.

67
Data types from Computer programming perspective
 Integers(int)- is used to store numbers, mathematically known
as integers. Example 1 and 3645

 Booleans(bool)- is used to represent restricted to one of two


values: Example true or false

 Characters(char)- is used to store a single character.


Example ‘a’, ‘7’ etc.

 Floating-point numbers(float)- is used to store real numbers


for example 1.354

 Alphanumeric strings(string)- used to store a combination of


characters and numbers. Example ‘’a234’’ etc
68
Data types from Data Analytics perspective
 From a data analytics point of view, there are three common
types of data types or structures: Structured, Semi-
structured, and Unstructured data types.

69
Structured Data
 Structured data is data that follows to a pre-defined data
model and is therefore direct to analyze.

 It obeys to a tabular format with a relationship between the


different rows and columns.

• Examples of structured data are Excel files or SQL


databases.

 Each of these has structured rows and columns that can be


sorted.

70
Semi-structured Data
 Data between structured and unstructured.

 Data with no rigid structure.

 Also known as a self-describing structure.

• Examples of semi-structured data include email, JSON and


XML are forms of semi-structured data.

71
Unstructured Data
 Its information that either not organized in a pre-defined
manner.

 Is typically text-heavy but may contain data such as dates,


numbers, and facts as well.

 This results in irregularities and ambiguities that make it


difficult to understand using traditional programs as
compared to data stored in structured databases.

• Examples of unstructured data include audio, video files or


NoSQL databases

72
Metadata – Data about Data
 The last category of data type is metadata.

 From a technical point of view, this is not a separate data


structure, but it is one of the most important elements for Big
Data analysis and big data solutions.

 Metadata is data about data.

 It provides additional information about a specific set of data.

• For example In a set of photographs, metadata could


describe when and where the photos were taken.

73
Data science life cycle
1. Business Requirements:
 Understanding the problems of business.

 Understanding general objectives.

 Understanding variables that need to be predicted.

 Data scientists need to work with business people and those


with expertise in understanding the data.

74
2. Data collection
 Process of gathering data before it is put in data warehouse
or storage on which data analysis can be carried out.

 Specifying what data need for problems.

 Understanding data sources (like Facebook, LinkedIn, Google


etc.)

 Selecting efficient way to store data and access all of it.

 Its is one of the major big data challenges in terms of


infrastructure requirements/storage and management.

75
3. Data cleaning
 sometimes unnecessary data is collected such that only
increases complexity of the problem

 Transforming data into desired format.

 Challenging is integration.

 Data cleaning:-

• missing value and duplicate value.

• Corrupted data

• Remove unnecessary data

• misspelled value

• inconsistent data
76
4. Data exploration and analysis
 Making raw data open to use in decision-making as well as
domain-specific usage.

 Understanding patterns in data

 Retrieving useful understanding and Forming hypothesis

 Data analysis involves exploring, data with the goal of


importance relevant data, creating and extracting useful
hidden information with high potential from a business point of
view.

 Related areas include data mining, and machine learning

77
5. Data modeling
 Create a model that predicts the target most accurately.

 Evaluate and test the efficiency of model.

 Identify the model that best fits the business requirement.

 Its like visualization(e.g histogram)

78
6. Deployment and optimization
 Last stage is deployed in users and feedback is collected and
maintained.

 Deploy the model in test environment.

 Monitor the performance

79
Data science process life cycle

80
Needs of data science
 Solves business problems

 For better decision( A or B)

 Predictive analysis(what will happen next?)

 Pattern discovery (is there any hidden information in the data?)

 Deep understanding about your customer.

 Additional layer of evidence for stakeholders.

 Optimize your resource use.

81
Seminar on Python Libraries for Data Science

 Some Python Libraries for Data Science


• Pandas
• NumPy
• SciPy
• Matplotlib
• SciKit-Learn
• TensorFlow
• Keras

You might also like