Professional Documents
Culture Documents
SEM 5 PDS IMP by GTU Medium
SEM 5 PDS IMP by GTU Medium
By GTU Medium
GTU Medium
GTU Medium
PYTHON FOR DATA SCIENCE
Python doesn’t know the type of variable until we run the code. It automatically assigns the data
type during execution. The programmer doesn’t need to worry about declaring variables and their
data types.
Page 1 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
But even if you do, a Python package manager (pip) makes things easier to import other great
packages from the Python package index (PyPi). It consists of over 200,000 packages.
7. Portability
In many languages like C/C++, you need to change your code to run the program on different
platforms. That is not the same with Python. You only write once and run it anywhere.
Numbers: Number stores numeric values. The integer, float, and complex values belong to a
Python Numbers data-type. Python provides the type() function to know the data-type of the
variable. Similarly, the isinstance() function is used to check an object belongs to a particular class.
Sequence Type: The string can be defined as the sequence of characters represented in the
quotation marks. In Python, we can use single, double, or triple quotes to define a string.
String handling in Python is a straightforward task since Python provides built-in functions and
operators to perform operations in the string.
Page 2 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
In the case of string handling, the operator + is used to concatenate two strings as the operation
"hello"+" python" returns "hello python".
The operator * is known as a repetition operator as the operation "Python" *2 returns 'Python
Python'.
Python Lists are similar to arrays in C. However, the list can contain data of different types. The
items stored in the list are separated with a comma (,) and enclosed within square brackets [].
We can use slice [:] operators to access the data of the list. The concatenation operator (+) and
repetition operator (*) works with the list in the same way as they were working with the strings.
A tuple is similar to the list in many ways. Like lists, tuples also contain the collection of the items
of different data types. The items of the tuple are separated with a comma (,) and enclosed in
parentheses ().
A tuple is a read-only data structure as we can't modify the size and value of the items of a tuple.
Boolean: Boolean type provides two built-in values, True and False. These values are used to
determine the given statement true or false. It denotes by the class bool. True can be represented
by any non-zero value or 'T' whereas false can be represented by the 0 or 'F'.
Set: Python Set is the unordered collection of the data type. It is iterable, mutable(can modify after
creation), and has unique elements. In set, the order of the elements is undefined; it may return
the changed sequence of the element. The set is created by using a built-in function set(), or a
sequence of elements is passed in the curly braces and separated by the comma. It can contain
various types of values.
Dictionary: Dictionary is an unordered set of a key-value pair of items. It is like an associative array
or a hash table where each key stores a specific value. Key can hold any primitive data type,
whereas value is an arbitrary Python object.
The items in the dictionary are separated with the comma (,) and enclosed in the curly braces {}.
Page 3 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
A list can have any number of items and they may be of different types (integer, float,
string, etc.). A list can also have another list as an item. This is called a nested list.
List items are indexed, the first item has index [0], the second item has index [1] etc.
If you add new items to a list, the new items will be placed at the end of the list.
The list is changeable, meaning that we can change, add, and remove items in a list after
it has been created.
To determine how many items a list has, use the len() function:
Set is one of 4 built-in data types in Python used to store collections of data, the other 3 are List,
Tuple, and Dictionary, all with different qualities and usage.
Unordered means that the items in a set do not have a defined order.
Page 4 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Set items are unchangeable, meaning that we cannot change the items after the set has
been created.
To determine how many items a set has, use the len() function.
Tuples are ordered, it means that the items have a defined order, and that order will not
change.
Tuples are unchangeable, meaning that we cannot change, add or remove items after the
tuple has been created.
Tuples are indexed, they can have items with the same values.
Tuple items are indexed, the first item has index [0], the second item has index [1] etc.
Page 5 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
An item has a key and a corresponding value that is expressed as a pair (key: value).
While the values can be of any data type and can repeat, keys must be of immutable type (string,
number or tuple with immutable elements) and must be unique.
A literal expression evaluates to the value it represents. Following are some examples of literal
expressions:
10 => 10
Binary Expressions
A binary expression consists of a binary operator applied to two operand expressions. Examples:
2*6 => 12
8-6 => 2
Page 6 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Unary Expressions
An unary expression contains one operator and single operand. Examples:
-(5/5) => -1
Compound Expressions
In a binary or unary expression, if an operand itself is an expression, such expression is known as
a compound expression. Examples:
3 * 2 + 1 => 7
2 + 6 * 2 => 14
(2 + 6) * 2 => 16
>>>x = 10
>>>(x + 2) * 4
48
>>>x
10
Page 7 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
def function_name(parameters):
"""docstring"""
statement1
statement2….
return [expr]
You can define functions to provide the required functionality. Here are simple rules to define a
function in Python.
Function blocks begin with the keyword def followed by the function name and
parentheses ( ( ) ).
Any input parameters or arguments should be placed within these parentheses. You can
also define parameters inside these parentheses.
The first statement of a function can be an optional statement - the documentation string
of the function or docstring.
The code block within every function starts with a colon (:) and is indented.
The statement return [expression] exits a function, optionally passing back an expression
to the caller. A return statement with no arguments is the same as return None.
def my_function():
my_function()
Page 8 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Return Type: Returns a sliced object containing elements in the given range only.
# String slicing
String = 'ASTRING'
s1 = slice(3)
s2 = slice(1, 5, 2)
print(“String slicing”)
print(String[s1])
print(String[s2])
print(String[s3])
Page 9 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Output:
String slicing
AST
SR
GITA
In this example, we will see the example of starting from 1 index and ending with a 5 index(stops
at 3-1=2 ), and the skipping step is 2. It is a good example of Python slicing string by character.
# string slicing
# String slicing
String = 'GEEKS'
print(String[1:5:2])
print(String[:3])
Output:
EK
GEE
Page 10 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Output:
The mangy, scrawny stray dog hurriedly gobbled down the grain-free, organic dog food.
Output:
We all are equal.
3. Formatting with string literals, called f-strings.
name = 'Ele'
Output:
My name is Ele.
# string interpolation
Page 11 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
n1 = 'Hello'
n2 = 'GTU Medium'
# template string.
print(n.substitute(n3=n1, n4=n2))
Output:
Hello ! This is GTU Medium.
Page 12 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
You can use one or more loop inside any another while, for or do..while loop.
for iterating_var in sequence:
for iterating_var in sequence:
statements(s)
statements(s)
While Loop
i=1
while i < 6:
print(i)
i += 1
Output:
1
2
3
4
5
For loop
word="anaconda"
print (letter)
Output:
a
n
Page 13 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
a
c
o
n
d
a
Nested loops
for x in adj:
for y in fruits:
print(x, y)
Output:
red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry
Page 14 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
1 break statement
Terminates the loop statement and transfers execution to the statement immediately
following the loop.
break
2 continue statement
Causes the loop to skip the remainder of its body and immediately retest its condition prior
to reiterating.
continue
3 pass statement
The pass statement in Python is used when a statement is required syntactically but you do
not want any command or code to execute.
pass
for i in range(10):
print(i)
if i == 2:
break
Output:
0
1
2
if var == "e":
Page 15 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
continue
print(var)
Output:
G
k
s
f
o
r
g
k
s
for i in li:
if(i =='a'):
pass
else:
print(i)
Output:
b
c
d
Page 16 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
In terms of application areas, ML scientists prefer Python as well. When it comes to areas like
building fraud detection algorithms and network security, developers leaned towards Java, while
for applications like natural language processing (NLP) and sentiment analysis, developers opted
for Python, because it provides large collection of libraries that help to solve complex business
problem easily, build strong system and data application.
Following are some useful features of Python language:
It uses the elegant syntax, hence the programs are easier to read.
It is a simple to access language, which makes it easy to achieve the program working.
The large standard library and community support.
The interactive mode of Python makes its simple to test codes.
In Python, it is also simple to extend the code by appending new modules that are
implemented in other compiled language like C++ or C.
Python is an expressive language which is possible to embed into applications to offer a
programmable interface.
Allows developer to run the code anywhere, including Windows, Mac OS X, UNIX, and
Linux.
It is free software in a couple of categories. It does not cost anything to use or download
Pythons or to add it to the application.
Page 17 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
They have to know math, statistics, programming, data management, visualization, and what not
to be a “full-stack” data scientist. As I mentioned earlier, 80% of the work goes into preparing the
data for processing in an industry setting.
According to professor Haider, the three important qualities to possess in order to succeed as a
data scientist are curious, judgemental, and proficient in programming.
The raw data undergoes different stages within a pipeline which are:
1) Fetching/Obtaining the Data
This stage involves the identification of data from the internet or internal/external databases and
extracts into useful formats. Prerequisite skills:
Distributed Storage: Hadoop, Apache Spark/Flink.
Page 18 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Page 19 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Prerequisite skills:
Python: NumPy, Matplotlib, Pandas, SciPy.
R: GGplot2, Dplyr.
Statistics: Random sampling, Inferential.
6) Revision
As the nature of the business changes, there is the introduction of new features that may degrade
your existing models. Therefore, periodic reviews and updates are very important from both
business’s and data scientist’s point of view.
Page 20 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Data science is not about great machine learning algorithms, but about the solutions which you
provide with the use of those algorithms. It is also very important to make sure that your pipeline
remains solid from start till end, and you identify accurate business problems to be able to bring
forth precise solutions.
Advantages
Relation with Real world entities
Code reusability
Abstraction or data hiding
Disadvantages
Data protection
Not suitable for all types of problems
Slow Speed
Page 21 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
class Emp:
self.name = name
self.age = age
def info(self):
# made here
Emp("Hilbert", 16),
Emp("Alice", 30)]
# used here
emp.info()
Output:
Hello, John. You are 43 old.
Hello, Hilbert. You are 16 old.
Hello, Alice. You are 30 old.
Page 22 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Example:
# of a list
# modularization is done by
# functional approach
def sum_the_list(mylist):
res = 0
Page 23 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
res += val
return res
print(sum_the_list(mylist))
Output:
100
import functools
Page 24 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
def sum_the_list(mylist):
if len(mylist) == 1:
return mylist[0]
else:
print(functools.reduce(lambda x, y: x + y, mylist))
Output:
110
The time complexity of that program(the algorithms you use to build the program, for example
using merge sort will be a lot faster on average than bubble sort because the time complexity of
the first is in the order of n log(n )and the latter is n^2).
The data that you want to process .Algorithms work differently on different sizes of input...
The power of the computer used to run that program like how many cores, the architecture, and
all of those stuff the guy at the store tells you when you want to buy a computer.
The language used in to build the program for example C++ is much faster than java because it's a
lower level programming language (closer to the hardware) while java is a high level programming
language.
Factors affecting Speed of Execution
1) Algorithm time complexity.
2) Input size.
Page 25 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
3) CPU speed.
4) I/O waiting time.
5) Amount of running processes.
6) Amount of available memory.
Page 26 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Page 27 of 28
GTU Medium
PYTHON FOR DATA SCIENCE
Page 28 of 28