You are on page 1of 99

DATA ANALYTICS WITH PYTHON

MCA 3rd SEM

UNIT – I

1. Outline the different types of statements in python with proper syntax and
examples. 10M
Any Instruction that a python interpreter can execute (carry out) is called a Statement.
An Instruction is an order/command given to a computer processor by a computer program to perform some mathematical
or logical manipulations (calculations). And, Each and every line or a sentence in any programming language is called an
instruction. If you didn’t understand the above definition of statement clearly,
Then, Here is a simple way to define Statement, So, In simple words – A Statement is the smallest executable unit of code
that has an effect, like creating a variable or displaying a value. Each and every line of code that we write in any
programming language is called a statement. Because all the lines are executable by the interpreter or the compiler of that
programming language.
Example: 1
x=3
print(x)
Output: 3
The first line is an assignment statement that gives a value to x.
The second line is a print statement that displays the value of x.
When you type a statement then the interpreter executes it, which means that it does whatever the statement says.
Some other kinds of statements in Python are – if statement, else statement, while statement, for statement, import statement,
etc. which will be discussed on later article.

2. Briefly discuss about the looping techniques in Python with suitable examples.
10M

Python For loop is used for sequential traversal i.e. it is used for iterating over an iterable like string, tuple, list,
etc. It falls under the category of definite iteration. Definite iterations mean the number of repetitions is specified
explicitly in advance. In Python, there is no C style for loop, i.e., for (i=0; i<n; i++). There is “for in” loop which
is similar to for each loop in other languages. Let us learn how to use for in loop for sequential traversals.

Syntax:
for var in iterable:
# statements
# Python program to illustrate Iterating over a list
print("List Iteration")
l = ["geeks", "for", "geeks"]
for i in l:
print(i)

# Iterating over a tuple (immutable)


print("\nTuple Iteration")
t = ("geeks", "for", "geeks")
for i in t:
print(i)

# Iterating over a String


print("\nString Iteration")
s = "Geeks"
for i in s:
print(i)

Loop Control Statements

Loop control statements change execution from its normal sequence. When execution leaves a scope, all
automatic objects that were created in that scope are destroyed. Python supports the following control statements.

Continue :

for letter in 'geeksforgeeks':


if letter == 'e' or letter == 's':
continue
print('Current Letter :', letter)

Break : Python break statement brings control out of the loop.


for letter in 'geeksforgeeks':
# break the loop as soon it sees 'e'
# or 's'
if letter == 'e' or letter == 's':
break

print('Current Letter :', letter)


range() function :

Python range() is a built-in function that is used when a user needs to perform an action a specific number of
times. range() in Python(3.x) is just a renamed version of a function called xrange() in Python(2.x). The range()
function is used to generate a sequence of numbers. Depending on how many arguments user is passing to the
function, user can decide where that series of numbers will begin and end as well as how big the difference will
be between one number and the next.range() takes mainly three arguments.
• start: integer starting from which the sequence of integers is to be returned
• stop: integer before which the sequence of integers is to be returned.
The range of integers end at stop – 1.
• step: integer value which determines the increment between each integer in the sequence

# Python Program to show range() basics

# printing a number
for i in range(10):
print(i, end=" ")
print()

Output: 0,1,2,3,4,5,6,7,8,9

# using range for iteration


l = [10, 20, 30, 40]
for i in range(len(l)):
print(l[i], end=" ")
print()

Output : 10,20,30,40

# performing sum of first 10 numbers


sum = 0
for i in range(1, 10):
sum = sum + i
print("Sum of first 10 numbers :", sum)

Python for loop with else

In most of the programming languages (C/C++, Java, etc), the use of else statements has been restricted with the
if conditional statements. But Python also allows us to use the else condition with for loops.

for i in range(1, 4):


print(i)
else: # Executed because no break in for
print("No Break\n")

Output : 1,2,3
No break

for i in range(1, 4):


print(i)
break
else: # Not executed as there is a break
print("No Break")
Output : 1,2,3
Python While Loop

Is used to execute a block of statements repeatedly until a given condition is satisfied. And when the condition
becomes false, the line immediately after the loop in the program is executed. While loop falls under the category
of indefinite iteration. Indefinite iteration means that the number of times the loop is executed isn’t specified
explicitly in advance.

Syntax : while expression:


statement(s)

count = 0
while (count < 3):
count = count + 1
print("Hello Geek")

3. Summarize the arithmetic, assignment, comparison and bitwise operators


with examples. 10M

Arithmetic Operators
Arithmetic operators are used to performing mathematical operations like addition, subtraction, multiplication,
and division.

Operator Description Syntax

+ Addition: adds two operands x+y


– Subtraction: subtracts two operands x–y
* Multiplication: multiplies two operands x*y
/ Division (float): divides the first operand by the second x/y
// Division (floor): divides the first operand by the second x // y
% Modulus: returns the remainder when the first operand
is divided by the second x%y
** Power: Returns first raised to power second x** y
a=9
b=4
add = a + b
sub = a - b
mul = a * b
div1 = a / b
div2 = a // b
mod = a % b
p = a ** b
# print results
print(add)
print(sub)
print(mul)
print(div1)
print(div2)
print(mod)
print(p)

Comparison
These operators compare the values on either sides of them and decide the relation among them. They are also
called Relational operators.
Assume variable a holds 10 and variable b holds 20, then :

Operator Description Example

== If the values of two operands are equal, then the condition becomes (a == b) is not true.
true.

!= If values of two operands are not equal, then condition becomes (a != b) is true.
true.

<> If values of two operands are not equal, then condition becomes (a <> b) is true. This
true. is similar to !=
operator.

> If the value of left operand is greater than the value of right operand, (a > b) is not true.
then condition becomes true.
< If the value of left operand is less than the value of right operand, (a < b) is true.
then condition becomes true.

>= If the value of left operand is greater than or equal to the value of (a >= b) is not true.
right operand, then condition becomes true.

<= If the value of left operand is less than or equal to the value of right (a <= b) is true.
operand, then condition becomes true.

Example
Assume variable a holds 10 and variable b holds 20, then :
a = 21
b = 10
c=0
if ( a == b ):
print "a is equal to b"
else:
print "a is not equal to b"
a = 5;
b = 20;
if ( a <= b ):
print "a is either less than or equal to b"
else:
print " a is neither less than nor equal to b"

if ( b >= a ):
print " b is either greater than or equal to b"
else:

Bitwise operators :
In Python, bitwise operators are used to performing bitwise calculations on integers. The integers are first
converted into binary and then operations are performed on bit by bit, hence the name bitwise operators. Then
the result is returned in decimal format.

OPERATOR DESCRIPTION SYNTAX


& Bitwise AND x&y
| Bitwise OR x|y
~ Bitwise NOT ~x
^ Bitwise XOR x^y
>> Bitwise right shift x>>
<< Bitwise left shift x<<

a = 10
b=4

# Print bitwise AND operation


print("a & b =", a & b)
# Print bitwise OR operation
print("a | b =", a | b)
# Print bitwise NOT operation
print("~a =", ~a)
# print bitwise XOR operation
print("a ^ b =", a ^ b)

4. Briefly explain the basic Input and Output statements of python. 5M

INPUT Statements
Sometimes a developer might want to take user input at some point in the program. To do this Python provides
an input() function.
Syntax:
input(“prompt')
Example 1: Python get user input with a message
# Taking input from the user
name = input("Enter your name: ")
# Output
print("Hello, " + name)
print(type(name))

Output Statement
Python provides the print() function to display output to the standard output devices.
Syntax:
print(value(s), sep= ‘ ‘, end = ‘\n’, file=file, flush=flush)
Parameters:
value(s) : Any value, and as many as you like. Will be converted to string before printed.

sep=’separator’ : (Optional) Specify how to separate the objects, if there is more than one. Default :’ ‘

end=’end’: (Optional) Specify what to print at the end. Default : ‘\n’

file : (Optional) An object with a write method. Default : sys.stdout

flush : (Optional) A Boolean, specifying if the output is flushed (True) or buffered (False). Default: False

Returns: It returns output to the screen.

# Python program to demonstrate print() method


print("GFG")
# code for disabling the soft space feature
print('G', 'F', 'G')

5. List and explain the basic types of Arguments in python with example.
6. Interpret the following in python functions with suitable code snippet:
i)Positional arguments ii) Keyword arguments iii) Variable length arguments

There are types of Arguments in Python Function Definition:


1. Default arguments
2. Keyword argument
3. Positional arguments
4. Arbitrary positional arguments
5. Arbitrary keyword arguments
1. default arguments:
➢ Default arguments are values that are provided while defining functions.
➢ The assignment operator = is used to assign a default value to the argument.
➢ Default arguments become optional during the function calls.
➢ If we provide a value to the default arguments during function calls, it overrides the default value.
➢ The function can have any number of default arguments.
➢ Default arguments should follow non-default arguments.
Exampel:
def add(a,b=5,c=10):
return (a+b+c)
add(10) #25
add(10,20 ,30) #60

2. Keyword Arguments:
Functions can also be called using keyword arguments of the form kwarg=value. During a function call, values
passed through arguments need not be in the order of parameters in the function definition. This can be achieved
by keyword arguments. But all the keyword arguments should match the parameters in the function definition
Example:
def add(a,b,c):
return (a+b+c)
print(add(b=10,a=5,c=20)) #35
print(add(a=10,b=5,c=40)) #55

3. Positional Arguments
During a function call, values passed through arguments should be in the order of parameters in the function
definition. This is called positional arguments. Keyword arguments should follow positional arguments only.
Example:
def add(a,b,c):
return (a+b+c)
print(add(a=10,b=5,c=20))#35
print(add(a=10,b=5,c=40)) #55

4. Arbitrary positional arguments:


For arbitrary positional argument, an asterisk (*) is placed before a parameter in function definition which can
hold non-keyword variable-length arguments. These arguments will be wrapped up in a tuple. Before the variable
number of arguments, zero or more normal arguments may occur.
Example :
def add(*b):
result=0
for i in b:
result=result +i
return result
print (add(1,2,3,4,5)) #15
print (add(10,20)) #30
5. Arbitrary keyword arguments:
For arbitrary positional argument, a double asterisk (**) is placed before a parameter in a function which can
hold keyword variable length arguments.
Example:
def fn(**a):
for i in a.items():
print (i)
fn(numbers=5,colors="blue",fruits="apple")
Output : ('numbers', 5)
('colors', 'blue')
('fruits', 'app’)

7. Demonstrate the working of if statements with examples.


We can have an if…elif…else statement inside another if…elif…else statement. This is called nesting in
computer programming. Any number of these statements can be nested inside one another. Indentation is the only
way to figure out the level of nesting. This can get confusing, so it must be avoided if we can
Syntax:
if (condition1):
stmts
elif(condition2):
stmts
else:
stmts
Example 1:
num = 15
if num >= 0:
print("Positive number")
elif num == 0:
print("Zero")
else:
print("Negative number")

Output: Positive number

8. Explain the basic primitives (data types) of python with example


Numbers
Number stores numeric values. The integer, float, and complex values belong to a Python Numbers data-type.
Python provides the type() function to know the data-type of the variable. Similarly, the isinstance() function is
used to check an object belongs to a particular class.
Python creates Number objects when a number is assigned to a variable. For
Example:
a=5
print("The type of a", type(a))
b = 40.5
print("The type of b", type(b))
c = 1+3j
print("The type of c", type(c))
print(" c is a complex number", isinstance(1+3j,complex))
Output:
The type of a <class 'int'>
The type of b <class 'float'>
The type of c <class 'complex'>
c is complex number: True
String
The string can be defined as the sequence of characters represented in the quotation marks. In Python, we can use
single, double, or triple quotes to define a string. String handling in Python is a straightforward task since Python
provides built-in functions and operators to perform operations in the string. In the case of string handling, the
operator + is used to concatenate two strings as the operation
"hello"+" python" returns "hello python".
The operator * is known as a repetition operator as the operation "Python" *2 returns 'Python Python'.
E.g:
str = "string using double quotes"
print(str)
s = “'''A multiline
string'''”
print(s)
Output:
string using double quotes
A multiline
String
List
Python Lists are similar to arrays in C. However, the list can contain data of different types. The items stored in
the list are separated with a comma (,) and enclosed within square brackets []. We can use slice [:] operators to
access the data of the list. The concatenation operator (+) and repetition operator (*) works with the list in the
same way as they were working with the strings. Consider the following
Example.
list1 = [1, "hi", "Python", 2]
print(type(list1)) #Checking type of given list
print (list1) #Printing the list1
print (list1[3:]) # List slicing
print (list1[0:2]) #List slicing
print (list1 + list1) # List Concatenation using + operator
print (list1 * 3) # List repetation using * operator
Output:
The type of a <class 'list'>
[1, 'hi', 'Python', 2]
[2]
[1, 'hi']
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]

Tuple
A tuple is similar to the list in many ways. Like lists, tuples also contain the collection of the items of different
data types. The items of the tuple are separated with a comma (,) and enclosed in parentheses (). A tuple is a read-
only data structure as we can't modify the size and value of the items of a tuple.
Example :
tup = ("hi", "Python", 2)
print (type(tup)) # Checking type of tup
print (tup) #Printing the tuple
print (tup[0:1]) # Tuple slicing
print (tup + tup) # Tuple concatenation using + operator
print (tup * 3) # Tuple repatation using * operator
t[2] = "hi" # Adding value to tup. It will throw an error.
Output:
<class 'tuple'>
('hi', 'Python', 2)
('Python', 2)
('hi',)
('hi', 'Python', 2, 'hi', 'Python', 2)
('hi', 'Python', 2, 'hi', 'Python', 2, 'hi', 'Python', 2)
Traceback (most recent call last):
File "main.py", line 14, in <module>
t[2] = "hi";
TypeError: 'tuple' object does not support item assignment
Dictionary
Dictionary is an unordered set of a key-value pair of items. It is like an associative array or a hash table where
each key stores a specific value. Key can hold any primitive data type, whereas value is an arbitrary Python
object. The items in the dictionary are separated with the comma (,) and enclosed in the curly braces {}.
Example :
d = {1:'Jimmy', 2:'Alex', 3:'john', 4:'mike'}
print (d) # Printing dictionary
print("1st name is "+d[1]) # Accesing value using keys
print("2nd name is "+ d[4])
print (d.keys()) #printing keys
print (d.values()) #printing values
Output
{1: 'Jimmy', 2: 'Alex', 3: 'john', 4: 'mike'}
1st name is Jimmy
2nd name is mike
dict_keys([1, 2, 3, 4])
dict_values(['Jimmy', 'Alex', 'john', 'mike'])
Boolean
Boolean type provides two built-in values, True and False. These values are used to determine the given statement
true or false. It denotes by the class bool. True can be represented by any non-zero value or 'T' whereas false can
be represented by the 0 or 'F'.
Example :
print(type(True))
print(type(False))
print(false)
Output:
<class 'bool'>
<class 'bool'>
NameError: name 'false' is not defined
Set
Python Set is the unordered collection of the data type. It is iterable, mutable(can modify after creation), and has
unique elements. In set, the order of the elements is undefined; it may return the changed sequence of the element.
The set is created by using a built-in function set(), or a sequence of elements is passed in the curly braces and
separated by the comma. It can contain various types of values.
Example :
set1 = set() # Creating Empty set
set2 = {'James', 2, 3,'Python'} # Creating set with values
print(set2) #Printing Set value
set2.add(10) #Adding element to the set
print(set2) # Printing set after adding
set2.remove(2) # Removing element from the set
print(set2) # Printing set after removing

9. Define Function and give general form for defining and calling Functions
in Python with examples.
A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide
better modularity for your application and a high degree of code reusing.
As you already know, Python gives you many built-in functions like print(), etc. but you can also create your own
functions. These functions are called user-defined functions.
Formal parameters are mentioned in the function definition.
Actual parameters(arguments) are passed during a function call.
Defining a Function
You can define functions to provide the required functionality. Here are simple rules to define a function in
Python.
• Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).
• Any input parameters or arguments should be placed within these parentheses. You can also define
parameters inside these parentheses.
• The first statement of a function can be an optional statement - the documentation string of the function
or docstring.
• The code block within every function starts with a colon (:) and is indented.
• The statement return [expression] exits a function, optionally passing back an expression to the caller. A
return statement with no arguments is the same as return None.
Syntax
def functionname( parameters ):
stmts
return [expression]
By default, parameters have a positional behavior and you need to inform them in the same order that they were
defined.

Example
The following function takes a string as input parameter and prints it on standard screen.
def printme( str ):
print(str)
return
Calling a Function
Defining a function only gives it a name, specifies the parameters that are to be included in the function and
structures the blocks of code.
Once the basic structure of a function is finalized, you can execute it by calling it from another function or directly
from the Python prompt. Following is the example to call printme() function −
def printme( str ):
print str
return;
printme("I'm first call to user defined function!")
printme("Again second call to the same function")
Output :
I'm first call to user defined function!
Again second call to the same function

10. Interpret the working of Anonymous functions in python.


What are lambda functions in Python?
In Python, an anonymous function is a function that is defined without a name.While normal functions are defined
using the def keyword in Python, anonymous functions are defined using the lambda keyword.Hence, anonymous
functions are also called lambda functions.
Syntax of Lambda Function in python
lambda arguments: expression
Lambda functions can have any number of arguments but only one expression. The expression is evaluated and
returned. Lambda functions can be used wherever function objects are required.
Example of Lambda Function in python
Here is an example of lambda function that doubles the input value.
# Program to show the use of lambda functions
double = lambda x: x * 2
print(double(5))

Output
10
In the above program, lambda x: x * 2 is the lambda function. Here x is the argument and x * 2 is the expression
that gets evaluated and returned.
This function has no name. It returns a function object which is assigned to the identifier double. We can now
call it as a normal function. The statement
double = lambda x: x * 2
is nearly the same as:
def double(x):
return x * 2
Use of Lambda Function in python
We use lambda functions when we require a nameless function for a short period of time.
In Python, we generally use it as an argument to a higher-order function (a function that takes in other functions
as arguments). Lambda functions are used along with built-in functions like filter(), map() etc.
Example use with filter()
The filter() function in Python takes in a function and a list as arguments.
The function is called with all the items in the list and a new list is returned which contains items for which the
function evaluates to True.
Here is an example use of filter() function to filter out only even numbers from a list.
# Program to filter out only the even items from a list
my_list = [1, 5, 4, 6, 8, 11, 3, 12]
new_list = list(filter(lambda x: (x%2 == 0) , my_list))
print(new_list)
Output
[4, 6, 8, 12]
Example use with map()
The map() function in Python takes in a function and a list.
The function is called with all the items in the list and a new list is returned which contains items returned by that
function for each item.Here is an example use of map() function to double all the items in a list.
# Program to double each item in a list using map()
my_list = [1, 5, 4, 6, 8, 11, 3, 12]
new_list = list(map(lambda x: x * 2 , my_list))
print(new_list)
Output
[2, 10, 8, 12, 16, 22, 6, 24]
Unit-02
1. Illustrate the basic operations of List in depth. 10M
The list is a most versatile datatype available in Python which can be written as a list of comma-separated values
(items) between square brackets. Important thing about a list is that items in a list need not be of the same type.
Creating a list is as simple as putting different comma-separated values between square brackets.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5 ]; list3 = ["a", "b", "c", "d"]

Accessing Values
in Lists To access values in lists, use the square brackets for slicing along with the index or indices to obtain value
available at that index.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5, 6, 7 ]; print "list1[0]: ",
list1[0] print "list2[1:5]: ",
list2[1:5]

Updating Lists
You can update single or multiple elements of lists by giving the slice on the left-hand side of the assignment
operator, and you can add to elements in a list with the append() method.
For example –
list = ['physics', 'chemistry', 1997, 2000];
print "Value available at index 2 : "
print list[2] list[2] = 2001;
print "New value available at index 2 : "
print list[2]

Delete List
Elements To remove a list element, you can use either the del statement if you know exactly which element(s)
you are deleting or the remove() method if you do not know.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
print list1
del list1[2];
print "After deleting value at index 2 : "
print list1
1. append() : The append() method is used to add elements at the end of the list. This method can only add a
single element at a time. To add multiple elements, the append() method can be used inside a loop.
Code:
myList.append(4)
myList.append(5)
myList.append(6)
for i in range(7, 9):
myList.append(i)
print(myList)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,4,5,6,7,8]

2. extend() : The extend() method is used to add more than one element at the end of the list. Although it can add
more than one element, unlike append(), it adds them at the end of the list like append().
Code:
myList.extend([4, 5, 6])
for i in range(7, 9):
myList.append(i)
print(myList)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,4,5,6,7,8]

3. insert() : The insert() method can add an element at a given position in the list. Thus, unlike append(), it can
add elements at any position, but like append(), it can add only one element at a time. This method takes two
arguments. The first argument specifies the position, and the second argument specifies the element to be inserted.
Code:
myList.insert(3, 4)
myList.insert(4, 5)
myList.insert(5, 6)
print(myList)
Output:
[1,2,3,4,5,6,’EduCBA’,’makes learning fun!’,]

4. remove() : The remove() method is used to remove an element from the list. In the case of multiple occurrences
of the same element, only the first occurrence is removed.
Code:
myList.remove('makes learning fun!')
myList.insert(4, 'makes')
myList.insert(5, 'learning')
myList.insert(6, 'so much fun!')
print(myList)
Output:
[1,2,3,’EduCBA’,’makes’, ‘learning’, ‘so much fun!’]

5. pop() : The method pop() can remove an element from any position in the list. The parameter supplied to this
method is the index of the element to be removed.
Code:
myList.pop(4)
myList.insert(4, 'makes')
myList.insert(5, 'learning')
myList.insert(6, 'so much fun!')
print(myList)
Output:
[1,2,3,’EduCBA’,’makes’, ‘learning’, ‘so much fun!’]

6. slice : The slice operation is used to print a section of the list. The slice operation returns a specific range of
elements. It does not modify the original list.
Code:
print(myList[:4]) # prints from beginning to end index
print(myList[2:]) # prints from start index to end of list
print(myList[2:4]) # prints from start index to end index
print(myList[:]) # prints from beginning to end of list
Output:
[1,2,3,’EduCBA’]
[3,’EduCBA’,’makes learning fun!’]
[3,’EduCBA’]
[1,2,3,’EduCBA’,’makes learning fun!’]

7. reverse() : The reverse() operation is used to reverse the elements of the list. This method modifies the original
list. To reverse a list without modifying the original one, we use the slice operation with negative indices.
Specifying negative indices iterates the list from the rear end to the front end of the list.
Code:
print(myList[::-1]) # does not modify the original list
myList.reverse() # modifies the original list
print(myList)
Output:
[‘makes learning fun!’,’EduCBA’,3,2,1]

8. len() : The len() method returns the length of the list, i.e. the number of elements in the list.
Code:
print(len(myList))
Output:
5

9. min() & max() : The min() method returns the minimum value in the list. The max() method returns the
maximum value in the list. Both the methods accept only homogeneous lists, i.e. list having elements of similar
type.
Code:
print(min(myList))
Output:
TypeError : unorderable types : str() < int ()
Code:
print(min([1, 2, 3]))
print(max([1, 2, 3]))
Output:
1
3

10. count() : The function count() returns the number of occurrences of a given element in the list.
Code:
print(myList.count(3))
Output:
1

11. concatenate The concatenate operation is used to merge two lists and return a single list. The + sign is used
to perform the concatenation. Note that the individual lists are not modified, and a new combined list is returned.
Code:
yourList = [4, 5, 'Python', 'is fun!']
print(myList+yourList)
Output:
[1,2,4,’EduCBA’,’makes learning fun!’,4,5,’python’,’is fun’]

12. multiply : Python also allows multiplying the list n times. The resultant list is the original list iterated n times.
Code:
print(myList*2)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,1,2,3,’EduCBA’,’makes learning fun!’]

13. index() : The index() method returns the position of the first occurrence of the given element. It takes two
optional parameters – the begin index and the end index. These parameters define the start and end position of
the search area on the list. When supplied, the element is searched only in the sub-list bound by the begin and
end indices. When not supplied, the element is searched in the whole list.
Code:
print(myList.index('EduCBA')) # searches in the whole list
print(myList.index('EduCBA', 0, 2)) # searches from 0th to 2nd position
Output:
3
ValueError : ‘EduCBA’ is not in list

14. sort() : The sort method sorts the list in ascending order. This operation can only be performed on
homogeneous lists, i.e. lists having elements of similar type.
Code:
yourList = [4, 2, 6, 5, 0, 1] yourList.sort()
print(yourList)
Output:
[0,1,2,4,5,6]

15. clear() : This function erases all the elements from the list and empties them.
Code:
myList.sort()
print(myList)
Output:
[]
2. Implement a python program to simulate stack. 10M

# Python program todemonstrate stack implementation using collections.deque


from collections import deque
stack = deque()
stack.append('a')
stack.append('b')
stack.append('c')
print('Initial stack:')
print(stack)
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())
print('\nStack after elements are popped:')
print(stack)

#using list Stack Implementation


stack = []
stack.append('a')
stack.append('b')
stack.append('c')
print('Initial stack')
print(stack)
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())
print('\nStack after elements are popped:')
print(stack)

3. Show the slicing and indexing methods of strings with examples 10M

Indexing : Means referring to an element of an iterable by its position within the iterable. Each of a string’s
characters corresponds to an index number and each character can be accessed using their index number.
We can access characters in a String in Two ways :
1. Accessing Characters by Positive Index Number
2. Accessing Characters by Negative Index Number

Accessing Characters by Positive Index Number: In this type of Indexing, we pass a Positive index(which we
want to access) in square brackets. The index number start from index number 0 (which denotes the first character
of a string)
# declaring the string
str = "Geeks for Geeks !"
# accessing the character of str at 0th index
print(str[0])
# accessing the character of str at 6th index
print(str[6])
# accessing the character of str at 10th index
print(str[10])

Accessing Characters by Negative Index Number : In this type of Indexing, we pass the Negative index(which
we want to access) in square brackets. Here the index number starts from index number -1 (which denotes the
last character of a string).
# declaring the string
str = "Geeks for Geeks !"
# accessing the character of str at last index
print(str[-1])
# accessing the character of str at 5th index from the last
print(str[-5])
# accessing the character of str at 10th index from the last
print(str[-10])

Slicing
Slicing in Python is a feature that enables accessing parts of sequence. In slicing string, we create a substring,
which is essentially a string that exists within another string. We use slicing when we require a part of string and
not the complete string.
Syntax :
string[start : end : step]
start : We provide the starting index.
end : We provide the end index(this is not included in substring).
step : It is an optional argument that determines the increment between
Example:
# declaring the string
str ="Geeks for Geeks !"
# slicing using indexing sequence
print(str[: 3])
print(str[1 : 5 : 2])
print(str[-1 : -12 : -2])

4. Outline the basic string methods with examples 10M

Python string is a sequence of Unicode characters that is enclosed in the quotations marks. In this article, we will
discuss the in-built function i.e. the functions provided by the Python to operate on strings.
# Python3 program to show the
# working of upper() function
text = 'geeKs For geEkS'
# upper() function to convert
# string to upper case
print("\nConverted String:")
print(text.upper())
# lower() function to convert
# string to lower case
print("\nConverted String:")
print(text.lower())
# converts the first character to
# upper case and rest to lower case
print("\nConverted String:")
print(text.title())
# original string never changes
print("\nOriginal String")
print(text)

Function Name Description

capitalize() Converts the first character of the string to a capital (uppercase) letter
Function Name Description

casefold() Implements caseless string matching

center() Pad the string with the specified character.

count() Returns the number of occurrences of a substring in the string.

encode() Encodes strings with the specified encoded scheme

endswith() Returns “True” if a string ends with the given suffix

expandtabs() Specifies the amount of space to be substituted with the “\t” symbol in the string

find() Returns the lowest index of the substring if it is found

format() Formats the string for printing it to console

format_map() Formats specified values in a string using a dictionary

index() Returns the position of the first occurrence of a substring in a string

isalnum() Checks whether all the characters in a given string is alphanumeric or not

isalpha() Returns “True” if all characters in the string are alphabets

isdecimal() Returns true if all characters in a string are decimal

isdigit() Returns “True” if all characters in the string are digits


Function Name Description

isidentifier() Check whether a string is a valid identifier or not

islower() Checks if all characters in the string are lowercase

isnumeric() Returns “True” if all characters in the string are numeric characters

isprintable() Returns “True” if all characters in the string are printable or the string is empty

isspace() Returns “True” if all characters in the string are whitespace characters

istitle() Returns “True” if the string is a title cased string

isupper() Checks if all characters in the string are uppercase

join() Returns a concatenated String

ljust() Left aligns the string according to the width specified

lower() Converts all uppercase characters in a string into lowercase

lstrip() Returns the string with leading characters removed

maketrans() Returns a translation table

partition() Splits the string at the first occurrence of the separator

replace() Replaces all occurrences of a substring with another substring


Function Name Description

rfind() Returns the highest index of the substring

rindex() Returns the highest index of the substring inside the string

rjust() Right aligns the string according to the width specified

rpartition() Split the given string into three parts

rsplit() Split the string from the right by the specified separator

rstrip() Removes trailing characters

splitlines() Split the lines at line boundaries

startswith() Returns “True” if a string starts with the given prefix

strip() Returns the string with both leading and trailing characters

swapcase() Converts all uppercase characters to lowercase and vice versa

title() Convert string to title case

translate() Modify string according to given translation mappings

upper() Converts all lowercase characters in a string into uppercase

zfill() Returns a copy of the string with ‘0’ characters padded to the left side of the string
5. With a example explain the working of dictionary in python. 5M

Dictionary in Python is an unordered collection of data values, used to store data values like a map, which, unlike
other Data Types that hold only a single value as an element, Dictionary holds key:value pair.
Key-value is provided in the dictionary to make it more optimized.

Creating a Dictionary
In Python, a Dictionary can be created by placing a sequence of elements within curly {} braces, separated by
‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair element being
its Key:value. Values in a dictionary can be of any data type and can be duplicated, whereas keys can’t be
repeated and must be immutable.

# Creating a Dictionary
# with Integer Keys
Dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
print("\nDictionary with the use of Integer Keys: ")
print(Dict)

# Creating a Dictionary
# with Mixed keys
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("\nDictionary with the use of Mixed Keys: ")
print(Dict)

Adding elements to a Dictionary


In Python Dictionary, the Addition of elements can be done in multiple ways. One value at a time can be added
to a Dictionary by defining value along with the key e.g. Dict[Key] = ‘Value’. Updating an existing value in a
Dictionary can be done by using the built-in update() method. Nested key values can also be added to an existing
Dictionary.

# Creating an empty Dictionary


Dict = {}
print("Empty Dictionary: ")
print(Dict)

# Adding elements one at a time


Dict[0] = 'Geeks'
Dict[2] = 'For'
Dict[3] = 1
print("\nDictionary after adding 3 elements: ")
print(Dict)

# Adding set of values


# to a single Key
Dict['Value_set'] = 2, 3, 4
print("\nDictionary after adding 3 elements: ")
print(Dict)

# Updating existing Key's Value


Dict[2] = 'Welcome'
print("\nUpdated key value: ")
print(Dict)

# Adding Nested Key value to Dictionary


Dict[5] = {'Nested' :{'1' : 'Life', '2' : 'Geeks'}}
print("\nAdding a Nested Key: ")
print(Dict)

Accessing elements from a Dictionary


In order to access the items of a dictionary refer to its key name. Key can be used inside square brackets.
# Python program to demonstrate
# accessing a element from a Dictionary

# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}

# accessing a element using key


print("Accessing a element using key:")
print(Dict['name'])

# accessing a element using key


print("Accessing a element using key:")
print(Dict[1])
Removing Elements from Dictionary
Using del keyword
In Python Dictionary, deletion of keys can be done by using the del keyword. Using the del keyword, specific
values from a dictionary as well as the whole dictionary can be deleted. Items in a Nested dictionary can also be
deleted by using the del keyword and providing a specific nested key and particular key to be deleted from that
nested Dictionary.

# Initial Dictionary
Dict = { 5 : 'Welcome', 6 : 'To', 7 : 'Geeks',
'A' : {1 : 'Geeks', 2 : 'For', 3 : 'Geeks'},
'B' : {1 : 'Geeks', 2 : 'Life'}}
print("Initial Dictionary: ")
print(Dict)

# Deleting a Key value


del Dict[6]
print("\nDeleting a specific key: ")
print(Dict)

# Deleting a Key from


# Nested Dictionary
del Dict['A'][2]
print("\nDeleting a key from Nested Dictionary: ")
print(Dict)

6. With an example explain the file handling methods of python with examples 10M

Working of open() function


Before performing any operation on the file like read or write, first we have to open that file. For this, we should
use Python’s inbuilt function open()
But at the time of opening, we have to specify the mode, which represents the purpose of the opening file.
f = open(filename, mode)
# a file named "geek", will be opened with the reading mode.
file = open('geek.txt', 'r')
# This will print every line one by one in the file
for each in file:
print (each)
Working of read() mode
There is more than one way to read a file in Python. If you need to extract a string that contains all characters in
the file then we can use file.read().

# Python code to illustrate read() mode


file = open("file.txt", "r")
print (file.read())

Creating a file using write() mode

# Python code to create a file


file = open('geek.txt','w')
file.write("This is the write command")
file.write("It allows us to write in a particular file")
file.close()

Working of append() mode

# Python code to illustrate append() mode


file = open('geek.txt','a')
file.write("This will add this line")
file.close()

Using write along with the with() function

# Python code to illustrate with() alongwith write()


with open("file.txt", "w") as f:
f.write("Hello World!!!")

split() using file handling


We can also split lines using file handling in Python. This splits the variable when space is encountered. You can
also split using any characters as we wish. Here is the code:

# Python code to illustrate split() function


with open("file.text", "r") as file:
data = file.readlines()
for line in data:
word = line.split()
print (word)

7. Summarize the basic concept of Exceptional handling in python 5M


8. Illustrate exception handing in Python with suitable examples. 10M
Error in Python can be of two types i.e. Syntax errors and Exceptions. Errors are the problems in a program due
to which the program will stop the execution. On the other hand, exceptions are raised when some internal events
occur which changes the normal flow of the program.
Exceptions: Exceptions are raised when the program is syntactically correct, but the code resulted in an error.
This error does not stop the execution of the program, however, it changes the normal flow of the program.
# initialize the amount variable
marks = 10000
# perform division with 0
a = marks / 0
print(a)

Try and Except Statement – Catching Exceptions


Try and except statements are used to catch and handle exceptions in Python. Statements that can raise exceptions
are kept inside the try clause and the statements that handle the exception are written inside except clause.
# Python program to handle simple runtime error
#Python 3
a = [1, 2, 3]
try:
print ("Second element = %d" %(a[1]))
# Throws error since there are only 3 elements in array
print ("Fourth element = %d" %(a[3]))

except:
print ("An error occurred")

Catching Specific Exception


A try statement can have more than one except clause, to specify handlers for different exceptions. Please note
that at most one handler will be executed. For example, we can add IndexError in the above code.
Syntax:
try:
# statement(s)
except IndexError:
# statement(s)
except ValueError:
# statement(s)

Example:
# Program to handle multiple errors with one
# except statement
# Python 3
def fun(a):
if a < 4:
# throws ZeroDivisionError for a = 3
b = a/(a-3)
# throws NameError if a >= 4
print("Value of b = ", b)
try:
fun(3)
fun(5)
# note that braces () are necessary here for
# multiple exceptions
except ZeroDivisionError:
print("ZeroDivisionError Occurred and Handled")
except NameError:
print("NameError Occurred and Handled")

Try with Else Clause


In python, you can also use the else clause on the try-except block which must be present after all the except
clauses. The code enters the else block only if the try clause does not raise an exception.
# Program to depict else clause with try-except
# Python 3
# Function which returns a/b
def AbyB(a , b):
try:
c = ((a+b) / (a-b))
except ZeroDivisionError:
print ("a/b result in 0")
else:
print (c)
# Driver program to test above function
AbyB(2.0, 3.0)
AbyB(3.0, 3.0)
Finally Keyword in Python
Python provides a keyword finally, which is always executed after the try and except blocks. The final block
always executes after normal termination of try block or after try block terminates due to some exception.
Syntax:
try:
# Some Code....
except:
# optional block
# Handling of exception (if required)
else:
# execute if no exception
finally:
# Some code .....(always executed)

Example:
# Python program to demonstrate finally
# No exception Exception raised in try block
try:
k = 5//0 # raises divide by zero exception.
print(k)
# handles zerodivision exception
except ZeroDivisionError:
print("Can't divide by zero")
finally:
# this block is always executed
# regardless of exception generation.
print('This is always executed')
Raising Exception
The raise statement allows the programmer to force a specific exception to occur. The sole argument in raise
indicates the exception to be raised. This must be either an exception instance or an exception class (a class that
derives from Exception).
# Program to depict Raising Exception
try:
raise NameError("Hi there") # Raise Error
except NameError:
print ("An exception")
raise # To determine whether the exception was raised or not

9. With an example explain the concept of Inheritance with example. 10M

Inheritance is the capability of one class to derive or inherit the properties from another class. The benefits of
inheritance are:
1. It represents real-world relationships well.
2. It provides reusability of a code. We don’t have to write the same code again and again. Also, it allows
us to add more features to a class without modifying it.
3. It is transitive in nature, which means that if class B inherits from another class A, then all the subclasses
of B would automatically inherit from class A.

# A Python program to demonstrate inheritance


# Base or Super class. Note object in bracket.
# (Generally, object is made ancestor of all classes)
# In Python 3.x "class Person" is
# equivalent to "class Person(object)"
class Person(object):
# Constructor
def __init__(self, name):
self.name = name
# To get name
def getName(self):
return self.name
# To check if this person is an employee
def isEmployee(self):
return False

# Inherited or Subclass (Note Person in bracket)


class Employee(Person):
# Here we return true
def isEmployee(self):
return True
# Driver code
emp = Person("Geek1") # An Object of Person
print(emp.getName(), emp.isEmployee())
emp = Employee("Geek2") # An Object of Employee
print(emp.getName(), emp.isEmployee())

10. Implement the creation of constructors in python 5M


Constructors are generally used for instantiating an object. The task of constructors is to initialize(assign values)
to the data members of the class when an object of the class is created. In Python the __init__() method is called
the constructor and is always called when an object is created
Syntax of constructor declaration :
def __init__(self):
# body of the constructor
Types of constructors :

default constructor: The default constructor is a simple constructor which doesn’t accept any arguments. Its
definition has only one argument which is a reference to the instance being constructed.
parameterized constructor: constructor with parameters is known as parameterized constructor. The
parameterized constructor takes its first argument as a reference to the instance being constructed known as self
and the rest of the arguments are provided by the programmer.

Example of default constructor :


class GeekforGeeks:
# default constructor
def __init__(self):
self.geek = "GeekforGeeks"
# a method for printing data members
def print_Geek(self):
print(self.geek)
# creating object of the class
obj = GeekforGeeks()
# calling the instance method using the object obj
obj.print_Geek()

Example of the parameterized constructor :

class Addition:
first = 0
second = 0
answer = 0
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s
def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))
def calculate(self):
self.answer = self.first + self.second

# creating object of the class


# this will invoke parameterized constructor
obj = Addition(1000, 2000)
# perform Addition
obj.calculate()
# display result
obj.display()

11.Illustrate the creation of class and object with proper syntax and example.
A class is a user-defined blueprint or prototype from which objects are created. Classes provide a means of
bundling data and functionality together. Creating a new class creates a new type of object, allowing new
instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state.
Class instances can also have methods (defined by their class) for modifying their state.
To understand the need for creating a class let’s consider an example, let’s say you wanted to track the number
of dogs that may have different attributes like breed, age. If a list is used, the first element could be the dog’s
breed while the second element could represent its age.
Let’s suppose there are 100 different dogs, then how would you know which element is supposed to be which?
What if you wanted to add other properties to these dogs? This lacks organization and it’s the exact need for
classes.
Class creates a user-defined data structure, which holds its own data members and member functions, which can
be accessed and used by creating an instance of that class.
A class is like a blueprint for an object.

Some points on Python class:


• Classes are created by keyword class.
• Attributes are the variables that belong to a class.
• Attributes are always public and can be accessed using the dot (.) operator. Eg.: Myclass.Myattribute

Class Definition Syntax:

class ClassName:
# Statement-1
.
.
.
# Statement-N

Example:
# Python3 program to
# demonstrate defining
# a class

class Dog:
pass

Class Object
An Object is an instance of a Class. A class is like a blueprint while an instance is a copy of the class with actual
values. It’s not an idea anymore, it’s an actual dog, like a dog of breed pug who’s seven years old. You can have
many dogs to create many different instances, but without the class as a guide, you would be lost, not knowing
what information is required.
An object consists of :
• State: It is represented by the attributes of an object. It also reflects the properties of an object.
• Behavior: It is represented by the methods of an object. It also reflects the response of an object to other
objects.
• Identity: It gives a unique name to an object and enables one object to interact with other objects.

Example:
# Python3 program to
# demonstrate instantiating
# a class
class Dog:
# A simple class
# attribute
attr1 = "mammal"
attr2 = "dog"
# A sample method
def fun(self):
print("I'm a", self.attr1)
print("I'm a", self.attr2)
# Driver code
# Object instantiation
Rodger = Dog()

# Accessing class attributes


# and method through objects
print(Rodger.attr1)
Rodger.fun()
Unit - 3
01. Briefly discuss how a CSV file can be loaded in python (05 marks)

1.Import the csv library. import csv.


2. Open the CSV file. The . ...
3. Use the csv.reader object to read the CSV file. csvreader = csv.reader(file)
4. Extract the field names. Create an empty list called header. ...
5. Extract the rows/records. ...
6. Close the file.

A CSV(Comma Separated Values) is a plain-text file format used to store tabular data such as a spreadsheet or
a database. It essentially stores a tabular data which comprises of numbers and text into plain text. Most of the
online services give users the liberty to export data from the website into CSV file format. CSV Files generally
open into Excel and nearly all the databases have different specific tools to allow the import of the same.
Every line of the file is called a record. And each record consists of fields that are separated by commas which
are also known as “delimiter” which is the default delimiter, others include pipe(|), semicolon(;). Given below is
a structure of a Normal CSV File separated by a comma, I am making use of a titanic CSV file.

Why is CSV File Format Used?

CSV is a plain-text file which makes it easier for data interchange and also easier to import onto spreadsheet or
database storage. For example: You might want to export the data of a certain statistical analysis to CSV file and
then import it to the spreadsheet for further analysis. Overall it makes users working experience very easy
programmatically. Any language supporting a text file or string manipulation like Python can
work with CSV files directly.

As an example: Write csv lab program

import pandas as pd
# Creating Dataframe
details = pd.DataFrame({'ID': [101, 102, 103, 104, 105,106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot', 'Pooja', 'Rahul', 'Nikita','Saurabh', 'Ayush',
'Dolly', "Mohit"],
'BRANCH': ['MCA', 'EEE', 'ISE', 'MECH', 'CSE', 'ECE', 'MBA', 'AERO', 'MCA',
'CIVIL']})
# Creating Dataframe
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL','9000', '15000', 'NIL','4500', '1800', '250', 'NIL']})
# Merging Dataframe
print(pd.merge(details, fees_status, on='ID'))
# Concatination
print (pd.concat([details, fees_status]))
#diaplay non duplicate value
df = pd.DataFrame(fees_status)
# Here df.duplicated() list duplicate Entries in ROllno.
# So that ~(NOT) is placed in order to get non duplicate values.
non_duplicate = df[~df.duplicated('PENDING')]
# printing non-duplicate values
print(non_duplicate)
#filtering
df_filtered = df.query('ID>105')
print(df_filtered)
02. Elaborate the steps involved in accessing SQL database in python(10m)

Ans: Python is a high-level, general-purpose, and very popular programming language. Basically, it was designed
with an emphasis on code readability, and programmers can express their concepts in fewer lines of code. We
can also use Python with SQL. In this article, we will learn how to connect SQL with Python using the ‘MySQL
Connector Python‘ module. The diagram given below illustrates how a connection request is sent to MySQL
connector Python, how it gets accepted from the database and how the cursor is executed with result data.

SQL connection with Python

Connecting SQL with Python

To create a connection between the MySQL database and Python, the connect() method of mysql.connector
module is used. We pass the database details like HostName, username, and the password in the method call, and
then the method returns the connection object.

The following steps are required to connect SQL with Python:

Step 1: Download and Install the free MySQL database from here.

Step 2: After installing the MySQL database, open your Command prompt.

Step 3: Navigate your Command prompt to the location of PIP. Click here to see, How to install PIP?

Step 4: Now run the commands given below to download and install “MySQL Connector”. Here, mysql.connector
statement will help you to communicate with the MySQL database.
Step 5: To check if the installation was successful, or if you already installed “MySQL Connector”, go to your
IDE and run the given below code :

import mysql.connector

If the above code gets executed with no errors, “MySQL Connector” is ready to be used.

Step 6: Now to connect SQL with Python, run the code given below in your IDE.
• mysql.connector allows Python programs to access MySQL databases.
• connect() method of the MySQL Connector class with the arguments will connect to MySQL and would
return a MySQLConnection object if the connection is established successfully.
• user = “yourusername”, here “yourusername” should be the same username as you set during MySQL
installation.
• password = “your_password”, here “your_password” should be the same password as you set during
MySQL installation.
• cursor() is used to execute the SQL statements in Python.
• execute() method is used to compile a SQL statement.

03. llustrate the concept of data normalization in python 10M

Pandas: Pandas is an open-source library that’s built on top of NumPy library. it is a Python package that provides
various data structures and operations for manipulating numerical data and statistics. It’s mainly popular for
importing and analyzing data much easier. Pandas is fast and it’s high-performance & productive for users.

Data Normalization: Data Normalization could also be a typical practice in machine learning which consists of
transforming numeric columns to a standard scale. In machine learning, some feature values differ from others
multiple times. The features with higher values will dominate the learning process.
Steps Needed

Here, we will apply some techniques to normalize the data and discuss these with the help of examples. For this,
let’s understand the steps needed for data normalization with Pandas.

1. Import Library (Pandas)

2. Import / Load / Create data.

3. Use the technique to normalize the data.

Examples

Here, we create data by some random values and apply some normalization techniques to it.
import pandas as pd # importing packages

df = pd.DataFrame([

[180000, 110, 18.9, 1400],

[360000, 905, 23.4, 1800],

[230000, 230, 14.0, 1300],

[60000, 450, 13.5, 1500]],

columns=['Col A', 'Col B',

'Col C', 'Col D']) # create data frame

display(df) # view data

Output :

Normalization techniques one by one.

Using The maximum absolute scaling

The maximum absolute scaling rescales each feature between -1 and 1 by dividing every observation by its
maximum absolute value. We can apply the maximum absolute scaling in Pandas using the .max() and .abs()
methods, as shown below.

df_max_scaled = df.copy() # copy the data

for column in df_max_scaled.columns: # apply normalization techniques

df_max_scaled[column] = df_max_scaled[column] / df_max_scaled[column].abs().max()

display(df_max_scaled) # view normalized data


Output :

Using The min-max feature scaling

The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by
subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in
Pandas using the .min() and .max() methods.

df_min_max_scaled = df.copy() # copy the data

for column in df_min_max_scaled.columns: # apply normalization techniques

df_min_max_scaled[column]=(df_min_max_scaled[column] - df_min_max_scaled[column].min()) /

(df_min_max_scaled[column].max() - df_min_max_scaled[column].min())

print(df_min_max_scaled) # view normalized data

Output :

Using The z-score method

The z-score method (often called standardization) transforms the info into distribution with a mean of 0 and a
typical deviation of 1. Each standardized value is computed by subtracting the mean of the corresponding feature
then dividing by the quality deviation.

df_z_scaled = df.copy() # copy the data

for column in df_z_scaled.columns: # apply normalization techniques

df_z_scaled[column] =(df_z_scaled[column] - df_z_scaled[column].mean()) / df_z_scaled[column].std()


display(df_z_scaled) # view normalized data

Output :

Data normalization consists of remodeling numeric columns to a standard scale. In Python, we will implement
data normalization in a very simple way. The Pandas library contains multiple built-in methods for calculating
the foremost common descriptive statistical functions which make data normalization techniques very easy to
implement.

04. Demonstrate the working of strip function in python(5m)

Ans : strip() is an inbuilt function in Python programming language that returns a copy of the string with both
leading and trailing characters removed (based on the string argument passed).

Remove spaces at the beginning and at the end of the string:

txt= " banana "


x= txt.strip()

print("of all fruits", x, "is my favorite")

Definition and Usage

The strip() method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters
(space is the default leading character to remove).
Syntax
string.strip(characters)

Parameter Description

characters Optional. A set of characters to remove as leading/trailing characters


05. Depict the aspects of data preprocessing in python 10M
Ans :

Steps involved in data preprocessing :


1. Importing the required Libraries
2. Importing the data set
3. Handling the Missing Data.
4. Encoding Categorical Data.
5. Splitting the data set into test set and training set.
6. Feature Scaling.

Step 1: Importing the required Libraries

Every time we make a new model, we will require to import Numpy and Pandas. Numpy is a Library which
contains Mathematical functions and is used for scientific computing while Pandas is used to import and manage
the data sets.

Import pandas as pd
import numpy as np

Here we are importing the pandas and Numpy library and assigning a shortcut “pd” and “np” respectively.

Step 2: Importing the Dataset


Data sets are available in .csv format. A CSV file stores tabular data in plain text. Each line of the file is a data
record. We use the read_csv method of the pandas library to read a local CSV file as a dataframe.

dataset = pd.read_csv('Data.csv')

After carefully inspecting our dataset, we are going to create a matrix of features in our dataset (X) and create a
dependent vector (Y) with their respective observations. To read the columns, we will use iloc of pandas (used to
fix the indexes for selection) which takes two parameters — [row selection, column selection].
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

Step 3: Handling the Missing Data

An example of Missing data and Imputation


The data we get is rarely homogenous. Sometimes data can be missing and it needs to be handled so that it does
not reduce the performance of our machine learning model.
To do this we need to replace the missing data by the Mean or Median of the entire column. For this we will be
using the sklearn.preprocessing Library which contains a class called Imputer which will help us in taking care of
our missing data.

from sklearn.preprocessing import Imputer


imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)

Our object name is imputer. The Imputer class can take parameters like :
1. missing_values : It is the placeholder for the missing values. All occurrences of missing_values will
be imputed. We can give it an integer or “NaN” for it to find missing values.
2. strategy : It is the imputation strategy — If “mean”, then replace missing values using the mean along
the axis (Column). Other strategies include “median” and “most_frequent”.
3. axis : It can be assigned 0 or 1, 0 to impute along columns and 1 to impute along rows.

Now we fit the imputer object to our data.

imputer = imputer.fit(X[:, 1:3])

Now replacing the missing values with the mean of the column by using transform method.

X[:, 1:3] = imputer.transform(X[:, 1:3])

Step 4: Encoding categorical data


Converting Categorical data into dummy variables
Any variable that is not quantitative is categorical. Examples include Hair color, gender, field of study, college
attended, political affiliation, status of disease infection.

But why encoding ?


We cannot use values like “Male” and “Female” in mathematical equations of the model so we need to encode
these variables into numbers.
To do this we import “LabelEncoder” class from “sklearn.preprocessing” library and create an object
labelencoder_X of the LabelEncoder class. After that we use the fit_transform method on the categorical features.

After Encoding it is necessary to distinguish between between the variables in the same column, for this we will
use OneHotEncoder class from sklearn.preprocessing library.

One-Hot Encoding
One hot encoding transforms categorical features to a format that works better with classification and regression
algorithms.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder


labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

Step 5: Splitting the Data set into Training set and Test Set

Now we divide our data into two sets, one for training our model called the training set and the other for testing
the performance of our model called the test set. The split is generally 80/20. To do this we import the
“train_test_split” method of “sklearn.model_selection” library.

from sklearn.model_selection import train_test_split

Now to build our training and test sets, we will create 4 sets —
1. X_train (training part of the matrix of features),
2. X_test (test part of the matrix of features),
3. Y_train (training part of the dependent variables associated with the X train sets, and therefore also
the same indices) ,
4. Y_test (test part of the dependent variables associated with the X test sets, and therefore also the same
indices).
We will assign to them the test_train_split, which takes the parameters — arrays (X and Y), test_size (Specifies
the ratio in which to split the data set).

X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)

Step 6: Feature Scaling

Most of the machine learning algorithms use the Euclidean distance between two data points in their
computations . Because of this, high magnitudes features will weigh more in the distance calculations than
features with low magnitudes. To avoid this Feature standardization or Z-score normalization is used. This is
done by using “StandardScaler” class of “sklearn.preprocessing”.

from sklearn.preprocessing import StandardScaler


sc_X = StandardScaler()

Further we will transform our X_test set while we will need to fit as well as transform our X_train set.
The transform function will transform all the data to a same standardized scale.

X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

06. Illustrate the working of panda's series object with example(10m)


Series :A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type.

Ex Create a simple Pandas Series from a list:


import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Labels : If nothing else is specified, the values are labeled with their index number. First value has index 0, second
value has index 1 etc. This label can be used to access a specified value.

Ex Return the first value of the Series:


print(myvar[0])
Create Labels : With the index argument, you can name your own labels.

Ex Create you own labels:


import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

When you have created labels, you can access an item by referring to the label. Ex Return the value of "y":

print(myvar["y"])

Key/Value Objects as Series : You can also use a key/value object, like a dictionary, when creating a Series. Ex
Create a simple Pandas Series from a dictionary:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)

Note: The keys of the dictionary become the labels.

To select only some of the items in the dictionary, use the index argument and specify only the items you want
to include in the Series. Ex Create a Series using only data from "day1" and "day2":

import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index = ["day1", "day2"])
print(myvar)

07. Demonstrate the working of pandas DataFrame with examples(10m)


DataFrames: Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.


DataFrame A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data) #load data into a DataFrame object:
print(df)

result:
calories duration
0 420 50
1 380 40
2 390 45

Locate Row : As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the loc attribute to return one or more specified row(s):
#refer to the row index:
print(df.loc[0])

Result
calories 420
duration 50
Name: 0, dtype: int64

Return row 0 and 1:


#use a list of indexes:
print(df.loc[[0, 1]])

Result :
calories duration
0 420 50
1 380 40
Named Indexes: With the index argument, you can name your own indexes.

Ex: Add a list of names to give each row a name:


import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)

Result :
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes : Use the named index in the loc attribute to return the specified row(s).

Ex Return "day2":
#refer to the named index:
print(df.loc["day2"])

Result :

calories 380
duration 40
Name: 0, dtype: int64
Load Files Into a DataFrame : If your data sets are stored in a file, Pandas can load them into a DataFrame.

Ex: Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df)

Result :
calories duration
0 420 50
1 380 40
2 390 45

08. List and explain the steps involved in Data wrangling process in python(10m)

Data Wrangling in Python : Data Wrangling is the process of gathering, collecting, and transforming Raw data
into another format for better understanding, decision-making, accessing, and analysis in less time. Data
Wrangling is also known as Data Munging.

Data Wrangling is a very important step. The below example will explain its importance as :
Books selling Website want to show top-selling books of different domains, according to user preference. For
example, a new user search for motivational books, then they want to show those motivational books which sell
the most or having a high rating, etc.
But on their website, there are plenty of raw data from different users. Here the concept of Data Munging or Data
Wrangling is used.

As we know Data is not Wrangled by System. This process is done by Data Scientists. So, the data Scientist will
wrangle data in such a way that they will sort that motivational books that are sold more or have high ratings or
user buy this book with these package of Books, etc.

On the basis of that, the new user will make choice. This will explain the importance of Data wrangling.

Data Wrangling in Python

Data Wrangling is a crucial topic for Data Science and Data Analysis. Pandas Framework of Python is used for
Data Wrangling. Pandas is an open-source library specifically developed for Data Analysis and Data Science.

The process like data sorting or filtration, Data grouping, etc.

Data wrangling in python deals with the below functionalities:

1. Data exploration: In this process, the data is studied, analyzed and understood by visualizing representations
of data.
2. Dealing with missing values: Most of the datasets having a vast amount of data contain missing values
of NaN, they are needed to be taken care of by replacing them with mean, mode, the most frequent value of
the column or simply by dropping the row having a NaN value.
3. Reshaping data: In this process, data is manipulated according to the requirements, where new data can be
added or pre-existing data can be modified.
4. Filtering data: Some times datasets are comprised of unwanted rows or columns which are required to be
removed or filtered
5. Other: After dealing with the raw dataset with the above functionalities we get an efficient dataset as per our
requirements and then it can be used for a required purpose like data analyzing, machine learning, data
visualization, model training etc.

Below is an example which implements the above functionalities on a raw dataset:


• Data exploration, here we assign the data, and then we visualize the data in a tabular format.

# Import pandas package


import pandas as pd
# Assign data
data = {'Name': ['Jai', 'Princi', 'Gaurav',
'Anuj', 'Ravi', 'Natasha', 'Riya'],
'Age': [17, 17, 18, 17, 18, 17, 17],
'Gender': ['M', 'F', 'M', 'M', 'M', 'F', 'F'],
'Marks': [90, 76, 'NaN', 74, 65, 'NaN', 71]}
# Convert into DataFrame
df = pd.DataFrame(data)
# Display data
Df

Output
09. Illustrate the reshaping or pivoting operations of python(06m)

A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on
tabular data. The pivot table takes simple column-wise data as input, and groups the entries into a two-
dimensional table that provides a multidimensional summarization of the data.

Python has operations for rearranging tabular data, known as reshaping or pivoting operations.
What is reshaping in Python?
The numpy.reshape() function allows us to reshape an array in Python. Reshaping basically means, changing the
shape of an array. And the shape of an array is determined by the number of elements in each dimension.

Reshaping allows us to add or remove dimensions in an array. We can also change the number of elements in
each dimension.
Syntax and parameters
Here is the syntax of the function:
numpy.reshape(array, shape, order = 'C')

• array: Input array.


• shape: Integers or tuples of integers.
• order: C-contiguous, F-contiguous, A-contiguous; this is an optional parameter. ‘C’ order means that
operating row-rise on the array will be slightly quicker. ‘F’ order means that column-wise operations will
be faster. ‘A’ means to read / write the elements in Fortran-like index order.

Return value: An array which is reshaped without any change to its data.
How to use the numpy.reshape() method in Python?
Let’s take the following one-dimensional NumPy array as an example.

Input:

import numpy as np
x = np.arange(20)
print(x) #prints out the array with its elements
print()
print(x.shape) #prints out the shape of the array
print()
print(x.ndim) #prints out the dimension value of the array
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
(20,)
1
In this case, the numbers [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] are the elements of the array x. The
output (20,)is the shape of the array. Finally the value 1 printed out signifies the dimension of the array.

Now to try out the numpy reshape function, we need to specify the original array as the first argument and the
shape of the second argument as a list or tuple. However, do keep in mind that if the shape does not match the
number of elements in the original array, a ValueError will occur.

In the example below, we are converting the defined 1-D array with 20 elements into a 2-D array. The outermost
dimension will have 2 arrays that contain 2 arrays, each with 5 elements.
Python Min()
The min() function, to return the lowest value.

Python next()
The next() function returns the next item in an iterator.
Unit 4
01: Elaborate the process of web scrapping in python. (10m)
To extract data using web scraping with python, you need to follow these basic steps:
1. Find the URL that you want to scrape.
2. Inspecting the Page.
3. Find the data you want to extract.
4. Write the code.
5. Run the code and extract the data.
6. Store the data in the required format.
Web Scraping is a technique to extract a large amount of data from several websites. The term "scraping" refers
to obtaining the information from another source (webpages) and saving it into a local file. For example: Suppose
you are working on a project called "Phone comparing website," where you require the price of mobile phones,
ratings, and model names to make comparisons between the different mobile phones. If you collect these details
by checking various sites, it will take much time. In that case, web scrapping plays an important role where by
writing a few lines of code you can get the desired results.

These are the following steps to perform web scraping. Let's understand the working of web scraping.

Step -1: Find the URL that you want to scrape

First, you should understand the requirement of data according to your project. A webpage or website contains a
large amount of information. That's why scrap only relevant information. In simple words, the developer should
be familiar with the data requirement.

Step - 2: Inspecting the Page

The data is extracted in raw HTML format, which must be carefully parsed and reduce the noise from the raw
data. In some cases, data can be simple as name and address or as complex as high dimensional weather and stock
market data.

Step - 3: Write the code

Write a code to extract the information, provide relevant information, and run the code.

Step - 4: Store the data in the file

Store that information in required csv, xml, JSON file format.

Why Web Scrapping?

As we have discussed above, web scrapping is used to extract the data from websites. But we should know how
to use that raw data. That raw data can be used in various fields. Let's have a look at the usage of web scrapping:

o Dynamic Price Monitoring

It is widely used to collect data from several online shopping sites and compare the prices of products and make
profitable pricing decisions. Price monitoring using web scrapped data gives the ability to the companies to know
the market condition and facilitate dynamic pricing. It ensures the companies they always outrank others.

o Market Research
Web Scrapping is perfectly appropriate for market trend analysis. It is gaining insights into a particular
market. The large organization requires a great deal of data, and web scrapping provides the data with a
guaranteed level of reliability and accuracy.

o Email Gathering

Many companies use personals e-mail data for email marketing. They can target the specific audience for their
marketing.

o News and Content Monitoring

A single news cycle can create an outstanding effect or a genuine threat to your business. If your company
depends on the news analysis of an organization, it frequently appears in the news. So web scraping provides the
ultimate solution to monitoring and parsing the most critical stories. News articles and social media platform can
directly influence the stock market.

o Social Media Scrapping

Web Scrapping plays an essential role in extracting data from social media websites such as Twitter,
Facebook, and Instagram, to find the trending topics.

o Research and Development

The large set of data such as general information, statistics, and temperature is scrapped from websites, which
is analyzed and used to carry out surveys or research and development.

02: Elaborate the creation and working of ndarrays in python with suitable examples.(10m)
Ans : Create a NumPy ndarray Object

NumPy is used to work with arrays. The array object in NumPy is called ndarray. We can create
NumPy ndarray object by using the array() function.
Ex :
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))

type(): This built-in Python function tells us the type of the object passed to it. Like in above code it shows that
arr is numpy.ndarray type.

To create an ndarray, we can pass a list, tuple or any array-like object into the array() method, and it will be
converted into an ndarray:

Ex Use a tuple to create a NumPy array:

import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)

Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
nested array: are arrays that have arrays as their elements.
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array
Ex :Create a 0-D array with value 42

import numpy as np
arr = np.array(42)
print(arr)

1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Ex : Create a 1-D array containing the values 1,2,3,4,5:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)

2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.
NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
Ex : Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
Ex : Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

Check Number of Dimensions


NumPy Arrays provides the ndim attribute that returns an integer that tells us how many dimensions the array
have.
Ex Check how many dimensions the arrays have:

import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

Higher Dimensional Arrays


An array can have any number of dimensions.
When the array is created, you can define the number of dimensions by using the ndmin argument.
Ex Create an array with 5 dimensions and verify that it has 5 dimensions:
import numpy as np
arr = np.array([1, 2, 3, 4], ndmin=5)
print(arr)
print('number of dimensions :', arr.ndim)

In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element that is the vector, the
3rd dim has 1 element that is the matrix with the vector, the 2nd dim has 1 element that is 3D array and 1st dim
has 1 element that is a 4D array.

03: Briefly explain the NumPy array slicing with example(05m)


Ans : Slicing arrays
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end].
We can also define the step, like this: [start:end:step].
If we don't pass start its considered 0
If we don't pass end its considered length of array in that dimension
If we don't pass step its considered 1

Ex : Slice elements from index 1 to index 5 from the following array:


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])

Note: The result includes the start index, but excludes the end index.
Ex : Slice elements from index 4 to the end of the array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])

ex :Slice elements from the beginning to index 4 (not included):


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])

Negative Slicing

Use the minus operator to refer to an index from the end:


Ex :Slice from the index 3 from the end to index 1 from the end:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])

STEP
Use the step value to determine the step of the slicing:
Ex Return every other element from index 1 to index 5:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
ex :Return every other element from the entire array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])

Slicing 2-D Arrays


Ex From the second element, slice elements from index 1 to index 4 (not included):
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])

Note: Remember that second element has index 1.


Ex : From both elements, return index 2:

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])

ex : From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])

04 : Interpret the python universal function ufunc in brief.(10m)


Ans : ufuncs stands for "Universal Functions" and they are NumPy functions that operates on the ndarray object.
Why use ufuncs?
ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.
They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for
computation.

ufuncs also take additional arguments, like:


where boolean array or condition defining where the operations should take place.
dtype defining the return type of elements.
out output array where the return value should be copied.

To create you own ufunc, you have to define a function, like you do with normal functions in Python, then you
add it to your NumPy ufunc library with the frompyfunc() method.

The frompyfunc() method takes the following arguments:


1. function - the name of the function.
2. inputs - the number of input arguments (arrays).
3. outputs - the number of output arrays.

Thses r the operations which r done on the ufunc :


ufunc Simple Arithmetic
ufunc Rounding Decimals
ufunc Logs
ufunc Summations
ufunc Products
ufunc Differences
ufunc Finding LCM
ufunc Finding GCD
ufunc Trigonometric
ufunc Hyperbolic
ufunc Set Operations

Ex : Create your own ufunc for addition:

import numpy as np
def myadd(x, y):
return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

05 : Illustrate the working of dataframe. aggregate function in pandas(05m)

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric
Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Dataframe.aggregate() function is used to apply some aggregation across one or more column. Aggregate using
callable, string, dict, or list of string/callables. Most frequently used aggregations are:

sum: Return the sum of the values for the requested axis
min: Return the minimum of the values for the requested axis
max: Return the maximum of the values for the requested axis

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)


Parameters:
func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data. If a function,
must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a
dict, if the keys are DataFrame column names.
axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply
function to each row.
Returns: Aggregated DataFrame

Example #1: Aggregate ‘sum’ and ‘min’ function across all the columns in data frame.
import pandas as pd # importing pandas package
df = pd.read_csv("nba.csv") # making data frame from csv file
df[:10] # printing the first 10 rows of the dataframe
Aggregation works with only numeric type columns.
# Applying aggregation across all the columns
# sum and min will be found for each
# numeric type column in df dataframe
df.aggregate(['sum', 'min'])
Output:
For each column which are having numeric values, minimum and sum of all values has been found. For dataframe
df , we have four such columns Number, Age, Weight, Salary.

Example #2:
In Pandas, we can also apply different aggregation functions across different columns. For that, we need to pass
a dictionary with key containing the column names and values containing the list of aggregation functions for
any specific column.
import pandas as pd # importing pandas package
df = pd.read_csv("nba.csv") # making data frame from csv file
df.aggregate({"Number":['sum', 'min'],
"Age":['max', 'min'],
"Weight":['min', 'sum'],
"Salary":['sum']}) # We are going to find aggregation for these columns
Output:
Separate aggregation has been applied to each column, if any specific aggregation is not applied on a column
then it has NaN value corresponding to it.

06 : Demonstrate the NumPy data types in brief(05m)

Data Types in Python


By default Python have these data types:
• strings - used to represent text data, the text is given under quote marks. e.g. "ABCD"
• integer - used to represent integer numbers. e.g. -1, -2, -3
• float - used to represent real numbers. e.g. 1.2, 42.42
• boolean - used to represent True or False.
• complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j

Data Types in NumPy


NumPy has some extra data types, and refer to data types with one character, like i for integers, u for unsigned
integers etc.Below is a list of all data types in NumPy and the characters used to represent them.
• i - integer
• b - boolean
• u - unsigned integer
• f - float
• c - complex float
• m - timedelta
• M - datetime
• O - object
• S - string
• U - unicode string
• V - fixed chunk of memory for other type ( void )

Checking the Data Type of an Array
The NumPy array object has a property called dtype that returns the data type of the array:Ex : Get the data type
of an array object:

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)

ex : Get the data type of an array containing strings:


import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)
Creating Arrays With a Defined Data Type
We use the array() function to create arrays, this function can take an optional argument: dtype that allows us to
define the expected data type of the array elements.
Ex : Create an array with data type string:
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

For i, u, f, S and U we can define size as well.


Ex : Create an array with data type 4 bytes integer:
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='i4')
print(arr)
print(arr.dtype)

Converting Data Type on Existing Arrays


The best way to change the data type of an existing array, is to make a copy of the array with the astype() method.
The astype() function creates a copy of the array, and allows you to specify the data type as a parameter.

The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly
like float for float and int for integer.

Ex : Change data type from float to integer by using 'i' as parameter value:

import numpy as np
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')
print(newarr)
print(newarr.dtype)

ex : Change data type from float to integer by using int as parameter value:
import numpy as np
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype(int)
print(newarr)
print(newarr.dtype)

ex :Change data type from integer to boolean:


import numpy as np
arr = np.array([1, 0, 3])
newarr = arr.astype(bool)
print(newarr)
print(newarr.dtype)

07 : Interpret the following in python functions with suitable code snippet:


1. Sum 2. Min 3. Max (10m)
Ans : The min(), max(), and sum() Tuple Functions
min(): gives the smallest element in the tuple as an output. Hence, the name is min().

For example,
tup = (22, 3, 45, 4, 2.4, 2, 56, 890, 1)
>>> min(tup)
1

max(): gives the largest element in the tuple as an output. Hence, the name is max().

For example,
>>> tup = (22, 3, 45, 4, 2.4, 2, 56, 890, 1)
>>> max(tup)
890

max(): gives the sum of the elements present in the tuple as an output.

For example,
>>> tup = (22, 3, 45, 4, 2, 56, 890, 1)
>>> sum(tup)
1023

08 : List and explain the basic numerical operations on NumPy arrays(10m)


Ans :
NumPy is a Python package which means ‘Numerical Python’. It is the library for logical computing, which
contains a powerful n-dimensional array object, gives tools to integrate C, C++ and so on. It is likewise helpful
in linear based math, arbitrary number capacity and so on. NumPy exhibits can likewise be utilized as an effective
multi-dimensional compartment for generic data.

NumPy Array: Numpy array is a powerful N-dimensional array object which is in the form of rows and columns.
We can initialize NumPy arrays from nested Python lists and access it elements.

A Numpy array on a structural level is made up of a combination of:


• The Data pointer indicates the memory address of the first byte in the array.
• The Data type or dtype pointer describes the kind of elements that are contained within the array.
• The shape indicates the shape of the array.
• The strides are the number of bytes that should be skipped in memory to go to the next element.

Operations on Numpy Array


Arithmetic Operations:

# Python code to perform arithmetic operations on NumPy array

import numpy as np
arr1 = np.(0,1,2,3) # Initializing the array
print('First array:')
print(arr1)

print('\nSecond array:')
arr2 = np.array([12, 12])
print(arr2)

print('\nAdding the two arrays:')


print(np.add(arr1, arr2))
print('\nSubtracting the two arrays:')
print(np.subtract(arr1, arr2))

print('\nMultiplying the two arrays:')


print(np.multiply(arr1, arr2))

print('\nDividing the two arrays:')


print(np.divide(arr1, arr2))

Output:
First array:
[[ 0 1]
[ 2 3]]
Second array:
[12 12]
Adding the two arrays:
[[ 12 13]
[ 14 15]]
Subtracting the two arrays:
[[-12 -11]
[-10 -9]]
Multiplying the two arrays:
[[ 0 12]
[ 24 36]]
Dividing the two arrays:
[[ 0. 0.08333333]
[ 0.16666667 0.25 ]]

numpy.reciprocol()

This function returns the reciprocal of argument, element-wise. For elements with absolute values larger than 1,
the result is always 0 and for integer 0, overflow warning is issued.
Example:

# Python code to perform reciprocal operation on NumPy array


import numpy as np
arr = np.array([25, 1.33, 1, 1, 100])
print('Our array is:')
print(arr)
print('\nAfter applying reciprocal function:')
print(np.reciprocal(arr))
arr2 = np.array([25], dtype = int)
print('\nThe second array is:')
print(arr2)
print('\nAfter applying reciprocal function:')
print(np.reciprocal(arr2))

Output

Our array is:


[ 25. 1.33 1. 1. 100. ]
After applying reciprocal function:
[ 0.04 0.7518797 1. 1. 0.01 ]
The second array is:
[25]
After applying reciprocal function:
[0]

numpy.power()
This function treats elements in the first input array as the base and returns it raised to the power of the
corresponding element in the second input array.

# Python code to perform power operation on NumPy array


import numpy as np
arr = np.array([5, 10, 15])
print('First array is:')
print(arr)
print('\nApplying power function:')
print(np.power(arr, 2))
print('\nSecond array is:')
arr1 = np.array([1, 2, 3])
print(arr1)
print('\nApplying power function again:')
print(np.power(arr, arr1))

Output:
First array is:
[ 5 10 15]
Applying power function:
[ 25 100 225]
Second array is:
[1 2 3]
Applying power function again:
[ 5 100 3375]

numpy.mod()
This function returns the remainder of division of the corresponding elements in the input array. The
function numpy.remainder() also produces the same result.
# Python code to perform mod function on NumPy array

import numpy as np
arr = np.array([5, 15, 20])
arr1 = np.array([2, 5, 9])
print('First array:')
print(arr)
print('\nSecond array:')
print(arr1)
print('\nApplying mod() function:')
print(np.mod(arr, arr1))
print('\nApplying remainder() function:')
print(np.remainder(arr, arr1))

Output:
First array:
[ 5 15 20]
Second array:
[2 5 9]
Applying mod() function:
[1 0 2]
Applying remainder() function:
[1 0 2]

09: List the rules of array broadcasting (Computation on arrays) in python(10m)

Ans : NumPy Broadcasting


In Mathematical operations, we may need to consider the arrays of different shapes. NumPy can perform such
operations where the array of different shapes are involved.

For example, if we consider the matrix multiplication operation, if the shape of the two matrices is the same then
this operation will be easily performed. However, we may also need to operate if the shape is not similar.
Consider the following example to multiply two arrays.

Example
1. import numpy as np
2. a = np.array([1,2,3,4,5,6,7])
3. b = np.array([2,4,6,8,10,12,14])
4. c = a*b;
5. print(c)

Output:
[ 2 8 18 32 50 72 98]

However, in the above example, if we consider arrays of different shapes, we will get the errors as shown below.
Example
1. import numpy as np
2. a = np.array([1,2,3,4,5,6,7])
3. b = np.array([2,4,6,8,10,12,14,19])
4. c = a*b;
5. print(c)
Output:
ValueError: operands could not be broadcast together with shapes (7,) (8,)

In the above example, we can see that the shapes of the two arrays are not similar and therefore they cannot be
multiplied together. NumPy can perform such operation by using the concept of broadcasting.
In broadcasting, the smaller array is broadcast to the larger array to make their shapes compatible with each other.

Broadcasting Rules
Broadcasting is possible if the following cases are satisfied.
1. The smaller dimension array can be appended with '1' in its shape.
2. Size of each output dimension is the maximum of the input sizes in the dimension.
3. An input can be used in the calculation if its size in a particular dimension matches the output size or its
value is exactly 1.
4. If the input size is 1, then the first data entry is used for the calculation along the dimension.

Broadcasting can be applied to the arrays if the following rules are satisfied.
1. All the input arrays have the same shape.
2. Arrays have the same number of dimensions, and the length of each dimension is either a common length
or 1.
3. Array with the fewer dimension can be appended with '1' in its shape.
4.
Let's see an example of broadcasting.
Example
1. import numpy as np
2. a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
3. b = np.array([2,4,6,8])
4. print("\nprinting array a..")
5. print(a)
6. print("\nprinting array b..")
7. print(b)
8. print("\nAdding arrays a and b ..")
9. c = a + b;
10. print(c)

Output:
printing array a..

[[ 1 2 3 4]

[ 2 4 5 6]

[10 20 39 3]]

printing array b..

[2 4 6 8]

Adding arrays a and b ..

[[ 3 6 9 12]

[ 4 8 11 14]

[12 24 45 11]]
numpy.array() in Python
The homogeneous multidimensional array is the main object of NumPy. It is basically a table of elements which
are all of the same type and indexed by a tuple of positive integers. The dimensions are called axis in NumPy.
The NumPy's array class is known as ndarray or alias array.

The numpy.array is not the same as the standard Python library class array.array. The array.array handles only
one-dimensional arrays and provides less functionality.
Syntax
1. numpy.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

Parameters
There are the following parameters in numpy.array() function.
1) object: array_likeJava Try Catch Any object, which exposes an array interface whose __array__ method
returns any nested sequence or an array.

2) dtype : optional data-type


This parameter is used to define the desired parameter for the array element. If we do not define the data type,
then it will determine the type as the minimum type which will require to hold the object in the sequence. This
parameter is used only for upcasting the array.

3) copy: bool(optional)
If we set copy equals to true, the object is copied else the copy will be made when an object is a nested sequence,
or a copy is needed to satisfy any of the other requirements such as dtype, order, etc.

4) order : {'K', 'A', 'C', 'F'}, optional


The order parameter specifies the memory layout of the array. When the object is not an array, the newly created
array will be in C order (row head or row-major) unless 'F' is specified. When F is specified, it will be in Fortran
order (column head or column-major). When the object is an array, it holds the following order.
order no copy copy=True

'K' Unchanged F and C order preserved.

'A' Unchanged When the input is F and not C then F order otherwise C order

'C' C order C order


'F' F order F order

When copy=False or the copy is made for the other reason, the result will be the same as copy= True with some
exceptions for A. The default order is 'K'.

5) subok : bool(optional)
When subok=True, then sub-classes will pass-through; otherwise, the returned array will force to be a base-class
array (default).

6) ndmin : int(optional)
This parameter specifies the minimum number of dimensions which the resulting array should have. Users can
be prepended to the shape as needed to meet this requirement
Unit 5
01 : Demonstrate the concept of data visualization in python(10m)
Ans : Python provides various libraries that come with different features for visualizing data. All these libraries
come with different features and can support various types of graphs. In this tutorial, we will be discussing four
such libraries.

• Matplotlib
• Seaborn
• Bokeh
• Plotly

• Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

• Matplotlib was created by John D. Hunter.

• Matplotlib is open source and we can use it freely.

• Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for
Platform compatibility.
Plotting x and y points

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

The function takes parameters for specifying points in the diagram.

Parameter 1 is an array containing the points on the x-axis.

Parameter 2 is an array containing the points on the y-axis.

If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the plot function.

Ex : Draw a line in a diagram from position (1, 3) to position (8, 10):

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints)
plt.show()
The x-axis is the horizontal axis.
The y-axis is the vertical axis.

Plotting Without Line

To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.

Ex :Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints, 'o')
plt.show()

Multiple Points

You can plot as many points as you like, just make sure you have the same number of points in both axis.

Ex :Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
If we do not specify the points in the x-axis, they will get the default values 0, 1, 2, 3, (etc. depending on the
length of the y-points. So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:

Ex : Plotting without x-points:


import matplotlib.pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10, 5, 7])
plt.plot(ypoints)
plt.show()

02 : List and explain the different Python Libraries used for data visualization(05m)

Ans : Data Visualization is an extremely important part of Data Analysis. After all, there is no better way to
understand the hidden patterns and layers in the data than seeing them in a visual format! Don’t trust me? Well,
assume that you analyzed your company data and found out that a particular product was consistently losing
money for the company. Your boss may not pay that much attention to a written report but if you present a line
chart with the profits as a red line that is consistently going down, then your boss may pay much more attention!
This shows the power of Data Visualization!
Humans are visual creatures and hence, data visualization charts like bar charts, scatterplots, line charts,
geographical maps, etc. are extremely important. They tell you information just by looking at them whereas
normally you would have to read spreadsheets or text reports to understand the data. And Python is one of the
most popular programming languages for data analytics as well as data visualization. There are several libraries
available in recent years that create beautiful and complex data visualizations. These libraries are so popular
because they allow analysts and statisticians to create visual data models easily according to their specifications
by conveniently providing an interface, data visualization tools all in one place!

1. Matplotlib
Matplotlib is a data visualization library and 2-D plotting library of Python It was initially released in 2003 and
it is the most popular and widely-used plotting library in the Python community. It comes with an interactive
environment across multiple platforms. Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter notebook, web application servers, etc. It can be used to embed plots into applications using
various GUI toolkits like Tkinter, GTK+, wxPython, Qt, etc. So you can use Matplotlib to create plots, bar
charts, pie charts, histograms, scatterplots, error charts, power spectra, stemplots, and whatever other
visualization charts you want! The Pyplot module also provides a MATLAB-like interface that is just as
versatile and useful as MATLAB while being free and open source.

2. Plotly
Plotly is a free open-source graphing library that can be used to form data visualizations. Plotly (plotly.py) is
built on top of the Plotly JavaScript library (plotly.js) and can be used to create web-based data visualizations
that can be displayed in Jupyter notebooks or web applications using Dash or saved as individual HTML files.
Plotly provides more than 40 unique chart types like scatter plots, histograms, line charts, bar charts, pie charts,
error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc. Plotly also provides contour plots,
which are not that common in other data visualization libraries. In addition to all this, Plotly can be used offline
with no internet connection.

3. Seaborn
Seaborn is a Python data visualization library that is based on Matplotlib and closely integrated with the NumPy
and pandas data structures. Seaborn has various dataset-oriented plotting functions that operate on data frames
and arrays that have whole datasets within them. Then it internally performs the necessary statistical
aggregation and mapping functions to create informative plots that the user desires. It is a high-level interface
for creating beautiful and informative statistical graphics that are integral to exploring and understanding data.
The Seaborn data graphics can include bar charts, pie charts, histograms, scatterplots, error charts, etc. Seaborn
also has various tools for choosing color palettes that can reveal patterns in the data.
4. GGplot
Ggplot is a Python data visualization library that is based on the implementation of ggplot2 which is created
for the programming language R. Ggplot can create data visualizations such as bar charts, pie charts,
histograms, scatterplots, error charts, etc. using high-level API. It also allows you to add different types of data
visualization components or layers in a single visualization. Once ggplot has been told which variables to map
to which aesthetics in the plot, it does the rest of the work so that the user can focus on interpreting the
visualizations and take less time in creating them. But this also means that it is not possible to create highly
customized graphics in ggplot. Ggplot is also deeply connected with pandas so it is best to keep the data in
DataFrames.

5. Altair
Altair is a statistical data visualization library in Python. It is based on Vega and Vega-Lite which are a sort of
declarative language for creating, saving, and sharing data visualization designs that are also interactive. Altair
can be used to create beautiful data visualizations of plots such as bar charts, pie charts, histograms, scatterplots,
error charts, power spectra, stemplots, etc. using a minimal amount of coding. Altair has dependencies which
include python 3.6, entrypoints, jsonschema, NumPy, Pandas, and Toolz which are automatically installed with
the Altair installation commands. You can open Jupyter Notebook or JupyterLab and execute any of the code
to obtain that data visualizations in Altair. Currently, the source for Altair is available on GitHub.

6. Bokeh
Bokeh is a data visualization library that provides detailed graphics with a high level of interactivity across
various datasets, whether they are large or small. Bokeh is based on The Grammar of Graphics like ggplot but
it is native to Python while ggplot is based on ggplot2 from R. Data visualization experts can create various
interactive plots for modern web browsers using bokeh which can be used in interactive web applications,
HTML documents, or JSON objects. Bokeh has 3 levels that can be used for creating visualizations. The first
level focuses only on creating the data plots quickly, the second level controls the basic building blocks of the
plot while the third level provides full autonomy for creating the charts with no pre-set defaults. This level is
suited to the data analysts and IT professionals that are well versed in the technical side of creating data
visualizations.

7. Pygal
Pygal is a Python data visualization library that is made for creating sexy charts! (According to their website!)
While Pygal is similar to Plotly or Bokeh in that it creates data visualization charts that can be embedded into
web pages and accessed using a web browser, a primary difference is that it can output charts in the form of
SVG’s or Scalable Vector Graphics. These SVG’s ensure that you can observe your charts clearly without
losing any of the quality even if you scale them. However, SVG’s are only useful with smaller datasets as too
many data points are difficult to render and the charts can become sluggish.
8. Geoplotlib
Most of the data visualization libraries don’t provide much support for creating maps or using geographical
data and that is why geoplotlib is such an important Python library. It supports the creation of geographical
maps in particular with many different types of maps available such as dot-density maps, choropleths, symbol
maps, etc. One thing to keep in mind is that requires NumPy and pyglet as prerequisites before installation but
that is not a big disadvantage. Especially since you want to create geographical maps and geoplotlib is the only
excellent option for maps out there!

In conclusion, all these Python Libraries for Data Visualization are great options for creating beautiful and
informative data visualizations. Each of these has its strong points and advantages so you can selec t the one
that is perfect for your data visualization or project. For example, Matplotlib is extremely popular and well
suited to general 2-D plots while Geoplotlib is uniquely suite to geographical visualizations. So go on and
choose your library to create a stunning visualization in Python!

03 : Elaborate Matplotlib visualization library in depth(10m)


Ans : What Is Python Matplotlib matplotlib.pyplot is a plotting library used for 2D graphics in python
programming language. It can be used in python scripts, shell, web application servers and other graphical user
interface toolkits. What is Matplotlib used for Matploitlib is a Python Library used for plotting, this python
library provides and objected-oriented APIs for integrating plots into applications.

Is Matplotlib Included in Python Matplotlib is not a part of the Standard Libraries which is installed by
default when Python, there are several toolkits which are available that extend python matplotlib functionality.
Some of them are separate downloads, others can be shipped with the matplotlib source code but have external
dependencies.
Python Matplotlib : Types of Plots

There are various plots which can be created using python matplotlib. Some of them are listed below:
from matplotlib import pyplot as plt
plt.plot([1,2,3],[4,5,1]) #Plotting to our canvas
plt.show() #Showing what we plotted

So, with three lines of code, you can generate a basic graph using python matplotlib. Simple, isn’t it?
Let us see how can we add title, labels to our graph created by python matplotlib library to bring in more meaning
to it. Consider the below example:

from matplotlib import pyplot as plt


x = [5,2,7]
y = [2,16,4]
plt.plot(x,y)
plt.title('Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
You can even try many styling techniques to create a better graph. What if you want to change the width or color
of a particular line or what if you want to have some grid lines, there you need styling! So, let me show you how
to add style to a graph using python matplotlib. First, you need to import the style package from python matplotlib
library and then use styling functions as shown in below code:

from matplotlib import pyplot as plt


from matplotlib import style
style.use('ggplot')
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.plot(x,y,'g',label='line one', linewidth=5)
plt.plot(x2,y2,'c',label='line two',linewidth=5)
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.legend()
plt.grid(True,color='k')
plt.show()

Python Matplotlib: Bar Graph


First, let us understand why do we need a bar graph. A bar graph uses bars to compare data among different
categories. It is well suited when you want to measure the changes over a period of time. It can be represented
horizontally or vertically. Also, the important thing to keep in mind is that longer the bar, greater is the
value. Now, let us practically implement it using python matplotlib.
from matplotlib import pyplot as plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],
label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],
label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Information')
plt.show()

In the above plot, I have displayed the comparison between the distance covered by two cars BMW and Audi
over a period of 5 days. Next, let us move on to another kind of plot using python matplotlib – Histogram.

Python Matplotlib – Histogram


Let me first tell you the difference between a bar graph and a histogram. Histograms are used to show a
distribution whereas a bar chart is used to compare different entities. Histograms are useful when you have arrays
or a very long list. Let’s consider an example where I have to plot the age of population with respect to bin. Now,
bin refers to the range of values that are divided into series of intervals. Bins are usually created of the same size.
In the below code, I have created the bins in the interval of 10 which means the first bin contains elements from
0 to 9, then 10 to 19 and so on.

import matplotlib.pyplot as plt


population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()
Output –

As you can see in the above plot, we got age groups with respect to the bins. Our biggest age group is between
40 and 50.

Python Matplotlib : Scatter Plot


Usually we need scatter plots in order to compare variables, for example, how much one variable is affected by
another variable to build a relation out of it. The data is displayed as a collection of points, each having the value
of one variable which determines the position on the horizontal axis and the value of other variable determines
the position on the vertical axis.

Consider the below example:

import matplotlib.pyplot as plt


x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()

Output :

As you can see in the above graph, I have plotted two scatter plots based on the inputs specified in the above
code. The data is displayed as a collection of points having ‘high income low salary’ and ‘low income high
salary’.

Python Matplotlib : Area Plot

Area plots are pretty much similar to the line plot. They are also known as stack plots. These plots can be used to
track changes over time for two or more related groups that make up one whole category. For example, let’s
compile the work done during a day into categories, say sleeping, eating, working and playing. Consider the
below code:

import matplotlib.pyplot as plt


days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
plt.plot([],[],color='m', label='Sleeping', linewidth=5)
plt.plot([],[],color='c', label='Eating', linewidth=5)
plt.plot([],[],color='r', label='Working', linewidth=5)
plt.plot([],[],color='k', label='Playing', linewidth=5)
plt.stackplot(days, sleeping,eating,working,playing, colors=['m','c','r','k'])
plt.xlabel('x')
plt.ylabel('y')
plt.title('Stack Plot')
plt.legend()
plt.show()

Output –

As we can see in the above image, we have time spent based on the categories. Therefore, area plot or stack plot
is used to show trends over time, among different attributes. Next, let us move to our last yet most frequently
used plot – Pie chart.

Python Matplotlib : Pie Chart


A pie chart refers to a circular graph which is broken down into segments i.e. slices of pie. It is basically used to
show the percentage or proportional data where each slice of pie represents a category. Let’s have a look at the
below example:

import matplotlib.pyplot as plt


days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
slices = [7,2,2,13]
activities = ['sleeping','eating','working','playing']
cols = ['c','m','r','b']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.1,0,0),
autopct='%1.1f%%')
plt.title('Pie Plot')
plt.show()

In the above pie chart, I have divided the circle into 4 sectors or slices which represents the respective category
(playing, sleeping, eating and working) along with the percentage they hold. Now, if you have noticed these
slices adds up to 24 hrs, but the calculation of pie slices is done automatically for you. In this way, pie charts are
really useful as you don’t have to be the one who calculates the percentage or the slice of the pie.

Next in python matplotlib, let’s understand how to work with multiple plots.

Python Matplotlib : Working With Multiple Plots

I have discussed about multiple types of plots in python matplotlib such as bar plot, scatter plot, pie plot, area
plot etc. Now, let me show you how to handle multiple plots. For this, I have to import numpy module which I
discussed in my previous blog on Python Numpy. Let me implement it practically, consider the below example.

import numpy as np
import matplotlib.pyplot as plt
def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
plt.subplot(221)
plt.plot(t1, f(t1), 'bo', t2, f(t2))
plt.subplot(222)
plt.plot(t2, np.cos(2*np.pi*t2))
plt.show()

The code is pretty much similar to the previous examples that you have seen but there is one new concept here
i.e. subplot. The subplot() command specifies numrow, numcol, fignum which ranges from 1 to
numrows*numcols. The commas in this command are optional if numrows*numcols<10. So subplot (221) is
identical to subplot (2,2,1). Therefore, subplots helps us to plot multiple graphs in which you can define it by
aligning vertically or horizontally. In the above example, I have aligned it horizontally.

Apart from these, python matplotlib has some disadvantages. Some of them are listed below:

• They are heavily reliant on other packages, such as NumPy.


• It only works for python, so it is hard or impossible to be used in languages other than python. (But it can
be used from Julia via PyPlot package).

04 : Illustrate Seaborn visualization library(10m)

05: Elaborate the countplot and histogram features of visualization

Ans : Seaborn is a Python data visualization library based on the Matplotlib library. It provides a high-level
interface for drawing attractive and informative statistical graphs. Here in this article, we’ll learn how to create
basic plots using the Seaborn library. Such as:

▪ Scatter Plot
▪ Histogram
▪ Bar Plot
▪ Box and Whiskers Plot
▪ Pairwise Plots

Scatter Plot:

Scatter plots can be used to show a linear relationship between two or three data points using the seaborn library.
A Scatter plot of price vs age with default arguments will be like this:

plt.style.use("ggplot")
plt.figure(figsize=(8,6))
sns.regplot(x = cars_data["Age"], y = cars_data["Price"])
plt.show()

Here, regplot means Regression Plot. By default fit_reg = True. It estimates and plots a regression model
relating the x and y variable.
Histogram:

In order to draw a histogram in Seaborn, we have a function called distplot and inside that, we have to pass the
variable which we want to include. Histogram with default kernel density estimate:

plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'])
plt.show()

For the x-axis, we are giving Age and the histogram is by default include kernel density estimate (kde). Kernel
density estimate is the curved line along with the bins or the edges of the frequency of the Ages. If you want to
remove the Kernel density estimate (kde) then use kde = False.

plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'],kde=False)
plt.show()
After that, you got frequency as the y-axis and the age of the car as the x-axis. If you want to organize all the
different intervals or bins, you can use the bins parameter on the distplot function. Let’s use bins = 5 on
the distplot function. It will organize your bins into five bins or intervals.

plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'],kde=False,bins=5)
plt.show()

Now you can say that from age 65 to 80 we have more than 500 cars.

Bar Plot:

Bar plot is for categorical variables. Bar plot is the commonly used plot because of its simplicity and it’s easy to
understand data through them. You can plot a barplot in seaborn using the countplot library. It’s really simple.
Let’s plot a barplot of FuelType.

plt.figure(figsize=(8,6))
sns.countplot(x="FuelType", data=cars_data)
plt.show()
In the y-axis, we have got the frequency distribution of FuelType of the cars.

Grouped Bar Plot:


We can plot a barplot between two variables. That’s called grouped barplot. Let’s plot a barplot of FuelType
distributed by different values of the Automatic column.

plt.figure(figsize=(8,6))
sns.countplot(x="FuelType", data=cars_data,
hue="Automatic")
plt.show()

05 : With a snippet of code demonstrate the univariate (box plot) nature of visualization.

Ans : BOX PLOTS :


A box-plot is a very useful and standardized way of displaying the distribution of data based on a five-number
summary (minimum, first quartile, second quartile(median), third quartile, maximum). It helps in understanding
these parameters of the distribution of data and is extremely helpful in detecting outliers.
Plotting box plot of variable ‘sepal.width’ :

Plotting box plots of all variables in one frame :

Since the box plot is for continuous variables, firstly create a data frame without the column ‘variety’. Then drop
the column from the DataFrame using the drop( ) function and specify axis=1 to indicate it.
In matplotlib, mention the labels separately to display it in the output.

The plotting box plot in seaborn :


Plotting the box plots of all variables in one frame :
Apply the pandas function pd.melt() on the modified data frame which is then passed onto
the sns.boxplot() function.

06 : With a snippet of code demonstrate the bivariate (scatter plot) nature of visualization

UNIVARIATE SCATTER PLOT :

This plots different observations/values of the same variable corresponding to the index/observation number.
Consider plotting of the variable ‘sepal length(cm)’ :
Use the plt.scatter() function of matplotlib to plot a univariate scatter diagram. The scatter() function requires
two parameters to plot. So, in this example, we plot the variable ‘sepal.width’ against the corresponding
observation number that is stored as the index of the data frame (df.index).

Then visualize the same plot by considering its variety using the sns.scatterplot() function of the seaborn library.

One of the interesting features in seaborn is the ‘hue’ parameter. In seaborn, the hue parameter determines which
column in the data frame should be used for color encoding. This helps to differentiate between the data values
according to the categories they belong to. The hue parameter takes the grouping variable as it’s input using
which it will produce points with different colors. The variable passed onto ‘hue’ can be either categorical or
numeric, although color mapping will behave differently in the latter case.

Note:Every function has got a wide variety of parameters to play with to produce better results. If one is using
Jupyter notebook, the various parameters of the function used can be explored by using the ‘Shift+Tab’ shortcut.

08: Illustrate the time series analysis in python


Ans : A time series is a sequence of observations over a certain period. The simplest example of a time series
that all of us come across on a day to day basis is the change in temperature throughout the day or week or month
or year.
The analysis of temporal data is capable of giving us useful insights on how a variable changes over time.
What is a Time Series?

Time series is a sequence of observations recorded at regular time intervals. Depending on the frequency of
observations, a time series may typically be hourly, daily, weekly, monthly, quarterly and annual.
Sometimes, you might have seconds and minute-wise time series as well, like, number of clicks and user
visits every minute etc. Why even analyze a time series? Because it is the preparatory step before you
develop a forecast of the series. Besides, time series forecasting has enormous commercial significance
because stuff that is important to a business like demand and sales, number of visitors to a website, stock
price etc are essentially time series data. So what does analyzing a time series involve? Time serie s analysis
involves understanding various aspects about the inherent nature of the series so that you are better informed
to create meaningful and accurate forecasts.
How to import time series in python?

So how to import time series data? The data for a time series typically stores in .csv files or other spreadsheet
formats and contains two columns: the date and the measured value. Let’s use the read_csv() in pandas
package to read the time series dataset (a csv file on Australian Drug Sales) as a pandas dataframe. Adding
the parse_dates=['date'] argument will make the date column to be parsed as a date field.

from dateutil.parser import parse


import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120})
# Import as Dataframe
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df.head()

Dataframe Time Series


Alternately, you can import it as a pandas Series with the date as index. You just need to specify
the index_col argument in the pd.read_csv() to do this.

ser = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',

parse_dates=['date'], index_col='date')

ser.head()
Series Timeseries
Note, in the series, the ‘value’ column is placed higher than date to imply that it is a series.

Visualizing a time series

Let’s use matplotlib to visualise the series.

# Time series data source: fpp pacakge in R.

import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'],

index_col='date')

# Draw Plot

def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):

plt.figure(figsize=(16,5), dpi=dpi)

plt.plot(x, y, color='tab:red')

plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)

plt.show()

plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to 2008.')
Visualizing Time Series
Patterns in a time series

Any time series may be split into the following components: Base Level + Trend + Seasonality + Error A
trend is observed when there is an increasing or decreasing slope obs erved in the time series. Whereas
seasonality is observed when there is a distinct repeated pattern observed between regular intervals due to
seasonal factors. It could be because of the month of the year, the day of the month, weekdays or even time
of the day. However, It is not mandatory that all time series must have a trend and/or seasonality. A time
series may not have a distinct trend but have a seasonality. The opposite can also be true. So, a time series
may be imagined as a combination of the trend, seasonality and the error terms.

fig, axes = plt.subplots(1,3, figsize=(20,4), dpi=100)

pd.read_csv('hello.csv', parse_dates=['date'], index_col='date').plot(title='Trend Only', legend=False,


ax=axes[0])

pd.read_csv('hii.csv', parse_dates=['date'], index_col='date').plot(title='Seasonality Only', legend=False,


ax=axes[1])

pd.read_csv('ok.csv', parse_dates=['date'], index_col='date').plot(title='Trend and Seasonality',


legend=False, ax=axes[2])

Patterns in Time Series

Another aspect to consider is the cyclic behaviour. It happens when the rise and fall pattern in the series
does not happen in fixed calendar-based intervals. Care should be taken to not confuse ‘cyclic’ effect with
‘seasonal’ effect. So, How to diffentiate between a ‘cyclic’ vs ‘seasonal’ pattern? If the patterns are not of
fixed calendar based frequencies, then it is cyclic. Because, unlike the seasonality, cyclic effects are
typically influenced by the business and other socio-economic factors.

You might also like