You are on page 1of 29

2022-2023 Python For Data Science

By GTU Medium

GTU Medium
GTU Medium
PYTHON FOR DATA SCIENCE

Unit-1: Overview of Python and Data Structures

Question 1. List Advantages of Python.


Advantages of Python:

1. Easy to Read, Learn and Write


Python is a high-level programming language that has English-like syntax. This makes it easier to
read and understand the code.
Python is really easy to pick up and learn, that is why a lot of people recommend Python to
beginners. You need less lines of code to perform the same task as compared to other major
languages like C/C++ and Java.
2. Improved Productivity
Python is a very productive language. Due to the simplicity of Python, developers can focus on
solving the problem. They don’t need to spend too much time in understanding the syntax or
behavior of the programming language. You write less code and get more things done.
3. Interpreted Language
Python is an interpreted language which means that Python directly executes the code line by line.
In case of any error, it stops further execution and reports back the error which has occurred.
Python shows only one error even if the program has multiple errors. This makes debugging easier.
4. Dynamically Typed

Python doesn’t know the type of variable until we run the code. It automatically assigns the data
type during execution. The programmer doesn’t need to worry about declaring variables and their
data types.

5. Free and Open-Source


Python comes under the OSI approved open-source license. This makes it free to use and
distribute. You can download the source code, modify it and even distribute your version of
Python. This is useful for organizations that want to modify some specific behavior and use their
version for development.

6. Vast Libraries Support


The standard library of Python is huge, you can find almost all the functions needed for your task.
So, you don’t have to depend on external libraries.

Page 1 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

But even if you do, a Python package manager (pip) makes things easier to import other great
packages from the Python package index (PyPi). It consists of over 200,000 packages.
7. Portability
In many languages like C/C++, you need to change your code to run the program on different
platforms. That is not the same with Python. You only write once and run it anywhere.

Question 2. Explain Data Types of Python with suitable example.

Numbers: Number stores numeric values. The integer, float, and complex values belong to a
Python Numbers data-type. Python provides the type() function to know the data-type of the
variable. Similarly, the isinstance() function is used to check an object belongs to a particular class.

Sequence Type: The string can be defined as the sequence of characters represented in the
quotation marks. In Python, we can use single, double, or triple quotes to define a string.
String handling in Python is a straightforward task since Python provides built-in functions and
operators to perform operations in the string.

str1 = 'hello javatpoint'

Page 2 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

In the case of string handling, the operator + is used to concatenate two strings as the operation
"hello"+" python" returns "hello python".
The operator * is known as a repetition operator as the operation "Python" *2 returns 'Python
Python'.
Python Lists are similar to arrays in C. However, the list can contain data of different types. The
items stored in the list are separated with a comma (,) and enclosed within square brackets [].

list1 = [1, "hi", "Python", 2]

We can use slice [:] operators to access the data of the list. The concatenation operator (+) and
repetition operator (*) works with the list in the same way as they were working with the strings.
A tuple is similar to the list in many ways. Like lists, tuples also contain the collection of the items
of different data types. The items of the tuple are separated with a comma (,) and enclosed in
parentheses ().

tuple = ("hi", "Python", 2)

A tuple is a read-only data structure as we can't modify the size and value of the items of a tuple.
Boolean: Boolean type provides two built-in values, True and False. These values are used to
determine the given statement true or false. It denotes by the class bool. True can be represented
by any non-zero value or 'T' whereas false can be represented by the 0 or 'F'.
Set: Python Set is the unordered collection of the data type. It is iterable, mutable(can modify after
creation), and has unique elements. In set, the order of the elements is undefined; it may return
the changed sequence of the element. The set is created by using a built-in function set(), or a
sequence of elements is passed in the curly braces and separated by the comma. It can contain
various types of values.

set = {'James', 2, 3,'Python'}

Dictionary: Dictionary is an unordered set of a key-value pair of items. It is like an associative array
or a hash table where each key stores a specific value. Key can hold any primitive data type,
whereas value is an arbitrary Python object.

d = {1:'Jimmy', 2:'Alex', 3:'john', 4:'mike'}

The items in the dictionary are separated with the comma (,) and enclosed in the curly braces {}.

Page 3 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Question 3. Explain List in Python with suitable example.


In Python, a list is created by placing elements inside square brackets [] , separated by commas.

 A list can have any number of items and they may be of different types (integer, float,
string, etc.). A list can also have another list as an item. This is called a nested list.

thislist = ["apple", "banana", "cherry"]


print(thislist)

 List items are ordered, changeable, and allow duplicate values.

 List items are indexed, the first item has index [0], the second item has index [1] etc.

 If you add new items to a list, the new items will be placed at the end of the list.

 The list is changeable, meaning that we can change, add, and remove items in a list after
it has been created.

 To determine how many items a list has, use the len() function:

thislist = ["apple", "banana", "cherry"]


print(len(thislist))

Question 4. Explain Set in Python with suitable example.


Sets are used to store multiple items in a single variable.

Set is one of 4 built-in data types in Python used to store collections of data, the other 3 are List,
Tuple, and Dictionary, all with different qualities and usage.

 A set is a collection which is unordered, unchangeable*, and unindexed.

 Unordered means that the items in a set do not have a defined order.

Page 4 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

 Set items are unchangeable, meaning that we cannot change the items after the set has
been created.

 Sets cannot have two items with the same value.

 To determine how many items a set has, use the len() function.

thisset = {"apple", "banana", "cherry"}


print(len(thisset))

Question 5. Explain Tuple in Python with suitable example.


Tuples are used to store multiple items in a single variable.
Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3 are List,
Set, and Dictionary, all with different qualities and usage.

 A tuple is a collection which is ordered and unchangeable.

 Tuples are written with round brackets.

thistuple = ("apple", "banana", "cherry")


print(thistuple)

 Tuple items are ordered, unchangeable, and allow duplicate values.

 Tuples are ordered, it means that the items have a defined order, and that order will not
change.

 Tuples are unchangeable, meaning that we cannot change, add or remove items after the
tuple has been created.

 Tuples are indexed, they can have items with the same values.

 Tuple items are indexed, the first item has index [0], the second item has index [1] etc.

Page 5 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

6. Explain Dictionary Python with suitable example.


Python dictionary is an unordered collection of items. Each item of a dictionary has a key/value
pair.
Dictionaries are optimized to retrieve values when the key is known.
Creating a dictionary is as simple as placing items inside curly braces {} separated by commas.

Dict = {"Name": "Tom", "Age": 22}

An item has a key and a corresponding value that is expressed as a pair (key: value).
While the values can be of any data type and can repeat, keys must be of immutable type (string,
number or tuple with immutable elements) and must be unique.

7. Explain expression evaluation in Python with suitable example.


A Python program contains one or more statements. A statement contains zero or more
expressions. Python executes a statement by evaluating its expressions to values one by one.
Python evaluates an expression by evaluating the sub-expressions and substituting their values.
Literal Expressions

A literal expression evaluates to the value it represents. Following are some examples of literal
expressions:

10 => 10

‘Welcome to Python’ => ‘Welcome to Python’

7.89 => 7.89

True => True

Binary Expressions

A binary expression consists of a binary operator applied to two operand expressions. Examples:

2*6 => 12

8-6 => 2

8==8 => True

Page 6 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

1000 > 100 => True

‘hello’ + ‘world’ => ‘helloworld’

Unary Expressions
An unary expression contains one operator and single operand. Examples:

-(5/5) => -1

-(3*4) => -12

-(2**4) => -16

-10 => -10

Compound Expressions
In a binary or unary expression, if an operand itself is an expression, such expression is known as
a compound expression. Examples:

3 * 2 + 1 => 7

2 + 6 * 2 => 14

(2 + 6) * 2 => 16

Variable Access Expressions


A variable access expressions allows us to access the value of a variable. Examples:

>>>x = 10

>>>(x + 2) * 4

48

>>>x

10

Page 7 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Question 8. Explain functions in Python with suitable example.


Python includes many built-in functions. These functions perform a predefined task and can be
called upon in any program, as per requirement. However, if you don't find a suitable built-in
function to serve your purpose, you can define one. We will now see how to define and use a
function in a Python program.
Defining a Function: A function is a reusable block of programming statements designed to perform
a certain task. To define a function, Python provides the def keyword. The following is the syntax
of defining a function.

def function_name(parameters):

"""docstring"""

statement1

statement2….

return [expr]

You can define functions to provide the required functionality. Here are simple rules to define a
function in Python.
 Function blocks begin with the keyword def followed by the function name and
parentheses ( ( ) ).
 Any input parameters or arguments should be placed within these parentheses. You can
also define parameters inside these parentheses.

 The first statement of a function can be an optional statement - the documentation string
of the function or docstring.
 The code block within every function starts with a colon (:) and is indented.
 The statement return [expression] exits a function, optionally passing back an expression
to the caller. A return statement with no arguments is the same as return None.

def my_function():

print("Hello from a function")

my_function()

Page 8 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Question 9. Explain String Slicing in python with Example.


Python slicing is about obtaining a sub-string from the given string by slicing it respectively from
start to end.
Method 1: Using slice() method
The slice() constructor creates a slice object representing the set of indices specified by
range(start, stop, step).
Syntax:
 slice(stop)

 slice(start, stop, step)


Parameters:
start: Starting index where the slicing of object starts.
stop: Ending index where the slicing of object stops.
step: It is an optional argument that determines the increment between each index for slicing.

Return Type: Returns a sliced object containing elements in the given range only.

# Python program to demonstrate

# String slicing

String = 'ASTRING'

# Using slice constructor

s1 = slice(3)

s2 = slice(1, 5, 2)

s3 = slice(-1, -12, -2)

print(“String slicing”)

print(String[s1])

print(String[s2])

print(String[s3])

Page 9 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Output:
String slicing
AST
SR

GITA

In this example, we will see the example of starting from 1 index and ending with a 5 index(stops
at 3-1=2 ), and the skipping step is 2. It is a good example of Python slicing string by character.

# Python program to demonstrate

# string slicing

# String slicing

String = 'GEEKS'

# Using indexing sequence

print(String[1:5:2])

print(String[:3])

Output:

EK
GEE

Page 10 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

11. Explain String Formatting in python with example.


There are four different ways to perform string formatting in Python:
1. Formatting with % Operator.

print("The mangy, scrawny stray dog %s gobbled down" +

"the grain-free, organic dog food." %'hurriedly')

Output:
The mangy, scrawny stray dog hurriedly gobbled down the grain-free, organic dog food.

2. Formatting with format() string method.

print('We all are {}.'.format('equal'))

Output:
We all are equal.
3. Formatting with string literals, called f-strings.

name = 'Ele'

print(f"My name is {name}.")

Output:
My name is Ele.

4. Formatting with String Template Class

# Python program to demonstrate

# string interpolation

from string import Template

Page 11 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

n1 = 'Hello'

n2 = 'GTU Medium'

# made a template which we used to

# pass two variable so n3 and n4

# formal and n1 and n2 actual

n = Template('$n3 ! This is $n4.')

# and pass the parameters into the

# template string.

print(n.substitute(n3=n1, n4=n2))

Output:
Hello ! This is GTU Medium.

11. Explain Loops in Python with suitable example.


Sr.No. Loop Type & Description
1 while loop
Repeats a statement or group of statements while a given condition is TRUE. It tests
the condition before executing the loop body.
while expression:
statement(s)
2 for loop
Executes a sequence of statements multiple times and abbreviates the code that
manages the loop variable.
for iterating_var in sequence:
statements(s)
3 nested loops

Page 12 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

You can use one or more loop inside any another while, for or do..while loop.
for iterating_var in sequence:
for iterating_var in sequence:
statements(s)
statements(s)

While Loop

i=1

while i < 6:

print(i)

i += 1

Output:
1

2
3
4
5

For loop

word="anaconda"

for letter in word:

print (letter)

Output:
a
n

Page 13 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

a
c
o
n

d
a

Nested loops

adj = ["red", "big", "tasty"]

fruits = ["apple", "banana", "cherry"]

for x in adj:

for y in fruits:

print(x, y)

Output:
red apple
red banana
red cherry

big apple
big banana
big cherry
tasty apple
tasty banana

tasty cherry

Page 14 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Sr.No. Control Statement & Description

1 break statement
Terminates the loop statement and transfers execution to the statement immediately
following the loop.
break
2 continue statement
Causes the loop to skip the remainder of its body and immediately retest its condition prior
to reiterating.
continue
3 pass statement
The pass statement in Python is used when a statement is required syntactically but you do
not want any command or code to execute.
pass

Python Break Statement

for i in range(10):

print(i)

if i == 2:

break

Output:
0
1
2

Python continue statement

for var in "Geeksforgeeks":

if var == "e":

Page 15 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

continue

print(var)

Output:
G
k

s
f
o
r
g

k
s

Python pass Statement

li =['a', 'b', 'c', 'd']

for i in li:

if(i =='a'):

pass

else:

print(i)

Output:
b
c
d

Page 16 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Unit-2: Data Science and Python


1. Justify why python is most suitable language for Data Science.
Python is open source, interpreted, high level language and provides great approach for object-
oriented programming. It is one of the best language used by data scientist for various data science
projects/application. Python provide great functionality to deal with mathematics, statistics and
scientific function. It provides great libraries to deals with data science application.
One of the main reasons why Python is widely used in the scientific and research communities is
because of its ease of use and simple syntax which makes it easy to adapt for people who do not
have an engineering background. It is also more suited for quick prototyping.
According to engineers coming from academia and industry, deep learning frameworks available
with Python APIs, in addition to the scientific packages have made Python incredibly productive
and versatile. There has been a lot of evolution in deep learning Python frameworks and it’s rapidly
upgrading.

In terms of application areas, ML scientists prefer Python as well. When it comes to areas like
building fraud detection algorithms and network security, developers leaned towards Java, while
for applications like natural language processing (NLP) and sentiment analysis, developers opted
for Python, because it provides large collection of libraries that help to solve complex business
problem easily, build strong system and data application.
Following are some useful features of Python language:

 It uses the elegant syntax, hence the programs are easier to read.

 It is a simple to access language, which makes it easy to achieve the program working.
 The large standard library and community support.
 The interactive mode of Python makes its simple to test codes.
 In Python, it is also simple to extend the code by appending new modules that are
implemented in other compiled language like C++ or C.
 Python is an expressive language which is possible to embed into applications to offer a
programmable interface.
 Allows developer to run the code anywhere, including Windows, Mac OS X, UNIX, and
Linux.
 It is free software in a couple of categories. It does not cost anything to use or download
Pythons or to add it to the application.

Page 17 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

2. Explain Core competencies of a data scientist.


A data scientist must demonstrate understanding of how big data is used, the big data ecosystem
and its major components. The data scientist must also demonstrate expertise with big data
platforms, such as Hadoop and Spark. 7 Leadership and professional development. Data scientists
must be good problem solvers.
“The familiarity and ability to use Hadoop, Java, Python, SQL, Hive, and Pig are core essentials.
Programming itself and computer science in general is the very starting point of gathering data
and understanding how to 'get' data and piece it together,” writes Mitchell A.

They have to know math, statistics, programming, data management, visualization, and what not
to be a “full-stack” data scientist. As I mentioned earlier, 80% of the work goes into preparing the
data for processing in an industry setting.

According to professor Haider, the three important qualities to possess in order to succeed as a
data scientist are curious, judgemental, and proficient in programming.

3. Explain steps of Data Science Pipeline.


The data science pipeline refers to the process and tools used to gather raw data from multiple
sources, analyze it, and present the results in an understandable format. Companies utilize the
process to answer specific business questions and create actionable insights based on real data.

The raw data undergoes different stages within a pipeline which are:
1) Fetching/Obtaining the Data
This stage involves the identification of data from the internet or internal/external databases and
extracts into useful formats. Prerequisite skills:
 Distributed Storage: Hadoop, Apache Spark/Flink.

 Database Management: MySQL, PostgresSQL, MongoDB.


 Querying Relational Databases.
 Retrieving Unstructured Data: text, videos, audio files, documents.

Page 18 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

2) Scrubbing/Cleaning the Data


This is the most time-consuming stage and requires more effort. It is further divided into two
stages:
 Examining Data:
 identifying errors

 identifying missing values


 identifying corrupt records
 Cleaning of data:
 replace or fill missing values/errors
Prerequisite skills:

 Coding language: Python, R.


 Data Modifying Tools: Python libs, Numpy, Pandas, R.
 Distributed Processing: Hadoop, Map Reduce/Spark.
3) Exploratory Data Analysis
When data reaches this stage of the pipeline, it is free from errors and missing values, and hence
is suitable for finding patterns using visualizations and charts.

Page 19 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Prerequisite skills:
 Python: NumPy, Matplotlib, Pandas, SciPy.
 R: GGplot2, Dplyr.
 Statistics: Random sampling, Inferential.

 Data Visualization: Tableau.


4) Modeling the Data
This is that stage of the data science pipeline where machine learning comes to play. With the help
of machine learning, we create data models. Data models are nothing but general rules in a
statistical sense, which is used as a predictive tool to enhance our business decision-making.
Prerequisite skills:
 Machine Learning: Supervised/Unsupervised algorithms.
 Evaluation methods.
 Machine Learning Libraries: Python (Sci-kit Learn, NumPy).

 Linear algebra and Multivariate Calculus.


5) Interpreting the Data
Similar to paraphrasing your data science model. Always remember, if you can’t explain it to a six-
year-old, you don’t understand it yourself. So, communication becomes the key!! This is the most
crucial stage of the pipeline, wherewith the use of psychological techniques, correct business
domain knowledge, and your immense storytelling abilities, you can explain your model to the
non-technical audience.
Prerequisite skills:
 Business domain knowledge.
 Data visualization tools: Tableau, D3.js, Matplotlib, ggplot2, Seaborn.
 Communication: Presenting/speaking and reporting/writing.

6) Revision
As the nature of the business changes, there is the introduction of new features that may degrade
your existing models. Therefore, periodic reviews and updates are very important from both
business’s and data scientist’s point of view.

Page 20 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Data science is not about great machine learning algorithms, but about the solutions which you
provide with the use of those algorithms. It is also very important to make sure that your pipeline
remains solid from start till end, and you identify accurate business problems to be able to bring
forth precise solutions.

4. Explain different programming styles (programming paradigms) in python.


Paradigm can also be termed as a method to solve some problems or do some tasks. A
programming paradigm is an approach to solve the problem using some programming language
or also we can say it is a method to solve a problem using tools and techniques that are available
to us following some approach. There are lots of programming languages that are known but all
of them need to follow some strategy when they are implemented and this methodology/strategy
is paradigms. Apart from varieties of programming languages, there are lots of paradigms to fulfill
each and every demand.
Python supports three types of Programming paradigms:

1. Object Oriented programming paradigms


In the object-oriented programming paradigm, objects are the key element of paradigms. Objects
can simply be defined as the instance of a class that contains both data members and the method
functions.
Moreover, the object-oriented style relates data members and methods functions that support
encapsulation and with the help of the concept of an inheritance, the code can be easily reusable
but the major disadvantage of object-oriented programming paradigm is that if the code is not
written properly then the program becomes a monster.

Advantages
 Relation with Real world entities
 Code reusability
 Abstraction or data hiding
Disadvantages

 Data protection
 Not suitable for all types of problems
 Slow Speed

Page 21 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

# class Emp has been defined here

class Emp:

def __init__(self, name, age):

self.name = name

self.age = age

def info(self):

print("Hello, % s. You are % s old." % (self.name, self.age))

# Objects of class Emp has been

# made here

Emps = [Emp("John", 43),

Emp("Hilbert", 16),

Emp("Alice", 30)]

# Objects of class Emp has been

# used here

for emp in Emps:

emp.info()

Output:
Hello, John. You are 43 old.
Hello, Hilbert. You are 16 old.
Hello, Alice. You are 30 old.

Page 22 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

2. Procedural programming paradigms


In Procedure Oriented programming paradigms, series of computational steps are divided
modules which means that the code is grouped in functions and the code is serially executed
step by step so basically, it combines the serial code to instruct a computer with each step to
perform a certain task. This paradigm helps in the modularity of code and modularization is
usually done by the functional implementation. This programming paradigm helps in an easy
organization related items without difficulty and so each file acts as a container.
Advantages
 General-purpose programming
 Code reusability
 Portable source code
Disadvantages
 Data protection
 Not suitable for real-world objects
 Harder to write

Example:

# Procedural way of finding sum

# of a list

mylist = [10, 20, 30, 40]

# modularization is done by

# functional approach

def sum_the_list(mylist):

res = 0

Page 23 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

for val in mylist:

res += val

return res

print(sum_the_list(mylist))

Output:
100

3. Functional programming paradigms


Functional programming paradigms is a paradigm in which everything is bind in pure
mathematical functions style. It is known as declarative paradigms because it uses declarations
overstatements. It uses the mathematical function and treats every statement as functional
expression as an expression is executed to produce a value. Lambda functions or Recursion are
basic approaches used for its implementation. The paradigms mainly focus on “what to solve”
rather than “how to solve”. The ability to treat functions as values and pass them as an argument
make the code more readable and understandable.
Advantages
 Simple to understand
 Making debugging and testing easier
 Enhances the comprehension and readability of the code
Disadvantages
 Low performance
 Writing programs is a daunting task
 Low readability of the code

# Functional way of finding sum of a list

import functools

Page 24 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

mylist = [11, 22, 33, 44]

# Recursive Functional approach

def sum_the_list(mylist):

if len(mylist) == 1:

return mylist[0]

else:

return mylist[0] + sum_the_list(mylist[1:])

# lambda function is used

print(functools.reduce(lambda x, y: x + y, mylist))

Output:
110

5. Explain Factors affecting Speed of Execution.

The time complexity of that program(the algorithms you use to build the program, for example
using merge sort will be a lot faster on average than bubble sort because the time complexity of
the first is in the order of n log(n )and the latter is n^2).

The data that you want to process .Algorithms work differently on different sizes of input...
The power of the computer used to run that program like how many cores, the architecture, and
all of those stuff the guy at the store tells you when you want to buy a computer.
The language used in to build the program for example C++ is much faster than java because it's a
lower level programming language (closer to the hardware) while java is a high level programming
language.
Factors affecting Speed of Execution
1) Algorithm time complexity.
2) Input size.
Page 25 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

3) CPU speed.
4) I/O waiting time.
5) Amount of running processes.
6) Amount of available memory.

7) Programming language used.

Page 26 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Unit-3: Getting Your Hands Dirty With Data


1. List different IDE of Pythons. Explain advantages and disadvantages of each.
2. Write a short note on Jupyter notebooks.
3. Explain Basic IO operations in Python.

4. Write a short note on Data Conditioning.


5. Write a short note on Data Shaping.
6. Differentiate Numpy and Pandas
7. Explain Numpy Array with example.
8. Differentiate rand and randn function in Numpy.

9. List and Explain Numpy Aggregation functions with example.


10 Explain Series in Pandas with example.
11 Explain Data Frame in Pandas with example.
12 Explain Multi-Index DataFrame in pandas with example.
13. Explain Cross Section in DataFrame with Example.

14. Explain how to deal with missing data in Pandas.


15. Explain Groupby function in pandas with example.
16. Explain join function in pandas with example.
17. Explain merge function in pandas with example.
18. Differentiate join and merge functions in pandas.
19. Explain read_csv function in pandas with example.
20 Explain read_excel function in pandas with example.
21. Explain Web Scrapping with Example using Beautiful Soup library.
22. Explain Bag of Word model.

Page 27 of 28
GTU Medium
PYTHON FOR DATA SCIENCE

Unit-4: Data Visulization


1. Write a short note on Data Visualization.
2. Write a short note of MatPlotLib.
3. Explain Axes, Ticks and Grid in MatPlotLib with example.

4. List and Explain different line appearance in MatPlotLib.


5. Explain Labels, Annotation and Legends in MatPlotLib.
6. List and Explain different graphs in MatPlotLib.
7. Write a program in Python for creating a Histogram/ piechart/ bar chart/ boxplot/ scatterplot

Unit-5: Data Wrangling


1. Write a short note on Data Wrangling.
2. Write a short note on Exploring Data Analysis/ Exploratory Data Analysis (EDA).
3. Differentiate Numerical Data and Categorical Data with suitable example. Also explain how to
handle such data types.
4. Write a short note on Classes in Scikit-learn library.
5. Differentiate Supervised and Unsupervised learning.

6. Explain Hasing Trick in python with example.


7. Explain time it magic command in Jupyter Notebook with example.
8. Explain Memory Profiler in Python.
9. Write a program in Python to perform DIFFERENT STATSTICAL OPERATIONS.

Page 28 of 28

You might also like