You are on page 1of 127

Informatics Practices (2023-24)

CLASS XI Code No. 065

1. Prerequisite. None

2. Learning Outcomes

At the end of this course, students will be able to:


● Identify the components of computer system.
● Create Python programs using different data types, lists and
dictionaries.
● Understand database concepts and Relational Database
Management Systems.
● Retrieve and manipulate data in RDBMS using Structured
Query Language
● Identify the Emerging trends in the fields of Information
Technology.

3. Distribution of Marks and Periods

Unit Unit Name Marks Periods Periods Total


No Theory Practical Period
1 Introduction to computer system 10 10 - 10

2 Introduction to Python 25 35 28 63

3 Database concepts and the 30 23 17 40


Structured Query Language
4 Introduction to Emerging 5 7 - 7
Trends
Practical 30 - - -

Total 100 75 45 120

4. Unit Wise syllabus

Unit 1: Introduction to Computer System

Introduction to computer and computing: evolution of computing devices, components of a


computer system and their interconnections, Input/output devices.
Computer Memory: Units of memory, types of memory – primary and secondary, data deletion,
its recovery and related security concerns.
Software: purpose and types – system and application software, generic and specific purpose
software.
Unit 2: Introduction to Python

Basics of Python programming, Python interpreter - interactive and script mode, the structure of
a program, indentation, identifiers, keywords, constants, variables, types of operators,
precedence of operators, data types, mutable and immutable data types, statements,
expressions, evaluation and comments, input and output statements, data type conversion,
debugging.
Control Statements: if-else, if-elif-else, while loop, for loop

Lists: list operations - creating, initializing, traversing and manipulating lists, list methods and
built-in functions – len(),list(),append(),insert(), count(),index(),remove(), pop(), reverse(), sort(),
min(),max(),sum()

Dictionary: concept of key-value pair, creating, initializing, traversing, updating and deleting
elements, dictionary methods and built-in functions – dict(), len(), keys(), values(), items(),
update(), del(), clear()

Unit 3: Database concepts and the Structured Query Language

Database Concepts: Introduction to database concepts and its need, Database Management
System.

Relational data model: Concept of domain, tuple, relation, candidate key, primary key, alternate
key

Advantages of using Structured Query Language, Data Definition Language, Data Query
Language and Data Manipulation Language, Introduction to MySQL, creating a database using
MySQL, Data Types

Data Definition: CREATE DATABASE, CREATE TABLE, DROP, ALTER

Data Query: SELECT, FROM, WHERE with relational operators, BETWEEN, logical operators,
IS NULL, IS NOT NULL

Data Manipulation: INSERT, DELETE,UPDATE

Unit 4: Introduction to the Emerging Trends

Artificial Intelligence, Machine Learning, Natural Language Processing, Immersive experience


(AR, VR), Robotics, Big data and its characteristics, Internet of Things (IoT), Sensors, Smart
cities, Cloud Computing and Cloud Services (SaaS, IaaS, PaaS); Grid Computing, Block chain
technology.
Practical Marks Distribution

S.No. Unit Name Marks

1 Problem solving using Python programming language 11

3 Creating database using MySQL and performing Queries 7

4 Practical file (minimum of 14 python programs, and 14 SQL 7


queries)

5 Viva-Voce 5

Total 30

5. Suggested Practical List


5.1 Programming in Python
1. To find average and grade for given marks.
2. To find sale price of an item with given cost and discount (%).
3. To calculate perimeter/circumference and area of shapes such as triangle, rectangle,
square and circle.
4. To calculate Simple and Compound interest.
5. To calculate profit-loss for given Cost and Sell Price.
6. To calculate EMI for Amount, Period and Interest.
7. To calculate tax - GST / Income Tax.
8. To find the largest and smallest numbers in a list.
9. To find the third largest/smallest number in a list.
10. To find the sum of squares of the first 100 natural numbers.
11. To print the first ‘n’ multiples of given number.
12. To count the number of vowels in user entered string.
13. To print the words starting with a alphabet in a user entered string.
14. To print number of occurrences of a given alphabet in each string.
15. Create a dictionary to store names of states and their capitals.
16. Create a dictionary of students to store names and marks obtained in 5 subjects.
17. To print the highest and lowest values in the dictionary.
5.3 Data Management: SQL Commands
18. To create a database
19. To create student table with the student id, class, section, gender, name, dob, and marks
as attributes where the student id is the primary key.
20. To insert the details of at least 10 students in the above table.
21. To display the entire content of table.
22. To display Rno, Name and Marks of those students who are scoring marks more than 50.
23. To display Rno, Name, DOB of those students who are born between ‘2005- 01-01’ and
‘2005-12-31’.

Suggested material
NCERT Informatics Practices - Text book for class - XI (ISBN- 978-93-5292-148-5 )
Informatics Practices
CLASS XII
Code No. 065
2023-2024

1. Prerequisite: Informatics Practices – Class XI

2. Learning Outcomes
At the end of this course, students will be able to:
● Create Series, Data frames and apply various operations.
● Visualize data using relevant graphs.
● Design SQL queries using aggregate functions.
● Import/Export data between SQL database and Pandas.
● Learn terminology related to networking and internet.
● Identify internet security issues and configure browser settings.
● Understand the impact of technology on society including gender and disability
issues.

3. Distribution of Marks and Periods


Unit Unit Name Marks Periods Periods Total
No Theory Practical Period
1 Data Handling using Pandas and 25 25 25 50
Data Visualization

2 Database Query using SQL 25 20 17 37

3 Introduction to Computer 10 12 0 12
Networks

4 Societal Impacts 10 14 - 14

Project - - 7 7

Practical 30 - - -

Total 100 71 49 120

4. Unit Wise syllabus

Unit 1: Data Handling using Pandas -I

Introduction to Python libraries- Pandas,


Matplotlib.
Data structures in Pandas - Series and
Data Frames.
Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical
operations; Head and Tail functions; Selection, Indexing and Slicing.

Data Frames: creation - from dictionary of Series, list of dictionaries, Text/CSV files;
display; iteration; Operations on rows and columns: add, select, delete, rename; Head and
Tail functions; Indexing using Labels, Boolean Indexing;

Importing/Exporting Data between CSV files and


Data Frames.

Data Visualization
Purpose of plotting; drawing and saving following types of plots using Matplotlib – line
plot, bar graph,

histogram

Customizing plots: adding label, title, and legend in plots.

Unit 2: Database Query using SQL

Revision of database concepts and SQL commands covered in class XI

Math functions: POWER (), ROUND (), MOD ().

Text functions: UCASE ()/UPPER (), LCASE ()/LOWER (), MID ()/SUBSTRING
()/SUBSTR (),
LENGTH (), LEFT (), RIGHT (), INSTR (), LTRIM (), RTRIM (), TRIM ().

Date Functions: NOW (), DATE (), MONTH (), MONTHNAME (), YEAR (), DAY (),
DAYNAME ().

Aggregate Functions: MAX (), MIN (), AVG (), SUM (), COUNT (); using COUNT (*).

Querying and manipulating data using Group by, Having, Order by.

Working with two tables using equi-join

Unit 3: Introduction to Computer Networks

Introduction to networks, Types of network: PAN, LAN, MAN, WAN.


Network Devices: modem, hub, switch, repeater, router, gateway
Network Topologies: Star, Bus, Tree, Mesh.
Introduction to Internet, URL, WWW, and its applications- Web, email, Chat, VoIP.
Website: Introduction, difference between a website and webpage, static vs dynamic
web page, web server and hosting of a website.

Web Browsers: Introduction, commonly used browsers, browser settings, add-ons and
plug-ins, cookies.

Unit 4: Societal Impacts

Digital footprint, net and communication etiquettes, data protection, intellectual property
rights (IPR), plagiarism, licensing and copyright, free and open source software (FOSS),
cybercrime and cyber laws, hacking, phishing, cyber bullying, overview of Indian IT Act.

E-waste: hazards and


management.

Awareness about health concerns related to the usage of technology.

Project Work
The aim of the class project is to create tangible and useful IT application. The learner
may identify a real-world problem by exploring the environment. e.g. Students can
visit shops/business places, communities or other organizations in their localities
and enquire about the functioning of the organization, and how data are generated,
stored, and managed.

The learner can take data stored in csv or database file and analyze using Python libraries
and generate appropriate charts to visualize.

Learners can use Python libraries of their choice to develop software for their school or
any other social good.

Learners should be sensitized to avoid plagiarism and violation of copyright issues while
working on projects. Teachers should take necessary measures for this. Any
resources (data, image etc.) used in the project must be suitably referenced.

The project can be done individually or in groups of 2 to 3 students. The project should be
started by students at least 6 months before the submission deadline.

Practical Marks Distribution

S. No. Unit Name Marks

1 Programs using Pandas and Matplotlib 8

2 SQL Queries 7
3 Practical file (minimum of 15 programs based on Pandas, 4 based on 5
Matplotlib and 15 SQL queries must be included)

4 Project Work (using concepts learned in class XI and XII) 5

5 Viva-Voce 5

TOTAL 30

5. Suggested Practical List


5.1 Data Handling
1. Create a panda’s series from a dictionary of values and a ndarray
2. Given a Series, print all the elements that are above the 75th percentile.
3. Create a Data Frame quarterly sales where each row contains the item category, item
name, and expenditure. Group the rows by the category and print the total expenditure
per category.
4. Create a data frame for examination result and display row labels, column labels data
types of each column and the dimensions
5. Filter out rows based on different criteria such as duplicate rows.
6. Importing and exporting data between pandas and CSV file

5.2 Visualization
1. Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
2. For the Data frames created above, analyze, and plot appropriate charts with title and
legend.
3. Take data of your interest from an open source (e.g. data.gov.in), aggregate and
summarize it. Then plot it using different plotting functions of the Matplotlib library.
5.3 Data Management
1. Create a student table with the student id, name, and marks as attributes where the
student id is the primary key.
2. Insert the details of a new student in the above table.
3. Delete the details of a student in the above table.
4. Use the select command to get the details of the students with marks more than 80.
5. Find the min, max, sum, and average of the marks in a student marks table.
6. Find the total number of customers from each country in the table (customer ID,
customer Name, country) using group by.
7. Write a SQL query to order the (student ID, marks) table in descending order of the
marks.
Page 1of 3

KENDRIYA VIDYALAYA SANGATHAN


MUMBAI REGION
Split up syllabus (Theory & Practical)
Class:XII Subject:InformaticsPractices Max Marks: 70
2023-24
Unit Unit Name Marks (Theory)
I Data Handling using Pandas and Data Visualization 25
II Database Query using SQL 25
III Introduction to Computer Networks 10
IV Societal Impacts 10
Total 70

Approximate No. of working days from June 2023– Nov 2024.


The following calculation may differ in a day or two as the case of school may be
Month No. of working days after Possible theory periods Possible practical periods
removing Sundays, 2nd Saturday,
holiday
April 2023 23 20 16
May-June 2023 10 9 6
July 2023 23 20 12
Aug 2023 23 20 12
Sep 2023 25 20 12
Oct 2023 16 18 6
Nov 2023 24 20 12
Total 127 70
Page 2of 3

KENDRIYA VIDYALAYA SANGATHAN: MUMBAI REGION


Split up syllabus (Theory & Practical) 2023-24
Class:XII Subject: InformaticsPractices Max Marks:70
No. of

Practical
Theory
working
Month Portion to be covered
days
available
April 2023 Unit 1: Data Handling using Pandas and Data Visualization 20 16 23
Data Handling using Pandas -I
Introduction to Python libraries-Pandas, Matplotlib. Data structures in
Pandas-Series and DataFrames.
Series: Creation of Series from – ndarray, dictionary, scalar value; mathematical operations; Head and Tail
functions; Selection, Indexing and Slicing.
DataFrames:creation-fromdictionary of Series,listofdictionaries,Text/CSVfiles;display;iteration;
May –June Operations on rows and columns: add, select, delete, rename; Head and Tail functions; Indexing using Labels, Boolean 09 06 10
2023 Indexing;
Importing/Exporting Data between CSV files and Data Frames.
July 2023 Unit 1: Data Handling using Pandas and Data Visualization (Contd..) 20 12 23
Data Visualization
Purpose of plotting; drawing and saving following types of plots using Matplotlib – line plot, bar graph, histogram
Customizing plots: adding label, title, and legend in plots.
Unit 2: Database Query using SQL
Math functions: POWER (), ROUND (), MOD ().
Textfunctions:UCASE()/UPPER(),LCASE()/LOWER(),MID()/SUBSTRING()/SUBSTR(),LENGTH(),LEFT(),RIGHT (), INSTR
(), LTRIM (), RTRIM (), TRIM ().
August Date Functions: NOW (), DATE (), MONTH (), MONTHNAME (), YEAR (), DAY (), DAYNAME (). 20 12 23
2023 AggregateFunctions:MAX(),MIN(),AVG(),SUM(),COUNT();usingCOUNT(*). Querying and
manipulating data using Group by, Having, Order by.
Unit 3: Introduction to Computer Networks
Introduction to networks, Types of network: LAN, MAN, WAN.
Network Devices: modem, hub, switch, repeater, router, gateway
September Network Topologies: Star, Bus, Tree, Mesh. 20 12 25
2023 Introduction to Internet, URL, WWW, and its applications- Web, email, Chat, VoIP. Website:
Introduction, difference between a website and webpage, static vs dynamic webpage, web server and hosting of a
website.
Web Browsers: Introduction, commonly used browsers, browser settings, add-ons and plug-ins, cookies.
Unit 4: Societal Impacts
Digital footprint, net and communication etiquettes,
October Data protection, intellectual property rights(IPR),plagiarism, licensing and copyright, free and open source software 18 06 16
2023 (FOSS).cybercrime and cyber laws, hacking, phishing, cyber bullying, overview of IndianITAct.
Page 3of 3

November E-waste: hazards and management. 20 12 24


2023 Awareness about health concerns related to the usage of technology.
Revision Work
December Revision Work, Pre Board – I Project
2023 Development /
January Remedial classes, Pre Board – II Practical file
2024 Practical Examination submission etc.
& Board Exams
February
2024

PRACTICAL MARK DISTRIBUTION


Data Handling using Pandas
and Data Visualization
1. Data Handling Using Pandas
Python module- A python module is a python script file(.py file) containing variables, python classes,
functions, statements etc.

Python Library/package- A Python library is a collection of modules that together cater to a specific type of
need or application. The advantage of using libraries is that we can directly use functions/methods for
performing specific type of application instead of rewriting the code for that particular use. They are used by
using the import command as-
import libraryname
at the top of the python code/script file.

Some examples of Python Libraries-


1. Python standard library-It is a collection of library which is normally distributed along with Python
installation. Some of them are-
a. math module- provides mathematical functions
b. random module- provides functions for generating pseudo-random numbers.
c. statistics module- provides statistical functions
2. Numpy (Numerical Python) library- It provides functions for working with large multi-dimensional
arrays(ndarrays) and matrices. NumPy provides a large set of mathematical functions that can
operate quickly on the entries of the ndarray without the need of loops.
3. Pandas (PANel + DAta) library- Pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool. Pandas is built on top of NumPy, relying on ndarray and its fast and
efficient array based mathematical functions.
4. Matplotlib library- It provides functions for plotting and drawing graphs.

Data Structure- Data structure is the arrangement of data in such a way that permits efficient access and
modification.

Pandas Data Structures- Pandas offers the following data structures-


a) Series - 1D array
b) DataFrame - 2D array
c) Panel - 3D array (not in syllabus)

Series- Series is a one-dimensional array with homogeneous data.


Index/Label
0 1 2 3 4
abc def ghi Jkl mno
1D Data values

Key features of Series-


• A Series has only one dimension, i.e. one axis
• Each element of the Series can be associated with an index/label that can be used to access the data
value. By default the index starts with 0,1,2,3… but it can be set to any other data type also.
• Series is data mutable i.e. the data values can be changed in-place in memory
• Series is size immutable i.e. once a series object is created in memory with a fixed number of
elements, then the number of elements cannot be changed in place. Although the series object can
be assigned a different set of values it will refer to a different location in memory.
• All the elements of the Series are homogenous data i.e. their data type is the same. For example.
0 1 2 3 4
all data is of int type
223 367 456 339 927

a b c de fg
all data is of object type
1 def 10.5 Jkl True
Creating a Series- A series object can be created by calling the Series() method in the following ways-
a) Create an empty Series- A Series object not containing any elements is an empty Series. It can be
created as follows-
import pandas as pd
s1=pd.Series()
print(s1)

o/p-
Series([], dtype: float64)

b) Create a series from array without index- A numpy 1D array can be used to create a Series object as
shown below. The default index is 0, 1, 2, …
import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', np.NaN])
s1=pd.Series(a1)
print(s1)

o/p-
0 hello
1 world
2 good
3 nan
dtype: object

c) Create a series from array with index- The default index for a Series object can be changed and
specified by the programmer by using the index parameter and enclosing the index in square
brackets. The number of elements of the array must match the number of index specified otherwise
python gives an error.
#Creating a Series object using numpy array and specifying index
import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', 'morning'])
s1=pd.Series(a1, index=[101, 111, 121, 131])
print(s1)
o/p-
101 hello
111 world
121 good
131 morning
dtype: object

d) Create a Series from dictionary- Each element of the dictionary contains a key:value pair. The key of
the dictionary becomes the index of the Series object and the value of the dictionary becomes the
data.
#4 Creating a Series object from dictionary
import pandas as pd

d={101:'hello', 111:'world', 121:'good', 131:'morning'}


s1=pd.Series(d)
print(s1)

o/p-
101 hello
111 world
121 good
131 morning
dtype: object

e) Create a Series from dictionary, reordering the index- When we are creating a Series object from a
dictionary then we can specify which all elements of the dictionary, we want to include in the Series
object and in which order by specifying the index argument while calling the Series() method.
• If any key of the dictionary is missing in the index argument, then that element is not added
to the Series object.
• If the index argument contains a key not present in the dictionary then a value of NaN is
assigned to that particular index.
• The order in which the index arguments are specified determines the order of the elements
in the Series object.
#5 Creating a Series object from dictionary reordering the index
import pandas as pd

d={101:'hello', 111:'world', 121:'good', 131:'morning'}


s1=pd.Series(d, index=[131, 111, 121, 199])
print(s1)

o/p-
131 morning
111 world
121 good
199 NaN
dtype: object
f) Create a Series from a scalar value- A Series object can be created from a single value i.e. a scalar
value and that scalar value can be repeated many times by specifying the index arguments that
many number of times.
#6 Creating a Series object from scalar value
import pandas as pd

s1=pd.Series(7, index=[101, 111, 121])


print(s1)

o/p-
101 7
111 7
121 7
dtype: int64

g) Create a Series from a List- A Series object can be created from a list as shown below.
#7 Creating a Series object from list
import pandas as pd
L=['abc', 'def', 'ghi', 'jkl']
s1=pd.Series(L)
print(s1)

o/p-
0 abc
1 def
2 ghi
3 jkl
dtype: object

h) Create a Series from a Numpy Array (using various array creation methods) - A Series object can be
created from a numpy array as shown below. All the methods of numpy array creation can be used
to create a Series object.
#7a Creating a Series object from list
import pandas as pd
import numpy as np

#a. Create an array consisting of elements of a list [2,4,7,10, 13.5, 20.4]


a1=np.array([2,4,7,10, 13.5, 20.4])
s1=pd.Series(a1)
print('s1=', s1)

#b. Create an array consisting of ten zeros.


a2=np.zeros(10)
s2=pd.Series(a2, index=range(101, 111))
print('s2=', s2)

#c. Create an array consisting of five ones.


a3=np.ones(5)
s3=pd.Series(a3)
print('s3=', s3)
#d. Create an array consisting of the elements from 1.1, 1.2, 1.3,1.4, 1.5, 1.6, 1.7
a4=np.arange(1.1,1.8,0.1)
s4=pd.Series(a4)
print('s4=', s4)

#e. Create an array of 10 elements which are linearly spaced between 1 and 10 (both inclusive)
a5=np.linspace(1,10,4)
s5=pd.Series(a5)
print('s5=', s5)

#f. Create an array containing each of the characters of the word ‘helloworld’
a6=np.fromiter('helloworld', dtype='U1')
s6=pd.Series(a6)
print('s6=', s6)

o/p:
s1= 0 2.0
1 4.0
2 7.0
3 10.0
4 13.5
5 20.4
dtype: float64
s2= 101 0.0
102 0.0
103 0.0
104 0.0
105 0.0
106 0.0
107 0.0
108 0.0
109 0.0
110 0.0
dtype: float64
s3= 0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
s4= 0 1.1
1 1.2
2 1.3
3 1.4
4 1.5
5 1.6
6 1.7
dtype: float64
s5= 0 1.0
1 4.0
2 7.0
3 10.0
dtype: float64
s6= 0 h
1 e
2 l
3 l
4 o
5 w
6 o
7 r
8 l
9 d
dtype: object

Operations on Series objects-


1. Accessing elements of a Series object
The elements of a series object can be accessed using different methods as shown below-
a) Using the indexing operator []
The square brackets [] can be used to access a data value stored in a Series object. The index
of the element must be entered within the square brackets. If the index is a string then the
index must be written in quotes. If the index is a number then the index must be written
without the quotes. Attempting to use an index which does not exist leads to error.
#8 Accessing elements of Series using index
import pandas as pd

d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}


s=pd.Series(d)
print(s['abc'])
print(s[131])

o/p-
world
morning

b) Using the get() method


The get() method returns the data value associated with an index.
Syntax: seriesobject.get(key, default=None)
The first argument to the get method is the index of the element which we want to access.
Here if the key/index is not present in the series object and the second argument is not
specified then None is returned. If the key is not present and we want some default value to
be returned then it is specified using the default argument.
#9 Accessing elements of Series using get() method
import pandas as pd

d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}


s=pd.Series(d)
print(s.get('abc'))
print(s.get(131))
print(s.get(200))
print(s.get(333, default='nice day'))

o/p-
world
morning
None
nice day

c) Using the at property of the Series object


The at property of a Series object can be used to access a data value using an index. The
limitation of the at property is that all the indexes must NOT be numbers. If the index is not
present in the Series object then it gives an error.

#10 Accessing elements of Series using at property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
print(s.at['def'])

o/p-
world

d) Using the iat property of the Series object


The iat property of a Series object can be used to access a data value using the integer
position of the index. Here we can use the forward indexing method (i.e. the index starts
from 0,1,2,….) or the backward indexing method (i.e. last element to first having index -1,-2,-
3, …. If the integer value is out of bounds then it gives an error.

#11 Accessing elements of Series using iat property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
print(s.iat[0])
print(s.iat[-1])

o/p-
hello
morning

e) Using the loc property of the Series object


The loc property of a Series object can be used to access a range of data values using the
label/index name inside [] brackets in the following ways:
1. A single index can be passed to the loc property. This will return back a single value.
2. A list of indexes can be passed. This will return back a Series object containing the
multiple values
3. A slice notation using labels/index such as startindex:stopindex. Here contrary to the
slice notation the ending index value also is included in the result.
4. A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

#12 Accessing elements of Series using loc property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
x=s.loc['def']
print(type(x))
print(x)
y=s.loc[['def', 'jkl']] #note the use of nested [[]]
print(type(y))
print('y=\n', y)
z=s.loc['def':'jkl']
print('z=\n', z)
m=s.loc[[False,True,True,False]] #note the use of nested [[]]
print('m=\n', m)

o/p-
<class 'str'>
world
<class 'pandas.core.series.Series'>
y=
def world
jkl morning
dtype: object
z=
def world
ghi good
jkl morning
dtype: object
m=
def world
ghi good
dtype: object

f) Using the iloc property of the Series object


The iloc property of a Series object can be used to access a range of data values using the
index position numbers inside [] brackets in the following ways:
1. A single int can be passed to the iloc property. This will return back a single value.
2. A list of int representing index position numbers can be passed. This will return back a
Series object containing the multiple values
3. A slice notation using index position numbers can be passed. The data values at the slice
position numbers will the included in the returned Series object
4. A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

#13 Accessing elements of Series using iloc property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
x=s.iloc[2]
print(type(x))
print(x)
y=s.iloc[[0,2]] #note the use of nested [[]]
print(type(y))
print('y=\n', y)
z=s.iloc[1:]
print('z=\n', z)
m=s.iloc[[False,True,True,False]] #note the use of nested [[]]
print('m=\n', m)

o/p-
<class 'str'>
good
<class 'pandas.core.series.Series'>
y=
abc hello
ghi good
dtype: object
z=
def world
ghi good
jkl morning
dtype: object
m=
def world
ghi good
dtype: object

2. Accessing the top elements of a Series object


The head() method can be used to return back the top elements of a Series object. This function returns
back another Series object. If no parameter is passed to the head() method it returns back the top 5
elements. If an integer parameter (say n) is passed to the head() method, then the top n elements of the
Series object is returned back. The index of the respective elements is returned as it was in the original
object.
#14 Accessing the top elements of a Series object
import pandas as pd

L=[101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 201, 211]
s=pd.Series(L)
x=s.head()
print('x=\n', x)
y=s.head(3)
print('y=\n', y)

o/p:
x=
0 101
1 111
2 121
3 131
4 141
dtype: int64
y=
0 101
1 111
2 121
dtype: int64

3. Accessing the bottom elements of a Series object


The tail() method can be used to return back the bottom elements of a Series object. This function returns
back another Series object. If no parameter is passed to the tail() method it returns back the bottom 5
elements. If an integer parameter (say n) is passed to the tail() method, then the bottom n elements of the
Series object is returned back. The index of the respective elements is returned as it was in the original
object.

#15 Accessing the bottom elements of a Series object


import pandas as pd

L=[101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 201, 211]
s=pd.Series(L)
x=s.tail()
print('x=\n', x)
y=s.tail(3)
print('y=\n', y)
o/p:
x=
7 171
8 181
9 191
10 201
11 211
dtype: int64
y=
9 191
10 201
11 211
dtype: int64
4. Indexing/Slicing a Series object-
The index [] operator can be used to perform indexing and slicing operations on a Series object. The index[]
operator can accept either-
a) Index/labels
b) Integer index positions

a) Using the index operator with labels-


The index operator can be used in the following ways-
i) Using a single label inside the square brackets- Using a single label/index inside the square brackets
will return only the corresponding element referred to by that label/index.
# 16 indexing a Series object single label
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
t=s['b']
print(t)

o/p:
102

ii) Using multiple labels- We can pass multiple labels in any order that is present in the Series object.
The multiple labels must be passed as a list i.e. the multiple labels must be separated by commas and
enclosed in double square brackets. Passing a label is passed that is not present in the Series object,
should be avoided as it right now gives NaN as the value but in future will be considered as an error by
Python.
# 17 indexing a Series object multiple labels
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
u=s[['b', 'a', 'f']]
print(u)

o/p:
b 102
a 101
f 106
dtype: int64

iii) Using slice notation startlabel:endlabel- Inside the index operator we can pass startlabel:endlabel.
Here contrary to the slice concept all the items from startlabel values till the endlabel values including
the endlabel values is returned back.
# 18 indexing a Series object using startlabel:endlabel
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
u=s['b': 'e']
print(u)

o/p:
b 102
c 103
d 104
e 105
dtype: int64

b) Slicing a Series object using Integer Index positions-


The concept of slicing a Series object is similar to that of slicing python lists, strings etc. Even though the data
type of the labels can be anything each element of the Series object is associated with two integer numbers:
• In forward indexing method the elements are numbered from 0,1,2,3, … with 0 being assigned to the
first element, 1 being assigned to the second element and so on.
• In backward indexing method the elements are numbered from -1,-2, -3, … with -1 being assigned to
the last element, -2 being assigned to the second last element and so on.
For example consider the following Series object-
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
The Series object is having the following integer index positions-
forward
indexing---> 0 1 2 3 4 5
a b c d e f
101 111 121 131 141 151
<----- backward
-6 -5 -4 -3 -2 -1 indexing

Slice concept-
The basic concept of slicing using integer index positions are common to Python object such as strings, list,
tuples, Series, Dataframe etc. Slice creates a new object using elements of an existing object. It is created as:
ExistingObjectName[start : stop : step] where start, stop , step are integers

The basic rules of slice:


i. The slice generates index/integers from : start, start + step, start + step + step, and so on. All the
numbers generated must be less than the stop value when step is positive.
ii. If step value is missing then by default is taken to be 1
iii. If start value is missing and step is positive then start value is by default taken as 0.
iv. If stop value is missing and step is positive then start value is by default taken to mean till you reach
the ending index(including the ending index)
v. A negative step value means the numbers are generated in backwards order i.e. from - start, then
start - step, then start -step -step and so on. All the numbers generated in negative step must be
greater than the stop value.
vi. If start value is missing and step is negative then start value takes default value -1
vii. If stop value is missing and step is negative then stop value is by default taken to be till you reach the
first element(including the 0 index element)
#16 Slicing a Series object
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

x=s[1: :2]
print('x=\n', x)

y=s[-1: :-1]
print('y=\n', y)

z=s[1: -2: 2]
print('z=\n', z)

o/p:
x=
b 111
d 131
f 151
dtype: int64
y=
f 151
e 141
d 131
c 121
b 111
a 101
dtype: int64
z=
b 111
d 131
dtype: int64

5. Modifying elements of Series object-


The elements of a Series object can be modified using any of the following methods-
a) Using index [] operator to modify single/multiple values
#20 Modifying a Series object index [] method
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s['c'] = 555
s[['f','a']] = [666,777]
print('s=\n', s)

s['b':'d']=[0,1,2]
print('s=\n', s)
o/p:
s=
a 777
b 111
c 555
d 131
e 141
f 666
dtype: int64
s=
a 777
b 0
c 1
d 2
e 141
f 666
dtype: int64

b) Using at/iat property to modify a single value


#21 Modifying a Series object at iat property
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s.at['d'] = 999
s.iat[-1] = 777
print('s=\n', s)

o/p:
s=
a 101
b 111
c 121
d 999
e 141
f 777
dtype: int64

c) Using loc, iloc property to modify single /multiple values


#22 Modifying a Series object loc iloc property
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s.loc['b'] = 9
s.loc['e':'f'] = [8,7]
print('s=\n', s)
s.iloc[1: :2] = [33,44,55]
print('s=\n', s)

o/p:
s=
a 101
b 9
c 121
d 131
e 8
f 7
dtype: int64
s=
a 101
b 33
c 121
d 44
e 8
f 55
dtype: int64

d) Using slice method to modify multiple values


#23 Modifying a Series object slice method
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s[1: :2] = [1,2,3]


print('s=\n', s)

o/p:
s=
a 101
b 1
c 121
d 2
e 141
f 3
dtype: int64

6. Changing indexes of Series object-


The index property can be used to change the indexes of a Series object.
#24 Changing indexes of Series object
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131}


s=pd.Series(d)

s.index = ['have','a','nice', 'day']


print('s=\n', s)
o/p:
s=
have 101
a 111
nice 121
day 131
dtype: int64

7. Vector/Arithmetic Operations on Series object-


All Arithmetic, Relational, Logical Operations are possible between a Series object and a scalar(single) value
as well as between two Series object.

Vector Operations: When an operation is performed on a Series object then all the elements of that object
take part in that operation. Such operations are known as Vector Operations. A Series object supports Vector
operations and the result that is returned is also a Series object.

Data Alignment: When Vector/Arithmetic operations are performed between two Series objects then the
data is aligned based on the common/matching labels/indexes between the two Series objects. In the case
where the data does not align due to incompatible data or corresponding index not being available then
usually a result of NaN is assigned to the result.
#25 Vector Operations on Series object
import pandas as pd

d={'a':21, 'b':5, 'c':11, 'd':3}


s1=pd.Series(d)

s2=s1*2
print('s2=\n', s2)

s3=s1>10
print('s3=\n', s3)

o/p:
s2=
a 42
b 10
c 22
d 6
dtype: int64
s3=
a True
b False
c True
d False
dtype: bool

#26 Arithmetic Operations on Series object


import pandas as pd
d1={'a':21, 'b':5, 'c':11, 'd':3}
s1=pd.Series(d1)

d2={'b':2, 'c':7, 'e':5 }


s2=pd.Series(d2)

s3=s1+s2
print('s3=\n', s3)

o/p:
s3=
a NaN
b 7.0
c 18.0
d NaN
e NaN
dtype: float64

7. Deleting an element of Series object-


The following commands can be used to delete an element of a Series object.
a) del seriesobject[index]
The del command can be used to delete a particular element of the series object by specifying the
index.

b) seriesobject.drop(labels=[list_of_indexes], inplace=False)
The drop() method can be used to delete one or more elements by passing a list of labels/indexes to
the labels parameter. If the inplace parameter is not passed or is False then the new series object
with the elements deleted is returned back. If the inplace=True parameter is passed then the current
object is modified in place.
c) seriesobject.pop(index)
The pop() method is passed an index of the element that is to be deleted. This method returns back
the value of the element that is being deleted.

import pandas as pd

s1=pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
del s1['b']
print('s1=',s1)

s1.drop(labels=['c','e'], inplace=True)
print('s1=',s1)

x=s1.pop('d')
print('Deleted value:',x)
print('s1=',s1)

o/p:
s1= a 1
c 3
d 4
e 5
dtype: int64
s1= a 1
d 4
dtype: int64
Deleted value: 4
s1= a 1
dtype: int64
8. Boolean Indexing in Series object-
The index operator [], the loc and the iloc properties can be passed a boolean list containing the same
number of elements as that in the Series object. Wherever the True values are present, those elements are
selected and returned back as another Series object.

Inside the index operator a relational expression can also be passed that gives a boolean Series object having
the same number of elements as the Series object itself. If different boolean expressions are combined then
instead of using the 'and' operator the 'bitwise and' i.e. '&' is used, instead of 'or' operator the 'bitwise or'
i.e. '|' is used and instead of the 'not' operator the 'bitwise not' i.e. '~' is used.

Whenever multiple relational expressions are used for Boolean indexing then individual expressions must be
enclosed in parentheses since the bitwise and(&), or(|), not(~) operators have higher precedence than the
relational operators.
#Boolean indexing
import pandas as pd

s1=pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
s2=s1[[True,False,True,False,True]]
print('s2=',s2)

s3=s1[s1>=3]
print('s3=',s3)

s4=s1[(s1>=2)&(s1<=4)]
print('s4=',s4)

s5=s1[(s1<2)|(s1>4)]
print('s5=',s5)

s6=s1[~(s1==3)]
print('s6=',s6)

o/p:
s2= a 1
c 3
e 5
dtype: int64
s3= c 3
d 4
e 5
dtype: int64
s4= b 2
c 3
d 4
dtype: int64
s5= a 1
e 5
dtype: int64
s6= a 1
b 2
d 4
e 5
dtype: int64

9. Mathematical Properties/Methods of Series object-


The following Mathematical functions are defined on Series objects:
Sr. Property/Method Description Example
No. (import pandas as pd is already present)
1 is_monotonic Return True if values in the d1={'a':21, 'b':55, 'c':61, 'd':93}
object are s1=pd.Series(d1)
monotonic_increasing print(s1.is_monotonic)
otherwise Returns False.
o/p:
True
2 is_monotonic_decreasing Return True if values in the d1={'a':9, 'b':7, 'c':5, 'd':1}
object are s1=pd.Series(d1)
monotonic_decreasing. print(s1.is_monotonic_decreasing)

o/p:
True
3 is_monotonic_increasing Return boolean if values in the d1={'a':9, 'b':7, 'c':5, 'd':1}
object are s1=pd.Series(d1)
monotonic_increasing. print(s1.is_monotonic_increasing)

o/p:
False
4 ndim Returns the number of d1={'a':9, 'b':1, 'c':7, 'd':2}
dimensions of the underlying s1=pd.Series(d1)
data, for Series object by print(s1.ndim)
definition has only one
dimension i.e. 1. o/p:
1
5 shape, size shape property returns a tuple d1={'a':9, 'b':1, 'c':7, 'd':2}
(n,) containing a single element s1=pd.Series(d1)
which is the number of print(s1.shape)
elements in the Series object. print(s1.size)

size property also returns an o/p:


integer value containing the (4,)
number of elements in the 4
Series object.
6 abs() Returns a Series object with s1=pd.Series([2.5, -3.2, 4, -99], index=['a','b','c','d'])
absolute i.e. positive values of s2=s1.abs()
each numeric element print('s2=\n', s2)
o/p:
s2=
a 2.5
b 3.2
c 4.0
d 99.0
dtype: float64
7 obj1.add(obj2, fill_value=None) Return the addition of series #28 add Function on Series object
element/index-wise. import pandas as pd
Fill value: import numpy as np
Fill existing missing (NaN)
values, and any new element a = pd.Series([1, 1, 1, np.NaN], index=['a', 'b', 'c',
needed for successful Series 'd'])
alignment, with this value b = pd.Series([10, 15, 20, np.nan], index=['a', 'b', 'd',
before computation. If data in 'e'])
both corresponding Series c = a.add(b)
locations is missing the result print('c=\n', c)
will be missing.
d=a.add(b, fill_value=100)
print('d=\n', d)

o/p:
c=
a 11.0
b 16.0
c NaN
d NaN
e NaN
dtype: float64
d=
a 11.0
b 16.0
c 101.0
d 120.0
e NaN
dtype: float64
8 obj1.radd(obj2, fill_value=None) Reverse addition. Technically
does obj2+obj1, similar to add
function
9 agg() Both agg() and aggregate() a = pd.Series([23,11,2,7,7,2],
aggregate() methods are used to perform index=['a','b','c','d','e','f'])
aggregate functions over a b = a.agg('min')
Series object. The aggregate print('b=', b)
operation viz 'min', 'max', 'sum', c = a.agg('sum')
'mean', 'median', 'mode' are print('c=', c)
passed as string parameter to
the agg()/aggregate() method. d = a.aggregate('mean')
print('d=', d)

e = a.aggregate('mode') #multiple modes


print('e=\n', e)

f = a.aggregate('median')
print('f=', f)

o/p:
b= 2
c= 52
d= 8.666666666666666
e=
0 2
1 7
dtype: int64
f= 7.0
10 obj1.count() Return number of non-NA/null import pandas as pd
observations in the Series import numpy as np

a = pd.Series([1, 1, 1, np.NaN], index=['a', 'b', 'c',


'd'])
print(a.count())

o/p:
3
11 obj1.div(obj2, fill_value=None) Return the floating division of #30 divide Function on Series object
series element/index-wise. import pandas as pd
obj1.divide(obj2, fill_value=None) Fill value: import numpy as np
Fill existing missing (NaN)
obj1.truediv(obj2, fill_value=None) values, and any new element a = pd.Series([10, 12, 13, np.NaN], index=['a', 'b',
needed for successful Series 'c', 'd'])
obj1.rtruediv (obj2, fill_value=None) alignment, with this value b = pd.Series([2, 5, 6, np.nan], index=['a', 'b', 'd',
before computation. If data in 'e'])
obj1.rdiv(obj2, fill_value=None) both corresponding Series c = a.div(b)
locations is missing the result print('c=\n', c)
will be missing.
d=a.divide(b, fill_value=3)
rdiv does reverse division i.e print('d=\n', d)
obj2/obj1
e=a.rdiv(b)
print('e=\n', e)

o/p:
c=
a 5.0
b 2.4
c NaN
d NaN
e NaN
dtype: float64
d=
a 5.000000
b 2.400000
c 4.333333
d 0.500000
e NaN
dtype: float64
e=
a 0.200000
b 0.416667
c NaN
d NaN
e NaN
dtype: float64
12 Relational functions on Series object: The eq, ne gt, ge, lt, le functions x = pd.Series([12,2,3,4,np.NaN], index=['a', 'b', 'c',
obj1.eq(obj2, fill_value=None) are similar to the relational 'd','f'])
operators ==, !=, >, >=, <, <=. y = pd.Series([1,9,4,5,np.NaN], index=['a', 'b', 'd',
obj1.ne(obj2, fill_value=None) These functions return back a 'e','f'])
Series object which compares m = x.eq(y)
obj1.gt(obj2, fill_value=None) the elements index-wise and print('m=\n', m)
returns either True/False.
obj1.ge(obj2, fill_value=None) n = x.gt(y)
If only one value is missing or print('n=\n', n)
obj1.lt(obj2, fill_value=None) NaN then result is False.
p=x.le(y)
obj1.le(obj2, fill_value=None) In addition the use of fill_value print('p=\n', p)
is:
Fill value: q=x.le(y,fill_value=10)
Fill existing missing (NaN) print('q=\n', q)
values, and any new element
needed for successful Series o/p:
alignment, with this value m=
before computation. a False
b False
If data in both corresponding c False
Series locations is missing/NaN d True
the result will be False. e False
f False
dtype: bool
n=
a True
b False
c False
d False
e False
f False
dtype: bool
p=
a False
b True
c False
d True
e False
f False
dtype: bool
q=
a False
b True
c True
d True
e False
f False
dtype: bool
13 equals() It returns a single True value if x = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
all the elements of both Series 'd','e'])
object match index-wise. If y = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
both Series object contain NaN 'd','e'])
at same position then also it print(x.equals(y))
evaluates to True. Otherwise it
returns False. o/p:
True
14 fillna(inplace=False) It fills NaN values with the x = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
parameter passed. If inplace 'd','e'])
parameter is not passed or is y = x.fillna(0)
False, then it returns back print('x=', x)
another Series object with the print('y=', y)
NaN values filled with the
parameter that was passed. z = pd.Series([2,2,np.NaN], index=['a', 'b', 'c'])
z.fillna(99,inplace=True)
If inplace=True parameter is print('z=', z)
passed then the current object
itself is modified. o/p:
x= a 1.0
b 2.0
c 3.0
d 4.0
e NaN
dtype: float64
y= a 1.0
b 2.0
c 3.0
d 4.0
e 0.0
dtype: float64
z= a 2.0
b 2.0
c 99.0
dtype: float64
15 obj1.floordiv(obj2, fill_value=None) Performs action similar to // i.e. x = pd.Series([10,7,9,5,np.NaN], index=['a', 'b', 'c',
integer/floor division with the 'd','e'])
obj1.rfloordiv (obj2, fill_value=None) addition of the fill_value y = pd.Series([3,2,5,4,7], index=['a', 'b', 'c', 'd','e'])
argument. z = x.floordiv(y)
print('z=',z)

o/p:
z= a 3.0
b 3.0
c 1.0
d 1.0
e NaN
dtype: float64
16 max(), min(), mean(), median(), max-finds the max a = pd.Series([2,1,2,1,np.NaN],
mode(), sum() min-finds the min index=['a','b','c','d','e'])
mean-finds the mean b = a.min()
median-finds the median print('b=', b)
mode-finds the mode. Mode
can return multiple values and c = a.max()
it returns back another Series print('c=', c)
object as the result.
sum-finds the sum. d = a.mean()
print('d=', d)
If any data is NaN then it is not
counted while doing the e = a.median()
calculation print('e=', e)

f = a.mode() #mode can return multiple values


print('f=\n', f)

g = a.sum()
print('g=', g)

o/p:
b= 1.0
c= 2.0
d= 1.5
e= 1.5
f=
0 1.0
1 2.0
dtype: float64
g= 6.0
17 obj1.mul(obj2, fill_value=None) Similar to the * operator with a = pd.Series([2,1,3,4,np.NaN],
the addition of fill_value index=['a','b','c','d','e'])
obj1.multiply (obj2, fill_value=None) argument b = pd.Series([4,5,7,np.NaN], index=['a','b','d','e'])
c= a.mul(b)
print('c=\n', c)

o/p:
c=
a 8.0
b 5.0
c NaN
d 28.0
e NaN
dtype: float64
18 nlargest(), nsmallest() If no parameter is passed, it a = pd.Series([2,1,3,4,7,10, 19, 21, 8, np.NaN])
returns the top 5 largest / b = a.nlargest()
smallest element in the Series print('b=\n', b)
object. If an integer parameter
x, is passed then it returns the c=a.nsmallest(3)
top 'x' largest/smallest print('c=\n', c)
elements in the Series object.
o/p:
b=
7 21.0
6 19.0
5 10.0
8 8.0
4 7.0
dtype: float64
c=
1 1.0
0 2.0
2 3.0
dtype: float64
19 obj1.pow(obj2, fill_value=None) Similar to the ** operator with a = pd.Series([2,1,3], index=['a','b','c'])
the option of using the b = pd.Series([3,4,2], index=['a','b','d'])
fill_value parameter to fill NaN c = a.pow(b)
values. print('c=\n', c)

o/p:
c=
a 8.0
b 1.0
c NaN
d NaN
dtype: float64
20 obj1.prod() Returns the product of the a = pd.Series([2,4,3], index=['a','b','c'])
obj1.product() values in the Series object b = a.prod()
print('b=', b)

o/p:
b= 24
21 obj1.round(decimals=0) Round each value in a Series to a = pd.Series([212.542,452.987,327.192],
the given number of decimals. index=['a','b','c'])
The parameter decimals has b = a.round()
default value of 0 i.e. if no print('b=\n', b)
parameter is specified then it c = a.round(2)
rounds to integers. If decimals print('c=\n', c)
is negative, it specifies the d = a.round(-2)
number of positions to the left print('d=\n', d)
of the decimal point
o/p:
b=
a 213.0
b 453.0
c 327.0
dtype: float64
c=
a 212.54
b 452.99
c 327.19
dtype: float64
d=
a 200.0
b 500.0
c 300.0
dtype: float64
22 obj1.std(ddof=1) std() without any parameters a = pd.Series([9, 2, 5, 4])
takes default ddof parameter as b = a.std() #calculates sample standard
1 and calculates the sample deviation
standard deviation: print('b=', b)
c = a.std(ddof=0) #calculates population standard
deviation
If we want to calculate the print('c=', c)
population standard deviation,
then use obj.std(ddof=0) which o/p:
is given by the formula: b= 2.943920288775949
c= 2.5495097567963922

23 obj1.var(ddof=1) var() without any parameters a = pd.Series([9, 2, 5, 4])


takes default ddof parameter as b = a.var() #calculates sample variance
1 and calculates the sample print('b=', b)
variance: c = a.var(ddof=0) #calculates population variance
print('c=', c)

o/p:
var(ddof=0) calculates the
b= 8.666666666666666
population variance:
c= 6.5

24 obj1.sub(obj2, fill_value=None) Similar to the - operator with mport numpy as np


the option of using the
obj1.subtract(obj2, fill_value=None) fill_value parameter to fill NaN a = pd.Series([9, 2, 5, 4])
values. b = pd.Series([1, 5, np.NaN, 4])
obj1.rsub(obj2, fill_value=None) c = a.sub(b)
obj1.rsub(obj2, print('c=\n', c)
fill_value=None)
rsub performs the reverse d = a.rsub(b)
subtract operation of obj2-obj1 print('d=\n', d)

o/p:
c=
0 8.0
1 -3.0
2 NaN
3 0.0
dtype: float64
d=
0 -8.0
1 3.0
2 NaN
3 0.0
dtype: float64

10. Dropping empty/NaN values-


The dropna() method can be used to drop empty/NaN values. The syntax is:
seriesobject.dropna(inplace=False)
The empty string '' is not considered as an NaN value whereas the object None is considered as empty and is
removed by the dropna() method. The default value for inplace parameter is False and the dropna() method
returns a new Series object with the empty/NaN values removed. If the parameter inplace=True is passed
then the current object is modified inplace and None is returned back by the method.
#53 dropping empty NaN values
import pandas as pd
import numpy as np

s1=pd.Series([1,None,3,4,np.NaN], index=['a','b','c','d','e'])
s2=s1.dropna()
print('s1=\n',s1)
print('s2=\n',s2)

s1.dropna(inplace=True)
print('s1=\n',s1)

o/p:
s1=
a 1.0
b NaN
c 3.0
d 4.0
e NaN
dtype: float64
s2=
a 1.0
c 3.0
d 4.0
dtype: float64
s1=
a 1.0
c 3.0
d 4.0
dtype: float64

11. Filling empty/NaN values-


The fillna() method can be used to fill empty/NaN values. The syntax is:
seriesobject.fillna(value, method=None,inplace=False,limit=None)
where -
value - is the value that is to be filled in place of empty/None/NaN values
method - is the filling method to be used, which can be one of : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
• 'pad' / 'ffill' - front fill, propagate last valid observation forward to sequence of NaN values
till you reach the next valid value.
• 'backfill' / 'bfill' - back fill, take the next valid observation value and fill backwards the NaN
values till you reach a previous valid value
[Note: Either the value or the method parameter can be used. Both cannot be used together]
limit - the number of consecutive empty/NaN values to fill in. The remaining NaN values are left as is.
inplace - default value is False. If inplace=True is passed then the current object is modified inplace and None
is returned back by the method.
#53 filling empty NaN values
import pandas as pd
import numpy as np

s1=pd.Series([1,np.NaN,np.NaN,np.NaN,5], index=['a','b','c','d','e'])
s2=s1.fillna(0)
print('s2=\n',s2)

s3=s1.fillna(method='ffill')
print('s3=\n',s3)

s4=s1.fillna(method='bfill')
print('s4=\n',s4)

s1.fillna(1.5,limit=2,inplace=True)
print('s1=\n',s1)

o/p:
s2=
a 1.0
b 0.0
c 0.0
d 0.0
e 5.0
dtype: float64
s3=
a 1.0
b 1.0
c 1.0
d 1.0
e 5.0
dtype: float64
s4=
a 1.0
b 5.0
c 5.0
d 5.0
e 5.0
dtype: float64
s1=
a 1.0
b 1.5
c 1.5
d NaN
e 5.0
dtype: float64
DataFrame
DataFrame

A DataFrame is a two-dimensional data structure in the python pandas library which stores heterogeneous
(different kinds of) data in different columns.

columns

axis=1

index

axis = 0

rows

Key features of DataFrame

• The DataFrame contains labelled axes (rows and columns).


• The rows are also known as axis=0 and the row labels are also known as index.
• The columns are also known as axis=1 and the column labels are also known simply as columns.
• Any operations on DataFrame are aligned on both row as well as column labels.
• All elements within a single column have the same data type, but different columns can have different
data types.
• DataFrame is size mutable as well as data-mutable

For using the DataFrame object we must import the pandas library by using the statement:
import pandas as pd

Creating a DataFrame
The DataFrame() method is primarily used to create a DataFrame. It can accept different kinds of input. There
are many different ways of creating a DataFrame. Some of which are:

1. Creating an Empty DataFrame


The DataFrame method when it is called with no parameters creates an empty DataFrame.

#1 Creating an Empty DataFrame


import pandas as pd

df1=pd.DataFrame()
print('df1=\n',df1)

o/p:
df1=
Empty DataFrame
Columns: []
Index: []
2. Creating a DataFrame from List of Lists
A two-dimensional nested list can be used to create a DataFrame. The columns parmeter is used to
give pass the name of the columns as a list.
#2 Creating a DataFrame from List of Lists
import pandas as pd

L = [['abc', 15], ['def', 16], ['ghi', 17]]


df1=pd.DataFrame(L, columns=['name', 'age'])
print('df1=\n',df1)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17

3. Creating a DataFrame from Dictionary of Lists/ndarrays/Series


A dictionary can be used to create a DataFrame. The key of the dictionary becomes the column label
and the values, which can be lists/ndarrays/Series objects, become the elements appearing under that
column. The row labels can be specified by passing to the index parameter, a list of row labels.
#3 Creating a DataFrame from Dictionary of Lists/ndarrays/Series
import pandas as pd
import numpy as np

d1 = {'name':['abc', 'def', 'ghi'], 'age':[15,16,17] }


df1 = pd.DataFrame(d1)
print('df1=\n',df1)

a1 = np.array(['jkl','mno','pqr'])
a2 = np.array([20,21,22])
d2 = { 'name' : a1, 'age' : a2 }
df2 = pd.DataFrame(d2, index=['r1', 'r2', 'r3'])
print('df2=\n', df2)

s1 = pd.Series(['stu','vw', 'xyz'])
s2 = pd.Series([23, 24, 25])
d3 = {'name':s1, 'age' : s2}
df3 = pd.DataFrame(d3)
print('df3=\n', df3)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17
df2=
name age
r1 jkl 20
r2 mno 21
r3 pqr 22
df3=
name age
0 stu 23
1 vw 24
2 xyz 25

4. Creating a DataFrame from List of Dictionary


A DataFrame can be created from a List of Dictionary. The elements of the dictionary are key:value
pairs. The keys of the dictionary become the column names in the DataFrame object and the values of
the dictionary become the column-values of the DataFrame object. If any one of the column-
names(keys) is missing from a particular dictionary, then that column has a NaN value associated with
it in the DataFrame object.
#4 Creating a DataFrame from List of Dictionary
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]


df1 = pd.DataFrame(d1)
print('df1=\n',df1)

o/p:
df1=
age class name
0 15 NaN abc
1 16 5.0 def

5. Creating a DataFrame from List of Dictionary and specifying row index


Similar to the previous example we can create a DataFrame from a list of dictionary but in addition
instead of the default row labels – 0, 1, 2, 3, … we can specify our own row labels by using the
index=[list_of _row_labels] parameters when using the DataFrame() method.
#5 Creating a DataFrame from List of Dictionary and row index
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]


df1 = pd.DataFrame(d1, index=['r1', 'r2'])
print('df1=\n',df1)

o/p:
df1=
age class name
r1 15 NaN abc
r2 16 5.0 def

6. Creating a DataFrame from List of Dictionary and specifying row / column index
Similar to the previous two examples we can use the index=[list_of_row_labels] and
columns=[list_of_column_labels] to specify the row index as well as the column index.

Here while specifying the column labels we have the flexibility of specifying only a limited list of
column names in the column list in which case only the columns apperaring in the list appear in the
DataFrame object.
Another flexibility is that if any additional column name is specified which does not exist in any of the
dictionary then that column is created in the DataFrame object and all the values appear as NaN under
that column.
#6 Creating a DataFrame from List of Dictionary and row / column index
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]

df1 = pd.DataFrame(d1, index=['r1', 'r2'], columns=['name','age']) #column class is left out


print('df1=\n',df1)

df2 = pd.DataFrame(d1, index=['r1', 'r2'], columns=['name','age','marks']) # column marks is added


print('df2=\n',df2)

o/p:
df1=
name age
r1 abc 15
r2 def 16
df2=
name age marks
r1 abc 15 NaN
r2 def 16 NaN

7. Creating a DataFrame using csv files / Writing to csv file


A csv file can be imported directly to a DataFrame object using the read_csv() method. The read_csv()
method has many parameters to control the kind of data imported.

The parameter sep='char' can be used to specify the character used to separate the column values, by
default it is the comma(,).

The parameter index_col=int can be used to specify the the row labels are to be taken from which
column. An int is specified to highlight the int column number containing the row labels. The first
column has index 0, second column has index 1, and so on.

Similar to importing of data from a csv file, data from a DataFrame object can be exported to a csv file
using the to_csv() method. The to_csv() method has many parameters to control the kind of data
exported. The parameter index=False will not export the index as a column in the csv file. The
parameter header=False will omit writing of the column names to the csv file being exported.

Content of file'students.csv' Content of file 'newdata.csv'

name age regno name age hometown


abc 15 101 abc 15 lll
def 16 111 def 16 mmm
ghi 17 121 ghi 17 nnn
#7 Creating a DataFrame using csv files / Writing to csv file
import pandas as pd
df1 = pd.read_csv('students.csv')
print('df1=\n',df1)

df2 = pd.read_csv('newdata.csv',sep=',', index_col=0 )


print('df2=\n',df2)

df1.to_csv('newfile1.csv')
df1.to_csv('newfile2.csv',index=False, header=False)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17
df2=
name age hometown
regno
101 abc 15 lll
111 def 16 mmm
121 ghi 17 nnn

Content of file'newfile1.csv' Content of file 'newfile2.csv'

name age abc 15

0 abc 15 def 16

1 def 16 ghi 17

2 ghi 17
Common properties/attributes of DataFrames
Assume DataFrame df1 is as defined below:
df1=
dict1={'students':['abc', 'def','ghi'], students marks sports
'marks': [24.5, 27.5, 30], I abc 24.5 cricket
'sports': ['cricket', 'badminton', 'football']} II def 27.5 badminton
df1=pd.DataFrame(dict1,index=['I','II','III']) III ghi 30.0 football
print('df1=\n',df1)

SrNo Attribute Description Example


1 index displays the index (row labels) of DataFrame print('index is:\n', df1.index)
o/p:
index is:
Index(['I', 'II', 'III'], dtype='object')
2 columns displays the column labels of the DataFrame print('columns are:\n', df1.columns)
o/p:
columns are:
Index(['students', 'marks', 'sports'], dtype='object')
3 axes Returns a list containing both the axes elements print('axes are:\n', df1.axes)
o/p:
axes are:
[Index(['I', 'II', 'III'], dtype='object'),
Index(['students', 'marks', 'sports'], dtype='object')]
4 dtypes Returns the dtype of data of each columns print("dtypes are:\n", df1.dtypes)
o/p:
dtypes are:
students object
marks float64
sports object
dtype: object
5 size Returns the number of elements in the object print('size are:\n', df1.size)
o/p:
size are:
9
6 shape Returns a tuple () representing the dimensions print('shape is :\n', df1.shape)
(rows,columns) of the DataFrame o/p:
shape is :
(3, 3)
7 ndim Returns an int representing the number of axes print('ndim is :\n', df1.ndim)
o/p:
ndim is :
2
8 empty Returns True/False to show if the DataFrame is print('Is DataFrame empty:\n', df1.empty)
empty o/p:
Is DataFrame empty:
False
9 T Diplays the Transpose of the DataFrame print('Transpose is:\n', df1.T)
o/p:
Transpose is:
I II III
students abc def ghi
marks 24.5 27.5 30
sports cricket badminton football
10 len Displays the number of rows of the DataFrame print('Number of rows of DataFrame is:', len(df1))
o/p:
Number of rows of DataFrame is: 3
DataFrame Operations
Assume DataFrame df1 is as defined below:
df1=
dict1={'students':['abc', 'def','ghi'], students marks sports
'marks': [24.5, 27.5, 30], I abc 24.5 cricket
'sports': ['cricket', 'badminton', 'football']} II def 27.5 badminton
df1=pd.DataFrame(dict1,index=['I','II','III']) III ghi 30.0 football
print('df1=\n',df1)

#1. Selecting/Accessing a single column


print("Students column is :\n", df1['students']) #using square brackets to access column

o/p:
Student column is :
I abc
II def
III ghi
Name: students, dtype: object

print("Marks column is :\n", df1.marks) #using dot notation to access a column

o/p:
Marks column is :
I 24.5
II 27.5
III 30.0
Name: marks, dtype: float64
The square bracket notation(df1['students'], df[2017]) can be used when the column names are
strings('students') or numbers(2017). The dot notation can only be used when the column name
is a string(df1.marks). Hence we use the square bracket notation in general for all cases.

#2. Selecting/Accessing Multiple columns


print("Students and Marks columns are:\n", df1[['students','marks']])
# use two square brackets only to access multiple columns
o/p:
Students and Marks columns are:
students marks
I abc 24.5
II def 27.5
III ghi 30.0

#3. Selecting subset of rows/columns from Dataframe using row names and column names
print('Displaying subset:\n', df1.loc['I':'II', 'students':'marks'])
o/p:
Displaying subset:
students marks
I abc 24.5
II def 27.5
<dataframe>.loc( <startrow> : <endrow> , <startcolumn> : <endcolumn> )
Used to access a subset of dataframe using row index name and column index name
#4. Selecting subset of rows/columns using row numbers and column numbers
print('Displaying subset using row and column index numbers:\n', df1.iloc[0:2, 1:3])

o/p:
Displaying subset using row and column index numbers:
marks sports
I 24.5 cricket
II 27.5 badminton
<dataframe>.iloc( <startrow index> : <endrow index> : <step value> ,
<startcolumn index> : <endcolumn index> :<step value> )
Used to access a subset of dataframe using row index number and column index number using
row slice and column slice. If step value is not written it is assumed to be 1.

#5. Selecting/Accessing individual value using column name and row name
print("Value in row I column student is:\n", df1.students['I'])
# after dot there is column name and inside square bracket the row name
o/p:
Value in row I column student is:
abc

#6. Selecting/Accessing individual value using column name and row number
print("Value in row 0 column sports is:\n", df1.sports[0])
# after dot there is column name and inside square bracket the row number
o/p:
Value in row 0 column sports is:
cricket

#7. Selecting/Accessing individual value using at attribute i.e. using row name and column
name
print("Accessing individual value using at attribute:\n", df1.at['II','students'])
# after .at there is square bracket and then inside it row name and column name
o/p:
Accessing individual value using at attribute:
def

#8. Selecting/Accessing individual value using iat attribute i.e. using numeric row index and
column index
print("Accessing individual value using iat attribute:\n", df1.iat[2,2])
# after .iat there is square bracket and then inside it row number and column number
o/p:
Accessing individual value using iat attribute:
football

Difference between at, iat, loc, iloc:


at –used to access a single element of a DataFrame using row index name and column index name
iat - used to access a single element of a DataFrame using row index number and column index
number
loc – used to access a group of rows and columns using row index name and column index name
iloc - used to access a group of rows and columns using row index number and column index number
#9. Changing a single data value
df1.students['I']='xyz'
df1.sports[0]='chess'
df1.at['II','students']='pqr'
df1.iat[2,2]='carrom'
print('df1(After updating values)=\n', df1)

o/p:
df1(After updating values)=
students marks sports
I xyz 24.5 chess
II pqr 27.5 badminton
III ghi 30.0 carrom
All the four methods described previously to access individual values of a DataFrame can be
used to also change an individual value of a DataFrame.

#10. Adding / Changing a column (same value in all rows)


df1['hometown'] = 'otp' # column named hometown is added to the DataFrame
print('df1(After adding column hometown)=\n', df1)

o/p:
df1(After adding column hometown)=
students marks sports hometown
I xyz 24.5 chess otp
II pqr 27.5 badminton otp
III ghi 30.0 carrom otp
The value 'otp' is appearing across all rows of the DataFrame.

#11. Adding / Changing a column (different values in all rows)


df1['hometown'] = ['ottapalam', 'shoranur', 'palakkad']
print('df1(After updating column hometown)=\n', df1)

o/p:
df1(After updating column hometown)=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30.0 carrom palakkad
The value ['ottapalam', 'shoranur', 'palakkad'] is appearing across the rows of the DataFrame.

#12. Adding / Changing row (same value in all columns)


df1.at['IV', :] = 'rrr'
print('df1(After adding row IV )=\n', df1)

o/p:
df1(After adding row IV )=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30 carrom palakkad
IV rrr rrr rrr rrr
The value 'rrr' is appearing across the columns of the DataFrame.

The functions at and loc can only be used to add as well as modify an entire row, since they
only can be used to access a row label.

#13. Adding / Changing row (different values in all columns)


df1.loc['IV',: ] = ['mno', 25.5, 'football', 'delhi']
print('df1(After updating row 3 )=\n', df1)

o/p:
df1(After updating row 3 )=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30 carrom palakkad
IV mno 25.5 football delhi
The value ['mno', 25.5, 'football', 'delhi'] is appearing across the columns of the DataFrame.

The functions at and loc can only be used to add as well as modify an entire row, since they
only can be used to access a row label.

#14. Deleting a column


del df1['hometown']
print('df1(After deleting column hometown ")=\n', df1)

df1.drop(['sports'], axis=1, inplace=True)


print('df1(After deleting column sports )=\n', df1)

s1=df1.pop('marks')
print('Deleted column is:\n', s1)
print('df1(After deleting column marks )=\n', df1)

o/p:
df1(After deleting column hometown )=
students marks sports
I xyz 24.5 chess
II pqr 27.5 badminton
III ghi 30 carrom
IV mno 25.5 football
df1(After deleting column sports )=
students marks
I xyz 24.5
II pqr 27.5
III ghi 30
IV mno 25.5
Deleted column is:
I 24.5
II 27.5
III 30
IV 25.5
Name: marks, dtype: object
df1(After deleting column marks )=
students
I xyz
II pqr
III ghi
IV mno
There are three ways of deleting a column of a DataFrame:
a) using the python del command as:
del dataframeobject[columnname]
b) using the dataframe drop() method
The drop command can be used to delete rows (axis=0) or columns(axis=1).
The first parameter is a list containing either the row index names or the column index
names.
The parameter inplace=True is used to modify/delete the dataframe df1 itself. If this
parameter is not specified or is False then the dataframe df1 is not modified, instead it
returns a new dataframe with the row or column deleted.
c) Using the pop('columnname') method
The pop() method is used to delete a single column from a DataFrame. In addition, the
column that was deleted is returned back as a Series object.

#15. Deleting a row


df1.drop(['II','III'], axis=0,inplace=True) #first parameter contains multiple row labels to be deleted
print('df1(After deleting row 1 and 2 )=\n', df1)

o/p:
df1(After deleting row 1 and 2 )=
students marks sports
I xyz 24.5 chess
IV mno 25.5 football
The drop command can be used to delete rows (axis=0) or columns(axis=1).

If multiple rows are to be deleted then the first parameter must contain the list of row names to be
deleted.

#16. head() and tail() functions


The head() function is used to retrieve the top rows of a DataFrame whereas the tail() function is
used to retrieve the bottom rows of a DataFrame. If no parameter is passed, then it retrieves the top
5 or bottom 5 rows.

If a positive value, n, is passed to the head function then it retrieves the top n rows. If a negative n is
passed to the head function, then it returns all the rows except the last n rows.

Similarly, if a positive value, n, is passed to the tail function then it retrieves the bottom n rows of the
DataFrame. If a negative, n, is passed to the DataFrame then all the rows except the first n rows are
retrieved back.

These functions are useful for quickly verifying the data for example after sorting or adding rows.

#10 head and tail functions


import pandas as pd
import numpy as np

d={'students':['a', 'b','c','d','e','f','g','h','i','j'],
'marks': [25,21,8,9,15,29,np.NaN,25,24,30]}
df1=pd.DataFrame(d)
df2=df1.head()
print('df2=\n',df2)

df3=df1.head(-7)
print('df3=\n',df3)

print(df1.tail(2))

o/p:
df2=
students marks
0 a 25.0
1 b 21.0
2 c 8.0
3 d 9.0
4 e 15.0
df3=
students marks
0 a 25.0
1 b 21.0
2 c 8.0
students marks
8 i 24.0
9 j 30.0

#17. Boolean Indexing


The following three methods of accessing DataFrame elements, can be passed a Boolean array to
select specific rows of the DataFrame:
a) index operator[]
b) loc property
c) iloc property
This method of accessing the rows of the DataFrame based on a Boolean array is known as Boolean
Indexing.

The length of the Boolean array passed must match the number or rows/indexes of the DataFrame
otherwise an error is thrown. The Boolean array can also be a Series object which can be derived
from applying a relational operator to one or more columns of the DataFrame. Different relational
expressions can be combined using the bitwise and (&), or (|), not (~) operators. When using the
bitwise operators the individual relational expressions must be enclosed in parentheses () as the
bitwise operators have higher precedence than the relational operators.

#11 Boolean Indexing


import pandas as pd
import numpy as np

d={'students':['a', 'b','c','d','e','f','g','h','i','j'],
'marks': [25,21,8,9,15,29,np.NaN,25,24,30],
'hobby': ['mm','nn','oo','pp','qq','rr','ss','t','uu','vv']}
df1=pd.DataFrame(d)

df2=df1[[True, True, False,False,False,False,False,False,True, True]]


print('df2=\n',df2)

df3=df1.loc[[True, True, False,False,False,False,False,False,False, False]]


print('df3=\n',df3)

df4=df1.iloc[[False,False, False,False,False,False,False,False,True, True]]


print('df4=\n',df4)

#display the details of student having marks > 25


df5=df1[df1['marks'] >25]
print('df5=\n',df5)

#display details of students having marks in range 20-25


df6=df1[(df1['marks'] >20) & (df1['marks'] <25)]
print('df6=\n',df6)

o/p:
df2=
students marks hobby
0 a 25.0 mm
1 b 21.0 nn
8 i 24.0 uu
9 j 30.0 vv
df3=
students marks hobby
0 a 25.0 mm
1 b 21.0 nn
df4=
students marks hobby
8 i 24.0 uu
9 j 30.0 vv
df5=
students marks hobby
5 f 29.0 rr
9 j 30.0 vv
df6=
students marks hobby
1 b 21.0 nn
8 i 24.0 uu

Iterating over a DataFrame


Generally for a DataFrame if some columns need to be worked on then the columns are extracted using
df[column_name] or any such method. And if some processing on rows need to be performed, then the df.loc
or df.iloc commands are used. We must use these methods only for processing DataFrame as far as possible as
they are optimized for performance.
In the rare occasion that we need to iterate over the rows or iterate over the columns of a DataFrame, then
only the iteration methods over a DataFrame should be used. The following methods can be used to iterate
over a DataFrame:
1. Iterate directly over a DataFrame
2. Use the df.iteritems() or df.items() method
3. Use the df.iterrows() method
4. Use the df.itertuples() method

Usually when using any of the iteration methods, we work on a copy of the DataFrame. So we must not
modify any of the DataFrame's values as those are not reflected/updated in the original DataFrame.
#18. Iterating directly over a DataFrame
Iterating directly over a DataFrame gives the column names.
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Iterating over DataFrame')
for i in df:
print(i)
o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Iterating over DataFrame
name
age
hobby

#19. Use the df.iteritems() or df.items() method


Using the df.iteritems() or the df.items() method has the same effect. It returns back two objects -
the first one is the column name and the second one is a Series object having all the values of that
particular column.
#Using iteritems
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using iteritems')
for cname,cseries in df.items(): #df.iteritems() also gives same results
print('cname:',cname)
print('cseries:\n',cseries)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using iteritems
cname: name
cseries:
s1 abc
s2 def
s3 ghi
Name: name, dtype: object
cname: age
cseries:
s1 19
s2 20
s3 21
Name: age, dtype: int64
cname: hobby
cseries:
s1 reading
s2 playing
s3 gardening
Name: hobby, dtype: object

#20. Use the df.iterrows() method


Using the df.iterrows() method we get back two objects - the first object is the row label or index and
the second object is a Series object containing the elements of one particular row at each iteration.
The Series object has index as the column name and the value of Series object is the value under that
particular column for that particular row.
#Using iterrows
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using iterrows')
for rname,rseries in df.iterrows():
print('rname:',rname)
print('rseries:\n',rseries)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using iterrows
rname: s1
rseries:
name abc
age 19
hobby reading
Name: s1, dtype: object
rname: s2
rseries:
name def
age 20
hobby playing
Name: s2, dtype: object
rname: s3
rseries:
name ghi
age 21
hobby gardening
Name: s3, dtype: object

#21. Use the df.itertuples() method


Using the df.itertuples() method we get back a named tuple for each row of the DataFrame. [ Note:
Named tuple is not there in syllabus ]

The first element of the named tuple is the row label and the remaining elements are the values
under different columns for that particular row.
#Using itertuples
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using itertuples')
for r in df.itertuples():
print(r)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using itertuples
Pandas(Index='s1', name='abc', age=19, hobby='reading')
Pandas(Index='s2', name='def', age=20, hobby='playing')
Pandas(Index='s3', name='ghi', age=21, hobby='gardening')
Dataframe attributes Create Dataframe
axes Return a list representing the axes of the
DataFrame. pandas.DataFrame( data, index, columns, dtype)
columns The column labels of the DataFrame. data : takes various forms like ndarray, series, map, lists, dict, constants and
dtypesReturn the dtypes in the DataFrame. also another DataFrame.
index The index (row labels) of the DataFrame. index : For the row labels,
Iteration over row ndim Return an int representing the number of axes / columns : For column labels
Features of Dataframe array dimensions. dtype : Data type of each column.
# row iteration - 2D structure shape Return a tuple representing the dimensionality of
i mpor t pandas as pd
import pandas as pd - Potentially columns are of different the DataFrame. l =[ [ 1, 10, ' a' ] , [ 2, 20, ' b' ] , [ 3, 30, ' c ' ] ]
dic={"SName":["Radha","Sam","Ameer","Aman"], types size Return an int representing the number of df = pd. Dat aFr ame( l )
"Gender":['F','M','M','M'],"Age":[17,18,20,18]} - Size ? Mutable elements in this object. pr i nt ( df )
df = pd.DataFrame(dic) - Labeled axes (rows and columns) values Return a Numpy representation of the
- Can Perform Arithmetic operations on 0 1 2
print(df) DataFrame.
rows and columns 0 1 10 a
print("Iterate using iterrows:") 1 2 20 b
for i,j in df.iterrows(): 2 3 30 c
print(j)
print("------") import pandas as pd
l=[[1,3,-9],[4,5,6],[7,5,8]]
print("Iterate using iteritems:") Observe the
df = pd.DataFrame(l,index=["r1","r2","r3"],columns=["c1","c2","c3"]) arrengement of
for key,value in df.iteritems(): print(df) row and
print(value) c1 c2 c3 columns
print("Iterate using itertuples:") r1 1 3 -9
for i in df.itertuples(): r2 4 5 6
print(i) r3 7 5 8
Observe the
#Dataframe using dictionary
Iteration over column import pandas as pd
arrengement of
row and
#column iteration dic = {'one':[1,2,3,4],'two':[5,6,7,8],'three':[9,10,11,12]} columns
import pandas as pd df = pd.DataFrame(dic)
dic={"Name":["Radha","Sam","Ameer","Aman"], Pandas print(df)
one two three
"Gender":['F','M','M','M'],"Age":[17,18,20,18]} Dataframe 0 1 5 9
df = pd.DataFrame(dic) 1 2 6 10 As no columns match the
print("Iterate over columns") 2 3 7 11 keys Dataframe is empty.
for i in df.columns: 3 4 8 12 Even if one column matches
print(df[i]) then the Dataframe would
import pandas as pd be created with that column
dic = {'one':[1,2,3,4],'two':[5,6,7,8],'three':[9,10,11,12]} vakues and remaining
df = pd.DataFrame(dic,columns=['c1','c2','c3']) columns NaN
print(df)
Empty DataFrame
Arithmetic Operations Columns: [c1, c2, c3]
Index: []
import pandas as pd
# Dataframe from dictionary using from_dict ( )
import numpy as np import pandas as pd Observe the
arr1 = np.random.randint(1,10,(3,3)) dic = {'one':[1,2,3,4],'two':[5,6,7,8],'three':[9,10,11,12]} effect of
arr2 = np.random.randint(1,10,(3,3)) df = pd.DataFrame.from_dict(dic,orient="index") orientation on
df1 = pd.DataFrame(arr1) print(df) the index and
df2 = pd.DataFrame(arr2) df = pd.DataFrame.from_dict(dic,orient='columns') column.
All the corresponding print(df)
df3 = df1 + df2 elements of both 0 1 2 3
print("The addition is:") dataframes would be one 1 2 3 4
print(df3) taken as two operans for two 5 6 7 8
df3 = df1 - df2 the operations. three 9 10 11 12
print("The subtraction is:") one two three
0 1 5 9
print(df3) 1 2 6 10 Any numpy
df3 = df1 * df2 2 3 7 11 method to create
print("The multiplication is:") 3 4 8 12 array can be
used.
print(df3)
df3 = df1 / df2 # Data frame using numpy array
import pandas as pd
print("The division is:")
import numpy as np
print(df3) a = np.random.randint(0,20,size=(3,3))
df3 = df1 % df2 df = pd.DataFrame(a) If there is NaN value in
print("The modulus is:") print(df) the data the complete
print(df3) 0 1 2 data is considered to
0 18 18 10 be a float.
1 17 4 1
2 13 0 16
Mathematical Operations with NaN values
# Dataframe using Series
import pandas as pd
import pandas as pd s1 = pd.Series([1,2,3,4])
import numpy as np s2 = pd.Series([5,6,7,8])
arr1 = np.random.randint(1,10,(3,4)) s3 = pd.Series([9,10,11,12])
s1 = pd.Series([2,3,5,6]) df = pd.DataFrame([s1,s2,s3]) Write complete
s2 = pd.Series([1,5,8,4]) print(df) path for the text
0 1 2 3 file. Header
df1 = pd.DataFrame(arr1) 0 1 2 3 4 specified the row
df2 = pd.DataFrame([s1,s2]) 1 5 6 7 8 that is to be taken
print("The Dataframe 1 is:") 2 9 10 11 12 as column labels.
print(df1)
print("The Dataframe 2 is:") # Dataframe using Text file
import pandas as pd
print(df2)
df=pd.read_table("data.txt",header=0)
df3 = df1.add(df2 , axis = 0) print(df)
print("The addition along axis = 0 is:")
print(df3) import pandas as pd
df3 = df1.add(df2, axis = 0,fill_value=1) df=pd.read_table("data.txt",header=0)
print(df)
print("The addition along axis = 0 is:") index_col gives
regno roll name
print(df3) 0 1 34 Rajesh the column to be
The Dataframe 1 is: 1 2 42 Suman used as index
0 1 2 3 label.
0 8 7 8 9 import pandas as pd
1 5 1 5 9 df=pd.read_csv("student1.csv",index_col=0)
print(df)
2 6 5 6 8 Name Eng Phy Chem Maths IP
The Dataframe 2 is: Roll No
0 1 2 3 12 Kishore 23 54 36 56 54
0 2 3 5 6 44 Tarun 34 65 45 46 52
1 1 5 8 4
df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
The addition along axis = 0 is: 'mask': ['red', 'purple'],
0 1 2 3 'weapon': ['sai', 'bo staff']})
0 10.0 10.0 13.0 15.0 df.to_csv(index=False)
1 6.0 6.0 13.0 13.0
2 NaN NaN NaN NaN
The addition along axis = 0 is:
0 1 2 3
0 10.0 10.0 13.0 15.0 Head and Tail function
1 6.0 6.0 13.0 13.0
2 7.0 6.0 7.0 9.0 import pandas as pd

df = pd.read_csv("StudentsPerformance.csv")
Selection using dot operator print(df.shape)
#first n rows
# selection using dot operator print(df.head(2))
import pandas as pd #last n rows
df = pd.read_csv("StudentsPerformance.csv",index_col=['Reg No']) print(df.tail(14))
print(df)
print("Selected column values for gender are : ") The columns Note : Default value of n is 5
print(df.gender) having labels i.e. df.tail() will display last 5 rows
with spaces can
not be accessed
Selection using [ ] with dot
operator. Rename Row / Column
# selection using [ ]
import pandas as pd import pandas as pd
df = pd.read_csv("StudentsPerformance.csv",index_col=['Reg No']) import numpy as np
print("Selected column values for reading score are : ") a = np.random.randint(1,10,(4,5))
print(df[['reading score','writing score']]) df = pd.DataFrame(a,columns = ['a','b','c','d','e'],index=['Row1','Row2','Row3','Row4'])
df2 = df.rename(index={'Row1':1,'Row2':2,'Row3':3,'Row4':4} )
print("The new Dataframe is:")
print(df2)
df2 = df.rename(columns={'a':1,'b':2,'c':3,'d':4,'e':5} )
print("The new Dataframe is:")
Selection using loc print(df2)

import pandas as pd
df = pd.read_csv("StudentsPerformance.csv",index_col="Reg No")
print(df)
#print("A selection of row based on label is:\n") Delete Row / Column
#print(df.loc[235 : 575,])
import pandas as pd
#print("A selection of multiple discrete rows based on label is:\n") import numpy as np
#print(df.loc[[231 , 475],]) a = np.random.randint(1,10,(3,5))
Remember : both df=pd.DataFrame(a)
#print("A selection of alternate rows based on label is:\n") the starting and print("The original Dataframe")
#print(df.loc[::2,]) print(df)
ending label is df.drop(0,axis=0,inplace=True)
#print("A selection cross section of a row and column is:\n") included in the print("The Dataframe after deleting 0th row")
#print(df.loc[[231,475],'race/ethnicity']) print(df)
range while using #discrete rows to be deleted.
#print("A selection cross section of a row and column is:\n") loc to make df.drop([0,3],axis=0,inplace = True)
#print(df.loc[[342,377],['race/ethnicity','math score']]) print("The Dataframe after deleting rows")
selection. print(df)
print("Selection using slicing of rows \n")
newdf=df.loc[332:543,'gender':'reading score':2]
print(newdf)

Selection using iloc


import pandas as pd Add a row
df= pd.read_csv("StudentsPerformance.csv",index_col='Reg No')
print(df) import pandas as pd
#print("The 3rd row of Dataframe is:") import numpy as np
#print(df.iloc[3]) a = np.random.randint(1,10,(3,5))
#print("\nThe Parental educational level of 4th row is:") df = pd.DataFrame(a,columns=['a','b','c','d','e'],index = [12,34,56])
#print(df.iloc[4,4]) df.loc[62]=[1,2,3,4,5]
#print("Select multiple rows using list") print(df)
#print(df.iloc[[2,3,8],]) df2 = df.append({'a':1,'b':2,'c':3,'d':4,'e':5},ignore_index = True)
Remember : while
#print("Select multiple columns using list") using range with
#print(df.iloc[:,[3,4,6]])
iloc lower range is
Add Column When values for
#print("Select multiple rows and multiple columns using list") included but
non-existent index are
#print(df.iloc[[2,6,7],[3,4,6,7]]) upper range is import pandas as pd given the new index is
excluded. import numpy as np created. Append also
print("Select multiple rows and multiple columns using slicing")
a = np.random.randint(1,10,(3,5))
print(df.iloc[2:3,1:4])
df = pd.DataFrame(a,columns=['a','b','c','d','e'],index = [12,34,56])
can be used to add a
print(df) row. But if dictionary is
Boolean Selection df['e']=[0,0,0] used the ingnore_index
df['f'] = [10,15,12] = True must be given
import pandas as pd df['g'] = 50
df = print(df)
pd.read_csv("StudentsPerformance.csv",index_col='Reg
No')
print(df['reading score']>80) df.insert(4,column = 'add',value = [-1,-1,-1])
print("Select students whose Reading scores are greater print(df) New column created when
than 80") values are given for a
non-existant column. In
print(df[df['reading score']>80]) column 'g' scalar value
given for all indexes. That
means the complete
#print("Data of Male students") insert ( ) can insert column will have same
#print(df[df['gender']=="male"]) the column in any value i.e. 50. Column thus
created is always the last
position. In the column.
example 4 is the
position and value
is for adding
values.
Data Visualization using Pyplot
Data Visualization means representing the data in a graphical format which is easier to understand.
For Data Visualization in Python we are using the Matplotlib library

Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy
formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python
and IPython shells, the Jupyter notebook, web application servers, and different graphical user interface
toolkits.

Types of plots/charts (to be studied as per syllabus):


Some examples of charts are:
1. Line plots
2. Bar plot
3. Histograms

Working with matplotlib


For working with matplotlib usually we use the following import command:
import matplotlib.pyplot as plt

and frequently we need numpy for creating datasets, so numpy is also imported as follows:
import numpy as np

Matplotlib Object Hierarchy


A plot drawn in matplotlib is a hierarchy of nested Python objects as shown below:

• A Figure object is the outermost container for a matplotlib graphic. It is the overall window/page on
which everything is drawn.
• The Axes is the area on which the data is plotted with functions such as plot() and scatter(). A
Figure can contain multiple Axes, but an Axes object is a part of only one Figure.
• Below the Axes in the hierarchy are smaller objects such as tick marks, individual lines, legends, and
text boxes. Almost every “element” of a chart is a Python object which can be manipulated, all the way
down to the ticks and labels.
Parts of a Figure/Plot

Basic Steps involved in drawing any plot

1. Identify the data you want to represent on the plot.


For plots such as line graph it means identify the values that will be represented in the X-axis as well as Y-
axis. For pie-charts, histograms etc. there will usually be only one dataset.
2. Identify the structure of the plot you want
The next step is identifying which plot will be suitable to represent the data accurately. It can be line plot,
bar plot, histogram etc. Also consider whether you want many sets of data to be represented in the same
plot or to show different plots for different sets of data.
3. Setup the different parameters of the plot
Each plot has different components such as the xticks, yticks, the shape/colour of markers/plots, legend
etc. Set the parameters of the plot.
4. Draw the plot.
Line plots:
A line chart or line plot or line graph or curve chart is a type of chart which displays information as a series of
data points called 'markers' connected by straight line segments.

A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often
drawn chronologically.
Plotting a Line Plot

#1 line plot - basic


import matplotlib.pyplot as plt

#1 setup the data


x=[1,2,3,4,5]
y=[2,4,6,8,10]

#2 setup the parameters for the plot


plt.plot(x,y)

#3 display the plot


plt.show()

Output:

1. For drawing any plot usually the matplotlib.pyplot is imported as plt


2. The plot function plt.plot() is used to draw line graphs drawing lines between any two successive values of x
and y.
3. The plot function accepts two datasets, the first one is a list of x-coordinates and the second a list of
corresponding y-coordinates. The number of values in both the x and y lists must be same.
4. The plt.plot(x,y) is used to draw the graph and the plt.show() function is used to display the plot on the
screen.

Variations to the above program:


1. The x- and y-coordinates can be numpy arrays. The advantage of using numpy is that linearly spaced
values of x-coordinates can be generated as follows.
import numpy as np
x=np.linspace(10,120,100)
The above code generates 100 values that are linearly spaced between the end values of 10 and 120.

2. After generating the x-values if the y-values can be represented as an equation in terms of x-variable
then the y values can be generated directly as follows:
y = x*x +2*x + 6
The above code will generate the corresponding y-values for all the x- values that are generated above.
For generating the sin values the following can be used:
y=np.sin(x)

3. For generation of smooth curves the linear spacing for both the x- and y- co-ordinates must be close to
one another for range of values to be displayed otherwise the curves will appear as jagged lines.
#2 line plot - using numpy arrays
import matplotlib.pyplot as plt
import numpy as np

#1 setup the data


x=np.linspace(10,120,100) #generate 100 linearly spaced values betwn 10 & 120
y = x*x +2*x + 6 #generate the corresponding y-values

plt.plot(x,y) #plot and show


plt.show()

x=np.linspace(0,np.pi*2,100) #generate 100 values between 0 and 2*pi


y=np.sin(x)
plt.plot(x,y)
plt.show()

Output:

Setting the parameters on the graph


The plt object can be used to set the different parameters of the plot and then display the plot.
The various parameters that can be set are:
1. plt.xlabel('time') - xlabel sets the x-axis label
2. plt.ylabel('speed') - ylabel sets the y-axis label
3. plt.yticks([5,7,10]) - yticks sets the tick marks that appear on y-axis
4. plt.xticks([1,3,4],['abc','def','ghi']) - xticks sets the ticks to appear on the x-axis at
points [1,3,4] the second parameter changes the corresponding labels to ['abc','def','ghi'].
5. plt.grid() - displays the gridlines
6. plt.legend() - displays the legend using the labels for the corresponding plots. legend is drawn
only after the plot() is called since it takes the labels from plot function.
7. Besides the plot function can accept a third parameter the format string -
plt.plot(x,y,'>--c', label='car 1')
The format string (fmt) has the following specification:
fmt = '[marker][line][color]'
All the values are optional and the possible values for marker, line and color are shown below:

marker line style color


character description character description character color
'.' point marker '-' solid line style 'b' blue
',' pixel marker '--' dashed line style 'g' green
dash-dot line
'o' circle marker '-.' 'r' red
style
'v' triangle_down marker ':' dotted line style 'c' cyan
'^' triangle_up marker 'm' magenta
'<' triangle_left marker 'y' yellow
'>' triangle_right marker 'k' black
'1' tri_down marker 'w' white
'2' tri_up marker
'3' tri_left marker
'4' tri_right marker
's' square marker
'p' pentagon marker
'*' star marker
'h' hexagon1 marker
'H' hexagon2 marker
'+' plus marker
'x' x marker
'D' diamond marker
'd' thin_diamond marker
'|' vline marker
'_' hline marker
Example of format strings:
'b' # blue markers with default shape
'or' # red circles
'-g' # green solid line
'--' # dashed line with default color
'^k:' # black triangle_up markers connected by a dotted line

#3 line plot - setting parameters


import matplotlib.pyplot as plt

#1 setup the data


x=[1,2,3,4,5]
y=[2,4,6,8,10]

#2 setup the parameters for the plot


plt.xlabel('time') #xlabel - x axis label
plt.ylabel('speed') #ylabel - y axis label
plt.title('speed vs time') # title - title of plot
plt.xticks([1,3,4],['abc','def','ghi']) #xticks - ticks on x-axis
plt.yticks([5,7,10]) #yticks - ticks on y-axis
plt.plot(x,y,'>--c', label='car 1') #using format strings
plt.legend() #display legend using label of plot
plt.grid() #display gridlines

#3 display the plot


plt.show()

Output:
Multiple plots in the same figure
Multiple plots can be drawn in the same figure by any one of the following methods:
1. By using the plt.plot() function multiple times with different data sets and parameters each time.
2. By using a single plot function with multiple parameters for x and y variables as shown below:
plt.plot(x1,y1,'formatstring1', x2,y2,'formatstring2')
When using this method, the labels for both the plots must be passed as a list while calling the legend() function

#3 line plot - multiple plots on same figure


import matplotlib.pyplot as plt

#1 setup the data


x1=[1,2,3,4,5] #dataset for first plot
y1=[2,4,6,8,10]

x2=[3,7,9,12,15,17] #dataset for second plot


y2=[3,9,12,18,23,27]

#2 setup the common parameters


plt.xlabel('time')
plt.ylabel('location')
plt.title('speed vs time')
plt.xticks([8,13,16],['loc1','loc2','loc3']) #ticks can be assigned names
plt.yticks([5,12,25])
plt.grid()

#3 parameters for first plot


plt.plot(x1,y1,'>--c', label='car 1') #car 1 dataset is x1,y1

#4 parameters for second plot


plt.plot(x2,y2,'o-.g', label='car 2') #car 2 dataset is x2,y2

plt.legend() #display legend only after plotting all graphs

#5 display the plot


plt.show()

#6 another way of plotting multiple plots


#all the other parameter are erased on calling show the second time
#if needed parameter must be set again
plt.plot(x1,y1,'>--c', x2,y2,'o-.g') #plot does not accept multiple labels
plt.legend(['car 1', 'car 2']) #multiple entries in legend written here
plt.show()

Output:
Plotting from a DataFrame
The plot() function can accept the source data for the x- and y- coordinates from an object having tabular data
such as from a DataFrame. The syntax used for plot() in this case is:
plt.plot('column_name_for_x_axis', 'column_name_for_y_axis',data=DataFrameName, label='labelname')

While using this method, we can also plot multiple plots from the same DataFrame by calling the plot() function
multiple times with different x- and y- data's.
#4 line plot - plotting from a DataFrame
import matplotlib.pyplot as plt
import pandas as pd

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 plotting a single plot


plt.xticks(rotation=90) #rotate the xticks
plt.plot('Date','POONDI',data=df1, label='POONDI')
plt.legend()
plt.show()

#2 multiple plots in same Figure from DataFrame


plt.xticks(rotation=90) #rotate the xticks
plt.plot('Date','POONDI', data=df1)
plt.plot('Date','CHOLAVARAM', data=df1)
plt.plot('Date','REDHILLS', data=df1)
plt.plot('Date','CHEMBARAMBAKKAM', data=df1)
plt.legend(['POONDI','CHOLAVARAM','REDHILLS','CHEMBARAMBAKKAM'])
plt.show()

Output:
The source data in file 'chennai_reservoir_levels.csv' is shown below:
Date POONDI CHOLAVARAM REDHILLS CHEMBARAMBAKKAM
01-01-2018 1012 513 1585 1842
01-02-2018 1387 451 1368 1693
01-03-2018 2011 398 1194 1507
01-04-2018 1611 100 1660 1215
01-05-2018 396 70 1779 1198
01-06-2018 184 68 1427 1214
01-07-2018 132 61 1120 906
01-08-2018 50 26 920 628
01-09-2018 13 1 713 445
01-10-2018 93 8 478 338
01-11-2018 695 20 809 232
01-12-2018 381 40 1102 185
01-01-2019 298 48 941 102
01-03-2019 477 48 520 22
01-04-2019 333 42 301 10
01-05-2019 193 11 125 2
Plotting multiple subplots
We can plot multiple subplots in the same Figure object by dividing the Figure object into subplots as shown
below:
plt.subplots(num_of_rows, num_of_columns, sharex=False, sharey=False)

where
num_of_rows - is the number of rows in the figure
num_of_columns - is the number of columns in the Figure
sharex - if we want the subplots to share the xticks across the subplots then sharex must be set to True. If sharex
is True then xticks is shown only in the bottom-most plot. (Default value is False)
sharey - if we want the subplots to share the yticks across subplots then sharey must be set to True. If sharey is
True, then yticks is shown only for the leftmost plot (Default value is False)

The subplots() function returns two values, the figure object and the axes object. The figure object refers to the
entire drawing area. The axes objects can be used in two ways:
Method 1:
f1, (ax1,ax2) = plt.subplots(1,2,sharey=True)
The figure object is divided into 1 row and 2 columns i.e. two subplots are created. The first subplot is
assigned to object ax1 and the second subplot is assigned to ax2
Method 2:
f1, ax = plt.subplots(2,2,sharex=True, sharey=True)
The figure object is divided into 2 rows and 2 columns i.e. four subplots are created. All the four objects
are passed as a matrix to the ax object. Individual axes object is accessed using the matrix notation, i.e.
the first subplot is ax[0,0], the second subplot is ax[0,1], third subplot is ax[1,0], fourth subplot is ax[1,1].

ax[0,0] ax[0,1]

ax1 ax2

ax[1,0] ax[1,1]

f1, (ax1,ax2) = plt.subplots(1,2,sharey=True) f1, ax = plt.subplots(2,2,sharex=True, sharey=True)

After getting the individual axes objects we can use the plot() function with the individual axes objects and draw
independent plots in each of the axes object areas.

While using the individual axes objects the following care is to be taken:

1. For setting ticks on x- and y- axes, instead of plt.xticks() and plt.yticks() use the functions :
ax1.set_xticks() and ax1.set_yticks() functions
2. For rotating the labels - instead of plt.xticks(rotation=90) use :
ax[1,0].tick_params( axis='x', labelrotation =90)
#4 line plot - plotting multiple plots in different subplots
import matplotlib.pyplot as plt
import pandas as pd

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 creating 1x2 grid subplots


f1, (ax1,ax2) = plt.subplots(1,2,sharey=True) #yticks shown only once per row

#2 set the parameters for the two subplots


ax1.set_xticks([]) #with axis objects use set_xticks not xticks
ax2.set_xticks([])
ax1.plot('Date','POONDI', data=df1,label='POONDI')
ax2.plot('Date','CHOLAVARAM', data=df1, label='CHOLAVARAM')
ax1.legend()
ax2.legend()

#3 Display the plot


plt.show()

Output:
#5 line plot - 4x4 subplots, sharing axes, rotating labels
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df1=pd.read_csv('.\chennai_reservoir_levels.csv')
#print('df1=\n', df1)

#1 creating 4x4 grid


f2, ax = plt.subplots(2,2,sharey=True,sharex=True) #using single axes object

#2 using array notation to access individual subplot


ax[0,0].plot('Date','POONDI', data=df1, label='POONDI' )
ax[0,1].plot('Date','CHOLAVARAM', data=df1, label='CHOLAVARAM')
ax[1,0].plot('Date','REDHILLS', data=df1, label='REDHILLS')
ax[1,1].plot('Date','CHEMBARAMBAKKAM', data=df1, label='CHEMBARAMBAKKAM')

ax[1,0].set_xticks(np.arange(0,18,5)) #show only 0,5,10,15 th reading dates

#rotate labels in x-axis by 90 degrees


ax[1,0].tick_params( axis='x', labelrotation =90)
ax[1,1].tick_params( axis='x', labelrotation =90)

#display all legends


ax[0,0].legend()
ax[0,1].legend()
ax[1,0].legend()
ax[1,1].legend()
plt.show()

Output:
Bar plot:
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or
lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.

A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories
being compared, and the other axis represents a measured value.
Plotting a Bar Graph

The syntax for plotting a Bar Graph is:


plt.bar(x, height, width=0.8, bottom=None, align='center', data=None)

where:
x : sequence of scalars which form the x coordinates of the bars
height: sequence of scalars which form the heights of the bars.
width: scalar or array-like, optional which are the width(s) of the bars (default: 0.8)
bottom: scalar or array-like, optional. The y coordinate(s) of the bars bases (default: 0)
align: {'center', 'edge'}, optional, default: 'center'. It shows the alignment of the bars to the x coordinates:
'center': Center the base on the x positions.
'edge': Align the left edges of the bars with the x positions.
data: If the source of the data is another matrix like structure such as a DataFrame then the name of the object
is mentioned here.

Explanation regarding width of bar graph


If the x-coordinates are numbers then width is in the same unit as the number on the x-axis.

If the x-coordinates are not numbers (for e.g. strings) then the width is the fractional part of the distance
between one xtick and another xtick. For example consider the xticks line below

By default width=0.8. This means that between the xticks 'a' and 'b' , 0.8 i.e. 80 percent of the space will be
occupied by the bar graph and 20 percent will be the space between one bar and the next bar.

a b c

The other parameters of the bar graph such as xlabel, ylabel, title, xticks, yticks, legend are same as the line
plots and can be set using the plt object.
# 7 Simple bar plot
import matplotlib.pyplot as plt

x1=[10, 20, 30, 40, 50]


y1 = [35, 60, 75, 25, 90]
plt.bar(x1,y1) #width of graph is 0.8 since x1 is numeric data
plt.show()

x2=['a', 'b', 'c', 'd', 'e']


y2 = [35, 60, 75, 25, 90]
plt.bar(x2,y2) #width of graph is 80% since x2 is string data
plt.show()

Output:
# 8 bar plot setting parameters
import matplotlib.pyplot as plt

x=['a', 'b', 'c', 'd', 'e']


y = [35, 60, 75, 25, 90]

plt.xlabel('city') #xlabel - x axis label


plt.ylabel('number of birds') #ylabel - y axis label
plt.title('Birds in Cities') # title - title of plot
plt.yticks(y) #yticks - ticks on y-axis
plt.bar(x,y,label='Birds')

plt.legend() #display legend using label of plot


plt.grid() #display gridlines

plt.show()

Output:
Displaying Bar plot from a DataFrame

Consider the excel file 'product_sales.xlsx' containing the following data and imported into DataFrame df:

Sales Area Chocolate Cake Biscuit


Area A 20 5 20
Area M 30 9 12
Area B 12 12 18
Area N 8 7 23

For displaying data from a DataFrame df, we use the appropriate column names for the x- and y-coordinates and
pass the parameter data=df when using the bar() function.

# 9 bar plot from DataFrame


import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('product_sales.xlsx')

plt.bar('Sales Area','Chocolate',data=df, label='Chocolate')

plt.legend() #display legend using label of plot


plt.grid() #display gridlines

plt.show()

Output:
Displaying grouped bar chart

For displaying grouped data, we change the x-position on the x-axis where the bar for each of the individual
plots should appear. Consider that we want to draw three bar plots in the same figure for 'Area A', 'Area B' and
'Area C' to be shown on the x-axis. On the y-axis we want the data for 'Chocolate', 'Cake' and 'Biscuit' i.e. three
groups to be shown. The steps are as follows:

1. First change the xticks to start from 1,2,3 and so on. The names that are displayed on the tick marks can
be 'Area A', 'Area B' and 'Area C'.

x-axis

1 2 3

Area A Area B Area B

2. The distance between any two xticks is 1. This is the maximum width that is available for displaying all
the three bar plots. Select any one width such that the sum of all the widths of the three bar plots is less
than 1. For example if we select the width as 0.2, then since we display three bar plots, the combined
width becomes (0.2 x 3) = 0.6, which is less than 1. The remaining (1 - 0.6) =0.4 is the empty space
between one grouped bar plot and the next grouped bar plot.
3. Next step is rearrange the x-positions of the three individual bar plots so that they are adjacent and do
not overlap.
For doing so, the following method is adopted.

width of individual bar plot, wd=0.2

wd wd wd

x-axis

(x-wd) (x) (x+wd)

1 2 3

Area A Area B Area B

a) The centre bar plot of blue colour is centered exactly at the xtick position, (x)
b) The first bar bar plot of green colour is centered at (x-wd)
c) The third bar plot of red colour is centered at (x+wd)
i.e. for displaying three grouped bar plots, the first bar plot will be plotted at (x-wd) position, second bar plot
will be plotted at (x) position and the third bar plot will be plotted at (x+wd) position.

Similar method is adopted for displaying any number of grouped data. For example for displaying two grouped
bar plots, the first bar plot will be plotted at (x-wd/2) position and the second bar plot will be plotted at
(x+wd/2) position.
# 10 Displaying grouped bar charts
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df=pd.read_excel('product_sales.xlsx')
x=np.arange(1, len(df)+1) #x values are 1,2,3,4…
wd=0.2 #width of bar plot=0.2
xlbl=df['Sales Area']

plt.xticks(x,xlbl)
plt.bar(x-wd,'Chocolate',data=df, label='Chocolate', width=wd)
plt.bar(x,'Cake',data=df, label='Cake', width=wd)
plt.bar(x+wd,'Biscuit',data=df, label='Biscuit', width=wd)

plt.legend() #display legend using label of plot


plt.grid() #display gridlines

plt.show()

Output:
Displaying Stacked bar plots

For displaying stacked bar plot we use the parameter bottom=[y values] while plotting the second bar plot. The
[y values] are the list of y-coordinate values of the first bar plot on which we want to stack the second bar plot.

Consider the following population data of Males and Females contained in the excel file 'population.xlsx'
for which we want to show the stacked bar plot.

city Male Female Total


Area A 20 17 37
Area M 30 19 49
Area B 12 15 27
Area N 8 7 15

# 11 Displaying stacked bar charts


import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('population.xlsx')

plt.bar('city','Male',data=df, label='Male')
plt.bar('city','Female',data=df, label='Female', bottom='Male')

plt.legend() #display legend using label of plot


plt.grid() #display gridlines

plt.show()

Output:
Horizontal Bar plot

To plot a horizontal bar plot use the barh() with the following syntax:

plt.barh(y, width, height=0.8, align='center', data=None)

where:
y : sequence of scalars which form the y coordinates of the bars
width: scalar or array-like, which are the width(s) of the bars on the x-axis
height: sequence of scalars which form the heights of the bars (default: 0.8)
align: {'center', 'edge'}, optional, default: 'center'. It shows the alignment of the bars to the y coordinates:
'center': Center the base on the x positions.
'edge': Align the left edges of the bars with the x positions.
data: If the source of the data is another matrix like structure such as a DataFrame then the name of the object
is mentioned here.

# 12 Displaying horizontal bar plots


import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_excel('population.xlsx')

plt.barh('city','Male',data=df, label='Male')

plt.legend() #display legend using label of plot


plt.grid() #display gridlines

plt.show()

Output:
Histogram
Histogram is a graphical display of data using bars of different heights to group numbers into ranges. The height
of each bar shows how many of the data fall in that particular range.

A histogram is an accurate representation of the distribution of numerical data. It differs from a bar graph, in
the sense that a bar graph relates two variables, but a histogram relates only one variable.
How to draw a Histogram
Example 1: For the dataset containing CGPA of 15 students shown below draw the histogram for bin size 10:

6.1, 4.12, 8.2, 6.4, 3.6, 9.2, 5.5, 8.4, 6.2, 9.8, 5.3, 3.9, 8.1, 6.1, 2.7

Step 1: Calculate the range of the data set


range = largest value - smallest value = 9.8 - 2.7 = 7.1

Step 2: Divide the range by the number of groups you want and then round up.
For example we want to divide the data set into 10 groups (in python if bin size is not mentioned then 10 is
taken as the default bin size), and then the width of each group is found by
class-width = range / number of groups = 7.1 / 10 = 0.71
Therefore class width = 0.71

Step 3: Use the class width to create your groups


The smallest value is 2.7 and class-width is 0.71, so first class or first bin is from 2.7 to (2.7 + 0.71) i.e. from 2.7
to 3.41.
The second class or second bin is from 3.41 to (3.41 +0.71) i.e. second bin is 3.41 to 4.12 and so on…

Draw the following table with the classes/bins:


Bin Classes Tally Frequency
First Bin [2.7 – 3.41)
Second Bin [3.41 – 4.12)
Third Bin [4.12 – 4.83)
Fourth Bin [4.83 – 5.54)
Fifth Bin [5.54 – 6.25)
Sixth Bin [6.25 – 6.96)
Seventh Bin [6.96-7.67)
Eighth Bin [7.67-8.38)
Ninth Bin [8.38-9.09)
Tenth Bin [9.09-9.8]

For any class/bin square brackets [ or ] means including that number and round brackets ) means excluding
that number. For example, [2.7 – 3.41) means the range from 2.7 (including 2.7) till 3.41(excluding 3.41).

Only the last bin has both ends with square brackets which mean that both 9.09 and 9.8 will be counted in the
last bin.

[Note: Due to limitations of storing floating point numbers accurately in a computer, sometimes the values
appearing at the boundaries are taken in the adjacent bin. For e.g. if 5.54 is coming in the data, then ideally is
should come in bin [5.54 – 6.25) but due to inaccuracies in floating point number calculation/storage it is
considered in the bin [4.83 – 5.54). Apart from this python follows the boundary rules as explained above ]
Step 4: Fill the tally column
For each element in the dataset find the correct bin and place a tally mark ( | ) against that bin. The filled in
table is shown below:
Bin Classes Tally Frequency
First Bin [2.7 – 3.41) |
Second Bin [3.41 – 4.12) ||
Third Bin [4.12 – 4.83) |
Fourth Bin [4.83 – 5.54) ||
Fifth Bin [5.54 – 6.25) |||
Sixth Bin [6.25 – 6.96) |
Seventh Bin [6.96-7.67)
Eighth Bin [7.67-8.38) ||
Ninth Bin [8.38-9.09) |
Tenth Bin [9.09-9.8] ||

Step 5: Fill the Frequency column


Count the number of tally marks and fill the frequency column
Bin Classes Tally Frequency
First Bin [2.7 – 3.41) | 1
Second Bin [3.41 – 4.12) || 2
Third Bin [4.12 – 4.83) | 1
Fourth Bin [4.83 – 5.54) || 2
Fifth Bin [5.54 – 6.25) ||| 3
Sixth Bin [6.25 – 6.96) | 1
Seventh Bin [6.96-7.67) 0
Eighth Bin [7.67-8.38) || 2
Ninth Bin [8.38-9.09) | 1
Tenth Bin [9.09-9.8] || 2
Setp 6: Draw the histogram
Take the classes in the X-axis and the Frequency on the Y-axis and draw the histogram.
Drawing histogram using hist() function of pyplot
The hist() function can be used to draw a histogram. It accepts only a single dimensional 1D array or a list to
draw the histogram. The other properties of plot object such as setting xlabels, ylabels, xticks, yticks etc. remain
the same as line/bar plots.

In its simplest form histogram is drawn using the command:


plt.hist(data, bins=10)
where-
data - is the list or 1D array containing the data on which histogram is to be created
bins - it can either be a number or a list. If it is a single number it denotes the number of intervals of the
histogram we want. If bins parameters is a list, then the elements of the list are the bin edges. The
number of bin edges must be one greater than the number of intervals needed for the histogram. If bins
parameter is not passed a default value of 10 is taken.

Program : Drawing Histogram using pyplot

#18 Histogram using pyplot


import matplotlib.pyplot as plt
import pandas as pd

dict1={ 'student': ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10',


's11','s12','s13', 's14', 's15'],
'cgpa': [6.1,4.12,8.2,6.4,3.6,9.2,5.5,8.4,6.2,9.8,
5.3,3.9,8.1,6.1,2.7],
'numattempts':[1,2,1,3,1,1,2,1,1,2,1,1,3,1,2] }

df1=pd.DataFrame(dict1)
data = df1['cgpa'] #extract 1D data on which the histogram is to be drawn

plt.xlabel('cgpa range')
plt.ylabel('Number of Students')
plt.title('cgpa range vs Number of students')
plt.grid()
plt.hist(data,bins=10)
plt.show() #bin edges not shown automatically

Output:

Whenever python draws histogram the xticks are automatically calculated, which is not aligned to the bin edges.
Histogram - displaying bin edges correctly
There are two ways to display the bin edges correctly-
1. Create a list of bin edges. Use this list as the value for bins parameter of hist() function and set the xticks
to this list of bin edges.
2. Use the return value of the hist() function. The hist() function when called returns back a tuple. The first
element of the tuple is the numpy array containing the frequencies of the intervals of the histogram.
The second element is the numpy array containing the bin edges of the histogram that was created. The
number of bin edges is one more than the frequencies of the histogram.

#19 Histogram setting bin edges correctly


import matplotlib.pyplot as plt

data= [6.1,4.12,8.2,6.4,3.6,9.2,5.5,8.4,6.2,9.8,5.3,3.9,8.1,6.1,2.7]

#method 1 setting bin edges manually and setting the xticks to bin edges
b1=[3,5.5,7.5,10]
plt.hist(data, bins=b1)
plt.xticks(b1)
plt.grid()
plt.show()

#method2 accessing the bin edges returned by hist function


rtval = plt.hist(data)
print('rtval=', rtval)
print('rtval[0]=', rtval[0]) #contains frequencies
print('rtval[1]=', rtval[1]) #contains bin edges
plt.xticks(rtval[1]) #set xticks to bin edges
plt.grid()
plt.show()
Output:

rtval[0]= [1. 2. 1. 2. 3. 1. 0. 2. 1. 2.]


rtval[1]= [2.7 3.41 4.12 4.83 5.54 6.25 6.96 7.67 8.38 9.09 9.8 ]
Line Plot
Basic steps to follow while plotting
- Choose appropriate plot type and then the function import matplotlib.pyplot as pl
- Bar plot : bar( )
- Line plot : plot( )
x = [1,2,3,4,5] #select data as given or as per the graph shown
- Histogram : hist( ) y = [23,41,26,10,31]
- Understand the data. Make a list of x axis data and y asix pl.plot(x,y,'r-',label = "Sales",linewidth = 4) # this label is going to be used in legend
data. If data is given in a Dataframe then select appropriate pl.title("Test Plot",loc="right")
column from dataframe for x axis and y axis. pl.xlabel("X - AXIS") # Give appropriate name
- Give the legend pl.ylabel("Y - AXIS") # Give appropriate name
Introduction to Matplotlib - Give the axis labels pl.legend()
matplotlib.pyplot is a collection of functions for - Give plot title
pl.show() # Import as the graph would be visible
2D plotting.
To install the library
- pip install matplotlib Multiple line plots
To import the library for plotting
import matplotlib.pyplot as pl
- import matplotlib.pyplot as pl import pandas as pd
df = pd.read_csv("temp.csv")
Some of the types of plots
- Line
x = df['year']
- Bar y1 = df['city1']
- Histogram y2 = df['city2']
- Pie
- Box
y3 = df['city3']
pl.plot(x,y1,'b-x',label = "City 1",linewidth = 2)
Data pl.plot(x,y2,'r-x',label = "City 2",linewidth = 2)
Visualisation pl.plot(x,y3,'k-x',label = "City 3",linewidth = 2)
pl.title("City temperature for 26 years",loc="right")
pl.xlabel("Year")
pl.ylabel("Temp in F")
pl.grid()
pl.legend()
pl.show()
Histogram
Bar Plot
import matplotlib.pyplot as pl
import numpy as np import matplotlib.pyplot as pl
x = np.arange(len(df)) x = ['English','Hindi','Maths','Science','SST']
math = [12,23,45,56,57,67,72,83,65,22,87,53,12,90,78,83,45,75,37,28] y = [34,54,41,44,37]
freq,bin,patches = pl.hist(math,bins=10,edgecolor = "black",label = "Math marks") pl.bar(x,y, width =0.8 ,label= "Marks",color="brown",edgecolor="black")
# frequency give the list of number of events in each bin pl.title("Marks of 5 subjects",loc="right")
# bin is the bin size taken for making 10 bins. pl.xlabel("Subject")
# Check the number of bins givenin the exam and accordingly give the bin size. pl.ylabel("Marks")
# patches is the individual rectangle object pl.legend()
pl.title("Performacne of students",loc="right") pl.show()
pl.xlabel("Mark in Maths")
pl.ylabel("Number of students") Multiple Bar Plots
pl.legend()
pl.show() import matplotlib.pyplot as pl
import pandas as pd
df = pd.read_csv("temp.csv")
x = df['year']
y1 = df['city1']
y2 = df['city2']
pl.bar(x,y1,width =0.4,label="City 1",color="yellow",edgecolor="black")
pl.bar(x+0.4,y2,width = 0.4 , label = "City 2",color="red",edgecolor = "black")
# offset the x axis and give same value as width for plotting second bar
pl.title("Temperature of two cities",loc="right")
pl.xlabel("Year")
pl.ylabel("Temp in F")
pl.legend()
pl.show()
Database Query using
SQL
UNIT-2
DATABASE QUERY USING SQL
There are four types of SQL functions as per our CBSE syllabus:
1. SQL Math Functions
2. Text/String/Character Functions
3. Date and time functions
4. Aggregate functions

1. SQL Math Functions:


A mathematical function executes a mathematical operation usually based on input values that are
provided as arguments, and return a numeric value as the result of the operation.

 POW( ) or POWER( ) :POWER( A, B) or POW( A, B) returns the number A raised to the power of another
number B. Here the number A is the base and the number B is the exponent. Needs 2 numbers as
parameters.
Syntax: SELECT POW( A, B);
Examples:
1) mysql> select power(2,3);
+---------------+
| power(2,3) |
+--------------+
|8|
+--------------+

2) mysql>select pow(2,3);

+------------+
| pow(2,3) |
+------------+
|8|
+------------+
1 row in set (0.00 sec)

 ROUND( )
This function is used to round the number to the specified number of decimal places. Parameters required:
the number to be rounded and the number of decimal places required. If the number of decimal places
required is not mentioned, then the result will not have decimal places.

Syntax: SELECT ROUND(NUMBER, NUMBER OF DECIMAL PLACES)


Examples:
(1) mysql>select round(2.25);
+---------------+
| round(2.25) |
+---------------+
|2|
+---------------+
1 row in set (0.01 sec)

(2) mysql>select round(2.25, 2);

+-------------------+
| round(2.25, 2) |
+------------------+
| 2.25 |
+------------------+
1 row in set (0.00 sec)

(3) mysql>select round(135.55, -1);

+-----------------------+
| round(135.55, -1) |
+-----------------------+
| 140 |
+-----------------------+
1 row in set (0.00 sec)

 MOD():This function can be used to find modulus (remainder) when one number is divided by
another.
Examples:
1) mysql>select mod(5,3);

+-------------+
| mod(5,3) |
+-------------+
|2|
+-------------+
1 row in set (0.00 sec)

2) mysql> select mod(5,4);

+-------------+
| mod(5,4) |
+-------------+
|1|
+-------------+
1 row in set (0.00 sec)
2. Text/String/Character Functions:
 UCASE( ) / UPPER( ) function : Used to convert a character or text to uppercase.
Examples:
(1) mysql>SELECT UCASE('hello');

+-------------------+
| UCASE('hello') |
+-------------------+
| HELLO |
+------------------+
1 row in set (0.00 sec)

(2) mysql>SELECT Upper('hello');

+-----------------+
| Upper('hello') |
+-----------------+
| HELLO |
+-----------------+
1 row in set (0.00 sec)

 LCASE( ) / LOWER( ) :To convert a character or text to lowercase.

Example:
mysql>select lcase('HELLO');

+-------------------+
| lcase('HELLO') |
+-------------------+
| hello |
+-------------------+
1 row in set (0.00 sec)

 MID( ) : To extract a specified number of characters from the string.

First parameter is the text/string. Second parameter is the starting index and the third parameter is the
number of characters required. (Note: index starts with 1 and not 0.)

Example:
mysql>SELECT MID('ABCDEFGHIJKLMNOP', 1,4);

+------------------------------+
| MID('ABCDEFGHIJKLMNOP', 1,4) |
+------------------------------+
| ABCD |
+------------------------------+
1 row in set (0.00 sec)

 SUBSTRING( ) : Same as that of MID( ). To extract a specified number of characters from the string.
Examples:
(1) mysql>SELECT SUBSTRING('ABCDEFGHIJKLMNOP', 3,4);

+-----------------------------------------------------------+
| SUBSTRING('ABCDEFGHIJKLMNOP', 3,4) |
+----------------------------------------------------------+
| CDEF |
+----------------------------------------------------------+
1 row in set (0.00 sec)

(2) mysql>SELECT SUBSTRING('ABCDEFGHIJKLMNOP', -4,2);

+-----------------------------------------------------------+
| SUBSTRING('ABCDEFGHIJKLMNOP', -4,2) |
+-----------------------------------------------------------+
| MN |
+-----------------------------------------------------------+
1 row in set (0.00 sec)

 SUBSTR( ) : Same as that of MID( ) and SUBSTRING( )

Examples:
mysql>SELECT SUBSTR('ABCDEFGHIJKLMNOP', -4,3);

+------------------------------------------------------+
| SUBSTR('ABCDEFGHIJKLMNOP', -4,3) |
+-----------------------------------------------------+
| MNO |
+----------------------------------------------------+
1 row in set (0.00 sec)

 LENGTH( ) : This function returns the number of characters in the given text.

Example:
mysql> SELECT LENGTH('HELLO WORLD');

+--------------------------------------+
| LENGTH('HELLO WORLD') |
+-------------------------------------+
| 11 |
+-------------------------------------+
1 row in set (0.00 sec)

 LEFT( ) :Returns the specified number of characters including space starting from the left most
characters. Parameters required: text, number of characters to be extracted.

Examples:
mysql>SELECT LEFT('ABCDEFGHIJKLMNOP',1);
+---------------------------------------------+
| LEFT('ABCDEFGHIJKLMNOP',1) |
+----------------------------------------------+
|A|
+-----------------------------------------------+
1 row in set (0.00 sec)

 RIGHT ( ): Returns the specified number of characters including space starting from the right of the
text. Parameters required: text, number of characters to be extracted.
Examples:
1) mysql> SELECT RIGHT('ABCDEFGHIJKLMNOP',1);

+-----------------------------------------------+
| RIGHT('ABCDEFGHIJKLMNOP',1) |
+-----------------------------------------------+
|P|
+----------------------------------------------+
1 row in set (0.00 sec)
( Extracting 1 character )

 INSTR( ) : Checks whether the second string/text is present in the first string. If present it returns the
starting index. Otherwise returns 0.

Example:
mysql> SELECT INSTR('ABCDEFGHIJKLMNOP','ABC');

+----------------------------------------------------+
| INSTR('ABCDEFGHIJKLMNOP','ABC') |
+----------------------------------------------------+
|1|
+----------------------------------------------------+
1 row in set (0.00 sec)

 LTRIM( ) :To trim the spaces, if any, from the beginning of the text.

Example:
mysql> SELECT LTRIM(' HELLO');

+-------------------------+
| LTRIM(' HELLO') |
+------------------------+
| HELLO |
+------------------------+
1 row in set (0.00 sec)

 RTRIM( ) : To trim the spaces, if any, from the end of the text.
Examples:
1) mysql> SELECT RTRIM('HELLO ');
+--------------------------+
| RTRIM('HELLO ') |
+--------------------------+
| HELLO |
+--------------------------+
1 row in set (0.00 sec)

 TRIM( ): To trim the spaces, if any, from the beginning and end of the text.

Examples:
mysql> SELECT CONCAT(TRIM('HELLO '), 'WORLD');

+-------------------------------------------------------------------------+
| CONCAT(TRIM('HELLO '), 'WORLD') |
+-------------------------------------------------------------------------+
| HELLOWORLD |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

Note: CONCAT( ) combines two strings/texts

Try these yourself:


Write the output of the following:
1. SELECT POWER(3,3);
2. SELECT POW(3,2);
3. SELECT ROUND(123.45,1);
4. SELECT ROUND(123.45,-1);
5. SELECT ROUND(123.45,0);
6. SELECT ROUND(153.45,2);
7. SELECT ROUND(155.45,0);
8. SELECT ROUND(245,-2);
9. SELECT ROUND(255,-2);
10. SELECT ROUND(897, -3);

3. DATE AND TIME FUNCTIONS:

Function Description Example


CURDATE()/ Return the current date Select curdate(); Select
CURRENT_DATE()/ current_date();
CURRENT_DATE
DATE() Return date part from Select date(‘2018-08-15 12:30’);
date- time expression Output: 2018-08-15

MONTH() Return month from date Select month(‘2018-08-15’); Output: 08

YEAR() Return year from date Select year(‘2018-08-15’); Output: 2018

DAYNAME() Return weekday name Select dayname(‘2018-12-04’);


Output: Tuesday
DAYOFMONTH() Return value from 1-31 Select dayofmonth(‘2018-08-15’)
Output: 15
DAYOFWEEK() Return weekday index, for Select dayofweek(‘2018-12-04’);
Sunday-1, Monday-2, .. Output: 3
DAYOFYEAR() Return value from 1-366 Select dayofyear(‘2018-02-10’)
Output: 41
NOW() Return both current date Select now();
and time at which the
function executes
SYSDATE() Return both current date Select sysdate();
and time

Difference Between NOW() and SYSDATE() :

NOW() function return the date and time at which function was executed even if we execute multiple
NOW() function with select. whereas SYSDATE() will always return date and time at which each SYDATE()
function started execution.
For example.
mysql> Select now(), sleep(2), now();
Output: 2018-12-04 10:26:20, 0, 2018-12-04 10:26:20
mysql> Select sysdate(), sleep(2), sysdate();
Output: 2018-12-04 10:27:08, 0, 2018-12-04 10:27:10

AGGREGATE functions:
Aggregate function is used to perform calculation on group of rows
and return the calculated summary like sum of salary, average of salary etc.
Available aggregate functions are –
1. SUM ()
2. AVG ()
3. COUNT ()
4. MAX ()
5. MIN ()
6. COUNT (*)
Consider the following table:

Empno Name Dept Salary


1 Ravi Sales 24000
2 Sunny Sales 35000
3 Shobit IT 30000
4 Vikram IT 27000
5 nitin HR 45000

mysql>Select SUM (salary) from emp;


Output – 161000
mysql>Select SUM (salary) from emp where dept = ‘sales’;
Output - 59000
mysql>Select AVG (salary) from emp;
Output – 32200
mysql>Select AVG (salary) from emp where dept = ‘sales’;
Output - 29500
mysql>Select MAX(Salary) from emp;
Output – 45000
mysql>Select MAX (salary) from emp where dep t= ‘Sales’;
Output - 35000
mysql>Select MIN(Salary) from emp;
Output – 24000
mysql>Select MIN (salary) from emp where dept = ‘IT’;
Output - 27000
mysql>Select COUNT(*) from emp;
Output – 6
mysql>Select COUNT(salary) from emp;
Output – 5

ORDER BY Clause
SQL Order By is used to sort the data in the ascending or descending order.
● By default, SELECT returns rows in no particular order.
● ORDER BY returns the rows in a given sort order.
● Rows can be returned in ascending or descending sort order.
● It sorts the data in ascending order by default.
● To sort the data in descending order we use the DESC keyword
ORDER BY syntax.
1. SELECT column-names
2. FROM table-name
3. WHERE condition
4. ORDER BY column-names [ASC, DESC]

ASC -- ascending sort order: low to high, a to z. This is the default.


DESC -- descending sort order: high to low, z to a.

Example: Consider the employees table having the following records –

SQL Query to List all employees in alphabetical order:

mysql>SELECT empid, empName,salary FROM employees ORDER BY empName;


Output:
Group By in SQL:
 The usage of SQL GROUP BY clause is, to divide the rows in a table into smaller groups.
 The GROUP BY clause is used with the SELECT statement to make a group of rows based on the values
of a specific column or expression.
 The SQL AGGREGATE function can be used to get summary information for every group and these are
applied to an individual group.
 The WHERE clause is used to retrieve rows based on a certain condition, but it cannot be applied to
grouped results.
 When some rows are retrieved from a grouped result against some condition, that is possible with the
HAVING clause.

GROUP BY syntax:

SELECT <column_list> FROM < table name > WHERE <condition> GROUP BY <columns>
[HAVING] <condition>;

Example : Consider the employees table having the following records –


SQL GROUP BY with COUNT() function
The following query displays the number of employees working in each department.

mysql>SELECT deptid "Department Code", COUNT(*) "No of Employees" FROM employees GROUP BY
deptid;
Mind Map of Databases query using SQL

Databases query
using SQL

Text Functions Date functions Aggregate functions


Math functions
Querying
UCASE()/UPPER() MAX() & data
NOW() manipulati
POWER()
ng using
LCASE()/LOWER() MIN() group by
DATE()

AVG()
ROUND()
MID()/SUBSTRING()/ MONTH()
SUBSTR()
SUM()

MOD() MONTHNAME()
LENGTH() COUNT()

YEAR()

COUNT(*)
LEFT() DAYNAME()
RTRIM()
RIGHT() DAY()
LTRIM()

GROUP BY
ORDER BY
TRIM() ,HAVING
INSTR()
Introduction to
Computer Networks
Computer Networks – An Introduction
The interconnected collection of all autonomous computers is called computer network.
Two computers are said to be interconnected if they are able to exchange information.

NEED OF COMPUTER NETWORK


 Resource Sharing :- Resource Sharing means to make all programs, data and peripherals
available to anyone on the network irrespective of the physical location of the resources
and the user.
 Reliability :- Reliability means to keep the copy of a file on two or more different
machines, so if one of them is unavailable (due to some hardware crash or any other)
them its other copy can be used.
 Cost Factor :- Cost factor means it greatly reduces the cost since the resources can be
shared.

BASIC TERMINOLOGIES:

1. Sender :- Sender are those who transmits data/information.


2. Receiver:- Receiver are those who receives data/information.
3. Transmission media:- It is a communication channel that is used to transfer information
from sender to receiver.
4. Protocol :- It is a set of rules which is used to transmit data/information between sender
to receiver.

Types of Network based on Geographical Spread


1. LAN (Local Area Network):- It is spread into limited geographical area like an office, a
school building or a factory)
2. MAN (Metropolitan Area Network):- It spread over an area as big as city.
3. WAN (Wide Area Network):- This network spread across countries or on a very big
geographical area.

Difference between LAN and WAN


LAN
• It spreads over a small area.
• It requires less cost as compare to WAN.
• It is usually a single network.
• Error rates are very low due to short distances.
WAN

• It spreads over a very large area.


• It requires high cost as compare to LAN.
• It is usually a network of networks.
• Error rates are very high due to very long distances.

DIFFERENCE BETWEEN MAN AND WAN


MAN

• It spreads over a city.


• It requires less cost as compare to WAN.
• Error rates are low as compare to WAN.

WAN

• It spreads over a very large area.


• It requires high cost as compare to MAN.
• Error rates are very high as compare to MAN.

Network Devices

Modem
 Modem is a device which allows a computer to send and receive data over telephone
line or cable connections.
 A modem is used to changes/converts the digital data from your computer into analog
data that can be carried by telephone lines. The process of conversion from digital signal
to analog signal is called modulation and vice versa is called demodulation.
 There are two types of modems: Internal modem and external modem.

Hub
 A hub is a hardware device used to connect several computers together.
 A similar term is concentrator. A concentrator is a device that provides the central
connection point for cable from workstation service and peripherals.
 There are two types of hubs:
Active hub:- A active hub is used to amplify the signal as it moves from one connected
device to another.
Passive hub:- A passive hub simply allows the signal to pass from one computer to
another without any change.
Switch
 A switch is a network device which is used to interconnect computers or devices on a
network.
 It filters and forwards data packets only to one or more devices for which the packet is
intended across a network.

Difference Between Hub and Switch

HUB SWTICH

Hub operates at the Physical Layer. Switch operates at the Data link Layer.

Hub use broadcast type of transmission. Switch use unicast, multicast as well as
broadcast type of transmission.

Hub shares bandwith among its connection. Switch does not share bandwith, each
connection gets full bandwidth.

Hub uses half duplex transmission mode Switch uses full duplex transmission mode.

There is only one collision domain in a hub. In a Switch, each port has its own collision
domain.

Repeater
A repeater is a network device that amplifies and restore signal for long distance transmission.
There are two types of repeaters:
1. Amplifier:- Amplifier amplifies all incoming signals over the network. It means it amplifies
both the signal and any concurrent noise.
2. Signal repeater:- Signal repeater collects the inbound packet and then retransmit the packet
as if it were starting from the source station.

Router
 A router is a network device that forwards data from one network to another.
 A router works like a bridge but can handle different protocols.

Gateway
 A Gateway is a network device that connects dissimilar networks.
 It establishes an intelligent connection between a local network and external network
with completely different structures.

Network Topologies
The pattern of interconnection of nodes in a network is called the topology.
There are number of factors to consider for selecting of topology but out of them only we will
consider three factor here:
1. Cost:- Cost is one of the most important factor. Everyone would try to minimize installation
cost.
2. Flexibility:- The topology should be easy for reconfiguration of the network. This involve
moving existing node and adding new ones.
3. Reliability:- The topology chosen for the network can help by allowing the location of the
fault to be detected and to provide some means of isolating it.

There are different types of topologies:


 Star topology
 Bus topology
 Tree topology
 Mesh topology

Star Topology
This topology consists of a central node to which all other nodes are connected by a single path.
Advantages of Star Topology
 Installation and maintenance of network is easy and takes less time.
 It is easy to detect faults in this network as all computers on the central node. This
means that any problem which makes the network non-functional can be traced to the
central node.
 The rate of data transfer is fast as all the data packets or messages are transferred
through Central node.
 As the nodes are not connected to each other, any problem in one node does not
hamper the performance of other nodes in the network.
 Removal or addition of any node in Star topology can take place easily without affecting
the entire performance of the network.

Disadvantages of Star Topology


 All node of star topology are dependent on Central node and ,therefore, any problem in
the central node makes the entire network shutdown.
 Performance of the entire network is directly dependent on the performance of the
central node. If the central node is slow it will cause the entire network to slow down.
 More cabling is required in Star topology as compared to any other topology as all nodes
are directly connected to a central node.

The Bus or Linear Topology


 Another popular topology for data network is the linear.
 This consists of a single length of the transmission medium onto which the various
nodes are attached.

Advantages of Bus Topology


 Nodes can be connected or removed easily from the bus network.
 It requires less cable length than a star topology.
 Bus network is easy to implement and can be extended up to a certain limit.
 It works well for small network.

Disadvantages of Bus Topology


 If there is a fault or break in the main cable, the entire network shutdown.
 Terminators are required at both ends of the backbone cable.
 Fault isolation is difficult to detect if the entire network shutdown.
 When the network is required in more than one building, bus network cannot be used.
 The signal becomes weaker if number of nodes becomes large.
 Collision of data can take place as several nodes can transmit data to each other at a
time.

Tree Topology
 In Tree Topology, all or some of the devices are connected to the central hub, called an
active hub and some of the devices are connected to the secondary hub which may be
an active hub or passive hub.
 An active hub contains the repeater that regenerates the signal when it becomes weaker
with long distances.
 A passive hub simply provides a connection between all other connecting nodes.
 It is a combination of Star and Bus topologies.
Advantages of Tree Topology
• When one of the node stops working, it does not impact other nodes.
• Each star segment gets a dedicated link from the central bus. Thus, failing of one
segment does not affect the rest of the network.
• Fault identification is easy.
• The network can be expanded by the addition of secondary nodes. Thus, scalability is
achieved.

Disadvantages of Tree Topology


 If the backbone line breaks, the entire segment goes down.
 There is a need for huge cabling.
 A lot of maintenance is needed even if it is easier.
 Though it is scalable, the number of nodes that can be added depends on the capacity of
the central bus and on the cable type.

Evolution of Internet
 ARPANET (Advanced Research Projects Agency NETwork): In 1969, The US govt.
(Department of Defense) formed an agency named ARPANET to connect computers at
various universities and defense agencies.
 In mid 80's another Federal agency, the National Science Foundation created a new high
capacity network called NSFnet which was more capable than ARPANET.
 NSFnet allowed only the academic research on its Network and not any kind of private
business on it. So many private companies built their own network which were later
interconnected along with ARPANET and NSFnet to form internet.
 It was the Inter networking i.e the linking of these two and some other networks i.e the
ARPANET and NSFnet and some private networks that was named Internet.
 The original ARPANET was shut down in 1990 and the government funding for NSFnet
discontinued in 1995 but the commercial internet services came into existence which in
the form of INTERNET.
 In general sense we can say that internet is network of networks.
 The internet is a worldwide network of computer network.
 The Internet is the global network of computing devices including desktop, laptop,
servers, tablets, mobile phones, other handheld devices as well as peripheral devices
such as printers, scanners, etc.
 In addition, it also consists of networking devices such as routers, switches, gateways,
etc. Today, smart electronic appliances like TV, AC, refrigerator, fan, light, etc., can also
communicate through the Internet.
 Interspace is a client/server software program that allows multiple users to communicate
online with real –time audio, video and text chat in dynamic 3D environments.

How does Internet works?


 Our computer or smartphone may be link to the internet using phone line/Mobile
ISP(Internet Service Provider).
 Our computer may be part of LAN. Then LAN will be connected to ISP using a high speed
phone line like T1 line(1.5mbps).where as normal phone line or modem typically handle
30000 to 50000 bits per second.
 ISPs then connect to larger ISP and the largest ISPs maintain fiber optic backbones for an
entire region. Backbones around the world are connected through fiber optic lines,
under sea cables or satellite links.
 In this manner every computer on internet is connected to every other computer on
internet. So we can say that it is a kind of WAN ,working with the help of various
networking devices and the protocols(especially TCP/IP) to forward data from source to
destination devices without constraints of dissimilar devices and architecture.

Applications of Internet
Following are some of the broad areas or services provided through Internet:
 The World Wide Web (WWW)
 Electronic mail (Email)
 Chat
 Voice Over Internet Protocol (VoIP)

URL

 URL stands for Uniform Resource Locator.


 The address of a website is called URL.
 Each address is unique.
 The elements in a URL:
• Protocol://server's address/filename
• Example: http://www.google.com/index.html
where, www.google.com indicates the IP address or the domain name where the source is located.
index.html specifies the name of the specified html document on the website of google.
Domain Names
 To communicate over the internet, we can use IP addresses but it is not possible to
remember the IP address of a particular website or computer every time as it consists of
a series of numbers.
 Domain Name System makes it easier to resolve IP addresses into names.
 It is the system which assigns names to some computers (web servers) and maintains a
database of these names and corresponding IP addresses.
 Domain names are used in URLs to identify particular web servers.
 For example in the URL https://www.cbse.nic.in/welcome.htm
 the domain name is cbse.nic.in

A domain name consists of the following parts:


1. Top level domain name or primary domain name
2. Sub-domain name(s).
• For example, in the domain name cbse.nic.in
 in is the primary domain name
 nic is the subdomain of in
 cbse is the subdomain of nic
The top-level domains are categorized into following domain names:
 Generic domain names
• .com- commercial business
• .edu- Educational institutions
• .gov- Government agencies
• .mil- Military
• .net- Organizations(not-profit)
 Country-specific Domain Names
• .in- India
• .au- Australia
• .ca- Canada
• .ch-China
• .nz-New Zealand
• .pk-Pakistan
• .jp-Japan
• .us-United Sates of America

WWW
• Most people think that WWW is Internet and vice versa, which is not true.
• They are closely linked but quite different in their overall working and concept.
DIFFERENCE BETWEEN INTERNET & WWW

INTERNET WWW

Internet is a means of connecting a computer to World Wide Web which is a collection of


any other computer anywhere in the world. information which is accessed via the Internet.

Internet is infrastructure. WWW is service on top of that infrastructure.

Internet is superset of WWW. WWW is a subset of the Internet.

Internet uses IP address. WWW uses HTTP.

WEB
• Web 2.0:-Web 2.0 refers to added features and application that make the web more
interactive, support easy online- information exchange and interoperability. Some
noticeable features of web 2.0 are blogs, wikis, video-sharing websites, social
networking websites etc.
• Web 3.0 :- It refers to the 3rd Generation of web where user will interact by using
artificial intelligence and with 3-D portals. Web 3.0 supports semantic web which
improves web technologies to create, connect and share content through the intelligent
search and the analysis based on the meaning of the words, instead of on the keywords
and numbers.

Email
 Email stands for Electronic Mail
 E-mail or email is information stored on a computer that is exchanged between two
users over telecommunications.
 It is a fast and efficient way to communicate with friends or colleagues.
 We can communicate with one person at a time or thousands; we can receive and send
files and other information.
 The message can be either text entered directly onto the email application or an
attached file( text, image audio, video,etc.) stored on a secondary storage.
 An existing file can be sent as an attachment with the email, so no need to type it again.
 To use email service, one needs to register with an email service provider by creating a
mail account. These services may be free or paid.
 Some of the popular email service providers are Google (gmail), Yahoo (yahoo mail),
Microsoft (outlook), etc.
 However, many organizations nowadays get customized business email addresses for
their staff using their own domain name. For example, username@companyname.com
Features of Email
 Automatic/default reply to messages.
 Auto-forward and redirection of messages.
 Facility to send copies of a message to many people.
 Automatic filing and retrieval of messages.
 Addresses can be stored in an address book and retrieved instantly.
 Notification if a message cannot be delivered.

Parts of an Email Message


 Headers: The message headers contain information about the sender and recipients.
Generally headers contain the following information:
 Subject It is a description of the topic of the message
 Sender (From): This is the sender’s email address
 Date and time received (Un)- The date and time an email was sent is usually included
automatically.
 Recipient (To): First/Last name of the recipient.
 Recipient’s email address: This is the receiver’s email address.
 Cc and Bcc: When sending a message to multiple recipients, Cc and Bec options can be
used. Stands for Carbon copy. It allows all recipients to see the email addresses of
everyone the message was sent to but the email addresses of recipients specified in Bcc
(Blind carbon copy) field do not appear in the received message.
 Body: The body of a message contains text that is the actual content. The body of the
message may also include signatures or automatically generated text that is inserted by
the sender’s email system.
 Attachment: It consists of files that are attached to the message. The attachment could
be a document, a picture or any other file type.
 Email address: An email address is a unique identifier for an email account. It is used for
both Sending and receiving email messages. Every email address has two main parts-a
username and Domain name. The username comes before ‘@’ and domain name comes
after it. In the example given below, ‘abc” is the username and ‘gmail.com’ is the
domain name:
abc@gmail.com

Basic Email Functions


 Send and receive mail messages
 Save your messages in a file
 Print mail messages
 Reply to mail messages
 Attach a file to a mail message

Protocols Used In Email


 IMAP(Internet Message Access Protocol):- It is a standard protocol for accessing email
from local server.
 POP3(Post Office Protocol 3):- It provides a simple standardized way for users to accept
mailboxes and download messages to their computers.
 SMTP (Simple Mail Transfer Protocol):- This is used when you send email to another
email users(the recipient).The SMTP protocol is used by the Mail Transfer Agent(MTA)
to deliver the sent eMail to the recipient's mail server.
 HTTP:- The HTTP protocol is not a protocol dedicated for email communication but it
can be used for accessing mailbox.

Parts of Email ID
 The email address has three parts:
– a user name
– an "at" sign (@)
– the address of the user's mail server
 Example vishantkhobragade@gmail.com

User name separator mail server

Chat
 Chatting is the other method for Internet conversation. It enables people connected
anywhere on the Internet to join in live discussions.
 Chat sessions allow many users to join in the same free - form conversation, usually
centered around a discussion topic.
 With ever increasing internet speed, it is now possible to send image, document, audio,
video as well through instant messengers. It means, the communicating parties can talk
to each other through an audio call or through a video call. Moreover, it is also possible
to chat through text, audio and video in a group. Thus, we can have group chat or group
calls.
 Applications such as WhatsApp, Slack, Skype, Yahoo Messenger, Google Talk, Facebook
Messenger, Google Hangout, etc., are examples of instant messengers. Some of these
applications support instant messaging through all the modes — text, audio and video.

VoIP

 Voice over Internet Protocol (VoIP), is a technology that allows us to make voice calls
using a broadband Internet connection instead of a regular (or analog) phone line.
 VoIP services convert our voice into a digital signal that travels over the Internet. If we
are calling a regular phone number, the signal is converted to a regular telephone signal
before it reaches the destination.
 VoIP can allow us to make a call directly from a computer, a special VoIP phone.
 In addition, wireless "hot spots" in locations such as airports, parks, and cafes allow us
to connect to the Internet and may enable us to use VoIP service wirelessly.

Advantages & Disadvantages of VoIP:


Advantages
 Less Cost
 Accessibility
 Flexibility
 Voice Quality
 Extra/Less Expensive Features
Disadvantages:
 Reliable Internet Connection Required
 Power Outages/Emergencies
 Latency

Website
 A website is a collection of web pages which consists of text, images and all types of
multi-media files.
 A page of information is stored on Internet is called web page.
Some of the common purposes for which websites are designed are listed below:
 Selling products and delivering services
 Posting and finding information on the internet
 Communicating with each other
 Entertainment purposes
 Disseminating contents and software

Difference Between Website & Webpage


Website Webpage

1. A collection of web pages which are grouped 1. A document which can be displayed in a web
together and usually connected together in browser such as Firefox, Google Chrome, Opera,
various ways, Often called a "web site" or simply Microsoft Internet Explorer etc.
a "site."
2. It has content about various entity. 2. It has content about single entity.

3. More development time is required. 3. Less development time is required.

4.Website address does not depend on 4. Webpage address depends on Website address.
Webpage address.

Difference between Static and Dynamic webpage

Static Webpage Dynamic Webpage

The static web pages display the same content In the dynamic Web pages, the page content
each time when someone visits it. changes according to the user.
It takes less time to load over internet. Dynamic web pages take more time while
loading.
No Database used in Static Webpage. A database is used at the server end in a
dynamic web page.

Changes rarely. Changes frequently.

Web Server
 A Web Server is a WWW server that responds to the requests made by web browsers.
 A web server is a computer that stores web server software and a website's component
files (e.g. HTML documents, images, CSS style sheets, and JavaScript files).
 A web server can be software or hardware.
 When talking about a web server as computer hardware, it stores web server software
and a website's contents (HTML pages, images, CSS, and JavaScript files). The server
needs to be connected to the Internet so that its contents can be made accessible to
others.
 When talking about a web server as a software, it is a specialized program that
understands URLs or web addresses coming as requests from browsers, and responds to
those requests.
 When client sends request for a web page, the web server search for the requested
page if requested page is found then it will send it to client with an HTTP response. If
the requested web page is not found, web server will the send an HTTP response :Error
404 Not found.
 The basic objective of the web server is to store, process and deliver web pages to the
users using Hypertext Transfer Protocol (HTTP). Apart from HTTP, a web server also
supports SMTP (Simple Mail transfer Protocol) and FTP (File Transfer Protocol) protocol
for e-mailing, for file transfer and storage.

Web Hosting
 Web hosting is an online service that enables you to publish your website or web
application on the internet. When you sign up for a hosting service, you basically rent
some space on a server on which you can store all the files and data necessary for your
website to work properly.
 A server is a physical computer that runs without any interruption so that your website
is available all the time for anyone who wants to see it.

How to host a website?

To host a website, following steps are given:


• Select the web hosting service provider that will provide the web server space as well as
related technologies and services such as database, bandwidth, data backup, firewall
support, email service, etc. This has to be done keeping in mind the features and
services that we want to offer through our website.
 Identify a domain name, which best suits our requirement, and get it registered through
domain name Registrar.
 Once we get web space, create logins with appropriate rights and note down IP address
to manage web space. Upload the files in properly organized folders on the allocated
space.
 Get domain name mapped to the IP address of the web server.
• The domain name system (DNS) is a service that does the mapping between domain
name and IP address. When the address of a website is entered in a browser, the DNS
finds out the IP address of the server corresponding to the requested domain name and
sends the request to that server.

Web Browser
 A web browser, or simply "browser," is an application used to access and view websites.
Common web browsers include Microsoft Internet Explorer, Google Chrome, Mozilla
Firefox, and Apple Safari.
 Mosaic was the first web browser developed by the National Centre for Supercomputing
Application (NCSA)

Web Browser Settings


• Home Panel: This panel contains options to set the home page of the browser, browser
window and tab settings.
• Search Panel: This panel contains options to edit the settings of the search engine used
by Firefox.
• Privacy and Security Panel: This panel contains options to secure the browser and data.
It includes the following:
 enhanced tracking protection
 forms and passwords
 history and address bar
 cookies and site data
 permission to view pop ups windows and install addons
• Sync Panel: This panel contains options to set up and manage a Firefox account which is
needed to access all services given by Mozilla. Make the desired settings and close the
browser settings window. The changes made in the browser settings will be applied.

Add-Ons and Plug-ins


• Add-ons and plug-ins are the tools that help to extend and modify the functionality of
the browser. Both the tools boost the performance of the browser, but are different
from each other.
 A plug-in is a complete program or may be a third-party software. For example, Flash
and Java are plug-ins. A Flash player is required to play a video in the browser. A plug-in
is a software that is installed on the host computer and can be used by the browser for
multiple functionalities and can even be used by other applications as well.
 On the other hand, an add-on is not a complete program and so is used to add only a
particular functionality to the browser. An add-on is also referred to as extension in
some browsers. Adding the functionality of a sound and graphics card is an example of
an add-on.

Cookies
 Cookies are small files which are stored on a user’s computer and contain information
like which Web pages visited in the past, logging details Password etc. They are designed
to hold a modest amount of data specific to a particular client and website and can be
accessed by the web server or the client computer.
 Cookies :- Cookies are messages that a web server transmits to a web browser so that
the web server can keep track of the user's activity on a specific website.

PREPARED BY

VISHANT D KHOBRAGADE
PGT CS
K V VSN, NAGPUR
TYPES OF NETWORK
•LAN
•MAN
•WAN

NETWORK DEVICES
• MODEM
• HUB
• SWITCH
• REPEATER
• ROUTER
• GATEWAY

NETWORK TOPOLOGIES
• STAR
• BUS
• TREE
• MESH

INTRODUCTION TO INTERNET

WWW & ITS APPLICATION


* WEB
* EMAIL
* CHAT
* VOIP

WEBSITES

WEB BROWSERS

PREPARED BY
VISHANT D KHOBRAGADE
PGT CS
K V VSN NAGPUR
Societal Impacts
CLASS XII
INFORMATICS PRACTICES(065)
UNIT IV SOCIETAL IMPACTS
STUDY MATERIAL
Digital footprint
Digital footprint refers to the data what you left on internet or provided to the internet through
your search queries or any kind of online activity. It includes the websites you visit,
emails you send, and information you submit to online services.
Types of digital footprint
There are two types of digital footprint:
1. Active Digital Footprint: It includes data filled up through online forms, emails, responses
given to e-mails or websites in the comments or feedback. The active digital footprints
created by the user intentionally with their active consent.
2. Passive Digital Footprint: It includes data generated by a website, mobile app or any
other activity on the internet. The passive digital footprint is created unintentionally
without user’s consents.
Net and Communication etiquettes:
Etiquettes refer to manners we are following for living a good life. Similarly, we have to abide by
some manners and etiquettes online as well. You should be ethical, respectful and responsible
while using the internet.
Anyone who uses digital technology along with Internet is a digital citizen or a netizen. Being a
good netizen means practicing safe, ethical and legal use of digital technology. A responsible
netizen must abide by net etiquettes, communication etiquettes and social media etiquettes.
Net-etiquette includes avoiding copyright violations, respecting privacy and diversity of users,
and avoiding cyber bullies and cyber trolls, besides sharing of expertise. We follow certain
etiquettes during our social interactions. One should be ethical, respectful and responsible while
surfing the Internet.
Be Ethical
While using internet, you should be ethical. Follow the following rules to be ethical on the
internet.
1. No copyright violation: While uploading media like audio, video, or images and creating
content we should not use any material created by others without their consent. We
should always try to make our own content or some time use loyalty free media having
creative commons free license to reuse them.
2. Share the expertise: You can share your knowledge to help people on the internet. There
are many platforms like a blog, youtube, podcast and affiliate marketing etc. You should
follow the simple stuff before sharing your knowledge on the internet.
a) The information should be true
b) You need to have enough knowledge about the topic
c) Share the knowledge in your words, should not copy paste kind work
Be respectful
We should be respectful on the internet with following aspects:
1. Respect Privacy: These are the most important things that need to be kept in mind
always! As we are using the internet we are getting some personal images, files, videos,
and some other data from other users. We should not share anything on the internet
related to others without their consent. This is called respect for privacy.
2. Respect Diversity: As on the internet we are following some forum, community, groups
on social media like Facebook, WhatsApp, etc. So there is a different kind of people having
different kind of mindset and opinion, knowledge, experience, culture and other aspects.
So we have to respect their diversity in the groups or community or forum.
Be responsible
While using internet, we should be responsible whatever we are doing.
1. Avoid cyberbullying: Cyberbullying refers to the activities done internet with an intention
to hurt someone or insult someone, degrading or intimidating online behaviour such as
spreading or sharing rumours without any knowledge or fact check on the ground,
sharing threats online, posting someone’s personal information, sexual harassment or
comments publicly ridicule (mockery, tease). These type of activities have very serious
impacts on the victims. Always remember, your activities can be tracked through your
digital footprints.
Communication Etiquettes
Communication etiquette refers to the rules or etiquettes followed while sending emails,
chatting, sending SMS, call, posting on forums and social media. The communication etiquettes
are as following:
1. Be Precise: It means that whenever any communication is required we should be precise
for the same. Respect the time and avoid some useless messages to forward or reply. Do
not expect an instant reply. While sending attachments also you should avoid a large file
as an attachment. If it is required to be sent then send the link of cloud storage like Google
Drive, OneDrive or Dropbox etc.
2. Be Polite: In communication we should always be polite whether we agree or disagree.
We should reply politely for communication without any aggression or abuse.
3. Be Credible: While making a comment on a forum we should decide the credibility over
a period of time. We should always try to go through the previous comments and judge
their credibility before typing a comment.

Data protection
In this digital age, data or information protection is mainly about the privacy of data stored
digitally. Elements of data that can cause substantial harm, embarrassment, inconvenience and
unfairness to an individual, if breached or compromised, is called sensitive data.
Examples of sensitive data include biometric information, health information, financial
information, or other personal documents, images or audios or videos. Privacy of sensitive data
can be implemented by encryption, authentication, and other secure methods to ensure that
such data is accessible only to the authorised user and is for a legitimate purpose. All over the
world, each country has its own data protection policies (laws). These policies are legal
documents that provide guidelines to the user on processing, storage and transmission of
sensitive information. The motive behind implementation of these policies is to ensure that
sensitive information is appropriately protected from modification or disclosure.
Intellectual property rights (IPR)
When someone owns a house or a motorcycle, we say that the person owns that property.
Similarly, if someone comes out with a new idea, this original idea is that person’s intellectual
property. Intellectual Property refers to the inventions, literary and artistic expressions, designs
and symbols, names and logos. The ownership of such concepts lies with the creator, or the
holder of the intellectual property. This enables the creator or copyright owner to earn
recognition or financial benefit by using their creation or invention.
Intellectual Property is legally protected through copyrights, patents, trademarks, etc.
(A) Copyright: Copyright grants legal rights to creators for their original works like writing,
photograph, audio recordings, video, sculptures, architectural works, computer software,
and other creative works like literary and artistic work. Copyrights are automatically
granted to creators and authors.
Copyright law gives the copyright holder a set of rights that they alone can avail legally.
The rights include right to copy (reproduce) a work, right to create derivative works based
upon it, right to distribute copies of the work to the public, and right to publicly display
or perform the work. It prevents others from copying, using or selling the work. For
example, writer Rudyard Kipling holds the copyright to his novel, ‘The Jungle Book’, which
tells the story of Mowgli, the jungle boy. It would be an infringement of the writer’s
copyright if someone used parts of the novel without permission. To use other’s
copyrighted material, one needs to obtain a license from them.
(B) Patent: A patent is usually granted for inventions. Unlike copyright, the inventor needs
to apply (file) for patenting the invention. When a patent is granted, the owner gets an
exclusive right to prevent others from using, selling, or distributing the protected
invention. Patent gives full control to the patentee to decide whether or how the
invention can be used by others. Thus it encourages inventors to share their scientific or
technological findings with others. A patent protects an invention for 20 years, after
which it can be freely used. Recognition and/or financial benefit foster the right
environment, and provide motivation for more creativity and innovation.
(C) Trademark: Trademark includes any visual symbol, word, name, design, slogan, label,
etc., that distinguishes the brand or commercial enterprise, from other brands or
commercial enterprises. For example, no company other than Nike can use the Nike
brand to sell shoes or clothes. It also prevents others from using a confusingly similar
mark, including words or phrases. For example, confusing brands like “Nikke” cannot be
used. However, it may be possible to apply for the Nike trademark for unrelated goods
like notebooks.
Plagiarism
With the availability of Internet, we can instantly copy or share text, pictures and videos.
Presenting someone else’s idea or work as one’s own idea or work is called plagiarism. If we copy
some contents from Internet, but do not mention the source or the original creator, then it is
considered as an act of plagiarism. Further, if someone derives an idea or a product from an
already existing idea or product, but instead presents it as a new idea, then also it is plagiarism.
It is a serious ethical offense and sometimes considered as an act of fraud. Even if we take
contents that are open for public use, we should cite the author or source to avoid plagiarism.
Licensing and copyright
Licensing and copyrights are two sides of the same coin.
1. A license is a type of contract or a permission agreement between the creator of an
original work permitting someone to use their work, generally for some price; whereas
copyright is the legal rights of the creator for the protection of original work of different
types.
2. Licensing is the legal term used to describe the terms under which people are allowed to
use the copyrighted material. A software license is an agreement that provides legally
binding guidelines pertaining to the authorised use of digital material. The digital material
may include any software or any form of art, literature, photos, etc., in digital form. Any
such resource posted on the Internet constitutes intellectual property and must be
downloaded, used or distributed according to the guidelines given in the license
agreement. Failure to follow such guidelines is considered as an infringement of
Intellectual Property Rights (IPR), and is a criminal offence.
Free and open source software (FOSS)
Copyright sometimes put restriction on the usage of the copyrighted works by anyone else. If
others are allowed to use and built upon the existing work, it will encourage collaboration and
would result in new innovations in the same direction.
Licenses provide rules and guidelines for others to use the existing work. When authors share
their copyrighted works with others under public license, it allows others to use and even modify
the content.
Open source licenses help others to contribute to existing work or project without seeking special
individual permission to do so.
The GNU General Public License (GPL) and the Creative Commons (CC) are two popular
categories of public licenses.
1. CC is used for all kind of creative works like websites, music, film, literature, etc. CC
enables the free distribution of an otherwise copyrighted work. It is used when an author
wants to give people the right to share, use and build upon a work that they have created.
GPL is primarily designed for providing public licence to a software.
2. GNU GPL is another free software license, which provides end users the freedom to run,
study, share and modify the software, besides getting regular updates. Users or
companies who distribute GPL licensed works may charge a fee for copies or give them
free of charge. This distinguishes the GPL license from freeware software licenses like
Skype, Adobe Acrobat reader, etc. that allow copying for personal use but prohibit
commercial distribution, or proprietary licenses where copying is prohibited by copyright
law.
3. Many of the proprietary software that we use are sold commercially and their program
code (source code) are not shared or distributed. However, there are certain software
available freely for anyone and their source code is also open for anyone to access,
modify, correct and improve.
4. Free and open source software (FOSS) has a large community of users and developers
who are contributing continuously towards adding new features or improving the existing
features. For example, Linux kernel-based operating systems like Ubuntu and Fedora
come under FOSS. Some of the popular FOSS tools are office packages, like Libre Office,
browser like Mozilla Firefox, etc.
5. Software piracy is the unauthorised use or distribution of software. Those who purchase
a license for a copy of the software do not have the rights to make additional copies
without the permission of the copyright owner. It amounts to copyright infringement
regardless of whether it is done for sale, for free distribution or for copier’s own use. One
should avoid software piracy. Using a pirated software not only degrades the
performance of a computer system, but also affects the software industry which in turn
affects the economy of a country.
Cybercrime and cyber laws
Cybercrime: Cyber crime is defined as a crime in which computer is the medium of crime
(hacking, phishing, spamming), or the computer is used as a tool to commit crimes (extortion,
data breaches, theft).
Hacking : Hacking is the act of unauthorised access to a computer, computer network or any
digital system. Hackers usually have technical expertise of the hardware and software. They look
for bugs to exploit and break into the system.
Hacking, when done with a positive intent, is called ethical hacking. Such ethical hackers are
known as white hat hackers. They are specialists in exploring any vulnerability or loophole by
during testing of the software. Thus, they help in improving the security of a software. An ethical
hacker may exploit a website in order to discover its security loopholes or vulnerabilities. He then
reports his findings to the website owner. Thus, ethical hacking is actually preparing the owner
against any cyber attack. A non-ethical hacker is the one who tries to gain unauthorised access
to computers or networks in order to steal sensitive data with the intent to damage or bring
down systems. They are called black hat hackers or crackers
Phishing : Phishing is an unlawful activity where fake websites or emails that look original or
authentic are presented to the user to fraudulently collect sensitive and personal details,
particularly usernames, passwords, banking and credit card details.
The most common phishing method is through email spoofing where a fake or forged email
address is used and the user presumes it to be from an authentic source. So you might get an
email from an address that looks similar to your bank or educational institution, asking for your
information, but if you look carefully you will see their URL address is fake. They will often use
logo’s of the original, making them difficult to detect from the real! Phishing attempts through
phone calls or text messages are also common these days.
Identity Theft :Identity thieves increasingly use personal information stolen from computers or
computer networks, to commit fraud by using the data gained unlawfully. A user’s identifiable
personal data like demographic details, email ID, banking credentials, passport, PAN, Aadhaar
number and various such personal data are stolen and misused by the hacker on behalf of the
victim. This is one type of phishing attack where the intention is largely for monetary gain. There
can be many ways in which the criminal takes advantage of an individual’s stolen identity.
Given below are a few examples:
• Financial identity theft: when the stolen identity is used for financial gain.
• Criminal identity theft: criminals use a victim’s stolen identity to avoid detection of their true
identity. • Medical identity theft: criminals can seek medical drugs or treatment using a stolen
identity.
Ransomware:This is another kind of cyber crime where the attacker gains access to the computer
and blocks the user from accessing, usually by encrypting the data. The attacker blackmails the
victim to pay for getting access to the data, or sometimes threatens to publish personal and
sensitive information or photographs unless a ransom is paid. Ransomware can get downloaded
when the users visit any malicious or unsecure websites or download software from doubtful
repositories.
Some ransomware are sent as email attachments in spam mails. It can also reach our system
when we click on a malicious advertisement on the Internet.
Cyber bullying : someone who uses the internet to harm or frighten another person, especially
by sending them unpleasant messages.
Overview of Indian IT Act
With the growth of Internet, many cases of cyber crimes, frauds, cyber attacks and cyber bullying
are reported. The nature of fraudulent activities and crimes keeps changing. To deal with such
menaces, many countries have come up with legal measures for protection of sensitive personal
data and to safeguard the rights of Internet users. The Government of India’s The Information
Technology Act, 2000 (also known as IT Act), amended in 2008, provides guidelines to the user
on the processing, storage and transmission of sensitive information.
In many Indian states, there are cyber cells in police stations where one can report any cyber
crime. The act provides legal framework for electronic governance by giving recognition to
electronic records and digital signatures. The act outlines cyber crimes and penalties for them.
Cyber Appellate Tribunal has been established to resolve disputes arising from cyber crime, such
as tampering with computer source documents, hacking the computer system, using password
of another person, publishing sensitive personal data of others without their consent, etc. The
act is needed so that people can perform transactions over the Internet through credit cards
without fear of misuse. Not only people, the act empowers government departments also to
accept filing, creation and storage of official documents in the digital format.
E-waste: hazards and management
E-waste or Electronic waste includes electric or electronic gadgets and devices that are no longer
in use. Hence, discarded computers, laptops, mobile phones, televisions, tablets, music systems,
speakers, printers, scanners etc. constitute e-waste when they are near or end of their useful
life.
E-waste is becoming one of the fastest growing environmental hazards in the world today. The
increased use of electronic equipment has also caused an exponential increase in the number of
discarded products. Lack of awareness and appropriate skill to manage it has further worsened
the problem. So, Waste Electrical and Electronic Equipment (WEEE) is becoming a major concern
for all countries across the world. Globally, e-waste constitutes more than 5 per cent of the
municipal solid waste. Therefore, it is very important that e-waste is disposed of in such a manner
that it causes minimum damage to the environment and society.
1. Impact of e-waste on environment
When e-waste is carelessly thrown or dumped in landfills or dumping grounds, certain
elements or metals used in production of electronic products cause air, water and soil
pollution. This is because when these products come in contact with air and moisture,
they tend to leach. As a result, the harmful chemicals seep into the soil, causing soil
pollution. Further, when these chemicals reach and contaminate the natural ground
water, it causes water pollution as the water becomes unfit for humans, animals and even
for agricultural use. When dust particles loaded with heavy metals enters the
atmosphere, it causes air pollution as well.
2. Impact of e-waste on humans
The electrical or electronic devices are manufactured using certain metals and elements
like lead, beryllium, cadmium, plastics, etc. Most of these materials are difficult to recycle
and are considered to be toxic and carcinogenic. If e-waste is not disposed of in proper
manner, it can be extremely harmful to humans, plants, animals and the environment as
discussed below: • One of the most widely used metals in electronic devices (such as
monitors and batteries) is lead. When lead enters the human body through contaminated
food, water, air or soil, it causes lead poisoning which affects the kidneys, brain and
central nervous system. Children are particularly vulnerable to lead poisoning.
• When e-waste such as electronic circuit boards are burnt for disposal, the elements
contained in them create a harmful chemical called beryllium which causes skin diseases,
allergies and an increased risk of lung cancer. Burning of insulated wires to extract copper
can cause neurological disorders.
• Some of the electronic devices contain mercury which causes respiratory disorders and
brain damage.
• The cadmium found in semiconductors and resistors can damage kidneys, liver and
bones.
• None of the electronic devices are manufactured without using plastics. When this
plastic reacts with air and moisture, it passes harmful chemicals into the soil and water
resources. When consumed, it damages the immune system of the body and also causes
various psychological problems like stress and anxiety.
Management of e-waste
E-waste management is the efficient disposal of e-waste. Although we cannot completely
destroy e-waste, still certain steps and measures have to be taken to reduce harm to the humans
and environment. Some of the feasible methods of e-waste management are reduce, reuse and
recycle.
• Reduce: We should try to reduce the generation of e-waste by purchasing the electronic or
electrical devices only according to our need. Also, they should be used to their maximum
capacity and discarded only after their useful life has ended. Good maintenance of electronics
devices also increases the life of the devices.
• Reuse: It is the process of re-using the electronic or electric waste after slight modification. The
electronic equipment that is still functioning should be donated or sold to someone who is still
willing to use it. The process of re-selling old electronic goods at lower prices is called
refurbishing.
• Recycle: Recycling is the process of conversion of electronic devices into something that can
be used again and again in some or the other manner. Only those products should be recycled
that cannot be repaired, refurbished or re-used. To promote recycling of e-waste many
companies and NGOs are providing door-to-door pick up facilities for collecting the e-waste from
homes and offices.
E-waste Management in India
In India, the Environmental Protection Act, 1986, has been enacted to punish people responsible
for causing any form of pollution by paying for the damage done to the natural environment.
According to this act, “Polluter pays Principle”, any one causing any form of pollution will pay for
the damage caused. Any violation of the provisions of this act is liable for punishment.
The Central Pollution Control Board (CPCB) has issued a formal set of guidelines for proper
handling and disposal of e-waste. According to these guidelines, the manufacturer of any
electronic equipment will be “personally” responsible for the final safe disposal of the product
when it becomes an e-waste.
The Department of Information Technology (DIT), Ministry of Communication and Information
Technology, has also issued a comprehensive technical guide on “Environmental Management
for Information Technology Industry in India.” The industries have to follow these guidelines for
recycling and reuse of e-waste. In order to make the consumers aware of the recycling of e-
waste, prominent smartphone and computer manufacturing companies have started various
recycling programs.
Awareness about health concerns related to the usage of technology
As digital technologies have penetrated into different fields, we are spending more time in front
of screens, be it mobile, laptop, desktop, television, gaming console, music or sound device. But
interacting in an improper posture can be bad for us — both physically, and mentally. Besides,
spending too much time on Internet can be addictive and can have a negative impact on our
physical and psychological well being.
However, these health concerns can be addressed to some extent by taking care of the way we
position such devices and the way we position our posture. Ergonomics is a branch of science
that deals with designing or arranging workplaces including the furniture, equipments and
systems so that it becomes safe and comfortable for the user. Ergonomics helps us in reducing
the strain on our bodies — including the fatigue and injuries due to prolonged use.

When we continuously look at the screen for watching, typing, chatting or playing games, our
eyes are continuously exposed to the glare coming from the screens. Looking at small handheld
devices makes it worse. Eye strain is a symptom commonly complained by users of digital
devices. Ergonomically maintaining the viewing distance and angle, along with the position can
be of some help. However, to get rid of dry, watering, or itchy eyes, it is better to periodically
focus on distant objects, and take a break for outdoor activities.
Bad posture, backaches, neck and shoulder pains can be prevented by arranging the workspace
as recommended by ergonomics. Overuse of keyboards (be it physical keyboard or touchscreen-
based virtual keyboard) not aligned ergonomically, can give rise to a painful condition of wrists
and fingers, and may require medical help in the long run. Stress, physical fatigue and obesity
are the other related impacts the body may face if one spends too much time using digital
devices.
CLASS XII
INFORMATICS PRACTICES(065)
UNIT IV SOCIETAL IMPACTS
MASTER CARD

You might also like