You are on page 1of 12

PANDAS: DATA STRUCTURE

There are two data structures offered by pandas are:


SERIES DATA STRUCTURE
DATA FRAME DATA STRUCTURE
1) Series Data Structure:
A series is a pandas data structure that represents a one dimensional array-like object
containing an array of data (of any numpy data type) and an associated array of data
labels called index.
Creating Series DS:
1. Create empty series
2. Create non-empty series
a) By using python lists/range
b) By using dictionary
c) By using scalar value
d) By using mathematical operations
e) By using numpy functions
CREATE EMPTY SERIES DS
Example 1)
OUTPUT 1)
import pandas as pd
Series([], dtype: float64)
a=pd.Series()
print(a)

CREATE NON - EMPTY SERIES DS


 You can create series data structure by using Series() function.
 Integer default size is int64; Float default size is float64, string datatype is object

1) By using python lists/ range


Output1 - List:
Example1 – List :
0 1
import pandas as pd
1 2
a=pd.Series([1,2,3])
2 3
print(a)
dtype: int64
Example2 – range(): Output2 – range():

import pandas as pd
a=pd.Series(range(7,11))
print(a)
Example 3:
import pandas as pd
a=pd.Series(range(2,10,2),index=['a',1,'b',2])
print(a)

Q1) Write a python code to create the series data structure by using range().

Answer:
import pandas as pd
a=pd.Series(range(3,15,4))
print(a)
2) By using Dictionary
Example for dictionary (Key –Value pair)
D={‘a’: ‘apple’, 1: ‘One’, ‘b’: ‘banana’, 2: ‘Two’}
Example1:
import pandas as pd
d={1:'One',2:'Two',3:'Three'}
s=pd.Series(d)
print(s) NOTE: string’s data type is object
Example2:
Consider a given Series , M1: Write a program in Python Pandas to create the series.

Answer:
import pandas as pd
d={‘Term1’:45,’Term2’:65,’Term3’:24,’Term4’:89}
M1=pd.Series(d)
print(M1)
3) By using scalar value
The data can be in the form of a single value or a scalar value. But if data
is a scalar data, then the index must be provided. There can be one or more entries
in index sequence. The scalar value will be repeated to match the length of index.
Example 1):
import pandas as pd
a=pd.Series(10)
print(a)
Example 2):
import pandas as pd
a=pd.Series(10,index=[1,2,3])
print(a)

Example 3):
import pandas as pd
b=pd.Series('Yet to start',index=[5,10,15])
print(b)
Answer:
import pandas as pd
S1=pd.Series(500,index=[100,101,102,103,104])
print(S1)

4) By using Mathematical Operations:


The Series() allows you to define a function or expression that can calculate
values for data sequence.
Example1:
import numpy as np
import pandas as pd
b=np.arange(5)
d=pd.Series(b*2,index=['a','b','c','a','b'])
print(d)

Example2:
import pandas as pd
a=[1,2,3,4]
b=pd.Series(a*2)
print(b)
What will be the output of the following python code.
HOME WORK
(Consider the necessary modules imported already)
a1=numpy.arange(1,10,3)
a=pandas.Series(a1**2)
print(a)

5) By using numpy functions


We can use all numpy functions such as empty(), zeros(), ones(),
arange(), linspace(), array().
NOTE: We have to import numpy as np when using numpy functions
NOTE: in numpy int-int32; float – float64; string – object
Default data type of numpy is float64.
1) numpy.empty(3) -> any 3 random values will be printed
2) numpy.zeros(2) -> 0.0 will be printed two times (since by default numpy is float)
3) numpy.ones(7)-> 7 times 1.0 will be printed
4) numpy.arange() is same as range() function in python
5) numpy.array() is same as list. Eg. numpy.array([2,7,1,0])

Example1:
Output1:
import numpy as np
0 1.0
import pandas as pd 1 1.0
2 1.0
a=pd.Series(np.ones(5))
3 1.0
print(a) 4 1.0
dtype: float64
Example2:
import pandas as pd
import numpy as np
x=np.arange(1,8,2)
y=pd.Series(x)
NOTE: since input is numpy, data type is int32
print(y)
Example3:
Consider a given Series , M1: Write a program in Python Pandas to create the series.

Answer:
import pandas as pd
import numpy as np
a=np.array([45,65,24,89])
M1=pd.Series(a, index=[‘Term1’,’Term2’,’Term3’,’Term4’])
print(M1)
Adding NaN values in a Series Object:
NaN means Not a number, used to represent absence of value or
missing value
Example1:
import numpy as np
import pandas as pd
a=pd.Series([5,np.NaN,7])
print(a)

Points to be noted about Numpy:


1. We can do all arithmetic operations like +,-,*,/,//,% on numpy variables.
2. When we are adding two numpy variables, the variables must be of same size
(length).
3. Consider a, b are two numpy variables, np.add(a,b), a+b, np.sub(a,b), a*4,
np.divide(7,b) etc can be done.
POINTS TO BE REMEBERED ABOUT LIST, NUMPY, PANDAS
1. In list arithmetic operations are not possible. But Numpy and pandas (series and
Dataframe) support all arithmetic operations.
2. In numpy index values can not be changed, but in pandas we can change index values.
3. In numpy, arithmetic operations can be performed only on same length variable. For
eg, when we are multiplying two numpy variables a and b, both must have same
number of values or same length.
4. But in Pandas unequal length variables can also be added.

Ex1) Consider a given Series , S1: Write a program in Python Pandas to create the series.

Answer:
import pandas as pd
d={‘Eng’:34,’Hindi’:37,’Maths’:30,’Sci’:40}
S1=pd.Series(d)
print(S1)

You might also like