Professional Documents
Culture Documents
Pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with 'relational' or 'labeled' data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real world data analysis in Python.
import pandas as pd
Now to the basic components of pandas.
DataFrames and Series are quite similar in that many operations that you can do
with one you can do with the other, such as filling in null values and calculating
the mean.
import pandas as pd
import numpy as np
Pandas version:
import pandas as pd
print(pd.__version__)
Key and Imports
Create DataSeries:
import pandas as pd
s = pd.Series([2, 4, 6, 8, 10])
print(s)
Sample Output:
0 2
1 4
2 6
3 8
4 10
dtype: int64
Create Dataframe:
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':
[86,97,96,72,83]});
print(df)
Sample Output:
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
1
2
3
4
5
1
2
3
4
5
6
7
1
2
3
4
5
6
7
# Example Create a series from array with
specified index
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s=
pd.Series(data,index=[1000,1001,1002,1003,1004,1
005])
print s
output:
1000 a
1001 b
1002 c
1003 d
1004 e
1005 f
dtype: object
output:
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
import pandas as pd
import numpy as np
s = pd.Series(7, index=[0, 1, 2, 3])
print s
output:
0 7
1 7
2 7
3 7
dtype: int64
# create a series
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)
a
Access or Retrieve the first three elements in
the Series:
# create a series
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)
# retrieve first three elements
print s[:3]
output:
0 a
1 b
2 c
dtype: object
# create a series
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)
3 d
4 e
5 f
dtype: object
# create a series
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s=pd.Series(data,index=[100,101,102,103,104,105])
print s[102]
output:
c
print s[[102,103,104]]
output:
102 c
103 d
104 e
dtype: object
http://www.datasciencemadesimple.com/access-elements-series-python-pandas/
Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called index.
pandas.Series
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
1
data
data takes various forms like ndarray, list, constants
2
index
Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is
passed.
3
dtype
dtype is for data type. If None, data type will be inferred
4
copy
Copy data. Default False
Array
Dict
Scalar value or constant
Example
Its output is as follows −
Series([], dtype: float64)
Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object
We did not pass any index, so by default, it assigned the indexes ranging from 0
to len(data)-1, i.e., 0 to 3.
Example 2
Its output is as follows −
100 a
101 b
102 c
103 d
dtype: object
We passed the index values here. Now we can see the customized indexed values in the
output.
Example 1
Its output is as follows −
a 0.0
b 1.0
c 2.0
dtype: float64
Observe − Dictionary keys are used to construct index.
Example 2
Its output is as follows −
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
Observe − Index order is persisted and the missing element is filled with NaN (Not a
Number).
Its output is as follows −
0 5
1 5
2 5
3 5
dtype: int64
Example 1
Retrieve the first element. As we already know, the counting starts from zero for the array,
which means the first element is stored at zeroth position and so on.
Live Demo
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Its output is as follows −
1
Example 2
Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from
that index onwards will be extracted. If two parameters (with : between them) is used,
items between the two indexes (not including the stop index)
Live Demo
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Its output is as follows −
a 1
b 2
c 3
dtype: int64
Example 3
Its output is as follows −
c 3
d 4
e 5
dtype: int64
Example 1
Example 2
Its output is as follows −
a 1
c 3
d 4
dtype: int64
Example 3
Its output is as follows −
…
KeyError: 'f'
Python Programs
>>>
0 101
1 102
2 103
3 104
4 105
dtype: int64
print(S1)
>>>
A1 101
B1 102
C1 103
D1 104
E1 105
dtype: int64
print(S2)
>>>
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
10 20
dtype: int64
#4.Create series using range() function and
changing data type
S2=pd.Series(range(10),dtype='float32')
print(S2)
>>>
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
dtype: float32
S3=pd.Series([20,np.NaN,np.NaN,45,67,89,54,45,23],index=['Anil',
'BN','BM','Ankit','Ram','Vishal','Ankita','Lokesh','Venkat'])
print(S3)
print(S3.index)
print(S3.values)
print(S3.dtype)
print(S3.shape)
print(S3.nbytes)
print(S3.ndim)
print(S3.itemsize)
print(S3.size)
print(S3.hasnans)
>>>
Anil 20.0
BN NaN
BM NaN
Ankit 45.0
Ram 67.0
Vishal 89.0
Ankita 54.0
Lokesh 45.0
Venkat 23.0
dtype: float64
dtype='object')
[20. nan nan 45. 67. 89. 54. 45. 23.]
float64
(9,)
72
1
8
9
True
print(S3)
Anil 20.0
BN NaN
BM NaN
Ankit 45.0
Ram 67.0
Vishal 89.0
Ankita 54.0
Lokesh 45.0
Venkat 23.0
dtype: float64
print(S3[6])
>>>54.0
print(S3[:2])
>>>
Anil 20.0
BN NaN
dtype: float64
print(S3[1:4])
>>>
BN NaN
BM NaN
Ankit 45.0
dtype: float64
dayno=[1,2,3,4,5,6,7]
dayname=["Monday","Tuesday","Wednesday","Thursday","Friday",
"Saturday","Sunday"]
ser_week=pd.Series(dayname,index=dayno)
print(ser_week)
>>>
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
dtype: object
#8.Creating series with integer, Nan and float
values
#Look at the change of data type of Series
#import numpy as np
S1=pd.Series([101,102,103,104,np.NaN,90.7])
print(S1)
>>>
0 101.0
1 102.0
2 103.0
3 104.0
4 NaN
5 90.7
dtype: float64
D1={'1':'Monday','2':'Tuesday','3':'Wednesday','4':'Thursday',
'5':'Friday','6':'Saturday','7':'Sunday'}
print(D1)
S5=pd.Series(D1)
print(S5)
>>>
{'1': 'Monday', '2': 'Tuesday', '3': 'Wednesday', '4': 'Thursday',
'5': 'Friday', '6': 'Saturday', '7': 'Sunday'}
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
dtype: object
#10.Creating Series using a scalar/constant value
S9=pd.Series(90.7,index=['a','b','c','d','e','f','g'])
print(S9)
>>>
a 90.7
b 90.7
c 90.7
d 90.7
e 90.7
f 90.7
g 90.7
dtype: float64
S7=pd.Series(90)
print(S7)
>>>
0 90
dtype: int64
S8=pd.Series(90,index=[1])
print(S8)
>>>
1 90
dtype: int64
>>>
0 95
1 95
2 95
3 95
4 95
dtype: int64
print(S8.iloc[1:5])
>>>
b 2
c 3
d 4
e 5
dtype: int64
>>>
b 2
c 3
d 4
e 5
dtype: int64
dayno=[91,92,93,94,95,96,97]
dayname=["Monday","Tuesday","Wednesday","Thursday","Friday",
"Saturday","Sunday"]
ser_week=pd.Series(dayname,index=dayno)
print(ser_week)
>>>
91 Monday
92 Tuesday
93 Wednesday
94 Thursday
95 Friday
96 Saturday
97 Sunday
dtype: object
pos=[0,2,5]
print(ser_week.take(pos))
>>>
91 Monday
93 Wednesday
96 Saturday
dtype: object
print(ser_week[91])
>>>
Monday
ss1=pd.Series([1,2,3,4,5],index=[11,12,13,14,15])
ss2=pd.Series(['a','b','c','d','e'])
print(ss1.append(ss2))
>>>
11 1
12 2
13 3
14 4
15 5
0 a
1 b
2 c
3 d
4 e
dtype: object
#Index numbers are repeated
print(ss1)
>>>
11 1
12 2
13 3
14 4
15 5
dtype: int64
print(ss2)
>>>
0 a
1 b
2 c
3 d
4 e
dtype: object
ss3=ss1.append(ss2)
print(ss3)
11 1
12 2
13 3
14 4
15 5
0 a
1 b
2 c
3 d
4 e
dtype: object
head() function with no arguments gets the first five rows of data from the data
series .
tail() function with no arguments gets the last five rows of data from the data
series.
import pandas as pd
import pandas as pd
S8=pd.Series([1,2,3,4,5,6,7],index=['a','b','c','d','e','f','g'])
print("The Series is")
print(S8)
The Series is
a 1
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int64
print(S8.tail())
Tail function output
c 3
d 4
e 5
f 6
g 7
dtype: int64
print(S8.head(7))
a 1
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int64
print(S8.tail(6))
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int64
print(S8.head(-4))
a 1
b 2
c 3
dtype: int64
# Syntax:
# import pandas as pd
# <series Object>=pd.Series(index=None,data=<expression [function]>)
import pandas as pd
import numpy as np
s1=np.arange(10,15)
print(s1)
[10 11 12 13 14]
sobj=pd.Series(index=s1,data=s1**2)
print(sobj)
10 100
11 121
12 144
13 169
14 196
dtype: int32
import pandas as pd
s1=pd.Series([11,12,13,14],index=[1,2,3,4])
print("Series s1")
print(s1)
Series s1
1 11
2 12
3 13
4 14
dtype: int64
s2=pd.Series([21,22,23,24],index=[1,2,3,4])
print("Series s2")
print(s2)
Series s2
1 21
2 22
3 23
4 24
dtype: int64
s3=pd.Series([21,22,23,24],index=[101,102,103,104])
print("Series s3=")
print(s3)
Series s3=
101 21
102 22
103 23
104 24
dtype: int64
print(s1+s2)
print(s1*s2)
1 231
2 264
3 299
4 336
dtype: int64
print(s1/s2)
1 0.523810
2 0.545455
3 0.565217
4 0.583333
dtype: float64
print(s1+s3)
1 NaN
2 NaN
3 NaN
4 NaN
101 NaN
102 NaN
103 NaN
104 NaN
dtype: float64
print(s1+2)
1 13
2 14
3 15
4 16
dtype: int64
print(s2*3)
1 63
2 66
3 69
4 72
dtype: int64
print(s3**2)
101 441
102 484
103 529
104 576
dtype: int64
import pandas as pd
s=pd.Series([1.0000,1.414214,1.73205,2.000000])
print(s)
0 1.000000
1 1.414214
2 1.732050
3 2.000000
dtype: float64
print (s[s<2])
0 1.000000
1 1.414214
2 1.732050
dtype: float64
print (s[s>=2])
3 2.0
dtype: float64
print(s)
0 1.000000
1 1.414214
2 1.732050
3 2.000000
dtype: float64
print(s.drop(3))
0 1.000000
1 1.414214
2 1.732050
dtype: float64
>>>
>>>
RESTART: C:/Users/naman/AppData/Local/Programs/Python/Python37-
32/panda-series.py
0 101
1 102
2 103
3 104
4 105
dtype: int64
A1 101
B1 102
C1 103
D1 104
E1 105
dtype: int64
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
10 20
dtype: int64
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
dtype: float32
Anil 20.0
BN NaN
BM NaN
Ankit 45.0
Ram 67.0
Vishal 89.0
Ankita 54.0
Lokesh 45.0
Venkat 23.0
dtype: float64
Index(['Anil', 'BN', 'BM', 'Ankit', 'Ram', 'Vishal', 'Ankita',
'Lokesh',
'Venkat'],
dtype='object')
[20. nan nan 45. 67. 89. 54. 45. 23.]
float64
(9,)
72
1
9
True
Anil 20.0
BN NaN
BM NaN
Ankit 45.0
Ram 67.0
Vishal 89.0
Ankita 54.0
Lokesh 45.0
Venkat 23.0
dtype: float64
54.0
Anil 20.0
BN NaN
dtype: float64
BN NaN
BM NaN
Ankit 45.0
dtype: float64
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
dtype: object
0 101.0
1 102.0
2 103.0
3 104.0
4 NaN
5 90.7
dtype: float64
{'1': 'Monday', '2': 'Tuesday', '3': 'Wednesday', '4': 'Thursday',
'5': 'Friday', '6': 'Saturday', '7': 'Sunday'}
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
dtype: object
a 90.7
b 90.7
c 90.7
d 90.7
e 90.7
f 90.7
g 90.7
dtype: float64
0 90
dtype: int64
1 90
dtype: int64
0 95
1 95
2 95
3 95
4 95
dtype: int64
b 2
c 3
d 4
e 5
dtype: int64
b 2
c 3
d 4
e 5
dtype: int64
91 Monday
92 Tuesday
93 Wednesday
94 Thursday
95 Friday
96 Saturday
97 Sunday
dtype: object
91 Monday
93 Wednesday
96 Saturday
dtype: object
Monday
11 1
12 2
13 3
14 4
15 5
0 a
1 b
2 c
3 d
4 e
dtype: object
11 1
12 2
13 3
14 4
15 5
dtype: int64
0 a
1 b
2 c
3 d
4 e
dtype: object
11 1
12 2
13 3
14 4
15 5
0 a
1 b
2 c
3 d
4 e
dtype: object
>>>
Pandas
pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with 'relationa' or 'labeled' data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real world data analysis in Python.
import pandas as pd
Now to the basic components of pandas.
DataFrames and Series are quite similar in that many operations that you can do
with one you can do with the other, such as filling in null values and calculating
the mean.
import pandas as pd
import numpy as np
Pandas version:
import pandas as pd
print(pd.__version__)
Key and Imports
Create DataSeries:
import pandas as pd
s = pd.Series([2, 4, 6, 8, 10])
print(s)
Sample Output:
0 2
1 4
2 6
3 8
4 10
dtype: int64
Create Dataframe:
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':
[86,97,96,72,83]});
print(df)
Sample Output:
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion
in rows and columns.
Features of DataFrame
Structure
Let us assume that we are creating a data frame with student’s data.
You can think of it as an SQL table or a spreadsheet data representation.
pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −
data
data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index
For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is
passed.
columns
For column labels, the optional default syntax is - np.arange(n). This is only true if no index is passed.
dtype
Data type of each column.
copy
This command (or whatever it is) is used for copying of data, if the default is False.
Create DataFrame
A pandas DataFrame can be created using various inputs like −
Lists
dict
Series
Numpy ndarrays
Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame using
these inputs.
Example
Live Demo
Its output is as follows −
Empty DataFrame
Columns: []
Index: []
Example 1
Live Demo
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age column to floating point.
Examples
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
>>> df.dtypes
col1 int64
col2 int64
dtype: object
>>> df.dtypes
col1 int8
col2 int8
dtype: object
>>> df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9
Attributes
Example 1
Live Demo
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Note − Observe the values 0,1,2,3. They are the default index assigned to each using the
function range(n).
Example 2
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Its output is as follows −
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
Note − Observe, the index parameter assigns an index to each row.
Example 1
The following example shows how to create a DataFrame by passing a list of dictionaries.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
Its output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0
Note − Observe, NaN (Not a Number) is appended in missing areas.
Example 2
The following example shows how to create a DataFrame by passing a list of dictionaries
and the row indices.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Its output is as follows −
a b c
first 1 2 NaN
second 5 10 20.0
Example 3
The following example shows how to create a DataFrame with a list of dictionaries, row
indices, and column indices.
Live Demo
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a',
'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a',
'b1'])
print df1
print df2
Its output is as follows −
#df1 output
a b
first 1 2
second 5 10
#df2 output
a b1
first 1 NaN
second 5 NaN
Note − Observe, df2 DataFrame is created with a column index other than the dictionary
key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices
same as dictionary keys, so NaN’s appended.
Example
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df
Its output is as follows −
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for
the d label, NaN is appended with NaN.
Let us now understand column selection, addition, and deletion through examples.
Column Selection
We will understand this by selecting a column from the DataFrame.
Example
Live Demo
import pandas as pd
df = pd.DataFrame(d)
print df ['one']
Its output is as follows −
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
Column Addition
We will understand this by adding a new column to an existing data frame.
Example
Live Demo
import pandas as pd
print df
Its output is as follows −
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Column Deletion
Columns can be deleted or popped; let us take an example to understand how.
Example
Live Demo
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
Its output is as follows −
Our dataframe is:
one three two
a 1.0 10.0 1
b 2.0 20.0 2
c 3.0 30.0 3
d NaN NaN 4
Selection by Label
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df.loc['b']
Its output is as follows −
one 2.0
two 2.0
Name: b, dtype: float64
The result is a series with labels as column names of the DataFrame. And, the Name of
the series is the label with which it is retrieved.
import pandas as pd
df = pd.DataFrame(d)
print df.iloc[2]
Its output is as follows −
one 3.0
two 3.0
Name: c, dtype: float64
Slice Rows
import pandas as pd
df = pd.DataFrame(d)
print df[2:4]
Its output is as follows −
one two
c 3.0 3
d NaN 4
Addition of Rows
Add new rows to a DataFrame using the append function. This function will append the
rows at the end.
Live Demo
import pandas as pd
df = df.append(df2)
print df
Its output is as follows −
a b
0 1 2
1 3 4
0 5 6
1 7 8
Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then
multiple rows will be dropped.
If you observe, in the above example, the labels are duplicate. Let us drop a label and will
see how many rows will get dropped.
Live Demo
import pandas as pd
df = df.append(df2)
print df
Its output is as follows −
a b
1 3 4
1 7 8
In the above example, two rows were dropped because those two contain the same label
0
A Dataframe is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns. In dataframe datasets arrange in rows and columns, we can store any
number of datasets in a dataframe. We can perform many operations on these datasets
like arithmetic operation, columns/rows selection, columns/rows addition etc.
Pandas DataFrame can be created in multiple ways. Let’s discuss different ways to create
a DataFrame one by one.
import pandas as pd
pd.DataFrame()
print(df)
Output :
Empty DataFrame
Columns: []
Index: []
Creating a dataframe using List:
DataFrame can be created using a single list or a list of lists.
filter_none
edit
play_arrow
brightness_4
# import pandas as pd
import pandas as pd
# list of strings
pd.DataFrame(lst)
print(df)
Output:
Creating DataFrame from dict of ndarray/lists:
To create DataFrame from dict of narray/list, all the narray must be of same length. If index
is passed then the length index should be equal to the length of arrays. If no index is
passed, then by default, index will be range(n) where n is the array length.
filter_none
edit
play_arrow
brightness_4
# Python code demonstrate creating
# By default addresses.
import pandas as pd
# Create DataFrame
pd.DataFrame(data)
print(df)
Output:
Create pandas dataframe from lists using dictionary:
Creating pandas data-frame from lists using dictionary can be achieved in different ways.
We can create pandas dataframe from lists using dictionary using pandas.DataFrame. With this
method in Pandas we can transform a dictionary of list to a dataframe.
filter_none
edit
play_arrow
brightness_4
# importing pandas as pd
import pandas as pd
# dictionary of lists
pd.DataFrame(dict)
print(df)
Output:
Multiple ways of creating dataframe :
Different ways to create Pandas Dataframe
Create pandas dataframe from lists using zip
Create a Pandas DataFrame from List of Dicts
Create a Pandas Dataframe from a dict of equal length lists
Creating a dataframe using List
Create pandas dataframe from lists using dictionary
pandas.Series()
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
data
data takes various forms like ndarray, list, constants
index
Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is
passed.
dtype
dtype is for data type. If None, data type will be inferred
copy
Copy data. Default False
Array
Dict
Scalar value or constant
Create an Empty Series
A basic series, which can be created is an Empty Series.
Example
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object
We did not pass any index, so by default, it assigned the indexes ranging from 0
to len(data)-1, i.e., 0 to 3.
Example 2
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
Its output is as follows −
a 0.0
b 1.0
c 2.0
dtype: float64
Observe − Dictionary keys are used to construct index.
Example 2
Observe − Index order is persisted and the missing element is filled with NaN (Not a
Number).
Its output is as follows −
0 5
1 5
2 5
3 5
dtype: int64
Example 1
Retrieve the first element. As we already know, the counting starts from zero for the array,
which means the first element is stored at zeroth position and so on.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Its output is as follows −
1
Example 2
Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from
that index onwards will be extracted. If two parameters (with : between them) is used,
items between the two indexes (not including the stop index)
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Example 1
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Example 3
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
data = {
'apples': [3, 2, 0, 1],
'oranges': [0, 3, 7, 2]
}
And then pass it to the pandas DataFrame constructor:
purchases = pd.DataFrame(data)
purchases
OUT:
apples
oranges
0
1
2
3
purchases
OUT:
apples
oranges
June
Robert
Lily
David
import pandas as pd
import numpy as np
Pandas version:
import pandas as pd
print(pd.__version__)
Key and Imports
Create Dataframe:
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':
[86,97,96,72,83]});
print(df)
Copy
Sample Output:
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Create DataSeries:
import pandas as pd
s = pd.Series([2, 4, 6, 8, 10])
print(s)
Copy
Sample Output:
0 2
1 4
2 6
3 8
4 10
dtype: int64
In [3]:
# Scatter chart using pairs of points
import matplotlib.pyplot as plt
from pylab import randn
X = randn(200)
Y = randn(200)
plt.scatter(X,Y, color='r')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()