Professional Documents
Culture Documents
1. Write a Pandas program to create and display a one-dimensional array-like object containing an
array of data
Code:
import pandas as pd
ds = pd.Series([2, 4, 6, 8, 10])
print(ds)
Sample Output:
0 2
1 4
2 6
3 8
4 10
dtype: int64
2. Write a Pandas program to convert a Panda module Series to Python list and it’s type
Code:
import pandas as pd
ds = pd.Series([2, 4, 6, 8, 10])
print("Pandas Series and type")
print(ds)
print(type(ds))
print("Convert Pandas Series to Python list")
print(ds.tolist())
print(type(ds.tolist()))
Sample Output:
Pandas Series and type
0 2
1 4
2 6
3 8
4 10
dtype: int64
<class 'pandas.core.series.Series'>
Convert Pandas Series to Python list
[2, 4, 6, 8, 10]
<class 'list'>
3. Write a Pandas program to add, subtract, multiple and divide two Pandas Series
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]
Code:
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9])
ds = ds1 + ds2
print("Add two Series:")
print(ds)
print("Subtract two Series:")
ds = ds1 - ds2
print(ds)
print("Multiply two Series:")
ds = ds1 * ds2
print(ds)
print("Divide Series1 by Series2:")
ds = ds1 / ds2
print(ds)
Sample Output:
Add two Series:
0 3
1 7
2 11
3 15
4 19
dtype: int64
Subtract two Series:
0 1
1 1
2 1
3 1
4 1
dtype: int64
Multiply two Series:
0 2
1 12
2 30
3 56
4 90
dtype: int64
Divide Series1 by Series2:
0 2.000000
1 1.333333
2 1.200000
3 1.142857
4 1.111111
dtype: float64
Code:
import numpy as np
import pandas as pd
np_array = np.array([10, 20, 30, 40, 50])
print("NumPy array:")
print(np_array)
new_series = pd.Series(np_array)
print("Converted Pandas series:")
print(new_series)
Sample Output:
NumPy array:
[10 20 30 40 50]
Converted Pandas series:
0 10
1 20
2 30
3 40
4 50
dtype: int64
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
1. Write a Pandas program to create and display a DataFrame from a specified dictionary data which has
the index labels.
Code:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print(df)
Sample Output:
b 3 Dima no 9.0
d 3 James no NaN
e 2 Emily no 9.0
h 1 Laura no NaN
i 2 Kevin no 8.0
Code:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Original rows:")
print(df)
print(df)
Sample Output:
Original rows:
b 3 Dima no 9.0
d 3 James no NaN
e 2 Emily no 9.0
i 2 Kevin no 8.0
b 3 Dima no 9.0
d 3 Suresh no NaN
e 2 Emily no 9.0
h 1 Laura no NaN
i 2 Kevin no 8.0
Code:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print("Original rows:")
print(df)
color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
df['color'] = color
print(df)
Sample Output:
Original rows:
b 3 Dima no 9.0
d 3 James no NaN
e 2 Emily no 9.0
h 1 Laura no NaN
i 2 Kevin no 8.0
Code:
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data , index=labels)
print(list(df.columns.values))
Sample Output:
--------->>>>>Pandas index:
1. Write a Pandas program to display the default index and set a column as an Index in a given
dataframe.
Test Data:
Code:
import pandas as pd
df = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
'date_Of_Birth': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
print("Default Index:")
print(df.head(10))
df1 = df.set_index('school_code')
print(df1)
df2 = df.set_index('t_id')
print(df2)
Sample Output:
Default Index:
t_id
Test Data:
Code:
import pandas as pd
print("Create an Int64Index:")
df_i64 = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
'date_Of_Birth': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
index=[1, 2, 3, 4, 5, 6])
print(df_i64)
print(df_i64.index)
df_f64 = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
print(df_f64)
print(df_f64.index)
Sample Output:
Create an Int64Index:
1. Write a Pandas program to convert all the string values to upper, lower cases in a given pandas series.
Also find the length of the string values.
Code:
import pandas as pd
import numpy as np
s = pd.Series(['X', 'Y', 'Z', 'Aaba', 'Baca', np.nan, 'CABA', None, 'bird', 'horse', 'dog'])
print("Original series:")
print(s)
print(s.str.upper())
print(s.str.lower())
print(s.str.len())
Sample Output:
Original series:
0 X
1 Y
2 Z
3 Aaba
4 Baca
5 NaN
6 CABA
7 None
8 bird
9 horse
10 dog
dtype: object
0 X
1 Y
2 Z
3 AABA
4 BACA
5 NaN
6 CABA
7 None
8 BIRD
9 HORSE
10 DOG
dtype: object
0 x
1 y
2 z
3 aaba
4 baca
5 NaN
6 caba
7 None
8 bird
9 horse
10 dog
dtype: object
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
5 NaN
6 4.0
7 NaN
8 4.0
9 5.0
10 3.0
dtype: float64
2. Write a Pandas program to remove whitespaces, left sided whitespaces and right sided white spaces
of the string values of a given panda series.
Code:
import pandas as pd
color1 = pd.Index([' Green', 'Black ', ' Red ', 'White', ' Pink '])
print("Original series:")
print(color1)
print("\nRemove whitespace")
print(color1.str.strip())
print(color1.str.lstrip())
print(color1.str.rstrip())
Sample Output:
Original series:
Index([' Green', 'Black ', ' Red ', 'White', ' Pink '], dtype='object')
Remove whitespace
import pandas as pd
df = pd.DataFrame({
})
print("Original DataFrame:")
print(df)
print(df)
Sample Output:
Original DataFrame:
4. Write a Pandas program to swap the cases of a specified character column in a given DataFrame.
Code:
import pandas as pd
df = pd.DataFrame({
'date_of_sale': ['12/05/2002','16/02/1999','25/09/1998','12/02/2022','15/09/1997'],
})
print("Original DataFrame:")
print(df)
print(df)
Sample Output:
Original DataFrame:
2 zefsalf .. ZEFSALF
[5 rows x 4 columns]
1. Write a Pandas program to join the two given dataframes along rows and assign all data.
Test Data:
student_data1:
3 S4 Ed Bernal 222
student_data2:
Code:
import pandas as pd
student_data1 = pd.DataFrame({
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
student_data2 = pd.DataFrame({
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
print("Original DataFrames:")
print(student_data1)
print("-------------------------------------")
print(student_data2)
print(result_data)
Sample Output:
Original DataFrames:
3 S4 Ed Bernal 222
-------------------------------------
3 S4 Ed Bernal 222
2. Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display
the combined data
Test Data:
student_id name marks
3 S4 Ed Bernal 222
Dictionary:
student_id S6
marks 205
dtype: object
Code:
import pandas as pd
student_data1 = pd.DataFrame({
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
print("Original DataFrames:")
print(student_data1)
print("\nDictionary:")
print(s6)
print("\nCombined Data:")
print(combined_data)
Sample Output:
Original DataFrames:
3 S4 Ed Bernal 222
Dictionary:
student_id S6
marks 205
dtype: object
Combined Data:
3 S4 Ed Bernal 222
3. Write a Pandas program to join the two dataframes with matching records from both sides where
available.
Test Data:
student_data1:
3 S4 Ed Bernal 222
student_data2:
Code:
import pandas as pd
student_data1 = pd.DataFrame({
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
student_data2 = pd.DataFrame({
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
print("Original DataFrames:")
print(student_data1)
print(student_data2)
print(merged_data)
Sample Output:
Original DataFrames:
3 S4 Ed Bernal 222
e) Current date.
Code:
import datetime
print(datetime(2012, 1, 15))
print(datetime.now())
print(datetime.date(datetime(2012, 5, 22)))
print("\nCurrent date:")
print(datetime.now().date())
print(datetime.now().time())
Sample Output:
2012-01-15 00:00:00
2011-01-15 21:20:00
2020-08-17 09:56:17.459790
2012-05-22
Current date:
2020-08-17
Time from a datetime:
18:12:00
09:56:17.461250
2. Write a Pandas program to create a date from a given year, month, day and another date from a given
string formats.
Code:
print(date1)
print(date2)
Sample Output:
2020-12-25 00:00:00
2021-01-01 00:00:00
3. Write a Pandas program to create a time-series with two index labels and random values. Also print
the type of the index.
Code:
import pandas as pd
import numpy as np
import datetime
print(time_series)
print(type(time_series.index))
Sample Output:
2011-09-01 -0.257567
2011-09-02 0.947341
dtype: float64
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Consider dataset:
school class name date_Of_Birth age height weight address
1. Write a Pandas program to split the following dataframe into groups based on school code. Also check
the type of GroupBy object.
Code:
import pandas as pd
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
student_data = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
print("Original DataFrame:")
print(student_data)
print('\nSplit the said data on school_code wise:')
result = student_data.groupby(['school_code'])
print("\nGroup:")
print(name)
print(group)
print(type(result))
Sample Output:
Original DataFrame:
[6 rows x 8 columns]
Group:
s001
Group:
s002
[2 rows x 8 columns]
Group:
s003
[1 rows x 8 columns]
Group:
s004
[1 rows x 8 columns]
<class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
2. Write a Pandas program to split the following dataframe by school code and get mean, min, and max
value of age for each school.
Code:
import pandas as pd
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
student_data = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
print("Original DataFrame:")
print(student_data)
print('\nMean, min, and max value of age for each value of the school:')
print(grouped_single)
Sample Output:
Original DataFrame:
[6 rows x 8 columns]
Mean, min, and max value of age for each value of the school:
age
school_code
s001 12.5 12 13
s002 13.0 12 14
s003 13.0 13 13
s004 12.0 12 12
----------->>>>>Pandas styling
1. Create a dataframe of ten rows, four columns with random values. Write a Pandas program to
highlight the negative numbers red and positive numbers black
Code:
import pandas as pd
import numpy as np
np.random.seed(24)
print("Original array:")
print(df)
def color_negative_red(val):
df.style.applymap(color_negative_red)
Original array:
A B C D E
Sample Output:
2. Create a dataframe of ten rows, four columns with random values. Write a
Pandas program to highlight the maximum value in each column.
Code:
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4),
columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
df.iloc[3, 3] = np.nan
df.iloc[4, 1] = np.nan
df.iloc[9, 4] = np.nan
print("Original array:")
print(df)
def highlight_max(s):
'''
highlight the maximum in a Series green.
'''
is_max = s == s.max()
return ['background-color: green' if v else '' for v in is_max]
Sample Output:
3. Create a dataframe of ten rows, four columns with random values. Write a
Pandas program to highlight dataframe's specific columns.
Code:
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4),
columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
df.iloc[3, 3] = np.nan
df.iloc[4, 1] = np.nan
df.iloc[9, 4] = np.nan
print("Original array:")
print(df)
def highlight_cols(s):
color = 'grey'
return 'background-color: %s' % color
print("\nHighlight specific columns:")
df.style.applymap(highlight_cols, subset=pd.IndexSlice[:, ['B', 'C']])
Original array:
Original array:
A B C D E
Excel Data:
coalpublic2013.xlsx:
Code:
import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')
print(df.head)
Sample Output:
Year MSHA ID Mine_Name Production Labor_Hours
0 2013 103381 Highwall Miner 56004 22392
1 2013 103404 Reid School Mine 28807 28447
2 2013 100759 Underground Min 1440115 474784
3 2013 103246 Bear Creek 87587 29193
2. Write a Pandas program to find the sum, mean, max, min value of
'Production (short tons)' column of coalpublic2013.xlsx file
Excel Data:
coalpublic2013.xlsx:
Code:
import pandas as pd
import numpy as np
df = pd.read_excel('E:\coalpublic2013.xlsx')
print("Sum: ",df["Production"].sum())
print("Mean: ",df["Production"].mean())
print("Maximum: ",df["Production"].max())
print("Minimum: ",df["Production"].min())
Sample Output:
Sum: 1611713
Mean: 402928.25
Maximum: 14,40,115
Minimum: 28,807