You are on page 1of 23

7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

Daily Task 6 - Explore Merge Function

Example - 1

In [1]:

import pandas as pd

In [11]:

temp = pd.DataFrame({"City": ['Mumbai','Chennai','Nashik','Pune','Delhi','Banglore'],


"Temp": [25,23,22,21,20,26]})
temp

Out[11]:

City Temp

0 Mumbai 25

1 Chennai 23

2 Nashik 22

3 Pune 21

4 Delhi 20

5 Banglore 26

In [12]:

humidity = pd.DataFrame({"City": ['Pune','Mumbai','Chennai','Nashik','Delhi','Tamilnadu'],


"Humidity": [75,83,85,78,53,69]})
humidity

Out[12]:

City Humidity

0 Pune 75

1 Mumbai 83

2 Chennai 85

3 Nashik 78

4 Delhi 53

5 Tamilnadu 69

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 1/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [13]:

weather = pd.merge(temp,humidity) ## It will merge only for same values in both dataframes
weather ## By default how = "inner" i.e intersection of both dat

Out[13]:

City Temp Humidity

0 Mumbai 25 83

1 Chennai 23 85

2 Nashik 22 78

3 Pune 21 75

4 Delhi 20 53

In [17]:

weather = pd.merge(temp,humidity,on = "City",how="outer") ## It will merge both datasets as


weather

Out[17]:

City Temp Humidity

0 Mumbai 25.0 83.0

1 Chennai 23.0 85.0

2 Nashik 22.0 78.0

3 Pune 21.0 75.0

4 Delhi 20.0 53.0

5 Banglore 26.0 NaN

6 Tamilnadu NaN 69.0

In [21]:

weather = pd.merge(temp,humidity,on = "City",how="left",indicator = True) ## It will merge


weather

Out[21]:

City Temp Humidity _merge

0 Mumbai 25 83.0 both

1 Chennai 23 85.0 both

2 Nashik 22 78.0 both

3 Pune 21 75.0 both

4 Delhi 20 53.0 both

5 Banglore 26 NaN left_only

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 2/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [20]:

weather = pd.merge(temp,humidity,on = "City",how="right",indicator = True) ## It will merge


weather

Out[20]:

City Temp Humidity _merge

0 Pune 21.0 75 both

1 Mumbai 25.0 83 both

2 Chennai 23.0 85 both

3 Nashik 22.0 78 both

4 Delhi 20.0 53 both

5 Tamilnadu NaN 69 right_only

Example - 2

In [4]:

electronics = pd.DataFrame({"Brands" :["HP","LG","Panasonic","Sony"],


"Devices" :["Laptop","Washing Machine","TV","Keyborad"],
"Department":["Purchase","HR","Quality","Design"]
})
electronics

Out[4]:

Brands Devices Department

0 HP Laptop Purchase

1 LG Washing Machine HR

2 Panasonic TV Quality

3 Sony Keyborad Design

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 3/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [5]:

electronics_new = pd.DataFrame({"Brands" :["Intel","LG","Panasonic","Sony","Haier"],


"Devices" :["Computer","Fridge","TV","AC","Oven"],
"Department":["Production","HR","Quality","Design","Purchase"]
})
electronics_new

Out[5]:

Brands Devices Department

0 Intel Computer Production

1 LG Fridge HR

2 Panasonic TV Quality

3 Sony AC Design

4 Haier Oven Purchase

In [8]:

accessories = pd.merge(electronics,electronics_new, on = "Brands")


accessories

Out[8]:

Brands Devices_x Department_x Devices_y Department_y

0 LG Washing Machine HR Fridge HR

1 Panasonic TV Quality TV Quality

2 Sony Keyborad Design AC Design

In [10]:

accessories = pd.merge(electronics,electronics_new, on = "Brands",suffixes = ('_left', '_ri


accessories

Out[10]:

Brands Devices_left Department_left Devices_right Department_right

0 LG Washing Machine HR Fridge HR

1 Panasonic TV Quality TV Quality

2 Sony Keyborad Design AC Design

===============================================

Daily Task 7 - Perform Data Cleaning

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 4/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [2]:

sales_data_2017 = pd.read_csv('E:\Data Science by John\pandas\Sales Transactions-2017.csv')


sales_data_2017

Out[2]:

Date Voucher Party Product Qty Rate Gr

SOLANKI DONA-VAI-
0 1/4/2017 Sal:1 2 1,690.00 3,380
PLASTICS 9100

SOLANKI LITE
1 1/4/2017 Sal:1 6 1,620.00 9,720
PLASTICS FOAM(1200)

VISHNU
SARNESWARA
2 1/4/2017 Sal:2 CHOTA 500 23 11,500
TRADERS
WINE

SARNESWARA LITE
3 1/4/2017 Sal:2 6 1,620.00 9,720
TRADERS FOAM(1200)

SARNESWARA DONA-VAI-
4 1/4/2017 Sal:2 5 1,690.00 8,450
TRADERS 9100

... ... ... ... ... ... ...

10*10
47285 31/03/2018 Sal:10042 Vkp 25 137 3,425
SHEET

47286 NaN NaN NaN NaN NaN NaN N

47287 NaN NaN NaN NaN NaN NaN N

47288 NaN Total NaN NaN 607,734.60 669,300.49 9,953,816

47289 NaN Total NaN NaN 7,593,062.00 8,309,116.00 115,778,725

47290 rows × 9 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 5/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [3]:

sales_data_2018 = pd.read_csv('E:\Data Science by John\pandas\Sales Transactions-2018.csv')


sales_data_2018

Out[3]:

Date Voucher Party Product Qty Rate Gross

SILVER
0 1/4/2018 Sal:146 TP13 POUCH 50 85 4,250.00
9*12

1 1/4/2018 Sal:146 TP13 RUBBER 5 290 1,450.00

DURGA
2 1/4/2018 Sal:146 TP13 10*12 1,600.00 5.5 8,800.00
Blue

DURGA
3 1/4/2018 Sal:146 TP13 13*16 400 11 4,400.00
BLUE

10*12
4 1/4/2018 Sal:146 TP13 SARAS- 600 8.1 4,860.00
NAT

... ... ... ... ... ... ... ...

HAMPI SPOON
44735 31/03/2019 Sal:9610 200 40 8,000.00
FOODS SOOFY

44736 NaN NaN NaN NaN NaN NaN NaN

44737 NaN NaN NaN NaN NaN NaN NaN

44738 NaN Total NaN NaN 666,056.00 1,067,808.80 10,796,991.30 29,9

44739 NaN Total NaN NaN 7,097,803.00 10,024,197.00 117,897,671.80 720,2

44740 rows × 9 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 6/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [4]:

sales_data_2019 = pd.read_csv('E:\Data Science by John\pandas\Sales Transactions-2019.csv')


sales_data_2019

Out[4]:

Date Voucher Party Product Qty Rate Gross

BALAJI DONA-
0 1/4/2019 Sal:687 1 1,730.00 1,730.00
PLASTICS VAI-9100

BALAJI SMART
1 1/4/2019 Sal:687 1 1,730.00 1,730.00
PLASTICS BOUL(48)

BALAJI Vishnu
2 1/4/2019 Sal:688 110 18.5 2,035.00
PLASTICS Ice

3 28/3 0 0

BALAJI 100LEAF
4 1/4/2019 Sal:689 3 585 1,755.00
PLASTICS -SP

... ... ... ... ... ... ... ...

13*16
19171 10/10/2019 Sal:4935 K.SRIHARI WHITE 400 16 6,400.00
RK

19172 NaN NaN NaN NaN NaN NaN NaN

19173 NaN NaN NaN NaN NaN NaN NaN

19174 NaN Total NaN NaN 99,284.90 175,381.65 2,203,649.50 20

19175 NaN Total NaN NaN 2,710,193.00 5,519,888.40 53,360,791.40 672

19176 rows × 9 columns

In [5]:

import warnings
warnings.filterwarnings(action = 'ignore')

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 7/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [6]:

sales_complete_data = sales_data_2017.append([sales_data_2018,sales_data_2019])
sales_complete_data

Out[6]:

Date Voucher Party Product Qty Rate Gros

SOLANKI DONA-VAI-
0 1/4/2017 Sal:1 2 1,690.00 3,380.0
PLASTICS 9100

SOLANKI LITE
1 1/4/2017 Sal:1 6 1,620.00 9,720.0
PLASTICS FOAM(1200)

VISHNU
SARNESWARA
2 1/4/2017 Sal:2 CHOTA 500 23 11,500.0
TRADERS
WINE

SARNESWARA LITE
3 1/4/2017 Sal:2 6 1,620.00 9,720.0
TRADERS FOAM(1200)

SARNESWARA DONA-VAI-
4 1/4/2017 Sal:2 5 1,690.00 8,450.0
TRADERS 9100

... ... ... ... ... ... ...

13*16
19171 10/10/2019 Sal:4935 K.SRIHARI 400 16 6,400.0
WHITE RK

19172 NaN NaN NaN NaN NaN NaN Na

19173 NaN NaN NaN NaN NaN NaN Na

19174 NaN Total NaN NaN 99,284.90 175,381.65 2,203,649.5

19175 NaN Total NaN NaN 2,710,193.00 5,519,888.40 53,360,791.4

111206 rows × 9 columns

In [7]:

sales_complete_data.shape

Out[7]:

(111206, 9)

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 8/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [8]:

sales_complete_data.head(20)

Out[8]:

Voucher
Date Voucher Party Product Qty Rate Gross Disc
Amount

SOLANKI DONA-VAI-
0 1/4/2017 Sal:1 2 1,690.00 3,380.00 NaN 13,100.00
PLASTICS 9100

SOLANKI LITE
1 1/4/2017 Sal:1 6 1,620.00 9,720.00 NaN NaN
PLASTICS FOAM(1200)

VISHNU
SARNESWARA
2 1/4/2017 Sal:2 CHOTA 500 23 11,500.00 NaN 30,990.00
TRADERS
WINE

SARNESWARA LITE
3 1/4/2017 Sal:2 6 1,620.00 9,720.00 NaN NaN
TRADERS FOAM(1200)

SARNESWARA DONA-VAI-
4 1/4/2017 Sal:2 5 1,690.00 8,450.00 NaN NaN
TRADERS 9100

SARNESWARA CLASSIC
5 1/4/2017 Sal:2 1 1,320.00 1,320.00 NaN NaN
TRADERS ENJOY(750)

Vishnu
6 1/4/2017 Sal:898 Lock 100 30 3,000.00 100 5,400.00
250ml

BLACK
7 1/4/2017 Sal:898 Lock 100 26 2,600.00 100 NaN
DOG-350ML

khader vali late


8
en

9 try

VAMSI
10 1/4/2017 Sal:2497 KRISHNA Loose Items 1 800 800 NaN 800
FANCY

11 NaN NaN #NAME? NaN NaN NaN NaN NaN NaN

DUMMY
12
ENTRY

VAMSI
13 1/4/2017 Sal:9263 KRISHNA Loose Items 1 280 280 NaN 280
FANCY

14 NaN NaN #NAME? NaN NaN NaN NaN NaN NaN

15 dummy entry

16 1/4/2017 Sal:9545 Vkp Loose Items 1 695 695 NaN 695

17 dummy entry

LITE
18 2/4/2017 Sal:16 KPR 1 1,620.00 1,620.00 NaN 1,620.00
FOAM(1200)

BALAJI 90ML
19 3/4/2017 Sal:3 150 14.5 2,175.00 NaN 2,175.00
PLASTICS RANGEELA

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 9/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [9]:

sales_cleaned_data = pd.read_csv('E:\Data Science by John\pandas\Sales-Transactions-Edited.


sales_cleaned_data

Out[9]:

Date Voucher Party Product Qty Rate

0 1/4/2017 1 SOLANKI PLASTICS DONA-VAI-9100 2 1690.0

1 1/4/2017 1 SOLANKI PLASTICS LITE FOAM(1200) 6 1620.0

2 1/4/2017 2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23.0

3 1/4/2017 2 SARNESWARA TRADERS LITE FOAM(1200) 6 1620.0

4 1/4/2017 2 SARNESWARA TRADERS DONA-VAI-9100 5 1690.0

... ... ... ... ... ... ...

95557 12/9/2019 4265 TP13 SPOON MED M.W 20 11.0

95558 12/9/2019 4266 K.SRIHARI SMART BOUL(48) 1 1830.0

95559 12/9/2019 4267 SMS SMARTBOUL GLA(4000) 1 1520.0

95560 12/9/2019 4268 ANILFANCY RR WINEGLASS 100 20.0

95561 12/9/2019 4268 ANILFANCY RR WATER GLASS 100 20.0

95562 rows × 6 columns

In [10]:

sales_cleaned_data.shape

Out[10]:

(95562, 6)

In [11]:

sales_complete_data.dtypes

Out[11]:

Date object

Voucher object

Party object

Product object

Qty object

Rate object

Gross object

Disc object

Voucher Amount object

dtype: object

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 10/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [12]:

sales_cleaned_data.dtypes

Out[12]:

Date object

Voucher int64

Party object

Product object

Qty int64

Rate float64

dtype: object

In [15]:

sales_complete_data.info()

<class 'pandas.core.frame.DataFrame'>

Int64Index: 111206 entries, 0 to 19175

Data columns (total 9 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 98615 non-null object

1 Voucher 98649 non-null object

2 Party 111166 non-null object

3 Product 98615 non-null object

4 Qty 98649 non-null object

5 Rate 98648 non-null object

6 Gross 98648 non-null object

7 Disc 5597 non-null object

8 Voucher Amount 27560 non-null object

dtypes: object(9)

memory usage: 8.5+ MB

Step 1 - Detecting NaN Values

In [17]:

sales_complete_data.isna().sum()

Out[17]:

Date 12591

Voucher 12557

Party 40

Product 12591

Qty 12557

Rate 12558

Gross 12558

Disc 105609

Voucher Amount 83646

dtype: int64

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 11/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [18]:

sales_cleaned_data.isna().sum()

Out[18]:

Date 0

Voucher 0

Party 0

Product 0

Qty 0

Rate 1

dtype: int64

Step 2 - Remove the all NaN values

In [19]:

sales_complete_data.drop(labels=["Gross","Disc","Voucher Amount"],axis = 1, inplace=True)

In [20]:

sales_complete_data

Out[20]:

Date Voucher Party Product Qty Rate

0 1/4/2017 Sal:1 SOLANKI PLASTICS DONA-VAI-9100 2 1,690.00

1 1/4/2017 Sal:1 SOLANKI PLASTICS LITE FOAM(1200) 6 1,620.00

SARNESWARA VISHNU CHOTA


2 1/4/2017 Sal:2 500 23
TRADERS WINE

SARNESWARA
3 1/4/2017 Sal:2 LITE FOAM(1200) 6 1,620.00
TRADERS

SARNESWARA
4 1/4/2017 Sal:2 DONA-VAI-9100 5 1,690.00
TRADERS

... ... ... ... ... ... ...

19171 10/10/2019 Sal:4935 K.SRIHARI 13*16 WHITE RK 400 16

19172 NaN NaN NaN NaN NaN NaN

19173 NaN NaN NaN NaN NaN NaN

19174 NaN Total NaN NaN 99,284.90 175,381.65

19175 NaN Total NaN NaN 2,710,193.00 5,519,888.40

111206 rows × 6 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 12/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [21]:

sales_complete_data.dropna(inplace = True)
sales_complete_data

Out[21]:

Date Voucher Party Product Qty Rate

0 1/4/2017 Sal:1 SOLANKI PLASTICS DONA-VAI-9100 2 1,690.00

1 1/4/2017 Sal:1 SOLANKI PLASTICS LITE FOAM(1200) 6 1,620.00

2 1/4/2017 Sal:2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23

3 1/4/2017 Sal:2 SARNESWARA TRADERS LITE FOAM(1200) 6 1,620.00

4 1/4/2017 Sal:2 SARNESWARA TRADERS DONA-VAI-9100 5 1,690.00

... ... ... ... ... ... ...

19167 10/10/2019 Sal:4935 K.SRIHARI 16*20(100-W) 140 26

19168 10/10/2019 Sal:4935 K.SRIHARI 10*12 KRISHNA-BK(10 600 8.4

19169 10/10/2019 Sal:4935 K.SRIHARI 13*16 Bk(100)KRISHN 320 16

19170 10/10/2019 Sal:4935 K.SRIHARI 10*12 RK 800 8.5

19171 10/10/2019 Sal:4935 K.SRIHARI 13*16 WHITE RK 400 16

98614 rows × 6 columns

In [22]:

sales_complete_data.shape

Out[22]:

(98614, 6)

In [23]:

print(sales_complete_data["Party"].unique())

['SOLANKI PLASTICS' 'SARNESWARA TRADERS' 'Lock' ... 'markfed -adurupalli'

'8/10 late entry' '10-Jul']

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 13/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [24]:

sales_complete_data.info()

<class 'pandas.core.frame.DataFrame'>

Int64Index: 98614 entries, 0 to 19171

Data columns (total 6 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 98614 non-null object

1 Voucher 98614 non-null object

2 Party 98614 non-null object

3 Product 98614 non-null object

4 Qty 98614 non-null object

5 Rate 98614 non-null object

dtypes: object(6)

memory usage: 5.3+ MB

In [25]:

sales_complete_data.describe(include = "all" )

Out[25]:

Date Voucher Party Product Qty Rate

count 98614 98614 98614 98614 98614 98614

unique 836 10043 1835 867 512 1075

top TP13 100

freq 3053 3053 13056 3053 12528 3051

In [26]:

sales_complete_data["Party"].value_counts()

Out[26]:

TP13 13056

K.SRIHARI 2537

KPR 2354

SVP-BUCHHI 1620

HAMPI FOODS 1419

...

g.subharao 1

svr brandi 1

VS 1

SK.BABU 1

10-Jul 1

Name: Party, Length: 1835, dtype: int64

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 14/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [27]:

sales_complete_data["Voucher"]

Out[27]:

0 Sal:1

1 Sal:1

2 Sal:2

3 Sal:2

4 Sal:2

...

19167 Sal:4935

19168 Sal:4935

19169 Sal:4935

19170 Sal:4935

19171 Sal:4935

Name: Voucher, Length: 98614, dtype: object

In [28]:

sales_complete_data["Voucher"] = sales_complete_data["Voucher"].str.replace("Sal:","")
sales_complete_data

Out[28]:

Date Voucher Party Product Qty Rate

0 1/4/2017 1 SOLANKI PLASTICS DONA-VAI-9100 2 1,690.00

1 1/4/2017 1 SOLANKI PLASTICS LITE FOAM(1200) 6 1,620.00

2 1/4/2017 2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23

3 1/4/2017 2 SARNESWARA TRADERS LITE FOAM(1200) 6 1,620.00

4 1/4/2017 2 SARNESWARA TRADERS DONA-VAI-9100 5 1,690.00

... ... ... ... ... ... ...

19167 10/10/2019 4935 K.SRIHARI 16*20(100-W) 140 26

19168 10/10/2019 4935 K.SRIHARI 10*12 KRISHNA-BK(10 600 8.4

19169 10/10/2019 4935 K.SRIHARI 13*16 Bk(100)KRISHN 320 16

19170 10/10/2019 4935 K.SRIHARI 10*12 RK 800 8.5

19171 10/10/2019 4935 K.SRIHARI 13*16 WHITE RK 400 16

98614 rows × 6 columns

In [29]:

sales_complete_data.dtypes

Out[29]:

Date object

Voucher object

Party object

Product object

Qty object

Rate object

dtype: object

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 15/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [30]:

sales_cleaned_data

Out[30]:

Date Voucher Party Product Qty Rate

0 1/4/2017 1 SOLANKI PLASTICS DONA-VAI-9100 2 1690.0

1 1/4/2017 1 SOLANKI PLASTICS LITE FOAM(1200) 6 1620.0

2 1/4/2017 2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23.0

3 1/4/2017 2 SARNESWARA TRADERS LITE FOAM(1200) 6 1620.0

4 1/4/2017 2 SARNESWARA TRADERS DONA-VAI-9100 5 1690.0

... ... ... ... ... ... ...

95557 12/9/2019 4265 TP13 SPOON MED M.W 20 11.0

95558 12/9/2019 4266 K.SRIHARI SMART BOUL(48) 1 1830.0

95559 12/9/2019 4267 SMS SMARTBOUL GLA(4000) 1 1520.0

95560 12/9/2019 4268 ANILFANCY RR WINEGLASS 100 20.0

95561 12/9/2019 4268 ANILFANCY RR WATER GLASS 100 20.0

95562 rows × 6 columns

In [31]:

sales_complete_data.groupby(by="Party")
sales_complete_data

Out[31]:

Date Voucher Party Product Qty Rate

0 1/4/2017 1 SOLANKI PLASTICS DONA-VAI-9100 2 1,690.00

1 1/4/2017 1 SOLANKI PLASTICS LITE FOAM(1200) 6 1,620.00

2 1/4/2017 2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23

3 1/4/2017 2 SARNESWARA TRADERS LITE FOAM(1200) 6 1,620.00

4 1/4/2017 2 SARNESWARA TRADERS DONA-VAI-9100 5 1,690.00

... ... ... ... ... ... ...

19167 10/10/2019 4935 K.SRIHARI 16*20(100-W) 140 26

19168 10/10/2019 4935 K.SRIHARI 10*12 KRISHNA-BK(10 600 8.4

19169 10/10/2019 4935 K.SRIHARI 13*16 Bk(100)KRISHN 320 16

19170 10/10/2019 4935 K.SRIHARI 10*12 RK 800 8.5

19171 10/10/2019 4935 K.SRIHARI 13*16 WHITE RK 400 16

98614 rows × 6 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 16/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [63]:

sales_complete_data.head()

Out[63]:

Date Voucher Party Product Qty Rate

0 1/4/2017 1 SOLANKI PLASTICS DONA-VAI-9100 2 1,690.00

1 1/4/2017 1 SOLANKI PLASTICS LITE FOAM(1200) 6 1,620.00

2 1/4/2017 2 SARNESWARA TRADERS VISHNU CHOTA WINE 500 23

3 1/4/2017 2 SARNESWARA TRADERS LITE FOAM(1200) 6 1,620.00

4 1/4/2017 2 SARNESWARA TRADERS DONA-VAI-9100 5 1,690.00

In [64]:

sales_complete_data.dtypes

Out[64]:

Date object

Voucher object

Party object

Product object

Qty object

Rate object

dtype: object

In [65]:

sales_cleaned_data.dtypes

Out[65]:

Date object

Voucher int64

Party object

Product object

Qty int64

Rate float64

dtype: object

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 17/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [66]:

sales_complete_data["Voucher"] = sales_complete_data["Voucher"].astype("int")

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

Input In [66], in <cell line: 1>()

----> 1 sales_complete_data["Voucher"] = sales_complete_data["Voucher"].asty


pe("int")

File ~\anaconda3\lib\site-packages\pandas\core\generic.py:5912, in NDFrame.a


stype(self, dtype, copy, errors)

5905 results = [

5906 self.iloc[:, i].astype(dtype, copy=copy)

5907 for i in range(len(self.columns))

5908 ]

5910 else:

5911 # else, only a single dtype is given

-> 5912 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=error


s)

5913 return self._constructor(new_data).__finalize__(self, method="as


type")

5915 # GH 33113: handle empty frame or series

File ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py:419, in
BaseBlockManager.astype(self, dtype, copy, errors)

418 def astype(self: T, dtype, copy: bool = False, errors: str = "raise"
) -> T:

--> 419 return self.apply("astype", dtype=dtype, copy=copy, errors=error


s)

File ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py:304, in
BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)

302 applied = b.apply(f, **kwargs)

303 else:

--> 304 applied = getattr(b, f)(**kwargs)

305 except (TypeError, NotImplementedError):


306 if not ignore_failures:

File ~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py:580, in B
lock.astype(self, dtype, copy, errors)

562 """

563 Coerce to the new dtype.

564

(...)

576 Block

577 """

578 values = self.values

--> 580 new_values = astype_array_safe(values, dtype, copy=copy, errors=erro


rs)

582 new_values = maybe_coerce_values(new_values)

583 newb = self.make_block(new_values)

File ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py:1292, in astyp


e_array_safe(values, dtype, copy, errors)

1289 dtype = dtype.numpy_dtype

1291 try:

-> 1292 new_values = astype_array(values, dtype, copy=copy)

1293 except (ValueError, TypeError):

1294 # e.g. astype_nansafe can fail on object-dtype of strings

1295 # trying to convert to float

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 18/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

1296 if errors == "ignore":

File ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py:1237, in astyp


e_array(values, dtype, copy)

1234 values = values.astype(dtype, copy=copy)

1236 else:

-> 1237 values = astype_nansafe(values, dtype, copy=copy)

1239 # in pandas we don't store numpy str dtypes, so convert to object

1240 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str


):

File ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py:1154, in astyp


e_nansafe(arr, dtype, copy, skipna)

1150 elif is_object_dtype(arr.dtype):

1151

1152 # work around NumPy brokenness, #1987

1153 if np.issubdtype(dtype.type, np.integer):

-> 1154 return lib.astype_intsafe(arr, dtype)

1156 # if we have a datetime/timedelta array of objects

1157 # then coerce to a proper dtype and recall astype_nansafe

1159 elif is_datetime64_dtype(dtype):

File ~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx:668, in pandas._lib


s.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: ' '

In [60]:

sales_complete_data.sort_values(by= 'Date',ascending= False)

Out[60]:

Date Voucher Party Product Qty Rate

16219 9/9/2019 4132 TP13 PP 16 153

16221 9/9/2019 4133 ATS TAMBULAM 7*10 90 12

16223 9/9/2019 4135 SEKHAR MARKET RR WATER GLASS 500 20

16224 9/9/2019 4136 SALESMAN SURESH BLACK DOG-350ML 80 22

16226 9/9/2019 4137 KARIMULLA-VGIRI CYCLE-BK-10*12 1,600.00 6.6

... ... ... ... ... ... ...

45206 credit bill

45203 my swami devastanam

45202 sri kodanda ramaswa

45142 credit bill

5947 gsr mallam

98614 rows × 6 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 19/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [54]:

sales_complete_data.sort_values(by = "Voucher")

Out[54]:

Date Voucher Party Product Qty Rate

19145 INV 19

26053 khadervali

26065 credit bill

26142 directh autolo

26220 AUTULO

... ... ... ... ... ... ...

47108 31/03/2018 9998 KRISHNAPATNAM PORT GST AMOUNT 1 324

47110 31/03/2018 9999 PAPARAO GARUDA 13*16 BLUE 550 13

47111 31/03/2018 9999 PAPARAO 10*10 TEJA 25 128

47112 31/03/2018 9999 PAPARAO GARUDA 16*20 BLUE 60 24.5

47113 31/03/2018 9999 PAPARAO HIGH COUNT 16*20 BL 5 160

98614 rows × 6 columns

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 20/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

In [62]:

pd.to_datetime(sales_complete_data["Date"])

--------------------------------------------------------------------------
-

TypeError Traceback (most recent call las


t)

File ~\anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py:2211, i
n objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_
iso8601, allow_object, allow_mixed)

2210 try:

-> 2211 values, tz_parsed = conversion.datetime_to_datetime64(data.rav


el("K"))

2212 # If tzaware, these values represent unix timestamps, so we

2213 # return them as i8 to distinguish from wall times

File ~\anaconda3\lib\site-packages\pandas\_libs\tslibs\conversion.pyx:360,
in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ParserError Traceback (most recent call las


t)

Input In [62], in <cell line: 1>()

----> 1 pd.to_datetime(sales_complete_data["Date"])

File ~\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py:1047, in
to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, in
fer_datetime_format, origin, cache)

1045 result = arg.tz_localize(tz)

1046 elif isinstance(arg, ABCSeries):

-> 1047 cache_array = _maybe_cache(arg, format, cache, convert_listlik


e)

1048 if not cache_array.empty:

1049 result = arg.map(cache_array)

File ~\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py:197, in
_maybe_cache(arg, format, cache, convert_listlike)

195 unique_dates = unique(arg)

196 if len(unique_dates) < len(arg):

--> 197 cache_dates = convert_listlike(unique_dates, format)

198 cache_array = Series(cache_dates, index=unique_dates)

199 # GH#39882 and GH#35888 in case of None and NaT we get duplica
tes

File ~\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py:402, in
_convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_dat
etime_format, dayfirst, yearfirst, exact)

400 assert format is None or infer_datetime_format

401 utc = tz == "utc"

--> 402 result, tz_parsed = objects_to_datetime64ns(

403 arg,

404 dayfirst=dayfirst,

405 yearfirst=yearfirst,

406 utc=utc,

407 errors=errors,

408 require_iso8601=require_iso8601,

409 allow_object=True,

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 21/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

410 )

412 if tz_parsed is not None:

413 # We can take a shortcut since the datetime64 numpy array

414 # is in UTC

415 dta = DatetimeArray(result, dtype=tz_to_dtype(tz_parsed))

File ~\anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py:2217, i
n objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_
iso8601, allow_object, allow_mixed)

2215 return values.view("i8"), tz_parsed

2216 except (ValueError, TypeError):

-> 2217 raise err


2219 if tz_parsed is not None:

2220 # We can take a shortcut since the datetime64 numpy array

2221 # is in UTC

2222 # Return i8 values to denote unix timestamps

2223 return result.view("i8"), tz_parsed

File ~\anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py:2199, i
n objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_
iso8601, allow_object, allow_mixed)

2197 order: Literal["F", "C"] = "F" if flags.f_contiguous else "C"

2198 try:

-> 2199 result, tz_parsed = tslib.array_to_datetime(

2200 data.ravel("K"),

2201 errors=errors,

2202 utc=utc,

2203 dayfirst=dayfirst,

2204 yearfirst=yearfirst,

2205 require_iso8601=require_iso8601,

2206 allow_mixed=allow_mixed,

2207 )

2208 result = result.reshape(data.shape, order=order)

2209 except ValueError as err:

File ~\anaconda3\lib\site-packages\pandas\_libs\tslib.pyx:381, in pandas._


libs.tslib.array_to_datetime()

File ~\anaconda3\lib\site-packages\pandas\_libs\tslib.pyx:613, in pandas._


libs.tslib.array_to_datetime()

File ~\anaconda3\lib\site-packages\pandas\_libs\tslib.pyx:751, in pandas._


libs.tslib._array_to_datetime_object()

File ~\anaconda3\lib\site-packages\pandas\_libs\tslib.pyx:742, in pandas._


libs.tslib._array_to_datetime_object()

File ~\anaconda3\lib\site-packages\pandas\_libs\tslibs\parsing.pyx:281, in
pandas._libs.tslibs.parsing.parse_datetime_string()

File ~\anaconda3\lib\site-packages\dateutil\parser\_parser.py:1368, in par


se(timestr, parserinfo, **kwargs)

1366 return parser(parserinfo).parse(timestr, **kwargs)

1367 else:

-> 1368 return DEFAULTPARSER.parse(timestr, **kwargs)

File ~\anaconda3\lib\site-packages\dateutil\parser\_parser.py:646, in pars


er.parse(self, timestr, default, ignoretz, tzinfos, **kwargs)

643 raise ParserError("Unknown string format: %s", timestr)

645 if len(res) == 0:

--> 646 raise ParserError("String does not contain a date: %s", timest
localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 22/23
7/11/22, 1:23 PM Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook

r)

648 try:

649 ret = self._build_naive(res, default)

ParserError: String does not contain a date:

In [ ]:

localhost:8888/notebooks/Python by John/Daily Tasks/Daily Task 6 %26 7 - Explore Merge Function %26 Perform Data Cleaning.ipynb 23/23

You might also like