You are on page 1of 3

SettingWithCopyWarning is one of the most common obstacles people run into when learning pandas and even pandas

itself does not guarantee one single outcome for two


lines of code that may look identical.

This article explains why the warning is generated and shows you how to solve it.

The first thing to understand is that SettingWithCopyWarning is a warning, and not an error. It informs us that our operation might not have worked as expected and that
we should check the result to make sure there isn't any mistake.

This is bad practice and SettingWithCopyWarning should never be ignored. We should always take some time to understand why we are getting the warning before taking
action.

import pandas as pd 

Reading CSV file

Showing first five rows


movies=pd.read_csv('http://bit.ly/imdbratings') movies.head()

Example 1

Showing the total number of null values in column "Content_rating"


movies['content_rating'].isnull().sum() 

Showing those particular rows where column 'content_rating' has null values i.e NAN
movies[movies['content_rating'].isnull()] 

Showing unique values of content rating where we can see that NOT RATED = 65 and it should be represented as missing values and it is best to replace with NAN
movies['content_rating'].value_counts() 


import numpy as np

Overwrite "NOT RATED" with NAN values



movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan

Warning

The problem with the above line of code is that it involves two operation and pandas is having difficulty to know whether this code is returning copy or view if it is a
view it will change the underlying data but if it is a copy it will not affect the DataFrame which is happening in this case, henceforth we are getting warnings

movies['content_rating'].isnull().sum()

So, we can see output of the above code which has still not changed means the code "movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan" has
not made any changes so it is advisable to not ignore warnings.

The best way to improve the code or avoid warnings in the above line of code is to use "Loc" method

movies.loc[movies['content_rating']=='NOT RATED','content_rating']=np.nan 

we see there is no warning while implementing loc method

movies['content_rating'].isnull().sum() 

We can see the output is changed to 68

Example 2

Showing the rows where column "star_rating" is greater than 9



top_movies=movies.loc[movies['star_rating']>9] top_movies

Using the assignment operator to assign value 150 in place of 142 in duration column of first row
top_movies.loc[0,'duration'] =150 

Warning

top_movies

In this output we can see that duration has been changed to 150 from 142 even we see that there was warnings showing
The problem is with the code "top_movies=movies.loc[movies['star_rating']>9]" which making it difficult for pandas to understand whether it is a view or copy
Improvement in the previous line of code by adding copy function

top_movies=movies.loc[movies['star_rating']>9].copy()
top_movies.loc[0,'duration'] =150 top_movies.head()

So we see in the above two examples how avoiding warning can lead to wrong representation of data

You might also like