Professional Documents
Culture Documents
This article explains why the warning is generated and shows you how to solve it.
The first thing to understand is that SettingWithCopyWarning is a warning, and not an error. It informs us that our operation might not have worked as expected and that
we should check the result to make sure there isn't any mistake.
This is bad practice and SettingWithCopyWarning should never be ignored. We should always take some time to understand why we are getting the warning before taking
action.
import pandas as pd
movies=pd.read_csv('http://bit.ly/imdbratings') movies.head()
Example 1
Showing those particular rows where column 'content_rating' has null values i.e NAN
movies[movies['content_rating'].isnull()]
Showing unique values of content rating where we can see that NOT RATED = 65 and it should be represented as missing values and it is best to replace with NAN
movies['content_rating'].value_counts()
import numpy as np
Warning
The problem with the above line of code is that it involves two operation and pandas is having difficulty to know whether this code is returning copy or view if it is a
view it will change the underlying data but if it is a copy it will not affect the DataFrame which is happening in this case, henceforth we are getting warnings
movies['content_rating'].isnull().sum()
So, we can see output of the above code which has still not changed means the code "movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan" has
not made any changes so it is advisable to not ignore warnings.
The best way to improve the code or avoid warnings in the above line of code is to use "Loc" method
movies.loc[movies['content_rating']=='NOT RATED','content_rating']=np.nan
movies['content_rating'].isnull().sum()
Example 2
Using the assignment operator to assign value 150 in place of 142 in duration column of first row
top_movies.loc[0,'duration'] =150
Warning
top_movies
In this output we can see that duration has been changed to 150 from 142 even we see that there was warnings showing
The problem is with the code "top_movies=movies.loc[movies['star_rating']>9]" which making it difficult for pandas to understand whether it is a view or copy
Improvement in the previous line of code by adding copy function
top_movies=movies.loc[movies['star_rating']>9].copy()
top_movies.loc[0,'duration'] =150 top_movies.head()
So we see in the above two examples how avoiding warning can lead to wrong representation of data