Professional Documents
Culture Documents
DataFrame NOTES
DataFrame NOTES
FEATURES OF DataFrame
Columns can be of different types of data such as numeric, string, Boolean etc.
Size of data is mutable. ( rows / columns can be increased or decreased )
DataFrame values / data are mutable ( can be changed any time )
Labelled axes ( both row and column have index )
Arithmetic operations on rows and columns. . Column Indexes
Indexes may constitute of numbers / strings or letters
Row Indexes
0 Rahul X A 25
1 Vijay X B 34
2 Mukesh X A 20
3 Meghna X A 45
CREATION OF DataFrame
Syntax: –
Example:
/// columns and index are empty as no arguments has been passed to DataFrame() method ///
List is passed as an argument to DataFrame() method and gets converted into DataFrame elements.
Columns and Index are automatically created. By default starts from 0 if no index values are provided.
Column label
Index
/// NaN (Not a Number) is automatically added for missing values ///
Example:
Cannot use this method to add a row of data with already existing (duplicate) index value (label). In such case, a row with this
index label will be updated / modified.
(PRADEEP IP CLASSROOM) DataFrame Notes : 10 | P a g e
Adding a row with lesser values than the number of column in the DataFrame it will results in a ValueError.
DataFrame.loc[] method can be used to change the data values of a row to a particular value.
Example:
Example:
Syntax:
1
DataFrame_object[new column label] = new_value(s)
Example:
Assigning values to a new column label that does not exist will create a new column at the end.
If the column already exists in the DataFrame then the existing column values are updated.
We can change the data of entire column to a particular value in a DataFrame.
Syntax:
DataFrame_object[column_label] = new_value
By using insert() function, we can add a new column to the existing DataFrame at any position / column index.
Syntax:
Example:
It will delete the column from a DataFrame by providing the name of the column as an argument. It will return the deleted
column along with its value.
DataFrame_object.columns=[new_column_name]
(PRADEEP IP CLASSROOM) DataFrame Notes : 16 | P a g e
SYNTAX TO CHANGE THE NAME OF MULTIPLE COLUMNS:
DataFrame_object.rename(columns=d, inplace=True)
d : d is a dictionary and the keys the columns to change and the values are the new names for these columns.
inplace = True : attribute of rename() to change column names in place.
Example:
(PRADEEP IP CLASSROOM) DataFrame Notes : 17 | P a g e
RENAMING ROWS NAME IN DataFrame
dataframe_object.rename(index=d, inplace=True)
d : d is a dictionary and the keys the index to change and the values are the new names for these indexes.
inplace=True : attribute of rename() to change row names in place.
RENAMING ROWS / COLUMNS NAME USING AXIS ATTRIBUTE OF RENAME FUNCTION IN DataFrame
Syntax:
d : d is a dictionary and the keys the columns / rows to change and the values are the new names for these columns / rows.
axis : can be ‘rows’ / ‘columns’ ( to rename either rows or columns )
inplace=True : attribute of rename() to change row names in place.
When a single row label / column label is passed, it returns the row / column as series
Using the format of square brackets followed by the name of the column passed as a string value.
Syntax:
DataFrame_object[column_name]
DataFrame_object.iat[row_number, column_number]
In DataFrame, Boolean values (‘True’ or ‘False’) can be associated with indexes and can be used to filter rows using
DataFrame_object.loc[] method.
Can use True for the rows to be shown and False for rows to be hide / omit.
head() function is used to get the first n rows from the object based on position. By default, it displays the first 5 rows.
tail() function is used to get the last n rows from the object based on position. By default, it displays the last 5 rows.
Syntax:
concat() function is used to combine or concatenate two or more DataFrames on the basis of rows (row-wise , axis=0) or
columns (column-wise, axis=1).
We can use concat()method when two DataFrames have similar structure.
Syntax:
Here: on used in situation when the merge is done on the basis of common column
left_on / right_on used in situation when merge is done on the basis of different columns in the DataFrame.
The argument “left_on” is to be specified for the left DataFrame and the “right_on” for the right DataFrame name.
MISSING VALUES (NaN Not a Number) The values with no computational significance / Undefined values / Unavailable
values / values for which hasn’t entered any value.
DataFrame_object.fillna({column_name:constant_value})
Fills the missing values by copying the values from above adjacent cell (Top cell).
Synatx:
DataFrame_object.fillna(method = ‘ffill’)
DataFrame_object.Attribute name
ATTRIBUTE DESCRIPTION
DataFrame_object.index
index
Used to display / fetch row labels (index’s name).
DataFrame_object.columns
columns
Used to display / fetch column labels.
DataFrame_object.axes
axes
Used to fetch both index(row) and column names.
DataFrame_object.dtypes
dtypes
Used to display datatypes of each column in the DataFrame.
DataFrame_object.size
size
Display the size of the DataFrame, which is the product of the number of rows and columns.
DataFrame_object.shape
shape Display DataFrame size with its shape. i.e. the number of rows and the number of columns of
the DataFrame.
DataFrame_object.ndim
ndim Used to display the dimensions ( number of axes ) of the given DataFrame, whether it is 1D,
2D or 3D.
DataFrame_object.isna()
isna()
Checks the presence of NaNs (Not a number) in DataFrame.
DataFrame_object.T
T(Transpose)
Used to transpose the DataFrame i.e. rows becomes columns and columns become rows.
Example
DataFrame_object.iterrows()
iteritems()
It represent DataFrame columns-wise.
Syntax:
DataFrame_object.iteritems()
Label based indexing. When a single row label / column label is passed, it returns the row / column as series
Syntax: DataFrame_object.loc[column_name / row_labels]
loc[]
Syntax: DataFrame_object.loc[[column_name / row_labels]] /// Multiple rows /
columns.
Index based indexing. Displays all rows and columns of the DataFrame.
iloc[]
Syntax: DataFrame_object.iloc[row_indexes, column_indexes]
Index based indexing. Used to access the single values from DataFrame by their row and column number.
iat[]
Syntax: DataFrame_object.iat[row_number, column_number]
Label based indexing. Used to access the single values from DataFrame by their row label and column name.
at[]
Syntax: DataFrame_object.iat[row_number, column_number]
Using the functions .loc[] and .iloc[]. The argument passed to these functions is a Boolean value (True
Boolean
[indicated by 1] or False [indicated by 0).
indexing
Syntax: DataFrame_object.loc/iloc[]
Used to get the first n rows from the object based on position. By default, it displays the first 5 rows.
head() Syntax: DataFrame_object.head(n) /// Here, n is the number of rows to be
extracted
Used to get the first n rows from the object based on position. By default, it displays the last 5 rows.
tail() Syntax: DataFrame_object.tail(n) /// Here, n is the number of rows to be
extracted