Professional Documents
Culture Documents
STATISTICS
WITH
PANDAS
DESCRIPTIVE STATISTICS WITH PANDAS
SYNTAX:
DataFrame.max(axis=None, skipna=bool/None,
numeric_only=bool/None)
CALCULATING MAXIMUM VALUES
DataFrame.max(axis=None, skipna=bool/None,
numeric_only=bool/None)
axis = 0 (columns) / 1 (rows)
skipna ( exclude NA/null values )
numeric_only ( accept bool. True: numeric column
only. False: every columns.
By default None )
CALCULATING MAXIMUM VALUES
df[‘columnname’].max()
CALCULATING MINIMUM VALUES
min()
Used to calculate the minimum values from the
DataFrame, regardless of its data types.
By default the axis is the COLUMN WISE.
SYNTAX:
DataFrame.min(axis=None, skipna=bool/None,
numeric_only=bool/None)
CALCULATING MAXIMUM VALUES
DataFrame.min(axis=None, skipna=bool/None,
numeric_only=bool/None)
axis = 0 (columns) / 1 (rows)
skipna ( exclude NA/null values )
numeric_only ( accept bool. True: numeric column
only. False: every columns.
By default None )
CALCULATING SUM
sum()
Used to add all of the values in a particular column
of a dataframe.
Skips all the missing values by default.
SYNTAX:
DataFrame.sum(axis=None, skipna=bool/None,
numeric_only=bool/None)
CALCULATING COUNT
count()
Used to get the number of values present in the column.
It counts all the non-NA entries for each row or column.
Ignores NA values.
SYNTAX:
DataFrame.count(axis=None, numeric_only=bool/
None)
CALCULATING MODE
mode()
mode - the most repeated values of a given set of
numbers.
mode() function calculates the mode i.e. the most
occurrence of each element among the axis
selected.
Can return multiple values. If no mode value is
found, sorted dataframe will be returned.
CALCULATING MODE
axis 0 get mode for each column
axis 1 get mode for each row
SYNTAX:
DataFrame.mode(axis=None, numeric_only=
True/False)
CALCULATING MEAN
mean() mean – average.
mean()function calculates the arithmetic mean
(average) of a dataframe, rows and column.
SYNTAX:
DataFrame.mean(axis=None, numeric_only=
True/False, skipna=None)
CALCULATING MEDIAN
median() median – middle value.
median()function calculates the median
(middle) of a dataframe, rows and column.
SYNTAX:
DataFrame.median(axis=None,
numeric_only= True/False, skipna=None)
CALCULATING QUANTILE
quantile()
means Fractile / Quarter (A sample is divided into equal –
sized subgroup)
quantile() function is used to get the quantile of each rows
and columns of the dataframe.
Divides the dataframe in four equal parts:
1st quantile 25% .25
2nd quantile 50% .5 (Median)
3rd quantile 75% .75
CALCULATING QUANTILE
SYNTAX:
DataFrame.quantile(p,
axis=None,
numeric_only= True/False, skipna=None)
SYNTAX:
DataFrame.describe(axis=None,
numeric_only= True/False, skipna=None)
DATA AGGREGATIONS
Dataaggregation is the process where data is collected
and presented in a summarized format for statistical
analysis.
“Aggregationmeans to transform the dataset and
produce a single numeric value from an array.”
Aggregation can be applied to one or more columns
together.
Aggregation functions are max(), min(), sum(), count(),
std(), ver().
DATA AGGREGATIONS
aggregate()
SYNTAX:
DataFrame.aggregate (‘function’,axis=None)
DataFrame.aggregate ([‘function’, ‘function’],
axis=None)
SORTING
Sorting refers to the arrangement of data
elements in a specific order, which can be
either be ascending or descending.
sort_values() function is used sort the data
values of a dataframe.
SYNTAX:
DataFrame.sort_values (by, axis=0/1, ascending
=True/False)
SORTING
Syntax: dataframe_object.isnull()
CHECKING MISSING VALUES
dataframe_object[‘column_name’].isnull()
// checks NaN column wise //
dataframe_obj.groupby(column-
name).aggregate-function
GROUPBY
first() - display the first entry from each group.
last() - display the last entry from each group.
size() - display the size of each group.
groups - display group data (group name, row index)
get_group() - display the data of a single group.
PIVOTING
RESHAPE DATA in DataFrame.
Allows to transform columns into rows and rows
into columns.
Summarize large amounts of data.
Used to summarize, sort, reorganize, group,
count, total or average data stored in a table.
PIVOTING FUNCTION IN PYTHON PANDAS
1. pivot() 2. pivot_table()
pivot()
pivot() method creates a new dataframe after reshaping the data
based on column values.
Take three arguments – index, columns and values.
SYNTAX:
dataframe_object.pivot(index, columns, values)
index - creates an index of a new dataframe, which is column
name from the original table.
columns - create columns of new dataframe (columns from
original table).
values - values of the columns from the original table.
Pivot_table()
Pivoting with aggregation.
Pivot_table() method is used when duplicate values are there in rows and
columns.
SYNTAX:
pd.pivot(DataFrame, index, columns, values, aggfunc)
DataFrame - pandas DataFrame
index - creates an index of a new dataframe, which is column name from
the original table.
columns - create columns of new dataframe (columns from original table).
aggfunc – functions to use like sum, max, min, mean, std, etc.
values - hold the value of a column to be aggregated.
REINDEXING AND RENAMING
rename()
rename () method is used to rename the
indexes in a dataframe.
Syntax:
dataframe_object.rename(index, inplace)
REINDEXING AND RENAMING
set_index()
set_index () method is used change the index
to some other column of the dataframe.
Syntax:
dataframe_object.set_index(column,
inplace=True)
REINDEXING AND RENAMING
reset_index()
reset_index () method is used to create a new
continuous index.
Syntax:
dataframe_object.reset_index( inplace=True)
REINDEXING AND RENAMING
reindex()
reindex () method is used revert back to the
previous index.
Syntax:
dataframe_object.reindex(index / column,
inplace=True)