Professional Documents
Culture Documents
Introduction
In this session, the most common methods to obtain statistics of a record will be
discussed. These methods are count, min, max, mean, meadian, mode, standard
deviation. The basic meaning of some methods are
Import Pandas and then read the csv file “car_sales.csv” and execute the data
frame as shown in figure 1.
In order to find out the number of records present in the data set, count()function
can be used. The data frame name should be specified when using this function
Figure 3: Getting the count of a column which is having a null value cell
2.2 Getting the count of all columns
Please note that in the previous section we specified the column name. But, if we
don’t specify it, column-wise records count can be obtained. See figure 4, the
function count is used after the frame.
3.1 Maximum
1. First, let’s find the maximum value in the Quantity column. (Please
change the first cell value back to 2884 as we put it as null in the
previous section) Specify the data frame and then the column name
with the max function as shown in figure 5.
The minimum can be taken in the same way we have done with the max function.
But in order to take the minimum, we gave to use the min() function.
Figure 7shows getting minimum of all the columns. Refer to figure 6 to verify
whether the printed values are correct.
Figure-6
Figure 7: Getting
the minimum
4 Mean
The mean is the average value of a given set of values. The mean can be calculated
by using the mean() function. As the functions we discussed previously this
function can be used to get the mean of a particular column or all the columns.
Figure 8:
Getting the mean of the Quantity column
To get the column-wise mean, remove the column name from the above code.
Then execute it as shown in figure 9. Observe that the mean of the Make column is
not shown. This is because it automatically detects that, the column contains
strings.
Figure
9: Getting the mean of the columns
5. Median
Median is the middle value of a given data set. The median can be calculated
using the median()function. Specify the data frame you want to find the median
and then use the median function. As discussed in the above sections, this function
also can be used to find out the median of a particular column or all the columns
(figure 10).
Figur
e 10: Getting the median of all columns
6 Mode
The mode is the most repeated value of a given data set. The mode can be obtained
using the mode() function. This can be used for a particular column.
To clearly obtain the mode, lets first change multiple cell values to 2884 as shown
in the figure 11.
Figure 11: Changing multiple cell values to 2884
Then let’s find the mode of the Quantity column. First specify the data frame, and
then column and at last put the mode() function as shown in figure 12 and execute
it. As you can see the mode is shown as 2884.
Figure 12:
Getting the mode of the Quantity column
For demonstration purposes now let’s put mode() function to all the columns as
shown in figure 13 and execute it. It can be observed that for Year column mode is
2007 there is no other hence shows as NaN. For both Pct and the Quantity column,
there are no repeated values hence shows all the values. The Price column mode is
12090 hence it shows in the first cell and the other cells in that column are NaN.
Figure
13: Trying to find mode of the all columns
Let’s assume that there are no repeated values in the quantity column, and then
execute the code to calculate the mode of the column: Quantity. As shown in
the figure 14, it outputs all the values in that column as there is no mode.
In the same way as the other functions are used, in order to find the Standard
Deviation std()function can be used.
In order to calculate the standard deviation of the Quantity column, Specify the
data frame you want to find the std and then the column name, and lastly, use the
std function as shown in figure 16.
Figure 16:
Getting the std of Quantity column
Column wise std can be obtained if we remove the column name. This is shown
in figure 17.
Figure 17: Obtaining column
wise std
8 Getting the information
There are two functions that can be used to obtain the statistical or concise
summary of the data frame. They are the describe() and info() functions.
Rather than column-wise obtaining the mode, median, std, etc. using the relevant
functions, describe() function can be used. It gives the summarized version of the
calculated mode, median, std, max, min, percentile values as shown in figure 18.
Fig
ure-18
8.2 info function
The info() function can be used to get a summary of index and column data types,
non-null values, and the memory usage as shown in figure 19.