You are on page 1of 2

AI - ML Sessions :

Date : 09/06/2021

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that,
for example, patterns might emerge from the data.

 Descriptive statistics summarizes or describes the characteristics of a data set.

 Descriptive statistics consists of two basic categories of measures: measures of central tendency and measures of variability (or
spread).

Descriptive Statistics

Measure of Central Measure of Dispersion/


tendency Variance

 Measures of central tendency describe the center of a data set.


 Measures of variability or spread describe the dispersion of data within the set.

Measures of Central Tendency

A Measure of Central Tendency is a one number summary of the data that typically describes the center of the data. These one number
summary is of three types.

Mean : Mean is defined as the ratio of the sum of all the observations in the data to the total number of observations. This is also known
as Average. Thus mean is a number around which the entire data set is spread.

Median : Median is the point which divides the entire data into two equal halves. One-half of the data is less than the median, and the
other half is greater than the same. Median is calculated by first arranging the data in either ascending or descending order.

 If the number of observations are odd, median is given by the middle observation in the sorted form.
 If the number of observations are even, median is given by the mean of the two middle observation in the sorted form.
 An important point to note that the order of the data (ascending or descending) does not affect the median.
 Median is also know Percentile 50 or P50
 Outliers does not impact the Median.

Mode : Mode is the number which has the maximum frequency in the entire data set, or in other words, mode is the number that appears
the maximum number of times. A data can have one or more than one mode.

 If there is only one number that appears maximum number of times, the data has one mode, and is called Uni-modal.
 If there are two numbers that appear maximum number of times, the data has two modes, and is called Bi-modal
 If there are more than two numbers that appear maximum number of times, the data has more than two modes, and is
called Multi-modal.

Type of Variable Best measure of central tendency


Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median

NumPy:

NumPy is the fundamental package for scientific computing in Python. At the core of the NumPy package, is the ndarray object. This
encapsulates n-dimensional arrays of homogeneous data types.

 The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
 NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of
an ndarray will create a new array and delete the original.
 Element by element operation

vectorization and broadcasting


Vectorization describes the absence of any explicit looping, indexing, etc., in the code
Broadcasting is the term used to describe the implicit element-by-element behaviour of operations; generally speaking, in NumPy all
operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion

You might also like