You are on page 1of 36

# Summarizing and Visualizing a Data Set

## Arun Kumar, Ravindra Gokhale, and Vinaysingh Chawan

Indian Institute of Management Indore

## Rubiks Cube Data Set

Rubiks cube puzzle solving contest is held in dierent countries every year. This data set has record minimum time for thirty three countries.

Type of Data

Quantitative Data: Data for which arithmetic operation makes sense. Ex: Age, Salary, Length.

Categorical Data: Data obtained by putting individuals in dierent categories. Ex: Gender, States of a country

Visualization

## *Discuss Rubiks cube data

Interpreting a Histogram

Shape: symmetric, skewed, unimodal, bimodal Center: mean, median Spread: range, standard deviation, inter-quartile range

## Measure of the central tendency of a data set

Mean: If we have a data set x1 , . . . , xn then mean of the data set is x1 ++xn . n

Notation: x

Mean: Example

## Measure of the Central Tendency of a Data Set

Median: Middle number in a sorted data set. When the number of observations (sample size) is an even number then there are two middle numbers. In that case, we take average of the two middle numbers to obtain the median.

Notations: x

Median: Example 1

## Calculate the median of 0,5,1,1,3.

Median: Example 2

## Measure of the Central Tendency of a Data Set

Mode: Observation in the data set with the largest frequency. Note that we can have more than one mode for a data set.

Mode: Example

## Calculate the mode of 0,5,1,1,3.

Eect of an Outlier

## Calculate mean, median, and mode of 0,5,1,1,3,100.

Eect of an Outlier

## Unit of Mean, Median, and Mode

Mean, Median, and Mode has the same unit as the data.

Symmetric:

Left skewed:

Right skewed:

Range: max-min

## Measure of the Spread of a Data Set

Variance:

n x 2 i=1 (xi )

Standard deviation:

n x 2 i=1 (xi )

## Variance and Standard Deviation: Example

Calculate variance and standard deviation of 3,3,3,3,3.

## Unit of Variance and Standard Deviation

Standard deviation has the same unit as the data but the unit of variance is square of the unit of the data.

Standard Deviation

Quartiles

Notation: Q1

Quartiles

Notation: Q3

Exercise

Quartiles

## Inter Quartile Range (IQR): Q3 Q1

*IQR is a robust measure of spread. IQR does not get aected much by skewness or outliers.

Exercise

## Minimum First quartile Median Third quartile Maximum

Boxplot

*We will create a box plot for the rubiks cube data set.

## Interpreting a Box Plot

Shape:

Outliers: Any observation not in the range [Q1 1.5 IQR, Q3 + 1.5 IQR] is considered an outlier (Informal Rule).

Example

*Bar Chart

*Pie Chart