Aser Unit I Basics of Statistics

26-09-2023
RANDOM VARIABLE
A variable which can take different values is called a random
PART 1 variable. It is generally denoted by capital letters: A, B, ……X,Y, Z,
………
Values taken by X are either numbers (numeric/ quantitative data)
BASICS OF STATISTICS or non – numeric (Qualitative data)
Examples of Numeric Data
E.g. X= time taken to deliver a particular order,
Y = Shipping record of time of receipt of an order to delivery.
Z = Scores received by employees in performance test conducted.
A = In a test conducted for the mother board the time to failure
B= The Business per employee in the public sector bank
X= Return on assets in a private banks.
DATA_SCIENCE_2019_20 1 DATA_SCIENCE_2019_20 2
1 2
Work
Examples of Non – Numeric Data No. CGPA UG Qualification Specialisation Experience Age (in years)
E.g. X= gender of an employee, 1 3.24 B.Com. Finance 0 23
Y = Graduate stream of a candidate. 2 3.14 B.Sc. HR 1 21
Z = Specialization offered by MMS students 3 3.72 BAF Finance 2 23
A = Sector to which a particular industry belongs 4 3.06 B.E. Systems 4 21
B= Names of states in a government data 5 3.14 BMS HR 7 22
X= Names of car – models in an automobile industry 6 3.14 CA Finance 2 23
…………………………………….. Etc. 7 3.06 B.A. Economics Operations 0 22
8 3.17 B.Sc.(IT) Systems 3 21
NOTE:
9 2.97 BCA Systems 2 22
In a given data there can be a combination of numeric and
non – numeric data. 10 3.14 B.Com. Finance 0 23
11 3.69 BMS Marketing 3 24
For example:
12 3.85 B.E. Operations 0 25
DATA_SCIENCE_2019_20 3 13 3.92 BCA DATA_SCIENCE_2019_20
Systems 0 23 4
3 4
26-09-2023
Name
Ratan Tata
Wealth in Crores
125674.12
Sector
Large diversified
Types of Data
Two types of data:
P. R. S. Oberoi 183739.00 Hospitality
(1) Ungrouped data is a data given in the form scattered values:
Azim H. Premji 64855.27 Software
X x1 x2 x3 x4 ---- --- ---- ---- xn
Mukesh Ambani 56414.35 Petrochemicals
(2) Grouped data is a data consisting of values or class intervals along with their
Sunil Mittal & Family 35558.22 Telecom frequencies: (Here frequency = number of times a particular value repeats)
Anil Ambani 34993.98 Large diversified
X x1 x2 x3 x4 x5 x6 ...... ……. xn
Tulsi R. Tanti & Family 26139.69 Wind energy
Anil Agarwal 18108.75 Metals OR F(frequency) f1 f2 f3 f4 f5 f6 …... ……. fn
Shiv Nadar 16698.47 Software & hardware (Here frequency = Number of observations/ data points/ values falling in a
Kumarmangalam Birla 16643.04 Large diversified particular interval)
Rahul Bajaj 12455.99 Auto Class Intervals 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Dilip S. Shanghvi 10584.49 Pharmaceuticals F( Frequency) 23 12 5 67 52 34 19
Baba Kalyani 7857.83
DATA_SCIENCE_2019_20
Auto components 5 DATA_SCIENCE_2019_20 6
5 6
DATA ARRAY Frequency Distribution

Given data to prepare frequency distribution of given n values with
Let X is a random variable. k class intervals proceed as follows:
Given a data values of X as: x1, x2, x3, ……xn. Step I: Find minimum (min) and maximum(max) value of the given
The arrangement of these values either is ascending or descending data.
order is called a “ Data Array”. Step II: Find the width of each of the k class intervals using
Example: Given a data Width = w = (max – min ) /k or Take suitable uniform width less
200 156 231 222 96 289 126 308 than or greater than width.
Step III: Choose proper lower limit(every thing depends on this) of
The data array is: the 1st class interval and formulate the k class intervals of width w
so as to include all the data values.
96 126 156 200 222 231 289 308 Step IV: Use Tally mark method to find the data values lying in each
of the class – interval. The number of tally marks across each class
– interval gives the frequency of that class – interval.
7 8
26-09-2023
Types of class intervals Cumulative Frequency

Each class interval has a lower limit(L) and upper limit (U) Given a frequency distribution (grouped data) with class intervals
Two types of class intervals : “Cumulative frequency” of a particular class interval
(1) Exclusive type: = cumulative freq. of previous class + freq. of the that class
Example: 0 – 10 , 10 – 20 , 20 – 30, 30 – 40, 40 – 50 Class - intervals FREQ Cumulative Frequency
Here, lower limit(L) is included but upper limit(U) is not included. 0 - 10 27 27
(2) Inclusive type: 10 - 20 12 39
Example: 0 – 5, 6 – 11, 12 – 17, 18 – 23, 24 – 29
20 - 30 8 47
Here, both lower limit(L) and upper limit(U) are included in the
class interval. 30 - 40 6 53
Convention: We will use Exclusive type of class interval unless it is 40 - 50 2 55
specifically mentioned or asked to use inclusive type of class 50 - 60 0 55
intervals. Consider an example: DATA_SCIENCE_2019_20 9
Total Frequency 55
DATA_SCIENCE_2019_20 10
9 10
Relative Frequency Finding Frequency Distribution

Given a frequency distribution (grouped data) with class intervals
“Relative frequency” of a particular class interval
Using Excel
= Freq. of that class / Total Frequency Given an ungrouped data with n values
Step 1: Find the min, max, width/ number of class intervals.
Class - intervals FREQ Relative Frequency Formulate the class intervals to include all the given data values.
0 - 10 27 27/55=0.490909091
Step 2: Find the bin value of the of every class. Generally it is 1 unit
10 - 20 12 12/55=0.218181818 point less than the upper limit of every class interval.
20 - 30 8 8/55=0.145454545
Step 3: Use function FRUQUENCY(range, bin value) to find the
30 - 40 6 6/55=0.109090909 cumulative frequency of each class = number of data points <=bin
40 - 50 2 2/55=0.036363636 value.
50 - 60 0 0/55=0 Range = column range in which values are stored.
Total 55 1
11 12
26-09-2023
Finding Frequency Distribution Graphs and diagrams

Using Excel • Histogram
Given a frequency distribution
Step 4: Find the frequency of each class using frequency =
cumulative freq. of than class – cumulative freq. of previous class
to get the frequency distribution table.
>freq_distribution.xlsx
Find the class – intervals i1, i2, i3……., in for which x1, x2, ……. Xn are
>Case_3.1 CGPA.xls midpoints.
The graph of intervals i Vs Freq. , where we draw the rectangles of
(height, length): (f1,i1), (f2, i2), ………………………………………..(fn,in)
13 14
Graphs and diagrams

Histogram • Ogive
20 19
17
Given a frequency distribution
15 X x1 X2. .. .. … …. xn
Frequency
12 Freq (F) f1 f2 …… …….. …… ……. …… fn

10 10 CF cf1 cf2 ……. ……. ……. ……. ……. ……. cfn
10 8
7
5 5
Frequency It is a graph of X Vs Cumulative frequencies.
5 3 3 Plot the points (x1,cf1), (x2,cf2)……………………(xn,cfn)
1
Join the points by straight lines /smooth curves
0
0 1 2 3 4 5 6 7 8 9 10 11
X
15 16
26-09-2023
Ogive Curve Frequency Polygon

25 60
19 50
20
17
40
15
12
cumulative frequency
10 10 30
10 8 Series1
7
5 5 20
5 3 3
1 10
0
0
0 2 4 6 8 10 12 2.5 - 2.7 2.7 - 2.9 2.9 - 3.1 3.1 - 3.3 3.3 - 3.5 3.5 - 3.7 3.7 - 3.9
X
17 18
Summarization of Data For Ungrouped Data

For ungrouped data:
Central average: ∑ ⋯…………..
(1) Mean 𝑋= , Weighted Mean =
Mean, Weighted Mean, Geometric Mean ⋯……………
Median Geometric Mean = 𝑥1. 𝑥2 … … . . 𝑥𝑛, combined Mean𝑋 =

Mode (2) Median: Arrange the data in ascending order.
If n is odd then median = middle data point
Other descriptors: Quartiles, Deciles, Percentiles = value at the ( ) position.
Measure of Dispersion: If n is even then there will be two middle data points: m1, m2
Standard Deviation, Coefficient of variation median = (m1+m2) /2
(3) Mode: If the data points are all distinct then it is not possible to find the
mode. If the data points are repeating then the most frequently appearing
DATA_SCIENCE_2019_20 19
data point is the mode. DATA_SCIENCE_2019_20 20
19 20
26-09-2023
For Ungrouped Data For Ungrouped Data

For ungrouped data: For ungrouped data:
∑ ∑ (8) Deciles: 𝐷 , 𝐷 , … … … … … … . . , 𝐷 where each
(4) Standard Deviation(SD)= S = = −𝑋 ( )
∑ ∑ 𝐷 = data point appearing at [ i ( )] position
(5) Variance = V(X) = S = = −𝑋
(9) Percentiles: 𝑃 , 𝑃 ,…………………, 𝑃 where each
(6) Coefficient of Variation (CV) = = ( )
𝑃 = data point appearing at [i ( )] position
(7) Quartiles: 𝑄 , 𝑄 , 𝑄 where
( ) (10) Range= Max – Min
𝑄 = data point appearing at [ ] position, (11) Inter Quartile Range = 𝑄 - 𝑄
( )
𝑄 = data point appearing at [ ] position Semi – Inter Quartile Range = (𝑄 - 𝑄 )/2
𝑄 = data point appearing at [
( )
] position (12) Inter fractile Range in the 𝑖 and 𝑖 fractile= 𝐷 - 𝐷 (or = 𝑃 - 𝑃 )
21 22
For Grouped Data For Grouped Data

Given frequency distribution: Given frequency distribution:
∑
(1) Mean 𝑋= ∑ , where X are the given values or midpoints of the given (3) To find mode:
class intervals and f are the frequencies.
(2) To find median: Identify the highest frequency say 𝑓
Find cumulative frequencies for the given frequency distribution. The class corresponding to this highest frequency 𝑓 is called
Find N = ∑ 𝑓, and find N/2, find the first cumulative frequency covering modal class: 𝑙 - 𝑙
(N/2).
The class corresponding this cumulative frequency is median class 𝑙 - 𝑙 If 𝑓 = frequency of the class preceding to modal class
Median =M = 𝑙 +
( )
(𝑙 - 𝑙 ) 𝑓 = frequency of the modal class
where 𝑙 - 𝑙 is a median class 𝑓 = frequency of the class succeeding to modal class
f = the frequency of the median class, cf = Cumulative Freq of the class Mode = 𝑙 + ( )(𝑙 - 𝑙 )
preceding the median class ( )
m = N/2 DATA_SCIENCE_2019_20 23 DATA_SCIENCE_2019_20 24
23 24
26-09-2023
For Grouped Data For Grouped Data

(7) Quartiles: 𝑄 , 𝑄 , 𝑄 where
Given frequency distribution: ( )
𝑄 =𝑙 + ( 𝑙 - 𝑙 ) , 𝑄 − class : 𝑙 - 𝑙 , m=(N/4), N = ∑ 𝑓
∑ ∑
(4) Standard Deviation(SD)= S = ∑
= ∑
−𝑋 cf = cumulative freq. of class preceding to 𝑄 − class
Where 𝑋=
∑ f = frequency of 𝑄 − class
∑ ( )
𝑄 =𝑙 + ( 𝑙 - 𝑙 ) , 𝑄 − class : 𝑙 - 𝑙 , m=(N/2), N = ∑ 𝑓
∑ ∑
(5) Variance V(x) = S = ∑
= ∑
−𝑋 cf = cumulative freq. of class preceding to 𝑄 − class
f = frequency of 𝑄 − class
( )
(6) Coefficient of Variation (CV) = = 𝑄 =𝑙 + ( 𝑙 - 𝑙 ) , 𝑄 − class : 𝑙 - 𝑙 , m=(3N/4), N = ∑ 𝑓
cf = cumulative freq. of class preceding to 𝑄 − class
f = frequency of 𝑄 − class
25 26
Relationships Skewness of Data

(1) Mean – Median = 3( Mean – Mode) Skewness is an indicator of lack of symmetry in a data. Data can be "skewed",
meaning it tends to have a long tail on one side or the other.
(2) Relative Dispersion =( x 100) %
(3) Interpretations of Coefficient of Variation we measure:
• Consistency (performance of cricketers) .
• Disparity (Par Capita Income of different states in India)
• Volatility / Risk
Negative Skewed Normal Positively Skewed
(Return of equity capital invested in some shares) ( ) ( )
Pearson’s Measure of Skewness = =
• Uniformity (Workload on different counters in the banks, ( )
Bowley’s coefficient of skewness = 𝑆 =
wages in different organizations)
DATA_SCIENCE_2019_20 27 DATA_SCIENCE_2019_20
( )
28
27 28

Aser Unit I Basics of Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aser Unit I Basics of Statistics

Uploaded by

Copyright:

Available Formats

26-09-2023

DATA ARRAY Frequency Distribution

Types of class intervals Cumulative Frequency

Relative Frequency Finding Frequency Distribution

Finding Frequency Distribution Graphs and diagrams

Graphs and diagrams

12 Freq (F) f1 f2 …… …….. …… ……. …… fn

Ogive Curve Frequency Polygon

Summarization of Data For Ungrouped Data

Median Geometric Mean = 𝑥1. 𝑥2 … … . . 𝑥𝑛, combined Mean𝑋 =

For Ungrouped Data For Ungrouped Data

For Grouped Data For Grouped Data

For Grouped Data For Grouped Data

Relationships Skewness of Data

You might also like