Attribution Non-Commercial (BY-NC)

6 views

Attribution Non-Commercial (BY-NC)

- Ch24 Answers
- AP Statistics 3.5
- StrategicBusinessAlliances
- 14520978 Ch24 Multivariate Analysis
- Correlation
- Paired data, correlation & regression
- Torben Juul Andersen, Jerker Denrell and Richard Bettis-1. 2007. Strategic Responsiveness and Bowman’s Risk-Return Paradox, Strategic Management Journal, 28(4)
- A Longitudinal Test of a Model of the Antecedents and Consequences of Union Loyalt
- Correlation and Regression Skill Set
- 3 - QSM 754 Course PowerPoint Slides v8
- Bms Project Shubham 4
- The relationship between the Crude Oil price and the Trade weighted US Dollar Index
- m11l21
- Internet Security
- A Study on the Impact of Women Self-help Groups (SHGs) on Rural Entrepreneurship Development-A Case Study in Selected Areas of West Bengal
- Correlation 1
- The Causal Relationship between Anopheles Mosquito Population and Climatic Factors in Makurdi- Nigeria: An Empirical Analysis
- Week2ReadingNotes (1)
- Reaserch Project
- Course Outline

You are on page 1of 13

The main objective of statistical analysis is to represent the data by one single

value which shows the concentration of data at that particular value. Such a value is

called the central value which facilitates easy comparison between two or more series

compared to loose data. Quantitative data organized or unorganized show a common

characteristic to concentrate at certain values usually some where in the centre of

distribution. Thus various measures which are employed to measure this tendency are

called measures of Central tendency. Constructing frequency distribution of raw data is

the first step towards condensation of large data into compact form. It is necessary to

condense the data into a single value. Such a single value is called an average. In most

of the data the average is a centre of concentration of the values in the date. Therefore,

the average is called a measure of central tendency. All values of the data are clustered

around the average and it carries the important properties of data. In that sense, it is

representative of the distribution. Two famous statistician named Yule and Kendall had

laid down certain requirements for an ideal average as follows:

2. Its computation should be based on all observations.

3. It should lend itself for algebraic treatment.

4. It should be least affected by extreme observations.

5. It should be easy to calculate and simple to understand.

6. It should not be affected by fluctuations of sampling.

2. Median

3. Mode

4. Quartiles

5. Geometric mean

6. Harmonic mean

7. Weighted mean

1. AM : It is the best known & widely used measures of central tendency. It is the

sum of all observations divided by no. of observations.

Sum of all observations

Mean =

No. of observations

Symbolically, if X1, X2, …….. XN are the values of a variable the mean is

computed by the formula.

N

i=1

N N

∑ is read as sigma

X = The mean of values

Xi = Values of the variable

N = No. of values

(Discrete frequency dist”) Total frequencies

Symbolically, if X1, X2 , …….XN are the value of a variable and F1, F2 …………..FN are

their corresponding frequencies, the mean is computed by the formula

N N

X = f1 X1 + f2 X2 + ……… + fN XN = ∑ f Xi = ∑ f Xi

i=1 i=1

f1 + f2 + ……… + fN ∑f N N

mean can be calculated by the formula.

2

N

∑ f dxi

X = A+ i=1

_______

N

Where A stands for assumed mean

dxi = deviations of xi values from assumed mean

f = frequencies

N = total frequencies

which fall in a given class are located at the mid-point of that class. This assumption

holds good only when the no. of frequencies is large.

From this assumption we take X1, X2 ………. XN as mid values of intervals and

calculated arithmetic mean

N

∑ fxi

X = i=1 where ∑ f = N

N

Computation procedure :

Step I : Write all class intervals serially in the first coln and

corresponding frequency in the second coln

and upper class interval and divide resultant quantity by 2

& put these values in third column.

in fourth coln. The addition of this column gives ∑ f X.

3

Sum of Second coln

If the values of variables are large in size, make it simple by using short cut

method.

Symbolically, X = A + d

Step – I choose any value from data which is called assured mean (a)

Step – II take the difference of assured mean & mid values known as

deviation of difference (d)

Step – III multiply each d by corresponding f

Step – IV calculate d by using the formula

Step – V the formula X = a + d is used to find mean of original data

1 It is rigidly defined

2 it is early to calculate & understand

3 It is based upon all the observations

4 It is capable of further mathematical treatment

5 It is least affected by sampling functions

Demerits of AM :-

1. It is used for quantitative data, mean cannot the calculated for qualitative data like

caste, religion and sex.

2. It is unduly affected by extreme observations.

3. It cannot be calculated when the frequency dist is with open end classes.

4. Some times, AM may not be an observation in a data.

5. It cannot be determined graphically.

4

n1 + n2

x1, x2 – mean of first group with size n1, n2 respectively.

Median:-

distribution with open end class and qualitative variables like honesty, sex, religion etc.

we use other meaning of CT like median.

Definition:-

Median may be defined as the central value of a variable when the values are

arranged in order of magnitude i.e., either in ascending order or in the descending order.

The median divides the series into two equal parts, 50% of the observations will be

smaller than the median while 50% of the observations will be larger than it.

2

Or median = 2 2

(un grouped data) 2 if n is even.

2

f

L2 = Upper limit of median class

f = Frequency of median class

cf = Cumulative frequency of the pre-median class

h = L2 – L1 class width

5

Merits of median;--(1) Easy to understand and easy to calculate .

(2) Can be computed for a distribution with open and classes.

(3) Not affected due to extreme observation .

(4) Applicable for quantitative as well as qualitative data.

(5)Can be determined graphically.

Demerits;- (1)It is not based on all the observations, hence it is not proper

representative.

Mode- The mode is the most common value of a variable that occurs

most frequently in a series.

(1) Ungrouped data: -In this case mode is obtained by inspection. For a

given data, mode may or may not exit & even if exists, it is not necessarily

. unique.

Mode= L+ fm- f1

------------------.h

2fm-f1-f2

L-Lower boundary of modal class

Fm- Frequency of modal class

Fi—Frequency of Pre modal class

F2-- Frequency of Post –modal class

h- Width of modal class

given distribution. As compared with the mean & median, the mode has very limited

utility.

2) Not affected by extreme observations.

3) Can be determined even though distribution has open-end classes.

4) Can be obtatined graphically.

Demerits:-

6

i. It is not based on all the observations.

ii. Not capabule of further Mathematical treatment.

iii. It is not rigidly defined.

iv. The calculation of mode is labourious & time consuming.

i.e. Mean-Mode=3(Mean-Median)

v. Quartiles :- The values which divide the given data into four

equal parts when observations are arranged in order of

magnitude are known as Quartiles. There will be three quartiles

Q1, Q2,& Q3. Q1 is known as lower quartile or first quartile

and will have 25% observations of the distributions

Below it and consequently 75% of the observations are

greater than it. The second quartile is known as Median &

Q3,75% observation below & 25% obs after.

4

Q1= (n/4)th +(n/4+1)th observation of arranged data if n=even

2

For grouped data:- The formula for determining quartile is

Q1= 1+ K-c.f./f *h , where Q1=first quartile , c.f- cumulative

Frequency of the class previous to first quartile class , f-freqency of first quartile.

h =class width of first quartile group , k= N/4 , where N= Total frequency

series. This is used when data contains a few extremely large or small values.

If there values are give 3, 9& 27 the GM be comp led as G= (3X9X27) 1/3=9

When the series consists more than three number ,it is difficult to extract root.

That is why logs are employed

GM= log G= log XI+ logX2+-----logXN

----------------------------------

N

N

=1 log xi

N

Or, G= Antilog [N log xi ]

-=1----------

N

7

For Disorate series ,

G=Antilog [ N f log xi ]

----------

N

For an ungrouped data the HM is given by formula X= 1+1+---+1

_ _ _

X1 X2 Xn

Or, N

-----------------------------

1

------

X + X2+ 1

MEASURES OF DISPERSION

As already discussed, the whole data is represented by a single value known as average.

It cannot describe the data completely. There may be two or more data sets with same

mean but data set may not be identified.

8

To avoid disuniformity in observations, if it is necessary to study the variation.

The variation is also known as dispersion. It gives the information how individual

observations are scattered or dispersed for the means of a large sizes.

Deviation=observation-Mean

Different Measures of Dispersion :

(i) Range : A-B

(ii) Quartile deviation : Q3-Q1

2

(iii) Coefficient of Quartile deviation : Q3 - Q1

Q3 + Q1

(iv) Mean deviation Md = ∑ x-x

(v) Standard deviation Md= ∑ + x-x

N

(vi) Variance : N= ∑f

(vii) Coefficient of variation :

Coefficient of mean deviation about mean = MD about mean ∑ x-x /X

mean n

Standard deviation : Positive square root of the arithmetic mean of the square of the

taken for the mean denoted by

δ = ∑ x-x 2

n

When population mean is not known, we can take sample mean as an estimate of

population mean. In this case, only (n-1) observations are independent. Therefore, when

there are n observation in the data, divisor is n-1. In statistical language n-1 is called

degree of freedom.

δ = ∑ x-x 2

n-1

on simplification = δ2 = 1/n(∑x2-nx-2)

When observations are large in size the formula for SD is lebonion short cut method may

be used.

I- Divide assigned mean ‘a’

9

II- Obtain deviation values u,d = x-a

III- Complete mean deviation

IV- Apply formula δ = ∑ (d2-nd-2

n-1

For grouped data δ = ∑ fd2- d-2 xh

n-1

6. Variance : The square of the standard deviation of a set of object is called the

variance & denoted by δ2

Merits of Standard deviation :

(i) It is rigidly defined.

(ii) It is based upon all observations.

(iii) It does not ignore the algebraic sign of deviation.

(iv) It is capable of further treatment.

(v) It is not much affected by sampling fluctuation.

Demerits of Standard deviation :

(i) It is difficult to understood & calculate.

(ii) It cannot be calculated for quantitative data &

(iii) It is unduly affected due to extreme deviation.

Coefficient of variation :

For comparing the variability of two frequency distribution, the relative is

known as Coefficient of variation. It is always expressed in percentage.

Cv = δ x 100

x

SUMMARY :

1. Standard deviation or variance is never negative.

2. When all observations are equal, standard deviation is zero.

3. When all the observation in the data are increased or decreased by a constant,

Standard deviation remains the same.

10

4. When each of the observation is multiplied by constant K, then the standard

deviation is K times the standard deviation of original data.

Many a times in statistics, the data is related to two variables known as bivariate

distribution . One of the variables is denoted by ‘x’ & other by ‘y’ & observations are

paired like (x,y). For example blood pressure & weight, age of wife & husband. Pulse

rate & temperature, height of father & sons etc.

We are interested to study whether there is mutual relations between two variables

under consideration or not. The joint relation is called correlations. Two variables are

said to be correlated when change in value of one variable causes corresponding change

in the value of theother variable. To study correlation, there must be logical relationship

between two variables.

Positive Correlation :

Increase in the value of the one variable causes increase in value of the other variable or

decrease in the value of one variable causes decrease in the value of other variable.

Correlation between these two variables is said to be positive correlation. In other words,

direction of change in values of two variables is same e.g Temp & pulse rate are

positively correlated.

Increase in the value of one variable causes decrease in the value of other variable & vice

versa. Change in the values of the two variables is in opposite direction.

The simplest way to study correlation is graphical method. Plot ‘n’ sized observation like

(X1, Y1) …..( Xn, Yn). Put these prints in a graph paper. These points are scattered. Thus

this diagram is known as scattered diagram .

Correlation Coefficient :

Prof. Karl Pearson has suggested a measure of degree of correlation coefficient. It

is calculated by the formula rxy

It is also called Product moment Correlation Coefficient.

r= n ∑ xy - ∑ x . ∑ y XXXXX

√ {n. ∑ x2 – ( ∑ x)2} √ n. ∑ y2- (∑y)2

or

r= 1/n ∑ xy- xy

11

√ (1/n. ∑ x2 - x2 ) x √ 1/n ∑y2-y2)

Properties of Correlation Coefficient :

(i) It always lies between -1 & +1. symbolically -1≤ r ≤ +1

(ii) r is a pure member , r is a unit less quantity.

(iii) Two independent variables are uncorrected , when x & y are independent ,

then r=0

(iv) The absolute value of Correlation Coefficient r is independent of change of

origin & scale.

RANK CORRELATION :

Given by the formula :

rs = 1- ∑ d2

n (n2-1)

Where n = No. of paired observation.

d= difference between respective ranks.

LINEAR REGRESSION :

First used by British biometrician Galton literally means stepping back towards

averages. Regression analysis is a mathematical measures of the average relationship

between two or more variables in terms of original units of the data . In Regression

analysis, there are two types of variables. The variables whose value is to be predicted is

called dependent variable & the variable which is used for prediction is called the

independent variable. In Regression analysis, independent variable is also known as

regressor, or predictor or explanator while the dependent variable is also known as

regressed or explained variable.

Y= a + bx

LINE OF REGRESSION :

If the variables in a bivariate distribution are related, we will find that the points

in the scatter diagram will cluster round some curve called the Curve of Regression. If the

curve is a straight line, it is called Line of Regression & there is said to be Linear

Regression between two variables. The Line of Regression is the line which gives the

best estimate to the value of one variable for any specific value of the other variable.

Thus the line of regression is the “line of best fi” & obtained by the principles of least

square.

12

Linear Equation satisfy an equation of the form

Y= a + bx falls as a straight line where a, b, are constant.

Mathematically, a is the y intercept &

b is the slope of the line.

Summarises the degree of relationship Summarises the nature of relationship

between two variables. between two variables.

Pairs of observation of two variables The value of one variable are selected at

selected at random. random by fixing the value of other

variables.

Applied to those cases where there is no Applied to those cases where there is a

direction of dependency. direction of dependency.

Cause & effect relationship between two One variable is dependent & another is

variables is not clear, x may be cause of y, independent.

y may be the cause of x or correlation may

be due to chance between two variables.

13

- Ch24 AnswersUploaded byamisha2562585
- AP Statistics 3.5Uploaded byEric
- StrategicBusinessAlliancesUploaded byLouiza Hank
- 14520978 Ch24 Multivariate AnalysisUploaded byGorav Bhalla
- CorrelationUploaded byRizzah Mae Soriano Raguine
- Paired data, correlation & regressionUploaded byvelkus2013
- Torben Juul Andersen, Jerker Denrell and Richard Bettis-1. 2007. Strategic Responsiveness and Bowman’s Risk-Return Paradox, Strategic Management Journal, 28(4)Uploaded bymmoukarz
- A Longitudinal Test of a Model of the Antecedents and Consequences of Union LoyaltUploaded byJorj Yasay
- Correlation and Regression Skill SetUploaded byKahloon Tham
- 3 - QSM 754 Course PowerPoint Slides v8Uploaded byFisher
- Bms Project Shubham 4Uploaded byRishab Lohan
- The relationship between the Crude Oil price and the Trade weighted US Dollar IndexUploaded bySara Gross
- m11l21Uploaded byAnkit Chawla
- Internet SecurityUploaded byKhoa Nguyen
- A Study on the Impact of Women Self-help Groups (SHGs) on Rural Entrepreneurship Development-A Case Study in Selected Areas of West BengalUploaded byIJSRP ORG
- Correlation 1Uploaded byAsad Munir
- The Causal Relationship between Anopheles Mosquito Population and Climatic Factors in Makurdi- Nigeria: An Empirical AnalysisUploaded byrobert0rojer
- Week2ReadingNotes (1)Uploaded byJoanna Strozak
- Reaserch ProjectUploaded bySeid Aragaw
- Course OutlineUploaded bymuralidharan
- final paperUploaded byapi-374258253
- BS Assignment 60 AbUploaded byAbhay v.s
- Abstrack Bahasa InggrisUploaded byRahma Basman
- Case StudyUploaded byArun Pareek
- RegressionUploaded byLuyanda Blom
- regressionUploaded byNur Sophia
- Influence of Personal Selling on Brand Performance of Retail Shoe Companies in Nairobi Central Business District1Uploaded byAlvano Febrianto
- Artikel_iriamana Liasyarah Marudin_k1a1 15 018Uploaded byIriamana Liasyarah Marudin
- Camena & Castro,2017.EASTS 2017.pdfUploaded byReji Patch
- Correlation ExercisesUploaded byKrishnaMohan Thatipalli

- UNIVARIATE STATISTICS.docxUploaded byDebmalya Dutta
- mean.pdfUploaded byrahul_agrawal165
- Ore Deposit ModelingUploaded byJean Marin Padilla
- Security Analysis and portfolio managementUploaded byzaddd
- Hem 707 BiostatisticsUploaded byGideonEmmanuel
- Pictgram NotesUploaded byWajahatAbbasWajeeh
- Statistics in Education- Made SimpleUploaded bySatheesh
- 131-0104Uploaded byapi-27548664
- Spinning CalculationUploaded byamboklate
- Numerical Descriptive MeasuresUploaded bySantosh Srikar
- 08 SBST1303 T4 (WM)Uploaded byFrizal Rahim
- Worksheet FA2Uploaded byMaths Dept
- uk observerUploaded byankaradan
- Measures of Central TendencyUploaded byAbigail Cabison
- IntroductoryBusinessStatistics OPUploaded byMOHAMMED IRFANUDDIN
- Mean and MedianUploaded byRahul Arora
- 003-Understanding-Immunoassays.pptUploaded byFidaa Jaafrah
- ZA7470 Variable Descriptions(1)Uploaded byheh1992
- OMS Documento Técnico de Mapeo de TemeraturasUploaded bygbra80
- Statistik Mid Rumus1Uploaded byPurnawanJogja
- BIOSTATISTICS 1Uploaded byISRAEL
- CBSE Class11 Maths Notes 15 StatisticsUploaded byRoy
- 4th ThesisUploaded byArjay Ladiana
- RTP-Dec-12_Foundation_P-4.pdfUploaded bysnjv2621
- ADL 07 Quantitative Techniques in Management V3Uploaded byRemo isaac
- Demand forecasting and statistics_Basic guideUploaded byapi-26957235
- C3-Fundamentals of Business MAthsUploaded byscamardela79
- Day 4 Ravi Sir CRS_Safety_IAHEUploaded byAaron Wilson
- Metode Analisis DataUploaded byhambyong
- upload-140103034715-phpapp01.pptxUploaded byRodjan Moscoso

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.