You are on page 1of 33

―A short note on Statistics and Econometrics for the examinees of AD (Stat.

),BB‖

::Basic Statistics::
.

Q:: Define Statistics. Write down its importance & scope and mention its
limitations.

Statistics: Statistics as subject is barely a century old. R.A Fisher is known as the
father of modern Statistics. Statistics is a branch of knowledge which deals with
collection, classification, summarization, presentation and analysis of data in any
field of inquiry.

Meaning of Statistics: The word ―statistics‖ seems to have obtained from the
Latin word ―Status‖ or the Italian word ―Statista‖ or the German word ―Statistik‖
or French word ―Statistique‖ each of which means ―Political State‖. In ancient time
the government used to collect information about total population, land, wealth,
total number of employees, soldiers etc. to have the idea of the manpower of the
country for formation of administrative set up, fiscal, new taxes, levies and military
policies of the government.

The word Statistics is used in three different senses-

i. Statistics as a singular
ii. Statistics as a plural
iii. Statistics as a plural of Statistic.

Function of Statistics:

i. It presents facts in a definite form


ii. It implies mass of data
iii. It facilitates comparison
iv. It helps in formulation and test of hypothesis
v. It helps in prediction
vi. It helps in the formation of the policies
.

Importance and Scope of Statistics

In modern times, Statistics is viewed not as a mere device for collecting numerical
data but as a means of developing sound techniques for their handling and analysis
and drawing valid inferences from them.

1. Statistics and Planning: Statistics is indispensable to planning .In the modern


age which is termed as ‗the age of planning‘, almost all over the world,
governments are resorting to planning for the economic development.

2. Statistics and Economics: Statistical data and technique of statistical analysis


have proved immensely useful in solving a variety of economic problems, such as
wages, prices, analysis of time series and demand analysis.

3. Statistics and Business: Statistics as an indispensable tool of production control


also. Business executives are relying more or more on statistical techniques for
studying the needs and the desires of the consumers and for many other purposes.
The success of a businessman more or less depends upon the accuracy and
precision of his statistical forecasting.

4. Statistics and Industry: In Industry, Statistics is very widely used in ‗Quality


Control‘. In production engineering, to find whether the product is conforming to
specifications or not, statistical tools, viz, inspection plan, control charts, etc.are of
extreme importance.

5. Statistics and Mathematics

Statistics and mathematics are very intimately related. Recent advancements in


statistical techniques are the outcomes of wide applications of advanced
mathematics.
6. Statistics and Medical Science:In medical science, the statistical tools for the
collection, presentation and analysis of observed facts relating to the causes and
incidence of diseases and the results obtained from the use of various drugs and
medicines are of great importance.

7. Statistics and Psychology and Education:In education and psychology, too,


Statistics has found wide applications e.g., to determine the reliability and validity
of a test,‘ Factor Analysis‘, etc.

Limitation of Statistics:

1. Statistics is not suited to the study of qualitative phenomenon: Statistics, being a


science dealing with a set of numerical data, is applicable to the study of only those
subjects of enquiry which are capable of quantitative measurement. As such,
qualitative phenomena like honesty, poverty, culture etc. which cannot be
expressed numerically, are not capable of direct statistical analysis.

2. Statistics does not study individuals: Individual items, taken separately, do not
constitute statistical data and are meaningless for statistical enquiry. Hence,
Statistical analysis is suited to only those problems where group of characteristics
are to be studied.

3. Statistical laws are not exact: On the basis of statistical analysis we can talk only
in terms of probability and chance and not in terms of certainty. Statistical
conclusions are not universally true-they are true only on an average.

4. Statistics is liable to be misused: Statistical methods are the most dangerous


tools in the hands of the in experts. The use of the statistical tools by inexperienced
and untrained persons might lead to very fallacious conclusions.

…. ….
Q:: What is Frequency Distribution ? Construction of a frequency distribution
from a raw data set.

Frequency Distribution: Frequency distribution is a table where observations of a


variable are divided into different mutually exclusive categories. Here each
category is called class. Usually, it contains three columns. Different categories of
the variable as class are in the first column, observations fall in each class
presented by tally marks are in the second column and frequencies for each class
are shown in the third column.

Construction of a frequency distribution from a raw data set is given below:

1. Find out range by subtracting the lowest value from the highest value of
the variable x.
2. The number of class interval should not be too large or too small, usually
it lies between 5 to 20, corresponding the practical situation. Having
fixed the no of classes, divide the range by it and nearest integer to the
value gives the length of class interval. The class interval should be
exhaustive, mutually exclusive and usually of equal length.
3. The table will have three columns having names- class interval, tally
marks and frequency. The first class interval will start with the smallest
value and continue until the interval with the highest value of the given
series of data is reached.
4. Give tick marks to each of the values of the original table of raw data
and put tally marks against the appropriate class interval.
5. Count the number of tally marks corresponding to each class interval
and write the result in the respective frequency column.

…. ….
Q:: What do you mean by Measure of Central Tendency ?

Measure of Central Tendency: In a representative sample, the value of a series of


data have a tendency to cluster around a certain point usually at the center of the
series. This tendency of clustering the values around the center of the series is
usually called central tendency. And its numerical measures are called the measure
of central tendency/location. There are five different measures of central location:

1. Arithmatic mean
2. Geometric mean
3. Harmonic mean
4. Median
5. Mode

…. ….

Q:: What do you mean by Measure of Dispersion? Why it is necessary ?

Measure of Dispersion: Measure of central location gives us an idea of the


concentration of the observations about the central value of the distribution. It is
equally important to know how the observations of the variate cluster around or
dispersed away from the central value of the distribution. Such variation is called
dispersion and its numerical measure is called measure of dispersion. Measure of
dispersion give the degree of scatterness about the central location and thus giving
measure of variability or lack of homogeneity of the data.

Following are the measures of dispersion:

a. Absolute measure
1. Range
2. Quartile deviation
3. Standard deviation
4. Mean deviation
b. Relative measure
1. Co-efficient of quartile deviation
2. Co-efficient of mean deviation
3. Co-efficient of variation

Necessity of measure of dispersion:

Dispersion tells us how compactly the individual values are distributed around the
central values. Dispersion of a single variable might not bear that much meaning,
while comparison of dispersion of two sets of variables is more useful for taking
decision. Measure of central location gives us an idea of the concentration of the
observations about the central value of the distribution. It is equally important to
know how the observations of the variate cluster around or dispersed away from
the central value of the distribution. . Measure of dispersion give the degree of
scatterness about the central location and thus giving measure of variability or lack
of homogeneity of the data.

…. ….

Q:: Write short notes about the followings—

Correlation Co-efficient: Correlation is a statistical technique which measure and


analyses the degree or extent to which two or more variables fluctuate with
reference to one another. Correlation thus denotes the interdependence amongst
variates. The degrees are expressed by a coefficient which ranges between -1 and
+1. The direction of change is indicated by + or – signs.
If the increase (decrease) in one variable results in the corresponding increase
(decrease) in the others i.e. if the changes are in the same directions the variables
are positively correlated. For example, the heights and weights of group of persons
is positively correlated, advertising and sales.
If the increase (decrease) in one variable results in the corresponding decrease
(increase) in the others i.e. if the changes are in the opposite directions the
variables are negatively correlated. For example, T.V registration and cinema
attendance is negatively correlated.
An absence of correlation is indicated by zero.
Correlation thus expresses the relationship through a relative measure of change
and it has nothing to do with the units in which the variables are expressed. The
numerical value of correlation is called correlation co-efficient.

Karl Pearson correlation co-efficient is denoted by,

 X Y
 XY  N
r=


 X  2


  Y  2


 X   Y 
2 2

  
 
N N
  

Scatter Diagram: Scatter diagram (or Dotogram or Scattergram) is a simple and


attractive method of diagrammatic represent of bivariate distribution for
ascertaining the nature of correlation between the variables. Thus for the bivariate
distribution ( xi , yi ) i=1,2,…..n if the values of the variables XY be plotted along
the X axis and Y axis respectively in the XY plane, the diagram of dots so obtained
is known as scatter diagram. On the other hand, A scatter plot of two variables
shows the values of one variable on the Y axis and the values of the other variable
on the X axis. Scatter plots are well suited for revealing the relationship between
two variables.

Index Number: Index number is a pure number which measures the relative
change of price or quantity or value of a commodity or a group of commodities of
a particular year called current year with respect to some standard year called base
year.

Time Series Analysis: A time series is a set of numerical measurements on a time


dependent variable of interest arranged over a regular interval of time.

.
Simple Random Sampling: Simple random sampling refers to the sampling
technique in which each and every item of the population has an equal chance of
being included in the sample. Thus simple random sampling is a method of
selecting n units out of a population size N units assigning equal probability of all
units.

…. ….

Q:: What is Design of Experiment ? What are the Principles of Design of


Experiment ?

Design of Experiment: By the word experiment we mean a process to have a


series of observations taken under some condition specified by the experimenter to
confirm or disprove something doubtful and also to discover some principles or
effects. The design of experiments mean the logical construction of experiment to
select the pattern of collecting data to suit the above purpose.

Principles of Design of Experiment: According to Prof. R.A Fisher, the basic


principles of design of experiment are-

A. Randomization
B. Replication
C. Error Control

Randomization: At first the treatments and experimental plots of the experiment


are decided. Randomization means that for an objective comparison it is necessary
that the treatments be allotted randomly to different experimental plots to avoid
any type of personal or subjective error. There are numbers of ways of
randomization depending on the nature of the design of experiment.

Replication: The repetition of the treatments under investigation to more than one
experimental plot is known as replication. Replication is necessary to increase the
accuracy of the estimates. The numbers of replications depend on the expenditure
and the degrees of precision.
Error Control: Though every experiment would provide an estimate of error
variance, it is not describe to have a large experimental error. The measures of
reducing the error variance are usually called error control or local control.

…. ….

Q:: What is index number ? Why Fisher‘s index is called the ideal index number ?
What do you mean by Cost of living index number ?

Index Number: Index number is a pure number which measures the relative
change of price or quantity or value of a commodity or a group of commodities of
a particular year called current year with respect to some standard year called base
year.

Laspeyere’s price index: In this method base year quantities are taken as weights.
A German Economists Laspeyere suggest this formula in1817. The formula is,

Where, = Price index

= Price in the current year

= Price in the base year

= Quantity in the base year

Fisher’s price index: The Fisher‘s price index is the geometric mean of
Laspeyere‘s and Paasche‘s price indices. This formula was first used by Fisher in
1920. The formula is,
∑ ∑
∑ ∑

Where, = Price index


= Price in the current year

= Price in the base year

= Quantity in the current year

= Quantity in the base year

Fisher’s index is called the ideal index because of the following reasons:

a. The formula is based on the geometric mean which is theoretically


considered as the best measure of average for constructing index numbers.
b. The formula takes into account prices and quantities of both current year
as well as base year.
c. The method is free from bias.
d. The method satisfies both time reversal and factor reversal tests which
justifies its superiority over other indices.

Cost of living index number: The consumer price index, also known as cost of
living index or retail price index is constructed to study the effect of changes in the
prices of a basket of goods and services on the purchasing power of a particular
group of people during current period as compared with some base period. Change
in the cost of living of an individual between two periods means the change in his
money that will be needed for him to maintain the same standard of living in both
periods. Thus the cost of living index numbers are intended to measure the average
increase in the cost of maintain the same standard of living in in a given year as in
the base year. Cost of living index numbers are therefore, compiled to get a
measure of the general price movement of the commodities consumed by a
different classes of people.

Family Budget Method: In this method the family budget of a large number of
people, for whom the index is to be constructed, are cautiously studied, then
aggregate expenditure of an average family on various items is estimated. These
values constitute the weights. Mathematically, consumer price index in this method

is, ∑

Where, P‘s are the price relatives, that means and V is the value
weight that means .

…. ….

Q:: What is time series analysis ? Describe the components of time series analysis.
Also describe Mathematical models for time series analysis.

Time Series Analysis: A time series is a set of numerical measurements on a time


dependent variable of interest arranged over a regular interval of time.

Components of Time Series Analysis:

The important components of a time series are

1. Secular trend
2. Seasonal variation
3. Cyclical fluctuation
4. Irregular variation

Seasonal Variation: Seasonal variations are like cycles, but they occur over
short and repetitive calendar periods. By seasonal variation we mean a periodic
movement that repeats itself with remarkable similarity at a regular interval of
time. Seasonal variation arises as va result of natural changes in the seasons
during the year.

Factors affecting seasonal variation:

i. Climate and weather factors.


ii. Social customs or religious factors.

Reason for studying seasonal component:

i. It allows us to establish the pattern of past changes.


ii. It is useful to project past pattern for the future.
iii. Once the seasonal pattern that exists in a time series data has been
established, it is possible to eliminate its effect from the data.

The following methods are used to measure the seasonal variation in a


time series data:
i. Method of simple average.
ii. Ratio-to-trend method.
iii. Ratio-to-moving average method.
iv. Link relatives method.

Mathematical models for time series analysis:

There are two mathematical models, which are commonly used for the
decomposition of a time series into different components. These are:

i. Multiplicative model.
ii. Additive model

In traditional time series analysis, it is assumed that there is a multiplicative


relationship among these four components. Let denotes the value of a time series
at time t. Symbolically,

Also, according to additive model a time series is the sum of its four components.
Symbolically,

Where, = Trend component; = Seasonal component; = Cyclical component


and = Irregular component.

…. ….
Q:: Write down the theorem of Total Probabilities and Bayes theorem.

Theorem of Total Probabilities:

be n mutually exclusive and exhaustive events in a


random experiment and A be any event in the same sample space, then,

..

It is obvious from Venn-diagram,

Since the events are mutually exclusive as


are mutually exclusive.

Now, from the multiplicative law of probability we‘ve,


.

Bayes Theorem:

Statement: be n mutually exclusive and exhaustive events


in a random experiment and A be any event in the same sample space such that
P[A]> 0, then Bayes theorem states that,


∑ [ ] ⁄

Proof: From conditional probability we‘ve,

From total theorem we‘ve,

Again, from multiplicative law, we‘ve, ⁄

Now, from (1) we‘ve,




∑ [ ] ⁄

Independent Events: Events of an experiment are said to be independent iff the


occurrence of each is not affected by the occurrence or non occurrence of the
others. If A and B are two independent events then,

Example: Let, three coins are tossed. Then, the event ―heads on the first coin‖ and
the event ―tails on the last two‖ may be independent.

…. ….
Q:: Write down the additive law and multiplicative law of probability.

Additive law of probability for three events:

∑ ∑ [ ]

Proof: We‘ve .

Let,

We know, for two events the additive law of probability is,

Now, by substituting the values of A and B in (2) we‘ve,

Here,

Now from (3) we‘ve,

∑ ∑ [ ]

This completes the proof of the theorem.


.

Multiplication Rule of Probability:

If A and B are two events, then

⁄ ⁄

Proof: From the definition of conditional probability, we‘ve,

It follows that, ⁄

Similarly,

It follows that, ⁄

From (1) and (2) we‘ve,

⁄ ⁄

This proves the theorem.

But if A and B are independent events, then,

⁄ and ⁄

Hence, .

…. ...
Q:: What do you mean by Binomial Experiment ? Find the Mean and Variance of
Binomial distribution.

Binomial Experiment: An experiment is called binomial priment when it has the


following properties:

1. The experiments consists of n Bernoulli trials


2. Each trial has two possible outcomes namely success and failure
3. The probability of success p remains the same from trial to trial
4. The repeated trials are independent

Mean and Variance of Binomial distribution:

Mean:

( )

Now, ∑

∑ ( )

Again,
Here, ∑

∑ ( )

…. ….
Q:: Derive Poisson distribution from Binomial distribution. Find its mean and
variance.

Derivation of Poisson distribution from Binomial distribution:

Poisson distribution can be derived from Binomial distribution under the


following conditions:

i. p, the probability of success in a Bernoulli trial is very small, that is


.
ii. N, the number of trials is very large, that is
iii. is a finite constant.

We know the probability function of binomial distribution is,


( )

Now for fixed x;

and,

( ) ( )
Which is probability function of poisson distribution.

Poisson Distribution: A discrete random variable x is said to have a Poisson


distribution if its probability function is given by,

Where, e= 2.71828 and is the parameter of the distribution.

Mean and Variance of Poisson distribution:

Mean:

.
Variance:
We know that,
Now,
Here, ∑

Now,
Thus,

…. ….

Q:: Define Mathematical Expectation of random variable. Derive Additive Law


and Multiplicative Law for Mathematical Expectation.

Mathematical Expectation of Random Variable:

If X is a random variable which can finite or infinite sequence of possible values


with corresponding probabilities , then the
mathematical expectation of random variable X, denoted by µ, is defined by,

Additive Law for Mathematical Expectation:

Statement: The expected value of the sum of two functions of a random variable x
and y is the sum of the expected values of the functions, that is,
Proof: Let, x and y are two random variables which can take values
and with corresponding probabilities and
respectively. Then,

∑ ∑

∑ ∑ ∑ ∑

∑ ∑ ∑

which proves the theorem.

Multiplicative Law for Mathematical Expectation:

Statement: The expected value of the product of two functions of a random


variable x and y is the product of the expected values of the functions, that is,

Proof: Let, x and y are two random variables which can take values
and with corresponding probabilities and
respectively. Then,

∑ ∑

which proves the theorem.

…. ….
::Econometrics::
.

Q:: State classical linear regression model (CLRM) and also state briefly its
important properties.

Ans:: The model which follows the following ten assumptions is known CLRM.

Statement:

1. The model should be linear in parameters.


2. X‘s values are fixed in repeated sampling.
3. .
4. The model should be homoscedastic.
5. No auto correlation between the disturbance term.
6.
7. No of observations should be greater than no of parameters.
8. Variability in X‘s values.
9. The model should be correctly specified.
10. There is no perfect multicolinearity.

Properties:

1. The estimators of CLRM are unbiased.


2. Have minimum variance.
3. Have consistency.
4. Parameters should be normally distributed.
5. Parameters should have minimum variance.

…. ….
Q:: What is multicolinearity ? Describe any method to detect multicolinearity.

Multicolinearity: Multicolinearity refers to the existence of more than one exact


linear relationship among some or all explanatory variables of a regression model.
For the K variable regression involving explanatory variable
having the following linear relationship, ; where, ‘s
are constants. Then t6he regression is affected by multicolinearity.

Types of Multicolinearity:

a. Perfect/Exact multicolinearity
b. Near/less than perfect multicolinearity

Detection of multicolinearity: Here I am going to describe the method of Eigen


value and conditional index to detect the multicolinearity.

At first we‘ve to calculate the data matrix ( ). Then using | | , we

get the values of which are known as eigen values. Now we‘ve, √ .
After calculating CI, if we get 10≤CI≤30, then we can conclude there is moderate
multicolinearity. And if CI≥30, we can conclude there is severe multicolinearity.

…. ….
Q:: What are the consequences of multicolinearity ? What remedial measures can
be taken to alleviate the problem of multicolinearity ?

Consequences of multicolinearity: In case of near or high multicolinearity one is


likely to encounter the following consequences-

1. Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
2. Because of consequence-1, the confidence interval tend to be much wider.
3. Also because of consequence-1, the t-ratio tends to be statistically
insignificant.
4. Because of consequence-3, the overall measures of goodness of fit can be
very high.
5. The OLS estimators and their standard errors can be sensitive to small
changes in the data.

Remedial measures:

1. Increasing the sample size


2. Using extraneous estimates
3. Dropping variables
4. Combining cross-sectional and time series data
5. Model specification
6. Using Ridge regression

…. ….
Q:: Discuss Farrar-Glaubu test for detecting multicolinearity ?

Test for detecting Multicolinearity:

Farrar-Glaubu test: F-G used three test statistic for detecting multicolinearity.

a. -test: for the detection of existence and severing of multicolinearity.


b. F-test: for locating which variables are multicolinear.
c. t-test: for finding the patterns of multicolinearity.

(a). -test: Let us consider thecdeterminant of X‘s in thw two variable models-

∑ ∑
| |
∑ ∑

To obtain the standardized form we‘ve from first element,

(√∑ √∑ ) √∑

The second element by same way and so on.


∑ ∑
√∑ √∑ √∑
| ∑ ∑
| | |
√∑ √∑ √∑

Case-1: In case of perfect multicolinearity, and we‘ve R = 0.

Case-2: In case of perfect orthogonality, and we‘ve R = 1.

Hypothesis:

Test statistic: | | with


Where, n = sample size and k = no of explanatory variables.

Comment: If at α% level of significance we may accept null


hypothesis.

(b). F-test:

Hypothesis:

Test statistic: .

Comment: If at α% level of significance we may accept null


hypothesis.

(c). t-test:

Hypothesis:



Test statistic: .

Comment: If at α% level of significance we may accept null


hypothesis.

…. ….
Q:: What is Auto Correlation ? Describe the sources of Auto Correlation.

Auto correlation: AC means correlation between the successive values of a


random variable. Let us consider the general linear regression model-

One of the basic assumption of this model is that the error terms U t‘s are mutually
independent or uncorrelated i.e., But this assumption may not
be valid in case of time series or cross sectional data. Such type of invalidity is
known as auto correlation problem.

Sources of Auto Correlation:

1. Omitted explanatory variables.


2. Misspecification of the mathematical form of the variable.
3. Interpolation in the statistical observations.
4. Misspecification of the true random error U.

…. ….
Q:: What are the consequences of auto correlation ? What remedial measures can
be taken to alleviate the problem of auto correlation ?

Consequence of AC:

1. When U‘s are serially correlated then the LSE are unbiased.
2. The OLS variance is greater than the variances of estimates.
3. The LSE‘s are consistent though U‘s are autocorrelated.
4. But the LSE‘s aren‘t efficient.
5. Zero order AC coefficient is always unity.
6. The usual t and F test of significance are no longer valid.
7. Over estimation of R2.

Remedial measures:

1. Try to find out if the AC is pure.


2. If it is pure AC, one can use appropriate transformation of the original
model.
3. In case of large sample one can use the Nrewly-West method to obtain
standard errors of the OLS estimates.
4. Some special cases we can use OLS method.

…. ….
Q:: Describe a procedure to detect auto correlation ?

Detection of AC: Run test:

Given an ordered sequence of two or more types of symbols. A run is defined to be


a successive of one or more identical symbols which are followed and proceeded
by a different symbol or no symbol at all.

Hypothesis:

Let, n1 = no of positive symbols.

n2 = no of negative symbols.

k = no of runs

n = n1 + n2

1. Under the null hypothesis that successive outcomes are independent and
assuming that n1>10 and n2>10. The no of runs is distributed normally with,

Mean, and Variance,

2. If the hypothesis of randomness is sustainable we should expect k, obtained


to lie between with 95% confidence.
3. Decision rule: Don‘t reject null hypothesis if

Note: Generalized Least Square Method (GLS) is used to obtain parameters if


there exists AC in any regression model.
Q:: Define heteroscedasticity with its consequences and remedial measures.
.

Heteroscedasticity: The basic assumptions of linear regression model is that,

a.
b. ( )
c.

The assumptions are known as homoscedasticity. If the assumptions are,

a.
b. ( )
c.

Then it is called heteroscedasticity.

Consequence of Heteroscedasticity:

1. The estimates of regression parameters are still unbiased but inefficient.


2. The variances of estimates are still unbiased but the tests do not hold.
3. The prediction would not be efficient.
4. .
5. Heteroscedasticity doesn‘t destroy the unbiasedness of OLS estimated but it
no longer BLUE.

.
Remedial measures:

a. When is known
b. When is unknown

Note: Spearman Rank Correlation test and Goldfeld-Quandt test are used to test
the heteroscedasticity. And Weighted Least Square Method (WLS) is used to
obtain parameters if there exists heteroscedasticity in any regression model.

…. ….

::Miscellaneous::
.

Q:: Explain why you want to start your career as a banker/ as a banker of BB ?

Q::Bangladesh Bank should accept equal number of male and female employees in
every department. To what extent do you agree or disagree ?

End of Part-1
.

Thankfully,

Abu Naser Mohammad Motaher Hossain Raju


Probationary Officer
Modhumoti Bank Limited
motaher.statcu16@gmail.com

************* *************** **************

You might also like