You are on page 1of 10

ECONOMICS SEM 4

1. Statistics- is a way to get information from the data collected.


Objective: extract information from data.
Descriptive Stats- organising, summarizing, and presenting data in a convenient and
informative manner. Types- a) graphical techniques and b) numerical technique (e.g.-
average or mean)
A sample is a set of data drawn from the studied population. A descriptive measure of a
sample is called a statistic.
Population- the entire set of observations under study.
Measure of central location is the median
Descriptive Measure of a population is called a parameter.
Average
Range- subtracting the smallest from the largest number
Histogram- shows the number of observations distributed in a particular range.
Inferential Stats- methods used to draw conclusions or inferences about characteristics of
populations based on sample data.
Statistical Inference- process of making an estimate, prediction, or decision about a
population based on sample data. Measures-
Confidence level- the proportion of times that an estimating procedure will be correct.
Significance Level-measures how frequently the conclusion will be wrong.

How does it help others?


Helps the managers (all sectors of the economy) to summarize and organize the data they
receive on a massive scale and this further helps in decision making.
Decision making is often performed with the help of Descriptive statistics. These methods
are straightforward.
Most management, business, and economics students will encounter numerous opportunities
to make valuable use of graphical and numerical descriptive techniques when preparing
reports and presentations in the workplace.
Types of Data and Information
Some terms:
Variable- some characteristic of a population or sample. E.g., the mark on a stats exam by a
student or price of a stock. Represented by uppercase letters like X, Z, Y.
Values- values of variables are the possible observations of the variable. E.g., price of stocks
(real number) ranging from 0 to 100 dollars.
Data- observed values of a variable. Datum- plural for data.

Three types of Data-


1. Interval- are real number such as heights, weights, incomes, and distances.
2. Nominal- values which are categories. Such as marital status of people. Codes are
assigned to these non-numerical values. Further categorised into Qualitative and
Categorical.
3. Ordinal- appear to be nominal but the difference is the order of their values has
meaning. Such as poor, fair, good, very good. It indicates a higher rating.
Accordingly, codes are assigned in ascending order.
* Difference between ordinal and interval is that differences in interval data are
consistent and meaningful and in ordinal data codes are assigned so its impossible to
compute and interpret differences.

Calculations of these data types- refer to pg. 17


1. Interval- all calculations can be performed.
2. Nominal- due to assignment of codes its not possible but we can find the frequency.
3. Ordinal- permissible calculations are those involving a ranking progress.

Describing a set of Nominal Data.


Frequency Distribution- Summarising the data in a table, which presents categories and
their counts.
Relative frequency distribution- lists the categories and the proportion with which each
occurs.
Function: COUNTIF (Input range, Criteria)
For e.g., Countif (X1:X250,1)
Describing relationship between two nominal variables or two or more data sets.
One of the methods used is to create a cross-clarification table and to produce a table
showing the row relative frequencies. Using Pivot Table (pg36) follow the instruction then
for Summarize values by wala step go to Values and right click and choose Count. To
convert into percentage right click and choose % of row.
Interpretation= If the two variables are unrelated, then the patterns exhibited in the bar
charts should be approximately the same. If some relationship exists, then some bar charts
will differ from others.

Graphical techniques to describe interval data


The frequency distribution provides information about how the numbers are distributed, the
information is more easily understood and imparted by drawing a picture or graph. The
graph is called a histogram. A histogram is created by drawing rectangles whose bases are
the intervals and whose heights are the frequencies.
Steps for histogram- go to data, then data, click histogram and then put in Bin range and
output range, choose labels and then Chart Output.

Class Interval Widths= Largest Observation-Smallest Observation/ no of classes


Shapes of Histogram-
Symmetry- the two sides are identical in shape and size.
Positively skewed- towards the left more
Negatively skewed- towards the right more
A skewed histogram is one with a long tail extending to either the right or the left.
Unimodal- single peak Bimodal-Two peaks

Time series- Time-series data are often graphically depicted on a line chart, which is a plot
of the variable over time. It is created by plotting the value of the variable on the vertical
axis and the time periods on the horizontal axis.

Scatter Diagram- Economists develop statistical techniques to describe the relationship


between such variables as unemployment rates and inflation. The technique is called a
scatter diagram. Pg 69 for graphs
Interpret- positive
Negative, No relation, no linear.
Ogives- An ogive is a freehand graph drawn curve to show the cumulative frequency
distribution. It is also known as a cumulative frequency polygon.

Unit4 – sampling
Direct observation= The simplest method of obtaining data is by direct observation. When
data are gathered in this way, they are said to be observational.
Experimental data- produced through experiments. And is more expensive.
Surveys- One of the most familiar methods of collecting data is the survey, which solicits
information from people concerning such things as their income, family size, and opinions
on various issues.
Personal Interview- involves an interviewer soliciting information from a respondent by
asking prepared questions. A personal interview has the advantage of having a higher
expected response rate than other methods of data collection.
Telephone Interview- A telephone interview is usually less expensive, but it is also less
personal and has a lower expected response rate. Unless the issue is of interest, many people
will refuse to respond to telephone surveys.

Simple Random Sampling-


is a sample selected in such a way that every possible sample with the same number of
observations is equally likely to be chosen.
A stratified random sample is obtained by separating the population into mutually
exclusive sets, or strata, and then drawing simple random samples from each stratum.
(Making categories)
A cluster sample is a simple random sample of groups or clusters of elements.

Sampling Error-
refers to differences between the sample and the population that exists only because of the
observations that happened to be selected for the sample.
Sampling error is an error that we expect to occur when we make a statement about a
population that is based only on the observations contained in a sample taken from the
population
The difference between the true (unknown) value of the population mean and its estimate,
the sample mean, is the sampling error.
Eg- population wala.
Non sampling Error-
Non sampling error is more serious than sampling error because taking a larger sample will
not diminish the size, or the possibility of occurrence, of this error. Even a census can (and
probably will) contain nonsampling errors. Nonsampling errors result from mistakes made
in the acquisition of data or from the sample observations being selected improperly.
1. Errors in data acquisition- arises from the recording of incorrect responses.
2. Nonresponse error- refers to error (or bias) introduced when responses are not
obtained from some members of the sample.
3. Selection bias- occurs when the sampling plan is such that some members of the
target population cannot possibly be selected for inclusion in the sample.

Unit 5
Measures of central location-
Arithmetic mean= mean is computed by summing the observations and dividing by the
number of observations.
Population means- μ Sample mean- x bar
Function= AVERAGE ([Input Range]) or Descriptive Analysis.
Median= The median is calculated by placing all the observations in order (ascending or
descending). The observation that falls in the middle is the median.
Function= MEDIAN (Input Range)
The mode is defined as the observation (or observations) that occurs with the greatest
frequency. Both the statistic and parameter are computed in the same way.
Pg-93
Range- Largest obv-smallest obv
Variance- How given data varies to the arithmetic mean of the data set. Sample variance
denotes the variation between your sample data and the mean of your sample data. Sample
is plucked from the huge pool of population (entire available data).
Function: VAR (Input Range)
Population Varx and Sample Varx pg. 97
VARIANCE CANNOT BE NEGATIVE BECAUSE IT IS SQUARED WHICH
ELIMINATES ALL THE POSSIBILITIES OF BEING NEGATIVE.
Standard deviation: Shows how much your data deviates or varies from the average or the
mean of the data. Low SD means data is clustered around mean and high means there is a
high dispersion of data from mean.
Interpret= info depends on shape of histogram and if the histogram is bell shaped, use the
empirical rule.
1. Approximately 68% of all observations fall within one standard deviation of the mean.
2. Approximately 95% of all observations fall within two standard deviations of the mean.
3. Approximately 99.7% of all observations fall within three standard deviations of the
mean
The coefficient of variation of a set of observations is the standard deviation of the
observations divided by their mean: Population coefficient of variation: CV = σ/μ
Sample coefficient of variation: cv = s/x

Percentile
The P th percentile is the value for which P % are less than that value and (100 – P)% are
greater than that value.
Because these three statistics divide the set of data into quarters, these measures of relative
standing are also called quartiles. The first or lower quartile is labelled Q1. It is equal to the
25th percentile. The second quartile, Q2, is equal to the 50th percentile, which is also the
median. The third or upper quartile, Q3, is equal to the 75th percentile.
Quintiles divide the data into fifths, and deciles divide the data into tenths.
LP = (n + 1) P/100
The interquartile range measures the spread of the middle 50% of the observations. Large
values of this statistic mean that the first and third quartiles are far apart, indicating a high
level of variability.
=Q3-Q1

Random Variable and Probability Distribution


RV- function or a rule that assigns real numbers to the outcome of a random experiment.
Random experiment is when you know the outcomes in advance, but exact outcomes are
unknown.
Discrete rv=is one that can take on a countable number of values.
Continuous= is one whose values are uncountable.
is one whose values are uncountable.
A probability distribution is a table, formula, or graph that describes the values of a random
variable and the probability associated with these values.
Requirements for a Distribution of a Discrete Random Variable
1. 0 ≤ P(x) ≤ 1 for all x
2. all-x P(x) = 1

Population mean(neu), Sample mean, Population Variance, and sample variance. (Refer to
notebook)
Sd= square root of variance.
E(x)=sigma x P(X=x)
It is the sum of the values taken by random variable and it is associated probabilities.

BINOMIAL DISTRIBUTION
Fixed no of trials
Has only two outcomes – success and failure
Probability of success- p
Probability of failure= 1-p
trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials
Bernoulli trial= if there are only two outcomes success and failure, Probability of success- p
Probability of failure= 1-p
trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials
P(x)= n! / x! (n-x)!
Function= BINOM.DIST

POISSON DISTRIBUTION
If we want to find out the probability of successes in each interval of time or space, we use
Poisson distribution.
Function=POISSON.DIST(x, neu, t/f)
P(x) = e−μμx/ x! and substituting x = 0
NORMAL DISTRIBUTION
Based on continuous random variables, values are within a particular interval.
Heights, Weights, and distance covered.
It is bell shaped and symmetric around the mean.
Functions= NORM.DIST(x, neu, SD, true)
NORM.S. DIST (Z, True)
NORM.INV
NORM.S.INV

Regression
Regression analysis is used to predict the value of one variable on the basis of another
variables. It develops a mathematical equation or model that describes the relation between
dependent variable (variable to be forecast) and independent variable (believed by the
practitioner)
Simple Linear Regression Model- also called first order linear model
y = β0 + β1x + ε
y = dependent variable x = independent variable β0 = y-intercept β1 = slope of the line
(defined as rise/run) ε = error variable
The straight line that we wish to use to estimate β0 and β1 is the “best” straight line—best in
the sense that it comes closest to the sample data points. This best straight line, called the
least squares line. / Least square method – Chp 4 pg 114

The slope is defined as rise/run, which means that it is the change in y (rise) for a oneunit
increase in x (run). Put less mathematically, the slope measures the marginal rate of change
in the dependent variable. The marginal rate of change refers to the effect of increasing the
independent variable by one additional unit.
Intercept- point at which the regression line and the y axis intercept.

Coefficient of Correlation tells us about the linear relationship whether it is strong or weak.
Refer to pg 636 and 637 of textbook for all formulae and example.
Interpretation of Regression: for xr 16-04
1. Multiple R: The correlation coefficient between the independent variable and
dependent variable is 0.876 which indicates a strong positive relation.
2. R square: The coefficient of determination is 0.767 which means that 76.8% of the
variation in the dependent variable is explained by the independent variable.
3. Adjusted R: Adjusts the R squared value for the no of predictors in the model. 0.749
indicates that the independent variable explains a substantial amount of variation in
dependent variable.
4. Standard Error: estimated is 3.4. It measures the average distance btw the observed
and predicted values.
5. Observations- 15
6. ANOVA table shows the sources of variation in the regression model.
Df- Degree of freedom. It is 1 for regression and 13 for residuals.
SS- sum of squares. Represents variation exp by regression model and the
unexplained variation of residuals.
MS- mean square. Dividing SS by Df.
F- F statistic is ratio of mean square values and tests the overall significance of
regression model. In this case it is small indicating that model is statistically
significant.
7. Intercept: The estimated intercept is 26.917. Means that when x=o then the ___ is
26.917. It represents the predicted value of the dependent variable when all
independent variables are zero.
8. Overweight: The estimated coefficient for the independent variable "Overweight" is
0.794. It indicates that for each unit increase in the "Overweight" variable, the
predicted value of the dependent variable increases by 0.794.
9. This table shows the predicted values, residuals, and observed values for each
observation in the dataset.
Predicted Television: The predicted values of the dependent variable based on the
regression model.
Residuals: The differences between the observed values and the predicted values.
Positive values indicate that the actual values are higher than predicted, while
negative values indicate the opposite.
10. Based on this information, you can conclude that the regression model is statistically
significant and explains a significant portion of the variation in the dependent
variable. The "Overweight" variable has a positive and significant effect on the
predicted value of the dependent variable.
11. The slope coefficient is 26.917 which predicts the value of all dependent variable
when all independent variables are zero.

OR
Linear Regression equation is y^ (predicted values of y)= bX(slope ie rate of increase or
decrease of Y that for each unit increase in X) + a (Y intercept= level of y when x=0)

SSE- Sum of squares for error is the minimized sum of squared deviations.

CORRELATION
It is a measure of linear association.
Karl Pearson
Spearman’s Correlation

You might also like