Stats

STATISTICS STATISTICIAN
- Is a branch of applied - Statistics practitioner is a person

mathematics that involves the who uses statistical techniques
collection, description, analysis, properly.
and inference of conclusions from - An individual who works with the
quantitative data. mathematics of statistics. Work
- Is the study and manipulation of involves research that develops
data; gather, review, analyze, and techniques and concepts that may
draw conclusions from data. help the statistics practitioner
- Art and science of collecting,
analyzing, and interpreting data. KEY STATISTICAL CONCEPTS
o Business Statistics • Population – is the group of all
o Descriptive Statistics – items of interest to a statistics
deals with methods of practitioner. (frequently very
organizing, summarizing, large/infinitely large no.)
and presenting data in a ➢ Parameter – a descriptive
convenient and measure of a population is
informative way; Graphical called a parameter.
Techniques/Numerical ➢ Census – the process of
Techniques collecting data from the
population
DATA – is raw, unorganized facts that need • Sample – a set of data drawn from
to be processed. Data can be something the studies population
simple and seemingly random and useless ➢ Statistic – a descriptive
until it is organized. measure of a sample
o Unorganised and unrefined ➢ Survey – the process of
facts collecting data from the
o Individual unit that sample
contains raw materials • Statistical Inference – is the
which do not carry any process of making an estimate,
specific meaning prediction, or decision about a
o Doesn’t depend on population based on sample data.
information ➢ it is far easier and cheaper
o Raw data -> insufficient to take a sample from the
population of interest and
INFORMATION – when data is processed, draw conclusions or make
organized, structured or presented in a estimates about the
given context. population on the basis
o Compromises processed, information provided by
organized data presented the sample
in a meaningful context ➢ Estimates > Test
o Group of data that carries Hypotheses > Conclusion
a logical meaning • Measures of Reliability
o Depends on data ➢ Confidence level – is the
o Sufficient for decision proportion of times that an
making estimating procedure will
be correct
➢ Significance level – HIERARCHY OF DATA – The data types can
measures how frequently be placed in order of the permissible
the conclusion will be calculations.
wrong • INTERVAL
➢ Values are real numbers
TYPES OF DATA AND INFORMATION ➢ All calculations are valid
• Variable – is some characteristic ➢ Data may be treated as
of a population or sample. ordinal or nominal
• Value – are the possible • ORDINAL
observations of the variable ➢ Values must represent the
• Data – are the observed values of ranked order of the data
a variable ➢ Calculations based on an
➢ Datum – is the singular ordering process are valid
form of data. Data is plural ➢ Data may be treated as
• Interval – are real numbers, such nominal but not as interval
as heights, weights, incomes, and • NOMINAL
distances ➢ Values are the arbitrary
➢ Quantitative/Numerical numbers that represent
➢ Ratio categories
• Nominal – categories ➢ Only calculations based on
➢ Qualitative/Categorial the frequencies or
• Ordinal – appears to be nominal, percentages of occurrence
but the difference is that the order are valid
of their values ➢ Data may not be treated
as ordinal or interval
SAMPLING METHODS – A representative
sampling methos guarantees a good DATA BASED ON RESEARCH DESIGN
statistical inference about the • Qualitative data
population. - Include labels or names
• Probability (random) and Non- used to identify an
probability (convenience) attribute of each element
• Population (all) vs Sample (part) - Qualitative data use either
the nominal or ordinal
SCALE OF MEASUREMENT
scale of measurement and
• Nominal Scale – consists of
may be nonnumeric or
labels or names used to
numerical
identify an attribute of the
- Summarize: count the
element
observation, compute the
• Ordinal Scale - appears to be
proportion by category
nominal, but the difference is
• Quantitative data
that the order of their values
- Require numeric values
• Interval Scale – with values
that indicate how much or
(ordinal) and interval between
how many
values is fixed (real numbers)
- Quantitative data are
• Ratio Scale – all properties and
obtained using either the
absence of the variable at zero
interval or ratio scale of
point
measurement
- Summarize: Arithmetic log (n); n is the
operation is meaningful sample size
- Cross-sectional data – - class width
are data collected at the
same or approximately the SHAPES OF HISTOGRAM
same point in time • symmetry – a histogram is said to
- Time series data – are be symmetric if, when we draw a
data collected over several vertical line down the center of the
time periods histogram, the two sides are
identical in shape and size.
RELATIONSHIP BETWEEN VARIABLES
• Univariate – techniques applied to
a single set of data
• Bivariate – there are many
situations where we wish to depict
the relationship between variables • skewness – a skewed histogram is
• Cross-classification table (cross- one with a long tail extending to
tabulation table) – is used to either the right or the left. The
describe the relationship between former is called positively skewed,
two nominal variables and the latter is called negatively
skewed.
DESCRIBING DATA
• Descriptive statistical methods are
used to summarize data sets so
that we can extract the relevant
information
• Bar charts, pie charts, and
frequency distributions are
employed to summarize single
sets of nominal data.
• Interval Data
➢ Histogram – created by
drawing rectangles whose • Number of Modal Classes – a
bases are the intervals and mode is the observation that
whose heights are the occurs with the greatest frequency.
frequencies. A modal class is the class with the
➢ Classes – create a largest number of observations
frequency distribution for • Unimodal histogram – is one with
interval data by counting a single peak. A special type of
the number of symmetric unimodal histogram is
observations that fall into one that is bell shaped.
each series of intervals.
➢ Class Interval
- sturges formula
Number of class
intervals = 1 + 3.3 • Bimodal histogram – is one with
two peaks not necessarily equal in
height
• Cross-sectional Data –
Observations at the same point in
time
Scatter Diagram
• Time-series Data – represent
- To know the relationship of two
measurements at successive
Interval Data
points in time
- The two most important
➢ Line Chart – plot of
characteristics are the strength
variable over time
and direction of the linear
relationship.
MEASURING INFLATION
- To determine the strength of the
• Inflation – is the increase in the
linear relationship: draw a straight
prices for goods and services.
line through the points in such a
• Consumer Price Index (CPI) –
way that the line represents the
works with basket of some 300
relationship. If most points fall
goods and services in the United
close to the line, there is a linear
States (also in other countries),
relationship.
including such diverse items as
food, housing, clothing,
transportation, health, and
recreation.
• Basket – is defined for the “typical”
or “average” middle-income
family, and the set of items and
their weights are revised
periodically (10 years- United
-
States; 7 years – Canada) - There are other types of
1. Compute the inflation adjusted relationships, such as quadratic or
values exponential one
➢ Use the CPI
2. Convert the CPI from months to
year
➢ One year as base for the
index
3. Compute the inflation adjusted
values. Use 2012 as the base year
➢ Compute the 2012 base
➢ Compute the 2012 CPI
➢ Compute the inflation
adjusted values
OTHERS
-
Direction - Figure without scale. No y-axis
➢ Positive – dependent scale.
variable increases when
independent increase
➢ Negative – dependent
variable decreases when
independent increase
- in interpreting the results of a
scatter diagram it is important to
understand that if two variables - Graphs with different caption. For
are linearly related it does not the same graph, interpretation
mean that one is causing the might be different due to the
other. We can express this more caption.
eloquently as Correlation is not
causation
GRAPHICAL EXCELLENCE
1. the graph represents large data
sets concisely and coherently.
Graphical techniques -> large data - Showing a big drop in your graph.
sets; Small Data sets -> table; One For this, percentage form in the y-
or two numbers -> sentence. axis is preferred.
2. The ideas and concepts the
statistics practitioner wants to
deliver are clearly understood by
the viewer. Chart is designed to
describe what would otherwise be
described in words.
3. The graph encourages the viewer
to compare two or more variables.
Graphs are often best used to - The first chart shows almost no
depict relationships between two difference in scale. But when
or more variables or to explain how adjusted, an increase in sales can
and why the observed results now be observed. Expanding the
occurred. scale is usually truncated (zigzag)
4. The display induces the viewer to to show the vertical axis begins not
address the substance of the data
and not the form of the graph.
5. There is no distortion of what the
data reveal.
GRAPHICAL DECEPTION
at zero value. ➢ Ogive: Relative Frequency
distribution
Time Series (measuring inflation)
➢ Line chart
➢ Scatter diagram
OGIVE
• Ogive is for graphical
representation
• Frequency distribution lists the
number of observations that fall
into each class interval.
- The first chart shows volatility. The

•
second chart shows stability.
• relative frequency distribution
highlights the proportion of the
observations that fall into each
class
• cumulative relative frequency
distribution highlight the
proportion of observations that lie
-
- Bar chart width should be the below each of the class limits.
same
APPLICATION OF HISTOGRAM IN
FINANCE
• stock and bond valuation – a
basic understanding of how
financial assets, such as stocks
and bonds, are valued is
- critical to good financial
- When using pictogram, the width management. Both are
should be the same considered long-term financial
assets. Valuation is necessary
for capital budgeting and
capital structure decision.
• Return on Investment – is
calculated by dividing the gain
- (or loss) by the value of the
investment.
e.g. $100 investment that is
GRAPHICAL TECHNIQUES worth $106 after 1 year has a
- Cross-sectional data: Nominal, 6% rate of return. A $100
Ordinal and Interval investment that loses $20 has
Interval Data (refer to activity)
➢ Histogram
➢ Stem-and leaf display
a -20% rate of return.
DESCRIPTIVE TECHNIQUES
• Measures of Central Location
➢ Mean – arithmetic mean or
simply the average.
COMPARING TWO INVESTMENTS Code: =average ([input
Finance range)]
1. Maximize the rate of return on
investment
• Histogram – the center of
the histogram gives us ➢ Median – calculated by
information about the placing all the observation
return one might expect in order. The observation
from the investment that falls in the middle. The
2. Reduce Risk sample and population
• Histogram – the spread medians are computed
variation of the histogram the same way.
provides us guidance Code: =median ([input
about the risk. Narrow range)]
spread means confidence ➢ Mode – the observation
in prediction. Wide spread that occurs with the
means uncertainty. greatest frequency. Both
statistic and parameter
LINE CHART – Is a plot of the variable over
are computed the same
time. It is created by plotting the value of
way.
the variable on the vertical axis and the
time periods on the horizontal axis. What measure to use?
- Mean – usually the first; interval
- Median – not sensitive to extreme
values; ordinal or interval
- Mode – nominal, ordinal, interval
- Geometric mean – growth rates;
interval
➢ Geometric Mean – let Ri
denote the rate of return
(in decimal form) in period
SCATTER DIAGRAM (Scatterplot) – i (i= 1,2,…,n). the geometric
describes the relationship between two mean Rg of the returns R1,
variables. R2, …, Rn is defined such
that
➢ Code: =geomean ([input
range)]
what measure to use?

- Mean – usually the first distance between each
- Median – not sensitive to extreme data point and the mean.
values
- Geometric Mean – use to find the
average of growth rate or rate of
change, in a variable over time.
- Arithmetic mean of n returns ➢
➢ Interpretation: unit of the
(growth rates) – is the appropriate
standard deviation is the
mean to calculate if you wish to
same as the unit of the
estimate the mean rate of return
original data.
for any single period in the future.
➢ Factors that identify what NORMAL DISTRIBUTION
measure to use? • Mean = Median = Mode
- Mean – interval • Symmetry about the center
- Median – ordinal or interval (with • For a bell shape histogram, we
extreme observation) apply the empirical rule:
- Mode – Nominal, ordinal, interval 1. Approximately 68% of all
• Measures of Variability - observations fall within one
Measures the spread or variability standard deviation of the
of the data mean.
➢ Range – calculated using 2. Approximately 95% of all
two measures: Largest and observations fall within two
Smallest value standard deviations of the
mean.
3. Approximately 99.7% of all
observations fall within three
standard deviations of the
mean.
CHEBYSHEV’S THEOREM
• A more general interpretation of
the standard deviation, which
applies to all shapes of histograms.
• The proportion of observations in
any sample or population that lie
within k standard deviations of the
mean is at least
➢ Mean absolute deviation
(MAD) – the mean
absolute deviation of a
dataset is the average •
For Skewed histogram • Percentile – the Pth percentile is
➢ When k=2, chebyshev’s the value for which P percent are
theorem states that at less than that value and (100 – P)%
least three-quarters (75%) are greater than that value.
of all observations lie • Quartile – measures of relative
within two standard standing for dividing dataset into
deviations of the mean quarters.
➢ When k=3, chebyshev’s • Q1 – first/lower Quartile; Q2 –
theorem states that at Second/middle quartile; Q3 –
least eight-ninths (88.9%) Third/upper quartile
of all observations lie
within three standard
deviations of the mean.
COEFFIECIENT OF VARIATION (CV) – a set PERCENTILE

of observations is the standard deviation of
the observations divided by their mean:
INTERQUARTILE RANGE – measures the

spread of the middle 50% of the
observation. Large values of this
statistic mean that the first and third
quartile are far apart, indicating a high
level of variability
KURTOSIS – A measure of the tailedness of
a distribution. Tailedness is how often
outliers occur.
BOX PLOTS – this technique graphs five
statistics: the minimum and maximum
observations, and the first, second, and
third quartiles. It also depicts other
features of a set of data.
- The three vertical lines of the box

are the first, second, and third
quartiles. The lines extending to the
left and right are called whiskers.
- Any point that lie outside the
whiskers are called outliers. The
MEASURES OF RELATIVE STANDING
whiskers extend outward to the
• Describe the position of particular
smaller of 1.5 times the interquartile
values relative to the entire data
range or to the most extreme point
set.
that is not an outlier.
• Observed in median = 50th
percentile = 2nd quartile
‘
OUTLIERS LEAST SQUARES METHOD
- Are unusually large or small - Used to produce a straight-line
observations. Because an outlier is equation
considerably removed from the - Produces a straight line drawn
main body of the data set, its through the points so that the sum
validity is suspect. of squared deviations between the
- Outliers should be checked to points and the line is minimized.
determine that they are not the
result of an error in recording their
values.
- Can also represent unusual
observations that should be
investigated.
DESCRIPTIVE TECHNIQUE
MEASURES OF LINEAR RELATIONSHIP

• Related to scatter diagram,
which shows the relationship
between two interval variables.
It is only limited in showing the
direction and strength of the
linear relationship
➢ Covariance
➢ Coefficient of
Determination (r^2) –
➢ Coefficient of
measures the amount
Correlation
of variation in the
dependent variable
that is explained by the
variation in the
independent variable.
– we calculate it by
squaring the
coefficient of
correlation

Stats

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats

Uploaded by

Copyright:

Available Formats

STATISTICS STATISTICIAN

- Is a branch of applied - Statistics practitioner is a person

- The first chart shows volatility. The

what measure to use?

COEFFIECIENT OF VARIATION (CV) – a set PERCENTILE

INTERQUARTILE RANGE – measures the

- The three vertical lines of the box

MEASURES OF LINEAR RELATIONSHIP

You might also like