You are on page 1of 44

Business

Statistics
Lecture 1
Muhammad Amin Qureshi

Presentation Credit: Asst. Prof. Ikram-e-Khuda


The Big
Inputs
Picture Random System
performing
Random
Mutually

Exclusive
Experiment Outputs

Outputs comprise the sample space and each output has its probability of occurrence

The real value assigned to each output is the random variable

Statistics use the information of probability to help us understand the random systems and ultimately develop the inference
Concept of a Random
Variable
A function whose value is a real number determined by each element
in the sample space is called a random variable
Random Variable and
Events
• A random variable is a variable whose domain is the set of basic
events, and whose range (outcome) could be numerical or
categorical.

• An event is an outcome or a union of outcomes, when the outcomes


are the occurrences over which you can assign probabilities (or
measures).
Concept of a Random
Variable
Random Variable Types
Discrete
Continuous
Concept of a Random
Variable
Concept of a Random
Variable
Concept of a Random
Variable
Measurement Scales of
Sampling
• Measurement scales are used to categorize and/or quantify variables
which are the outcomes of a random experiment out of sampling
process.

• Four scales of measurement that are commonly used in statistical


analysis:
• nominal
• ordinal
• Interval
• ratio scales.
Nominal Scale of
Measurement
The nominal scale of measurement only satisfies the identity property
of measurement.
Examples
• Gender
• Religion
• Political affiliation
• Color
Ordinal Scale of
The ordinal scale has the property of both identity and magnitude.
An ordinal variableMeasurement
is a categorical variable for which the possible values are ordered

Example
• Rank can be ordered as 1st, 2nd 3rd etc (in a race)
• However, we cannot tell from this ordinal scale whether it was a close race or whether the
winner won by a mile !

• Educational level might be ordered as


1: Elementary school education
2: High school graduate
3: Some college
4: College graduate
5: Graduate degree
Interval Scale of
• Identity
Measurement
• The interval scale of measurement has the properties of
• Magnitude
• Equal intervals.

Example
• Temperature
• Any temperature scale is made up of equal temperature units, so that the difference
between 40 and 50 degrees is equal to the difference between 50 and 60 degrees (for
example).
• With an interval scale, you know not only whether different values are bigger or smaller,
you also know how much bigger or smaller they are.
• Absolute zero is not defined!!
Ratio Scale of
The ratio scaleMeasurement
of measurement satisfies all four of the properties of
measurement: identity, magnitude, equal intervals, and a minimum value of
zero.
Example
• Distance
• Length
• Height
• Width
• Area
• Age
• Cost price
• Selling price
Properties of Measurement
Scales
Each scale of measurement satisfies one or more of the
following measurement.
properties of
• Identity. Each value on the measurement scale has a unique meaning.

• Magnitude. Values on the measurement scale have an ordered relationship to


one
another.
• That is, some values are larger and some are smaller.

• Equal intervals. Scale units along the scale are equal to one another. This means, for
example, that the difference between 1 and 2 would be equal to the difference between
19 and 20.

• A minimum value of zero. The scale has a true zero point, below which no values exist.
Summary of Different Variable
Data or Types
Random Variable
• Qualitative
• Discrete (Integer Numbers)
• Finite set: Grouped or Ungrouped
• Nominal
• Ordinal
• Quantitative
• Continuous (Real Numbers)
• Finite set: Grouped or Ungrouped type
• Nominal
• Ordinal
• Scaled
• Interval (Zero point not defined)
• Ratio (Zero point undefined)
• Infinite set
Descriptive Statistics vs. Statistical
Inference
• Descriptive statistics Gives first hand knowledge about the data
• Presenting, organizing and summarizing the data
• Probability calculations
• By frequency distribution tables
• Graphing the data
• Measures of central tendency
• Measures of Dispersion
• Five number summary
• Measures of shapes

• Inferential statistics makes inferences/ conclusions and predictions about a


population based on a sample of data taken from the population in
question.
Descriptive vs. Inference Statistics
Descriptive vs. Inference Statistics
Population and
Sampling
Population and sample are two basic concepts of statistics.
• Population
• Population is the collection of all individuals or items under consideration in a statistical study.
• Sample
• Sample is that part of the population from which information is collected.
• Sampling
• Sampling is the process by which inference is made to the whole by examining a part.
• With a single grain of rice, we can test if all the rice in the pot has boiled;
• from a cup of tea, a tea-taster determines the quality of the brand of tea; and
• a sample of moon rocks provides scientists with information on the origin of the moon.
• This process of testing some data based on a small sample is called sampling

Sampling Types
• Probability Sampling
• Non-probability sampling

• The population always represents the target of an investigation. We learn about the population by sampling from the collection.
Population and
Sampling
The Statistical Inference
• Graph your data (probability distributions)
• Look the shape!
Process
• Normal, skewed and kurtosis

• Estimations (Estimating the Population parameter in the Sample)


• Point Estimation
• It gives a particular value as an estimate of the population parameter
• Mean/ weighted mean, median, mode, quartiles, percentiles, range, IQR etc

• Interval Estimation
• It gives a range of values which is likely to contain the population parameter.
• This interval is called a confidence interval
• This procedure also tells that how likely the interval is to contain the actual parameter
• That value is called confidence level
• Usually α=1-confidence level. This implies that confidence level=1-α
• A confidence interval is a random interval

• Inference
• Hypothesis testing
• One tailed and two tailed tests
• Type 1 and Type 2 errors
• Drawing conclusion and predictions
Important Concepts and
Formulae
1) Measure of positions: Quartiles, percentiles and deciles

2) Measures of central tendency: Mean or Mathematical Expectations

3) Measures of dispersion: Variance and standard deviation

4) Measures of shapes: normal , skewed and kurtosis


Analysis Grouped Data Formulae

Formula for Quartiles


Location or class boundary of quartile group

Value

Formula for Percentiles


Location or class boundary of percentile group

Value

Formula for Deciles


Location or class boundary of decile group

Value
Mean or Average value of Dataset
Let the dataset is represented by .

Here is the ith frequency corresponding to ith class or group called

Variance of Mean

Standard Deviation of Mean


The square root of variance gives the standard deviation = Error = variance
2

Error= standard deviation

𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑜𝑢𝑡𝑐𝑜𝑚𝑒=𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝑚𝑜𝑑𝑒𝑙 ±𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡 h𝑒 𝑚𝑜𝑑𝑒𝑙


Measures of Shapes
Measures of Shapes
The frequency distributions can have any shape.

These shapes are visible on bar charts and histograms.

Of so many shapes available we are particularly interested in the shape of normal distributions.

Normal distribution is symmetric bell shaped curve.

This curve appears as the envelope of a bar chart or histogram.

Deviation on horizontal axis from normal distributions is called skewness. There can be either positively or negatively
skewed deviations.

Deviation (on vertical axis) from the height of appropriate normal distribution height is called kurtosis. There can be
either negative kurtosis (platykurtic) or positive kurtosis (leptokurtic).
Bell Shaped Normal
Curve
Data mostly on: Data mostly on:
Tail skewed towards
Tail skewed towards Right, thus right-skewed data
Left, thus left-skewed data

Ref: https://develve.net/Skewness.html
Bell shaped normal distribution curve
Ref: http://itfeature.com/statistics/measure-of-dispersion/measure-of-kurtosis
Characteristics of Distributions

Characteristics of Normal distributions


Mean is the middle value of dataset
Mean =Median=Mode
Skewness=0

Characteristics of Skewed distributions


Mean ≠ Median
Skewness > 0 => Positive skewness (Mean > Median)
Skewness < 0 => Negative skewness (Mean < Median)
Mathematical Calculation
Skewness

Here

is the total no groups or classes.


Normal Shape
In summary following criteria can be used to check if the shape of frequency
distribution is normal or not

Frequency Distribution Graph


All heights must be symmetric
Mean=Median
Skewness=0

All these three criteria must support each other


Probability
Calculations
• Basic probability of one event

• Probability of two events occurring simultaneously


• Mutually exclusive case

• Probability of two events occurring one after the other


• Conditional probabilities and decision trees
• Bayes theorem
Probability
Review
• Probability of an event A is symbolized by P(A) and 0 ≤ P(A) ≤ 1.

Probability Formula 1

𝑇𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠


𝑃 = 𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒
A
𝑠𝑝𝑎𝑐𝑒
Probability Formula 2
𝑚 = 𝑁𝑜.𝑜𝑓 𝑡𝑖𝑚𝑒 𝑒𝑣𝑒𝑛𝑡 𝐴
𝑃 𝐴 = lim 𝑜𝑐𝑐𝑢𝑟𝑠
𝑛→∞ 𝑛= 𝑁 𝑜 . 𝑜 𝑓 𝑟 𝑒 𝑝 𝑒 𝑡 𝑖 𝑡 𝑖 𝑜 𝑛 𝑠 𝑜 𝑓 𝑟 a 𝑛 𝑑 𝑜 𝑚 𝑒 𝑥 𝑝 𝑒 𝑟 𝑖 𝑚 𝑒 𝑛 𝑡

• Sum of the probabilities of all mutually exclusive events is always equal to 1.


Other Rules of Probability
Rules of Sum and Rules of
Product
Addition Rule 1
Addition Rule 2
Multiplication Rule 1
Multiplication Rule 2
Bayes Theorem
Example

Let:
P(A) = Probability of someone suffering from the disease = 0.5% = 0.005
P() = Probability of someone not suffering from the disease = 1 – 0.005 = 0.995
P(B) = Probability of someone being positive in the test (this is required in part a)
P() = Probability of someone being negative in the test
P(B|A) = Probability of someone being positive in test, given that they were suffering = 0.95 (given)
P() = Probability of someone being negative in test, given that they were suffering = 1 – 0.95 = 0.05
P(B|) = Probability of someone being positive given that they were not suffering = 0.1 (given)
P( ) = Probability of someone being negative given that they were not suffering = 1 – 0.1 = 0.9
Part a requires “Total Probability” of P(A&B) = P(A).P(B|A)
someone being positive in the test
regardless of the fact that they were P(A&) = P(A).P(
suffering or not.

The answer for part a will be the sum


of two branches highlighted in green
on the right.

P(B) = P(A&B) + P(&B)

Part b requires that a person got a


positive test result, what is the
probability that they were really
suffering from the disease? => P(A|B)

You might also like