Professional Documents
Culture Documents
• Definition
• Is the science of conduction studies to
• 1- collect
• 2-Organize
• 3-Summarize
• 4- analyze
• 5- draw conclusion
Terminology
• Variable is the characteristics or attribute that
can assume different values
• Data are the values (measurements or
observations) that the variable can assume
• Random variable are variable whose values
are determined by chance
• Data set are collection of data values
Types of statistics
Quantitative Qualitative
Data Data
Quantitative Data
Interval
Ratio
Summary -
Levels of Measurement
Qualitative Quantitative
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
Methods of Data Collection
In an observational study, a researcher observes and
measures characteristics of interest of part of a
population.
In an experiment, a treatment is applied to part of a
population, and responses are observed.
A simulation is the use of a mathematical or physical
model to reproduce the conditions of a situation or
process.
A survey is an investigation of one or more characteristics
of a population.
A census is a measurement of an entire population.
20
Shape of the Distribution
• Symmetrical (mean is about equal to median)
• Skewed
– Negatively : mean < median
– Positively (example: income)
mean > median
frequency distribution shape
• There are three most common
• 1- positively skewed, 2-symetric and negatively
skewed…
• In positively skewed or right skewed distribution ,
the majority of the data values fall to the left of
the mean and clustered at the lower end of the
distribution the tail to the right . The mean is to
right of the median and the mode is left of the
median. E.g. exam when most of the students
have poor result or the income of the population
in some countries.
• 2- Symmetric the data are evenly distributed around the
mean. Moreover, when the distribution is unimodal the
mean, the mode and the median are concentrated
around at the centre of distribution.
• 3- negatively skewed or left skewed distribution: The
majority of the data falls to the right of the mean and
clustered at the upper end of the distribution with the
tails to the left . the mean is left to the median and the
mode to the right of the median. Exam when most of
the student get high result. When the distribution is
extremely skwed the value of the mean is pulled to the
tail, and the majorty of the data will be greater or less
than mean depend on skewness. Here Median is used.
Central tendency shape
Central Tendency and Skewed
Normally distributed Distributions
Outliers
n+1
lower quartile = th value
4
n+1
median = th value
2
3 (n + 1)
upper quartile = th value
4
34
Choosing Appropriate
Measure of Variability
• If data are symmetric, with no serious outliers,
use range and standard deviation.
• If data are skewed, and/or have serious
outliers, use IQR.
• If comparing variation across two data sets,
use coefficient of variation.
Summary
• For parametric (normally distributed,
symmetrical) data, the mean and SD are the
appropriate measures of central tendency and
• variability of the data.
• For non-parametric data, the median is the
• appropriate central tendency measure and the
IQR is the appropriate measure of the
variability of the data.
Introduction
• Probability is the measurement of the likelihood
that an event will occur; It is an important part of
statistics and it is the basis of inferential
statistics. make decisions about a population
from sample
• Used to make decisions in the face of uncertainty
• For example:, a weather forecaster may predict
that there is an 80% chance of rain tomorrow.
Calculating probability
A B A B
A and B A or B
The Addition Rule
The probability that one or the other of two events will
occur is: P(A) + P(B) – P(A and B)
• Rule 1
• The probability of any event (E) is a number (either fraction or
decimal) between and including 0 and 1( 0< P( E) < 1). From Rule 1
probability cannot be negative or greater than one.
• Rule 2
• If event cannot occurred it is probability is zero. When a single die is
rolled find the probability of getting 9 = 0
• Rule 3
• If probability of event is certain P (E ) = 1. When a single die is rolled
find the probability of getting number less than 7 =
• Thus when the probability of an event is near to zero is less likely to
occur and when it greater than 0.5 it is more likely.
• Rule 4
• The sum of probability of all outcomes in the sample space is 1
Correlation
A correlation is a relationship between two variables. The data can be represented by
the ordered pairs (x, y) where x is the independent (or explanatory) variable, and y is
the dependent (or response) variable. without being able to infer causal
relationships.
x
Example: 2 4 6
x 1 2 3 4 5 –2
y –4 –2 –1 0 2
–4
• What to study
• 1-Are two or more variables related
• 2-If Yes, what is the strength of the
relationship
• To answer question 1 and 2 we used
correlation coefficient
Linear Correlation
y y
As x increases, y As x increases, y
tends to tends to increase.
decrease.
x x
Negative Linear Correlation Positive Linear Correlation
y y
x x
No Correlation Nonlinear Correlation
Linear Correlation
y
y
r = 0.91 r = 0.88
x
x
Strong negative correlation
Strong positive correlation
y
y
r = 0.42
r = 0.07
x
x
Weak positive correlation
Nonlinear Correlation
correlation coefficient (r)
r = greater than +.50) = a
When the correlation
strong positive relationship
coefficient approaches r = -
or high degree of
1.00 (or less than r = -.50), it
relationship between the
means that there is a strong
two variables. R = +1 perfect
negative relationship
correlation
If there is a significant correlation between two variables, you should consider the
following possibilities.