You are on page 1of 32

Statistics in Research

BASIC STATISTICS FOR SOCIAL SCIENCE RESEARCH

Dr. Azadeh Asgari

Purpose of Statistics
1. To describe phenomena, 2. To organize and summarize our result more
conveniently and meaningfully,

3. To make inference or make certain
predictions,

4. To make explain, and 5. To make conclusion.

Type of Statistics
1. Descriptive Statistics:
Concerned with summarizing the distribution of single variable or measuring relationship between two or more variables (eg: Frequency distribution, measure of central tendencies, measures of dispersion and so on)

2. Inferential Statistics:
Concerned with making generalization from sample to population (eg: T-test, Analysis of Variance and Chi-square and so on).

Concepts in Statistics
a. Population
 The entire group being observed, almost always assumed to
be infinite in size  The total collection of all cases in which the researcher is interested and wishes to understanding.  Group or set of human subjects or other entities (eg: all student at the UPM, all members at government)

b. Sample
 The sub-group of population  Generalizations based on samples can accurately represent the population

Concepts in Statistics (Cont.)
Population
i. Basic unit of interest ii. Known as universe iii.Large in numbers iv.Difficult to observed v. Dynamic

Sample
i. A portion of defined population ii. Small in numbers iii.Observable iv.Can draw inference about population

Concepts in Statistics (Cont.)
VARIABLE
 As an observable characteristic of an object or event that can be described according to certain classification or scales of measurement.  Independent Variable: In bi-variate relationship, the variable is taken as cause, normally represented by symbol X.

Concepts in Statistics (Cont.)
1. Dependent variable: In a bi-variate relationship, the
variable is taken as the effect, normally represented by symbol Y. 2. Continuous variable/data: A variable/data with a unit of measurement that can be subdivided infinitely. e.g: height = 150.3 cm 3. Discrete variable/data: A variable with a basic unit of measurement that cannot be subdivided. e.g: sex (1 = Male, 2 = Female)

Measurement
… is the process of assigning a number to object, place or person.

 Level of Measurement The mathematical characteristic of a variable as determined by the measurement process. A major criterion for selecting statistical procedures or techniques.

Level of Measurement (Type of Data)
1. Nominal
    Sorting elements with respect to certain characteristics Sort into categories that are at homogenous as possible Lowest level of measurement classification, naming, labeling

2. Ordinal
 Grouping or classification of elements with degree of order or ranking  May not be able say exactly how much they possess  Can be arrange or placed in single continuum e.g: Likert scale

Level of Measurement (Type of Data)
3. Interval
 Ordering elements with respect to the degree to which they possess certain characteristics  Indicates the exact distance between them  Zero does not means absence e.g: 0 degrees Celsius

4. Ratio
 Ordering elements with respect to the degree to which they possess certain characteristics  Indicates the exact distance between them  Zero means absence – absolute

Level of Measurement (Type of Data)
These four scale of measurement can be generalized into TWO categories: 1. Non-metric: includes the nominal and ordinal scales of measurement. 2. Metric: include interval and ratio scales of measurement.

1. Descriptive Statistics
a. Frequency Distribution b. Measure of Central Tendency c. Measure of Dispersion d. Measure of Association

Data Presentation
Basic function of statistics to organize and summarize data: a.Frequency table

b.Graphic presentation
- Pie Chart - Bar Chart - Histogram - Polygon - Line Graph

General Guides

Use mode when variable are nominal; you want to present quick and easy measure for ordinal, interval and ratio data/variables. Use median when variable are ordinal; you want to report the central score and the scores measured at interval and ratio levels have badly skewed distribution. Use mean when variables are interval or ratio (except for badly skewed distribution); you want to report the typical score and you anticipate additional statistical analysis.

2. Inferential Statistics

To enable researcher to make statement or summary or decision about the population based on the sample To enable researcher to make statement or summary or decision on the unseen data based on the empirical data To enable researcher to make statement or summary or decision on the large group based on data from the small group.

Two Main Procedures of Inferential Statistics
1. ESTIMATES 2. HYPOTHESIS TESTING

Statistical Assumption

A set of parameters, guidelines indicating the conditions under which the procedures can be most appropriately used. Every test has own assumption that should not be violated Four main assumption of Inferential Statistics

Main Assumption of Inferential Statistics

1. Random sample 2. Characteristics are related to true population 3. Multiple random sample from same population yield similar statistics that cluster around true population parameters 4. Can calculate the sampling error associated with a sample statistics

Normal Distribution
 The normal probability distribution is a continuous probability distribution.  Data in the normal distribution are measured in terms of standard deviation from mean and are called standard scores or Z score.  Characteristics of Normal Distribution:
1. It is a continuous probability distribution. 2. Symmetrical or bell-shaped with the mode, median and mean are equal. 3. The distribution contains an infinite number of cases. 4. The distribution is asymptotic – the tails approach abscissa: range from negative to positive infinity. 5. About 95% of distribution lies within 2 standard deviation from the mean.

Hypothesis Testing
 Hypothesis is a tentative statement about
something.  Statement concerning:
a. b. c.
Differences between groups Relationship or association between variables Changes that occurs

 Statement related to our prediction population characteristics or relationship  Statement related to research question  Statement must be testable or verifiable

about

Hypothesis Testing
Hypothesis Statement And Testing Help Us On:

a.Drawing Conclusion b.Making Implication c.Making Suggestion

Type of Hypothesis
 We are not going to prove the hypothesis is true, but we are to prove that is not true or false.  Statistical test is to test the hypothesis
Two Types Of Hypothesis:
 Null Hypothesis (Ho)  Alternative/Research Hypothesis (Ha or H1)

Type of Hypothesis
Null Hypothesis:
A statement of no difference or no association (among variables, samples etc).

Alternative/Research Hypothesis:
A statement asserting that there is difference or association (among variables, samples, etc).

Forms of Hypothesis
There Are TWO Forms Of Hypothesis: 1. Directional Hypothesis: e.g.: Ha: μ >230 or Ha: μ < 230 2. Non-directional Hypothesis: e.g.: Ha: μ = 230

5 Step Model for Hypothesis Testing
Step 1: Making assumption
Samples selected randomly Defined population Interval-ratio data Sampling distribution – normal

Step 2: State the null and research hypothesis Step 3: Selecting the appropriate distribution such as
z, t, f and χ² and establishing the level of significance as well as critical region. Calculate the test statistics

5 Step Model for Hypothesis Testing
Step 4: State the level of significance and critical
region
Level of significance or alpha level commonly used 0.05 Critical region will determine the rejection or failure to reject the null hypothesis

Step 5: Making decision
If test statistic falls in the critical region, reject the null hypothesis. If test statistic does not fall in the critical region, we fail to reject the null hypothesis at predetermined alpha level.

Type I and Type II Error
a. Type I Error (ALPHA ERROR):
The probability of rejecting a null hypothesis that is in fact true.

b. Type II Error (BETA ERROR)
The probability of failing to reject the null hypothesis in fact false.

Level of Significance (Alpha Level)

The probability of area under the sampling distribution that contains unlikely sample outcomes given that the null hypothesis is true. Also, the probability of type I error Commonly expressed as 90%, 95% or 99% or written as alpha = 0.10, 0.05 or 0.01 95%, refers to alpha 0.05 which means that we are 95% sure of making the right decision and 5% error.

Critical Region
 The area under the sampling distribution
that, in advance of the test itself, is defined as including unlikely sample outcome given that the null hypothesis is true. null hypothesis.

 Critical value of the test statistic to reject  Critical value is defined from the test
statistic table corresponding to its level of significance and degree of freedom.

One-tailed and Two-tailed Test
Critical region on one side or both sides of the distribution depending on the nature of alternative or research hypothesis. e.g.: Ho: a = b (Two-tailed) Ha: a ≠ b Ha: a > b (One-tailed) Ha: a < b

One-tailed Test
 A type of hypothesis test used when the
direction of the difference between variables or samples can be predicted (Directional hypothesis).

 One-tailed test has a one critical region that
correspond to the direction of the research hypothesis.

Two-tailed Test
 A type of hypothesis test used when direction of difference between variables or samples cannot be predicted (Nondirectional hypothesis).  Two-tailed test has a two critical regions on both sides of the distribution