Professional Documents
Culture Documents
Analysis
• Statistics is a field of study concerned with
5
INFERENTIAL STATISTICS
Population?
For example based on a sample survey results reported in USA today, only
46% of high school students can solve problems involving fractions,
decimals and percentages
7
Basic Concepts
Population
A population is a collection of all individuals or objects of interest.
Sample
Example
Average heights of all students of DUHS
Average price of all statistics books in a book fair
Basic Concepts
Statistic
Any value (like Mean, SD) calculated from the sample.
Examples
Average heights of any 30 students selected from DUHS
Average price of any 20 statistics books selected from a book fair
• A variable is something whose value can vary
variables
32 Male A
24 Male B data
40 Female A
Types of Variable
Categorical Variables :
• Gender
- Male
- Female
• Type of disease
- Diarrhea
- Fever
Nominal Categorical Variables
• The ordering of categories is completely arbitrary (No ordering)
Examples
- Gender: Male, Female
• Faculty Position
- Professor
- Associate Professor
- Assistant Professor
Continuous variables
Examples
Examples
• No of Houses in a city
Survey/Questionnaires Records
Experimentation
• The first step to describe a set of data is in the form of table called
frequency distribution
24
A grouping of data into non-overlapping classes showing the
number of observations in each class.
25
Frequency Distribution for qualitative
Variables
• Following table showing the frequency distribution of education
status for a group of 25 people surveyed
Degrees Frequency
None 2
Bachelor 11
Master 7
Doctorate 5
Construct frequency Table for
• Gender, ID Gender Minority
• Minority 1
2
m
m
No
No
3 f No
4 f No
5 m No
6 m No
7 m No
8 f No
9 f No
10 f No
11 f No
12 m Yes
13 m Yes
14 f Yes
15 m No
16 m No
17 m No
18 m No
19 m No
20 f No
Frequency table for Gender
Gender frequency
Female 8
Male 12
Total 20
Relative
Degree Frequency Frequency(%)
None 2 =2/25*100=8
Bachelor 11 =11/25*100=44
Master 7 =7/25*100=28
Doctorate 5 =5/25*100=20
Total 25 =25/25*100=100
Cumulative Relative frequency
•It is the proportion of cases in a particular category and all preceding
category
Commulative
Relative
Degree Frequency Relative
Frequency
frequency
None 2 8% 8%
Bachelor 11 44% 52%
Master 7 28% 80%
Doctorate 5 20% 100%
Total 25 100%
The commonly used graphic forms are:
• Bar Graph
• Pie Chart
Bar Graph: A graph in which classes are reported on the horizontal
axis and class frequencies on vertical axis. It is important to note that
as the length / height of the bar increases the value is greater
Bachelor 11 44
Master 7 28
Doctorate 5 20
Total 25 100
• A researcher wishes to prepare a report showing the number of hours
per week students spend studying. He selects a random sample of 30
students and determines the number of hours each student studied last
week.
1
3
30-35 100
30
Total 100
The commonly used graphic forms is:
• Histograms
A Histogram shows the shape of a distribution. In Histogram the classes
are marked on the horizontal axis and the class frequencies on the vertical
axis.
Common Shapes of frequency distribution
Measures of Central Tendency
Measures of Central Tendency
• A measure of central tendency is a measure which indicates where the
middle of the data is.
- Mean
- Median
- Mode
Mean:
For a given set of n observation, x1, x2, x3, …, xn, the mean is the sum of
these numbers divided by n, and denoted by
x i
x i 1
n
• The following are the weight losses of 10 individuals who entered in a 5
week weight-control program:
• 9, 7, 10, 11, 10, 11, 4, 8, 10, 9
10
x i
9 7 10 ... 9 89
x i 1
8.9
n 10 10
45
Median:
•It is the middle most value of the data set.
•It divides the data in such a way that half the observations
are less than that number and half the observations are
greater than that number.
•It is denoted by ~
x
46
Example:
•Find median of following three numbers 2, 8, 5
47
• If n (no of observations) is odd, median = (n+1)/2 th ordered
observation.
Example :
Data: 1, 7, 6, 2, 5 n=5
Ordered: 1, 2, 5, 6, 7
n 1th th=5 1
rd
median is
2
observation =
3 observation
2
So, median = 5
If n (no of observations) is even, median= mean of th
n
observation and th observation.
1
2
EX2.
Data: 4, 6, 2, 7, 5, 8 n=6
Ordered: 2, 4, 5, 6, 7, 8
n 6
median 3rd observation
2 2
n
and 1 (3 1)th observation 4th observation
2
56
So, median 5.5
2
Determine the median of following two dataset:
1. {1, 2, 3, 4, 5}
2. {2, 3, 4, 5, 6, 7, 8,9}
The Mode:
The value which occurs most frequently in the data set.
If all values are different there is no mode.
Sometimes, there are more than one mode.
EX.
Data: 4, 5, 2, 2, 6, 8 n=6
53
Median
The median is affected less than the mean by extremely high or
extremely low values and is therefore a valuable measure of central
tendency when such values occur.
• It is less sensitive to variations in the data
54
Measure of Dispersion
7 7 7 8
3 2
7 77 7 77
7 8 13
7 6
9
Mean = 7 Mean = 7
Mean = 7
Measure of Dispersion
Example:
Omeperazole 0.55 0.32 0.36 0.37 0.39 0.43 0.43 0.47 0.52 0.53
Rabeperazole 0.26 0.26 0.43 0.23 0.47 0.51 0.52 0.55 0.59 0.55
The obtained measure of central tendency of the previous example are as
follow:
Both drugs have the same mean i.e.0.437.but still two drugs differ.
There is more variation in the values of Rabeperazole
- Range
- Variance
- Standard Deviation
Range = Maximum Value – Minimum Value
Let’s calculate the range of previous example
62
Due to square quantity, the variance is not considered as good measure
of dispersion. To avoid this, square root of variance is taken, which is
called as standard deviation
n
1
s s2
n 1 i 1
( xi x ) 2
- symmetrical
- negatively skewed.
• Measure of skewness describes the shape of data
For a symmetrical distribution, the mean will equal the median, and the
skewness coefficient will be zero.
70
mean > median > mode
If the distribution is skewed to the right, the mean will be less than
the median, and the coefficient will be negative.
Output Viewer
• Displays output. Extension of the saved file will be “spv.”
• Displays output. Extension of the saved file will be “spv.”
• How would you put the following information into SPSS?
Patient Smoking
Gender Age Height
ID Status
1 2 50 5.4 2
2 2 45 5.1 2
3 1 43 5.6 1
4 2 35 6 1
5 1 29 5.9 2
6 2 32 5.6 2
7 2 36 5.8 2
8 2 55 5 2
9 1 49 5.4 1
10 1 43 5.11 1
Click
• Sort the data by the ‘Height’ of students in descending order.
Double Click
Sorting the data (cont’d)
Click
Opening the sample data
• Open ‘Employee data.sav’ from the SPSS
• Go to “File,” “Open,” and Click Data
• Go to Program Files,” “SPSSInc,” “SPSS16,” and “Samples” folder.
• Open “Employee Data.sav” file
Opening the sample data
• Recoding into the same variable
• Recoding into different variables
• It is always recommended to recode into
different variables and not to alter the original
variable
• Click on Transform > Recode > Into different variables.
• Click Continue
• And then OK.
Basic Analysis with SPSS
• Frequencies
- This analysis produces frequency tables showing frequency counts
and percentages of the values of individual variables.
• Descriptive
- This analysis shows the maximum, minimum, mean, Range ,standard
deviation etc. of the variables
Frequencies
• Click ‘Analyze,’ ‘Descriptive statistics,’ then click ‘Frequencies’
Frequencies
• Click gender and put it into the variable box.
• Click ‘Charts.’
• Then click ‘Bar charts’ and click ‘Continue.’
Click
Click
• Finally Click OK in the Frequencies box.
Click
Frequencies
• Click ‘Analyze,’ ‘Descriptive statistics,’ then click ‘Descriptives…’
• Click ‘Current Salary’ and ‘Beginning Salary,’ and put it into the
variable box.
• Click Options
Click
• The options allows you to analyze other descriptive statistics besides the mean and Std.
• Click ‘variance’, ‘Minimum’, ’Maximum’ and ‘Range’
• Finally click ‘Continue’
Click
• Finally Click OK in the Descriptives box. You will be able to see the
result of the analysis.
Select File Open Data
Choose Excel as file type
Select the file you want to import
Then click Open
106
107
Key in values and labels for each variable
Run frequency for each variable
Check outputs to see if you have variables with
wrong values.
Check missing values and physical surveys if you
use paper surveys, and make sure they are real
missing.
Sometimes, you need to recode string variables
into numeric variables
108
Wrong
entries
109
Recode variables
1. Select Transform Recode
into Different Variables
2. Select variable that you want
to transform (e.g. Q20): we
want
1= Yes and 0 = No
3. Click Arrow button to put
your variable into the right
window
4. Under Output Variable: type
name for new variable and
label, then click Change
5. Click Old and New Values
110