Professional Documents
Culture Documents
and How of
Data Collection
IS 106 - FUNDAMENTALS OF COMPUTING
Objectives
o Describing data
1. Collect Data
3. Characterize Data
x i
Remember: The purpose of any descriptive measure is to describe data, your task will be to select the best to use.
Figure. Spreadsheet of a textbook company showing the data for their 15 books .
Average = = ₱37,533.33
o Categories
• Estimation
• Hypothesis Testing
o Estimation
• Used if we want to learn about a large set of data but it is impractical to work
with all the data
• Estimates are formed from a subset of the data
o Hypothesis Testing
• Remember hearing:
“more doctors recommend this”
No
Collect Check if accurate
Yes
Use data
o Primary Data
• Data captured from data collection
• experiments, telephone surveys, written questionnaires, direct observation, personal interviews
o Secondary Data
• Data captured from data compilation
• scan through printed documents
◦ Closed-ended questions
• answered by selecting from a limited number of options
◦ Open-ended questions
• answer by freely expressing your thoughts and ideas
◦ Demographic questions
• questions about your personal information
◦ Leading Questions
• questions that can influence the respondents response
◦ Conduct Interviews
• Avoid leading questions
1. Interviewer Bias
2. Nonresponse Bias
3. Selection Bias
4. Observer Bias
5. Measurement Error
6. Internal Validity
7. External Validity
1. Interviewer Bias
◦ Interviewers induce bias by:
• the way she ask, look, and tone of voice
2. Nonresponse Bias
◦ Affects the quality of data
◦ Instances like
• No email response
• Unanswered call
• Refusal to answer
3. Selection Bias
◦ Bias in the way respondents are selected
4. Observer Bias
◦ When data is collected thru observation
5. Measurement Error
◦ Instances where data is in the form of measurement
6. Internal Validity
◦ A characteristic of an experiment in which data are collected in such a way
that external factors that could affect the experimental environment and test
subjects are eliminated/controlled.
7. External Validity
◦ A characteristic of an experiment whose results can be generalized beyond
the test environment so that outcomes can be replicated.
o Example:
• Population = 1000 tricycle drivers
o If we get the average income of all 1000 drivers, the result is called a _________.
o If we get the average income of just the 300 drivers, the result is called a
_________.
Challenge:
◦ Volume of cherries
Solution:
◦ Convenience sampling
• Quality checker picks a sample from the lugs
• Grade the sample and assign overall quality to the lug
• Assuming the cherry on top is the same quality with the others.
o Probability sampling
o Types
• Simple Random Sampling
• Stratified Sampling
• Cluster Sampling
• Systematic Sampling
• A method of sampling such that every item from the population has equal
chance of being selected
• Samples can be obtained from a table of random numbers or computer
random number generators
• Selection may be with or without replacement
o Population > Let n = 5
Jundee, Clark, Clark
Jundee, Clark, El
o Sample > Let r = 3 Jundee, Clark, Rodny
Jundee, Godwin, El
Jundee, Godwin, Rodny
o C (n, r) = Jundee, El, Rodny
Clark, Godwin, El
Clark, Godwin, Rodny
Clark, El, Rodny
Godwin, El, Rodny
We are tasked to estimate the total cash holdings of the business in the
Philippines.
Challenge:
•Mostly are small, some are medium, and only few are large
Solution:
• Stratified sampling
Group the businesses according to size
Combine result
• How?
2. Get the ratio of the population size over the desired sample size
4. Select the kth object after that until the desired sample size is
obtained.
oSolution:
o Decide on the sample size: n = 500
• When to use?
• Use it as an alternative to simple random sampling only when you can
assume the population is randomly ordered with respect to the measurement
being addressed in the survey.
• In this case, peoples’ views on ethics are likely unrelated to the spelling of
their last name.
• Involves dividing the population into cluster, that are intended to be mini-
populations
• A simple random sample of m clusters are then selected from the group of
clusters.
• If m clusters have been selected,
• All the items in the cluster can be used as sample; or
• Items can be chosen from each cluster using any probability technique
• How?
b) Use any sampling technique to select items from each of the m clusters
IS 106 – QUANTITATIVE METHODS
Cluster Sampling- example
The Oakland Raiders of the National Football League plays its home
games at McAfee Coliseum in Oakland, California. Recently an outside
marketing group was retained by the Raiders to interview season
ticket holders about the potential for changing how season ticket
pricing is structured. The marketing firm plans to interview season
ticket holders just prior to home games during the current season.
o Challenge
• The geographical spread of the sample.
o Solution:
• Cluster Sampling
o Solution:
1. m clusters = stadium sections
3. Use all items from each cluster or select a random sample from each.
o The data that the statistical techniques deal with are of different types of
level of measurements
1. Qualitative
result of categorizing or describing attributes of a population
• Data is inherently categorical
generally described by words or letters
1. Qualitative - Example
Hair Color
• Black, Dark brown, light
Blood Type
• AB+, O-, B+
2. Quantitative
result of counting or measuring attributes of a population
• data is inherently numerical
always numbers
• sometimes data are best expressed in purely numbers
e.g. amount of money, weight, number of people
2. Quantitative
a) Discrete
◦ result of counting (counted items )
b) Continuous
◦ result of measuring ( measured characterics)
2. Quantitative - Example
◦ Assume you and your friends carry backpacks with books in them to
school.
◦ Discrete data is ……..
◦ Continuous data is ……
3. Cross-sectional
A set of data values observed at a fixed point in time
4. Time-Series
A set of data values observe over time
o Higher the level of measurements, the more sophisticated the analysis that can
be done
1. Nominal
Lowest form of data but the most common
• categorical / qualitative
1. Nominal - Example
F - Female, M – Male
2. Ordinal
Ordinal or rank data
Data elements can be rank-ordered on the basis of some relationship among them
with the assigned values indicating the order
2. Ordinal - Example
Competition Placement
o 1st - Rodny, 2nd - Godwin, 3rd – Clark, 4th - Jundee
Level of Satisfaction
o 1 – Very Satisfied, 2 – Satisfied, 3 – Fair, 4 – Not Satisfied
3. Interval
Not only classifies and orders the measurements but also specifies the distance
between each interval from low to high interval
The interval data allows us to precisely measure the difference between any
two values
In ordinal data, we only know that the value is larger than the other
3. Interval
Can be calculated but cannot be compared
The interval between items are equal
No true zero
The zero entry represents a position on a scale, but the entry is not really
zero
3. Interval - Example
A popular example of this level of measurement is temperature in centigrade,
where, for example, the distance between 94°C and 96°C is the same as the
distance between 100°C and 102°C.
4. Ratio
Data that have all the characteristics of interval data but have a true zero
Zero means zero in the scale, you won’t have a negative value in the
scale
4. Ratio - Example
Student Score Rati
0 20 80
Means that the student who scored 80 is 4 times better than the student
who scores lower